D6.1 – State-of-the-art DEEP Learning (DL), Neural Network (NN) and Machine Learning (ML) frameworks and libraries

This document provides an overview of the state-of-the-art in Deep Learning (DL), Neural Network (NN) and Machine Learning (ML) frameworks and libraries to be used as building blocks in the DEEP Open Catalogue. The initial state of the catalogue will be built based on the outcome of this document and the initial user community requirements of scientific data analytic and ML/DL tools coming from WP2.

DEEP-JRA3-D6.1

Scientific Publications

New publication: “Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey”

We are thrilled to announce that we have published a new paper entitled “Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey” on the Springer Artificial Intelligence Review Journal.

The paper, that is published as Open Access and can be downloaded following its doi: 10.1007/s10462-018-09679-z, is authored by Giang Nguyen, Stefan Dlugolinsky, Martin Bobák, Viet Tran, Álvaro López García, Ignacio Heredia, Peter Malík and Ladislav Hluchý, from the Institute of Informatics Slovak Academy of Sciences (IISAS) and the Institute of Physics of Cantabria (IFCA – CSIC – UC).

Abstract: The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.

Deliverables

D4.2 – First implementation of software platform for accessing accelerators and HPC

This deliverable describes the first implementation of the software platform for accessing accelerators and HPC. The list of components included in the software platform is based on the analysis provided by Deliverable D4.1. This document provides detailed descriptions of software components used in the platforms, the work done on each component and its current status.
Evaluation of achieved results and implementation plan for the next periods are also included.

http://hdl.handle.net/10261/168086

Deliverables

D5.2 – High Level Hybrid Cloud solutions prototype

This document complements deliverable D5.1 Definition of the Architecture of the Hybrid Cloud (D5.1) with the specific prototype developments carried out to support the deployment of hybrid infrastructures across multiple IaaS Cloud sites. The document describes the technical challenges, the evolution of the components to support this prototype and a roadmap of implementation towards the final release of the High Level Hybrid Cloud solutions.

http://hdl.handle.net/10261/168087

Deliverables

D6.3 – First prototype of the DEEP as a Service

This document provides an updated description of the prototype implementation of the DEEP as a Service solution that is being developed within the DEEP-Hybrid-DataCloud project Work Package 6 (WP6). As such it provides an overview of the state of the art of the relevant components and technologies, as well as a technology readiness level assessment with regards to the required functionality, the required interactions with other work packages in the project, as well as the detailed work plan and risk assessment for each of the activities.

http://hdl.handle.net/10261/168088

Deliverables

D3.2 – Pilot testbed and integration architecgture with EOSC large scale infrastructures

The deliverable contains the plan, design, architecture and deployment of the Pilot Preview testbed based on technical requirements and descriptions from the WP2 use cases. The services and components developed during the DEEP-HybridDataCloud by the teams part of the WP4, 5 and 6 are deployed, tested and validated by end-users in this testbed. It also describes how the Pilot Preview testbed services and components will be integrated with EOSC production infrastructure and other external resource providers.

http://digital.csic.es/handle/10261/168084

News & Events

DEEP-Genesis: first software release is out

The first DEEP-HybridDataCloud software release is out!

The DEEP-HybridDataCloud project is pleased to announce the availability of its first public software release, codenamed DEEP Genesis. The release notes can be found here.

This release comes after an initial phase of requirement gathering which involved several research communities in areas as diverse as citizen science, computing security, physics, earth observation or biological and medical science. This resulted in the development of a set of software components under the common label of DEEP as a Service (DEEPaaS) enabling the easy development and integration of applications requiring cutting-edge tecniques such as artificial inteligence (deep learning and machine learning), data mining or analysis of massive online data streams. These components are now released into a consistent and modular suite, with the aim of being integrated under the EOSC ecosystem.

DEEP Genesis provides open source modules to allow users from research communities to easily develop, build and deploy complex models as a service at their local laptop, on a production server or on top of e-Infrastructures supporting the DEEP-Hybrid-DataCloud stack.

High-level modules covering three types of users:

Basic users can browse and download already built-in models and reuse them for training on their own data.
Intermediate users can retrain the available models to perform the same tasks but fine tuning them to their own data.
Advanced users can develop their own deep learning tools from scratch and easily deploy them within the DEEP infrastructure.

All models can be exposed in a friendly front-end allowing an easy integration within larger scientific tools or mobile applications thanks to a RESTful API (DEEPaaS API). More details can be found here.

Key components of the release:

The DEEP Open Catalogue where the single users and communities can browse, store and download relevant modules for building up their applications (such as ready to use machine learning frameworks, tutorial notebooks, complex application architectures, etc.).
A runtime engine able to supply the required computing resources and deploy related applications.
The DEEP PaaS layer coordinating the overall workflow execution to select the appropriate resources (cloud and others, HPC, HTC) and manage the deployment of the applications to be executed.
The DEEP as a Service solution offering the application functionality to the user.

All the DEEP components are integrated into a comprehensive and flexible architecture that could be deployed and exploited following the user requirements. The DEEP Authentication and Authorization approach follows the AARC blueprint, with support for user authentication through multiple methods thanks to the INDIGO IAM user authentication (SAML, OpenID Connect and X.509), support for distributed authorization policies and a Token Translation Service, creating credentials for services that do not natively support OpenID Connect.

The DEEP-HybridDataCloud software is built upon the INDIGO-DataCloud components, and is released under the Apache License Version 2.0 (approved by the Open Source Initiative), except for the modifications contributed to existing projects, where the corresponding open source license has been used. The services can be deployed on both public and private cloud infrastructures. Installation, configuration guides and documentation can be consulted here. The initial set of ready-to-use models from a variety of domains can be found at the DEEP Open Catalogue.

Get in touch

If you want to get more information about the scientific applications adopting the DEEP-Hybrid-DataCloud solutions or you want to become one, please contact us!