DEEP-Hybrid-DataCloud announces the availability of the second software release and platform

  • New platform release allows data scientists and machine learning practitioners to build, develop, train and deploy machine learning services easily, with a comprehensive set of tools and services covering the whole machine learning application lifecycle.
  • The new DEEP as a Service allows to deploy machine learning models as services, following a serverless approach, with horizontal scalability.
  • Redesigned marketplace allows to train existing modules easily.

It is a pleasure to announce that the DEEP-HybridDataCloud project has published its second software release and platform, code named DEEP Rosetta. DEEP Rosetta expands on the first version of software generated by the project, called DEEP Genesis, enlarging its functionalities to cover the whole machine learning cycle, enhancing the stability of the different components and adding new features.  Developing, training, sharing and deploying your model has never been easier!

All the changes in this new release are oriented towards the common project goal of easing the path for the scientific communities to develop, build and deploy complex models as a service at their local laptop, on a production server or on top of e-Infrastructures supporting the DEEP-Hybrid-DataCloud stack. 

As in the previous release, the DEEP components are integrated into a comprehensive and flexible architecture that can be deployed and exploited following the user requirements. 

Comprehensive set of services for machine learning

  • The DEEP training facility, accessible through the DEEP training dashboard allows data scientists to develop and train their models, with access to latest generation EU computing e-Infrastructures.
  • DEEP as a Service is a fully managed service that allows to easily and automatically deploy the developed applications as services, with horizontal scalability thanks to a serverless approach. The pre-trained applications that are published in the catalog are automatically deployed as services to make them available for general use.
  • DEEP Open Catalog and marketplace comprises a curated set of applications ready to use or extend, fostering knowledge exchange and re-usability of applications. This open exchange aims to serve as a central knowledge hub for machine learning applications that leverage the DEEP-Hybrid-DataCloud stack, breaking knowledge barriers across distributed teams. Moreover, pre-configured Docker containers, repository templates and other related components and tools are also part of this catalog.
  • DEEPaas API enables data scientists to expose their applications through an HTTP endpoint, delivering a common interface for machine learning, deep learning and artificial intelligence applications.

The platform has been extended to support asynchronous training, allowing to launch, monitor, stop and delete the training directly from your web browser. The trained models to perform inference can now be chosen from the models available in the training history. All the documentation on these new features has been accordingly updated here.

In addition, the user friendly training dashboard allows now to easily and transparently deploy the modules in a cloud environment. From the dashboard, the user can choose the resources needed for the deployment in terms of memory, type of processing unit (CPU or GPU), the storage client to be used or even to manually configure the scheduling.

DEEP training dashboard screenshot
The DEEP training dashboard allows to easily train any existing modules, or your own one.

The DEEP Open Catalogue has been renewed with a more appealing design, improving the organisation of the modules and the general site interactivity. 

 The DEEP Rosetta release consists of:

  • 10 products distributed via 22 software packages and tarballs supporting the CentOS 7, Ubuntu 16.04 and 18.04 operating systems.
  • 15 fully containerised ready-to-use models from a variety of domains available at the DEEP Open Catalogue

The release notes can be found here. The full list of products together with the installation, configuration guides and documentation can be consulted here

The EOSC ecosystem

The release components have been built into a consistent and modular suite, always with the aim of being integrated under the EOSC ecosystem.  As part of our integration path into the EOSC, we have published four different services and components in the EOSC portal:

  • DEEPaaS training facility: Tools for building training, testing and evaluating Machine Learning, Artificial Intelligence and Deep Learning models over distributed e-Infrastructures leveraging GPU resources. Models can be built from scratch or form existing and pre-trained models (transfer learning or model reuse).
  • Application composition tool: Easy and user focused way of building complex application topologies based on the TOSCA open standard, that can be deployed on any e-Infrastructure using the orchestration services.  
  • Infrastructure Manager: Open-source service that deploys complex and customised virtual infrastructures on multiple back-ends.
  • PaaS Orchestrator: Service allowing to coordinate the provisioning of virtualized compute and storage resources on distributed cloud infrastructures and the deployment of dockerized services and jobs on Mesos clusters.

Collaboration with Industry

The EOSC Digital Innovation Hub (DIH) is in charge of establishing collaborations between private companies and the public sector to access technological services, research data and human capital. The DEEP project and DIH have established a collaboration where DEEP provides services and DIH seeks for SMEs to conduct application pilots. The services included in this agreement are:

  • ML application porting to EOSC technological infrastructure
  • ML implementation best practices
  • AI-enabled services prototyping in EOSC landscape

Get in touch

If you want to get more information about the scientific applications adopting the DEEP-Hybrid-DataCloud solutions, please contact us!

Paving the path towards the implementation of the EOSC ecosystem

The European Open Science Cloud (EOSC) aims to be the Europe’s virtual environment for all researchers to store, manage, analyse and re-use data for research, innovation and educational purposes.

INDIGO-DataCloud (INDIGO) and its two follow-up projects, Deep-HybridDataCloud (DEEP) and eXtreme-DataCloud (XDC) are considered to be part of the key contributors to the actual implementation of EOSC.

Funded by the EC, the projects are aimed at developing cloud-oriented scalable technologies capable of operating at the unprecedented scale as required by the most demanding, data intensive, research experiments in Europe and Worldwide.

Thanks to the service components developed by INDIGO, researchers in Europe are now using public and private cloud resources to handle large volume data that enables new research and innovations different scientific disciplines and research fields. In particular, many INDIGO services are included – or in the process to be included – in the unified service catalogue provided by the EOSC-hub project, endeavoured putting in place the basic layout for the European Open Science Cloud.

Built upon the already existing INDIGO service components, DEEP released Genesis a set of software components enabling the easy development and integration of applications requiring cutting-edge techniques such as artificial intelligence (deep learning and machine learning), data mining and analysis of massive online data streams. These components are now available as a consistent and modular suite, ready to be included in the service catalogue.

The INDIGO service components are also the building blocks of Pulsar, the first XDC release, featuring new or improved functionalities that cover important topics like federation of storage resources, smart caching solutions, policy driven data management based on Quality of Service, data lifecycle management, metadata handling and manipulation, optimised data management based on access patterns. Some of them ready to be included in the service catalogue.

Suitable to run in the already existing and the next generation e-Infrastructures deployed in Europe, the INDIGO, DEEP and XDC solutions have been implemented following a community-driven approach by addressing requirements from research communities belonging to a wide range of scientific domains:  Life Science, Biodiversity, Clinical Research, Astrophysics, High Energy Physics and Photon Science, that represent an indicator in terms of computational needs in Europe and worldwide.

This is exactly the way forward for EOSC: an advanced and all-encompassing research environment that founds its core mission on community-driven open source solutions.

New publication: “Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey”

We are thrilled to announce that we have published a new paper entitled “Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey” on the Springer Artificial Intelligence Review Journal.

The paper, that is published as Open Access and can be downloaded following its doi: 10.1007/s10462-018-09679-z, is authored by Giang Nguyen, Stefan Dlugolinsky, Martin Bobák, Viet Tran, Álvaro López García, Ignacio Heredia, Peter Malík and Ladislav Hluchý, from the Institute of Informatics Slovak Academy of Sciences (IISAS) and the Institute of Physics of Cantabria (IFCA – CSIC – UC).

Abstract: The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.

DEEP-Genesis: first software release is out

The first DEEP-HybridDataCloud software release is out!

The DEEP-HybridDataCloud project is pleased to announce the availability of its first public software release, codenamed DEEP Genesis. The release notes can be found here.

This release comes after an initial phase of requirement gathering which involved several research communities in areas as diverse as citizen science, computing security, physics, earth observation or biological and medical science. This resulted in the development of a set of software components under the common label of DEEP as a Service (DEEPaaS) enabling the easy development and integration of applications requiring cutting-edge tecniques such as artificial inteligence (deep learning and machine learning), data mining or analysis of massive online data streams.  These components are now released into a consistent and modular suite, with the aim of being integrated under the EOSC ecosystem.

DEEP Genesis provides open source modules to allow users from research communities to easily develop, build and deploy  complex models as a service at their local laptop, on a production server or on top of e-Infrastructures supporting the DEEP-Hybrid-DataCloud stack.

High-level modules covering three types of users:

  • Basic users can browse and download already built-in models and reuse them for training on their own data.
  • Intermediate users can retrain the available models to perform the same tasks but fine tuning them to their own data.
  • Advanced users can develop their own deep learning tools from scratch and easily deploy them within the DEEP infrastructure.

All models can be exposed in a friendly front-end allowing an easy integration within larger scientific tools or mobile applications thanks to a RESTful API (DEEPaaS API).  More details can be found here.

Key components of the release:

  • The DEEP Open Catalogue where the single users and communities can browse, store and download relevant modules for building up their applications (such as ready to use machine learning frameworks, tutorial notebooks, complex application architectures, etc.).
  • A runtime engine able to supply the required computing resources and deploy related applications.
  • The DEEP PaaS layer coordinating the overall workflow execution to select the appropriate resources (cloud and others, HPC, HTC) and manage the deployment of the applications to be executed.
  • The DEEP as a Service solution offering the application functionality to the user.

All the DEEP components are integrated into a comprehensive and flexible architecture that could be deployed and exploited following the user requirements. The DEEP Authentication and Authorization approach follows the AARC blueprint, with support for user authentication through multiple methods thanks to the INDIGO IAM user authentication (SAML, OpenID Connect and X.509), support for distributed authorization policies and a Token Translation Service, creating credentials for services that do not natively support OpenID Connect.

The DEEP-HybridDataCloud software is built upon the INDIGO-DataCloud components, and is released under the Apache License Version 2.0 (approved by the Open Source Initiative), except for the modifications contributed to existing projects, where the corresponding open source license has been used. The services can be deployed on both public and private cloud infrastructures. Installation, configuration guides and documentation can be consulted here. The initial set of ready-to-use models from a variety of domains can be found at the DEEP Open Catalogue. 

Get in touch

If you want to get more information about the scientific applications adopting the DEEP-Hybrid-DataCloud solutions or you want to become one, please contact us!