News & Updates

DEEP as a Service: Deep Learning for everybody

First part: Running a module locally for prediction

Deep Learning is nowadays at the forefront of Artificial Intelligence, shaping tools that are being used to achieve very high levels of accuracy in many different research fields. Training a Deep Learning model is a very complex and computationally intensive task requiring the user to have a full setup involving a certain hardware, the adequate drivers, dedicated software and enough memory and storage resources. Very often the Deep Learning practitioner is not a computing expert, and want all of this technology as accessible and transparent as possible to be able to just focus on creating a new model or applying a prebuild one to some data.

With the DEEP-HybridDataCloud solutions you will be able to start working from the very first moment!

The DEEP-HybridDataCloud project offers a framework for all users, and not just for a few experts, enabling the transparent training, sharing and serving of Deep Learning models both locally or on hybrid cloud system. 

The DEEP Open Catalog  (https://marketplace.deep-hybrid-datacloud.eu/, also known as “marketplace”) provides the universal point of entry to all services offered by DEEP. Its offers several options for users of all levels to get acquainted with DEEP:

  • Basic Users can browse the DEEP Open Catalog, download a certain model and apply it to some local or remote data for inference/prediction.
  • Intermediate Users can also browse the DEEP Open Catalog, download a model and do some training using their own data easily changing with the parameters of the training.
  • Advanced Users can do all of the above. In addition, they will work on more complex tasks, that include larger amounts of data.

The DEEP-HybridDataCloud solution is based on Docker containers packaging already all the tools needed to deploy and run the Deep Learning models in the most transparent way. No need to worry about compatibility problems, everything has already been tested and encapsulated so that the user has a fully working model in just a few minutes.

To make things even easier, we have developed an API allowing the user to interact with the model directly from the web browser. It is possible to perform inference, train or check the model metadata just with a simple click!

Let’s see how all this work!

In this post we will show how to download and use one of the available models from the DEEP Open Catalog in our local machine. These instructions will assume the user is running on linux but the docker containers can run on any platform.

First we browse the catalog and click on the model we are interested in among the many that are already in place. Once we click on the model of our choice we will see something similar to this:

In this case we have selected a module classifying plant images according to their species using a convolutional neural network architecture developed in Tensorflow. Under the name of each of the modules in the DEEP Open Catalog we find some useful links:

  • Link to the GitHub repository including the model source code
  • Link to the Docker Hub repository of the docker containing all the needed software configured and ready to use
  • In case this is a pretrained model, a link to the original dataset used for the training.

Before starting we need to have either docker or udocker installed in our computer. We will be using udocker since it allows to run docker container without requiring root privileges. To install udocker you can just follow this very simple instructions:

 virtualenv udocker

 source udocker/bin/activate

 git clone https://github.com/indigo-dc/udocker

 cd udocker

 pip install .

We can now just follow the instructions on the right part of the module page and type the following commands:

udocker pull deephdc/deep-oc-plants-classification-tf

udocker run -p 5000:5000  deephdc/deep-oc-plants-classification-tf

This will download (pull) the docker container from Docker Hub and run it on our local machine. The run methods includes the option -p 5000:5000 which maps the port 5000 from our local server into the port 5000 in the container.

We have now the DEEP API running on our localhost!

You can go to your preferred web browser and enter localhost:5000 in the address bar. This will open the DEEP as a Service API endpoint. It looks like this:

As you can see in the image different methods can be chosen. You can either return the list of loaded models (in this case we are just running the plant classification example) or the metadata of your models. You can also do some prediction on some plant image of your interest or even train the classification neural network on a completely new dataset. All this directly from your web browser.

Let’s now try out the prediction method. We can either use a local file or the URL to some online plant image to perform the classification. For this example we will use a locally stored image.

We click on Select File and browse our file system for the image we are interested in. In this case we will use the image of a rose. If you want to reproduce this example you can find the image here.

Now that we have selected the image we can click on Execute. The first time we perform a prediction with a given model the process takes a little while since the Tensorflow environment must be initialized. Afterwards, the prediction will be extremely quick (less than one second in many cases).

The prediction for our roses gives us the following output:

The result shows us the 5 most probable species. The most probable one is the Rosa Chinensis with a probability of 80%. Our module has predicted correctly! Together with the prediction we can find a link pointing to Wikipedia to check the species. 

The output is given in JSON format that can be very easily integrated with any other application needing to access the results.

In this example we have seen how to use one of the DEEP-HybridDatacloud modules running a Deep Learning model in just a few simple steps on our local machine.

If you want more detail, you can find the full documentation here.

In next posts we will see how to train a model using the DEEP API and how to run on a cloud system.  Stay tunned!

Paving the path towards the implementation of the EOSC ecosystem

The European Open Science Cloud (EOSC) aims to be the Europe’s virtual environment for all researchers to store, manage, analyse and re-use data for research, innovation and educational purposes.

INDIGO-DataCloud (INDIGO) and its two follow-up projects, Deep-HybridDataCloud (DEEP) and eXtreme-DataCloud (XDC) are considered to be part of the key contributors to the actual implementation of EOSC.

Funded by the EC, the projects are aimed at developing cloud-oriented scalable technologies capable of operating at the unprecedented scale as required by the most demanding, data intensive, research experiments in Europe and Worldwide.

Thanks to the service components developed by INDIGO, researchers in Europe are now using public and private cloud resources to handle large volume data that enables new research and innovations different scientific disciplines and research fields. In particular, many INDIGO services are included – or in the process to be included – in the unified service catalogue provided by the EOSC-hub project, endeavoured putting in place the basic layout for the European Open Science Cloud.

Built upon the already existing INDIGO service components, DEEP released Genesis a set of software components enabling the easy development and integration of applications requiring cutting-edge techniques such as artificial intelligence (deep learning and machine learning), data mining and analysis of massive online data streams. These components are now available as a consistent and modular suite, ready to be included in the service catalogue.

The INDIGO service components are also the building blocks of Pulsar, the first XDC release, featuring new or improved functionalities that cover important topics like federation of storage resources, smart caching solutions, policy driven data management based on Quality of Service, data lifecycle management, metadata handling and manipulation, optimised data management based on access patterns. Some of them ready to be included in the service catalogue.

Suitable to run in the already existing and the next generation e-Infrastructures deployed in Europe, the INDIGO, DEEP and XDC solutions have been implemented following a community-driven approach by addressing requirements from research communities belonging to a wide range of scientific domains:  Life Science, Biodiversity, Clinical Research, Astrophysics, High Energy Physics and Photon Science, that represent an indicator in terms of computational needs in Europe and worldwide.

This is exactly the way forward for EOSC: an advanced and all-encompassing research environment that founds its core mission on community-driven open source solutions.

D6.1 – State-of-the-art DEEP Learning (DL), Neural Network (NN) and Machine Learning (ML) frameworks and libraries

This document provides an overview of the state-of-the-art in Deep Learning (DL), Neural Network (NN) and Machine Learning (ML) frameworks and libraries to be used as building blocks in the DEEP Open Catalogue. The initial state of the catalogue will be built based on the outcome of this document and the initial user community requirements of scientific data analytic and ML/DL tools coming from WP2.

 

DEEP-JRA3-D6.1

New publication: “Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey”

We are thrilled to announce that we have published a new paper entitled “Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey” on the Springer Artificial Intelligence Review Journal.

The paper, that is published as Open Access and can be downloaded following its doi: 10.1007/s10462-018-09679-z, is authored by Giang Nguyen, Stefan Dlugolinsky, Martin Bobák, Viet Tran, Álvaro López García, Ignacio Heredia, Peter Malík and Ladislav Hluchý, from the Institute of Informatics Slovak Academy of Sciences (IISAS) and the Institute of Physics of Cantabria (IFCA – CSIC – UC).

Abstract: The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.

D4.2 – First implementation of software platform for accessing accelerators and HPC

This deliverable describes the first implementation of the software platform for accessing accelerators and HPC. The list of components included in the software platform is based on the analysis provided by Deliverable D4.1. This document provides detailed descriptions of software components used in the platforms, the work done on each component and its current status.
Evaluation of achieved results and implementation plan for the next periods are also included.

http://hdl.handle.net/10261/168086

D5.2 – High Level Hybrid Cloud solutions prototype

This document complements deliverable D5.1 Definition of the Architecture of the Hybrid Cloud (D5.1) with the specific prototype developments carried out to support the deployment of hybrid infrastructures across multiple IaaS Cloud sites. The document describes the technical challenges, the evolution of the components to support this prototype and a roadmap of implementation towards the final release of the High Level Hybrid Cloud solutions.

http://hdl.handle.net/10261/168087

D6.3 – First prototype of the DEEP as a Service

This document provides an updated description of the prototype implementation of the DEEP as a Service solution that is being developed within the DEEP-Hybrid-DataCloud project Work Package 6 (WP6). As such it provides an overview of the state of the art of the relevant components and technologies, as well as a technology readiness level assessment with regards to the required functionality, the required interactions with other work packages in the project, as well as the detailed work plan and risk assessment for each of the activities.

http://hdl.handle.net/10261/168088

D3.2 – Pilot testbed and integration architecgture with EOSC large scale infrastructures

The deliverable contains the plan, design, architecture and deployment of the Pilot Preview testbed based on technical requirements and descriptions from the WP2 use cases. The services and components developed during the DEEP-HybridDataCloud by the teams part of the WP4, 5 and 6 are deployed, tested and validated by end-users in this testbed. It also describes how the Pilot Preview testbed services and components will be integrated with EOSC production infrastructure and other external resource providers.

http://digital.csic.es/handle/10261/168084

DEEP-Genesis: first software release is out

The first DEEP-HybridDataCloud software release is out!

The DEEP-HybridDataCloud project is pleased to announce the availability of its first public software release, codenamed DEEP Genesis. The release notes can be found here.

This release comes after an initial phase of requirement gathering which involved several research communities in areas as diverse as citizen science, computing security, physics, earth observation or biological and medical science. This resulted in the development of a set of software components under the common label of DEEP as a Service (DEEPaaS) enabling the easy development and integration of applications requiring cutting-edge tecniques such as artificial inteligence (deep learning and machine learning), data mining or analysis of massive online data streams.  These components are now released into a consistent and modular suite, with the aim of being integrated under the EOSC ecosystem.

DEEP Genesis provides open source modules to allow users from research communities to easily develop, build and deploy  complex models as a service at their local laptop, on a production server or on top of e-Infrastructures supporting the DEEP-Hybrid-DataCloud stack.

High-level modules covering three types of users:

  • Basic users can browse and download already built-in models and reuse them for training on their own data.
  • Intermediate users can retrain the available models to perform the same tasks but fine tuning them to their own data.
  • Advanced users can develop their own deep learning tools from scratch and easily deploy them within the DEEP infrastructure.

All models can be exposed in a friendly front-end allowing an easy integration within larger scientific tools or mobile applications thanks to a RESTful API (DEEPaaS API).  More details can be found here.

Key components of the release:

  • The DEEP Open Catalogue where the single users and communities can browse, store and download relevant modules for building up their applications (such as ready to use machine learning frameworks, tutorial notebooks, complex application architectures, etc.).
  • A runtime engine able to supply the required computing resources and deploy related applications.
  • The DEEP PaaS layer coordinating the overall workflow execution to select the appropriate resources (cloud and others, HPC, HTC) and manage the deployment of the applications to be executed.
  • The DEEP as a Service solution offering the application functionality to the user.

All the DEEP components are integrated into a comprehensive and flexible architecture that could be deployed and exploited following the user requirements. The DEEP Authentication and Authorization approach follows the AARC blueprint, with support for user authentication through multiple methods thanks to the INDIGO IAM user authentication (SAML, OpenID Connect and X.509), support for distributed authorization policies and a Token Translation Service, creating credentials for services that do not natively support OpenID Connect.

The DEEP-HybridDataCloud software is built upon the INDIGO-DataCloud components, and is released under the Apache License Version 2.0 (approved by the Open Source Initiative), except for the modifications contributed to existing projects, where the corresponding open source license has been used. The services can be deployed on both public and private cloud infrastructures. Installation, configuration guides and documentation can be consulted here. The initial set of ready-to-use models from a variety of domains can be found at the DEEP Open Catalogue. 

Get in touch

If you want to get more information about the scientific applications adopting the DEEP-Hybrid-DataCloud solutions or you want to become one, please contact us!

DEEP Hybrid DataCloud accepted paper in 22nd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES2018), Sep 3-5, 2018, Belgrade, Serbia

We are proud to announce that a research paper developed under the Deep Hybrid DataCloud Project has been accepted for inclusion in 22nd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES2018) to be held on 3-5 September 2018, in Belgrade (Serbia). This paper will be published by Elsevier Science in the open-access Procedia Computer Science series on-line.

Title: A multivariate fuzzy time series resource forecast model for clouds using LSTM and data correlation analysis

Authors, Nhuan Trana, Thang Nguyena, Binh Minh Nguyena, Giang Nguyenb

a School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam

b Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia

Abstract

Today, almost all clouds only offer auto-scaling functions using resource usage thresholds, which are defined by users. Meanwhile, applying prediction-based auto-scaling functions to clouds still faces a problem of inaccurate forecast during operation in practice even though the functions only deal with univariate monitoring data. Up until now, there are still very few efforts to simultaneously process multiple metrics to predict resource utilization. The motivation for this multivariate processing is that there could be some correlations among metrics and they have to be examined in order to increase the model applicability in fact. In this paper, we built a novel forecast model for cloud proactive auto-scaling systems with combining several mechanisms. For preprocessing data phase, to reduce the fluctuation of monitoring data, we exploit fuzzification technique. We evaluate the correlations between different metrics to select suitable data types as inputs for the prediction model. In addition, long-short term memory (LSTM) neural network is employed to predict the resource consumption with multivariate time series data at the same time. Our model thus is called multivariate fuzzy LSTM (MF-LSTM). The proposed system is tested with Google trace data to prove its efficiency and feasibility when applying to clouds.