DEEP support to fight COVID-19

Artistic representation of a coronavirus.

The last years, artificial intelligence, and more concretely, deep learning, has proved to be a very useful tool for biomedical research, medical related problems and clinical assistance. In the current situation of health emergency a massive amount of data is being produced and need to be understood using the most powerful tools available. The DEEP project is contributing to fight the COVID-19 emergency on different fronts thanks to its capacity to process huge amounts of data, to develop and share deep learning applications in a quick and easy way, and to the resources available at the project testbed. Currently, we are involved in the following initiatives:

Genetic studies

DEEP has been requested to join a project coordinated by the Institut d’Investigacions Biomèdiques de Barcelona that aims at discovering any genetic traits explaining why some people without previous pathologies get severe forms of covid-19 leading them to the ICU or even to death. The study will take genetic material (together with populational and clinical information) of 200 patients who are under 60 years old and who do not have any previous or serious chronic diseases. We want to study the difference between those patients who evolve well and those who get worse and end up in the ICU by discovering whether, at the genetic level, these latter patients have a special susceptibility. In that case this will give us an indicator of which cases are the most vulnerable and should be protected. If this indicator is found, the patients without such genetic condition could get discharged earlier and we could protect those who, besides the elderly, are likely to have serious symptoms of the disease. DEEP will provide extensive data analysis, including the development of a deep learning model that will then be published and available at our Open Catalog, and dedicated testbed resources.

X-ray images classification

Building on our image clasification module, DEEP is collaborating with the University Hospital Marqués de Valdecilla in order to develop and share a new module trained to classify chest x-ray images that will act as an assistant for the physician and will help with the patients triage. In the current state of health alarm, huge amounts of simple chest x-rays are being produced daily. Due to the saturation of the medical systems, professionals with no x-ray experience are being forced to interpret the chest images, and must systematically resort to the advice of a radiologist who is overwhelmed with consequent delay in diagnosis. Under these circumstances, a reliable automatic triage system to assist diagnosis using simple chest x-rays would greatly expedite patient management.
Although this project focuses on patients with COVID-19, the developed tools will be equally applicable to other diseases with pneumonia and will be made available at the Open Catalog.

Data science to understand confinement effectiveness

European countries have adopted strict confinement measures to fight the COVID-19 spread. The Spanish National Research Council, in cooperation with the Spanish National Microbiology Center from the Health Institute Carlos III, is using data science and computing techniques in order to understand the effectiveness of these measures in Spain. The project is following a multidisciplinary approach involving computing, demography, physics and migration experts; studying high-resolution massive data to gain insights in how mobility and social contacts have changed since the measures were enforced and how these changes are influencing the COVID-19 incidence. These data are then leveraged by computational models (based both on artificial intelligence and mechanistic models), allowing to study different scenarios towards the end of the confinement measures. In this regard, the DEEP-Hybrid-DataCloud stack is being used to develop the AI models, that will be published in the Open Catalog and served through the DEEP as a Service component.

Accelerated computing in DEEP-HybridDataCloud project

Accelerated computing systems play important roles for delivering energy efficient and powerful computing capabilities for computational-intensive applications. However, the support for accelerated computing in cloud is not straightforward. Unlike common computing resources (CPU, RAM), accelerators need special treatment and support at every software layer. The maturity of the support strongly depends on specific hardware/software combinations. A mismatch at any software layer will make the accelerators unavailable for end-users.

The DEEP-Hybrid-DataCloud project aims at developing a distributed architecture to leverage intensive computing techniques for deep learning. One of the objectives of the project is to develop innovative services to support intensive computing techniques that require specialized HPC hardware, such as GPUs or low-latency interconnects, to explore very large datasets. In the project, the support for accelerators are carefully treated at all software layers:

  • Support for accelerators at hypervisor/container level: During the project, GPU support in udocker, the portable tool to execute simple Docker containers in user space, has been significantly improved. Current version of udocker can automatically detect GPU drivers on host machines and mount it to containers. The improvement allows udocker to execute standard containers with GPU support from DockerHub like tensorflow:latest-gpu without modification. The support for GPU in other container and hypervisor drivers is also analyzed, tested and deployed on the project testbed in combination with higher cloud middleware framework whenever possible.
  • Support for accelerators at cloud middleware framework level: The project testbed consists of sites with different cloud middleware frameworks including Openstack, Apache Mesos, Kubernetes and also HPC clusters. All these cloud middleware platforms are deployed with GPU supports. As GPU virtualization is supported only on newer GPU cards, Openstack sites mostly provide support for GPU via PCI passthrough approach in KVM hypervisor. Kubernetes sites have GPU support via NVIDIA device plugin and Mesos provide access to GPU via its own executor which mimics the nvidia-docker approach. Finally, the GPU support on HPC sites is provided by the portable, user-space execution tool udocker mentioned above.
  • Support for accelerators at PaaS orchestrator level: In the project,the information system (Cloud Info Provider + CMDB) has been extended in order to collect information about the availability of GPUs at the sites. The GPUs can be made available through different services at the IaaS level, e.g. they can be provided through native Cloud Management Framework interfaces (e.g. Openstack or Amazon specific flavors) or through Container Orchestration Platforms, like Mesos. The TOSCA model for compute and container nodes has been extended in order to include requirements of specialized devices like GPUs, specified by users. The Orchestrator bases its scheduling mechanism on the provided information to select the best site where the resources will be allocated.

DEEP testbed

DEEP Preview-Testbed exposing enhanced DEEP-Rosetta services is now available for users to experience the ease of building, developing, training and deploying machine learning models and exploiting the new DEEP training dashboard.

Although called a testbed, in reality it is a small scale production infrastructure, all software components and services are running the versions released in the DEEP-2 – coded named Rosetta – and are operated and managed as any other production service or platform. A diagram of the Pilot Preview is shown in the figure below.

Resources made available by project partners are nonetheless significant, one can exploit about 30 high end NVIDIA GPUs distributed across 3 cloud e-infrastructures and a data/storage management system that  is federated between 3 providers with about 80TB of total storage. One of the main features is the data locality to the computing resources, allowing a more efficient computation.

Other features worthy to mention are: the cloud resource providers are part of the production EGI Fedcloud infrastructure, the data/storage management system is a “result” of a tight collaboration between DEEP-HybridDataCloud and eXtreme DataCloud – XDC projects, where storage resources from both projects are federated through the Onedata service (bottom of the diagram); users are authenticated and authorized through the Federated AAI service called DEEP-IAM  (right side of the diagram).

Finally, and the most important highlight, the users can execute ML/AI applications in a production mode with long training of the models and using large datasets (some cases of the order of TBs).

Integrating HPC resources with PaaS Cloud approach in DEEP-HybridDataCloud project

Typically the HPC environments are characterized by software and hardware stacks optimized for maximum performance at the cost of flexibility in terms of OS, system software and hardware configuration. This close-to-metal approach creates a steep learning curve for new users and makes external services, especially cloud-oriented, hard to cooperate with. In exchange for the flexibility one gets access to tens of thousands of CPUs and a high performance network and storage but with little isolation between the jobs and little or no possibility for the applications to interact with services outside a particular cluster.

The DEEP-Hybrid-DataCloud project aims at developing a distributed architecture to leverage intensive computing techniques for deep learning. One of the objectives of the project is to promote the integration of existing HPC resources under a Hybrid Cloud approach, so it can be used on-demand by researchers of different communities.

The abstraction offered by our solution simplifies the interaction for end users thanks to the following key features:

  • Promoting container technologies for application development, delivery and execution: This approach enables easier application development, integration and delivery with CI/CD practices. It also makes applications portable and can be deployed/executed on any platform, independently from OS/libraries/software installed on host. Such containerized applications can be used in both Cloud or HPC platforms without modifications.
  • Using portable container execution tool in use space on HPC platforms: udocker is a basic user tool to execute simple Docker containers in user space without requiring root privileges. It enables download and execution of Docker containers by non-privileged users in Linux systems where Docker is not available. It can be used to pull and execute Docker containers in Linux batch systems and interactive clusters that are managed by other entities such as grid infrastructures or externally managed batch or interactive systems.
  • Standard interfaces are used to manage different workloads and environments, both cloud and HPC-based: the TOSCA language is used to model the jobs and the PaaS Orchestrator creates a single point of access for the submission of the processing requests. The DEEP PaaS layer features advanced federation and scheduling capabilities ensuring the transparent access to different IaaS back-ends including OpenStack, OpenNebula, Amazon Web Services, Microsoft Azure, Apache Mesos, Kubernetes and finally HPC environments. The user request is expressed in the TOSCA templating language and submitted to the PaaS Orchestrator. Depending on the type of request, the specific plugin will be activated in order to dispatch the task to the best compute service.
  • Adopting unified AAI throughout the whole stack, from the PaaS to the data and compute layer: it is implemented by the INDIGO IAM service that provides federated authentication based on OpenID Connect/OAuth20 mechanisms. A SSH PAM module has been developed in order to allow users to login via ssh using their IAM access token instead of password. The users are automatically provisioned on the HPC cluster starting from the list of users registered in IAM and belonging to a specific group. Each IAM user is mapped onto a local account.
  • A REST API gateway for submitting and monitoring the jobs from outside the HPC site. QCG-Computing is an open architecture implementation of SOAP Web service for multi-user access and policy-based job control routines by various queuing and batch systems managing local computational resources. This key service in QCG is using Distributed Resource Management Application API (DRMAA) to communicate with the underlying queuing systems. QCG-Computing has been designed to support a variety of plugins and modules for external communication as well as to handle a large number of concurrent requests from external clients and services.

Collaboration partnership between DEEP-Hybrid-DataCloud and EOSC DIH

We are pleasured to announce that DEEP-Hybrid-DataCloud consortium has signed a collaboration agreement with EOSC-DIH, aiming at boosting the dissemination of DEEP offering and fostering the adoption of project solutions by SMEs. This collaboration will allow to perform industrial pilots, which will increase the acceptance in the market of DEEP solutions.

The EOSC DIH is a mechanism for private companies to collaborate with public sector institutions to access technical services, research data, and human capital.

The goal of the collaboration agreement is twofold, the promotion of the collaboration among DEEP-Hybrid-DataCloud and SMEs, and the dissemination of DEEP results through the EOSC-hub channels. The activities envisioned are:

  • Inclusion of the DEEP-Hybrid-DataCloud services in the EOSC DIH offering
  • ML implementation best practices and ML application porting to EOSC technological infrastructures
  • AI-enabled services prototyping in EOSC landscape
  • Identify SMEs interested in DEEP results, providing the resources to set up a business pilot, and facilitating the adoption of DEEP offering by SMEs
  • Disseminate DEEP offering through the EOSCH DIH channels to promote the interaction among providers, SMEs and DEEP consortium

Event-Driven Execution of DEEP Open Catalog Modules for Prediction on Amazon Web Services

The DEEP Open Catalog provides ready to use modules for Artificial Intelligence, Machine Learning and Deep Learning models that can be executed in a wide variety of computing platforms. These include local laptops, production servers, supercomputers and e-infrastructures supporting the DEEP Hybrid-DataCloud software stack.

The versatility of the DEEPaaS component, which provides a REST API to serve machine learning and deep learning models, has allowed to introduce additional functionality required to perform the prediction phase from the command line interface. This is required to perform batch execution of prediction jobs to be run, for example, on Local Resource Management Systems (LRMS) such as SLURM, within a cluster of PCs. This allows, for example, to classify thousands of audio files using an Audio Classifier module in the DEEP Open Catalog in an unattended manner.

We wanted to determine how easy it was to run these modules on a public Cloud provider such as Amazon Web Services. Indeed, existing services such as AWS Batch provide the ability to deploy virtual elastic clusters, even with GPU support, that execute Docker-based jobs and can auto-scale to zero in order to support a pay-per-usage approach. 

To this aim, we used the open-source SCAR tool, which allows to create highly-parallel event-driven file-processing serverless applications that execute on both customised runtime environments on AWS Lambda and in AWS Batch compute environments. As it can be seen in the newly added deep-audio use case for SCAR, this tool uses a YAML file to describe the job to be executed (in this case based on the deephdc/deep-oc-audio-classification-tf Docker image in Docker Hub).

The following figure shows the dashboard of the developed service. It consists of a web-based application that provides seamless access to the service to the DEEP’s user community. The service provides the ability to select a model from the DEEP Open Catalog (from those integrated so far) so that, whenever a new file is upload, this triggers the execution of prediction phase of the model, using this file as input. This is executed on a dynamically provisioned cluster of machines than can leverage both CPUs and GPUs (if the models supports this feature). Additional computing nodes are added if many files are pending to be processed. Also, the virtual clusters auto-scales to zero whenever all the files have been processed, thus providing seamless event-driven prediction for DEEP models.

The following figure summarises the architecture of the service. The web service has been integrated with the DEEP IAM through the help of Amazon Cognito’s Federated Identities in order to provide easy access for existing DEEP users. Uploading a file to the Amazon S3 bucket triggers the execution of an AWS Lambda function (created through SCAR) that automatically converts the request to a Batch job submitted to a specific compute environment, what triggers the deployment of additional virtual machines to perform the processing (prediction) of the files. 

As  a summary, the flexibility of DEEPaaS and the availability of pre-trained modules in the DEEP Open Catalog has facilitated the highly scalable execution of these models on a public Cloud provider such as Amazon Web Services. SCAR is able to provide serverless computing for scientific applications to be run directly on AWS Lambda. However, the large size of the Docker images in the DEEP Open Catalog required to support the event-driven execution of more resource-intensive computing services as is the case of AWS Batch. 

Training your model with the DEEP platform is now easier than ever.

The second DEEP release was recently published and it comes with plenty of new useful functionalities, all of them with the common goal of easing the path for the scientific communities to develop, build and deploy complex models as a service at their local laptop, on a production server or on a cloud infrastructure. One of the main novelties is that the platform now supports asynchronous training and allow to launch, monitor, stop and delete the training directly from the web browser in a transparent way. The prediction features were already introduced and explained in detail in a previous post. Now is the turn for a walk through the training functionalities of the DEEP platform  and the recent additions to improve the user interactivity. 

To make things easy, we have developed a platform allowing the user to interact with the model directly from the web browser. Let’s see how all this work!

In this post we will show how to train a model using the training dashboard. All the modules included in the marketplace are available from the dashboard. Let’s take as an example the image classifier. We are going to perform a new training to distinguisth between pictures of roses and daisies.

The data must first be uploaded to the storage system of your choice. For this example we will be using NextCloud, where we need to create a folder called dataset_files that must include two text files:

  • train.txt: containing the paths within NextCloud to the images to be used for the training followed by a number indicating the category of each image (0 or 1 in this case, since this is a binary classification)
  • classes.txt: containing the name of the different classes. In this case rose and asteraceae (the scientific name of the daisies).
Example of the content of the train.txt and classes.txt files

Once we have our training data in place, we open the dashboard and browse the different models till we find the one titled “Train an image classifier”.

Then, we can just click on the Train module button. This will take us to a web form where we can specify the resources needed for the training: CPU or GPU, amount of memory or even the concrete site where we want to run in case we have a certain preference. We can also set the different identification parameters (ID and password) in order to access the storage system where we have our training data (NextCloud in this particular case).

Form to be filled specifying the resources needed for the training

Once we have submitted the form, we are redirected to a dashboard showing us the status of our deployment.

Dashboard showing the status of our deployments

When the CREATE COMPLETE label appears, we can click on the Access menu and select DEEPaaS. This will open a new tab with a nice user interface allowing us to interact with the image classification model.

From this web user interface we can:

  • Check the model metadata and details
  • Retrain a certain model with our own data
  • Get the list of trainings currently running
  • Get the status of a training
  • Cancel the training
  • Make a prediction with a certain trained model
DEEPaaS web user interface. It allows checking the model metadata, launch and interact with the training and predicting using a trained model

If we click on the train button we can set the training options such as number of epochs, type of architecture, batch size, whether or not to use data augmentation or early stopping, among many others. Once we are happy with the training options we can launch it just by clicking on the Execute button.

During the training, we can monitor the learning metrics (accuracy and loss) using Tensorboard by clicking on the Monitor option from the deployment Access menu in the dashboard. This will open a new tab where we will see the following information:

Monitoring of the training metrics using Tensorboard

As mentioned earlier, the main change in this new release is the asynchronous support for training. This means that you do not have to wait for the training to finish in order to continue using the web user interface. For example you can launch a training and, while you wait for it to finish, you can immediately do predictions on a previously trained model or close the browser and come back later.

Also, you can check the status of your training or even DELETE it by using the identifier provided when the training is created (UUID). For instance, let’s see what is the current status of our image classification training from the web user interface by using the UUID:

Monitoring the status of the training. We can see that the training finished (status: done) and also some additional information about it (duration, finish date, etc…)

Another useful feature in this new release is the ability to store the history of all the performed trainings. This allows to monitor the status of your training directly from the training Dashboard.

Monitoring the status of your training from the training dashboard

Once our training is over, we can use the web user interface to perform some prediction with our brand new model. Let’s use the image of a daisy that we found in internet:

Prediction with the new trained model

The prediction output looks like this:

Prediction output

We can see that the prediction is correct! Our image is predicted as a daisy with a probability of 100%.

In this post we have seen how to browse a model from the DEEP Open Catalog, train it with our own data, monitor the training and do some prediction with our new trained model. Everything from the web browser!

Soon, another post from this saga on how to use the development docker with Jupyter. Stay tuned!

DEEP-Hybrid-DataCloud announces the availability of the second software release and platform

  • New platform release allows data scientists and machine learning practitioners to build, develop, train and deploy machine learning services easily, with a comprehensive set of tools and services covering the whole machine learning application lifecycle.
  • The new DEEP as a Service allows to deploy machine learning models as services, following a serverless approach, with horizontal scalability.
  • Redesigned marketplace allows to train existing modules easily.

It is a pleasure to announce that the DEEP-HybridDataCloud project has published its second software release and platform, code named DEEP Rosetta. DEEP Rosetta expands on the first version of software generated by the project, called DEEP Genesis, enlarging its functionalities to cover the whole machine learning cycle, enhancing the stability of the different components and adding new features.  Developing, training, sharing and deploying your model has never been easier!

All the changes in this new release are oriented towards the common project goal of easing the path for the scientific communities to develop, build and deploy complex models as a service at their local laptop, on a production server or on top of e-Infrastructures supporting the DEEP-Hybrid-DataCloud stack. 

As in the previous release, the DEEP components are integrated into a comprehensive and flexible architecture that can be deployed and exploited following the user requirements. 

Comprehensive set of services for machine learning

  • The DEEP training facility, accessible through the DEEP training dashboard allows data scientists to develop and train their models, with access to latest generation EU computing e-Infrastructures.
  • DEEP as a Service is a fully managed service that allows to easily and automatically deploy the developed applications as services, with horizontal scalability thanks to a serverless approach. The pre-trained applications that are published in the catalog are automatically deployed as services to make them available for general use.
  • DEEP Open Catalog and marketplace comprises a curated set of applications ready to use or extend, fostering knowledge exchange and re-usability of applications. This open exchange aims to serve as a central knowledge hub for machine learning applications that leverage the DEEP-Hybrid-DataCloud stack, breaking knowledge barriers across distributed teams. Moreover, pre-configured Docker containers, repository templates and other related components and tools are also part of this catalog.
  • DEEPaas API enables data scientists to expose their applications through an HTTP endpoint, delivering a common interface for machine learning, deep learning and artificial intelligence applications.

The platform has been extended to support asynchronous training, allowing to launch, monitor, stop and delete the training directly from your web browser. The trained models to perform inference can now be chosen from the models available in the training history. All the documentation on these new features has been accordingly updated here.

In addition, the user friendly training dashboard allows now to easily and transparently deploy the modules in a cloud environment. From the dashboard, the user can choose the resources needed for the deployment in terms of memory, type of processing unit (CPU or GPU), the storage client to be used or even to manually configure the scheduling.

DEEP training dashboard screenshot
The DEEP training dashboard allows to easily train any existing modules, or your own one.

The DEEP Open Catalogue has been renewed with a more appealing design, improving the organisation of the modules and the general site interactivity. 

 The DEEP Rosetta release consists of:

  • 10 products distributed via 22 software packages and tarballs supporting the CentOS 7, Ubuntu 16.04 and 18.04 operating systems.
  • 15 fully containerised ready-to-use models from a variety of domains available at the DEEP Open Catalogue

The release notes can be found here. The full list of products together with the installation, configuration guides and documentation can be consulted here

The EOSC ecosystem

The release components have been built into a consistent and modular suite, always with the aim of being integrated under the EOSC ecosystem.  As part of our integration path into the EOSC, we have published four different services and components in the EOSC portal:

  • DEEPaaS training facility: Tools for building training, testing and evaluating Machine Learning, Artificial Intelligence and Deep Learning models over distributed e-Infrastructures leveraging GPU resources. Models can be built from scratch or form existing and pre-trained models (transfer learning or model reuse).
  • Application composition tool: Easy and user focused way of building complex application topologies based on the TOSCA open standard, that can be deployed on any e-Infrastructure using the orchestration services.  
  • Infrastructure Manager: Open-source service that deploys complex and customised virtual infrastructures on multiple back-ends.
  • PaaS Orchestrator: Service allowing to coordinate the provisioning of virtualized compute and storage resources on distributed cloud infrastructures and the deployment of dockerized services and jobs on Mesos clusters.

Collaboration with Industry

The EOSC Digital Innovation Hub (DIH) is in charge of establishing collaborations between private companies and the public sector to access technological services, research data and human capital. The DEEP project and DIH have established a collaboration where DEEP provides services and DIH seeks for SMEs to conduct application pilots. The services included in this agreement are:

  • ML application porting to EOSC technological infrastructure
  • ML implementation best practices
  • AI-enabled services prototyping in EOSC landscape

Get in touch

If you want to get more information about the scientific applications adopting the DEEP-Hybrid-DataCloud solutions, please contact us!

DEEP Marketplace

We are glad to present an open ecosystem to foster the exchange of machine learning modules in the scientific community: the DEEP Marketplace. The catalogue includes all the applications developed in the DEEP Hybrid DataCloud project (DEEP), as well as external modules developed by users who want to share their applications and solution approaches within the AI community, thus promoting collaboration between research and development (R&D) groups.

The DEEP framework supports artificial intelligent modules, especially compute-intensive deep learning modules over distributed e-infrastructures in the European Open Science Cloud (EOSC) for the most compute-intensive tasks of the intelligent software development and deployment life-cycle phases. Moreover, the DEEP Open Catalogue (available in DEEP Marketplace), DEEP as Service solution and DEEP learning facilities are in connection with the EOSC storage and data services, which allow to easily share, publish, exchange, collaborate and deploy developed models as services. These, in turn, support Open Science in which transparency and reproducibility of computational work are important.

DEEP use cases as deep learning modules are publicly available as Open Source in the DEEP Open Catalogue and publicly accessible in the DEEP Open Market. They are categorized in the following main extendable groups:

  • Earth observations: deep neural network applications to perform pattern recognition on satellite images. They can be combined with other in-situ measurements for ecosystems and biodiversity to perform tasks such as remote object detection, terrain segmentation or meteorological prediction. Currently we offer a super-resolution module to upscale low resolution bands to high resolution for the most popular multispectral satellites from all around the world.
  • Biological and medical science: deep learning modules for biomedical image analysis have opened new opportunities in how diseases are diagnosed and treated. We currently provide a retinopathy automated classification based on color fundus retinal photography images.
  • Cyber-security and network monitoring: we provide modules with co-functions enhancements for surveillance Intrusion Detection Systems supervising traffic network flows of computing infrastructure. 
  • Citizen science: deep learning modules for leveraging citizen science in large-scale biodiversity monitoring. The available modules include automatic identification of species from images for a wide range of categories (plants, seeds, conus and phytoplankton). Most of these modules are also available as mobile applications.
  • General purpose: this includes modules that can be used across a wide range of domains. We currently provide modules for image classification, audio classification, speech-to-text synthesis and pose detection.

These groups are open extensible categories, which offer flexible way for adding new and interesting applications to be presented in the DEEP Marketplace for wide and public audience.

Figure 1. A snapshot of the application catalogue.

All modules in DEEP Open Catalogue is categorized by name and by tags like #tensorflow, #keras or #docker with upcoming full-text search capability. For every module a set of metadata entries are provided. The full list of metadata entries includes:

  • Title
  • Summary

A quick one liner describing the module

  • Description

An extended description of what the module performs, what problem is it solving, what machine learning tools it is using, which data types it needs for prediction, etc.

  • Tags

These are the tags that are going to make the module more searchable. Tags usually refer to overall features or tools used in the module. 

  • License
    Under what license a module can be used, modified, and shared.
  • Creation date
  • Cite this module” – URL for citation (for modules based on a published paper)

URL pointing to the DOI of the paper

  • “Get this module” – URL for the module’s source code

This points to the source code that is used to run the application. If you plan to use the application (especially if you want to retrain it) it always good to have a look to the README in this repository as it is going to explain in detail what you need to successfully retrain the application.

  • URL for the module’s Docker image in Dockerhub

This is the Docker image of the module. It can be hosted in the deephdc organization or somewhere else.

  • URL for training weights of the module

This is usually a compressed file with all the info from the training (including the final weights). Together with the module’s code, it should be enough to reproduce any of the model’s functionality.

  • URL pointing to the dataset used for training

This points to the dataset used for training (for example ImageNet for some image classification modules)

Not all the modules need to define all the metadata fields. In the module’s page you will also see a small side panel with quick one-liner instructions on how to run the module with Docker and start making predictions: a simple example would be to run the Docker container locally and go to http://127.0.0.1:5000/docs in your local browser to interact with the API.

As you can see, we provide all the tools to make the code exportable to any platform as the code is completely open source and not linked to the DEEP infrastructure in any way. Any user can access the code of any application to modify it freely.

As explained in the documentation modules in the Marketplace modules are intended for two purposes:

  • So that a user uses a module “as it is” for prediction.
  • So that the user finetunes the module to a specific task. For example users reuse the image classification module to train a classifier for x-ray scans.

Sometimes users will find that their task requires code that is not present in any of the available modules’ functionalities. So users can develop their own custom code from scratch using the DEEP template if they want to, or starting from another module’s code as all modules have Open Source  Licenses. Once users have their new module developed to perform the desired task they can request to upload it to the Marketplace to share it with the community. For detailed up-to-date instructions on all these steps please visit the DEEP documentation.

So, with all of this, we have covered the basics of the Marketplace:

  • How to browse for modules
  • How to browse the metadata inside the module
  • How modules are structured: module’s code, container’s code, weights, etc.
  • How a user can deploy the modules either for predict or to retrain
  • How user’s can develop their own modules to perform a task that is not available in the Marketplace. These new modules can then be shared with the whole community.