The DEEP Open Catalog provides ready to use modules for Artificial Intelligence, Machine Learning and Deep Learning models that can be executed in a wide variety of computing platforms. These include local laptops, production servers, supercomputers and e-infrastructures supporting the DEEP Hybrid-DataCloud software stack.
The versatility of the DEEPaaS component, which provides a REST API to serve machine learning and deep learning models, has allowed to introduce additional functionality required to perform the prediction phase from the command line interface. This is required to perform batch execution of prediction jobs to be run, for example, on Local Resource Management Systems (LRMS) such as SLURM, within a cluster of PCs. This allows, for example, to classify thousands of audio files using an Audio Classifier module in the DEEP Open Catalog in an unattended manner.
We wanted to determine how easy it was to run these modules on a public Cloud provider such as Amazon Web Services. Indeed, existing services such as AWS Batch provide the ability to deploy virtual elastic clusters, even with GPU support, that execute Docker-based jobs and can auto-scale to zero in order to support a pay-per-usage approach.
To this aim, we used the open-source SCAR tool, which allows to create highly-parallel event-driven file-processing serverless applications that execute on both customised runtime environments on AWS Lambda and in AWS Batch compute environments. As it can be seen in the newly added deep-audio use case for SCAR, this tool uses a YAML file to describe the job to be executed (in this case based on the deephdc/deep-oc-audio-classification-tf Docker image in Docker Hub).
The following figure shows the dashboard of the developed service. It consists of a web-based application that provides seamless access to the service to the DEEP’s user community. The service provides the ability to select a model from the DEEP Open Catalog (from those integrated so far) so that, whenever a new file is upload, this triggers the execution of prediction phase of the model, using this file as input. This is executed on a dynamically provisioned cluster of machines than can leverage both CPUs and GPUs (if the models supports this feature). Additional computing nodes are added if many files are pending to be processed. Also, the virtual clusters auto-scales to zero whenever all the files have been processed, thus providing seamless event-driven prediction for DEEP models.
The following figure summarises the architecture of the service. The web service has been integrated with the DEEP IAM through the help of Amazon Cognito’s Federated Identities in order to provide easy access for existing DEEP users. Uploading a file to the Amazon S3 bucket triggers the execution of an AWS Lambda function (created through SCAR) that automatically converts the request to a Batch job submitted to a specific compute environment, what triggers the deployment of additional virtual machines to perform the processing (prediction) of the files.
As a summary, the flexibility of DEEPaaS and the availability of pre-trained modules in the DEEP Open Catalog has facilitated the highly scalable execution of these models on a public Cloud provider such as Amazon Web Services. SCAR is able to provide serverless computing for scientific applications to be run directly on AWS Lambda. However, the large size of the Docker images in the DEEP Open Catalog required to support the event-driven execution of more resource-intensive computing services as is the case of AWS Batch.