The Spanish National Research Council (CSIC) as coordinator of the DEEP-Hybrid-DataCloud project has organized together with the eXtreme-DataCloud project the “New challenges in Data Science” workshop, in the context of the Advanced Summer Courses offered by the prestigious Universidad Internacional Menendez Pelayo (UIMP). This workshop took place from June 18th to 22nd in Santander, at the Palacio de la Magdalena UIMP venue, with the participation of more than 20 experts and students from all around Europe.
The objective of the course was to review and discuss the current research trends and European initiatives regarding the infrastructure support to compute intensive data analytics techniques over massive amounts of data, making special emphasis on Deep Learning, over High Performance Computing (HPC) and hybrid Cloud Platforms.
The first half of the course started with an introductory session by Fernando Aguilar (CSIC), entitled “Understanding Researchers Requirements: a Data Science perspective”, serving to frame the discussion that took place over the next two days, where Scientists from the Italian National Institute for Nuclear Physics (INFN), the German Electron Synchrotron (DESY), the Annecy-le-Vieux Particle Physics Laboratory (LAPP, France), the CSIC and the European Clinical Research Infrastructure Network (ECRIN) analized and studied different use cases in different areas (such a Astrophysics and Particle Physics, Bioinformatics and Biodiversity) with the objective of understanding the present and future challenges that are to be tackled in these scientific areas over the next years. This first part of the course finalized by a joint wrap up and conlussions session by Daniele Cesini and Alessandro Costantiti both from INFN, project coordinators of the eXtreme-DataCloud project.
Wednesday 20th started with an introduction to the European Open Science Cloud (EOSC) done by Isabel Campos Plasencia (CSIC, member of the European High Level Expert Group for the EOSC), followed by a presentation from Giacinto Donvito (INFN ) about the EOSC-hub (an EU project implementing a service hub for the EOSC) catalogue of services and rules of engagement. The morning session concluded with Pablo Orviz from CSIC, presenting the software quality procedures and trends in the EOSC.
The last part of the course started on Wednesday afternoon and was focused on the description of practical deployments and implementations of the tools required to perform the aforementioned massive data processing on top of cloud infrastructures. Wolfgang zu Castell, from the Helmholtz Zentrum Muenchen, put the focus on the current deep learning techniques, and what are the missing gaps in current e-Infrastructures to be effectively exploited.
Tuesday 21st started with the DEEP-Hybrid-DataCloud approach to deploy advanced services over hybrid clouds. Firstly, the project architecture was described and demonstrated by Álvaro Lopez (CSIC, project co-coordinator), followed by Andy S. Alic (Universitat Politècnica de València, UPV) describing how complex applications and services can be graphically composed in order to be deployed over hybrid clouds. The day finalized with an overview of the High Performance Computing panorama and how to effectively exploit these services by using adavanced computing techniques, such as containerization, a session held by Jorge Gomes from the Portuguese Laboratory of Instrumentation and Experimental Particle Physics (LIP).
The workshop concluded on Friday with two general sessions. First of all a debate around data science ethics, guided by Steve Canham from ECRIN, took place. Participants raised interesting questions about ethical questions regarding data, data science, privacy and security. The final wrap up and conclusion session was in charge of Jesús Marco, coordinator of the DEEP-Hybrid-DataCloud project and Director of the course, setting the basis for future work in the data science area. The official closing ceremony of the event was done by Miguel Ángel Casermeiro, General Secreatary of the UIMP.