Methodology

The DEEP Hybrid DataCloud project is structured into six different work packages, covering Networking Activities (NA) devoted to the coordination, communication and community liaison; Service Activities (SA) focused on the provisioning of services and resources for the execution of the data analysis challenges; and Joint Research Activities (JRAs), dealing with the development of new components and technologies to support data analysis. Figure 7 describes the interaction between the different work packages.

This work package will perform the global oversight of the activities carried out within the project, ensuring that they are aligned with the DEEP Hybrid DataCloud work programme. WP1 will also coordinate the consortium management through the governance structure described in Section 3.2, including the promotion of an adequate interaction between the WP through the steering committee.
WP1 will also monitor sex/gender issues. During the project preparatory phase an initial exploration of potential issues, including a revision of the ideas from the gendered innovation documentation 29 , was done. In agreement with the wide experience of the consortium (that includes leading female researchers in several areas) no specific issues were identified to be addressed within the project. However, feedback from user communities and developers will be used to assess potential gender related issues within the project.

Communication, dissemination and exploitation activities will be carried out within this work package, including the identification of the key outcomes of the project and their exploitation in the context of the European Open Science Cloud (EOSC).

This work package is responsible for the definition and correct understanding of the pilot usage scenarios regarding the project’s technical architecture. This task will craft and propose an architecture that is applicable for the identified  applications.

Moreover, NA2 will interact with SA1 and JRA3 to ensure that the delivered outcomes are aligned with the expectations of the user communities, are compliant with the proposed scenarios and validated against the user applications.

The service activities of the project will be supported by WP3, that will guarantee that the project pilot testbeds are correctly integrated with other state of the art and e-Infrastructures and services from the European Open Science Cloud (EOSC), so that the project can exploit their services in an easy way.

Moreover, this work package will supervise the software development within the project, providing a continuous software improvement process that will involve quality assurance activities, software release management, maintenance and support. The outcome of these activities will ensure that the delivered software and service outcomes are at the expected TRL at the end of the project, increasing the TRL to a production grade service (TRL8) whenever applicable. This know-how will be extended towards the developers of research applications, to improve the final quality of the solutions.

This key research activity will be carried out close to the hardware and infrastructure, addressing the gaps that currently exist in the support of accelerators (like GPU), specialized hardware (such as low-latency interconnects) and HPC systems in general. This task will ensure that bare-metal like performance is delivered through the adopted solution, and that the resources can be shared in multi-tenancy environments.

Proper interfaces will be exposed to the upper layers, from the visualization to the cloud management framework to the platform one. On top of that, high level access to HPC resources will be investigated, providing seamless access and data sharing from Cloud infrastructures.

WP5 will take care of the provisioning of the platform exploiting the outcomes from JRA1 in a hybrid approach, delivering an execution platform for JRA2, ensuring that applications can be spawned in across several cloud infrastructures. This will be done by enhancing the current orchestration components (especially those from the INDIGO-DataCloud project) enabling fine-grained multi-site orchestration.

Moreover, we will work on providing secure network interconnects between the different sites participating in this multi-cloud deployment, transparently for the user, so that all the provisioned resources can communicate among them as if they were on the same network segment. This is an important fact when users have their data available on some infrastructure, requiring at the same time access to hardware or resources not available in that infrastructure. By using a hybrid approach we will be able to access data transparently, as if it was stored in the same cloud.

This activity focuses on bridging the outcomes of NA2, JRA1 and JRA2 so as to deliver the final solution to the users in the form of a DEEP as a Service solution. This service will ensure that scientists have an easy way to deploy and execute their intensive compute applications based on containers (from NA2) that will be executed in an hybrid cloud platform (JRA2), exploiting the specialized hardware that their application requires (JRA1). This will be done by composing a set of defined building blocks that will model the user application. Secondly, this activity aims at making possible to deploy these applications as services that can be offered to final users, as a way to deliver scientific results to a wider scope of stakeholders.

Relationship and interaction with existing operational services

The project will rely on existing components and services provided by other e-Infrastructures, to which most of the partners participate. Regarding authentication and Authorization we will leverage the AAI systems provided by the EOSC e-Infrastructure. We will follow standard solutions, best practices and recommendations (like those from the AARC and AARC-2 projects) to continue the path towards a seamless and interoperable AAI solutions.

It is important to highlight that the DEEP Hybrid DataCloud project is focusing on services for exploiting extreme large datasets from a computational  perspective. Regarding data management activities (like data staging, data caching, data sharing, etc.) we will rely on, consume and integrate the solutions delivered by other EINFRA-12 and EINFRA-21 funded projects and particularly by the EOSC e-Infrastructure (such as the EGI DataHub or EUDAT B2 services). The services developed will be made available through an on-line catalogue of services, that will be integrated into the EOSC catalogue of services following the procedures defined, and according to the request of the different research communities, and the potential interest of others research communities, innovation companies, educational institutions or citizen science.