New publication: “A Cloud-Based Framework for Machine Learning Workloads and Applications”

We are thrilled to announce that we have published a new paper entitled “A Cloud-Based Framework for Machine Learning Workloads and Applications” on IEEEXplore.

The paper, that is published as Open Access and can be downloaded downloaded following its doi: 10.1109/ACCESS.2020.2964386, is authored by Álvaro López García, Jesús Marco de Lucas, Marica Antonacci, Wolfgang Zu Castell, Mario David, Marcus Hardt, Lara Lloret Iglesias, Germán Moltó, Marcin Plociennik, Viet Tran, Andy S. Alic, Miguel Caballer, Isabel Campos Plasencia, Alessandro Costantini, Stefan Dlugolinsky, Doina Cristina Duma, Giacinto Donvito, Jorge Gomes, Ignacio Heredia Cacha, Keiichi Ito, Valentin Y. Kozlov, Giang Nguyen, Pablo Orviz Fernández, Zdeněk Sustr and Pawel Wolniewicz from Institute of Physics of Cantabria (CSIC-UC), Laboratory of instrumentation and experimental particle physics from Lisbon, INFN Bari, INFN CNAF, Poznan Supercomputing and Networking Center, Karlsruhe Institute of Technology, Instituto de instrumentación para la imagen molecular (i3m) from the Universitat politècnica de València, Slovak academy of sciences, Institute of informatics (IISAS), Helmholtz Zentrum München and CESNET.

Abstract: In this paper we propose a distributed architecture to provide machine learning practitioners with a set of tools and cloud services that cover the whole machine learning development cycle: ranging from the models creation, training, validation and testing to the models serving as a service, sharing and publication. In such respect, the DEEP-Hybrid-DataCloud framework allows transparent access to existing e-Infrastructures, effectively exploiting distributed resources for the most compute-intensive tasks coming from the machine learning development cycle. Moreover, it provides scientists with a set of Cloud-oriented services to make their models publicly available, by adopting a serverless architecture and a DevOps approach, allowing an easy share, publish and deploy of the developed models.

New publication: “An information-centric approach for slice monitoring from edge devices to clouds”

We are thrilled to announce that we have published a new paper entitled “An information-centric approach for slice monitoring from edge devices to clouds” on Elsevier Procedia Computer Science.

The paper, that is published as Open Access and can be downloaded downloaded following its doi:10.1016/j.procs.2018.04.046, is authored by Binh Minh Nguyen, Huan Phan, Duong Quang Ha and Giang Nguyen, from the Institute of Informatics Slovak Academy of Sciences (IISAS) and the School of Information and Communication Technology in Hanoi.

Abstract: Internet of Things (IoT) has enabled physical devices and virtual objects to be connected to share data, coordinate, and automatically make smart decisions to server people. Recently, many IoT resource slicing studies that allow managing devices, IoT platforms, network functions, and clouds under a single unified programming interface have been proposed. Although they helped IoT developers to create IoT services more easily, the efforts still have not dealt with the monitoring problem for the slice components. This could cause an issue: thing states could not be tracked continuously, and hence the effectiveness of controlling the IoT components would be decreased significantly because of updated information lack. In this paper, we introduce an information-centric approach for multiple sources monitoring issue in IoT. The proposed model thus is designed to provide generic and extensible data format for diverse IoT objects. Through this model, IoT developers can build smart services smoothly without worrying about the diversity as well as heterogeneity of collected data. We also propose an overall monitoring architecture for the information-centric model to deploy in IoT environment and its monitoring API prototype. This document also presents our experiments and evaluations to prove feasibility of the proposals in practice.

New publication: “A multivariate fuzzy time series resource forecast model for clouds using LSTM and data correlation analysis”

We are thrilled to announce that we have published a new paper entitled “A multivariate fuzzy time series resource forecast model for clouds using LSTM and data correlation analysis” on Elsevier Procedia Computer Science.

The paper, that is published as Open Access and can be downloaded downloaded following its doi:10.1016/j.procs.2018.07.298, is authored by Nhuan Tran, Thang Nguyen, Binh Minh Nguyen and Giang Nguyen, from the Institute of Informatics Slovak Academy of Sciences (IISAS) and the School of Information and Communication Technology in Hanoi.

Abstract: Today, almost all clouds only offer auto-scaling functions using resource usage thresholds, which are defined by users. Meanwhile, applying prediction-based auto-scaling functions to clouds still faces a problem of inaccurate forecast during operation in practice even though the functions only deal with univariate monitoring data. Up until now, there are still very few efforts to simultaneously process multiple metrics to predict resource utilization. The motivation for this multivariate processing is that there could be some correlations among metrics and they have to be examined in order to increase the model applicability in fact. In this paper, we built a novel forecast model for cloud proactive auto-scaling systems with combining several mechanisms. For preprocessing data phase, to reduce the fluctuation of monitoring data, we exploit fuzzification technique. We evaluate the correlations between different metrics to select suitable data types as inputs for the prediction model. In addition, long-short term memory (LSTM) neural network is employed to predict the resource consumption with multivariate time series data at the same time. Our model thus is called multivariate fuzzy LSTM (MF-LSTM). The proposed system is tested with Google trace data to prove its efficiency and feasibility when applying to clouds.

New publication: “Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey”

We are thrilled to announce that we have published a new paper entitled “Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey” on the Springer Artificial Intelligence Review Journal.

The paper, that is published as Open Access and can be downloaded following its doi: 10.1007/s10462-018-09679-z, is authored by Giang Nguyen, Stefan Dlugolinsky, Martin Bobák, Viet Tran, Álvaro López García, Ignacio Heredia, Peter Malík and Ladislav Hluchý, from the Institute of Informatics Slovak Academy of Sciences (IISAS) and the Institute of Physics of Cantabria (IFCA – CSIC – UC).

Abstract: The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.

DEEP Hybrid DataCloud accepted paper in 22nd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES2018), Sep 3-5, 2018, Belgrade, Serbia

We are proud to announce that a research paper developed under the Deep Hybrid DataCloud Project has been accepted for inclusion in 22nd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES2018) to be held on 3-5 September 2018, in Belgrade (Serbia). This paper will be published by Elsevier Science in the open-access Procedia Computer Science series on-line.

Title: A multivariate fuzzy time series resource forecast model for clouds using LSTM and data correlation analysis

Authors, Nhuan Trana, Thang Nguyena, Binh Minh Nguyena, Giang Nguyenb

a School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam

b Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia

Abstract

Today, almost all clouds only offer auto-scaling functions using resource usage thresholds, which are defined by users. Meanwhile, applying prediction-based auto-scaling functions to clouds still faces a problem of inaccurate forecast during operation in practice even though the functions only deal with univariate monitoring data. Up until now, there are still very few efforts to simultaneously process multiple metrics to predict resource utilization. The motivation for this multivariate processing is that there could be some correlations among metrics and they have to be examined in order to increase the model applicability in fact. In this paper, we built a novel forecast model for cloud proactive auto-scaling systems with combining several mechanisms. For preprocessing data phase, to reduce the fluctuation of monitoring data, we exploit fuzzification technique. We evaluate the correlations between different metrics to select suitable data types as inputs for the prediction model. In addition, long-short term memory (LSTM) neural network is employed to predict the resource consumption with multivariate time series data at the same time. Our model thus is called multivariate fuzzy LSTM (MF-LSTM). The proposed system is tested with Google trace data to prove its efficiency and feasibility when applying to clouds.

DEEP Hybrid DataCloud accepted paper in 9th International Conference on Ambient Systems, Networks and Technologies, May 8-11, 2018, Porto (Portugal)

We are proud to announce that a research paper developed under the Deep Hybrid DataCloud Project has been accepted for inclusion in 9th International Conference on Ambient Systems, Networks and Technologies to be held on 8-11 May 2018 in Porto (Portugal). This paper will be published by Elsevier Science in the open-access Procedia Computer Science series on-line.

Title: “An Information-centric Approach for Slice Monitoring from Edge Devices to Clouds

Authors: Binh Minh Nguyena, Huah Phana, Duong Quang Haa, Giang Nguyenb

a School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam

b Institute of Informatics, Slovak Academy of Sciences, Bratislava 845 07 , Slovakia

ABSTRACT

Internet of Things (IoT) has enabled physical devices and virtual objects to be connected to share data, coordinate, and automatically make smart decisions to server people. Recently, many IoT resource slicing studies that allow managing devices, IoT platforms, network functions, and clouds under a single unified programming interface have been proposed. Although they helped IoT developers to create IoT services more easily, the efforts still have not dealt with the monitoring problem for the slice components. This could cause an issue: thing states could not be tracked continuously, and hence the effectiveness of controlling the IoT components would be decreased significantly because of updated information lack. In this paper, we introduce an information-centric approach for multiple sources monitoring issue in IoT. The proposed model thus is designed to provide generic and extensible data format for diverse IoT objects. Through this model, IoT developers can build smart services smoothly without worrying about the diversity as well as heterogeneity of collected data. We also propose an overall monitoring architecture for the information-centric model to deploy in IoT environment and its monitoring API prototype. This document also presents our experiments and evaluations to prove feasibility of the proposals in practice.

DEEP Hybrid DataCloud accepted paper in Data and Knowledge Engineering Journal (Elsevier)

We are proud to announce that a research paper developed under the Deep-Hybrid-DataCloud project has been accepted in the Data and Knowledge Engineering Journal (Elsevier), Available online 12 March 2018.

Title: A heuristics approach to mine behavioural data logs in mobile malware detection system

Authors:  Giang Nguyena, Binh Nguyen b, Dang Tran b, Ladislay Hluchy a

a Institute of Informatics, Slovak Academy of Sciences, Dubravska cesta 9, 845 07 Bratislava, Slovakia

b School of Information and Communication Technology, Hanoi University of Science and Technology, Vietnam

ABSTRACT

Nowadays, in the era of Internet of Things when everything is connected via the Internet, the number of mobile devices has risen exponentially up to billions around the world. In line with this increase, the volume of data generated is enormous and has attracted malefactors who do ill deeds to others. For hackers, one of the popular threads to mobile devices is to spread malware. These actions are very difficult to prevent because the application installation and configuration rights are set by owners, who usually have very low knowledge or do not care about the security. In this study, our aim is to improve security in the environment of mobile devices by proposing a novel system to detect malware intrusions automatically. Our solution is based on modelling user behaviours and applying the heuristic analysis approach to mobile logs generated during the device operation process. Although behaviours of individual users have a significant impact on the social cyber-security, to achieve the user awareness has still remained one of the major challenges today. For this task, there is proposed a light-weight semantic formalization in the form of physical and logical taxonomy for classifying the collected raw log data. Then a set of techniques is used, like sliding windows, lemmatization, feature selection, term weighting, and so on, to process data. Meanwhile, malware detection tasks are performed based on incremental machine learning mechanisms, because of the potential complexity of these tasks. The solution is developed in the manner to allow the scalability with several blocks that cover pre-processing raw collected logs from mobile devices, automatically creating datasets for machine learning methods, using the best selected model for detecting suspicious activity surrounding malware intrusions, and supporting decision making using a predictive risk factor. We experimented cautiously with the proposal and achieved test results confirm the effectiveness and feasibility of the proposed system in applying to the large-scale mobile environment.