We are proud to announce that a research paper developed under the Deep-Hybrid-DataCloud project has been accepted in the Data and Knowledge Engineering Journal (Elsevier), Available online 12 March 2018.

Title: A heuristics approach to mine behavioural data logs in mobile malware detection system

Authors:  Giang Nguyena, Binh Nguyen b, Dang Tran b, Ladislay Hluchy a

a Institute of Informatics, Slovak Academy of Sciences, Dubravska cesta 9, 845 07 Bratislava, Slovakia

b School of Information and Communication Technology, Hanoi University of Science and Technology, Vietnam

ABSTRACT

Nowadays, in the era of Internet of Things when everything is connected via the Internet, the number of mobile devices has risen exponentially up to billions around the world. In line with this increase, the volume of data generated is enormous and has attracted malefactors who do ill deeds to others. For hackers, one of the popular threads to mobile devices is to spread malware. These actions are very difficult to prevent because the application installation and configuration rights are set by owners, who usually have very low knowledge or do not care about the security. In this study, our aim is to improve security in the environment of mobile devices by proposing a novel system to detect malware intrusions automatically. Our solution is based on modelling user behaviours and applying the heuristic analysis approach to mobile logs generated during the device operation process. Although behaviours of individual users have a significant impact on the social cyber-security, to achieve the user awareness has still remained one of the major challenges today. For this task, there is proposed a light-weight semantic formalization in the form of physical and logical taxonomy for classifying the collected raw log data. Then a set of techniques is used, like sliding windows, lemmatization, feature selection, term weighting, and so on, to process data. Meanwhile, malware detection tasks are performed based on incremental machine learning mechanisms, because of the potential complexity of these tasks. The solution is developed in the manner to allow the scalability with several blocks that cover pre-processing raw collected logs from mobile devices, automatically creating datasets for machine learning methods, using the best selected model for detecting suspicious activity surrounding malware intrusions, and supporting decision making using a predictive risk factor. We experimented cautiously with the proposal and achieved test results confirm the effectiveness and feasibility of the proposed system in applying to the large-scale mobile environment.