Exploring the usage of Topic Modeling for Android Malware Static Analysis

posted Jun 3, 2016, 10:08 AM by Eric Medvet   [ updated Dec 19, 2016, 1:15 AM ]
The rapid growth in smartphone and tablet usage over the last years has led to the inevitable rise in targeting of these devices by cyber-criminals. The exponential growth of Android devices, and the buoyant and largely unregulated Android app market, produced a sharp rise in malware targeting that platform. Furthermore, malware writers have been developing detection-evasion techniques which rapidly make anti-malware technologies ineffective. It is hence advisable that security expert are provided with tools which can aid them in the analysis of existing and new Android malware.
In this paper, we explore the use of topic modeling as a technique which can assist experts to analyse malware applications in order to discover their characteristic. We apply Latend Dirichlet Allocation (LDA) to mobile applications represented as opcode sequences, hence considering a topic as a discrete distribution of opcode. Our experiments on a dataset of 900 malware applications of different families show that the information provided by topic modeling may help in better understanding malware characteristics and similarities.
Eric Medvet,
Dec 19, 2016, 2:13 AM