Content area
Full Text
Introduction
Malware (malicious software) refers to various forms of malicious or intrusive software, such as viruses, worms, backdoors, spyware, Trojan horses, and rootkits (Malware Definition 2019). Usually malware is specifically designed to attack computers, such as damaging file system, stealing information, executing other undesirable or illegal actions. In present, due to the portability and convenience of smart mobile devices, smart mobile devices have been widely used. A recent report by StatCounter has shown that about 42% mobile devices use Android system. The popularity of the Android system has also led to the wide spread of Android malware (Aafer et al. 2013). These malicious applications are mainly distributed in markets operated by the third parties, but even the Google Android Market cannot guarantee that all of its listed applications are threat free. Android malware brings users even more serious security threats. The threats include Phishing, Banking-Trojans, Spyware, Bots, RootExploits, SMS Fraud, Premium Dialers and so on.
To protect the security of android system, different malware detection methods have been proposed. Currently, one of the most widely used detection methods is the signature based method. This method needs experts to define malware signatures manually, and malware is detected by searching for signatures in applications. The disadvantage of signature based method is that using obfuscation technology malware writers can change the signature of a malware, which results malware can easily evade the detection of detection engine.
To address this problem, intelligent malware detection method is proposed. The intelligent malware detection method uses machine learning method to detect malwares. Machine learning algorithms can learn the hidden patterns of malware. These patterns have strong generalization ability, which can discriminate malicious applications from benign ones. In general traditional machine learning based detection methods need to use decompiling techniques or dynamic monitoring techniques to analyze malware and then design the feature representation method to re-represent malware. This procedure is time consuming and strongly depends on the skills of experts. Furthermore, malware can hide its malicious behaviors when it detects it is running in a monitored environment. Moreover, malware can also be packed or encrypted to avoid the analysis of the decompiling tools (Lindorfer et al. 2012). In these cases it is hard to obtain the feature representation of malware.
To solve this...