A Comprehensive Survey on Machine Learning Techniques for Android Malware Detection
Abstract
:1. Introduction
- Feature ranking and selection by calculating feature importance scores.
- Dimensionality reduction transforms features into a lower dimension to reduce bias and noise.
- Ensemble models combine the output of multiple base models to improve the overall classification performance and can be used in conjunction with any of the previous two techniques.
- Provides a detailed mapping of the contemporary ML techniques regarding Android malware detection proposed in the literature during the last 7 years, namely from 2017 to 2021.
- Categorizes each contribution based on four distinct criteria, i.e., the chosen metrics, dataset age, classification models, and performance improvement techniques.
- Introduces a converging, i.e., decision-making scheme to guide future work in this ecosystem.
2. Relevant Surveys
3. Literature Survey
- The analysis type, namely static, dynamic, or hybrid.
- The feature extraction method, namely Manifest Analysis (MA), source Code Analysis (CA), Network Traffic Analysis (NTA), Code Instrumentation (CI), System Calls Analysis (SCA), System Resources Analysis (SRA), and User Interaction Analysis (UIA).
- The features collected, as it has been listed in Table 1.
- The classification approach, i.e., base models and possible performance improvement techniques, including Feature importance (FI) metrics, Dimensionality Reduction (DR), and Ensemble Learning (EL).
4. Discussion
- Ensemble models are considered by 7 works, i.e., roughly the one-third.
- Feature importance scores are calculated in 5 works, i.e., about the one-fourth.
- Dimensionality reduction techniques are used in only 2 works.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Mobile Threat Report 2020. Available online: https://www.mcafee.com/content/dam/consumer/en-us/docs/2020-Mobile-Threat-Report.pdf (accessed on 10 March 2021).
- Kouliaridis, V.; Potha, N.; Kambourakis, G. Improving Android Malware Detection Through Dimensionality Reduction Techniques. In Machine Learning for Networking; Springer International Publishing: Paris, France, 2021; pp. 57–72. [Google Scholar] [CrossRef]
- Bacci, A.; Bartoli, A.; Martinelli, F.; Medvet, E.; Mercaldo, F.; Visaggio, C. Impact of Code Obfuscation on Android Malware Detection based on Static and Dynamic Analysis; Funchal: Madeira, Portugal, 2018; pp. 379–385. [Google Scholar] [CrossRef]
- Kouliaridis, V.; Kambourakis, G.; Geneiatakis, D.; Potha, N. Two Anatomists Are Better than One—Dual-Level Android Malware Detection. Symmetry 2020, 12, 1128. [Google Scholar] [CrossRef]
- Petsas, T.; Voyatzis, G.; Athanasopoulos, E.; Polychronakis, M.; Ioannidis, S. Rage against the virtual machine. In Proceedings of the Seventh European Workshop on System Security—EuroSec ’14, Amsterdam, The Netherlands, 13 April 2014; ACM Press: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Roy, S.; DeLoach, J.; Li, Y.; Herndon, N.; Caragea, D.; Ou, X.; Ranganath, V.; Li, H.; Guevara, N. Experimental Study with Real-World Data for Android App Security Analysis Using Machine Learning. In Proceedings of the 31st Annual Computer Security Applications Conference, ACSAC 2015, Los Angeles, CA, USA, 7–11 December 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 81–90. [Google Scholar] [CrossRef] [Green Version]
- Yan, P.; Yan, Z. A survey on dynamic mobile malware detection. Softw. Qual. J. 2017, 26, 891–919. [Google Scholar] [CrossRef]
- Odusami, M.; Abayomi-Alli, O.; Misra, S.; Shobayo, O.; Damasevicius, R.; Maskeliunas, R. Android Malware Detection: A Survey. In Communications in Computer and Information Science; Springer International Publishing: New York, NY, USA, 2018; pp. 255–266. [Google Scholar] [CrossRef]
- Kouliaridis, V.; Barmpatsalou, K.; Kambourakis, G.; Chen, S. A Survey on Mobile Malware Detection Techniques. IEICE Trans. Inf. aSyst. 2020, E103.D, 204–211. [Google Scholar] [CrossRef] [Green Version]
- Liu, K.; Xu, S.; Xu, G.; Zhang, M.; Sun, D.; Liu, H. A Review of Android Malware Detection Approaches Based on Machine Learning. IEEE Access 2020, 8, 124579–124607. [Google Scholar] [CrossRef]
- Gibert, D.; Mateu, C.; Planes, J. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. J. Netw. Comput. Appl. 2020, 153, 102526. [Google Scholar] [CrossRef]
- Shabtai, A.; Tenenboim-Chekina, L.; Mimran, D.; Rokach, L.; Shapira, B.; Elovici, Y. Mobile malware detection through analysis of deviations in application network behavior. Comput. Secur. 2014, 43, 1–18. [Google Scholar] [CrossRef]
- Canfora, G.; Mercaldo, F.; Visaggio, C.A. Mobile Malware Detection using Op-code Frequency Histograms. In Proceedings of the 12th International Conference on Security and Cryptography, SCITEPRESS—Science and and Technology Publications, Colmar, France, 20–22 July 2015. [Google Scholar] [CrossRef]
- Jang, J.; Kang, H.; Woo, J.; Mohaisen, A.; Kim, H.K. Andro-AutoPsy: Anti-malware system based on similarity matching of malware and malware creator-centric information. Digit. Investig. 2015, 14, 17–35. [Google Scholar] [CrossRef]
- Virusshare. Available online: https://virusshare.com/ (accessed on 10 September 2020).
- Contagio. Available online: http://contagiominidump.blogspot.com/ (accessed on 10 September 2020).
- Google Play. Available online: https://play.google.com/ (accessed on 10 September 2020).
- Yerima, S.Y.; Sezer, S.; Muttik, I. High accuracy android malware detection using ensemble learning. IET Inf. Secur. 2015, 9, 313–320. [Google Scholar] [CrossRef] [Green Version]
- Coronado-De-Alba, L.D.; Rodríguez-Mota, A.; Escamilla-Ambrosio, P.J. Feature selection and ensemble of classifiers for Android malware detection. In Proceedings of the 2016 8th IEEE Latin-American Conference on Communications (LATINCOM), Medellin, Colombia, 15–17 November 2016; pp. 1–6. [Google Scholar]
- Arp, D.; Spreitzenbarth, M.; Huebner, M.; Gascon, H.; Rieck, K. Drebin: Efficient and Explainable Detection of Android Malware in Your Pocket. In Proceedings of the 21th Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 23–26 February 2014; Volume 12, p. 1128. [Google Scholar]
- Milosevic, N.; Dehghantanha, A.; Choo, K.K.R. Machine learning aided Android malware classification. Comput. Electr. Eng. 2017, 61, 266–274. [Google Scholar] [CrossRef] [Green Version]
- Damshenas, M.; Dehghantanha, A.; Choo, K.K.; Mahmud, R. M0Droid: An Android Behavioral-Based Malware Detection Model. J. Inf. Priv. Secur. 2015, 11, 141–157. [Google Scholar] [CrossRef]
- Idrees, F.; Rajarajan, M.; Conti, M.; Chen, T.M.; Rahulamathavan, Y. PIndroid: A novel Android malware detection system using ensemble learning methods. Comput. Secur. 2017, 68, 36–46. [Google Scholar] [CrossRef] [Green Version]
- Android Malware Genome Project. Available online: http://www.malgenomeproject.org/ (accessed on 11 August 2020).
- The Zoo Aka Malware DB. Available online: https://thezoo.morirt.com/ (accessed on 11 August 2020).
- MalShare Project. Available online: https://malshare.com/about.php (accessed on 11 August 2020).
- Alam, S.; Qu, Z.; Riley, R.; Chen, Y.; Rastogi, V. DroidNative: Automating and optimizing detection of Android native code malware variants. Comput. Secur. 2017, 65, 230–246. [Google Scholar] [CrossRef]
- Android Runtime (ART) and Dalvik. Available online: https://source.android.com/devices/tech/dalvik (accessed on 10 September 2020).
- Kouliaridis, V.; Barmpatsalou, K.; Kambourakis, G.; Wang, G. Mal-Warehouse: A Data Collection-as-a-Service of Mobile Malware Behavioral Patterns. In Proceedings of the 2018 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computing, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Guangzhou, China, 8–12 October 2018; pp. 1503–1508. [Google Scholar] [CrossRef]
- Tao, G.; Zheng, Z.; Guo, Z.; Lyu, M.R. MalPat: Mining Patterns of Malicious and Benign Android Apps via Permission-Related APIs. IEEE Trans. Reliab. 2018, 67, 355–369. [Google Scholar] [CrossRef]
- Shen, F.; Vecchio, J.D.; Mohaisen, A.; Ko, S.Y.; Ziarek, L. Android Malware Detection Using Complex-Flows. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA, 5–8 June 2017; pp. 2430–2437. [Google Scholar] [CrossRef]
- Wang, S.; Chen, Z.; Yan, Q.; Yang, B.; Peng, L.; Jia, Z. A mobile malware detection method using behavior features in network traffic. J. Netw. Comput. Appl. 2019, 133, 15–25. [Google Scholar] [CrossRef]
- Allix, K.; Bissyandé, T.F.; Klein, J.; Traon, Y.L. AndroZoo: Collecting Millions of Android Apps for the Research Community. In Proceedings of the 13th International Conference on Mining Software Repositories, Austin, TX, USA, 14–15 May 2016; ACM: New York, NY, USA, 2016; pp. 468–471. [Google Scholar]
- Potha, N.; Kouliaridis, V.; Kambourakis, G. An extrinsic random-based ensemble approach for android malware detection. Connect. Sci. 2020, 1–17. [Google Scholar] [CrossRef]
- Alzaylaee, M.K.; Yerima, S.Y.; Sezer, S. DL-Droid: Deep learning based android malware detection using real devices. Comput. Secur. 2020, 89, 101663. [Google Scholar] [CrossRef]
- Taheri, R.; Ghahramani, M.; Javidan, R.; Shojafar, M.; Pooranian, Z.; Conti, M. Similarity-based Android malware detection using Hamming distance of static binary features. Future Gener. Comput. Syst. 2020, 105, 230–247. [Google Scholar] [CrossRef] [Green Version]
- Millar, S.; McLaughlin, N.; del Rincon, J.M.; Miller, P.; Zhao, Z. DANdroid: A Multi-View Discriminative Adversarial Network for Obfuscated Android Malware Detection; Association for Computing Machinery: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
- Cai, L.; Li, Y.; Xiong, Z. JOWMDroid: Android malware detection based on feature weighting with joint optimization of weight-mapping and classifier parameters. Comput. Secur. 2021, 100, 102086. [Google Scholar] [CrossRef]
- Kouliaridis, V.; Kambourakis, G.; Peng, T. Feature Importance in Android Malware Detection. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December 2020–1 January 2021; pp. 1449–1454. [Google Scholar] [CrossRef]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Analysis Type | Feature Extraction Method | Features Extracted |
---|---|---|
Manifest analysis | Package name, Permissions, Intents, Activities, Services, Providers | |
Static | Code analysis | API calls, Information flow, Taint tracking, Opcodes, Native code, Cleartext analysis |
Network traffic analysis | URLs, IPs, Network Protocols, Certificates, Non-encrypted data | |
Code instrumentation | Java classes, intents, network traffic | |
Dynamic | System calls analysis | System calls |
System resources analysis | CPU, Memory, and Battery usage, Process reports, Network usage | |
User interaction analysis | Buttons, Icons, Actions/Events |
Work | Year | Performance | PEM | DT | ML | AM | FE | DL | Datasets | ML PI |
---|---|---|---|---|---|---|---|---|---|---|
[7] | 2017 | + | + | - | - | - | - | - | - | - |
[8] | 2018 | - | + | + | + | - | - | - | - | - |
[9] | 2020 | + | - | + | - | + | + | - | - | - |
[10] | 2020 | + | - | - | + | - | + | - | + | - |
[11] | 2020 | - | - | - | + | - | - | + | - | - |
Current | 2021 | + | + | + | + | + | + | - | + | + |
Work | Year | Analysis | Method(s) | Feature(s) | Dataset(s) | ML Technique(s) |
---|---|---|---|---|---|---|
[12] | 2014 | Dynamic | NTA | Network traffic | N/A | Base models |
[13] | 2015 | Static | CA | Opcodes | Drebin | Base models, DR |
[14] | 2015 | Static | MA, CA | Package name, Permissions, API calls, Intents, Opcodes | Contagio Mobile, VirusShare | Base models |
[18] | 2015 | Static | CA | Permissions, API Calls | McAfee | EL |
[19] | 2016 | Static | CA | Permissions, Intents | Drebin | EL |
[21] | 2017 | Static | CA | Permissions, Source code | M0Droid | EL |
[23] | 2017 | Static | CA | Permissions, Intents | Contagio, MalGenome, theZoo, Malshare, VirusShare | FI, EL |
[27] | 2017 | Static | CA | Native code | Contagio Mobile, Drebin | Base models |
[29] | 2018 | Dynamic | SRA | CPU, Memory, and Battery usage, Process reports, Network usage | N/A | Base models |
[30] | 2018 | Static | CA | API calls | N/A | Base models |
[31] | 2018 | Static | CA | Information flow | N/A | Base models |
[32] | 2019 | Dynamic | NTA | Network traffic | Drebin | Base models |
[4] | 2020 | Hybrid | MA, CA, CI | Permissions, Intents, API calls, Java classes, inter-process communication, network traffic | Drebin, VirusShare, AndroZoo | Base models, FI, EL |
[34] | 2020 | Static | MA | Permissions, Intents | Drebin, VirusShare, AndroZoo | Base models, EL |
[35] | 2020 | Hybrid | MA, CA, UIA | Permissions, Intents, API Calls, Actions/Events | McAfee | Base models, FI |
[36] | 2020 | Static | MA, CA | Permissions, Intents, API Calls | Drebin, Contagio mobile, MalGenome | Base models, FI |
[37] | 2020 | Static | MA, CA | Permissions, Opcodes, API Calls | Drebin | Base models |
[38] | 2021 | Static | MA, CA | Permissions, Intents, Features, Components, API Calls, Intents, Shell commands | Drebin, AMD | Base models plus weighted-mapping, FI |
[2] | 2021 | Static | MA | Permissions, Intents | AndroZoo | Base models, EL, DR |
Category | Option | No. of Works |
---|---|---|
Static | 14 | |
Analysis type | Dynamic | 3 |
Hybrid | 2 | |
Source Code analysis | 14 | |
Manifest analysis | 8 | |
Feature extraction method | Network traffic analysis | 2 |
Code instrumentation | 1 | |
System resources analysis | 1 | |
User interaction analysis | 1 | |
2010 to 2014 | 11 | |
Dataset age | 2015 to 2016 | 5 |
2017 to 2020 | 3 | |
Base models | 15 | |
ML techniques | Ensemble learning | 7 |
Feature importance | 5 | |
Dimensionality reduction | 2 | |
Accuracy as a metric | 13 | |
Metrics | AUC as a metric | 7 |
Other metric | 4 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kouliaridis, V.; Kambourakis, G. A Comprehensive Survey on Machine Learning Techniques for Android Malware Detection. Information 2021, 12, 185. https://doi.org/10.3390/info12050185
Kouliaridis V, Kambourakis G. A Comprehensive Survey on Machine Learning Techniques for Android Malware Detection. Information. 2021; 12(5):185. https://doi.org/10.3390/info12050185
Chicago/Turabian StyleKouliaridis, Vasileios, and Georgios Kambourakis. 2021. "A Comprehensive Survey on Machine Learning Techniques for Android Malware Detection" Information 12, no. 5: 185. https://doi.org/10.3390/info12050185
APA StyleKouliaridis, V., & Kambourakis, G. (2021). A Comprehensive Survey on Machine Learning Techniques for Android Malware Detection. Information, 12(5), 185. https://doi.org/10.3390/info12050185