A Survey of the Recent Trends in Deep Learning Based Malware Detection
Abstract
:1. Introduction
- Description of malware classification and identification strategies
- Mechanisms for classifying and detecting malware and a comparative analysis between these methods
- Potential issues and challenges in the different categories of proposed solutions
- The future direction of research in this domain
2. Trends in Malware Detection
2.1. Malware Detection with Primitive Methods (Statistical Analysis Based Methods)
2.2. Malware Detection with Conventional Machine Learning Based Methods
2.3. Malware Detection with Deep Learning Based Methods
3. Issues and Challenges
3.1. Shortcomings of Primitive Methods (Statistical Analysis Based Methods) for Detecting Malware
3.2. Shortcomings of Conventional Machine Learning Based Methods for Detecting Malware
3.3. Shortcomings of Deep Learning Based Methods for Malware Detection
4. Direction for Future Work
4.1. Moderate Sized and Updated Dataset
4.2. Using Significant Features
4.3. Handling Evasion Techniques
4.4. Combating Anti sAnalysis Techniques
- Since malware easily changes its shape due to sophisticated techniques used by malware writers so research in the future should be conducted with the motive of dealing with metamorphic, polymorphic, and obfuscated malware.
- The day-by-day increase in malware is the prime reason for the increasing no. of malware families and with the passage of a certain period various new forms of malware keep on showing up on the surface of the cyber world. Future research should focus on developing a generic model that should be capable of detecting zero day malware.
- To implement the real time solution, a model should be reliable enough to handle any kind of unseen malware as well.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Taxonomy of Malware Analysis
Appendix A.1. Malware Types
- Virus: It can replicate itself by getting attached to any file/document. It has the potential to corrupt the system, destroy the data, and can pose a great threat to assets.
- Worm: It behaves just like a virus but can replicate itself over the network.
- Trojan horse: It masquerades itself as a useful program but contains malicious code.
- Backdoor: It gets itself installed on the system and gives access to the attacker without or with very little authentication.
- Botnet: It behaves just like a backdoor. The difference lies when it comes to the command and control server. All systems compromised by the botnet receive the same command from the same command and control server.
- Spyware: It behaves as a useful application but leaks users’ data.
- Downloader: It is normally installed by the attacker on victims’ machines. Its sole purpose is to download malicious code on the system.
- Rootkit: It gets paired with other malware and hides the existence of that malware. Another devastating effect of the rootkit is the root level access that it gives to the malware.
- Scareware: It frightens the users to buy their products to keep their data and system safe.
- Many malware fall into more than one category as they exhibit features of more than one malware family.
Appendix A.2. Malware Analysis
- To gain the capability of responding to network intrusion
- To determine how can systems and files be infected
- To analyze the potential of suspected binaries/PE
- To devise the mechanism for identifying malware
- To find host-based signatures or indicators
- To find network-based signatures or indicators
- The scale of devastation that malware can pose
Advantages | Disadvantages | |
---|---|---|
Static Analysis |
|
|
Dynamic Analysis |
|
|
Appendix B. Glossary of All Terms
References
- PandaLabs Annual Report 2018; Panda Security: Chertsey, UK, 2018.
- FBI. Addressing Threats to the Nations Cybersecurity 1. FBI Report, Retrieved 3 August 2022. Available online: https://www.fbi.gov/file-repository/addressing-threats-to-the-nations-cybersecurity-1.pdf/view (accessed on 10 August 2022).
- Manavi, F.; Hamzeh, A. A novel approach for ransomware detection based on PE header using graph embedding. J. Comput. Virol. Hacking Tech. 2022, 14, 1–12. [Google Scholar] [CrossRef]
- Zahoora, U.; Rajarajan, M.; Pan, Z.; Khan, A. Zero-day Ransomware Attack Detection using Deep Contractive Autoencoder and Voting based Ensemble Classifier. Appl. Intell. 2022, 1–20. [Google Scholar] [CrossRef]
- Mohurle, S.; Patil, M. A brief study of Wannacry Threat: Ransomware Attack 2017. Int. J. Adv. Res. Comput. Sci. 2017, 8, 1938–1940. [Google Scholar]
- Maria Vergelis, T.S. Spam and Phishing in Q2 2019; SecureList by Kaspersky: Moscow, Russia, 2019. [Google Scholar]
- ISTR Internet Security Threat Report; Symantec: Tempe, AZ, USA, 2019; Volume 24.
- Cyberattacks. Available online: https://www.cnbc.com/2019/10/13/cyberattacks-cost-small-companies-200k-putting-many-out-of-business.html (accessed on 9 March 2022).
- Baezner, M.; Robin, P.; Wenger, A. Stuxnet. 2017. Available online: https://css.ethz.ch/ (accessed on 5 July 2020).
- Mo, Y.; Chabukswar, R.; Sinopoli, B. Detecting integrity attacks on SCADA systems. IEEE Trans. Control Syst. Technol. 2014, 22, 1396–1407. [Google Scholar] [CrossRef]
- Marelli, D.; Sui, T.; Fu, M.; Lu, R. Statistical Approach to Detection of Attacks for Stochastic Cyber-Physical Systems. IEEE Trans Autom. Contr 2021, 66, 849–856. [Google Scholar] [CrossRef]
- Sui, T.; Mo, Y.; Marelli, D.; Sun, X.; Fu, M. The Vulnerability of Cyber-Physical System under Stealthy Attacks. IEEE Trans Autom. Contr 2021, 66, 637–650. [Google Scholar] [CrossRef]
- Aslan, O.; Samet, R. A Comprehensive Review on Malware Detection Approaches. IEEE Access 2020, 8, 6249–6271. [Google Scholar] [CrossRef]
- Souri, A.; Hosseini, R. A state-of-the-art survey of malware detection approaches using data mining techniques. Hum. Cent. Comput. Inf. Sci. 2018, 8, 3. [Google Scholar] [CrossRef]
- Ucci, D.; Aniello, L.; Baldoni, R. Survey of machine learning techniques for malware analysis. Comput. Secur. 2019, 81, 123–147. [Google Scholar] [CrossRef]
- Mahdavifar, S.; Ghorbani, A.A. Application of deep learning to cybersecurity: A survey. Neurocomputing 2019, 347, 149–176. [Google Scholar] [CrossRef]
- Berman, D.S.; Buczak, A.L.; Chavis, J.S.; Corbett, C.L. A survey of deep learning methods for cyber security. Information 2019, 10, 122. [Google Scholar] [CrossRef] [Green Version]
- Komatwar, R.; Kokare, M. A Survey on Malware Detection and Classification. J. Appl. Secur. Res. 2021, 16, 390–420. [Google Scholar] [CrossRef]
- Christodorescu, M.; Jha, S. Static analysis of executables to detect malicious patterns. In Proceedings of the 12th USENIX Security Symposium (USENIX Security 03), Washington, DC, USA, 4–8 August 2003. [Google Scholar] [CrossRef]
- Santos, I. Idea: Opcode-sequence-based malware detection. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2010; Volume 5965. [Google Scholar] [CrossRef]
- Sabbatel, G.B.; Korczynski, M.; Duda, A. Architecture of a Platform for Malware Analysis and Confinement. In Proceedings of the Proceeding MCSS 2010: Multimedia Communications, Services and Security, Cracow, Poland, 2–3 June 2011. [Google Scholar]
- Elhadi, A.A.E.; Maarof, M.A.; Osman, A.H. Malware detection based on hybrid signature behavior application programming interface call graph. Am. J. Appl. Sci. 2012, 9, 283–288. [Google Scholar] [CrossRef]
- Fleck, D.; Tokhtabayev, A.; Alarif, A.; Stavrou, A.; Nykodym, T. PyTrigger: A system to trigger & extract user-activated malware behavior. In Proceedings of the 2013 International Conference on Availability, Reliability and Security, Regensburg, Germany, 2–6 September 2013. [Google Scholar] [CrossRef]
- Berlin, K.; Slater, D.; Saxe, J. Malicious behavior detection using windows audit logs. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, Denver, CO, USA, 16 October 2015. [Google Scholar] [CrossRef]
- Kumar, G.; Thakur, K.; Ayyagari, M.R. MLEsIDSs: Machine learning-based ensembles for intrusion detection systems—A review. J. Supercomput. 2020, 76, 8938–8971. [Google Scholar] [CrossRef]
- Chen, L.; Li, T.; Abdulhayoglu, M.; Ye, Y. Intelligent malware detection based on file relation graphs. In Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015), Anaheim, CA, USA, 7–9 February 2015. [Google Scholar] [CrossRef]
- Elhadi, A.A.E.; Maarof, M.A.; Barry, B.I.A. Improving the detection of malware behaviour using simplified data dependent API call graph. Int. J. Secur. Its Appl. 2013, 7, 29–42. [Google Scholar] [CrossRef]
- Feng, Z.; Xiong, S.; Cao, D.; Deng, X.; Wang, X.; Yang, Y.; Zhou, X.; Huang, Y.; Wu, G. HRS: A Hybrid Framework for Malware Detection. In Proceedings of the 2015 ACM International Workshop on International Workshop on Security and Privacy Analytics, San Antonio, TX, USA, 4 March 2015. [Google Scholar] [CrossRef] [Green Version]
- Ghiasi, M.; Sami, A.; Salehi, Z. Dynamic VSA: A framework for malware detection based on register contents. Eng. Appl. Artif. Intell. 2015, 44, 111–122. [Google Scholar] [CrossRef]
- Kwon, B.J.; Dumitras, T. The Dropper Effect: Insights into Malware Distribution with Downloader Graph Analytics Categories and Subject Descriptors. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (Ccs’15), Denver, CO, USA, 12–16 October 2015. [Google Scholar]
- Mao, W.; Cai, Z.; Towsley, D.; Guan, X. Probabilistic inference on integrity for access behavior based malware detection. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2015; Volume 9404. [Google Scholar] [CrossRef]
- Piyanuntcharatsr, S.S.W.; Adulkasem, S.; Chantrapornchai, C. On the comparison of malware detection methods using data mining with two feature sets. Int. J. Secur. Its Appl. 2015, 9, 293–318. [Google Scholar] [CrossRef]
- Wüchner, T.; Ochoa, M.; Pretschner, A. Robust and effective malware detection through quantitative data flow graph metrics. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2015; Volume 9148. [Google Scholar] [CrossRef]
- Raff, E.; Nicholas, C. An alternative to NCD for large sequences, lempel-ZiV jaccard distance. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; Volume 129685. [Google Scholar] [CrossRef]
- Khodamoradi, P.; Fazlali, M.; Mardukhi, F.; Nosrati, M. Heuristic metamorphic malware detection based on statistics of assembly instructions using classification algorithms. In Proceedings of the 18th CSI International Symposium on Computer Architecture and Digital Systems, (CADS 2015), Tehran, Iran, 7–8 October 2015. [Google Scholar] [CrossRef]
- Upchurch, J.; Zhou, X. Variant: A malware similarity testing framework. In Proceedings of the 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), Fajardo, PR, USA, 20–22 October 2015. [Google Scholar] [CrossRef]
- Liang, G.; Pang, J.; Dai, C. A Behavior-Based Malware Variant Classification Technique. Int. J. Inf. Educ. Technol. 2016, 6, 291. [Google Scholar] [CrossRef]
- Vadrevu, P.; Perdisci, R. MAXS: Scaling malware execution with sequential multi-hypothesis testing. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, Xi’an, China, 30 May–3 June 2016. [Google Scholar] [CrossRef]
- Dahl, G.E.; Stokes, J.W.; Deng, L.; Yu, D. Large-scale malware classification using random projections and neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013. [Google Scholar] [CrossRef] [Green Version]
- Ravi, V.; Alazab, M.; Selvaganapathy, S.; Chaganti, R. A Multi-View attention-based deep learning framework for malware detection in smart healthcare systems. Comput. Commun. 2022, 195, 73–81. [Google Scholar] [CrossRef]
- Rama, K.; Kumar, P.; Bhasker, B. Deep Learning to Address Candidate Generation and Cold Start Challenges in Recommender Systems: A Research Survey. arXiv 2019, arXiv:1907.08674. [Google Scholar]
- Rhode, M.; Burnap, P.; Jones, K. Early-stage malware prediction using recurrent neural networks. Comput Secur 2018, 77, 578–594. [Google Scholar] [CrossRef]
- Kolosnjaji, B.; Zarras, A.; Webster, G.; Eckert, C. Deep learning for classification of malware system call sequences. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2016; Volume 9992. [Google Scholar] [CrossRef]
- Hardy, W.; Chen, L.; Hou, S.; Ye, Y.; Li, X. DL 4 MD: A Deep Learning Framework for Intelligent Malware Detection; CSREA Press: Las Vegas, NV, USA, 2016; pp. 61–67. [Google Scholar]
- Saxe, J.; Berlin, K. eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys. arXiv 2017, arXiv:1702.08568. [Google Scholar]
- Azmoodeh, A.; Dehghantanha, A.; Choo, K.K.R. Robust Malware Detection for Internet of (Battlefield) Things Devices Using Deep Eigenspace Learning. IEEE Trans. Sustain. Comput. 2019, 4, 88–95. [Google Scholar] [CrossRef]
- Cui, Z.; Xue, F.; Cai, X.; Cao, Y.; Wang, G.G.; Chen, J. Detection of Malicious Code Variants Based on Deep Learning. IEEE Trans Ind. Inf. 2018, 14, 3187–3196. [Google Scholar] [CrossRef]
- Ni, S.; Qian, Q.; Zhang, R. Malware identification using visualization images and deep learning. Comput Secur 2018, 77, 871–885. [Google Scholar] [CrossRef]
- Rosenberg, I.; Sicard, G.; David, E. End-to-end deep neural networks and transfer learning for automatic analysis of nation-state malware. Entropy 2018, 20, 390. [Google Scholar] [CrossRef]
- Kolosnjaji, B.; Eraisha, G.; Webster, G.; Zarras, A.; Eckert, C. Empowering convolutional networks for malware classification and analysis. In Proceedings of the International Joint Conference on Neural Networks, Anchorage, AK, USA, 14–19 May 2017. [Google Scholar] [CrossRef]
- Xiao, F.; Lin, Z.; Sun, Y.; Ma, Y. Malware Detection Based on Deep Learning of Behavior Graphs. Math. Probl. Eng. 2019, 2019, 8195395. [Google Scholar] [CrossRef]
- Tobiyama, S.; Yamaguchi, Y.; Shimada, H.; Ikuse, T.; Yagi, T. Malware Detection with Deep Neural Network Using Process Behavior. In Proceedings of the 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), Atlanta, GA, USA, 10–14 June 2016; Volume 2. [Google Scholar] [CrossRef]
- Vinayakumar, R.; Alazab, M.; Soman, K.P.; Poornachandran, P.; Venkatraman, S. Robust Intelligent Malware Detection Using Deep Learning. IEEE Access 2019, 7, 46717–46738. [Google Scholar] [CrossRef]
- David, O.E.; Netanyahu, N.S. DeepSign: Deep learning for automatic malware signature generation and classification. In Proceedings of the International Joint Conference on Neural Networks, Killarney, Ireland, 12–17 July 2015. [Google Scholar] [CrossRef]
- Saxe, J.; Berlin, K. Deep neural network based malware detection using two dimensional binary program features. In Proceedings of the 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), Fajardo, PR, USA, 20–22 October 2015. [Google Scholar] [CrossRef]
- Tran, T.K.; Sato, H.; Kubo, M. One-shot learning approach for unknown malware classification. In Proceedings of the 2018 5th Asian Conference on Defense Technology (ACDT), Hanoi, Vietnam, 25–26 October 2018. [Google Scholar] [CrossRef]
- Raff, E.; Sylvester, J.; Nicholas, C. Learning the PE header, malware detection with minimal domain knowledge. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, Dallas, TX, USA, 3 November 2017. [Google Scholar] [CrossRef]
- Bensaoud, A.; Kalita, J. Deep multi-task learning for malware image classification. J. Inf. Secur. Appl. 2022, 64, 103057. [Google Scholar] [CrossRef]
- Kumar, S.; Janet, B. DTMIC: Deep transfer learning for malware image classification. J. Inf. Secur. Appl. 2022, 64, 103063. [Google Scholar] [CrossRef]
- Mohammadi, F.G.; Amini, M.H.; Arabnia, H.R. An introduction to advanced machine learning: Meta-learning algorithms, applications, and promises. In Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2020; Volume 1123. [Google Scholar] [CrossRef]
- Kadam, S.; Vaidya, V. Review and analysis of zero, one and few shot learning approaches. In Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2020; Volume 940. [Google Scholar] [CrossRef]
- Hsiao, S.C.; Kao, D.Y.; Liu, Z.Y.; Tso, R. Malware image classification using one-shot learning with siamese networks. Procedia Comput. Sci. 2019, 159, 1863–1871. [Google Scholar] [CrossRef]
- Tran, T.K.; Sato, H.; Kubo, M. Image-based unknown malware classification with few-shot learning models. In Proceedings of the 2019 Seventh International Symposium on Computing and Networking Workshops (CANDARW), Nagasaki, Japan, 26–29 November 2019. [Google Scholar] [CrossRef]
- Tang, Z.; Wang, P.; Wang, J. ConvProtoNet: Deep prototype induction towards better class representation for few-shot malware classification. Appl. Sci. 2020, 10, 2847. [Google Scholar] [CrossRef]
- Atapour-Abarghouei, A.; Bonner, S.; McGough, A.S. A King’s Ransom for Encryption: Ransomware Classification using Augmented One-Shot Learning and Bayesian Approximation. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019. [Google Scholar] [CrossRef]
- Lee, J.; Jeong, K.; Lee, H. Detecting metamorphic malwares using code graphs. In Proceedings of the 2010 ACM Symposium on Applied Computing, Sierre, Switzerland, 22–26 March 2010. [Google Scholar] [CrossRef]
- Santos, I.; Devesa, J.; Brezo, F.; Nieves, J.; Bringas, P.G. OPEM: A static-dynamic approach for machine-learning-based malware detection. In Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2013; Volume 189. [Google Scholar] [CrossRef]
- Pai, S.; di Troia, F.; Visaggio, C.A.; Austin, T.H.; Stamp, M. Clustering for malware classification. J. Comput. Virol. Hacking Tech. 2017, 13, 95–107. [Google Scholar] [CrossRef]
- Polino, M.; Scorti, A.; Maggi, F.; Zanero, S. Jackdaw: Towards automatic reverse engineering of large datasets of binaries. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2015; Volume 9148. [Google Scholar] [CrossRef]
- Sexton, J.; Storlie, C.; Anderson, B. Subroutine based detection of APT malware. J. Comput. Virol. Hacking Tech. 2016, 12, 225–233. [Google Scholar] [CrossRef]
- Lin, C.T.; Wang, N.J.; Xiao, H.; Eckert, C. Feature selection and extraction for malware classification. J. Inf. Sci. Eng. 2015, 31, 965–992. [Google Scholar]
- Mohaisen, A.; Alrawi, O.; Mohaisen, M. AMAL: High-fidelity, behavior-based automated malware analysis and classification. Comput Secur 2015, 52, 251–266. [Google Scholar] [CrossRef]
- Lindorfer, M.; Kolbitsch, C.; Milani Comparetti, P. Detecting environment-sensitive malware. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2011; Volume 6961. [Google Scholar] [CrossRef]
- Santos, I.; Brezo, F.; Ugarte-Pedrero, X.; Bringas, P.G. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. 2013, 231, 64–82. [Google Scholar] [CrossRef]
- Park, Y.; Reeves, D.; Mulukutla, V.; Sundaravel, B. Fast malware classification by automated behavioral graph matching. In Proceedings of the 6th Annual Workshop on Cyber Security and Information Intelligence Research (CSIIRW ’10), Oak Ridge, TN, USA, 21–23 April 2010. [Google Scholar] [CrossRef]
- Islam, R.; Tian, R.; Batten, L.M.; Versteeg, S. Classification of malware based on integrated static and dynamic features. J. Netw. Comput. Appl. 2013, 36, 646–656. [Google Scholar] [CrossRef]
- Nari, S.; Ghorbani, A.A. Automated malware classification based on network behavior. In Proceedings of the 2013 International Conference on Computing, Networking and Communications (ICNC), San Diego, CA, USA, 28–31 January 2013. [Google Scholar] [CrossRef]
- Kawaguchi, N.; Omote, K. Malware function classification using apis in initial behavior. In Proceedings of the 2015 10th Asia Joint Conference on Information Security, Kaohsiung, Taiwan, 24–26 May 2015. [Google Scholar] [CrossRef]
- Gharacheh, M.; Derhami, V.; Hashemi, S.; Fard, S.M.H. Proposing an HMM-based approach to detect metamorphic malware. In Proceedings of the 2015 4th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), Zahedan, Iran, 9–11 September 2015. [Google Scholar] [CrossRef]
- Loi, N.; Borile, C.; Ucci, D. Towards an Automated Pipeline for Detecting and Classifying Malware through Machine Learning. arXiv 2021, arXiv:2106.05625. [Google Scholar]
- Azeez, N.A.; Odufuwa, O.E.; Misra, S.; Oluranti, J.; Damaševičius, R. Windows PE malware detection using ensemble learning. Informatics 2021, 8, 10. [Google Scholar] [CrossRef]
- Damaševičius, R.; Venčkauskas, A.; Toldinas, J.; Grigaliūnas, Š. Ensemble-based classification using neural networks and machine learning models for windows pe malware detection. Electronics 2021, 10, 485. [Google Scholar] [CrossRef]
- Langner, R. Stuxnet: Dissecting a cyberwarfare weapon. IEEE Secur. Priv. 2011, 9, 49–51. [Google Scholar] [CrossRef]
- Roseline, S.A.; Geetha, S.; Kadry, S.; Nam, Y. Intelligent Vision-Based Malware Detection and Classification Using Deep Random Forest Paradigm. IEEE Access 2020, 8, 206303–206324. [Google Scholar] [CrossRef]
- Barriga, J.J.A.; Yoo, S.G. Malware detection and evasion with machine learning techniques: A survey. Int. J. Appl. Eng. Res. 2017, 12, 7207–7214. [Google Scholar]
- Kim, K.; Moon, B.R. Malware detection based on dependency graph using hybrid genetic algorithm. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, Portland, OR, USA, 7–11 July 2010. [Google Scholar] [CrossRef]
- Sanders, C.; Smith, J. Applied Network Security Monitoring; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar] [CrossRef]
- William Stallings, L.B. Computer Security: Principles and Practice, 4th ed.; Pearson: Upper Saddle River, NJ, USA, 2021. [Google Scholar]
- Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
- Vinh, P.C. Context-Aware Systems and Applications (ICCASA 2018) and Nature of Computation and Communication (ICTCC 2018). Mob. Netw. Appl. 2019, 24, 80–81. [Google Scholar] [CrossRef]
- Chouhan, N.; Khan, A.; Rasheed, R.; Khan, H. Network anomaly detection using channel boosted and residual learning based deep convolutional neural network. Appl. Soft Comput. J. 2019, 83, 105612. [Google Scholar] [CrossRef]
Coverage | Other Papers | Our Survey Paper |
---|---|---|
Survey of statistical based methods for malware detection | [13,14] | |
Survey of machine learning based algorithms for malware detection | [15] | |
Survey of deep learning based techniques to detect malware | [13,16,17] | |
Analysis of problems associated with statistical based approaches of detecting malware | [18] | |
Analysis of shortcomings of machine learning based solutions for detecting malware | [15] | |
Analysis of disadvantages of using deep learning based methods to detect malware | [13,16] | |
Survey of FSL methods in the domain of malware detection |
Title | Author | Data Samples Used | Performance Metrics Used | ||
---|---|---|---|---|---|
Source | Malicious | Benign | |||
Support Vector Machine for malware analysis and classification | M. Kruczkowski, E. N. Szynkiewicz | N6 Platform | - | - | Classification Accuracy = 0.9498 Sensitivity = 0.9774 Specificity = 0.8971 AUC = 0.9901 F1 = 0.9623 Precision = 0.9475 |
Improving the detection of malware behavior using simplified data dependent API call graph | E. Elhadi, M. A. Maarof, B. Barry | VxHeavens | 75 | 10 | Detection Rate = 98.6% Accuracy = 98.8% False Alarm = 0% |
Dynamic VSA: a framework for malware detection based on register contents | M. Ghiasi, A. Sami, Z. Salehi | Windows XP system, Program Files Folder, and Private Repository | 850 | 390 | TP = 0.988 FP = 0.125 Recall = 0.988 Precision = 0.888 F-Measure = 0.940 Accuracy = 0.930 |
Novel feature extraction, selection, and fusion for effective malware family classification | M. Ahmadi, G. Giacinto, D. Ulyanov, S. Semenov, M. Trofimov | Microsoft’s Malware classification challenge | 21,741 | 0 | Accuracy, Logloss |
Probabilistic inference on integrity for access behavior based malware detection | W. Mao, Z. Cai, D. Towsley, X. Guan | Windows XP SP3 VxHeavens | 7257 | 534 | TPR, AUC |
Robust and effective malware detection through quantitative data flow graph metrics | T. W¨uchner, M. Ochoa, A. Pretschner | Legitimate app downloads Malicia | 6994 | 513 | Detection Rate, FPR, Precision, F-Measure |
An alternative to NCD for large sequences, Lempel Ziv Jaccard distance | E. Raff, C. Nicholas | Industry Partner | 237,349 | 240,000 | Balanced Accuracy |
Proposing a HMM-based approach to detect metamorphic malware | M. Gharacheh, V. Derhami, S. Hashemi, S. M. H. Fard | Cygwin VxHeavens | - | - | Detection Rate = 0.9803 FPR = 0.0058 Accuracy = 0.9833 |
Heuristic metamorphic malware detection based on statistics of assembly instructions using classification algorithms | P. Khodamoradi, M. Fazlali, F. Mardukhi, M. Nosrati | Windows XP system and Program Files folder Self-generated metamorphic malware | 280 | 550 | Accuracy |
A malware similarity testing framework | J. Upchurch, X. Zhou | Sampled from security incidents | 85 | 0 | PR Curve |
A behavior based malware variant classification technique | G. Liang, J. Pang, C. Dai | Anubis Website | 330,248 | 0 | Similarity measure |
Scaling Malware Execution with Sequential Multi Hypothesis Testing | P. Vadrevu, R. Perdisci | Security Company and Large Research Institute | 1,651,906 | 0 | Jaccard Index |
Fast malware classification by automated behavioral graph matching | Y. Park, D. Reeves, V. Mulukutla, B. Sundaravel | Legitimate apps Anubis Sandbox | 300 | 80 | Similarity measurement |
Automated malware classification based on network behavior | S. Nari, A. A. Ghorban | Communi-cation Research Centre Canada | 3768 | 0 | Accuracy = 94.5783% |
Malware function classification using APIs in initial behavior | N. Kawaguchi, K. Omote | FFRI Inc. | 408 | 236 | Accuracy, FPR, FNR |
Feature selection and extraction for malware classification | C.-T. Lin, N.-J. Wang, H. Xiao, C. Eckert | Sandbox | 3899 | 389 | Micro Precision, Micro Recall, Micro Specificity, Macro Precision, Macro Recall, Macro F1 |
High fidelity, behavior based automated malware analysis and classification | A. Mohaisen, O. Alrawi, M. Mohaisen | AMAL system | 115,157 | 0 | |
Clustering for malware classification | S. Pai, F. Di Troia, C. A. Visaggio, T. H. Austin, M. Stamp | Cygwin utility files and Malicia | 8052 | 213 | Silhouette coefficient, purity |
Towards Automatic Reverse Engineering of Large Datasets of Binaries | M. Polino, A. Scorti, F. Maggi, S. Zanero, Jackdaw | - | - | - | Jaccard Index |
Subroutine based detection of APT malware | J. Sexton, C. Storlie, B. Anderson | - | 197 | 4622 | Similarity index |
A static signal processing based malware triage | D. Kirat, L. Nataraj, G. Vigna, B. Manjunat | Windows XP, ZDNet, NSRL, Anubis | 1,200,000 | 52,750 | Precision and Recall |
Title | Author | Year | Dataset Samples | Performance Metrics | ||||
---|---|---|---|---|---|---|---|---|
Source | Malicious | Benign | ||||||
Early Stage Malware Prediction Using Recurrent Neural Networks | Rhode, Matilda, et al. | 2018 | Machine Activity collected in VM using Cuckoo Sandbox | 594 | 594 | Accuracy = 93% (After 4 min of malware execution) | ||
DL4MD: A Deep Learning Framework for Intelligent Malware Detection | Hardy, William, et al. | 2016 | Comodo Cloud Security Centre | 22,500 | 22,500 | TP = 22,035 FP = 953 TN = 21,547 FN = 465 Accuracy = 96.85% | ||
eXpose: A Character Level Convolutional Neural Network with Embeddings for Detecting Malicious URLs, File Paths and Registry Key | Saxe, Joshua, and Konstant-in Berlin. | 2017 | VirusTotal | URLs | 7,211,705 | 1,496,198 | TPR = 0.77 × 10−4 FPR = 0.84 × 10−3 AUC = 0.993 | |
File Paths | 869,836 | 3,677,404 | TPR = 0.16 × 10−4 FPR = 0.43 × 10−3 AUC = 0.978 | |||||
Regist-ry Keys | 250,819 | 1,282,292 | TPR = 0.51 × 10−4 FPR = 0.62 × 10−3 AUC = 0.992 | |||||
Robust Malware Detection for the Internet of (Battlefield) Things Devices Using Deep Eigenspace Learning | Azmood-eh, Amin, Ali Dehghanta-nha, and Kim Kwang Raymond Choo. | 2018 | VirusTotal | 1078 | 128 | Accuracy = 99% Recall = 98% | ||
Detection of Malicious Code Variants Based on Deep Learning | Cui, Zhihua, et al. | 2018 | Vision Research Lab | 9342 (25 Malware Families) | - | Accuracy = 94.5 Precision = 94.6 Recall = 94.5 Runtime = 20 ms | ||
Malware Identification Using visualization images and deep learning | Ni, Sang, Quan Qian, and Rui Zhang | 2018 | Kaggle 2015 | 10,085 (9 Malware Families) | - | Accuracy = 99% | ||
End-to-End Deep Neural Networks and Transfer Learning for Automatic Analysis of Nation State Malware | Rosenberg, Ishai, Guillaume Sicard, and Eli David. | 2018 | Cuckoo Sandbox | 3200 (2 APT classes) | - | Accuracy = 98.6% | ||
Empowering Convolutional Networks for Malware Classification and Analysis | Kolosnjaji, Bojan, et al. | 2017 | Virusshar, Maltrieve, Private Collection | - | - | Precision = 0.93 Recall = 0.93 F-1 Score = 0.92 | ||
Malware Detection Based on Deep Learning of Behavior Graphs | Fei Xiao et al. | 2019 | Vx Heaven | 880 | 880 | Precision = 0.986 Recall = 0.992 F-1 Score = 0.989 | ||
Deep Learning for Classification of Malware System Call Sequences | Bojan et al. | 2016 | Virusshar, Maltrieve, Private Collection | 4753 | - | Precision = 85.6% Recall = 89.4% | ||
Malware Detection with Deep Neural Network Using Process Behavior | Shun Tobiyama et al. | 2016 | NTT Secure Platform Laboratory | 81 | 69 | AUC = 0.96 | ||
Robust Intelligent Malware Detection Using Deep Learning | R. Vinaya Kumar et al. | 2018 | WSBD | Ember | 70,140 | 69,860 | Accuracy = 98.9% Precision = 99.7% Recall = 98.1% F-1 score = 98.9% | |
WDBD | Cukoo Sandbox | 173,946 | 169,509 | Accuracy = 93.6% Precision = 94.8% Recall = 92.0% F-1 Score = 93.4% | ||||
DIMD | Malimg, Virus-sign, Virus-share | 24,851 | - | Accuracy = 96.3% | ||||
Deep Neural Network Based Malware Detection Using Two Dimensional Binary Program Features | Joshua et al. | 2015 | 81,910 | 350,016 | TPR = 95.2% AUC = 0.999 | |||
Learning the PE Header, Malware Detection With Minimal Domain Knowledge | Edward Raff, Jared Sylvester, Charles Nicholas | 2017 | Group A | Virus- share | 301,575 | 291,285 | Accuracy = 90.8% AUC = 97.7% | |
Group B | Industry Partner | 240,000 | 237,349 | Accuracy = 83.7% AUC = 91.4% | ||||
One Shot Learning Approach for Unknown Malware Classification | True Kien, Hiroshi Sato, Masao Kubo | 2018 | Malicia Project, Virustotal | 23,080 | Accuracy (with training) = 0.74 Accuracy (without training) = 0.85 | |||
DTMIC: Deep transfer learning for malware image classification | Sanjeev Kumar, B. Janet | 2022 | MalImg and MS BIG dataset | 9339 + 10,868 | Accuracy on MalImg = 98.92% Accuracy on BIG dataset = 93.19 | |||
Deep multitask learning for malware image classification | Ahmed Bensaoud, Jugal Kalita | 2022 | Virusshare, Virus total, contagio | Accuracy = 99.97% TPR = 99.98 FPR = 0.73 | ||||
DTMIC: Deep transfer learning for malware image classification | Sanjeev Kumar, B. Janet | 2022 | MalImg and Microsoft | 9339 + 21,741 | Accuracy = 98.92 Precision = 99 Recall = 99 |
Title | Author | Year | Weakness/Limitation |
---|---|---|---|
The architecture of a Platform for Malware Analysis and Confinement | Gilles Berger et al. | 2010 | Usage of low interactive honeypots. Malware that gets active under certain conditions might not be detected in such a scenario. |
A Comparison of Static, Dynamic, and Hybrid Analysis for Malware Detection | Anusha Damodaran et al. | 2015 | In this research work, the comparison is performed only on opcode and system calls whereas there are many more static and dynamic useful features. |
PyTrigger: A System to Trigger & Extract User-Activated Malware Behavior | Dan Fleck et al. | 2013 | Selected features were used in this research work which could have compromised the information which could be gained by other features. |
Malicious Behavior Detection Using Windows Audit Logs | Konstantin et al. | 2015 | Researchers in this search have not mentioned if they have dealt with obfuscated logs. Moreover, each sample is run for four minutes, and this window audit log is not sufficient for slow-moving malware. |
Static Analysis of executables to Detect Malicious Patterns | Mihai et al. | 2006 | The detection algorithm is context insensitive and cannot track the calling context of the executable. It can be made context-sensitive. |
Idea: Opcode-Sequence-Based Malware Detection | Igor Santos, Felix Brezo et al. | 2010 | One of the limitations of this paper was that the authors did not deal with packed executables which are a major part of real time data. Secondly, they used quite a small dataset. |
Malware Detection Based on Hybrid Signature Behavior Application Programming Interface Call Graph | Ammar Ahmed et al. | 2012 | Evasion techniques were not catered to. |
A Heuristic Approach for the Detection of Obfuscated Malware | Scott Treadwell, Mian Zhou | 2009 | Some legitimate applications are reported as malware so the False Positive Rate is high. |
Detecting Metamorphic Malware Using Code Graphs | Jusuk Lee, Kyoochang Jeong, Heejo Lee | 2010 | Only 3 obfuscation techniques were mitigated whereas there are 6 to 8 more obfuscation techniques that are normally applied by the malware writers. |
Title | Author | Year | Weakness/Limitation |
---|---|---|---|
Improving the detection of malware behavior using simplified data dependent API call graph | E. Elhadi, M. A. Maarof, B. Barry | 2015 | One of the major limitations of this research work was the small dataset. |
Dynamic VSA: a framework for malware detection based on register contents | M. Ghiasi, A. Sami, Z. Salehi | 2015 | The authors of this paper used a small dataset which means there is a great chance that models were not trained well. Secondly, a subset of features was used which means there might be many more useful features that were ignored while training. |
Novel feature extraction, selection, and fusion for effective malware family classification | M. Ahmadi, G. Giacinto, D. Ulyanov, S. Semenov, M. Trofimov | 2015 | Feature optimization ignored real distribution |
Probabilistic inference on integrity for access behavior based malware detection | W. Mao, Z. Cai, D. Towsley, X. Guan | 2015 | This research work was carried out on a small dataset, and evasion techniques were not even taken care of, due to which reliability is compromised. Some of the malware need human interaction to get activated as they get triggered over certain input. This important fact was even ignored while the feature extraction process. |
Robust and effective malware detection through quantitative data flow graph metrics | T. W¨uchner, M. Ochoa, A. Pretschner | 2015 | One of the major limitations of this research work was the usage of a small dataset, secondly, since obfuscation was not dealt with while training so in case encountering the obfuscated malware model would not perform well. |
An alternative to NCD for large sequences, Lempel-Ziv Jaccard distance | E. Raff, C. Nicholas | 2017 | Researchers did not consider the obfuscation while training the models, therefore the performance of models would not be good on real time data. |
Proposing a hmm-based approach to detect metamorphic malware | M. Gharacheh, V. Derhami, S. Hashemi, S. M. H. Fard | 2015 | The authors of this paper used a small dataset, secondly, a subset of features was used which means there might be many more useful features that were ignored while training. |
Heuristic metamorphic malware detection based on statistics of assembly instructions using classification algorithms | P. Khodamoradi, M. Fazlali, F. Mardukhi, M. Nosrati | 2015 | Once again, this research work has used small, and again a subset of features is used in this work which means the research has ignored many useful features also. |
A malware similarity testing framework | J. Upchurch, X. Zhou | 2015 | An extremely small dataset which raises a serious question on the reliability of the model’s training. |
A behavior-based malware variant classification technique | G. Liang, J. Pang, C. Dai | 2016 | Small dataset, non-optimized feature set, ignored real distribution |
Scaling Malware Execution with Sequential Multi Hypothesis Testing | P. Vadrevu, R. Perdisci | 2016 | Since this research work ignored evasion techniques and user interaction for triggering malware behavior during feature extraction, therefore model might not perform well on real time data. |
Automated malware classification based on network behavior | S. Nari, A. A. Ghorban | 2013 | Using small dataset in training machine learning algorithms, compromises the reliability of results. |
Malware function classification using APIs in initial behavior | N. Kawaguchi, K. Omote | 2015 | The small dataset which has been used in this research work is the main limitation of this paper. |
Feature selection and extraction for malware classification | C.-T. Lin, N.-J. Wang, H. Xiao, C. Eckert | 2015 | Since this research work ignored evasion techniques and user interaction for triggering malware behavior during feature extraction, therefore model might not perform well on real time data. Moreover, a small dataset was used for training which establishes the fact that the models were not well trained. |
High-fidelity, behavior based automated malware analysis and classification | A. Mohaisen, O. Alrawi, M. Mohaisen | 2015 | This research work ignored evasion techniques. |
Clustering for malware classification | S. Pai, F. Di Troia, C. A. Visaggio, T. H. Austin, M. Stamp | 2015 | Researchers did not consider the obfuscation while training the models, therefore the performance of models would not be good on real time data. Secondly, a small dataset was used for training the models which is not recommended. |
Towards Automatic Reverse Engineering of Large Datasets of Binaries | M. Polino, A. Scorti, F. Maggi, S. Zanero, Jackdaw | 2015 | Evasion techniques, packed malware, and user interaction for triggering malware behavior were ignored while extracting features. Although all these phenomena are found in real time data. Therefore, the model’s performance on real time data will not be accurate. |
Subroutine based detection of APT malware | J. Sexton, C. Storlie, B. Anderson | 2015 | Obfuscated samples are quite commonly found in real time data and this research work ignored obfuscated samples while training the model. |
On the comparison of malware detection methods using data mining with two feature sets | S. Srakaew, W. Piyanuntcharatsr, S. Adulkasem | 2015 | Ignored real distribution |
A static signal processing based malware triage | D. Kirat, L. Nataraj, G. Vigna, B. Manjunat | 2013 | Ignored real distribution |
Towards an Automated Pipeline for Detecting and Classifying Malware through Machine Learning | Nicola Loi et al. | 2021 | Used static features only |
Title | Author | Year | Weakness/Limitation |
---|---|---|---|
Early Stage Malware Prediction Using Recurrent Neural Networks | Rhode, Matilda, et al. | 2018 | The system should be tested for large data. This approach can be a failure if the attacker comes to know that file is being monitored in the first 5 s so this can be evaded. |
DL4MD: A Deep Learning Framework for Intelligent Malware Detection | Hardy, William, et al. | 2016 | Sparsity constraints are not imposed on SAE which can improve malware detection. |
eXpose: A Character-Level Convolutional Neural Network with Embeddings for Detecting Malicious URLs, File Paths and Registry Key | Saxe, Joshua, and Konstantin Berlin. | 2017 | The computational cost of training on long strings is very high. This approach labeled any sample that had 0 occurrences in malware data as benign, and the rest was labeled malware. So, strings, file paths, and registry keys due to less training data can decrease this model’s generalizability. |
Robust Malware Detection for the Internet of (Battlefield) Things Devices Using Deep Eigenspace Learning | Azmoodeh, Amin, Ali Dehghantanha, and Kim-Kwang Raymond Choo. | 2018 | Dataset was small for training the neural network which implies that the network could not learn the features at its best. |
Detection of Malicious Code Variants Based on Deep Learning | Cui, Zhihua, et al. | 2018 | The model required all the input images to be of fixed size due to which images could have lost meaningful information while image processing. |
Malware Identification Using visualization images and deep learning | Ni, Sang, Quan Qian, and Rui Zhang | 2018 | Detection of packed, encrypted malware or malware using anti-debugging and anti-dissembling approaches is not performed. Real time data consist of all these kind of malware, therefore network might not work well with real time data. |
End-to-End Deep Neural Networks and Transfer Learning for Automatic Analysis of Nation State Malware | Rosenberg, Ishai, Guillaume Sicard, and Eli David. | 2018 | Due to the unavailability of nation-state APT, the proposed classifier is not evaluated for it. Moreover, static features provided by Cuckoo sandbox are not verified |
Empowering Convolutional Networks for Malware Classification and Analysis | Kolosnjaji, Bojan, et al. | 2017 | Results of only static malware analysis are used. Code obfuscation can affect the proposed approach. A system unknown family of malware can affect the system’s performance. |
Malware Detection Based on Deep Learning of Behavior Graphs | Fei Xiao et al. | 2019 | The time of execution for extracting API calls is not mentioned. If the time of execution would have been small, then the results would not be reliable |
Deep Learning for Classification of Malware System Call Sequences | Bojan et al. | 2016 | The approach did not consider the evasion of malware detectors by inserting noise in system calls. Moreover, work depends on system calls’ paths which depend on input data. |
Malware Detection with Deep Neural Network Using Process Behavior | Shun Tobiyama et al. | 2016 | Due to the small amount of data, large Deep Neural Networks are not used. |
Robust Intelligent Malware Detection Using Deep Learning | R. Vinaya Kumar et al. | 2018 | Malware are transformed into fixed-size images but can be converted to variable size to get good model learning. |
Deep Neural Network-Based Malware Detection Using Two Dimensional Binary Program Features | Joshua et al. | 2015 | A small amount of data is used for getting low false positives, moreover only syntactic features are used, and semantic features are not focused. |
Learning the PE Header, Malware Detection With Minimal Domain Knowledge | Edward Raff, Jared Sylvester, Charles Nicholas | 2017 | In one of the baseline approaches, features were not normalized |
One-Shot Learning Approach for Unknown Malware Classification | True Kien, Hiroshi Sato, Masao Kubo | 2018 | For training, only malware samples were given. Malware families given for training were insufficient |
Windows PE Malware Detection Using Ensemble Learning | Nureni Ayofe Azeez et al. | 2021 | Base ensemble classifiers only used static features. |
Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection | Robertas Damaševičius et al. | 2021 | Base ensemble classifiers only used static features. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tayyab, U.-e.-H.; Khan, F.B.; Durad, M.H.; Khan, A.; Lee, Y.S. A Survey of the Recent Trends in Deep Learning Based Malware Detection. J. Cybersecur. Priv. 2022, 2, 800-829. https://doi.org/10.3390/jcp2040041
Tayyab U-e-H, Khan FB, Durad MH, Khan A, Lee YS. A Survey of the Recent Trends in Deep Learning Based Malware Detection. Journal of Cybersecurity and Privacy. 2022; 2(4):800-829. https://doi.org/10.3390/jcp2040041
Chicago/Turabian StyleTayyab, Umm-e-Hani, Faiza Babar Khan, Muhammad Hanif Durad, Asifullah Khan, and Yeon Soo Lee. 2022. "A Survey of the Recent Trends in Deep Learning Based Malware Detection" Journal of Cybersecurity and Privacy 2, no. 4: 800-829. https://doi.org/10.3390/jcp2040041
APA StyleTayyab, U. -e. -H., Khan, F. B., Durad, M. H., Khan, A., & Lee, Y. S. (2022). A Survey of the Recent Trends in Deep Learning Based Malware Detection. Journal of Cybersecurity and Privacy, 2(4), 800-829. https://doi.org/10.3390/jcp2040041