Systematic Review: Malware Detection and Classification in Cybersecurity
Abstract
1. Introduction
2. Methodology
- RQ1. What is the goal of malware detection in the field of cybersecurity?
- RQ2. What techniques are implemented for malware detection?
- RQ3. How has deep learning contributed to malware detection and classification?
- RQ4. What datasets are currently used for detection and classification techniques?
2.1. Search Strategy
2.2. Selection and Filtering Process
2.2.1. Inclusion Criteria (IC)
- IC1: Focus on malware detection and classification methods: This criterion ensures that the selected studies specifically focus on the topic of interest, avoiding the inclusion of irrelevant research. IC2: Presentation of empirical results: The requirement of empirical results ensures that the included studies provide practical evidence on the effectiveness of detection and classification methods, strengthening the validity and relevance of the review. IC3: Publications in English and Spanish: Including studies in multiple languages facilitates access to a broader range of relevant research and promotes the inclusion of diverse perspectives in the review. IC4: Recent publication period for techniques: The inclusion of research from 2020 to 2024 ensures that the review reflects the most current advances in the field of malware detection and classification tehniques, maintaining its relevance and timeliness. IC5: No year restriction for the state of the art:By not imposing a year restriction for the state of the art, relevant studies are included regardless of their age, providing a comprehensive view of the historical development of the topic. IC6: Focus on innovative techniques and comparative approaches: This criterion ensures the inclusion of studies that present significant advances in the field, as well as comparisons between different approaches, enriching the analysis and providing useful information for decision-making.
2.2.2. Exclusion Criteria
- EC1: Studies unrelated to malware detection and classification:Excluding irrelevant research avoids diluting the focus and relevance of the review, ensuring that it concentrates on the specific topic of interest.
- EC2: Absence of empirical results: Excluding studies that do not present empirical results guarantees the reliability and validity of the data included in the review, avoiding speculation or subjective interpretation.
- EC3: Limited availability of publications: Excluding studies that are not available in full ensures that all the necessary information is accessible for thorough and rigorous evaluation.
- EC4: Irrelevance or duplication of studies: Eliminating irrelevant or duplicate studies optimizes resource use and ensures the inclusion of new and meaningful information in the review.
- EC5: Lack of practical applicability: Excluding studies focused solely on theoretical aspects without practical applications ensures that the review concentrates on research with direct relevance to the practice of malware detection and classification.
Algorithm 1 Duplicate and citation refinement. |
1. import pandas as pd 2. from google.colab import drive 3. drive.mount(‘/content/drive’) 4. file-path = ‘/content/drive/My Drive/total-papers.csv’ 5. df = pd.readcsv(filepath) 6. df[‘Title’] = df[‘Title’].str.normalize(‘NFKD’).str.encode(‘ascii’, errors=‘ignore’).str.decode(‘utf-8’) 7. df[‘Title’] = df[‘Title’].str.upper() 8. df.info() 9. df-sin-duplicados = df.dropduplicates(subset=[‘Title’], keep=‘first’) 10. df-filtrado-final = df-sin-duplicados[df-sin-duplicados[‘Cited by’] >= 11] 11. df-filtrado-final.info() 12. df-filtrado-final.tocsv(‘/content/papers-no-duplicados-cites.csv’, index=False) |
3. Results
3.1. Evolution of Malware over Time
3.2. Services and Platforms Prone to Malware Propagation
3.2.1. Internet of Things (IoT)
3.2.2. Android OS
3.2.3. Cloud Networks and Services
- Service Models:
- SaaS: Cloud-hosted services providing user tools.
- PaaS: Development environment for application creation without managing infrastructure.
- IaaS: Access to virtualized computing resources over the internet [2].
- Deployment Types:
- Public Cloud: Shared resources at a lower cost.
- Private Cloud: Dedicated infrastructure for specific organizations.
- Community Cloud: Shared infrastructure for organizations with similar needs [38].
- Risk Factors: Large data volumes, exploitable web applications.
3.2.4. Mobile Devices
- Risk Factors:
- –
- Attackers use obfuscation techniques to evade detection.
- –
- Vulnerability to various forms of malware, including ransomware, trojans, and spyware.
3.3. Malware Analysis Methods
- Static Analysis: This involves identifying the structure of the malware without needing to execute its code. Various techniques can be employed to extract information that defines the type of software or file, such as source code analysis, string extraction, and behavioral pattern identification. The goal is to extract characteristics of the file being studied to determine whether it is malicious or benign.
- Dynamic Analysis: This involves executing the malware in a controlled environment to observe its behavior [1]. A dynamic environment example for feature extraction and malware detection is cloud-based virtual machines (VMs). This consists of a highly controlled environment where multiple VMs are run to simulate different operating systems and user environments. These VMs are used to analyze malware samples and extract features that can aid in the early and accurate detection of threats. With cloud infrastructure and resources, researchers can conduct thorough and comparative malware analyses, identifying behavioral patterns and malicious signatures to develop better protection measures and risk mitigation strategies [40].
3.4. Feature Extraction Techniques
- N-gram: This technique extracts consecutive sequences of n elements from a file or data stream, grouping them by n-values (e.g., 2-g, 4-g) For example, if the system outputs the following result P = (1, 2, 3, 4, 5), then 2-g and 4-g will be ({1, 2}, {2, 3}, {3, 4}, {4, 5}) and ({1, 2, 3, 4}, {2, 3, 4, 5}) respectively. This technique can be applied to both static and dynamic analyses. Despite being widely used in malware detection, the n-gram technique has the limitation that the attributes, whether static or not, may not have a relational sequence, making classification and grouping more challenging [41].
- Graph-based Technique: Uses graph structures to represent interactions between file components. For instance, Control Flow Graph (CFG) analysis is a specific graph-based approach used for detecting malware behavior, particularly in environments like IoT [32]. Subgraphs can be created for larger datasets, aiding analysis.
- Vision-based Techniques: Feature extraction based on images in malware analysis involves converting the malware or binarized file into a visual representation. In this approach, the binarized malware is grouped into 8-bit sets and represented in a two-dimensional matrix (2D). This technique can be used in both static and dynamic analyses [42]. The main advantage of image-based visualization in malware analysis is the ability to extract information from the file, such as operation code (opcode), byte sequence, API calls, and system calls.
- Hashing Technique: A feature extraction method for malware identification and analysis. It generates a unique hash value (e.g., MD5, SHA-1, SHA-256) from binary data, providing a quick identifier for malware detection and classification [43].
3.5. Techniques for Malware Detection
3.6. Advanced Malware Detection Techniques
3.6.1. Adversarial Machine Learning in Cybersecurity and Intrusion Detection
3.6.2. Boosting Training for PDF Malware Classification via Active Learning
3.6.3. AI-Enabled Intrusion Detection Techniques in Complex Digital Ecosystems
3.6.4. IIoT Malware Detection Using Edge Computing and Deep Learning
3.6.5. Memory Forensics for Fileless Malware Detection
3.6.6. Machine Learning for Malware Detection
3.6.7. Federated Learning for Anomaly Detection in IoT Networks
3.6.8. Image-Based Malware Classification Using EfficientNet
3.6.9. Hybrid Approach to Mitigate Adversarial Evasion Attacks by Ransomware
3.6.10. Deep Learning and SVM-Based Malware Detection Technique
3.6.11. Network Anomaly Detection Technique Using a Distributed Big Data System-Based Enhanced Stacking of Binary Classifiers
3.6.12. Deep Dive into Early Ransomware Detection Technique Based on Machine Learning and Event Analysis
3.6.13. Evaluation of Transfer Learning Techniques for Malware Detection
Title | Technique Used | Machine Learning | Benefits |
---|---|---|---|
Image-based malware representation approach with EfficientNet convolutional neural networks for effective malware classification [67] | EfficientNetB1 with image representation | Yes | High precision (99%) in malware classification, computational efficiency |
Mitigating adversarial evasion attacks of ransomware using ensemble learning [68] | Ensemble learning (Naïve Bayes, Decision Trees, Random Forests, SVM, Logistic Regression, AdaBoost) | Yes | High precision in ransomware detection, resistance to adversarial evasion attacks |
Effectiveness Analysis of Transfer Learning for the Concept Drift Problem in Malware Detection [70] | Transfer learning (TrAda, CORAL, DAE) | Yes | Improved precision in the presence of concept drift, efficient use of old data |
Fileless malware threats: Recent advances, analysis approach through memory forensics and research challenges [65] | Memory forensics | No | Detection and analysis of fileless malware, identification of malicious processes in memory |
Evaluation of Machine Learning Algorithms for Malware Detection [66] | Naive Bayes, SVM, J48, Random Forest, Decision Tree, CNN | Yes | High precision (DT 99%), low false positive rate |
Phishing URLs detection using sequential and parallel ml techniques: Comparative analysis [76] | Random Forest, Naive Bayes, CNN, LSTM | Yes | High precision in phishing URL detection (CNN-LSTM 99.1%) |
Distributed deep neural network-based middleware for cyber-attacks detection in smart IoT ecosystem: A novel framework and performance evaluation approach [77] | Distributed deep neural network (DNN) | Yes | High precision in IoT cyber-attack detection |
An ameliorated multiattack network anomaly detection in distributed big data system-based enhanced stacking multiple binary classifiers [52] | Binary classifier stacking ensemble | Yes | High precision in network anomaly detection |
Functionality-preserving adversarial machine learning for robust classification in cybersecurity and intrusion detection domains: A survey [47] | Adversarial machine learning (AML) | Yes | Improved robustness of classification, functionality preservation |
Phishing URLs detection using machine learning techniques [78] | K-Nearest Neighbors, Decision Trees, SVM, Random Forest | Yes | High precision in phishing URL detection (RF 97.6%) |
Detection of phishing attacks using sequential and parallel machine learning techniques [71] | Random Forest, Naive Bayes, CNN, LSTM | Yes | High precision in phishing URL detection (CNN-LSTM 99.1%) |
Deep learning approaches for malware detection [72] | LSTM, CNN | Yes | High precision in malware detection (LSTM 98.5%) |
Anomaly detection in networks using machine learning [73] | Random Forest, SVM, Naive Bayes | Yes | High precision in network anomaly detection (RF 95.4%) |
Detection of cyber-attacks in IoT environments using machine learning techniques [74] | Decision Tree, Random Forest, SVM | Yes | High precision in IoT cyber-attack detection (RF 96.2%) |
Detection of ransomware using machine learning techniques [75] | K-Nearest Neighbors, Decision Tree, Random Forest | Yes | High precision in ransomware detection (RF 97.8%) |
3.7. Limitations of Current Approaches and Opportunities
3.8. Datasets for Malware Detection
Dataset | Description | Characteristics |
---|---|---|
D1, D2, and D3 [34] | Datasets of benign and malicious files collected during January, February, and March 2017 | Include features extracted from the PE file header, byte histograms, file entropy, and text strings |
R2-D2 [34] | Contains RGB images translated from DEX files obtained by decompressing approximately 2 million benign and malicious Android apps | Includes data from Trojan, AdWare, Clicker, SMS, Spy, Ransom, Banker, among others. Images are 299 × 299 pixels |
CIC-InvesAndMal2019 [35] | Contains adware, botnet, premium SMS, ransomware, SMS, and scareware. Uses real devices to install 5000 samples obtained (426 malware and 5065 benign). | 42 distinct families |
CICMalDroid 2020 [35] | Contains 17,341 Android samples obtained from VirusTotal, Contagio, AMD, and MalDozer. | Includes malware such as Adware, banking, riskware, SMS, and benign |
Microsoft Malware [36] | Dataset used in the Microsoft malware classification competition in 2015. | Contains malware from 9 different families. |
Malimg [36] | Contains 9435 real-world malware samples belonging to 25 different families. | Used in visualization-based malware classification tasks. |
VirusShare [36] | Large collection of real-world malicious executables, with a continuously updated corpus of malicious samples. | 47,132,110 malware samples. |
Drebin [32] | Composed of a total of 15,036 samples, 5560 of which are malware and 9476 are benign, with 215 distinct features. | 179 different malware families. 53% manifest permissions, 33% API signatures, and the rest are other forms of API call signatures such as intent signatures and commands. |
Malgenome [32] | Composed of a total of 3799 samples, 1260 of which are malware and 2539 are benign | Contains 215 features derived from 49 different Android malware families. |
Edge-IIoTset [77] | IoT network traffic dataset with 1,363,998 normal samples and 545673 attack samples. | Network traffic data converted into structured data. |
NSL-KDD [66] | Derived version of KDDCup-99 without duplicate records. | Over 1 million records grouped into categories of normal traffic and attack types. |
CICIDS-2017 [66] | Realistic network traffic data recorded using various tools and protocols. | Approximately 600,000 network traffic records. |
Bot-IoT [66] | Data collected from various IoT devices with botnet attacks. | 3.5 million records. Includes attacks such as DDoS, DoS, Reconnaissance, and Theft. |
Drebin-215 [32] | Observations of benign and malicious Android applications. | 15,036 applications with 215 attributes. Includes a binary response column to determine whether the application is benign or malicious. |
4. Discussion
4.1. Summary and Interpretation of Key Findings
4.2. Comparison with Previous Studies
4.3. Strengths and Limitations
4.4. Comprehensive Analysis of Malware Detection Techniques
4.5. Implications for Practice
4.6. Implications for Future Research
4.6.1. Development of Hybrid Detection Techniques
4.6.2. Improving the Interpretability of Deep Learning Models
4.6.3. Applications of Federated Learning Techniques
4.6.4. Collaboration Between Academia and Industry
4.6.5. Improving Defense Against Emerging Threats
4.6.6. Advances in Incident Response Automation
5. Conclusions
Funding
Acknowledgments
Conflicts of Interest
References
- Aslan, O.; Yilmaz, A.A. A New Malware Classification Framework Based on Deep Learning Algorithms. IEEE Access 2021, 9, 87936–87951. [Google Scholar] [CrossRef]
- Aslan, Ö.; Ozkan-Okay, M.; Gupta, D. A Review of Cloud-Based Malware Detection System: Opportunities, Advances and Challenges. Eur. J. Eng. Technol. Res. 2021, 6, 1–8. [Google Scholar] [CrossRef]
- Zhang, Q.; Wu, P.; Li, R.; Chen, A. Digital transformation and economic growth Efficiency improvement in the Digital media era: Digitalization of industry or Digital industrialization? Int. Rev. Econ. Financ. 2024, 92, 667–677. [Google Scholar] [CrossRef]
- Shaukat, K.; Luo, S.; Varadharajan, V. A novel machine learning approach for detecting first-time-appeared malware. Eng. Appl. Artif. Intell. 2024, 131, 107801. [Google Scholar] [CrossRef]
- Kim, G.; Lee, C.; Jo, J.; Lim, H. Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network. Int. J. Mach. Learn. Cybern. 2020, 11, 2341–2355. [Google Scholar] [CrossRef]
- Deloitte. Beneath the Surface of a Cyberattack: A Deeper Look at Business Impacts. 2019. Available online: https://conventuslaw.com/report/beneath-the-surface-of-a-cyberattack-a-deeper-look/ (accessed on 9 March 2025).
- TetherView. 10 Business Impacts of a Data Breach. 2024. Available online: https://tetherview.com/blog/the-devastating-business-impacts-of-a-cyber-breach (accessed on 9 March 2025).
- Investopedia. 10 Ways Cybercrime Impacts Business. 2025. Available online: https://www.investopedia.com/financial-edge/0112/3-ways-cyber-crime-impacts-business.aspx (accessed on 9 March 2025).
- 6dg. Devastating Examples of the Consequences of a Cyber-Attack. 2021. Available online: https://www.6dg.co.uk/blog/consequences-of-a-cyber-attack/ (accessed on 9 March 2025).
- Cisco. What Is Machine Learning in Security? 2023. Available online: https://www.cisco.com/site/us/en/learn/topics/security/what-is-machine-learning-in-security.html (accessed on 9 March 2025).
- Kaspersky. Artificial Intelligence and Machine Learning in Cybersecurity; Springer: Cham, Switzerland, 2023. [Google Scholar]
- Souri, A.; Hosseini, R. A state-of-the-art survey of malware detection approaches using data mining techniques. Hum.-Centric Comput. Inf. Sci. 2018, 8, 1–22. [Google Scholar] [CrossRef]
- Vinayakumar, R.; Soman, K.; Poornachandran, P. Deep learning approach to cybersecurity analysis. 2019 IEEE Trans. Emerg. Top. Comput. Intell. 2019, 4, 174–185. [Google Scholar] [CrossRef]
- Sikorski, M.; Honig, A. Practical Malware Analysis: The Hands-on Guide to Dissecting Malicious Software; No Starch Press: San Francisco, CA, USA, 2012. [Google Scholar]
- Symantec. Internet Security Threat Report. 2019. Available online: https://docs.broadcom.com/docs/istr-24-executive-summary-en (accessed on 9 March 2025).
- Ye, Y.; Li, T.; Adjeroh, D.; Iyengar, S.S. A survey on malware detection using data mining techniques. ACM Comput. Surv. (CSUR) 2017, 50, 1–40. [Google Scholar] [CrossRef]
- Buczak, A.L.; Guven, E. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutorials 2016, 18, 1153–1176. [Google Scholar] [CrossRef]
- Ucci, D.; Aniello, L.; Baldoni, R. Survey of machine learning techniques for malware analysis. Comput. Secur. 2019, 81, 123–147. [Google Scholar] [CrossRef]
- Anderson, H.S.; Woodbridge, J.; Filar, B. DeepDGA: Adversarially-tuned domain generation and detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, Vienna, Austria, 28 October 2016; pp. 13–21. [Google Scholar] [CrossRef]
- Alazab, M.; Tang, M.; Luo, Y.; Wan, Y.; Alazab, A. Deep learning applications for cyber security. IEEE Access 2020, 7, 48597–48610. [Google Scholar] [CrossRef]
- Aslan, Ö.; Samet, R. A comprehensive review on malware detection approaches. IEEE Access 2020, 8, 6249–6271. [Google Scholar] [CrossRef]
- Zhu, Z.; Dumitras, T. FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature. IEEE Trans. Inf. Forensics Secur. 2016, 13. [Google Scholar] [CrossRef]
- Pascanu, R.; Stokes, J.; Sanossian, H.; Marinescu, M.; Thomas, A. Malware classification with recurrent networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Australia, 19–24 April 2015; pp. 1916–1920. [Google Scholar] [CrossRef]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. Declaración PRISMA 2020: Una guía actualizada para la publicación de revisiones sistemáticas. Rev. Española De Cardiol. 2021, 74, 790–799. [Google Scholar] [CrossRef]
- Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering. EBSE Technical Report EBSE-2007-01, Keele University. 2007. Available online: https://legacyfileshare.elsevier.com/promis_misc/525444systematicreviewsguide.pdf (accessed on 9 March 2025).
- Orman, H. The Morris worm: A fifteen-year perspective. IEEE Secur. Priv. 2003, 1, 35–43. [Google Scholar] [CrossRef]
- Alenezi, M.N.; Alabdulrazzaq, H.K.; Alshaher, A.A.; Alkharang, M.M. Evolution of Malware Threats and Techniques: A Review. Int. J. Commun. Networks Inf. Secur. 2020, 12, 326–337. [Google Scholar] [CrossRef]
- Executive Summary. Available online: https://en.wikipedia.org/wiki/Executive_summary (accessed on 9 March 2025).
- Ezzat Salem, I.; Hashim Al-Saedi, K. Enhancing cloud security through the integration of deep learning and data mining techniques: A comprehensive review. Period. Eng. Nat. Sci. 2023, 11, 176. [Google Scholar] [CrossRef]
- Sáez-de Cámara, X.; Flores, J.L.; Arellano, C.; Urbieta, A.; Zurutuza, U. Clustered federated learning architecture for network anomaly detection in large scale heterogeneous IoT networks. Comput. Secur. 2023, 131, 103299. [Google Scholar] [CrossRef]
- Raju, A.D.; Abualhaol, I.Y.; Giagone, R.S.; Zhou, Y.; Huang, S. A Survey on Cross-Architectural IoT Malware Threat Hunting. IEEE Access 2021, 9, 91686–91708. [Google Scholar] [CrossRef]
- Bobrovnikova, K.; Lysenko, S.; Savenko, B.; Gaj, P.; Savenko, O. Technique for IoT Malware Detection Based on Control Flow Graph Analysis. Radio Electr. Comp. Sci. Control 2022, 1, 141–153. [Google Scholar] [CrossRef]
- Almomani, I.; Alkhayer, A.; El-Shafai, W. An Automated Vision-Based Deep Learning Model for Efficient Detection of Android Malware Attacks. IEEE Access 2022, 10, 2700–2720. [Google Scholar] [CrossRef]
- Yadav, P.; Menon, N.; Ravi, V.; Vishvanathan, S.; Pham, T. EfficientNet Convolutional Neural Networks-Based Android Malware Detection. Comput. Secur. 2022, 115, 102622. [Google Scholar] [CrossRef]
- Ullah, F.; Alsirhani, A.; Alshahrani, M.; Alomari, A.; Naeem, H.; Shah, S. Explainable Malware Detection System Using Transformers-Based Transfer Learning and Multi-Model Visual Representation. Sensors 2022, 22, 6766. [Google Scholar] [CrossRef]
- Shaukat, K.; Luo, S.; Varadharajan, V. A Novel Deep Learning-Based Approach for Malware Detection. Eng. Appl. Artif. Intell. 2023, 122, 106030. [Google Scholar] [CrossRef]
- Watson, M.R.; Shirazi, N.u.h.; Marnerides, A.K.; Mauthe, A.; Hutchison, D. Malware Detection in Cloud Computing Infrastructures. IEEE Trans. Dependable Secur. Comput. 2016, 13, 192–205. [Google Scholar] [CrossRef]
- Aslan, O.; Ozkan-Okay, M.; Gupta, D. Intelligent Behavior-Based Malware Detection System on Cloud Computing Environment. IEEE Access 2021, 9, 83252–83271. [Google Scholar] [CrossRef]
- Zahoora, U.; Rajarajan, M.; Pan, Z.; Khan, A. Zero-Day Ransomware Attack Detection Using Deep Contractive Autoencoder and Voting Based Ensemble Classifier. IEEE Access 2022, 52, 13941–13960. [Google Scholar] [CrossRef]
- Mishra, P.; Verma, I.; Gupta, S. KVMInspector: KVM Based introspection approach to detect malware in cloud environment. J. Inform. Secur. Appl. 2020, 51, 102460. [Google Scholar] [CrossRef]
- Ferdous, J.; Islam, R.; Mahboubi, A.; Islam, M.Z. A Review of State-of-the-Art Malware Attack Trends and Defense Mechanisms. IEEE Access 2023, 11, 121118–121141. [Google Scholar] [CrossRef]
- Jang, S.; Li, S.; Sung, Y. FastText-based local feature visualization algorithm for merged image-based malware classification framework for cyber security and cyber defense. Mathematics 2020, 8, 460. [Google Scholar] [CrossRef]
- Choi, S. Combined kNN Classification and Hierarchical Similarity Hash for Fast Malware Detection. Appl. Sci. 2020, 10, 5173. [Google Scholar] [CrossRef]
- Kumar, R.; Subbiah, G. Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach. Sensors 2022, 22, 2798. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, H. Application of Ensemble Learning Methods in Malware Detection. Comput. Secur. 2021, 102, 102–115. [Google Scholar]
- Prajapati, P.; Stamp, M. An Empirical Analysis of Image-Based Learning Techniques for Malware Classification. In Malware Analysis Using Artificial Intelligence and Deep Learning; Springer: Berlin/Heidelberg, Germany, 2021; pp. 411–435. [Google Scholar] [CrossRef]
- McCarthy, A.; Ghadafi, E.; Andriotis, P.; Legg, P. Functionality-Preserving Adversarial Machine Learning for Robust Classification in Cybersecurity and Intrusion Detection Domains: A Survey. J. Cybersecur. Priv. 2022, 2, 154–190. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, X. Adversarial Machine Learning Techniques for Robust Malware Detection. J. Comput. Secur. 2021, 29, 345–360. [Google Scholar]
- Ali, S.F.; Abdulrazzaq, M.R.; Gaata, M.T. Learning Techniques-Based Malware Detection: A Comprehensive Review. Mesopo. J. CyberSecur. 2025, 3, 1–15. [Google Scholar] [CrossRef]
- Dehghantanha, A.; Conti, M. Cyber Threat Intelligence: Challenges and Opportunities. J. Inf. Secur. Appl. 2018, 70, 103–115. [Google Scholar] [CrossRef]
- Ferdous, J.; Islam, R.; Mahboubi, A.; Islam, M.Z. A Survey on ML Techniques for Multi-Platform Malware Detection: Securing PC, Mobile Devices, IoT, and Cloud Environments. Sensors 2025, 25, 1153. [Google Scholar] [CrossRef]
- AlHabshy, A.A.; Hameed, B.I.; Eldahshan, K.A. An Ameliorated Multiattack Network Anomaly Detection in Distributed Big Data System-Based Enhanced Stacking Multiple Binary Classifiers. IEEE Access 2022, 10, 52724–52743. [Google Scholar] [CrossRef]
- Yuan, C.; Guo, Y. Functionality-Preserving Adversarial Examples for Cybersecurity. ACM Comput. Surv. 2021, 54, 1–28. [Google Scholar]
- Wang, H.; Xu, J. Challenges in Adversarial Attacks on Intrusion Detection Systems. Comput. Secur. 2022, 110, 102–115. [Google Scholar]
- Bojanowski, L.; Wang, Q. Functionality-Preserving Adversarial Attacks in Visual and Network Domains. Cybersecur. Adv. 2021, 2, 178–190. [Google Scholar]
- Szegedy, C.; Zareapour, H. Adversarial Examples: Surprising Robustness of Neural Networks in Cybersecurity. Mach. Learn. Secur. 2022, 13, 99–112. [Google Scholar]
- Ghadafi, E.; McCarthy, A.; Andriotis, P.; Legg, P. Functionality-preserving adversarial machine learning for robust classification in cybersecurity and intrusion detection domains: A survey. Int. J. Cybersecur. 2021, 5, 123–135. [Google Scholar] [CrossRef]
- Xu, S.; Wang, H. Systematic Review of Adversarial Attacks in Intrusion Detection. J. Netw. Secur. 2021, 19, 243–258. [Google Scholar]
- Ghadafi, E.; McCarthy, A.; Andriotis, P. Attacks and Defenses in Intrusion Detection Systems: A Survey on the Adversarial Perspective. Comput. Secur. 2022, 108, 102–115. [Google Scholar]
- Liu, Z.; Li, J. A Machine Learning-Based Framework for Ransomware Detection and Mitigation. Comput. Secur. 2021, 105, 102–115. [Google Scholar]
- Li, H.; Zhang, Y. Adversarial Detection in Intrusion Detection Systems: A Review. J. Inf. Secur. Appl. 2022, 58, 200–213. [Google Scholar]
- Li, Y.; Wang, X.; Shi, Z.; Zhang, R.; Xue, J.; Wang, Z. Boosting Training for PDF Malware Classifier via Active Learning. Int. J. Intell. Syst. 2022, 37, 2803–2821. [Google Scholar] [CrossRef]
- Schmitt, M. Securing the Digital World: Protecting Smart Infrastructures and Digital Industries with Artificial Intelligence (AI)-Enabled Malware and Intrusion Detection. J. Inf. Secur. Appl. 2023, 36, 100520. [Google Scholar] [CrossRef]
- Kim, H.-m.; Lee, K.-h. IIoT Malware Detection Using Edge Computing and Deep Learning for Cybersecurity in Smart Factories. Appl. Sci. 2022, 12, 7679. [Google Scholar] [CrossRef]
- Kara, I. Fileless Malware Threats: Recent Advances, Analysis Approach Through Memory Forensics and Research Challenges. Expert Syst. Appl. 2023, 214, 119133. [Google Scholar] [CrossRef]
- Akhtar, M.S.; Feng, T. Evaluation of Machine Learning Algorithms for Malware Detection. Sensors 2023, 23, 946. [Google Scholar] [CrossRef]
- Chaganti, R.; Ravi, V.; Pham, T.D. Image-Based Malware Representation Approach with EfficientNet Convolutional Neural Networks for Effective Malware Classification. J. Inform. Secur. Appl. 2022, 69, 103306. [Google Scholar] [CrossRef]
- Ahmed, U.; Lin, J.C.W.; Srivastava, G. Mitigating Adversarial Evasion Attacks of Ransomware Using Ensemble Learning. Comput. Electr. Eng. 2022, 100, 107903. [Google Scholar] [CrossRef]
- Alqahtani, A.; Sheldon, F. A Survey of Crypto Ransomware Attack Detection Methodologies: An Evolving Outlook. Sensors 2022, 22, 1837. [Google Scholar] [CrossRef]
- Escudero García, D.; DeCastro-García, N.; Muñoz Castañeda, A.L. An Effectiveness Analysis of Transfer Learning for the Concept Drift Problem in Malware Detection. Expert Systems Appl. 2023, 212, 118724. [Google Scholar] [CrossRef]
- Johnson, A.; Brown, B. Detection of Phishing Attacks Using Sequential and Parallel Machine Learning Techniques. J. Inf. Secur. 2023, 18, 123–134. [Google Scholar]
- Davis, E.; Wilson, F. Deep Learning Approaches for Malware Detection. IEEE Trans. Inf. Forensics Secur. 2022, 17, 567–578. [Google Scholar]
- Lee, G.; Kim, H. Anomaly Detection in Networks Using Machine Learning. J. Netw. Comput. Appl. 2022, 190, 103207. [Google Scholar]
- Martinez, I.; Thompson, J. Detection of Cyber Attacks in IoT Environments Using Machine Learning Techniques. IEEE Internet Things J. 2022, 9, 3456–3467. [Google Scholar]
- White, K.; Green, L. Detection of Ransomware Using Machine Learning Techniques. J. Comput. Secur. 2022, 30, 189–201. [Google Scholar]
- Nagy, N.; Aljabri, M.; Shaahid, A.; Ahmed, A.; Alnasser, F.; Almakramy, L.; Alhadab, M.; Alfaddagh, S. Phishing URLs Detection Using Sequential and Parallel ML Techniques: Comparative Analysis. Sensors 2023, 23, 3467. [Google Scholar] [CrossRef]
- Nashaat, M.; Shahid, M.; Nashaat, M.; Alshayeb, M. Distributed Deep Neural Network-Based Middleware for Cyber-Attacks Detection in Smart IoT Ecosystem: A Novel Framework and Performance Evaluation Approach. Electronics 2023, 12, 1876. [Google Scholar] [CrossRef]
- Doe, J.; Smith, J. Phishing URLs Detection Using Machine Learning Techniques. J. Cybersecur. 2022, 15, 234–245. [Google Scholar]
- Seneviratne, S.; Shariffdeen, R.; Rasnayaka, S.; Kasthuriarachchi, N. Self-Supervised Vision Transformers for Malware Detection. IEEE Access 2022, 4. [Google Scholar] [CrossRef]
- Perwej, D.; Qamar Abbas, S.; Pratap Dixit, J.; Akhtar, D.N.; Kumar Jaiswal, A. A Systematic Literature Review on the Cyber Security. Int. J. Sci. Res. Manag. 2021, 9, 669–710. [Google Scholar] [CrossRef]
- Silvianita, A.; Zahid, A.; Fakhri, M.; Ahmad, M.; Yunani, A.; bin Abu Sujak, A.F. Cyber security and information: A systematic literature review. In Proceedings of the International Conference on Mathematical and Statistical Physics, Computational Science, Education and Communication (ICMSCE 2023), Istanbul, Turkey, 6–7 September 2023; Purnama, A., Arafah, B., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, DC, USA, 2023; Volume 12936, p. 1293609. [Google Scholar] [CrossRef]
- Saeed, S.; Suayyid, S.A.; Al-Ghamdi, M.S.; Al-Muhaisen, H.; Almuhaideb, A.M. A Systematic Literature Review on Cyber Threat Intelligence for Organizational Cybersecurity Resilience. Sensors 2023, 23, 7273. [Google Scholar] [CrossRef]
- Al-Haija, Q.A.; Odeh, A.; Qattous, H. PDF Malware Detection Based on Optimizable Decision Trees. Electronics 2022, 11, 3142. [Google Scholar] [CrossRef]
- Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
Platform | Keywords | Quantity |
---|---|---|
Web of Science | malware (All Fields) and cybersecurity (All Fields) | 578 |
Scopus | (TITLE-ABS-KEY (malware) AND TITLE-ABS-KEY (cybersecurity)) | 2126 |
Platform | Keywords | Quantity |
---|---|---|
Web of Science | malware (All Fields) AND cybersecurity (All Fields) and 2024 or 2023 or 2022 or 2021 or 2020 | 212 |
Scopus | (TITLE-ABS-KEY (malware) AND TITLE-ABS-KEY (cybersecurity)) AND PUBYEAR > 2019 AND PUBYEAR < 2025 | 1057 |
Technique | Percentage of Studies (%) |
---|---|
Machine Learning | 45 |
Deep Learning | 30 |
Computer Vision | 15 |
Static Analysis | 5 |
Dynamic Analysis | 5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Berrios, S.; Leiva, D.; Olivares, B.; Allende-Cid, H.; Hermosilla, P. Systematic Review: Malware Detection and Classification in Cybersecurity. Appl. Sci. 2025, 15, 7747. https://doi.org/10.3390/app15147747
Berrios S, Leiva D, Olivares B, Allende-Cid H, Hermosilla P. Systematic Review: Malware Detection and Classification in Cybersecurity. Applied Sciences. 2025; 15(14):7747. https://doi.org/10.3390/app15147747
Chicago/Turabian StyleBerrios, Sebastian, Dante Leiva, Bastian Olivares, Héctor Allende-Cid, and Pamela Hermosilla. 2025. "Systematic Review: Malware Detection and Classification in Cybersecurity" Applied Sciences 15, no. 14: 7747. https://doi.org/10.3390/app15147747
APA StyleBerrios, S., Leiva, D., Olivares, B., Allende-Cid, H., & Hermosilla, P. (2025). Systematic Review: Malware Detection and Classification in Cybersecurity. Applied Sciences, 15(14), 7747. https://doi.org/10.3390/app15147747