Next Article in Journal
Comparative Evaluation of Gemini and DeepSeek for LLM-Generated Code Quality and Architectural Robustness in Backend Software Engineering
Previous Article in Journal
Electromagnetic Characteristic Analysis of Microbump Structures Under Standard Integrated Circuit Processes
Previous Article in Special Issue
AMNDA: An Adaptive Multi-Layer, Lifecycle-Aware Defense Architecture for Multi-Stage Cyberattacks with Azure-Based Validation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Tri-Axis Systematic Literature Review of AI-Powered Cyber Defense: ATT&CK-Aligned Analysis of Cyberattacks, Machine Learning Methods, and Datasets

School of Business, Computing and Social Sciences, University of Gloucestershire, Park Campus, Cheltenham GL50 2RH, UK
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(13), 2804; https://doi.org/10.3390/electronics15132804 (registering DOI)
Submission received: 13 March 2026 / Revised: 8 May 2026 / Accepted: 12 May 2026 / Published: 25 June 2026

Abstract

The increasing complexity and sophistication of cyberattacks have made machine learning (ML) and artificial intelligence (AI) central to modern cyber defense. However, existing surveys typically examine attacks, ML methods, or datasets separately, limiting understanding of how methodological choices align with adversarial behaviours and benchmark availability. This paper presents a systematic literature review (SLR) of AI- and ML-based cyber defense studies published between 2019 and 2025, framed as an ATT&CK-aligned tri-axis synthesis of cyberattacks, machine learning methods, and datasets. Across 99 primary studies, the review maps 312 attack labels to MITRE ATT&CK tactics and techniques, categorises the ML methods applied, and organizes 96 datasets into a refined taxonomy spanning NIDD, IoT-NIDD, malware, Spam and Phishing, ICS, Insider Threat, custom-collected, and other datasets. Rather than treating attacks, ML methods, and datasets as separate descriptive dimensions, the review analyses them jointly through a tri-axis cross-reference framework, enabling the identification of benchmark dependence, methodological concentration, and underexplored attack–method–dataset intersections that are not visible in single-axis or model-centred surveys. The synthesis shows that the literature is strongly concentrated on externally visible attacks associated with Impact, Initial Access, and Execution, that ensemble and deep learning models dominate high-frequency detection settings, and that dataset usage remains heavily skewed toward a small set of public benchmarks, particularly CSE-CIC-IDS2017, UNSW-NB15, and NSL-KDD. This review further identifies persistent blind spots, including limited coverage of post-compromise ATT&CK behaviours, sparse use of ICS and insider-threat datasets, and weak support for multi-stage or multi-dataset evaluation. These findings provide a more focused and actionable evidence base for future ML-based cyber defense research.

1. Introduction

Cybersecurity has become one of the most pressing challenges in the digital era, primarily due to the exponential growth of interconnected systems, cloud infrastructures, and Internet of Things (IoT) devices [1]. This rapid expansion of the attack surface has been accompanied by a marked increase in the complexity and sophistication of cyber threats [2]. Traditional attacks such as brute-force password guessing or denial-of-service (DoS) have evolved into highly coordinated, multi-vector campaigns that exploit advanced malware, supply-chain vulnerabilities, and even adversarial artificial intelligence (AI) [3,4]. As a result, conventional signature or rule-based detection mechanisms are increasingly insufficient in addressing modern adversarial tactics, techniques, and procedures (TTPs) [5,6].
The evolution of cyberattacks reflects a growing asymmetry between adversarial innovation and defensive readiness [7]. Recent years have witnessed a proliferation of sophisticated attacks targeting diverse domains, ranging from ransomware and distributed denial-of-service (DDoS) campaigns to stealthy insider threats and industrial control system (ICS) compromises [8,9]. While some attack categories such as intrusion detection [10,11] have attracted significant research attention, others remain comparatively underexplored. This imbalance suggests the presence of critical research gaps where adversaries may enjoy a strategic advantage due to limited academic or operational focus. Understanding not only which attack types dominate the research landscape, but also which ones are neglected, is essential for shaping future defensive strategies [12].
In response to these challenges, the field has increasingly turned to machine learning (ML) methods for adaptive and data-driven cyber defense [13]. The machine learning methods used in cybersecurity span a wide spectrum of techniques, ranging from foundational optimization and classical methods to advanced ensemble and hybrid deep learning architectures [14]. These methods are deployed to address critical challenges, including intrusion detection, malware classification, phishing detection, and insider threat monitoring [3,15]. Deep learning approaches such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs) have further demonstrated strong capabilities in capturing temporal, spatial, and multi-modal attack patterns, thereby enhancing detection accuracy [16,17,18]. However, this diversity of techniques raises important questions: Which ML paradigms are most effective for particular adversarial behaviours or attack stages? Why do certain methods dominate in specific contexts while others remain underutilized?
A systematic way to address these questions is by mapping ML methods onto structured attack frameworks. The MITRE ATT&CK framework provides a comprehensive taxonomy of adversarial TTPs across stages such as Initial Access, Lateral Movement, and Exfiltration [19]. By aligning ML approaches with these tactics and techniques, researchers can gain deeper insight into the suitability of specific algorithms for particular adversarial behaviours. For example, ensemble learning methods have frequently demonstrated utility in classifying attack behaviours, while sequential models such as LSTM and GRU are often applied to detect exfiltration or credential access attacks [17,18]. Such cross-referencing not only illuminates prevailing trends but also highlights underutilized approaches with potential for novel defensive applications.
Equally critical to AI-driven cybersecurity research is the question of datasets. The reliability and generalisability of ML models are fundamentally dependent on the representativeness of their training data [18]. Despite the frequent use of benchmark datasets such as NSL-KDD [20], UNSW-NB15 [21], and CIC-IDS2017 [22], these resources have well-documented limitations, including outdated traffic patterns and limited coverage of modern adversarial techniques [1,8]. While more specialized datasets such as ToN-IoT [23] and BoT-IoT [9] have emerged for IoT security, other critical domains—including insider threats, multi-vector attack simulations, and ICS environments—remain comparatively under-represented. The lack of systematic integration across dataset categories further constrains the robustness of existing ML models and leaves significant blind spots in defensive research.
Several systematic literature reviews (SLRs) have examined the intersection of AI and cybersecurity from different perspectives. Foundational surveys such as those by Buczak and Guven [6] and Sommer and Paxson [1] established early discussions around machine learning for intrusion detection, while Ring et al. [17] and Yang et al. [18] examined the growing use of deep learning for cybersecurity tasks. More recent reviews have expanded the field in different directions. For example, Sowmya et al. [24] focused on AI-based intrusion detection, Mvula et al. [8] reviewed cybersecurity data repositories and performance metrics for semi-supervised learning, Salem et al. [25] surveyed AI-driven cyber-threat detection techniques, Ofusori et al. [26] provided a broad review of AI applications in cybersecurity, and Rehman et al. [27], Hozouri et al. [28], and Dobler et al. [29] examined IDS-oriented or dataset-oriented aspects of the field.
Despite these valuable contributions, most existing reviews remain limited in three respects:
  • They emphasise ML models without systematically connecting them to adversarial behaviours or structured taxonomies such as MITRE ATT&CK;
  • They treat datasets in isolation, without a comprehensive categorisation across domains such as NIDD, IoT-NIDD, ICS, and Insider Threat datasets;
  • They rarely provide cross-reference analyses that jointly consider attacks, ML paradigms, and datasets.
To address these limitations, this paper is positioned as a systematic literature review with tri-axis synthesis, rather than as a general survey of AI in cybersecurity. This review is organized around three linked evidence dimensions: (i) cyberattacks, represented through MITRE ATT&CK tactics and techniques; (ii) machine learning methods, spanning classical, ensemble, hybrid, and deep learning approaches; and (iii) datasets, categorised into a refined taxonomy of cybersecurity data sources. The novelty of this review lies not only in aligning attacks to MITRE ATT&CK or in organising datasets into a refined taxonomy, but in cross-referencing cyberattacks, ML methods, and datasets jointly as three linked evidence axes. This tri-axis perspective makes it possible to identify concentration patterns, benchmark dependence, and missing attack–method–dataset intersections that are not visible in single-axis or model-centred surveys.
The scope of this review is intentionally bounded. It includes peer-reviewed studies published between 2019 and 2025 that apply ML- or AI-based methods to cyberattack detection, classification, or mitigation and that report identifiable attack types and datasets. It does not aim to review all applications of AI in cybersecurity, such as cryptography, privacy-preserving AI, secure software engineering, purely conceptual discussions, or studies that do not expose a clear attack–method–dataset relationship. This boundary is necessary to support a consistent and reproducible cross-reference synthesis across the three evidence axes.
Accordingly, this paper should be read not as a general review of AI-powered cybersecurity, but as an ATT&CK-aligned SLR that synthesizes the literature through the linked lenses of cyberattacks, machine learning methods, and datasets.

Positioning Against Existing Reviews

Recent review papers provide valuable but partial views of the literature. Sowmya et al. [24] review AI-based intrusion detection and organize work around ML, DL, and ensemble methods, but remain IDS-centred rather than ATT&CK-centred. Mvula et al. [8] provide an SLR of cybersecurity datasets and performance metrics for semi-supervised learning, but their emphasis is dataset repositories and evaluation metrics rather than cross-referencing attacks, methods, and datasets. Salem et al. [25] survey AI-driven detection techniques across more than sixty recent studies and cover broad cyber threats such as malware, network intrusion, and spam, but they do not organize the literature through an ATT&CK-aligned tri-axis synthesis. Ofusori et al. [26] review AI applications in cybersecurity at a broader level, but without a structured ATT&CK-based threat synthesis. More recent IDS-focused reviews, such as Rehman et al. [27] and Hozouri et al. [28], concentrate on detection models, benchmark datasets, evaluation metrics, and IDS architectures, again without an explicit ATT&CK mapping or a structured attack–method–dataset cross-reference. Dataset-focused work also remains more specialized: Dobler et al. [29], for instance, systematically characterize malicious industrial network traffic datasets for ML evaluation, but their contribution is dataset-centric rather than a broader synthesis across attacks, ML methods, and dataset usage. Table 1 summarises how this review differs from representative recent surveys.
Taken together, prior review papers provide valuable but partial views of the literature. Broad AI–cybersecurity surveys tend to emphasise methods and applications at a high level, IDS-focused reviews concentrate on detection architectures, datasets, and performance measures, and dataset-oriented studies examine repositories or benchmark suitability within narrower methodological settings. In contrast, the present study is designed as an ATT&CK-aligned systematic literature review with tri-axis synthesis. Its contribution is not merely to summarize attacks, models, or datasets separately, but to examine how these three evidence dimensions interact across the literature. This framing makes it possible to identify underexplored ATT&CK behaviours, methodological concentration around particular ML families, benchmark dependence, and missing intersections across attack types, model classes, and dataset categories.
Accordingly, the present review is differentiated not simply by broader coverage, but by its tri-axis analytical design, which cross-references attacks, methods, and datasets jointly rather than discussing each dimension in isolation.
The contributions of this study are threefold:
  • ATT&CK-aligned attack synthesis: This review identifies and maps 312 attack labels from 99 studies to MITRE ATT&CK tactics and techniques, providing a structured threat-oriented view of the literature.
  • Refined cross-domain dataset taxonomy: This review organizes 96 datasets into a refined taxonomy spanning NIDD, IoT-NIDD, malware, Spam and Phishing, ICS, Insider Threat, custom-collected, and other datasets, thereby clarifying the benchmark landscape used in AI-driven cyber defense research.
  • Tri-axis cross-reference analysis: The central contribution of this study is a joint cross-reference of cyberattacks, ML methods, and datasets as three linked evidence axes. This tri-axis analysis reveals methodological concentration, benchmark dependence, and underexplored attack–method–dataset intersections that are not visible when attacks, models, or datasets are examined separately.
Through this systematic approach, the review aims to bridge fragmented research streams in AI-driven cybersecurity and provide actionable insights for both academic and operational communities. The findings highlight not only current practices but also critical gaps—such as the underutilization of insider-threat and ICS datasets, and the persistent over-reliance on single-dataset experiments—that must be addressed to develop more resilient and generalizable ML-based defense systems.
The remainder of this paper is organized as follows. Section 2 outlines the methodology used to conduct the SLR. Section 3 presents a taxonomy and analysis of cyberattacks in line with MITRE ATT&CK tactics and techniques. Section 4 reviews and analyses the ML techniques used in the literature. Section 5 explores the datasets employed, their frequency, and the research gaps associated with them. Section 6 provides a cross-reference analysis of attack tactics, techniques, ML models, and datasets. Section 7 discusses the main gaps and limitations of the SLR, and Section 8 outlines implications for future research. Finally, Section 9 summarizes the main findings.

2. Methodology

Systematic Literature Reviews (SLRs) are widely used to identify research gaps, synthesize evidence, and structure knowledge within a specific domain. According to Alnabhan and Branco (2024) [30], an SLR provides a structured and reproducible approach for synthesising evidence and identifying trends in emerging research areas. This study follows the guidelines for conducting a Systematic Literature Review proposed by Keele (2007) [31], which have been widely applied in software engineering and cybersecurity research. The review protocol was designed to ensure reproducibility, minimise bias, and systematically address the research questions. The process consists of seven linked phases:
  • Review Scope and Analytical Frame;
  • Research Questions;
  • Search Strategy and Literature Identification;
  • Study Selection and Eligibility Screening;
  • Data Extraction and Coding Procedure;
  • Attack Mapping, Method Grouping, and Dataset Categorisation Rules;
  • Ambiguity Handling, Consistency Checking, and Quality Appraisal.

2.1. Review Scope and Analytical Frame

This review was designed as a systematic literature review with tri-axis evidence synthesis. The analytical unit of the review is not the ML model alone, nor the attack category or dataset in isolation, but the reported relationship among attack, method, and dataset within each included study. Accordingly, studies were retained only when they provided sufficient evidence to identify: (i) the cyberattack type, family, or ATT&CK-relevant behaviour under study; (ii) the machine learning or deep learning method applied; and (iii) the dataset or data source used for evaluation. This design choice narrows the scope of the review to the literature that can support consistent cross-reference analysis across the three axes.
To preserve analytical coherence, this review does not seek to cover the full landscape of AI in cybersecurity. Studies were considered outside scope when they addressed cybersecurity in a broad conceptual sense without a clear attack focus, lacked explicit dataset grounding, or did not report a machine learning pipeline suitable for structured comparison. This bounded scope is aligned with the research questions and supports reproducible synthesis of trends, dominant practices, and gaps across attacks, methods, and datasets.

2.2. Review Research Questions

To guide the review and define the research aims, five research questions (RQs) were formulated, as shown in Table 2. These questions address the landscape of cyberattacks, machine learning methods, datasets, and gaps in the literature.

2.3. Search Strategy and Literature Identification

An unbiased and rigorous search strategy was adopted to retrieve the literature relevant to AI- and ML-based cybersecurity methods for cyberattack detection, classification, or mitigation. To maximize coverage, four major bibliographic databases were searched: IEEE Xplore, ACM Digital Library, Scopus, and Google Scholar. The search string was derived from the research questions and combined terms related to cyberattacks, machine learning, mitigation, and datasets:
(“cyberattack” OR “cyber security”) AND (“ML” OR “DL” OR “machine learning” OR “deep learning” OR “AI” OR “artificial intelligence”) AND (“mitigate” OR “mitigation”) AND (“dataset” OR “benchmarking”).
The search covered the period from January 2019 to March 2025 and produced an initial retrieval of 1960 papers.

2.4. Study Selection

Figure 1 illustrates the study selection process. An initial total of 1960 articles was retrieved from four databases: Scopus (1075), ACM (661), Google Scholar (82), and IEEE Xplore (142). After the removal of duplicates and non-relevant studies—such as non-English papers, posters, reviews, surveys, non-scientific publications, books, book chapters, newer versions of studies, guideline documents, and editorials—1245 studies remained, resulting in the exclusion of 715 records.
These 1245 studies were then subjected to title and abstract screening, during which the research methodology, results, and contributions were examined. Based on this step, 787 articles were excluded, leaving 458 studies. If the title and abstract did not provide sufficient clarity about the application domain or contribution of the study, the paper was retained for the next stage.
Subsequently, full-text screening was carried out to assess the originality of the research and its relevance to cyberattacks, machine learning methods, and datasets. This process led to the exclusion of a further 359 studies. Ultimately, 99 primary studies were included in this systematic literature review and formed the basis for the findings and analysis presented in the following sections.

Inclusion and Exclusion Criteria

Explicit inclusion and exclusion criteria were applied to ensure quality and relevance (Table 3).

2.5. Data Extraction and Coding Procedure

For each included study, a structured extraction form was used to record bibliographic and analytical metadata. The extracted fields included publication year, application domain, reported attack type(s), machine learning (ML) or deep learning method(s), dataset(s) used, performance metrics, and any study-specific contextual notes required for subsequent coding. The extraction process was designed to support the tri-axis analytical frame introduced in Section 2, in which each study was represented through the linked dimensions of attack, method, and dataset.
The coding process proceeded in three stages. First, raw study-level descriptors were extracted as reported by the original authors wherever possible. Second, these raw descriptors were normalized into comparable analytical units. For example, reported attack names were standardized into consistent attack labels, model names were grouped into method families and subfamilies, and datasets were assigned to a unified taxonomy. Third, the normalized entries were linked across the three axes to enable cross-reference analysis of attack–method, attack–dataset, method–dataset, and attack–method–dataset relationships.
To preserve traceability, coding was based on a principle of closest explicit evidence. When a study directly stated the attack type, ML method, or dataset used, that information was recorded without reinterpretation. Additional interpretation was introduced only when the study used broad or incomplete terminology and a more specific label could be derived from the dataset description, malware family, or evaluation context provided in the paper. This approach reduced unnecessary inference while allowing consistent aggregation across studies.

2.6. Attack Mapping, Method Grouping, and Dataset Categorisation Rules

Attack extraction and normalization. Attack evidence was extracted at the most specific level reported in each study. When authors explicitly named attack types (e.g., DoS, SQL injection, phishing, ransomware, or brute force), these labels were retained and later normalized for consistency across spelling variants, abbreviations, and closely related naming conventions. When a study referred only to a broad task such as “network intrusion detection” without enumerating attack classes, the attack labels were derived from the documented attack classes in the dataset used by that study. When a paper focused on malware families rather than named attack types, the malware family name was first recorded as the raw label and then mapped to the most appropriate ATT&CK-relevant adversarial behaviour based on the primary function or operational objective described in the study.
ATT&CK mapping rules. After normalization, each attack label was mapped to one primary MITRE ATT&CK tactic and one primary technique to maintain a consistent unit of comparison across this review. The mapping followed a hierarchical decision rule: the adversary’s operational objective determined the tactic, and the most representative observable behaviour determined the technique. When a label could plausibly correspond to multiple ATT&CK entries, the mapping favoured the behaviour that best matched the study’s detection target rather than the full attack chain. For example, a denial-of-service label was mapped to an Impact tactic and the corresponding denial-of-service technique, whereas a malware family used primarily for credential theft was mapped to Credential Access rather than to malware execution in a generic sense. To maintain comparability across domains, MITRE ATT&CK for Enterprise was used as the primary reference framework throughout this review, including for studies involving ICS-related datasets.
Machine learning method grouping. ML methods were first recorded using the terminology provided in the original study. They were then normalized into method families and subcategories to support meaningful aggregation across heterogeneous naming practices. Closely related variants were grouped under shared analytical labels; for example, decision-tree derivatives were grouped under Tree-Based Models, autoencoder variants under Autoencoders, and recurrent sequence models under LSTM/GRU families where appropriate. After this normalization step, the methods were organized into higher-level categories used throughout the review, namely Classical Machine Learning Models, Deep Learning Models, Hybrid/Ensemble/Explainable Methods, and Learning Paradigms and Optimization.
Dataset categorisation rules. Datasets were recorded using the names reported by the study authors and then assigned to a refined taxonomy developed for this review. The categorisation was based on the dataset’s operational context, data source, and intended cybersecurity use case rather than on name alone. Accordingly, datasets were grouped into categories such as NIDD, IoT-NIDD, malware, Spam and Phishing, ICS, Insider Threat, custom-collected, and other datasets. When prior survey papers classified a dataset differently, reassignment in this review followed the way the dataset was actually used in the reviewed study. For example, datasets used in industrial monitoring or cyber-physical intrusion contexts were categorised under ICS even if they had been placed elsewhere in earlier surveys. This rule was adopted to prioritise analytical consistency with study usage over inheritance from prior taxonomies.

2.7. Ambiguity Handling, Consistency Checking, and Quality Appraisal

Ambiguities were handled through an explicit decision hierarchy. First, the study’s own wording was used whenever it was sufficiently specific. Second, if the wording was broad or underspecified, the dataset documentation and evaluation context reported in the study were used to recover the likely attack class, method category, or dataset role. Third, if multiple interpretations remained plausible, the more conservative and more general label was retained to avoid over-specification. This rule was particularly important for malware-family studies, multi-stage attacks, and papers that used broad labels such as “intrusion” or “malicious traffic”.
Initial screening, extraction, and coding were conducted by the first author. To improve coding consistency, ambiguous cases were flagged during extraction and revisited in iterative consistency passes using the shared coding rules described above. This process allowed earlier and later cases to be harmonized under a shared codebook, refine normalization rules, remove duplicate or near-duplicate labels, and ensure that similar study designs were treated under the same analytical criteria across the final review dataset.
A lightweight quality appraisal was also applied at the included-study level. This appraisal did not exclude studies after screening, but it was used to judge the interpretive strength of the evidence base. The appraisal considered whether a study clearly specified the following: (i) the attack target, (ii) the ML method or pipeline, (iii) the dataset or data source, (iv) the evaluation setting or metrics, and (v) sufficient methodological detail to support comparative interpretation.
Representative difficult cases included studies that used broad labels such as “intrusion” without enumerating attack classes, malware-family studies without explicit behavioural labels, and datasets whose operational use in the reviewed study differed from earlier survey classifications. These cases were resolved using the decision hierarchy described above, with preference given to the study’s explicit evaluation target and the dataset’s documented usage context.
Table 4 summarises the lightweight quality-appraisal criteria used to assess evidence clarity.
The quality-appraisal score was used descriptively to assess evidence clarity and interpretive confidence, rather than as an exclusion threshold after screening.
Together, these procedures established a traceable pipeline from search and screening to coding, normalization, and tri-axis synthesis, thereby supporting reproducible comparison across cyberattacks, machine learning methods, and datasets.

3. Cyberattacks Across Reviewed Studies

From the 99 included studies, 312 unique raw attack labels were extracted. Attack evidence was recorded at the most specific level reported in each paper and then normalized according to the coding rules defined in Section 2.6. When studies explicitly named attack classes, those labels were retained and standardized for consistency. When studies used broad task descriptions such as “network intrusion detection” without enumerating attack classes, the corresponding attack labels were derived from the attack classes documented in the dataset used by the study. For studies centred on malware families, the malware family name was first retained as the raw label and then mapped to the most appropriate ATT&CK-relevant adversarial behaviour on the basis of the malware’s primary operational objective. This process provided a traceable bridge between heterogeneous study descriptions and the unified analytical framework used in the review.

3.1. Attack Taxonomy Selection and ATT&CK Mapping

Numerous taxonomies have been developed to classify cyberattacks based on various criteria, including attack vectors, targets, impacts, and methodologies [32]. While these taxonomies offer valuable perspectives, not all are well-suited to the scope and objectives of this systematic literature review (SLR), which analyses over 300 distinct cyberattack labels extracted from 99 academic studies. The decision to adopt a single, unified taxonomy necessitated a careful evaluation of several popular existing frameworks, their strengths, and their limitations in relation to the reviewed studies and datasets [33].
Several taxonomies focus on attack methods and locations, particularly within manufacturing and industrial contexts. For instance, research by Wu and Moon [34,35], Pan et al. [36], and Tuptuk and Hailes [37] emphasise physical domain attack vectors and the impact of cyberattacks on manufacturing systems. These taxonomies often target Industry 4.0 applications, additive manufacturing, and quality inspection processes [38,39,40]. While valuable for sector-specific threat modelling, these approaches are often too domain-specific, lacking generalisability to the broader range of network, malware, and intrusion-based attacks covered in this review.
The Common Attack Pattern Enumeration and Classification (CAPEC) schema [41] provides a comprehensive catalogue of attack patterns. While highly detailed, CAPEC focuses primarily on describing how attacks are executed (i.e., attack patterns), making it more useful for software developers or threat modelling than for high-level threat categorisation or mapping to real-world datasets. Its granularity, while useful for certain applications, presents challenges for mapping general attack types or dataset labels, which are often abstract or ambiguous.
Another taxonomy proposed by Hansman and Hunt [42] classifies attacks based on four dimensions: vector, target, vulnerability, and payload. This multi-dimensional “axe structure” offers a detailed framework but leans heavily toward malware-centric attacks, potentially under-representing human-focused or access-based threats. Similarly, the taxonomy by Meyers et al. [43] emphasises subtypes and granular descriptions but retains a malware-heavy orientation.
Chapman et al. [44] introduced a novel approach based on access requirements, which is insightful in evaluating preconditions for different attacks. However, this method lacks detail in key aspects such as privilege escalation, and its practical application across diverse attack types is limited. Zhu et al. [32] proposed a taxonomy tailored to Operational Technology (OT) and ICS environments, categorising attacks across hardware, software, and communication stacks. While effective in the ICS domain, this model does not generalize well to IT systems or network-level attacks.
Simmons et al. [45] proposed the AVOIDIT taxonomy (Attack vector, Operational impact, Victim, Objective, Defense, Information impact, and Target), using a tree-based structure. While AVOIDIT provides a well-rounded view of the attack lifecycle, it suffers from uneven detail across categories—some attack vectors are richly described, while others remain vague.
MITRE ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge) [19] is a globally recognized, continuously updated knowledge base that categorises adversarial behaviours observed in real-world cyber incidents. The MITRE ATT&CK framework provides broad coverage, including a wide spectrum of tactics and techniques. This makes it compatible with the diverse nature of the reviewed studies, as it encompasses both malware behaviours and network-based intrusions. The framework’s separation of tactics (the adversary’s goal) and techniques (how the goal is achieved) enables hierarchical classification and flexible mapping of ambiguous or compound attack labels. MITRE ATT&CK is derived from empirical evidence and adversary behaviours, providing a practical foundation for aligning academic research with operational threat intelligence. While separate frameworks exist for Enterprise and ICS environments, ATT&CK allows for cross-domain integration. In this review, the MITRE ATT&CK for Enterprise was used as a unifying framework to maintain consistency, even when analysing ICS-related datasets. MITRE ATT&CK is supported by a wide ecosystem of tools and is widely adopted in both academia and industry, enhancing the reproducibility and applicability of this review. Given the diversity and often loosely defined nature of attack labels in the reviewed literature and datasets, the MITRE ATT&CK framework was selected as the most suitable unifying taxonomy for this review.
Despite its advantages, ATT&CK is not without limitations. Notably, it assumes the existence of network activity or observable system behaviour, which made it challenging to categorise studies that focused purely on malware binaries without behavioural context. In such cases, attack labels were mapped according to the malware sample’s primary function or operational objective (e.g., credential access or lateral movement). Nevertheless, some datasets pertain to Industrial Control Systems (ICS), for which the MITRE ATT&CK for ICS provides a separate taxonomy. However, to maintain consistency across the review, the MITRE ATT&CK Enterprise framework was used as the unified categorisation scheme wherever feasible.
For analytical consistency, each normalized attack label was assigned one primary ATT&CK tactic and one primary technique. This one-to-one assignment was used as a comparative device rather than as a claim that real attacks are single-stage phenomena. In cases where an attack could plausibly map to multiple ATT&CK entries, the selected mapping prioritised the study’s explicit detection focus and the most representative observable behaviour. This rule improved comparability across the full corpus while preserving interpretive transparency in ambiguous cases.

3.2. Cyberattack Frequency Analysis

Across the 99 reviewed papers, a total of 312 unique cyberattack labels were identified, each mapped to a single MITRE ATT&CK tactic and technique (see Appendix A). The distribution reveals a strong concentration around a limited subset of highly visible attack behaviours, a pattern examined further in the subsequent trend, co-occurrence, and breadth analyses. The top-ten most frequent labels (counted once per paper) are the following: DoS (30 papers), SQL Injection (26), XSS (26), PortScan (23), Backdoor (23), DoS Slowloris (20), SYN Flood (18), Reconnaissance (18), Worms (18), and Exploits (17). This distribution highlights the dominance of Denial-of-Service (DoS) attacks (including specific variants such as Slowloris and SYN Flood) and common web-based attacks (SQL Injection; XSS). These categories remain popular due to their high visibility, ease of deployment, and immediate operational consequences.
Table 5 summarises the distribution of attack labels across MITRE ATT&CK tactics. The most frequent tactic is Impact (72, 14.55%), reflecting the prevalence of attacks aimed at disrupting availability and causing direct operational harm—most notably Denial-of-Service (DoS) and its sub-variants. Following closely are Initial Access (59, 11.92%), Execution (58, 11.72%), and Command and Control (55, 11.11%), all of which focus on the adversary’s ability to establish and sustain an initial foothold. Reconnaissance-related behaviour (Reconnaissance, 54, 10.91%) also features prominently, underscoring its role in both attack preparation and intrusion attempts.
By contrast, tactics that represent deeper compromise stages—such as Persistence (3.43%), Collection (3.43%), Privilege Escalation (2.63%), and Resource Development (1.01%)—appear far less frequently. This suggests that current datasets and experimental studies focus disproportionately on early-stage, externally observable threats rather than long-term adversarial presence within systems.
Importantly, the dominance of Impact, Initial Access, and Execution in the literature should not be interpreted as evidence that these are the only or most consequential phases in real intrusions. Rather, these phases are more visible, easier to simulate, and better supported by public datasets, which makes them more attractive for benchmark-driven research.
The ten most common MITRE ATT&CK techniques are shown in Table 6. A clear skew exists towards availability-related behaviours: Network Denial of Service (61, 7.71%) and Endpoint Denial of Service (44, 5.56%) together account for over 13% of all labels. Other frequently observed techniques include Exploit Public-Facing Application (5.44%), Active Scanning (5.06%), and Gather Victim Host Information (4.55%), reflecting the emphasis on perimeter-based exploitation and preparatory reconnaissance. Credential-related attacks (Brute Force, 4.30%) and command-driven execution (Command-Line Interface, 4.05%) also feature prominently.
Notably, techniques associated with stealthier, post-compromise activity—such as privilege escalation, persistence, or covert exfiltration—do not appear in the top 10, again illustrating the field’s bias towards high-visibility attacks that are easier to simulate and label in controlled datasets.
Taken together, these findings reveal a research landscape that strongly favours high-impact, externally observable attacks (e.g., DoS, scanning, and web exploitation). This emphasis is likely driven by the relative ease of reproducing such attacks in experimental testbeds, as well as the availability of public datasets containing these behaviours. However, the relative scarcity of advanced, low-and-slow tactics such as privilege escalation, persistence, lateral movement, and exfiltration highlights a notable gap between experimental research and real-world adversary tradecraft. Addressing this imbalance would require richer datasets and experimental designs capable of capturing stealthier post-compromise activities, which are often more critical in determining the true severity of a cyber intrusion.

3.3. Cyberattack Trends (2019–2024)

Since the data for 2025 are incomplete, including them would not provide an accurate reflection of the overall trend. Therefore, the trend analysis presented here is limited to the period 2019–2024. The longitudinal view of tactic usage (Figure 2) shows a clear maturation pattern in the literature between 2019 and 2024. Three tactics—Impact, Initial Access, and Execution—dominate the growth trajectory, rising steadily after 2021 and reaching sharp peaks in 2024 (Impact = 30, Initial Access = 23, and Execution = 24). This reflects an increased focus on high-visibility behaviours such as denial-of-service and exploitation of external-facing assets, which are easier to reproduce in experimental datasets.
Command and Control (23 in 2024) and Reconnaissance (23 in 2024) also display consistent upward trends, suggesting stronger coverage of adversarial activities both before and after gaining access. Credential Access grows more moderately but still peaks at 18 in 2024, highlighting rising attention to authentication-based threats.
By contrast, stealthier or deeper-compromise tactics such as Privilege Escalation (5 in 2024), Persistence (10 in 2024), and Exfiltration (8 in 2024) remain under-represented throughout the period. While they do show modest increases, they never surpass 10 annual instances, indicating that datasets and studies still place less emphasis on long-term intrusion behaviours compared to disruptive, high-impact attacks. Similarly, Resource Development appears sporadically and with negligible representation.
At the technique level (Figure 3), the dominance of denial-of-service behaviours is unmistakable. Network Denial of Service rises from just 1 instance in 2019 to 27 in 2024, while Endpoint Denial of Service follows a similar trajectory, reaching 20 in 2024. These two techniques together form the bulk of observed attacks, underlining the field’s reliance on availability-disruption scenarios.
Other techniques focused on exploiting the external perimeter—Exploit Public-Facing Application (18 in 2024) and Active Scanning (19 in 2024)—also show strong upward trends. Meanwhile, Gather Victim Host Information peaks at 16 in 2024, reflecting a growing but secondary emphasis on reconnaissance and victim profiling. Because 2025 records are incomplete, they are excluded from this longitudinal trend analysis to avoid distorting the observed pattern.
Taken together, these patterns reveal that between 2019 and 2024, cyberattack research and datasets have increasingly centred on high-impact, high-detection-rate behaviours—particularly denial-of-service, scanning, and web-facing exploitation. This emphasis likely reflects the relative ease of simulating and labelling such behaviours in testbeds. However, stealth-oriented tactics such as persistence, lateral movement, and exfiltration remain consistently under-represented, pointing to a persistent gap between experimental focus and the realities of advanced adversarial operations. Addressing this gap requires richer datasets that capture the “low-and-slow” phases of intrusions, which are often decisive in real-world incidents but challenging to model experimentally.

3.4. Co-Occurrence of ATT&CK Tactics and Techniques

Examining the co-occurrence of MITRE ATT&CK tactics and techniques across the 99 reviewed papers reveals which adversarial behaviours most frequently appear together within the same study (Figure 4 and Figure 5). Such analysis highlights patterns of behavioural clustering, where specific adversarial actions are operationally dependent or strategically complementary.
These co-occurrences should be interpreted as patterns of joint coverage within the reviewed studies rather than as direct evidence of temporal sequencing within individual real-world incidents.
At the tactic level, the strongest association is between Execution and Command and Control (50 co-occurrences), illustrating that once a foothold is established, adversaries often execute commands or payloads directly through their C2 infrastructure. This reflects the operational reality that C2 channels are not passive but serve as the primary conduit for directing malicious activity.
Impact co-occurs heavily with Reconnaissance (47) and Initial Access (46), suggesting a frequent attack progression: adversaries probe and map the target environment, establish access, and then deliver disruptive or destructive payloads. Similarly, Execution pairs strongly with Impact (42) and Reconnaissance (42), reinforcing the tight coupling between operational actions and observable consequences.
Reconnaissance itself is present in seven of the top-ten tactic pairings, underscoring its role as both a preparatory and supporting behaviour across different attack stages. Meanwhile, deeper-compromise tactics such as Persistence, Privilege Escalation, and Exfiltration appear less frequently in high-ranking pairs, reflecting their secondary treatment in experimental and dataset-driven research.
At the technique level, the dominance of denial-of-service behaviours is clear. The strongest pairing is Network Denial of Service with Endpoint Denial of Service (43), highlighting how many DoS scenarios combine both network-level saturation and endpoint disruption for maximum effect. Closely following are Network Denial of Service with Exploit Public-Facing Application (41) and Network Denial of Service with Active Scanning (39), reflecting a pattern where service exploitation and reconnaissance are embedded within disruptive campaigns.
Techniques linked to credential compromise and direct execution—such as Brute Force with Command-Line Interface (30) and Brute Force with Active Scanning (32)—appear frequently in mid-ranked pairings. This points to an integrated attack workflow: reconnaissance to identify weak points, brute-forcing to obtain credentials, and CLI-based execution to solidify control.
By contrast, stealthier or persistence-related techniques, such as Phishing and Remote Access Software or Remote Access Services, show weaker associations overall, appearing less prominently in the top co-occurrence patterns. This reflects the broader trend across the literature, where high-impact, high-visibility behaviours are emphasised, while subtle post-compromise techniques remain underexplored.
Overall, co-occurrence analysis reinforces the conclusion that cyberattack research between 2019 and 2025 has centred on externally visible, easily simulated behaviours (DoS, scanning, public-facing exploitation). The lack of strong co-occurrence signals for stealth-oriented tactics and techniques highlights a persistent gap between academic research and real-world adversary operations, where multi-stage, low-and-slow intrusions are often decisive.

3.5. Breadth of ATT&CK Coverage per Paper

To better understand the breadth of adversarial coverage in the literature, both the number of distinct MITRE ATT&CK tactics and techniques addressed in each of the 99 papers were analysed. The results are summarised in Table 7 and Table 8.
Of the 99 papers, 19 (19.2%) concentrated exclusively on a single tactic. These were often highly specialised studies targeting niche attack behaviours (e.g., denial-of-service, phishing, or command-and-control detection). While such targeted research yields deep insights into specific adversary techniques, it inherently limits coverage of the wider attack lifecycle. For instance, a DoS study examining only Impact overlooks how attackers gain initial access—a critical gap for defensive planning.
At the other end, only 16 papers (16%) addressed nine tactics or more, highlighting that comprehensive coverage of the ATT&CK framework remains uncommon. Notably, around half of the reviewed studies (49.5%) engaged with four tactics or fewer, reflecting either a deliberate focus on niche adversarial behaviour, dataset limitations, or methodological simplicity. This suggests that while breadth of coverage is possible, many works prioritise depth in specific phases of the attack lifecycle.
The analysis of techniques reveals a similar skew. Fourteen papers (14.1%) employed only a single ATT&CK technique, while a comparable proportion (12.1%) examined four techniques. A small subset of highly comprehensive works mapped over 20 techniques, with the most extensive study covering 27. These broad-spectrum studies are rare (9%), but they demonstrate the potential of multi-technique modelling to capture the complexity of real-world adversarial behaviour.
Overall, the distributions for both tactics and techniques confirm a trend towards narrower research scopes: roughly half of the studies explored only a limited set of adversarial behaviours, while only a minority attempted wide coverage across the attack lifecycle. This reflects a balance in the field between depth (specialised studies) and breadth (holistic approaches).

3.6. Cyberattack-Side Key Findings and Research Gaps

The cyberattack analysis reveals a structurally imbalanced evidence base. The literature is concentrated on ATT&CK tactics such as Impact, Initial Access, Execution, Reconnaissance, and Command and Control, while post-compromise phases such as Persistence, Privilege Escalation, Lateral Movement, Collection, and Exfiltration remain comparatively under-represented. This pattern should not be interpreted as evidence that the latter tactics are less important in practice. Rather, it reflects the fact that externally visible, short-horizon, and high-signal attacks are easier to reproduce in laboratory settings, easier to label in public datasets, and therefore easier to benchmark.
This concentration has two important implications for the field. First, model development is being guided disproportionately by attacks that are operationally loud and experimentally convenient, such as DoS/DDoS, scanning, and web-facing exploitation. As a result, the literature gives stronger coverage to disruption-centric threats than to the stealthier phases that often determine the success, persistence, and severity of real intrusions. Second, the relative scarcity of studies covering broader ATT&CK trajectories suggests that many detection pipelines are being evaluated on isolated attack moments rather than on full adversarial workflows. This limits their value for understanding how AI-based cyber defense performs against coordinated, multi-stage campaigns.
The main research gap exposed by this section is therefore not simply the absence of certain attack labels, but the lack of sustained empirical attention to low-and-slow post-compromise behaviour. Future work should prioritise datasets and experimental designs that better capture persistence, lateral movement, privilege escalation, exfiltration, and other ATT&CK phases that remain difficult to observe but are central to real-world adversary operations.
This attack-side imbalance becomes even more significant when interpreted alongside the model and dataset distributions examined in the following sections.

4. Machine Learning Methods Across Reviewed Studies

In this section, the machine learning (ML) approaches employed in the reviewed studies are examined through the normalization and grouping rules defined in Section 2.6. Across the 99 included studies, 143 unique raw ML method labels were identified. Because the literature uses heterogeneous naming conventions, closely related algorithms and architectural variants were first standardized into comparable analytical units before higher-level aggregation. For example, decision-tree derivatives were grouped under Tree-Based Models, autoencoder variants under Autoencoders, and recurrent sequence models under their corresponding sequence-learning families. This two-stage coding process allowed this review to compare methodological patterns without losing the connection to the original study terminology.
Following this initial classification, this study adopted the high-level taxonomy proposed by Emmert-Streib et al. [46], which categorises machine learning approaches into Classical Machine Learning Models, Deep Learning Models, and Learning Paradigms. This framework encompassed the majority of the identified method groups. However, in line with von Rueden et al. [47], an additional category—Hybrid and Ensemble Learning—was introduced, reflecting the importance of integrating multiple models or knowledge sources to enhance performance and robustness.
Two remaining categories, Optimization Algorithms (e.g., Stochastic Gradient Descent, Adam, and RMSProp) and Explainable AI (XAI) methods (e.g., SHAP; LIME), did not directly align with these existing groups. For the purposes of this review, explainable AI techniques were incorporated into the Hybrid, Ensemble, and Explainable Methods category, acknowledging that interpretability methods are often developed and applied alongside hybrid or ensemble approaches [48]. Optimization algorithms were grouped under the Learning Paradigms and Optimization category because they function primarily as training and performance-enhancement mechanisms rather than as standalone detection models [46].

4.1. ML Method Frequency Analysis

The most frequently reported individual method across the 99 reviewed papers is the Random Forest (RF) algorithm, appearing in 22 studies. RF’s popularity in cybersecurity research is linked to its robustness and strong performance on tabular, high-dimensional datasets [49]. Long Short-Term Memory (LSTM) networks appear in 18 studies, while Convolutional Neural Networks (CNNs) appear in 15. The frequent use of LSTM and CNN reflects the growing application of deep learning models to sequential (e.g., network traffic) and spatial/temporal data representations in intrusion detection. Support Vector Machines (SVMs) (15 studies) and Decision Trees (DTs) (12 studies) also remain widely used.
Table 9 presents the frequency of machine learning models across the main categories and their subcategories. Note that one paper may employ multiple models, leading to a total frequency larger than the number of reviewed studies (99).
Accordingly, the prevalence of deep learning and ensemble methods should be interpreted in light of the data regimes most commonly studied. Their dominance reflects not only methodological strength, but also the structure of available benchmark tasks.
At the main category level, Deep Learning Models dominate the literature, appearing in 72 studies, followed by Hybrid, Ensemble, and Explainable Methods in 46 studies and Classical Machine Learning Models in 34 studies. Learning Paradigms and Optimization represent the smallest share, appearing in 18 studies, typically appearing as supporting techniques rather than standalone detection frameworks. The prominence of deep learning is consistent with its ability to automatically extract complex features from raw data, making it highly effective for processing the high-volume and high-velocity data streams typical of cybersecurity contexts [50,51].
At the subcategory level, Ensemble Learning Methods were the most common (29 papers), reflecting the strong emphasis on improving accuracy and reducing variance through model combination—particularly important in imbalanced intrusion detection datasets [52]. Within deep learning, LSTM and Variants (27 papers), Feedforward Networks (24 papers), and Core CNN Architectures (22 papers) were the leading approaches, demonstrating the relevance of sequence modelling and convolution-based feature extraction for time-series and packet-level cybersecurity data.
Classical methods still retain a significant role, with Statistical Models and SVM variants each appearing in 17 studies, showing that interpretable and well-established algorithms remain valuable, especially in settings requiring transparency or where computational resources are limited.
The results also show the importance of hybridization: Boosting (16 papers) and Hybrid Architectures (13 papers) frequently appeared, often combining deep learning with traditional models to exploit complementary strengths. Meanwhile, optimization techniques (11 papers) and explainability-focused methods (4 studies) were less frequent, indicating that while they are recognized as critical areas, they remain underexplored in the current cybersecurity ML literature.

4.2. ML Method Trends (2019–2024)

As the data for 2025 are incomplete, including them would not provide an accurate reflection of the overall trend. Consequently, the trend analysis presented here is limited to the period from 2019 to 2024. As illustrated in Figure 6, the adoption of machine learning (ML) methods in cybersecurity shows dynamic growth patterns between 2019 and 2024. Among the main categories, Deep Learning Models show the most consistent rise, starting from a single study in 2019 and reaching a peak of 26 in 2024. This trend highlights both the rapid adoption of deep architectures in cybersecurity research and the possible transition toward more integrated frameworks in recent years.
Hybrid, Ensemble, and Explainable Methods also demonstrate a steady increase, with no recorded studies in 2019 but 17 studies in 2024. Their emergence reflects increasing recognition of the benefits of combining multiple models and prioritising interpretability, which is particularly relevant in security-sensitive contexts.
Classical Machine Learning Models maintain a relatively stable presence across the review period, peaking at nine studies in 2022 and sustaining eight studies during 2023–2024. This suggests that while traditional algorithms such as SVMs and decision trees remain relevant, they are increasingly supplemented by more complex architectures. Learning Paradigms and Optimization methods appear later in the timeline, from 2022 onward, gradually increasing to eight studies in 2024 and indicating a growing interest in adaptive learning strategies and optimization-based improvements.
At the subcategory level, as shown in Figure 7, the top ten approaches reveal more nuanced patterns. Ensemble Learning Methods steadily gain traction from 2020, reaching their highest point in 2024 with 11 studies, reinforcing their reputation for robustness across heterogeneous cyber datasets. LSTM and Variants show sharp growth from 2021 onward, peaking at 10 studies in 2024, consistent with their strength in modelling sequential attack patterns.
Feedforward Networks and Variants demonstrate continuous adoption, rising from one study in 2020 to eight studies in 2023 before stabilising, confirming their value as general-purpose deep models. Similarly, Core CNN Architectures surge in 2024, with nine studies, reflecting renewed interest in image-like cybersecurity representations. Classical approaches such as SVM and Variants, Tree-Based Models, and Statistical Models experience moderate but significant usage during 2022–2024, often in hybrid or explainable frameworks.
A notable observation is the trajectory of Boosting Methods, which rise steadily to five studies in 2024, suggesting sustained relevance as a performance-enhancement strategy in evolving attack settings. Hybrid Architectures maintain a low but stable presence across the years, while Optimization Algorithms display a late surge in 2024, with six studies, indicating growing experimentation with parameter tuning and adaptive optimization for cybersecurity tasks.

4.3. Co-Occurrence of Main Categories and Subcategories

At the main category level, as shown in Figure 8, the strongest co-occurrence is observed between Deep Learning Models and Hybrid, Ensemble, and Explainable Methods (26 instances, 26.26%). This reflects a consistent trend in the literature of augmenting deep neural architectures with ensemble strategies to improve accuracy, robustness, and generalization—especially in adversarial or imbalanced cybersecurity contexts. The second most frequent pairing is between Classical Machine Learning Models and Hybrid, Ensemble, and Explainable Methods (21 instances, 21.21%), suggesting that ensembles remain a vital strategy to leverage the diversity of traditional models such as decision trees, SVMs, and statistical techniques. Classical Machine Learning Models and Deep Learning Models co-occurred 18 times (18.18%), often in comparative evaluations or hybrid frameworks designed to combine interpretability and efficiency with the representation learning power of deep networks. Finally, Learning Paradigms and Optimization methods appeared as secondary but non-negligible companions to Deep Learning Models (14 instances), indicating growing interest in reinforcement learning, optimization algorithms, and training strategies to fine-tune model performance.
These co-occurrences should be interpreted as patterns of joint methodological use within the reviewed studies rather than as direct evidence that the paired methods were always integrated into a single deployable detection pipeline.
At the subcategory level, the co-occurrence heatmap in Figure 9 highlights several important patterns. The strongest link is observed between Tree-Based Models and Ensemble Learning Methods (12 instances, 12.12%), reaffirming the widespread adoption of ensemble strategies such as bagging and boosting applied to decision-tree families (e.g., Random Forest; XGBoost). Ensemble Learning Methods also co-occur frequently with SVM and Variants (nine instances) and with Statistical Models (nine instances), underlining a broader trend of combining interpretable, mathematically grounded models with ensemble techniques to balance transparency and predictive power.
On the deep learning side, a notable co-occurrence is seen between LSTM and Variants and Core CNN Architectures (10 instances, 10.10%). This indicates the prevalence of hybrid temporal–spatial deep models, where CNNs extract structural or spatial patterns (e.g., from network flows or packet payloads) while LSTMs capture sequential dependencies, making them well suited, at least conceptually, to attacks with both spatial and temporal signatures. Feedforward Networks and Variants also frequently pair with LSTM and Variants (seven instances) and Ensemble Learning Methods (six instances), highlighting their role as baseline or complementary components in more complex architectures.
Boosting techniques are another recurring theme, with frequent co-occurrence alongside Tree-Based Models (seven instances), SVM and Variants (six instances), and Statistical Models (five instances). This suggests that boosting remains a key strategy to elevate the performance of both classical and statistical learners in cybersecurity tasks. Finally, Optimization Algorithms, though less frequently paired overall, appear most often in combination with LSTM and Variants (six instances), reflecting ongoing research into improving deep sequence models through enhanced optimization strategies.
Overall, these patterns suggest a field that is moving increasingly towards integrative strategies: ensembles to improve robustness, hybrid architectures that blend classical and deep models, and optimization-driven refinements for specialized deep learning subcategories. This co-occurrence analysis highlights not only the dominance of ensemble thinking across both classical and deep paradigms but also a gradual convergence between spatial and temporal deep networks, which appears especially promising for detecting multi-faceted cyber threats.

4.4. Method Breadth per Paper

Table 10 presents the number of different methods employed within individual papers. A notable 22.2% of studies applied only a single method, indicating a focused, single-technique approach—often for benchmarking or proof-of-concept purposes. The most common configuration, however, involved two methods (25.3%), reflecting a tendency to either pair complementary algorithms for improved robustness and performance, or to compare novel techniques against established baselines. Only a small number of studies adopted extensive methodological portfolios, with as many as seven (3.0%), eight (1.0%), eleven (1.0%), or even thirteen (1.0%) methods tested within a single paper. Such high counts are typically associated with large-scale comparative studies or advanced ensemble frameworks that aim for comprehensive performance evaluations.
At the main category level, the analysis (Table 11) shows that the majority of papers restricted themselves to a limited number of categories. Specifically, 43 papers (43.4%) relied on only one main category, while another 42 papers (42.4%) combined two categories. This suggests that most research either focused narrowly on one paradigm (e.g., tree-based models, neural networks, or probabilistic methods) or paired two distinct categories to balance interpretability and predictive strength. Only 13 papers (13.1%) employed three categories, and just 1 study (1.0%) explored four, highlighting the rarity of highly diversified category-level integrations. Overall, more than 85% of the studies used one or two main categories, underscoring the preference for simplicity or targeted methodological design.
At the subcategory level, shown in Table 12, a similar but more nuanced trend emerges. Around one-quarter of the papers (25.3%) limited themselves to a single subcategory, while the largest share (37.4%) combined two subcategories. Taken together, more than 60% of studies relied on only one or two subcategories, indicating a preference for simpler designs or narrowly targeted approaches to cyberattack detection. More complex integrations also appear, with 12.1% of papers using three subcategories, another 12.1% using four, and smaller proportions using five (7.1%), six (4.0%), or seven (2.0%). These higher counts are characteristic of ensemble and hybrid designs, where multiple algorithm families contribute to a unified detection system.

4.5. ML Method-Side Key Findings and Research Gaps

The machine learning results indicate that current AI-driven cyber defense research is shaped as much by benchmark structure as by algorithmic suitability. The strong presence of deep learning and ensemble methods suggests that the field is optimizing for settings in which large volumes of structured or semi-structured traffic data are available and where classification performance can be improved through feature learning, model combination, and robustness to class imbalance. This helps explain why Random Forest, LSTM, CNN, and hybrid ensemble architectures appear so frequently across the reviewed studies.
At the same time, the dominance of these families should not be read as proof that they are universally the best choices for cyber defense. Their concentration is partly a function of the tasks and datasets most commonly studied. High-volume intrusion datasets and overt attack classes naturally favour methods that perform well on tabular traffic features, sequential traces, or benchmark-style supervised learning problems. By contrast, method families that may be valuable for relational reasoning, adaptive defense, or sparse high-context environments—such as graph neural networks, reinforcement learning, and more systematic explainability-oriented approaches—remain comparatively rare. This indicates a mismatch between the richness of the available ML toolbox and the narrowness of its current application.
A second gap concerns operational realism. Classical models remain present not only because they are familiar, but because they are computationally efficient, interpretable, and easier to deploy in constrained environments. This suggests that real-world deployment constraints continue to matter, even when the literature favours high-capacity deep models. Future work should therefore move beyond benchmark-driven performance comparisons and examine which method families are most appropriate for specific ATT&CK stages, data regimes, interpretability requirements, and deployment contexts. In particular, more work is needed on methods suited to stealthy, long-horizon, and low-resource detection settings rather than only high-signal benchmark tasks.
These methodological patterns become more interpretable when considered alongside the benchmark concentration and category imbalances in the dataset landscape examined in the next section.

5. Datasets Across Reviewed Studies

The datasets used across the reviewed literature through the dataset-categorisation rules are defined in Section 2.6. In total, 96 unique datasets were identified across the 99 included studies. Each dataset was recorded using the name reported by the study authors and then assigned to a refined taxonomy based on operational context, data source, and intended cybersecurity use case. This approach allowed datasets with similar names but different analytical roles to be categorised consistently, while also permitting reassignment when prior survey classifications did not match how the dataset was actually used in the reviewed study.
Selecting an appropriate dataset is central to AI-driven cybersecurity research, because the reliability and generalisability of machine learning models depend heavily on how well the underlying data represent operational threats and environments [8,53]. To support comparative analysis, datasets in this review were organised using prior survey frameworks [17,18] and then refined where necessary to better reflect their observed use in the reviewed studies.
On this basis, the datasets were divided into seven primary categories:
  • NIDD (Network-based Intrusion Detection Datasets);
  • IoT-NIDD (IoT-specific Network-based Intrusion Detection Datasets);
  • S&P (Spam and Phishing Datasets);
  • ICS (Industrial Control System Datasets);
  • Insider Threat;
  • Custom-Collected Datasets;
  • Other (e.g., computer vision, NLP, or behavioural datasets).
Two refinements are particularly important. First, an additional IoT-NIDD category was introduced to distinguish IoT-specific intrusion datasets from broader NIDD resources, reflecting the distinct traffic characteristics and communication constraints of IoT environments. Second, some datasets were reassigned when their actual use in the reviewed studies differed from earlier survey classifications. For example, the IEEE Bus Distribution and Gas Pipeline datasets [54] were categorised under ICS because they were used in industrial or cyber–physical contexts in the reviewed literature. Likewise, computer-vision datasets such as Udacity, GTSRB, and UAVid were placed in the “Other” category because of their relevance to autonomous systems, smart mobility, and broader cyber–physical settings. The full dataset-level information used to support this taxonomy is provided in Appendix B.

5.1. Dataset Frequency Analysis

Table 13 summarizes the frequency of dataset usage across the refined dataset categories. Network-based Intrusion Detection Datasets (NIDDs) are by far the most frequently used category, appearing in 65 studies. They are followed by IoT-NIDD datasets, which appear in 31 studies, less than half the usage frequency of standard NIDD resources. Malware datasets are used in 20 studies, while Spam and Phishing (S&P) datasets appear in 17. At the lower end, ICS datasets appear in 12 studies, Other datasets in 7, and Insider Threat datasets in only 4.
This distribution indicates a strong research concentration around NIDD-, IoT-NIDD-, and malware-related benchmarks, while insider-threat and ICS-oriented data remain comparatively under-represented. In practical terms, this means that the literature is richest in domains where public benchmark datasets are already mature and easily reusable, and much thinner in settings where data collection is operationally sensitive, difficult to label, or harder to share.
The benchmark concentration is also visible within categories. In NIDDs, a small set of widely reused public datasets dominates, especially CSE-CIC-IDS2017, UNSW-NB15, and NSL-KDD. Similar concentration patterns appear within IoT-NIDD and S&P categories. This repeated reliance on a limited family of public datasets has methodological consequences: it may inflate apparent model maturity while masking sensitivity to domain shift, class imbalance, and attack distributions outside the benchmark setting.

5.2. Dataset Trends (2019–2024)

Because the 2025 data are incomplete, the trend analysis is limited to the period 2019–2024. Figure 10 shows the yearly usage patterns of the major dataset categories across the reviewed studies.
NIDD exhibits the clearest and most sustained growth, increasing steadily from 2019 to a peak in 2024. This confirms its continued centrality in AI-based cybersecurity research and reflects the long-standing availability of reusable public intrusion-detection benchmarks. IoT-NIDD also grows gradually through 2023 before rising more sharply in 2024, suggesting increasing academic and industrial attention to the security of connected and resource-constrained devices.
Malware datasets peak earlier, particularly in 2021, and then decline before stabilising. A similar pattern appears for S&P datasets, which rise to a peak in 2022 and then level off. These trajectories may indicate bursts of concentrated research attention followed by partial saturation, rather than the steady benchmark reuse observed in NIDD. By contrast, ICS datasets remain relatively sparse throughout the period, although they show modest growth toward 2024, indicating gradually increasing interest in critical infrastructure and cyber–physical security.
Custom-collected datasets increase through 2023 and then decline in 2024. This pattern may reflect the growing availability of standardized public datasets, which reduce the need for researchers to construct new datasets for every study. Taken together, the trend analysis suggests that dataset usage is not only shaped by threat priorities, but also by the relative availability, maturity, and reusability of public benchmark resources.

5.3. Co-Occurrence of Dataset Categories

Only 42 of the 99 reviewed studies used more than one dataset. For this subset, co-occurrence analysis was used to examine which dataset categories most often appeared together within the same paper (Figure 11). These co-occurrences should be interpreted as patterns of joint dataset use within the reviewed studies rather than as evidence that the paired categories are naturally equivalent or directly interoperable.
The strongest co-occurrence is between NIDD and IoT-NIDD datasets (eight studies). This suggests that researchers increasingly view conventional network intrusion benchmarks and IoT-specific intrusion datasets as complementary resources for evaluating detection methods across heterogeneous network environments. However, the relatively small number of such studies also indicates that broad comparative validation across conventional and IoT traffic settings remains limited.
ICS datasets co-occur less frequently, appearing three times with NIDD and two times with IoT-NIDD. These pairings suggest growing efforts to adapt or compare intrusion-detection methods across conventional, IoT, and industrial cyber–physical environments. At the same time, the low co-occurrence counts indicate that cross-domain evaluation remains uncommon, especially in settings that bridge ICS and IIoT security. This is an important gap, as industrial systems increasingly incorporate IoT-like devices, communication layers, and distributed sensing infrastructures.
Overall, the co-occurrence results suggest that multi-domain dataset evaluation is still relatively rare. Most studies continue to remain within a single benchmark family or a narrow pair of related categories, limiting evidence of how well published methods transfer across substantially different operational settings.

5.4. Dataset Breadth per Paper

Figure 12 shows the distribution of dataset usage per paper across the 99 reviewed studies. A substantial majority of studies rely on a small number of datasets: 58 papers use only one dataset, 21 use two, and 12 use three. Only nine studies incorporate four or more datasets.
This pattern indicates that narrow evaluation settings still dominate the literature. In some cases, such focused designs are appropriate because the study targets a specific attack type, domain, or detection scenario. However, the prevalence of single-dataset experiments also raises concerns about limited generalisability, as methods validated in one benchmark environment may not transfer well to other organisations, traffic conditions, or operational domains.
The small number of studies using four or more datasets suggests that broad validation remains the exception rather than the norm. This is important because multi-dataset evaluation provides stronger evidence of robustness, transferability, and resistance to overfitting benchmark-specific properties. Overall, the distribution reinforces the concern that much of the current literature remains benchmark-convenient but only weakly validated across diverse real-world conditions.

5.5. Dataset-Side Key Findings and Research Gaps

The dataset analysis shows that current AI-based cyber defense research is strongly conditioned by benchmark availability. Network intrusion datasets dominate the literature, with repeated reliance on a relatively small set of public benchmarks such as NSL-KDD, UNSW-NB15, and CICIDS2017. This concentration creates an availability bias: researchers are more likely to study attack classes that are already well represented in public data and are correspondingly less likely to investigate attack types, environments, and operational contexts for which benchmark datasets are scarce.
This has important consequences for how published performance should be interpreted. Highly reported results on widely reused datasets do not necessarily indicate broad real-world robustness; in many cases, they may instead reflect repeated optimization against familiar data distributions, familiar feature structures, and familiar attack classes. The prevalence of single-dataset studies further reinforces this concern, as it limits evidence of transferability across domains, organisations, or operational settings. As a result, benchmark progress may be overstating methodological maturity while understating generalization risk.
The weakest parts of the current evidence base are also the most strategically important. ICS, insider-threat, and other under-represented categories remain sparse, while multi-dataset evaluations are still uncommon. This means that the literature is richest where public data are easiest to obtain, not necessarily where defensive need is greatest. Future work should therefore prioritise broader dataset diversity, clearer reporting of dataset provenance and limitations, and more systematic evaluation across multiple datasets and domains. Without this shift, progress in AI-driven cyber defense risks remaining benchmark-strong but deployment-fragile.
These dataset-side constraints become even more revealing when interpreted alongside the attack and model distributions in the cross-reference analysis presented in the next section.

6. Cross-Reference Analysis

6.1. Cyberattacks × ML Overview

This section examines how machine learning (ML) approaches are distributed across adversarial behaviours by cross-referencing MITRE ATT&CK tactics and techniques with ML model families. Heatmaps are used to make relative concentration patterns visible across attack phases, model categories, and dataset-linked problem settings. Unless otherwise noted, counts are aggregated at the paper level across the 2019–2025 review window and should be interpreted as patterns of research attention rather than direct evidence of model superiority or causal attack sequencing.
The heatmaps should be interpreted as showing relative concentration of research activity and paper-level co-occurrence, not comparative model performance.

6.2. Tactics × ML Main Categories

Figure 13 shows a pronounced concentration of studies on Impact, Execution, Initial Access, Command and Control (C2), and Reconnaissance when paired with Deep Learning Models. In particular, Impact (55), Execution (42), and Initial Access (40) form the hottest region, followed by C2 (38) and Reconnaissance (35). Hybrid, Ensemble, and Explainable Methods also show notable intensity for Impact (36), Execution (26), and Initial Access (28), indicating that ensembles are frequently adopted alongside deep architectures in critical, high-visibility phases.
Viewed jointly, these patterns suggest that benchmark availability is shaping both which attacks are studied and which ML paradigms appear most successful, thereby narrowing the practical meaning of published performance gains.
By contrast, Collection, Persistence, and Resource Development remain consistently cool across all main categories, with maximum counts of 14, 14, and 3, respectively. Privilege Escalation exhibits a relatively flat profile across families: Classical = 6; Deep = 6; Hybrid = 6; Learning Paradigms and Optimization = 5.

Interpretation of Tactics × ML Main Categories Patterns

(1) The strong emphasis on Impact and Execution aligns with datasets that are easier to generate and label, such as volumetric DoS/DDoS, overt service disruption, and malware execution traces. These settings favour deep models that learn complex, non-linear signatures from high-dimensional traffic or telemetry.
(2) The elevated presence of ensembles in Impact, Initial Access, and Execution suggests a stabilizing role for bagging/boosting and model stacking when class imbalance and dataset heterogeneity are pronounced.
(3) The consistently low-intensity regions, including Persistence, Collection, and Resource Development, likely reflect data scarcity and measurement challenges associated with stealthy, low-and-slow behaviours and therefore represent a substantive research gap.

6.3. Tactics × ML Subcategories

Figure 14 resolves the main-category picture into specific architectures. Three patterns stand out:
  • Ensemble Learning as a cross-tactic workhorse. The hottest subcategory cells appear under Impact (23), Execution (19), and Initial Access (19), with consistently high values for C2 (17) and Reconnaissance (15). This indicates that ensembles are the default stabilizer across diverse data modalities, including netflows, logs, and host events, and across different class distributions.
  • Temporal models where sequences matter. LSTM and Variants are prominent for Impact (21), Initial Access (16), Execution (16), C2 (15), and Reconnaissance (15), which is consistent with the appeal of sequence modelling in settings where temporal ordering, staged activity, or periodic communication patterns are expected.
  • CNNs for structured/spatialized representations. Core CNN Architectures show meaningful intensity for Impact (16), Execution (15), Initial Access (13), Credential Access (10), and C2 (14). This is consistent with representations that transform packets, flows, binaries, or logs into grids, images, or embeddings exhibiting local spatial patterns.
Figure 14. Heatmap of MITRE ATT&CK tactics versus the ten most frequently represented ML method subcategories across the reviewed studies. Each cell reports the number of papers in which a given tactic was associated with a given ML subcategory. Darker cells indicate higher frequencies of co-occurrence in the literature.
Figure 14. Heatmap of MITRE ATT&CK tactics versus the ten most frequently represented ML method subcategories across the reviewed studies. Each cell reports the number of papers in which a given tactic was associated with a given ML subcategory. Darker cells indicate higher frequencies of co-occurrence in the literature.
Electronics 15 02804 g014
Supporting families appear in narrower roles: SVM and Variants and Tree-Based Models maintain moderate presence across Execution, Initial Access, and Impact, while Optimization Algorithms and Statistical Models are frequently used as feature selection, calibration, or baseline components rather than as final detectors.

Interpretation of Tactics × ML Subcategories

(1) Researchers combine deep sequence learners, such as LSTM models, and spatial learners, such as CNN models, to capture complementary temporal and structural cues; ensembles are then layered to improve generalization under distribution shift.
(2) The persistence of classical learners within high-frequency tactics implies that interpretability and efficiency remain important in operational contexts, such as on-appliance detection and explainable triage.
(3) Cooler subcategories under Defense Evasion, Discovery, and Exfiltration likely result from sparser ground truth and harder labelling, providing further evidence of underexplored, high-value detection targets.

6.4. Techniques × ML Subcategories

At the technique level, Figure 15 shows that the dataset-driven bias becomes most visible:
  • The DoS/DDoS family is hottest. Network Denial of Service shows high co-occurrence with Ensembles (18), LSTM (17), Feedforward (17), and CNN (12), with Statistical Models (11) also notable. Endpoint Denial of Service exhibits a similar spread, with Ensembles/LSTM = 13 and Feedforward = 12.
  • Perimeter techniques are broadly covered. Exploit Public-Facing Application is strongly represented across Ensembles (14), LSTM (12), CNN (10), and Feedforward (11). Active Scanning shows a balanced profile, with Ensembles 12, LSTM 11, CNN 10, and Hybrid 8, indicating that reconnaissance is modelled with both temporal and structural features.
  • Protocol and host-information signals skew simpler. Application Layer Protocol (Ensembles 13; CNN 9; Feedforward 8) and Gather Victim Host Information (Feedforward 11; LSTM 10; Optimization 8) suggest that tabular/engineered features remain competitive where semantics are well captured by aggregate statistics or handcrafted indicators.
Figure 15. Heatmap of high-frequency MITRE ATT&CK techniques versus the ten most frequently represented ML method subcategories across the reviewed studies. Each cell reports the number of papers in which a given technique was associated with a given ML subcategory. Darker cells indicate higher frequencies of co-occurrence in the literature.
Figure 15. Heatmap of high-frequency MITRE ATT&CK techniques versus the ten most frequently represented ML method subcategories across the reviewed studies. Each cell reports the number of papers in which a given technique was associated with a given ML subcategory. Darker cells indicate higher frequencies of co-occurrence in the literature.
Electronics 15 02804 g015
Less intense regions, such as Remote Access Software or Remote Access Services, likely reflect a gap in the literature rather than diminished operational importance.

Interpretation of Techniques × ML Subcategories

(1) Technique-level intensities mirror data availability: where labelled, shareable corpora exist, such as DoS, scanning, and exploitation datasets, the literature shows deep coverage across multiple families.
(2) Diverse co-occurrence for the same technique, such as Network Denial of Service across LSTM/CNN/FFN/Ensembles, indicates algorithmic complementarity: temporal cues, localized structure, and robust aggregation each capture distinct signal facets.
(3) For textual/metadata-centric problems, such as phishing, feedforward models remain frequently used, consistent with structured features and a lower need for temporal context.

6.5. Heatmaps Meaning for Research and Practice

Bias toward visible, high-signal phases. Across all views, the field concentrates on Impact, Execution, and Initial Access—phases that are easier to emulate and label and that produce strong signatures. This improves benchmark progress but risks blind spots in stealthy, post-compromise behaviours, including Persistence, Exfiltration, Privilege Escalation, and Resource Development.
Architectural division of labour. LSTM models are especially prominent where order and timing are likely to matter, such as beaconing, staged execution, or flow-based exfiltration patterns. Ensembles appear to serve a stabilizing role under class imbalance, heterogeneous datasets, and potential domain shift, which may help explain their ubiquity across the most intensively studied tactics and techniques.
Operational and data constraints shape model choice. Classical learners, including SVMs, tree-based models, and statistical models, persist in mid-to-hot zones because they are computationally efficient, interpretable, and deployable on constrained sensors. Optimization and feature selection appear as enablers that make higher-capacity detectors viable in production.

6.6. Cyberattacks × ML × Datasets

This subsection examines the combined relationship among cyberattacks, ML paradigms, and datasets. When the attack, model, and dataset dimensions are examined together, a clearer picture of current research practice emerges. Methodological choices are not made independently; they are strongly shaped by the types of attacks represented in public benchmarks and by the availability of reusable datasets. This helps explain why denial-of-service, brute force, scanning, and other highly visible network-centric behaviours are repeatedly associated with deep learning, tree-based ensembles, and hybrid methods.
The tri-axis view also exposes important absence patterns. Post-compromise ATT&CK phases such as Persistence, Lateral Movement, Privilege Escalation, Collection, and Exfiltration are not only less studied overall, but are also supported by a much narrower dataset base and a thinner range of ML methods. Likewise, ICS and insider-threat settings remain methodologically sparse, with limited evidence of broad model exploration or robust cross-dataset validation.
Taken together, these patterns suggest that current progress in AI-driven cyber defense is constrained less by a shortage of algorithms than by the narrowness of representative evaluation settings. This motivates the integrated synthesis presented in the following subsection.

6.7. Key Findings and Research Gaps

The cross-reference analysis provides the clearest evidence that methodological choices in AI-driven cyber defense are not independent of the attacks and datasets being studied. Instead, they are tightly coupled. High-frequency benchmark datasets are strongly associated with a narrow band of ATT&CK tactics and techniques, especially those linked to visible network intrusion, scanning, brute force, and denial-of-service behaviour. These settings, in turn, favour model families such as deep learning architectures, tree-based ensembles, and hybrid methods that perform well on high-volume supervised traffic analysis tasks. As a result, the apparent dominance of particular ML paradigms is partly a reflection of benchmark structure rather than a neutral indicator of universal suitability.
This combined perspective also makes absence patterns more visible. Some of the most important gaps in the literature do not appear when attacks, models, or datasets are viewed separately, but emerge only at their intersections. For example, post-compromise ATT&CK phases such as Persistence, Lateral Movement, Privilege Escalation, Collection, and Exfiltration are not only less studied overall; they are also associated with a much narrower range of datasets and a thinner range of ML methods. Likewise, ICS and insider-threat settings remain methodologically sparse, with limited evidence of diverse model exploration, robust cross-dataset validation, or sustained attention to long-horizon adversarial behaviour.
The main implication is that current progress in AI-driven cyber defense is constrained less by a shortage of algorithms than by a shortage of representative evaluation settings. The field has developed a rich catalogue of ML methods, but these methods are repeatedly validated on a relatively narrow subset of benchmark tasks. Consequently, there is a risk that methodological sophistication is being mistaken for broad defensive readiness. Future work should therefore target missing attack–model–dataset intersections, especially those involving stealthy post-compromise tactics, under-represented operational domains, and multi-dataset or cross-domain validation settings.
These tri-axis gaps directly motivate the broader limitations and structural bottlenecks discussed in the following section.

7. Gaps and Limitations

This section synthesises findings across attacks, ML methods, and datasets to identify where current AI-driven cyber defense research is well covered, where it is thin, and why. We also acknowledge threats to validity in our own review.

7.1. Coverage Gaps Across ATT&CK Tactics and Techniques

The cross-reference heatmaps reveal a systematic concentration on Initial Access, Execution, Command and Control, Reconnaissance, and Impact. By contrast, Persistence, Collection, and Resource Development remain consistently underexplored. Post-compromise behaviours such as Privilege Escalation, Lateral Movement, and Exfiltration also exhibit flatter, fragmented coverage. These phases are inherently stealthier, unfold over longer horizons, and are harder to label with reliable ground truth, which depresses dataset availability and skews model development toward high-signal, easily benchmarked problems (e.g., DoS/DDoS, scanning, and overt malware execution). The result is an ecosystem of models that perform strongly on visible, short-horizon events but provide limited assurance against low-and-slow, multi-stage adversaries.

7.2. Methodological Limitations in Model Development and Evaluation

Over-reliance on single-dataset studies. A substantial proportion of studies relied on only one dataset for model development and evaluation. While this can support focused benchmarking or proof-of-concept analysis, it also reflects simplified experimental settings that do not adequately capture the variability, complexity, and unpredictability of real-world cyber threats. As a result, many published models are optimized for narrow benchmark environments rather than for transfer across organisations, infrastructures, or attack settings. This contributes to limited generalisability and weak evidence for robustness against multi-stage or multi-vector adversaries.
Shallow alignment to ATT&CK semantics. Although many papers reference ATT&CK, the alignment between model inputs, labels, and outputs and specific tactics or techniques is often indirect. Models trained on coarse labels such as “attack” versus “benign” are sometimes discussed as though they detect fine-grained TTPs. Without semantically aligned labels and evaluation procedures, it is difficult to claim genuine coverage of stealthier adversarial behaviours such as credential abuse, persistence, or living-off-the-land activity.
Limited treatment of drift and adversarial pressure. Few studies explicitly address concept drift, evolving baselines, software changes, or adversarial ML threats such as poisoning, evasion, or backdoor manipulation. Consequently, many proposed detectors may be brittle in long-running deployments and vulnerable to targeted manipulation or degradation over time.
Underuse of complementary learning paradigms. While ensembles, CNNs, and LSTMs dominate the literature, several potentially valuable paradigms remain underexplored. These include graph-based learning for host–process–identity relationships, long-context sequence models for low-and-slow behaviours, self-/semi-supervised pretraining on large unlabelled telemetry, and reinforcement learning for adaptive defense or sensing. Where such methods do appear, they are typically validated on the same narrow set of benchmark datasets, which limits the strength of the conclusions that can be drawn about their broader practical value.

7.3. Dataset-Specific Gaps

Despite the growing diversity of publicly available cybersecurity datasets, this review reveals several persistent gaps in how these resources are used. These gaps directly affect the generalisability, applicability, and resilience of AI-powered cyber defense solutions.
Under-representation of critical domains. Some of the most strategically important domains remain sparsely represented in the literature. Insider-threat datasets appear in only a very small number of studies, despite the operational importance of insider risk in enterprise environments. ICS datasets also remain limited, and only a small number of studies explore their overlap with IoT-NIDD settings. Likewise, phishing and social-engineering datasets, although present, are rarely integrated with network or malware datasets, which restricts the study of multi-vector attacks that cross technical and human attack surfaces.
Limited heterogeneity and co-occurrence. Multi-dataset studies do exist, but they typically combine closely related sources, such as NIDD with IoT-NIDD, rather than truly heterogeneous pairings such as NIDD with ICS, insider-threat, or phishing data. As a result, models are rarely evaluated on composite, multi-stage scenarios such as phishing leading to credential theft, lateral movement, and exfiltration. This limits the realism of current evaluation practice.
Scarce use of dataset expansion and adaptation. Only a limited number of studies use custom-collected datasets, and even fewer combine them with public datasets to broaden coverage. In addition, techniques such as data augmentation, synthetic generation, simulation-based expansion, and domain adaptation remain comparatively rare. This constrains the ability of models to generalise to unseen environments, novel attack vectors, and changing threat conditions.

7.4. Limitations

This systematic literature review is subject to several limitations. First, the search strategy was bounded by the selected databases, search string, and time window (2019–2025), which means that some relevant studies may not have been captured. Second, the review included only English-language, peer-reviewed studies, so potentially relevant preprints, technical reports, and practitioner-oriented system documents were excluded. Third, the mapping of studies to ATT&CK tactics and techniques, ML categories, and dataset categories involved structured judgement; although coding rules and consistency checks were applied, some residual misclassification remains possible. Fourth, the review necessarily relies on author-reported metrics and experimental settings, which differ across preprocessing choices, sampling procedures, feature engineering, and evaluation protocols. This limits the comparability of reported performance across studies. Finally, the counts and heatmap intensities presented in this paper reflect patterns of emphasis within the literature, not the true prevalence of attacks in operational environments, and should therefore be interpreted as indicators of research attention rather than incident frequency.
Taken together, these gaps and limitations suggest that the main challenge facing AI-driven cyber defense is not simply model selection, but the interaction among semantic labelling quality, benchmark diversity, and realistic cross-domain evaluation.

8. Future Research Directions

Building on the gaps identified across attacks, methods, and datasets, this section outlines several actionable directions for advancing AI-driven cyber defense research. These directions are intended not as generic future-work statements, but as priorities derived directly from the tri-axis synthesis presented in this review.

8.1. Develop Multi-Domain and Multi-Stage Benchmarking Datasets

Future work should move beyond single-domain benchmarks and embrace multi-dataset integration. Composite datasets—linking, for example, NIDD traffic with phishing payloads, insider threat activity, and ICS telemetry—would enable the study of adversaries who operate across vectors and stages. Curating or simulating multi-stage attack chains (e.g., phishing → credential theft → lateral movement → data exfiltration) will be critical to closing the current fragmentation and testing model resilience in realistic, multi-hop kill chains.

8.2. Generate Fine-Grained Datasets Aligned with ATT&CK TTPs

Most current benchmarks offer only coarse-grained attack labels (e.g., “DoS” or “malware”). Future datasets should explicitly encode MITRE ATT&CK tactics and techniques, enabling models to learn at the semantic level of adversary behaviour. This includes collecting detailed telemetry for stealthy tactics (Persistence, Credential Access, and Lateral Movement) and ensuring balanced representation of underexplored techniques. Synthetic augmentation (e.g., GAN-based traffic generation, attack simulations in controlled labs) could help fill blind spots where real-world data are scarce.

8.3. Expand Methodological Horizons Beyond Standard Models

While ensembles, CNNs, and LSTMs dominate, more diverse ML paradigms should be tested against cyber defense challenges. Promising directions include the following:
  • Graph-based learning for modelling host-process-identity relationships, naturally capturing lateral movement or privilege escalation.
  • Long-context sequence models (e.g., Transformers) for detecting low-and-slow behaviours.
  • Self-/semi-supervised pretraining on large unlabelled telemetry for better generalisation.
  • Reinforcement learning for adaptive detection and proactive defense.
Crucially, such methods should not be validated only on narrow benchmarks but on heterogeneous, realistic scenarios.

8.4. Address Under-Represented Critical Domains

Insider threats, ICS/SCADA, IoT convergence, and phishing/social engineering remain substantially underexplored. Researchers should prioritise these domains, not in isolation, but by integrating them with traditional NIDD or malware datasets to reflect enterprise and cyber-physical system realities. Realistic cross-domain datasets would allow defense models to detect hybrid threats (e.g., phishing-initiated attacks against IoT-enabled critical infrastructure).

8.5. Promote Heterogeneity and Co-Occurrence in Evaluation

Future evaluation protocols should require models to demonstrate robustness across heterogeneous datasets and attack contexts, not just within one dataset family. Cross-domain benchmarking (e.g., training on NIDD and testing on IoT or ICS) would stress-test generalisation and reduce overfitting to dataset artefacts. Co-occurrence evaluation—where models face blended attack types within the same session—would better reflect real-world complexity.

8.6. Advance Data Expansion, Domain Adaptation, and Drift-Resilience

There is a clear need for systematic methods and evaluation frameworks that can cope with evolving adversarial environments. Techniques such as domain adaptation, transfer learning, and drift detection should be embedded into experimental pipelines. Data expansion through adversarial generation, simulation environments, and continual learning frameworks can provide models with resilience against concept drift and novel attack vectors. This direction is particularly critical for long-horizon tactics like persistence and stealthy credential abuse.

8.7. Key Research Priorities

Taken together, the future directions outlined in this section point toward a common priority: AI-driven cyber defense research must move beyond narrow benchmark optimisation and toward richer, ATT&CK-aligned, and cross-domain evaluation ecosystems. The strongest pattern across the identified priorities is the need for multi-stage datasets, heterogeneous validation settings, and broader testing of methods capable of handling long-horizon and structurally complex adversarial behaviour. The central gap is therefore not the absence of promising algorithms, but the absence of sufficiently realistic data and evaluation conditions in which such methods can be meaningfully assessed. This matters because without these shifts, the field will continue to produce strong benchmark results without corresponding confidence in real-world defensive readiness.
Without this shift, future AI-driven cyber defense research is likely to remain strong in benchmark performance but limited in operational generalisability.

9. Conclusions

This review mapped the contemporary landscape of AI-powered cyber defense across three linked evidence axes: cyberattacks, machine learning methods, and datasets. By analysing 99 peer-reviewed studies published between 2019 and 2025, aligning 312 attack labels to the MITRE ATT&CK framework, and organising 96 datasets into a refined taxonomy, this study provides an integrated perspective that goes beyond model-centric or dataset-centric surveys. In particular, it contributes the following: (i) an ATT&CK-aligned view of the attack landscape, (ii) a structured synthesis of ML method usage across attack contexts, and (iii) a tri-axis cross-reference analysis showing how attacks, models, and datasets interact to shape current research practice. Among these, the tri-axis cross-reference analysis is the distinctive contribution: by examining attacks, ML methods, and datasets jointly, it surfaces benchmark dependencies and missing intersections that are obscured in model-centred or dataset-centred reviews.
Three findings stand out. First, research attention remains strongly concentrated on high-visibility phases such as Initial Access, Execution, Command and Control, Reconnaissance, and Impact, while stealthier and longer-horizon phases such as Persistence, Privilege Escalation, Lateral Movement, and Exfiltration remain comparatively underexplored. Second, deep learning and ensemble-based approaches dominate the most intensively studied settings, especially those supported by large public intrusion-detection benchmarks, while classical models remain important because of their efficiency, interpretability, and deployability in constrained environments. Third, dataset availability exerts a strong shaping effect on the field: repeated reliance on a narrow group of public NIDD benchmarks supports progress on visible threats such as DoS/DDoS, scanning, and web exploitation, but provides far weaker evidence of robustness against multi-stage, cross-domain, and low-and-slow adversaries.
These findings make the principal gaps in the field more explicit. Current AI-driven cyber defense research remains heavily concentrated on benchmark-friendly, high-visibility attack settings, leaving the post-compromise ATT&CK phases—Persistence, Privilege Escalation, Lateral Movement, and Exfiltration—comparatively underexplored. The literature is also shaped by strong benchmark dependence, with repeated reliance on a small set of public NIDD datasets and only limited evidence from multi-dataset, cross-domain, or long-horizon evaluation. As a result, under-represented but strategically important domains—including ICS/IIoT environments, insider threats, and phishing-led multi-stage attack scenarios—remain much less studied than conventional intrusion-detection settings. Together, these gaps reflect three structural bottlenecks that constrain real-world impact: dataset concentration, which channels evaluation toward a narrow set of reusable benchmarks; semantic shallowness, where coarse labels are sometimes interpreted as though they provide fine-grained ATT&CK coverage; and weak evaluation diversity, reflected in limited cross-dataset, cross-domain, and long-horizon validation.
Future research should therefore prioritise the development of richer ATT&CK-aligned datasets, especially for stealthy and long-horizon adversarial behaviours that remain difficult to observe in existing benchmarks. Greater emphasis is needed on multi-domain and multi-stage evaluation settings that combine network, IoT, phishing, insider-threat, and ICS/IIoT evidence, rather than validating models only within a single benchmark family. In methodological terms, the field would benefit from broader testing of models suited to structurally complex and low-and-slow environments, together with stronger evaluation under dataset shift, concept drift, and cross-organisational variation. From a practical standpoint, security operations and platform teams can translate these insights into near-term gains by combining heterogeneous telemetry sources, balancing interpretable and high-capacity detectors, and evaluating systems using operator-relevant criteria such as transferability, latency, alert burden, and investigation cost. Without these changes, progress in AI-driven cyber defense risks remaining strong at the benchmark level but weak in deployment realism.
Ultimately, the next stage of progress in AI-powered cyber defense will depend less on adding new models and more on building richer datasets, stronger cross-domain evaluations, and more realistic ATT&CK-aligned evidence.

Author Contributions

Conceptualization, M.C., A.A. and Q.K.A.M.; Methodology, M.C., A.A. and Q.K.A.M.; Software, M.C.; Validation, M.C. and H.C.; Formal Analysis, M.C. and H.C.; Investigation, M.C.; Data Curation, M.C.; Writing—Original Draft Preparation, M.C.; Writing—Review and Editing, A.A., Q.K.A.M. and H.C.; Visualization, M.C.; Supervision, A.A. and Q.K.A.M.; Project Administration, M.C. All authors have read and agreed to the published version of this manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are available on request from the corresponding author.

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive comments and helpful suggestions, which helped improve the quality and clarity of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Mapping of Identified Cyberattacks to MITRE ATT&CK Tactics and Techniques

Table A1. Mapping of identified cyberattacks to MITRE ATT&CK tactics, techniques, and IDs.
Table A1. Mapping of identified cyberattacks to MITRE ATT&CK tactics, techniques, and IDs.
Attack/KeywordTacticTechniqueTechnique ID
Reconnaissance
 Vulnerability ScanningReconnaissanceActive Scanning: Vulnerability ScanningT1595.002
 mscanReconnaissanceActive Scanning: Scanning IP BlocksT1595.001
 saintReconnaissanceActive Scanning: Vulnerability ScanningT1595.002
 portsweepReconnaissanceActive Scanning: Scanning IP BlocksT1595.001
 satanReconnaissanceActive Scanning: Vulnerability ScanningT1595.002
 ipsweepReconnaissanceActive Scanning: Scanning IP BlocksT1595.001
 nmapReconnaissanceActive ScanningT1595
 PortScanReconnaissanceActive ScanningT1595
 PortScan OSReconnaissanceGather Victim Host Information: Client ConfigurationsT1592.004
 ReconnaissanceReconnaissanceActive Scanning, Gather Victim Host InformationT1595, T1592
 OS FingerprintingReconnaissanceGather Victim Host Information: Client ConfigurationsT1592.004
 ProbeReconnaissanceActive ScanningT1595
 OS ScanningReconnaissanceGather Victim Host Information: Client ConfigurationsT1592.004
 ScanningReconnaissanceActive ScanningT1595
 Browsing job-hunting or competitor websitesReconnaissanceGather Victim Org InformationT1591
 Information GatheringReconnaissanceGather Victim Host InformationT1592
 SYN ScanReconnaissanceActive ScanningT1595
 TCP Connect ScanReconnaissanceActive ScanningT1595
 UDP ScanReconnaissanceActive ScanningT1595
 Network Sweep (IP range scanning)ReconnaissanceActive ScanningT1595
 Mirai–Junk/ScanReconnaissanceActive ScanningT1595
 Bashlite–Junk/ScanReconnaissanceActive ScanningT1595
 Service ScanReconnaissanceActive ScanningT1595
 pingsweepReconnaissanceActive ScanningT1595
Resource Development
 Domain AbuseResource DevelopmentAcquire Infrastructure: DomainsT1583.001
 Malicious DomainResource DevelopmentAcquire Infrastructure: DomainsT1583.001
 Account creation for fake reviewsResource DevelopmentEstablish Accounts: Social Media AccountsT1585.001
 Phishing KitsResource DevelopmentDevelop CapabilitiesT1587
 Hosting InfraResource DevelopmentAcquire Infrastructure: Web ServicesT1583.006
 Botnet fake accountResource DevelopmentEstablish Accounts: Social Media AccountsT1585.001
 Sybil AttackResource DevelopmentCompromise AccountsT1586
Initial Access
 sendmailInitial AccessExploit Public-Facing ApplicationT1190
 namedInitial AccessExploit Public-Facing ApplicationT1190
 ftp_writeInitial AccessValid AccountsT1078
 Web Attack–SQL InjectionInitial AccessExploit Public-Facing ApplicationT1190
 LDAP InjectionInitial AccessExploit Public-Facing ApplicationT1190
 XPath InjectionInitial AccessExploit Public-Facing ApplicationT1190
 mysqlInitial AccessValid AccountsT1078
 Unintentional Illegal RequestsInitial AccessExploit Public-Facing ApplicationT1190
 sqlattackInitial AccessExploit Public-Facing ApplicationT1190
 SSI InjectionInitial AccessExploit Public-Facing ApplicationT1190
 phfInitial AccessExploit Public-Facing ApplicationT1190
 ExploitsInitial AccessExploit Public-Facing ApplicationT1190
 Phishing EmailInitial AccessPhishingT1566
 InfiltrationInitial AccessPhishingT1566
 CVE-2017-5638 (Struts2)Initial AccessExploit Public-Facing ApplicationT1190
 proftpdInitial AccessExploit Public-Facing ApplicationT1190
 apache-strutsInitial AccessExploit Public-Facing ApplicationT1190
 Spam EmailInitial AccessPhishing: Spearphishing via EmailT1566.001
 SpamInitial AccessPhishingT1566
 PhishingInitial AccessPhishingT1566
 Fake PagesInitial AccessPhishingT1566
 Phishing Site DeploymentInitial AccessPhishing: Spearphishing LinkT1566.002
 AI-generated phishing URLsInitial AccessPhishing: Spearphishing LinkT1566.002
 Payload via EmailInitial AccessPhishing: Spearphishing via EmailT1566.001
 Telnet exploitInitial AccessExploit Public-Facing ApplicationT1190
 Generic (Generic Exploits)Initial AccessExploit Public-Facing ApplicationT1190
 Code Red WormInitial AccessExploit Public-Facing ApplicationT1190
 Parameter TamperingInitial AccessExploit Public-Facing ApplicationT1190
 Path TraversalInitial AccessExploit Public-Facing ApplicationT1190
 Opportunistic Service Attack (OSA)Initial AccessExploit Public-Facing ApplicationT1190
 Replay AttacksInitial AccessValid AccountsT1078
 SpearphishingInitial AccessPhishing: Spearphishing AttachmentT1566.001
 S7 unauthorized accessInitial AccessExploit Public-Facing ApplicationT1190
 Unauthorized access to HMI or SCADAInitial AccessValid AccountsT1078
Execution
 Web Attack–XSSExecutionCommand-Line Interface: JavaScriptT1059.007
 Server-Side IncludeExecutionServer Software ComponentT1505.003
 loadmoduleExecutionProcess Injection: Dynamic-Link Library InjectionT1055.001
 Web Attack-Command InjectionExecutionCommand-Line InterfaceT1059
 perlExecutionCommand-Line InterfaceT1059
 xtermExecutionCommand-Line InterfaceT1059
 ShellcodeExecutionExploitation for Client ExecutionT1203
 OS Command ExecutionExecutionCommand and Scripting InterpreterT1059
 Downloaders/DroppersExecutionIngress Tool TransferT1105
 S7 command injectionExecutionManipulation of Control/Command MessageT0851
 Command injection to PLC or RTUExecutionCommand and Scripting InterpreterT1059
 warezmasterExecutionCommand-Line InterfaceT1059
 JavaMeterpreterExecutionCommand-Line Interface: JavaScriptT1059.007
 Windows-RCEExecutionExploitation for Client ExecutionT1203
 TrojanExecutionUser ExecutionT1204
 Installing unauthorized softwareExecutionUser Execution: Malicious FileT1204.002
 Unauthorized command via MODBUSExecutionCommand and Scripting InterpreterT1059
 VirusesExecutionUser ExecutionT1204
 Fileless MalwareExecutionCommand and Scripting Interpreter: PowerShellT1059.001
 Allaple.A/LExecutionUser ExecutionT1204
 Agent.FYIExecutionCommand and Scripting InterpreterT1059
 Dialplatform.BExecutionUser ExecutionT1204
 InstantaccessExecutionUser ExecutionT1204
 VB.ATExecutionCommand and Scripting InterpreterT1059
 Yuner.AExecutionCommand and Scripting InterpreterT1059
 GatakExecutionCommand and Scripting InterpreterT1059
 LollipopExecutionCommand and Scripting InterpreterT1059
 TracurExecutionUser ExecutionT1204
 SimdaExecutionCommand and Scripting InterpreterT1059
 HtbotExecutionCommand and Scripting InterpreterT1059
 MiurefExecutionCommand and Scripting InterpreterT1059
 NerisExecutionCommand and Scripting InterpreterT1059
 KenjiroExecutionCommand and Scripting InterpreterT1059
 Hide and SeekExecutionCommand and Scripting InterpreterT1059
 MiraiExecutionCommand and Scripting InterpreterT1059
Persistence
 Wintrim.BXPersistenceBoot or Logon Autostart Execution: Registry Run Keys/Startup FolderT1547.001
 Autorun.KPersistenceBoot or Logon Autostart ExecutionT1547
 Murlo (TCP-based backdoor)PersistenceCreate or Modify System ProcessT1569
 Browser HijackingPersistenceSoftware Extensions: Browser ExtensionsT1176.001
 Uploading AttackPersistenceServer Software Component: Web ShellT1505.003
 Modification of control logicPersistenceEvent Triggered ExecutionT1546
 Malware with Persistence via RegistryPersistenceBoot or Logon Autostart Execution: Registry Run Keys/Startup FolderT1547.001
 Alueron.gen!JPersistenceBoot or Logon Autostart ExecutionT1547
 Dontovo.APersistenceBoot or Logon Autostart ExecutionT1547
 Lolyda.AA1/AA2/AA3/ATPersistenceBoot or Logon Autostart ExecutionT1547
 Malex.gen!JPersistenceBoot or Logon Autostart ExecutionT1547
 Skintrim.NPersistenceBoot or Logon Autostart ExecutionT1547
 Kelihos_ver3PersistenceBoot or Logon Autostart ExecutionT1547
 Kelihos_ver1PersistenceBoot or Logon Autostart ExecutionT1547
 VundoPersistenceBoot or Logon Autostart ExecutionT1547
 ShifuPersistenceBoot or Logon Autostart ExecutionT1547
 ToriiPersistenceBoot or Logon Autostart ExecutionT1547
Privilege Escalation
 xlockPrivilege EscalationExploitation for Privilege EscalationT1068
 Buffer OverflowPrivilege EscalationExploitation for Privilege EscalationT1068
 Adduser (Unauthorized)Privilege EscalationCreate AccountT1136
Defense Evasion
 CRLF InjectionDefense EvasionObfuscated Files or InformationT1027
 RootkitDefense EvasionRootkitT1014
 Logging in outside working hoursDefense EvasionValid AccountsT1078
 Obfuscator.ADDefense EvasionObfuscated Files or InformationT1027
 Obfuscator.ACYDefense EvasionObfuscated Files or InformationT1027
 URL ObfuscationDefense EvasionObfuscated Files or InformationT1027
 Avoiding spam detectionDefense EvasionEmail Obfuscation/Content SpoofingT1566/T1027
 Log deletion or obfuscationDefense EvasionIndicator Removal on HostT1070.001
 Decreased Rank AttackDefense EvasionImpair DefensesT1562
 hash-based malwareDefense EvasionObfuscated Files or InformationT1027
 SpoofingDefense EvasionModify Authentication ProcessT1556
 Disabling alarms or event logsDefense EvasionIndicator RemovalT1070
 AI-generated malwareDefense EvasionObfuscated Files or InformationT1027
 Obfuscated MalwareDefense EvasionObfuscated Files or InformationT1027
 Polymorphic malwareDefense EvasionObfuscated Files or InformationT1027
 Metamorphic malwareDefense EvasionObfuscated Files or InformationT1027
 Packed malwareDefense EvasionObfuscated Files or Information: Software PackingT1027.002
 FakereanDefense EvasionObfuscated Files or InformationT1027
 Swizzor.gen!E/IDefense EvasionObfuscated Files or InformationT1027
 Nsis-ayDefense EvasionObfuscated Files or InformationT1027
Credential Access
 imapCredential AccessExploitation of Remote ServicesT1210
 Infiltration-mitmCredential AccessAdversary-in-the-MiddleT1557
 MITMCredential AccessAdversary-in-the-MiddleT1557
 Telnet remote access attemptsCredential AccessBrute ForceT1110
 xsnoopCredential AccessInput CaptureT1056
 spyCredential AccessInput CaptureT1056
 KeyloggerCredential AccessKeyloggingT1056.001
 Web Brute ForceCredential AccessBrute ForceT1110
 guess_passwdCredential AccessBrute ForceT1110
 FTP-PatatorCredential AccessBrute ForceT1110
 SSH-PatatorCredential AccessBrute ForceT1110
 SSH Brute ForceCredential AccessBrute ForceT1110
 SSH AttackCredential AccessBrute ForceT1110
 RDP Brute ForceCredential AccessBrute ForceT1110.003
 Password CrackingCredential AccessBrute Force: Password SprayingT1110.004
 RTSP Brute ForceCredential AccessBrute ForceT1110
 Credential HarvestingCredential AccessInput Capture/PhishingT1056/T1566
 SFTP attackCredential AccessBrute ForceT1110
 Brute ForceCredential AccessBrute ForceT1110
 HeartbleedCredential AccessExploitation for Credential AccessT1212
 Hydra-FTP (FTP brute-force attacks)Credential AccessBrute Force: Password GuessingT1110.001
 Hydra-SSH (SSH brute-force attacks)Credential AccessBrute Force: Password GuessingT1110.001
 Bashlite–bruteCredential AccessBrute Force: Password GuessingT1110.001
 DNS SpoofingCredential AccessAdversary-in-the-MiddleT1557
 Dictionary BruteForceCredential AccessBrute Force: Password GuessingT1110.001
 Wormhole Attack (WHA)Credential AccessAdversary-in-the-MiddleT1557
 Hijacking or spoofing PLC communicationsCredential AccessAdversary-in-the-MiddleT1557
 Evil Twin AttacksCredential AccessAdversary-in-the-MiddleT1557
 RamnitCredential AccessCredentials from Password StoresT1555
 CridexCredential AccessCredentials from Password StoresT1555
 ZeusCredential AccessCredentials from Password StoresT1555
 TinbaCredential AccessCredentials from Password StoresT1555
Discovery
 Fuzzers AttackDiscoveryNetwork Service ScanningT1046
 snmpguessDiscoveryNetwork Service ScanningT1046
 psDiscoveryProcess DiscoveryT1057
 snmpgetattackDiscoveryNetwork Service ScanningT1046
 AnalysisDiscoveryNetwork SniffingT1040
 Sniffing AttacksDiscoveryNetwork SniffingT1040
 ARP SpoofingDiscoveryNetwork SniffingT1040
 Host DiscoveryDiscoveryRemote System DiscoveryT1018
 Sinkhole Attack (SHA)DiscoveryNetwork SniffingT1040
Lateral Movement
 smb-exploitLateral MovementExploitation of Remote ServicesT1210
 WormsLateral MovementRemote ServicesT1021
 W32.Blaster WormLateral MovementExploitation of Remote ServicesT1210
 Reaper WormLateral MovementExploitation of Remote ServicesT1210
 Virut (Malware propagation)Lateral MovementReplication Through Removable MediaT1091
 ConfickerLateral MovementExploitation of Remote ServicesT1210
 HakaiLateral MovementRemote ServicesT1021
 MuhstikLateral MovementRemote ServicesT1021
Collection
 File DisclosureCollectionData from Local SystemT1005
 Searching for sensitive documentsCollectionData from Local SystemT1005
 SpywareCollectionInput CaptureT1056
 ARP MitMCollectionAdversary-in-the-MiddleT1557
 Active WiretapCollectionAdversary-in-the-MiddleT1557
 Accessing sensitive files repeatedlyCollectionData from Local SystemT1005
 Printing sensitive informationCollectionData from Local SystemT1005
 InfostealersCollectionInput CaptureT1056
 Adialer.CCollectionInput CaptureT1056
 Alueron.gen!JCollectionIngress Tool TransferT1105
Command and Control
 BackdoorCommand and ControlRemote Access SoftwareT1219
 httptunnelCommand and ControlApplication Layer Protocol: Web ProtocolsT1071.001
 multihopCommand and ControlProxy: Multi-hop ProxyT1090.003
 warezclientCommand and ControlApplication Layer ProtocolT1071
 Botnet ARESCommand and ControlApplication Layer Protocol: Web ProtocolsT1071.001
 Spam botnetCommand and ControlApplication Layer ProtocolT1071
 Meterpreter (Metasploit post-exploitation activity)Command and ControlApplication Layer ProtocolT1071
 IRC BotnetCommand and ControlApplication Layer ProtocolT1071
 HTTP BotnetCommand and ControlApplication Layer ProtocolT1071
 DNS TunnelingCommand and ControlApplication Layer Protocol: DNST1071.004
 BotnetCommand and ControlApplication Layer ProtocolT1071
 NerisCommand and ControlApplication Layer ProtocolT1071
 RbotCommand and ControlApplication Layer ProtocolT1071
 Menti (P2P botnet)Command and ControlNon-Application Layer ProtocolT1095
 Sogou (HTTP-based C2)Command and ControlApplication Layer Protocol: Web ProtocolsT1071.001
 NSIS.ay (Downloader trojan)Command and ControlIngress Tool TransferT1105
 Remote Access Trojan (RAT)Command and ControlRemote Access ToolsT1219
 Use of anonymizing toolsCommand and ControlProxy: Multi-hop ProxyT1090.003
 Command & Control using HTTPCommand and ControlApplication Layer Protocol: Web ProtocolsT1071.001
 C2LOP.P/gen!gCommand and ControlNon-Application Layer ProtocolT1095
 Rbot!genCommand and ControlIngress Tool TransferT1105
 GeodoCommand and ControlIngress Tool TransferT1105
 GafgytCommand and ControlIngress Tool TransferT1105
 Linux.HajimeCommand and ControlIngress Tool TransferT1105
Exfiltration
 Sending confidential info to competitorsExfiltrationExfiltration Over Web ServiceT1041
 Data ExfiltrationExfiltrationExfiltration Over Command and Control ChannelT1041
 Data Exfiltration ToolsExfiltrationExfiltration ToolsT1567
 Uploading to personal email/cloudExfiltrationExfiltration Over Web ServiceT1567.002
 Use of removable mediaExfiltrationExfiltration Over Physical Medium: Exfiltration over USBT1052.001
Impact
 mailbombImpactEmail BombingT1667
 smurf (ICMP amplification)ImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 neptuneImpactNetwork Denial of Service: Direct Network FloodT1498.001
 backImpactService StopT1489
 teardropImpactEndpoint Denial of Service: Application or System ExploitationT1499.004
 pod (Ping of Death)ImpactNetwork Denial of Service: Direct Network FloodT1498.001
 land DoSImpactEndpoint Denial of Service: OS Exhaustion FloodT1499.001
 apache2ImpactEndpoint Denial of Service: Service Exhaustion FloodT1499.002
 processtableImpactEndpoint Denial of Service: OS Exhaustion FloodT1499.001
 udpstormImpactNetwork Denial of Service: Direct Network FloodT1498.001
 Packts fragmentation attackImpactNetwork Denial of Service: Direct Network FloodT1498.001
 UDP FragmentationImpactNetwork Denial of Service: Direct Network FloodT1498.001
 ACK FragmentationImpactNetwork Denial of Service: Direct Network FloodT1498.001
 RSTFIN FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 PSHACK FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 ICMP FragmentationImpactNetwork Denial of Service: Direct Network FloodT1498.001
 SynonymousIP FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 SYN FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 UDP FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 ICMP FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 HTTP FloodImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 Apache Range HeaderImpactEndpoint Denial of Service: Application or System ExploitationT1499.004
 Slow POSTImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 HTTP/2 Rapid ResetImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 GraphQL OverloadImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 API Parameter AbuseImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 WS AmplificationImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 DoS GoldenEyeImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 DoS HulkImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 DoS SlowhttptestImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 DoS SlowlorisImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 DoSImpactNetwork Denial of ServiceT1498
 DDoSImpactNetwork Denial of ServiceT1498
 DDoSsimImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 DDoS LOIC-UDPImpactNetwork Denial of Service: Direct Network FloodT1498.001
 DDoS LOIC-HTTPImpactNetwork Denial of Service: Direct Network FloodT1498.001
 DDoS HOICImpactNetwork Denial of Service: Direct Network FloodT1498.001
 DDoS BotImpactNetwork Denial of ServiceT1498
 DDoS StompImpactNetwork Denial of Service: Direct Network FloodT1498.001
 DDoS DYNImpactNetwork Denial of Service: Direct Network FloodT1498.001
 DDoS TCPImpactNetwork Denial of Service: Direct Network FloodT1498.001
 DNS (DNS amplification)ImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 LDAP (UDP flood)ImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 MSSQL (UDP flood)ImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 NetBIOS (UDP flood)ImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 NTP (NTP amplification)ImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 Portmap (UDP flood)ImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 SNMP (SNMP amplification)ImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 CLDAP ReflectionImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 SSDP (SSDP amplification)ImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 UDP (Generic UDP flood)ImpactNetwork Denial of Service: Direct Network FloodT1498.001
 UDPLag (UDP with response lag)ImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 TFTP (UDP flood)ImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 WebDDoS (HTTP flood)ImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 MemcachedImpactNetwork Denial of Service: Reflection AmplificationT1498.002
 Mirai–TCP FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 Mirai–UDP FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 Mirai–HTTP FloodImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 Mirai-GREIP FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 Mirai-Greeth FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 Mirai-UDPPlainImpactNetwork Denial of Service: Direct Network FloodT1498.001
 Bashlite–TCP FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 Bashlite–UDP FloodImpactNetwork Denial of Service: Direct Network FloodT1498.001
 Bashlite–HTTP FloodImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 Bashlite–ACK/other floodsImpactNetwork Denial of Service: Direct Network FloodT1499.003
 SSL Renegotiation DoSImpactEndpoint Denial of Service: Application Exhaustion FloodT1499.003
 Flooding AttackImpactNetwork Denial of ServiceT1498
 DODAG Version Number AttackImpactEndpoint Denial of Service (ICS-specific)T1499.004
 RansomwareImpactData Encrypted for ImpactT1486
 Blackhole AttackImpactNetwork Denial of ServiceT1498
 Video InjectionImpactDefacementT1491
 Gear Spoofing AttackImpactData ManipulationT1565
 RPM Spoofing AttackImpactData ManipulationT1565
 False Data Injection AttackImpactData ManipulationT1565
 Tolerable FDIAImpactData ManipulationT1565
 Posting fake reviewsImpactData ManipulationT1565
 Opinion Spam (Fake Review Attack)ImpactData ManipulationT1565
 Disinformation/fake contentImpactData ManipulationT1565
 Coordinated campaign (review farms)ImpactData ManipulationT1565
 Targeting businesses’ reputationImpactData ManipulationT1565
 Review floodingImpactData ManipulationT1565
 Malicious actuator controlImpactEndpoint Denial of ServiceT1499
 PTP Attack (time sync manipulation)ImpactData ManipulationT1565
 De-authentication DoSImpactEndpoint Denial of ServiceT1499
 Fake Landing (tricking the UAV into landing)ImpactData ManipulationT1565
 AdwareImpactResource HijackingT1496

Appendix B. Reviewed Datasets Information and Refined Taxonomy

Table A2. SLR Dataset Used Properties.
Table A2. SLR Dataset Used Properties.
DatasetCategoryYearNormalAttackMetadataFormatCountDurationKindNetworkCompleteSplitsBalancedLabeledClasses
KDD Cup 99 [55] NIDD 1999yesyesnoother5 M-emulatedsmallyesyesnoyes4
NSL-KDD [20]NIDD2009yesyesnoother148,517-emulatedsmallyesyesnoyes4
UNSW-NB15 [21]NIDD2015yesyesyespacket, other2,540,04431 hemulatedsmallyesyesnoyes9
CSE-CIC-IDS2017 [22]NIDD2017yesyesyespacket, bi.flow3,100,0005 daysemulatedsmallyesnonoyes9
CSE-CIC-IDS2018 [56]NIDD2018yesyesyespacket, bi.flow16,000,00010 daysemulatedsmallyesnonoyes15
CIC-DDoS2019 [57]NIDD2019yesyesyespacket, bi.flow50,000,0005 daysemulatedsmallyesnonoyes11
LITNET-2020 [58]NIDD2020yesyesyesflow, packet50,000,000monthsreallarge ISPyesnonoyes2
NetML-2020 [59]NIDD2020yesyesyesflow3,000,000daysemulatedsmallyesnoyesyes10
5G-NIDD [60]NIDD2021yesyesyesflow15,000hoursemulated5G wirelessyesnoyesyes2
FLNET2023 [61]NIDD2023yesyesyesflow6,000,00024 hreal + emulatedvariousyesnonoyes11
NGIDS-DS [62]NIDD2022yesyesyesflow20,000,000daysemulatedsmallyesnonoyes9
CAIDA 2007 [63]NIDD2007noyesyespackethugeminutesrealvariouspartialnonono-
BoNeSi Dataset [64]NIDD2010noyesnopacket100,000minutesemulatedlabnononono-
DDoSDB [65]NIDDsince 2020variesyesyespacket--real + syntheticvariousyesnonono-
App Layer DoS [66]NIDD2017yesyesyesflow1,000,0008 hemulatedsmallyesnonoyes4
CSIC HTTP 2010 [67]NIDD2010noyesyesHTTP logs67,000sessionsemulatedapplication-levelyesyesyesyes2
ECML/PKDD 2007 [68]NIDD2007noyesyessession600,000weeksrealtelecompartialyesyesyes2
NPS-2009-Casper-Rw [69]NIDD2009noyesyespacket, flow1,000,000hoursemulatedsmallnononono-
NCC Dataset [70]NIDD2022yesyesyesflow5,000,000hoursreal + emulatedhugeyesnonoyes14
NCC-2 Dataset [71]NIDD2023yesyesyesflow10,000,000hoursreal + emulatedhugeyesyesnoyes18
InSDN Dataset [72]NIDD2022yesyesyesflow4,000,000hoursemulatedSDNyesnoyesyes10
Benign and Malicious [73]NIDD2021yesyesyesother90,000-realvariousno-yesyes2
CTU-13 Dataset [74]NIDD2011yesyesyespacket13 scenariosdaysreal + emulatedbotnet trafficyesnonoyes13
USTC-TFC2016 [75]NIDD2017yesyesyespacket750,000hoursrealmalware datasetyesnonoyes10
N-BaIoT [76]IoT-NIDD2018yesyesyesflow100,000daysemulatedIoTyesnonoyes2
BoT-IoT [9]IoT-NIDD2018yesyesyesflow70,000,000daysemulatedIoTyesyesnoyes5
IoTPOT [77]IoT-NIDD2015noyesyespacket-weeksrealhoneypotpartialnonono-
ToN-IoT [23]IoT-NIDD2020yesyesyesflow, syslogs25,000,000daysemulated + realIoTyesyesyesyes9
IoT-23 [78]IoT-NIDD2020yesyesyesflow20,000,000daysemulated+realIoTyesnonoyes10
EdgeIIoT 2023 [79]IoT-NIDD2023yesyesyesflow2,000,000daysemulatededge IoTyesnoyesyes15
CIC-IoT2022 [80]IoT-NIDD2022yesyesyesflow, packet4,000,000daysemulatedsmall IoT labyesnonoyes6
CICIoT-2023 [81]IoT-NIDD2023yesyesyesflow, packet10,000,000daysemulatedIoT/5G hybridyesnonoyes8
UNSW IoT Traffic [82]IoT-NIDD2019yesyesyesflow1,000,000hoursemulatedIoTyesnoyesyes10
Distributed IoT [83]IoT-NIDD2021yesyesyesflow3,000,000hoursemulatedIoTyesnoyesyes10
ROUT-4-2023 [84]IoT-NIDD2023yesyesyesflow2,000,000hoursemulatedhybrid SDN & IoTyesnonoyes9
Kitsune Dataset [85]IoT-NIDD2018yesyesyespacket, flow100,000,000daysemulatedIoT & smart homeyesnonoyes21
Wi-Fi Dataset [86]IoT-NIDD2022yesyesyesflow1,000,000hoursemulatedWi-Fi labyesnoyesyes7
HCRL CAN [87]IoT-NIDD2020yesyesyesother, packet4,500,000hoursreal + syntheticvehicle CAN busyesnonoyes5
HCRL Car Hacking [88]IoT-NIDD2020yesyesyesother4,300,00040 minrealvehicle CAN busyesnonoyes5
Malimg [89]Malware2011noyesyesimages9339-staticmalware datasetyesnonoyes25
BIG 2015 [90]Malware2015noyesyesbinaries10 GB files-staticmalware datasetyesyesnoyes9
MaleVis [91]Malware2020noyesyesimages14,226-staticmalware datasetyesnoyesyes26
Malicia [92]Malware2013noyesyesbinaries11,668-staticmalware datasetyesnonoyes2
Drebin project [93]Malware2014noyesyesAPK files129,013-staticmobile malwareyesyesyesyes2
VX-Heavens [94]Malwaresince 2010noyeslimitedbinaries30,000-staticmalware datasetnononono-
VirusShare [95]Malwaresince 2010noyeslimitedbinaries--staticmalware datasetnononono-
VirusTotal [96]Malwaresince 2004noyesyesbinaries--static + dynamicmalware datasetnononoyes1
CIC-MalMem-2022 [97]Malware2022noyesyesmemory100,000hoursdynamicmalware memoryyesyesyesyes6
MemMal-D2024 [98]Malware2024noyesyesmemory100,000hoursdynamicmalware memoryyesyesyesyes2
CIC-CMD-2024 [99]Malware2024yesyesyesflow10,000,000daysreal + emulatedmalware datasetyesyesnoyes-
SpamEmail [100]S&P1999yesyesyesother4601-staticspam emails-nonoyes2
SpamClassification [101]S&P2021yesyesyesother5796-emulatedspam messages-yesnoyes2
Email Spam [102]S&P2020yesyesyesother5172-emulatedspam emails-yesyesyes2
SpamAssassin [103]S&Psince 2021yesyespartialother60471 yearrealspam emails-yesnoyes2
Benign Email [104]S&P2013yesnoyesother14,043-realbenign emails-no-yes1
Phishing Email [105]S&P2020yesyesyesother--emulatedphishing emails-yesnoyes2
Bot Account [106]S&P2023yesyesyesother8574-realsocial medianonoyesyes2
STIX & Curated [107]S&P2015noyesyesother--emulatedThreat Indicatorsnono-yes-
Alexa Phishing [108]S&Psince 2020yesyesyesother1M+-realPhishing URLsnononoyes2
PhishTank [109]S&P-noyesyesother100K+-realPhishing URLsyesnonoyes2
OpenPhish [110]S&P-noyesyesother--realPhishing URLsnononoyes2
Anti-Phishing WG [111]S&P-noyesyesother-monthsrealPhishing incidentsnono-yes2
YelpChi [112]S&P2013yesyesyesother45,000+yearsrealreviewsyesyesyesyes2
YelpNYC [113]S&P2015yesyesyesother160,000+yearsrealreviewsyesyesyesyes2
YelpZip [113]S&P2015yesyesyesother60,000+yearsrealreviewsyesyesyesyes2
Gas Pipeline [54,114]ICS2011yesyesyesflow, packet100,000hoursemulatedICS networkyesnonoyes2
SWaT dataset [115]ICS2015yesyesyesother946,72211 daysrealICS networkyesyesnoyes2
Necon-IIUM ICS Dataset [116]ICS2022yesyesyesother1,500,0007 daysemulatedICS networkyesnonoyes5
ERENO IEC-61850 [117]ICS2020yesyesyespacket, flow-2 hrealICS networkyesnonoyes5
IEEE 118-bus dataset [118]ICS2001yesyesyesother118 buses-syntheticICS networkyesno-yes3
IEEE 123-bus dataset [118]ICS1991yesnoyesother123 buses-syntheticICS networkyesnonoyes3
IEEE 13-bus dataset [118]ICS1992yesnoyesother13 buses-syntheticICS networkyesnonoyes2
IEEE-14-bus dataset [118]ICS2018yesyesyesother14 buses24 hsyntheticICS networkyesno-yes2
CERT Insider Threat [119]Insider Threat2016yesyesyesuser logs1,000,000monthsemulatedenterpriseyesnonoyes2
Udacity dataset [120]Other2016yesnoyesimages--realsimulationyesnonoyes-
GTSRB [121]Other2011--yesimages51,839-real--yesnoyes43
UAVid dataset [122]Other2020--yesimages3000-realUAV/Aerialnoyesnoyes8
ConsumerComplaint [123]Other2018--yesother1,200,0008 yearsreal--nonoyes10
SpeechCommands [124]Other2017--yeswav105,829-realvoice command-yes-yes35
IMDB [125]Other2011--yesother50,000-real--yesyesyes2
CIFAR-10 [126]Other2009--yesimages60,000-real--yesyesyes10

References

  1. Sommer, R.; Paxson, V. Outside the closed world: On using machine learning for network intrusion detection. In Proceedings of the 2010 IEEE Symposium on Security and Privacy (SP); IEEE: Los Alamitos, CA, USA, 2010; pp. 305–316. [Google Scholar]
  2. Li, Y.; Liu, Q. A comprehensive review study of cyber-attacks and cyber security; Emerging trends and recent developments. Energy Rep. 2021, 7, 8176–8186. [Google Scholar] [CrossRef]
  3. Hizal, S.; Cavusoglu, U.; Akgun, D. A novel deep learning-based intrusion detection system for IoT DDoS security. Internet Things 2024, 28, 101336. [Google Scholar] [CrossRef]
  4. Jada, I.; Mayayise, T.O. The impact of artificial intelligence on organisational cyber security: An outcome of a systematic literature review. Data Inf. Manag. 2024, 8, 100063. [Google Scholar] [CrossRef]
  5. Baron Garcia, A. Machine Learning and Artificial Intelligence Methods for Cybersecurity Data Within the Aviation Ecosystem. Ph.D. Thesis, Embry-Riddle Aeronautical University, Daytona Beach, FL, USA, 2022. [Google Scholar]
  6. Buczak, A.L.; Guven, E. A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection. IEEE Commun. Surv. Tutor. 2015, 17, 2501–2528. [Google Scholar] [CrossRef]
  7. Kaur, R.; Gabrijeic, D.; Klobucar, T. Artificial intelligence for cybersecurity: Literature review and future research directions. Inf. Fusion 2023, 97, 101804. [Google Scholar] [CrossRef]
  8. Mvula, P.K.; Branco, P.; Jourdan, G.V.; Viktor, H.L. A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning. Discov. Data 2023, 1, 4. [Google Scholar] [CrossRef]
  9. Koroniotis, N.; Moustafa, N.; Turnbull, B.; Choo, K.K.R. Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: The Bot-IoT Dataset. In Proceedings of the Future Generation Computer Systems; Elsevier: Amsterdam, The Netherlands, 2019. [Google Scholar]
  10. Guhan, N.K.; Ramachandran, M.; Ravindran, S.; Vijean, V. A Deep and Systematic Review of the Intrusion Detection Systems based on Machine Learning and Deep Learning Techniques. In Proceedings of the 2024 10th International Conference on Advanced Computing and Communication Systems (ICACCS); IEEE: New York, NY, USA, 2024; p. 1564. [Google Scholar] [CrossRef]
  11. Bhavyashree, Y.R.; Kavyashree, M.K.; Amrutha, K.R. Systematic Review on Frameworks for Intrusion Detection using Machine Learning and Deep Learning Algorithms. In Proceedings of the 2024 Second International Conference on Networks, Multimedia and Information Technology (NMITCON); IEEE: New York, NY, USA, 2024; pp. 1–12. [Google Scholar] [CrossRef]
  12. Ali, T.; Eleyan, A.; Bejaoui, T. Detecting Conventional and Adversarial Attacks Using Deep Learning Techniques: A Systematic Review. In Proceedings of the 2023 International Symposium on Networks, Computers and Communications (ISNCC); IEEE: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
  13. Gamage, S.; Samarabandu, J. Deep learning methods in network intrusion detection: A survey and an objective comparison. J. Netw. Comput. Appl. 2020, 169, 102767. [Google Scholar] [CrossRef]
  14. Tsai, C.F.; Hsu, Y.F.; Lin, C.Y.; Lin, W.Y. Intrusion detection by machine learning: A review. Expert Syst. Appl. 2009, 36, 11994–12000. [Google Scholar] [CrossRef]
  15. Pingala Suthishni, D.N.; Kumar, K.S.S. A Review on Machine Learning based Security Approaches in Intrusion Detection System. In Proceedings of the 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom); IEEE: Piscataway, NJ, USA, 2022; pp. 101–105. [Google Scholar] [CrossRef]
  16. Azmoodeh, A.; Al-Rawi, W.; Al-Dahhan, M.; Ghita, B. Detecting Cyber Attacks in Industrial Control Systems Using Convolutional Neural Networks. arXiv 2018. [Google Scholar] [CrossRef]
  17. Ring, M.; Wunderlich, S.; Scheuring, D.; Landes, D.; Hotho, A. A survey of network-based intrusion detection data sets. Comput. Secur. 2019, 86, 147–167. [Google Scholar] [CrossRef]
  18. Yang, Z.; Liu, X.; Li, T.; Wu, D.; Wang, J.; Zhao, Y.; Han, H. A systematic literature review of methods and datasets for anomaly-based network intrusion detection. Comput. Secur. 2022, 116, 102675. [Google Scholar] [CrossRef]
  19. Strom, B.; Applebaum, A.; Miller, D.; Nickels, K.; Pennington, A.; Thomas, C. MITRE ATT&CK™: Design and Philosophy; MITRE Corporation: Bedford, MA, USA, 2018. [Google Scholar]
  20. Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set (NSL-KDD). In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), Ottawa, ON, Canada, 8–10 July 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–6. [Google Scholar] [CrossRef]
  21. Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS); IEEE: Piscataway, NJ, USA, 2015; pp. 1–6. [Google Scholar] [CrossRef]
  22. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. CSE-CIC-IDS2017 Dataset: Intrusion Detection Evaluation Dataset; Canadian Institute for Cybersecurity (CIC), University of New Brunswick: Fredericton, NB, Canada, 2017. [Google Scholar]
  23. Gad, A.R.; Nashat, A.A.; Barkat, T.M. Intrusion Detection System Using Machine Learning for Vehicular Ad Hoc Networks Based on ToN-IoT Dataset. IEEE Access 2021, 9, 142206–142217. [Google Scholar] [CrossRef]
  24. Sowmya, T.; Mary Anita, E.A. A Comprehensive Review of AI Based Intrusion Detection System. Meas. Sens. 2023, 28, 100827. [Google Scholar] [CrossRef]
  25. Salem, A.H.; Azzam, S.M.; Emam, O.E.; Abohany, A.A. Advancing Cybersecurity: A Comprehensive Review of AI-Driven Detection Techniques. J. Big Data 2024, 11, 105. [Google Scholar] [CrossRef]
  26. Ofusori, L.; Bokaba, T.; Mhlongo, S. Artificial Intelligence in Cybersecurity: A Comprehensive Review and Future Direction. Appl. Artif. Intell. 2024, 38, 2439609. [Google Scholar] [CrossRef]
  27. Rehman, H.M.R.U.; Liaquat, S.; Gul, M.J.; Jhandir, M.Z.; Gavilanes, D.; Masias Vergara, M.; Ashraf, I. A Systematic Literature Study of Machine Learning Techniques Based Intrusion Detection: Datasets, Models, Challenges, and Future Directions. J. Big Data 2025, 12, 264. [Google Scholar] [CrossRef]
  28. Hozouri, A.; Mirzaei, A.; Effatparvar, M. A Comprehensive Survey on Intrusion Detection Systems with Advances in Machine Learning, Deep Learning and Emerging Cybersecurity Challenges. Discov. Artif. Intell. 2025, 5, 314. [Google Scholar] [CrossRef]
  29. Dobler, M.; Hellwig, M.; Lopes, N.; Oakley, K.; Winterburn, M. Systematic Review and Characterisation of Malicious Industrial Network Traffic Datasets. Int. J. Inf. Secur. 2025, 24, 208. [Google Scholar] [CrossRef]
  30. Alnabhan, M.Q.; Branco, P. Fake News Detection Using Deep Learning: A Systematic Literature Review. IEEE Access 2024, 12, 114435–114459. [Google Scholar] [CrossRef]
  31. Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; Technical Report EBSE-2007-01; Keele University: Staffordshire, UK, 2007. [Google Scholar]
  32. Zhu, B.; Joseph, A.; Sastry, S. A Taxonomy of Cyber Attacks on SCADA Systems. In Proceedings of the 2011 IEEE International Conferences on Internet of Things, and Cyber, Physical and Social Computing; IEEE: Piscataway, NJ, USA, 2011; pp. 380–385. [Google Scholar]
  33. Ozkan Okay, M.; Iliev, T.; Akin, E.; Aslan, O.; Kosunalp, S.; Stoyanov, I.; Beloev, I. A Comprehensive Survey: Evaluating the Efficiency of Artificial Intelligence and Machine Learning Techniques on Cyber Security Solutions. IEEE Access 2024, 12, 12229–12255. [Google Scholar] [CrossRef]
  34. Wu, M.; Moon, Y.B. Taxonomy of Cross-Domain Attacks on CyberManufacturing System. Procedia Comput. Sci. 2017, 114, 367–374. [Google Scholar] [CrossRef]
  35. Wu, M.; Moon, Y.B. Taxonomy for secure cybermanufacturing systems. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition Proceedings; ASME: New York, NY, USA, 2018; Volume 2, pp. 1–10. [Google Scholar] [CrossRef]
  36. Pan, Y.; White, J.; Schmidt, D.C.; Elhabashy, A.; Sturm, L.; Camelio, J.; Williams, C. Taxonomies for Reasoning About Cyber-physical Attacks in IoT-based Manufacturing Systems. Int. J. Interact. Multimed. Artif. Intell. 2017, 4, 45–54. [Google Scholar] [CrossRef]
  37. Tuptuk, N.; Hailes, S. Security of smart manufacturing systems. J. Manuf. Syst. 2018, 47, 93–106. [Google Scholar] [CrossRef]
  38. Wu, D.; Ren, A.; Zhang, W.; Fan, F.; Liu, P.; Fu, X.; Terpenny, J. Cybersecurity for digital manufacturing. J. Manuf. Syst. 2018, 48, 3–12. [Google Scholar] [CrossRef]
  39. Yampolskiy, M.; King, W.E.; Gatlin, J.; Belikovetsky, S.; Brown, A.; Skjellum, A.; Elovici, Y. Security of additive manufacturing: Attack taxonomy and survey. Addit. Manuf. 2018, 21, 431–457. [Google Scholar] [CrossRef]
  40. Elhabashy, A.E.; Wells, L.J.; Camelio, J.A.; Woodall, W.H. A cyber-physical attack taxonomy for production systems: A quality control perspective. J. Intell. Manuf. 2019, 30, 2489–2504. [Google Scholar] [CrossRef]
  41. Barnum, S. Common Attack Pattern Enumeration and Classification (CAPEC) Schema Description; Technical Report; MITRE Corporation: McLean, VA, USA, 2008. [Google Scholar]
  42. Hansman, S.; Hunt, R. A taxonomy of network and computer attacks. Comput. Secur. 2005, 24, 31–43. [Google Scholar] [CrossRef]
  43. Meyers, C.A.; Powers, S.S.; Faissol, D.M. Taxonomies of Cyber Adversaries and Attacks: A Survey of Incidents and Approaches; Technical Report; Lawrence Livermore National Laboratory: Livermore, CA, USA, 2009. [Google Scholar]
  44. Chapman, I.M.; Leblanc, S.P.; Partington, A. Taxonomy of cyber attacks and simulation of their effects. In Proceedings of the Military Modeling and Simulation Symposium; The Society for Modeling and Simulation International (SCS): San Diego, CA, USA, 2011; pp. 73–80. [Google Scholar]
  45. Simmons, C.B.; Shiva, S.G.; Bedi, H.; Dasgupta, D. AVOIDIT: A Cyber Attack Taxonomy. In Proceedings of the 9th Annual Symposium on Information Assurance; University at Albany, State University of New York: Albany, NY, USA, 2014; pp. 2–12. [Google Scholar]
  46. Emmert-Streib, F.; Dehmer, M. Taxonomy of machine learning paradigms: A data-centric perspective. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2022, 12, e1470. [Google Scholar] [CrossRef]
  47. von Rueden, L.; Mayer, S.; Beckh, K.; Georgiev, B.; Giesselbach, S.; Heese, R.; Kirsch, B.; Pfrommer, J.; Pick, A.; Bauckhage, C.; et al. Informed machine learning—A taxonomy and survey of integrating knowledge into learning systems. arXiv 2019, arXiv:1903.12394. Available online: https://arxiv.org/abs/1903.12394 (accessed on 11 May 2026). [CrossRef]
  48. Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  49. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  50. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  51. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  52. Dietterich, T.G. Ensemble methods in machine learning. In Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar] [CrossRef]
  53. Ozlem, M.; Turk, A.; Yavuz, A. A review on cyber security dataset for machine learning algorithms. In Proceedings of the IEEE International Conference on Big Data (Big Data); IEEE: Piscataway, NJ, USA, 2017; pp. 2186–2193. [Google Scholar] [CrossRef]
  54. Mississippi State University. Gas Pipeline Intrusion Dataset; Mississippi State University, Critical Infrastructure Protection Center: Starkville, MS, USA, 2020; Available online: https://sites.google.com/a/uah.edu/tommy-morris-uah/ics-data-sets (accessed on 13 January 2025).
  55. Stolfo, S.; Fan, W.; Lee, W.; Prodromidis, A.; Chan, P. KDD Cup 1999 Data. UCI Machine Learning Repository. 1999. Available online: https://archive.ics.uci.edu/dataset/130/kdd+cup+1999+data (accessed on 13 February 2025).
  56. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. CSE-CIC-IDS2018 Dataset; Canadian Institute for Cybersecurity (CIC), University of New Brunswick: Fredericton, NB, Canada, 2018. [Google Scholar]
  57. Sharafaldin, I.; Lashkari, A.H.; Hakak, S.; Ghorbani, A.A. Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy. In Proceedings of the 2019 International Carnahan Conference on Security Technology (ICCST), Chennai, India, 1–3 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar] [CrossRef]
  58. Damasevicius, R.; Venckauskas, A.; Grigaliunas, S.; Toldinas, J.; Morkevicius, N.; Aleliunas, T.; Smuikys, P. LITNET-2020: An annotated real-world network flow dataset for network intrusion detection. Electronics 2020, 9, 800. [Google Scholar] [CrossRef]
  59. Barut, O.; Luo, Y.; Zhang, T.; Li, W.; Li, P. NetML: A challenge for network traffic analytics. arXiv 2020, arXiv:2004.13006. [Google Scholar] [CrossRef]
  60. Samarakoon, S.; Siriwardhana, Y.; Porambage, P.; Liyanage, M.; Chang, S.; Kim, J.; Kim, J.; Ylianttila, M. 5G-NIDD: A Comprehensive Network Intrusion Detection Dataset Generated over 5G Wireless Network; IEEE Dataport; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
  61. Kumar, P.; Liu, J.; Tayeen, A.S.M.; Misra, S.; Cao, H.; Harikumar, J.; Perez, O. FLNET2023: Realistic Network Intrusion Detection Dataset for Federated Learning. In Proceedings of the Proceedings of the MILCOM 2023–IEEE Military Communications Conference (MILCOM), Boston, MA, USA, 30 October–3 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 345–350. [Google Scholar] [CrossRef]
  62. Haider, W.; Hu, J.; Slay, J.; Turnbull, B.; Xie, Y. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J. Netw. Comput. Appl. 2017, 87, 185–192. [Google Scholar] [CrossRef]
  63. CAIDA. The CAIDA DDoS Attack 2007 Dataset; Center for Applied Internet Data Analysis, University of California San Diego: San Diego, CA, USA, 2007; Available online: https://www.caida.org/catalog/datasets/ddos-20070804_dataset/ (accessed on 10 May 2025).
  64. BoNeSi—The DDoS Botnet Simulator. 2020. Available online: https://github.com/Markus-Go/bonesi (accessed on 26 February 2022).
  65. Jonker, M.; Sperotto, A.; Pras, A. DDoSDB dataset: DDoS Mitigation—A Measurement-Based Approach. In Proceedings of the NOMS 2020—IEEE/IFIP Network Operations and Management Symposium, Budapest, Hungary, 20–24 April 2022; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
  66. Samiullah, H. Application Layer DoS Attack Dataset. 2021. Available online: https://www.kaggle.com/hamzasamiullah/ml-analysis-application-layer-dos-attack-dataset (accessed on 30 August 2021).
  67. Giménez, C.T.; Villegas, A.P.; Marañón, G.Á. HTTP Data Set CSIC 2010; Information Security Institute of CSIC (Spanish Research National Council): Madrid, Spain, 2010. [Google Scholar]
  68. Raïssi, C.; Brissaud, J.; Dray, G.; Poncelet, P.; Roche, M.; Teisseire, M. Web Analyzing Traffic Challenge: Description and Results. In Proceedings of the ECML PKDD 2007 Discovery Challenge, Warsaw, Poland, 17–21 September 2007. [Google Scholar]
  69. Digital Corpora. NPS-2009-Casper-Rw Dataset. 2009. Available online: https://digitalcorpora.org/corpora/disk-images/ (accessed on 11 May 2026).
  70. Hostiadi, D.P.; Ahmad, T. Dataset for Botnet group activity with adaptive generator. Data Brief 2021, 38, 107334. [Google Scholar] [CrossRef] [PubMed]
  71. Putra, M.A.R.; Hostiadi, D.P.; Ahmad, T. Botnet dataset with simultaneous attack activity. Data Brief 2022, 45, 108628. [Google Scholar] [CrossRef]
  72. Tayfour, O.E.; Mubarakali, A.; Tayfour, A.E.; Marsono, M.N.; Hassan, E.; Abdelrahman, A.M. Adapting deep learning-LSTM method using optimized dataset in SDN controller for secure IoT. Soft Comput. 2023, 27, 1–9. [Google Scholar] [CrossRef]
  73. Benign and Malicious Domains Based on DNS Logs. Benign and Malicious Domains Based on DNS Logs Dataset. 2022. Available online: https://data.mendeley.com/datasets/623sshkdrz/5 (accessed on 24 May 2022).
  74. Garcia, S.; Grill, M.; Stiborek, J.; Zunino, A. An empirical comparison of botnet detection methods. In Proceedings of the 2014 IEEE 32nd International Conference on Performance, Computing and Communications Conference (IPCCC); Elsevier: Amsterdam, The Netherlands, 2014; pp. 1–8. [Google Scholar] [CrossRef]
  75. Wang, W.; Zhu, M.; Zeng, X.; Ye, X.; Sheng, Y. Malware traffic classification using convolutional neural network for representation learning. In 2017 International Conference on Information Networking (ICOIN); IEEE: Piscataway, NJ, USA, 2017; pp. 712–717. [Google Scholar] [CrossRef]
  76. Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-BaIoT: Network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Comput. 2018, 17, 12–22. [Google Scholar] [CrossRef]
  77. Pa, Y.M.P.; Suzuki, S.; Yoshioka, K.; Matsumoto, T.; Kasama, T.; Rossow, C. IoTPOT: Analysing the rise of IoT compromises. In Proceedings of the 9th USENIX Workshop on Offensive Technologies (WOOT), Washington, DC, USA, 10–11 August 2014. [Google Scholar]
  78. García, S.; Shuvaev, S.; Uritskaya, A. IoT-23: A Labeled Dataset with Malicious and Benign IoT Network Traffic; Stratosphere Laboratory, Czech Technical University: Prague, Czech Republic, 2020. [Google Scholar]
  79. Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning. IEEE Access 2022, 10, 40281–40306. [Google Scholar] [CrossRef]
  80. Dadkhah, S.; Mahdikhani, H.; Danso, P.K.; Zohourian, A.; Truong, K.A.; Ghorbani, A.A. Towards the development of a realistic multidimensional IoT profiling dataset. In Proceedings of the 19th Annual International Conference on Privacy, Security and Trust (PST); IEEE: Piscataway, NJ, USA, 2022; pp. 1–11. [Google Scholar] [CrossRef]
  81. Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CicIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment. Sensors 2023, 23, 5941. [Google Scholar] [CrossRef] [PubMed]
  82. Hamza, A.; Gharakheili, H.H.; Benson, T.A.; Sivaraman, V. UNSW IoT Traffic Attack Dataset. In Proceedings of the 2019 ACM Symposium on SDN Research (SOSR); ACM: New York, NY, USA, 2019; pp. 36–48. [Google Scholar] [CrossRef]
  83. Aramini, A.; Arazzi, M.; Facchinetti, T.; Ngankem, L.S.; Nocera, A. Distributed IoT Traffic Attack Dataset. In Proceedings of the 2022 IEEE 18th International Conference on Factory Communication Systems (WFCS); IEEE: Piscataway, NJ, USA, 2022; pp. 1–8. [Google Scholar] [CrossRef]
  84. Emec, M. ROUT-4-2023: RPL Based Routing Attack Dataset for IoT. IEEE Dataport. 2023. Available online: https://ieee-dataport.org/documents/rout-4-2023-rpl-based-routing-attack-dataset-iot (accessed on 14 June 2024).
  85. Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, Y. Kitsune: An ensemble of autoencoders for online network intrusion detection. arXiv 2018, arXiv:1802.09089. [Google Scholar] [CrossRef]
  86. Samson, K. Wi-Fi Association and Disassociation Dataset. 2023. Available online: https://github.com/samsonkg/Wi-Fi-Association_Disassociation-Dataset (accessed on 26 August 2023).
  87. Pazul, K. Controller Area Network (CAN) Basics, 1999. Available online: https://cika.com/soporte/Information/Microchip/AnalogInterface/CAN/AppNotes/AN713(DS00713a).pdf (accessed on 20 May 2025).
  88. Song, H.M.; Woo, J.; Kim, H.K. In-vehicle network intrusion detection using deep convolutional neural network. Veh. Commun. 2020, 21, 100198. [Google Scholar] [CrossRef]
  89. Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B.S. Malware images. In Proceedings of the 8th International Symposium on Visualization for Cyber Security (VizSec), Pittsburgh, PA, USA, 20 July 2011; ACM: New York, NY, USA, 2011; pp. 1–7. [Google Scholar] [CrossRef]
  90. Ronen, R.; Radu, M.; Feuerstein, C.; Yom-Tov, E.; Ahmadi, M. Microsoft Malware Classification Challenge. arXiv 2018, arXiv:1802.10135. [Google Scholar] [CrossRef]
  91. Bozkir, A.S.; Cankaya, A.O.; Aydos, M. Utilization and Comparison of Convolutional Neural Networks in Malware Recognition. In Proceedings of the 27th Signal Processing and Communications Applications Conference (SIU), Sivas, Turkey, 24–26 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [Google Scholar] [CrossRef]
  92. Nappa, A.; Rafique, M.Z.; Caballero, J. The MALICIA dataset: Identification and analysis of drive-by download operations. Int. J. Inf. Secur. 2015, 14, 15–33. [Google Scholar] [CrossRef]
  93. Drebin: Android Malware Dataset. 2014. Available online: https://drebin.mlsec.org/ (accessed on 16 July 2025).
  94. VX Heavens. 2021. Available online: https://vx-underground.org/Archive (accessed on 6 July 2021).
  95. VirusShare Dataset. 2021. Available online: https://virusshare.com/ (accessed on 6 July 2021).
  96. VirusTotal. VirusTotal: Free Online Virus, Malware and URL Scanner. Available online: https://www.virustotal.com/ (accessed on 11 May 2026).
  97. Ullah, I.; Ahmad, J.; Ahmed, I.; Amin, R.; Imran, M. CIC-MalMem-2022: Malware detection in memory dumps using machine learning. In Proceedings of the 2022 International Conference on Cyber Security and Resilience (CSR); IEEE: Piscataway, NJ, USA, 2022; pp. 153–159. [Google Scholar] [CrossRef]
  98. Maniriho, P.; Mahmood, A.N.; Chowdhury, M.J.M.C. MeMalDet: A Memory Analysis-Based Malware Detection Framework Using Deep Autoencoders and Stacked Ensemble under Temporal Evaluations. Comput. Secur. 2024, 142, 103864. [Google Scholar] [CrossRef]
  99. Canadian Institute for Cybersecurity (CIC). CIC-CMD-2024: Command and Control Malware Dataset. 2024. Available online: https://www.kaggle.com/datasets/datasetengineer/cybertec-iiot-malware-dataset-cimd-2024 (accessed on 11 May 2026).
  100. Hopkins, M.; Reeber, E.; Forman, G.; Suermondt, J. Spambase Dataset UCI Machine Learning Repository. 1999. Available online: https://archive.ics.uci.edu/dataset/94/spambase (accessed on 11 May 2026).
  101. Biswas, B. Email Spam Classification Dataset CSV. 2020. Available online: https://www.kaggle.com/balaka18/email-spam-classification-dataset-csv (accessed on 5 May 2022).
  102. Nitisha. Email Spam Dataset. 2020. Available online: https://www.kaggle.com/nitishabharathi/email-spam-dataset (accessed on 1 May 2022).
  103. Naidu, C. Spam Classification for Basic NLP. 2021. Available online: https://kaggle.com/chandramoulinaidu/spam-classification-for-basic-nlp (accessed on 15 January 2022).
  104. Murthy, M.Y.B.; Mastanbi, S.; Sujitha, B.; Babu, K.R. Evaluating deep learning algorithms for natural language processing. In Algorithms for Intelligent Systems; Springer Nature: Singapore, 2023; pp. 709–720. [Google Scholar]
  105. Kaggle. Phishing Email Collection. 2020. Available online: https://www.kaggle.com/datasets/akashsurya156/phishing-paper1 (accessed on 11 May 2026).
  106. Jagtap, S. Kaggle Bot Account Detection Dataset. Available online: https://www.kaggle.com/datasets/shriyashjagtap/kaggle-bot-account-detection/data (accessed on 11 May 2026).
  107. MITRE. Sharing Threat Intelligence Just Got a lot Easier. 2018. Available online: https://oasis-open.github.io/cti-documentation/stix/intro (accessed on 31 December 2022).
  108. Zeng, V.; Baki, S.; Aassal, A.E.; Verma, R.; Moraes, L.F.T.D.; Das, A. Diverse datasets and a customizable benchmarking framework for phishing. In Proceedings of the Proceedings 6th International Workshop on Security and Privacy Analytics, New Orleans, LA, USA, 18 March 2020; ACM: New York, NY, USA, 2020; pp. 35–41. [Google Scholar]
  109. Chiew, K.L.; Chang, E.H.; Tan, C.L.; Abdullah, J.; Yong, K.C. Building standard offline anti-phishing dataset for benchmarking. Int. J. Eng. Technol. 2018, 7, 71–74. [Google Scholar] [CrossRef]
  110. Ariyadasa, S.; Fernando, S.; Fernando, S. Phishing Websites Dataset. Mendeley Data. 2021. Available online: https://data.mendeley.com/datasets/n96ncsr5g4/1 (accessed on 10 May 2025).
  111. Bahnsen, A.C.; Bohorquez, E.C.; Villegas, S.; Vargas, J.; Gonzalez, F.A. Classifying phishing URLs using recurrent neural networks. In Proceedings of the Proceedings APWG Symposium on Electronic Crime Research (eCrime), Scottsdale, Arizona, USA, 25–27 April 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–8. [Google Scholar]
  112. Mukherjee, A.; Venkataraman, V.; Liu, B.; Glance, N. What yelp fake review filter might be doing? In Proceedings of the 7th International Conference on Weblogs and Social Media (ICWSM); AAAI: Palo Alto, CA, USA, 2013; pp. 409–418. [Google Scholar]
  113. Rayana, S.; Akoglu, L. Collective opinion spam detection: Bridging review networks and metadata. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD); ACM: New York, NY, USA, 2015; pp. 985–994. [Google Scholar] [CrossRef]
  114. Wang, W.; Harrou, F.; Bouyeddou, B.; Senouci, S.M.; Sun, Y. A stacked deep learning approach to cyber-attacks detection in industrial systems: Application to power system and gas pipeline systems. Clust. Comput. 2022, 25, 561–578. [Google Scholar] [CrossRef]
  115. Mathur, A.P.; Tippenhauer, N.O. SWaT: A water treatment testbed for research and training on ICS security. In Proceedings of the 2016 International Workshop on Cyber-physical Systems for Smart Water Networks (CySWater); IEEE: Piscataway, NJ, USA, 2016. [Google Scholar] [CrossRef]
  116. Mubarak, S.; Habaebi, M.H.; Islam, M.R.; Balla, A.; Tahir, M. Industrial datasets with ICSs testbed and attack detection using machine learning techniques. Intell. Autom. Soft Comput. 2022, 31, 1345–1360. [Google Scholar] [CrossRef]
  117. Quincozes, S.E.; Albuquerque, C.; Passos, D.G.; Mossé, D. ERENO: A framework for generating realistic IEC-61850 intrusion detection datasets for smart grids. IEEE Trans. Dependable Secur. Comput. 2023, 21, 3851–3865. [Google Scholar] [CrossRef]
  118. Xu, Z. IEEE 118-Bus, 300-Bus and 3266-Bus System Dataset for Unit Commitment; IEEE DataPort; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar] [CrossRef]
  119. Software Engineering Institute (CERT Division), Carnegie Mellon University. Insider Threat Test Dataset (Versions r4–r6). Data Set, 2020. Synthetic Insider Threat Logs, Including Releases r4.x Through r6.x. Available online: https://kilthub.cmu.edu/articles/dataset/Insider_Threat_Test_Dataset/12841247/1 (accessed on 10 May 2025).
  120. Udacity. An Open Source Self-Driving Car. 2016. Available online: https://www.udacity.com/ (accessed on 10 May 2025).
  121. Stallkamp, J.; Schlipsing, M.; Salmen, J.; Igel, C. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 2012, 32, 323–332. [Google Scholar] [CrossRef] [PubMed]
  122. Unmanned Aerial Vehicle (UAV) Intrusion Detection. 2020. Available online: https://archive.ics.uci.edu/dataset/564/unmanned+aerial+vehicle+uav+intrusion+detection (accessed on 10 May 2025).
  123. Consumer Complaint Database. 2019. Available online: https://catalog.data.gov/dataset/consumer-complaint-database (accessed on 10 May 2025).
  124. TensorFlow Speech Recognition Challenge. 2019. Available online: https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/data (accessed on 11 January 2025).
  125. IMDB Dataset of 50K Movie Reviews. 2019. Available online: https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews (accessed on 1 May 2024).
  126. Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; Technical report; Citeseer: University Park, PA, USA, 2009. [Google Scholar]
Figure 1. Study selection process: filtering and inclusion workflow.
Figure 1. Study selection process: filtering and inclusion workflow.
Electronics 15 02804 g001
Figure 2. Trend of the ten most frequently represented MITRE ATT&CK tactics across the reviewed studies from 2019 to 2024. Each line shows the yearly frequency with which a given tactic appeared in the reviewed literature.
Figure 2. Trend of the ten most frequently represented MITRE ATT&CK tactics across the reviewed studies from 2019 to 2024. Each line shows the yearly frequency with which a given tactic appeared in the reviewed literature.
Electronics 15 02804 g002
Figure 3. Trend of the ten most frequently represented MITRE ATT&CK techniques across the reviewed studies from 2019 to 2024. Each line shows the yearly frequency with which a given technique appeared in the reviewed literature.
Figure 3. Trend of the ten most frequently represented MITRE ATT&CK techniques across the reviewed studies from 2019 to 2024. Each line shows the yearly frequency with which a given technique appeared in the reviewed literature.
Electronics 15 02804 g003
Figure 4. Co-occurrence heatmap of MITRE ATT&CK tactics across the 99 reviewed studies. Each cell reports the number of papers in which the corresponding pair of tactics co-appeared within the same study. Darker cells indicate higher co-occurrence frequencies, and diagonal values are set to zero because self-co-occurrence is suppressed.
Figure 4. Co-occurrence heatmap of MITRE ATT&CK tactics across the 99 reviewed studies. Each cell reports the number of papers in which the corresponding pair of tactics co-appeared within the same study. Darker cells indicate higher co-occurrence frequencies, and diagonal values are set to zero because self-co-occurrence is suppressed.
Electronics 15 02804 g004
Figure 5. Co-occurrence heatmap of the ten most frequently represented MITRE ATT&CK techniques across the 99 reviewed studies. Each cell reports the number of papers in which the corresponding pair of techniques co-appeared within the same study. Darker cells indicate higher co-occurrence frequencies, and diagonal values are set to zero because self-co-occurrence is suppressed.
Figure 5. Co-occurrence heatmap of the ten most frequently represented MITRE ATT&CK techniques across the 99 reviewed studies. Each cell reports the number of papers in which the corresponding pair of techniques co-appeared within the same study. Darker cells indicate higher co-occurrence frequencies, and diagonal values are set to zero because self-co-occurrence is suppressed.
Electronics 15 02804 g005
Figure 6. Trend of machine learning main-category usage across the reviewed studies from 2019 to 2024. Each line shows the yearly frequency with which a given ML main category appeared in the reviewed literature.
Figure 6. Trend of machine learning main-category usage across the reviewed studies from 2019 to 2024. Each line shows the yearly frequency with which a given ML main category appeared in the reviewed literature.
Electronics 15 02804 g006
Figure 7. Trend of the ten most frequently represented ML method subcategories across the reviewed studies from 2019 to 2024. Each line shows the yearly frequency with which a given subcategory appeared in the reviewed literature.
Figure 7. Trend of the ten most frequently represented ML method subcategories across the reviewed studies from 2019 to 2024. Each line shows the yearly frequency with which a given subcategory appeared in the reviewed literature.
Electronics 15 02804 g007
Figure 8. Co-occurrence heatmap of ML main categories across the reviewed studies. Each cell reports the number of papers in which the corresponding pair of ML main categories co-appeared within the same study. Darker cells indicate higher co-occurrence frequencies, and diagonal values are set to zero because self-co-occurrence is suppressed.
Figure 8. Co-occurrence heatmap of ML main categories across the reviewed studies. Each cell reports the number of papers in which the corresponding pair of ML main categories co-appeared within the same study. Darker cells indicate higher co-occurrence frequencies, and diagonal values are set to zero because self-co-occurrence is suppressed.
Electronics 15 02804 g008
Figure 9. Co-occurrence heatmap of the ten most frequently represented ML method subcategories across the reviewed studies. Each cell reports the number of papers in which the corresponding pair of ML subcategories co-appeared within the same study. Darker cells indicate higher co-occurrence frequencies, and diagonal values are set to zero because self-co-occurrence is suppressed.
Figure 9. Co-occurrence heatmap of the ten most frequently represented ML method subcategories across the reviewed studies. Each cell reports the number of papers in which the corresponding pair of ML subcategories co-appeared within the same study. Darker cells indicate higher co-occurrence frequencies, and diagonal values are set to zero because self-co-occurrence is suppressed.
Electronics 15 02804 g009
Figure 10. Trend of dataset-category usage across the reviewed studies from 2019 to 2024. Each line shows the yearly frequency with which datasets from a given category appeared in the reviewed literature.
Figure 10. Trend of dataset-category usage across the reviewed studies from 2019 to 2024. Each line shows the yearly frequency with which datasets from a given category appeared in the reviewed literature.
Electronics 15 02804 g010
Figure 11. Co-occurrence heatmap of the six most frequently represented dataset categories across the reviewed studies. Each cell reports the number of papers in which the corresponding pair of dataset categories co-appeared within the same study. Darker cells indicate higher co-occurrence frequencies, and diagonal values are set to zero because self-co-occurrence is suppressed.
Figure 11. Co-occurrence heatmap of the six most frequently represented dataset categories across the reviewed studies. Each cell reports the number of papers in which the corresponding pair of dataset categories co-appeared within the same study. Darker cells indicate higher co-occurrence frequencies, and diagonal values are set to zero because self-co-occurrence is suppressed.
Electronics 15 02804 g011
Figure 12. Distribution of dataset usage across the reviewed studies. Most studies relied on a single dataset, whereas a smaller proportion evaluated models using multiple datasets. Percentages are reported relative to the total number of reviewed papers represented in the figure.
Figure 12. Distribution of dataset usage across the reviewed studies. Most studies relied on a single dataset, whereas a smaller proportion evaluated models using multiple datasets. Percentages are reported relative to the total number of reviewed papers represented in the figure.
Electronics 15 02804 g012
Figure 13. Heatmap of MITRE ATT&CK tactics versus ML main categories across the reviewed studies. Each cell reports the number of papers in which a given tactic was associated with a given ML main category. Darker cells indicate higher frequencies of co-occurrence in the literature.
Figure 13. Heatmap of MITRE ATT&CK tactics versus ML main categories across the reviewed studies. Each cell reports the number of papers in which a given tactic was associated with a given ML main category. Darker cells indicate higher frequencies of co-occurrence in the literature.
Electronics 15 02804 g013
Table 1. Positioning of this review against representative recent surveys.
Table 1. Positioning of this review against representative recent surveys.
ReviewTime Window/FocusMain Analytical EmphasisAttack TaxonomyDataset Taxonomy DepthATT&CK MappingAttack–Method–Dataset Cross-ReferenceMain Limitation Relative to This Study
Sowmya et al. (2023) [24]72-paper review of AI-based IDSML, DL, and ensemble methods for intrusion detectionNo explicit ATT&CK-style taxonomyModerateNoLimitedIDS-centered rather than ATT&CK-aligned or tri-axis.
Mvula et al. (2023) [8]SSL-focused SLR on cybersecurity datasets and metricsDataset repositories and performance metricsNo explicit attack taxonomyStrongNoLimitedDataset- and metric-centred, not a tri-axis synthesis.
Salem et al. (2024) [25]Review of more than sixty AI-driven cyber-threat studiesBroad comparison of ML, DL, and metaheuristicsBroad attack coverage, but no ATT&CK-based taxonomyModerateNoLimitedTechnique-centred rather than ATT&CK-organized.
Ofusori et al. (2024) [26]Broad review of AI in cybersecurityApplications, trends, and future directionsNo structured threat taxonomyLimited–moderateNoNoToo high-level to expose specific attack–method–dataset gaps.
Rehman et al. (2025) [27]Systematic review of ML-based intrusion detectionModels, datasets, metrics, and challengesIDS/domain framing rather than ATT&CK tactics/
techniques
Moderate–strongNoLimitedClose in topic, but still IDS-centric and not ATT&CK-aligned.
Hozouri et al. (2025) [28]Survey of IDS with ML/DL advancesIDS architectures, benchmark datasets, and emerging challengesNo explicit ATT&CK-based taxonomyModerateNoLimitedStrong IDS synthesis, but not a behavioural cross-reference review.
Dobler et al. (2025) [29]Systematic review of malicious industrial traffic datasetsDataset characterization and ML-oriented selectionIndustrial attack types, but not ATT&CK as the main frameStrongNoLimitedDomain-specific dataset review rather than a broader tri-axis synthesis.
This review2019–2025; 99 studiesTri-axis synthesis across attacks, ML methods, and datasetsMITRE ATT&CK-alignedStrongYesYesDesigned to expose underexplored intersections, benchmark dependence, and gaps across attack behaviours, model families, and dataset categories.
Dataset taxonomy depth is reported qualitatively. Limited indicates illustrative mention only, moderate indicates structured but bounded discussion, and strong indicates a dedicated taxonomy or systematic dataset characterization. Attack–method–dataset cross-reference indicates whether a review explicitly synthesizes these three dimensions together rather than discussing them separately.
Table 2. Research Questions.
Table 2. Research Questions.
RQResearch question
RQ1What types of cyberattacks are most frequently studied, and how have they evolved?
RQ2Which machine learning and deep learning techniques are applied to mitigate attacks?
RQ3What datasets are commonly used in AI-powered cybersecurity research?
RQ4Which ML techniques are associated with specific categories of cyberattacks?
RQ5What are the key gaps and limitations in applying AI-powered methods for attack mitigation?
Table 3. Inclusion and exclusion criteria.
Table 3. Inclusion and exclusion criteria.
CriterionDescription
InclusionPeer-reviewed articles published between 2019 and 2025, written in English, addressing AI- or ML-based methods for cyberattack detection, classification, or mitigation, and reporting identifiable model/classifier and dataset information.
ExclusionNon-peer-reviewed works, duplicate records, papers outside computer science, cybersecurity, or closely related AI-for-security domains, short or insufficiently detailed papers (fewer than five pages), and studies lacking the methodological detail required for structured comparison.
Table 4. Lightweight quality-appraisal criteria applied to included studies.
Table 4. Lightweight quality-appraisal criteria applied to included studies.
CriterionDescriptionScoring
Q1The study clearly specifies the attack type, family, or adversarial behaviour under analysis.0/1
Q2The study clearly identifies the ML/DL method, model family, or detection pipeline used.0/1
Q3The dataset or data source is clearly reported and sufficiently identifiable.0/1
Q4The evaluation setting, metrics, or experimental design is sufficiently described for interpretation.0/1
Q5The study provides enough methodological detail to support comparative synthesis.0/1
Table 5. MITRE ATT&CK tactic counts.
Table 5. MITRE ATT&CK tactic counts.
TacticCountPercentage
Impact7214.55%
Initial Access5911.92%
Execution5811.72%
Command and Control5511.11%
Reconnaissance5410.91%
Credential Access428.48%
Defense Evasion316.26%
Discovery285.66%
Lateral Movement234.65%
Exfiltration214.24%
Persistence173.43%
Collection173.43%
Privilege Escalation132.63%
Resource Development51.01%
Table 6. Top 10 most frequent MITRE ATT&CK techniques.
Table 6. Top 10 most frequent MITRE ATT&CK techniques.
TechniqueCountPercentage
Network Denial of Service617.71%
Endpoint Denial of Service445.56%
Exploit Public-Facing Application435.44%
Active Scanning405.06%
Gather Victim Host Information364.55%
Brute Force344.30%
Application Layer Protocol324.05%
Command-Line Interface324.05%
Phishing253.16%
Input Capture232.91%
Table 7. Distribution of tactics per paper (total papers: 99).
Table 7. Distribution of tactics per paper (total papers: 99).
Number of TacticsNumber of PapersPercentage
11919.19%
288.08%
31111.11%
41111.11%
577.07%
688.08%
71010.10%
899.09%
977.07%
1066.06%
1111.01%
1222.02%
Table 8. Distribution of techniques per paper (total papers: 99).
Table 8. Distribution of techniques per paper (total papers: 99).
Number of TechniquesNumber of PapersPercentage
11414.14%
21111.11%
366.06%
41212.12%
555.05%
644.04%
744.04%
81111.11%
911.01%
1011.01%
1177.07%
1255.05%
1322.02%
1611.01%
1722.02%
1833.03%
1911.01%
2222.02%
2322.02%
2411.01%
2633.03%
2711.01%
Table 9. Distribution of machine learning models across main categories and subcategories.
Table 9. Distribution of machine learning models across main categories and subcategories.
Main CategoryCountSubcategoriesCount
Deep Learning Models72LSTM & Variants27
Feedforward Networks & Variants24
Core CNN Architectures22
Transformer-Based Models9
Autoencoders8
Specialized/Advanced CNNs8
GAN & Variants5
GRU & Variants4
Graph Neural Networks (GNN)3
Hybrid, Ensemble & Explainable46Ensemble Learning Methods29
Boosting16
Hybrid Architectures13
Interpretability4
Classical Machine Learning Models34Statistical Models17
SVM & Variants17
Tree-Based Models16
Bayesian Models9
Clustering6
Hidden Markov Models1
Learning Paradigms and Optimization18Optimization Algorithms11
Learning Paradigms & Feature Selection7
Table 10. Distribution of machine learning methods in papers by count.
Table 10. Distribution of machine learning methods in papers by count.
Number of ML Methods UsedNumber of PapersPercentage
12222.2%
22525.3%
31717.2%
41313.1%
577.1%
699.1%
733.0%
811.0%
1111.0%
1311.0%
Table 11. Main categories used per paper.
Table 11. Main categories used per paper.
Number of ML Main Categories UsedNumber of PapersPercentage
14343.4%
24242.4%
31313.1%
411.0%
Table 12. Subcategory methods used per paper.
Table 12. Subcategory methods used per paper.
Number of ML Subcategories UsedNumber of PapersPercentage
12525.3%
23737.4%
31212.1%
41212.1%
577.1%
644.0%
722.0%
Table 13. Dataset usage frequency by category.
Table 13. Dataset usage frequency by category.
CategoryFrequenciesMost Used Datasets within Category
NIDD65CSE-CIC-IDS2017 (14), UNSW-NB15 (11), NSL-KDD (10), CSE-CIC-IDS2018 (4)
IoT-NIDD31ToN-IoT (5), EdgeIIoT 2023 (5), BoT-IoT (4), N-BaIoT (3)
Malware20Malimg (4), BIG 2015 (3)
S&P17Phishing Email Collection (4), PhishTank (3)
Custom-Collected Datasets16
ICS12SWaT dataset (3), Gas Pipeline (2)
Other7
Insider Threat4CERT Insider Threat (4)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chizari, M.; Alam, A.; Ali Mirza, Q.K.; Chizari, H. A Tri-Axis Systematic Literature Review of AI-Powered Cyber Defense: ATT&CK-Aligned Analysis of Cyberattacks, Machine Learning Methods, and Datasets. Electronics 2026, 15, 2804. https://doi.org/10.3390/electronics15132804

AMA Style

Chizari M, Alam A, Ali Mirza QK, Chizari H. A Tri-Axis Systematic Literature Review of AI-Powered Cyber Defense: ATT&CK-Aligned Analysis of Cyberattacks, Machine Learning Methods, and Datasets. Electronics. 2026; 15(13):2804. https://doi.org/10.3390/electronics15132804

Chicago/Turabian Style

Chizari, Mohammad, Abu Alam, Qublai Khan Ali Mirza, and Hassan Chizari. 2026. "A Tri-Axis Systematic Literature Review of AI-Powered Cyber Defense: ATT&CK-Aligned Analysis of Cyberattacks, Machine Learning Methods, and Datasets" Electronics 15, no. 13: 2804. https://doi.org/10.3390/electronics15132804

APA Style

Chizari, M., Alam, A., Ali Mirza, Q. K., & Chizari, H. (2026). A Tri-Axis Systematic Literature Review of AI-Powered Cyber Defense: ATT&CK-Aligned Analysis of Cyberattacks, Machine Learning Methods, and Datasets. Electronics, 15(13), 2804. https://doi.org/10.3390/electronics15132804

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop