A Tri-Axis Systematic Literature Review of AI-Powered Cyber Defense: ATT&CK-Aligned Analysis of Cyberattacks, Machine Learning Methods, and Datasets
Abstract
1. Introduction
- They emphasise ML models without systematically connecting them to adversarial behaviours or structured taxonomies such as MITRE ATT&CK;
- They treat datasets in isolation, without a comprehensive categorisation across domains such as NIDD, IoT-NIDD, ICS, and Insider Threat datasets;
- They rarely provide cross-reference analyses that jointly consider attacks, ML paradigms, and datasets.
Positioning Against Existing Reviews
- ATT&CK-aligned attack synthesis: This review identifies and maps 312 attack labels from 99 studies to MITRE ATT&CK tactics and techniques, providing a structured threat-oriented view of the literature.
- Refined cross-domain dataset taxonomy: This review organizes 96 datasets into a refined taxonomy spanning NIDD, IoT-NIDD, malware, Spam and Phishing, ICS, Insider Threat, custom-collected, and other datasets, thereby clarifying the benchmark landscape used in AI-driven cyber defense research.
- Tri-axis cross-reference analysis: The central contribution of this study is a joint cross-reference of cyberattacks, ML methods, and datasets as three linked evidence axes. This tri-axis analysis reveals methodological concentration, benchmark dependence, and underexplored attack–method–dataset intersections that are not visible when attacks, models, or datasets are examined separately.
2. Methodology
- Review Scope and Analytical Frame;
- Research Questions;
- Search Strategy and Literature Identification;
- Study Selection and Eligibility Screening;
- Data Extraction and Coding Procedure;
- Attack Mapping, Method Grouping, and Dataset Categorisation Rules;
- Ambiguity Handling, Consistency Checking, and Quality Appraisal.
2.1. Review Scope and Analytical Frame
2.2. Review Research Questions
2.3. Search Strategy and Literature Identification
2.4. Study Selection
Inclusion and Exclusion Criteria
2.5. Data Extraction and Coding Procedure
2.6. Attack Mapping, Method Grouping, and Dataset Categorisation Rules
2.7. Ambiguity Handling, Consistency Checking, and Quality Appraisal
3. Cyberattacks Across Reviewed Studies
3.1. Attack Taxonomy Selection and ATT&CK Mapping
3.2. Cyberattack Frequency Analysis
3.3. Cyberattack Trends (2019–2024)
3.4. Co-Occurrence of ATT&CK Tactics and Techniques
3.5. Breadth of ATT&CK Coverage per Paper
3.6. Cyberattack-Side Key Findings and Research Gaps
4. Machine Learning Methods Across Reviewed Studies
4.1. ML Method Frequency Analysis
4.2. ML Method Trends (2019–2024)
4.3. Co-Occurrence of Main Categories and Subcategories
4.4. Method Breadth per Paper
4.5. ML Method-Side Key Findings and Research Gaps
5. Datasets Across Reviewed Studies
- NIDD (Network-based Intrusion Detection Datasets);
- IoT-NIDD (IoT-specific Network-based Intrusion Detection Datasets);
- S&P (Spam and Phishing Datasets);
- ICS (Industrial Control System Datasets);
- Insider Threat;
- Custom-Collected Datasets;
- Other (e.g., computer vision, NLP, or behavioural datasets).
5.1. Dataset Frequency Analysis
5.2. Dataset Trends (2019–2024)
5.3. Co-Occurrence of Dataset Categories
5.4. Dataset Breadth per Paper
5.5. Dataset-Side Key Findings and Research Gaps
6. Cross-Reference Analysis
6.1. Cyberattacks × ML Overview
6.2. Tactics × ML Main Categories
Interpretation of Tactics × ML Main Categories Patterns
6.3. Tactics × ML Subcategories
- Ensemble Learning as a cross-tactic workhorse. The hottest subcategory cells appear under Impact (23), Execution (19), and Initial Access (19), with consistently high values for C2 (17) and Reconnaissance (15). This indicates that ensembles are the default stabilizer across diverse data modalities, including netflows, logs, and host events, and across different class distributions.
- Temporal models where sequences matter. LSTM and Variants are prominent for Impact (21), Initial Access (16), Execution (16), C2 (15), and Reconnaissance (15), which is consistent with the appeal of sequence modelling in settings where temporal ordering, staged activity, or periodic communication patterns are expected.
- CNNs for structured/spatialized representations. Core CNN Architectures show meaningful intensity for Impact (16), Execution (15), Initial Access (13), Credential Access (10), and C2 (14). This is consistent with representations that transform packets, flows, binaries, or logs into grids, images, or embeddings exhibiting local spatial patterns.

Interpretation of Tactics × ML Subcategories
6.4. Techniques × ML Subcategories
- The DoS/DDoS family is hottest. Network Denial of Service shows high co-occurrence with Ensembles (18), LSTM (17), Feedforward (17), and CNN (12), with Statistical Models (11) also notable. Endpoint Denial of Service exhibits a similar spread, with Ensembles/LSTM = 13 and Feedforward = 12.
- Perimeter techniques are broadly covered. Exploit Public-Facing Application is strongly represented across Ensembles (14), LSTM (12), CNN (10), and Feedforward (11). Active Scanning shows a balanced profile, with Ensembles 12, LSTM 11, CNN 10, and Hybrid 8, indicating that reconnaissance is modelled with both temporal and structural features.
- Protocol and host-information signals skew simpler. Application Layer Protocol (Ensembles 13; CNN 9; Feedforward 8) and Gather Victim Host Information (Feedforward 11; LSTM 10; Optimization 8) suggest that tabular/engineered features remain competitive where semantics are well captured by aggregate statistics or handcrafted indicators.

Interpretation of Techniques × ML Subcategories
6.5. Heatmaps Meaning for Research and Practice
6.6. Cyberattacks × ML × Datasets
6.7. Key Findings and Research Gaps
7. Gaps and Limitations
7.1. Coverage Gaps Across ATT&CK Tactics and Techniques
7.2. Methodological Limitations in Model Development and Evaluation
7.3. Dataset-Specific Gaps
7.4. Limitations
8. Future Research Directions
8.1. Develop Multi-Domain and Multi-Stage Benchmarking Datasets
8.2. Generate Fine-Grained Datasets Aligned with ATT&CK TTPs
8.3. Expand Methodological Horizons Beyond Standard Models
- Graph-based learning for modelling host-process-identity relationships, naturally capturing lateral movement or privilege escalation.
- Long-context sequence models (e.g., Transformers) for detecting low-and-slow behaviours.
- Self-/semi-supervised pretraining on large unlabelled telemetry for better generalisation.
- Reinforcement learning for adaptive detection and proactive defense.
8.4. Address Under-Represented Critical Domains
8.5. Promote Heterogeneity and Co-Occurrence in Evaluation
8.6. Advance Data Expansion, Domain Adaptation, and Drift-Resilience
8.7. Key Research Priorities
9. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Mapping of Identified Cyberattacks to MITRE ATT&CK Tactics and Techniques
| Attack/Keyword | Tactic | Technique | Technique ID |
|---|---|---|---|
| Reconnaissance | |||
| Vulnerability Scanning | Reconnaissance | Active Scanning: Vulnerability Scanning | T1595.002 |
| mscan | Reconnaissance | Active Scanning: Scanning IP Blocks | T1595.001 |
| saint | Reconnaissance | Active Scanning: Vulnerability Scanning | T1595.002 |
| portsweep | Reconnaissance | Active Scanning: Scanning IP Blocks | T1595.001 |
| satan | Reconnaissance | Active Scanning: Vulnerability Scanning | T1595.002 |
| ipsweep | Reconnaissance | Active Scanning: Scanning IP Blocks | T1595.001 |
| nmap | Reconnaissance | Active Scanning | T1595 |
| PortScan | Reconnaissance | Active Scanning | T1595 |
| PortScan OS | Reconnaissance | Gather Victim Host Information: Client Configurations | T1592.004 |
| Reconnaissance | Reconnaissance | Active Scanning, Gather Victim Host Information | T1595, T1592 |
| OS Fingerprinting | Reconnaissance | Gather Victim Host Information: Client Configurations | T1592.004 |
| Probe | Reconnaissance | Active Scanning | T1595 |
| OS Scanning | Reconnaissance | Gather Victim Host Information: Client Configurations | T1592.004 |
| Scanning | Reconnaissance | Active Scanning | T1595 |
| Browsing job-hunting or competitor websites | Reconnaissance | Gather Victim Org Information | T1591 |
| Information Gathering | Reconnaissance | Gather Victim Host Information | T1592 |
| SYN Scan | Reconnaissance | Active Scanning | T1595 |
| TCP Connect Scan | Reconnaissance | Active Scanning | T1595 |
| UDP Scan | Reconnaissance | Active Scanning | T1595 |
| Network Sweep (IP range scanning) | Reconnaissance | Active Scanning | T1595 |
| Mirai–Junk/Scan | Reconnaissance | Active Scanning | T1595 |
| Bashlite–Junk/Scan | Reconnaissance | Active Scanning | T1595 |
| Service Scan | Reconnaissance | Active Scanning | T1595 |
| pingsweep | Reconnaissance | Active Scanning | T1595 |
| Resource Development | |||
| Domain Abuse | Resource Development | Acquire Infrastructure: Domains | T1583.001 |
| Malicious Domain | Resource Development | Acquire Infrastructure: Domains | T1583.001 |
| Account creation for fake reviews | Resource Development | Establish Accounts: Social Media Accounts | T1585.001 |
| Phishing Kits | Resource Development | Develop Capabilities | T1587 |
| Hosting Infra | Resource Development | Acquire Infrastructure: Web Services | T1583.006 |
| Botnet fake account | Resource Development | Establish Accounts: Social Media Accounts | T1585.001 |
| Sybil Attack | Resource Development | Compromise Accounts | T1586 |
| Initial Access | |||
| sendmail | Initial Access | Exploit Public-Facing Application | T1190 |
| named | Initial Access | Exploit Public-Facing Application | T1190 |
| ftp_write | Initial Access | Valid Accounts | T1078 |
| Web Attack–SQL Injection | Initial Access | Exploit Public-Facing Application | T1190 |
| LDAP Injection | Initial Access | Exploit Public-Facing Application | T1190 |
| XPath Injection | Initial Access | Exploit Public-Facing Application | T1190 |
| mysql | Initial Access | Valid Accounts | T1078 |
| Unintentional Illegal Requests | Initial Access | Exploit Public-Facing Application | T1190 |
| sqlattack | Initial Access | Exploit Public-Facing Application | T1190 |
| SSI Injection | Initial Access | Exploit Public-Facing Application | T1190 |
| phf | Initial Access | Exploit Public-Facing Application | T1190 |
| Exploits | Initial Access | Exploit Public-Facing Application | T1190 |
| Phishing Email | Initial Access | Phishing | T1566 |
| Infiltration | Initial Access | Phishing | T1566 |
| CVE-2017-5638 (Struts2) | Initial Access | Exploit Public-Facing Application | T1190 |
| proftpd | Initial Access | Exploit Public-Facing Application | T1190 |
| apache-struts | Initial Access | Exploit Public-Facing Application | T1190 |
| Spam Email | Initial Access | Phishing: Spearphishing via Email | T1566.001 |
| Spam | Initial Access | Phishing | T1566 |
| Phishing | Initial Access | Phishing | T1566 |
| Fake Pages | Initial Access | Phishing | T1566 |
| Phishing Site Deployment | Initial Access | Phishing: Spearphishing Link | T1566.002 |
| AI-generated phishing URLs | Initial Access | Phishing: Spearphishing Link | T1566.002 |
| Payload via Email | Initial Access | Phishing: Spearphishing via Email | T1566.001 |
| Telnet exploit | Initial Access | Exploit Public-Facing Application | T1190 |
| Generic (Generic Exploits) | Initial Access | Exploit Public-Facing Application | T1190 |
| Code Red Worm | Initial Access | Exploit Public-Facing Application | T1190 |
| Parameter Tampering | Initial Access | Exploit Public-Facing Application | T1190 |
| Path Traversal | Initial Access | Exploit Public-Facing Application | T1190 |
| Opportunistic Service Attack (OSA) | Initial Access | Exploit Public-Facing Application | T1190 |
| Replay Attacks | Initial Access | Valid Accounts | T1078 |
| Spearphishing | Initial Access | Phishing: Spearphishing Attachment | T1566.001 |
| S7 unauthorized access | Initial Access | Exploit Public-Facing Application | T1190 |
| Unauthorized access to HMI or SCADA | Initial Access | Valid Accounts | T1078 |
| Execution | |||
| Web Attack–XSS | Execution | Command-Line Interface: JavaScript | T1059.007 |
| Server-Side Include | Execution | Server Software Component | T1505.003 |
| loadmodule | Execution | Process Injection: Dynamic-Link Library Injection | T1055.001 |
| Web Attack-Command Injection | Execution | Command-Line Interface | T1059 |
| perl | Execution | Command-Line Interface | T1059 |
| xterm | Execution | Command-Line Interface | T1059 |
| Shellcode | Execution | Exploitation for Client Execution | T1203 |
| OS Command Execution | Execution | Command and Scripting Interpreter | T1059 |
| Downloaders/Droppers | Execution | Ingress Tool Transfer | T1105 |
| S7 command injection | Execution | Manipulation of Control/Command Message | T0851 |
| Command injection to PLC or RTU | Execution | Command and Scripting Interpreter | T1059 |
| warezmaster | Execution | Command-Line Interface | T1059 |
| JavaMeterpreter | Execution | Command-Line Interface: JavaScript | T1059.007 |
| Windows-RCE | Execution | Exploitation for Client Execution | T1203 |
| Trojan | Execution | User Execution | T1204 |
| Installing unauthorized software | Execution | User Execution: Malicious File | T1204.002 |
| Unauthorized command via MODBUS | Execution | Command and Scripting Interpreter | T1059 |
| Viruses | Execution | User Execution | T1204 |
| Fileless Malware | Execution | Command and Scripting Interpreter: PowerShell | T1059.001 |
| Allaple.A/L | Execution | User Execution | T1204 |
| Agent.FYI | Execution | Command and Scripting Interpreter | T1059 |
| Dialplatform.B | Execution | User Execution | T1204 |
| Instantaccess | Execution | User Execution | T1204 |
| VB.AT | Execution | Command and Scripting Interpreter | T1059 |
| Yuner.A | Execution | Command and Scripting Interpreter | T1059 |
| Gatak | Execution | Command and Scripting Interpreter | T1059 |
| Lollipop | Execution | Command and Scripting Interpreter | T1059 |
| Tracur | Execution | User Execution | T1204 |
| Simda | Execution | Command and Scripting Interpreter | T1059 |
| Htbot | Execution | Command and Scripting Interpreter | T1059 |
| Miuref | Execution | Command and Scripting Interpreter | T1059 |
| Neris | Execution | Command and Scripting Interpreter | T1059 |
| Kenjiro | Execution | Command and Scripting Interpreter | T1059 |
| Hide and Seek | Execution | Command and Scripting Interpreter | T1059 |
| Mirai | Execution | Command and Scripting Interpreter | T1059 |
| Persistence | |||
| Wintrim.BX | Persistence | Boot or Logon Autostart Execution: Registry Run Keys/Startup Folder | T1547.001 |
| Autorun.K | Persistence | Boot or Logon Autostart Execution | T1547 |
| Murlo (TCP-based backdoor) | Persistence | Create or Modify System Process | T1569 |
| Browser Hijacking | Persistence | Software Extensions: Browser Extensions | T1176.001 |
| Uploading Attack | Persistence | Server Software Component: Web Shell | T1505.003 |
| Modification of control logic | Persistence | Event Triggered Execution | T1546 |
| Malware with Persistence via Registry | Persistence | Boot or Logon Autostart Execution: Registry Run Keys/Startup Folder | T1547.001 |
| Alueron.gen!J | Persistence | Boot or Logon Autostart Execution | T1547 |
| Dontovo.A | Persistence | Boot or Logon Autostart Execution | T1547 |
| Lolyda.AA1/AA2/AA3/AT | Persistence | Boot or Logon Autostart Execution | T1547 |
| Malex.gen!J | Persistence | Boot or Logon Autostart Execution | T1547 |
| Skintrim.N | Persistence | Boot or Logon Autostart Execution | T1547 |
| Kelihos_ver3 | Persistence | Boot or Logon Autostart Execution | T1547 |
| Kelihos_ver1 | Persistence | Boot or Logon Autostart Execution | T1547 |
| Vundo | Persistence | Boot or Logon Autostart Execution | T1547 |
| Shifu | Persistence | Boot or Logon Autostart Execution | T1547 |
| Torii | Persistence | Boot or Logon Autostart Execution | T1547 |
| Privilege Escalation | |||
| xlock | Privilege Escalation | Exploitation for Privilege Escalation | T1068 |
| Buffer Overflow | Privilege Escalation | Exploitation for Privilege Escalation | T1068 |
| Adduser (Unauthorized) | Privilege Escalation | Create Account | T1136 |
| Defense Evasion | |||
| CRLF Injection | Defense Evasion | Obfuscated Files or Information | T1027 |
| Rootkit | Defense Evasion | Rootkit | T1014 |
| Logging in outside working hours | Defense Evasion | Valid Accounts | T1078 |
| Obfuscator.AD | Defense Evasion | Obfuscated Files or Information | T1027 |
| Obfuscator.ACY | Defense Evasion | Obfuscated Files or Information | T1027 |
| URL Obfuscation | Defense Evasion | Obfuscated Files or Information | T1027 |
| Avoiding spam detection | Defense Evasion | Email Obfuscation/Content Spoofing | T1566/T1027 |
| Log deletion or obfuscation | Defense Evasion | Indicator Removal on Host | T1070.001 |
| Decreased Rank Attack | Defense Evasion | Impair Defenses | T1562 |
| hash-based malware | Defense Evasion | Obfuscated Files or Information | T1027 |
| Spoofing | Defense Evasion | Modify Authentication Process | T1556 |
| Disabling alarms or event logs | Defense Evasion | Indicator Removal | T1070 |
| AI-generated malware | Defense Evasion | Obfuscated Files or Information | T1027 |
| Obfuscated Malware | Defense Evasion | Obfuscated Files or Information | T1027 |
| Polymorphic malware | Defense Evasion | Obfuscated Files or Information | T1027 |
| Metamorphic malware | Defense Evasion | Obfuscated Files or Information | T1027 |
| Packed malware | Defense Evasion | Obfuscated Files or Information: Software Packing | T1027.002 |
| Fakerean | Defense Evasion | Obfuscated Files or Information | T1027 |
| Swizzor.gen!E/I | Defense Evasion | Obfuscated Files or Information | T1027 |
| Nsis-ay | Defense Evasion | Obfuscated Files or Information | T1027 |
| Credential Access | |||
| imap | Credential Access | Exploitation of Remote Services | T1210 |
| Infiltration-mitm | Credential Access | Adversary-in-the-Middle | T1557 |
| MITM | Credential Access | Adversary-in-the-Middle | T1557 |
| Telnet remote access attempts | Credential Access | Brute Force | T1110 |
| xsnoop | Credential Access | Input Capture | T1056 |
| spy | Credential Access | Input Capture | T1056 |
| Keylogger | Credential Access | Keylogging | T1056.001 |
| Web Brute Force | Credential Access | Brute Force | T1110 |
| guess_passwd | Credential Access | Brute Force | T1110 |
| FTP-Patator | Credential Access | Brute Force | T1110 |
| SSH-Patator | Credential Access | Brute Force | T1110 |
| SSH Brute Force | Credential Access | Brute Force | T1110 |
| SSH Attack | Credential Access | Brute Force | T1110 |
| RDP Brute Force | Credential Access | Brute Force | T1110.003 |
| Password Cracking | Credential Access | Brute Force: Password Spraying | T1110.004 |
| RTSP Brute Force | Credential Access | Brute Force | T1110 |
| Credential Harvesting | Credential Access | Input Capture/Phishing | T1056/T1566 |
| SFTP attack | Credential Access | Brute Force | T1110 |
| Brute Force | Credential Access | Brute Force | T1110 |
| Heartbleed | Credential Access | Exploitation for Credential Access | T1212 |
| Hydra-FTP (FTP brute-force attacks) | Credential Access | Brute Force: Password Guessing | T1110.001 |
| Hydra-SSH (SSH brute-force attacks) | Credential Access | Brute Force: Password Guessing | T1110.001 |
| Bashlite–brute | Credential Access | Brute Force: Password Guessing | T1110.001 |
| DNS Spoofing | Credential Access | Adversary-in-the-Middle | T1557 |
| Dictionary BruteForce | Credential Access | Brute Force: Password Guessing | T1110.001 |
| Wormhole Attack (WHA) | Credential Access | Adversary-in-the-Middle | T1557 |
| Hijacking or spoofing PLC communications | Credential Access | Adversary-in-the-Middle | T1557 |
| Evil Twin Attacks | Credential Access | Adversary-in-the-Middle | T1557 |
| Ramnit | Credential Access | Credentials from Password Stores | T1555 |
| Cridex | Credential Access | Credentials from Password Stores | T1555 |
| Zeus | Credential Access | Credentials from Password Stores | T1555 |
| Tinba | Credential Access | Credentials from Password Stores | T1555 |
| Discovery | |||
| Fuzzers Attack | Discovery | Network Service Scanning | T1046 |
| snmpguess | Discovery | Network Service Scanning | T1046 |
| ps | Discovery | Process Discovery | T1057 |
| snmpgetattack | Discovery | Network Service Scanning | T1046 |
| Analysis | Discovery | Network Sniffing | T1040 |
| Sniffing Attacks | Discovery | Network Sniffing | T1040 |
| ARP Spoofing | Discovery | Network Sniffing | T1040 |
| Host Discovery | Discovery | Remote System Discovery | T1018 |
| Sinkhole Attack (SHA) | Discovery | Network Sniffing | T1040 |
| Lateral Movement | |||
| smb-exploit | Lateral Movement | Exploitation of Remote Services | T1210 |
| Worms | Lateral Movement | Remote Services | T1021 |
| W32.Blaster Worm | Lateral Movement | Exploitation of Remote Services | T1210 |
| Reaper Worm | Lateral Movement | Exploitation of Remote Services | T1210 |
| Virut (Malware propagation) | Lateral Movement | Replication Through Removable Media | T1091 |
| Conficker | Lateral Movement | Exploitation of Remote Services | T1210 |
| Hakai | Lateral Movement | Remote Services | T1021 |
| Muhstik | Lateral Movement | Remote Services | T1021 |
| Collection | |||
| File Disclosure | Collection | Data from Local System | T1005 |
| Searching for sensitive documents | Collection | Data from Local System | T1005 |
| Spyware | Collection | Input Capture | T1056 |
| ARP MitM | Collection | Adversary-in-the-Middle | T1557 |
| Active Wiretap | Collection | Adversary-in-the-Middle | T1557 |
| Accessing sensitive files repeatedly | Collection | Data from Local System | T1005 |
| Printing sensitive information | Collection | Data from Local System | T1005 |
| Infostealers | Collection | Input Capture | T1056 |
| Adialer.C | Collection | Input Capture | T1056 |
| Alueron.gen!J | Collection | Ingress Tool Transfer | T1105 |
| Command and Control | |||
| Backdoor | Command and Control | Remote Access Software | T1219 |
| httptunnel | Command and Control | Application Layer Protocol: Web Protocols | T1071.001 |
| multihop | Command and Control | Proxy: Multi-hop Proxy | T1090.003 |
| warezclient | Command and Control | Application Layer Protocol | T1071 |
| Botnet ARES | Command and Control | Application Layer Protocol: Web Protocols | T1071.001 |
| Spam botnet | Command and Control | Application Layer Protocol | T1071 |
| Meterpreter (Metasploit post-exploitation activity) | Command and Control | Application Layer Protocol | T1071 |
| IRC Botnet | Command and Control | Application Layer Protocol | T1071 |
| HTTP Botnet | Command and Control | Application Layer Protocol | T1071 |
| DNS Tunneling | Command and Control | Application Layer Protocol: DNS | T1071.004 |
| Botnet | Command and Control | Application Layer Protocol | T1071 |
| Neris | Command and Control | Application Layer Protocol | T1071 |
| Rbot | Command and Control | Application Layer Protocol | T1071 |
| Menti (P2P botnet) | Command and Control | Non-Application Layer Protocol | T1095 |
| Sogou (HTTP-based C2) | Command and Control | Application Layer Protocol: Web Protocols | T1071.001 |
| NSIS.ay (Downloader trojan) | Command and Control | Ingress Tool Transfer | T1105 |
| Remote Access Trojan (RAT) | Command and Control | Remote Access Tools | T1219 |
| Use of anonymizing tools | Command and Control | Proxy: Multi-hop Proxy | T1090.003 |
| Command & Control using HTTP | Command and Control | Application Layer Protocol: Web Protocols | T1071.001 |
| C2LOP.P/gen!g | Command and Control | Non-Application Layer Protocol | T1095 |
| Rbot!gen | Command and Control | Ingress Tool Transfer | T1105 |
| Geodo | Command and Control | Ingress Tool Transfer | T1105 |
| Gafgyt | Command and Control | Ingress Tool Transfer | T1105 |
| Linux.Hajime | Command and Control | Ingress Tool Transfer | T1105 |
| Exfiltration | |||
| Sending confidential info to competitors | Exfiltration | Exfiltration Over Web Service | T1041 |
| Data Exfiltration | Exfiltration | Exfiltration Over Command and Control Channel | T1041 |
| Data Exfiltration Tools | Exfiltration | Exfiltration Tools | T1567 |
| Uploading to personal email/cloud | Exfiltration | Exfiltration Over Web Service | T1567.002 |
| Use of removable media | Exfiltration | Exfiltration Over Physical Medium: Exfiltration over USB | T1052.001 |
| Impact | |||
| mailbomb | Impact | Email Bombing | T1667 |
| smurf (ICMP amplification) | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| neptune | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| back | Impact | Service Stop | T1489 |
| teardrop | Impact | Endpoint Denial of Service: Application or System Exploitation | T1499.004 |
| pod (Ping of Death) | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| land DoS | Impact | Endpoint Denial of Service: OS Exhaustion Flood | T1499.001 |
| apache2 | Impact | Endpoint Denial of Service: Service Exhaustion Flood | T1499.002 |
| processtable | Impact | Endpoint Denial of Service: OS Exhaustion Flood | T1499.001 |
| udpstorm | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| Packts fragmentation attack | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| UDP Fragmentation | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| ACK Fragmentation | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| RSTFIN Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| PSHACK Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| ICMP Fragmentation | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| SynonymousIP Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| SYN Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| UDP Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| ICMP Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| HTTP Flood | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| Apache Range Header | Impact | Endpoint Denial of Service: Application or System Exploitation | T1499.004 |
| Slow POST | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| HTTP/2 Rapid Reset | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| GraphQL Overload | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| API Parameter Abuse | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| WS Amplification | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| DoS GoldenEye | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| DoS Hulk | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| DoS Slowhttptest | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| DoS Slowloris | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| DoS | Impact | Network Denial of Service | T1498 |
| DDoS | Impact | Network Denial of Service | T1498 |
| DDoSsim | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| DDoS LOIC-UDP | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| DDoS LOIC-HTTP | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| DDoS HOIC | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| DDoS Bot | Impact | Network Denial of Service | T1498 |
| DDoS Stomp | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| DDoS DYN | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| DDoS TCP | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| DNS (DNS amplification) | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| LDAP (UDP flood) | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| MSSQL (UDP flood) | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| NetBIOS (UDP flood) | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| NTP (NTP amplification) | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| Portmap (UDP flood) | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| SNMP (SNMP amplification) | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| CLDAP Reflection | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| SSDP (SSDP amplification) | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| UDP (Generic UDP flood) | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| UDPLag (UDP with response lag) | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| TFTP (UDP flood) | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| WebDDoS (HTTP flood) | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| Memcached | Impact | Network Denial of Service: Reflection Amplification | T1498.002 |
| Mirai–TCP Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| Mirai–UDP Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| Mirai–HTTP Flood | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| Mirai-GREIP Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| Mirai-Greeth Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| Mirai-UDPPlain | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| Bashlite–TCP Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| Bashlite–UDP Flood | Impact | Network Denial of Service: Direct Network Flood | T1498.001 |
| Bashlite–HTTP Flood | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| Bashlite–ACK/other floods | Impact | Network Denial of Service: Direct Network Flood | T1499.003 |
| SSL Renegotiation DoS | Impact | Endpoint Denial of Service: Application Exhaustion Flood | T1499.003 |
| Flooding Attack | Impact | Network Denial of Service | T1498 |
| DODAG Version Number Attack | Impact | Endpoint Denial of Service (ICS-specific) | T1499.004 |
| Ransomware | Impact | Data Encrypted for Impact | T1486 |
| Blackhole Attack | Impact | Network Denial of Service | T1498 |
| Video Injection | Impact | Defacement | T1491 |
| Gear Spoofing Attack | Impact | Data Manipulation | T1565 |
| RPM Spoofing Attack | Impact | Data Manipulation | T1565 |
| False Data Injection Attack | Impact | Data Manipulation | T1565 |
| Tolerable FDIA | Impact | Data Manipulation | T1565 |
| Posting fake reviews | Impact | Data Manipulation | T1565 |
| Opinion Spam (Fake Review Attack) | Impact | Data Manipulation | T1565 |
| Disinformation/fake content | Impact | Data Manipulation | T1565 |
| Coordinated campaign (review farms) | Impact | Data Manipulation | T1565 |
| Targeting businesses’ reputation | Impact | Data Manipulation | T1565 |
| Review flooding | Impact | Data Manipulation | T1565 |
| Malicious actuator control | Impact | Endpoint Denial of Service | T1499 |
| PTP Attack (time sync manipulation) | Impact | Data Manipulation | T1565 |
| De-authentication DoS | Impact | Endpoint Denial of Service | T1499 |
| Fake Landing (tricking the UAV into landing) | Impact | Data Manipulation | T1565 |
| Adware | Impact | Resource Hijacking | T1496 |
Appendix B. Reviewed Datasets Information and Refined Taxonomy
| Dataset | Category | Year | Normal | Attack | Metadata | Format | Count | Duration | Kind | Network | Complete | Splits | Balanced | Labeled | Classes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| KDD Cup 99 [55] | NIDD | 1999 | yes | yes | no | other | 5 M | - | emulated | small | yes | yes | no | yes | 4 |
| NSL-KDD [20] | NIDD | 2009 | yes | yes | no | other | 148,517 | - | emulated | small | yes | yes | no | yes | 4 |
| UNSW-NB15 [21] | NIDD | 2015 | yes | yes | yes | packet, other | 2,540,044 | 31 h | emulated | small | yes | yes | no | yes | 9 |
| CSE-CIC-IDS2017 [22] | NIDD | 2017 | yes | yes | yes | packet, bi.flow | 3,100,000 | 5 days | emulated | small | yes | no | no | yes | 9 |
| CSE-CIC-IDS2018 [56] | NIDD | 2018 | yes | yes | yes | packet, bi.flow | 16,000,000 | 10 days | emulated | small | yes | no | no | yes | 15 |
| CIC-DDoS2019 [57] | NIDD | 2019 | yes | yes | yes | packet, bi.flow | 50,000,000 | 5 days | emulated | small | yes | no | no | yes | 11 |
| LITNET-2020 [58] | NIDD | 2020 | yes | yes | yes | flow, packet | 50,000,000 | months | real | large ISP | yes | no | no | yes | 2 |
| NetML-2020 [59] | NIDD | 2020 | yes | yes | yes | flow | 3,000,000 | days | emulated | small | yes | no | yes | yes | 10 |
| 5G-NIDD [60] | NIDD | 2021 | yes | yes | yes | flow | 15,000 | hours | emulated | 5G wireless | yes | no | yes | yes | 2 |
| FLNET2023 [61] | NIDD | 2023 | yes | yes | yes | flow | 6,000,000 | 24 h | real + emulated | various | yes | no | no | yes | 11 |
| NGIDS-DS [62] | NIDD | 2022 | yes | yes | yes | flow | 20,000,000 | days | emulated | small | yes | no | no | yes | 9 |
| CAIDA 2007 [63] | NIDD | 2007 | no | yes | yes | packet | huge | minutes | real | various | partial | no | no | no | - |
| BoNeSi Dataset [64] | NIDD | 2010 | no | yes | no | packet | 100,000 | minutes | emulated | lab | no | no | no | no | - |
| DDoSDB [65] | NIDD | since 2020 | varies | yes | yes | packet | - | - | real + synthetic | various | yes | no | no | no | - |
| App Layer DoS [66] | NIDD | 2017 | yes | yes | yes | flow | 1,000,000 | 8 h | emulated | small | yes | no | no | yes | 4 |
| CSIC HTTP 2010 [67] | NIDD | 2010 | no | yes | yes | HTTP logs | 67,000 | sessions | emulated | application-level | yes | yes | yes | yes | 2 |
| ECML/PKDD 2007 [68] | NIDD | 2007 | no | yes | yes | session | 600,000 | weeks | real | telecom | partial | yes | yes | yes | 2 |
| NPS-2009-Casper-Rw [69] | NIDD | 2009 | no | yes | yes | packet, flow | 1,000,000 | hours | emulated | small | no | no | no | no | - |
| NCC Dataset [70] | NIDD | 2022 | yes | yes | yes | flow | 5,000,000 | hours | real + emulated | huge | yes | no | no | yes | 14 |
| NCC-2 Dataset [71] | NIDD | 2023 | yes | yes | yes | flow | 10,000,000 | hours | real + emulated | huge | yes | yes | no | yes | 18 |
| InSDN Dataset [72] | NIDD | 2022 | yes | yes | yes | flow | 4,000,000 | hours | emulated | SDN | yes | no | yes | yes | 10 |
| Benign and Malicious [73] | NIDD | 2021 | yes | yes | yes | other | 90,000 | - | real | various | no | - | yes | yes | 2 |
| CTU-13 Dataset [74] | NIDD | 2011 | yes | yes | yes | packet | 13 scenarios | days | real + emulated | botnet traffic | yes | no | no | yes | 13 |
| USTC-TFC2016 [75] | NIDD | 2017 | yes | yes | yes | packet | 750,000 | hours | real | malware dataset | yes | no | no | yes | 10 |
| N-BaIoT [76] | IoT-NIDD | 2018 | yes | yes | yes | flow | 100,000 | days | emulated | IoT | yes | no | no | yes | 2 |
| BoT-IoT [9] | IoT-NIDD | 2018 | yes | yes | yes | flow | 70,000,000 | days | emulated | IoT | yes | yes | no | yes | 5 |
| IoTPOT [77] | IoT-NIDD | 2015 | no | yes | yes | packet | - | weeks | real | honeypot | partial | no | no | no | - |
| ToN-IoT [23] | IoT-NIDD | 2020 | yes | yes | yes | flow, syslogs | 25,000,000 | days | emulated + real | IoT | yes | yes | yes | yes | 9 |
| IoT-23 [78] | IoT-NIDD | 2020 | yes | yes | yes | flow | 20,000,000 | days | emulated+real | IoT | yes | no | no | yes | 10 |
| EdgeIIoT 2023 [79] | IoT-NIDD | 2023 | yes | yes | yes | flow | 2,000,000 | days | emulated | edge IoT | yes | no | yes | yes | 15 |
| CIC-IoT2022 [80] | IoT-NIDD | 2022 | yes | yes | yes | flow, packet | 4,000,000 | days | emulated | small IoT lab | yes | no | no | yes | 6 |
| CICIoT-2023 [81] | IoT-NIDD | 2023 | yes | yes | yes | flow, packet | 10,000,000 | days | emulated | IoT/5G hybrid | yes | no | no | yes | 8 |
| UNSW IoT Traffic [82] | IoT-NIDD | 2019 | yes | yes | yes | flow | 1,000,000 | hours | emulated | IoT | yes | no | yes | yes | 10 |
| Distributed IoT [83] | IoT-NIDD | 2021 | yes | yes | yes | flow | 3,000,000 | hours | emulated | IoT | yes | no | yes | yes | 10 |
| ROUT-4-2023 [84] | IoT-NIDD | 2023 | yes | yes | yes | flow | 2,000,000 | hours | emulated | hybrid SDN & IoT | yes | no | no | yes | 9 |
| Kitsune Dataset [85] | IoT-NIDD | 2018 | yes | yes | yes | packet, flow | 100,000,000 | days | emulated | IoT & smart home | yes | no | no | yes | 21 |
| Wi-Fi Dataset [86] | IoT-NIDD | 2022 | yes | yes | yes | flow | 1,000,000 | hours | emulated | Wi-Fi lab | yes | no | yes | yes | 7 |
| HCRL CAN [87] | IoT-NIDD | 2020 | yes | yes | yes | other, packet | 4,500,000 | hours | real + synthetic | vehicle CAN bus | yes | no | no | yes | 5 |
| HCRL Car Hacking [88] | IoT-NIDD | 2020 | yes | yes | yes | other | 4,300,000 | 40 min | real | vehicle CAN bus | yes | no | no | yes | 5 |
| Malimg [89] | Malware | 2011 | no | yes | yes | images | 9339 | - | static | malware dataset | yes | no | no | yes | 25 |
| BIG 2015 [90] | Malware | 2015 | no | yes | yes | binaries | 10 GB files | - | static | malware dataset | yes | yes | no | yes | 9 |
| MaleVis [91] | Malware | 2020 | no | yes | yes | images | 14,226 | - | static | malware dataset | yes | no | yes | yes | 26 |
| Malicia [92] | Malware | 2013 | no | yes | yes | binaries | 11,668 | - | static | malware dataset | yes | no | no | yes | 2 |
| Drebin project [93] | Malware | 2014 | no | yes | yes | APK files | 129,013 | - | static | mobile malware | yes | yes | yes | yes | 2 |
| VX-Heavens [94] | Malware | since 2010 | no | yes | limited | binaries | 30,000 | - | static | malware dataset | no | no | no | no | - |
| VirusShare [95] | Malware | since 2010 | no | yes | limited | binaries | - | - | static | malware dataset | no | no | no | no | - |
| VirusTotal [96] | Malware | since 2004 | no | yes | yes | binaries | - | - | static + dynamic | malware dataset | no | no | no | yes | 1 |
| CIC-MalMem-2022 [97] | Malware | 2022 | no | yes | yes | memory | 100,000 | hours | dynamic | malware memory | yes | yes | yes | yes | 6 |
| MemMal-D2024 [98] | Malware | 2024 | no | yes | yes | memory | 100,000 | hours | dynamic | malware memory | yes | yes | yes | yes | 2 |
| CIC-CMD-2024 [99] | Malware | 2024 | yes | yes | yes | flow | 10,000,000 | days | real + emulated | malware dataset | yes | yes | no | yes | - |
| SpamEmail [100] | S&P | 1999 | yes | yes | yes | other | 4601 | - | static | spam emails | - | no | no | yes | 2 |
| SpamClassification [101] | S&P | 2021 | yes | yes | yes | other | 5796 | - | emulated | spam messages | - | yes | no | yes | 2 |
| Email Spam [102] | S&P | 2020 | yes | yes | yes | other | 5172 | - | emulated | spam emails | - | yes | yes | yes | 2 |
| SpamAssassin [103] | S&P | since 2021 | yes | yes | partial | other | 6047 | 1 year | real | spam emails | - | yes | no | yes | 2 |
| Benign Email [104] | S&P | 2013 | yes | no | yes | other | 14,043 | - | real | benign emails | - | no | - | yes | 1 |
| Phishing Email [105] | S&P | 2020 | yes | yes | yes | other | - | - | emulated | phishing emails | - | yes | no | yes | 2 |
| Bot Account [106] | S&P | 2023 | yes | yes | yes | other | 8574 | - | real | social media | no | no | yes | yes | 2 |
| STIX & Curated [107] | S&P | 2015 | no | yes | yes | other | - | - | emulated | Threat Indicators | no | no | - | yes | - |
| Alexa Phishing [108] | S&P | since 2020 | yes | yes | yes | other | 1M+ | - | real | Phishing URLs | no | no | no | yes | 2 |
| PhishTank [109] | S&P | - | no | yes | yes | other | 100K+ | - | real | Phishing URLs | yes | no | no | yes | 2 |
| OpenPhish [110] | S&P | - | no | yes | yes | other | - | - | real | Phishing URLs | no | no | no | yes | 2 |
| Anti-Phishing WG [111] | S&P | - | no | yes | yes | other | - | months | real | Phishing incidents | no | no | - | yes | 2 |
| YelpChi [112] | S&P | 2013 | yes | yes | yes | other | 45,000+ | years | real | reviews | yes | yes | yes | yes | 2 |
| YelpNYC [113] | S&P | 2015 | yes | yes | yes | other | 160,000+ | years | real | reviews | yes | yes | yes | yes | 2 |
| YelpZip [113] | S&P | 2015 | yes | yes | yes | other | 60,000+ | years | real | reviews | yes | yes | yes | yes | 2 |
| Gas Pipeline [54,114] | ICS | 2011 | yes | yes | yes | flow, packet | 100,000 | hours | emulated | ICS network | yes | no | no | yes | 2 |
| SWaT dataset [115] | ICS | 2015 | yes | yes | yes | other | 946,722 | 11 days | real | ICS network | yes | yes | no | yes | 2 |
| Necon-IIUM ICS Dataset [116] | ICS | 2022 | yes | yes | yes | other | 1,500,000 | 7 days | emulated | ICS network | yes | no | no | yes | 5 |
| ERENO IEC-61850 [117] | ICS | 2020 | yes | yes | yes | packet, flow | - | 2 h | real | ICS network | yes | no | no | yes | 5 |
| IEEE 118-bus dataset [118] | ICS | 2001 | yes | yes | yes | other | 118 buses | - | synthetic | ICS network | yes | no | - | yes | 3 |
| IEEE 123-bus dataset [118] | ICS | 1991 | yes | no | yes | other | 123 buses | - | synthetic | ICS network | yes | no | no | yes | 3 |
| IEEE 13-bus dataset [118] | ICS | 1992 | yes | no | yes | other | 13 buses | - | synthetic | ICS network | yes | no | no | yes | 2 |
| IEEE-14-bus dataset [118] | ICS | 2018 | yes | yes | yes | other | 14 buses | 24 h | synthetic | ICS network | yes | no | - | yes | 2 |
| CERT Insider Threat [119] | Insider Threat | 2016 | yes | yes | yes | user logs | 1,000,000 | months | emulated | enterprise | yes | no | no | yes | 2 |
| Udacity dataset [120] | Other | 2016 | yes | no | yes | images | - | - | real | simulation | yes | no | no | yes | - |
| GTSRB [121] | Other | 2011 | - | - | yes | images | 51,839 | - | real | - | - | yes | no | yes | 43 |
| UAVid dataset [122] | Other | 2020 | - | - | yes | images | 3000 | - | real | UAV/Aerial | no | yes | no | yes | 8 |
| ConsumerComplaint [123] | Other | 2018 | - | - | yes | other | 1,200,000 | 8 years | real | - | - | no | no | yes | 10 |
| SpeechCommands [124] | Other | 2017 | - | - | yes | wav | 105,829 | - | real | voice command | - | yes | - | yes | 35 |
| IMDB [125] | Other | 2011 | - | - | yes | other | 50,000 | - | real | - | - | yes | yes | yes | 2 |
| CIFAR-10 [126] | Other | 2009 | - | - | yes | images | 60,000 | - | real | - | - | yes | yes | yes | 10 |
References
- Sommer, R.; Paxson, V. Outside the closed world: On using machine learning for network intrusion detection. In Proceedings of the 2010 IEEE Symposium on Security and Privacy (SP); IEEE: Los Alamitos, CA, USA, 2010; pp. 305–316. [Google Scholar]
- Li, Y.; Liu, Q. A comprehensive review study of cyber-attacks and cyber security; Emerging trends and recent developments. Energy Rep. 2021, 7, 8176–8186. [Google Scholar] [CrossRef]
- Hizal, S.; Cavusoglu, U.; Akgun, D. A novel deep learning-based intrusion detection system for IoT DDoS security. Internet Things 2024, 28, 101336. [Google Scholar] [CrossRef]
- Jada, I.; Mayayise, T.O. The impact of artificial intelligence on organisational cyber security: An outcome of a systematic literature review. Data Inf. Manag. 2024, 8, 100063. [Google Scholar] [CrossRef]
- Baron Garcia, A. Machine Learning and Artificial Intelligence Methods for Cybersecurity Data Within the Aviation Ecosystem. Ph.D. Thesis, Embry-Riddle Aeronautical University, Daytona Beach, FL, USA, 2022. [Google Scholar]
- Buczak, A.L.; Guven, E. A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection. IEEE Commun. Surv. Tutor. 2015, 17, 2501–2528. [Google Scholar] [CrossRef]
- Kaur, R.; Gabrijeic, D.; Klobucar, T. Artificial intelligence for cybersecurity: Literature review and future research directions. Inf. Fusion 2023, 97, 101804. [Google Scholar] [CrossRef]
- Mvula, P.K.; Branco, P.; Jourdan, G.V.; Viktor, H.L. A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning. Discov. Data 2023, 1, 4. [Google Scholar] [CrossRef]
- Koroniotis, N.; Moustafa, N.; Turnbull, B.; Choo, K.K.R. Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: The Bot-IoT Dataset. In Proceedings of the Future Generation Computer Systems; Elsevier: Amsterdam, The Netherlands, 2019. [Google Scholar]
- Guhan, N.K.; Ramachandran, M.; Ravindran, S.; Vijean, V. A Deep and Systematic Review of the Intrusion Detection Systems based on Machine Learning and Deep Learning Techniques. In Proceedings of the 2024 10th International Conference on Advanced Computing and Communication Systems (ICACCS); IEEE: New York, NY, USA, 2024; p. 1564. [Google Scholar] [CrossRef]
- Bhavyashree, Y.R.; Kavyashree, M.K.; Amrutha, K.R. Systematic Review on Frameworks for Intrusion Detection using Machine Learning and Deep Learning Algorithms. In Proceedings of the 2024 Second International Conference on Networks, Multimedia and Information Technology (NMITCON); IEEE: New York, NY, USA, 2024; pp. 1–12. [Google Scholar] [CrossRef]
- Ali, T.; Eleyan, A.; Bejaoui, T. Detecting Conventional and Adversarial Attacks Using Deep Learning Techniques: A Systematic Review. In Proceedings of the 2023 International Symposium on Networks, Computers and Communications (ISNCC); IEEE: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
- Gamage, S.; Samarabandu, J. Deep learning methods in network intrusion detection: A survey and an objective comparison. J. Netw. Comput. Appl. 2020, 169, 102767. [Google Scholar] [CrossRef]
- Tsai, C.F.; Hsu, Y.F.; Lin, C.Y.; Lin, W.Y. Intrusion detection by machine learning: A review. Expert Syst. Appl. 2009, 36, 11994–12000. [Google Scholar] [CrossRef]
- Pingala Suthishni, D.N.; Kumar, K.S.S. A Review on Machine Learning based Security Approaches in Intrusion Detection System. In Proceedings of the 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom); IEEE: Piscataway, NJ, USA, 2022; pp. 101–105. [Google Scholar] [CrossRef]
- Azmoodeh, A.; Al-Rawi, W.; Al-Dahhan, M.; Ghita, B. Detecting Cyber Attacks in Industrial Control Systems Using Convolutional Neural Networks. arXiv 2018. [Google Scholar] [CrossRef]
- Ring, M.; Wunderlich, S.; Scheuring, D.; Landes, D.; Hotho, A. A survey of network-based intrusion detection data sets. Comput. Secur. 2019, 86, 147–167. [Google Scholar] [CrossRef]
- Yang, Z.; Liu, X.; Li, T.; Wu, D.; Wang, J.; Zhao, Y.; Han, H. A systematic literature review of methods and datasets for anomaly-based network intrusion detection. Comput. Secur. 2022, 116, 102675. [Google Scholar] [CrossRef]
- Strom, B.; Applebaum, A.; Miller, D.; Nickels, K.; Pennington, A.; Thomas, C. MITRE ATT&CK™: Design and Philosophy; MITRE Corporation: Bedford, MA, USA, 2018. [Google Scholar]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set (NSL-KDD). In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), Ottawa, ON, Canada, 8–10 July 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–6. [Google Scholar] [CrossRef]
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS); IEEE: Piscataway, NJ, USA, 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. CSE-CIC-IDS2017 Dataset: Intrusion Detection Evaluation Dataset; Canadian Institute for Cybersecurity (CIC), University of New Brunswick: Fredericton, NB, Canada, 2017. [Google Scholar]
- Gad, A.R.; Nashat, A.A.; Barkat, T.M. Intrusion Detection System Using Machine Learning for Vehicular Ad Hoc Networks Based on ToN-IoT Dataset. IEEE Access 2021, 9, 142206–142217. [Google Scholar] [CrossRef]
- Sowmya, T.; Mary Anita, E.A. A Comprehensive Review of AI Based Intrusion Detection System. Meas. Sens. 2023, 28, 100827. [Google Scholar] [CrossRef]
- Salem, A.H.; Azzam, S.M.; Emam, O.E.; Abohany, A.A. Advancing Cybersecurity: A Comprehensive Review of AI-Driven Detection Techniques. J. Big Data 2024, 11, 105. [Google Scholar] [CrossRef]
- Ofusori, L.; Bokaba, T.; Mhlongo, S. Artificial Intelligence in Cybersecurity: A Comprehensive Review and Future Direction. Appl. Artif. Intell. 2024, 38, 2439609. [Google Scholar] [CrossRef]
- Rehman, H.M.R.U.; Liaquat, S.; Gul, M.J.; Jhandir, M.Z.; Gavilanes, D.; Masias Vergara, M.; Ashraf, I. A Systematic Literature Study of Machine Learning Techniques Based Intrusion Detection: Datasets, Models, Challenges, and Future Directions. J. Big Data 2025, 12, 264. [Google Scholar] [CrossRef]
- Hozouri, A.; Mirzaei, A.; Effatparvar, M. A Comprehensive Survey on Intrusion Detection Systems with Advances in Machine Learning, Deep Learning and Emerging Cybersecurity Challenges. Discov. Artif. Intell. 2025, 5, 314. [Google Scholar] [CrossRef]
- Dobler, M.; Hellwig, M.; Lopes, N.; Oakley, K.; Winterburn, M. Systematic Review and Characterisation of Malicious Industrial Network Traffic Datasets. Int. J. Inf. Secur. 2025, 24, 208. [Google Scholar] [CrossRef]
- Alnabhan, M.Q.; Branco, P. Fake News Detection Using Deep Learning: A Systematic Literature Review. IEEE Access 2024, 12, 114435–114459. [Google Scholar] [CrossRef]
- Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; Technical Report EBSE-2007-01; Keele University: Staffordshire, UK, 2007. [Google Scholar]
- Zhu, B.; Joseph, A.; Sastry, S. A Taxonomy of Cyber Attacks on SCADA Systems. In Proceedings of the 2011 IEEE International Conferences on Internet of Things, and Cyber, Physical and Social Computing; IEEE: Piscataway, NJ, USA, 2011; pp. 380–385. [Google Scholar]
- Ozkan Okay, M.; Iliev, T.; Akin, E.; Aslan, O.; Kosunalp, S.; Stoyanov, I.; Beloev, I. A Comprehensive Survey: Evaluating the Efficiency of Artificial Intelligence and Machine Learning Techniques on Cyber Security Solutions. IEEE Access 2024, 12, 12229–12255. [Google Scholar] [CrossRef]
- Wu, M.; Moon, Y.B. Taxonomy of Cross-Domain Attacks on CyberManufacturing System. Procedia Comput. Sci. 2017, 114, 367–374. [Google Scholar] [CrossRef]
- Wu, M.; Moon, Y.B. Taxonomy for secure cybermanufacturing systems. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition Proceedings; ASME: New York, NY, USA, 2018; Volume 2, pp. 1–10. [Google Scholar] [CrossRef]
- Pan, Y.; White, J.; Schmidt, D.C.; Elhabashy, A.; Sturm, L.; Camelio, J.; Williams, C. Taxonomies for Reasoning About Cyber-physical Attacks in IoT-based Manufacturing Systems. Int. J. Interact. Multimed. Artif. Intell. 2017, 4, 45–54. [Google Scholar] [CrossRef]
- Tuptuk, N.; Hailes, S. Security of smart manufacturing systems. J. Manuf. Syst. 2018, 47, 93–106. [Google Scholar] [CrossRef]
- Wu, D.; Ren, A.; Zhang, W.; Fan, F.; Liu, P.; Fu, X.; Terpenny, J. Cybersecurity for digital manufacturing. J. Manuf. Syst. 2018, 48, 3–12. [Google Scholar] [CrossRef]
- Yampolskiy, M.; King, W.E.; Gatlin, J.; Belikovetsky, S.; Brown, A.; Skjellum, A.; Elovici, Y. Security of additive manufacturing: Attack taxonomy and survey. Addit. Manuf. 2018, 21, 431–457. [Google Scholar] [CrossRef]
- Elhabashy, A.E.; Wells, L.J.; Camelio, J.A.; Woodall, W.H. A cyber-physical attack taxonomy for production systems: A quality control perspective. J. Intell. Manuf. 2019, 30, 2489–2504. [Google Scholar] [CrossRef]
- Barnum, S. Common Attack Pattern Enumeration and Classification (CAPEC) Schema Description; Technical Report; MITRE Corporation: McLean, VA, USA, 2008. [Google Scholar]
- Hansman, S.; Hunt, R. A taxonomy of network and computer attacks. Comput. Secur. 2005, 24, 31–43. [Google Scholar] [CrossRef]
- Meyers, C.A.; Powers, S.S.; Faissol, D.M. Taxonomies of Cyber Adversaries and Attacks: A Survey of Incidents and Approaches; Technical Report; Lawrence Livermore National Laboratory: Livermore, CA, USA, 2009. [Google Scholar]
- Chapman, I.M.; Leblanc, S.P.; Partington, A. Taxonomy of cyber attacks and simulation of their effects. In Proceedings of the Military Modeling and Simulation Symposium; The Society for Modeling and Simulation International (SCS): San Diego, CA, USA, 2011; pp. 73–80. [Google Scholar]
- Simmons, C.B.; Shiva, S.G.; Bedi, H.; Dasgupta, D. AVOIDIT: A Cyber Attack Taxonomy. In Proceedings of the 9th Annual Symposium on Information Assurance; University at Albany, State University of New York: Albany, NY, USA, 2014; pp. 2–12. [Google Scholar]
- Emmert-Streib, F.; Dehmer, M. Taxonomy of machine learning paradigms: A data-centric perspective. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2022, 12, e1470. [Google Scholar] [CrossRef]
- von Rueden, L.; Mayer, S.; Beckh, K.; Georgiev, B.; Giesselbach, S.; Heese, R.; Kirsch, B.; Pfrommer, J.; Pick, A.; Bauckhage, C.; et al. Informed machine learning—A taxonomy and survey of integrating knowledge into learning systems. arXiv 2019, arXiv:1903.12394. Available online: https://arxiv.org/abs/1903.12394 (accessed on 11 May 2026). [CrossRef]
- Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Dietterich, T.G. Ensemble methods in machine learning. In Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar] [CrossRef]
- Ozlem, M.; Turk, A.; Yavuz, A. A review on cyber security dataset for machine learning algorithms. In Proceedings of the IEEE International Conference on Big Data (Big Data); IEEE: Piscataway, NJ, USA, 2017; pp. 2186–2193. [Google Scholar] [CrossRef]
- Mississippi State University. Gas Pipeline Intrusion Dataset; Mississippi State University, Critical Infrastructure Protection Center: Starkville, MS, USA, 2020; Available online: https://sites.google.com/a/uah.edu/tommy-morris-uah/ics-data-sets (accessed on 13 January 2025).
- Stolfo, S.; Fan, W.; Lee, W.; Prodromidis, A.; Chan, P. KDD Cup 1999 Data. UCI Machine Learning Repository. 1999. Available online: https://archive.ics.uci.edu/dataset/130/kdd+cup+1999+data (accessed on 13 February 2025).
- Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. CSE-CIC-IDS2018 Dataset; Canadian Institute for Cybersecurity (CIC), University of New Brunswick: Fredericton, NB, Canada, 2018. [Google Scholar]
- Sharafaldin, I.; Lashkari, A.H.; Hakak, S.; Ghorbani, A.A. Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy. In Proceedings of the 2019 International Carnahan Conference on Security Technology (ICCST), Chennai, India, 1–3 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Damasevicius, R.; Venckauskas, A.; Grigaliunas, S.; Toldinas, J.; Morkevicius, N.; Aleliunas, T.; Smuikys, P. LITNET-2020: An annotated real-world network flow dataset for network intrusion detection. Electronics 2020, 9, 800. [Google Scholar] [CrossRef]
- Barut, O.; Luo, Y.; Zhang, T.; Li, W.; Li, P. NetML: A challenge for network traffic analytics. arXiv 2020, arXiv:2004.13006. [Google Scholar] [CrossRef]
- Samarakoon, S.; Siriwardhana, Y.; Porambage, P.; Liyanage, M.; Chang, S.; Kim, J.; Kim, J.; Ylianttila, M. 5G-NIDD: A Comprehensive Network Intrusion Detection Dataset Generated over 5G Wireless Network; IEEE Dataport; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
- Kumar, P.; Liu, J.; Tayeen, A.S.M.; Misra, S.; Cao, H.; Harikumar, J.; Perez, O. FLNET2023: Realistic Network Intrusion Detection Dataset for Federated Learning. In Proceedings of the Proceedings of the MILCOM 2023–IEEE Military Communications Conference (MILCOM), Boston, MA, USA, 30 October–3 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 345–350. [Google Scholar] [CrossRef]
- Haider, W.; Hu, J.; Slay, J.; Turnbull, B.; Xie, Y. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J. Netw. Comput. Appl. 2017, 87, 185–192. [Google Scholar] [CrossRef]
- CAIDA. The CAIDA DDoS Attack 2007 Dataset; Center for Applied Internet Data Analysis, University of California San Diego: San Diego, CA, USA, 2007; Available online: https://www.caida.org/catalog/datasets/ddos-20070804_dataset/ (accessed on 10 May 2025).
- BoNeSi—The DDoS Botnet Simulator. 2020. Available online: https://github.com/Markus-Go/bonesi (accessed on 26 February 2022).
- Jonker, M.; Sperotto, A.; Pras, A. DDoSDB dataset: DDoS Mitigation—A Measurement-Based Approach. In Proceedings of the NOMS 2020—IEEE/IFIP Network Operations and Management Symposium, Budapest, Hungary, 20–24 April 2022; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Samiullah, H. Application Layer DoS Attack Dataset. 2021. Available online: https://www.kaggle.com/hamzasamiullah/ml-analysis-application-layer-dos-attack-dataset (accessed on 30 August 2021).
- Giménez, C.T.; Villegas, A.P.; Marañón, G.Á. HTTP Data Set CSIC 2010; Information Security Institute of CSIC (Spanish Research National Council): Madrid, Spain, 2010. [Google Scholar]
- Raïssi, C.; Brissaud, J.; Dray, G.; Poncelet, P.; Roche, M.; Teisseire, M. Web Analyzing Traffic Challenge: Description and Results. In Proceedings of the ECML PKDD 2007 Discovery Challenge, Warsaw, Poland, 17–21 September 2007. [Google Scholar]
- Digital Corpora. NPS-2009-Casper-Rw Dataset. 2009. Available online: https://digitalcorpora.org/corpora/disk-images/ (accessed on 11 May 2026).
- Hostiadi, D.P.; Ahmad, T. Dataset for Botnet group activity with adaptive generator. Data Brief 2021, 38, 107334. [Google Scholar] [CrossRef] [PubMed]
- Putra, M.A.R.; Hostiadi, D.P.; Ahmad, T. Botnet dataset with simultaneous attack activity. Data Brief 2022, 45, 108628. [Google Scholar] [CrossRef]
- Tayfour, O.E.; Mubarakali, A.; Tayfour, A.E.; Marsono, M.N.; Hassan, E.; Abdelrahman, A.M. Adapting deep learning-LSTM method using optimized dataset in SDN controller for secure IoT. Soft Comput. 2023, 27, 1–9. [Google Scholar] [CrossRef]
- Benign and Malicious Domains Based on DNS Logs. Benign and Malicious Domains Based on DNS Logs Dataset. 2022. Available online: https://data.mendeley.com/datasets/623sshkdrz/5 (accessed on 24 May 2022).
- Garcia, S.; Grill, M.; Stiborek, J.; Zunino, A. An empirical comparison of botnet detection methods. In Proceedings of the 2014 IEEE 32nd International Conference on Performance, Computing and Communications Conference (IPCCC); Elsevier: Amsterdam, The Netherlands, 2014; pp. 1–8. [Google Scholar] [CrossRef]
- Wang, W.; Zhu, M.; Zeng, X.; Ye, X.; Sheng, Y. Malware traffic classification using convolutional neural network for representation learning. In 2017 International Conference on Information Networking (ICOIN); IEEE: Piscataway, NJ, USA, 2017; pp. 712–717. [Google Scholar] [CrossRef]
- Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-BaIoT: Network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Comput. 2018, 17, 12–22. [Google Scholar] [CrossRef]
- Pa, Y.M.P.; Suzuki, S.; Yoshioka, K.; Matsumoto, T.; Kasama, T.; Rossow, C. IoTPOT: Analysing the rise of IoT compromises. In Proceedings of the 9th USENIX Workshop on Offensive Technologies (WOOT), Washington, DC, USA, 10–11 August 2014. [Google Scholar]
- García, S.; Shuvaev, S.; Uritskaya, A. IoT-23: A Labeled Dataset with Malicious and Benign IoT Network Traffic; Stratosphere Laboratory, Czech Technical University: Prague, Czech Republic, 2020. [Google Scholar]
- Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning. IEEE Access 2022, 10, 40281–40306. [Google Scholar] [CrossRef]
- Dadkhah, S.; Mahdikhani, H.; Danso, P.K.; Zohourian, A.; Truong, K.A.; Ghorbani, A.A. Towards the development of a realistic multidimensional IoT profiling dataset. In Proceedings of the 19th Annual International Conference on Privacy, Security and Trust (PST); IEEE: Piscataway, NJ, USA, 2022; pp. 1–11. [Google Scholar] [CrossRef]
- Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CicIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment. Sensors 2023, 23, 5941. [Google Scholar] [CrossRef] [PubMed]
- Hamza, A.; Gharakheili, H.H.; Benson, T.A.; Sivaraman, V. UNSW IoT Traffic Attack Dataset. In Proceedings of the 2019 ACM Symposium on SDN Research (SOSR); ACM: New York, NY, USA, 2019; pp. 36–48. [Google Scholar] [CrossRef]
- Aramini, A.; Arazzi, M.; Facchinetti, T.; Ngankem, L.S.; Nocera, A. Distributed IoT Traffic Attack Dataset. In Proceedings of the 2022 IEEE 18th International Conference on Factory Communication Systems (WFCS); IEEE: Piscataway, NJ, USA, 2022; pp. 1–8. [Google Scholar] [CrossRef]
- Emec, M. ROUT-4-2023: RPL Based Routing Attack Dataset for IoT. IEEE Dataport. 2023. Available online: https://ieee-dataport.org/documents/rout-4-2023-rpl-based-routing-attack-dataset-iot (accessed on 14 June 2024).
- Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, Y. Kitsune: An ensemble of autoencoders for online network intrusion detection. arXiv 2018, arXiv:1802.09089. [Google Scholar] [CrossRef]
- Samson, K. Wi-Fi Association and Disassociation Dataset. 2023. Available online: https://github.com/samsonkg/Wi-Fi-Association_Disassociation-Dataset (accessed on 26 August 2023).
- Pazul, K. Controller Area Network (CAN) Basics, 1999. Available online: https://cika.com/soporte/Information/Microchip/AnalogInterface/CAN/AppNotes/AN713(DS00713a).pdf (accessed on 20 May 2025).
- Song, H.M.; Woo, J.; Kim, H.K. In-vehicle network intrusion detection using deep convolutional neural network. Veh. Commun. 2020, 21, 100198. [Google Scholar] [CrossRef]
- Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B.S. Malware images. In Proceedings of the 8th International Symposium on Visualization for Cyber Security (VizSec), Pittsburgh, PA, USA, 20 July 2011; ACM: New York, NY, USA, 2011; pp. 1–7. [Google Scholar] [CrossRef]
- Ronen, R.; Radu, M.; Feuerstein, C.; Yom-Tov, E.; Ahmadi, M. Microsoft Malware Classification Challenge. arXiv 2018, arXiv:1802.10135. [Google Scholar] [CrossRef]
- Bozkir, A.S.; Cankaya, A.O.; Aydos, M. Utilization and Comparison of Convolutional Neural Networks in Malware Recognition. In Proceedings of the 27th Signal Processing and Communications Applications Conference (SIU), Sivas, Turkey, 24–26 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [Google Scholar] [CrossRef]
- Nappa, A.; Rafique, M.Z.; Caballero, J. The MALICIA dataset: Identification and analysis of drive-by download operations. Int. J. Inf. Secur. 2015, 14, 15–33. [Google Scholar] [CrossRef]
- Drebin: Android Malware Dataset. 2014. Available online: https://drebin.mlsec.org/ (accessed on 16 July 2025).
- VX Heavens. 2021. Available online: https://vx-underground.org/Archive (accessed on 6 July 2021).
- VirusShare Dataset. 2021. Available online: https://virusshare.com/ (accessed on 6 July 2021).
- VirusTotal. VirusTotal: Free Online Virus, Malware and URL Scanner. Available online: https://www.virustotal.com/ (accessed on 11 May 2026).
- Ullah, I.; Ahmad, J.; Ahmed, I.; Amin, R.; Imran, M. CIC-MalMem-2022: Malware detection in memory dumps using machine learning. In Proceedings of the 2022 International Conference on Cyber Security and Resilience (CSR); IEEE: Piscataway, NJ, USA, 2022; pp. 153–159. [Google Scholar] [CrossRef]
- Maniriho, P.; Mahmood, A.N.; Chowdhury, M.J.M.C. MeMalDet: A Memory Analysis-Based Malware Detection Framework Using Deep Autoencoders and Stacked Ensemble under Temporal Evaluations. Comput. Secur. 2024, 142, 103864. [Google Scholar] [CrossRef]
- Canadian Institute for Cybersecurity (CIC). CIC-CMD-2024: Command and Control Malware Dataset. 2024. Available online: https://www.kaggle.com/datasets/datasetengineer/cybertec-iiot-malware-dataset-cimd-2024 (accessed on 11 May 2026).
- Hopkins, M.; Reeber, E.; Forman, G.; Suermondt, J. Spambase Dataset UCI Machine Learning Repository. 1999. Available online: https://archive.ics.uci.edu/dataset/94/spambase (accessed on 11 May 2026).
- Biswas, B. Email Spam Classification Dataset CSV. 2020. Available online: https://www.kaggle.com/balaka18/email-spam-classification-dataset-csv (accessed on 5 May 2022).
- Nitisha. Email Spam Dataset. 2020. Available online: https://www.kaggle.com/nitishabharathi/email-spam-dataset (accessed on 1 May 2022).
- Naidu, C. Spam Classification for Basic NLP. 2021. Available online: https://kaggle.com/chandramoulinaidu/spam-classification-for-basic-nlp (accessed on 15 January 2022).
- Murthy, M.Y.B.; Mastanbi, S.; Sujitha, B.; Babu, K.R. Evaluating deep learning algorithms for natural language processing. In Algorithms for Intelligent Systems; Springer Nature: Singapore, 2023; pp. 709–720. [Google Scholar]
- Kaggle. Phishing Email Collection. 2020. Available online: https://www.kaggle.com/datasets/akashsurya156/phishing-paper1 (accessed on 11 May 2026).
- Jagtap, S. Kaggle Bot Account Detection Dataset. Available online: https://www.kaggle.com/datasets/shriyashjagtap/kaggle-bot-account-detection/data (accessed on 11 May 2026).
- MITRE. Sharing Threat Intelligence Just Got a lot Easier. 2018. Available online: https://oasis-open.github.io/cti-documentation/stix/intro (accessed on 31 December 2022).
- Zeng, V.; Baki, S.; Aassal, A.E.; Verma, R.; Moraes, L.F.T.D.; Das, A. Diverse datasets and a customizable benchmarking framework for phishing. In Proceedings of the Proceedings 6th International Workshop on Security and Privacy Analytics, New Orleans, LA, USA, 18 March 2020; ACM: New York, NY, USA, 2020; pp. 35–41. [Google Scholar]
- Chiew, K.L.; Chang, E.H.; Tan, C.L.; Abdullah, J.; Yong, K.C. Building standard offline anti-phishing dataset for benchmarking. Int. J. Eng. Technol. 2018, 7, 71–74. [Google Scholar] [CrossRef]
- Ariyadasa, S.; Fernando, S.; Fernando, S. Phishing Websites Dataset. Mendeley Data. 2021. Available online: https://data.mendeley.com/datasets/n96ncsr5g4/1 (accessed on 10 May 2025).
- Bahnsen, A.C.; Bohorquez, E.C.; Villegas, S.; Vargas, J.; Gonzalez, F.A. Classifying phishing URLs using recurrent neural networks. In Proceedings of the Proceedings APWG Symposium on Electronic Crime Research (eCrime), Scottsdale, Arizona, USA, 25–27 April 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–8. [Google Scholar]
- Mukherjee, A.; Venkataraman, V.; Liu, B.; Glance, N. What yelp fake review filter might be doing? In Proceedings of the 7th International Conference on Weblogs and Social Media (ICWSM); AAAI: Palo Alto, CA, USA, 2013; pp. 409–418. [Google Scholar]
- Rayana, S.; Akoglu, L. Collective opinion spam detection: Bridging review networks and metadata. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD); ACM: New York, NY, USA, 2015; pp. 985–994. [Google Scholar] [CrossRef]
- Wang, W.; Harrou, F.; Bouyeddou, B.; Senouci, S.M.; Sun, Y. A stacked deep learning approach to cyber-attacks detection in industrial systems: Application to power system and gas pipeline systems. Clust. Comput. 2022, 25, 561–578. [Google Scholar] [CrossRef]
- Mathur, A.P.; Tippenhauer, N.O. SWaT: A water treatment testbed for research and training on ICS security. In Proceedings of the 2016 International Workshop on Cyber-physical Systems for Smart Water Networks (CySWater); IEEE: Piscataway, NJ, USA, 2016. [Google Scholar] [CrossRef]
- Mubarak, S.; Habaebi, M.H.; Islam, M.R.; Balla, A.; Tahir, M. Industrial datasets with ICSs testbed and attack detection using machine learning techniques. Intell. Autom. Soft Comput. 2022, 31, 1345–1360. [Google Scholar] [CrossRef]
- Quincozes, S.E.; Albuquerque, C.; Passos, D.G.; Mossé, D. ERENO: A framework for generating realistic IEC-61850 intrusion detection datasets for smart grids. IEEE Trans. Dependable Secur. Comput. 2023, 21, 3851–3865. [Google Scholar] [CrossRef]
- Xu, Z. IEEE 118-Bus, 300-Bus and 3266-Bus System Dataset for Unit Commitment; IEEE DataPort; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar] [CrossRef]
- Software Engineering Institute (CERT Division), Carnegie Mellon University. Insider Threat Test Dataset (Versions r4–r6). Data Set, 2020. Synthetic Insider Threat Logs, Including Releases r4.x Through r6.x. Available online: https://kilthub.cmu.edu/articles/dataset/Insider_Threat_Test_Dataset/12841247/1 (accessed on 10 May 2025).
- Udacity. An Open Source Self-Driving Car. 2016. Available online: https://www.udacity.com/ (accessed on 10 May 2025).
- Stallkamp, J.; Schlipsing, M.; Salmen, J.; Igel, C. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 2012, 32, 323–332. [Google Scholar] [CrossRef] [PubMed]
- Unmanned Aerial Vehicle (UAV) Intrusion Detection. 2020. Available online: https://archive.ics.uci.edu/dataset/564/unmanned+aerial+vehicle+uav+intrusion+detection (accessed on 10 May 2025).
- Consumer Complaint Database. 2019. Available online: https://catalog.data.gov/dataset/consumer-complaint-database (accessed on 10 May 2025).
- TensorFlow Speech Recognition Challenge. 2019. Available online: https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/data (accessed on 11 January 2025).
- IMDB Dataset of 50K Movie Reviews. 2019. Available online: https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews (accessed on 1 May 2024).
- Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; Technical report; Citeseer: University Park, PA, USA, 2009. [Google Scholar]













| Review | Time Window/Focus | Main Analytical Emphasis | Attack Taxonomy | Dataset Taxonomy Depth | ATT&CK Mapping | Attack–Method–Dataset Cross-Reference | Main Limitation Relative to This Study |
|---|---|---|---|---|---|---|---|
| Sowmya et al. (2023) [24] | 72-paper review of AI-based IDS | ML, DL, and ensemble methods for intrusion detection | No explicit ATT&CK-style taxonomy | Moderate | No | Limited | IDS-centered rather than ATT&CK-aligned or tri-axis. |
| Mvula et al. (2023) [8] | SSL-focused SLR on cybersecurity datasets and metrics | Dataset repositories and performance metrics | No explicit attack taxonomy | Strong | No | Limited | Dataset- and metric-centred, not a tri-axis synthesis. |
| Salem et al. (2024) [25] | Review of more than sixty AI-driven cyber-threat studies | Broad comparison of ML, DL, and metaheuristics | Broad attack coverage, but no ATT&CK-based taxonomy | Moderate | No | Limited | Technique-centred rather than ATT&CK-organized. |
| Ofusori et al. (2024) [26] | Broad review of AI in cybersecurity | Applications, trends, and future directions | No structured threat taxonomy | Limited–moderate | No | No | Too high-level to expose specific attack–method–dataset gaps. |
| Rehman et al. (2025) [27] | Systematic review of ML-based intrusion detection | Models, datasets, metrics, and challenges | IDS/domain framing rather than ATT&CK tactics/ techniques | Moderate–strong | No | Limited | Close in topic, but still IDS-centric and not ATT&CK-aligned. |
| Hozouri et al. (2025) [28] | Survey of IDS with ML/DL advances | IDS architectures, benchmark datasets, and emerging challenges | No explicit ATT&CK-based taxonomy | Moderate | No | Limited | Strong IDS synthesis, but not a behavioural cross-reference review. |
| Dobler et al. (2025) [29] | Systematic review of malicious industrial traffic datasets | Dataset characterization and ML-oriented selection | Industrial attack types, but not ATT&CK as the main frame | Strong | No | Limited | Domain-specific dataset review rather than a broader tri-axis synthesis. |
| This review | 2019–2025; 99 studies | Tri-axis synthesis across attacks, ML methods, and datasets | MITRE ATT&CK-aligned | Strong | Yes | Yes | Designed to expose underexplored intersections, benchmark dependence, and gaps across attack behaviours, model families, and dataset categories. |
| RQ | Research question |
|---|---|
| RQ1 | What types of cyberattacks are most frequently studied, and how have they evolved? |
| RQ2 | Which machine learning and deep learning techniques are applied to mitigate attacks? |
| RQ3 | What datasets are commonly used in AI-powered cybersecurity research? |
| RQ4 | Which ML techniques are associated with specific categories of cyberattacks? |
| RQ5 | What are the key gaps and limitations in applying AI-powered methods for attack mitigation? |
| Criterion | Description |
|---|---|
| Inclusion | Peer-reviewed articles published between 2019 and 2025, written in English, addressing AI- or ML-based methods for cyberattack detection, classification, or mitigation, and reporting identifiable model/classifier and dataset information. |
| Exclusion | Non-peer-reviewed works, duplicate records, papers outside computer science, cybersecurity, or closely related AI-for-security domains, short or insufficiently detailed papers (fewer than five pages), and studies lacking the methodological detail required for structured comparison. |
| Criterion | Description | Scoring |
|---|---|---|
| Q1 | The study clearly specifies the attack type, family, or adversarial behaviour under analysis. | 0/1 |
| Q2 | The study clearly identifies the ML/DL method, model family, or detection pipeline used. | 0/1 |
| Q3 | The dataset or data source is clearly reported and sufficiently identifiable. | 0/1 |
| Q4 | The evaluation setting, metrics, or experimental design is sufficiently described for interpretation. | 0/1 |
| Q5 | The study provides enough methodological detail to support comparative synthesis. | 0/1 |
| Tactic | Count | Percentage |
|---|---|---|
| Impact | 72 | 14.55% |
| Initial Access | 59 | 11.92% |
| Execution | 58 | 11.72% |
| Command and Control | 55 | 11.11% |
| Reconnaissance | 54 | 10.91% |
| Credential Access | 42 | 8.48% |
| Defense Evasion | 31 | 6.26% |
| Discovery | 28 | 5.66% |
| Lateral Movement | 23 | 4.65% |
| Exfiltration | 21 | 4.24% |
| Persistence | 17 | 3.43% |
| Collection | 17 | 3.43% |
| Privilege Escalation | 13 | 2.63% |
| Resource Development | 5 | 1.01% |
| Technique | Count | Percentage |
|---|---|---|
| Network Denial of Service | 61 | 7.71% |
| Endpoint Denial of Service | 44 | 5.56% |
| Exploit Public-Facing Application | 43 | 5.44% |
| Active Scanning | 40 | 5.06% |
| Gather Victim Host Information | 36 | 4.55% |
| Brute Force | 34 | 4.30% |
| Application Layer Protocol | 32 | 4.05% |
| Command-Line Interface | 32 | 4.05% |
| Phishing | 25 | 3.16% |
| Input Capture | 23 | 2.91% |
| Number of Tactics | Number of Papers | Percentage |
|---|---|---|
| 1 | 19 | 19.19% |
| 2 | 8 | 8.08% |
| 3 | 11 | 11.11% |
| 4 | 11 | 11.11% |
| 5 | 7 | 7.07% |
| 6 | 8 | 8.08% |
| 7 | 10 | 10.10% |
| 8 | 9 | 9.09% |
| 9 | 7 | 7.07% |
| 10 | 6 | 6.06% |
| 11 | 1 | 1.01% |
| 12 | 2 | 2.02% |
| Number of Techniques | Number of Papers | Percentage |
|---|---|---|
| 1 | 14 | 14.14% |
| 2 | 11 | 11.11% |
| 3 | 6 | 6.06% |
| 4 | 12 | 12.12% |
| 5 | 5 | 5.05% |
| 6 | 4 | 4.04% |
| 7 | 4 | 4.04% |
| 8 | 11 | 11.11% |
| 9 | 1 | 1.01% |
| 10 | 1 | 1.01% |
| 11 | 7 | 7.07% |
| 12 | 5 | 5.05% |
| 13 | 2 | 2.02% |
| 16 | 1 | 1.01% |
| 17 | 2 | 2.02% |
| 18 | 3 | 3.03% |
| 19 | 1 | 1.01% |
| 22 | 2 | 2.02% |
| 23 | 2 | 2.02% |
| 24 | 1 | 1.01% |
| 26 | 3 | 3.03% |
| 27 | 1 | 1.01% |
| Main Category | Count | Subcategories | Count |
|---|---|---|---|
| Deep Learning Models | 72 | LSTM & Variants | 27 |
| Feedforward Networks & Variants | 24 | ||
| Core CNN Architectures | 22 | ||
| Transformer-Based Models | 9 | ||
| Autoencoders | 8 | ||
| Specialized/Advanced CNNs | 8 | ||
| GAN & Variants | 5 | ||
| GRU & Variants | 4 | ||
| Graph Neural Networks (GNN) | 3 | ||
| Hybrid, Ensemble & Explainable | 46 | Ensemble Learning Methods | 29 |
| Boosting | 16 | ||
| Hybrid Architectures | 13 | ||
| Interpretability | 4 | ||
| Classical Machine Learning Models | 34 | Statistical Models | 17 |
| SVM & Variants | 17 | ||
| Tree-Based Models | 16 | ||
| Bayesian Models | 9 | ||
| Clustering | 6 | ||
| Hidden Markov Models | 1 | ||
| Learning Paradigms and Optimization | 18 | Optimization Algorithms | 11 |
| Learning Paradigms & Feature Selection | 7 |
| Number of ML Methods Used | Number of Papers | Percentage |
|---|---|---|
| 1 | 22 | 22.2% |
| 2 | 25 | 25.3% |
| 3 | 17 | 17.2% |
| 4 | 13 | 13.1% |
| 5 | 7 | 7.1% |
| 6 | 9 | 9.1% |
| 7 | 3 | 3.0% |
| 8 | 1 | 1.0% |
| 11 | 1 | 1.0% |
| 13 | 1 | 1.0% |
| Number of ML Main Categories Used | Number of Papers | Percentage |
|---|---|---|
| 1 | 43 | 43.4% |
| 2 | 42 | 42.4% |
| 3 | 13 | 13.1% |
| 4 | 1 | 1.0% |
| Number of ML Subcategories Used | Number of Papers | Percentage |
|---|---|---|
| 1 | 25 | 25.3% |
| 2 | 37 | 37.4% |
| 3 | 12 | 12.1% |
| 4 | 12 | 12.1% |
| 5 | 7 | 7.1% |
| 6 | 4 | 4.0% |
| 7 | 2 | 2.0% |
| Category | Frequencies | Most Used Datasets within Category |
|---|---|---|
| NIDD | 65 | CSE-CIC-IDS2017 (14), UNSW-NB15 (11), NSL-KDD (10), CSE-CIC-IDS2018 (4) |
| IoT-NIDD | 31 | ToN-IoT (5), EdgeIIoT 2023 (5), BoT-IoT (4), N-BaIoT (3) |
| Malware | 20 | Malimg (4), BIG 2015 (3) |
| S&P | 17 | Phishing Email Collection (4), PhishTank (3) |
| Custom-Collected Datasets | 16 | |
| ICS | 12 | SWaT dataset (3), Gas Pipeline (2) |
| Other | 7 | |
| Insider Threat | 4 | CERT Insider Threat (4) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chizari, M.; Alam, A.; Ali Mirza, Q.K.; Chizari, H. A Tri-Axis Systematic Literature Review of AI-Powered Cyber Defense: ATT&CK-Aligned Analysis of Cyberattacks, Machine Learning Methods, and Datasets. Electronics 2026, 15, 2804. https://doi.org/10.3390/electronics15132804
Chizari M, Alam A, Ali Mirza QK, Chizari H. A Tri-Axis Systematic Literature Review of AI-Powered Cyber Defense: ATT&CK-Aligned Analysis of Cyberattacks, Machine Learning Methods, and Datasets. Electronics. 2026; 15(13):2804. https://doi.org/10.3390/electronics15132804
Chicago/Turabian StyleChizari, Mohammad, Abu Alam, Qublai Khan Ali Mirza, and Hassan Chizari. 2026. "A Tri-Axis Systematic Literature Review of AI-Powered Cyber Defense: ATT&CK-Aligned Analysis of Cyberattacks, Machine Learning Methods, and Datasets" Electronics 15, no. 13: 2804. https://doi.org/10.3390/electronics15132804
APA StyleChizari, M., Alam, A., Ali Mirza, Q. K., & Chizari, H. (2026). A Tri-Axis Systematic Literature Review of AI-Powered Cyber Defense: ATT&CK-Aligned Analysis of Cyberattacks, Machine Learning Methods, and Datasets. Electronics, 15(13), 2804. https://doi.org/10.3390/electronics15132804

