Introducing UWF-ZeekData24: An Enterprise MITRE ATT&CK Labeled Network Attack Traffic Dataset for Machine Learning/AI
Abstract
1. Introduction
2. Related Works
3. Methodology
3.1. Experimental Setup
3.2. Overall Architectural Framework


3.3. Hadoop Cluster

3.4. The Enterprise MITRE ATT&CK Framework
3.5. Generating and Collecting the Data
3.5.1. Labs Used to Generate the Data
| Lab | Description of Lab | MITRE ATT&CK Tactic | MITRE ATT&CK Technique to Be Used for Data Collection | 
|---|---|---|---|
| Network Mapping | 
  | Reconnaissance | Active Scanning | 
| Gather Victim Host Information | |||
| Gather Victim Identity Information | |||
| Gather Victim Network Information | |||
| Enumeration | 
  | Reconnaissance | Active Scanning | 
| Gather Victim Host Information | |||
| Gather Victim Identity Information | |||
| Gather Victim Network Information | |||
| Attack Metasploit | 
  | Initial Access | External Remote Services | 
| Password Attacks | 
  | Credential Access | Brute Force | 
| OS Credential Dumping | |||
| Reconnaissance | 
  | Reconnaissance | Active Scanning | 
| Gather Victim Host Information | |||
| Gather Victim Identity Information | |||
| Gather Victim Network Information | |||
| Gaining Access | 
  | Initial Access | Exploit Public Facing Application | 
| External Remote Services | |||
| Valid Accounts | |||
| Credential Access | Brute Force | ||
| Credentials from Password Stores | |||
| Input Capture | |||
| OS Credential Dumping | |||
| Lateral Movement | Exploitation of Remote Services | ||
| Lateral Tool Transfer | |||
| Remote Services Session Hijacking | |||
| Remote Services | |||
| Execution | 
  | Collection | Automated Exfiltration | 

3.5.2. Scripts Used to Generate the Data
| Pseudocode 1: Nmap Scan | 
| for time in [00:00, 05:00, 11:00, 17:00] do | 
| sleep_random(3400) | 
| exploit_mitre_att&ck_t1595() | 
| Pseudocode 2: Psexec Exploit | 
| for time in [01:00, 07:00, 12:00, 18:00] do | 
| sleep_random(3500) | 
| exploit_mitre_att&ck_t1078() | 
| Pseudocode 3: GlassFish Exploit | 
| for time in [02:00, 08:00, 14:00, 19:00] do | 
| sleep_random(3500) | 
| exploit_mitre_att&ck_t1110() | 
| Pseudocode 4: ProFTPD Exploit | 
| for time in [03:00, 09:00, 15:00, 21:00] do | 
| sleep_random(3500) | 
| exploit_mitre_att&ck_t1190() | 
| Pseudocode 5: SMB Exploit | 
| for time in [04:00, 10:00, 16:00, 22:00] do | 
| sleep_random(3500) | 
| exploit_mitre_att&ck_t1078() | 
| exploit_mitre_att&ck_t1048() | 
| Pseudocode 6: Attack Script | 
| # Define the CSV file path | 
| Set CSV_FILE to “/home/kali/nmap_scan.csv” | 
| # Get the current timestamp in “MM/DD/YYYY HH:MM:SS” format | 
| Set TIMESTAMP to current date in “MM/DD/YYYY HH:MM:SS” format | 
| # Set constants | 
| Set GROUP_NUMBER to “1” | 
| Set TACTIC_ID to “T1595” | 
| Set SOURCE_IP to “143.88.1.18” | 
| Set SOURCE_PORT to ““ (empty) | 
| Set TARGET_IP to “143.88.2.1-21” | 
| Set TARGET_PORT to “445” | 
| # Get the start time and date in UTC and split it into components | 
| Set START_TIME_DATE to current date in “YYYY-MM-DDTHH:MM:SSZ” format (UTC) | 
| Set START_YEAR to current year in “YYYY” format | 
| Set START_MONTH to current month in “MM” format | 
| Set START_DAY to current day in “DD” format | 
| Set START_TIME to current time in “HH:MM:SS” format | 
| # Run the nmap scan with specified target IP, port, and output file | 
| Run nmap with options: | 
| - Timing template “T4” | 
| - Port set to TARGET_PORT | 
| - Target IP set to TARGET_IP | 
| - Output results in XML format to “nmapOut.xml” | 
| # Get the end time and date components | 
| Set END_YEAR to current year in “YYYY” format | 
| Set END_MONTH to current month in “MM” format | 
| Set END_DAY to current day in “DD” format | 
| Set END_TIME to current time in “HH:MM:SS” format | 
| # Write all collected data into the CSV file | 
| Append to CSV_FILE: | 
| TIMESTAMP, GROUP_NUMBER, TACTIC_ID, SOURCE_IP, SOURCE_PORT, TARGET_IP, TARGET_PORT, | 
| START_TIME_DATE, START_YEAR, START_MONTH, START_DAY, START_TIME, END_YEAR, | 
| END_MONTH, END_DAY, END_TIME | 
3.6. Mapping and Labeling Data
- 1.
 - Preprocessing Mission Logs
- Time Conversion: Mission log timestamps are converted to epoch time.
 - Array Creation: Arrays for specific features within the logs like source/destination ports, source/destination IP, and attack indicators are built.
 
 - 2.
 - Preprocessing Conn Data
- A similar process to mission logs occurs, timestamps are converted and attribute names that contain “.” are renamed in order to maintain compatibility with spark processing.
 
 - 3.
 - Joining Mission Logs with Conn Data
- Mission logs and conn data are joined based on time intervals (this allows for a slop factor, which was 1 min in this case), IPaddresses, as well as port numbers.
 - After they are joined, the Conn Data inherits the attack information taken from the mission logs.
 
 - 4.
 - Merging with STIX Data
- Labeled Conn data is combined using STIX data in order to enhance MITRE technique-to-tactic mappings.
 - Flattening array structures in IP and attack fields allow for cases where a single technique relates to multiple tactics.
 
 - 5.
 - Final Labeled Conn Data
- Benign entries are labeled with mitre_attack == none and label_tactic == none.
 
 - 6.
 - Final Labeled DNS Data
- Finally, the labeled DNS dataset is created by joining labeled Conn data with raw DNS dating using Unique identifiers (uid).
 
 

4. The Dataset
4.1. Zeek Logs
| Name | Total Count | Description | 
|---|---|---|
| mission_logs | 29,550 | Used for collating records. | 
| Broker | 19,818 | Communication file used to enforce asynchronous distributed communication, as well as to interact with persistent data stores. | 
| capture_loss | 19,746 | Shows how well Zeek’s management and analysis tools are working. A missing TCP sequence set is correlated to a “gap” of lost data. This lost data results in a capture_loss file. | 
| Cluster | 84 | Zeek cluster messages. | 
| conn-summary | 4433 | |
| Conn | 46,991,170 | Tracks protocols and associated information such as IP addresses, durations, transferred (two way) bytes, states, packets, and tunnel information. Conn files provide all data regarding the connection between two points. | 
| dhcp | 32,113 | Helps correlate IP addresses and MAC addresses and potentially hostnames. From a security standpoint, this allows for the confirmation of connected systems/services and potential intrusion detection by determining which system is assigned to which IP address. | 
| dns | 59,041,059 | Provides a swath of information on how specific systems access and utilize the internet and other systems; focuses on the system that is asking a question and all elements of the question and its associated answer. | 
| loaded_scripts | 1455 | |
| Notice | 7111 | An event that Zeek learning has determined to be inspection-worthy; these are often higher-level alerts such as self-signed certs and are Zeek’s approximate equivalent to IDS alerts. | 
| Reporter | 4 | Internal error/warning messages. | 
| Stats | 34,549 | Memory/event/packet/lag statistics. | 
| Stderr | 21 | Captures standard errors when Zeek is started from ZeekControl. | 
| Stdout | 32 | Captures standard outputs when Zeek is started from ZeekControl. | 
| Weird | 4000 | Anything that does not fall into any other category. | 
4.2. Mission Logs
4.3. Tactics and Techniques in UWF-ZeekData24
| Attack Type | Description | 
|---|---|
| Reconnaissance | Active or passive tactics for gathering information that can be used to plan future operations. | 
| Discovery | Tactics that may be used to gain knowledge about the system and internal network. | 
| Credential access | Tactics for stealing credentials such as account names and passwords. | 
| Privilege escalation | Tactics used to gain higher-level permissions on systems or networks. | 
| Exfiltration | Tactics used to steal data from the network. | 
| Initial access | Tactics that use various entry vectors to gain an initial foothold within the network. | 
| Persistence | Tactics used to keep access to systems across restarts, changed credentials, and other interruptions. | 
| MITRE Tactic Attack Type | Count | % | 
|---|---|---|
| Credential Access | 871,188 | 90.88 | 
| Reconnaissance | 58,095 | 6.06 | 
| Initial Access | 10,662 | 1.11 | 
| Privilege Escalation | 6048 | 0.631 | 
| Persistence | 6048 | 0.631 | 
| Defense Evasion | 6048 | 0.631 | 
| Exfiltration | 559 | 5.83 × 10−4 | 
| MITRE Tactic Attack Type | Count | % | 
|---|---|---|
| T1110 | 871,188 | 90.87 | 
| T1595 | 58,095 | 6.06 | 
| T1078 | 6048 | 1.89 | 
| T1190 | 4614 | 0.63 | 
| T1048 | 559 | 0.48 | 
5. Traffic Analysis
| Traffic_Type_Relabelled | Count | % | 
|---|---|---|
| Malicious traffic | 958,648 | 50 | 
| Non-malicious traffic | 958,561 | 50 | 
Traffic Analysis of Cumulative Flows
| Features | Sub-Features | Counts | 
|---|---|---|
| src_bytes | 1,063,460,303 | |
| dest_bytes | 12,461,401,543 | |
| src_pkts | 45,053,268 | |
| dest_pkts | 53,723,672 | |
| Protocol Types | udp | 928,896 | 
| icmp | 2691 | |
| tcp | 985,622 | |
| Unique | src_ip | 55 | 
| dest_ip | 166 | 
6. Crowded Sourced Data Versus Controlled Data
6.1. UWF-ZeekData22 Plots










6.2. UWF-ZeekData24 Plots










7. Conclusions
8. Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| NTP | Network Time Protocol | 
| IP | Interface Protocol | 
| FTP | File Transfer Protocol | 
| DNS | Domain Name System | 
| DoS | Denial of Service | 
| LAN | Local Area Network | 
| WAN | Wide Area Network | 
| VLAN | Virtual Local Area Network | 
| STIX | Structured Threat Information Expression | 
| PCAP | Packet Capture | 
| HDFS | Hadoop Distributed File System | 
| VM | Virtual Machine | 
| IDS | Intrusion Detection System | 
| IPS | Intrusion Prevention System | 
| TTP | Tactics, Techniques, Procedures | 
| SMB | Server Message Block | 
References
- MITRE ATT&CK. Available online: https://attack.mitre.org/ (accessed on 19 September 2024).
 - About Zeek—Book of Zeek. Available online: https://docs.zeek.org/en/master/about.html (accessed on 16 September 2024).
 - Bagui, S.S.; Mink, D.; Bagui, S.C.; Ghosh, T.; Plenkers, R.; McElroy, T.; Dulaney, S.; Shabanali, S. Introducing UWF-ZeekData22: A Comprehensive Network Traffic Dataset Based on the MITRE ATT&CK Framework. Data 2023, 8, 18. [Google Scholar] [CrossRef]
 - KDD Cup 1999. Available online: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 3 September 2024).
 - Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. Available online: https://ieeexplore.ieee.org/document/5356528 (accessed on 9 August 2024).
 - Moustafa, N.; Slay, J. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems. In Proceedings of the Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia, 10–12 November 2015; IEEE: Canberra, Australia, 2015; pp. 1–6. Available online: https://ieee-dataport.org/documents/unswnb15-dataset (accessed on 9 August 2024).
 - Maciá-Fernández, G.; Camacho, J.; Magán-Carrión, R.; García-Teodoro, P.; Therón, R. UGR’16: A New Dataset for the Evaluation of Cyclostationarity-Based Network IDSs. Comput. Secur. 2018, 73, 411–424. [Google Scholar] [CrossRef]
 - Sharafaldin, I.; Habibi Lashkari, A.; Ghorbani, A.A. A Detailed Analysis of the CICIDS2017 Data Set. In ICISSP; Revised Selected Papers; Springer: Cham, Switzerland, 2018; pp. 172–188. Available online: https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 4 August 2024).
 - Booij, T.M.; Chiscop, I.; Meeuwissen, E.; Moustafa, N.; den Hartog, F.T. ToN_IoT: The Role of Heterogeneity and the Need for Standardization of Features and Attack Types in IoT Network Intrusion Data Sets. IEEE Internet Things J. 2022, 9, 485–496. [Google Scholar] [CrossRef]
 - Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment. Sensors 2023, 23, 5941. [Google Scholar] [CrossRef] [PubMed]
 - Available online: https://datasets.uwf.edu/ (accessed on 3 August 2024).
 - Kali Linux | Penetration Testing and Ethical Hacking Linux Distribution. Available online: https://www.kali.org/ (accessed on 3 August 2023).
 - pfSense Documentation. Netgate. Available online: https://docs.netgate.com/pfsense/en/latest/ (accessed on 9 August 2024).
 - Metasploit. Available online: https://www.rapid7.com/products/metasploit/resources/ (accessed on 6 September 2024).
 - Security Onion Solutions. Available online: https://securityonionsolutions.com/ (accessed on 3 August 2024).
 - Project Jupyter | Home. Available online: https://jupyter.org/ (accessed on 9 August 2024).
 - Apache Spark—Unified Engine for Large-Scale Data Analytics. Available online: https://spark.apache.org/ (accessed on 3 August 2024).
 - Apache Hadoop. Available online: https://hadoop.apache.org/ (accessed on 10 August 2024).
 - Windows Server 2008 R2 and Windows 2000. Microsoft. Available online: https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/hh831795(v=ws.11) (accessed on 9 August 2024).
 - Singh, Y.; Singh, P.; Sinha, G. Footprinting using Nmap. J. Inform. Electr. Electron. Eng. 2022, 3, 1–15. [Google Scholar] [CrossRef]
 - “PsExec.” Microsoft Sysinternals Documentation, Microsoft. Available online: https://learn.microsoft.com/en-us/sysinternals/downloads/psexec (accessed on 9 August 2024).
 - GlassFish Documentation. Oracle. Available online: https://docs.oracle.com/cd/E26576_01/index.htm (accessed on 9 August 2024).
 - ProFTPD Documentation. ProFTPD Project. Available online: http://www.proftpd.org/ (accessed on 9 August 2024).
 - SMB Essay 71415: SMB. University of Twente. Available online: https://essay.utwente.nl/71415/ (accessed on 9 August 2024).
 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.  | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Elam, M.; Mink, D.; Bagui, S.S.; Plenkers, R.; Bagui, S.C. Introducing UWF-ZeekData24: An Enterprise MITRE ATT&CK Labeled Network Attack Traffic Dataset for Machine Learning/AI. Data 2025, 10, 59. https://doi.org/10.3390/data10050059
Elam M, Mink D, Bagui SS, Plenkers R, Bagui SC. Introducing UWF-ZeekData24: An Enterprise MITRE ATT&CK Labeled Network Attack Traffic Dataset for Machine Learning/AI. Data. 2025; 10(5):59. https://doi.org/10.3390/data10050059
Chicago/Turabian StyleElam, Marshall, Dustin Mink, Sikha S. Bagui, Russell Plenkers, and Subhash C. Bagui. 2025. "Introducing UWF-ZeekData24: An Enterprise MITRE ATT&CK Labeled Network Attack Traffic Dataset for Machine Learning/AI" Data 10, no. 5: 59. https://doi.org/10.3390/data10050059
APA StyleElam, M., Mink, D., Bagui, S. S., Plenkers, R., & Bagui, S. C. (2025). Introducing UWF-ZeekData24: An Enterprise MITRE ATT&CK Labeled Network Attack Traffic Dataset for Machine Learning/AI. Data, 10(5), 59. https://doi.org/10.3390/data10050059
        
