Detecting AI-Generated Network Traffic Using Transformer–MLP Ensemble
Abstract
1. Introduction
- We propose a dual-stage intrusion detection system that combines a lightweight LSTM-based primary detector for conventional malicious traffic with a Transformer–MLP ensemble secondary verifier designed to identify AI-generated traffic.
- We introduce a message-type transition–based feature representation tailored to the MQTT protocol, enabling the detection of adversarial traffic that imitates legitimate communication patterns but fails to reproduce deeper protocol-level behaviours.
- We present a log-ratio–based anomaly scoring mechanism that captures subtle probabilistic irregularities while reducing false positives, thereby enhancing overall detection reliability.
- We validate the proposed approach through extensive experiments on the MQTTset dataset under two representative scenarios (DoS and multi-attack), achieving an average accuracy of 99.1% and 100% detection of AI-generated traffic. Additionally, we analyse inference latency, computational complexity, and trade-offs between detection accuracy and real-time deployment feasibility to provide practical insights for real-world IoT applications.
2. Related Work
2.1. Malicious Network Traffic Detection
2.2. Generative AI-Based Data Generation and Detection
3. Fake Network Traffic Detection
3.1. MQTT
3.2. Malicious Network Traffic Detection (LSTM)
3.2.1. Dataset Preprocessing and Feature Extraction
- Source Port IndexTo reflect the characteristic behaviour of a single publisher initiating multiple ephemeral connections during DoS or Brute-force attacks, each new source port was assigned a sequential index. For example, the first port was indexed as 1001, and subsequent ports were incremented by 1. Frequent port changes indicate a high connection attempt rate typical of resource-exhaustion or brute-force login attempts.
- TCP LengthIn Flood attacks, attackers typically transmit unusually large packets to overload the broker. To capture this behaviour, packets with payload sizes below 10,000 bytes were normalised to −1, while packets exceeding this threshold retained their original length values.
- MQTT Message TypeThe frequency and sequence of MQTT message-types can vary significantly during Flood and Malformed attacks. For instance, flood attacks generate a high volume of PUBLISH packets (type 3), while malformed attacks exhibit abnormal sequences such as a SUBSCRIBE request immediately followed by a PUBLISH message, which attempts to trigger exceptions on the broker.
- Keep AliveIn SlowITe attacks [22], attackers exploit the broker’s timeout mechanism by sending connection packets with abnormally high Keep Alive values (e.g., 65,535), forcing the broker to wait excessively before terminating the session. Values below 1000 were normalised to −1, while higher values were preserved to capture this anomaly.
- Connection ACKIn Brute-force attacks, repeated authentication failures result in frequent CONNACK responses with refusal codes (e.g., 5). These raw response codes were retained as they strongly indicate repeated failed login attempts.
3.2.2. Malicious Network Detection Model
3.3. Fake Network Traffic Generation
3.4. Fake Network Traffic Detection Using Transformer and MLP
3.4.1. Feature Selection for Fake Detection
3.4.2. Detection Model
3.5. Dual Detection Workflow Summary
4. Experimental Results
4.1. Experiment Setup
4.2. Malicious Network Traffic Detection Results
4.3. Fake Network Traffic Generation Results
4.4. Fake Network Traffic Detection Results
4.4.1. Model Performance Comparison
4.4.2. Inference Efficiency Analysis
4.4.3. Comparison with LSTM
4.5. Malicious and Fake Network Traffic Detection Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
GPT Prompt for Synthetic Traffic Generation
References
- Electricity AMI. Available online: https://www.aitimes.com/news/articleView.html?idxno=141421 (accessed on 22 August 2025).
- Gas AMI. Available online: https://www.gasnews.com/news/articleView.html?idxno=104555 (accessed on 22 August 2025).
- Water AMI. Available online: https://www.boannews.com/media/view.asp?idx=85538 (accessed on 22 August 2025).
- Global ICT Research. Available online: https://www.globalict.kr/product/product_list.do?menuCode=040200&knwldNo=143735 (accessed on 22 August 2025).
- MQTT. Available online: https://mqtt.org (accessed on 22 August 2025).
- IOT Security Threat Report. Available online: https://www.msit.go.kr/bbs/view.do?sCode=user&nttSeqNo=3185279&pageIndex=&searchTxt=&searchOpt=ALL&bbsSeqNo=94&mId=307&mPid=208 (accessed on 22 August 2025).
- Three Open Source MQTT Message Brokers Found Vulnerable Against a DoS. Available online: https://www.secureblink.com/cyber-security-news/three-open-source-mqtt-message-brokers-found-vulnerable-against-a-dos-cyrc-alerted (accessed on 22 August 2025).
- 32,000 Smart Homes Can Be Easily Hacked Due to Misconfigured MQTT Servers. Available online: https://www.csoonline.com/article/566079/32000-smart-homes-can-be-easily-hacked-due-to-misconfigured-mqtt-servers.html (accessed on 22 August 2025).
- Ullah, F.; Ullah, S.; Srivastava, G.; Lin, J.C.W. IDS-INT: Intrusion Detection System Using Transformer-Based Transfer Learning for Imbalanced Network Traffic. Digit. Commun. Netw. 2024, 10, 190–204. [Google Scholar] [CrossRef]
- Bazaluk, B.; Hamdan, M.; Ghaleb, M.; Gismalla, M.S.M.; Correa da Silva, F.S.; Batista, D.M. Towards a Transformer-Based Pre-trained Model for IoT Traffic Classification. In Proceedings of the 2024 IEEE Network Operations and Management Symposium (NOMS), Seoul, Republic of Korea, 20–24 May 2024; pp. 1–7. [Google Scholar]
- Al Hanif, A.; Ilyas, M. Enhance the Detection of DoS and Brute-Force Attacks within the MQTT Environment through Feature Engineering and Employing an Ensemble Technique. arXiv 2024, arXiv:2408.00480. [Google Scholar] [CrossRef]
- Choi, S.; Cho, J. Novel Feature-Extraction Method for Detecting Malicious MQTT Traffic Using Seq2Seq. Appl. Sci. 2022, 12, 12306. [Google Scholar] [CrossRef]
- Vaccari, I.; Chiola, G.; Aiello, M.; Mongelli, M.; Cambiaso, E. MQTTset, a New Dataset for Machine Learning Techniques on MQTT. Sensors 2020, 20, 6578. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2690. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 22 August 2025).
- Lee, S.; Kim, B.; Choi, S. MQTT-based IDS Evasion Method using GPT. J. Korean Inst. Inf. Technol. 2024, 22, 175–182. [Google Scholar] [CrossRef]
- Mitchell, E.; Lee, Y.; Khazatsky, A.; Manning, C.D.; Finn, C. DetectGPT: Zero-Shot Machine-Generated Text Detection Using Probability Curvature. In Proceedings of the 40th International Conference on Machine Learning (ICML), Honolulu, HI, USA, 23–29 July 2023; pp. 24950–24968. [Google Scholar]
- Tian, E. GPTZero. Available online: https://gptzero.me (accessed on 22 August 2025).
- Elaziz, M.A.; Fares, I.A.; Dahou, A.; Shrahili, M. Federated learning framework for IoT intrusion detection using tab transformer and nature-inspired hyperparameter optimization. Front. Big Data 2025, 8, 1526480. [Google Scholar]
- Yuan, X.; Han, S.; Huang, W.; Ye, H.; Kong, X.; Zhang, F. A Simple Framework to Enhance the Adversarial Robustness of Deep Learning-based Intrusion Detection System. Comput. Secur. 2024, 137, 103644. [Google Scholar] [CrossRef]
- Wali, S.; Farrukh, Y.A.; Khan, I. Explainable AI and Random Forest Based Reliable Intrusion Detection System. Comput. Secur. 2025, 157, 104542. [Google Scholar] [CrossRef]
- Vaccari, I.; Aiello, M.; Cambiaso, E. SlowITe, a Novel Denial of Service Attack Affecting MQTT. Sensors 2020, 20, 2932. [Google Scholar] [CrossRef] [PubMed]
- OpenAI. GPT-4o: Omni Model for Text, Audio, and Vision. Available online: https://openai.com/index/hello-gpt-4o (accessed on 22 August 2025).








| Source Port Index | TCP Length | Message Type | Keep Alive | Connection ACK | |
|---|---|---|---|---|---|
| DoS Attack | ○ | - | - | - | - |
| Flood Attack | - | ○ | ○ | - | - |
| SlowITe Attack | - | - | - | ○ | - |
| Malformed Attack | - | - | ○ | - | - |
| Brute-force Attack | ○ | - | - | - | ○ |
| Message Type | Next Message Type (Most Frequent) | Transition Probability (%) |
|---|---|---|
| 1 (CONNECT) | 2 (CONNACK) | 100 |
| 2 (CONNACK) | 8 (SUBSCRIBE) | 100 |
| 3 (PUBLISH) | 3 (PUBLISH) | 99.45 |
| 8 (SUBSCRIBE) | 9 (SUBACK) | 100 |
| 9 (SUBACK) | 3 (PUBLISH) | 100 |
| 12 (PINGREQ) | 13 (PINGRESP) | 100 |
| 13 (PINGRESP) | 3 (PUBLISH) | 100 |
| Data Type | Train Data | Test Data |
|---|---|---|
| Normal | 40,000 | 10,000 |
| Malicious (DoS) | 40,000 | 10,000 |
| Fake (GPT-4o) | - | 10,000 |
| Fake (GAN) | - | 10,000 |
| Data Type | Train Data | Test Data |
|---|---|---|
| Normal | 12,000 | 3000 |
| Malicious (All types) | 12,000 | 3000 |
| Fake (GPT-4o) | - | 3000 |
| Fake (GAN) | - | 3000 |
| Experiment | Traffic Type | Accuracy (%) |
|---|---|---|
| DoS | Normal (10,000) | 100 |
| Malicious (10,000) | 100 | |
| All types | Normal (3000) | 99.13 |
| Malicious (3000) | 94.83 |
| Experiment | Traffic Type | Accuracy (%) |
|---|---|---|
| DoS | Fake (GPT-4o) | 66.09 |
| Fake (GAN) | 57.9 | |
| All types | Fake (GPT-4o) | 98.09 |
| Fake (GAN) | 36.87 |
| Scenario | Model | Normal | Malicious | Fake (GPT-4o) | Fake (GAN) |
|---|---|---|---|---|---|
| DoS | Transformer | 100% | 99.41% | 100% | 100% |
| MLP | 100% | 100% | 100% | 100% | |
| Ensemble | 100% | 100% | 100% | 100% | |
| All types | Transformer (Ensemble) | 100% | 87.69% | 100% | 100% |
| MLP | 91.63% | 25.93% | 100% | 100% |
| Model | Total Time (s) | Per Sequence Latency (ms) | Throughput (seq/s) |
|---|---|---|---|
| MLP | 88.74 | 8.874 | 112.7 |
| Transformer | 209.54 | 20.954 | 47.7 |
| Ensemble | 238.73 | 23.873 | 41.9 |
| Experiment | Traffic Type | Accuracy (%) |
|---|---|---|
| DoS | Normal | 98.9 |
| Malicious | 100 | |
| Fake (GPT-4o) | 100 | |
| Fake (GAN) | 100 | |
| All types | Normal | 99.13 |
| Malicious | 94.83 | |
| Fake (GPT-4o) | 100 | |
| Fake (GAN) | 100 |
| Scenario | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| DoS | 99.73% | 99.63% | 100% | 99.81% |
| All types | 98.49% | 99.71% | 98.28% | 98.99% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, B.; Chaudhary, A.; Choi, S. Detecting AI-Generated Network Traffic Using Transformer–MLP Ensemble. Appl. Sci. 2025, 15, 11338. https://doi.org/10.3390/app152111338
Kim B, Chaudhary A, Choi S. Detecting AI-Generated Network Traffic Using Transformer–MLP Ensemble. Applied Sciences. 2025; 15(21):11338. https://doi.org/10.3390/app152111338
Chicago/Turabian StyleKim, Byeongchan, Abhishek Chaudhary, and Sunoh Choi. 2025. "Detecting AI-Generated Network Traffic Using Transformer–MLP Ensemble" Applied Sciences 15, no. 21: 11338. https://doi.org/10.3390/app152111338
APA StyleKim, B., Chaudhary, A., & Choi, S. (2025). Detecting AI-Generated Network Traffic Using Transformer–MLP Ensemble. Applied Sciences, 15(21), 11338. https://doi.org/10.3390/app152111338

