Shield-X: Vectorization and Machine Learning-Based Pipeline for Network Traffic Threat Detection †
Abstract
1. Introduction
2. Proposed Method
3. Results
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Santos, V. Sistemas de Detecção de Intrusões (IDS—Intrusion Detection Systems) Usando Unicamente Softwares Open Source. SegInfo. 2010. Available online: https://seginfo.com.br/2010/06/21/sistemas-de-deteccao-de-intrusoes-ids-intrusion-detection-systems-usando-unicamente-softwares-open-source/ (accessed on 18 August 2025).
- Palo Alto Networks. O que é um sistema de detecção de intrusões? Cyberpedia, n.d. Available online: https://www.paloaltonetworks.com.br/cyberpedia/what-is-an-intrusion-detection-system-ids (accessed on 5 September 2025).
- Hariharasubramanian, N. Signature Based IDS vs. Anomaly Based IDS: Understanding the Difference. Which is Best for Your Needs? Fidelis Security. 2025. Available online: https://fidelissecurity.com/cybersecurity-101/learn/signature-based-vs-anomaly-based-ids/ (accessed on 12 September 2025).
- Lip, Y.P.; Dai, Z.; Leem, S.J.; Chen, Y.; Yang, J.; Binbeshr, F. A Systematic Literature Review on AI-Based Methods and Challenges in Detecting Zero-Day Attacks. IEEE Access 2024, 12, 144150–144163. [Google Scholar] [CrossRef]
- Perumal, G.; Subburayalu, G.; Abbas, Q.; Naqi, S.M.; Qureshi, I. VBQ-Net: A Novel Vectorization-Based Boost Quantized Network Model for Maximizing the Security Level of IoT System to Prevent Intrusions. Systems 2023, 11, 436. [Google Scholar] [CrossRef]
- Moore, A.; Zuev, D.; Crogan, M. Discriminators for Use in Flow-Based Classification; Technical Report; Intel Research: Cambridge, UK, 2005. [Google Scholar]
- Komisarek, M.; Pawlicki, M.; Kozik, R.; Choraś, M. Machine Learning Based Approach to Anomaly and Cyberattack Detection in Streamed Network Traffic Data. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl. 2021, 12, 3–19. [Google Scholar] [CrossRef]
- Srivatsa, A.; Gudisa, V. LogSense: Scalable Real-Time Log Anomaly Detection Architecture. arXiv 2024, arXiv:2408.13699. Available online: https://clovlog.com/logsense.pdf (accessed on 20 September 2025).
- Yu, Z.; Yang, S.; Li, Z.; Li, L.; Luo, H.; Yang, F. LogMS: A multi-stage log anomaly detection method based on multi-source information fusion and probability label estimation. Front. Phys. 2024, 15, 1401857. [Google Scholar] [CrossRef]
- Akanbi, A.; Masinde, M. A distributed stream processing middleware framework for real-time analysis of heterogeneous data. Sensors 2020, 10, 3166. [Google Scholar]
- McKinney, W. Data Structures for Statistical Computing in Python. SciPy 2010, 56–61. [Google Scholar] [CrossRef]
- Alshammari, R.; Zincir-Heywood, A.N. Investigating two different approaches for encrypted traffic classification. In Proceedings of the 2008 Sixth Annual Conference on Privacy, Security and Trust, Fredericton, NB, Canada, 1–3 October 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 156–166. [Google Scholar]
- Garcia, M.L.; Marin, L.; Gomez-Skarmeta, A.F. Intrusion detection in wireless ad hoc networks. J. Netw. Comput. Appl. 2004, 25, 99–114. [Google Scholar]
- Jung, J.; Paxson, V.; Berger, A.W. Fast Portscan Detection Using Sequential Hypothesis Testing; IEEE S&P: San Francisco, CA, USA, 2002; pp. 211–225. [Google Scholar]
- Lakhina, A.; Crovella, M.; Diot, C. Mining anomalies using traffic feature distributions. ACM SIGCOMM Comput. Commun. Rev. 2005, 35, 217–228. [Google Scholar] [CrossRef]
- Mahoney, M.V.; Chan, P.K. PHAD: Packet Header Anomaly Detection for Identifying Hostile Network Traffic. Florida Institute of Technology; Technical Report CS-2001-04; Florida Institute of Technology: Melbourne, FL, USA, 2001; Available online: https://cs.fit.edu/media/TechnicalReports/cs-2001-04.pdf (accessed on 20 September 2025).
- Park, K.; Pai, V.S.; Peterson, L.; Wang, Z. CoDNS: Improving DNS Performance and Reliability via Cooperative Lookups. In Proceedings of the 6th Symposium on Operating Systems Design & Implementation (OSDI’04), San Francisco, CA, USA, 6–8 April 2004. [Google Scholar]
- Paxson, V. Bro: A system for detecting network intruders in real-time. Comput. Netw. 1999, 31, 2435–2463. [Google Scholar] [CrossRef]
- Hodo, E.; Bellekens, X.; Hamilton, A.; Dubouilh, P.-L.; Iorkyase, E.; Tachtatzis, C. Threat analysis of IoT networks Using Artificial Neural Network Intrusion Detection System. In Proceedings of the 2016 International Symposium on Networks, Computers and Communications (ISNCC), Yasmine Hammamet, Tunisia, 11–13 May 2016. [Google Scholar] [CrossRef]
- Johnson, J.; Douze, M.; Jégou, H. Billion-Scale Similarity Search with GPUs. IEEE Trans. Big Data 2016, 7, 535–547. [Google Scholar] [CrossRef]
- Dask Development Team. Dask: Library for Dynamic Task Scheduling. 2016. Available online: https://dask.org (accessed on 22 September 2025).
- Syriopoulos, P.K.; Kalampalikis, N.G.; Kotsiantis, S.B.; Vrahatis, M.N. kNN Classification: A review. Ann. Math. Artif. Intell. 2023, 93, 43–75. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16), San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Huang, A. Similarity measures for text document clustering. In Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, 29 April 2008; pp. 49–56. [Google Scholar]
- Aggarwal, C.C.; Hinneburg, A.; Keim, D.A. On the Surprising Behavior of Distance Metrics in High Dimensional Space. In Database Theory—ICDT 2001; Van den Bussche, J., Vianu, V., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2001; Volume 1973. [Google Scholar] [CrossRef]
- Fernández, A.; García, S.; Galar, M.; Prati, R.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer: Cham, Switzerland, 2018. [Google Scholar] [CrossRef]
- Johnson, J.M.; Khoshgoftaar, T.M. Survey on Deep Learning with Class Imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef]
- Puthal, D.; Nepal, S.; Ranjan, R.; Chen, J. A secure big data stream analytics framework for disaster management on the cloud. In Proceedings of the 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Sydney, Australia, 12–14 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1218–1225. [Google Scholar]
- Mahapatra, T. Composing high-level stream processing pipelines. J. Big Data 2020, 7, 81. [Google Scholar] [CrossRef]
- Nazari, E.; Shahriari, M.H.; Tabesh, H. Big data analysis in healthcare. Front. Health Inform. 2019, 30, 29. [Google Scholar]

| Model | Accuracy | Macro Precision | Macro Recall | Macro F1-Score |
|---|---|---|---|---|
| XGBoost | 0.9700 | 0.95 | 0.94 | 0.94 |
| Random Forest | 0.9600 | 0.98 | 0.95 | 0.96 |
| KNN | 0.9736 | 0.9787 | 0.9775 | 0.9773 |
| Metric | Simulated Traffic | Real Traffic |
|---|---|---|
| Detection Rate | 99.9% | 97.8% |
| False Positives | 1% | 5% |
| Average Detection Time | 8 ms | 12 ms |
| Throughput (packets/s) | 1.2 M | 800 K |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
de Oliveira, C.H.M.; Ladeira, M.; Van Erven, G.C.G.; Gondim, J.J.C. Shield-X: Vectorization and Machine Learning-Based Pipeline for Network Traffic Threat Detection. Eng. Proc. 2026, 123, 10. https://doi.org/10.3390/engproc2026123010
de Oliveira CHM, Ladeira M, Van Erven GCG, Gondim JJC. Shield-X: Vectorization and Machine Learning-Based Pipeline for Network Traffic Threat Detection. Engineering Proceedings. 2026; 123(1):10. https://doi.org/10.3390/engproc2026123010
Chicago/Turabian Stylede Oliveira, Claudio Henrique Marques, Marcelo Ladeira, Gustavo Cordeiro Galvao Van Erven, and João José Costa Gondim. 2026. "Shield-X: Vectorization and Machine Learning-Based Pipeline for Network Traffic Threat Detection" Engineering Proceedings 123, no. 1: 10. https://doi.org/10.3390/engproc2026123010
APA Stylede Oliveira, C. H. M., Ladeira, M., Van Erven, G. C. G., & Gondim, J. J. C. (2026). Shield-X: Vectorization and Machine Learning-Based Pipeline for Network Traffic Threat Detection. Engineering Proceedings, 123(1), 10. https://doi.org/10.3390/engproc2026123010

