Adversarial Attacks on Machine Learning Models for Network Traffic Filtering

Martínez Hernández, Luis Alberto; Pérez Arteaga, Sandra; Sandoval Orozco, Ana Lucila; García Villalba, Luis Javier

doi:10.3390/engproc2026123023

Open AccessProceeding Paper

Adversarial Attacks on Machine Learning Models for Network Traffic Filtering^†

by

Luis Alberto Martínez Hernández

^‡

,

Sandra Pérez Arteaga

^‡

,

Ana Lucila Sandoval Orozco

^‡

and

Luis Javier García Villalba

^*,‡

Group of Analysis, Security and Systems (GASS), Department of Software Engineering and Artificial Intelligence (DISIA), Faculty of Computer Science and Engineering, Office 431, Universidad Complutense de Madrid (UCM), Calle Profesor José García Santesmases, 9, Ciudad Universitaria, 28040 Madrid, Spain

^*

Author to whom correspondence should be addressed.

^†

Presented at the First Summer School on Artificial Intelligence in Cybersecurity, Cancun, Mexico, 3–7 November 2025.

^‡

These authors contributed equally to this work.

Eng. Proc. 2026, 123(1), 23; https://doi.org/10.3390/engproc2026123023

Published: 5 February 2026

(This article belongs to the Proceedings of First Summer School on Artificial Intelligence in Cybersecurity)

Download Versions Notes

Abstract

Due to people’s increasing access to computers, IT security has become extremely important in today’s society. This increase in connectivity has also led cybercriminals to take advantage of the anonymity and privacy offered by the Internet to carry out illegal activities. One of the most innovative solutions for protecting systems and networks is the use of artificial intelligence. However, these same technologies can become attractive targets for attackers seeking to compromise an organisation’s security. This paper analyses attacks targeting machine learning algorithms used in the classification of messaging application traffic, using Generative Adversarial Networks. Three algorithms were specifically evaluated and the results obtained were compared. The analyses show that all algorithms have a certain degree of vulnerability to malicious manipulation, highlighting the need to strengthen their defence mechanisms.

Keywords:

artificial intelligence; digital information; computer forensic; entity extraction; entity recognition; storage devices; text processing

1. Introduction

Artificial Intelligence (AI) has experienced a remarkable boom due to people’s increasing access to computers and their ability to optimise working times through solutions based on this technology. However, this same advance has created new risks: cybercriminals actively seek to obtain personal data and compromise users’ privacy. Among the most innovative applications of AI is the protection of computer systems and networks; however, these defences can also become attractive targets for attackers attempting to breach an organisation’s security. One of the most significant risks facing AI models is adversarial attacks, which can allow unauthorised access to sensitive data or cause models to make erroneous predictions. It is therefore essential to consider these types of attacks during the development and training of models in order to identify and reinforce their weaknesses.

This paper addresses the study of attacks targeting AI algorithms used in the classification of messaging applications. Three algorithms in particular were analysed: K-Nearest Neighbors, Decision Trees and Multilayer Perceptron, comparing the results obtained before and after being subjected to an attack. The experiments show that, through attacks generated with Generative Adversarial Networks (GANs), these models are susceptible to malicious manipulation. However, the effectiveness and complexity of the attacks depending on the algorithm and the structure of the classification model. It should be noted that such attacks not only compromise the accuracy of the classifications, but also the privacy and security of messaging app users. The rest of the work is organised as follows: Section 2 describes the importance and different types of attacks and their modalities according to the attacker’s knowledge. Section 3 describes the adversarial threat model, focusing on AI. Section 4 describes experiments to confuse AI models responsible for traffic filtering. Finally, the conclusions of the work are included in Section 5.

2. Security in Artificial Intelligence

Security in Artificial Intelligence (AI) has become an increasingly important and relevant issue in the field of technology. As AI is integrated into a wide variety of systems and applications, ensuring that they are secure and reliable is becoming increasingly critical. The incorporation of AI into the field of cybersecurity has proven to be essential for identifying, mitigating, and responding to new types of attacks more quickly and efficiently. Thanks to its capabilities, AI can protect user data and identity, optimise threat detection, streamline incident response, and automate complex processes. However, its application also carries certain risks. Among the main challenges are data security, attacks targeting AI models, the difficulty of code maintenance, and the increasing complexity of systems. It is essential to address these risks to ensure the ethical, responsible, and secure implementation of AI in the field of security and defence. Likewise, one of the main concerns in this context is data privacy. Given that AI algorithms require large volumes of information for training, there is a risk that this data may be misused or shared with third parties without the consent of users. It is therefore essential to establish and implement robust security and privacy measures that guarantee the responsible, transparent and secure handling of information.

Attacks Based on the Attacker’s Knowledge

In the context of IT security applied to Artificial Intelligence (AI), attacks can be classified according to the attacker’s level of knowledge and experience. This classification ranges from basic techniques to highly specialised methods that exploit specific vulnerabilities in the system. At one end of the spectrum are general attacks, which do not require in-depth knowledge of the internal structure of the system; at the other end are sophisticated, targeted threats, which demand a detailed understanding of the defences and functioning of the target. As AI becomes increasingly integrated into security systems, attacks aimed at evading AI-based defensive mechanisms or manipulating machine learning models have become more frequent. This trend has significantly broadened the threat landscape in the field of cybersecurity, highlighting the need to develop more robust and adaptive protection strategies.

White Box Attacks: In white-box attacks, the adversary has complete knowledge of the target model: they know the training data, architecture, and parameters. To induce a failure in the model, the attacker analyses its behaviour and determines its vulnerabilities for example, which inputs produce incorrect outputs; with this information, it is easier to design the generator and transfer knowledge from the classifier. This type of attack is widely used in the literature. In [1], a white-box attack is proposed that is capable of generating subtle perturbations that compromise machine learning models, achieving the highest success rate against adversarially trained models such as TRADES. The reported attacks reduce the accuracy of classifiers by between 16% and 31% in black-box transfer scenarios. In addition, Wang et al. [2] present an approach to minimise superfluous information in the generation of gradient-based negative examples and introduce two specific algorithms: the Integrated Finite Point Attack Algorithm (IFPA) and the Integrated Universe Attack Algorithm (IUA).
Black Box Attacks: Unlike previous taxonomies, black-box attacks do not require knowledge of the model, its configuration, or the technologies that support it [3]. The only information available about the attacked model is the output it produces for a given input. The adversarial network model that has yielded the best results in this type of attack is the MalGAN structure [4]. There are proposals that minimise the incidence of black-box attacks on ML algorithms; in [5], a strategy for protection against adversarial perturbations using GANs is presented. To reduce the dimensionality of the dataset and improve model performance, experiments were conducted using classification algorithms such as random forest, principal component analysis, and recursive feature elimination. It was found that GAN-based adversarial training increased the model’s resilience and was able to mitigate black-box attacks. In [6], a realistic threat model is used in a study on the security of black-box attack detectors. The samples collected from the attacker’s side are various approaches used for query reduction and threshold understanding for anomaly detection.

3. Adversarial Threat Model

Threat modelling is an essential component of system security analysis, whose main objective is to identify potential threats and vulnerabilities in order to establish security objectives and define policies that prevent or mitigate their impact. This stage is fundamental to maximising a system’s protection. Machine learning (ML) models are vulnerable to adversarial attacks. In the field of network security, these attacks are particularly frequent due to the critical nature of their applications, such as malware detection, intrusion identification, and spam filtering. In [7], a taxonomy is proposed for the generation of adversarial attacks against ML models implemented in security platforms that monitor network traffic, along with two classification approaches and various defence strategies against such attacks.

Classification of Adversarial Threats

Adversarial models consider different types of attacks based on multiple factors. These include: (1) attacker knowledge, which assesses the amount of information available about the AI architecture; (2) timing of the attack, which distinguishes between evasion and poisoning attacks, which seek to confuse the model after training and alter the training data before learning to induce systematic errors, respectively; (3) frequency, which focuses on how often the adversarial example is updated; (4) falsification, which refers to classification errors, specifically false positives and false negatives, in the model’s inferences.

4. Experiments and Results

To verify the security of the machine learning models, tests were conducted to try to deceive the model at the time of prediction. For these tests, adversarial attacks were carried out using GAN networks on three of the most popular models: decision trees, k-nearest neighbours, and multilayer perceptrons. The ML models are trained to filter network traffic specifically from the most popular messaging applications such as Confide, Discord, Instagram, Dust, Facebook Messenger, Kakao Talk, Kik, Line, Meet, Signal, Skype, Snapchat, Telegram, Twitter, Viber, WhatsApp, and Wickr Me. The dataset consists of 233 samples of mobile application traffic in PCAP format with a size of 16.70 GB. The dataset contains 63 characteristics for each type of traffic, including aggregate, statistical, and temporal characteristics. The dataset is divided into 80% for training, 10% for testing, and 10% for validation. In order to carry out the training, the set is divided into X and Y, where X are the traffic features and Y are the labels or output. It should be noted that for the experiments carried out, the traffic payload is not used because this traffic is encrypted end-to-end. The generator consists of three dense layers mixed with two batch normalisation layers. It receives an input tensor of shape

(72)

and generates an output of shape

(63)

. All layers use the ReLU activation function with a negative slope of 0.2, which contributes to the stability and rapid convergence of the model. The discriminator, responsible for differentiating between real and generated data, consists of four dense layers, three batch normalisation layers, and a softmax activation layer. It receives an input tensor of shape

(63)

and produces a one-dimensional output indicating the probability that the sample is real or synthetic. The intermediate layers use ReLU activation functions, which promotes training stability, while the final softmax layer allows for probabilistic traffic classification.

For the experiments of the proposed method, a GAN trained with WhatsApp data is used as a basis. Once the GAN has been trained and the attack executed, a decrease in the accuracy of the model can be observed. In the case of decision trees, the initial accuracy was 77.72%, correctly identifying 150 of 193 samples entered. However, with those modified by the GAN trained with traffic data, the accuracy drops to 4.76%, correctly identifying only 9 of 193 predictions. Meanwhile, in the classifier based on K Nearest Neighbours, the model’s accuracy was 68.39% with the original data, correctly identifying 132 out of 193 samples. When attempting to classify the adversarial data from the GAN, it correctly identifies 3 out of 193 predictions, i.e. its accuracy is 1.55%. Finally, something similar occurs in the multi-layer perceptron classifier with the GAN, which drops from 75.13% accuracy to 5.18%, correctly predicting only 10 times.

5. Conclusions

The growing use of artificial intelligence-based IT security solutions has highlighted increasing concerns about the robustness and resistance to adversarial attacks of the base models used. By easily circumventing conventional defences, adversarial attacks pose a significant threat to the integrity and effectiveness of these systems. Research into adversarial models has proven to be a promising solution for improving AI’s ability to resist such attacks, allowing vulnerabilities to be identified and reduced before they are exploited in production environments. Anomaly detection and adversarial data generation are emerging as important areas in this context, providing essential tools for evaluating security and strengthening resistance against malicious attacks. In this work, some of the weaknesses of the main AI models were demonstrated through experiments to try to disrupt the accuracy of three of the most popular ML models (Decision Trees, K-Nearest Neighbours, and Multilayer Perceptron), in which a very high attack success rate was achieved, reducing the accuracy of the models to less than 6%. For this work, the target models were trained with traffic samples. This highlights the need to verify the security of AI models, especially when they are necessary for anomaly detection, malicious traffic classification, etc.

Author Contributions

All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the European Commission under the Horizon 2020 research and innovation programme, as part of the project HEROES (http://heroes-fct.eu, Grant Agreement no. 101021801) and of the project ALUNA (https://aluna-isf.eu/, Grant Agreement no. 101084929). This work was also carried out with funding from the Recovery, Transformation and Resilience Plan, financed by the European Union (Next Generation EU), through the Chair “Cybersecurity for Innovation and Digital Protection” INCIBE-UCM. In addition this work has been supported by Comunidad Autonoma de Madrid, CIRMA-CM Project (TEC-2024/COM-404). The content of this article does not reflect the official opinion of the European Union. Responsibility for the information and views expressed therein lies entirely with the authors.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, Y.; Liu, J.; Chang, X.; Wang, J.; Rodríguez, R.J. DI-AA: An Interpretable White-box Attack for Fooling Deep Neural Networks. arXiv 2021, arXiv:2110.07305. [Google Scholar] [CrossRef]
Wang, Y.; Liu, J.; Chang, X.; Mišić, J.; Mišić, V.B. IWA: Integrated Gradient-Based White-Box Attacks for Fooling Deep Neural Networks. Int. J. Intell. Syst. 2021, 37, 4253–4276. [Google Scholar] [CrossRef]
Yuan, X.; He, P.; Zhu, Q.; Li, X. Adversarial examples: Attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2805–2824. [Google Scholar] [CrossRef] [PubMed]
Hu, W.; Tan, Y. Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN. arXiv 2017, arXiv:1702.05983. [Google Scholar] [CrossRef]
Alahmed, S.; Alasad, Q.; Hammood, M.M.; Yuan, J.S.; Alawad, M. Mitigation of black-box attacks on intrusion detection systems-based ml. Computers 2022, 11, 115. [Google Scholar] [CrossRef]
He, K.; Kim, D.D.; Asghar, M.R. Adversarial machine learning for network intrusion detection systems: A comprehensive survey. IEEE Commun. Surv. Tutor. 2023, 25, 538–566. [Google Scholar] [CrossRef]
Ibitoye, O.; Abou-Khamis, R.; Matrawy, A.; Shafiq, M.O. The Threat of Adversarial Attacks on Machine Learning in Network Security—A Survey. arXiv 2019, arXiv:1911.02621. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Martínez Hernández, L.A.; Pérez Arteaga, S.; Sandoval Orozco, A.L.; García Villalba, L.J. Adversarial Attacks on Machine Learning Models for Network Traffic Filtering. Eng. Proc. 2026, 123, 23. https://doi.org/10.3390/engproc2026123023

AMA Style

Martínez Hernández LA, Pérez Arteaga S, Sandoval Orozco AL, García Villalba LJ. Adversarial Attacks on Machine Learning Models for Network Traffic Filtering. Engineering Proceedings. 2026; 123(1):23. https://doi.org/10.3390/engproc2026123023

Chicago/Turabian Style

Martínez Hernández, Luis Alberto, Sandra Pérez Arteaga, Ana Lucila Sandoval Orozco, and Luis Javier García Villalba. 2026. "Adversarial Attacks on Machine Learning Models for Network Traffic Filtering" Engineering Proceedings 123, no. 1: 23. https://doi.org/10.3390/engproc2026123023

APA Style

Martínez Hernández, L. A., Pérez Arteaga, S., Sandoval Orozco, A. L., & García Villalba, L. J. (2026). Adversarial Attacks on Machine Learning Models for Network Traffic Filtering. Engineering Proceedings, 123(1), 23. https://doi.org/10.3390/engproc2026123023

Article Menu

Adversarial Attacks on Machine Learning Models for Network Traffic Filtering^†

Abstract

1. Introduction

2. Security in Artificial Intelligence

Attacks Based on the Attacker’s Knowledge

3. Adversarial Threat Model

Classification of Adversarial Threats

4. Experiments and Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Adversarial Attacks on Machine Learning Models for Network Traffic Filtering †

Abstract

1. Introduction

2. Security in Artificial Intelligence

Attacks Based on the Attacker’s Knowledge

3. Adversarial Threat Model

Classification of Adversarial Threats

4. Experiments and Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Adversarial Attacks on Machine Learning Models for Network Traffic Filtering^†