1. Introduction
Artificial Intelligence (AI) has experienced a remarkable boom due to people’s increasing access to computers and their ability to optimise working times through solutions based on this technology. However, this same advance has created new risks: cybercriminals actively seek to obtain personal data and compromise users’ privacy. Among the most innovative applications of AI is the protection of computer systems and networks; however, these defences can also become attractive targets for attackers attempting to breach an organisation’s security. One of the most significant risks facing AI models is adversarial attacks, which can allow unauthorised access to sensitive data or cause models to make erroneous predictions. It is therefore essential to consider these types of attacks during the development and training of models in order to identify and reinforce their weaknesses.
This paper addresses the study of attacks targeting AI algorithms used in the classification of messaging applications. Three algorithms in particular were analysed: K-Nearest Neighbors, Decision Trees and Multilayer Perceptron, comparing the results obtained before and after being subjected to an attack. The experiments show that, through attacks generated with Generative Adversarial Networks (GANs), these models are susceptible to malicious manipulation. However, the effectiveness and complexity of the attacks depending on the algorithm and the structure of the classification model. It should be noted that such attacks not only compromise the accuracy of the classifications, but also the privacy and security of messaging app users. The rest of the work is organised as follows:
Section 2 describes the importance and different types of attacks and their modalities according to the attacker’s knowledge.
Section 3 describes the adversarial threat model, focusing on AI.
Section 4 describes experiments to confuse AI models responsible for traffic filtering. Finally, the conclusions of the work are included in
Section 5.
2. Security in Artificial Intelligence
Security in Artificial Intelligence (AI) has become an increasingly important and relevant issue in the field of technology. As AI is integrated into a wide variety of systems and applications, ensuring that they are secure and reliable is becoming increasingly critical. The incorporation of AI into the field of cybersecurity has proven to be essential for identifying, mitigating, and responding to new types of attacks more quickly and efficiently. Thanks to its capabilities, AI can protect user data and identity, optimise threat detection, streamline incident response, and automate complex processes. However, its application also carries certain risks. Among the main challenges are data security, attacks targeting AI models, the difficulty of code maintenance, and the increasing complexity of systems. It is essential to address these risks to ensure the ethical, responsible, and secure implementation of AI in the field of security and defence. Likewise, one of the main concerns in this context is data privacy. Given that AI algorithms require large volumes of information for training, there is a risk that this data may be misused or shared with third parties without the consent of users. It is therefore essential to establish and implement robust security and privacy measures that guarantee the responsible, transparent and secure handling of information.
Attacks Based on the Attacker’s Knowledge
In the context of IT security applied to Artificial Intelligence (AI), attacks can be classified according to the attacker’s level of knowledge and experience. This classification ranges from basic techniques to highly specialised methods that exploit specific vulnerabilities in the system. At one end of the spectrum are general attacks, which do not require in-depth knowledge of the internal structure of the system; at the other end are sophisticated, targeted threats, which demand a detailed understanding of the defences and functioning of the target. As AI becomes increasingly integrated into security systems, attacks aimed at evading AI-based defensive mechanisms or manipulating machine learning models have become more frequent. This trend has significantly broadened the threat landscape in the field of cybersecurity, highlighting the need to develop more robust and adaptive protection strategies.
White Box Attacks: In white-box attacks, the adversary has complete knowledge of the target model: they know the training data, architecture, and parameters. To induce a failure in the model, the attacker analyses its behaviour and determines its vulnerabilities for example, which inputs produce incorrect outputs; with this information, it is easier to design the generator and transfer knowledge from the classifier. This type of attack is widely used in the literature. In [
1], a white-box attack is proposed that is capable of generating subtle perturbations that compromise machine learning models, achieving the highest success rate against adversarially trained models such as TRADES. The reported attacks reduce the accuracy of classifiers by between 16% and 31% in black-box transfer scenarios. In addition, Wang et al. [
2] present an approach to minimise superfluous information in the generation of gradient-based negative examples and introduce two specific algorithms: the Integrated Finite Point Attack Algorithm (IFPA) and the Integrated Universe Attack Algorithm (IUA).
Black Box Attacks: Unlike previous taxonomies, black-box attacks do not require knowledge of the model, its configuration, or the technologies that support it [
3]. The only information available about the attacked model is the output it produces for a given input. The adversarial network model that has yielded the best results in this type of attack is the MalGAN structure [
4]. There are proposals that minimise the incidence of black-box attacks on ML algorithms; in [
5], a strategy for protection against adversarial perturbations using GANs is presented. To reduce the dimensionality of the dataset and improve model performance, experiments were conducted using classification algorithms such as random forest, principal component analysis, and recursive feature elimination. It was found that GAN-based adversarial training increased the model’s resilience and was able to mitigate black-box attacks. In [
6], a realistic threat model is used in a study on the security of black-box attack detectors. The samples collected from the attacker’s side are various approaches used for query reduction and threshold understanding for anomaly detection.
3. Adversarial Threat Model
Threat modelling is an essential component of system security analysis, whose main objective is to identify potential threats and vulnerabilities in order to establish security objectives and define policies that prevent or mitigate their impact. This stage is fundamental to maximising a system’s protection. Machine learning (ML) models are vulnerable to adversarial attacks. In the field of network security, these attacks are particularly frequent due to the critical nature of their applications, such as malware detection, intrusion identification, and spam filtering. In [
7], a taxonomy is proposed for the generation of adversarial attacks against ML models implemented in security platforms that monitor network traffic, along with two classification approaches and various defence strategies against such attacks.
Classification of Adversarial Threats
Adversarial models consider different types of attacks based on multiple factors. These include: (1) attacker knowledge, which assesses the amount of information available about the AI architecture; (2) timing of the attack, which distinguishes between evasion and poisoning attacks, which seek to confuse the model after training and alter the training data before learning to induce systematic errors, respectively; (3) frequency, which focuses on how often the adversarial example is updated; (4) falsification, which refers to classification errors, specifically false positives and false negatives, in the model’s inferences.
4. Experiments and Results
To verify the security of the machine learning models, tests were conducted to try to deceive the model at the time of prediction. For these tests, adversarial attacks were carried out using GAN networks on three of the most popular models: decision trees, k-nearest neighbours, and multilayer perceptrons. The ML models are trained to filter network traffic specifically from the most popular messaging applications such as Confide, Discord, Instagram, Dust, Facebook Messenger, Kakao Talk, Kik, Line, Meet, Signal, Skype, Snapchat, Telegram, Twitter, Viber, WhatsApp, and Wickr Me. The dataset consists of 233 samples of mobile application traffic in PCAP format with a size of 16.70 GB. The dataset contains 63 characteristics for each type of traffic, including aggregate, statistical, and temporal characteristics. The dataset is divided into 80% for training, 10% for testing, and 10% for validation. In order to carry out the training, the set is divided into X and Y, where X are the traffic features and Y are the labels or output. It should be noted that for the experiments carried out, the traffic payload is not used because this traffic is encrypted end-to-end. The generator consists of three dense layers mixed with two batch normalisation layers. It receives an input tensor of shape and generates an output of shape . All layers use the ReLU activation function with a negative slope of 0.2, which contributes to the stability and rapid convergence of the model. The discriminator, responsible for differentiating between real and generated data, consists of four dense layers, three batch normalisation layers, and a softmax activation layer. It receives an input tensor of shape and produces a one-dimensional output indicating the probability that the sample is real or synthetic. The intermediate layers use ReLU activation functions, which promotes training stability, while the final softmax layer allows for probabilistic traffic classification.
For the experiments of the proposed method, a GAN trained with WhatsApp data is used as a basis. Once the GAN has been trained and the attack executed, a decrease in the accuracy of the model can be observed. In the case of decision trees, the initial accuracy was 77.72%, correctly identifying 150 of 193 samples entered. However, with those modified by the GAN trained with traffic data, the accuracy drops to 4.76%, correctly identifying only 9 of 193 predictions. Meanwhile, in the classifier based on K Nearest Neighbours, the model’s accuracy was 68.39% with the original data, correctly identifying 132 out of 193 samples. When attempting to classify the adversarial data from the GAN, it correctly identifies 3 out of 193 predictions, i.e. its accuracy is 1.55%. Finally, something similar occurs in the multi-layer perceptron classifier with the GAN, which drops from 75.13% accuracy to 5.18%, correctly predicting only 10 times.
5. Conclusions
The growing use of artificial intelligence-based IT security solutions has highlighted increasing concerns about the robustness and resistance to adversarial attacks of the base models used. By easily circumventing conventional defences, adversarial attacks pose a significant threat to the integrity and effectiveness of these systems. Research into adversarial models has proven to be a promising solution for improving AI’s ability to resist such attacks, allowing vulnerabilities to be identified and reduced before they are exploited in production environments. Anomaly detection and adversarial data generation are emerging as important areas in this context, providing essential tools for evaluating security and strengthening resistance against malicious attacks. In this work, some of the weaknesses of the main AI models were demonstrated through experiments to try to disrupt the accuracy of three of the most popular ML models (Decision Trees, K-Nearest Neighbours, and Multilayer Perceptron), in which a very high attack success rate was achieved, reducing the accuracy of the models to less than 6%. For this work, the target models were trained with traffic samples. This highlights the need to verify the security of AI models, especially when they are necessary for anomaly detection, malicious traffic classification, etc.