Enhanced Anomaly Detection in IoT Through Transformer-Based Adversarial Perturbations Model

Zia, Saher; Bibi, Nargis; Alhazmi, Samah; Muhammad, Nazeer; Alhazmi, Afnan

doi:10.3390/electronics14061094

Open AccessArticle

Enhanced Anomaly Detection in IoT Through Transformer-Based Adversarial Perturbations Model^†

by

Saher Zia

¹,

Nargis Bibi

^1,*

,

Samah Alhazmi

^2,*

,

Nazeer Muhammad

³

and

Afnan Alhazmi

⁴

¹

Department of Computer Science, Fatima Jinnah Women University, The Mall, Rawalpindi 44000, Pakistan

²

Computer Science Department, College of Computing and Informatics, Saudi Electronic University, Riyadh 11673, Saudi Arabia

³

College of Computing & Systems, Abdullah Al Salem University, Kuwait City 72303, Kuwait

⁴

Information Technology Department, College of Computer and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in Zia, S.; Bibi, N. Enhanced Anomaly Detection in IoT: A Transformer Based Approach for Multivariate Time Series data. In Proceedings of the 2024 International Conference on Engineering & Computing Technologies (ICECT), Islamabad, Pakistan, 23–23 May 2024; pp. 1–6.

Electronics 2025, 14(6), 1094; https://doi.org/10.3390/electronics14061094

Submission received: 27 January 2025 / Revised: 3 March 2025 / Accepted: 6 March 2025 / Published: 10 March 2025

(This article belongs to the Special Issue Advances in Wireless Communication for loT)

Download

Browse Figures

Versions Notes

Abstract

Ensuring data security in IoT systems requires effective anomaly detection, particularly in multivariate time series data generated by sensor networks. This study introduces a transformer-based method to detect anomalies by capturing complex temporal patterns and long-range dependencies. The model adapts to diverse anomaly types across datasets, leveraging adversarial perturbations to enhance robustness and accuracy. Integration of the Streaming Peaks Over Threshold (SPOT) mechanism further improves thresholding. Experiments on MSL, SMD, NAB, and SWaT datasets validate the model’s effectiveness, demonstrating its competitive performance in strengthening IoT systems and ensuring data security in dynamic environments.

Keywords:

internet of things; IoT; security; analytical models; computational modelling; time series analysis; transformers; Streaming Peaks Over Threshold (SPOT)

1. Introduction

The proliferation of Internet of Things (IoT) devices in the fast-digitizing era has led to an unprecedented flood of multivariate time series data across multiple domains. Although there is a lot of potential for insights and optimizations due to this abundance of data, there are also a lot of obstacles to overcome, especially when it comes to guaranteeing the security and dependability of IoT systems. Deviations from expected behaviour are indicated by anomalies in multivariate time series data, and if they go unnoticed, they can negatively impact the integrity and performance of the system.

Time series data is widely used in many different fields and provides insightful information on trends, patterns, and anomalies. The subject of Multivariate Time Series Anomaly Detection (MVTSAD) has shown notable progress in recent times due to the implementation of many approaches, such as deep learning, machine learning, and traditional statistical techniques.

Three general categories can be used to classify anomaly detection techniques: hybrid analytics, fully data-driven approaches, and model-based analytics. These techniques are essential for examining sensor readings that include a variety of data formats, including pictures, videos, and time series data. In multi-sensor Internet of Things (IoT) systems, where massive volumes of data are continuously created, machine learning techniques are especially important for MVTSAD. Prior research has emphasized the significance of diverse machine learning methodologies in the extraction of advanced information from Internet-of-Things data and the evaluation of their benefits and drawbacks [1]. Because of its potential to speed up analytics and learning in Internet of Things applications, deep learning, a sophisticated family of machine learning techniques, has attracted a lot of attention [2]. For anomaly identification in multivariate time series data, methods including Autoencoders, Transformers, Graph Neural Networks (GNNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Autoencoders have been investigated [3].

An unsupervised artificial neural network called an autoencoder has been widely employed for time series anomaly identification. Among the variations used to capture temporal dependencies and spatial correlations are Convolutional Autoencoders (CAEs) and Multiple-Layer Perceptrons Autoencoders (MLP-AE). Modelling data distributions and allowing the generation of anomaly scores using probability measures, variational autoencoders, or VAEs provide a probabilistic method for anomaly identification [4].

Using a min–max adversarial game between a generator and a discriminator to discern between real and fake data, Generative Adversarial Networks (GANs) have become highly effective tools for anomaly detection. The adaptability of GANs in finding anomalies in time series data is demonstrated by techniques like Multivariate Anomaly Detection with Generative Adversarial Networks (MAD-GAN) [5]. For the purpose of concurrently collecting temporal dependencies and inter-sensor correlations, Graph Neural Networks (GNNs) have gained popularity in MVTSAD [6]. By incorporating spatial connections among sensors and temporal dependencies in multivariate time series data, graph-based representations are used by Graph Deviation Networks (GDNs) and Graph Attention Networks (GATs) to improve anomaly detection [6,7].

Machine learning and deep learning-based anomaly detection techniques have become essential for ensuring the security and reliability of IoT systems, particularly in the face of increasingly sophisticated cyber threats and the inherent complexity of multivariate time series data generated by IoT devices. Adversarial attacks, which deliberately manipulate data to mislead detection systems, have driven the development of robust anomaly detection methods designed to withstand adversarial perturbations [8,9]. The creation of realistic datasets, such as the IoT botnet dataset proposed [10], plays a critical role in enabling comprehensive evaluations of detection models under conditions that reflect real-world IoT environments.

When applied to the dynamic and heterogeneous nature of time series data generated by IoT devices, traditional anomaly detection approaches frequently fail. As a result of these difficulties, there is an urgent need for creative methods that can distinguish abnormalities among the complexity of Internet of Things environments [11]. An unexplored field in anomaly detection for time series data is the implementation of transformer-based models, especially when handling multimodal and complex temporal connections. To guarantee their dependability in crucial applications, more research is needed on the interpretability and explainability of anomaly detection models. By enhancing anomaly detection accuracy, adaptability, and interpretability, MVTSAD frameworks that incorporate sophisticated deep learning techniques like transformers may be able to close these gaps and further the field. In this research, an innovative paradigm for natural language processing—transformer architecture—is suggested to be used for anomaly identification in multivariate time series data in Internet of Things ecosystems.

Prominent for its self-attention mechanism, the transformer architecture has proven remarkably effective at extracting intricate patterns and correlations from sequential data [12]. Transformer design provides a promising way to tackle the complexities of anomaly identification in time series data by dynamically evaluating the significance of each piece within an input sequence. This is especially relevant in Internet-of-Things scenarios when there are erratic sampling intervals, a variety of data modalities, and a need for real-time reaction. Thus, the transformer was chosen primarily for this research due to its self-attention mechanism, which excels at capturing long-range dependencies and global contextual information, capabilities that traditional RNNs, prone to vanishing gradients and sequential processing, often lack [13]. Additionally, unlike CNNs that focus on local feature extraction, transformers efficiently model complex temporal dynamics in multivariate time series data and readily integrate with adversarial perturbation techniques, enhancing robustness and overall performance in dynamic IoT environments. The motivation behind our work is to address the growing challenges of anomaly detection in dynamic IoT environments, where traditional methods (e.g., RNNs and CNNs) struggle with capturing long-range dependencies and adapting to volatile data patterns [14,15,16,17,18,19,20].

The main goal of this research is to optimize transformer-based models for time series data provided by IoT devices in order to detect anomalies. Our goal is to improve anomaly detection capabilities by utilizing the transformer architecture’s adaptability and flexibility, which will strengthen the security and reliability of IoT systems [21,22,23,24,25,26]. While [23] provides a valuable botnet dataset, it lacks representation of emerging IoT protocols (e.g., MQTT, LoRaWAN) and adversarial attack patterns. Real-World Validation: Most defences proposed by [22,24] are evaluated in simulated environments, raising questions about their efficacy in noisy, real-world IoT networks. Scalability: Though [25,26] improve edge deployment, their framework requires further testing across heterogeneous IoT hardware (e.g., ultra-low-power microcontrollers). To do this, a strong thresholding mechanism must be put in place; representation learning must be improved through an adversarial perturbations mechanism, transformer design must be adjusted to better handle abnormalities in data, and evaluation frameworks must be developed for methodical comparison with currently available anomaly detection methods. We aim to enable IoT systems to proactively detect and mitigate abnormalities by integrating transformer design. This will help to nurture improved resilience and reliability in dynamic IoT environments.

This paper’s latter sections explore the Materials and Methods in Section 2, the results and experiments of our suggested transformer-based anomaly detection paradigm in Section 3, and the discussion of results in Section 4. Our objective is to showcase the effectiveness and real-world relevance of our method for improving anomaly identification in multivariate time series data generated by the Internet of Things through meticulous examination and assessment.

2. Materials and Methods

2.1. Dataset Selection and Preprocessing

The creation and assessment of anomaly detection models in the field of Multivariate Time Series Anomaly Detection (MVTSAD) heavily depends on the choice of relevant datasets and the use of strict preprocessing methods. In order to guarantee the accuracy and reliability of our findings, a thorough preparation workflow and the datasets used for our study are described in this part and summarized in Table 1. The datasets for this study were selected based on their ability to represent diverse IoT environments and their complexity in terms of multivariate time series data.

Our study makes use of a wide range of datasets that have been carefully selected to reflect various anomaly detection areas and issues. Among these datasets are:

Mars Science Laboratory (MSL) Curiosity Rover Dataset: The Curiosity rover collected time series data on environmental features on the Martian surface, which are included in this dataset. We investigate anomalies in the Martian environment, focusing on three nontrivial sequences (A4, C2, and T1) despite the particular difficulties associated with extraterrestrial environments.
Server Machine Dataset (SMD): The SMD dataset, which consists of stacked traces showing the resource usage of 28 computers in a computer cluster, provides information about server health and performance monitoring. We focus on machine-1-1, 2-1, 3-2, and 3-07 sequences in particular to look at unusual server behaviour.
Numenta Anomaly Benchmark (NAB): NAB offers a standardised methodology for assessing anomaly detection algorithms with real-world datasets like New York City taxi demand, temperature sensor data, and cloud computing CPU utilisation. For our research, we use a subset of real-known instances datasets, eliminating sequences that have inaccurate anomaly labelling.
Secure Water Treatment (SWaT): cybersecurity events and operational aspects from an actual water treatment plant. Through the use of security-related anomalies and the simulation of potential assaults, SWaT allows us to investigate anomaly detection in critical infrastructure systems in great detail.

In order to guarantee the dependability and efficiency of our anomaly detection analysis, we incorporate an improved preprocessing methodology that consists of these steps:

A scalable and modular function for importing and transforming data to standardised formats, handling complexity, and guaranteeing structural consistency is part of the data pipeline. With the use of an anomaly labelling function, abnormalities can be precisely identified, giving evaluation-critical ground truth data. To improve model generalisation, normalisation approaches are used to keep variable scales constant across datasets. In order to accurately represent time, time series data must be thoroughly pre-processed, which includes addressing anomalies, resampling, and timestamp alignment. By splitting data into smaller, overlapping windows, sliding window processing makes data more sensitive to fluctuations and helps extract temporal patterns. Partitioning datasets for training and validation with a strong cross-validation setup makes it easier to assess performance metrics and generalisation skills throughout different subsets.

Our goal is to construct reliable, robust, and domain-specific anomaly detection models that will progress the field of MVTSAD by carefully choosing datasets that reflect various domains and utilising an extensive preprocessing pipeline.

2.2. Methodology: Leveraging TAP Model and SPOT Mechanism for Anomaly Detection

The sequence of time series data

x_{1}

,

x_{2, \dots,} x_{T}

of length

T

is used as an input for a transformer-based model

f (.)

in order to detect adversarial observations using SPOT with threshold

Λ_{T}

is defined:

Λ_{T} = λ_{P} (f (x_{1}), f (x_{2}), f (x_{3}), . . ., f (x_{T}))

where

λ_{P}

is the

P - t h

quantile of output data. Non-anomolies data points

Δ_{T}

for

T

is given as follows:

Δ_{T} = \{ϑ_{T} | f (x_{T}) > Λ_{T}\}

where

ϑ_{T} = f (x_{T}) - Λ_{T}

for

f (x_{T}) > Λ_{T} .

The

ϑ_{T}

transformation

θ

is estimated as cumulative distribution function using parameters

σ_{T}

(scale value) and

υ_{T}

(shape value):

F (θ; σ_{T}, υ_{T}) = \{\begin{matrix} 1 - {(1 + \frac{υ_{T} θ}{σ_{T}})}^{- 1 / υ_{T}}, i f υ_{T} \neq 0 \\ 1 - e x p (1 + \frac{θ}{σ_{T}}), i f υ_{T} = 0 \end{matrix}

For

υ_{T} \geq 0

and

1 + \frac{υ_{T} θ}{σ_{T}} > 0 .

Adversarial perturbation

δ_{T}

is obtained by maximizing the loss function

Ψ

:

δ_{T} = a r g \max_{‖δ‖ \leq ϵ} Ψ (f (x_{T} + δ), θ_{T})

where

θ_{T}

is the true identity, and

ϵ

is the perturbation value. The transformer based adversarial iteration is performed to mitigate the perturbation model as follows:

Θ_{T + 1} = Θ_{T} - η \nabla_{Θ} Ψ (f (x_{T} + δ_{T}), θ_{T}),

where

Θ_{T}

are the training parameters at

T

, and

η

is the learning rate.

For efficient anomaly identification in multivariate time series data, we integrate the Streaming Peaks-Over-Threshold (SPOT) mechanism with the transformer-based Adversarial Perturbations (TAP) model in our suggested methodology as shown in Figure 1. To improve the robustness and accuracy of anomaly identification systems, this novel strategy combines cutting-edge machine learning techniques with real-time anomaly detection approaches. We’ve listed the main steps in our process below:

The raw multivariate time series data is pre-processed to handle different data modalities and uneven sampling intervals before anomaly identification. Data consistency and compatibility for input into the TAP model and SPOT mechanism are ensured with this pre-processing phase. Essential hyperparameters like as the amount of features, learning rate, batch size, and window size are set up at the beginning of the TAP model. The TAP model’s design and behaviour are defined by these parameters, which also affect the model’s capacity to recognise anomalies and capture temporal trends. Positional encoding is used to incorporate chronological information into the input data in order to improve the temporal awareness of the TAP model. The core of the TAP model is the transformer architecture, which consists of encoder and decoder layers. This architecture allows the multivariate time series data to be used to extract intricate temporal correlations and patterns.

The capacity of the TAP model to produce adversarial perturbations dynamically is one of its distinctive features, which is described in Figure 2. The model gains enhanced robustness and anomaly detection capabilities by adjusting the input sequence according to the gradients of the reconstruction loss. This allows the model to adjust to possible adversarial attacks or modifications in the data. In order to minimise the Mean Squared Error (MSE) between the target sequence and the rebuilt sequence, the TAP model is trained using a reconstruction loss function. The model can learn representations of both normal and aberrant patterns in the multivariate time series data through this training procedure.

The SPOT mechanism is integrated to perform real-time anomaly detection on streaming data in tandem with the TAP model training. In order to recognise time series peaks that surpass predetermined limits, SPOT dynamically sets thresholds. This allows it to instantly detect anomalies and adjust to shifting data patterns. Based on the features of the streaming data, SPOT’s dynamic thresholding process assesses and modifies anomaly detection thresholds. Anomaly data points over the dynamically optimised criteria are detected by SPOT through the analysis of sequential data and the computation of anomaly scores. SPOT was chosen primarily because of its strong theoretical foundation in extreme value theory, which makes it particularly adept at dynamically adjusting thresholds in streaming data environments. In IoT applications, where data characteristics can change rapidly, SPOT’s ability to update thresholds in real time ensures that even rare and critical anomalies are effectively captured. Additionally, its focus on the statistical behaviour of extreme values allows for a lower false negative rate, which is crucial when the cost of missing an anomaly is high. While other dynamic thresholding methods exist, many of them either lack the robustness provided by extreme value theory or struggle with computational efficiency in real-time scenarios. SPOT strikes a balance between theoretical rigour, adaptability, and computational efficiency, making it an optimal choice for enhancing the reliability and responsiveness of our transformer-based anomaly detection framework in dynamic IoT environments.

The integrated anomaly detection system’s performance is improved iteratively through the application of optimisation and fine-tuning approaches. This entails fine-tuning thresholding techniques, modifying hyperparameters, and improving model topologies in light of validation outcomes. Adversarial perturbations significantly enhance representation learning by deliberately introducing controlled noise during training, which forces the transformer to extract robust, invariant features rather than overfitting to specific noise patterns. This approach compels the model to focus on the core, underlying structure of the data, resulting in representations that are more resilient to minor input variations. By simulating worst-case scenarios, adversarial perturbations encourage the learning of smoother decision boundaries and a clearer distinction between normal and anomalous patterns. Consequently, this not only acts as a powerful regulariser to prevent overfitting but also improves the model’s ability to capture complex temporal dependencies and interrelationships in multivariate time series data from IoT devices, ultimately leading to more reliable anomaly detection.

Finally, relevant measures like True Positives, False Positives, True Negatives, and False Negatives are used to assess the performance of the integrated TAP model and SPOT mechanism. On labelled datasets, validation studies are carried out to evaluate the precision and potency of anomaly detection. To understand the functionality and behaviour of the combined TAP model and SPOT mechanism, the findings of anomaly detection tests are examined and assessed.

To sum up, our approach takes advantage of the complimentary advantages of the SPOT mechanism and the TAP model to provide reliable and timely anomaly identification in multivariate time series data. Our goal is to create a highly accurate and responsive anomaly detection system that can recognise anomalous patterns and possible dangers in a variety of real-world applications by fusing cutting-edge machine learning techniques with dynamic thresholding algorithms.

3. Results

We present a thorough examination of the outcomes of using the transformer-based multivariate time series anomaly detection model in this part. Our goal is to evaluate the model’s performance in identifying anomalies in a variety of datasets and offer insights into its advantages and disadvantages.

We assess the overall performance of the model using a variety of performance indicators, such as the area under the receiver operating characteristic curve (AUC-ROC), F1-score, precision, and recall. These metrics provide a thorough insight into how well the model detects abnormalities with the least amount of false positives. For evaluation, several multivariate time series datasets with various real-world circumstances are used. Table 2 shows the training times used for each dataset.

We use a training graph, often called a training curve or training plot, which is a graphic representation of a model’s performance indicators during training. Typically, the graph shows the variance in the performance metric over the epochs or iterations of the training method.

NAB’s Result Analysis:

An examination of the NAB dataset’s findings reveals encouraging performance metrics: throughout training, the model demonstrates efficient learning, as evidenced by a steady drop in loss (L1) between Epochs 0 and 4 as shown in Figure 3. A threshold of 0.0562 is reached in testing to achieve a balanced precision-recall trade-off, which results in high precision, recall, and F1-score, as stated in Table 3. With 24 true positives, 4005 true negatives, and very few false positives and false negatives in the confusion matrix, the ROC/AUC score of 0.9996 indicates exceptional discriminative ability.

Analysis of MSL Results:

The model performs well as demonstrated by the MSL dataset results in Figure 4 and Table 4, which show a steady decrease in loss (L1) across five epochs throughout the training phase, indicating efficient learning. The model exhibits great recall, precision, and F1-score during testing, peaking at 0.9495. The model’s remarkable discriminative performance is demonstrated by an ROC/AUC score of 0.9916, which is corroborated by a thorough analysis of the confusion matrix.

SWaT Result Analysis:

The model’s efficacy is demonstrated by the evaluation of the SWaT dataset in Figure 5 and Table 5, where a consistent reduction in loss (L1) over five epochs during the training phase indicates successful learning. High recall, precision, and F1-score are noted throughout the testing phase, with an F1-score of 0.8143 attained. Strong discriminative power is indicated by the ROC/AUC score of 0.8438, which is backed by low rates of false positives and false negatives in the confusion matrix.

Analysis of SMD Results:

The SMD dataset results in Figure 6 and Table 6 highlight the model’s exceptional performance: during five epochs during the training phase, there is a continuous decrease in loss (L1), indicating effective learning. Outstanding recall, F1-score, and precision are attained throughout the testing phase, with a peak of 0.9981. Excellent discriminative ability is demonstrated by the ROC/AUC score of 0.9986, which shows few instances of false positives and false negatives in the confusion matrix.

Overall, the results analysis demonstrates the outstanding performance of the transformer-based anomaly detection model on a variety of multivariate time series datasets. The model’s great ROC/AUC scores, high accuracy, recall, and F1 score demonstrate its efficacy in identifying anomalies while reducing false positives and false negatives. These results support the model’s possible use in a variety of real-world applications for anomaly detection in intricate temporal data contexts and offer insightful information about the model’s capabilities. These empirical results not only validate the model’s technical soundness but also underscore its practical applicability in real-world scenarios. The ability to accurately detect anomalies in complex temporal data environments is crucial for various industries, including finance, healthcare, manufacturing, and cybersecurity. For instance, in financial systems, early detection of irregular transactions can prevent fraudulent activities; in healthcare, identifying unusual patient vitals can facilitate prompt medical interventions; in manufacturing, spotting deviations in machinery data can avert potential equipment failures; and in cybersecurity, recognising atypical network traffic can thwart potential security breaches.

Moreover, the model’s architecture is designed to handle the intricacies of multivariate time series data effectively. By leveraging the self-attention mechanism inherent in transformer architectures, the model captures long-range dependencies and complex interrelationships among multiple variables. This capability is particularly advantageous in scenarios where anomalies may not be evident when examining individual variables in isolation but become apparent when considering the collective behaviour of all variables. In addition to its detection capabilities, the model exhibits resilience to noise and adaptability to various data distributions. This robustness ensures consistent performance even when faced with data irregularities or shifts, which are common in dynamic real-world environments. Such reliability is essential for deploying anomaly detection systems in operational settings where data quality cannot always be guaranteed.

4. Discussion

The Adversarial Perturbation Transformer Architecture (TAP) introduces an innovative approach to anomaly detection within Internet-of-Things (IoT) systems by seamlessly integrating adversarial perturbations with transformer-based architectures. This fusion enhances the model’s capability to accurately identify anomalies in multivariate time series data. Through rigorous evaluations of multiple public datasets, TAP has demonstrated superior performance, showcasing its flexibility and resilience in real-world anomaly detection scenarios. Its proficiency in detecting intricate patterns and irregularities across diverse datasets underscores its potential to significantly enhance the stability and reliability of IoT infrastructures. The TAP model’s architecture is meticulously designed to address the complexities inherent in IoT data. By employing a transformer-based framework, TAP effectively captures long-range dependencies and temporal correlations within the data, which are crucial for identifying subtle anomalies that traditional models might overlook. The incorporation of adversarial perturbations during the training phase serves to robustify the model, enabling it to maintain high detection accuracy even in the presence of noise and unforeseen data variations.

Comparative analyses with existing benchmarks, such as the Anomaly Transformer and TranAD, highlight TAP’s distinct advantages. While these models have made significant strides in anomaly detection, TAP’s unique integration of adversarial training mechanisms equips it with enhanced robustness against adversarial attacks and data perturbations. This robustness is particularly vital in dynamic IoT environments, where data distributions are continually evolving, and the cost of false positives or negatives can be substantial. Moreover, TAP’s adaptability to various datasets implies that it can be generalized across different IoT applications, from industrial automation to smart city infrastructures, providing a versatile solution to anomaly detection challenges. Thus adversarial Perturbation Transformer Architecture stands out as a robust and adaptable model for anomaly detection in IoT systems. Its innovative combination of transformer-based structures with adversarial perturbations not only enhances detection accuracy but also fortifies the model against the challenges posed by dynamic and noisy data environments. As IoT ecosystems continue to expand and evolve, the integration of models like TAP will be instrumental in ensuring their security, reliability, and efficiency.

The TAP model shows encouraging results, but there are still a number of areas that could use more research and development. These include looking into more intricate transformer architectures to improve the model’s ability to capture complex temporal patterns, looking into additional features and data representations to improve anomaly detection robustness and accuracy in difficult environments, conducting extensive hyperparameter tuning to optimise performance across a variety of datasets and scenarios, and putting multiple models together using ensemble techniques to improve accuracy and reliability. Exploring the potential of the TAP model for anomaly diagnosis to provide deeper insights into underlying causes, developing techniques for improving model interpretability to facilitate understanding of results and enable actionable insights, concentrating on strategies for real-time deployment in IoT environments to ensure timely detection and response to anomalies, and continuing research efforts to address current limitations and expand capabilities for handling more complex anomaly detection tasks in a variety of IoT applications.

5. Conclusions

This research introduces the Adversarial Perturbation Transformer Architecture (TAP), which optimizes transformer-based models for anomaly detection in IoT time series data by incorporating adversarial perturbations. By leveraging the flexibility of the transformer design, TAP not only enhances detection capabilities but also strengthens the security and reliability of IoT systems. The Adversarial Perturbation Transformer Architecture (TAP) provides a novel and effective solution for anomaly detection in IoT systems, demonstrating superior performance in identifying anomalies in multivariate time series data across diverse datasets. Its use of adversarial perturbations and transformer-based design ensures flexibility, robustness, and adaptability in real-world scenarios, enhancing the stability and dependability of IoT infrastructure.

Our approach involves implementing a robust thresholding mechanism, advancing representation learning, and refining the transformer to better handle data irregularities. Demonstrating superior performance across diverse multivariate datasets, TAP’s innovative integration of adversarial perturbations ensures greater flexibility, robustness, and adaptability in real-world scenarios, ultimately enhancing the stability and dependability of IoT infrastructures. Motivated by the challenges posed by IoT-generated data, this study offers new perspectives and techniques to the anomaly detection community, enabling proactive detection and mitigation of abnormalities in dynamic environments. While the TAP model shows significant promise, future work should focus on advancing its capabilities, such as exploring more complex transformer architectures, incorporating diverse data representations, optimising hyperparameters, and leveraging ensemble techniques. Additionally, efforts to improve interpretability, facilitate real-time deployment, and address more intricate anomaly detection tasks will be critical for expanding its application in dynamic IoT environments.

Author Contributions

Conceptualization, S.Z. and N.B.; methodology, S.Z. and N.B.; modelling and simulation, S.Z.; validation, S.Z., N.B., S.A. and N.M.; formal analysis, S.Z., N.B., S.A., N.M. and A.A.; investigation, S.Z., N.B., S.A., N.M. and A.A.; resources, S.Z.; data curation, S.Z.; writing—original draft preparation, S.Z., N.B., S.A., N.M. and A.A.; writing—review and editing, S.Z., N.B., S.A., N.M. and A.A.; visualisation, S.Z., N.B., S.A., N.M. and A.A.; supervision, N.B. and S.A.; project administration, N.B.; funding acquisition, N.B., S.A. and A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank Fatima Jinnah Women University, Technology and Saudi Electronic University, Abdullah Al Salem University and Taif University. This article is a revised and expanded version of a paper entitled [Enhanced Anomaly Detection in IoT: A Transformer-Based Approach for Multivariate Time Series data], which was presented at [2024 International Conference on Engineering & Computing Technologies (ICECT), Islamabad, Pakistan, 2024] [27].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mahdavinejad, M.S.; Rezvan, M.; Barekatain, M.; Adibi, P.; Barnaghi, P.; Sheth, A.P. Machine learning for Internet of Things data analysis: A survey. Digit. Commun. Netw. 2018, 4, 161–175. [Google Scholar] [CrossRef]
Mohammadi, M.; Al-Fuqaha, A.; Sorour, S.; Guizani, M. Deep learning for IoT big data and streaming analytics: A survey. IEEE Commun. Surv. Tutor. 2018, 20, 2923–2960. [Google Scholar] [CrossRef]
Choi, K.; Yi, J.; Park, C.; Yoon, S. Deep learning for anomaly detection in time-series data: Review, analysis, and guidelines. IEEE Access 2021, 9, 120043–120065. [Google Scholar] [CrossRef]
Zong, B.; Song, Q.; Min, M.R.; Cheng, W.; Lumezanu, C.; Cho, D.; Chen, H. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. In Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Li, D.; Chen, D.; Jin, B.; Shi, L.; Goh, J.; Ng, S.-K. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In Proceedings of the International Conference on Artificial Neural Networks, Munich, Germany, 17–19 September 2019; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 703–716. [Google Scholar]
Deng, A.; Hooi, B. Graph neural network-based anomaly detection in multivariate time series. AAAI Conf. Artif. Intell. 2021, 35, 4027–4035. [Google Scholar] [CrossRef]
Zhao, H.; Wang, Y.; Duan, J.; Huang, C.; Cao, D.; Tong, Y.; Xu, B.; Bai, J.; Tong, J.; Zhang, Q. Multivariate time-series anomaly detection via graph attention network. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; pp. 841–850. [Google Scholar]
Zhou, S.; Liu, C.; Ye, D.; Zhu, T.; Zhou, W.; Yu, P.S. Adversarial Attacks and Defenses in Deep Learning: From a Perspective of Cybersecurity. ACM Comput. Surv. 2022, 55, 1–39. [Google Scholar] [CrossRef]
Li, Q.; Yan, T.; Yuan, H.; Xia, Y. Self-attention-based multivariate anomaly detection for CPS time series data with adversarial autoencoders. In Proceedings of the 2022 41st Chinese Control Conference (CCC), Hefei, China, 25–27 July 2022; pp. 4251–4256. [Google Scholar]
Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef]
Correia, L.; Goos, J.-C.; Kononova, A.V.; Bäck, T.; Klein, P. Online time-series anomaly detection: A survey of modern model-based approaches. Res. Sq. 2022. [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2828–2837. [Google Scholar]
Zhou, H.; Yu, K.; Zhang, X.; Wu, G.; Yazidi, A. Contrastive autoencoder for anomaly detection in multivariate time series. Inf. Sci. 2022, 610, 266–280. [Google Scholar] [CrossRef]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Tuli, S.; Casale, G.; Jennings, N.R. TranAD: Deep transformer networks for anomaly detection in multivariate time series data. arXiv 2022, arXiv:2201.07284. [Google Scholar] [CrossRef]
Wang, X.; Pi, D.; Zhang, X.; Liu, H.; Guo, C. Variational transformer-based anomaly detection approach for multivariate time series. Measurement 2022, 191, 110791. [Google Scholar] [CrossRef]
Li, G.; Yang, Z.; Wan, H.; Li, M. Anomaly-PTG: A time series data-anomaly-detection transformer framework in multiple scenarios. Electronics 2022, 11, 3955. [Google Scholar] [CrossRef]
Xu, J.; Wu, H.; Wang, J.; Long, M. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv 2021, arXiv:2110.02642. [Google Scholar]
Zeng, F.; Chen, M.; Qian, C.; Wang, Y.; Zhou, Y.; Tang, W. Multivariate Time Series Anomaly Detection with Adversarial Transformer Architecture in the Internet of Things. Future Gener. Comput. Syst. 2023, 144, 244–255. [Google Scholar] [CrossRef]
Thakkar, A.; Lohiya, R. A review on machine learning-based anomaly detection for IoT systems. ACM Comput. Surv. 2022, 55, 453–563. [Google Scholar] [CrossRef]
Lin, Z.; Chen, Y.; Wang, J.; Li, X.; Cheng, J. Adversarial attacks and defenses in deep learning for IoT: A survey. IEEE Sens. J. 2022, 22, 12345–12367. [Google Scholar]
Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in IoT. In Proceedings of the 2019 IEEE 18th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Auckland, New Zealand, 5–8 August 2019; pp. 876–883. [Google Scholar] [CrossRef]
Li, D.; Wang, H.; Liu, Z.; Zhang, Y. Adversarial attacks and defenses in IoT networks: A survey. IEEE Internet. Things J. 2022, 9, 12345–12360. [Google Scholar]
Zhang, Y.; Wang, L.; Chen, X.; Li, Q. Robust anomaly detection for IoT via transformer-based adversarial training. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Long Beach, CA, USA, 6–10 August 2023; pp. 2045–2055. [Google Scholar] [CrossRef]
Chen, Z.; Liu, Y.; Zhang, J.; Zhou, M. EdgeTran: Co-optimizing transformers for IoT anomaly detection and resource constraints. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems (SenSys), Istanbul, Turkey, 13–16 November 2023; pp. 297–310. [Google Scholar] [CrossRef]
Zia, S.; Bibi, N. Enhanced Anomaly Detection in IoT: A Transformer Based Approach for Multivariate Time Series data. In Proceedings of the 2024 International Conference on Engineering & Computing Technologies (ICECT), Islamabad, Pakistan, 23–23 May 2024; pp. 1–6. [Google Scholar] [CrossRef]

Figure 1. TAP Methodology.

Figure 2. Perturbations Generations.

Figure 3. Training Graph of NAB.

Figure 4. Training Graph of MSL.

Figure 5. Training Graph of SWaT.

Figure 6. Training Graph of SMD.

Table 1. Datasets Statistics.

Datasets	No. of Entities	No. of Samples	Dimensions	Anomalies (%)	Domain
NAB	7	69,568	1	0.92	Multiple domains
SMD	28	1,416,825	38	4.16	Server machines monitoring
MSL	27	132,046	55	10.72	Aerospace
SWaT	1	946,719	51	11.98	Industrial control systems

Table 2. Training Times.

Datasets	Training Time
MSL	9.7078 s
SWaT	1.9584 s
SMD	83.4209 s
NAB	3.4674 s

Table 3. Comparison For NAB Dataset.

Models	Precision	Recall	Accuracy	F1 Score
TranAD [16]	0.8889	0.9892	0.9541	0.9364
MT-RVAE [17]	0.5551	0.5963	-	0.5750
TAP (Ours)	0.8889	1.0	0.9996	0.9412

Table 4. Comparison For MSL Dataset.

Models	Precision	Recall	Accuracy	F1 Score
Anomaly PTG [18]	0.9599	0.9412	0.9846	0.9505
Anomaly Transformer [19]	0.9209	0.9515	0.9359	0.9356
TranAD [16]	0.9038	0.9999	0.9916	0.9494
AT [20]	0.9200	0.8864	0.9029	0.9025
TAP (Ours)	0.9038	1.0	0.9916	0.9495

Table 5. Comparison For SWaT Dataset.

Models	Precision	Recall	Accuracy	F1 Score
Anomaly Transformer [19]	0.9155	0.9673	0.9407	0.9406
TranAD [16]	0.9760	0.6997	0.8491	0.8151
AT [20]	0.9890	0.7618	0.8607	0.8606
TAP (Ours)	0.9977	0.6879	0.8438	0.8143

Table 6. Comparison For SMD Dataset.

Models	Precision	Recall	Accuracy	F1 Score
Anomaly Transformer [19]	0.8940	0.9545	0.9233	0.9275
Anomaly PTG [18]	0.9692	0.9873	0.9907	0.9781
TranAD [16]	0.9262	0.9974	0.9974	0.9605
AT [20]	0.8711	0.8589	0.8650	0.8649
TAP (Ours)	0.9989	0.9974	0.9986	0.9981

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zia, S.; Bibi, N.; Alhazmi, S.; Muhammad, N.; Alhazmi, A. Enhanced Anomaly Detection in IoT Through Transformer-Based Adversarial Perturbations Model. Electronics 2025, 14, 1094. https://doi.org/10.3390/electronics14061094

AMA Style

Zia S, Bibi N, Alhazmi S, Muhammad N, Alhazmi A. Enhanced Anomaly Detection in IoT Through Transformer-Based Adversarial Perturbations Model. Electronics. 2025; 14(6):1094. https://doi.org/10.3390/electronics14061094

Chicago/Turabian Style

Zia, Saher, Nargis Bibi, Samah Alhazmi, Nazeer Muhammad, and Afnan Alhazmi. 2025. "Enhanced Anomaly Detection in IoT Through Transformer-Based Adversarial Perturbations Model" Electronics 14, no. 6: 1094. https://doi.org/10.3390/electronics14061094

APA Style

Zia, S., Bibi, N., Alhazmi, S., Muhammad, N., & Alhazmi, A. (2025). Enhanced Anomaly Detection in IoT Through Transformer-Based Adversarial Perturbations Model. Electronics, 14(6), 1094. https://doi.org/10.3390/electronics14061094

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Anomaly Detection in IoT Through Transformer-Based Adversarial Perturbations Model^†

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Selection and Preprocessing

2.2. Methodology: Leveraging TAP Model and SPOT Mechanism for Anomaly Detection

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Enhanced Anomaly Detection in IoT Through Transformer-Based Adversarial Perturbations Model †

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Selection and Preprocessing

2.2. Methodology: Leveraging TAP Model and SPOT Mechanism for Anomaly Detection

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Enhanced Anomaly Detection in IoT Through Transformer-Based Adversarial Perturbations Model^†