TSA-Net: Multivariate Time Series Anomaly Detection Based on Two-Stage Temporal Attention
Abstract
1. Introduction
- We propose a novel spatio-temporal feature extractor that integrates RepVGG-TCN and GAT. This design captures both intrinsic temporal dynamics and spatial correlations while ensuring model compactness and high training efficiency through structural reparameterization.
- We develop a unique cascade feedback mechanism that uses the initial prediction from the first stage to iteratively refine the input for the second stage, significantly improving the model’s sensitivity to subtle anomalies.
- We introduce an adaptive gated fusion strategy to dynamically weigh the importance of spatio-temporal features, enhancing the detection of composite anomalies.
2. Related Work
2.1. Temporal-Centric Modeling Approaches
2.2. Joint Spatio-Temporal Modeling Approaches
2.3. Hierarchical Architectures and Computational Efficiency
3. Methodology
3.1. Problem Definition
3.2. TSA-Net Framework
3.2.1. Spatio-Temporal Feature Extraction
3.2.2. Adaptive Gated Fusion
3.2.3. Encoder-Decoder
3.2.4. Cascade Feedback
3.3. Training Process
3.3.1. Objective Function
3.3.2. Anomaly Scoring and Thresholding
4. Experiments
4.1. Dataset Description
- Soil Moisture Active Passive (SMAP): This dataset is a 25-dimensional publicly available dataset collected by NASA [9] and contains telemetry information and anomaly data extracted from Anomalous Event Anomaly (ISA) reports from spacecraft monitoring systems.
- Mars Science Laboratory (MSL): This dataset is similar to SMAP and contains actuator and sensor data from the Mars Rover itself [5].
- Server Machine Dataset (SMD): This dataset is a newly collected 5-week-long dataset from a large internet company and is divided into two equally sized subsets of training and testing sets [13].
4.2. Data Preprocessing
- Normalization: To address varying scales across features, Min-Max normalization is applied to all datasets, mapping the values to the interval [0, 1].
- Noise Enhancement: Gaussian white noise is added to the datasets to enhance model robustness.
- Sliding Window Processing: Slicing the original time series with a sliding window of size w converts it into a dataset of overlapping, fixed-length subsequences, which enables the model to process the data efficiently.
4.3. Evaluation Metrics
4.4. Baseline Methods
- LSTM-NDT [36]: A model employing a unique dual LSTM parallel structure. Aims to capture different levels of time series dependence through two parallel recurrent neural networks.
- DAGMM [37]: A classical approach combining deep autoencoders and Gaussian mixture models.
- OmniAnomal [38]: By sampling from a probability distribution and decoding the reconstruction, the model better captures the randomness and dynamics of normal data and detects anomalies by reconstructing the probabilities.
- MAD-GAN [39]: A reconstruction method based on GAN networks to compute anomaly scores and capture temporal correlation of time series distributions using LSTM-RNN.
- MTAD-GAT [5]: A model that combines graph attention networks and recurrent neural networks. After capturing these complex dependencies via GAT, the enhanced feature representation is fed into a GRU network to model temporal dynamics.
- USAD [40]: Designed an encoder and two independent decoders to form two competing self-encoders. During training, one self-encoder tries to minimize the reconstruction error, while the other tries to distinguish between the real data and the other’s reconstructed data.
- GDN [41]: It is proposed to learn relationships between sensors in the form of graphs and then recognize and interpret deviations from the learned patterns.
- CAE-M [42]: A convolutional autoencoder that treats the time series window as a 2D matrix and uses standard convolutional and transpositional convolutional layers for encoding and decoding, focusing on its local feature extraction capabilities of convolutional operations to learn patterns in normal data and detecting anomalies through reconstruction errors.
- TranAD [13]: Advanced Transformer-based models utilize the idea of enlarging the error to effectively amplify anomalous signals, and direct their internal attention mechanisms by reconstructing the error-derived focus scores to make the model more sensitive to deviations.
- DTAAD [3]: Proposes an autoregressive-autoencoder (AR-AE) framework that extracts long and short-term dependencies separately via a parallel two-branch TCN combined with a Transformer and utilizes a callback mechanism to amplify the reconstruction error to enhance the detection of weak anomalies.
4.5. Hyperparameters
4.6. Performance Comparison
4.7. Computational Efficiency Analysis
4.7.1. Training Efficiency and Convergence
4.7.2. Inference Latency and Deployment
4.7.3. Training Dynamics and Convergence
4.8. Visualization Analysis
4.8.1. Spatio-Temporal Correlation Analysis
4.8.2. Case Study
4.9. Ablation Studies
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sun, Y.; Chen, T.; Nguyen, Q.V.H.; Yin, H. TinyAD: Memory-efficient anomaly detection for time-series data in industrial IoT. IEEE Trans. Ind. Inform. 2023, 20, 824–834. [Google Scholar] [CrossRef]
- Zhou, X.; Hu, Y.; Liang, W.; Ma, J.; Jin, Q. Variational LSTM enhanced anomaly detection for industrial big data. IEEE Trans. Ind. Inform. 2020, 17, 3469–3477. [Google Scholar] [CrossRef]
- Yu, L.R.; Lu, Q.h.; Xue, Y. DTAAD: Dual TCN-attention networks for anomaly detection in multivariate time series data. Knowl.-Based Syst. 2024, 295, 111849. [Google Scholar] [CrossRef]
- Kang, H.; Kang, P. Transformer-based multivariate time series anomaly detection using inter-variable attention mechanism. Knowl.-Based Syst. 2024, 290, 111507. [Google Scholar] [CrossRef]
- Zhao, H.; Wang, Y.; Duan, J.; Huang, C.; Cao, D.; Tong, Y.; Xu, B.; Bai, J.; Tong, J.; Zhang, Q. Multivariate time-series anomaly detection via graph attention network. In 2020 IEEE International Conference on Data Mining (ICDM); IEEE: Piscataway, NJ, USA, 2020; pp. 841–850. [Google Scholar]
- Shuaiyi, L.; Wang, K.; Zhang, L.; Wang, B. Global-local integration for GNN-based anomalous device state detection in industrial control systems. Expert Syst. Appl. 2022, 209, 118345. [Google Scholar]
- Han, J.; Chen, Z.; Zhou, D.; Hu, B.; Xia, T.; Pan, E. Unsupervised motion-based anomaly detection with graph attention networks for industrial robots labeling. Eng. Appl. Artif. Intell. 2025, 146, 110298. [Google Scholar] [CrossRef]
- Zeng, M.; Wu, F.; Cheng, Y. Remaining useful life prediction via spatio-temporal channels and transformer. IEEE Sens. J. 2023, 23, 29176–29185. [Google Scholar] [CrossRef]
- Wang, C.; Liu, G. From anomaly detection to classification with graph attention and transformer for multivariate time series. Adv. Eng. Inform. 2024, 60, 102357. [Google Scholar] [CrossRef]
- Belay, M.A.; Rasheed, A.; Rossi, P.S. MTAD: Multiobjective transformer network for unsupervised multisensor anomaly detection. IEEE Sens. J. 2024, 24, 20254–20265. [Google Scholar] [CrossRef]
- Cao, Y.; Tang, X.; Deng, X.; Wang, P. Fault detection of complicated processes based on an enhanced transformer network with graph attention mechanism. Process Saf. Environ. Prot. 2024, 186, 783–797. [Google Scholar] [CrossRef]
- Tu, F.F.; Liu, D.J.; Yan, Z.W.; Jin, X.B.; Geng, G.G. STFT-TCAN: A TCN-attention based multivariate time series anomaly detection architecture with time-frequency analysis for cyber-industrial systems. Comput. Secur. 2024, 144, 103961. [Google Scholar] [CrossRef]
- Tuli, S.; Casale, G.; Jennings, N.R. TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. Proc. VLDB Endow. 2022, 15, 2568–2581. [Google Scholar] [CrossRef]
- Cao, W.; Meng, Z.; Li, J.; Wu, J.; Fan, F. A remaining useful life prediction method for rolling bearing based on TCN transformer. IEEE Trans. Instrum. Meas. 2024, 73, 3501309. [Google Scholar] [CrossRef]
- Ge, D.; Dong, Z.; Cheng, Y.; Wu, Y. An enhanced spatio-temporal constraints network for anomaly detection in multivariate time series. Knowl.-Based Syst. 2024, 283, 111169. [Google Scholar] [CrossRef]
- Ding, C.; Sun, S.; Zhao, J. MST-GAT: A multimodal spatial–temporal graph attention network for time series anomaly detection. Inf. Fusion 2023, 89, 527–536. [Google Scholar] [CrossRef]
- Fan, Y.; Fu, T.; Listopad, N.I.; Liu, P.; Garg, S.; Hassan, M.M. Utilizing correlation in space and time: Anomaly detection for Industrial Internet of Things (IIoT) via spatiotemporal gated graph attention network. Alex. Eng. J. 2024, 106, 560–570. [Google Scholar] [CrossRef]
- Xiong, W.; Wang, P.; Sun, X.; Wang, J. SiET: Spatial information enhanced transformer for multivariate time series anomaly detection. Knowl.-Based Syst. 2024, 296, 111928. [Google Scholar] [CrossRef]
- Gao, Y.; Su, R.; Ben, X.; Chen, L. EST transformer: Enhanced spatiotemporal representation learning for time series anomaly detection. J. Intell. Inf. Syst. 2025, 63, 783–805. [Google Scholar] [CrossRef]
- Xu, Y.; Ding, Y.; Jiang, J.; Cong, R.; Zhang, X.; Wang, S.; Kwong, S.; Yang, S.H. Skip-patching spatial–temporal discrepancy-based anomaly detection on multivariate time series. Neurocomputing 2024, 609, 128428. [Google Scholar] [CrossRef]
- Gao, C.; Ma, H.; Pei, Q.; Chen, Y. Dynamic graph-based graph attention network for anomaly detection in industrial multivariate time series data. Appl. Intell. 2025, 55, 517. [Google Scholar] [CrossRef]
- Zheng, Y.; Koh, H.Y.; Jin, M.; Chi, L.; Phan, K.T.; Pan, S.; Chen, Y.P.P.; Xiang, W. Correlation-aware spatial–temporal graph learning for multivariate time-series anomaly detection. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 11802–11816. [Google Scholar] [CrossRef] [PubMed]
- Liu, L.; Tian, L.; Kang, Z.; Wan, T. Spacecraft anomaly detection with attention temporal convolution networks. Neural Comput. Appl. 2023, 35, 9753–9761. [Google Scholar] [CrossRef]
- Gu, W.; Liu, J.; Chen, Z.; Zhang, J.; Su, Y.; Gu, J.; Feng, C.; Yang, Z.; Yang, Y.; Lyu, M.R. Identifying performance issues in cloud service systems based on relational-temporal features. ACM Trans. Softw. Eng. Methodol. 2025, 34, 1–31. [Google Scholar] [CrossRef]
- Wu, P.; Tian, E.; Tao, H.; Chen, Y. Transfer learning-motivated intelligent fault diagnosis framework for cross-domain knowledge distillation. Neural Netw. 2025, 190, 107699. [Google Scholar] [CrossRef]
- Tang, S.; Ding, Y.; Wang, H. An Interpretable Method for Anomaly Detection in Multivariate Time Series Predictions. Appl. Sci. 2025, 15, 7479. [Google Scholar] [CrossRef]
- Chen, N.; Tu, H.; Zeng, H.; Ou, Y. Anomaly detection for key performance indicators by fusing self-supervised spatio-temporal graph attention networks. Knowl.-Based Syst. 2024, 300, 112167. [Google Scholar] [CrossRef]
- Li, J.; Deng, X.; Yao, B. Enhanced anomaly detection of industrial control systems via graph-driven spatio-temporal adversarial deep support vector data description. Expert Syst. Appl. 2025, 270, 126573. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, Y.; Wei, W.; Peng, Y.; Niu, Y. Less Is More: Fast Multivariate Time Series Forecasting with LightTS. arXiv 2022, arXiv:2202.03661. [Google Scholar]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2021; pp. 13733–13742. [Google Scholar]
- Tang, N.; Wang, X.; Zhou, F.; Tang, S.; Lyu, Y. Reparameterization causal convolutional network for automatic modulation classification. IEEE Trans. Veh. Technol. 2024, 73, 8576–8583. [Google Scholar] [CrossRef]
- Wu, C.W.; Squillante, M.S.; Kalantzis, V.; Horesh, L. Stable iterative refinement for solving linear systems with inaccurate computation. J. Comput. Appl. Math. 2025, 471, 116746. [Google Scholar] [CrossRef]
- Yan, J.; Ma, L.; Jiang, T.; Zheng, J.; Li, D.; Teng, X. Microseismic moment tensor inversion based on ResNet model. Artif. Intell. Geosci. 2025, 6, 100107. [Google Scholar] [CrossRef]
- Wang, H.; Liu, Y.; Yin, H.; Zheng, X.; Zha, Z.; Lv, M.; Guo, Z. Res2coder: A two-stage residual autoencoder for unsupervised time series anomaly detection. Appl. Intell. 2025, 55, 804. [Google Scholar] [CrossRef]
- Deng, M. Extreme Value Theory in High Dimensional Statistical Models and its Application in Anomaly Detection. In 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC); IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
- Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; Association for Computing Machinery: New York, NY, USA, 2018; pp. 387–395. [Google Scholar]
- Zong, B.; Song, Q.; Min, M.R.; Cheng, W.; Lumezanu, C.; Cho, D.; Chen, H. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; Association for Computing Machinery: New York, NY, USA, 2019; pp. 2828–2837. [Google Scholar]
- Li, D.; Chen, D.; Jin, B.; Shi, L.; Goh, J.; Ng, S.K. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In Proceedings of the International Conference on Artificial Neural Networks; Springer: Cham, Switzerland, 2019; pp. 703–716. [Google Scholar]
- Audibert, J.; Michiardi, P.; Guyard, F.; Marti, S.; Zuluaga, M.A. Usad: Unsupervised anomaly detection on multivariate time series. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; Association for Computing Machinery: New York, NY, USA, 2020; pp. 3395–3404. [Google Scholar]
- Deng, A.; Hooi, B. Graph neural network-based anomaly detection in multivariate time series. In Proceedings of the AAAI Conference on Artificial Intelligence; Association for the Advancement of Artificial Intelligence: Washington, DC, USA, 2021; Volume 35, pp. 4027–4035. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, Y.; Wang, J.; Pan, Z. Unsupervised deep anomaly detection for multi-sensor time-series signals. IEEE Trans. Knowl. Data Eng. 2021, 35, 2118–2132. [Google Scholar] [CrossRef]








| Dataset | Dimensions | Train | Test | Anomalies (%) |
|---|---|---|---|---|
| SMD | 38 (4) | 708,420 | 708,420 | 4.16 |
| SMAP | 25 (55) | 135,183 | 427,617 | 13.13 |
| MSL | 55 (3) | 58,317 | 73,729 | 10.72 |
| Method | SMD | SMAP | MSL | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P | R | AUC | F1 | P | R | AUC | F1 | P | R | AUC | F1 | |||
| LSTM-NDT | 0.9734 | 0.8430 | 0.9669 | 0.9034 | 0.8523 | 0.7317 | 0.8600 | 0.7874 | 0.6288 | 0.9991 | 0.9530 | 0.7721 | ||
| DAGMM | 0.9103 | 0.9905 | 0.9952 | 0.9488 | 0.8067 | 0.9882 | 0.9883 | 0.8874 | 0.7363 | 0.9992 | 0.9714 | 0.8482 | ||
| OmniAnomaly | 0.8879 | 0.9975 | 0.9944 | 0.9395 | 0.8128 | 0.9409 | 0.9887 | 0.8723 | 0.7846 | 0.9915 | 0.9780 | 0.8762 | ||
| MAD-GAN | 0.9981 | 0.8431 | 0.9931 | 0.9142 | 0.8157 | 0.9207 | 0.9889 | 0.8655 | 0.8516 | 0.9921 | 0.9860 | 0.9166 | ||
| MTAD-GAT | 0.8210 | 0.9208 | 0.9921 | 0.8681 | 0.7993 | 0.9984 | 0.9845 | 0.8880 | 0.7917 | 0.9817 | 0.9889 | 0.8762 | ||
| USAD | 0.9062 | 0.9967 | 0.9933 | 0.9493 | 0.7480 | 0.9619 | 0.9889 | 0.8412 | 0.7948 | 0.9904 | 0.9794 | 0.8817 | ||
| GDN | 0.7170 | 0.9965 | 0.9924 | 0.8340 | 0.7482 | 0.9884 | 0.9864 | 0.8521 | 0.9308 | 0.9885 | 0.9814 | 0.9588 | ||
| CAE-M | 0.9080 | 0.9662 | 0.9781 | 0.9360 | 0.8193 | 0.9559 | 0.9901 | 0.8824 | 0.7753 | 0.9991 | 0.9903 | 0.8735 | ||
| TranAD | 0.9052 | 0.9964 | 0.9932 | 0.9487 | 0.8103 | 0.9989 | 0.9886 | 0.8950 | 0.9036 | 0.9990 | 0.9914 | 0.9492 | ||
| DTAAD | 0.8464 | 0.9966 | 0.9891 | 0.9152 | 0.8221 | 0.9991 | 0.9910 | 0.9026 | 0.9037 | 0.9991 | 0.9917 | 0.9494 | ||
| TSA-Net | 0.9795 | 0.9769 | 0.9865 | 0.9782 | 0.8724 | 0.9952 | 0.9945 | 0.9297 | 0.8950 | 0.9873 | 0.9892 | 0.9389 | ||
| Time per Epoch (s) | Total Convergence Time (s) | ||||||
|---|---|---|---|---|---|---|---|
| Method | SMD | SMAP | MSL | SMD | SMAP | MSL | |
| LSTM-NDT | 57.63 | 25.52 | 24.25 | 10,373 | 4441 | 4074 | |
| DAGMM | 31.59 | 8.68 | 8.28 | 1011 | 278 | 314 | |
| OmniAnomaly | 35.39 | 17.31 | 14.26 | 2760 | 1264 | 984 | |
| MAD-GAN | 30.14 | 15.71 | 15.63 | 1356 | 597 | 594 | |
| MTAD-GAT | 1128.82 | 157.11 | 198.73 | 45,600 | 4713 | 5563 | |
| USAD | 58.19 | 5.71 | 5.14 | 2036 | 348 | 288 | |
| GDN | 144.92 | 10.51 | 17.43 | 4203 | 736 | 1134 | |
| CAE-M | 517.13 | 41.34 | 120.21 | 10,922 | 1984 | 3607 | |
| TranAD | 22.59 | 15.71 | 16.27 | 905 | 628 | 520 | |
| DTAAD | 21.27 | 1.51 | 2.81 | 894 | 105 | 78 | |
| TSA-Net | 13.63 | 1.48 | 1.19 | 545 | 59 | 38 | |
| Scope | Structural Topology | Latency (ms) | Speed-Up | Throughput (Samples/s) |
|---|---|---|---|---|
| TCN Module | Multi-branch (Unfused) | 0.79 | - | - |
| Single-branch (Fused) | 0.34 | 2.31× | - | |
| Overall System | TSA-Net (Unfused) | 5.01 | - | ∼199 |
| TSA-Net (Fused) | 4.35 | 1.15× | ∼230 |
| Method | F1-Score | ||
|---|---|---|---|
| SMD | SMAP | MSL | |
| TSA-NET | 0.9782 | 0.9297 | 0.9389 |
| w/o Local-STF | 0.9660 | 0.8714 | 0.8896 |
| w/o Global-STF | 0.9679 | 0.8735 | 0.8957 |
| w/o Feedback | 0.9731 | 0.9072 | 0.8458 |
| w/o Gated Fusion | 0.9679 | 0.8676 | 0.8416 |
| w/o Transformer | 0.9582 | 0.8869 | 0.7806 |
| w/o RepVGG | 0.9655 | 0.8812 | 0.8530 |
| Head Count (h) | Dim/Head () | Test MSE | Configuration Strategy |
|---|---|---|---|
| 1 | 38 | 0.0030 | Single-Head Baseline |
| 2 | 19 | 0.0040 | Conservative Partitioning |
| 19 | 2 | 0.0025 | Proposed (Dynamic Rule) |
| 38 | 1 | 0.0031 | Fully-Split (Limiting Case) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wu, H.; Le, W.; Jia, Z.-H.; Zhao, H.; Zhang, S.; Zhang, Z.-S. TSA-Net: Multivariate Time Series Anomaly Detection Based on Two-Stage Temporal Attention. Sensors 2026, 26, 1062. https://doi.org/10.3390/s26031062
Wu H, Le W, Jia Z-H, Zhao H, Zhang S, Zhang Z-S. TSA-Net: Multivariate Time Series Anomaly Detection Based on Two-Stage Temporal Attention. Sensors. 2026; 26(3):1062. https://doi.org/10.3390/s26031062
Chicago/Turabian StyleWu, Hao, Wu Le, Zhen-Hong Jia, Hui Zhao, Sai Zhang, and Zhen-Sen Zhang. 2026. "TSA-Net: Multivariate Time Series Anomaly Detection Based on Two-Stage Temporal Attention" Sensors 26, no. 3: 1062. https://doi.org/10.3390/s26031062
APA StyleWu, H., Le, W., Jia, Z.-H., Zhao, H., Zhang, S., & Zhang, Z.-S. (2026). TSA-Net: Multivariate Time Series Anomaly Detection Based on Two-Stage Temporal Attention. Sensors, 26(3), 1062. https://doi.org/10.3390/s26031062

