Multivariate Time Series Anomaly Detection Based on Inverted Transformer with Multivariate Memory Gate
Abstract
1. Introduction
- We propose a multi-memory gated Transformer framework based on inverted embedding. Through fine-grained modeling of complex interactions and dependencies among variables, global correlation learning is strengthened;
- We propose an improved multi-memory unit and self-supervised modeling mechanism, which can capture the specific normal pattern of each variable and adaptively adjust the memory state;
- The most advanced performance is achieved in a large number of experiments on four widely used benchmark datasets.
2. Related Work
2.1. Detection of Time Series Anomalies
2.2. Embedding and Tokenization of Time Series Data
2.3. Transformer-Based Time Series Analysis
3. Methods
3.1. Definition of the Problem
3.2. Model Overview
3.3. Input Embedding
3.4. Feature Extraction Encoder
3.4.1. Encoder Structure
3.4.2. Reconstruction Decoder
3.5. Multivariate Memory Module
3.5.1. Multivariate Memory Update
3.5.2. Latent Vector Update
3.5.3. Loss Function
Algorithm 1 ITMMG Training Algorithm |
Required: Training data ; ITMMG model; number of epochs; block number Lenc. Train encoder after first phase -) 1 04: while n ≤ epoch do: = Embedding(X) = 1 to Lenc: )) )) 09: End for 10: for each variable in do: 11: Compute attention scores 12: Update memory p 13: End for 14: 15: 16: 17: 18: 19: 20: n + 1 21: End while |
3.6. Anomaly Criterion
Algorithm 2 ITMMG Testing Algorithm |
Required: Training data ; trained model parameters; block number Lenc. 01: = Embedding(X) 02: = 1 to Lenc: 03: Self-attention layer is applied on variate tokens. 04: )) 05: )) 06: End for 07: do: 08: Compute attention scores between 09: 10: End for 11: 12: 13: (X) 14: ) |
4. Experiment
4.1. Benchmark Dataset
- PSM (Pooled Server Metrics [37]): The PSM dataset, which comes from eBay server computers, is openly accessible. It has 25 characteristics that describe server machine metrics, including memory and CPU use.
- SMAP (Soil Moisture Active Passive [38]): The SMAP dataset is used by NASA and contains soil samples and telemetry information obtained from the Mars rover. It consists of 25 features, primarily used for studying the spatiotemporal variations of soil moisture.
- SWaT (Smart Water Treatment [39]): This dataset is collected from a real-world water treatment plant and contains sensor data with 51 dimensions, collected from continuously operating critical infrastructure systems.
4.2. Evaluation Metrics
4.3. Implementation Details
4.4. Detection Results
4.5. Ablation Studies
4.6. Parameter Sensitivity
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gong, D.; Liu, L.; Le, V.; Saha, B.; Mansour, M.R.; Venkatesh, S.; Hengel, A.V.D. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1705–1714. [Google Scholar]
- Park, H.; Noh, J.; Ham, B. Learning memory-guided normality for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14372–14381. [Google Scholar]
- Chen, X.; Deng, L.; Huang, F.; Zhang, C.; Zhang, Z.; Zhao, Y.; Zheng, K. Daemon: Unsupervised anomaly detection and interpretation for multivariate time series. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chiang Mai, Thailand, 13–16 April 2021. [Google Scholar]
- Li, D.; Chen, D.; Jin, B.; Shi, L.; Goh, J.; Ng, S.-K. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In Proceedings of the International Conference on Artificial Neural Networks, Munich, Germany, 12–15 September 2019; Springer International Publishing: Cham, Switzerland, 2019. [Google Scholar]
- Zhou, B.; Liu, S.; Hooi, B.; Cheng, X.; Ye, J. Beatgan: Anomalous rhythm detection using adversarially generated time series. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019. [Google Scholar]
- Bao, H.; Dong, L.; Piao, S.; Ye, Q.; Wei, F. BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers. arXiv 2022, arXiv:2208.09210. [Google Scholar]
- Abdulaal, A.; Liu, Z.; Lancewicki, T. Practical approach to asynchronous multivariate time series anomaly detection and localization. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery Data Mining, Singapore, 14–18 August 2021. [Google Scholar]
- Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, London, UK, 19–23 August 2018. [Google Scholar]
- Mathur, A.P.; Tippenhauer, N.O. SWaT: A water treatment testbed for research and training on ICS security. In Proceedings of the 2016 International Workshop on Cyber-Physical Systems for Smart Water Networks (CySWater), Vienna, Austria, 11 April 2016. [Google Scholar]
- Angryk, R.; Martens, P.; Aydin, B.; Kempton, D.; Mahajan, S.; Basodi, S.; Ahmadzadeh, A.; Cai, X.; Filali Boubrahimi, S.; Hamdi, S.M.; et al. SWAN-SF. Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/EBCFKM (accessed on 20 May 2025).
- Lai, K.H.; Zha, D.; Xu, J.; Zhao, Y.; Wang, G.; Hu, X. Revisiting time series outlier detection: Definitions and benchmarks. In Proceedings of the Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), Virtual Event, 7–10 December 2021. [Google Scholar]
- Xu, H.; Chen, W.; Zhao, N.; Li, Z.; Bu, J.; Li, Z. Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference for Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Shin, Y.; Lee, S.; Tariq, S.; Lee, M.S.; Jung, O.; Chung, D.; Woo, S.S. Itad: Integrative tensor-based anomaly detection system for reducing false positives of satellite systems. In Proceedings of the 29th ACM International Conference on Information Knowledge Management, Virtual Event, 19–23 October 2020. [Google Scholar]
- Shen, L.; Li, Z.; Kwok, J. Timeseries anomaly detection using temporal hierarchical one-class network. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–14 December 2020. [Google Scholar]
- Manevitz, L.M.; Yousef, M. One-class SVMs for document classification. J. Mach. Learn. Res. 2001, 2, 139–154. [Google Scholar]
- Borghesi, M.; Molan, M.; Milano, A.; Bartolini, A. Anomaly detection and anticipation in high performance computing systems. In Proceedings of the 2021 International Conference on High Performance Computing Systems, New York, NY, USA, 20–22 April 2021. [Google Scholar]
- Fraikin, A.; Bennetot, A.; Allassonnière, S. T-Rep: Representation Learning for Time Series using Time-Embeddings. arXiv 2024, arXiv:2310.04486. [Google Scholar]
- Song, J.; Kim, K.; Oh, J.; Cho, S. Memto: Memory-guided transformer for multivariate time series anomaly detection. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Yang, Y.; Zhang, C.; Zhou, T.; Wen, Q.; Sun, L. Dcdetector: Dual attention contrastive representation learning for time series anomaly detection. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Singapore, 28 August–1 September 2023. [Google Scholar]
- Cook, A.A.; Mısırlı, G.; Fan, Z. Anomaly detection for IoT time-series data: A survey. IEEE Internet Things J. 2019, 7, 6481–6494. [Google Scholar] [CrossRef]
- Schmidl, S.; Wenig, P.; Papenbrock, T. Anomaly detection in time series: A comprehensive evaluation. Proc. VLDB Endow. 2022, 15, 1779–1797. [Google Scholar] [CrossRef]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the of the 8th IEEE International Conference on Data Mining, Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008. [Google Scholar]
- Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef] [PubMed]
- Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 16–18 May 2000. [Google Scholar]
- Canizo, M.; Triguero, I.; Conde, A.; Onieva, E. Multi-head CNN–RNN for multi-time series anomaly detection: An industrial case study. Neurocomputing 2019, 363, 246–260. [Google Scholar] [CrossRef]
- Park, D.; Hoshi, Y.; Kemp, C.C. A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robot. Autom. Lett. 2018, 3, 1544–1551. [Google Scholar] [CrossRef]
- Tuli, S.; Casale, G.; Jennings, N.R. Tranad: Deep transformer networks for anomaly detection in multivariate time series data. arXiv 2022, arXiv:2201.07284. [Google Scholar] [CrossRef]
- Xu, J. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv 2021, arXiv:2110.02642. [Google Scholar]
- Jeong, Y.; Yang, E.; Ryu, J.H.; Park, I.; Kang, M. Anomalybert: Self-supervised transformer for time series anomaly detection using data degradation scheme. arXiv 2023, arXiv:2305.04468. [Google Scholar]
- Liu, Y.; Hu, T.G.; Zhao, H.R.; Wu, H.X.; Wang, S.Y.; Ma, L.T.; Long, M.S. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. In Proceedings of the 12nd International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Vaswani, A. Attention is all you need. In Proceedings of the Advances in Neural Information Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Zong, B.; Song, Q.; Min, M.R.; Cheng, W.; Lumezanu, C.; Cho, D.; Chen, H. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Deng, A.; Hooi, B. Graph neural network-based anomaly detection in multivariate time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021. [Google Scholar]
- Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar]
- Chen, Z.; Chen, D.; Zhang, X.; Yuan, Z.; Cheng, X. Learning graph structures with transformer for multivariate time-series anomaly detection in IoT. IEEE Internet Things J. 2021, 9, 9179–9189. [Google Scholar] [CrossRef]
- Wang, S.Y.; Wu, H.X.; Shi, X.M.; Hu, T.G.; Luo, H.K.; Ma, L.T.; Zhang, J.Y.; Zhou, J. Timemixer: Decomposable multiscale mixing for time series forecasting. In Proceedings of the Twelfth International Conference on Learning Representations (ICLR), Vienna Austria, 7–11 May 2024. [Google Scholar]
- Huang, Q.; Shen, L.; Zhang, R.; Cheng, J.; Ding, S.; Zhou, Z.; Wang, Y. HDMixer: Hierarchical dependency with extendable patch for multivariate time series forecasting. In Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI), Vancouver, BC, Canada, 20–27 February 2024. [Google Scholar]
- Yu, M.; Sun, S. Policy-based reinforcement learning for time series anomaly detection. Eng. Appl. Artif. Intell. 2020, 95, 103919. [Google Scholar] [CrossRef]
- Ping, C.X. Application Research of Edge Computing Technology Oriented to 5G in Railway Marshalling Stations. Proc. Railw. Signal. Commun. 2023, 59, 7–14. [Google Scholar]
- Fang, S.S. Research on Structural Strain Threshold Setting of Bridge Health Monitoring System Based on Deep Learning. Master’s Thesis, China Three Gorges University, Yichang, China, May 2023. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; Curran Associates Inc.: Red Hook, NY, USA; pp. 3111–3119. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 3–7 May 2021. [Google Scholar]
- Hamilton, W.L. Graph Representation Learning; Morgan & Claypool Publishers: San Rafael, CA, USA, 2020. [Google Scholar]
- Yeh, C.-F.; Mahadeokar, J.; Kalgaonkar, K.; Wang, Y.; Le, D.; Jain, M.; Schubert, K.; Fuegen, C.; Michael; Seltzer, L. Transformer-transducer: End-to-end speech recognition with self-attention. arXiv 2019, arXiv:1910.12977. [Google Scholar]
- Trirat, P.; Shin, Y.; Kang, J.; Nam, Y.; Na, J.; Bae, M.; Kim, J.; Kim, B.; Lee, J.-G. Universal Time-Series Representation Learning: A Survey. arXiv 2024, arXiv:2401.03717. [Google Scholar] [CrossRef]
- Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.X.; Yan, X. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv. Neural Inf. Process. Syst. 2019, 32, 5243–5253. [Google Scholar]
Dataset | Dimension | Train | Validation | Test | Anomalies (%) |
---|---|---|---|---|---|
PSM | 25 | 105,984 | 26,497 | 87,841 | 27.75 |
SMAP | 25 | 108,146 | 27,037 | 427,617 | 13.13 |
SWaT | 51 | 396,000 | 99,000 | 449,919 | 11.98 |
NIPS-TS-SWAN | 38 | 48,000 | 12,000 | 708,420 | 32.6 |
Dataset | PSM | SMAP | SWaT | ||||||
---|---|---|---|---|---|---|---|---|---|
Metric | P | R | F1 | P | R | F1 | P | R | F1 |
OC-SVM | 62.75 | 80.89 | 70.67 | 53.85 | 59.07 | 56.34 | 45.39 | 49.22 | 47.23 |
IFOREST | 76.09 | 92.45 | 83.48 | 52.39 | 52.39 | 55.53 | 49.29 | 44.95 | 47.02 |
DAGMM | 93.49 | 70.03 | 80.08 | 86.45 | 56.73 | 68.51 | 89.92 | 57.84 | 70.4 |
VAR | 90.71 | 83.82 | 87.17 | 81.38 | 53.88 | 64.83 | 81.59 | 60.29 | 69.34 |
LSTM-VAE | 73.62 | 89.92 | 80.96 | 92.2 | 67.75 | 78.1 | 76.03 | 89.5 | 82.2 |
OmniAnomaly | 88.39 | 74.46 | 80.83 | 92.49 | 81.99 | 86.92 | 81.42 | 84.3 | 82.83 |
BeatGAN | 90.3 | 93.84 | 92.04 | 92.38 | 55.85 | 69.61 | 64.01 | 87.46 | 73.92 |
THOC | 88.14 | 90.99 | 89.54 | 92.06 | 89.34 | 90.68 | 93.94 | 86.36 | 85.13 |
Anomaly Transformer | 96.91 | 98.9 | 97.89 | 94.13 | 99.4 | 96.69 | 91.55 | 86.73 | 91.07 |
DCdetector | 96.62 | 97.32 | 96.97 | 94.44 | 97.8 | 96.09 | 92.53 | 88.59 | 90.52 |
ITMMG (Ours) | 97.61 | 98.47 | 98.04 | 98.86 | 94.82 | 96.8 | 92.38 | 91.49 | 91.93 |
Dataset | NIPS-TS-SWAN | ||
---|---|---|---|
Metric | P | R | F1 |
OC-SVM | 47.4 | 49.8 | 48.5 |
Iforest | 56.9 | 59.8 | 58.3 |
MatrixProfile | 16.7 | 17.5 | 17.1 |
GBRT | 44.7 | 37.5 | 40.8 |
LSTM-RNN | 52.7 | 22.1 | 31.2 |
Autoregression | 42.1 | 35.4 | 38.5 |
AutoEncoder | 49.7 | 52.2 | 50.9 |
AnomalyTransformer | 90.7 | 47.4 | 62.3 |
DCdetector | 95.5 | 59.6 | 73.4 |
ITMMG (Ours) | 99.32 | 58.44 | 73.58 |
Method | F1 Score (%) | ||||
---|---|---|---|---|---|
PSM | SMAP | SWaT | NIPS-TS-SWAN | Avg F1 | |
w/o Inverted Tokenization | 77.93 | 79.45 | 72.67 | 58.76 | 72.20 |
w/o multi-memory | 81.57 | 80.26 | 82.68 | 59.57 | 76.02 |
w/o memory | 83.21 | 81.23 | 73.23 | 60.12 | 74.44 |
ITMMG (Ours) | 98.04 | 96.8 | 91.93 | 73.58 | 90.09 |
Method | F1 Score (%) | ||||
---|---|---|---|---|---|
PSM | SMAP | SWaT | NIPS-TS-SWAN | Avg F1 | |
LSD | 82.79 | 80.34 | 70.83 | 65.46 | 74.86 |
RDF | 78.41 | 79.32 | 75.85 | 68.46 | 74.51 |
Addition | 92.79 | 92.27 | 88.49 | 69.73 | 85.82 |
Multiplication | 98.04 | 96.8 | 91.93 | 73.58 | 90.09 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, Y.; Liu, W.; Xu, C.; Bai, L.; Zhang, E.; Wang, J. Multivariate Time Series Anomaly Detection Based on Inverted Transformer with Multivariate Memory Gate. Entropy 2025, 27, 939. https://doi.org/10.3390/e27090939
Ma Y, Liu W, Xu C, Bai L, Zhang E, Wang J. Multivariate Time Series Anomaly Detection Based on Inverted Transformer with Multivariate Memory Gate. Entropy. 2025; 27(9):939. https://doi.org/10.3390/e27090939
Chicago/Turabian StyleMa, Yuan, Weiwei Liu, Changming Xu, Luyi Bai, Ende Zhang, and Junwei Wang. 2025. "Multivariate Time Series Anomaly Detection Based on Inverted Transformer with Multivariate Memory Gate" Entropy 27, no. 9: 939. https://doi.org/10.3390/e27090939
APA StyleMa, Y., Liu, W., Xu, C., Bai, L., Zhang, E., & Wang, J. (2025). Multivariate Time Series Anomaly Detection Based on Inverted Transformer with Multivariate Memory Gate. Entropy, 27(9), 939. https://doi.org/10.3390/e27090939