On Cost-Effectiveness of Language Models for Time Series Anomaly Detection
Abstract
1. Introduction
- We propose a model-agnostic, residual-driven approach for Time Series Anomaly Detection (TSAD) using LMs, reducing the need for extensive training data and computational resources.
- We systematically compare pretrained and fine-tuned LMs with state-of-the-art LLM-based approaches, including both open- and closed-source models.
- We perform a cost–benefit analysis considering predictive performance (F1 score) and efficiency metrics, such as inference time, memory usage, and estimated monetary cost, to identify the most practical LM-based TSAD solutions.
- We demonstrate that lightweight LMs can achieve competitive TSAD performance compared to proprietary LLMs while substantially reducing computational costs, offering practical insights for real-world deployment.
2. Related Work
2.1. Time Series Forecasting Using Language Models
2.2. Time Series Anomaly Detection Using Language Models
2.3. Position of Our Work
3. Method
3.1. Preliminaries
3.2. The Proposed Pipeline
Language Model-Based Time Series Forecasting
- Warm-up context: The first window of length w serves as a warm-up. No forecasts are generated for this initial window, as it provides the initial context to predict the first forecast horizon h.
- Sliding-window forecasting: We slide a window of size w across the series with step size s. At each position, the model generates forecasts for horizon h based on the current context window. Forecasts are produced for all time steps beyond the warm-up window (see Figure 3). Importantly, the data in the context (observed) window are always true values; forecasts do not overwrite or replace observed data.
- Handling overlapping forecasts: Depending on the forecast horizon h and a step size s, some future time steps may fall within multiple forecasted intervals if . In such cases, the final forecast for a time step is computed as the average of all predictions generated for that timestamp:where k is the number of overlapping forecasts for time step i, and is the j-th forecast for that step.
- Residual computation: Once forecasts are obtained, residuals are calculated asquantifying the deviation from the expected temporal pattern and serving as input to residual-based anomaly detection methods.
3.3. Time Series Anomaly Detection
| Algorithm 1 Unified Anomaly Detection Framework | |
| Require: Residuals , detection method m 1: 2: 3: 4: return |
▹ Preprocessing step ▹ Thresholding function ▹ Mark it as anomaly |
3.4. Residual Functions
3.4.1. Standard Deviation (STD)
3.4.2. Interquartile Range (IQR)
3.4.3. Median Absolute Deviation (MAD)
3.4.4. Percentile-Based Filter (PBF)
3.4.5. Robust Z-Score
3.4.6. Extreme Value Detection
3.4.7. Relative Difference (RD)
3.4.8. Consensus-Based Detection
- Standard Deviation: 3.0
- Percentile: 3.0
- Interquartile Range (IQR): 2.0
- Relative Difference: 2.0
- Z-score: 1.0
- Extreme Value: 0.5
- Median Absolute Deviation (MAD): 0.5
- Threshold (): 6.0
| Algorithm 2 Consensus-Based Anomaly Detection |
| Require: Detected index sets , method weights , threshold 1: Initialize consensus set 2: for each data point i do 3: 4: if then 5: 6: end if 7: end for 8: return |
4. Experimental Settings
4.1. Datasets
- NAB contains 58 univariate time series from domains such as server metrics, network traffic, and IoT sensors. Each series includes labeled point anomalies and exhibits challenges such as seasonality, concept drift, and noise.
- Yahoo S5 includes 367 univariate time series across four subsets, representing a mixture of real and synthetic data. Labeled anomalies include point-wise, collective, and contextual anomalies [5], capturing spikes, drops, and behavioral shifts.
- NASA (MSL & SMAP): comprises the Mars Science Laboratory (MSL) and Soil Moisture Active Passive (SMAP) datasets. Each series contains univariate sensor readings with labeled anomalies representing complex system failures. These datasets were introduced and used in [29] for spacecraft anomaly detection.
4.2. Language Models
- T5-large is a transformer-based encoder–decoder language model suitable for sequence-to-sequence tasks. Its encoder–decoder architecture allows it to capture complex temporal dependencies in time series data. We fine-tuned a Chronos-pretrained T5-large checkpoint on the target datasets, leveraging prior knowledge from large-scale time series. Fine-tuning enables the model to adapt to dataset-specific patterns, improving forecast accuracy and supporting residual-based anomaly detection.
- GPT-2 is an autoregressive transformer-based language model originally developed for natural language generation. Its architecture models sequential dependencies, making it applicable to time series forecasting tasks once adapted to numerical sequences. Although it was not pretrained for time series data, its ability to model long-range dependencies enables it to capture temporal dynamics effectively.
- BART is an encoder–decoder transformer architecture that combines bidirectional encoding with autoregressive decoding. While originally designed for text-to-text generation tasks, its sequence modeling capabilities make it suitable for forecasting applications. Unlike T5-Large, BART was not pretrained within the Chronos framework, but it serves as a useful baseline to assess the benefits of domain-specific pretraining.
4.3. Evaluation Metrics
4.4. Parameter Settings
- Warm-up window and overlapping forecasts
- The first window served as a warm-up period, providing the initial context for the model without generating forecasts.
- Subsequent windows slid across the series by step size s, potentially producing overlapping forecasts.
- For timestamps with overlapping forecasts, the final prediction was computed as the average of overlapping values.
4.5. Hardware and Computational Resources
4.6. Reproducibility
5. Experimental Results
5.1. Further Experiments
5.2. Benchmarking Anomaly Detection Methods
Ablation Study
5.3. Comparison of Fine-Tuning Strategies and Computational Efficiency
6. Discussion
Limitations
7. Conclusions and Future Research Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Schmidl, S.; Wenig, P.; Papenbrock, T. Anomaly detection in time series: A comprehensive evaluation. Proc. VLDB Endow. 2022, 15, 1779–1797. [Google Scholar] [CrossRef]
- Paoletti, G.; Giobergia, F.; Giordano, D.; Cagliero, L.; Ronchiadin, S.; Moncalvo, D.; Mellia, M.; Baralis, E. MAD: Multicriteria Anomaly Detection of Suspicious Financial Accounts from Billions of Cash Transactions. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto, ON, Canada, 3–7 August 2025; pp. 4751–4760. [Google Scholar] [CrossRef]
- Buccafusco, S.; Cagliero, L.; Megaro, A.; Vaccarino, F.; Loti, R.; Salvatori, L. Learning industrial vehicles’ duty patterns: A real case. Comput. Ind. 2023, 145, 103826. [Google Scholar] [CrossRef]
- Lu, Y.; Wu, R.; Mueen, A.; Zuluaga, M.A.; Keogh, E. DAMP: Accurate time series anomaly detection on trillions of datapoints and ultra-fast arriving data streams. Data Min. Knowl. Discov. 2023, 37, 627–669. [Google Scholar] [CrossRef]
- Zamanzadeh Darban, Z.; Webb, G.I.; Pan, S.; Aggarwal, C.; Salehi, M. Deep Learning for Time Series Anomaly Detection: A Survey. ACM Comput. Surv. 2024, 57, 1–42. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Ansari, A.F.; Stella, L.; Turkmen, C.; Zhang, X.; Mercado, P.; Shen, H.; Shchur, O.; Rangapuram, S.S.; Pineda Arango, S.; Kapoor, S.; et al. Chronos: Learning the Language of Time Series. Trans. Mach. Learn. Res. 2024, 10, 1–25. [Google Scholar]
- Jin, M.; Wang, S.; Ma, L.; Chu, Z.; Zhang, J.Y.; Shi, X.; Chen, P.Y.; Liang, Y.; Li, Y.F.; Pan, S.; et al. Time-LLM: Time series forecasting by reprogramming large language models. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Kang, H.; Kang, P. Transformer-based multivariate time series anomaly detection using inter-variable attention mechanism. Knowl.-Based Syst. 2024, 290, 111507. [Google Scholar] [CrossRef]
- Wang, X.; Pi, D.; Zhang, X.; Liu, H.; Guo, C. Variational transformer-based anomaly detection approach for multivariate time series. Measurement 2022, 191, 110791. [Google Scholar] [CrossRef]
- Tuli, S.; Casale, G.; Jennings, N.R. TranAD: Deep transformer networks for anomaly detection in multivariate time series data. Proc. VLDB Endow. 2022, 15, 1201–1214. [Google Scholar] [CrossRef]
- Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Akhtar, N.; Barnes, N.; Mian, A. A Comprehensive Overview of Large Language Models. ACM Trans. Intell. Syst. Technol. 2025, 16, 1–72. [Google Scholar] [CrossRef]
- Alnegheimish, S.; Nguyen, L.; Berti-Equille, L.; Veeramachaneni, K. Can Large Language Models be Anomaly Detectors for Time Series? In Proceedings of the 2024 IEEE International Conferencze on Data Science and Advanced Analytics (IEEE DSAA), San Diego, CA, USA, 6–10 October 2024; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Keogh, E. Time Series Data Mining: A Unifying View. Proc. VLDB Endow. 2023, 16, 3861–3863. [Google Scholar] [CrossRef]
- Gruver, N.; Finzi, M.; Qiu, S.; Wilson, A.G. Large Language Models Are Zero-Shot Time Series Forecasters. In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Dong, M.; Huang, H.; Cao, L. Large Language Models Can Be Zero-Shot Anomaly Detectors for Time Series. In Proceedings of the 11th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2024), San Diego, CA, USA, 6–10 October 2024. [Google Scholar]
- OpenAI. GPT-3.5 Turbo. 2025. Available online: https://platform.openai.com/docs/models/gpt-3.5-turbo (accessed on 13 October 2025).
- Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; de las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; et al. Mistral 7B. arXiv 2023, arXiv:2310.06825. [Google Scholar] [CrossRef]
- Xue, H.; Salim, F.D. PromptCast: A New Prompt-Based Learning Paradigm for Time Series Forecasting. IEEE Trans. Knowl. Data Eng. 2024, 36, 6851–6864. [Google Scholar] [CrossRef]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), Online, 5–10 July 2020; pp. 7871–7880. [Google Scholar] [CrossRef]
- Zhang, Y.; Gong, K.; Zhang, K.; Li, H.; Qiao, Y.; Ouyang, W.; Yue, X. Meta-Transformer: A Unified Framework for Multimodal Learning. arXiv 2023, arXiv:2307.10802. [Google Scholar] [CrossRef]
- Zhou, T.; Niu, P.; Wang, X.; Sun, L.; Jin, R. One Fits All: Power General Time Series Analysis by Pretrained LM. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023; pp. 1–34. [Google Scholar]
- Liu, C.; He, S.; Zhou, Q.; Li, S.; Meng, W. Large Language Model Guided Knowledge Distillation for Time Series Anomaly Detection. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI 2024), Jeju, Republic of Korea, 3–9 August 2024; pp. 1–9. [Google Scholar]
- Zhou, Z.; Yu, R. Can LLMs Understand Time Series Anomalies? In Proceedings of the Thirteenth International Conference on Learning Representations (ICLR 2025), Singapore, 24–28 April 2025. [Google Scholar]
- Dong, M.; Huang, H.; Cao, L. Can LLMs Serve As Time Series Anomaly Detectors? arXiv 2024, arXiv:2408.03475. [Google Scholar] [CrossRef]
- Kaptein, M.; van den Heuvel, E. Statistics for Data Scientists: An Introduction to Probability, Statistics, and Data Analysis; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
- Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 19–23 August 2018; pp. 387–395. [Google Scholar] [CrossRef]
- Wu, R.; Keogh, E.J. Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress. IEEE Trans. Knowl. Data Eng. 2023, 35, 2421–2429. [Google Scholar] [CrossRef]
- Keogh, E. Irrational Exuberance: Why We Should Not Believe 95% of Papers on Time Series Anomaly Detection. 2025. Available online: https://kdd-milets.github.io/milets2021/slides/Irrational%20Exuberance_Eammon_Keogh.pdf (accessed on 12 December 2025).
- Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 2010, 11, 2079–2107. [Google Scholar]
- Tatbul, N.; Salinas, D.; Zhao, F.; Madden, S. Precision and Recall for Time Series. In Proceedings of the Advances in Neural Information Processing Systems, Red Hook, NY, USA, 10–16 December 2018; Volume 31. [Google Scholar]
- Liu, J.; Zhang, C.; Qian, J.; Ma, M.; Qin, S.; Bansal, C.; Lin, Q.; Rajmohan, S.; Zhang, D. Large Language Models Can Deliver Accurate and Interpretable Time Series Anomaly Detection. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Toronto, ON, Canada, 3–7 August 2025; pp. 4623–4634. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 25–29 April 2022. [Google Scholar]




| Dataset | Number of Series | Average Number of Anomalies per Series | Series Length () |
|---|---|---|---|
| NASA | 80 | 1.28 | 8715 ± 3121 |
| Yahoo S5 | 367 | 5.87 | 1562 ± 140 |
| NAB | 45 | 2.09 | 6088 ± 3150 |
| Model | Approach | Model | Setting | F1 | Inference Time (s) | Cost ($) | Memory (GB) |
|---|---|---|---|---|---|---|---|
| Pretrained | Chronos * | T5-Large | 0.679 | 57 | – | 2.5 | |
| LMs | Fine-tuned | Chronos * | T5-Large | 0.730 | 120 | – | 2.5 |
| Fine-tuned | Chronos * | Bart-Large | 0.421 | 83 | – | 1.5 | |
| Fine-tuned | Chronos * | GPT2 | 0.692 | 34 | – | 1.1 | |
| Pretrained | LLaMA3-8B [27] | Zero-shot | 0.018 | 36 | – | 22 | |
| Fine-tuned | LLaMA3-8B [27] | LoRA | 0.039 | 36 | – | 22 | |
| Pretrained | PROMPTER Mistral-7B [17] | Zero-shot | 0.283 | 46 | – | 30 | |
| Pretrained | PROMPTER GPT-3.5-turbo [17] | Zero-shot | 0.143 | 4 | 1.69 | – | |
| Pretrained | DETECTOR Mistral-7B [17] | Zero-shot | 0.534 | 280 | – | 30 | |
| LLMs | Pretrained | LLMAD Llama-3-70B [34] | AnoCoT (Few-shot) | 0.490 | – | – | – |
| Pretrained | LLMAD GPT-3.5 [34] | AnoCoT (Few-shot) | 0.204 | – | – | – | |
| Pretrained | LLMAD GPT-4 [34] | No CoT (Few-shot) | 0.621 | – | – | – | |
| Pretrained | LLMAD GPT-4 [34] | CoT (Few-shot) | 0.645 | – | – | – | |
| Pretrained | LLMAD GPT-4 [34] | AnoCoT (Few-shot) | 0.724 | 14 | 0.260 | – | |
| Pretrained | Gemini-1.5-Flash [26] | Zero-shot Vision | 0.317 | 1.9 | 0.001 | – | |
| Pretrained | GPT-4o-Mini [26] | Zero-shot Vision | 0.513 | 2.1 | 2.146 | – | |
| Pretrained | Internvlm-76B [26] | Zero-shot Text-S0.3-PAP | 0.534 | 6 | 0.017 | – |
| Method | Precision | Recall | F1 Score |
|---|---|---|---|
| Chronos Zero-shot | 0.743 * ± 0.175 | 0.716 ± 0.155 | 0.679 * ± 0.134 |
| Chronos Fine-tuned | 0.804 ± 0.133 | 0.7165 ± 0.162 | 0.730 ± 0.168 |
| GPT Fine-tuned | 0.775 * ± 0.178 | 0.692 ± 0.162 | 0.692 * ± 0.183 |
| BART Fine-tuned | 0.455 * ± 0.206 | 0.538 * ± 0.250 | 0.421 * ± 0.207 |
| Model | Yahoo S5 | NASA | NAB | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| A1 | A2 | A3 | A4 | MSL | SMAP | Art | AWS | AdEx | Traf | ||
| AER | 0.799 | 0.987 | 0.892 | 0.709 | 0.587 | 0.819 | 0.714 | 0.741 | 0.690 | 0.703 | 0.764 ± 0.108 |
| LSTM DT | 0.728 | 0.985 | 0.744 | 0.646 | 0.471 | 0.726 | 0.400 | 0.468 | 0.786 | 0.585 | 0.654 ± 0.168 |
| ARIMA | 0.728 | 0.856 | 0.797 | 0.686 | 0.525 | 0.411 | 0.308 | 0.382 | 0.727 | 0.467 | 0.589 ± 0.183 |
| MP | 0.507 | 0.897 | 0.793 | 0.825 | 0.474 | 0.423 | 0.571 | 0.440 | 0.692 | 0.305 | 0.593 ± 0.188 |
| TadGAN | 0.578 | 0.817 | 0.416 | 0.340 | 0.560 | 0.605 | 0.500 | 0.623 | 0.818 | 0.452 | 0.571 ± 0.149 |
| LSTM AE | 0.595 | 0.867 | 0.466 | 0.239 | 0.545 | 0.662 | 0.667 | 0.741 | 0.500 | 0.500 | 0.578 ± 0.163 |
| VAE | 0.592 | 0.803 | 0.438 | 0.230 | 0.494 | 0.613 | 0.667 | 0.689 | 0.583 | 0.483 | 0.559 ± 0.150 |
| AT | 0.571 | 0.565 | 0.760 | 0.576 | 0.400 | 0.266 | 0.414 | 0.430 | 0.500 | 0.371 | 0.485 ± 0.132 |
| MAvg | 0.713 | 0.356 | 0.647 | 0.615 | 0.171 | 0.092 | 0.222 | 0.408 | 0.880 | 0.157 | 0.426 * ± 0.259 |
| MS Azure | 0.280 | 0.653 | 0.702 | 0.344 | 0.051 | 0.019 | 0.056 | 0.112 | 0.163 | 0.117 | 0.250 * ± 0.235 |
| PROMPTER MISTRAL [17] | 0.194 | 0.235 | 0.338 | 0.336 | 0.160 | 0.154 | 0.370 | 0.268 | 0.000 | 0.135 | 0.219 * ± 0.108 |
| PROMPTER GPT [17] | 0.143 | 0.078 | 0.157 | 0.195 | 0.049 | 0.110 | 0.154 | 0.194 | 0.133 | 0.133 | 0.135 * ± 0.044 |
| DETECTOR [17] | 0.615 | 0.828 | 0.376 | 0.363 | 0.429 | 0.431 | 0.400 | 0.362 | 0.727 | 0.480 | 0.501 ± 0.157 |
| Chronos Pretrained | 0.543 | 0.789 | 0.747 | 0.570 | 0.553 | 0.508 | 0.092 | 0.238 | 0.387 | 0.428 | 0.485 ± 0.201 |
| Chronos Fine-tuned | 0.533 | 0.866 | 0.874 | 0.647 | 0.510 | 0.389 | 0.346 | 0.141 | 0.392 | 0.509 | 0.521 ± 0.217 |
| Method | Model | Yahoo S5 | NASA | NAB | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A1 | A2 | A3 | A4 | MSL | SMAP | Art | AWS | AdEx | Traf | |||
| Consensus | FT | 0.505 | 0.859 | 0.828 | 0.609 | 0.329 | 0.378 | 0.056 | 0.151 | 0.246 | 0.409 | 0.437 ± 0.268 |
| PT | 0.528 | 0.810 | 0.777 | 0.586 | 0.386 | 0.407 | 0.000 | 0.241 | 0.234 | 0.373 | 0.434 ± 0.250 | |
| Extreme | FT | 0.000 | 0.000 | 0.000 | 0.000 | 0.041 | 0.065 | 0.000 | 0.000 | 0.000 | 0.000 | 0.011 * ± 0.023 |
| PT | 0.000 | 0.000 | 0.000 | 0.000 | 0.075 | 0.140 | 0.000 | 0.000 | 0.000 | 0.000 | 0.022 * ± 0.048 | |
| IQR | FT | 0.459 | 0.774 | 0.558 | 0.324 | 0.186 | 0.334 | 0.010 | 0.123 | 0.303 | 0.402 | 0.347 * ± 0.220 |
| PT | 0.505 | 0.630 | 0.213 | 0.107 | 0.195 | 0.326 | 0.007 | 0.199 | 0.303 | 0.404 | 0.289 * ± 0.187 | |
| MAD | FT | 0.139 | 0.164 | 0.294 | 0.210 | 0.247 | 0.206 | 0.011 | 0.020 | 0.081 | 0.075 | 0.145 * ± 0.096 |
| PT | 0.140 | 0.231 | 0.394 | 0.261 | 0.271 | 0.249 | 0.022 | 0.023 | 0.076 | 0.086 | 0.175 * ± 0.124 | |
| Percentile | FT | 0.533 | 0.577 | 0.520 | 0.482 | 0.509 | 0.375 | 0.000 | 0.140 | 0.392 | 0.434 | 0.396 ± 0.186 |
| PT | 0.540 | 0.593 | 0.570 | 0.484 | 0.553 | 0.427 | 0.000 | 0.150 | 0.386 | 0.428 | 0.413 ± 0.194 | |
| RD | FT | 0.224 | 0.345 | 0.263 | 0.160 | 0.353 | 0.389 | 0.251 | 0.113 | 0.033 | 0.193 | 0.232 * ± 0.113 |
| PT | 0.224 | 0.166 | 0.172 | 0.115 | 0.494 | 0.508 | 0.092 | 0.150 | 0.032 | 0.205 | 0.216 * ± 0.160 | |
| STD | FT | 0.495 | 0.866 | 0.874 | 0.647 | 0.207 | 0.327 | 0.056 | 0.141 | 0.282 | 0.509 | 0.440 ± 0.289 |
| PT | 0.542 | 0.789 | 0.747 | 0.570 | 0.211 | 0.360 | 0.000 | 0.238 | 0.275 | 0.422 | 0.415 ± 0.249 | |
| Z-score | FT | 0.182 | 0.412 | 0.602 | 0.341 | 0.221 | 0.191 | 0.016 | 0.036 | 0.115 | 0.123 | 0.224 * ± 0.181 |
| PT | 0.198 | 0.573 | 0.676 | 0.410 | 0.263 | 0.214 | 0.022 | 0.041 | 0.089 | 0.126 | 0.261 * ± 0.224 | |
| Model | Method | Dataset | Epochs | Time | Memory (GB) |
|---|---|---|---|---|---|
| BART-Large | Full fine-tuning | Yahoo S5 | 1 | 1:26:00 | 12–16 |
| GPT-2 | Full fine-tuning | Yahoo S5 | 1 | 0:29:28 | 6–8 |
| Chronos T5-Large | Full fine-tuning | Yahoo S5 | 1 | 2:24:51 | 24–32 |
| LLaMA3-8B | LoRA | Synthetic | 1 | 2:00:00 (2 GPUs) | 130 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yassine, A.; Cagliero, L.; Vassio, L. On Cost-Effectiveness of Language Models for Time Series Anomaly Detection. Information 2026, 17, 72. https://doi.org/10.3390/info17010072
Yassine A, Cagliero L, Vassio L. On Cost-Effectiveness of Language Models for Time Series Anomaly Detection. Information. 2026; 17(1):72. https://doi.org/10.3390/info17010072
Chicago/Turabian StyleYassine, Ali, Luca Cagliero, and Luca Vassio. 2026. "On Cost-Effectiveness of Language Models for Time Series Anomaly Detection" Information 17, no. 1: 72. https://doi.org/10.3390/info17010072
APA StyleYassine, A., Cagliero, L., & Vassio, L. (2026). On Cost-Effectiveness of Language Models for Time Series Anomaly Detection. Information, 17(1), 72. https://doi.org/10.3390/info17010072

