Timestamp-Guided Knowledge Distillation for Robust Sensor-Based Time-Series Forecasting
Abstract
1. Introduction
- (1)
- We propose a novel temporal knowledge distillation framework consisting of two prediction branches that learn from each other. The framework is highly flexibility: the Backbone Model can be any sequence-based prediction model, and the Timestamp Mapper serves as a plug-and-play component that seamlessly collaborates with the Backbone Model.
- (2)
- We design a unique Timestamp Mapper with self-distillation. This self-distillation facilitates knowledge transfer from deeper to shallower layers within the same model, which is beneficial to enhancing its capacity to learn from broader contextual information in the timestamps.
- (3)
- Experiments are conducted using several real-world datasets collected from sensor-based systems to demonstrate the effectiveness of the proposed model.
2. Related Work
2.1. Knowledge Distillation
2.2. Time-Series Prediction
3. Methods
3.1. Overview
3.2. Timestamp Mapper
3.3. Self-Distillation Phase
3.4. Mutual Learning Phase
Algorithm 1: Prediction algorithm process in TKDF—TKDF for Time-Series Prediction |
Input: Historical timestamp , future timestamp , historical sequence |
Output: Future sequence |
Hyperparameter: Learning rate , weight |
1. Initialization: Network and Network |
2. [The pre-training stage]: utilize to predict |
3. for do: |
4. Calculate self-distillation loss: |
5. Update parameters of the timestamp mapper: |
6. end for |
7. [The Multi-branch Prediction Stage]: utilize and to predict |
8. for do: |
9. Calculate mutual learning loss of network : |
10. Calculate mutual learning loss of network : |
11. Update parameters of network : |
12. Update parameters of network : |
13. Update the prediction result |
14. End for |
4. Experiments
4.1. Datasets
- Electricity comprises hourly electricity consumption data from 321 customers, collected between 2012 and 2014.
- Traffic contains hourly road occupancy rates collected by 862 sensors installed on freeways in the San Francisco Bay Area, covering the period from 2015 to 2016.
- Weather contains 21 meteorological variables recorded every 10 min at stations across Germany in 2020.
- ETT records the oil temperature and load characteristics of two power transformers from 2016 to 2018, with measurements collected at two different resolutions (15 min and 1 h) for each transformer, resulting in four datasets: ETTh1, ETTh2, ETTm1, and ETTm2.
4.2. Backbone Model
4.3. Evaluation Metrics
4.4. Experiment 1: Prediction Performance of the Proposed Framework
4.5. Experiment 2: Parameter Determination
4.6. Experiment 3: Ablation Experiments
5. Discussion
5.1. Strengths and Innovations
5.2. Comparison with Existing Models
5.3. Computational Complexity Analysis
5.4. Limitations and Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, J.; Zheng, Y.; Qi, D. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Liu, Y.; Yu, J.J.Q.; Kang, J.; Niyato, D.; Zhang, S. Privacy-Preserving Traffic Flow Prediction: A Federated Learning Approach. IEEE Internet Things J. 2020, 7, 7751–7763. [Google Scholar] [CrossRef]
- Zhang, L.; Zhu, J.; Jin, B.; Wei, X. Multiview Spatial-Temporal Meta-Learning for Multivariate Time Series Forecasting. Sensors 2024, 24, 4473. [Google Scholar] [CrossRef] [PubMed]
- Pinto, T.; Praça, I.; Vale, Z.; Silva, J. Ensemble Learning for Electricity Consumption Forecasting in Office Buildings. Neurocomputing 2021, 423, 747–755. [Google Scholar] [CrossRef]
- Du, S.; Li, T.; Yang, Y.; Horng, S.J. Deep Air Quality Forecasting Using Hybrid Deep Learning Framework. IEEE T. Knowl. Data En. 2019, 33, 2412–2424. [Google Scholar] [CrossRef]
- He, Q.Q.; Siu, S.W.I.; Si, Y.W. Instance-based Deep Transfer Learning with Attention for Stock Movement Prediction. Appl. Intell. 2023, 53, 6887–6908. [Google Scholar] [CrossRef]
- Liang, P.P.; Zadeh, A.; Morency, L.P. Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions. ACM Comput. Surv. 2024, 56, 1–42. [Google Scholar] [CrossRef]
- Yin, X.; Wu, G.; Wei, J.; Shen, Y.; Qi, H.; Yin, B. Deep Learning on Traffic Prediction: Methods, Analysis, and Future Directions. IEEE Trans. Intell. Transp. Syst. 2022, 23, 3054840. [Google Scholar] [CrossRef]
- Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023.
- Yan, J.; Li, H.; Zhang, D.; Bai, Y.; Xu, Y.; Han, C. A Multi-feature Spatial–temporal Fusion Network for Traffic Flow Prediction. Sci. Rep. 2024, 14, 14264. [Google Scholar] [CrossRef] [PubMed]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021. [Google Scholar]
- Wang, C.; Qi, Q.; Wang, J.; Sun, H.; Zhuang, Z.; Wu, J.; Liao, J. Rethinking the Power of Timestamps for Robust Time Series Forecasting: A Global-Local Fusion Perspective. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
- Pereira, L.M.; Salazar, A.; Vergara, L.A. Comparative Analysis of Early and Late Fusion for the Multimodal Two-Class Problem. IEEE Access 2023, 11, 84283–84300. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [PubMed]
- Ma, R.; Zhang, C.; Kang, Y.; Wang, X.; Qiu, C. MCD: Multi-Stage Catalytic Distillation for Time Series Forecasting. In Proceedings of the International Conference on Database Systems for Advanced Applications, Gifu, Japan, 2–5 July 2024. [Google Scholar]
- Wu, R.; Feng, M.; Guan, W.; Wang, D.; Lu, H.; Ding, E. A Mutual Learning Method for Salient Object Detection with Intertwined Multi-Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, California, CA, USA, 16–20 June 2019. [Google Scholar]
- Li, Y.; Li, P.; Yan, D.; Liu, Z. Deep Knowledge Distillation: A Self-Mutual Learning Framework for Traffic Prediction. Expert Syst. Appl. 2024, 252, 124138. [Google Scholar] [CrossRef]
- Kim, K.; Ji, B.M.; Yoon, D.; Hwang, S. Self-knowledge Distillation with Progressive Refinement of Targets. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montréal, QC, Canada, 11–17 October 2021. [Google Scholar]
- Ji, J.; Yu, F.; Lei, M. Self-supervised Spatiotemporal Graph Neural Networks with Self-Distillation for Traffic Prediction. IEEE T. Intell. Transp. 2022, 24, 1580–1593. [Google Scholar] [CrossRef]
- Li, W.; Law, K.L.E. Deep learning models for time series forecasting: A review. IEEE Access 2024, 12, 92306–92327. [Google Scholar] [CrossRef]
- Ariyo, A.A.; Adewumi, A.O.; Ayo, C.K. Stock Price Prediction Using the ARIMA Model. In Proceedings of the International Conference on Computer Modelling and Simulation, Cambridge, UK, 26–28 March 2014. [Google Scholar]
- Ghaderpour, E.; Dadkhah, H.; Dabiri, H.; Bozzano, F.; Scarascia Mugnozza, G.; Mazzanti, P. Precipitation Time Series Analysis and Forecasting for Italian Regions. Eng. Proc. 2023, 39, 23. [Google Scholar]
- Puri, C.; Kooijman, G.; Vanrumste, B.; Luca, S. Forecasting Time Series in Healthcare with Gaussian Processes and Dynamic Time Warping Based Subset Selection. IEEE J. Biomed. Health 2022, 26, 6126–6137. [Google Scholar] [CrossRef] [PubMed]
- Agarap, A.F.M. A Neural Network Architecture Combining Gated Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection in Network Traffic Data. In Proceedings of the ACM International Conference Proceeding Series, Macau, China, 26–28 February 2018. [Google Scholar]
- Bouhali, A.; Zeroual, A.; Harrou, F. Enhancing Traffic Flow Prediction with Machine Learning Models: A Comparative Study. In Proceedings of the 2024 International Conference of the African Federation of Operational Research Societies, Tlemcen, Algeria, 3–5 November 2024. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. arXiv 2018, arXiv:1707.01926. [Google Scholar] [CrossRef]
- Jia, Y.; Lin, Y.; Hao, X.; Lin, Y.; Guo, S.; Wan, H. WITRAN: Water-Wave Information Transmission and Recurrent Acceleration Network for Long-Range Time Series Forecasting. In Proceedings of the Annual Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3848–3858. [Google Scholar] [CrossRef]
- Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Kaushik, S.; Choudhury, A.; Sheron, P.K.; Dasgupta, N.; Natarajan, S.; Pickett, L.A.; Dutt, V. AI in Healthcare: Time-Series Forecasting Using Statistical, Neural, and Ensemble Architectures. Front. Big Data 2020, 3, 4. [Google Scholar] [CrossRef] [PubMed]
Dataset | Channel | Length | Frequency | Information |
---|---|---|---|---|
Electricity | 321 | 26,304 | 1 h | Energy |
Traffic | 862 | 17,544 | 1 h | Transportation |
Weather | 21 | 52,696 | 10 min | Climate |
ETTh1 and ETTh2 | 7 | 17,420 | 1 h | Energy |
ETTm1 and ETTm2 | 7 | 69,680 | 15 min | Energy |
Models | Informer | +Ours | TimesNet | +Ours | DLinear | +Ours | iTransformer | +Ours | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MAE | MAE | MSE | MAE | MSE | MAE | |
Traffic | 96 | 0.641 | 0.347 | 0.598 | 0.332 | 0.598 | 0.321 | 0.565 | 0.306 | 0.602 | 0.390 | 0.561 | 0.368 | 0.397 | 0.273 | 0.385 | 0.265 |
192 | 0.708 | 0.398 | 0.677 | 0.375 | 0.623 | 0.339 | 0.601 | 0.325 | 0.637 | 0.398 | 0.510 | 0.320 | 0.438 | 0.281 | 0.429 | 0.274 | |
336 | 0.840 | 0.439 | 0.702 | 0.391 | 0.645 | 0.343 | 0.627 | 0.331 | 0.653 | 0.437 | 0.514 | 0.341 | 0.456 | 0.315 | 0.437 | 0.289 | |
720 | 0.864 | 0.452 | 0.798 | 0.406 | 0.662 | 0.358 | 0.642 | 0.344 | 0.697 | 0.469 | 0.535 | 0.384 | 0.469 | 0.332 | 0.441 | 0.297 | |
Electricity | 96 | 0.334 | 0.420 | 0.296 | 0.383 | 0.186 | 0.279 | 0.164 | 0.245 | 0.198 | 0.281 | 0.152 | 0.242 | 0.150 | 0.241 | 0.142 | 0.231 |
192 | 0.365 | 0.434 | 0.343 | 0.406 | 0.203 | 0.298 | 0.180 | 0.267 | 0.204 | 0.290 | 0.175 | 0.271 | 0.169 | 0.257 | 0.168 | 0.250 | |
336 | 0.387 | 0.441 | 0.369 | 0.417 | 0.224 | 0.319 | 0.192 | 0.293 | 0.211 | 0.309 | 0.194 | 0.286 | 0.178 | 0.267 | 0.170 | 0.237 | |
720 | 0.406 | 0.449 | 0.394 | 0.428 | 0.240 | 0.327 | 0.222 | 0.319 | 0.253 | 0.338 | 0.230 | 0.314 | 0.229 | 0.324 | 0.217 | 0.304 | |
ETTh1 | 96 | 0.938 | 0.742 | 0.632 | 0.639 | 0.456 | 0.487 | 0.437 | 0.462 | 0.419 | 0.442 | 0.403 | 0.412 | 0.400 | 0.427 | 0.398 | 0.401 |
192 | 1.297 | 0.836 | 0.872 | 0.698 | 0.543 | 0.531 | 0.523 | 0.513 | 0.485 | 0.507 | 0.470 | 0.483 | 0.462 | 0.541 | 0.441 | 0.528 | |
336 | 1.368 | 0.878 | 0.903 | 0.704 | 0.621 | 0.572 | 0.581 | 0.556 | 0.526 | 0.523 | 0.498 | 0.504 | 0.503 | 0.458 | 0.474 | 0.512 | |
720 | 1.364 | 0.892 | 0.973 | 0.737 | 0.801 | 0.691 | 0.704 | 0.653 | 0.712 | 0.612 | 0.697 | 0.600 | 0.721 | 0.631 | 0.698 | 0.603 | |
ETTh2 | 96 | 0.789 | 0.671 | 0.768 | 0.654 | 0.347 | 0.374 | 0.296 | 0.329 | 0.342 | 0.374 | 0.324 | 0.352 | 0.306 | 0.349 | 0.298 | 0.347 |
192 | 1.197 | 0.699 | 1.132 | 0.637 | 0.432 | 0.425 | 0.367 | 0.396 | 0.480 | 0.473 | 0.453 | 0.451 | 0.383 | 0.413 | 0.374 | 0.409 | |
336 | 1.541 | 1.017 | 1.507 | 1.001 | 0.467 | 0.458 | 0.392 | 0.431 | 0.592 | 0.517 | 0.567 | 0.498 | 0.430 | 0.438 | 0.423 | 0.421 | |
720 | 1.797 | 1.208 | 1.763 | 1.107 | 0.483 | 0.469 | 0.416 | 0.402 | 0.643 | 0.596 | 0.609 | 0.507 | 0.433 | 0.454 | 0.424 | 0.443 | |
ETTm1 | 96 | 0.598 | 0.561 | 0.510 | 0.483 | 0.369 | 0.417 | 0.349 | 0.381 | 0.332 | 0.387 | 0.312 | 0.350 | 0.396 | 0.407 | 0.343 | 0.372 |
192 | 0.621 | 0.582 | 0.539 | 0.532 | 0.442 | 0.438 | 0.411 | 0.394 | 0.392 | 0.392 | 0.367 | 0.383 | 0.418 | 0.433 | 0.392 | 0.403 | |
336 | 0.892 | 0.781 | 0.724 | 0.639 | 0.503 | 0.499 | 0.482 | 0.463 | 0.449 | 0.436 | 0.438 | 0.424 | 0.472 | 0.476 | 0.439 | 0.441 | |
720 | 1.083 | 0.794 | 0.928 | 0.732 | 0.551 | 0.527 | 0.513 | 0.513 | 0.501 | 0.483 | 0.493 | 0.487 | 0.523 | 0.516 | 0.503 | 0.498 | |
ETTm2 | 96 | 0.237 | 0.312 | 0.218 | 0.270 | 0.183 | 0.267 | 0.161 | 0.232 | 0.197 | 0.298 | 0.190 | 0.293 | 0.189 | 0.268 | 0.179 | 0.263 |
192 | 0.314 | 0.347 | 0.297 | 0.323 | 0.240 | 0.313 | 0.227 | 0.268 | 0.293 | 0.371 | 0.287 | 0.369 | 0.264 | 0.318 | 0.248 | 0.309 | |
336 | 0.478 | 0.482 | 0.463 | 0.384 | 0.317 | 0.335 | 0.304 | 0.297 | 0.375 | 0.435 | 0.363 | 0.427 | 0.320 | 0.364 | 0.307 | 0.342 | |
720 | 0.878 | 0.673 | 0.732 | 0.591 | 0.414 | 0.398 | 0.403 | 0.372 | 0.568 | 0.538 | 0.550 | 0.528 | 0.423 | 0.427 | 0.401 | 0.413 | |
Weather | 96 | 1.248 | 0.867 | 0.649 | 0.567 | 0.178 | 0.231 | 0.174 | 0.224 | 0.198 | 0.257 | 0.172 | 0.247 | 0.180 | 0.225 | 0.161 | 0.220 |
192 | 1.272 | 0.882 | 0.872 | 0.678 | 0.223 | 0.279 | 0.221 | 0.279 | 0.242 | 0.231 | 0.224 | 0.229 | 0.228 | 0.263 | 0.213 | 0.261 | |
336 | 1.306 | 1.012 | 0.998 | 0.861 | 0.289 | 0.313 | 0.286 | 0.297 | 0.391 | 0.339 | 0.369 | 0.317 | 0.285 | 0.313 | 0.275 | 0.308 | |
720 | 1.363 | 1.131 | 1.320 | 0.892 | 0.368 | 0.369 | 0.352 | 0.362 | 0.350 | 0.386 | 0.334 | 0.365 | 0.361 | 0.350 | 0.350 | 0.346 |
Method | iTransformer | +Ours | w/o Mutual Learning | w/o Quantile | w/o Self-Distillation | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
w/o Backbone | w/o Timestamp Mapper | ||||||||||||
Metric | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | |
Traffic | 96 | 0.397 | 0.273 | 0.385 | 0.265 | 0.432 | 0.297 | 0.395 | 0.271 | 0.389 | 0.271 | 0.389 | 0.270 |
192 | 0.438 | 0.281 | 0.429 | 0.274 | 0.463 | 0.303 | 0.443 | 0.285 | 0.433 | 0.280 | 0.431 | 0.279 | |
336 | 0.456 | 0.315 | 0.437 | 0.289 | 0.478 | 0.336 | 0.457 | 0.313 | 0.446 | 0.297 | 0.444 | 0.293 | |
720 | 0.469 | 0.332 | 0.441 | 0.297 | 0.510 | 0.351 | 0.470 | 0.336 | 0.459 | 0.322 | 0.451 | 0.310 | |
Electricity | 96 | 0.150 | 0.241 | 0.142 | 0.231 | 0.189 | 0.268 | 0.153 | 0.244 | 0.146 | 0.235 | 0.144 | 0.233 |
192 | 0.169 | 0.257 | 0.168 | 0.250 | 0.210 | 0.279 | 0.172 | 0.259 | 0.169 | 0.254 | 0.168 | 0.256 | |
336 | 0.178 | 0.267 | 0.170 | 0.237 | 0.235 | 0.294 | 0.180 | 0.271 | 0.174 | 0.248 | 0.172 | 0.241 | |
720 | 0.229 | 0.324 | 0.217 | 0.304 | 0.267 | 0.348 | 0.231 | 0.330 | 0.223 | 0.317 | 0.220 | 0.310 | |
Weather | 96 | 0.180 | 0.225 | 0.161 | 0.220 | 0.226 | 0.249 | 0.182 | 0.227 | 0.171 | 0.223 | 0.169 | 0.224 |
192 | 0.228 | 0.263 | 0.213 | 0.261 | 0.249 | 0.289 | 0.229 | 0.266 | 0.220 | 0.262 | 0.217 | 0.261 | |
336 | 0.285 | 0.313 | 0.275 | 0.308 | 0.317 | 0.343 | 0.287 | 0.315 | 0.279 | 0.311 | 0.279 | 0.308 | |
720 | 0.361 | 0.350 | 0.350 | 0.346 | 0.384 | 0.376 | 0.365 | 0.351 | 0.358 | 0.349 | 0.357 | 0.347 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yan, J.; Li, H.; Bai, Y.; Liu, J.; Lv, H.; Bai, Y. Timestamp-Guided Knowledge Distillation for Robust Sensor-Based Time-Series Forecasting. Sensors 2025, 25, 4590. https://doi.org/10.3390/s25154590
Yan J, Li H, Bai Y, Liu J, Lv H, Bai Y. Timestamp-Guided Knowledge Distillation for Robust Sensor-Based Time-Series Forecasting. Sensors. 2025; 25(15):4590. https://doi.org/10.3390/s25154590
Chicago/Turabian StyleYan, Jiahe, Honghui Li, Yanhui Bai, Jie Liu, Hairui Lv, and Yang Bai. 2025. "Timestamp-Guided Knowledge Distillation for Robust Sensor-Based Time-Series Forecasting" Sensors 25, no. 15: 4590. https://doi.org/10.3390/s25154590
APA StyleYan, J., Li, H., Bai, Y., Liu, J., Lv, H., & Bai, Y. (2025). Timestamp-Guided Knowledge Distillation for Robust Sensor-Based Time-Series Forecasting. Sensors, 25(15), 4590. https://doi.org/10.3390/s25154590