Predicting the Remaining Service Life of Power Transformers Using Machine Learning
Abstract
1. Introduction
2. Theoretical Foundations
2.1. Transformer
2.2. BiGRU-GlobalAttention
2.3. Cross-Attention Feature Fusion
2.4. Parallel Transformer–GlobalAttention–BiGRU Model
3. Experimental Design
3.1. Data Source
3.2. Data Preprocessing
3.3. Model Parameter Settings
4. Experimental Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mishra, S.R.; Pattnaik, P.K.; Baithalu, R.; Ratha, P.K.; Panda, S. Predicting heat transfer performance in transient flow of CNT nanomaterials with thermal radiation past a heated spinning sphere using an artificial neural network: A machine learning approach. Partial. Differ. Equ. Appl. Math. 2024, 12, 100936. [Google Scholar] [CrossRef]
- Vita, V.; Fotis, G.; Chobanov, V.; Pavlatos, C.; Mladenov, V. Predictive maintenance for distribution system operators in increasing transformers’ reliability. Electronics 2023, 12, 1356. [Google Scholar] [CrossRef]
- Sayyad, S.; Kumar, S.; Bongale, A.; Kamat, P.; Patil, S.; Kotecha, K. Data-driven remaining useful life estimation for milling process: Sensors, algorithms, datasets, and future directions. IEEE Access 2021, 9, 110255–110286. [Google Scholar] [CrossRef]
- Sun, L.; Huang, X.; Liu, J.; Song, J.; Wu, S. Remaining useful life prediction of lithium batteries based on jump connection multi-scale CNN. Sci. Rep. 2025, 15, 12345. [Google Scholar] [CrossRef]
- Xu, T.; Nan, X.; Cai, X.; Zhao, P. Industrial sensor time series prediction based on CBDAE and TCN-Transformer. J. Nanjing Univ. Inf. Sci. Technol. 2025, 17, 455–466. [Google Scholar]
- Gillioz, A.; Casas, J.; Mugellini, E.; Abou Khaled, O. Overview of the Transformer-based Models for NLP Tasks. In Proceedings of the 2020 15th Conference on Computer Science and Information Systems (FedCSIS), Sofia, Bulgaria, 6–9 September 2020; pp. 179–183. [Google Scholar]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
- Shewalkar, A. Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. J. Artif. Intell. Soft Comput. Res. 2019, 9, 235–245. [Google Scholar] [CrossRef]
- Mehrish, A.; Majumder, N.; Bharadwaj, R.; Mihalcea, R.; Poria, S. A review of deep learning techniques for speech processing. Inf. Fusion 2023, 99, 101869. [Google Scholar] [CrossRef]
- Feng, L.; Tung, F.; Hajimirsadeghi, H.; Ahmed, M.O.; Bengio, Y.; Mori, G. Attention as an RNN. arXiv 2024, arXiv:2405.13956. [Google Scholar] [PubMed]
- Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent models of visual attention. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; Cottrell, G. A dual-stage attention-based recurrent neural network for time series prediction. arXiv 2017, arXiv:1704.02971. [Google Scholar]
- Hao, Y.; Zhang, Y.; Liu, K.; He, S.; Liu, Z.; Wu, H.; Zhao, J. An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, WA, Canada, 30 July–4 August 2017; pp. 221–231. [Google Scholar]
- Yeh, C.; Chen, Y.; Wu, A.; Chen, C.; Viégas, F.; Wattenberg, M. Attentionviz: A global view of transformer attention. IEEE Trans. Vis. Comput. Graph. 2023, 30, 262–272. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Wu, X.-J. CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach. Inf. Fusion 2024, 103, 102147. [Google Scholar] [CrossRef]
- Cai, W.; Wei, Z. Remote sensing image classification based on a cross-attention mechanism and graph convolution. IEEE Geosci. Remote Sens. Lett. 2020, 19, 1–5. [Google Scholar] [CrossRef]
- Yan, J.; Zhuang, X.; Zhao, X.; Shao, X.; Han, J. CAMSNet: Few-Shot Semantic Segmentation via Class Activation Map and Self-Cross Attention Block. Comput. Mater. Contin. 2025, 82, 5363–5386. [Google Scholar] [CrossRef]
- Cai, M.; Zhan, J.; Zhang, C.; Liu, Q. Fusion k-means clustering and multi-head self-attention mechanism for a multivariate time prediction model with feature selection. Int. J. Mach. Learn. Cybern. 2024, 1–19. [Google Scholar] [CrossRef]
- Li, Z.; Luo, S.; Liu, H.; Tang, C.; Miao, J. TTSNet: Transformer–Temporal Convolutional Network–Self-Attention with Feature Fusion for Prediction of Remaining Useful Life of Aircraft Engines. Sensors 2025, 25, 432. [Google Scholar] [CrossRef]
- Wang, L.; Wang, X.; Dong, C.; Sun, Y. Wave predictor models for medium and long term based on dual attention-enhanced Transformer. Ocean Eng. 2024, 310, 118761. [Google Scholar] [CrossRef]
- Liu, M.; Wang, W.; Hu, X.; Fu, Y.; Xu, F.; Miao, X. Multivariate long-time series traffic passenger flow prediction using causal convolutional sparse self-attention MTS-Informer. Neural Comput. Appl. 2023, 35, 24207–24223. [Google Scholar] [CrossRef]
- Xu, L.; Lv, Y.; Moradkhani, H. Daily multistep soil moisture forecasting by combining linear and nonlinear causality and attention-based encoder-decoder model. Stoch. Environ. Res. Risk Assess. 2024, 38, 4979–5000. [Google Scholar] [CrossRef]
- Wang, Y.; Li, Y.; Lu, H.; Wang, D. Method for remaining useful life prediction of rolling bearings based on deep reinforcement learning. Rev. Sci. Instrum. 2024, 95, 095112. [Google Scholar] [CrossRef]
- Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
- Shumway, R.H.; Stoffer, D.S.; Shumway, R.H.; Stoffer, D.S. ARIMA models. In Time Series Analysis and Its Applications: With R Examples; Springer: Berlin/Heidelberg, Germany, 2017; pp. 75–163. [Google Scholar]
- Chua, L.O.; Roska, T. The CNN paradigm. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 2002, 40, 147–156. [Google Scholar] [CrossRef]
- Chen, Z.; Zhao, Y.-L.; Pan, X.-Y.; Dong, Z.-Y.; Gao, B.; Zhong, Z.-W. An overview of Prophet. In Proceedings of the Algorithms and Architectures for Parallel Processing: 9th International Conference, ICA3PP 2009, Taipei, Taiwan, 8–11 June 2009; pp. 396–407. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 19–21 May 2021; pp. 11106–11115. [Google Scholar]
- Fan, Y.; Zhang, L.; Li, K. AE-BiLSTM: Multivariate Time-Series EMI Anomaly Detection in 5G-R High-Speed Rail Wireless Communications. In Proceedings of the 2024 IEEE International Conference on Communications Workshops (ICC Workshops), Denver, CO, USA, 9–13 June 2024; pp. 439–444. [Google Scholar]
- Li, Z. A Combination Perception Model Based on CNN-BiGRU-Attention and Its Application in Signal Processing. In Proceedings of the 2024 IEEE 4th International Conference on Data Science and Computer Application (ICDSCA), Dalian, China, 22–24 November 2024; pp. 694–701. [Google Scholar]
- Bian, Q.; As’arry, A.; Cong, X.; Rezali, K.A.B.M.; Raja Ahmad, R.M.K.B. A hybrid Transformer-LSTM model apply to glucose prediction. PLoS ONE 2024, 19, e0310084. [Google Scholar] [CrossRef] [PubMed]










| Field | Date | HUFL | HULL | MUFL | MULL | LUFL | LULL | OT |
|---|---|---|---|---|---|---|---|---|
| Description | Record date | High useful load | High useless load | Medium useful load | Medium useless load | Low useful load | Low useless load | Oil temperature (target) |
| Unit | MW | MW | MW | MW | MW | MW | MW | °C |
| Date | HUFL (MW) | HULL (MW) | MUFL (MW) | MULL (MW) | LUFL (MW) | LULL (MW) | OT (°C) |
|---|---|---|---|---|---|---|---|
| 1 July 2016 0:00 | 41.130 | 12.480 | 36.536 | 9.354 | 4.423 | 1.310 | 38.662 |
| 1 July 2016 1:00 | 37.527 | 10.135 | 33.936 | 7.532 | 4.434 | 1.215 | 37.124 |
| 1 July 2016 2:00 | 37.946 | 11.309 | 35.330 | 9.006 | 2.100 | 0.000 | 36.465 |
| 1 July 2016 3:00 | 38.951 | 11.895 | 35.543 | 9.435 | 3.380 | 1.215 | 33.608 |
| 1 July 2016 4:00 | 38.113 | 11.475 | 35.409 | 9.623 | 2.036 | 0.000 | 31.850 |
| Module Name | Parameter | Value |
|---|---|---|
| Transformer | Attention Dimension | 128 |
| Encoder Layers | 2 | |
| Cross-Attention | Multi-Head Attention Heads | 2 |
| BiGRU | Layers | 32 |
| Dimension | 64 | |
| Global Attention Dimension | Attention Layer Dimension | 64 |
| Hyperparameters | Learning Rate | 0.003 |
| Loss Function | MSE | |
| Dropout | 0.5 | |
| Optimizer | Adam |
| Method | MSE | MAE | RMSE | MAPE | R2 |
|---|---|---|---|---|---|
| LSTM [24] | 0.155 | 0.307 | 14.94 | 8.5 | 0.86 |
| ARIMA [25] | 3.554 | 0.445 | 14.77 | 12.1 | 0.87 |
| CNN [26] | 0.197 | 0.357 | 12.61 | 7.8 | 0.90 |
| Prophet [27] | 0.199 | 0.381 | 13.53 | 9.2 | 0.88 |
| Informer [28] | 0.093 | 0.240 | 13.71 | 6.5 | 0.92 |
| AE-BiLSTM [29] | 0.107 | 0.274 | 12.45 | 6.8 | 0.91 |
| CNN-BiGRU-Attention [30] | 0.087 | 0.255 | 12.37 | 6.0 | 0.93 |
| Transformer-LSTM [31] | 0.083 | 0.246 | 12.21 | 5.5 | 0.94 |
| Transformer-BiGRU-GlobalAttention | 0.078 | 0.233 | 11.13 | 4.8 | 0.95 |
| Method | MSE | MAE | RMSE |
|---|---|---|---|
| Transformer | 0.097 | 0.246 | 13.12 |
| BiGRU | 0.093 | 0.241 | 12.87 |
| Transformer+BiGRU | 0.085 | 0.237 | 12.59 |
| Transformer–BiGRU–GlobalAttention | 0.078 | 0.233 | 11.13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, Z.; Yu, B.; Guang, J.; Jiang, S.; Cong, X.; Zhang, M.; Yu, L. Predicting the Remaining Service Life of Power Transformers Using Machine Learning. Processes 2025, 13, 3459. https://doi.org/10.3390/pr13113459
Gao Z, Yu B, Guang J, Jiang S, Cong X, Zhang M, Yu L. Predicting the Remaining Service Life of Power Transformers Using Machine Learning. Processes. 2025; 13(11):3459. https://doi.org/10.3390/pr13113459
Chicago/Turabian StyleGao, Zimo, Binkai Yu, Jiahe Guang, Shanghua Jiang, Xinze Cong, Minglei Zhang, and Lin Yu. 2025. "Predicting the Remaining Service Life of Power Transformers Using Machine Learning" Processes 13, no. 11: 3459. https://doi.org/10.3390/pr13113459
APA StyleGao, Z., Yu, B., Guang, J., Jiang, S., Cong, X., Zhang, M., & Yu, L. (2025). Predicting the Remaining Service Life of Power Transformers Using Machine Learning. Processes, 13(11), 3459. https://doi.org/10.3390/pr13113459
