P2IFormer: A Multi-Granularity Patch-to-Image Embedding Model for Fault Diagnosis of High-Speed Train Axle-Box Bearings
Abstract
1. Introduction
- (1)
- A fault diagnosis model for high-speed train axle-box bearings, named P2IFormer, is developed based on a patch-to-image embedding framework. By combining multi-granularity segmentation of time series with image transformation, patch sequences at different granularities are converted into multi-channel Gramian Angular Field (GAF) images, significantly enhancing the modeling of local temporal features.
- (2)
- A granularity-specific image embedding module is designed to generate feature representations for each granularity. Patch images at each granularity are processed through feature extraction, pooling, and linear projection, and then uniformly encoded as granularity-specific tokens. This provides high-quality representations to support effective multi-granularity interaction modeling.
- (3)
- The proposed method is evaluated under various constant-speed and variable-speed operating conditions. The results demonstrate that P2IFormer achieves over 99.5% accuracy across all scenarios, significantly outperforming existing CNN- and Transformer-based methods.
2. Related Work
2.1. Transformer-Based Approaches for Bearing Fault Diagnosis
2.2. Image-Based Fault Diagnosis Through Time-Series Transformation
2.3. Multi-Scale Feature Extraction for Fault Diagnosis
3. Methodology
3.1. Problem Definition
3.2. Overview of the P2IFormer Architecture
3.3. Multi-Granularity Patch-to-Image Embedding
3.3.1. Multi-Granularity Patch Splitting
3.3.2. Patch-to-Image Embedding
- Step 1: Normalization. Each channel-wise sequence is normalized to the range using min–max scaling:
- Step 2: Polar coordinate mapping. Each normalized data point is mapped to a point in the polar coordinate system.
- Step 3: GAF image generation. The Gram Angular Summation Field (GASF) and Gram Angular Difference Field (GADF) are defined as
3.4. Multi-Granularity Self Attention
3.4.1. Intra-Granularity Self-Attention
3.4.2. Inter-Granularity Self-Attention
3.5. Multi-Granularity Feature Aggregation
3.6. Classifier
4. Experiments and Results
4.1. Dataset Description
4.2. Experimental Setup
4.3. Performance Evaluation Under Constant-Speed Conditions
4.4. Performance Evaluation Under the Variable-Speed Condition
4.5. Ablation Study
4.6. Robustness Evaluation Under Noisy Conditions
4.7. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Xie, S.; Tan, H.; Yang, C.; Yan, H. A review of fault diagnosis methods for key systems of the high-speed train. Appl. Sci. 2023, 13, 4790. [Google Scholar] [CrossRef]
- Ma, W.; Wang, J.; Zhang, C.; Jia, Q.; Zhu, L.; Ji, W.; Wang, Z. Application of Variational Graph Autoencoder in Traction Control of Energy-Saving Driving for High-Speed Train. Appl. Sci. 2024, 14, 2037. [Google Scholar] [CrossRef]
- Wu, J.; Li, Y.; Jia, L.; An, G.; Li, Y.F.; Antoni, J.; Xin, G. Semi-supervised fault diagnosis of wheelset bearings in high-speed trains using autocorrelation and improved flow Gaussian mixture model. Eng. Appl. Artif. Intell. 2024, 132, 107861. [Google Scholar] [CrossRef]
- Zhao, L.; Yang, S.; Liu, Y. Weak fault feature extraction of axle box bearing based on pre-identification and singular value decomposition. Machines 2022, 10, 1213. [Google Scholar] [CrossRef]
- Jin, Z.; He, D.; Wei, Z. Intelligent fault diagnosis of train axle box bearing based on parameter optimization VMD and improved DBN. Eng. Appl. Artif. Intell. 2022, 110, 104713. [Google Scholar] [CrossRef]
- Hu, W.; Xin, G.; Wu, J.; An, G.; Li, Y.; Feng, K.; Antoni, J. Vibration-based bearing fault diagnosis of high-speed trains: A literature review. High-Speed Railw. 2023, 1, 219–223. [Google Scholar] [CrossRef]
- Yu, M.; Zhang, Y.; Yang, C. Rolling bearing faults identification based on multiscale singular value. Adv. Eng. Inform. 2023, 57, 102040. [Google Scholar] [CrossRef]
- Yang, Z.; Wu, B.; Shao, J.; Lu, X.; Zhang, L.; Xu, Y.; Chen, G. Fault detection of high-speed train axle bearings based on a hybridized physical and data-driven temperature model. Mech. Syst. Signal Process. 2024, 208, 111037. [Google Scholar] [CrossRef]
- Shaalan, A.A.; Mefteh, W.; Frihida, A.M. Review on deep learning classifiers for faults diagnosis of rotating industrial machinery. Serv. Oriented Comput. Appl. 2024, 18, 361–379. [Google Scholar] [CrossRef]
- Chen, J.; Huang, R.; Zhao, K.; Wang, W.; Liu, L.; Li, W. Multiscale convolutional neural network with feature alignment for bearing fault diagnosis. IEEE Trans. Instrum. Meas. 2021, 70, 1–10. [Google Scholar] [CrossRef]
- Sun, H.; Fan, Y. A new bearing fault diagnosis method based on multi-scale CNN and LSTM. In Proceedings of the International Conference on Mechatronics and Intelligent Control (ICMIC 2023), Wuhan, China, 21–23 July 2023; SPIE: Bellingham WA, USA, 2023; Volume 12793, pp. 440–461. [Google Scholar]
- Dengfeng, Z.; Chaoyang, T.; Zhijun, F.; Yudong, Z.; Junjian, H.; Wenbin, H. Multi scale convolutional neural network combining BiLSTM and attention mechanism for bearing fault diagnosis under multiple working conditions. Sci. Rep. 2025, 15, 13035. [Google Scholar] [CrossRef]
- Hou, Y.; Ma, J.; Wang, J.; Li, T.; Chen, Z. Enhanced generative adversarial networks for bearing imbalanced fault diagnosis of rotating machinery. Appl. Intell. 2023, 53, 25201–25215. [Google Scholar] [CrossRef]
- Luo, J.; Zhang, Y.; Yang, F.; Jing, X. Imbalanced data fault diagnosis of rolling bearings using enhanced relative generative adversarial network. J. Mech. Sci. Technol. 2024, 38, 541–555. [Google Scholar] [CrossRef]
- Zhao, J.; Wang, W.; Huang, J.; Ma, X. A comprehensive review of deep learning-based fault diagnosis approaches for rolling bearings: Advancements and challenges. AIP Adv. 2025, 15, 020702. [Google Scholar] [CrossRef]
- Hakim, M.; Omran, A.A.B.; Ahmed, A.N.; Al-Waily, M.; Abdellatif, A. A systematic review of rolling bearing fault diagnoses based on deep learning and transfer learning: Taxonomy, overview, application, open challenges, weaknesses and recommendations. Ain Shams Eng. J. 2023, 14, 101945. [Google Scholar] [CrossRef]
- Shen, J.; Wu, Z.; Cao, Y.; Zhang, Q.; Cui, Y. Research on Fault Diagnosis of Rolling Bearing Based on Gramian Angular Field and Lightweight Model. Sensors 2024, 24, 5952. [Google Scholar] [CrossRef]
- Bai, R.; Wang, H.; Sun, W.; Shi, Y. Fault diagnosis method for rotating machinery based on SEDenseNet and Gramian Angular Field. Maint. Reliab. I Niezawodn. 2024, 26, 191445. [Google Scholar] [CrossRef]
- Yu, P.; Li, R.b.; Cao, J.; Qin, J.h. Bearing fault diagnosis method for unbalance data based on Gramian angular field. J. Intell. Fuzzy Syst. 2024, 47, 45–54. [Google Scholar] [CrossRef]
- Tong, A.; Zhang, J.; Xie, L. Intelligent fault diagnosis of rolling bearing based on Gramian angular difference field and improved dual attention residual network. Sensors 2024, 24, 2156. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Wang, R.; Dong, E.; Cheng, Z.; Liu, Z.; Jia, X. Transformer-based intelligent fault diagnosis methods of mechanical equipment: A survey. Open Phys. 2024, 22, 20240015. [Google Scholar] [CrossRef]
- Hou, Y.; Wang, J.; Chen, Z.; Ma, J.; Li, T. Diagnosisformer: An efficient rolling bearing fault diagnosis method based on improved Transformer. Eng. Appl. Artif. Intell. 2023, 124, 106507. [Google Scholar] [CrossRef]
- Chen, Q.; Zhang, F.; Wang, Y.; Yu, Q.; Lang, G.; Zeng, L. Bearing fault diagnosis based on efficient cross space multiscale CNN transformer parallelism. Sci. Rep. 2025, 15, 12344. [Google Scholar] [CrossRef]
- Han, Y.; Zhang, F.; Li, Z.; Wang, Q.; Li, C.; Lai, P.; Li, T.; Teng, F.; Jin, Z. Mt-ConvFormer: A multi-task bearing fault diagnosis method using a combination of CNN and transformer. IEEE Trans. Instrum. Meas. 2024, 74, 3501816. [Google Scholar] [CrossRef]
- Lv, J.; Xiao, Q.; Zhai, X.; Shi, W. A high-performance rolling bearing fault diagnosis method based on adaptive feature mode decomposition and Transformer. Appl. Acoust. 2024, 224, 110156. [Google Scholar] [CrossRef]
- Chen, F.; Wang, X.; Zhu, Y.; Yuan, W.; Hu, Y. Time–frequency Transformer with shifted windows for journal bearing-rotor systems fault diagnosis under multiple working conditions. Meas. Sci. Technol. 2023, 34, 085121. [Google Scholar] [CrossRef]
- Cen, J.; Yang, Z.; Wu, Y.; Hu, X.; Jiang, L.; Chen, H.; Si, W. A mask self-supervised learning-based transformer for bearing fault diagnosis with limited labeled samples. IEEE Sens. J. 2023, 23, 10359–10369. [Google Scholar] [CrossRef]
- Fang, X.; Deng, X.; Chen, J.; Liu, M.; Fu, Y.; Huang, G.; Zhou, C. Convolution Transformer Based Fault Diagnosis Method For Aircraft Engine Bearings. In Proceedings of the 2024 43rd Chinese Control Conference (CCC), Kunming, China, 28–31 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 5038–5042. [Google Scholar]
- Liu, W.; Zhang, Z.; Zhang, J.; Huang, H.; Zhang, G.; Peng, M. A novel fault diagnosis method of rolling bearings combining convolutional neural network and transformer. Electronics 2023, 12, 1838. [Google Scholar] [CrossRef]
- Zhang, S.; Zhang, S.; Wang, B.; Habetler, T.G. Deep learning algorithms for bearing fault diagnostics—A comprehensive review. IEEE Access 2020, 8, 29857–29881. [Google Scholar] [CrossRef]
- Zhang, M. Multi-resolution short-time Fourier transform providing deep features for 3D CNN to classify rolling bearing fault vibration signals. Eng. Res. Express 2024, 6, 035201. [Google Scholar] [CrossRef]
- Toma, R.N.; Toma, F.H.; Kim, J.M. Comparative analysis of continuous wavelet transforms on vibration signal in bearing fault diagnosis of induction motor. In Proceedings of the 2021 International Conference on Electronics, Communications and Information Technology (ICECIT), Khulna, Bangladesh, 14–16 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–4. [Google Scholar]
- Yu, S.; Liu, Z.; Wang, S.; Zhang, G. A novel adaptive gramian angle field based intelligent fault diagnosis for motor rolling bearings. J. Phys. Conf. Ser. 2024, 2785, 012042. [Google Scholar] [CrossRef]
- Dou, S.; Cheng, X.; Du, Y.; Wang, Z.; Liu, Y. Gearbox fault diagnosis based on Gramian angular field and TLCA-MobileNetV3 with limited samples. Int. J. Metrol. Qual. Eng. 2024, 15, 15. [Google Scholar] [CrossRef]
- Zhou, Y.; Long, X.; Sun, M.; Chen, Z. Bearing fault diagnosis based on Gramian angular field and DenseNet. Math. Biosci. Eng. 2022, 19, 14086–14101. [Google Scholar] [CrossRef] [PubMed]
- Yin, Z.; Zhang, F.; Xu, G.; Han, G.; Bi, Y. Multi-scale rolling bearing fault diagnosis method based on transfer learning. Appl. Sci. 2024, 14, 1198. [Google Scholar] [CrossRef]
- Ding, S.; Rui, Z.; Lei, C.; Zhuo, J.; Shi, J.; Lv, X. A rolling bearing fault diagnosis method based on Markov transition field and multi-scale Runge-Kutta residual network. Meas. Sci. Technol. 2023, 34, 125150. [Google Scholar] [CrossRef]
- Deng, J.; Liu, H.; Fang, H.; Shao, S.; Wang, D.; Hou, Y.; Chen, D.; Tang, M. MgNet: A fault diagnosis approach for multi-bearing system based on auxiliary bearing and multi-granularity information fusion. Mech. Syst. Signal Process. 2023, 193, 110253. [Google Scholar] [CrossRef]
- Xue, L.; Ningyun, L.; Chuang, C.; Tianzhen, H.; Bin, J. Attention mechanism based multi-scale feature extraction of bearing fault diagnosis. J. Syst. Eng. Electron. 2023, 34, 1359–1367. [Google Scholar] [CrossRef]
- Hu, B.; Liu, J.; Xu, Y. A novel multi-scale convolutional neural network incorporating multiple attention mechanisms for bearing fault diagnosis. Measurement 2025, 242, 115927. [Google Scholar] [CrossRef]
- Wang, Z.; Oates, T. Encoding time series as images for visual inspection and classification using tiled convolutional neural networks. In Proceedings of the Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 1, pp. 1–7. [Google Scholar]
- Zhang, Y.; Zhou, S.; Li, H. Depth information assisted collaborative mutual promotion network for single image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 2846–2855. [Google Scholar]
- Wang, Y.; Huang, N.; Li, T.; Yan, Y.; Zhang, X. Medformer: A multi-granularity patching transformer for medical time-series classification. arXiv 2024, arXiv:2405.19363. [Google Scholar]
- Chen, X.; Zhang, B.; Gao, D. Bearing fault diagnosis base on multi-scale CNN and LSTM model. J. Intell. Manuf. 2021, 32, 971–987. [Google Scholar] [CrossRef]
- Zhang, W.; Peng, G.; Li, C.; Chen, Y.; Zhang, Z. A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals. Sensors 2017, 17, 425. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Hyperparameter | Value |
---|---|
Input sequence shape | |
Number of granularities n | 3 |
Patch lengths | |
Embedding dimension d | 128 |
Encoder layers L | 2 |
Attention heads H | 4 |
Dropout rate | |
Learning rate | |
Batch size | 64 |
Fusion vector dimension | 768 |
Loss function | Cross-Entropy |
Optimizer | AdamW |
Weight decay |
Model | 20 Hz | 40 Hz | 60 Hz | 80 Hz | Params | Time/Epoch | ||||
---|---|---|---|---|---|---|---|---|---|---|
Acc | F1 | Acc | F1 | Acc | F1 | Acc | F1 | (M) | (s) | |
WDCNN | 91.29 | 90.94 | 98.78 | 98.77 | 99.45 | 99.45 | 99.56 | 99.56 | 0.029 | 3 |
CNN-LSTM | 90.99 | 90.95 | 97.51 | 97.50 | 99.27 | 99.27 | 99.67 | 99.67 | 0.093 | 5 |
DenseNet | 97.84 | 97.82 | 99.29 | 99.28 | 99.80 | 99.80 | 99.98 | 99.98 | 6.959 | 81 |
ResNet | 92.93 | 92.90 | 99.39 | 99.38 | 99.28 | 99.28 | 99.40 | 99.40 | 42.510 | 66 |
ViT | 95.12 | 95.10 | 99.12 | 99.11 | 99.42 | 99.42 | 99.38 | 99.38 | 85.801 | 140 |
ECMCTP | 96.38 | 96.36 | 99.41 | 99.40 | 99.52 | 99.52 | 99.96 | 99.96 | 0.781 | 25 |
P2IFormer | 99.76 | 99.76 | 99.61 | 99.61 | 99.89 | 99.89 | 99.90 | 99.90 | 17.143 | 43 |
Model | 20 Hz | 40 Hz | 60 Hz | 80 Hz | Variable Speed |
---|---|---|---|---|---|
P2I-MG (multi) | 99.76 | 99.61 | 99.89 | 99.90 | 99.64 |
P2I-3 (single) | 96.93 | 98.70 | 98.88 | 98.90 | 97.41 |
2.83 | 0.91 | 1.01 | 1.00 | 2.23 | |
P2I-5 (single) | 96.87 | 98.39 | 99.02 | 99.09 | 97.87 |
2.89 | 1.22 | 0.87 | 0.81 | 1.77 | |
P2I-7 (single) | 97.48 | 98.25 | 99.04 | 99.18 | 97.23 |
2.28 | 1.36 | 0.85 | 0.72 | 2.41 |
SNR (dB) | Model | 20 Hz | 40 Hz | 60 Hz | 80 Hz | Variable-Speed |
---|---|---|---|---|---|---|
−6 | WDCNN | 57.46 | 63.75 | 77.19 | 76.44 | 50.35 |
CNN-LSTM | 60.12 | 65.20 | 80.19 | 79.88 | 51.62 | |
DenseNet | 88.24 | 94.15 | 98.53 | 99.43 | 87.41 | |
ResNet | 86.51 | 94.59 | 94.40 | 98.45 | 86.98 | |
ViT | 77.18 | 90.14 | 91.69 | 92.05 | 76.93 | |
ECMCTP | 78.97 | 92.39 | 98.47 | 99.61 | 79.52 | |
P2IFormer | 90.26 | 97.46 | 98.58 | 97.76 | 90.23 | |
−3 | WDCNN | 58.75 | 65.82 | 78.09 | 79.10 | 52.50 |
CNN-LSTM | 62.38 | 71.19 | 83.27 | 85.97 | 55.23 | |
DenseNet | 90.28 | 97.70 | 98.70 | 98.57 | 88.70 | |
ResNet | 89.02 | 96.35 | 90.29 | 98.39 | 88.79 | |
ViT | 79.35 | 92.93 | 93.47 | 94.02 | 82.60 | |
ECMCTP | 81.98 | 92.85 | 97.87 | 99.69 | 81.10 | |
P2IFormer | 91.51 | 97.26 | 98.80 | 97.73 | 91.91 | |
0 | WDCNN | 61.80 | 75.27 | 87.21 | 87.50 | 59.86 |
CNN-LSTM | 65.71 | 78.74 | 88.28 | 91.04 | 61.47 | |
DenseNet | 90.89 | 98.57 | 99.10 | 99.21 | 91.77 | |
ResNet | 90.29 | 96.19 | 98.03 | 98.75 | 90.45 | |
ViT | 82.26 | 94.61 | 94.28 | 96.29 | 84.28 | |
ECMCTP | 84.18 | 94.20 | 99.06 | 99.90 | 84.57 | |
P2IFormer | 92.64 | 98.78 | 99.10 | 98.60 | 93.71 | |
3 | WDCNN | 69.14 | 84.91 | 92.36 | 94.80 | 65.82 |
CNN-LSTM | 70.54 | 86.62 | 91.62 | 95.79 | 65.12 | |
DenseNet | 91.81 | 98.78 | 99.32 | 99.36 | 93.27 | |
ResNet | 90.79 | 98.06 | 99.02 | 99.09 | 92.90 | |
ViT | 83.48 | 95.65 | 97.00 | 97.07 | 85.94 | |
ECMCTP | 85.08 | 96.07 | 98.99 | 99.93 | 85.60 | |
P2IFormer | 93.91 | 99.06 | 99.10 | 99.15 | 95.32 | |
6 | WDCNN | 75.22 | 87.05 | 96.56 | 97.93 | 72.34 |
CNN-LSTM | 76.91 | 91.17 | 96.77 | 98.72 | 73.70 | |
DenseNet | 94.07 | 99.10 | 99.84 | 99.65 | 94.20 | |
ResNet | 91.73 | 99.30 | 99.24 | 99.25 | 93.16 | |
ViT | 84.70 | 96.87 | 97.88 | 97.95 | 87.88 | |
ECMCTP | 89.41 | 97.18 | 99.27 | 99.95 | 88.81 | |
P2IFormer | 95.51 | 99.28 | 99.32 | 99.23 | 95.48 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, W.; Zhang, C.; Chen, L.; Wang, Z.; Fan, X.; Cui, Y. P2IFormer: A Multi-Granularity Patch-to-Image Embedding Model for Fault Diagnosis of High-Speed Train Axle-Box Bearings. Sensors 2025, 25, 5138. https://doi.org/10.3390/s25165138
Ma W, Zhang C, Chen L, Wang Z, Fan X, Cui Y. P2IFormer: A Multi-Granularity Patch-to-Image Embedding Model for Fault Diagnosis of High-Speed Train Axle-Box Bearings. Sensors. 2025; 25(16):5138. https://doi.org/10.3390/s25165138
Chicago/Turabian StyleMa, Weigang, Chaohui Zhang, Ling Chen, Zhoukai Wang, Xing Fan, and Yingan Cui. 2025. "P2IFormer: A Multi-Granularity Patch-to-Image Embedding Model for Fault Diagnosis of High-Speed Train Axle-Box Bearings" Sensors 25, no. 16: 5138. https://doi.org/10.3390/s25165138
APA StyleMa, W., Zhang, C., Chen, L., Wang, Z., Fan, X., & Cui, Y. (2025). P2IFormer: A Multi-Granularity Patch-to-Image Embedding Model for Fault Diagnosis of High-Speed Train Axle-Box Bearings. Sensors, 25(16), 5138. https://doi.org/10.3390/s25165138