Transformer-Based Bearing Fault Classification with VMD-Based Noise Suppression and rCCA-Enhanced Correlation Modeling
Abstract
1. Introduction
- A VMD-based adaptive noise suppression framework is employed to enhance fault-related frequency components in bearing vibration signals, improving signal quality before feature extraction.
- A Transformer-based frequency-domain representation is developed by tokenizing band energy distributions, enabling the model to capture global dependencies and complex spectral patterns directly from vibration data.
- A correlation-aware hybrid feature construction strategy is introduced by integrating analytical spectral descriptors with Transformer-learned deep representations.
- A regularized Canonical Correlation Analysis (rCCA)-based feature fusion mechanism is proposed to model and strengthen the relationship between spectral and Transformer feature spaces, resulting in a more discriminative and compact hybrid feature vector.
- A comprehensive experimental evaluation using multiple classifiers (SVM, Random Forest, and XGBoost) demonstrates that the proposed rCCA-enhanced Transformer framework achieves superior classification performance compared to standalone spectral or deep representations. The proposed framework provides a robust and generalizable solution for vibration-based bearing fault diagnosis, offering practical potential for industrial condition monitoring and predictive maintenance applications.
2. Related Works
3. Method and Materials
| Algorithm 1. VMD-Denoising + Frequency-Token Transformer + rCCA-Based Hybrid Feature Learning | |
| Input: , ; , ; ; Transformer params ; . Output: | |
| 1. | Segment |
| 2. | VMD: Decompose each window: |
| 3. | Compute mode energies |
| 4. | Compute ratios |
| 5. | Select |
| 6. | Reconstruct |
| 7. | Compute DFT , |
| 8. | Map frequency bins |
| 9. | |
| 10. | , |
| 11. | , |
| 12. | |
| 13. | |
| 14. | |
| 15. | ; |
| 16. | |
| 17. | , , |
| 18. | rCCA: Regularize |
| 19. | |
| 20. | (e.g., SVM/RF/XGBoost). |
3.1. Denoising via Variational Mode Decomposition (VMD)
3.2. Spectral Feature Extraction in the Frequency Domain
3.3. Construction of Frequency Tokens and Token Embedding
3.4. Transformer Encoder
3.5. Regularized Canonical Correlation Analysis (rCCA)
4. Results
4.1. Dataset
4.2. Results Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Randall, R.B.; Antoni, J. Rolling Element Bearing Diagnostics—A Tutorial. Mech. Syst. Signal Process. 2011, 25, 485–520. [Google Scholar] [CrossRef]
- Tian, Y.; Lu, C.; Wang, Z.L. Approach for Hydraulic Pump Fault Diagnosis Based on WPT-SVD and SVM. Appl. Mech. Mater. 2015, 764, 191–197. [Google Scholar] [CrossRef]
- Tandon, N.; Choudhury, A. A Review of Vibration and Acoustic Measurement Methods for the Detection of Defects in Rolling Element Bearings. Tribol. Int. 1999, 32, 469–480. [Google Scholar] [CrossRef]
- Li, Y.; Xu, M.; Wang, R.; Huang, W. A Fault Diagnosis Scheme for Rolling Bearing Based on Local Mean Decomposition and Improved Multiscale Fuzzy Entropy. J. Sound Vib. 2016, 360, 277–299. [Google Scholar] [CrossRef]
- Shang, Y.; Tang, X.; Zhao, G.; Jiang, P.; Lin, T.R. A Remaining Life Prediction of Rolling Element Bearings Based on a Bidirectional GRU and CNN. Measurement 2022, 202, 111893. [Google Scholar] [CrossRef]
- Lei, Y.; Li, N.; Guo, L.; Yan, T.; Lin, J. Machinery Health Prognostics: A Systematic Review from Data Acquisition to RUL Prediction. Mech. Syst. Signal Process. 2018, 104, 799–834. [Google Scholar] [CrossRef]
- Guo, J.; Li, Z.; Li, M. A Review on Prognostics Methods for Engineering Systems. IEEE Trans. Reliab. 2019, 69, 1110–1129. [Google Scholar] [CrossRef]
- Gao, Z.; Cecati, C.; Ding, S.X. A survey of fault diagnosis and fault-tolerant techniques—Part I: Fault diagnosis with model-based and signal-based approaches. IEEE Trans. Ind. Electron. 2015, 62, 3757–3767. [Google Scholar] [CrossRef]
- Widodo, A.; Yang, B.-S. Support Vector Machine in Machine Condition Monitoring and Fault Diagnosis. Mech. Syst. Signal Process. 2007, 21, 2560–2574. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Guo, J.; Yang, Y.; Li, H.; Wang, J.; Tang, A.; Shan, D.; Huang, B. A Hybrid Deep Learning Model towards Fault Diagnosis of Drilling Pump. Appl. Energy 2024, 372, 123773. [Google Scholar] [CrossRef]
- Alonso-González, M.; Díaz, V.G.; Pérez, B.L.; G-Bustelo, B.C.P.; Anzola, J.P. Bearing Fault Diagnosis With Envelope Analysis and Machine Learning Approaches Using CWRU Dataset. IEEE Access 2023, 11, 57796–57805. [Google Scholar] [CrossRef]
- Zhang, X.; Zhao, B.; Lin, Y. Machine Learning Based Bearing Fault Diagnosis Using the Case Western Reserve University Data: A Review. IEEE Access 2021, 9, 155598–155608. [Google Scholar] [CrossRef]
- Hendriks, J.; Dumond, P.; Knox, D.A. Towards Better Benchmarking Using the CWRU Bearing Fault Dataset. Mech. Syst. Signal Process. 2022, 169, 108732. [Google Scholar] [CrossRef]
- Neupane, D.; Seok, J. Bearing Fault Detection and Diagnosis Using Case Western Reserve University Dataset with Deep Learning Approaches: A Review. IEEE Access 2020, 8, 93155–93178. [Google Scholar] [CrossRef]
- Prabhakar, S.; Mohanty, A.R.; Sekhar, A.S. Application of Discrete Wavelet Transform for Detection of Ball Bearing Race Faults. Tribol. Int. 2002, 35, 793–800. [Google Scholar] [CrossRef]
- Malhi, A.; Gao, R.X. PCA-Based Feature Selection Scheme for Machine Defect Classification. IEEE Trans. Instrum. Meas. 2004, 53, 1517–1525. [Google Scholar] [CrossRef]
- Nouri Khajavi, M.; Norouzi Keshtan, M. Intelligent Fault Classification of Rolling Bearings Using Neural Network and Discrete Wavelet Transform. J. Vibroeng. 2014, 16, 761–769. [Google Scholar]
- Gupta, P.; Pradhan, M.K. Fault Detection Analysis in Rolling Element Bearing: A Review. Mater. Today Proc. 2017, 4, 2085–2094. [Google Scholar] [CrossRef]
- Soleimani, A.; Khadem, S.E. Early Fault Detection of Rotating Machinery Through Chaotic Vibration Feature Extraction of Experimental Data Sets. Chaos Solitons Fractals 2015, 78, 61–75. [Google Scholar] [CrossRef]
- Li, H.; Huang, J.; Ji, S. Bearing Fault Diagnosis with a Feature Fusion Method Based On an Ensemble Convolutional Neural Network and Deep Neural Network. Sensors 2019, 19, 2034. [Google Scholar] [CrossRef]
- Magar, R.; Ghule, L.; Li, J.; Zhao, Y.; Farimani, A.B. FaultNet: A Deep Convolutional Neural Network for Bearing Fault Classification. IEEE Access 2021, 9, 25189–25199. [Google Scholar] [CrossRef]
- Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep Learning and Its Applications to Machine Health Monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
- Shao, H.; Jiang, H.; Wang, F.; Wang, Y. Rolling Bearing Fault Diagnosis Using Adaptive Deep Belief Network. ISA Trans. 2017, 69, 187–201. [Google Scholar] [CrossRef]
- Luo, T.; Qiu, M.; Wu, Z.; Zhao, Z.; Zhang, D. Bearing Fault Diagnosis Based on Multi-Scale Spectral Images and Convolutional Neural Network. arXiv 2025, arXiv:2503.21566. [Google Scholar] [CrossRef]
- Alam, T.E.; Ahsan, M.M.; Raman, S. Multimodal Bearing Fault Classification under Variable Conditions: A 1D CNN with Transfer Learning. Mach. Learn. Appl. 2025, 21, 100682. [Google Scholar] [CrossRef]
- Hatipoğlu, A.; Süpürtülü, M.; Yılmaz, E. Enhanced Fault Classification in Bearings: A Multi-Domain Feature Extraction Approach with LSTM-Attention and LASSO. Arab. J. Sci. Eng. 2024, 50, 10795–10812. [Google Scholar] [CrossRef]
- Sinitsin, V.; Ibryaeva, O.; Sakovskaya, V.; Eremeeva, V. Intelligent Bearing Fault Diagnosis Method Combining Hybrid CNN-MLP Model. Mech. Syst. Signal Process. 2022, 180, 109454. [Google Scholar] [CrossRef]
- Chen, J.; Lin, C.; Peng, D.; Ge, H. Fault Diagnosis of Rotating Machinery: A Review and Bibliometric Analysis. IEEE Access 2020, 8, 224985–225003. [Google Scholar] [CrossRef]
- Sahu, D.; Dewangan, R.K.; Matharu, S.P.S. An Investigation of Fault Detection Techniques in Rolling Element Bearing. J. Vib. Eng. Technol. 2024, 12, 5585–5608. [Google Scholar] [CrossRef]
- Jamil, M.A.; Khanam, S. Influence of One-Way ANOVA and Kruskal–Wallis Based Feature Ranking. J. Vib. Eng. Technol. 2024, 12, 3101–3132. [Google Scholar] [CrossRef]
- Ali, U.; Ramzan, U.; Ali, W.; Al-Jaafari, K.A. An Improved Fault Diagnosis Strategy For Induction Motors Using Weighted Probability Ensemble Deep Learning. IEEE Access 2025, 13, 106958–106973. [Google Scholar] [CrossRef]
- Abbasi, M.A.; Huang, S.; Khan, A.S. Fault Detection and Classification of Motor Bearings under Multiple Operating Conditions. ISA Trans. 2025, 156, 61–69. [Google Scholar] [CrossRef]
- Liu, B.; Yan, C.; Liu, Y.; Wang, Z.; Huang, Y.; Wu, L. Multiscale Residual Antinoise Network via Interpretable Dynamic Recalibration Mechanism for Rolling Bearing Fault Diagnosis With Few Samples. IEEE Sens. J. 2023, 23, 31425–31439. [Google Scholar] [CrossRef]
- Babiker, A.; Yan, C.; Li, Q.; Meng, J.; Wu, L. Initial Fault Time Estimation of Rolling Element Bearing by Backtracking Strategy, Improved VMD and Infogram. J. Mech. Sci. Technol. 2021, 35, 425–437. [Google Scholar] [CrossRef]
- Li, F.; Zhang, B.; Verma, S.; Marfurt, K.J. Seismic Signal Denoising Using Thresholded Variational Mode Decomposition. Explor. Geophys. 2017, 49, 450–461. [Google Scholar] [CrossRef]
- Zhang, L.; Tang, J.; Li, G.; Chen, W. Audio Magnetotelluric Denoising via Variational Mode Decomposition and Adaptive Dictionary Learning. J. Appl. Geophy. 2022, 204, 104748. [Google Scholar] [CrossRef]
- Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
- Irfan, M.; Alwadie, A.S.; AlThobiani, F.; Quraishi, K.S.; Jalalah, M.; Abbass, A.; Rahman, S.; Khan, M.K.A.; Alqhtani, S. A Comparison of Machine Learning Methods for the Diagnosis of Motor Faults Using Automated Spectral Feature Extraction Technique. J. Nondestr. Eval. 2022, 41, 31. [Google Scholar] [CrossRef]
- Wang, K.; Guo, P.; Luo, A.-L. A New Automated Spectral Feature Extraction Method and Its Application in Spectral Classification and Defective Spectra Recovery. Mon. Not. R. Astron. Soc. 2017, 465, 4311–4324. [Google Scholar] [CrossRef]
- Tian, J.; Morillo, C.; Azarian, M.H.; Pecht, M. Motor Bearing Fault Detection Using Spectral Kurtosis-Based Feature Extraction Coupled with K-Nearest Neighbor Distance Analysis. IEEE Trans. Ind. Electron. 2015, 63, 1793–1803. [Google Scholar] [CrossRef]
- Li, P.; Lang, Z.; Zhao, L.; Tian, G.; Neasham, J.A.; Zhang, J.; Graham, D.J. System Identification-Based Frequency Domain Feature Extraction for Defect Detection and Characterization. NDT E Int. 2018, 98, 70–79. [Google Scholar] [CrossRef]
- Al-Fahoum, A.S.; Al-Fraihat, A.A. Methods of EEG Signal Features Extraction Using Linear Analysis in Frequency and Time-frequency Domains. Int. Sch. Res. Not. 2014, 2014, 730218. [Google Scholar] [CrossRef]
- Ma, B.; Zhang, W.; Jin, Z.; Li, J.; Zhang, P.; Song, X.; Jin, B. Frequency-Aware Token-Filtered Transformer for Fine-Grained Species Recognition. Eng. Sci. 2026, 39, 2040. [Google Scholar] [CrossRef]
- Irani, H.; De, B.; Metsis, V. WaveFormer: Wavelet Embedding Transformer for Biomedical Signals. arXiv 2026, arXiv:2602.12189. [Google Scholar] [CrossRef]
- Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in Transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
- Zhang, X.; Liu, Y.; Gong, C.; Nie, Y.; Rodriguez, J. Electric Motor Bearing Fault Noise Detection via Mel-Spectrum-Based Contrastive Self-Supervised Transformer Model. IEEE Trans. Ind. Appl. 2024, 60, 8755–8765. [Google Scholar] [CrossRef]
- Abdollah, M.A.F.; Scoccia, R.; Aprile, M. Transformer encoder based self-supervised learning for HVAC fault detection with unlabeled data. Build. Environ. 2024, 258, 111568. [Google Scholar] [CrossRef]
- Li, J.; Bao, Y.; Liu, W.; Ji, P.; Wang, L.; Wang, Z. Twins Transformer: Cross-Attention Based Two-Branch Transformer Network for Rotating Bearing Fault Diagnosis. Measurement 2023, 223, 113687. [Google Scholar] [CrossRef]
- Fu, Z.; Liu, Z.; Ping, S.; Li, W.; Liu, J. TRA-ACGAN: A Motor Bearing Fault Diagnosis Model Based on an Auxiliary Classifier Generative Adversarial Network and Transformer Network. ISA Trans. 2024, 149, 381–393. [Google Scholar] [CrossRef] [PubMed]
- Raganato, A.; Tiedemann, J. An Analysis of Encoder Representations in Transformer-Based Machine Translation. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and İnterpreting Neural Networks for NLP, Brussels, Belgium, 1 November 2018; pp. 287–297. [Google Scholar]
- Huang, J.; Yuan, S.-J.; Li, D.; Li, H. A Kernel Canonical Correlation Analysis Approach for Removing Environmental and Operational Variations for Structural Damage Identification. J. Sound Vib. 2023, 548, 117516. [Google Scholar] [CrossRef]
- Chen, L.; Wang, K.; Li, M.; Wu, M.; Pedrycz, W.; Hirota, K. K-Means Clustering-Based Kernel Canonical Correlation Analysis for Multimodal Emotion Recognition in Human–Robot Interaction. IEEE Trans. Ind. Electron. 2023, 70, 1016–1024. [Google Scholar] [CrossRef]
- Zhou, X.; Shen, H. Regularized Canonical Correlation Analysis with Unlabeled Data. J. Zhejiang Univ.-Sci. A 2009, 10, 504–511. [Google Scholar] [CrossRef]
- Tuzhilina, E.; Tozzi, L.; Hastie, T. Canonical Correlation Analysis in High Dimensions with Structured Regularization. Stat. Model. 2023, 23, 203–227. [Google Scholar] [CrossRef] [PubMed]
- Case Western Reserve University. Bearing Data Center. Available online: https://engineering.case.edu/bearingdatacenter (accessed on 12 December 2025).






| Study | Main Strategy | Signal/View | Fusion Mechanism | Gap Addressed in This Study |
|---|---|---|---|---|
| Neupane and Seok [15] | Review of deep-learning-based bearing fault diagnosis studies using the CWRU dataset | CWRU vibration data in prior deep-learning studies. | No explicit feature-level fusion; survey of model-centric DL methods. | Motivates a framework that goes beyond stand-alone deep models by explicitly integrating complementary representations rather than only comparing architectures. |
| Hendriks et al. [14] | Benchmarking study for CWRU under a more realistic train/test split, showing flaws in the common setup | CWRU vibration data; original vs. proposed benchmark partitions with independent bearings. | No correlation-aware feature fusion; includes a time-frequency data fusion benchmark variant, but the paper’s central contribution is benchmarking rigor. | Highlights that evaluation protocol and leakage-resistant benchmarking are crucial, not just high reported accuracy; this supports the need for a more principled methodology and fair validation. |
| Zhang et al. [13] | Review of machine-learning-based CWRU fault diagnosis methods, including dataset characteristics, feature selection, and classifiers. | CWRU vibration signals with emphasis on engineered features + ML pipelines. | Mostly direct/model-specific combinations, not an explicit cross-view alignment framework. | Motivates explicit study of complementary handcrafted and learned views, rather than relying only on conventional feature engineering or classifier selection. |
| Alonso-González et al. [12] | Envelope analysis with classical machine-learning classifiers for bearing diagnosis | Frequency-domain/envelope-spectrum features from CWRU vibration data; amplitudes at characteristic fault frequencies. | No multi-view fusion; conventional ML over envelope-derived predictors. | Shows that informative spectral features are useful, but cross-view dependency modeling is limited; this leaves room for integrating spectral descriptors with learned contextual features. |
| This study | VMD + frequency-token Transformer + rCCA | Spectral descriptors + deep contextual tokens | Correlation-aware rCCA alignment | Reduces redundancy and strengthens complementary information across handcrafted spectral and learned token-level views. |
| Class No | Fault Type | Fault Diameter |
|---|---|---|
| 0 | Normal | — |
| 1 | Inner Race | 0.007 |
| 2 | Inner Race | 0.014 |
| 3 | Inner Race | 0.021 |
| 4 | Ball | 0.007 |
| 5 | Ball | 0.014 |
| 6 | Ball | 0.021 |
| 7 | Outer Race | 0.007 |
| 8 | Outer Race | 0.014 |
| 9 | Outer Race | 0.021 |
| Method | Spectral | Transformer | Hybrid Feature Without rCCA | Hybrid Feature with rCCA | Split | Acc (%) | Prec (%) | Recall (%) | F1 (%) |
|---|---|---|---|---|---|---|---|---|---|
| SVM | √ | 70–30% | 89.42 | 88.98 | 88.11 | 88.53 | |||
| √ | 70–30% | 93.25 | 92.84 | 92.61 | 92.73 | ||||
| √ | 70–30% | 95.87 | 95.41 | 95.09 | 95.22 | ||||
| √ | 70–30% | 97.54 | 97.02 | 96.81 | 96.92 | ||||
| √ | 10-fold | 89.86 | 89.33 | 89.02 | 89.17 | ||||
| √ | 10-fold | 93.98 | 93.44 | 93.22 | 93.31 | ||||
| √ | 10-fold | 96.48 | 96.03 | 95.66 | 95.79 | ||||
| √ | 10-fold | 98.21 | 97.90 | 97.42 | 97.64 |
| Method | Spectral | Transformer | Hybrid Feature Without rCCA | Hybrid Feature with rCCA | Split | Acc (%) | Prec (%) | Recall (%) | F1 (%) |
|---|---|---|---|---|---|---|---|---|---|
| Random Forest | √ | 70–30% | 83.77 | 83.12 | 82.54 | 82.66 | |||
| √ | 70–30% | 89.21 | 88.76 | 88.31 | 88.42 | ||||
| √ | 70–30% | 92.94 | 92.21 | 91.88 | 92.01 | ||||
| √ | 70–30% | 95.63 | 95.07 | 94.78 | 94.91 | ||||
| √ | 10-fold | 84.20 | 83.75 | 83.09 | 83.31 | ||||
| √ | 10-fold | 89.98 | 89.51 | 89.20 | 89.34 | ||||
| √ | 10-fold | 93.54 | 92.97 | 92.61 | 92.75 | ||||
| √ | 10-fold | 96.74 | 96.11 | 95.84 | 95.97 |
| Method | Spectral | Transformer | Hybrid Feature Without rCCA | Hybrid Feature with rCCA | Split | Acc (%) | Prec (%) | Recall (%) | F1 (%) |
|---|---|---|---|---|---|---|---|---|---|
| XGBoost | √ | 70–30% | 85.63 | 85.22 | 84.81 | 84.93 | |||
| √ | 70–30% | 91.40 | 91.05 | 90.63 | 90.81 | ||||
| √ | 70–30% | 94.88 | 94.31 | 93.94 | 94.12 | ||||
| √ | 70–30% | 97.42 | 96.98 | 96.63 | 96.78 | ||||
| √ | 10-fold | 86.12 | 85.71 | 85.20 | 85.35 | ||||
| √ | 10-fold | 92.04 | 91.68 | 91.22 | 91.36 | ||||
| √ | 10-fold | 95.61 | 95.04 | 94.68 | 94.79 | ||||
| √ | 10-fold | 98.36 | 97.92 | 97.41 | 97.63 |
| Classifier | Spectral Acc (%) | Transformer Acc (%) | Hybrid Feature Without rCCA Acc (%) | Hybrid Feature with rCCA Acc (%) | rCCA Gain (%) |
|---|---|---|---|---|---|
| SVM | 89.86 | 93.98 | 96.48 | 98.21 | +1.73 |
| Random Forest | 84.20 | 89.98 | 93.54 | 96.74 | +3.20 |
| XGBoost | 86.12 | 92.04 | 95.61 | 98.36 | +2.75 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Koca, T.; Er, M.B.; Çıtlak, A. Transformer-Based Bearing Fault Classification with VMD-Based Noise Suppression and rCCA-Enhanced Correlation Modeling. Machines 2026, 14, 507. https://doi.org/10.3390/machines14050507
Koca T, Er MB, Çıtlak A. Transformer-Based Bearing Fault Classification with VMD-Based Noise Suppression and rCCA-Enhanced Correlation Modeling. Machines. 2026; 14(5):507. https://doi.org/10.3390/machines14050507
Chicago/Turabian StyleKoca, Tarkan, Mehmet Bilal Er, and Aydın Çıtlak. 2026. "Transformer-Based Bearing Fault Classification with VMD-Based Noise Suppression and rCCA-Enhanced Correlation Modeling" Machines 14, no. 5: 507. https://doi.org/10.3390/machines14050507
APA StyleKoca, T., Er, M. B., & Çıtlak, A. (2026). Transformer-Based Bearing Fault Classification with VMD-Based Noise Suppression and rCCA-Enhanced Correlation Modeling. Machines, 14(5), 507. https://doi.org/10.3390/machines14050507

