Next Article in Journal
Federated Learning-Driven Digital Twin: A Privacy-Preserving AI Approach for Crisis Logistics
Previous Article in Journal
Towards a More Natural Urdu: A Comprehensive Approach to Text-to-Speech and Voice Cloning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Research on Bearing Fault Diagnosis Method Based on Multi-Scale Convolution and Attention Mechanism in Strong Noise Environment †

School of Mechanical and Electrical Engineering, Wenzhou University, Wenzhou 325035, China
*
Author to whom correspondence should be addressed.
Presented at the 5th International Conference on Advances in Mechanical Engineering (ICAME-25), Islamabad, Pakistan, 26 August 2025.
Eng. Proc. 2025, 111(1), 14; https://doi.org/10.3390/engproc2025111014
Published: 17 October 2025

Abstract

Aiming to address the suboptimal diagnostic efficacy of conventional approaches in rolling bearing fault diagnosis under complex operational scenarios—particularly when subjected to intense noise contamination—this study introduces a novel fault diagnosis framework leveraging the synergistic attention mechanisms of MSCNN and GRU.The workflow begins with conditioning vibration signals sourced from the CRUW bearing dataset, followed by robust extraction of multi-scale fault features through a multi-scale convolutional neural network (MSCNN). To enhance temporal modeling, a gated recurrent unit (GRU) is integrated to extract temporal dependencies from the signals. Further improvements are achieved by enhancing the base model with Transformer layers and a convolutional attention module (CBAM), respectively. Experimental validation is conducted to compare the diagnostic performance of three configurations MSCNN+GRU, MSCNN+GRU+Transformer, and MSCNN+GRU+CBAM—under varying noise conditions. Outcomes demonstrate that the MSCNN+GRU+CBAM model exhibits superior fault detection accuracy and optimal generalization capacity across different load and rotational speed settings, confirming its notable advantage in rolling bearing fault diagnosis.

1. Introduction

Rotary bearings, as commonly utilized key components in rotary machinery, exert a critical impact on the overall safety, reliability, and operational efficiency of mechanical systems. Conventional fault diagnosis approaches for rotating bearings primarily depend on vibration signal processing, encompassing time-domain, frequency-domain, and time-frequency domain methodologies. Techniques like Fourier Transform (FFT) and Empirical Mode Decomposition (EMD) have been widely applied in bearing fault detection [1]. However, these traditional methods mostly depend on expert knowledge and manual feature extraction, which have limitations when dealing with strong background noise interference and complex operating conditions, often leading to a decline in diagnostic performance [2].
Over the past few years, propelled by the rapid progress of deep learning theories and techniques, data-driven fault diagnosis frameworks have progressively emerged as a critical research domain in mechanical equipment fault detection. Contrary to traditional methodologies, deep learning facilitates self-driven extraction of fault-related features, thereby demonstrating enhanced accuracy and robustness in diagnostic efficacy [3]. Since LeCun introduced the LeNet-5 model, classical convolutional neural network (CNN) structures, such as AlexNet, VGGNet, Inception, and ResNet, have been developed and effectively implemented in image processing and signal analysis, thereby attaining extensive success [4].
In bearing fault diagnosis, researchers have investigated the application of CNNs. Li et al. developed a CNN model utilizing Continuous Wavelet Transform (CWT) for classifying bearing faults with 2D time-frequency images, demonstrating good diagnostic performance [5]. However, while these methods enhance fault classification accuracy, transforming 1D vibration signals into 2D images not only elevates computational complexity but also unavoidably introduces information degradation during the signal conversion process. To avoid these issues, one-dimensional convolutional neural networks (1D-CNN) have gained widespread attention. 1D-CNNs directly extract features from one-dimensional vibration signals, effectively preserving the original characteristics of the signal while reducing computational load. Zhao et al. proposed a 1D-CNN-based bearing fault diagnosis model, confirming its effectiveness in diagnostic tasks [6].
Considering that bearing vibration signals inherently exhibit temporal characteristics, recurrent neural networks (RNNs) have also been gradually applied in bearing fault diagnosis. In particular, gated recurrent units (GRU), with their high training efficiency and strong ability to capture long-term dependencies in sequential data, have been increasingly used for learning sequential features of vibration signals. Chen et al. developed an integrated CNN-GRU bearing fault diagnosis model, achieving high classification accuracy in complex noisy environments, thus confirming the feasibility of combining GRU with CNN [7].
Meanwhile, attention mechanisms have been introduced to enhance the model’s focus on fault-sensitive features. The Transformer self-attention mechanism has attracted attention due to its powerful global feature capturing ability but has high computational complexity. In contrast, the Convolutional Block Attention Module (CBAM), which performs joint attention computation in both the feature space and channel dimensions, has the advantages of a lightweight structure and strong noise resistance, making it suitable for real-time diagnostic requirements [8]. Zhang et al. developed a CNN-based bearing fault diagnosis network integrated with CBAM, confirming its ability to focus on fault signals [9].
This research proposes a bearing fault diagnosis framework leveraging an MSCNN (multi-scale convolutional neural network) integrated with GRU and the CBAM (convolutional block attention module) attention mechanism. The proposed approach utilizes multi-scale convolutions to extract rich feature information, employs GRU to effectively capture temporal features, and further focuses on key fault features in the signals through the CBAM. The method is designed to better adapt to bearing fault diagnosis tasks under strong noise environments and complex operating conditions, achieving high diagnostic accuracy and strong generalization capability.

2. Problem Description

Fault diagnosis of rolling bearings faces several challenges, especially under strong noise and complex operating conditions. Conventional signal processing techniques, including the Fourier transform (FT) and empirical mode decomposition (EMD), have shown some effectiveness in simpler environments. However, when vibration signals are subjected to noise interference or changes in operating conditions, the diagnostic accuracy is often compromised. Furthermore, these methods predominantly rely on manual feature extraction, lacking automation and adaptability. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) demonstrate an autonomous capability to extract discriminative features from data without human intervention, thereby achieving effective fault classification. However, these methods typically focus on single-scale feature learning, making it difficult to simultaneously handle multi-scale signal features. In particular, the diagnostic performance’s stability and robustness remain limited in environments with high levels of noise.
To tackle these challenges, this research introduces an innovative fault diagnosis approach for rolling bearings, integrating MSCNN, GRU, and CBAM. The method leverages MSCNN to effectively extract multi-scale local features and global trend characteristics, uses GRU to model temporal dependencies in the signal, and incorporates CBAM to enhance the focus on key features, thus improving diagnostic accuracy and robustness in complex environments.
The goal of this study is to overcome the limitations of traditional methods in complex noise environments and multi-scale feature extraction, by proposing a fault diagnosis framework that better adapts to complex operating conditions and noisy environments, providing more accurate and stable solutions for industrial applications.

3. Bearing Fault Diagnosis Approach Utilizing MSCNN Integrated with GRU and Attention Mechanism

By integrating a multi-scale convolutional neural network (MSCNN), a gated recurrent unit (GRU), and a convolutional block attention module (CBAM), we have developed a comprehensive diagnostic framework. This framework encompasses stages including data preprocessing, feature derivation, network structure design, and diagnostic result generation, as shown in Figure 1.

3.1. Multi-Scale Convolution Feature Extraction Module

Rolling bearing vibration signals exhibit pronounced nonlinearity and non-stationarity under actual operating conditions, and features at different scales exert distinct influences on diagnostic accuracy; consequently, traditional single-scale CNN architectures are inadequate for capturing the full spectrum of bearing vibration characteristics. In this study, the proposed MSCNN module comprises two parallel convolutional branches, each employing kernels and strides of different scales to achieve synchronous multi-scale feature extraction. The first branch utilizes larger kernels (kernel sizes of 20 and 10) with a stride of 2, effectively extracting low-frequency global features to characterize the overall fault trend. The second branch employs a smaller kernel (kernel size of 6) with combined strides of 1 and 2, which facilitates the capture of high-frequency local features and fine fault details. Algorithm 1 presents some details of the data settings.
Finally, the outputs of both branches are pooled and then fused via element-wise multiplication, enhancing the network’s ability to model inter-scale feature correlations and yielding more representative fault feature representations.
Algorithm 1. Multi-Scale Convolutional Neural Network Channel Design
class AdvancedMSCNN(nn.Module):
def __init__(self):
        super(AdvancedMSCNN, self).__init__()
        # The first convolutional path
        self.layer1 = nn.Sequential(
            nn.Conv1d(in_channels=1, out_channels=50, kernel_size=20, stride=2),
            nn.BatchNorm1d(50),
            nn.ReLU(inplace=True),
            nn.Conv1d(in_channels=50, out_channels=30, kernel_size=10, stride=2),
            nn.BatchNorm1d(30),
            nn.ReLU(inplace=True),
            nn.MaxPool1d(kernel_size=2)
        )
        

3.2. Gated Recurrent Unit Module

The GRU architecture, utilizing an update gate and a reset gate mechanism, modulates information flow dynamics, enabling the model to capture both long-term and short-term temporal dependencies while strengthening the robustness of feature extraction. The update gate determines the proportion of historical information to retain in the current state, ensuring effective propagation of long-range dependencies, while the reset gate controls the degree to which past information influences the current computation, allowing the model to adaptively adjust the utilization of short-term information. The mathematical formulation of the GRU is given by:
R t   =   σ ( W r X t + U r H t 1 + b r ) ,
Z t = σ ( W z X t + U z H t 1 + b z ) .
Among them, Wr and Wz are the weight matrices from the input Xt to the gating unit, Ur and Uz are the weight matrices linking the state of the previous time step to the gating unit, br and bz are the bias terms, and σ(⋅) is the Sigmoid activation function. During the calculation process, the input Xt undergoes weight transformation to form a matrix of (N, dh) shape. Meanwhile, the state Ht−1 of the previous time step undergoes transformation to obtain a matrix of the same shape and is added to the bias term. Since br and bz are vectors, the broadcasting mechanism will be triggered during the calculation, enabling it to expand to the matching matrix shape.

3.3. Transformer Attention Module

The attention-based mechanism, initially designed for natural language processing (NLP) tasks, has been progressively integrated into mechanical fault diagnosis in recent years. Central to the Transformer architecture is its self-attention mechanism, which calculates pairwise correlations among elements in the input sequence and dynamically assigns weights to features, thereby capturing long-range dependencies. The Transformer module primarily comprises a multi-head self-attention architecture, enabling parallel attention operations across distinct feature subspaces and significantly improving its capacity to extract fault-relevant signal characteristics. In this study, the input supplied to the Transformer module is the temporal feature sequence generated by the GRU; through self-attention application, the module acquires global contextual information, thereby strengthening the model’s emphasis on critical signal features.

3.4. Convolutional Block Attention Module

Although the Transformer module exhibits superior feature-capturing capabilities, its high computational complexity and limited real-time performance restrict its practical deployment in industrial scenarios. To address these limitations while maintaining robust feature extraction, this study incorporates the lightweight Convolutional Block Attention Module (CBAM) after the GRU layer. CBAM comprises two sequentially cascaded submodules—channel attention and spatial attention—that collaboratively refine fault features by focusing on critical channels and spatial regions. Specifically, the channel attention submodule employs global average and max pooling to aggregate channel-wise information, generating weight factors that amplify informative feature channels; the spatial attention submodule then analyzes these aggregated statistics to compute a spatial attention map, highlighting salient regions within the feature map. This dual mechanism enables CBAM to suppress noise-dominated channels/regions while enhancing fault-relevant details, aligning with the need for efficient yet precise feature selection in bearing fault diagnosis.
Due to its linear computational complexity and minimal parameter count, CBAM is strategically placed after the GRU layer rather than at the front end of the network. This placement leverages the GRU’s temporal modeling capability to first extract sequential fault patterns (encoding both short-term and long-term characteristics), allowing CBAM to refine these pre-evolved features rather than processing raw noisy signals. By doing so, CBAM focuses its attention on meaningful fault signatures already identified by GRU, further improving the model’s signal-to-noise ratio and diagnostic accuracy.
To validate the necessity of CBAM in the proposed framework, ablation studies (Figure 1) compare the performance of three configurations: GRU alone, GRU+Transformer, and GRU+CBAM. As shown in Figure 1 GRU+CBAM achieves an 2.48% accuracy boost under −4 dB noise compared to GRU alone, and outperforms GRU+Transformer by 1.79% in the same condition.
The detailed architecture of CBAM is illustrated in Figure 2, where the channel and spatial attention submodules are cascaded to form a hierarchical feature refinement pipeline. Together, these components enable the model to dynamically allocate weights to both channel-wise and spatially critical information, ultimately enhancing its responsiveness to essential fault characteristics in complex noise environments. The detailed architecture of the CBAM is presented in Figure 2.

4. Experimental Validation

4.1. Rolling Bearing Dataset Construction and Noise Environment Simulation

This research employs the openly accessible rolling bearing vibration dataset provided by Case Western Reserve University (CRUW) for experimental testing. SKF 6205 model bearings were evaluated under a steady load, with vibration signals collected at a sampling frequency of 12 kHz. Experimental investigations included normal operational states, inner raceway anomalies, outer raceway anomalies, and rolling element anomalies; the defect dimensions for each failure type were 0.007, 0.014, and 0.021 inches (in sequence), with these specimens fabricated via electrical discharge machining (EDM). To accurately simulate high-noise industrial environments, synthetic Gaussian white noise was injected into the raw vibration signals at signal-to-noise ratios (SNR) of −4 dB (high noise), 0 dB (moderate noise), and 4 dB (low noise). Each sample was adjusted to a consistent length of 2000 data points.

4.2. Comprehensive Performance Validation

4.2.1. Ablation Studies on Model Components

To validate the effectiveness of the proposed CBAM and Transformer modules, we conducted ablation experiments under three noise conditions (−4 dB, 0 dB, and 4 dB). Three model variants were systematically compared: the baseline MSCNN+GRU, the MSCNN+GRU+Transformer, and the full model MSCNN+GRU+CBAM. As illustrated in Figure 3, the full model consistently outperformed alternative architectures across all noise levels. Notably, under −4 dB noise, the CBAM-integrated model achieved a 2.48% higher accuracy compared to the baseline, demonstrating its critical role in noise suppression. This improvement stems from CBAM’s dual mechanisms: channel attention suppresses noise-dominated spectral features, while spatial attention highlights localized fault patterns. In contrast, the Transformer variant offered only a marginal 1.79% accuracy gain at −4 dB, confirming CBAM’s superiority in real-time feature refinement. Collectively, these findings highlight CBAM’s ability to enhance diagnostic robustness without compromising computational efficiency.

4.2.2. Diagnostic Performance Variation with Signal-to-Noise Ratio Under Varying Noise Conditions

Additionally, the signal-to-noise ratio (SNR) was systematically adjusted from −10 dB to 10 dB to evaluate the variation in diagnostic accuracy of the proposed framework under high-noise scenarios, as shown in Table 1. Results indicate that the MSCNN+GRU+CBAM model maintains superior performance over competing approaches throughout the full SNR spectrum. Notably, with SNR increasing from −10 dB to 4 dB, the accuracy of the proposed framework exhibits a sharp rise followed by stabilization, reflecting its robust noise resistance and operational reliability.

5. Conclusions

Overall, the experimental results demonstrate that the proposed MSCNN+GRU+CBAM fault diagnosis approach offers significant advantages in the context of rolling bearing fault detection. In particular, under severe noise interference, the method achieves superior accuracy and stability. Furthermore, the CBAM attention mechanism markedly enhances the model’s sensitivity to both channel-wise and spatial features, while the GRU module substantially improves the extraction of temporal dependencies within vibration signal sequences. These strengths confer notable theoretical significance and practical applicability to the proposed framework in the field of bearing fault diagnosis.

Author Contributions

Conceptualization, Y.Z. and J.W. and W.S.; writing—review, J.W.; Data curation, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Basic scientific research project of Wenzhou City (G2023066), NSFC (12202318).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Randall, R.B.; Antoni, J. Rolling element bearing diagnostics—A tutorial. Mech. Syst. Signal Process. 2011, 25, 485–520. [Google Scholar] [CrossRef]
  2. Liu, R.; Yang, B.; Zio, E.; Chen, X. Artificial intelligence for fault diagnosis of rotating machinery: A review. Mech. Syst. Signal Process. 2018, 108, 33–47. [Google Scholar] [CrossRef]
  3. Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
  4. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  5. Li, C.X.; Yin, X.Q.; Chen, J.X.; Yang, H.; Hong, L. Bearing Fault Diagnosis Based on Wavelet Transform and Convolutional Neural Network. Open Access Libr. J. 2022, 9, 1–14. [Google Scholar] [CrossRef]
  6. Zhao, X.; Jia, M.; Liu, Z. Semi-supervised graph convolution deep belief network for fault diagnosis of electromechanical systems. IEEE Trans. Ind. Inform. 2021, 17, 5657–5667. [Google Scholar] [CrossRef]
  7. Chen, Z.; Li, W. Multisensor feature fusion for bearing fault diagnosis using sparse autoencoder and deep belief network. IEEE Trans. Instrum. Meas. 2017, 66, 1693–1702. [Google Scholar] [CrossRef]
  8. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  9. Zhang, Q.; Wei, X.; Wang, Y.; Hou, C. Convolutional Neural Network with Attention Mechanism and Visual Vibration Signal Analysis for Bearing Fault Diagnosis. Sensors 2024, 24, 1831. [Google Scholar]
Figure 1. Diagnostic Framework Based on MSCNN–GRU with Integrated Attention Mechanism.
Figure 1. Diagnostic Framework Based on MSCNN–GRU with Integrated Attention Mechanism.
Engproc 111 00014 g001
Figure 2. Convolutional Block Attention Module Diagram.
Figure 2. Convolutional Block Attention Module Diagram.
Engproc 111 00014 g002
Figure 3. Comparison chart of confusion matrices for the three models under different signal-to-noise ratios.
Figure 3. Comparison chart of confusion matrices for the three models under different signal-to-noise ratios.
Engproc 111 00014 g003
Table 1. Comparative Performance of Attention-Enhanced Models for Bearing Fault Diagnosis Under Multi-Noise Conditions.
Table 1. Comparative Performance of Attention-Enhanced Models for Bearing Fault Diagnosis Under Multi-Noise Conditions.
Mscnn+GruMscnn+Gru+TransformerMscnn+Gru+Cbam
−10 dB Accuracy84.73%87.91%90.35%
−10 dB AUC-ROC0.890.910.94
−8 db Accuracy87.88%89.99%92.77%
−4 dB Accuracy95.15%96.94%97.63%
−4 dB AUC-ROC0.920.940.97
0 dB Accuracy99.19%99.31%99.72%
4 dB Accuracy99.33%99.42%99.99%
4 dB AUC-ROC0.970.980.99
8 dB Accuracy99.93%100%100%
10 dB Accuracy100%100%100%
10 dB AUC-ROC1.001.001.00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Wang, J.; Sun, W. Research on Bearing Fault Diagnosis Method Based on Multi-Scale Convolution and Attention Mechanism in Strong Noise Environment. Eng. Proc. 2025, 111, 14. https://doi.org/10.3390/engproc2025111014

AMA Style

Zhang Y, Wang J, Sun W. Research on Bearing Fault Diagnosis Method Based on Multi-Scale Convolution and Attention Mechanism in Strong Noise Environment. Engineering Proceedings. 2025; 111(1):14. https://doi.org/10.3390/engproc2025111014

Chicago/Turabian Style

Zhang, Yi, Jianlong Wang, and Weifang Sun. 2025. "Research on Bearing Fault Diagnosis Method Based on Multi-Scale Convolution and Attention Mechanism in Strong Noise Environment" Engineering Proceedings 111, no. 1: 14. https://doi.org/10.3390/engproc2025111014

APA Style

Zhang, Y., Wang, J., & Sun, W. (2025). Research on Bearing Fault Diagnosis Method Based on Multi-Scale Convolution and Attention Mechanism in Strong Noise Environment. Engineering Proceedings, 111(1), 14. https://doi.org/10.3390/engproc2025111014

Article Metrics

Back to TopTop