Fault Diagnosis of Rolling Bearing Acoustic Signal Under Strong Noise Based on WAA-FMD and LGAF-Swin Transformer

Wang, Hengdi; Wang, Haokui; Xie, Jizhan; Ma, Zikui

doi:10.3390/pr13092742

Open AccessArticle

Fault Diagnosis of Rolling Bearing Acoustic Signal Under Strong Noise Based on WAA-FMD and LGAF-Swin Transformer

¹

School of Mechanical and Electrical Engineering, Henan University of Science and Technology, Luoyang 471023, China

²

Schaeffler Trading (Shanghai) Co., Ltd., Shanghai 201804, China

^*

Author to whom correspondence should be addressed.

Processes 2025, 13(9), 2742; https://doi.org/10.3390/pr13092742

Submission received: 3 April 2025 / Revised: 23 May 2025 / Accepted: 12 June 2025 / Published: 27 August 2025

(This article belongs to the Section Process Control and Monitoring)

Download

Browse Figures

Review Reports Versions Notes

Abstract

To address the challenges of low diagnostic accuracy arising from the non-stationary and nonlinear time-varying characteristics of acoustic signals in rolling bearing fault diagnosis, as well as their susceptibility to noise interference, this paper proposes a fault diagnosis method based on a Weighted Average Algorithm–Feature Mode Decomposition (WAA-FMD) and a Local–Global Adaptive Multi-scale Attention Mechanism (LGAF)–Swin Transformer. First, the WAA is utilized to optimize the key parameters of FMD, thereby enhancing its signal decomposition performance while minimizing noise interference. Next, a bilateral expansion strategy is implemented to extend both the time window and frequency band of the signal, which improves the temporal locality and frequency globality of the time–frequency diagram, significantly enhancing the ability to capture signal features. Ultimately, the introduction of depthwise separable convolution optimizes the receptive field and improves the computational efficiency of shallow networks. When combined with the Swin Transformer, which incorporates LGAF and adaptive feature selection modules, the model further enhances its perceptual capabilities and feature extraction accuracy through dynamic kernel adjustment and deep feature aggregation strategies. The experimental results indicate that the signal denoising performance of WAA-FMD significantly outperforms traditional denoising techniques. In the KAIST dataset (NSK 6205: inner raceway fault and outer raceway fault) and the experimental dataset (FAG 30205: inner raceway fault, outer raceway fault, and rolling element fault), the accuracies of the proposed model reach 100% and 98.62%, respectively, both exceeding that of other deep learning models. In summary, the proposed method demonstrates substantial advantages in noise reduction performance and fault diagnosis accuracy, providing valuable theoretical insights for practical applications.

Keywords:

acoustic signal; feature mode decomposition; Swin Transformer; fault diagnosis; rolling bearing

1. Introduction

Rolling bearings are essential components in mechanical systems [1], and their failures account for a significant proportion of faults in rotating machinery. When a failure occurs, it can lead to abnormal vibrations, jeopardizing the safe operation of the equipment. Therefore, it is critical to extract fault signals accurately and promptly. While vibration signal analysis is widely used [2], contact sensors face limitations in harsh environments such as high temperatures and corrosive conditions. In contrast, sound signal analysis offers advantages such as non-contact measurement and remote monitoring, making it suitable for bearing monitoring in complex environments [3].

Traditional signal decomposition methods primarily include empirical mode decomposition (EMD) [4], ensemble empirical mode decomposition (EEMD) [5], complementary ensemble empirical mode decomposition (CEEMD) [6], and variational mode decomposition (VMD) [7], all of which have been widely applied in signal denoising. Miao et al. [8] proposed feature mode decomposition (FMD), a novel adaptive decomposition method specifically designed for mechanical fault feature extraction. Unlike VMD and wavelet transform, which impose constraints on filter type and bandwidth, FMD utilizes correlation kurtosis (CK) to adaptively update FIR filter banks, thereby enabling the extraction of more fault information. This advantage makes FMD particularly effective in mechanical signal decomposition, especially in scenarios where prior knowledge of fault periodicity is lacking.

Traditional bearing fault diagnosis methods typically rely on feature extraction in the time or frequency domain. However, these approaches are often constrained by the subjectivity and complexity of manual feature selection. In recent years, deep learning-based fault diagnosis methods have witnessed rapid development [9]. By enabling automatic feature extraction, deep learning allows models to learn and identify critical information directly from raw signals, providing an efficient and precise solution for fault diagnosis [10].

Currently, widely used deep learning techniques include convolutional neural networks (CNNs), graph convolutional networks (GCNs), recurrent neural networks (RNNs), long short-term memory networks (LSTMs), and attention mechanism-based models [11,12,13]. These methods have demonstrated remarkable performance in fault diagnosis, particularly in extracting complex features and handling nonlinear relationships.

Lou [14] proposed using the wavelet transform to process vibration signals and generate feature vectors, followed by an adaptive neuro-fuzzy inference system (ANFIS) to classify normal conditions, inner raceway faults, and ball faults. Guo [15] proposed a novel hierarchical adaptive deep convolutional neural network to overcome the reliance on expert knowledge in feature extraction, enabling both fault pattern recognition and fault size estimation. Ruan [16] proposed a physics-guided CNN that leverages fault periodicity and signal attenuation characteristics, thereby addressing the suboptimal design of conventional CNNs in terms of convolutional kernel size and input alignment for bearing vibration signals.

Fernández [17] proposed combining a one-class ν-SVM with bandpass filtering and the Hilbert transform to extract fault characteristics from the envelope spectrum, achieving early fault detection as well as a qualitative evaluation of fault location and evolution trends. Liang [18] proposed a fault diagnosis approach based on the wavelet transform and an improved residual neural network (IResNet) that incorporates a novel pooling layer, a global singular value decomposition strategy, and an enhanced loss function to significantly improve system robustness and diagnostic accuracy under harsh environmental interference and label noise.

Song [19] proposed a hybrid Transformer–ResNet architecture integrated with transfer learning strategies for joint feature extraction, maintaining high prediction accuracy even in high-noise environments. Yu [20] proposed a fault diagnosis framework based on graph neural networks (GNNs) and dynamic graph embedding (DGE) to address operating condition fluctuations. This framework first extracts hidden features using a CNN, then reconstructs dynamic features via an edge dropout mechanism and node voting mechanism to enhance fault pattern recognition stability.

Liu [21] proposed a Cycle-GAN-based transfer learning model leveraging numerical simulation to overcome challenges in fault data acquisition in real industrial scenarios. By generating simulated vibration signals using a bearing dynamics model and transforming them into near-realistic signals through Cycle-GAN, this approach alleviates limitations caused by a lack of fault data. Ni [22] proposed a dual-stream CNN model for fault diagnosis in CNC machine tools, where one stream processes one-dimensional vibration signal spectra and the other handles two-dimensional time–frequency representations derived from the same signal. By incorporating convolutional attention mechanisms, dropout, and batch normalization, the model effectively integrates spatiotemporal features. Wang [23] proposed a hybrid approach that combines deep convolutional neural networks (DCNNs) with a deep extreme learning machine (DELM) optimized by the whale optimization algorithm (WOA). By integrating Efficient Channel Attention (ECA-Net) and BiLSTM, the enhanced DCNN achieves high-efficiency fault identification across multiple working conditions.

However, due to the intricate and highly variable time–frequency characteristics of bearing fault signals, traditional deep learning models often struggle with challenges such as capturing long-range dependencies, feature redundancy, and high computational complexity [24].

Traditional denoising methods exhibit limited adaptability when dealing with the non-stationary and nonlinear noise characteristics of audio signals. Their ability to perceive entropy variations in the multidimensional spectral properties and time–frequency coupling relationships of signals is relatively weak, making it difficult to effectively distinguish between signal components and noise. Consequently, these conventional methods struggle to maintain robustness in complex and dynamic noise environments.

To address these challenges, this study first employs the Weighted Average Algorithm (WAA) [25] to optimize FMD for signal denoising. The most informative intrinsic mode functions (IMFs) containing fault characteristics are then selected. Subsequently, Continuous Wavelet Transform (CWT) is utilized to convert one-dimensional signals into two-dimensional time–frequency representations, serving as the input for the deep learning model.

Despite its strengths, the local window self-attention mechanism in the Swin Transformer [26] has limitations in capturing long-range dependencies in sequential signals, restricting the effective fusion of global features. Furthermore, the Swin Transformer exhibits a weaker generalization ability when handling imbalanced datasets.

To overcome these limitations, this paper proposes the Local–Global Attention and Adaptive Feature Selection Module–Swin Transformer (LGAF-Swin Transformer), a novel deep learning model that integrates a local–global adaptive multi-scale attention mechanism with an adaptive feature selection module. The proposed method introduces a Bilateral Augmentation (BA) strategy to expand both the time window and frequency bandwidth of the signal. Additionally, depthwise separable convolutions are employed to rapidly enlarge the receptive field in shallow layers, thereby reducing the number of parameters and computational resources required for optimal performance. Lastly, a hybrid activation layer replaces the conventional Softmax layer in the Swin Transformer, enhancing nonlinear expressiveness and improving the contribution of minority-class samples during backpropagation. The experimental results on both open-source datasets and real-world tests validate the superior fault feature extraction capability of the proposed model under complex conditions.

The research process presented in this paper encompasses optimization algorithms, signal processing algorithms, and deep learning algorithms. The characteristics of the different algorithms are illustrated in Table 1.

2. Methodologies

2.1. Noise Reduction in Signals

In industrial settings, bearing acoustic signals are often overwhelmed by complex noise, making fault feature extraction particularly challenging. Directly feeding raw signals into deep learning models not only risks the loss of critical information due to noise masking but also exposes the models to interference from irrelevant features, which can lead to overfitting and ultimately reduce both diagnostic accuracy and generalization capabilities. Therefore, effective denoising to extract diagnostically valuable signal components is a prerequisite for establishing an intelligent diagnostic system. Subsequently, the optimal denoised components are transformed into time–frequency maps using CWT, providing an intuitive representation of fault features. These time–frequency maps are then used as inputs for the deep learning model, enabling precise fault pattern recognition and enhancing the stability and reliability of the diagnostic process.

Traditional eigenmode decomposition decomposes the signal into multiple sub-signals using an FIR filter bank and uses the CK as well as the impulsivity and periodicity of the signal to perform back-convolution. FMD, taking into account the impulsiveness and periodicity of the signal, can effectively resist the influence of other interferences and noise. The CK representation of the k-th decomposition function u_k is as follows:

C K M (u_{k}) = \frac{\sum_{n = 1}^{N} {(\sum_{m = 0}^{M} u_{k} (n - m T_{s}))}^{2}}{\sum_{n = 1}^{N} u_{k} {(n)}^{2}}

(1)

In this equation, N represents the total number of sampling points of the decomposition function u_k; n denotes the index of the current sampling point; M indicates the maximum delay when calculating CK; m is the index of the correlation delay; T_s is the sampling time interval; and u_k(n) refers to the amplitude of the k-th decomposition function at the n-th sampling point in the time series. The representation used to update the coefficient f_k of the k-th FIR filter is as follows:

R_{X W X} f_{k} = R_{X X} f_{k} λ

(2)

In this equation, R_XWX represents the weighted correlation matrix, which enhances the significance of the frequency components by considering the statistical characteristics of the signal; R_XX denotes the autocorrelation matrix, measuring the similarity of the signal with itself at different time lags; f_k is the coefficient vector of the k-th FIR filter; and λ is the eigenvalue associated with f_k.

However, this process requires multiple iterations and filter updates, and lacks an adaptive adjustment mechanism, which leads to information overload when dealing with long time series signals, which in turn affects the accuracy of mode selection. To solve this problem, this paper introduces the WAA to optimize the key parameters of the FMD, and takes the minimum envelope entropy as the objective function. To overcome the iterative complexity and non-adaptive nature of the traditional FMD, the filter length and cycle period are adaptively updated, and this optimization process improves the frequency resolution and feature selection accuracy, and at the same time optimizes the fault feature retention ability and convergence rate.

The implementation steps of the WAA are as follows:

Randomly initializing the population provides the algorithm with an evenly distributed initial solution in the search space, enabling the algorithm to explore on a global scale.

X_{i, j} (0) = L B_{j} + rand (U B_{j} - L B_{j})

(3)

In this equation, X_i,j(0) represents the initial position of the i-th individual in the j-th dimension; LB_j denotes the lower bound of the j-th dimension; UB_j denotes the upper bound of the j-th dimension; and rand is a random number between 0 and 1.

2.: The weighted average position X_Min(it) is calculated, reflecting the trend in high-quality solutions within the current population, providing guidance for subsequent individual position updates.

X_{Min} (i t) = \frac{\sum_{i = 1}^{N_{Candidate}} Fitness (X_{i}) X_{i}}{\sum_{i = 1}^{N_{Candidate}} Fitness (X_{i})}

(4)

In this equation, X_Min(it) represents the weighted average position at the it-th iteration; N_Candidate denotes the number of candidate individuals used to compute the weighted average position; Fitness(X_i) refers to the fitness value of the i-th candidate individual, reflecting the quality of the individual; and X_i indicates the position of the i-th candidate individual.

3.: Based on X_Min(it), the individual position is updated to move towards high-quality solutions while maintaining exploration capabilities through random factors, thus balancing global search and local development.

X_{i} (i t + 1) = X_{i} (i t) + rand \times (X_{Miu} (i t) - X_{i} (i t))

(5)

In this equation, X_i(it + 1) and X_i(it) represent the positions of the i-th individual at the (it + 1)-th and it-th iterations, respectively.

To verify the superiority of the WAA, this study uses the Dragonfly Algorithm (DA) and the Grey Wolf Optimizer (GWO) as benchmark methods and analyzes the average convergence efficiency curves of four standard test functions, as shown in Equation (1). The four test functions encompass various optimization problem scenarios and characteristics, including convex optimization, non-convex optimization, and stochastic optimization, thereby providing a comprehensive assessment of the performance of parameter optimization algorithms under different conditions. Among them, F1 is a typical ball function in unconstrained optimization problems, which examines the algorithm’s ability to find the global minimum; F2 consists of absolute value and factorial functions in unconstrained optimization problems, aimed at testing the global search capability of optimization algorithms; F3 is a quadratic function with a random term, used to evaluate the stability and robustness of optimization algorithms in a stochastic environment; and F4 is a complex nonlinear function in unconstrained optimization problems, designed to test the global search ability and convergence speed of optimization algorithms in complex scenarios.

\begin{matrix} F_{1} (x) = \sum_{i = 1}^{n} x_{i}^{2} \\ F_{2} (x) = \sum_{i = 1}^{n} |x_{i}| + \lim_{x \to \infty} \prod_{i = 1}^{n} |x_{i}| \\ F_{3} (x) = \sum_{i = 1}^{n} i x_{i}^{2} + random (0, 1) \\ F_{4} (x) = - 20 \exp (- (\frac{1}{5}) \sqrt{\frac{1}{n \sum_{i = 1}^{n} x_{i}^{2}}}) \end{matrix}

(6)

In the equation, x_i represents the i-th component in the variable dimension, where n denotes the dimension of the variable. The limit is indicated by lim, while random (0, 1) signifies a random number between 0 and 1, and exp refers to the exponential function.

The test results, presented in Figure 1, indicate that while the DA demonstrates a strong exploratory capability, its convergence speed is relatively slow, making it prone to premature convergence. The GWO algorithm introduces strategic adjustments that enhance its convergence performance slightly over the DA; however, it still exhibits limitations in maintaining population diversity. In contrast, the WAA integrates information through a weighted averaging strategy, significantly improving convergence speed and stability while demonstrating an excellent robustness and adaptive capability. Overall, the WAA outperforms the DA and GWO in preventing premature convergence and optimizing performance.

Through the weighted balancing mechanism of WAA, the modal selection and filter design in the iterative process of FMD are optimized to ensure the effective separation of signals in different frequency bands and reduce the interference of low-frequency noise, while retaining the important effective information of the fault features. The Improved Signal Extraction Index (ISEI) proposed in this paper is used as the criterion for selecting the optimal intrinsic mode function following FMD. The ISEI is calculated from the Z-score normalized values of weighted kurtosis and the Harmonic Noise Ratio (HNR):

I S E I = w_{1} \frac{K - μ_{K}}{σ_{K}} + w_{2} \frac{H N R - μ_{H N R}}{σ_{H N R}}

(7)

In this equation, K represents kurtosis,

μ_{K}

is the mean value of kurtosis,

σ_{K}

is the standard deviation of kurtosis,

H N R

denotes the harmonic noise ratio,

μ_{H N R}

and

σ_{H N R}

are its mean and standard deviation, and

w_{1}

and

w_{2}

are the weights for kurtosis and HNR, set to 0.3 and 0.7, according to [27]. The specific flow of the WAA-optimized FMD is shown in Figure 2.

2.2. Swin Transformer

Swin Transformer [26] is a hierarchical visual Transformer model specifically designed for vision tasks. It employs a window-based self-attention mechanism to extract image features by limiting the attention scope to local windows and utilizing a hierarchical architecture to process multi-scale information, thereby significantly improving both feature extraction efficiency and computational performance.

As shown in Figure 3, the architecture of Swin Transformer is organized in layers, where each layer performs self-attention calculations within fixed-size windows to extract features. The main modules include the following: Patch Partition, which divides the input image into non-overlapping patches; Linear Embedding, which maps each patch into a high-dimensional vector space; Swin Transformer Block, which comprises both window-based multi-head self-attention (W-MSA) and shifted window multi-head self-attention (SW-MSA); and the Patch Merging Layer, which aggregates features from adjacent patches, reducing resolution while increasing channel dimensions. In Figure 3, ‘H’ and ‘W’ represent the height and width of the image, respectively, while C(3) denotes the number of channels.

Z^{l - 1}

is the output feature from the (l − 1)th layer in the MLP, and

Z^{l}

is the final output of the lth layer, obtained after processing by the MLP and through a residual connection.

{\hat{Z}}^{l}

is the intermediate output of the lth layer, generated after processing by W-MSA and through a residual connection.

{\hat{Z}}^{l + 1}

is the intermediate output of the (l + 1)th layer, generated after processing by SW-MSA and through a residual connection.

Z^{l + 1}

is the intermediate output of the (l + 1)th layer, also generated after processing by SW-MSA and through a residual connection.

The traditional Swin Transformer model performs excellently in structured tasks such as image recognition; however, it exhibits significant limitations when applied to bearing fault diagnosis in strong noise acoustic environments. On one hand, the standard Transformer primarily relies on a global attention mechanism, lacking the precise modeling capability for local short-term impulse features and non-stationary harmonic patterns, making it difficult for the model to recognize weak fault signals under high noise interference. On the other hand, while the fixed window attention introduced by the Swin Transformer does incorporate some locality, its rigid design of window size and channel structure still struggles to adapt to the complex and variable local and global features present in actual acoustic signals. Furthermore, in the absence of integrated denoising strategies, the Transformer structure is prone to overemphasizing background noise, leading to feature degradation and reduced diagnostic performance. To address these issues, this paper proposes the LGAF-Swin Transformer, which enhances the representation of time–frequency graph structures through the introduction of a bilateral enhancement module, improves shallow local feature perception using depthwise separable convolutions, constructs a local–global attention mechanism that integrates multi-scale convolutions and self-attention to enhance the modeling capability of non-uniform fault regions, and suppresses redundant information through an adaptive feature selection module, significantly improving the model’s robustness and diagnostic accuracy in strong noise acoustic fields.

2.3. Model of LGAF-Swin Transformer

2.3.1. The Structure of the Network

In view of the sparse nature of time–frequency maps and the significant impact of local features on rolling bearing fault diagnosis, this paper proposes the LGAF-Swin Transformer model. First, a BA strategy is employed to extend the signal’s time window and frequency band. By using a 3 × 3 convolution kernel to increase the number of channels from 256 to 512, the receptive field is enhanced, thereby improving the perception of both local and global features in the time–frequency map while effectively reducing the computational burden. To balance computational efficiency with feature extraction capability, depthwise separable convolution (DSC) is introduced, which further expands the receptive field, compresses the parameter count, and lowers computational costs. This optimization ensures that the network can accurately extract fault features even under limited computational resources.

A Local–Global Attention Mechanism (LGAM) is then applied to adaptively adjust the weights of multi-scale features. The dynamic convolution kernel strategy within the LGAM adjusts the kernel size based on different fault patterns, effectively mitigating the loss of adjacent information in deep networks and further enhancing the receptive field. Positioned after the feature extraction layer, this module also reduces the complexity of shallow networks. Additionally, the model incorporates an Adaptive Feature Selection Module (AFSM), which uses attention mechanisms at each layer to automatically filter for the most discriminative feature channels. By weighting the feature outputs, the model prioritizes high-weight features as primary inputs while discarding low-weight redundant features, thereby effectively reducing the influence of extraneous information.

Furthermore, Attention Pooling (AP) is utilized to focus on salient features. This module employs a 3 × 3 convolution kernel with 128 output channels to effectively concentrate on key regions within the time–frequency map. Finally, the traditional Softmax layer in the Swin Transformer is replaced with a Hybrid Activation Layer (HAL) that combines ReLU and Sigmoid activation functions. This modification significantly enhances the model’s ability to fit complex nonlinear features, particularly improving backpropagation for imbalanced samples, thus ensuring training stability and optimal overall performance. The overall structure and parameter settings of the model are illustrated in Figure 4 and detailed in Table 2.

The LGAF-Swin Transformer model proposed in this paper aims to address the challenges posed by the strong non-stationarity of rolling bearing acoustic signals, the weakness of fault features, and significant noise interference through the collaborative effect of multiple modules. The four core modules are developed from four dimensions: perception range expansion, feature sparse modeling, multi-scale information fusion, and discriminative feature enhancement. Each module constructs an overall deep network architecture in a hierarchical, progressive, and functionally complementary manner, with the following interactive relationships and design logic:

Bilateral Augmentation:

The BA module is located at the network input end, and its core objective is to extend the temporal locality and global frequency characteristics of the signal, allowing the initial time–frequency representation to encompass a more complete set of periodic impulse information and harmonic structures. By dynamically selecting neighborhood points through a time–frequency-weighted Euclidean distance, it integrates geometric offset (Δp) and semantic difference (Δf), thereby enhancing the local periodicity and global band distribution of the time–frequency representation. Its specific parameters include the following: time weight α > frequency weight β. The Gaussian kernel σ is initialized based on the time–frequency representation resolution and optimized through backpropagation.

Interaction Design:

(1): Input Optimization: The enhanced time–frequency representation outputted by the BA module (with the number of channels expanded from 256 to 512) serves as the input for subsequent DSC, ensuring that the shallow network can capture richer time–frequency coupling features.
(2): Neighborhood Awareness: By applying Gaussian kernel weighting to suppress low-frequency noise, the BA module elevates the SNR to 5.38 dB under −13 dB noise conditions, providing high SNR features for the subsequent attention mechanism.

2.: Depthwise Separable Convolution:

The DSC module reduces the parameter count from the standard convolution of 256 × 256 × 3² to 256 × 3² + 256 × 256 by decomposing the ordinary convolution operation into depthwise convolution and pointwise convolution. This approach maintains the local feature distribution and cross-channel coupling relationships while decreasing both the parameter count and computational complexity.

Interaction Design:

(1): Shallow Feature Compression: DSC, as a subsequent layer of the BA module, compresses the feature map from 512 channels to 256 channels, preventing information overload.
(2): Receptive Field Expansion: By utilizing 3 × 3 depthwise convolution, the receptive field of the shallow network is rapidly expanded, providing multi-scale feature primitives for the Local–Global Attention Mechanism (LGAM).

3.: Local–Global Attention Mechanism:

LGAM integrates multi-scale convolutional receptive fields with self-attention mechanisms, aiming to address the issues of insufficient local feature capture and weak global relationship modeling in traditional Swin Transformers when processing non-uniform feature areas. By introducing a dynamic convolution kernel adjustment mechanism, the size of the convolution kernel can be adaptively adjusted based on the input feature patterns, thereby enhancing the feature receptive strength in different regions.

Interaction Design:

(1): Multi-scale feature fusion: The 256-channel features output by the DSC are utilized to extract microscopic deformation features (such as the seventh harmonic of rolling element faults) through local convolution, while global attention is employed to model macroscopic degradation trends (such as the variation in fault frequency with load).
(2): Dynamic weight allocation: The multi-scale features output by LGAM are input into the Adaptive Feature Selection Module (AFSM), which reinforces key frequency bands through attention weights.
4.: Adaptive Feature Selection Module:

AFSM constructs a nonlinear attention weight mechanism through a hybrid activation function of Sigmoid and ReLU, automatically learning the discriminative capability of each feature channel. This approach achieves the high retention of discriminative features while suppressing low-value features, and further enhances the response capability to boundary features and small sample characteristics through a dynamic threshold mechanism.

Interaction Design:

(1): Feature Channel Optimization: The 256-channel features output by LGAM are weighted, retaining high-weight channels (such as the time–frequency regions corresponding to fault impacts) and discarding low-weight noise channels.
(2): Classifier Input Adaptation: The features filtered by AFSM are input to the fully connected layer, where attention pooling (AP) focuses on key regions, and is ultimately replaced by a Hybrid Activation Layer (HAL) instead of Softmax, enhancing the nonlinear expressive capability.

Summary of Module Synergy:

(1): Data Flow Synergy: BA → DSC → LGAM → AFSM forms a hierarchical progressive link.
(2): BA and DSC: The former enhances noise reduction, while the latter optimizes compression, ensuring the signal-to-noise ratio and efficiency of shallow features.
(3): LGAM and AFSM: The former extracts and fuses multi-scale features, while the latter selects discriminative regions, enhancing the capability of weak fault identification.
(4): Performance Verification: Testing on the KAIST dataset achieved an accuracy of 100%, significantly surpassing the standard Swin Transformer (93.67%).

2.3.2. Enhancement of Data and Extraction of Features

BA [28] is a method based on point cloud feature enhancement. In bearing fault diagnosis, the horizontal axis of the time–frequency map must capture the periodic fault impacts, while the frequency axis should represent the distribution of resonance bands. However, conventional convolution operations treat the time–frequency map

X \in R^{H \times W \times C}

as a two-dimensional grid without an inherent topological structure, and their fixed-size kernels are unable to adaptively capture dynamic variations within the time–frequency domain. Therefore, this paper proposes a Time–Frequency-Aware Bilateral Augmentation (TF-BA) approach, which regards the time–frequency map as point cloud data. In this framework, each point

p_{i} = (t_{i}, f_{i})

corresponds to a time–frequency coordinate, along with its associated feature vector

f_{i} \in R^{C}

. The TF-BA module enhances feature representation through the following steps:

(1): Time–Frequency Distance Metric

The neighborhood relationship for each pixel

(t, p)

in the time–frequency map is defined using a time–frequency-weighted Euclidean distance

d (p_{i}, p_{j}) = \sqrt{α {(t_{i} - t_{j})}^{2} + β {(f_{i} - f_{j})}^{2}}

(8)

In this equation,

α > β

represents a higher weight in the time dimension, enhancing the continuity of periodic impact signals, while the frequency dimension weight controls the sensitivity to the local range of the resonance band.

(2): Dynamic Neighborhood Selection and Feature Fusion

For a central point

p i

, the top k most relevant neighboring points are selected based on the time–frequency distance

d (t i, p j)

. A Gaussian kernel function is then applied for weighting, where the weight

w_{i j}

is given by the following equation:

w_{i j} = \exp (- \frac{d {(p_{i}, p_{j})}^{2}}{2 σ^{2}})

(9)

In this equation,

σ

represents the initial value determined by the resolution of the time–frequency map and is optimized through backpropagation.

(3): Enhancement of characteristics

\begin{array}{l} G_{i} = concat (M ({\tilde{G}}_{w} (p_{i})), M ({\tilde{G}}_{w} (f_{i}))) \\ Δ p_{i j} = (t_{j} - t_{i}, f_{j} - f_{i}) \\ Δ f_{i j} = f_{j} - f_{i} \end{array}

(10)

In this equation, the geometric context

M ({\tilde{G}}_{w} (p_{i}))

comprises the time–frequency offset

p_{i}

between the central point

Δ p_{i j}

and its neighboring points, capturing the spatiotemporal distribution of periodic impacts. The semantic context

M ({\tilde{G}}_{w} (f_{i}))

consists of the difference between the central point feature

f_{i}

and its neighboring point features

Δ f_{i j}

, and it is used to capture energy variations in the resonance frequency bands. The multilayer perceptron mm learns the mapping relationships of geometric and semantic contexts through a fully connected network. The concatenation operation

concat (\cdot)

fuses both types of contextual information, generating robust feature representations.

2.3.3. Depthwise Separable Convolution

The computational complexity of Swin Transformer increases rapidly with model depth and input dimensions. To alleviate this computational burden, a DSC layer is introduced. DSC first performs depthwise convolution (DC), which applies independent convolution operations to each input channel separately. This is followed by pointwise convolution (PC), utilizing 1 × 1 convolutions to combine features across different channels. Each convolution operation is followed by batch normalization and a ReLU activation function to further enhance feature representation and nonlinear characteristics.

Y = D (X) \circ W + b

(11)

In this equation,

D (X)

represents the DC operation,

W

denotes the PC operation, ∘ signifies element-wise multiplication, and

b

is the bias term.

In diagnostic tasks, DSC optimizes the feature extraction process of time–frequency maps by reducing redundant computations and the number of parameters. As shown in Figure 5, DC applies a 3 × 3 convolution kernel independently to each channel of the input feature map, where Din input channels correspond to Din separate kernels. Each kernel operates only on its corresponding channel, and the number of parameters is Din × K². In Figure 6, PC uses a 1 × 1 convolution kernel to combine the Din output channels from DC into Dout final output channels, with the number of parameters being Din × Dout. The total parameters of the structure are Din × K² + Din × Dout, which are significantly fewer than the standard convolution’s Din × Dout × K² parameters. This reduction in parameters and computational complexity makes the network more lightweight, effectively lowering the learning difficulty of the mapping and accelerating the model’s convergence speed while maintaining performance.

2.3.4. Local–Global Attention Mechanism

The LGAM [29] combines multi-scale convolution and self-attention mechanisms to effectively capture fine-grained local features and global context information in time–frequency maps. For local feature extraction, multiple convolution kernels are used to extract local features at different scales, and these features are then weighted and fused through the self-attention mechanism. Meanwhile, global features are extracted using larger convolution kernels to capture global information. The structure of the LGAM is shown in Figure 7.

2.3.5. Adaptive Feature Selection Module

The AFSM [30] learns the attention weights of each channel to select the most important features, enhancing their influence while suppressing irrelevant or redundant features. In traditional convolutional operations, all channels are typically treated as equivalent; however, the AFSM allows the model to adaptively select features based on the specific characteristics of the data. This mechanism improves the model’s accuracy and robustness in high-dimensional, noisy tasks such as rolling bearing fault diagnosis. The HAL is used in place of Softmax to compute attention weights, employing the following hybrid activation function:

H A L (x) = β \times σ (x) + (1 - β) \times R (x)

(12)

In this equation,

σ (x)

represents the Sigmoid function,

R (x)

denotes the ReLU function, and

β

is a parameter between 0 and 1, which is used to balance the outputs of the Sigmoid and ReLU functions.

The calculation formula for the attention weight

α_{i}

is as follows:

α_{i} = H A L (W_{a} \times R e L U (W_{e} \times F_{i}))

(13)

In this equation,

W_{a}

and

W_{e}

are learnable weight matrices, and

F_{i}

is the output feature of the i-th layer.

F_{weighted} = \sum_{i \in selected indices} α_{i} \cdot F_{i}

(14)

In this equation, selected indices refer to the set of indices selected based on the attention weights

α_{i}

and the dynamic threshold

θ

. The AFSM integrates the selected features through weighted summation to form the final feature representation, which is then used for subsequent fault diagnosis tasks.

2.3.6. Fault Diagnosis Model of LGAF-Swin Transformer

A multi-layer feature enhancement LGAF-Swin Transformer model for rolling bearing fault diagnosis is proposed. To address the limitations in time–frequency feature extraction, a TF-BA module is designed. This module enhances signal time–frequency features by extending the time domain window and applying frequency band decomposition strategies. A DSC module is used to construct a lightweight feature extraction structure. The cascade design of DC and PC reduces the number of parameters while efficiently representing shallow features.

The LGAM captures microscopic fault deformation features using 3 × 3 local convolution kernels and extracts macroscopic degradation trends by combining cross-channel self-attention mechanisms. Dynamic weights are learned to achieve the adaptive fusion of dual-scale features.

To further optimize the feature transfer path, an AFSM and HAL are introduced. These modules enable the dynamic selection of key fault components and a segmented nonlinear activation strategy to suppress noise interference. This approach maintains the model’s nonlinear expressive power while enhancing the discriminative ability of the feature space.

The fault diagnosis process of the LGAF-Swin Transformer is shown in Figure 8. The process involves collecting bearing operation acoustic signals, applying WAA-FMD for noise reduction, and using CWT for data preprocessing. The preprocessed data is then input into the LGAF-Swin Transformer for fault classification tasks. In Figure 8,

Z^{l - 1}

represents the output features of the (l − 1)th layer in the MLP;

Z^{l}

denotes the final output of the lth layer, obtained after processing by the MLP and through a residual connection;

{\hat{Z}}^{l}

is the intermediate output of the lth layer, generated after processing by W-MSA and via a residual connection;

{\hat{Z}}^{l + 1}

is the intermediate output of the (l + 1)th layer, produced after processing by SW-MSA and through a residual connection; and

Z^{l + 1}

is the intermediate output of the (l + 1)th layer, also generated after processing by SW-MSA and through a residual connection.

To avoid overfitting issues, this paper incorporates structural optimization strategies in its model design, including depthwise separable convolutions, adaptive feature selection modules, and mixed activation functions. Additionally, bilateral enhancement is employed to increase the diversity of the input data. Furthermore, the model’s good generalization ability is validated through its consistency of performance across the training set, validation set, and independent test set, as well as the stable convergence trends in the loss function and accuracy.

3. Simulation Verification Analysis

To comprehensively evaluate the denoising performance of WAA-FMD under different SNR conditions, this experiment constructs rolling bearing fault simulation signals at various SNR levels of −12 dB, −10 dB, −8 dB, −6 dB, and −4 dB. The denoising effects of different optimization algorithms on FMD are compared, including the DA, GWO, and WAA-optimized FMD models (i.e., DA-FMD, GWO-FMD, and WAA-FMD), with performance comparisons against fixed-parameter FMD and the CEEDAN method. The expression of the simulation signal is as follows:

\{\begin{matrix} x (t) = s (t) + n (t) = \sum_{i} A_{i} h (t - i T) + n (t) \\ A_{i} = 1 + A_{0} \cos (2 π f_{r} t) \\ h (t) = e^{(- C t) \cos (2 π f_{n} t)} \end{matrix}

(15)

In this equation,

s (t)

represents the periodic impulse component; the amplitude

A_{0}

is 0.3; the rotational frequency

f_{r}

is 30 Hz; the decay coefficient C is 700; the resonance frequency

f_{n}

is 4 kHz; the sampling frequency f_s is 16 kHz; the number of analysis points is 4096; the inner raceway fault characteristic frequency

f_{i} = 1 / T = 120 Hz

; and

n (t)

represents Gaussian white noise, adjusted to −13 dB, −11 dB, −9 dB, −7 dB, and −5 dB, respectively.

Figure 9 illustrates the optimization convergence curves of DA-FMD, GWO-FMD, and the proposed WAA-FMD during the denoising process of the simulated signals, with the minimum envelope entropy as the objective function. The experimental results indicate that the DA converges slowly and exhibits poor stability in high noise environments such as −9 dB, −11 dB, and −13 dB, making it prone to local optima. The FMD parameter combinations obtained are often insufficient for effective feature extraction tasks. Although the GWO algorithm outperforms DA in search performance and can somewhat improve convergence effects, its global optimization capability still has limitations. In contrast, the WAA optimization algorithm maintains a fast and stable convergence trend under strong noise conditions by dynamically adjusting the weights of the FMD parameters, demonstrating greater robustness and adaptability. It effectively overcomes the problem of premature convergence commonly seen in traditional optimization methods, allowing for a more comprehensive acquisition of the optimal FMD parameter combinations. Overall, WAA shows significant advantages over DA and GWO in terms of optimization efficiency, global search capability, and algorithm robustness.

3.1. Indicators of Noise Reduction Effect

3.1.1. Signal-to-Noise Ratio (SNR)

A higher SNR indicates better denoising performance. It measures the ratio of the signal’s power to the noise’s power, with a higher value signifying that the signal is clearer and less affected by noise.

S N R = 20 \log_{10} (\frac{v_{s}}{v_{n}})

(16)

In the equation,

v_{s}

and

v_{n}

represent the effective values of the shock component and the noise component, respectively.

3.1.2. Normalized Correlation Coefficient (NCC)

The NCC is used to measure the similarity between the denoised signal and the original signal. The closer the NCC value is to 1, the more similar the signals are, indicating better denoising performance.

N C C = \frac{\sum_{i = 1}^{N} (x_{denoised} (i) - μ_{denoised}) (x_{original} (i) - μ_{original})}{\sqrt{\sum_{i = 1}^{N} {(x_{denoised} (i) - μ_{denoised})}^{2} \sum_{i = 1}^{N} {(x_{original} (i) - μ_{original})}^{2}}}

(17)

In this equation,

x_{denoised} (i)

and

x_{original} (i)

represent the i-th sample point of the denoised signal and the original signal, respectively;

μ_{denoised}

and

μ_{original}

represent the mean values of the denoised signal and the original signal, respectively.

3.1.3. Mean Square Error (MSE)

The MSE is used to measure the average difference between the denoised signal and the original signal. The larger the MSE value, the greater the difference, indicating a poorer denoising effect.

M S E = \frac{1}{N} \sum_{i = 1}^{N} {(x_{denoised} (i) - x_{original} (i))}^{2}

(18)

In this formula,

N

is the signal length, and

x_{denoised} (i)

and

x_{original} (i)

represent the i-th sample points of the denoised signal and the original signal, respectively.

3.2. Simulated Signals with Different Signal-to-Noise Ratios

As shown in Figure 10, the impact waveform of the simulated signal illustrates the inner raceway fault of a rolling bearing under different levels of Gaussian white noise. Specifically, noise levels of −13 dB, −11 dB, −9 dB, −7 dB, and −5 dB were introduced to simulate varying SNR conditions.

At lower noise levels, as depicted in Figure 10a,b, where the −5 dB and −7 dB noises were added, the simulated signals in the time domain still exhibit distinguishable periodic impact characteristics. However, as the noise level increases to −9 dB, −11 dB, and −13 dB, as shown in Figure 10c–e, the periodic impact features in the time domain waveform become increasingly obscured, with the inner raceway fault signal almost entirely masked by the strong noise.

3.3. Performance Evaluation of Different Noise Reduction Methods

CEEMDAN, fixed-parameter FMD, DA-FMD, GWO-FMD, and the proposed method correspond to methods A, B, C, D, and E, respectively. These methods were applied to denoise signals at SNR levels of −5 dB, −7 dB, −9 dB, −11 dB, and −13 dB. Figure 11 presents the SNR, MSE, and NCC metrics of the denoised signals.

As shown in Figure 11, the proposed method demonstrates significant advantages in denoising fault simulation signals under different SNR conditions. In a −5 dB noise environment, its SNR improves to 11.97 dB, and even under severe noise conditions at −13 dB, the SNR remains at 5.38 dB, far surpassing the other methods. In contrast, the fixed-parameter FMD achieves only 6.13 dB and 0.85 dB at −5 dB and −13 dB, respectively, indicating its strong dependence on parameter settings. Meanwhile, GWO-FMD and DA-FMD suffer from the local convergence and insufficient global search capability of the GWO and DA, limiting their denoising performance.

Regarding the MSE metric, the proposed method achieves the lowest mean square error, with values of just 0.11 at −5 dB and 0.43 at −13 dB, significantly lower than those of the other methods. This demonstrates that the proposed approach not only effectively suppresses noise and minimizes signal distortion but also preserves the key characteristics of the original signal, ensuring high similarity between the denoised and original impact signals.

Additionally, in terms of the NCC metric, the proposed method attains an NCC value of 0.94 under −5 dB noise, indicating an almost perfect match with the original impact waveform. Even at −13 dB, the NCC remains as high as 0.81, significantly outperforming the other methods. In summary, these results clearly validate the superior ability of the proposed denoising method in enhancing signal quality, reducing noise interference, and preserving essential signal features.

Under the noise conditions of −5 dB, −9 dB, and −13 dB, simulated inner raceway fault signals were selected and processed using CEEMDAN, fixed-parameter FMD, DA-FMD, GWO-FMD, and the proposed method to extract fault-related information. Based on the maximum ISEI criterion, the optimal component was selected for envelope spectrum analysis.

As illustrated in Figure 12, a comparative analysis of the optimal component envelope spectrum obtained by different methods under varying SNRs reveals that even under severe noise interference at −13 dB, the proposed method effectively extracts fault features, distinctly exhibiting the inner raceway fault frequency

f_{i}

120 Hz along with its four harmonic components. In contrast, fixed-parameter FMD and CEEMDAN struggle to preserve critical fault-related information at −13 dB, leading to the loss of essential fault characteristics. These methods can only marginally extract weak fault features under −5 dB conditions, with limited distinguishability.

Comparatively, while DA-FMD and GWO-FMD can identify the primary fault frequency

f_{i}

120 Hz under −13 dB noise conditions, their harmonic components are entirely submerged in noise, making accurate fault diagnosis challenging. This limitation primarily stems from the tendency of the DA and GWO algorithm to fall into local optima during the optimization process, restricting search precision and hindering their ability to achieve globally optimal solutions. Overall, the proposed method demonstrates superior robustness and noise immunity across all SNR conditions, effectively extracting fault information with remarkable clarity.

Furthermore, in terms of denoising performance, GWO-FMD outperforms DA-FMD, CEEMDAN, and fixed-parameter FMD. Therefore, subsequent experiments will focus on a comparative evaluation between GWO-FMD and the proposed method to further assess their fault diagnosis capabilities in greater depth.

Although traditional FMD possesses strong adaptive decomposition characteristics, it heavily relies on its parameter settings and is susceptible to noise interference. WAA-FMD enhances the identification accuracy of modal separation through a parameter-adaptive combination adjustment mechanism, significantly improving the ability to retain fault features and stability in noise reduction under strong noise conditions. Compared to the other methods, the advantages of WAA-FMD are shown in Table 3.

4. Open-Source Data Verification

4.1. Test Bearings and Test Stands

The rotating machinery test platform at the Korea Advanced Institute of Science and Technology (KAIST) [31] is shown in Figure 13. It is driven by a Siemens 3-horsepower asynchronous motor and operates at a subsynchronous speed of 3010 rpm through a gearbox with a speed ratio of 2.07, effectively avoiding power frequency interference.

The load system consists of an AHB-3A hysteresis brake and an M425 non-contact torque sensor, forming a closed-loop loading mechanism. Acoustic signal acquisition is performed using a PCB378B02 microphone, capturing bearing housing acoustic signals at a sampling rate of 51.2 kHz, with 8192 analysis points. The sensor measurement point is located at the radial position of bearing housing A.

The test object is an NSK 6205 DDU bearing, with a pitch diameter of 38.5 mm, a rolling element diameter of 7.90 mm, and nine rolling elements, as well as a contact angle of 0°. The calculated inner raceway fault frequency is 272.07 Hz, and the outer raceway fault frequency is 179.43 Hz.

The fault classification of the KAIST acoustic data is shown in Table 4, including five sets of acoustic data corresponding to inner raceway faults, outer raceway faults, and normal bearings under fault diameters of 0.3 mm and 1.0 mm. The KAIST dataset employed in this study only contains sound signals under the conditions of inner and outer raceway faults at 0.3 mm and 1.0 mm. Therefore, the analysis of fault feature extraction in this section is limited to these two typical fault types.

4.2. WAA-FMD Noise Reduction Effect Verification

Figure 14 shows the convergence curves of WAA-FMD and GWO-FMD during the denoising process of the sound signals for different fault types in the KAIST dataset. The results indicate that WAA-FMD can quickly converge within approximately 30 iterations across four types of fault signals, demonstrating high convergence stability, significantly improving computational efficiency, and ensuring the smoothness of parameter optimization. In contrast, GWO-FMD exhibits noticeable fluctuations in its convergence curve even after 50 iterations, making it difficult to obtain a stable and reliable combination of FMD parameters. Furthermore, the corresponding envelope entropy values are generally higher than those of WAA-FMD, leading to deficiencies in the global optimality and stability of the obtained FMD parameter solutions, which results in overall weaker convergence performance.

As shown in Figure 15, the acoustic signal of the bearing faults from the KAIST dataset were denoised using WAA-FMD and GWO-FMD, followed by optimal component extraction and envelope spectrum analysis. The proposed method accurately identifies inner and outer raceway fault frequencies along with their harmonic components, demonstrating superior noise immunity.

In contrast, GWO-FMD remains susceptible to low-frequency noise interference when the fault diameter is 0.3 mm, making fault characteristics difficult to discern. Although it can extract primary harmonics at a fault diameter of 1.0 mm, rotational frequency interference compromises diagnostic accuracy. For outer raceway faults, frequency drift occurs at 0.3 mm, while at 1.0 mm, certain harmonics remain affected by noise, increasing the risk of misclassification.

Overall, the proposed method exhibits greater stability, efficiency, and robustness in fault feature extraction across different fault sizes.

4.3. Ablation Study of the Improved Swin Transformer

The specific implementation steps of signal preprocessing are the following:

Signal Denoising Processing (WAA-FMD)

In order to extract the bearing fault information contained in the original signal, a denoising process is performed on the raw collected signal prior to the wavelet transform. This paper employs the proposed WAA-FMD to process the original signal. This method decomposes the original signal into multiple Intrinsic Modal Function (IMF) components, and utilizes the ISEI to filter out the optimal component that contains the richest fault information, which serves as the input for the subsequent feature extraction.

2.: Data Sample Construction (Sliding Window Segmentation)

After extracting the optimal IMF components, the sliding window method is used to segment the signal and construct the data samples required for training. There is an overlap of 548 data points between each sample, with a sliding window step size set to 1500 data points. The sample division method is illustrated in Figure 16.

3.: Normalization processing

To eliminate the interference of dimensional differences among the sensors on feature extraction, the time series sample

X = \{\begin{matrix} x_{1}, x_{2}, \dots, x_{i} \end{matrix}\}

was standardized. The data was processed to conform to a standard normal distribution, resulting in a mean of 0 and a standard deviation of 1. This preprocessing step aimed to enhance the stability and accuracy of model training, and the calculation formula is as follows:

\bar{x} = \frac{x - μ}{σ}

(19)

In this equation, μ is the mean of the sample data and σ is the standard deviation of the sample data.

4.: Feature extraction (CWT transformation)

Due to the difficulty of fixed-window functions in adapting to variations in different frequency components of the signal, CWT is employed to process the optimal components after denoising ISBOA-FMD. After experimentation, the Daubechies5 wavelet is selected as the basis function, with the total number of scales set to 1024. A scale sequence is constructed based on the central frequency of the wavelet, thereby generating the time–frequency representation. The transformation results are shown in Figure 17.

After the CWT transformation, the time–frequency representation dataset contains 600 data samples for each type of bearing fault, totaling 3000 samples. The dataset is then divided into training, validation, and test sets in a 6:2:2 ratio.

To validate the effectiveness of the LGAF-Swin Transformer fault diagnosis algorithm, a deep learning framework based on PyTorch v2.1.1 was employed, with Python v3.11.9 as the programming language. The experiments were conducted on a system equipped with an Intel(R) Core(TM) i7-14650HX processor, 16 GB of RAM, an NVIDIA GeForce RTX 4060 GPU, and a Windows 11 operating system. The batch size for each training session was set to 16 samples, and the optimization method used was stochastic gradient descent (SGD), which updates the model parameters through forward propagation. The learning rate was set to 0.001, and the classical cross-entropy loss function was utilized to ensure an efficient training process and precise fault diagnosis results.

To assess the effectiveness of each module incorporated into the LGAF-Swin Transformer, in the foundational Swin Transformer architecture, we designed four variants: ST-BA introduces a bidirectional enhancement (BA) layer at the model input, expanding the channel count from 256 to 512 to generate richer time–frequency features; ST-DSC replaces the first 3 × 3 standard convolution with a depthwise separable convolution, significantly reducing the number of parameters while accelerating shallow feature extraction; ST-LGAM inserts a local–global attention mechanism (LGAM) layer after each Transformer Block, utilizing 3 × 3 and 7 × 7 convolutions to capture multi-scale features, followed by self-attention fusion, enhancing the model’s capability for cross-region semantic modeling; and ST-AFSM adds an adaptive feature selection module (AFSM) before the final feature aggregation, effectively highlighting discriminative channels and suppressing noise redundancy through hybrid activation (HAL) and dynamic thresholding, thereby further improving the accuracy and robustness of fault identification. The hyperparameter settings for the Swin Transformer and its four variants are listed in Table 5, with all models adopting the same number of training epochs (50 epochs for the KAIST dataset and 100 epochs for the experimental dataset) to minimize control variables; an overview of the optimization strategies is presented in Table 6.

As illustrated in Figure 18a,c, the experimental results on the KAIST dataset reveal that the loss curves of the ST model remain relatively flat during the first 20 training epochs. Between epochs 22 and 39, the loss function declines more rapidly before stabilizing, albeit with minor fluctuations and rebounds. Eventually, the training and validation losses stabilize at 0.4362 and 0.5214, respectively, corresponding to the accuracy rates of 95.61% and 94.83% shown in Figure 18b,d.

In contrast, as depicted in Figure 18a,c, the loss function curves of the ST-BA, ST-DSC, ST-LGAM, and ST-AFSM models, incorporating the proposed enhancements, exhibit a rapid decline within the first 10 epochs before leveling off. Moreover, their training and validation losses are significantly lower than those of the ST model. Notably, LGAF-Swin Transformer achieves convergence as early as the eighth epoch, with final loss values of 0.0362 and 0.0378, which are substantially lower than those of other models. In the accuracy convergence curves presented in Figure 18b,d, ST-BA, ST-DSC, ST-LGAM, and ST-AFSM demonstrate a rapid increase in accuracy within the first 25 epochs, significantly outperforming the ST model. Most impressively, LGAF-Swin Transformer achieves full convergence within just 12 epochs, attaining 100% accuracy on both the training and validation sets.

As shown in Table 7, the accuracy of each model on the test set indicates that the improved LGAF-Swin Transformer model achieves an accuracy of 100% on the KAIST test set, whereas the unimproved ST model only reaches 93.67%. Additionally, the accuracy of the improved ST-BA, ST-DSC, ST-LGAM, and ST-AFSM models also surpasses that of the ST model. The experimental results demonstrate that incorporating the BA, DSC, LGAM, and AFSM modules effectively enhances the model’s performance and optimization.

A systematic evaluation of the LGAF-Swin Transformer model was conducted on the KAIST dataset, which consists of five types of faults and 3000 samples, with 20% allocated for testing. The diagnostic results of the different models are shown in Table 8. The model achieved a TPR and F1 Score of 1.000 across all categories, with an FNR of 0.000, and both the Kappa and AUC values also reached 1.000, demonstrating perfect classification accuracy and consistency. Compared to other mainstream deep learning models, the LGAF-Swin Transformer significantly outperformed in all metrics: the TPR of the Swin Transformer was 93.65%, with an F1 score of 93.67%, a Kappa value of 92.08%, and an AUC of 96.04%; ResNet had a TPR of 92.50%, an F1 score of 92.90%, a Kappa of 90.63%, and an AUC of 95.32%; LeNet achieved a TPR of 93.33%, an F1 score of 93.34%, a Kappa of 91.67%, and an AUC of 96.23%; and the metrics for CNN were relatively lower. In summary, the LGAF-Swin Transformer demonstrated exceptional accuracy, robustness, and stability in the multi-class rolling bearing fault diagnosis task, particularly excelling in the identification of subtle differences, fully validating the effectiveness and practical value of its multi-scale attention mechanism and adaptive feature selection strategy.

To further validate the superiority of LGAF-Swin Transformer, it is compared with deep learning models such as CNN, ResNet, and LeNet. The processed time–frequency images are used as the input for each model for comparative analysis. As shown in Table 9, the test accuracy of CNN is only 88.67%, which is lower than that of Swin Transformer (93.67%). Although ResNet and LeNet achieve over 92% accuracy on the test set, they still fall short of the standard Swin Transformer’s 93.67%. The diagnostic performance of LGAF-Swin Transformer is significantly superior to the other deep learning models.

5. Experimental Verification of Acoustic Signals

To further validate the effectiveness of the WAA-FMD method in acoustic signal denoising and the outstanding performance of the LGAF-Swin Transformer in fault diagnosis, this study conducts experimental verification by collecting real bearing acoustic signal data for evaluation.

5.1. Test Equipment and Data Collection

The experimental procedure of this study is illustrated in Figure 19, and includes three components: the experimental apparatus, the hardware wiring of the acquisition system, and the interface for sound signal acquisition [32].

Table 10 presents the performance parameters of the acoustic sensor. The acoustic data is sampled using the NI 9234 acquisition card, with its sampling frequency set at 51.2 kHz and 51,200 sampling points. The test subject is the FAG 30205-XL bearing, with its fault frequencies calculated based on its dimensional parameters, as shown in Table 11. The test conditions are set as follows: the rotational speed is constant at 5000 r/min, with an axial load of 5 KN and a radial load of 10 KN.

5.2. Experimental Verification of WAA-FMD Denoising on Measured Acoustic Signals

In Figure 20, photos under the microscope for an inner raceway fault, outer raceway fault, and rolling element fault are displayed. The specific fault locations are marked within the red boxes. The acoustic signals for these three fault types are collected for comparative verification.

Shown in Figure 21 are time domain diagrams of the sound signals under an inner raceway fault, outer raceway fault, and rolling element fault. The fault information of the bearing is completely submerged in the noise, the periodic impact cannot be seen, and diagnosis cannot be carried out. GWO-FMD and the proposed method, respectively, de-noise and reduce the above signals, and then the optimal components to perform envelope spectrum analysis are selected, respectively.

As shown in Figure 22, the convergence curves corresponding to the optimization of the FMD parameters for the proposed WAA-FMD method and GWO-FMD are presented in the context of the noise reduction processing of sound signals from three types of bearing faults measured in practice. The results indicate that WAA-FMD achieves rapid and stable convergence in approximately 35 iterations across all three fault signal categories, demonstrating an excellent convergence speed and parameter tuning capability. In contrast, the convergence curve of GWO-FMD under the same conditions tends to flatten but exhibits greater fluctuations, resulting in a significantly slower convergence speed. Additionally, GWO-FMD is prone to becoming trapped in local optima, making it challenging to achieve the global optimization of the FMD parameters.

Figure 23 shows the envelope spectrum after noise reduction for different fault types under different noise reduction methods. The envelope spectrum extraction effects of the optimal components after noise reduction between the proposed method and GWO-FMD are compared and analyzed. The results of the proposed method correspond to Figure 23a,c,e and the GWO-FMD results correspond to Figure 23b,d,f. In inner raceway fault processing, the method in this paper can clearly extract the rotation frequency, inner raceway fault frequency, and its second harmonic, whereas GWO-FMD can only identify the rotation frequency and fault frequency; the harmonics are submerged by noise and the information extraction is insufficient. In outer raceway fault diagnosis, the method in this paper can identify the rotation frequency, outer raceway fault frequency, and its fifth harmonic, while GWO-FMD can only extract the second harmonic and is accompanied by strong noise, so the noise reduction is incomplete. In rolling element fault processing, the method in this paper can completely extract the rotation frequency, outer raceway fault frequency, and its seventh harmonic, while GWO-FMD can only identify the rotation frequency and outer raceway fault frequency. The second and third harmonics are almost concealed by noise, and the fault characteristics are seriously lost.

The GWO failed to find a global optimal solution when optimizing FMD, resulting in insufficient decomposition, as well as a loss or aliasing of some key feature modes, making it difficult to effectively separate weak fault features, and ultimately affecting the accuracy of fault identification. In contrast, the proposed method is superior to GWO-FMD in feature retention and noise reduction performance, which verifies its reliability and effectiveness under complex operating conditions.

To further highlight the effectiveness and superiority of the proposed denoising method, this study applies the proposed method, GWO-FMD, DA-FMD, fixed-parameter FMD, and CEEMDAN (corresponding to A, B, C, D, E) to denoise the fault data of bearing 30205, and selects the optimal components of each method to compute the Harmonic Signal-to-Noise Ratio (HSN) and ISEI. In Figure 24, the HSN and ISEI after denoising with different methods are displayed.

From Figure 24a–c, it can be observed that the proposed method achieves HSN values of 4.37 dB, 7.68 dB, and 11.37 dB for inner raceway, outer raceway, and rolling element faults, respectively. The corresponding ISEI values are 4.86, 9.83, and 14.62, which are significantly higher than the HSN values of GWO-FMD after denoising (2.18 dB, 3.84 dB, and 6.49 dB, respectively) and the ISEI values of GWO-FMD (2.79, 4.52, and 8.71), as well as the results from the other methods.

The experimental results indicate that the proposed method is more effective in extracting fault features and preserving multi-order harmonic information when processing the bearing 30205 acoustic signals under complex working conditions, significantly enhancing the signal’s SNR. Especially in rolling element fault diagnosis, the HSN and ISEI values of the proposed method are much higher than those of the other methods, demonstrating its ability to maintain excellent denoising performance even under complex conditions. Although GWO-FMD outperforms DA-FMD, fixed-parameter FMD, and CEEMDAN in denoising, its inability to find the optimal FMD parameters results in an insufficient feature separation ability, making its denoising performance less effective than the proposed method in complex environments.

5.3. Experimental Verification of LGAF-Swin Transformer on Measured Acoustic Signals

After collecting sound signals for four bearing conditions—normal, inner raceway fault, outer raceway fault, and rolling element fault—the raw data undergoes WAA-FMD denoising. A sliding sampling method is then applied to construct data samples, followed by normalization to ensure consistency in feature scaling. Subsequently, the processed time domain signals are transformed into time–frequency representations using CWT.

In this experiment, a total of 3,602,192 sound data points were selected, with each sample comprising 2048 data points. A sliding window approach was utilized for data sampling, where adjacent samples overlapped by 548 data points, and the step size was set to 1500 data points. Ultimately, 600 samples were obtained for each bearing condition, resulting in a total of 2400 samples. The dataset distribution is presented in Table 12.

Figure 25a,b illustrate the loss function convergence curves and accuracy curves for LGAF-Swin Transformer, Swin Transformer, ResNet, LeNet, and CNN after 100 training iterations. As shown in these figures, the training loss for LGAF-Swin Transformer, Swin Transformer, and ResNet exhibits a significant decline. Notably, the proposed LGAF-Swin Transformer outperforms the other models in terms of training speed and convergence efficiency, achieving a loss convergence of 0.0016 after just 12 iterations. Furthermore, the training accuracy stabilizes at 99.67% as early as the ninth iteration, which is significantly higher than that of the other models.

Although Swin Transformer demonstrates strong feature extraction capabilities, its sliding window mechanism struggles to fully capture long-range dependencies, limiting its sensitivity to complex fault features. Additionally, the hierarchical design of Swin Transformer leads to information loss when handling high-dimensional data and causes boundary effects when processing feature edges, ultimately affecting the model’s accuracy and stability. As a result, Swin Transformer only converges after 34 iterations, with a final training loss of 0.1356—superior to ResNet, LeNet, and CNN. However, while its training accuracy stabilizes at 93.89%, surpassing ResNet (90.38%), LeNet (87.72%), and CNN (69.94%), it remains lower than that of LGAF-Swin Transformer.

Figure 25c,d present the loss function convergence curves and accuracy curves on the validation set for the five models. LGAF-Swin Transformer exhibits the fastest decline in validation loss, beginning to converge after 16 iterations and eventually stabilizing at 0.01432, which is lower than the other four models. Moreover, its validation accuracy reaches 99.56% after just eight iterations, significantly outperforming the other models. By overcoming Swin Transformer’s limitations in global feature capture and boundary feature processing, LGAF-Swin Transformer demonstrates a superior capability in extracting complex fault characteristics, leading to faster convergence and higher fault diagnosis accuracy.

As shown in Figure 26, the comparison of confusion matrices for different models on the test set demonstrates that the proposed LGAF-Swin Transformer achieves the highest accuracy of 98.13%. This represents an improvement of 8.34%, 14.80%, 17.71%, and 21.67% over the standard Swin Transformer (89.79%), ResNet (83.33%), LeNet (80.42%), and CNN (76.46%), respectively.

Although Swin Transformer possesses strong feature extraction capabilities, its windowed self-attention mechanism limits its ability to capture global features. Additionally, its fixed window size struggles to adapt to time–frequency images of varying resolutions, negatively impacting classification accuracy for complex faults. ResNet, despite mitigating the vanishing gradient problem through residual connections, suffers from feature redundancy, insufficient generalization on small datasets, and poor adaptability to unstructured signals, restricting its diagnostic performance in complex conditions. LeNet, as an early CNN architecture, is hindered by its shallow network depth and limited feature extraction capability, making it ineffective in recognizing high-dimensional fault patterns, resulting in lower accuracy. Traditional CNN constrained by a limited receptive field, lack of global feature modeling, and high computational complexity due to large parameter sizes, struggle to fully capture the intricate patterns of bearing fault signals, further reducing classification accuracy.

In contrast, LGAF-Swin Transformer, incorporating local–global adaptive multi-scale attention mechanisms, adaptive feature selection, and dynamic convolutional kernels, effectively overcomes these limitations. It exhibits superior feature perception and classification accuracy in complex environments, significantly enhancing the reliability and robustness of bearing fault diagnosis.

In the sound signals of rolling bearings collected through actual measurements (which consist of four categories: normal, inner ring fault, outer ring fault, and rolling element fault), the LGAF-Swin Transformer continues to demonstrate excellent fault identification capabilities. The diagnostic results of different models are shown in Table 13. Despite complex noise interference, the model achieved an accuracy of 98.13% and an F1 score of 98.23%. Compared to models such as Swin Transformer, ResNet, LeNet, and CNN, it attained a Kappa coefficient of 97.50% and an AUC value as high as 98.78%, both of which are at a leading level. This further indicates that the model not only performs exceptionally well under ideal conditions but also exhibits remarkable robustness and generalization capabilities in a high-noise background, demonstrating significant practical engineering value.

To ensure the stability of the experimental results and minimize the impact of random factors, this study conducts ten independent evaluations on the test set for five models: LGAF-Swin Transformer, Swin Transformer, ResNet, LeNet, and CNN (corresponding to A, B, C, D, E). The average accuracies of these models are 98.62%, 90.88%, 83.12%, 80.75%, and 75.82%, respectively (as shown in Figure 27). It is evident that the LGAF-Swin Transformer consistently maintains an accuracy above 98%, significantly outperforming the other models, demonstrating its outstanding fault diagnosis capability and robustness. In comparison, although the Swin Transformer achieves an accuracy of 90.88%, it is still limited by its insufficient global feature extraction ability. While ResNet mitigates the gradient vanishing issue with residual connections, it is prone to overfitting in small sample cases, with an accuracy of only 83.12%. As an earlier deep learning network, LeNet has a relatively shallow structure, making it less capable of adapting to complex fault features, resulting in an accuracy of only 80.75%. CNN, due to its lack of a deep information extraction ability and limited modeling capacity for high-dimensional time–frequency data, performs the worst, with an accuracy of just 75.82%. These experimental results further validate the superiority of LGAF-Swin Transformer under complex operating conditions and demonstrate the effectiveness of the proposed improvements. The final accuracy after averaging the results of 10 trials is shown in Table 14.

5.4. Analysis of Test Results

This study verified the noise reduction performance of the WAA-FMD method under various signal-to-noise ratios using simulated signals. At a noise level of −5 dB, the signal-to-noise ratio (SNR) improved to 11.97 dB (with a fixed-parameter FMD of 6.13 dB and CEEMDAN at 4.25 dB). The normalized phase correlation coefficient (NCC) reached 0.94, and the mean squared error (MSE) was only 0.11. Under extreme noise conditions of −13 dB, the SNR still achieved 5.38 dB (with GWO-FMD at 0.85 dB), the NCC was 0.81, and the MSE was 0.43. Notably, the fourth harmonic of the fault frequency was clearly extracted from the envelope spectrum, while traditional methods could only identify the fundamental frequency. In the KAIST open-source dataset, the ISEI of the fault components extracted by WAA-FMD reached 9.83 for inner raceway faults (1.0 mm), compared to 4.52 for GWO-FMD, representing an improvement of 117%. Meanwhile, the LGAF-Swin Transformer achieved a classification accuracy of 100% on the KAIST dataset, outperforming Swin Transformer (93.67%) and ResNet (83.33%), with both the F1 score and AUC value reaching 100%. In the validation of the measured data, the ISEI of WAA-FMD for rolling element faults was 14.62, compared to 8.71 for GWO-FMD, indicating a 68% improvement. The test accuracy of LGAF-Swin Transformer was 98.13%, with an F1 score of 98.23% and a Kappa value of 97.50%, significantly surpassing CNN (76.46% accuracy) and Swin Transformer (89.79% accuracy). These results demonstrate that this method exhibits excellent noise suppression, feature retention, and classification capabilities even under strong noise conditions.

5.5. Generalization Ability Tests of WAA-FMD and LGAF-Swin Transformer

To verify the diagnostic performance of this method across various bearings, we conducted an experimental analysis on the FAG NU218-E-XL-TVP2 bearing and the FAG 7205-B-XL-TVP bearing. During the testing process, the NU218 radial load was set at 10 kN with a rotational speed of 300 r/min, while the 7205 bearing experienced an axial load of 8 kN and a radial load of 8 kN, maintaining a constant rotational speed of 12,000 r/min. All other testing equipment and experimental conditions remained consistent with those previously described. As illustrated in Table 15 and Table 16, the LGAF-Swin Transformer achieved the best performance metrics across both test sets. For the NU218 bearing, it attained a true positive rate (TPR) of 98.54%, an F1 score of 98.64%, a Kappa value of 98.05%, and an Area Under the Curve (AUC) value of 99.03%. Similarly, in the 7205 bearing test set, it maintained high accuracy, with a TPR of 98.19%, an F1 score of 98.40%, a Kappa value of 97.50%, and an AUC value of 98.75%. These results robustly demonstrate the model’s exceptional diagnostic capabilities across different equipment and operating conditions. In contrast, the original Swin Transformer, while possessing some fault identification capability, performed slightly less effectively under complex working conditions. Traditional models such as ResNet, LeNet, and CNN exhibit noticeable deficiencies in key metrics like F1, Kappa, and AUC, particularly demonstrating significant performance fluctuations under high-speed rotational conditions, as well as a limited generalization ability. Overall, the LGAF-Swin Transformer demonstrates remarkable robustness and broad adaptability in complex and variable industrial environments. Ten trials were conducted for each type of bearing, and the average values were computed to ensure the scientific rigor of the experiments while minimizing randomness. The final experimental results of the model proposed in this paper are presented in Table 17.

6. Conclusions and Prospects

6.1. Conclusions

This study validated the noise reduction performance of the WAA-FMD method across various signal-to-noise ratios (SNRs) using simulated signals. At a noise level of −5 dB, the SNR increased to 11.97 dB, with fixed-parameter FMD yielding 6.13 dB and CEEMDAN achieving 4.25 dB. The normalized correlation coefficient (NCC) reached 0.94, while the mean squared error (MSE) was only 0.11. Under extreme noise conditions of −13 dB, the SNR still attained 5.38 dB, with GWO-FMD producing 0.85 dB. The NCC was 0.81, and the MSE was 0.43, enabling the clear extraction of the fourth harmonic of the fault frequency from the envelope spectrum, whereas traditional methods could only identify the fundamental frequency. In the KAIST open-source dataset, the ISEI for fault components extracted by WAA-FMD reached 9.83 for inner ring faults (1.0 mm), surpassing GWO-FMD’s score of 4.52, marking a 117% improvement. When combined with the LGAF-Swin Transformer, the classification accuracy achieved 100%, compared to 93.67% for the Swin Transformer and 83.33% for ResNet. The F1 score was 0.983, with an AUC value of 0.999. In the validation of the measured data, WAA-FMD achieved an ISEI of 14.62 for rolling element faults, outperforming GWO-FMD’s score of 8.71, which represents a 68% improvement. The LGAF-Swin Transformer also demonstrated a test accuracy of 98.13%, with true positive rates (TPRs) of 0.974 for inner ring faults, 0.983 for outer ring faults, and 0.975 for rolling element faults, yielding an F1 score of 0.987 and a Kappa statistic of 0.975. This significantly outperformed CNN, which had an accuracy of 76.46%, and Swin Transformer, with an accuracy of 89.79%. The main contributions and conclusions of this study are summarized as follows:

This paper introduces an adaptive approach for optimizing FMD key parameters using WAA, which dynamically updates filter length and cyclic period by minimizing envelope entropy. This effectively mitigates the issues of iterative complexity and information overload in traditional FMD when processing long-sequence signals. A comparative experiment was conducted using inner raceway fault simulation signals with noise levels ranging from −5 dB to −13 dB, and the results demonstrate that, compared to GWO-FMD, DA-FMD, fixed-parameter FMD, and CEEMDAN, the proposed method achieves optimal performance in the SNR, NCC, and MSE metrics. Moreover, at −5 dB, −9 dB, and −13 dB noise levels, the envelope spectrum of the denoised signal reveals richer fault frequency harmonic components. Validation using the KAIST sound dataset further confirms that the proposed method outperforms GWO-FMD in both denoising effectiveness and fault feature extraction.
This paper proposes the LGAF-Swin Transformer model, which enhances receptive fields and computational efficiency through a bilateral expansion strategy and DSC. The model incorporates an LGAM and dynamic convolution kernels for optimized feature extraction, while an AFSM reduces feature redundancy. The AP module strengthens key feature focus, and an HAL replaces Softmax to improve complex feature fitting capabilities. Ablation experiments on the KAIST dataset indicate that the classification accuracy of ST, ST-BA, ST-DSC, ST-LGAM, ST-AFSM, and LGAF-Swin Transformer are 93.67%, 98.17%, 96.83%, 98.83%, 97.33%, and 100%, respectively, validating the effectiveness of the BA, DSC, LGAM, and AFSM modules in optimizing the model. In comparison experiments, CNN achieves an 88.67%/86.83% accuracy on the test set, while ResNet and LeNet, despite surpassing 90%, still fall short of the standard Swin Transformer’s 93.67%/94.83% performance.
An experimental analysis on real-world data shows that, for inner raceway, outer raceway, and rolling element faults, the HSN values of the denoised signals using WAA-FMD reach 4.37 dB, 7.68 dB, and 11.37 dB, respectively, while the ISEI indicators are 4.86, 9.83, and 14.62, all significantly higher than those of the four baseline methods. The LGAF-Swin Transformer achieves an average accuracy of 98.62% on the real-world audio dataset, significantly outperforming Swin Transformer (90.88%), ResNet (83.12%), LeNet (80.75%), and CNN (75.82%), demonstrating superior feature perception and classification capabilities in complex environments.

6.2. Prospects

Limitations of this study: Although the LGAF-Swin Transformer demonstrates excellent performance in suppressing Gaussian noise and diagnostic capabilities under laboratory conditions, challenges remain regarding computational load and memory usage when processing low sampling rates or long sequences. The WAA-FMD is highly sensitive to noise levels, often requiring significant time for parameter tuning. Furthermore, this study focuses solely on a single fault type and additive Gaussian noise, lacking a sufficient recognition capability for multi-source composite faults or impulse noise environments. Experiments were conducted under conditions where the bearing and acoustic sensors were fixed with no relative motion, and the applicability of this method in dynamic conditions such as high-speed trains or machine tools has yet to be validated.
Expanding fault types: Future research will introduce various fault modes such as cage cracks and lubrication faults, constructing a composite fault identification framework to enhance the model’s diagnostic capability for multi-source faults.
Multi-modal sensor fusion: Data will be synchronously collected from multiple sensors, including vibration, temperature, and current, employing cross-modal attention mechanisms or graph neural networks to collaboratively model the features from each channel, and achieving information complementarity and noise suppression, thereby further improving the robustness and accuracy of fault diagnosis.

Author Contributions

Author Contributions: Conceptualization, H.W. (Hengdi Wang) and H.W. (Haokui Wang); methodology, H.W. (Hengdi Wang) and H.W. (Haokui Wang); software, J.X.; validation, Z.M. and J.X.; formal analysis, H.W. (Haokui Wang); investigation, J.X.; resources, Z.M.; data curation, H.W. (Haokui Wang); writing—original draft preparation, H.W. (Haokui Wang); writing—review and editing, H.W. (Haokui Wang); visualization, J.X.; supervision, H.W. (Hengdi Wang); project administration, J.X.; funding acquisition, H.W. (Hengdi Wang) All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Program Project of the State Administration for Market Regulation (2023MK039); Ningbo Key Research and Development Program and “Unveiling and Leading” Project (2023T016); and National Natural Science Foundation of China Project: Research on Tribological Effects and Behavior Regulation Mechanisms of Magnetorheological Fluids (52305190) and Research on Grease Degradation Patterns and State Evaluation for High-power Wind Turbine Main Bearings (52105182).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Zikui Ma was employed by the company Schaeffler Trading (Shanghai) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Wang, H.; Deng, S.; Yang, J.; Liao, H. A fault diagnosis method for rolling element bearing (REB) based on reducing REB foundation vibration and noise-assisted vibration signal analysis. J. Mech. Eng. 2019, 233, 2574–2587. [Google Scholar] [CrossRef]
Chen, J.; Xu, T.; Huang, Z.; Sun, T.; Li, X.; Ji, L.; Yang, H. Fault diagnosis of rolling bearings based on acoustic signals. J. Vib. Shock. 2023, 42, 237–244. [Google Scholar]
Cempel, C. Diagnostically oriented measures of vibration acoustical processes. J. Sound Vib. 1980, 73, 547–561. [Google Scholar] [CrossRef]
Chen, S.; Peng, Z.; Zhou, P. Review of Signal Decomposition Theory and Its Applications in Machine Fault Diagnosis. J. Mech. Eng. 2020, 56, 91–107. [Google Scholar]
Wu, Z.; Huang, N. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
Yeh, J.; Shieh, J.; Huang, N. Complementary Ensemble Empirical Mode Decomposition: A Novel Noise Enhanced Data Analysis Method. J. Adv. Adapt. Data Anal. 2010, 2, 135–156. [Google Scholar] [CrossRef]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
Miao, Y.; Zhang, B.; Li, C.; Lin, J.; Zhang, D. Feature mode decomposition: New decomposition theory for rotating machinery fault diagnosis. IEEE Trans. Ind. Electron. 2022, 70, 1949–1960. [Google Scholar] [CrossRef]
Shi, P.; Wu, S.; Yu, Y.; Zhang, Y.; Xu, F. Rolling bearing fault diagnosis based on attention mechanism and depth residual network. J. Yan Shan Univ. 2024, 48, 39–47. [Google Scholar]
Zhang, H.; Lu, G.; Zhan, M.; Zhang, B. Semi-supervised classification of graph convolutional networks with Laplacian rank constraints. Neural Process. Lett. 2022, 54, 2645–2656. [Google Scholar] [CrossRef]
Hu, Y.; Guo, L.; Zhang, Z.; Kang, J.; Cui, C.; Qiao, G. Fault diagnosis method based on optimized convolutional neural network for chemical process. J. Yan Shan Univ. 2024, 48, 550–560. [Google Scholar]
Wu, H.; Xu, Y.; Zhou, J.; Xie, X.; Peng, D. Fault diagnosis of pumping unit based on convolutional neural network. J. Yan Shan Univ. 2024, 48, 30–38. [Google Scholar]
Ding, X.; He, Q. Energy-fluctuated multiscale feature learning with deep convnet for intelligent spindle bearing fault diagnosis. J. IEEE Trans. Instrum. Meas. 2017, 66, 1926–1935. [Google Scholar] [CrossRef]
Lou, X.; Loparo, K.A. Bearing fault diagnosis based on wavelet transform and fuzzy inference. Mech. Syst. Signal Process. 2004, 18, 1077–1095. [Google Scholar] [CrossRef]
Guo, X.; Chen, L.; Shen, C. Hierarchical adaptive deep convolution neural network and its application to bearing fault diagnosis. Measurement 2016, 93, 490–502. [Google Scholar] [CrossRef]
Ruan, D.; Wang, J.; Yan, J.; Gühmann, C. CNN parameter design based on fault signal analysis and its application in bearing fault diagnosis. Adv. Eng. Inform. 2023, 55, 101877. [Google Scholar] [CrossRef]
Fernández-Francos, D.; Martínez-Rego, D.; Fontenla-Romero, O.; Alonso-Betanzos, A. Automatic bearing fault diagnosis based on one-class ν-SVM. Comput. Ind. Eng. 2013, 64, 357–365. [Google Scholar] [CrossRef]
Liang, P.; Wang, W.; Yuan, X.; Liu, S.; Zhang, L.; Cheng, Y. Intelligent fault diagnosis of rolling bearing based on wavelet transform and improved ResNet under noisy labels and environment. Eng. Appl. Artif. Intell. 2022, 115, 105269. [Google Scholar] [CrossRef]
Song, B.; Liu, Y.; Fang, J.; Liu, W.; Zhong, M.; Liu, X. An optimized CNN-BiLSTM network for bearing fault diagnosis under multiple working conditions with limited training samples. Neurocomputing 2024, 574, 127284. [Google Scholar] [CrossRef]
Yu, Z.; Zhang, C.; Deng, C. An improved GNN using dynamic graph embedding mechanism: A novel end-to-end framework for rolling bearing fault diagnosis under variable working conditions. Mech. Syst. Signal Process. 2023, 200, 110534. [Google Scholar] [CrossRef]
Liu, X.; Liu, S.; Xiang, J.; Sun, R. A transfer learning strategy based on numerical simulation driving 1D Cycle-GAN for bearing fault diagnosis. Inf. Sci. 2023, 642, 119175. [Google Scholar] [CrossRef]
Ni, Z.; Tong, Y.; Song, Y.; Wang, R. Enhanced Bearing Fault Diagnosis in NC Machine Tools Using Dual-Stream CNN with Vibration Signal Analysis. Processes 2024, 12, 1951. [Google Scholar] [CrossRef]
Wang, L.; Ping, D.; Wang, C.; Jiang, S.; Shen, J.; Zhang, J. Fault diagnosis of rotating machinery bearings based on improved DCNN and WOA-DELM. Processes 2023, 11, 1928. [Google Scholar] [CrossRef]
Zhang, W.; Peng, G.; Li, C.; Chen, Y.; Zhang, Z. A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals. Sensors 2017, 17, 425. [Google Scholar] [CrossRef] [PubMed]
Cheng, J.; De, W. Weighted average algorithm: A novel meta-heuristic optimization algorithm based on the weighted average position concept. Knowl. Based Syst. 2024, 305, 112564. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
Wang, H.; Wang, Z.; He, Y. Application of parameter adaptive FMD in early fault diagnosis of bearings. J. Vib. Eng. 2025, 4, 1–11. (In Chinese) [Google Scholar]
Qiu, S.; Anwar, S.; Barnes, N. Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1757–1767. [Google Scholar]
Shao, Y. Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration. arXiv 2024, arXiv:2411.09604. [Google Scholar]
Cheng, R.; Razani, R.; Taghavi, E.; Li, E.; Liu, B. 2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12547–12556. [Google Scholar]
Jung, W.; Kim, S.-H.; Yun, S.-H.; Bae, J.; Park, Y.-H. Vibration, acoustic, temperature, and motor current dataset of rotating machine under varying operating conditions for fault diagnosis. Data Brief 2023, 48, 109049. [Google Scholar] [CrossRef]
Wang, H.; Xie, J. Fault Diagnosis of Rolling Bearings Based on Acoustic Signals in Strong Noise Environments. Appl. Sci. 2025, 15, 1389. [Google Scholar] [CrossRef]

Figure 1. Convergence curves of the fitted values for the selected functions under different optimization methods. (a) Convergence curve of the F1 test function; (b) convergence curve of the F2 test function; (c) convergence curve of the F3 test function; (d) convergence curve of the F4 test function.

Figure 2. Flow chart of WAA-optimized FMD.

Figure 3. Model structure diagram of Swin Transformer.

Figure 4. Structural diagram of LGAF-Swin Transformer model.

Figure 5. Structure diagram of depthwise convolution.

Figure 6. Structure diagram of pointwise convolution.

Figure 7. Structure diagram of local–global attention.

Figure 8. Fault Diagnosis Model of LGAF-Swin Transformer.

Figure 9. Convergence curves of the algorithm optimizing FMD under different SNRs. (a) Optimization convergence curves at −5 dB; (b) optimization convergence curves at −7 dB; (c) optimization convergence curves at −9 dB; (d) optimization convergence curves at −11 dB; (e) optimization convergence curves at −13 dB.

Figure 10. Time domain diagrams of simulated inner raceway faults under different SNRs. (a) Waveform diagram of signal impact; (b) time domain plot with −5 dB noise added; (c) time domain plot with −7 dB noise added; (d) time domain plot with −9 dB noise added; (e) time domain plot with −11 dB noise added; (f) time domain plot with −13 dB noise added.

Figure 11. Noise reduction indicators of various methods under different SNRs. (a) SNR of different methods under various SNRs; (b) MSE of different methods under various SNRs; (c) NCC of different methods under various SNRs.

Figure 12. Envelope spectra of optimal components of each noise reduction method under different SNRs. (a) CEEMDAN (−5 dB); (b) CEEMDAN (−9 dB); (c) CEEMDAN (−13 dB); (d) fixed-parameter FMD (−5 dB); (e) fixed-parameter FMD (−9 dB); (f) fixed-parameter FMD (−13 dB); (g) DA-FMD (−5 dB); (h) DA-FMD (−9 dB); (i) DA-FMD (−13 dB); (j) GWO-FMD (−5 dB); (k) GWO-FMD (−9 dB); (l) GWO-FMD (−13 dB); (m) proposed method (−5 dB); (n) proposed method (−9 dB); (o) proposed method (−13 dB).

Figure 13. Structural diagram of the test bench.

Figure 14. Convergence curves of different algorithms optimizing FMD. (a) Inner raceway (0.3 mm); (b) inner raceway (1.0 mm); (c) outer raceway (0.3 mm); (d) outer raceway (1.0 mm).

Figure 15. Envelope spectrum of the optimal components decomposed by WAA-FMD and GWO-FMD. (a) WAA-FMD (inner raceway fault with a diameter of 0.3 mm); (b) GWO-FMD (inner raceway fault with a diameter of 0.3 mm); (c) WAA-FMD (inner raceway fault with a diameter of 1.0 mm); (d) GWO-FMD (inner raceway fault with a diameter of 1.0 mm); (e) WAA-FMD (outer raceway fault with a diameter of 0.3 mm); (f) GWO-FMD (outer raceway fault with a diameter of 0.3 mm); (g) WAA-FMD (outer raceway fault with a diameter of 1.0 mm); (h) GWO-FMD (outer raceway fault with a diameter of 1.0 mm).

Figure 16. Data graph of sample segmentation.

Figure 17. Transformation of time–frequency graph. (a) Time domain waveform of inner raceway fault (diameter: 0.3 mm); (b) time domain waveform of inner raceway fault (diameter: 1.0 mm); (c) time–frequency representation of inner raceway fault (diameter: 0.3 mm); (d) time–frequency representation of inner raceway fault (diameter: 1.0 mm); (e) time domain waveform of outer raceway fault (diameter: 0.3 mm); (f) time domain waveform of outer raceway fault (diameter: 1.0 mm); (g) time–frequency representation of outer raceway fault (diameter: 0.3 mm); (h) time–frequency representation of outer raceway fault (diameter: 1.0 mm); (i) time domain waveform of the normal state; (j) time–frequency representation of the normal state.

Figure 18. Comparison of different models. (a) Iteration curve of training loss function; (b) accuracy curve for training set; (c) iteration curve of validation loss function; (d) accuracy curve for validation set.

Figure 19. Experimental flowchart.

Figure 20. Images of bearings with different fault types. (a) Image of inner raceway fault; (b) image of outer raceway fault; (c) image of rolling element fault.

Figure 21. Time domain graphs under different fault types. (a) Time domain graph of the inner raceway fault; (b) time domain graph of the outer raceway fault; (c) time domain graph of the rolling element fault.

Figure 22. Convergence curves of different algorithms optimizing FMD. (a) Inner raceway; (b) outer raceway; (c) rolling element.

Figure 23. Envelope spectra of acoustic signals for different fault types after denoising with different methods. (a) Envelope spectrum of the inner raceway fault after denoising using the proposed method; (b) envelope spectrum of the inner raceway fault after denoising using GWO-FMD; (c) envelope spectrum of the outer raceway fault after denoising using the proposed method; (d) envelope spectrum of the outer raceway fault after denoising using GWO-FMD; (e) envelope spectrum of the rolling element fault after denoising using the proposed method; (f) envelope spectrum of the rolling element fault after denoising using GWO-FMD.

Figure 24. HSN and ISEI values of optimal components under different noise reduction methods. (a) HSN and ISEI under the inner raceway fault; (b) HSN and ISEI under the outer raceway fault; (c) HSN and ISEI under the rolling element fault.

Figure 25. Comparison of different models under experimental datasets. (a) Iteration curve of training set loss function; (b) accuracy curve for training set; (c) iteration curve of training set loss function; (d) accuracy curve for training set.

Figure 26. Confusion matrices for different models under test sets. (a) Confusion matrix of LGAF-Swin Transformer; (b) confusion matrix of Swin Transformer; (c) confusion matrix of ResNet; (d) confusion matrix of Lenet; (e) confusion matrix of CNN.

Figure 27. Ten tests using different models on the test set.

Table 1. Summary of the research methods in this article.

Algorithm Type	Algorithm Name	Advantages	Disadvantages
Parameter Optimization	DA	Few parameters and easy to adjust.	It is prone to premature convergence, and its efficiency severely decreases in high-dimensional problems.
	GWO	Strong global search capability.	Prone to falling into local optima, with suboptimal search precision.
	WAA	It can demonstrate relatively strong optimization efficiency and robustness, and possesses adaptive capabilities.	The algorithm’s weight settings are sensitive (this paper sets its weights based on the literature [25] and experimental results).
Signal Processing	CEEMDAN	Can suppress modal aliasing issues with minimal noise residue.	The algorithm has a high computational load, is sensitive to noise, and suffers from endpoint effects.
Signal Processing	FMD	The computational efficiency is high, enabling the rapid decomposition and processing of bearing signals. The complexity of the signals and noise interference have a minimal impact on the results of the algorithm’s decomposition.	Lack of parameter self-adaptation capability (WAA is employed in this paper to optimize the key parameters of FMD).
Deep Learning	CNN	The model structure is relatively flexible.	Sensitive to noise; requires a high volume of data.
	LeNet	The structure is relatively simple, making it easy to understand and implement.	The feature extraction capability is relatively weak, and the model capacity is limited.
	ResNet	Strong feature representation and adaptability capabilities.	High computational complexity and large model parameter size.
Deep Learning	Swin Transformer	Strong capability in capturing global features, high flexibility, and excellent adaptability.	The insufficient ability to capture long-range dependencies, along with high computational resource and time costs, is improved in this paper through the use of LGAF.

Table 2. Parameter table of LGAF-Swin Transformer model.

Layer Name	Output Size	Kernel Size	Stride	Number of Layers	Output Channels
Image	256 × 256	—	—	—	3
BA Layer	512 × 512	3 × 3	1	1	64
DCS block	128 × 128	3 × 3	2	1	256
AP Layer	128 × 128	3 × 3	1	1	256
LGAM block	128 × 128	3 × 3	1	2	256
LGA block	64 × 64	3 × 3	2	1	256
AFSM block	64 × 64	3 × 3	1	1	256
DC Layer	32 × 32	—	—	1	256
Global Pool	32 × 32	4 × 4	2	1	512
HAL	32 × 32	1 × 1	—	1	—
FC Layer	1 × 4	—	—	—	4

Table 3. Improvements in WAA-FMD.

Method	Characteristics	Improvements in WAA-FMD
CEEMDAN	A variant of empirical mode decomposition based on additive noise, suitable for non-stationary signals.	CEEMDAN is sensitive to noise and lacks an adaptive mechanism for parameter tuning. The WAA-FMD method introduces the ISEI to dynamically select the optimal IMF, enhancing the objectivity and accuracy of modal selection.
Fixed-parameter FMD	The original FMD, with fixed parameters, cannot adapt to different operating conditions.	WAA-FMD can adaptively adjust its FMD parameters under varying noise intensities and signal characteristics, thereby avoiding issues of under-decomposition or over-decomposition.
DA-FMD	Using the DA for FMD parameter optimization, with certain search capabilities.	DA has the issue of a strong exploratory capability but slow convergence, while WAA integrates multi-solution information using a weighted strategy, resulting in faster convergence and better stability.
GWO-FMD	Using GWO to optimize FMD demonstrates a good convergence capability but is prone to falling into local optima.	WAA comprehensively considers the quality of multiple solutions by enhancing population diversity through a balancing strategy, thereby avoiding local optima and improving global search capability.

Table 4. Fault types of bearing 6205.

Fault Type	Fault Diameter (mm)	Speed (RPM)
Normal	-	3010
Inner raceway fault	0.3	3010
Inner raceway fault	1.0	3010
Outer raceway fault	0.3	3010
Outer raceway fault	1.0	3010

Table 5. Hyperparameter settings.

Parameter Name	Numerical/Configuration	Explanation
Batch Size	16	GPU memory constraints (RTX 4060, 16 GB VRAM) introduce moderate gradient noise with small batch sizes to prevent overfitting.
Maximum Epochs	50/100	Maximum training epochs.
Early Stopping (patience)	10	Early termination if validation loss shows no improvement for 10 consecutive rounds to prevent overfitting.
Optimizer	SGD (lr = 0.001; momentum = 0.9; weight_decay = 1 × 10⁻⁴)	Enhances generalization capability.
Learning Rate Scheduler	StepLR (γ = 0.1 at epochs = [30,40])	Grid search validation (range 0.0001–0.01), balancing convergence speed and stability.
Loss Function	CrossEntropyLoss	Applicable to multi-class classification tasks.
Input dimensions	256 × 256	CWT outputs time–frequency map dimensions.
Patch Size	4 × 4	Consistent with the original Swin text.
Embed Dim	100	Embedding dimension.
Window Size	7	Local attention window size.
MLP Ratio	4.0	FFN hidden layer multiplier.

Table 6. Overview of optimization strategies.

Variant	Module	Key Parameters	Explanation
ST-BA	BA	Kernel size: 3 × 3; Channels: 256→512.	Enhance the density of the time–frequency diagram; strengthen the input.
ST-DSC	DSC	Depthwise 3 × 3 + Pointwise 1 × 1; Channel: 256→256; BatchNorm + ReLU.	Reduce the number of parameters; accelerate shallow feature extraction.
ST-LGAM	LGAM	Local convolution 3 × 3; global convolution 7 × 7.	Integrate multi-scale features; enhance cross-region modeling.
ST-AFSM	AFSM	HAL activation β = 0.5; dynamic threshold θ = 0.7.	Select high-discriminative channels and suppress noise redundancy.

Table 7. Accuracy of test set.

Model	KAIST Accuracy (%)
ST	93.67
ST-BA	98.17
ST-DSC	96.83
ST-LGAM	98.33
ST-AFSM	97.33
LGAF-Swin Transformer	100

Table 8. Diagnostic results of different models.

Model	TPR	FNR	F1 Score	Kappa Value	AUC Value
LGAF-Swin Transformer	100%	0	100%	100%	100%
Swin Transformer	93.65%	6.35%	93.67%	92.08%	96.04%
ResNet	92.50%	7.50%	92.90%	90.63%	95.32%
LeNet	93.33%	6.67%	93.34%	91.67%	96.23%
CNN	88.67%	11.33%	88.68%	85.83%	93.98%

Table 9. Accuracy of different models under test sets.

Model	KAIST Accuracy (%)
CNN	88.67
Resnet	93.33
Lenet	92.50
Swin Transformer	93.67
LGAF-Swin Transformer	100.00

Table 10. Parameters of the acoustic sensor.

Parameter	Value
Sensitivity, mV/Pa	50
Dynamic range, dB	20~142
Frequency range, Hz	10~20,000
Size, φ (mm)	12.7
Frequency response characteristics	free field

Table 11. Fault frequencies of bearing 30205.

Fault Characteristic Frequency	Value
Inner raceway fault frequency $f_{i}$ , Hz	775
Outer raceway fault frequency $f_{o}$ , Hz	557
Rolling element fault frequency $f_{b}$ , Hz	241

Table 12. Explanation of test data samples.

Fault Types	Training Set	Validation Set	Test Set	Label Value
Normal Condition	360	120	120	0
Inner raceway fault	360	120	120	1
Outer raceway fault	360	120	120	2
Rolling element fault	360	120	120	3
Total	1440	480	480

Table 13. Diagnostic results of different models.

Model	TPR	FNR	F1 Score	Kappa Value	AUC Value
LGAF-Swin Transformer	98.13%	1.87%	98.23%	97.50%	98.78%
Swin Transformer	89.79%	10.21%	89.80%	86.39%	93.19%
ResNet	83.33%	16.67%	83.40%	77.78%	88.89%
LeNet	80.42%	19.58%	80.15%	73.89%	86.94%
CNN	76.46%	23.54%	76.43%	68.61%	84.31%

Table 14. Accuracy of different models under test sets.

Model	KAIST Accuracy (%)
CNN	75.82 ± 2.49
Resnet	83.12 ± 3.33
Lenet	80.75 ± 3.25
Swin Transformer	90.88 ± 1.50
LGAF-Swin Transformer	98.62 ± 0.29

Table 15. Metrics of various models on the NU218 test set.

Model	TPR	FNR	F1 Score	Kappa Value	AUC Value
LGAF-Swin Transformer	98.54%	1.46%	98.64%	98.05%	99.03%
Swin Transformer	91.67%	8.33%	91.60%	88.89%	95.00%
ResNet	84.58%	15.42%	84.56%	79.44%	90.90%
Lenet	79.17%	20.83%	79.86%	82.22%	87.50%
CNN	77.92%	22.08%	77.75%	70.56%	86.67%

Table 16. Metrics of each model on the 7205 test set.

Model	TPR	FNR	F1 Score	Kappa Value	AUC Value
LGAF-Swin Transformer	98.19%	1.81%	98.40%	97.50%	98.75%
Swin Transformer	90.52%	9.48%	90.20%	88.05%	94.03%
ResNet	82.92%	17.08%	82.80%	77.23%	88.61%
Lenet	76.88%	23.12%	76.90%	69.17%	84.58%
CNN	74.79%	25.21%	0.750%	66.39%	83.19%

Table 17. Detection results of generalization test.

Bearing Model Number	Accuracy (%)
NU 218	98.75 ± 0.21
7205	98.44 ± 0.52

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Wang, H.; Xie, J.; Ma, Z. Fault Diagnosis of Rolling Bearing Acoustic Signal Under Strong Noise Based on WAA-FMD and LGAF-Swin Transformer. Processes 2025, 13, 2742. https://doi.org/10.3390/pr13092742

AMA Style

Wang H, Wang H, Xie J, Ma Z. Fault Diagnosis of Rolling Bearing Acoustic Signal Under Strong Noise Based on WAA-FMD and LGAF-Swin Transformer. Processes. 2025; 13(9):2742. https://doi.org/10.3390/pr13092742

Chicago/Turabian Style

Wang, Hengdi, Haokui Wang, Jizhan Xie, and Zikui Ma. 2025. "Fault Diagnosis of Rolling Bearing Acoustic Signal Under Strong Noise Based on WAA-FMD and LGAF-Swin Transformer" Processes 13, no. 9: 2742. https://doi.org/10.3390/pr13092742

APA Style

Wang, H., Wang, H., Xie, J., & Ma, Z. (2025). Fault Diagnosis of Rolling Bearing Acoustic Signal Under Strong Noise Based on WAA-FMD and LGAF-Swin Transformer. Processes, 13(9), 2742. https://doi.org/10.3390/pr13092742

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fault Diagnosis of Rolling Bearing Acoustic Signal Under Strong Noise Based on WAA-FMD and LGAF-Swin Transformer

Abstract

1. Introduction

2. Methodologies

2.1. Noise Reduction in Signals

2.2. Swin Transformer

2.3. Model of LGAF-Swin Transformer

2.3.1. The Structure of the Network

2.3.2. Enhancement of Data and Extraction of Features

2.3.3. Depthwise Separable Convolution

2.3.4. Local–Global Attention Mechanism

2.3.5. Adaptive Feature Selection Module

2.3.6. Fault Diagnosis Model of LGAF-Swin Transformer

3. Simulation Verification Analysis

3.1. Indicators of Noise Reduction Effect

3.1.1. Signal-to-Noise Ratio (SNR)

3.1.2. Normalized Correlation Coefficient (NCC)

3.1.3. Mean Square Error (MSE)

3.2. Simulated Signals with Different Signal-to-Noise Ratios

3.3. Performance Evaluation of Different Noise Reduction Methods

4. Open-Source Data Verification

4.1. Test Bearings and Test Stands

4.2. WAA-FMD Noise Reduction Effect Verification

4.3. Ablation Study of the Improved Swin Transformer

5. Experimental Verification of Acoustic Signals

5.1. Test Equipment and Data Collection

5.2. Experimental Verification of WAA-FMD Denoising on Measured Acoustic Signals

5.3. Experimental Verification of LGAF-Swin Transformer on Measured Acoustic Signals

5.4. Analysis of Test Results

5.5. Generalization Ability Tests of WAA-FMD and LGAF-Swin Transformer

6. Conclusions and Prospects

6.1. Conclusions

6.2. Prospects

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI