Bearing Fault Vibration Signal Denoising Based on Adaptive Denoising Autoencoder

: Vibration signal analysis is regarded as a fundamental approach in diagnosing faults in rolling bearings, and recent advancements have shown notable progress in this domain. However, the presence of substantial background noise often results in the masking of these fault signals, posing a significant challenge for researchers. In response, an adaptive denoising autoencoder (ADAE) approach is proposed in this paper. The data representations are learned by the encoder through convolutional layers, while the data reconstruction is performed by the decoder using deconvolutional layers. Both the encoder and decoder incorporate adaptive shrinkage units to simulate denoising functions, effectively removing interfering information while preserving sensitive fault features. Additionally, dropout regularization is applied to sparsify the network and prevent overfitting, thereby enhancing the overall expressive power of the model. To further enhance ADAE’s noise resistance, shortcut connections are added. Evaluation using publicly available datasets under scenarios with known and unknown noise demonstrates that ADAE effectively enhances the signal-to-noise ratio in strongly noisy backgrounds, facilitating accurate diagnosis of faults in rolling bearings.


Introduction
Rolling bearings are crucial components in rotating machinery, significantly influencing equipment reliability and stability [1].However, their prolonged exposure to complex working environments makes them susceptible to malfunctions, which can negatively impact machine performance and pose safety risks [2,3].Effective fault diagnosis techniques for rolling bearings are therefore essential [4,5].
EMD and its variants, such as Ensemble EMD (EEMD), have been widely used for non-stationary signal analysis.EMD decomposes a signal into intrinsic mode functions (IMFs) but can suffer from mode mixing issues.VMD offers an alternative by decomposing the signal into modes with specific sparsity properties in the frequency domain, yet it requires predefinition of mode numbers and can be sensitive to noise.
WT provides a multi-resolution analysis of signals and has been effective in fault diagnosis.It offers a time-frequency representation, making it suitable for transient feature extraction.However, the choice of mother wavelet and decomposition level significantly influences its performance.
Threshold denoising techniques, often combined with WT, apply hard or soft thresholds to the decomposed signal coefficients to reduce noise.Although these methods can enhance signal clarity, they are typically heuristic and may not adapt well to varying noise levels.
Recent advancements in deep learning, particularly convolutional neural networks (CNNs), have significantly enhanced fault diagnosis capabilities by automating feature extraction from raw data [21][22][23].CNNs excel in learning hierarchical representations, which are vital for distinguishing fault features from noise [24][25][26].Studies have demonstrated that CNN-based models outperform traditional methods in both accuracy and robustness under varying noise conditions [27][28][29].These models can capture complex patterns and interactions within the data that are often missed by traditional techniques.
Building upon these advancements, we propose an adaptive denoising autoencoder (ADAE) framework.The ADAE framework introduces a novel adaptive shrinkage unit (ASU) that employs local attention mechanisms to dynamically adjust shrinkage coefficients.This approach effectively removes noise while preserving critical fault features.Additionally, the integration of dropout regularization within the encoder enhances the network's resilience by simulating real-world noise conditions during training.This dual strategy of local attention and dropout regularization not only improves denoising performance but also ensures the model's robustness and adaptability across different industrial applications.
The ADAE framework holds broad applicability in industries reliant on precise machinery health monitoring, including the aerospace, manufacturing, and energy sectors.By effectively mitigating noise interference in vibration signals, ADAE paves the way for early detection of equipment anomalies and predictive maintenance strategies, thereby reducing unscheduled downtime and maintenance costs.
In summary, our ADAE framework addresses the limitations of traditional denoising techniques and leverages the strengths of CNNs to provide a more adaptable and accurate solution for vibration signal analysis in noisy environments.This work represents a significant step forward in the development of robust diagnostic tools for industrial applications.

Problem Statement
In rolling bearings, vibration signals, denoted as , are a combination of the original signal  and a noise signal .In real-world scenarios, these signals are often contaminated with noise from various sources, typically modeled as additive white Gaussian noise (AWGN).The goal is to remove the noise  from the observed vibration signal  to recover the original fault pulse signal .Since obtaining a pure signal is challenging, we use noisy signals as the training set, adding Gaussian white noise to simulate noise and train the model.This approach allows the model to learn the characteristics of the noise within the contaminated signals and extract the clean signal.
For the training process, a training set  = {(  ,   )} =1  =  ×  is utilized, where  and  represent matrices of noise signals and clean signals, respectively.The network parameters are trained to minimize the loss error  between the network input   and output   �.

Basic Components of CNN
Convolutional neural networks (CNNs) are comprised of various fundamental components that play essential roles in processing and extracting features from input data.These components include convolution (Conv), deconvolution (Deconv), nonlinear activation functions (ReLU), batch normalization (BN), and fully connected (FC) layers, among others [30,31].

Dropout: Addressing Overfitting in Deep Learning
Dropout [32] is a widely adopted technique in deep learning models, renowned for its efficacy in mitigating overfitting.By randomly dropping out neurons during training, Dropout encourages the network to learn more robust and diverse representations of the data.This prevents the model from relying too heavily on specific features or training samples, thereby improving its generalization performance on unseen data.Dropout is straightforward to implement and does not require significant modifications to the network architecture, making it a popular choice among researchers and practitioners in the field of deep learning.Its ability to enhance model performance while maintaining simplicity has contributed to its widespread adoption across various applications.

Theoretical Basis of Autoencoders
Autoencoders [33] represent a class of neural networks designed to replicate input data with precision, aiming to distill essential features while discarding redundant information.Comprising input, hidden, and output layers, autoencoders are structured to encode input data into a lower-dimensional latent space and subsequently decode it back to its original form [34]. Positioned between the input and output layers, the hidden layer in autoencoders acts as a bottleneck, deliberately restricting the network's capacity to learn a compressed representation of the input data.
This component transforms the input data  into a latent representation ℎ by means of nonlinear transformations, facilitated by parameters such as the activation function , weight matrix  and bias vector .
Subsequently, the decoder network reconstructs the latent representation ℎ back into the original input  �.This process involves another set of nonlinear transformations using , ′ and ′ for the activation function, weight matrix, and bias vector of the decoder, respectively.
The performance of an autoencoder is evaluated based on its ability to reconstruct the input faithfully.The reconstruction error [35], denoted by (,  �), measures the discrepancy between the original input  and its reconstruction  �.The optimization process revolves around fine-tuning the network parameters, to minimize the reconstruction error observed across the training samples.This objective underscores the endeavor to glean insightful representations of the input data, capable of faithfully reconstructing the original inputs with precision and fidelity.

Shortcut Connection
Challenges related to gradients, such as gradient vanishing and gradient explosion, are prevalent in the training of deep neural networks and often lead to convergence issues or sluggish training.To ensure that gradients converge meaningfully towards detection rather than diverging, this study adopts several strategies: the use of adaptive learning rate scheduling via the Adam algorithm, which adjusts the learning rate dynamically during training; proper weight initialization techniques; regularization methods; dropout [32]; and skip connections [36].Skip connections enable the network to bypass one or more layers, directly adding the input from a previous layer to the output of a subsequent layer, thereby facilitating smoother gradient flow and enhancing training stability.This facilitates: 1. Gradient Propagation: Providing a direct path for gradients to propagate back to earlier layers helps mitigate the vanishing gradient problem, thus facilitating smoother training.2. Information Flow: Faster information flow within the network is enabled, aiding in quicker and more efficient learning of effective feature representations.3. Feature Reuse: Accessing original inputs or feature maps from preceding layers directly prevents information loss, crucial for tasks such as image super-resolution and segmentation, where preserving detailed information is vital.
In summary, the incorporation of shortcut connection enhances training efficiency, stability, and generalization ability, leading to improved performance across various deep learning tasks.

Adaptive Shrinkage Unit (ASU)
Traditional denoising techniques such as filter design, wavelet thresholding, and sparse representation are effective in retaining useful information while removing noise from signals.However, these methods often demand substantial domain expertise.Deep learning approaches such as DRSN [37] combine soft thresholding with deep learning to automatically learn channel thresholds.Nevertheless, manually defining threshold functions and their varying impacts present challenges in adaptively selecting appropriate functions for specific problems.
The ASU employs a local attention mechanism to train shrinkage coefficients, which are then used to attenuate noise in the input signal.This approach allows the ASU to adaptively determine the optimal shrinkage values for different parts of the signal, thus effectively removing noise while preserving essential fault-related features.
Training of Shrinkage Coefficients: The shrinkage coefficients are trained using convolutional and deconvolutional operations, as depicted in Figure 1.Specifically, the ASU consists of a convolutional layer followed by a sigmoid activation function to constrain the shrinkage coefficients within the [0, 1] range.The sigmoid function ensures that the coefficients scale the input signal appropriately, reducing the noise components while retaining significant fault information.Significance in the Denoising Process: The ASU enhances the network's generalization ability and overall performance by reducing the reliance on manually defined threshold functions and improving adaptability to different datasets and noise levels.By incorporating the ASU within both the encoder and decoder of our adaptive denoising autoencoder (ADAE) framework, we ensure that noise is effectively attenuated during both feature extraction and signal reconstruction phases, resulting in more accurate fault diagnosis.

Architecture of the Proposed Method
Based on the fundamental denoise autoencoder architecture, our work introduces an innovative vibration signal denoising framework, the adaptive denoising autoencoder (ADAE), illustrated in Figure 2. The proposed ADAE framework consists of several key modules: Encoder: Utilizes convolutional layers to learn data representations, focusing on compressing temporal information.
Decoder: Employs deconvolutional layers for data reconstruction, ensuring the recovery of signal details.
Adaptive Shrinkage Unit (ASU): Trains shrinkage coefficients through local attention mechanisms, effectively removing noise while preserving fault features.
Dropout Regularization: Introduced after the initial convolutional layer to mimic real-world noise conditions, enhancing the network's resilience and adaptability.
Shortcut Connection: Enhances training efficiency, stability, and generalization ability, leading to improved performance across various deep learning tasks.

Experimental Validation
The denoising performance of ADAE is rigorously evaluated on the CWRU dataset across different noise levels to ascertain its effectiveness in real-world scenarios.Our experiments were conducted using Matlab-2022a in windows environment.

Experimental Setup and Data Description
The CWRU dataset is a commonly utilized third-party dataset that offers a reliable method for assessing the effectiveness of existing algorithms.This dataset includes driveend samples captured at a sampling frequency of 12 kHz, containing signals related to four different health conditions: Normal, rolling element fault, inner race fault, and outer race fault.Each health condition consists of four different operating loads, and within each fault mode, three different fault sizes are examined.Therefore, there are a total of ten categories of bearing data, each corresponding to four operating loads.
To ensure data consistency and comparability, all collected data undergoes z-score standardization, mitigating the potential impact of sensor placement.Gaussian white noise (AWGN) is subsequently added to the data based on signal-to-noise ratio (SNR) for performance validation of ADAE and comparison with other mainstream algorithms.The dataset is split into training and testing samples at a ratio of 4:1.In our denoising process, the presence of AWGN necessitates an assessment of the denoising algorithm's effectiveness.We employ the signal-to-noise ratio (SNR) as a fundamental metric, defined as: We aim to eliminate AWGN noise, mimicking real industrial scenarios.To gauge the algorithm's performance comprehensively, we utilize three evaluation metrics [11]: SNR improvement and root mean square error (RMSE) defined as: Here  ,  � and  have the same meanings as before.A higher   and lower  indicate stronger denoising capabilities of the algorithm.

Hyperparameter Optimization and Ablation Study
In this section, critical structures and parameters are optimized and tested using the CWRU dataset.All experiments are conducted under a noise level of SNR = −6 dB.We employ a sliding window approach to construct the sample set, with each sample containing 2048 data points, and no overlap between adjacent samples.For each fault category and each load, 50 samples are generated, resulting in a total of 2000 samples.This experimental design ensures a comprehensive evaluation and comparison of model performance under different conditions.

Impact of Dropout Probability on Denoising Performance
This section delves into the significance of dropout probability () in dictating the denoising efficiency of the adversarial domain adaptation encoder (ADAE) model, specifically when the signal-to-noise ratio (SNR) is set at −6 dB.An exploratory endeavor was undertaken, where  was systematically varied from 0 to 0.6, yielding a comprehensive dataset of denoising outcomes summarized in Table 1.The findings highlight a peak in denoising performance within the p range of 0.1 to 0.3 across all evaluation metrics.A marked deterioration occurs beyond  = 0.3, suggesting that excessive dropout disrupts the signal excessively, hindering effective learning.Conversely, lower  values maintain a balance between regularization and information retention, fostering an environment conducive to noise reduction.
In anticipation of variable noise profiles in practical applications, we introduced a novel strategy: The randomization of  within the interval [0.1, 0.3].This decision was meticulously considered to imbue the ADAE model with heightened versatility.By dynamically selecting p during training epochs, the model becomes resilient to fluctuations in noise intensity, thereby ensuring consistent denoising performance across a broader spectrum of SNR conditions.Specifically, this stochastic element emulates real-world variability, forcing the model to adapt and learn from a more diverse set of signal-disruption patterns, which, in turn, bolsters its generalization capabilities.
In summary, the analysis underscores the critical role of dropout probability in optimizing the denoising performance of the ADAE model, with randomized  values proving instrumental in enhancing adaptability and efficacy across varying noise levels.

Impact of Initial Convolutional Kernel Width
In delving into the intricacies of the neural network architecture, the initial choice of the convolutional kernel width significantly influences the model's feature extraction efficacy from raw inputs.This section provides a comprehensive elaboration on this parameter, addressing the oversight in previous explanations and enhancing the holistic understanding of the architecture.
Intuitive Understanding of Kernel Width: Analogous to the human visual system, the convolutional kernel operates as a feature perception mechanism, scanning over the data.The width of the kernel determines the "field of view" for feature detection, affecting the balance between capturing local details and integrating broader contextual information.Wider kernels are proficient at incorporating extensive contextual cues, whereas narrower kernels excel in discerning fine-grained, localized features.
Experimental Framework and Insights: Our study intentionally selected kernel widths as powers of 2 (4, 8, 16,..., 1024) for systematic comparison and computational efficiency.Under a fixed dropout rate (  = 0.2), we assessed the denoising performance of the adversarial domain adaptation encoder (ADAE) across these varying widths, presenting the findings in Table 2 through SNR improvement (  ) and root mean square error (RMSE) metrics under −6 dB SNR conditions.
Core Findings: The analysis revealed that a kernel width of 128 optimized the ADAE's denoising capability, striking a balance between detailed local feature extraction and the integration of global structural information.While narrower and wider kernels exhibit unique advantages, they do not match the comprehensive denoising efficiency achieved with a medium-sized kernel width.
Conclusion and Optimization Guidance: This section's expanded explanation not only elucidates the direct impact of kernel width on model performance but also offers guidance for future researchers in adjusting this parameter for different tasks and datasets.The careful calibration of the initial convolutional kernel width is thus emphasized as a pivotal design consideration in deep learning models, enriching the understanding of its role in achieving optimal performance.In this ablation study, the integration of shortcut connections at various stages within the ADAE model was meticulously investigated, targeting three strategic locations: Position I (connecting the input to the output of the encoder), Position II (linking the input of the encoder to the output of the decoder), and Position III (establishing a direct path from the input to the output of the decoder).The assessment encompassed eight distinct configurations, the outcomes of which are summarized in Table 3.The remarkable performance of configurations featuring solely a Position III shortcut stands out, surpassing all other arrangements by a substantial margin.This superiority can be attributed to the direct transmission path from the input to the decoder output, which bypasses the potential accumulation of error in intermediate layers.By maintaining a purer representation of the input signal, Position III shortcuts facilitate more effective feature reconstruction, resulting in significantly reduced noise levels, as reflected in the lowest  and highest   .
Combinations inclusive of Position III shortcuts consistently showed favorable performance, validating the strategic advantage of this connection scheme.Conversely, Position I and Position II shortcuts alone or in certain combinations provided less pronounced improvements.Consequently, our optimized ADAE model, depicted in Figure 3, exclusively integrates Position III shortcuts to harness its exceptional denoising capabilities.For completeness, the finalized network specifications are detailed in Table 4.  Convolution and deconvolution operations are inherently linear processes.However, the relationship between signals and noise is complex, requiring nonlinear transformations for effective denoising.While rectified linear unit (ReLU) activation functions are commonly used in computer vision tasks, they tend to disregard negative values during signal denoising.Negative values are pivotal in denoising tasks, especially as signal means are normalized to zero, rendering negative values significant.ReLU simply nullifies negative values, resulting in considerable information loss.
To address this limitation, Leaky ReLU activation functions are introduced, which scale negative values by a leakage factor ().In this section, we replace all ReLU activations with Leaky ReLU and explore the impact of different  values on the denoising performance of the adversarial domain adaptation encoder (ADAE).
As depicted in Figure 4, denoising performance initially increases with , reaching a peak before diminishing returns are observed.This trend suggests that increasing the leakage factor accentuates the importance of negative values, thereby enhancing denoising effectiveness.ADAE achieves maximal denoising efficacy when  is set to 0.3.Beyond this value, denoising capability starts to deteriorate, indicating a delicate balance between capturing negative values and nonlinear learning capabilities.In summary, a judicious selection of the leakage factor () is crucial to achieving optimal denoising performance in ADAE.The findings suggest that  = 0.3 offers the best balance between capturing negative values and nonlinear learning capabilities, thereby maximizing denoising effectiveness.This optimal configuration is adopted for subsequent experiments to ensure robust denoising performance.

Known Noise Intensity
To underscore the superiority of the proposed method, we conducted a comparative evaluation with several state-of-the-art and traditional denoising techniques, including WT [18], EMD [13], JL-CNN [4], SEAEFD [38], and NL-FCNN [11].The denoising performance of ADAE was assessed under varying SNRs of −6 dB, −3 dB, and 0 dB, with the results summarized in Table 5.
ADAE demonstrates significantly superior denoising performance compared to other methods across all noise intensities, with   nearly doubling or more in comparison.Even when noise intensity is low, ADAE achieves an   greater than 15, showcasing its robustness and ability to learn noise characteristics while preserving original fault information.
To provide a more intuitive understanding of ADAE's denoising efficacy, we visualize both temporal and frequency waveforms of signals under the three noise intensities.These visualizations underscore ADAE's remarkable denoising capabilities, effectively removing noise while preserving fault information across various fault modes and noise intensities.The distinct denoising effects observed in the frequency spectra demonstrate ADAE's ability to remove irrelevant frequency components, further validating its effectiveness even under challenging noise conditions.

Unknown Noise Intensity
To further evaluate the denoising performance of ADAE in real-world scenarios where noise levels are unpredictable, we conducted experiments with randomly varied SNR ranging from 0 dB to −6 dB.The results, compared with other denoising methods, are presented in Table 6.In all tested scenarios, ADAE consistently showcases superior denoising capabilities.When trained on a dataset with randomly varied SNR, ADAE exhibits even better performance compared to the previous scenario with constant noise intensity.Particularly notable is the significant enhancement in the SNR imf metric under 0 dB and −3 dB conditions, where ADAE achieves values surpassing 22, representing an improvement of 7 compared to the previous results.This substantial improvement underscores ADAE's adaptability and effectiveness in handling varying noise levels.
Furthermore, relative to other denoising methods, ADAE consistently outperforms them across all tested scenarios, highlighting its unparalleled advantages in signal denoising.This reaffirms ADAE's robustness and generality, making it a highly reliable solution for denoising tasks in diverse environments.
To provide further insights into ADAE's denoising efficacy, temporal and frequency waveforms of test samples under an SNR of −6 dB are presented in Figure 6.These samples encompass signals from ten bearing health states under 1 hp load.As observed in the figures, ADAE successfully preserves temporal fault features while effectively removing irrelevant frequency components.This capability underscores ADAE's role as an efficient preprocessing method for rolling bearing fault diagnosis, particularly in environments characterized by strong noise backgrounds.

Conclusions
This study pioneers the adaptive denoising autoencoder (ADAE), an innovative framework designed to mitigate noise in vibration signals with unprecedented effectiveness.By introducing an adaptive shrinkage unit with local attention mechanisms within its architectural core, ADAE innovatively discriminates between noise and fault signatures, preserving vital diagnostic information while suppressing irrelevant noise.The strategic integration of a dynamic dropout strategy further augments its adaptability, endowing the model with versatility across diverse noise profiles.
Our research breaks new ground by exhaustively examining the influence of Leaky-ReLU activation functions and convolutional kernel dimensions on denoising efficacy, contributing fresh insights to the design of neural network architectures for signal processing.Moreover, the exploration of shortcut connections validates their utility in enhancing feature preservation amidst noise suppression.
The empirical validations affirm ADAE's dominance over conventional denoising techniques, evidencing exceptional competence in isolating and removing noise in complex, real-world conditions.Its proficiency in both time and frequency domains, particularly in low SNR environments, underscores its potential as a transformative tool in the realm of fault diagnosis, especially critical for bearings operating under heavy noise interference.
In conclusion, the proposed ADAE framework stands as a groundbreaking advancement in the realm of signal denoising, offering unparalleled performance in challenging

Figure 1 .
Figure 1.Principle of ASU.Application to Noisy Signals: Once the shrinkage coefficients are learned, they are applied to the noisy input signal.For a given noisy signal xxx, the ASU computes the shrinkage coefficients α as follows: α = σ(Wx + b) where W and b are the weights and biases learned during training, and σ represents the sigmoid activation function.The denoised signal  � is then obtained by element-multiplication of the shrinkage coefficients with the noisy signal:  � = .This process effectively filters out the noise components, as the shrinkage coefficients adapt to the noise level in different segments of the signal.Significance in the Denoising Process: The ASU enhances the network's generalization ability and overall performance by reducing the reliance on manually defined

Figure 2 .
Figure 2. Illustrative overview of the proposed model's network structure.

Figure 5
depicts the denoising results of ADAE under SNR = −6 dB, when noise intensity is known.

Figure 5 .
Figure 5. Denoising results of ADAE with SNR = −6 dB when noise intensity is known.

Figure 6 .
Figure 6.Denoising results of ADAE with SNR = −6 dB when noise intensity is unknown.

Table 2 .
Denoising performance of ADAE with different kernel widths at −6 dB.

Table 3 .
Denoising performance of ADAE with shortcut connections at −6 dB.

Table 5 .
Denoising results of different methods under constant noise intensity.

Table 6 .
Denoising results of different methods under random noise intensity.
Original Waveform/Specturm Noisy Waveform/Specturm Denoised Waveform/Specturm noise scenarios and charting a new course for enhancing diagnostic accuracy under severe noise interference, thereby solidifying its value as a pivotal preprocessing technology for robust machinery health monitoring.Conceptualization, H.L. and K.Z.; methodology, K.Z. and L.H.; writingoriginal draft preparation, H.L.; project administration, K.Z.; funding acquisition, K.Z.All authors have read and agreed to the published version of the manuscript.This research was funded by the Joint Funds of Equipment Advance Research of China, grant number 6141B02040207. Funding: