1. Introduction
Bearings serve as critical components in rotating machinery, widely implemented in mechanical transmission systems of large-scale equipment such as electric motors, gearboxes, and cranes [
1]. The increasing complexity of mechanical structures and progressively extreme operating conditions have subjected rolling bearings to more severe operational loads, thereby heightening their susceptibility to failure [
2]. The operational state of bearings directly governs equipment functionality, reliability, and safety, with potential failures potentially triggering system breakdowns and substantial economic losses [
3]. Consequently, continuous reliability monitoring of rolling bearings has become imperative [
4,
5]. Vibration signals constitute essential information sources for condition monitoring, and are recognized as the most accurate indicators of bearing performance and the prevalent choice for fault diagnostics [
6]. Researchers often employ Fourier transform, envelope analysis, spectral kurtosis, Wavelet Transform, and Empirical Mode Decomposition to process bearing vibration signals [
7]. Bearing fault detection methodologies primarily consist of two sequential stages: feature extraction and fault identification [
8].
Fault characteristics can be extracted from temporal and frequency domains. However, the non-stationary and nonlinear nature of bearing vibration signals under actual operating conditions complicates accurate feature extraction, adversely affecting diagnostic accuracy [
9]. Time-frequency analysis serves as a prevalent signal processing approach for characterizing signal variations across both domains. Commonly employed techniques include Short-Time Fourier Transform (STFT), Empirical Mode Decomposition (EMD), Variational Mode Decomposition (VMD), and Wavelet Transform (WT). Classical bearing fault diagnosis methods include Hilbert transform-based (envelope analysis) and the SPM (Shock Pulse Method). Wang et al. [
10] directly used the entire Hilbert envelope spectrum of the resampled signal as a feature vector to characterize bearing fault types and established a DBN classifier model to identify bearing faults. However, background noise in the original vibration signal may persist in the envelope spectrum after Hilbert transform, particularly high-frequency noise, which could be misinterpreted as fault features. Zhang et al. [
11] combined the Shock Pulse Method with frequency analysis for early bearing fault detection. However, the Shock Pulse Method is highly dependent on rotational speed. At low speeds, the impact signals may be too weak for reliable detection, while at high speeds, increased noise may mask genuine fault signals.
Geraei et al. [
12] developed a bearing fault detection methodology integrating Adaptive Local Binary Patterns (ALBPs) with STFT. Unlike conventional Fourier analysis providing global frequency information, STFT dissects signals into short-duration segments through window functions (e.g., Hann, rectangular) that decay to zero beyond specified intervals, followed by Fourier transformation of each segment. This localized analysis enables spatiotemporal fault localization within mechanical systems.
Wang et al. [
13] also utilized STFT to convert one-dimensional time-domain vibration signals of bearings into two-dimensional time-frequency images, which were then input into a generative adversarial network for bearing fault diagnosis. Nevertheless, STFT’s efficacy critically depends on predefined window parameters (shape and length), which remain fixed regardless of signal characteristics. As STFT essentially constitutes superposed windowed Fourier transforms, noise contamination inevitably corrupts time-frequency distributions—particularly detrimental under low signal-to-noise ratio conditions. Qi et al. [
14] utilized EMD to decompose bearing vibration signals, subsequently feeding the IMFs into a hierarchical 1D Convolutional Neural Network (1D-CNN) for fault classification. However, high-frequency impact components in bearing fault signals may spectrally overlap with low-frequency background noise within the same IMF, resulting in frequency masking that compromises diagnostic precision. Taibi et al. [
15] implemented VMD to disassemble raw bearing vibration signals into multiple IMFs, followed by Discrete Wavelet Transform (DWT) for noise-IMF filtration and feature extraction in motor bearing diagnostics. Notably, VMD requires predefined parameters (decomposition mode count
k and penalty factor
α), whose selection lacks theoretical guidance and relies on empirical or trial-and-error approaches. Li et al. [
16] employed genetic algorithms to optimize the decomposition level and penalty factor in VMD for rolling bearing fault diagnosis across the full life cycle. Compared to other optimization algorithms such as the whale optimization algorithm (WOA), genetic algorithms leverage population diversity (through crossover and mutation operations) and global parallel search mechanisms to explore multiple potential optimal solutions in multidimensional parameter spaces, effectively avoiding local optima and making them an excellent choice for VMD parameter optimization. Liu et al. [
17] proposed an interpretable domain-adaptive Transformer (IDAT) method for cross-condition and cross-machine bearing fault diagnosis. The core of this approach involves using a multi-layer domain-adaptive Transformer to extract useful features, with interpretation provided through multi-head attention maps. However, processing multiple feature subspaces in parallel via multi-head attention leads to exponentially increasing computational demands for bearing vibration signals, particularly with long time series or high-resolution time-frequency images, significantly increasing the computational burden. Yu et al. [
18] developed a post-processing time-frequency analysis (TFA) method termed a Wavelet-Based Time Synchro Extracting Transform (WTSET) to precisely capture fault characteristic frequencies in flexible thin-walled bearings. This technique enhances time-frequency representation (TFR) energy concentration while maintaining signal invertibility. Lu et al. [
19] employed wavelet thresholding (WT) denoising to process the acquired bearing vibration signals, effectively reducing noise interference. The diagnostic efficacy of wavelet-based methods inherently depends on wavelet basis function selection (
ψ) and the decomposition level (
L). Different combinations of
ψ and
L exhibit significant variability in feature representation capabilities, yet their optimization lacks systematic theoretical frameworks and requires empirical determination.
Following feature extraction from vibration signals, the subsequent fault classification phase typically employs diagnostic models including Decision Trees, support vector machines (SVMs), k-nearest neighbors (K-NNs), Naive Bayes classifiers, convolutional neural networks (CNNs), and long short-term memory (LSTM) networks. Briglia et al. [
20] implemented Decision Tree technology for motor bearing fault detection using power current signatures, where tree structures are inductively learned from observational data to enable straightforward fault categorization through path traversal. The Decision Tree algorithm recursively partitions the feature space, yet its performance diminishes significantly in high-dimensional spaces while tending to generate overcomplex structures prone to overfitting. Wang et al. [
21] proposed a rolling bearing fault diagnosis technique based on recurrence quantification analysis (RQA) and a Bayesian-optimized support vector machine (RQA-Bayes-SVM). The Bayesian optimization algorithm was employed to search for the optimal penalty factor C and kernel function parameter g of the SVM, thereby establishing an optimal Bayes-SVM model. Maincer et al. [
22] applied SVM and K-NN methodologies for robotic manipulator fault diagnosis, employing Gaussian kernel-based SVM optimized via particle swarm optimization (PSO) to maximize diagnostic accuracy. Sun et al. [
23] developed a k-NN attention-enhanced Video Vision Transformer (k-ViViT) network for action recognition, substituting conventional self-attention with k-NN attention to mitigate noise interference from irrelevant tokens. SVM performance critically depends on kernel selection (linear, polynomial, radial basis function) and parameter tuning (regularization parameter C, kernel coefficient γ). Suboptimal parameter combinations degrade model efficacy, necessitating computationally intensive cross-validation and grid search procedures. In contrast, K-NN suffers from computational inefficiency in high-dimensional feature spaces and large datasets due to its exhaustive distance calculation requirements between query instances and all training samples. Peretz et al. [
24] proposed an enhanced naïve Bayes classifier method for multidimensional and multivariate datasets, termed the Naïve Bayes Enrichment Method (NBEM). This method employs multiple naïve Bayes classifiers based on different distributions and their combinations to classify new observations. However, as a probabilistic model, naïve Bayes classifiers rely on the probability distribution of the training data; significant discrepancies between the distributions of training and testing data may limit the model’s generalization capability.
Ni et al. [
25] constructed a dual-stream convolutional neural network (CNN) model for bearing fault diagnosis. The first stream processes 1D vibration signal spectra, while the second stream handles 2D time-frequency representations derived from the same signals. CNNs extract local features through convolutional layers, which may inadequately capture global features or complex spatiotemporal relationships in certain fault diagnosis scenarios. Chen et al. [
26] designed a neural network model called multi-scale CNN-LSTM (convolutional neural network-long short-term memory) combined with a deep residual learning model for rolling bearing fault diagnosis. This architecture integrates multi-scale wide CNN-LSTM modules with deep residual modules. Han et al. [
27] proposed a hybrid diagnostic method combining convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and gated recurrent unit (GRU) models. In this approach, the output processing of the CNN embeds LSTM networks to analyze long-sequence variation characteristics of rolling bearing vibration signals, enabling long-term time series prediction by capturing long-range dependencies in the sequences. Xu et al. [
28] developed a multi-level residual convolutional neural network with dynamic feature fusion (MRCNN-DFF) for mechanical fault diagnosis. The MRCNN-DFF incorporates depthwise separable (DS) convolution to reduce the number of trainable parameters, offering advantages in model optimization and improving optimization efficiency.
In the CNN-LSTM architecture, unidirectional LSTM may overlook reverse causal relationships, whereas BiLSTM can simultaneously capture both forward and backward temporal dependencies in fault signals. The dense convolutional layers in CNN typically involve a large number of parameters. This computational burden can be reduced by decomposing standard convolution into depthwise convolution and pointwise convolution. The static convolution operations in CNN-LSTM demonstrate limited effectiveness in suppressing noise interference. To address this limitation, a Hybrid Attention module integrating channel attention and spatial attention can be employed. This module dynamically enhances critical fault features through energy gating. Regarding hyperparameter tuning in CNN-LSTM, the manual adjustment of learning rates and other parameters often lacks precision. This challenge can be mitigated by implementing a random search for automated hyperparameter optimization and adopting dynamic learning rate scheduling. Building upon the CNN-BiLSTM fault classification framework, we propose an enhanced architecture: the Hybrid Attention-Based Depthwise Separable CNN-BiLSTM (HADS-CNN-BiLSTM) model.
Based on the comprehensive analysis above, this study proposes a bearing fault diagnosis method integrating algorithm-optimized VMD-DWT with the HADS-CNN-BiLSTM model to address the challenges of low diagnostic accuracy. To reduce strong noise interference and resolve parameter configuration difficulties in VMD and DWT decomposition, the genetic algorithm (GA) is utilized to optimize VMD parameters (decomposition modes k and penalty factor α) using permutation entropy (PE)–kurtosis (K) as the objective function. The correlation coefficients between each IMF and the original signal are calculated. IMFs with coefficients less than or equal to 0.1 are directly discarded. Those with coefficients between 0.1 and 0.8 undergo further wavelet transformation, while components with coefficients greater than 0.8 are retained. The PSO algorithm optimizes the DWT parameters (optimal wavelet basis and decomposition level) based on the minimum Root Mean Square Error (RMSE) criterion to perform secondary denoising on IMFs with coefficients in the 0.1 to 0.8 range. The denoised IMFs and retained components are then reconstructed. After feature extraction from the reconstructed signals, the features are input into the HADS-CNN-BiLSTM model for fault diagnosis. The experimental results demonstrate that this method effectively improves bearing fault diagnosis accuracy and exhibits promising application prospects. The main contributions of this study are as follows:
- (1)
GA-optimized VMD parameters: addressing the difficulty in selecting decomposition modes k and penalty factor α, GA automates parameter optimization, eliminating reliance on empirical selection through extensive manual experimentation.
- (2)
PSO-optimized DWT parameters: solving the challenge of wavelet basis and decomposition level selection in DWT denoising, PSO prevents incomplete noise reduction.
- (3)
VMD-DWT signal reconstruction: the proposed reconstruction method significantly reduces strong noise interference on useful vibration signals, providing reliable data support for bearing fault diagnosis and condition monitoring.
- (4)
The HADS-CNN-BiLSTM framework is constructed by integrating advanced techniques such as depthwise separable convolution, hybrid attention mechanisms, and BiLSTM. Specifically optimized for industrial fault diagnosis tasks, the proposed method is validated on two public datasets to demonstrate its effectiveness.
The remainder of this paper is organized as follows:
Section 2 “ Signal processing methods”: introduces the noise reduction process of raw signals through GA-optimized VMD and PSO-optimized DWT, followed by signal reconstruction.
Section 3 “The proposed model”: details the complete workflow of the improved VMD-DWT and HADS-CNN-BiLSTM fault diagnosis framework.
Section 4 “Experimental validation”: validates the diagnostic accuracy of the method using the CWRU and XJTU-SY bearing datasets.
Section 5 “Conclusions”: summarizes the key research findings.