1. Introduction
With the rapid advancement of industrial automation and intelligent manufacturing, fault diagnosis technology has become an indispensable component in enhancing the stability of mechanical systems, optimizing production processes, and effectively reducing maintenance costs [
1,
2,
3]. Rolling bearings, as critical rotating components in mechanical equipment, are susceptible to wear, fatigue, cracking, and other forms of failure during prolonged operation in high-load, complex environments. These failures often result in increased vibration, performance degradation, and, in severe cases, equipment shutdown [
4,
5,
6]. Timely and accurate diagnosis of bearing faults is crucial for improving equipment reliability, minimizing economic losses, and ensuring production safety [
7,
8]. However, bearing fault signals are typically influenced by significant noise and non-stationary characteristics, posing substantial challenges to the extraction and classification of fault features. Consequently, numerous advanced methods have been proposed by researchers to address these challenges.
Traditional bearing fault diagnosis methods primarily rely on statistical analysis, expertise, and signal processing techniques such as the Fourier transform and wavelet transform, assessing the bearing’s operational status by extracting key features from vibration signals [
9,
10,
11]. Among these techniques, wavelet transform has been widely used in early signal processing and fault diagnosis, and certain results have been achieved [
12,
13]. However, in modern complex industrial environments, wavelet transform faces many challenges when dealing with signals containing multiple noises, and its ability to identify fault features is often limited [
14,
15]. The Fourier transform analyzes frequency components by decomposing a signal into a set of sinusoids, which is suitable for spectral analysis of smooth signals but is ineffective in dealing with non-smooth signals [
16,
17]. In contrast, VMD, which is based on the local information of the signal for adaptive decomposition, is able to effectively separate the different components and accurately capture the frequency characteristics of non-smooth signals over time, even in the presence of non-smooth signals [
18]. However, despite the partial success of these methods, their performance relies on a large amount of expert knowledge and still exhibits limitations in extracting effective features [
9].
With the rapid development of artificial intelligence and big data technologies, data-driven fault diagnosis methods have garnered significant attention [
19,
20]. For instance, Hu et al. [
21] proposed a multi-scale, multi-frequency branching interactive spatio-temporal sequence prediction network for predicting the remaining service life of railroad electromechanical equipment. Li et al. [
22] introduced a joint-attention feature transfer network to address the category imbalance issue in industrial data. Kamil et al. [
23] proposed the BiGRU-CNN model for real-time monitoring and technical status diagnosis of small unmanned aerial vehicle units. Niu et al. [
24] leveraged CNN layers and a BiGRU to extract high-dimensional features and temporal dependencies from historical sequences, demonstrating strong performance compared to other models. Li et al. [
14] combined graph convolutional networks with a residual module to enhance the model’s ability to capture localized features in signals. Zhao et al. [
25] integrated the unsupervised feature learning capabilities of auto encoders with the powerful feature extraction abilities of CNN for fault detection and classification. Li et al. [
26] proposed an ACWOS fault diagnosis method based on clustering weighted oversampling to solve the problem of bearing fault diagnosis when the operating conditions change and the data are unbalanced.
In recent years, research by both domestic and international scholars in the field of fault diagnosis has focused on innovative applications such as generative adversarial networks (GANs), lightweight model design, digital twin technology, etc. Pham et al. [
27] proposed an improved GANs-based fault diagnosis method for rolling bearings, which solves the misclassification problem of the traditional CNN model in the insufficient-data scenario by generating two-dimensional time-frequency representation data of the acoustic emission signals. The method performs well in low-speed and composite fault datasets; however, the study does not consider the computational efficiency limitations in real-time diagnosis scenarios. Li et al. [
28] constructed a multiscale fault evolution digital twin model for the entire life cycle of rolling bearings and achieved accurate prediction of the fault expansion mechanism through dynamic excitation mapping and real-time data updating. Li et al. [
29] developed a frequency-time multimodal Transformer model, which fuses frequency-domain feature maps and time-domain feature vectors through multivariate decomposition and discrete wavelet transform. However, the high complexity of this model makes it difficult to deploy in resource-constrained edge devices. Zhong et al. [
30] designed a simplified fast GANs and triple migration learning framework, which significantly reduces the time of GANs data generation. The time required for GANs to generate data was significantly reduced, and the model’s generalization ability was improved by joint training with open-source data, synthetic data, and real data. Niu et al. [
31] proposed a fault diagnosis method based on a rolling element separation signal processing technique combined with a lightweight convolutional network. This approach reduces computational cost through channel sharing and unidirectional spatial convolution, and demonstrates strong robustness in noisy environments. However, its mechanical structure feature extraction process relies on manual design and does not achieve end-to-end adaptive optimization.
Despite the progress made by these innovative approaches, GANs are prone to problems such as gradient vanishing, pattern crashing, and oscillations during the training process, resulting in unreliable quality of the generated fault data. While lightweight models reduce resource consumption by reducing the number of parameters and computational complexity, they often sacrifice the expressive power of the model, which in turn affects diagnostic accuracy. Digital twin technology imposes high performance requirements when processing time-series data; thus, accurately capturing early signs of faults in complex environments remains a key challenge in current research.
These methods improve the ability to identify fault signatures in complex environments through finer signal analysis and processing. However, accurately capturing early fault signs under high noise and low signal-to-noise ratio conditions remains one of the major challenges in current research [
32].
In this paper, a VMD/FFT-Quadratic-BiGRU model is proposed, aiming at the combined use of signal decomposition, feature enhancement, and temporal modeling capabilities to achieve effective extraction and fusion of weak features, along with improved noise immunity. The main innovations of the model are primarily reflected in the following aspects:
- (1)
Improved feature extraction method: A parallel processing strategy combining VMD and FFT is employed to process bearing vibration signals, enabling the extraction of both time-domain and frequency-domain feature sets of the bearing.
- (2)
Comparison with existing methods: We introduce the structure of combining a quadratic network and BiGRU and construct a diagnostic model with stronger noise robustness. The quadratic network enhances the feature signals through nonlinearities to effectively suppress the influence of noise; the BiGRU further refines the time series features through bi-directional time-dependent modeling to ensure the accuracy and robustness of fault classification.
The content of this paper is as follows:
- (1)
A model is proposed for bearing fault diagnosis, and a quadratic network is introduced to enhance the feature extraction capability.
- (2)
VMD and FFT are combined for signal preprocessing to effectively extract time–frequency domain information, while the BiGRU is used to capture time series features and improve the accuracy of fault classification.
- (3)
The effectiveness of the proposed model is verified by the publicly available CWRU dataset and several comparative experiments.
The remainder of the manuscript is organized as follows. In
Section 2, related work is introduced and the working principle of the proposed method is described.
Section 3 presents the experimental study, demonstrating the application and comparison of the proposed method across different datasets.
Section 4 provides the conclusion, which summarizes the key findings of the paper.
3. Experiments and Analysis of Results
3.1. Introduction to the Datasets
The data for Experiment 1 were obtained from the Case Western Reserve University (Cleveland, OH, USA) bearing fault dataset. The experimental setup used for data collection is depicted in
Figure 4, which includes a motor, torque transducer, fan end bearing, drive end bearing, and dynamometer. The dataset corresponds to a rolling bearing at the drive end, with a motor speed of 1797 rpm, a load of 0 hp, and a sampling frequency of 12 kHz. The specific details of the data are provided in
Table 1. This dataset includes both the healthy operating state and three primary failure scenarios: inner ring failure, outer ring failure, and ball failure. For each failure type, various damage levels were considered: 0.1778 mm, 0.3556 mm, and 0.5334 mm, resulting in a total of 10 distinct failure states for analysis.
The data for Experiment 2 were obtained from laboratory equipment. The experiment table consists of a motor, coupling, drive end bearing, vibration transducer, etc., as shown in
Figure 5. This experiment covers six operating conditions: normal operating condition, inner ring failure, outer ring failure, ball failure, cage failure, and combined failure. The operating speed of the bearing is set to 1250 r/min and the load is set to 0 hp; the sampling frequency is 11 kHz, and 16,384 data are collected for each operating condition. The dataset is shown in
Table 2.
3.2. Model Parameter Setting
The performance of the VMD depends on the selection of key parameters such as the penalty factor α and the modal number
K. Inappropriate parameters may lead to signal decomposition distortion or modes mix. We introduce the grey wolf optimizer (GWO) algorithm to optimize the VMD parameters to improve the quality of decomposition and the accuracy of signal analysis. The search range of VMD parameters is set
,
; the grey wolf population is set to 20 and the number of iterations is set to 15; the envelope entropy is chosen as the fitness calculation.
Figure 6 shows the optimization process of the GWO algorithm with the minimum fitness corresponding to the [
K, α] combination of [4, 563].
In the quadratic network, the convolution kernel of the convolution layer is set to 1 × 3, a stride of 1, and padding of 1 to ensure that the feature map size remains unchanged after convolution. The kernel size of the max-pooling layer is set to 2, with a stride of 2 and no padding, which reduces the feature map size by half after max pooling. The activation function for both layers is the ReLU function. In the BiGRU network, the activation function is the tanh function. The specific parameters are provided in
Table 3.
Two datasets are sampled according to the overlap sampling method with a window of 1024 and an overlap rate of 50%. The sampled data are categorized into training set, validation set, and test set.
3.3. Analysis of Experimental Results
The model’s accuracy curve and loss value curves for Experiments 1 and 2 are shown in
Figure 7 and
Figure 8, following 50 epochs of training. The beginning of training is marked by a low accuracy rate and a high loss value. Increased iterations lead to a gradual increase in accuracy and a stable loss value, indicating that the model is slowly fitting the training data. Furthermore, the high consistency between the training curve and the validation curve suggests that there has been no fitting in the model.
After 20 iterations, the model’s prediction accuracy and loss values stabilized, indicating successful convergence. In Experiment 1, the classification accuracy on the training set reached 100%, with a corresponding loss value of 9.80 × 10−5. In Experiment 2, the accuracy also reached 100%, with a loss value of 0.0015. For the validation sets, the classification accuracies were 100% and 100%, with loss values of 0.0001 and 0.007, respectively. These results demonstrate that the combined VFQB model exhibits excellent stability and high accuracy.
To further visualize the recognition accuracy across different categories in the two experiments,
Figure 9 presents the confusion matrix for bearing fault state recognition results. In
Figure 9a, the horizontal and vertical axes, labeled C1 to C10, represent the 10 bearing states in Experiment 1, including one normal state and nine fault states. In
Figure 9b, the axes labeled C1 to C6 represent the six bearing states in Experiment 2, which include one normal state and five fault states. The diagonal values of each matrix indicate the number of samples in which the model correctly recognized each state.
There is no misidentification of the normal operation state as a fault state in the model, which effectively avoids false shutdowns in actual production. At the same time, when the bearing fails, the VFQB model demonstrates high fault identification accuracy, which can effectively shorten the length of maintenance downtime during the fault and yield substantial economic benefits.
3.4. Ablation Experiments
In order to verify the effectiveness of each component of the proposed model, the model was partially disassembled in modules to form three ablation models:
- (1)
Model 1: This model is constructed by removing the quadratic network module from the original model.
- (2)
Model 2: This model is constructed by replacing the BiGRU network with a BiLSTM network on the basis of the original model.
- (3)
Model 3: This model is constructed by removing the attention module from the original model.
The above models are subjected to ablation experiments and evaluated for precision, recall, and F1-score.
Table 4 and
Table 5 present a comparison of the performance of this paper’s model with the ablation models.
The results show that the diagnostic performance of the proposed model exceeds that of all ablation models. In Experiment 1, the diagnostic accuracy of VFQB is improved by 0.43%, 1.69%, and 0.86% compared to Model 1, Model 2, and Model 3, respectively. In Experiment 2, the accuracy is improved by 2.08% and 4.46% compared to Model 1 and Model 2, respectively, which verifies the contribution of each module to the model’s performance. Specifically, removing the quadratic network module (Model 1) slightly reduces the diagnostic accuracy, indicating that this module contributes to the model’s diagnostic capability to some extent. Replacing the BiGRU with a BiLSTM (Model 2) decreases the diagnostic accuracy compared to the original model, indicating that the BiGRU network is more effective at capturing the dynamic characteristics of the time series data for this task. Removing the attention module (Model 3) has a smaller impact on diagnostic performance, suggesting that the attention mechanism has a relatively limited effect on model enhancement, but still optimizes the attention and extraction of information to some extent. Overall, the original model is able to integrate information more comprehensively by combining various key modules, thereby improving both the accuracy and robustness of the fault diagnosis.
3.5. Comparative Experiments
To evaluate the superiority of VFQB in noisy environments, a comparison is conducted with three existing deep learning algorithms: MCNN-LSTM [
43], BearingPGA-Net [
44], and Laplace_Inception [
45]. The experimental results are presented in
Table 6 and
Table 7.
As observed from
Table 6 and
Table 7, the diagnostic accuracy of all models decreases to some extent as the signal-to-noise ratio (SNR) decreases. This decline is primarily due to excessive noise, which obscures the characteristics of the original signal. Consequently, as the level of noise increases, the model’s immunity to noise becomes more critical. Among the models evaluated, the VFQB demonstrates superior fault diagnosis accuracy in the −12 to 12 dB SNR range. This can be attributed to the model’s robust feature extraction capabilities, which allow it to effectively identify fault features within the data, even in the presence of noise interference across both the temporal and spatial domains.
When the SNR exceeds 6 dB, all algorithms demonstrate improved fault diagnosis performance in both experiments, achieving accuracy rates above 90% due to the lower noise levels. However, as noise intensity increases, the diagnostic performance of the MCNN-LSTM and Laplace_Inception algorithms deteriorates significantly. Specifically, at an SNR of −12 dB, the accuracy of the MCNN-LSTM model drops to approximately 20%, whereas the VFQB model maintains an accuracy of nearly 85% at the same SNR. This performance can be attributed to the attention module’s ability to focus on important features despite the presence of noise, coupled with the advantages of the BiGRU and quadratic neural network in processing temporal signals. In highly noisy environments with SNRs below 0 dB, the diagnostic accuracy of VFQB decreases to a lesser extent and outperforms the other models. For instance, at −12 dB, the diagnostic accuracy of VFQB in Experiment 1 is 91.17%, which is 69.22% higher than that of MCNN-LSTM, 30.2% higher than BearingPGA-Net, and 9.29 higher than Laplace_Inception. In Experiment 2, the diagnostic accuracy of VFQB is 95.31%, which is 69.02% higher than MCNN-LSTM, 8.21% higher than BearingPGA-Net, and 8.21% higher than Laplace_Inception, highlighting the superior performance of VFQB in high-noise environments. In summary, VFQB is demonstrated through comparative experiments to possess exceptional noise suppression capabilities, effective fault information extraction from vibration signals, and significant potential for intermediary bearing fault diagnosis in noisy environments.