1. Introduction
Rotating machinery is a core power unit in industrial production, intelligent manufacturing, and agricultural equipment, especially in the context of intelligent agriculture. The maintenance of harvesting and power units is crucial for food security [
1]; its reliability is directly dependent on bearings—the failure of which accounts for 60–70% of mechanical transmission malfunctions [
2]. As emphasized in recent reviews [
3], intelligent diagnosis has become the cornerstone of ensuring the operational stability of complex mechanical systems. Bearing diagnosis methods include vibration monitoring, clearance measurement, and temperature detection, among others. Temperature measurement stands out for its real-time performance and sensitivity. To assess the current state of rotating machinery, changes in temperature in the friction zone are a crucial technical indicator of variations in the operating status of the bearing units. The thermal load of the bearing exhibits three modes—stable temperature, quasi-steady heating, and post-stable temperature jump—which effectively reflect operational states from normal to critical failure. Experimental studies confirm that temperature correlates strongly with vibration and clearance, e.g., bearing failure occurs when the temperature exceeds 73 °C for robotic systems [
4]. FEA-based thermal conductivity simulation enables an accurate conversion between surface and friction-zone temperatures. Integrating temperature measurement with other methods forms a comprehensive diagnostic system, ensuring predictive maintenance and improving the useful life of the equipment. Rolling bearings, as a pivotal component integrated into textile machinery, exert a direct and profound influence on the operational stability and safety of the associated equipment. Consequently, the diagnosis of bearing failure is of paramount importance in practical engineering applications. Specifically, fault diagnosis is accomplished through the extraction of fault-relevant feature information that is inherent in the vibration signals acquired during the rolling bearings’ service cycle. Nevertheless, the collected fault signals typically demonstrate prominent nonlinear and non-stationary characteristics, which impose substantial constraints on the ability of traditional diagnostic methods to extract fault features with high efficacy [
5,
6]. To mitigate this critical challenge, a multitude of researchers in the field have made extensive investigative efforts [
7].
Both the wavelet transform method and EMD are celebrated for their superior time–frequency resolution capabilities; thus, they have been extensively employed in noise suppression tasks associated with industrial equipment [
8,
9,
10]. In 1995, the pioneering wavelet threshold denoising algorithm was first proposed by Donoho and Johnstone [
11]. Later, Donoho further supplemented the theoretical basis of soft-thresholding denoising, optimizing the coefficient shrinkage strategy [
12]. The core operational mechanism of this approach lies in the threshold-based processing of wavelet coefficients: when the magnitude of the component wavelet coefficients falls below a predefined threshold, the corresponding components are regarded as noise-dominant and are eliminated. In contrast, coefficients exceeding this threshold are identified as target-signal-dominant and are either retained intact or shrunk toward zero by a fixed value. Subsequently, the denoised signal is retrieved via wavelet reconstruction utilizing the adjusted wavelet coefficients. This foundational algorithm has sparked extensive scholarly investigations, with research efforts primarily focusing on the improvement and optimization of wavelet function selection [
13], decomposition level selection [
14,
15], threshold selection methods [
16,
17], and threshold functions [
18]. Liu, H. proposed an improved wavelet threshold function based on noise variance estimation, which enhanced the adaptability of denoising for non-stationary signals [
19]. Bayer, F. designed an iterative wavelet threshold method, effectively reducing signal distortion caused by fixed thresholding [
20]. Qiao, Y. proposed a seismic signal denoising method integrating VMD and improved wavelet thresholds, which strengthened the extraction of weak fault information in complex noise [
21]. Zhang, L. developed a speech enhancement method based on improved wavelet thresholds and optimized VMD, balancing noise suppression and signal detail retention [
22]. Nevertheless, the wavelet threshold denoising method exhibits inherent sensitivity to the local time–frequency characteristics of signals. Consequently, when confronted with processing complex signals with intricate time–frequency distributions, this method may suffer from inadequate denoising effects [
23] or give rise to considerable deviations in the processed results.
In 1998, the authors of [
24] proposed EMD. Based on the time-scale attributes of the data, this method achieves adaptive signal decomposition. Although it overcomes the limitations of the wavelet threshold denoising technique, it inevitably exhibits intrinsic drawbacks related to end effects and mode mixing [
25]. In 2009 [
26], ensemble empirical mode decomposition (EEMD) was proposed on the basis of noise-assisted analysis. Although this method rectifies certain limitations of EMD and improves the precision of decomposition, it exhibits insufficient robustness in signal decomposition [
27]. In practical applications [
28], the discrete wavelet transform (DWT) has been utilized to remove noise from partial GIS discharges. Despite its effectiveness in eliminating white noise, this approach faces challenges when confronted with highly nonlinear and non-stationary data [
9]. An adaptive noise cancellation algorithm has been combined with EMD to partition narrowband interference into multiple frequency bands. Despite its remarkable adaptive filtering performance, this approach is prone to the loss of certain time or frequency scales, making it unable to retrieve the inherent characteristics of the original signal [
29,
30]. EEMD was utilized to denoise partial discharge and vibration signals from transformers, achieving effective suppression of mode mixing while maximizing the preservation of useful information within intrinsic mode functions (IMFs). However, EEMD involves multiple random sampling and decomposition processes, making it challenging to select the appropriate regularization parameters. Furthermore, repeated testing and adjustment are indispensable for signal extraction, significantly impairing the efficiency and operation rate of signal processing.
To overcome these challenges, VMD was introduced in 2014 [
31]. This algorithm is capable of adaptively matching the optimal center frequency and constrained bandwidth corresponding to each Intrinsic Mode Function (IMF), which promotes the effective separation of IMFs and partitioned signals in the frequency domain, thereby acquiring valid decomposition components of the analyzed signal. This not only overcomes inherent drawbacks (e.g., mode mixing) existing in traditional EMD but also demonstrates superior time–frequency localization performance [
32]. Given that vibration signals from textile machinery are generally characterized by high noise content and complex harmonic components, the adoption of VMD aids in extracting the local time–frequency characteristics of signals while removing noise and harmonic components. Consequently, the VMD algorithm is applicable for denoising the vibration signals of rolling bearings. However, it remains critical to overcome the inherent deficiencies of the algorithm, improve computational efficiency, and ensure the accuracy and anti-interference capability of vibration monitoring systems.
Therefore, the VMD-based denoising method generally requires empirical knowledge or multiple iterative trials to determine the values of two core parameters: the penalty factor
and the number of intrinsic mode functions K. Recent studies [
33] have explored various meta-heuristic algorithms to automate the parameter tuning of VMD, yet the balance between exploration and exploitation remains a challenge. Excessively large or small values assigned to
and K will cause insufficient time and frequency resolution in the process of signal decomposition, indicating that the decomposition results are unable to accurately capture the true inherent characteristics and information of the signal.
To acquire the optimal VMD parameters, a novel joint signal denoising method is presented and applied to the denoising process of rolling bearing fault signals. The proposed joint denoising method leverages the IEWOA optimization algorithm to overcome the inherent randomness in VMD parameter determination; meanwhile, in combination with wavelet threshold denoising technology, it enhances the comprehensive denoising performance of the algorithm and maintains signal integrity. The application of this method can address the low signal-to-noise ratio issue of early fault information in rolling bearings, thereby enabling effective extraction of fault features.
In existing research, VMD has become a mainstream signal decomposition method for bearing denoising due to its superior stability over EMD/EEMD. The advantages and disadvantages of various methods are shown in
Table 1. However, its performance is limited by parameter optimization. Recent studies have focused on improving VMD with intelligent optimization algorithms, but key limitations remain: unbalanced exploration/exploitation, trade-offs between computational efficiency and denoising effect, and the lack of fair comparisons. COA-VMD [
34] simulates coati foraging behavior, introducing a dynamic weight strategy to optimize VMD’s
and K. Applied to acoustic signal denoising of high-voltage shunt reactors, its core improvement lies in “population segmentation + adaptive step size”, enhancing parameter optimization accuracy for complex signals. EWOA-VMD [
35] improves the WOA with chaotic initialization and Levy flight to optimize VMD parameters for fault diagnosis. Improved initialization strategies, such as the Sobol sequence and chaotic mapping [
36], have been proven to significantly enhance the global convergence of the WOA. Core Improvement: “Chaotic perturbation + adaptive inertia weight”, alleviating local optimum issues.
Inadequate Adaptability and Stability of Parameter Optimization: Existing algorithms (e.g., PSO-VMD [
37], GA-VMD [
38], COA-VMD) struggle to balance global exploration and local exploitation, with local optimum rates generally >10%. Moreover, VMD parameter optimization relies on fixed objective functions, failing to adapt to the nonlinear/non-stationary characteristics of bearing signals. According to information theory, envelope entropy is highly sensitive to the periodic impacts generated by bearing faults; a signal with distinct impulsive features exhibits a lower entropy value, whereas a noise-contaminated signal shows high entropy. Minimizing this indicator enables the adaptive identification of optimal VMD parameters, thereby maximizing the clarity of fault-related components while suppressing random interference [
39].
To optimize VMD parameters, an innovative joint denoising approach for rolling bearing fault signals is introduced to optimize VMD parameters. The proposed approach leverages the IEWOA to eliminate randomness in VMD parameter selection while applying wavelet threshold denoising to enhance the overall performance and preserve signal integrity. This methodology prevents low signal-to-noise ratios (SNRs) in early-stage bearing failures, facilitating fault feature extraction.
Existing algorithms struggle to balance global exploration and local exploitation, with local optimum rates > 10%; VMD parameter optimization relies on fixed objective functions, failing to adapt to the nonlinear/non-stationary characteristics of bearing signals. Most methods are validated on single datasets and fail to address coupled noise in practical engineering; secondary denoising strategies lack dataset-specific design. Validating the robustness across multiple public datasets is essential for ensuring the generalizability of denoising models [
40].
The main contributions of this study are summarized as follows:
A method combining IEWOA-VMD with wavelet secondary denoising is proposed. This method first performs VMD decomposition on the signal and then conducts secondary wavelet decomposition on the basis of the VMD-decomposed results. The VMD parameters are determined by the IEWOA. This method significantly reduces the likelihood of the whale optimization algorithm (WOA) falling into a local optimum and enables the acquisition of higher-quality vibration signals after secondary denoising. The integration of secondary denoising techniques has shown superior performance in preserving fault harmonics compared to single-stage methods [
41].
For the purpose of validating the superior performance of the IEWOA-VMD + wavelet secondary denoising method, this study conducts a comparative analysis between the proposed method and two conventional approaches: VMD and EEMD. Meanwhile, the data processed by different methods are input into various models for training and comparative analysis. The evaluation metrics used in the comparison include RMSE, SNR, NCC, accuracy, and loss value. Experimental results on two datasets—the Bearing Dataset of CWRU in the United States, and the Bearing Dataset of PU in Germany—show that the proposed method outperforms other comparative methods in terms of performance on both datasets.
The IEWOA introduces six improvements to the original WOA: First, the Sobol sequence is incorporated into the population initialization stage to reduce the possibility of the algorithm falling into a local optimum due to uneven initial population distribution. Second, a nonlinear parameter adjustment strategy is adopted during the iteration process to improve the algorithm’s global and local search capabilities. Third, a heuristic probability strategy is integrated into the whale position update process to balance the algorithm’s local exploitation and global exploration capabilities. Fourth, the Levy flight strategy is introduced to prevent the algorithm from falling into local optimum traps in the later stages of iteration. Fifth, adaptive t-distribution mutation is added in the late iteration stage to enhance the model’s ability to jump out of local optima [
42]. Sixth, a reflective boundary strategy is applied to whales at edge positions to avoid the problem of reduced search ability of the algorithm caused by unchanged whale positions after updates. To verify the performance of the IEWOA, this paper compares it with four other optimization algorithms and confirms its performance advantages through the comparison of fitness values.
2. Variational Mode Decomposition
VMD is an adaptive signal processing technique based on Wiener filtering that dispenses with the requirement of predefining the mathematical model corresponding to the signal [
31]. Given a predefined number of modes K, VMD iteratively seeks optimal solutions for variational modes, yielding K Band-limited intrinsic mode functions (BIMFs) with frequency centers and mode functions. Each BIMF is an amplitude-modulated and frequency-modulated (AM-FM) signal. To estimate bandwidth, the constrained variational problem is formulated as Equation (
1), with the constraint condition that the sum of all decomposed mode functions equals the original signal, as shown in Equation (
2):
where
is the decomposed mode function,
is the center frequency,
is the partial derivative with respect to time,
is the unit impulse function, and
is the original signal. Introducing the penalty factor
and Lagrangian multiplier
yields the augmented Lagrangian function
L:
The Alternating Direction Method of Multipliers (ADMM) iteratively identifies saddle points of the augmented Lagrangian function
L (Equation (
3)). Initialize
, and
; then, increment
and loop
to update the frequency-domain mode function
using Equation (
4):
Using Equation (
5), update
:
where
is the mode function in the frequency domain,
is the Lagrangian multiplier in the frequency domain, and
is the original signal in the frequency domain.
Update the Lagrangian multiplier
using Equation (
6) to ensure the convergence of the variational problem:
where
is the noise tolerance coefficient (recommended
for optimal denoising).
Repeat Equations (3)–(6) until the convergence condition defined in Equation (
7) is satisfied:
Output K components upon loop termination.
4. Wavelet Threshold Denoising
The time–frequency localization characteristic of wavelet transform can focus on points of abrupt signal changes (such as the impact pulse generated by bearing failure). By filtering through thresholds, only wavelet coefficients dominated by noise are removed, while key information such as the amplitude and phase of the fault features is fully preserved. This is highly consistent with the characteristics of early fault signals of rolling bearings, i.e., “weak impact and strong noise masking”, avoiding the excessive smoothing or distortion of features by other methods. The noise of bearing vibration signals is distributed over a wide frequency band. Wavelet thresholding can decompose the signal into detailed components of different frequency bands through multi-scale decomposition and suppress noise in each frequency band in a targeted manner; in particular, it can achieve precise separation of noise that overlaps with the fault’s characteristic frequency band, without losing features. Moving-average filtering only suppresses high-frequency noise and cannot handle low-frequency interference; it over-smooths fault impact characteristics, resulting in the loss of weak fault information; and it has poor adaptability to non-stationary signals. Lacking time–frequency localization capability, this approach cannot distinguish “noise in the same frequency band as the fault” and is prone to spectral leakage for non-stationary signals, resulting in distortion of fault characteristics. EMD suffers from mode aliasing and endpoint effects, making it difficult to accurately separate fault features from noise components, and resulting in poor stability in decomposing strong noise signals. For the effective IMF components after VMD decomposition, wavelet multi-scale decomposition is further used to remove residual noise that overlaps with fault features, while retaining the weak fault harmonic components that are not fully highlighted in the IMF components. This combination of “coarse division + fine extraction” makes up for the shortcomings of single VMD decomposition in suppressing fine-grained noise, as well as the limited ability of single wavelet thresholds to separate strongly coupled noise. The workflow of wavelet threshold denoising is depicted in
Figure 3.
Decomposition and reconstruction stages form the core components of wavelet transform, and parameter selection imposes a significant influence on the obtained results. Wavelet basis functions are the fundamental basis of wavelet analysis; the selection of different wavelet basis functions directly affects the final decomposition and reconstruction results, ultimately determining the quality of the denoised signal. The number of decomposition levels is typically chosen in accordance with specific signal characteristics and application requirements; however, this method only provides fixed values for reference. The selection of the optimal decomposition level has a profound impact on the denoising effect, rendering it a key factor that determines the performance of the wavelet threshold denoising algorithm. Furthermore, the quality of wavelet threshold denoising results is dependent on threshold selection: an excessively high threshold will induce signal distortion, while an excessively low threshold will lead to residual noise remaining in the signal, thereby resulting in unsatisfactory denoising performance.
4.1. Selection of Wavelet Basis Functions
When utilizing the wavelet threshold method for the denoising of rolling bearing vibration signals, the selection of wavelet basis functions is contingent upon the demand for accurate depiction of vibration signal information. The primary factors generally considered include whether the function exhibits compact support and sufficient vanishing moment characteristics within a specified interval. The Daubechies (‘db’) function, as a high-precision wavelet basis function, not only possesses orthogonality and satisfies the compact support condition but also maintains a high degree of similarity with rolling bearing vibration signals. Consequently, this research selects the ‘db’ wavelet function as the wavelet basis for the analysis of rolling bearing vibration signals.
4.2. Wavelet Threshold Selection Criteria
Establishing selection criteria for wavelet thresholds is an essential step in the wavelet threshold denoising process. Generally, the selection methods include four criteria: MiniMaxi, SGToloG, Rigrsure, and Heursure. On this basis, this paper introduces the threshold function proposed by Qiao et al. (which integrates soft and hard threshold functions) and the improved threshold processing function proposed by [
22] (which falls between hard and soft threshold functions) for comparative selection.
The MiniMaxi criterion determines the threshold range by virtue of the maximum and minimum values in the signal; it is capable of removing signal components that are lower than the minimum value or higher than the maximum value, rendering it suitable for the filtering and denoising of signals with a low SNR.
The SGToloG criterion implements a fixed threshold across the entire signal, thereby eliminating signal components below this predefined threshold. This criterion is particularly effective for the denoising of signals with relatively stable noise levels.
The Rigrsure criterion is mainly employed in cases where there are significant variations in noise levels, as it can dynamically adjust the threshold based on the local average of the signal. By preserving a greater amount of high-frequency information, the Rigrsure criterion is well suited for processing high-frequency signals.
The Heursure criterion integrates the advantages inherent to both the SGToloG and Rigrsure criteria, and it ascertains the threshold through multiple iterative computations; its objective is to retain as many inherent signal characteristics as possible while guaranteeing the desired denoising performance.
Considering that heuristic denoising (based on the Heursure criterion) has obvious advantages over the other three thresholds, and since the vibration signals of rolling bearings are mainly in the medium- and low-frequency range, this study adopts the Heursure criterion to denoise the vibration signals of rolling bearings.
4.3. Selection of the Number of Wavelet Decomposition Layers
Typically, the determination of the number of wavelet decomposition layers is performed manually. The employment of fixed decomposition layers will inevitably introduce limitations to the signal denoising process. As the number of decomposition layers increases, more detailed information within the signal can be extracted. However, when the layer count exceeds a certain critical point, overfitting may occur, where noise is erroneously classified as signal, and the useful components of the original signal are eliminated. Consequently, in practical application scenarios, it is necessary to comprehensively assess the signal processing performance and select a reasonable number of wavelet decomposition layers to ensure the integrity of the processed signal.
6. Experimental Analysis
6.1. Vibration Signal Analysis
The time-domain and frequency-domain representations of the four different state signals under no-load conditions from the CWRU dataset are shown in
Figure 5. From the time-domain diagrams, all signals exhibit irregular amplitude fluctuations within the range of [−0.8, 0.8] (normal: [−0.6, 0.6]; inner: [−0.75, 0.75]; ball: [−0.8, 0.8]; outer: [−0.75, 0.75]), with no obvious fault-related impulse features, because the early fault signals are weak and masked by background noise.
Under no-load conditions, the PU dataset selects six different states as samples, including normal state, artificial outer ring damage, real outer ring damage, artificial inner and outer ring damage, artificial inner ring damage, and real outer ring damage. The time-domain and frequency-domain representations of the signals for these six states are shown in
Figure 6. Notably, the compound fault state shows no obvious superposition of inner/outer ring fault features in either the time or frequency domain, further confirming that noise severely obscures fault information.
As shown in
Figure 5 and
Figure 6, the time-domain signal distributions under different bearing conditions are very similar, making it difficult to distinguish between fault types. Although there are subtle differences among these four cases, it remains challenging to differentiate them. The above analysis indicates that the early fault signals of rolling bearings are characterized by “noise dominance and weak fault features” in both the time and frequency domains. Traditional methods fail to effectively separate fault features from noise, due to insufficient adaptability to such complex signals. This directly motivates the proposed IEWOA-VMD + wavelet threshold denoising method, which aims to first decompose the signal into pure modal components via optimized VMD, and then enhance fault features through secondary denoising.
6.2. Signal Decomposition via IEWOA-VMD
The results from CWRU are shown in
Figure 7, and the results from PU are shown in
Figure 8. Due to the differences in the operating conditions of the signals, it can be observed that there are variations in the amplitude, phase, and instantaneous frequency between the two datasets. Therefore, in order to obtain effective components, the optimal VMD decomposition parameters obtained for the two datasets after IEWOA optimization also differ. In
Figure 7, the three decomposed IMFs exhibit distinct frequency characteristics. The center frequencies of the IMFs are non-overlapping, and no energy leakage between components is observed, confirming the absence of mode mixing. For the PU dataset (
Figure 8), the optimized K = 6 is due to the more complex harmonic components of real-world fault signals. The first four IMFs account for 91% of the total signal energy: IMF-1 matches the theoretical outer ring fault frequency, IMFs 2–4 correspond to fault harmonics, and IMFs 5–6 are low-energy noise components. This indicates that IEWOA-VMD adaptively decomposes the signal into “fault-dominant IMFs” and “noise-dominant IMFs” based on the dataset characteristics.
After the IEWOA-VMD decomposition, neither mode mixing nor over-decomposition occurred between the modes of the CWRU and PU datasets. This indicates that the IEWOA achieved global optimization and exhibited robustness. An experimental comparison of VMD decomposition was conducted among the IEWOA, COAT, DE, GA, and PSO algorithms. The iteration curves of these five optimization algorithms are shown in
Figure 9 and
Figure 10, where it can be seen that, compared with other algorithms, the IEWOA demonstrates a faster convergence rate.The parameter settings and optimization durations of the five algorithms are shown in
Table 4. Although the PSO algorithm performs local optimization more rapidly in the initial stage for the PU dataset, its global search capability is inferior to that of the IEWOA, and it falls into the trap of local optimization. During the iteration process, the IEWOA can find positions with smaller average envelope entropy. Moreover, when other algorithms fall into the trap of local optimization, the IEWOA can identify better solutions. Real-time monitoring of industrial equipment typically requires a signal processing latency of ≤1 s per data segment. Most algorithms meet this requirement in terms of computation time, but IEWOA-VMD has the shortest computation time.
To verify the effectiveness of the improvements made to the WOA in this study, ablation experiments were conducted on the IEWOA using the two datasets; the results are shown in
Figure 11 and
Figure 12.
Table 5 records the final fitness values of the ablation experiments across different datasets.
Among the terms cited above, NWOA refers to the WOA integrated with adaptive T-distribution perturbation, IWOA refers to the WOA integrated with differential perturbation, LWOA refers to the WOA integrated with flight strategy, and BWOA refers to the WOA integrated with boundary handling. It can be seen from
Table 2,
Figure 11 and
Figure 12 that the improved whale optimization algorithm proposed in this study outperforms the single improved algorithms across different datasets. Particularly in the more complex Paderborn dataset, the advantages of the proposed improved algorithm are more prominent, verifying the effectiveness of the improvement. Moreover, in the CWRU dataset, the proposed algorithm found better optimal solutions during the last two iterations, further demonstrating its strong global optimization capability.
The results of decomposing the other three signal states from the CWRU dataset using IEWOA-VMD are shown in
Figure 13,
Figure 14 and
Figure 15.
It can be seen from
Figure 13 that, even though the kurtosis values of the two original signals are very close, mode mixing still does not occur in the decomposed IMF2 and IMF3, indicating the effectiveness of the IEWOA in determining parameters.
6.3. Secondary Denoising via Wavelet Thresholding
To obtain the optimal results of wavelet threshold denoising, the parameters of the wavelet method were appropriately configured to obtain optimal wavelet threshold denoising results.
(1) Selection of Wavelet Basis Function: Considering the compact support, the orthogonality of the wavelet basis function, and its high similarity to the vibration signals of rolling bearings, the ‘db’ wavelet function was selected as the wavelet basis function for analyzing the vibration signals of rolling bearings, in order to achieve better denoising results.
(2) Selection of Wavelet Threshold: Initially, the decomposition level of the wavelet was fixed at 6, and different threshold selection rules were applied to process each mode of VMD. The SNR, RMSE, and NCC were determined after the denoising process with different wavelet bases in the ‘db’ sequence. Experiments on wavelet threshold selection were conducted on the CWRU dataset, as shown in
Figure 16.
As shown in the comprehensive selection shown in
Figure 16, when the Zhang 2025 threshold is applied for denoising, the utilization of db19 yields the maximum SNR and the minimum RMSE; therefore, ‘db19’ is selected as the wavelet basis function for rolling bearing vibration signals. Similarly, when the Rigrsure threshold is adopted, ‘db8’ is chosen as the wavelet basis function. For the heuristic threshold, MiniMaxi threshold, SGToloG threshold, and Qiao 2021 threshold, ‘db19’ is selected. The results of the same experiment conducted on the PU dataset are presented in
Figure 17. When the Zhang 2025 threshold is applied for denoising, the utilization of ‘db12’ achieves the maximum SNR and the minimum RMSE, and so on.
(3) Selection of the Number of Wavelet Decomposition Levels: The denoising results shown in
Figure 16 and
Figure 18 indicate that the application of the Zhang 2025 threshold for denoising achieved the optimal SNR, RMSE, and NCC, outperforming the other three wavelets. Therefore, in this study, the heuristic wavelet threshold method was used to denoise each mode after VMD, with the ‘db19’ wavelet basis and nine decomposition levels selected. The results of the experiment on the number of wavelet decomposition levels conducted on the PU dataset are presented in
Figure 19; thus, for denoising the data from the PU dataset, wavelet decomposition was performed with the Zhang 2025 threshold, ‘db12’ wavelet basis, and 10 decomposition levels.
6.4. Comparison of Different Denoising Methods
To verify the effectiveness of the proposed method, the denoising results of wavelet, EEMD, VMD with fixed
K and
a (where
and
a = 2000), four optimization algorithms combined with VMD, and the method proposed in this study are presented in
Table 6.
The RMSE values of the proposed denoising method under the different datasets were 0.00041 and 0.00013—the smallest values obtained among the eight denoising methods. In terms of SNR, the proposed method ranks second only to EEMD on the CWRU dataset, and it far outperforms the VMD method on the Paderborn dataset. Compared with other denoising methods, the proposed method can better retain the original signal information, outperforming all VMD-based optimized methods on both datasets. Compared with DE-VMD, the proposed method improves the SNR by 4.65% and 45.3%. Moreover, on the premise of ensuring low error, the SNR index leads in a balanced manner. The denoising method proposed in this paper achieves higher NCC values, at 0.9689 and 0.9798, respectively. Compared with the next-best baseline, the proposed method improves NCC by 0.9% (CWRU) and 4.4% (PU), outperforming PSO-VMD, GA-VMD, and COAT-VMD consistently. The NCC values are close to 1, proving that the proposed method causes the least damage to the original structure of the signal. Compared with all other methods, the proposed method achieves a more balanced and superior performance in terms of RMSE, SNR, and NCC; its advantages stem from two core improvements: the IEWOA’s six enhancements enable more accurate global optimization of VMD parameters, avoiding the mode mixing and over-decomposition that plague baseline optimization algorithms.
Through the analysis of four different denoising methods under different noise levels, it can be concluded that the denoising method proposed in this paper effectively removes the noise in the signal. Furthermore, it effectively combines the adaptability of VMD, the time–frequency locality of wavelet decomposition, and the parameter tuning advantage of the IEWOA. In addition, as it performs optimally in two types of dataset with significant differences, it can be concluded that this method has extremely strong robustness.
6.5. Comparison of Different Methods Among Various Models
To better highlight the effectiveness of the proposed method, a comparison was conducted with five models—CNN, CNN+BiGRU+attention, CNN+BiTCN, CNN+transformer, and CNN+TCN—using three approaches: EEMD, VMD with fixed K and
a (where K = 4 and
a = 2000), and the method proposed in this paper. The convergence curves of different approaches in the CNN model are shown in
Figure 20 and
Figure 21 below.
The method proposed in this paper and the VMD method exhibit a significant convergence advantage in the initial stage, with the loss value rapidly decreasing from 1.75 to below 0.75. In contrast, the EEMD method shows a slow downward trend and takes 40 iterations to reach the level reached by the other methods after 20 iterations. It is worth noting that the proposed method and VMD essentially converge after 40 iterations, while the EEMD method maintains a loss value of around 0.5 throughout the entire training process, with a fluctuation range significantly larger than that of the other methods. This indicates insufficient stability in its optimization process.
The validation loss curve more significantly reflects the differences between the methods. After 20 iterations, the loss values of the proposed method and VMD have stabilized below 0.5, finally reaching a stable state of approximately 0.25. However, the validation loss of the EEMD method remains above 0.5, and obvious fluctuations occur in the 80–100-iteration interval, indicating that its generalization ability on unseen data is weak. It is noteworthy that the validation loss of all methods is slightly higher than the training loss, but the gap between the proposed method and VMD is controlled within 0.1, demonstrating good generalization ability. In terms of training accuracy, the proposed method and the VMD method show excellent performance: the accuracy exceeds 0.8 within 20 iterations, stabilizes above 0.95 after 40 iterations, and finally approaches the perfect value of 1.0. In contrast, the EEMD method only reaches an accuracy of 0.7 after 60 iterations and finally stabilizes at around 0.8. The validation accuracy curve highlights the differences between the various methods: the proposed method and VMD stabilize above 0.9 after 40 iterations, while the EEMD method only reaches approximately 0.7 in the end, with obvious fluctuations.
A cross-analysis of various indicators shows the following: in terms of convergence speed, the proposed method ≈ VMD > EEMD; in terms of stability, the proposed method < VMD < EEMD. Through analysis, it can be concluded that the proposed method can more effectively extract the essential features of signals. The convergence curves of the remaining CWRU data in the model are shown in
Figure 22,
Figure 23,
Figure 24 and
Figure 25.
In terms of the loss convergence curve, the method proposed in this paper exhibits the optimal convergence performance and the best generalization ability, with a faster initial convergence speed: within 20 iterations, the loss value decreases from approximately 1.8 to 0.6. Under the same number of iterations, VMD only decreases to 0.8, and EEMD only decreases to 1.2. In the later stage, the proposed method shows stronger stability: after 40 iterations, it stabilizes in the range of 0.25 ± 0.05, which is significantly better than VMD and EEMD. In terms of the accuracy convergence curve, the proposed method exceeds 0.9, which is earlier than both VMD and EEMD.
The specific optimal results are shown in
Table 7.
As shown in
Table 7, the proposed method exhibits significant advantages in core metrics (training accuracy, training loss, validation accuracy, and validation loss) across five models and two datasets, with its performance enhancement deeply rooted in the physical compatibility between the denoising mechanism and bearing fault signals. From a physical perspective, first, the proposed method achieves “coarse noise separation and fine feature extraction” by optimizing VMD parameters through the IEWOA and implementing dataset-specific wavelet thresholding. Early bearing fault signals are characterized by weak impact features masked by multi-band noise. VMD decomposition strictly adheres to the physical logic of “modal orthogonality and bandwidth constraints”, effectively separating faults’ fundamental frequencies, harmonics, and noise. Wavelet thresholding further precisely suppresses residual noise, resulting in significantly higher “feature purity” of signals input to the model. For example, in the CNN model on the CWRU dataset, the training loss of the proposed method is 0.0045142, which is 30.6% lower than that of VMD (0.0063635). This is because the model no longer needs to learn redundant noise information, allowing the loss function to quickly converge to a low level. Second, the PU dataset contains compound faults (outer ring + inner ring damage) with superimposed frequency components. Traditional methods such as EEMD are prone to modal aliasing, leading to distorted features and poor model generalization. The IEWOA balances exploration and exploitation through six improvements, ensuring that the fault features extracted by VMD decomposition (e.g., outer ring fault fundamental frequency of 100 Hz and inner ring fault harmonic of 200 Hz) are more consistent with real physical phenomena. Thus, the validation accuracy of the proposed method in the CNN+BiTCN model on the PU dataset reaches 0.9175627, which is 23.1% higher than that of EEMD (0.7455197), with smaller fluctuations. Third, the core physical features of bearing fault signals (i.e., impact pulses, frequency harmonics) are universal. The denoised signals retained by the proposed method are not dependent on specific model structures—CNNs excel at extracting local impact features, while transformers excel at capturing global frequency dependencies. Pure fault signals can adapt to the feature learning logic of different models; hence, the proposed method maintains superior performance across all five models, verifying the physical reliability of the denoising effect. Fourth, complex models have more parameters and stronger fitting capabilities. If the input signals contain noise, overfitting is likely to occur (e.g., the validation loss of EEMD-denoised signals reaches 0.4963078 in the CNN+transformer model). The proposed method achieves a better balance between “signal-to-noise ratio and feature integrity”, reducing the model’s over-learning of noise; its training loss in the CNN+transformer model on the CWRU dataset is only 0.0016482, which is 16.6% of that of EEMD (0.0099050) and 21.5% of that of VMD (0.007681). In summary, the performance advantages of the proposed method are not dependent on model adaptation but stem from the accurate capture of the physical nature of fault signals and effective noise suppression; its cross-model and cross-dataset consistency, combined with statistical significance, confirm the reliability and universality of the denoising method, providing a more solid signal preprocessing support for early bearing fault diagnosis.
7. Conclusions
This study proposes a hybrid denoising method combining the IEWOA to optimize VMD with dataset-specific wavelet thresholding for early fault signal processing in rolling bearings. Significant technical breakthroughs were achieved through systematic validation on two public datasets (CWRU and PU) and five deep learning models. The key findings and core contributions are as follows:
(1) The proposed method achieves a root-mean-square error (RMSE) as low as 0.00013 (PU dataset) and 0.00041 (CWRU dataset), representing reductions of 59.4% and 16.3%, respectively, compared to the best baseline method (DE-VMD). The NCC reaches 0.9689–0.9798, an improvement of 2.0–4.4% over DE-VMD, approaching the ideal value of 1. On the PU dataset containing compound faults, the proposed method achieves an SNR of 25.25373 dB, which is 113.5% higher than the traditional VMD and 45.5% higher than COA-VMD, significantly outperforming other methods in complex noise and frequency superposition, scenarios,
(2) The proposed IEWOA reduces the local optimum rate to 8.7%, which is 53.5% lower than that of the original WOA, through six improvements, including Sobol sequence initialization and nonlinear parameter adjustment; it solves the key problem of “exploration–exploitation imbalance” in traditional optimization algorithms and provides a more stable adaptive scheme for VMD parameter optimization. By implementing end-to-end adaptation of “optimized algorithm—signal decomposition—secondary denoising”, the processing challenges of “weak features, strong noise, and multi-frequency band superposition” in rolling bearing fault signals have been solved.
The method described in this paper has high generalization ability and stability. The proposed IEWOA incorporates six improvements (e.g., Sobol sequence initialization, nonlinear parameter adjustment, Lévy flight strategy), effectively overcoming local optima in VMD parameter optimization. The hybrid denoising strategy was evaluated via multiple metrics, achieving optimal performance on both CWRU and PU datasets; cross-model validation confirmed its generalization ability. This method can improve economic benefits and resource savings: by extracting early fault features, it reduces equipment maintenance costs, extends bearing life, meets the predictive maintenance requirements in smart manufacturing, and has significant resource optimization significance. The extracted fundamental frequency and harmonic frequency components of fault features are more prominent, endowing our method with high engineering application value and practical promotion prospects. However, this study only used the CWRU and PU datasets, lacking long-term operational data from real industrial scenarios (such as bearing signals containing wear evolution processes). The performance of our method in the “full fault evolution cycle” has not yet been validated. In the experiment, the influence of environmental factors such as temperature and humidity on the signal was not considered. In industrial settings, environmental variables can cause changes in noise characteristics; the robustness of our method in scenarios with multiple coupled environmental variables needs further verification. The aforementioned limitations mean that the generalizability of this study’s conclusions is limited to “common noise types, typical bearing structures, and laboratory/semi-industrial scenarios”. For special noise conditions, bearing structures with special features, or complex industrial environments, the performance of the method may degrade, requiring targeted adjustments. Future research will focus on three aspects: first, expanding the noise adaptation range by optimizing the wavelet threshold function for impulse noise and time-varying noise in industrial environments; second, developing an adaptive calibration mechanism for IEWOA parameters to improve the method’s versatility for bearings with different structures; and third, verifying the method’s performance throughout the entire fault evolution cycle by combining long-term operating data from real industrial scenarios.