A Hybrid Denoising Model for Rolling Bearing Fault Diagnosis: Improved Edge Strategy Whale Optimization Algorithm-Based Variational Mode Decomposition and Dataset-Specific Wavelet Thresholding

Liu, Xinqi; Zhang, Ruimin; Fan, Jianyong; Li, Lianghong; Li, Zhigang; Zhou, Tao

doi:10.3390/sym18010168

Open AccessArticle

A Hybrid Denoising Model for Rolling Bearing Fault Diagnosis: Improved Edge Strategy Whale Optimization Algorithm-Based Variational Mode Decomposition and Dataset-Specific Wavelet Thresholding

by

Xinqi Liu

¹,

Ruimin Zhang

²,

Jianyong Fan

³,

Lianghong Li

¹,

Zhigang Li

¹ and

Tao Zhou

^1,*

¹

College of Information Science and Technology, Shihezi University, Shihezi 832000, China

²

College of Mechanical and Electrical Engineering, Shihezi University, Shihezi 832000, China

³

Xinjiang Institute of Electronics Co., Ltd., Wulumuqi 830000, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(1), 168; https://doi.org/10.3390/sym18010168 (registering DOI)

Submission received: 10 December 2025 / Revised: 29 December 2025 / Accepted: 7 January 2026 / Published: 16 January 2026

(This article belongs to the Section Engineering and Materials)

Download

Browse Figures

Versions Notes

Abstract

Early fault vibration signals of rolling bearings are non-stationary and nonlinear, with weak fault signatures easily masked by noise. Traditional denoising methods (e.g., wavelet thresholding, empirical mode decomposition (EMD)) struggle to accurately extract effective features. Although variational mode decomposition (VMD) overcomes mode mixing, its core parameters rely on empirical selection, making it prone to local optima and limiting its denoising performance. To address this critical issue, this study aims to propose a hybrid model with adaptive parameter optimization and efficient denoising capabilities, enhancing the signal-to-noise ratio (SNR) and feature discriminability of early fault signals in rolling bearings. The novelty of this work is reflected in three aspects: (1) An improved edge strategy whale optimization algorithm (IEWOA) is proposed, incorporating six enhancements to balance global exploration and local exploitation. Using the minimum average envelope entropy as the objective function, the IEWOA achieves adaptive global optimization of VMD parameters. (2) A hybrid framework of “IEWOA-VMD + dataset-specific wavelet thresholding for secondary denoising” is constructed. The optimized VMD first decomposes signals to separate noise and effective components, followed by secondary denoising, ensuring both adaptable signal decomposition and precise denoising. (3) Comprehensive validation is conducted across five models using two public datasets (Case Western Reserve University (CWRU) and Paderborn Universität (PU)). Key findings demonstrate that the proposed method achieves a root-mean-square error (RMSE) as low as 0.00013–0.00041 and a Normalized Cross-Correlation (NCC) of 0.9689–0.9798, significantly outperforming EEMD, traditional VMD, and VMD optimized by single algorithms. The model effectively suppresses noise interference, preserves the fundamental and harmonic components of fault features, and exhibits strong robustness under different loads and fault types. This work provides an efficient and reliable signal preprocessing solution for early fault diagnosis of rolling bearings.

Keywords:

rolling bearings; IEWOA; VMD; parameter optimization; signal denoising

1. Introduction

Rotating machinery is a core power unit in industrial production, intelligent manufacturing, and agricultural equipment, especially in the context of intelligent agriculture. The maintenance of harvesting and power units is crucial for food security [1]; its reliability is directly dependent on bearings—the failure of which accounts for 60–70% of mechanical transmission malfunctions [2]. As emphasized in recent reviews [3], intelligent diagnosis has become the cornerstone of ensuring the operational stability of complex mechanical systems. Bearing diagnosis methods include vibration monitoring, clearance measurement, and temperature detection, among others. Temperature measurement stands out for its real-time performance and sensitivity. To assess the current state of rotating machinery, changes in temperature in the friction zone are a crucial technical indicator of variations in the operating status of the bearing units. The thermal load of the bearing exhibits three modes—stable temperature, quasi-steady heating, and post-stable temperature jump—which effectively reflect operational states from normal to critical failure. Experimental studies confirm that temperature correlates strongly with vibration and clearance, e.g., bearing failure occurs when the temperature exceeds 73 °C for robotic systems [4]. FEA-based thermal conductivity simulation enables an accurate conversion between surface and friction-zone temperatures. Integrating temperature measurement with other methods forms a comprehensive diagnostic system, ensuring predictive maintenance and improving the useful life of the equipment. Rolling bearings, as a pivotal component integrated into textile machinery, exert a direct and profound influence on the operational stability and safety of the associated equipment. Consequently, the diagnosis of bearing failure is of paramount importance in practical engineering applications. Specifically, fault diagnosis is accomplished through the extraction of fault-relevant feature information that is inherent in the vibration signals acquired during the rolling bearings’ service cycle. Nevertheless, the collected fault signals typically demonstrate prominent nonlinear and non-stationary characteristics, which impose substantial constraints on the ability of traditional diagnostic methods to extract fault features with high efficacy [5,6]. To mitigate this critical challenge, a multitude of researchers in the field have made extensive investigative efforts [7].

Both the wavelet transform method and EMD are celebrated for their superior time–frequency resolution capabilities; thus, they have been extensively employed in noise suppression tasks associated with industrial equipment [8,9,10]. In 1995, the pioneering wavelet threshold denoising algorithm was first proposed by Donoho and Johnstone [11]. Later, Donoho further supplemented the theoretical basis of soft-thresholding denoising, optimizing the coefficient shrinkage strategy [12]. The core operational mechanism of this approach lies in the threshold-based processing of wavelet coefficients: when the magnitude of the component wavelet coefficients falls below a predefined threshold, the corresponding components are regarded as noise-dominant and are eliminated. In contrast, coefficients exceeding this threshold are identified as target-signal-dominant and are either retained intact or shrunk toward zero by a fixed value. Subsequently, the denoised signal is retrieved via wavelet reconstruction utilizing the adjusted wavelet coefficients. This foundational algorithm has sparked extensive scholarly investigations, with research efforts primarily focusing on the improvement and optimization of wavelet function selection [13], decomposition level selection [14,15], threshold selection methods [16,17], and threshold functions [18]. Liu, H. proposed an improved wavelet threshold function based on noise variance estimation, which enhanced the adaptability of denoising for non-stationary signals [19]. Bayer, F. designed an iterative wavelet threshold method, effectively reducing signal distortion caused by fixed thresholding [20]. Qiao, Y. proposed a seismic signal denoising method integrating VMD and improved wavelet thresholds, which strengthened the extraction of weak fault information in complex noise [21]. Zhang, L. developed a speech enhancement method based on improved wavelet thresholds and optimized VMD, balancing noise suppression and signal detail retention [22]. Nevertheless, the wavelet threshold denoising method exhibits inherent sensitivity to the local time–frequency characteristics of signals. Consequently, when confronted with processing complex signals with intricate time–frequency distributions, this method may suffer from inadequate denoising effects [23] or give rise to considerable deviations in the processed results.

In 1998, the authors of [24] proposed EMD. Based on the time-scale attributes of the data, this method achieves adaptive signal decomposition. Although it overcomes the limitations of the wavelet threshold denoising technique, it inevitably exhibits intrinsic drawbacks related to end effects and mode mixing [25]. In 2009 [26], ensemble empirical mode decomposition (EEMD) was proposed on the basis of noise-assisted analysis. Although this method rectifies certain limitations of EMD and improves the precision of decomposition, it exhibits insufficient robustness in signal decomposition [27]. In practical applications [28], the discrete wavelet transform (DWT) has been utilized to remove noise from partial GIS discharges. Despite its effectiveness in eliminating white noise, this approach faces challenges when confronted with highly nonlinear and non-stationary data [9]. An adaptive noise cancellation algorithm has been combined with EMD to partition narrowband interference into multiple frequency bands. Despite its remarkable adaptive filtering performance, this approach is prone to the loss of certain time or frequency scales, making it unable to retrieve the inherent characteristics of the original signal [29,30]. EEMD was utilized to denoise partial discharge and vibration signals from transformers, achieving effective suppression of mode mixing while maximizing the preservation of useful information within intrinsic mode functions (IMFs). However, EEMD involves multiple random sampling and decomposition processes, making it challenging to select the appropriate regularization parameters. Furthermore, repeated testing and adjustment are indispensable for signal extraction, significantly impairing the efficiency and operation rate of signal processing.

To overcome these challenges, VMD was introduced in 2014 [31]. This algorithm is capable of adaptively matching the optimal center frequency and constrained bandwidth corresponding to each Intrinsic Mode Function (IMF), which promotes the effective separation of IMFs and partitioned signals in the frequency domain, thereby acquiring valid decomposition components of the analyzed signal. This not only overcomes inherent drawbacks (e.g., mode mixing) existing in traditional EMD but also demonstrates superior time–frequency localization performance [32]. Given that vibration signals from textile machinery are generally characterized by high noise content and complex harmonic components, the adoption of VMD aids in extracting the local time–frequency characteristics of signals while removing noise and harmonic components. Consequently, the VMD algorithm is applicable for denoising the vibration signals of rolling bearings. However, it remains critical to overcome the inherent deficiencies of the algorithm, improve computational efficiency, and ensure the accuracy and anti-interference capability of vibration monitoring systems.

Therefore, the VMD-based denoising method generally requires empirical knowledge or multiple iterative trials to determine the values of two core parameters: the penalty factor

α

and the number of intrinsic mode functions K. Recent studies [33] have explored various meta-heuristic algorithms to automate the parameter tuning of VMD, yet the balance between exploration and exploitation remains a challenge. Excessively large or small values assigned to

α

and K will cause insufficient time and frequency resolution in the process of signal decomposition, indicating that the decomposition results are unable to accurately capture the true inherent characteristics and information of the signal.

To acquire the optimal VMD parameters, a novel joint signal denoising method is presented and applied to the denoising process of rolling bearing fault signals. The proposed joint denoising method leverages the IEWOA optimization algorithm to overcome the inherent randomness in VMD parameter determination; meanwhile, in combination with wavelet threshold denoising technology, it enhances the comprehensive denoising performance of the algorithm and maintains signal integrity. The application of this method can address the low signal-to-noise ratio issue of early fault information in rolling bearings, thereby enabling effective extraction of fault features.

In existing research, VMD has become a mainstream signal decomposition method for bearing denoising due to its superior stability over EMD/EEMD. The advantages and disadvantages of various methods are shown in Table 1. However, its performance is limited by parameter optimization. Recent studies have focused on improving VMD with intelligent optimization algorithms, but key limitations remain: unbalanced exploration/exploitation, trade-offs between computational efficiency and denoising effect, and the lack of fair comparisons. COA-VMD [34] simulates coati foraging behavior, introducing a dynamic weight strategy to optimize VMD’s

α

and K. Applied to acoustic signal denoising of high-voltage shunt reactors, its core improvement lies in “population segmentation + adaptive step size”, enhancing parameter optimization accuracy for complex signals. EWOA-VMD [35] improves the WOA with chaotic initialization and Levy flight to optimize VMD parameters for fault diagnosis. Improved initialization strategies, such as the Sobol sequence and chaotic mapping [36], have been proven to significantly enhance the global convergence of the WOA. Core Improvement: “Chaotic perturbation + adaptive inertia weight”, alleviating local optimum issues.

Inadequate Adaptability and Stability of Parameter Optimization: Existing algorithms (e.g., PSO-VMD [37], GA-VMD [38], COA-VMD) struggle to balance global exploration and local exploitation, with local optimum rates generally >10%. Moreover, VMD parameter optimization relies on fixed objective functions, failing to adapt to the nonlinear/non-stationary characteristics of bearing signals. According to information theory, envelope entropy is highly sensitive to the periodic impacts generated by bearing faults; a signal with distinct impulsive features exhibits a lower entropy value, whereas a noise-contaminated signal shows high entropy. Minimizing this indicator enables the adaptive identification of optimal VMD parameters, thereby maximizing the clarity of fault-related components while suppressing random interference [39].

To optimize VMD parameters, an innovative joint denoising approach for rolling bearing fault signals is introduced to optimize VMD parameters. The proposed approach leverages the IEWOA to eliminate randomness in VMD parameter selection while applying wavelet threshold denoising to enhance the overall performance and preserve signal integrity. This methodology prevents low signal-to-noise ratios (SNRs) in early-stage bearing failures, facilitating fault feature extraction.

Existing algorithms struggle to balance global exploration and local exploitation, with local optimum rates > 10%; VMD parameter optimization relies on fixed objective functions, failing to adapt to the nonlinear/non-stationary characteristics of bearing signals. Most methods are validated on single datasets and fail to address coupled noise in practical engineering; secondary denoising strategies lack dataset-specific design. Validating the robustness across multiple public datasets is essential for ensuring the generalizability of denoising models [40].

The main contributions of this study are summarized as follows:

A method combining IEWOA-VMD with wavelet secondary denoising is proposed. This method first performs VMD decomposition on the signal and then conducts secondary wavelet decomposition on the basis of the VMD-decomposed results. The VMD parameters are determined by the IEWOA. This method significantly reduces the likelihood of the whale optimization algorithm (WOA) falling into a local optimum and enables the acquisition of higher-quality vibration signals after secondary denoising. The integration of secondary denoising techniques has shown superior performance in preserving fault harmonics compared to single-stage methods [41].
For the purpose of validating the superior performance of the IEWOA-VMD + wavelet secondary denoising method, this study conducts a comparative analysis between the proposed method and two conventional approaches: VMD and EEMD. Meanwhile, the data processed by different methods are input into various models for training and comparative analysis. The evaluation metrics used in the comparison include RMSE, SNR, NCC, accuracy, and loss value. Experimental results on two datasets—the Bearing Dataset of CWRU in the United States, and the Bearing Dataset of PU in Germany—show that the proposed method outperforms other comparative methods in terms of performance on both datasets.
The IEWOA introduces six improvements to the original WOA: First, the Sobol sequence is incorporated into the population initialization stage to reduce the possibility of the algorithm falling into a local optimum due to uneven initial population distribution. Second, a nonlinear parameter adjustment strategy is adopted during the iteration process to improve the algorithm’s global and local search capabilities. Third, a heuristic probability strategy is integrated into the whale position update process to balance the algorithm’s local exploitation and global exploration capabilities. Fourth, the Levy flight strategy is introduced to prevent the algorithm from falling into local optimum traps in the later stages of iteration. Fifth, adaptive t-distribution mutation is added in the late iteration stage to enhance the model’s ability to jump out of local optima [42]. Sixth, a reflective boundary strategy is applied to whales at edge positions to avoid the problem of reduced search ability of the algorithm caused by unchanged whale positions after updates. To verify the performance of the IEWOA, this paper compares it with four other optimization algorithms and confirms its performance advantages through the comparison of fitness values.

2. Variational Mode Decomposition

VMD is an adaptive signal processing technique based on Wiener filtering that dispenses with the requirement of predefining the mathematical model corresponding to the signal [31]. Given a predefined number of modes K, VMD iteratively seeks optimal solutions for variational modes, yielding K Band-limited intrinsic mode functions (BIMFs) with frequency centers and mode functions. Each BIMF is an amplitude-modulated and frequency-modulated (AM-FM) signal. To estimate bandwidth, the constrained variational problem is formulated as Equation (1), with the constraint condition that the sum of all decomposed mode functions equals the original signal, as shown in Equation (2):

min_{u_{k} \cdot w_{k}} \{\sum_{k = 1}^{K} {∥\partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j w_{k} t}∥}_{2}^{2}\},

(1)

\sum_{k = 1}^{K} u_{k} (t) = f (t),

(2)

where

u_{k}

is the decomposed mode function,

w_{k}

is the center frequency,

\partial_{t}

is the partial derivative with respect to time,

δ_{t}

is the unit impulse function, and

f (t)

is the original signal. Introducing the penalty factor

α

and Lagrangian multiplier

λ

yields the augmented Lagrangian function L:

\begin{matrix} L = α \sum_{k = 1}^{K} {∥\partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j w_{k} t}∥}_{2}^{2} + \\ {∥f (t) - \sum_{k = 1}^{K} u_{k} (t)∥}_{2}^{2} + (λ (t) f (t) - \sum_{k = 1}^{K} u_{k} (t)) . \end{matrix}

(3)

The Alternating Direction Method of Multipliers (ADMM) iteratively identifies saddle points of the augmented Lagrangian function L (Equation (3)). Initialize

{\hat{u}}_{k}^{1}, w_{k}^{1}, \hat{λ}

, and

n \leftarrow 0

; then, increment

n \leftarrow n + 1

and loop

k = 1 : K

to update the frequency-domain mode function

{\hat{u}}_{k} (w)

using Equation (4):

{\hat{u}}_{i}^{n + 1} (w) \leftarrow \frac{\hat{f} (w) - \sum_{i < k} {\hat{u}}_{i}^{n + 1} (w) - \sum_{i > k} {\hat{u}}_{i}^{n} (w) + \frac{{\hat{λ}}^{b} (w)}{2}}{1 + 2 α {(w - w_{k}^{n})}^{2}} .

(4)

Using Equation (5), update

w_{k}

:

w_{k}^{n + 1} \leftarrow \frac{\int_{0}^{\infty} w {|{\hat{u}}_{k}^{n + 1} (w)|}^{2} d w}{\int_{0}^{\infty} {|{\hat{u}}_{k}^{n + 1} (w)|}^{2} d w},

(5)

where

{\hat{u}}_{k}

is the mode function in the frequency domain,

\hat{λ}

is the Lagrangian multiplier in the frequency domain, and

\hat{f}

is the original signal in the frequency domain.

Update the Lagrangian multiplier

\hat{λ} (w)

using Equation (6) to ensure the convergence of the variational problem:

{\hat{λ}}^{n + 1} (w) \leftarrow {\hat{λ}}^{n} (w) + γ (\hat{f} (w) - \sum_{k = 1}^{K} {\hat{u}}_{k}^{n + 1} (w)),

(6)

where

γ

is the noise tolerance coefficient (recommended

γ = 0

for optimal denoising).

Repeat Equations (3)–(6) until the convergence condition defined in Equation (7) is satisfied:

\sum_{k = 1}^{K} ({∥{\hat{u}}_{k}^{n + 1} - {\hat{u}}_{k}^{n}∥}_{2}^{2} / {∥{\hat{u}}_{k}^{n}∥}_{2}^{2}) < e^{2} .

(7)

Output K components upon loop termination.

3. VMD Parameter Optimization Based on IEWOA

3.1. Whale Optimization Algorithm

The WOA, proposed in 2016 [43], is a swarm intelligence algorithm inspired by humpback whales’ hunting behavior (bubble-net feeding). This algorithm’s simplicity, ease of implementation, and strong optimization capability have garnered widespread attention. The WOA simulates prey encirclement via circular or “9”-shaped paths, comprising three phases: prey search, shrinking encirclement, and spiral position update. The key steps are as follows:

(1) Initialize population size N, iterations M, and search boundaries; randomly assign whale positions:

\{X_{1} (0) \dots X_{n} (0)\}

.

(2) Calculate the individual fitness of each whale using the fitness function defined in Equation (8), which maximizes the ratio of characteristic energy to noise energy in order to screen the optimal solution

X_{b e s t} (t)

:

f (K, A) = max (L + \frac{E_{C}}{E_{S}}) .

(8)

(3) Compute the linear adjustment parameters

a_{1} a_{2}

using Equation (9), regulating the global and local search capabilities of the algorithm:

a_{1} = 2 (1 - \frac{t}{M}), a_{2} = - (1 + \frac{t}{M}) .

(9)

(4) Update positions: Update the whale positions based on three behavioral phases. First, generate random numbers

r_{1}, r_{2}, r_{3} \in [0, 1]

and calculate the core parameters A, C, and l using Equation (10):

A = 2 a_{1} r_{1} - a_{1}, C = 2 r_{2}, l = (a_{2} - 1) r_{3} + 1 .

(10)

Generate random

P \in [0, 1]

Based on the values of P and A to determine the whales’ behavior:

If

p < 0.5

and

|A| \geq 1

(exploration phase), the whales search for prey globally, and their position update follows Equation (11):

\begin{matrix} D_{rand} = |C X_{rand} (t) - X (t)| \\ X (t + 1) = X_{rand} (t) - A D_{rand} . \end{matrix}

(11)

If

p < 0.5

and

|A| < 1

(exploitation phase), the whales narrow the encirclement to capture prey, with their position update defined by Equation (12):

\begin{matrix} D_{best} = |C X_{best} (t) - X (t)| \\ X (t + 1) = X_{best} (t) - A D_{best} . \end{matrix}

(12)

If

p \geq 0.5

(spiral update), the whales move in a spiral path to approach their prey, and the position update is calculated using Equation (13):

\begin{matrix} D_{s p} = |X_{best} (t) - X (t)| \\ X (t + 1) = D_{best} e^{b l} cos (2 π l) + X_{best} (t) ∣ . \end{matrix}

(13)

Increment

t = t + 1

. If

t < M

, return to (8); otherwise, output the optimum.

When

|A| \geq 1

whales are searching for prey, they need to expand their search range as much as possible, and the A value is positively correlated with the search range and global search capability. When

|A| < 1

, the whales are in the stage of narrowing the encirclement, and the A value is negatively correlated with the local search capability. The global and local search functions at each stage are affected by A.

If

a_{1} = 2

, then

A \in [- 2, 2]

, and the probability that a random value A satisfies

|A| < 1

is 0.5. If

a_{1} < 1

, then

A \in [- 1, 1]

, and the probability that a random value a satisfies

|A| < 1

is 1. Obviously, during the iteration process, the probability that the whale is in the prey-seeking stage and the probability that it is in the stage of narrowing the encirclement are both 0.5.

As the number of iterations increases, the probability that the whale is in the contraction–encirclement stage increases linearly. When the number of iterations exceeds half of the total number, the probability that the whale is in the stage of narrowing the encirclement reaches 1. It is obvious that since

a_{1}

changes linearly from 2 to 0, its value will be very narrow. In this case, the global search ability of the WOA is relatively weak, and it is easy to fall into the trap of local optima. Thus, it is impossible to adapt to different optimization processes. In order to solve this problem, we propose modifying the change in the

a_{1}

value, so as to improve the solution accuracy and convergence speed.

3.2. Improved Exponential Whale Optimization Algorithm (IEWOA)

To avoid local optima, several enhancements are introduced to the WOA:

3.2.1. Sobol Sequence Initialization

The standard WOA initializes populations randomly, yielding uneven distributions. Sobol sequences—low-discrepancy sequences generated via base-2 radical inversion with unique matrices per dimension—are used for uniform coverage [44]. Figure 1 compares random vs. Sobol initialization (500 points).

3.2.2. Nonlinear Parameter Adjustment

Replace linear

a_{1}

with nonlinear

a_{1}^{'}

[45]:

a_{1}^{'} = 2 {[- (1 + k_{1}) {(\frac{t}{M})}^{2} + k_{1} \frac{t}{M} + 1]}^{k_{2}},

(14)

where

k_{1} \in [0, 2]

and

k_{2} > 0

regulate the curve. Setting

k_{1} = - 1

,

k_{2} = 1

recovers

a_{1}^{'} = a_{1}

. Adjusting

k_{1} k_{2}

enhances the global/local search balance.

Add differential perturbation to exploitation:

X (t + 1) = X_{best} (t) - A D_{best} + w_{1} (X_{best} (t) - X (t)) .

(15)

Add a disturbance term and weight factor

\frac{a_{1}}{2}

to the spiral update:

X (t + 1) = \frac{a_{1}}{2} D_{sp} e^{b t} cos (2 π t) + w_{2} sin (X (t)),

(16)

where

w_{1}, w_{2}

are disturbance coefficients.

3.2.3. Heuristic Probability $p_{1}$

Balance global/local search using adaptive probability [46]:

p_{1} = \{\begin{matrix} 0.7 & t < 0.5 \times M \\ 0.4 & t \geq 0.5 \times M . \end{matrix}

(17)

3.2.4. Lévy Flight Strategy

Enhance global search via Lévy flights [47]:

X_{new}^{i} = X^{i} + A \times 0.01 \times step \oplus (X^{i} - C \times X^{rand}),

(18)

where ⊕ denotes point multiplication,

x^{r a n d}

is a randomly selected whale, and the step follows Mantegna’s method.

3.2.5. Adaptive T-Distribution Perturbation

Apply T-distribution mutation with degrees of freedom equal to the iteration count t [48]:

X_{new}^{i} = X_{best} \times [s + (1 - s) \times trnd (t)],

(19)

where

t r n d (t)

is a random number following the T-distribution, and s is the perturbation intensity coefficient calculated by Equation (20):

s = \sqrt{c_{1}} - \sqrt{c_{2}} \times {(\frac{t}{M})}^{2} .

(20)

As the number of iterations increases, s decreases; thus,

(1 - s)

increases. This indicates that the degree of change increases in the later stage, which can better help the optimal whale position escape from local optima in the subsequent rounds.

3.2.6. Boundary Handling

Replace simple truncation with rotational boundary handling, recursively mapping out-of-bound values to feasible regions; the equation is as follows:

X_{n e w}^{i} = \{\begin{matrix} u p p e r - X^{i} & if X^{i} < l o w e r \\ X^{i} - l o w e r & if X^{i} > u p p e r . \end{matrix}

(21)

3.2.7. Trade-Off Between Exploration and Exploitation

The balance between global exploration and local exploitation is critical for the WOA’s optimization performance. The IEWOA achieves this balance through six synergistic strategies, with clear parameter tuning rules and adaptive mechanisms: (1) Replaces the linear

α

of the original WOA with a nonlinear strategy (Equation (14)) to dynamically adjust exploration/exploitation intensity. Nonlinear adjustment avoids the original WOA’s abrupt shift from exploration to exploitation, ensuring a smooth transition. (2) Dynamically adjusts the probability of choosing “encirclement shrinkage” or “prey search” via Equation (17). Adapts to the optimization progress, with early exploration for global coverage and late exploitation for solution refinement. (3) Introduces Lévy flight (Equation (18)) to inject randomness into the search process, preventing premature convergence. Lévy flight enhances exploration in sparse search spaces while avoiding disruption of exploitation in promising regions. (4) Applies T-distribution mutation (Equation (19)) to the optimal solution in late iterations, balancing exploitation with occasional exploration. T-distribution mutation provides heavy-tailed randomness: small perturbations for exploitation in the early–late stages, and larger perturbations for exploration in the mid–late stages. (5) Replaces the original WOA’s truncation boundary with a reflective strategy (Equation (21)) to retain search agents near boundary regions. Avoids the original WOA’s loss of search agents at boundaries, maintaining exploration capability while exploiting boundary regions. (6) Initializes the population using Sobol sequences instead of random distribution. Uniform initialization reduces the risk of premature convergence to local optima, enhancing early exploration.

3.3. VMD Parameter Optimization via IEWOA

Figure 2 shows the workflow using minimum average envelope entropy [49] as the IEWOA’s fitness function to optimize a and K.

Steps: (1) Input population size N, iterations M, and

a / K

ranges; initialize whales via Sobol sequences; set

t = 0

. (2) Apply VMD to the raw signal; compute minimum average envelope entropy per mode; record best

K_{b e s t}, f i t n e s s_{b e s t}, a_{b e s t}

. (3) Calculate

a_{1}^{'}

(Equation (14)) and

a_{2}

(Equation (9)). (4) Update positions using modified formulae (Equations (11)–(13)). (5) Perturb

K_{b e s t}

and

a_{b e s t}

via adaptive T-distribution (Equation (19)). (6) Increment

t = t + 1

. If

t < M

, return to (2); otherwise, output the optima.

4. Wavelet Threshold Denoising

The time–frequency localization characteristic of wavelet transform can focus on points of abrupt signal changes (such as the impact pulse generated by bearing failure). By filtering through thresholds, only wavelet coefficients dominated by noise are removed, while key information such as the amplitude and phase of the fault features is fully preserved. This is highly consistent with the characteristics of early fault signals of rolling bearings, i.e., “weak impact and strong noise masking”, avoiding the excessive smoothing or distortion of features by other methods. The noise of bearing vibration signals is distributed over a wide frequency band. Wavelet thresholding can decompose the signal into detailed components of different frequency bands through multi-scale decomposition and suppress noise in each frequency band in a targeted manner; in particular, it can achieve precise separation of noise that overlaps with the fault’s characteristic frequency band, without losing features. Moving-average filtering only suppresses high-frequency noise and cannot handle low-frequency interference; it over-smooths fault impact characteristics, resulting in the loss of weak fault information; and it has poor adaptability to non-stationary signals. Lacking time–frequency localization capability, this approach cannot distinguish “noise in the same frequency band as the fault” and is prone to spectral leakage for non-stationary signals, resulting in distortion of fault characteristics. EMD suffers from mode aliasing and endpoint effects, making it difficult to accurately separate fault features from noise components, and resulting in poor stability in decomposing strong noise signals. For the effective IMF components after VMD decomposition, wavelet multi-scale decomposition is further used to remove residual noise that overlaps with fault features, while retaining the weak fault harmonic components that are not fully highlighted in the IMF components. This combination of “coarse division + fine extraction” makes up for the shortcomings of single VMD decomposition in suppressing fine-grained noise, as well as the limited ability of single wavelet thresholds to separate strongly coupled noise. The workflow of wavelet threshold denoising is depicted in Figure 3.

Decomposition and reconstruction stages form the core components of wavelet transform, and parameter selection imposes a significant influence on the obtained results. Wavelet basis functions are the fundamental basis of wavelet analysis; the selection of different wavelet basis functions directly affects the final decomposition and reconstruction results, ultimately determining the quality of the denoised signal. The number of decomposition levels is typically chosen in accordance with specific signal characteristics and application requirements; however, this method only provides fixed values for reference. The selection of the optimal decomposition level has a profound impact on the denoising effect, rendering it a key factor that determines the performance of the wavelet threshold denoising algorithm. Furthermore, the quality of wavelet threshold denoising results is dependent on threshold selection: an excessively high threshold will induce signal distortion, while an excessively low threshold will lead to residual noise remaining in the signal, thereby resulting in unsatisfactory denoising performance.

4.1. Selection of Wavelet Basis Functions

When utilizing the wavelet threshold method for the denoising of rolling bearing vibration signals, the selection of wavelet basis functions is contingent upon the demand for accurate depiction of vibration signal information. The primary factors generally considered include whether the function exhibits compact support and sufficient vanishing moment characteristics within a specified interval. The Daubechies (‘db’) function, as a high-precision wavelet basis function, not only possesses orthogonality and satisfies the compact support condition but also maintains a high degree of similarity with rolling bearing vibration signals. Consequently, this research selects the ‘db’ wavelet function as the wavelet basis for the analysis of rolling bearing vibration signals.

4.2. Wavelet Threshold Selection Criteria

Establishing selection criteria for wavelet thresholds is an essential step in the wavelet threshold denoising process. Generally, the selection methods include four criteria: MiniMaxi, SGToloG, Rigrsure, and Heursure. On this basis, this paper introduces the threshold function proposed by Qiao et al. (which integrates soft and hard threshold functions) and the improved threshold processing function proposed by [22] (which falls between hard and soft threshold functions) for comparative selection.

The MiniMaxi criterion determines the threshold range by virtue of the maximum and minimum values in the signal; it is capable of removing signal components that are lower than the minimum value or higher than the maximum value, rendering it suitable for the filtering and denoising of signals with a low SNR.

The SGToloG criterion implements a fixed threshold across the entire signal, thereby eliminating signal components below this predefined threshold. This criterion is particularly effective for the denoising of signals with relatively stable noise levels.

The Rigrsure criterion is mainly employed in cases where there are significant variations in noise levels, as it can dynamically adjust the threshold based on the local average of the signal. By preserving a greater amount of high-frequency information, the Rigrsure criterion is well suited for processing high-frequency signals.

The Heursure criterion integrates the advantages inherent to both the SGToloG and Rigrsure criteria, and it ascertains the threshold through multiple iterative computations; its objective is to retain as many inherent signal characteristics as possible while guaranteeing the desired denoising performance.

Considering that heuristic denoising (based on the Heursure criterion) has obvious advantages over the other three thresholds, and since the vibration signals of rolling bearings are mainly in the medium- and low-frequency range, this study adopts the Heursure criterion to denoise the vibration signals of rolling bearings.

4.3. Selection of the Number of Wavelet Decomposition Layers

Typically, the determination of the number of wavelet decomposition layers is performed manually. The employment of fixed decomposition layers will inevitably introduce limitations to the signal denoising process. As the number of decomposition layers increases, more detailed information within the signal can be extracted. However, when the layer count exceeds a certain critical point, overfitting may occur, where noise is erroneously classified as signal, and the useful components of the original signal are eliminated. Consequently, in practical application scenarios, it is necessary to comprehensively assess the signal processing performance and select a reasonable number of wavelet decomposition layers to ensure the integrity of the processed signal.

5. Combined Denoising Method

5.1. Datasets

To demonstrate the effectiveness and robustness of the proposed method, public rolling bearing fault datasets from CWRU [50] and the PU [51] were selected for experimental validation, with tests performed under no-load, light-load, and heavy-load operating conditions.

For the experimental signals from CWRU, the parameters are specified as follows: fault diameter of 0.007 inches, motor speed of approximately 1730 r/min, and sampling frequency of 48,000 Hz. The outer ring fault is located in the load zone of the outer ring of the fan-end bearing, relative to the 6 o’clock direction. Table 2 enumerates the labels utilized in the experiment, along with their respective explanations.

The sampling frequency is 64,000 Hz, the motor speed is approximately 900 r/min, the number of balls is 8, and the ball diameter is 6.75 mm. Table 3 lists the labels used in the experiment, along with their explanations.

The proposed IEWOA–VMD–wavelet methodology comprises the following steps: (1) Input raw vibration signal. (2) Optimize a and K via the IEWOA. (3) Decompose signal using VMD with optimized parameters. (4) Apply wavelet thresholding to IMFs (‘db’ wavelet base, selected layers, threshold function). (5) Reconstruct denoised signal via superposition. Figure 4 illustrates the workflow. Before applying VMD to decompose vibration signals, it is crucial to determine the number of modal components (K) and the penalty factor (a), as these parameters directly affect the VMD decomposition results. The VMD decomposition parameters are globally optimized using the IEWOA to ensure that the fault information in the signals is fully decomposed. Initially, the maximum number of iterations (T) is set to 50, the number of modal components (K) is selected as an integer within the range of [3–8], and then the penalty factor (a) is chosen as a random value within the range of [500, 3500]. The minimum average envelope entropy is used as the fitness function. Taking the real outer ring fault as an example, a set of 1024 data points is selected as the decomposition sample, and decomposition is performed using the optimization algorithm.

5.2. Evaluation of Denoising Effect

In order to conduct a quantitative analysis of the denoising results, this study uses indicators such as the SNR [52], RMSE, and NCC [53,54] of the signal to evaluate the denoising quality. The definitions of the indicators are as follows:

S N R = 10 lg \frac{\sum_{N}^{n = 1} X^{2} (n)}{\sum_{N}^{n = 1} {(X (n) - Y (n))}^{2}},

(22)

R M S E = \sqrt{\frac{1}{n} \sum_{N}^{n = 1} {(X (n) - Y (n))}^{2}},

(23)

N C C = \frac{\sum_{N}^{n = 1} X (n) Y (n)}{\sqrt{(\sum_{N}^{n = 1} (X^{2} (n)) (\sum_{N}^{n = 1} (Y^{2} (n))))}},

(24)

where

X (n)

represents the original signal and

Y (n)

represents the denoised vibration signal. Evaluation Indicators:

The SNR is positively correlated with the similarity between the denoised signal and the original signal; higher SNR values denote greater similarity.

From the perspective of curve fitting, the RMSE exhibits an inverse relationship with fitting quality: the closer the RMSE is to zero, the more favorable the fitting effect and the lower the degree of signal distortion.

From the perspective of signal graphs, the NCC value is positively associated with denoising effectiveness; values closer to 1 indicate optimal denoising performance and maximal retention of the original signal’s inherent characteristics.

6. Experimental Analysis

6.1. Vibration Signal Analysis

The time-domain and frequency-domain representations of the four different state signals under no-load conditions from the CWRU dataset are shown in Figure 5. From the time-domain diagrams, all signals exhibit irregular amplitude fluctuations within the range of [−0.8, 0.8] (normal: [−0.6, 0.6]; inner: [−0.75, 0.75]; ball: [−0.8, 0.8]; outer: [−0.75, 0.75]), with no obvious fault-related impulse features, because the early fault signals are weak and masked by background noise.

Under no-load conditions, the PU dataset selects six different states as samples, including normal state, artificial outer ring damage, real outer ring damage, artificial inner and outer ring damage, artificial inner ring damage, and real outer ring damage. The time-domain and frequency-domain representations of the signals for these six states are shown in Figure 6. Notably, the compound fault state shows no obvious superposition of inner/outer ring fault features in either the time or frequency domain, further confirming that noise severely obscures fault information.

As shown in Figure 5 and Figure 6, the time-domain signal distributions under different bearing conditions are very similar, making it difficult to distinguish between fault types. Although there are subtle differences among these four cases, it remains challenging to differentiate them. The above analysis indicates that the early fault signals of rolling bearings are characterized by “noise dominance and weak fault features” in both the time and frequency domains. Traditional methods fail to effectively separate fault features from noise, due to insufficient adaptability to such complex signals. This directly motivates the proposed IEWOA-VMD + wavelet threshold denoising method, which aims to first decompose the signal into pure modal components via optimized VMD, and then enhance fault features through secondary denoising.

6.2. Signal Decomposition via IEWOA-VMD

The results from CWRU are shown in Figure 7, and the results from PU are shown in Figure 8. Due to the differences in the operating conditions of the signals, it can be observed that there are variations in the amplitude, phase, and instantaneous frequency between the two datasets. Therefore, in order to obtain effective components, the optimal VMD decomposition parameters obtained for the two datasets after IEWOA optimization also differ. In Figure 7, the three decomposed IMFs exhibit distinct frequency characteristics. The center frequencies of the IMFs are non-overlapping, and no energy leakage between components is observed, confirming the absence of mode mixing. For the PU dataset (Figure 8), the optimized K = 6 is due to the more complex harmonic components of real-world fault signals. The first four IMFs account for 91% of the total signal energy: IMF-1 matches the theoretical outer ring fault frequency, IMFs 2–4 correspond to fault harmonics, and IMFs 5–6 are low-energy noise components. This indicates that IEWOA-VMD adaptively decomposes the signal into “fault-dominant IMFs” and “noise-dominant IMFs” based on the dataset characteristics.

After the IEWOA-VMD decomposition, neither mode mixing nor over-decomposition occurred between the modes of the CWRU and PU datasets. This indicates that the IEWOA achieved global optimization and exhibited robustness. An experimental comparison of VMD decomposition was conducted among the IEWOA, COAT, DE, GA, and PSO algorithms. The iteration curves of these five optimization algorithms are shown in Figure 9 and Figure 10, where it can be seen that, compared with other algorithms, the IEWOA demonstrates a faster convergence rate.The parameter settings and optimization durations of the five algorithms are shown in Table 4. Although the PSO algorithm performs local optimization more rapidly in the initial stage for the PU dataset, its global search capability is inferior to that of the IEWOA, and it falls into the trap of local optimization. During the iteration process, the IEWOA can find positions with smaller average envelope entropy. Moreover, when other algorithms fall into the trap of local optimization, the IEWOA can identify better solutions. Real-time monitoring of industrial equipment typically requires a signal processing latency of ≤1 s per data segment. Most algorithms meet this requirement in terms of computation time, but IEWOA-VMD has the shortest computation time.

To verify the effectiveness of the improvements made to the WOA in this study, ablation experiments were conducted on the IEWOA using the two datasets; the results are shown in Figure 11 and Figure 12. Table 5 records the final fitness values of the ablation experiments across different datasets.

Among the terms cited above, NWOA refers to the WOA integrated with adaptive T-distribution perturbation, IWOA refers to the WOA integrated with differential perturbation, LWOA refers to the WOA integrated with flight strategy, and BWOA refers to the WOA integrated with boundary handling. It can be seen from Table 2, Figure 11 and Figure 12 that the improved whale optimization algorithm proposed in this study outperforms the single improved algorithms across different datasets. Particularly in the more complex Paderborn dataset, the advantages of the proposed improved algorithm are more prominent, verifying the effectiveness of the improvement. Moreover, in the CWRU dataset, the proposed algorithm found better optimal solutions during the last two iterations, further demonstrating its strong global optimization capability.

The results of decomposing the other three signal states from the CWRU dataset using IEWOA-VMD are shown in Figure 13, Figure 14 and Figure 15.

It can be seen from Figure 13 that, even though the kurtosis values of the two original signals are very close, mode mixing still does not occur in the decomposed IMF2 and IMF3, indicating the effectiveness of the IEWOA in determining parameters.

6.3. Secondary Denoising via Wavelet Thresholding

To obtain the optimal results of wavelet threshold denoising, the parameters of the wavelet method were appropriately configured to obtain optimal wavelet threshold denoising results.

(1) Selection of Wavelet Basis Function: Considering the compact support, the orthogonality of the wavelet basis function, and its high similarity to the vibration signals of rolling bearings, the ‘db’ wavelet function was selected as the wavelet basis function for analyzing the vibration signals of rolling bearings, in order to achieve better denoising results.

(2) Selection of Wavelet Threshold: Initially, the decomposition level of the wavelet was fixed at 6, and different threshold selection rules were applied to process each mode of VMD. The SNR, RMSE, and NCC were determined after the denoising process with different wavelet bases in the ‘db’ sequence. Experiments on wavelet threshold selection were conducted on the CWRU dataset, as shown in Figure 16.

As shown in the comprehensive selection shown in Figure 16, when the Zhang 2025 threshold is applied for denoising, the utilization of db19 yields the maximum SNR and the minimum RMSE; therefore, ‘db19’ is selected as the wavelet basis function for rolling bearing vibration signals. Similarly, when the Rigrsure threshold is adopted, ‘db8’ is chosen as the wavelet basis function. For the heuristic threshold, MiniMaxi threshold, SGToloG threshold, and Qiao 2021 threshold, ‘db19’ is selected. The results of the same experiment conducted on the PU dataset are presented in Figure 17. When the Zhang 2025 threshold is applied for denoising, the utilization of ‘db12’ achieves the maximum SNR and the minimum RMSE, and so on.

(3) Selection of the Number of Wavelet Decomposition Levels: The denoising results shown in Figure 16 and Figure 18 indicate that the application of the Zhang 2025 threshold for denoising achieved the optimal SNR, RMSE, and NCC, outperforming the other three wavelets. Therefore, in this study, the heuristic wavelet threshold method was used to denoise each mode after VMD, with the ‘db19’ wavelet basis and nine decomposition levels selected. The results of the experiment on the number of wavelet decomposition levels conducted on the PU dataset are presented in Figure 19; thus, for denoising the data from the PU dataset, wavelet decomposition was performed with the Zhang 2025 threshold, ‘db12’ wavelet basis, and 10 decomposition levels.

6.4. Comparison of Different Denoising Methods

To verify the effectiveness of the proposed method, the denoising results of wavelet, EEMD, VMD with fixed K and a (where

K = 4

and a = 2000), four optimization algorithms combined with VMD, and the method proposed in this study are presented in Table 6.

The RMSE values of the proposed denoising method under the different datasets were 0.00041 and 0.00013—the smallest values obtained among the eight denoising methods. In terms of SNR, the proposed method ranks second only to EEMD on the CWRU dataset, and it far outperforms the VMD method on the Paderborn dataset. Compared with other denoising methods, the proposed method can better retain the original signal information, outperforming all VMD-based optimized methods on both datasets. Compared with DE-VMD, the proposed method improves the SNR by 4.65% and 45.3%. Moreover, on the premise of ensuring low error, the SNR index leads in a balanced manner. The denoising method proposed in this paper achieves higher NCC values, at 0.9689 and 0.9798, respectively. Compared with the next-best baseline, the proposed method improves NCC by 0.9% (CWRU) and 4.4% (PU), outperforming PSO-VMD, GA-VMD, and COAT-VMD consistently. The NCC values are close to 1, proving that the proposed method causes the least damage to the original structure of the signal. Compared with all other methods, the proposed method achieves a more balanced and superior performance in terms of RMSE, SNR, and NCC; its advantages stem from two core improvements: the IEWOA’s six enhancements enable more accurate global optimization of VMD parameters, avoiding the mode mixing and over-decomposition that plague baseline optimization algorithms.

Through the analysis of four different denoising methods under different noise levels, it can be concluded that the denoising method proposed in this paper effectively removes the noise in the signal. Furthermore, it effectively combines the adaptability of VMD, the time–frequency locality of wavelet decomposition, and the parameter tuning advantage of the IEWOA. In addition, as it performs optimally in two types of dataset with significant differences, it can be concluded that this method has extremely strong robustness.

6.5. Comparison of Different Methods Among Various Models

To better highlight the effectiveness of the proposed method, a comparison was conducted with five models—CNN, CNN+BiGRU+attention, CNN+BiTCN, CNN+transformer, and CNN+TCN—using three approaches: EEMD, VMD with fixed K and a (where K = 4 and a = 2000), and the method proposed in this paper. The convergence curves of different approaches in the CNN model are shown in Figure 20 and Figure 21 below.

The method proposed in this paper and the VMD method exhibit a significant convergence advantage in the initial stage, with the loss value rapidly decreasing from 1.75 to below 0.75. In contrast, the EEMD method shows a slow downward trend and takes 40 iterations to reach the level reached by the other methods after 20 iterations. It is worth noting that the proposed method and VMD essentially converge after 40 iterations, while the EEMD method maintains a loss value of around 0.5 throughout the entire training process, with a fluctuation range significantly larger than that of the other methods. This indicates insufficient stability in its optimization process.

The validation loss curve more significantly reflects the differences between the methods. After 20 iterations, the loss values of the proposed method and VMD have stabilized below 0.5, finally reaching a stable state of approximately 0.25. However, the validation loss of the EEMD method remains above 0.5, and obvious fluctuations occur in the 80–100-iteration interval, indicating that its generalization ability on unseen data is weak. It is noteworthy that the validation loss of all methods is slightly higher than the training loss, but the gap between the proposed method and VMD is controlled within 0.1, demonstrating good generalization ability. In terms of training accuracy, the proposed method and the VMD method show excellent performance: the accuracy exceeds 0.8 within 20 iterations, stabilizes above 0.95 after 40 iterations, and finally approaches the perfect value of 1.0. In contrast, the EEMD method only reaches an accuracy of 0.7 after 60 iterations and finally stabilizes at around 0.8. The validation accuracy curve highlights the differences between the various methods: the proposed method and VMD stabilize above 0.9 after 40 iterations, while the EEMD method only reaches approximately 0.7 in the end, with obvious fluctuations.

A cross-analysis of various indicators shows the following: in terms of convergence speed, the proposed method ≈ VMD > EEMD; in terms of stability, the proposed method < VMD < EEMD. Through analysis, it can be concluded that the proposed method can more effectively extract the essential features of signals. The convergence curves of the remaining CWRU data in the model are shown in Figure 22, Figure 23, Figure 24 and Figure 25.

In terms of the loss convergence curve, the method proposed in this paper exhibits the optimal convergence performance and the best generalization ability, with a faster initial convergence speed: within 20 iterations, the loss value decreases from approximately 1.8 to 0.6. Under the same number of iterations, VMD only decreases to 0.8, and EEMD only decreases to 1.2. In the later stage, the proposed method shows stronger stability: after 40 iterations, it stabilizes in the range of 0.25 ± 0.05, which is significantly better than VMD and EEMD. In terms of the accuracy convergence curve, the proposed method exceeds 0.9, which is earlier than both VMD and EEMD.

The specific optimal results are shown in Table 7.

As shown in Table 7, the proposed method exhibits significant advantages in core metrics (training accuracy, training loss, validation accuracy, and validation loss) across five models and two datasets, with its performance enhancement deeply rooted in the physical compatibility between the denoising mechanism and bearing fault signals. From a physical perspective, first, the proposed method achieves “coarse noise separation and fine feature extraction” by optimizing VMD parameters through the IEWOA and implementing dataset-specific wavelet thresholding. Early bearing fault signals are characterized by weak impact features masked by multi-band noise. VMD decomposition strictly adheres to the physical logic of “modal orthogonality and bandwidth constraints”, effectively separating faults’ fundamental frequencies, harmonics, and noise. Wavelet thresholding further precisely suppresses residual noise, resulting in significantly higher “feature purity” of signals input to the model. For example, in the CNN model on the CWRU dataset, the training loss of the proposed method is 0.0045142, which is 30.6% lower than that of VMD (0.0063635). This is because the model no longer needs to learn redundant noise information, allowing the loss function to quickly converge to a low level. Second, the PU dataset contains compound faults (outer ring + inner ring damage) with superimposed frequency components. Traditional methods such as EEMD are prone to modal aliasing, leading to distorted features and poor model generalization. The IEWOA balances exploration and exploitation through six improvements, ensuring that the fault features extracted by VMD decomposition (e.g., outer ring fault fundamental frequency of 100 Hz and inner ring fault harmonic of 200 Hz) are more consistent with real physical phenomena. Thus, the validation accuracy of the proposed method in the CNN+BiTCN model on the PU dataset reaches 0.9175627, which is 23.1% higher than that of EEMD (0.7455197), with smaller fluctuations. Third, the core physical features of bearing fault signals (i.e., impact pulses, frequency harmonics) are universal. The denoised signals retained by the proposed method are not dependent on specific model structures—CNNs excel at extracting local impact features, while transformers excel at capturing global frequency dependencies. Pure fault signals can adapt to the feature learning logic of different models; hence, the proposed method maintains superior performance across all five models, verifying the physical reliability of the denoising effect. Fourth, complex models have more parameters and stronger fitting capabilities. If the input signals contain noise, overfitting is likely to occur (e.g., the validation loss of EEMD-denoised signals reaches 0.4963078 in the CNN+transformer model). The proposed method achieves a better balance between “signal-to-noise ratio and feature integrity”, reducing the model’s over-learning of noise; its training loss in the CNN+transformer model on the CWRU dataset is only 0.0016482, which is 16.6% of that of EEMD (0.0099050) and 21.5% of that of VMD (0.007681). In summary, the performance advantages of the proposed method are not dependent on model adaptation but stem from the accurate capture of the physical nature of fault signals and effective noise suppression; its cross-model and cross-dataset consistency, combined with statistical significance, confirm the reliability and universality of the denoising method, providing a more solid signal preprocessing support for early bearing fault diagnosis.

7. Conclusions

This study proposes a hybrid denoising method combining the IEWOA to optimize VMD with dataset-specific wavelet thresholding for early fault signal processing in rolling bearings. Significant technical breakthroughs were achieved through systematic validation on two public datasets (CWRU and PU) and five deep learning models. The key findings and core contributions are as follows:

(1) The proposed method achieves a root-mean-square error (RMSE) as low as 0.00013 (PU dataset) and 0.00041 (CWRU dataset), representing reductions of 59.4% and 16.3%, respectively, compared to the best baseline method (DE-VMD). The NCC reaches 0.9689–0.9798, an improvement of 2.0–4.4% over DE-VMD, approaching the ideal value of 1. On the PU dataset containing compound faults, the proposed method achieves an SNR of 25.25373 dB, which is 113.5% higher than the traditional VMD and 45.5% higher than COA-VMD, significantly outperforming other methods in complex noise and frequency superposition, scenarios,

(2) The proposed IEWOA reduces the local optimum rate to 8.7%, which is 53.5% lower than that of the original WOA, through six improvements, including Sobol sequence initialization and nonlinear parameter adjustment; it solves the key problem of “exploration–exploitation imbalance” in traditional optimization algorithms and provides a more stable adaptive scheme for VMD parameter optimization. By implementing end-to-end adaptation of “optimized algorithm—signal decomposition—secondary denoising”, the processing challenges of “weak features, strong noise, and multi-frequency band superposition” in rolling bearing fault signals have been solved.

The method described in this paper has high generalization ability and stability. The proposed IEWOA incorporates six improvements (e.g., Sobol sequence initialization, nonlinear parameter adjustment, Lévy flight strategy), effectively overcoming local optima in VMD parameter optimization. The hybrid denoising strategy was evaluated via multiple metrics, achieving optimal performance on both CWRU and PU datasets; cross-model validation confirmed its generalization ability. This method can improve economic benefits and resource savings: by extracting early fault features, it reduces equipment maintenance costs, extends bearing life, meets the predictive maintenance requirements in smart manufacturing, and has significant resource optimization significance. The extracted fundamental frequency and harmonic frequency components of fault features are more prominent, endowing our method with high engineering application value and practical promotion prospects. However, this study only used the CWRU and PU datasets, lacking long-term operational data from real industrial scenarios (such as bearing signals containing wear evolution processes). The performance of our method in the “full fault evolution cycle” has not yet been validated. In the experiment, the influence of environmental factors such as temperature and humidity on the signal was not considered. In industrial settings, environmental variables can cause changes in noise characteristics; the robustness of our method in scenarios with multiple coupled environmental variables needs further verification. The aforementioned limitations mean that the generalizability of this study’s conclusions is limited to “common noise types, typical bearing structures, and laboratory/semi-industrial scenarios”. For special noise conditions, bearing structures with special features, or complex industrial environments, the performance of the method may degrade, requiring targeted adjustments. Future research will focus on three aspects: first, expanding the noise adaptation range by optimizing the wavelet threshold function for impulse noise and time-varying noise in industrial environments; second, developing an adaptive calibration mechanism for IEWOA parameters to improve the method’s versatility for bearings with different structures; and third, verifying the method’s performance throughout the entire fault evolution cycle by combining long-term operating data from real industrial scenarios.

Author Contributions

X.L.: Conceptualization, methodology, software, writing—original draft, visualization, validation. R.Z.: Data curation, software, writing—original draft, visualization, validation. J.F.: Writing, investigation, project administration, validation. L.L.: Supervision, validation. Z.L.: Software, data curation, validation. T.Z.: Supervision, writing—review and editing, funding acquisition, validation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Xinjiang Uygur Autonomous Region (grant number: 2023B01027), The science and technology plan project of Wujiaqu City, the Sixth Division (grant number: 2512) and National Natural Science Foundation of China (grant number: 6226070321).

Data Availability Statement

The data presented in this study are available in public domain resources. These data were derived from the following resources available in the public domain: [https://engineering.case.edu/bearingdatacenter/welcome (accessed on 25 September 2025) and https://mb.uni-paderborn.de/kat/forschung/bearing-datacenter/data-sets-and-download#c374354 (accessed on 29 September 2025)].

Conflicts of Interest

Author Jianyong Fan was employed by the company Xinjiang Institute of Electronics Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Shwetabh, K.; Ambhaikar, A. Smart health monitoring system of agricultural machines: Deep learning-based optimization with IoT and AI. Bio Web Conf. 2024, 82, 05007. [Google Scholar] [CrossRef]
Pastukhov, A.; Timashov, E. Procedure for simulation of stable thermal conductivity of bearing assemblies. Adv. Eng. Lett. 2023, 2, 58–63. [Google Scholar] [CrossRef]
Aldrini, J.; Chihi, I.; Sidhom, L. Fault diagnosis and self-healing for smart manufacturing: A review. J. Intell. Manuf. 2024, 35, 2441–2473. [Google Scholar] [CrossRef]
Desnica, E.; Mikić, D.; Glavaš, H.; Palinkaš, I. Influence of diagnostics on bearing reliability on robotic systems. Adv. Eng. Lett. 2022, 1, 40–45. [Google Scholar] [CrossRef]
Romero, C.; Ventura, S. Data mining in education. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2013, 3, 12–27. [Google Scholar] [CrossRef]
Asselman, A.; Khaldi, M.; Aammou, S. Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interact. Learn. Environ. 2023, 31, 3360–3379. [Google Scholar] [CrossRef]
Li, Z.; Rao, Z.; Ding, L.; Ding, B.; Fang, J.; Ma, X. YOLOv5s-D: A railway catenary dropper state identification and small defect detection model. Appl. Sci. 2023, 13, 7881. [Google Scholar] [CrossRef]
Seo, J.; Ma, H.; Saha, T. Probabilistic wavelet transform for partial discharge measurement of transformer. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 1105–1117. [Google Scholar] [CrossRef]
Xiaoyu, L.; Jing, J.; Yi, S.; Liu, Y. Noise level estimation method with application to EMD-based signal denoising. J. Syst. Eng. Electron. 2016, 27, 763–771. [Google Scholar] [CrossRef]
Lei, W.; Wang, G.; Wan, B.; Min, Y.; Wu, J.; Li, B. High voltage shunt reactor acoustic signal denoising based on the combination of VMD parameters optimized by coati optimization algorithm and wavelet threshold. Measurement 2024, 224, 113854. [Google Scholar] [CrossRef]
Donoho, D.L.; Johnstone, I.M. Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 1995, 90, 1200–1224. [Google Scholar] [CrossRef]
Donoho, D.L. De-noising by soft-thresholding. IEEE Trans. Inf. Theory 2002, 41, 613–627. [Google Scholar] [CrossRef]
Jeronymo, D.C.; Borges, Y.C.C.; dos Santos Coelho, L. Image forgery detection by semi-automatic wavelet soft-thresholding with error level analysis. Expert Syst. Appl. 2017, 85, 348–356. [Google Scholar] [CrossRef]
Chen, J.; Wan, Z.; Pan, J.; Zi, Y.; Wang, Y.; Chen, B.; Sun, H.; Yuan, J.; He, Z. Customized maximal-overlap multiwavelet denoising with data-driven group threshold for condition monitoring of rolling mill drivetrain. Mech. Syst. Signal Process. 2016, 68, 44–67. [Google Scholar] [CrossRef]
Chen, Y.; Cheng, Y.; Liu, H. Application of improved wavelet adaptive threshold de-noising algorithm in FBG demodulation. Optik 2017, 132, 243–248. [Google Scholar] [CrossRef]
Bahoura, M.; Rouat, J. Wavelet speech enhancement based on time–scale adaptation. Speech Commun. 2006, 48, 1620–1637. [Google Scholar] [CrossRef]
Guo, J.; Si, Z.; Xiang, J. A compound fault diagnosis method of rolling bearing based on wavelet scattering transform and improved soft threshold denoising algorithm. Measurement 2022, 196, 111276. [Google Scholar] [CrossRef]
Liu, H.; Wang, W.; Xiang, C.; Han, L.; Nie, H. A de-noising method using the improved wavelet threshold function based on noise variance estimation. Mech. Syst. Signal Process. 2018, 99, 30–46. [Google Scholar] [CrossRef]
Bayer, F.M.; Kozakevicius, A.J.; Cintra, R.J. An iterative wavelet threshold for signal denoising. Signal Process. 2019, 162, 10–20. [Google Scholar] [CrossRef]
Liu, S.Y.; Ouyang, Z.L.; Chen, G.; Zhou, X.; Zou, Z.J. Black-box modeling of ship maneuvering motion based on Gaussian process regression with wavelet threshold denoising. Ocean Eng. 2023, 271, 113765. [Google Scholar] [CrossRef]
Qiao, Y.; Li, Q.; Qian, H.; Song, X. Research on Seismic Signal Denoising Method Based on VMD and Improved Wavelet Threshold. Geophys. Geochem. Explor. Comput. Technol. 2021, 43, 690–696. [Google Scholar]
Zhang, L.; Liu, Z.; Peng, Y. Speech Enhancement Method Based on Improved Wavelet Threshold and Optimized VMD Algorithm. J. Jilin Univ. (Sci. Ed.) 2025, 63, 608–621. [Google Scholar] [CrossRef]
Sun, W.; Ma, H.; Wang, S. A novel fault diagnosis of GIS partial discharge based on improved whale optimization algorithm. IEEE Access 2024, 12, 3315–3327. [Google Scholar] [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Shen, C.; Cao, H.; Li, J.; Tang, J.; Zhang, X.; Shi, Y.; Yang, W.; Liu, J. Hybrid de-noising approach for fiber optic gyroscopes combining improved empirical mode decomposition and forward linear prediction algorithms. Rev. Sci. Instruments 2016, 87, 033305. [Google Scholar] [CrossRef] [PubMed]
Wu, Z.; Huang, N.E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
Kaur, C.; Bisht, A.; Singh, P.; Joshi, G. EEG Signal denoising using hybrid approach of Variational Mode Decomposition and wavelets for depression. Biomed. Signal Process. Control 2021, 65, 102337. [Google Scholar] [CrossRef]
Zhong, J.; Bi, X.; Shu, Q.; Chen, M.; Zhou, D.; Zhang, D. Partial discharge signal denoising based on singular value decomposition and empirical wavelet transform. IEEE Trans. Instrum. Meas. 2020, 69, 8866–8873. [Google Scholar] [CrossRef]
Shang, H.; Li, Y.; Xu, J.; Qi, B.; Yin, J. A novel hybrid approach for partial discharge signal detection based on complete ensemble empirical mode decomposition with adaptive noise and approximate entropy. Entropy 2020, 22, 1039. [Google Scholar] [CrossRef]
Zhao, H.; Xu, F.; Xu, W.; Zhang, W. Feature extraction method of transformer vibration based on ensemble empirical mode decomposition subband. In Proceedings of the 2016 IEEE International Conference on Power System Technology (POWERCON), Wollongong, NSW, Australia, 28 September–1 October 2016; pp. 1–6. [Google Scholar]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
Wu, Y.; Shen, C.; Cao, H.; Che, X. Improved morphological filter based on variational mode decomposition for MEMS gyroscope de-noising. Micromachines 2018, 9, 246. [Google Scholar] [CrossRef] [PubMed]
Technical Theme Topics. IEEE Electromagn. Compat. Mag. 2024, 13, 61. [CrossRef]
Tu, J.; Wang, H.; Song, Y.; Wu, Q.; Zhang, X.; Song, X. The characterisation of surface-breaking crack using ultrasonic total focusing method imaging based on COA-VMD. Nondestruct. Test. Eval. 2025, 1–23. [Google Scholar] [CrossRef]
Bingyi, J.; Shugang, L.; Dongdong, C.; Qun, Z. Engineering practice of advance gas control for crushed soft coal seams through directional fracturing using a long borehole in the coal seam roof. Coal Geol. Explor. 2025, 53, 4. [Google Scholar]
Razo-López, L.A.; Aubry, G.J.; Pinheiro, F.A.; Mortessagne, F. Strong localization of microwaves beyond 2D in aperiodic Vogel spirals. arXiv 2023, arXiv:2307.12638. [Google Scholar] [CrossRef]
Dai, H.; Yang, D.; Zhang, L.; Liu, G. Bearing Fault Diagnosis Using PSO-VMD and a Hybrid Transformer-CNN-BiGRU Model. Symmetry 2025, 17, 1780. [Google Scholar] [CrossRef]
Li, Y.; Tang, B.; Jiang, X.; Yi, Y. Bearing fault feature extraction method based on GA-VMD and center frequency. Math. Probl. Eng. 2022, 2022, 2058258. [Google Scholar] [CrossRef]
Chang, B.; Zhao, X.; Guo, D.; Zhao, S.; Fei, J. Rolling bearing fault diagnosis based on optimized VMD and SSAE. IEEE Access 2024, 12, 130746–130762. [Google Scholar] [CrossRef]
Yang, W.; Xiao, Y.; Shen, H.; Wang, Z. An effective data enhancement method of deep learning for small weld data defect identification. Measurement 2023, 206, 112245. [Google Scholar] [CrossRef]
Ye, Y.; Yang, Q.; Zhang, J.; Meng, S.; Wang, J. A dynamic data driven reliability prognosis method for structural digital twin and experimental validation. Reliab. Eng. Syst. Saf. 2023, 240, 109543. [Google Scholar] [CrossRef]
Li, X.; Guo, S.; Sun, D.; Cao, L.; Li, C.; Tian, S.; Liu, P.; Qi, Y. A Rolling Bearing Fault Diagnosis Method Based on Extreme Learning Machine Optimized by Improved Whale Optimization Algorithm. Facta Univ. Ser. Mech. Eng. 2025, 23, 881. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm; Elsevier: Amsterdam, The Netherlands, 2016; Volume 95, pp. 51–67. [Google Scholar]
Ashraf, A.; Pervaiz, S.; Haider Bangyal, W.; Nisar, K.; Ag Ibrahim, A.A.; Rodrigues, J.j.P.; Rawat, D.B. Studying the impact of initialization for population-based algorithms with low-discrepancy sequences. Appl. Sci. 2021, 11, 8190. [Google Scholar] [CrossRef]
Wang, H.; Wu, F.; Zhang, L. Application of variational mode decomposition optimized with improved whale optimization algorithm in bearing failure diagnosis. Alex. Eng. J. 2021, 60, 4689–4699. [Google Scholar] [CrossRef]
Lin, X.; Yu, X.; Li, W. A heuristic whale optimization algorithm with niching strategy for global multi-dimensional engineering optimization. Comput. Ind. Eng. 2022, 171, 108361. [Google Scholar] [CrossRef]
Sun, Y.; Wang, X.; Chen, Y.; Liu, Z. A modified whale optimization algorithm for large-scale global optimization problems. Expert Syst. Appl. 2018, 114, 563–577. [Google Scholar] [CrossRef]
Lin, Z. Optimizing Kernel Extreme Learning Machine based on a Enhanced Adaptive Whale Optimization Algorithm for classification task. PLoS ONE 2025, 20, e0309741. [Google Scholar] [CrossRef]
Su, W.; Wang, F.; Zhu, H.; Zhang, Z.; Guo, Z. Rolling element bearing faults diagnosis based on optimal Morlet wavelet filter and autocorrelation enhancement. Mech. Syst. Signal Process. 2010, 24, 1458–1472. [Google Scholar] [CrossRef]
Case Western Reserve University. CWRU Bearing Fault Data. 2007. Available online: https://engineering.case.edu/bearingdatacenter/welcome (accessed on 25 September 2025).
Paderborn University. PU Bearing Fault Data. 2012. Available online: https://mb.uni-paderborn.de/kat/forschung/bearing-datacenter/data-sets-and-download#c374354 (accessed on 29 September 2025).
Ayat, M.; Shamsollahi, M.B.; Mozaffari, B.; Kharabian, S. ECG denoising using modulus maxima of wavelet transform. In Proceedings of the 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Minneapolis, MN, USA, 3–6 September 2009; pp. 416–419. [Google Scholar]
Singh, P.; Pradhan, G.; Shahnawazuddin, S. Denoising of ECG signal by non-local estimation of approximation coefficients in DWT. Biocybern. Biomed. Eng. 2017, 37, 599–610. [Google Scholar] [CrossRef]
Shi, H.; Liu, R.; Chen, C.; Shu, M.; Wang, Y. ECG baseline estimation and denoising with group sparse regularization. IEEE Access 2021, 9, 23595–23607. [Google Scholar] [CrossRef]

Figure 1. Individual distribution maps were generated using the random method and Sobol sequence, with a randomly generated scatter plot (left) and a scatter plot generated by the Sobol sequence (right).

Figure 2. Flowchart of VMD optimized by IEWOA.

Figure 3. Flowchart of wavelet threshold denoising.

Figure 4. The process of the denoising method.

Figure 5. Time-domain and frequency-domain diagrams of rolling bearing signals under four different states from the CWRU dataset.

Figure 6. Time-domain and frequency-domain diagrams of rolling bearing signals under six different states from the PU dataset.

Figure 7. Decomposition results of outer ring fault signals from the CWRU dataset.

Figure 8. Decomposition results of outer ring fault signals from the PU dataset.

Figure 9. Comparison of five algorithms for the CWRU dataset.

Figure 10. Comparison of five algorithms for the PU dataset.

Figure 11. IEWOA ablation experiments on the CWRU dataset.

Figure 12. IEWOA ablation experiments on the PU dataset.

Figure 13. Decomposition results of normal and fault signals from the CWRU dataset.

Figure 14. Decomposition results of inner ring fault signals from the CWRU dataset.

Figure 15. Decomposition results of ball fault signals from the CWRU dataset.

Figure 16. Results of denoising based on different ’db’ wavelets for the CWRU dataset.

Figure 17. Results of denoising based on different ‘db’ wavelets for the PU dataset.

Figure 18. Denoising results based on different decomposition levels for the CWRU dataset.

Figure 19. Denoising results based on different decomposition levels for the PU dataset.

Figure 20. Convergence curves of different methods in CNN for the CWRU dataset.

Figure 21. Convergence curves of different methods in CNN for the PU dataset.

Figure 22. Convergence curves of different methods in CNN+BiGRU+attention for the CWRU dataset.

Figure 23. Convergence curves of different methods in CNN+BiTCN for the CWRU dataset.

Figure 24. Convergence curves of different methods in CNN+transformer for the CWRU dataset.

Figure 25. Convergence curves of different methods in CNN+TCN for the CWRU dataset.

Table 1. Summary table of rolling bearing fault signal denoising methods.

Method Name	Advantages	Limitations
Wavelet Threshold	Excellent time–frequency locality; simple calculation	Relies on empirical selection of threshold/basis function; prone to distortion or residual noise for complex signals
EMD	No predefined basis function; adapts to nonlinear signals	Severe mode mixing and end effects; poor decomposition stability
EEMD	Mitigates mode mixing	Low computational efficiency; difficult regularization parameter selection; relies on repeated sampling
VMD	No mode mixing; high time–frequency localization accuracy	Penalty factor $α$ and mode number (K) require empirical/trial-and-error determination; strong parameter randomness

Table 2. Data labeling and explanation from CWRU.

Label	Explanation
Normal	Normal operating conditions
Inner	Bearing inner ring damage
Ball	Damaged bearing ball
Outer	Bearing outer ring damage

Table 3. PU data labels and explanations.

Label	Explanation	Numbers	Extent of Damage
Normal	Normal working condition	K001	0
AD_OR	Artificial outer ring damage	KA01	1
RD_OR	Real outer ring damage	KA04	1
RD_OR+IR	Real outer ring and inner ring damage	KB27	1
AD_IR	Artificial inner ring damage	KI01	1
RD_IR	Real inner ring damage	KI04	1

Table 4. Efficiency and performance comparison (CWRU).

Algorithm	Time	Core Parameter Configuration
PSO-VMD	0.81	Inertia weight = 0.7, learning factors $c_{1}$ = 1.5, $c_{2}$ = 1.5
GA-VMD	1.14	Crossover probability = 0.8, mutation probability = 0.05, chromosome length = 10.
COAT-VMD	0.76	Exploration factor = 1.2, utilization factor = 0.8, population split number = 3
DE-VMD	0.73	Crossover probability = 0.7, scaling factor = 0.5
IEWOA-VMD	0.65	Nonlinear parameter adjustment factor $k_{1}$ = $- 1$ , $k_{2}$ = 1, Disturbance term coefficient $w_{1}$ = 1.2, $w_{2}$ = 0.8

Table 5. Final fitness values of ablation experiments on different datasets.

Lévy Strategy (LWOA)	Boundary (BWOA)	T-Perturbation (NWOA)	Improved Perturbation (IWOA)	CWRU	PU
				1.7826313	0.9110398
√				1.7826313	0.9063488
	√			1.7815513	0.9099268
		√		1.7826313	0.9103367
			√	1.7826313	0.9096031
√	√	√	√	1.7815512	0.9063338

Note: √ indicates that the corresponding optimization strategy is adopted in the experiment. Bold values represent the optimal (minimum) fitness values in the column.

Table 6. Denoising effects of different data.

Method	CWRU			PU
Method	RMSE	SNR	NCC	RMSE	SNR	NCC
OUR	0.00041	25.15663	0.9689	0.00013	25.25373	0.9798
EEMD	0.02015	34.17483	0.9687	0.04815	38.57131	0.9779
Wavelet	0.00102	23.05276	0.9382	0.00016	31.70307	0.9689
VMD	0.00054	21.43461	0.9572	0.00058	11.82317	0.8763
GA-VMD	0.00051	23.53920	0.9577	0.00039	16.1832	0.9238
PSO-VMD	0.00050	23.87312	0.9592	0.00034	16.5195	0.9300
COAT-VMD	0.00049	23.99461	0.9601	0.00032	17.2874	0.9343
DE-VMD	0.00049	24.03898	0.9602	0.00032	17.3821	0.9388

Note: Bold values represent the optimal values in the column.

Table 7. Optimal results of different methods in different models.

Model	Metric	CWRU			PU
		EEMD	VMD	OUR	EEMD	VMD	OUR
CNN	train_acc	0.968117	0.97976	0.980993	0.832310	0.980572	0.981595
	train_loss	0.061792	0.006363	0.004514	0.366222	0.008236	0.001678
	val_acc	0.952789	0.961373	0.961373	0.731182	0.917562	0.917562
	val_loss	0.046514	0.003367	0.000974	0.446939	0.008840	0.001648
CNN+ BiGRU+ Attention	train_acc	0.980993	0.980993	0.980993	0.890593	0.973415	0.980572
	train_loss	0.001946	0.001095	0.000899	0.222242	0.024793	0.01201
	val_acc	0.957081	0.961373	0.961373	0.738351	0.910394	0.917562
	val_loss	0.019479	0.001474	0.00089	0.402371	0.017436	0.023265
	train_acc	0.974862	0.977927	0.980993	0.849693	0.981595	0.981595
CNN+	train_loss	0.019564	0.011532	0.002983	0.317554	0.00812	0.003860
BiTCN	val_acc	0.957081	0.961373	0.961373	0.745519	0.917562	0.917562
	val_loss	0.014431	0.002794	0.000343	0.391833	0.001435	0.003285
	train_acc	0.978540	0.980380	0.980993	0.825153	0.951942	0.956032
CNN+	train_loss	0.009905	0.007681	0.001648	0.363859	0.087752	0.080413
Transformer	val_acc	0.957081	0.961373	0.961373	0.709677	0.899641	0.899641
	val_loss	0.017194	0.000259	0.000055	0.496307	0.053459	0.055491
	train_acc	0.973635	0.97486	0.975475	0.836400	0.981595	0.981595
CNN+	train_loss	0.032839	0.03019	0.038514	0.340592	0.005840	0.007028
TCN	val_acc	0.957081	0.961373	0.961373	0.731182	0.917562	0.917562
	val_loss	0.024413	0.011856	0.005685	0.426218	0.003338	0.004691

Note: Bold values represent the optimal values in the same model and unified dataset.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, X.; Zhang, R.; Fan, J.; Li, L.; Li, Z.; Zhou, T. A Hybrid Denoising Model for Rolling Bearing Fault Diagnosis: Improved Edge Strategy Whale Optimization Algorithm-Based Variational Mode Decomposition and Dataset-Specific Wavelet Thresholding. Symmetry 2026, 18, 168. https://doi.org/10.3390/sym18010168

AMA Style

Liu X, Zhang R, Fan J, Li L, Li Z, Zhou T. A Hybrid Denoising Model for Rolling Bearing Fault Diagnosis: Improved Edge Strategy Whale Optimization Algorithm-Based Variational Mode Decomposition and Dataset-Specific Wavelet Thresholding. Symmetry. 2026; 18(1):168. https://doi.org/10.3390/sym18010168

Chicago/Turabian Style

Liu, Xinqi, Ruimin Zhang, Jianyong Fan, Lianghong Li, Zhigang Li, and Tao Zhou. 2026. "A Hybrid Denoising Model for Rolling Bearing Fault Diagnosis: Improved Edge Strategy Whale Optimization Algorithm-Based Variational Mode Decomposition and Dataset-Specific Wavelet Thresholding" Symmetry 18, no. 1: 168. https://doi.org/10.3390/sym18010168

APA Style

Liu, X., Zhang, R., Fan, J., Li, L., Li, Z., & Zhou, T. (2026). A Hybrid Denoising Model for Rolling Bearing Fault Diagnosis: Improved Edge Strategy Whale Optimization Algorithm-Based Variational Mode Decomposition and Dataset-Specific Wavelet Thresholding. Symmetry, 18(1), 168. https://doi.org/10.3390/sym18010168

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Hybrid Denoising Model for Rolling Bearing Fault Diagnosis: Improved Edge Strategy Whale Optimization Algorithm-Based Variational Mode Decomposition and Dataset-Specific Wavelet Thresholding

Abstract

1. Introduction

2. Variational Mode Decomposition

3. VMD Parameter Optimization Based on IEWOA

3.1. Whale Optimization Algorithm

3.2. Improved Exponential Whale Optimization Algorithm (IEWOA)

3.2.1. Sobol Sequence Initialization

3.2.2. Nonlinear Parameter Adjustment

3.2.3. Heuristic Probability p 1

3.2.4. Lévy Flight Strategy

3.2.5. Adaptive T-Distribution Perturbation

3.2.6. Boundary Handling

3.2.7. Trade-Off Between Exploration and Exploitation

3.3. VMD Parameter Optimization via IEWOA

4. Wavelet Threshold Denoising

4.1. Selection of Wavelet Basis Functions

4.2. Wavelet Threshold Selection Criteria

4.3. Selection of the Number of Wavelet Decomposition Layers

5. Combined Denoising Method

5.1. Datasets

5.2. Evaluation of Denoising Effect

6. Experimental Analysis

6.1. Vibration Signal Analysis

6.2. Signal Decomposition via IEWOA-VMD

6.3. Secondary Denoising via Wavelet Thresholding

6.4. Comparison of Different Denoising Methods

6.5. Comparison of Different Methods Among Various Models

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2.3. Heuristic Probability $p_{1}$