Enhancement of Bearing Fault Diagnosis Using Optimized Variational Decomposition, Entropy-Based Modal Reconstruction, and Evolutionary Bidirectional Fusion Network

Chen, Xupeng; Li, Huiyin; Zhang, Xu; Lai, Jianling; Hu, Xin; Peng, Tian

doi:10.3390/pr14121861

Open AccessArticle

Enhancement of Bearing Fault Diagnosis Using Optimized Variational Decomposition, Entropy-Based Modal Reconstruction, and Evolutionary Bidirectional Fusion Network

by

Xupeng Chen

^1,2,*,

Huiyin Li

^1,2,

Xu Zhang

^1,2,

Jianling Lai

^1,2,

Xin Hu

³ and

Tian Peng

³

¹

Changlongshan Pumped Storage Power Plant, China Yangtze Power Renewables Co., Ltd., Huzhou 310009, China

²

Longxu Youth Innovation Studio, China Yangtze Power Renewables Co., Ltd., Huzhou 310009, China

³

PowerChina HuaDong Engineering Co., Ltd., Hangzhou 311122, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(12), 1861; https://doi.org/10.3390/pr14121861 (registering DOI)

Submission received: 18 May 2026 / Revised: 4 June 2026 / Accepted: 5 June 2026 / Published: 9 June 2026

(This article belongs to the Section Process Control, Modeling and Optimization)

Download

Browse Figures

Versions Notes

Abstract

Rolling bearing vibration signals often exhibit strong nonstationarity and are susceptible to noise interference, which makes fault feature extraction and accurate diagnosis challenging under complex operating conditions. To address these issues, this paper proposes a fault diagnosis pipeline that sequentially combines an improved snow ablation optimizer (ISAO), variational generalized nonlinear mode decomposition (VGNMD), and a bidirectional temporal sequence fusion network (BiTSF-Net). Firstly, ISAO is used to optimize the key parameters of VGNMD, including the bandwidth penalty parameter and smoothing constraint parameter, with minimum envelope entropy as the fitness function. Secondly, the optimized VGNMD decomposes raw vibration signals into modal components, and the modal component with the minimum envelope entropy is selected to highlight fault-related impulsive characteristics. Thirdly, 11-dimensional time-domain statistical features are extracted from the selected optimal modal component to characterize bearing health states. Finally, these extracted features are used as the input to BiTSF-Net, which combines bidirectional temporal convolutional networks and bidirectional long short-term memory networks in a parallel structure to learn local transient features and temporal dependencies for fault classification. Experimental validation is conducted on the Case Western Reserve University dataset. Comparative results with convolutional neural networks, gated recurrent units, and long short-term memory networks demonstrate that the proposed pipeline achieves superior diagnostic performance, with an average accuracy of 99.63% and a maximum accuracy of 100%. These results confirm the effectiveness and robustness of the proposed ISAO-VGNMD feature extraction and BiTSF-Net classification pipeline for bearing fault diagnosis under complex nonstationary conditions.

Keywords:

bearing fault diagnosis; VGNMD; ISAO; BiTSF-Net; feature extraction; intelligent fault diagnosis

1. Introduction

1.1. Research Background

As crucial supporting components in rotating machinery systems, rolling bearings are extensively employed in power equipment, rail transportation, aerospace, and industrial manufacturing industries. Their operational state directly influences the safety and reliability of mechanical systems [1]. In the event of bearing failure, it may, at the least, result in a decrease in equipment operational efficiency, or, at the most, lead to severe mechanical damage and major safety incidents. Therefore, efficient and accurate fault diagnosis of rolling bearings has significant engineering implications and research value [2]. In recent years, with the escalating complexity of industrial equipment, traditional fault diagnosis methods based on empirical rules have proven insufficient for intelligent diagnosis under complex operating conditions. Achieving high-precision fault identification through advanced signal processing techniques and intelligent algorithms has become a major research focus in the field of mechanical fault diagnosis.

Among diverse monitoring techniques, vibration signals are extensively employed in bearing fault diagnosis research, as they directly mirror the operating conditions of mechanical structures. Nevertheless, bearing vibration signals generally display prominent non-stationarity, nonlinearity, and multi-scale coupling characteristics. Under complex operational circumstances and interference from environmental noise, fault impact features are frequently concealed within background signals, substantially augmenting the difficulties in feature extraction and fault identification [3]. Consequently, the effective extraction of discriminative feature information from complex vibration signals has become a crucial challenge in the research of rolling bearing fault diagnosis. In recent years, with the progress of signal processing methods and artificial intelligence technologies, fault diagnosis approaches that combine signal decomposition with deep learning have emerged as a research focus. This integrated approach provides an effective framework for feature extraction and condition recognition in complex nonstationary signals. [4].

1.2. Literature Review

To address the nonstationary characteristics of bearing vibration signals, numerous signal decomposition methods have been developed for fault feature extraction. Among them, Empirical Mode Decomposition (EMD) [5] is regarded as one of the most representative adaptive signal analysis approaches. Lei et al. [6] systematically reviewed the application of EMD in rotating machinery fault diagnosis and demonstrated its capability in separating signal components at different scales. However, EMD-based methods remain highly susceptible to mode mixing, end effects, and decomposition instability under noisy operating conditions, which may reduce the reliability of extracted fault features in practical applications.

To overcome these limitations, Dragomiretskiy and Zosso [7] proposed Variational Mode Decomposition (VMD), which formulates signal decomposition as a variational optimization problem and effectively suppresses modal aliasing. Compared with EMD, VMD exhibits better mathematical robustness and decomposition stability. Nevertheless, the decomposition performance of VMD remains highly dependent on manually selected parameters, particularly the modal number and bandwidth penalty factor. Improper parameter settings may lead to under-decomposition or redundant modal components, thereby affecting fault feature extraction accuracy.

Subsequently, Nazari et al. [8] proposed Successive Variational Mode Decomposition (SVMD), which further improves decomposition stability through successive extraction strategies. Although SVMD alleviates parameter sensitivity to some extent, its adaptability to highly nonlinear and strongly nonstationary vibration signals remains limited under complex operating conditions.

In recent years, Wang et al. [9] proposed Variational Generalized Nonlinear Mode Decomposition (VGNMD), which integrates adaptive time–frequency clustering with variational optimization to simultaneously process nonlinear frequency modulation modes and dispersive modes. Compared with conventional decomposition methods, VGNMD demonstrates stronger capability in analyzing complex signal structures and extracting physically meaningful modal information. However, the decomposition performance of VGNMD is still highly sensitive to key parameters, such as the bandwidth penalty parameter and frequency smoothing constraint coefficient [10]. Inappropriate parameter selection may result in insufficient modal separation, information leakage, or redundant modal components, thereby affecting subsequent fault diagnosis performance. Therefore, adaptive parameter optimization has become an important research direction for improving decomposition quality and feature representation capability.

With the development of swarm intelligence optimization techniques, various optimization algorithms have been introduced into mechanical fault diagnosis, including Particle Swarm Optimization (PSO) [11], Gray Wolf Optimizer (GWO) [12], and Differential Evolution (DE) [13]. These methods have demonstrated effectiveness in parameter optimization and feature selection tasks. Recently, Deng et al. [14] proposed the Snow Ablation Optimizer (SAO), which exhibits strong global optimization capability in engineering applications. However, similar to many swarm intelligence algorithms, the original SAO still suffers from insufficient population diversity and premature convergence when solving high-dimensional nonlinear optimization problems. Consequently, improving the exploration–exploitation balance and convergence stability of optimization algorithms remains a critical issue.

Meanwhile, deep learning techniques have been extensively applied in intelligent fault diagnosis and remaining useful life prediction [15,16]. In particular, Zhang et al. [17] proposed a cross-working-condition bearing remaining useful life prediction method based on SPW-SVDD health indicators and a temporal self-attention mechanism, demonstrating the effectiveness of deep learning architectures in capturing temporal degradation characteristics under varying operating conditions. Ji et al. [18] established a swarm intelligence-based deep learning model combining the improved whale optimization algorithm and bidirectional long short-term memory to realize fault diagnosis of chemical processes. However, CNN-based models mainly focus on local receptive fields and lack sufficient capability for capturing long-term temporal dependencies [19]. Long Short-Term Memory (LSTM) networks [20] can effectively model temporal sequence information and have demonstrated advantages in time-series prediction and fault diagnosis tasks. Nevertheless, LSTM architectures often suffer from high computational complexity, limited parallelization capability, and gradient vanishing problems when modeling long sequences. Temporal Convolutional Networks (TCNs) [21], benefiting from dilated causal convolution structures, can effectively expand receptive fields while maintaining parallel computation capability. However, TCN-based models still exhibit limitations in modeling complex nonlinear temporal dynamics.

To further improve feature learning capability, many studies have attempted to combine multiple network architectures. Existing serial hybrid models, such as TCN-LSTM frameworks, can partially integrate local feature extraction and temporal dependency modeling. However, serial structures may still introduce information loss during feature transmission and fail to fully preserve multi-scale temporal characteristics. In addition, the increasing network depth may further reduce training efficiency and model robustness.

Overall, existing studies have achieved significant progress in bearing fault diagnosis. Nevertheless, several challenges remain unresolved, including parameter-sensitive signal decomposition, insufficient robustness under complex operating conditions, limited adaptive feature extraction capability, and inadequate joint modeling of local transient characteristics and long-term temporal dependencies. Therefore, developing an intelligent fault diagnosis framework with adaptive decomposition capability, robust parameter optimization performance, and efficient multi-scale temporal feature learning remains an important research topic.

1.3. Research Gaps and Innovations

Overall, existing studies have achieved considerable progress in bearing fault diagnosis. However, several critical challenges remain unresolved, including parameter-sensitive signal decomposition, insufficient robustness under complex operating conditions, limited adaptive feature extraction capability, and inadequate joint modeling of local transient characteristics and long-term temporal dependencies.

Traditional signal decomposition methods may still suffer from mode mixing and strong parameter sensitivity when processing complex nonlinear vibration signals, which undermines the accuracy of fault feature extraction. Although VGNMD can achieve high-precision decomposition of complex non-stationary signals, its key parameters are highly sensitive to the decomposition results. Inappropriate parameter settings may deteriorate the quality of decomposition. During fault diagnosis, single deep-learning models often struggle to simultaneously capture local impulsive features and long-term temporal dependencies, thus restricting their capacity to effectively represent complex vibration signals.

To tackle the aforementioned challenges, this paper presents a bearing fault diagnosis method that integrates an improved snow ablation optimizer algorithm with variational generalized nonlinear mode decomposition. The key innovations are as follows:

An ISAO algorithm combining Latin hypercube sampling (LHS) and Tent chaotic mapping, which performs adaptive optimization of key parameters in VGNMD to improve mode decomposition quality;
A modal selection strategy based on minimum envelope entropy that adaptively selects optimal modal components from decomposed signals while extracting 11-dimensional time-domain statistical features to enhance fault feature representation;
A bidirectional time-series parallel fusion network (BiTSF-Net) integrating bidirectional time convolution networks (BiTCN) and bidirectional long short-term memory networks (BiLSTM) for multi-scale temporal feature learning, thereby improving fault recognition performance for complex vibration signals.

2. Theoretical Research

2.1. VGNMD

For completeness and clarity, the main principles and mathematical formulations of VGNMD [9] are briefly reviewed in this section before introducing the proposed optimization strategy.

For complex non-stationary signals that encompass both nonlinear chirp modes [22] and dispersive modes, conventional signal decomposition methods frequently necessitate pre-assumed signal types or manually configured decomposition parameters. This makes it difficult to achieve a balance between adaptability and decomposition accuracy. The Variational Generalized Nonlinear Mode Decomposition (VGNMD) attains the adaptive extraction of modes with distinct physical mechanisms through the integration of adaptive time–frequency clustering, modal type discrimination, and variational optimization strategies. As depicted in Figure 1:

Given the input signal

s (t), t \in [0, T]

, the VGNMD framework comprises three core steps: from Adaptive Time–Frequency Clustering to Modal Type Discrimination and then to Variational Modal Extraction [9], as detailed below:

The time–frequency representation of signals is obtained through the multiscale short-time Fourier transform [9]:

{STFT}_{i} (t, f) = \int s (τ) w_{i} (τ - t) e^{- j 2 π f τ} d τ, i = 1, 2, \dots, M

(1)

where

w_{i} (\cdot)

denotes analysis window functions at different scales. After performing normalization and threshold-based denoising on the time–frequency spectra at each scale, significant energy distribution regions are extracted using time–frequency connected domain clustering methods. Subsequently, the clustering results across different scales are fused to obtain a set of time–frequency subregions:

{C_{k} (t, f)}_{k = 1}^{K}

(2)

where

K

denotes the adaptively determined latent modal number, with each

C_{k}

corresponding to the time–frequency support region of a candidate signal modality.

Given the distinct physical characteristics of different modalities, VGNMD further incorporates a Mode-Type Discrimination Criterion (MTDC) to differentiate candidate modalities into nonlinear frequency modulation modes or dispersion modes. For the K-th time–frequency clustering region

C_{k}

, we first extract its time–frequency ridge lines:

(f_{k} (t), t_{k} (f))

, representing the principal energy trajectories along the time and frequency axes. By analyzing the temporal and frequency variations in these ridge lines, we calculate their average change rate and define the ridge line slope ratio:

{CR}_{k} = |\frac{f_{k} (t_{2}) - f_{k} (t_{1})}{t_{2} - t_{1}}|

(3)

If the mode exhibits smooth temporal variation with

{CR}_{k} \leq 1

, it is classified as a nonlinear chirp mode:

s_{k} (t) = A_{k} (t) \cos (2 π \int_{0}^{t} f_{k} (τ) d τ + ϕ_{k})

(4)

where

A_{k} (t)

denotes the amplitude and

f_{k} (t)

represents the frequency (IF).

Conversely, when the variation demonstrates higher stability along the frequency axis, it is categorized as a dispersive mode. Subsequently, an appropriate optimization model can be chosen for the subsequent variational extraction phase to formulate the following variational problem:

\min_{A_{k}, f_{k}} \{\begin{matrix} {‖s_{k} (t) - A_{k} (t) \cos (2 π \int_{0}^{t} f_{k} (τ) d τ)‖}_{2}^{2} \\ + γ {‖ \nabla^{2} A_{k} (t) ‖}_{2}^{2} + δ {‖ \nabla^{2} f_{k} (t) ‖}_{2}^{2} \end{matrix}\}

(5)

where

γ

and

δ

represent the amplitude bandwidth and instantaneous frequency smoothing constraint parameters, respectively. By alternately iteratively updating

A_{k} (t)

and

f_{k} (t)

, nonlinear frequency modulation modes with physical significance can be obtained.

Dispersion modes are more easily characterized in the frequency domain. Let their frequency-domain representation be denoted as

S_{k} (f) = B_{k} (f) e^{- j 2 π \int_{0}^{f} τ_{k} (ξ) d ξ}

(6)

where

B_{k} (f)

denotes the frequency-domain amplitude, and

τ_{k} (f)

represents Group Delay (GD). VGNMD employs Generalized Dispersive Mode Decomposition (GDMD) to construct the following variational model:

\min_{B_{k}, τ_{k}} \{\begin{matrix} {‖S_{k} (f) - B_{k} (f) e^{- j 2 π \int_{0}^{f} τ_{k} (ξ) d ξ}‖}_{2}^{2} \\ + γ {‖ \nabla^{2} B_{k} (f) ‖}_{2}^{2} + δ {‖ \nabla^{2} τ_{k} (f) ‖}_{2}^{2} \end{matrix}\}

(7)

This model facilitates the adaptive extraction of dispersion modes within the frequency domain and ultimately reconstructs their time-domain representation via inverse Fourier transform. Through the steps described above, VGNMD decomposes the original signal into several modal components with well-defined physical meanings:

s (t) = \sum_{k = 1}^{K} s_{k} (t) + r (t)

(8)

where

s_{k} (t)

denotes the extracted nonlinear frequency modulation or dispersion modes, while

r (t)

represents the residual signal. Compared with conventional decomposition methods, VGNMD does not require a predefined number of modes, can adaptively identify modal types based on time–frequency structure, and achieves high-precision decomposition through variational optimization.

2.2. Improve SAO

To enhance the global search ability and convergence stability of the SAO algorithm in continuous parameter spaces, this paper presents an ISAO that combines LHS [23] with Tent [24] chaotic mapping. Through the systematic reconstruction of the initial population distribution, this algorithm notably improves search diversity and the uniformity of solution space coverage without augmenting computational complexity. This effectively mitigates the common limitations of the original SAO algorithm, such as vulnerability to the randomness of the initial population and premature convergence.

2.2.1. Latin Hypercube Sampling Initialization

In the SAO, the initial population is generally generated uniformly and randomly within the search space. Nevertheless, in high-dimensional nonlinear optimization problems, this random initialization method frequently results in non-uniform sample distribution. Some regions may encounter excessive sample aggregation, while others lack effective search candidates. Consequently, the algorithm’s capacity to explore global optimal solutions is diminished, and the risk of being trapped in local optima is increased. To improve the uniform distribution of the initial population throughout the entire search space, this paper presents the LHS strategy for population initialization.

Let the search space dimension of the optimization problem be D, the population size be N, and the upper and lower bounds of the d-th dimension parameter be

[l_{d}, u_{d}]

. The LHS divides each interval [0, 1] into N equal subintervals and randomly selects a sample point within each subinterval to ensure uniform marginal distribution across all dimensions. The normalized representation of the i-th individual in the d-th dimension can be expressed as

x_{i, d}^{(0)} = \frac{π_{d} (i) - r_{i, d}}{N}

(9)

i = 1, 2, \dots, N, d = 1, 2, \dots, D

(10)

where

π_{d} (\cdot)

denotes a random permutation of {1, 2, …, N}, and

r_{i, d} ~ U (0, 1)

represents uniformly distributed random numbers. Subsequently, samples are projected into the actual search space via linear mapping:

X_{i, d}^{(0)} = l_{d} + x_{i, d}^{(0)} (u_{d} - l_{d})

(11)

By means of LHS initialization, the coverage uniformity of the initial population across all dimensions can be substantially enhanced, offering more comprehensive global information for subsequent optimization searches. In comparison with traditional random initialization methods, LHS improves the spatial coverage of samples while preserving randomness, leading to a more uniformly distributed initial population across dimensions. This strategy not only enhances the algorithm’s initial search ability but also reduces the probability of premature convergence to a certain degree, thereby laying the groundwork for further augmenting population diversity through the introduction of Tent chaotic mapping.

2.2.2. Tent Chaos Mapping Enhances Population Diversity

Although the LHS method can enhance the spatial distribution uniformity of the initial population to a certain degree, it essentially depends on random sampling techniques that might lead to correlations among samples. In intricate high-dimensional optimization problems, LHS initialization alone is unable to ensure adequate population traversal ability. To augment population diversity and global search performance, this research presents Tent chaotic mapping perturbation initialization based on LHS. Chaotic mapping pertains to deterministic nonlinear dynamic systems that display stochastic behavior, featuring ergodicity, randomness, and sensitivity to initial values. This mechanism allows chaotic sequences to exhibit strong search capabilities in optimization algorithms and prevents the population from stagnating at local optima. In comparison with traditional chaotic systems such as Logistic mapping, Tent mapping has a simpler structure, a more uniform distribution, and superior ergodic characteristics, which makes it widely utilized in the initialization and perturbation stages of swarm intelligence optimization algorithms.

It can be defined as

T (x) = \{\begin{array}{l} 2 x, & 0 \leq x < 0.5 \\ 2 (1 - x), & 0.5 \leq x \leq 1 \end{array}

(12)

where

x \in (0, 1)

denotes a chaotic sequence. When the system parameter is set to 2, the Tent mapping generates a chaotic sequence with excellent ergodic properties and a positive Lyapunov exponent, indicating that the system operates in a typical chaotic state. The chaotic sequence exhibits uniform distribution within the interval (0, 1), which enhances the coverage capability of the search space.

Applying the Tent mapping dimensionally to the normalized samples

x_{i, d}^{(0)}

generated by LHS yields the chaos-enhanced samples:

{\tilde{x}}_{i, d}^{(0)} = T (x_{i, d}^{(0)})

(13)

where

T (\cdot)

denotes the Tent mapping function. This approach further eliminates potential correlations between original samples, resulting in a more uniform and stochastic distribution of the population within the search space. The population is then remapped back to the original search space:

{\tilde{X}}_{i, d}^{(0)} = l_{d} + {\tilde{x}}_{i, d}^{(0)} (u_{d} - l_{d})

(14)

Through the integration of the Tent chaotic mapping and the LHS strategy, the random distribution of the initial population can be further enhanced while preserving uniformity, which substantially enhances the global search ability of the optimization algorithm. This chaotic-enhanced initialization strategy not only strengthens the algorithm’s exploration capacity in the early iterations but also reduces the likelihood of being trapped in local optima, thus offering a more excellent initial population for the subsequent global optimization of the ISAO algorithm.

2.2.3. ISAO Optimization of VGNMD Parameters

The ISAO algorithm initially delineates the value ranges of the bandwidth penalty parameter α and the smoothing constraint parameter β for VGNMD. It utilizes Latin hypercube sampling in conjunction with Tent chaotic mapping to generate an initial population, thus augmenting population diversity. In the exploration stage, Gaussian Brownian motion is employed to expand the search space, whereas in the development stage, local searches are carried out around the current optimal solution based on the snowmelt model. The algorithm introduces a dual-population mechanism that dynamically modifies the sizes of the exploration and development sub-populations during iterations to attain an adaptive equilibrium between them. Ultimately, continuous iterative updates are conducted under boundary constraints to acquire the optimal parameter combination that minimizes the objective function. Flowchart of the ISAO algorithm is shown in Figure 2.

2.3. Envelope Entropy and Feature Extraction

Typically, non-stationary signal decomposition yields multiple modal components. However, not all decomposed modes contain discriminative features relevant to fault identification [25,26,27]. To adaptively select optimal modes containing key impact characteristics from multimodal decomposition results, this study introduces the minimum envelope entropy criterion as the modal selection basis.

Envelope entropy [28] effectively characterizes the degree of concentration in signal energy distribution. For a given modal component x(t), its envelope signal can be obtained through a Hilbert transform, thereby constructing a normalized envelope probability distribution. Envelope entropy is defined as

E_{env} = - \sum_{i = 1}^{N} p_{i} \ln p_{i}

(15)

where

p_{i}

denotes the normalized amplitude of the envelope signal at the i-th sampling point. A smaller envelope entropy value indicates a more concentrated signal energy distribution and more pronounced impact characteristics, which are more conducive to characterizing mechanical fault states.

In accordance with the aforementioned criteria, this study computes the envelope entropy for each decomposed modal component separately and selects the one with the lowest envelope entropy as the optimal modal component. This method facilitates the adaptive extraction of crucial fault information and effectively reduces the subjectivity associated with manual modal selection.

Upon obtaining the optimal mode, to comprehensively characterize its time-domain statistical properties, we further extract 11-dimensional time-domain feature parameters from this mode, such as mean value, variance, and peak-to-peak value. These features characterize the signal amplitude distribution, energy content, and impulsive characteristics from multiple perspectives, offering discriminative feature inputs for subsequent fault classification models.

The joint strategy of “screening for optimal modes based on minimum envelope entropy+extraction of multidimensional time-domain features” enhances the data characteristics.

3. BiTSF-Net Construction

In the domain of mechanical fault diagnosis, the commonly employed deep-learning models mainly encompass architectures like CNN, LSTM/GRU, and TCN. Nevertheless, all these models present certain limitations in feature extraction.

CNNs are effective in extracting local features from signals through convolutional kernels with local receptive fields, and they showcase remarkable performance in image recognition and one-dimensional signal processing. However, their restricted receptive fields encounter difficulties in modeling long-term dependencies and are unable to capture the global evolutionary patterns of vibration signals. LSTM and GRU architectures, although demonstrating strong capabilities in time-series modeling for representing long-term dependencies, have inherent limitations. These include challenges in parallelization during training, sub-optimal computational efficiency, weak localization of transient impact features, and potential gradient decay issues in extended sequences. TCN architectures partially overcome the shortcomings of CNNs by expanding convolutional structures, enabling effective long-term sequence modeling with improved parallel processing capabilities. Nevertheless, as convolutional architectures at their core, TCNs still exhibit limitations in modeling complex nonlinear dynamic patterns, making it arduous to fully characterize the evolutionary dynamics of vibration signals.

Serial TCN-LSTM models: Current research generally adopts a tandem structure (TCN→LSTM) to integrate the advantages of both architectures. However, such architectures still manifest the following problems: information loss during feature transfer; biased learning of features across different scales; and an increase in network depth due to the serial structure, which undermines training efficiency.

To tackle the aforementioned challenges, this study proposes the BiTSF-Net model with a parallel dual-branch architecture. By simultaneously applying BiTCN and BiLSTM to input features, the model realizes collaborative learning of multi-scale features.

3.1. BiTCN

BiTCN utilizes one-dimensional causal convolution to conduct time-series modeling. The output of BiTCN at any specific moment is solely contingent upon the current and historical inputs, thus precluding information leakage. Regarding the input sequence X, the one-dimensional convolution operation within TCN integrates dilated convolution to augment the modeling capacity for long-term dependencies, which can be formulated as

y (t) = \sum_{k = 0}^{K - 1} w (k) x (t - d \cdot k)

(16)

where K denotes the size of the convolution kernel,

w (k)

represents the kernel weight, and

d

is the dilation factor.

By incrementally increasing the expansion factor layer by layer, TCN can effectively extract features from long time series without significantly increasing network depth or parameter scale. Moreover, BiTCN employs forward TCN (Forward TCN) and backward TCN (Backward TCN) modeling for input sequences. The output features of the two TCN pathways are represented as follows:

H_{f} = TCN (x)

(17)

H_{b} = TCN (reverse (x))

(18)

3.2. BiLSTM

BiLSTM combines forward and backward LSTM layers to encode sequences from two directions, allowing the model to concurrently capture bidirectional contextual information that includes historical and future contexts. This bidirectional architecture substantially enhances the model’s capacity to perceive global sequence features.

BiLSTM is capable of processing both forward and backward time-series data simultaneously, facilitating more comprehensive prediction results and notably improving the accuracy of soft measurement forecasting. The following presents the hidden state equation of BiLSTM at time t:

\{\begin{array}{l} \vec{h_{t}} = \vec{LSTM} (h_{t - 1}, x_{t}, g_{t - 1}) \\ \overset{\leftarrow}{h_{t}} = \overset{\leftarrow}{LSTM} (h_{t + 1}, x_{t}, g_{t + 1}) \end{array}

(19)

H_{t} = [\begin{matrix} \vec{h_{t}}, \overset{\leftarrow}{h_{t}} \end{matrix}]

(20)

where

\vec{h_{t}}

denotes the forward-propagation hidden layer state, while

\overset{\leftarrow}{h_{t}}

represents the backward-propagation hidden layer state.

3.3. Feature Extraction

To effectively characterize the dynamic variations in bearing vibration signals under different fault conditions, 11-dimensional time-domain statistical features are extracted from the selected modal component in this study. These features include mean value, standard deviation, variance, root mean square, peak value, peak-to-peak value, skewness, kurtosis, crest factor, impulse factor, and margin factor.

Specifically, the mean value reflects the average vibration level of the selected modal component, while the standard deviation and variance describe the dispersion degree and fluctuation intensity of the signal. The root mean square represents the effective energy level of the vibration signal and is sensitive to changes in bearing operating states. The peak value and peak-to-peak value characterize the maximum impact amplitude and overall vibration range, which are closely related to transient shock components caused by local bearing defects.

In addition, skewness reflects the asymmetry of the signal amplitude distribution, whereas kurtosis is highly sensitive to impulsive components and is commonly used to identify early bearing faults. The crest factor evaluates the ratio between the peak amplitude and the effective signal energy, which can indicate the existence of abnormal impact responses. The impulse factor further measures the intensity of sudden shocks relative to the average signal amplitude, while the margin factor emphasizes the sensitivity of extreme impact components and is useful for detecting localized fault-induced vibration impulses.

Compared with high-dimensional frequency-domain and time–frequency-domain features, these time-domain statistical indicators have lower computational complexity, stronger physical interpretability, and higher computational efficiency. More importantly, bearing fault signals usually exhibit impulsive, nonstationary, and amplitude-fluctuation characteristics, which can be effectively described by the selected 11-dimensional feature set. Therefore, these features can balance diagnostic information representation and computational cost, providing reliable input information for subsequent BiTSF-Net fault classification.

3.4. Feature Fusion and Overall Network Architecture of BiTSF-Net

To comprehensively leverage the multi-level temporal features intrinsic to mechanical fault signals, this research puts forward a BiTSF-Net model. Through the effective integration of complementary features extracted by BiTCN and BiLSTM, the network realizes the collaborative modeling of local impact characteristics and long-term dependency features. The comparative architecture between BiTSF-Net and serial TCN-LSTM is presented in Figure 3.

In the BiTSF-Net, the input feature sequence is initially fed in parallel into two sequential modeling branches, namely the BiTCN and the BiLSTM. The BiTCN is dedicated to extracting local temporal patterns and multi-scale impact features from signals via expanded convolutional architectures, which facilitates the efficient capture of short-term mutations and periodic information. Simultaneously, the BiLSTM utilizes a bidirectional recursive structure to model sequences, highlighting the long-term dependency relationships and nonlinear dynamic evolution characteristics of signals.

To effectively utilize the temporal features extracted from different branches, this research adopts a feature-level fusion strategy to integrate the outputs of the BiTCN and the BiLSTM. Specifically, a joint feature representation is constructed by concatenating the two feature streams along the feature dimension.

H_{fusion} = [H_{BiTCN} / / H_{BiLSTM}]

(21)

Following feature fusion, a fully connected layer is incorporated to conduct nonlinear mapping and dimensionality reduction on the integrated features. This process not only strengthens feature coupling but also suppresses redundancy. Regularization methods, such as Dropout, are utilized to enhance the model’s generalization ability. Subsequently, the fused features are input into the classification layer for fault pattern recognition. In this layer, the Softmax function yields probability distributions for each fault category. The mathematical expression is as follows:

P (y = i ∣ H_{fusion}) = \frac{\exp (z_{i})}{\sum_{j = 1}^{C} \exp (z_{j})}

(22)

where

C

denotes the number of fault categories, and

z_{i}

represents the network output corresponding to the i-th category. By minimizing the cross-entropy loss function, end-to-end training of network parameters is conducted to achieve accurate discrimination of different fault states.

3.5. Bearing Fault Diagnosis Method for ISAO-VGNMD and BiTSF-Net

The bearing fault diagnosis model based on ISAO-VGNMD and BiTSF-Net follows the workflow illustrated in Figure 4. The specific diagnostic procedure is illustrated as follows:

Step 1: The original dataset is first divided into independent training and test sets. Subsequently, samples used for ISAO-based VGNMD parameter optimization are randomly selected only from the training set to avoid potential data leakage during the optimization process.

Step 2: Using envelope entropy as the fitness function and combining it with the ISAO algorithm, the parameters (α, β) of VGNMD are optimized to obtain the optimal IMF component index value.

Step 3: The optimal parameters are substituted back into VGNMD, the original data are decomposed, and the optimal IMF components are output.

Step 4: The corresponding optimal IMF components and their 11-dimensional time-domain features are extracted; then, a feature set is constructed after data augmentation.

Step 5: The extracted feature set is used to construct the BiTSF-Net fault diagnosis model based on the predefined training and test sets, where the test data remain completely independent throughout the optimization and training stages.

4. Experimental Verification and Analysis

4.1. Data Introduction

Data preprocessing is demonstrated using bearing parameters from CWRU. The test bench, as shown in Figure 5 [29,30], consists of a drive motor, torque sensor, load motor, and test bearings. The SKF6025 bearings used in the experiment comprise 10 different models listed in Table 1. During testing, the bearings operated at 1797 rpm with a sampling frequency of 12 kHz, generating 2048 data points per sample. Parameter configurations are detailed in Table 2.

The collected signals are saved in MAT format for subsequent data processing and analysis. These data will be utilized in bearing fault diagnosis research, enabling accurate fault diagnosis through feature extraction and classification recognition of vibration signals.

In this experiment, the dataset consists of 1200 samples divided into training and testing sets at a 3:1 ratio. To ensure fair evaluation and avoid information leakage, the test set was not used in the parameter optimization, modal selection, or model training procedures. Standard data labels are assigned as follows: Label 1 for normal data, Label 2 for inner ring fault 1, Label 3 for inner ring fault 2, Label 4 for inner ring fault 3, Label 5 for outer ring fault 1, Label 6 for outer ring fault 2, Label 7 for outer ring fault 3, Label 8 for rolling element fault 1, Label 9 for rolling element fault 2, and Label 10 for rolling element fault 3. Table 3 details the composition of the experimental dataset.

4.2. Data Processing

First, the original data is decomposed using the ISAO-optimized VGNMD method. Taking one sample from Dataset Label 7 as an example, the optimal mode is identified as the one with the lowest entropy value after decomposition. The entropy selection criteria are illustrated in the envelope entropy line graph shown in Figure 6.

As shown in Figure 6, the optimized VGNMD method effectively decomposes the vibration signal into multiple modal components with relatively clear frequency structures. A total of 40 IMFs were obtained, and the 26th mode exhibited the minimum envelope entropy value, indicating the strongest fault-related impulsive characteristics and the highest information concentration. Therefore, this mode was selected as the optimal component. Compared with the original signal, the selected mode shows more distinct fault impact features and reduced background interference, demonstrating that the proposed ISAO-based parameter optimization strategy improves modal separation quality. To facilitate clearer observation, Figure 7 presents the comparison between the original signal and the selected optimal mode.

Finally, for the selected optimal modalities, we extracted 11-dimensional time-domain features from their signal data. The same feature extraction process was then applied to all samples in the dataset to systematically calculate all feature values corresponding to each sample. The aggregated feature data was ultimately visualized in Figure 8 as a line graph for intuitive presentation.

As shown in Figure 8, the extracted features of different fault categories exhibit relatively clear clustering patterns with reduced inter-class overlap, indicating that the proposed feature extraction strategy effectively enhances feature separability and improves the discriminative capability of the subsequent BiTSF-Net classifier. Detailed descriptions of these features are provided in Section 3.3.

4.3. Comparison of Diagnostic Results in the CWRU Dataset

To validate the diagnostic performance of the model described in Section 3.4, this study comprehensively evaluates its adaptability and robustness in VGNMD-decomposed data environments using the CWRU dataset. Cross-validation experiments are conducted with recently proposed fault diagnosis models, including CNN, LSTM, GRU, and ISAO algorithms. The experiments were performed independently 20 times with repeated runs. Comparative results of all control models are presented in Table 4.

The population size NP of the SAO and ISAO algorithms is set to 20, and the iteration count T is set to 25. Then, fault diagnosis experiments are conducted for M1 through M6. The confusion matrix diagram obtained from one independent experiment is shown in Figure 9.

As shown in Figure 9, the overall diagnostic performance of models M1, M2, and M3 falls within the “usable but unstable” range. Specifically, M1 achieves an accuracy rate of 86.67%, M3 reaches 88.67%, and M2 demonstrates slightly better performance at 91.33%. These results indicate that under identical feature and input conditions, directly integrating basic network architectures like CNN, LSTM, or GRU after VGNMD alone results in limited discrimination capability for similar fault categories. This leads to more pronounced “off-diagonal misclassification” phenomena in the confusion matrix, where certain fault categories are erroneously assigned to other similar categories, thereby reducing overall classification accuracy. Within the internal comparison of M1–M3, the GRU-based model (M3) outperforms LSTM (M2) and CNN (M1) in accuracy, reflecting GRU’s superior balance between fitting performance and generalization ability through its simplified gate structure and moderate parameter scale in this experimental dataset. However, even with these advantages, the overall accuracy rate remains below 92%, indicating persistent challenges in completely eliminating confusion between different fault categories. This highlights the need for significant improvements in adaptability and robustness for existing models when handling complex data environments.

Further analysis of performance metrics from M4 to M6 reveals a stepwise significant improvement. M4 achieves an accuracy rate of 98%, demonstrating that the BiTSF-Net architecture effectively integrates long-term temporal dependencies with discriminative features in time-series data. This optimization concentrates energy distribution along the principal diagonal in the confusion matrix, resulting in a substantial reduction in inter-class misclassifications. Building upon this foundation, the introduction of SAO and the improved ISAO algorithm for automatic parameter optimization further enhanced the diagnostic performance. Specifically, M5 achieved an accuracy of 98.33%, while M6 obtained a maximum diagnostic accuracy of 100% in one independent experimental run. In Figure 9g, 10 categories correspond to ten fault types, and the diagnostic accuracy for each fault reaches 100%. Detailed performance metrics and stability data are presented in Table 5.

It should be noted that the time reported in Table 5 refers to the testing time, namely, the runtime required to perform fault diagnosis on the test set. It does not include the model training time or the offline ISAO-based VGNMD parameter optimization time.

Table 5 reveals that traditional models (CNN, GRU, and LSTM) exhibit noticeable performance limitations in fault diagnosis tasks, with accuracy rates consistently below 92%. Regarding diagnostic efficiency, the reported time corresponds to the average testing time required for fault classification rather than model training time. Among the traditional models, GRU achieves a relatively favorable balance between diagnostic accuracy and testing efficiency, whereas LSTM requires slightly longer testing time without providing substantial improvements in diagnostic performance.

In contrast, the proposed BiTSF-Net model (M4) achieves a diagnostic accuracy of 98.00% while maintaining a relatively low testing time, demonstrating that the parallel fusion of BiTCN and BiLSTM effectively enhances feature representation capability without significantly increasing diagnostic latency. Building upon this architecture, the introduction of SAO optimization (M5) further improves the average diagnostic accuracy to 98.53%, indicating that parameter optimization enhances the robustness and stability of the fault diagnosis model.

By implementing the enhanced ISAO algorithm (M6), the proposed model achieved a maximum diagnostic accuracy of 100% and an average diagnostic accuracy of 99.63% across 20 independent experiments. The results indicate that the proposed framework maintains high diagnostic stability and strong generalization performance under repeated evaluations.

Overall, the proposed ISAO-BiTSF-Net model demonstrates significant advantages in accuracy, stability, and computational efficiency, validating the effectiveness and superiority of this approach for bearing fault diagnosis under complex operating conditions.

It should be noted that the CWRU dataset used in this study was collected under relatively controlled laboratory conditions with stable operating parameters and limited environmental interference. Therefore, the fault characteristics are more distinguishable than those in real industrial environments. In practical applications, variable operating conditions, strong background noise, and coupled interference may increase the difficulty of fault diagnosis. Although the proposed method achieved excellent performance on the benchmark dataset, further validation under real industrial scenarios is still required to evaluate its generalization capability and engineering applicability.

The results indicate that the design of BiTSF-Net and optimization algorithms is key to performance improvement, with ISAO exhibiting stronger global search and convergence capabilities compared to SAO. To further demonstrate ISAO’s convergence performance, it was tested against the SAOE algorithm on four benchmark functions, as shown in Figure 10.

The convergence curves in Figure 10a–d demonstrate that ISAO outperforms SAO across all test functions, achieving faster convergence to optimal solutions during initial iterations with lower final fitness values. Its convergence process exhibits smoother stability without significant oscillations or premature convergence, indicating the algorithm strikes a more balanced equilibrium between global exploration and local optimization.

To further verify the contribution of the proposed ISAO optimization strategy and the BiTSF-Net architecture to bearing fault diagnosis performance, additional ablation experiments were conducted under identical experimental conditions. The comparative results are summarized in Table 6.

As shown in Table 6, the proposed BiTSF-Net achieves higher diagnostic accuracy than the conventional TCN-LSTM architecture, indicating that the parallel dual-branch structure can more effectively capture both local transient features and long-term temporal dependencies of bearing vibration signals. Compared with the original BiTSF-Net, the incorporation of SAO-based parameter optimization further improves the diagnostic performance, demonstrating the importance of adaptive VGNMD parameter optimization for modal decomposition quality and fault feature representation.

Furthermore, the proposed ISAO-BiTSF-Net framework achieves the highest optimal accuracy and average accuracy among all compared models. These results demonstrate that the proposed ISAO strategy can effectively enhance decomposition parameter optimization capability, thereby improving modal separation quality and providing more discriminative fault information for subsequent BiTSF-Net classification.

5. Conclusions and Further Research

5.1. Conclusions

To address the challenges of strong nonstationarity, noise interference, and insufficient adaptive feature extraction in rolling bearing vibration signals under complex operating conditions, this study proposes an intelligent fault diagnosis framework integrating ISAO-optimized VGNMD with BiTSF-Net. The proposed method combines adaptive signal decomposition, entropy-based modal selection, and multi-scale temporal feature learning to improve fault feature representation and diagnostic performance.

First, the proposed ISAO algorithm effectively enhances the parameter optimization capability of VGNMD by introducing Latin hypercube sampling and Tent chaotic mapping. Compared with conventional optimization strategies, ISAO improves population diversity, global exploration capability, and convergence stability, thereby enabling more accurate extraction of informative modal components from complex nonstationary signals.

Second, the minimum envelope entropy criterion enables adaptive selection of modal components containing significant fault-related impact information. Combined with multidimensional time-domain feature extraction, the proposed framework effectively suppresses redundant and noise-dominated information while improving feature representation capability.

Third, experimental results verified the effectiveness and superiority of the proposed framework. On the CWRU bearing dataset containing 10 fault categories, the proposed ISAO-VGNMD-BiTSF-Net method achieved a maximum diagnostic accuracy of 100% and an average diagnostic accuracy of 99.63%, outperforming comparative deep learning models such as CNN, GRU, LSTM, and conventional TCN-LSTM networks. These results confirm that the proposed method possesses excellent feature extraction capability, high diagnostic accuracy, and strong robustness for intelligent bearing fault diagnosis under nonstationary conditions.

From an industrial perspective, the proposed framework provides a promising solution for intelligent condition monitoring and predictive maintenance of rotating machinery. By effectively extracting fault-sensitive information from complex vibration signals, the method can potentially be applied to critical industrial equipment such as wind turbines, induction motors, gearboxes, railway traction systems, and power generation machinery. The high diagnostic accuracy and adaptive decomposition capability of the proposed framework may contribute to early fault detection, maintenance decision support, reduced unplanned downtime, and improved operational reliability in industrial environments.

It should also be noted that the present study was validated using the CWRU benchmark dataset collected under relatively controlled laboratory conditions. Although the proposed framework demonstrated excellent diagnostic performance, further validation using real industrial field data is still necessary to comprehensively evaluate its robustness and generalization capability under practical operating environments involving variable working conditions, strong background noise, sensor uncertainty, and coupled fault scenarios. Such investigations will further support the practical deployment of the proposed framework in industrial intelligent maintenance systems.

5.2. Further Research

Although the proposed method achieves excellent diagnostic performance on the benchmark dataset, several research directions deserve further investigation:

Future work will integrate vibration, acoustic, temperature, current, and other heterogeneous sensor signals to construct a multi-modal fault diagnosis framework, thereby improving diagnostic reliability and robustness under complex industrial conditions.
Future studies will explore domain adaptation and transfer learning techniques to enhance model adaptability under varying loads, rotational speeds, and environmental disturbances, thereby improving cross-condition diagnostic performance.
To facilitate practical industrial implementation, lightweight network architectures and edge-computing deployment strategies will be investigated to enable real-time fault monitoring and intelligent maintenance applications.
Although ISAO demonstrates strong optimization capability, hybrid optimization mechanisms combining multiple swarm intelligence algorithms and adaptive parameter control strategies can be further explored to improve convergence efficiency and solution accuracy in high-dimensional optimization problems.

Author Contributions

Conceptualization, J.L. and X.C.; methodology, H.L.; software, J.L.; validation, X.Z., X.H.; formal analysis, J.L.; investigation, X.C.; resources, H.L.; data curation, X.Z.; writing—original draft preparation, J.L.; writing—review and editing, X.C.; visualization, T.P.; supervision, X.C.; project administration, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Construction Project of Changdian Xinneng Pumped Storage Electrical Secondary Innovation Laboratory, China Yangtze Power Co., Ltd., grant number Z152401006.

Data Availability Statement

The data supporting the reported results are available from the corresponding author upon reasonable request.

Acknowledgments

Special thanks are given to the Longxu Youth Innovation Studio for its support.

Conflicts of Interest

Authors Xupeng Chen, Huiyin Li, Xu Zhang and Jianling Lai were employed by the company China Yangtze Power Renewables Co., Ltd. Authors Xin Hu and Tian Peng were employed by the company PowerChina HuaDong Engineering Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Luo, R.; Xie, H.; Wen, H.; He, H.; Li, Y.; Wang, K. Cross-Domain Bearing Fault Diagnosis Under Class Imbalance: A Dynamic Maximum Triple-View Classifier Discrepancy Network. Algorithms 2026, 19, 228. [Google Scholar] [CrossRef]
Zou, Y.; Li, C.; Zhang, Y.; Si, Z.; Li, L. A Multimodal Three-Channel Bearing Fault Diagnosis Method Based on CNN Fusion Attention Mechanism Under Strong Noise Conditions. Algorithms 2026, 19, 144. [Google Scholar] [CrossRef]
Zhang, Y.; Li, S.; Li, A.; Zhang, G.; Wu, M. Fault diagnosis method of belt conveyor idler based on sound signal. J. Mech. Sci. Technol. 2023, 37, 69–79. [Google Scholar] [CrossRef]
Liang, P.; Deng, C.; Wu, J.; Yang, Z. Intelligent fault diagnosis of rotating machinery via wavelet transform, generative adversarial nets and convolutional neural network. Measurement 2020, 159, 107768. [Google Scholar] [CrossRef]
Vijayakumar, K.; Kumar, B.D. An EEG-Driven Exoskeleton Rehabilitation Robot for Upper Limb Recovery Using Empirical Mode Decomposition: A Response Surface Methodology Approach. Russ. J. Phys. Chem. B 2026, 20, 304–313. [Google Scholar] [CrossRef]
Lei, Y.; Lin, J.; He, Z.; Zuo, M.J. A review on empirical mode decomposition in fault diagnosis of rotating machinery. Mech. Syst. Signal Process. 2013, 35, 108–126. [Google Scholar] [CrossRef]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
Nazari, M.; Sakhaei, S.M. Successive variational mode decomposition. Signal Process. 2020, 174, 107610. [Google Scholar] [CrossRef]
Wang, H.; Chen, S.; Zhai, W. Variational generalized nonlinear mode decomposition: Algorithm and applications. Mech. Syst. Signal Process. 2024, 206, 110913. [Google Scholar] [CrossRef]
Ding, C.; Huang, X.; Wang, B.; Li, X.; Huang, W.; Zhu, Z. Low-rank informed adaptive chirp mode decomposition and its application in rotating machine fault diagnosis under varying speed conditions. IEEE Trans. Instrum. Meas. 2025, 74, 3524313. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of ICNN’95-International Conference on Neural Networks; IEEE: New York, NY, USA, 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Price, K.V. Differential evolution. In Handbook of Optimization: From Classical to Modern Approach; Springer: Berlin/Heidelberg, Germany, 2013; pp. 187–214. [Google Scholar]
Deng, L.; Liu, S. Snow ablation optimizer: A novel metaheuristic technique for numerical optimization and engineering design. Expert Syst. Appl. 2023, 225, 120069. [Google Scholar] [CrossRef]
Lei, C.; Zhou, J.; Li, M.; Hao, D.; Li, X.; Li, C.; Feng, R. Rolling bearing fault diagnosis based on MDBO-SVMD. J. Mech. Sci. Technol. 2026, 40, 1541–1554. [Google Scholar] [CrossRef]
Guo, Z.; Li, J.; Wang, T.; Xie, J.; Yang, J.; Niu, B. Dynamic-Constrained Digital Twin-Based Mechanical Diagnosis Framework under Undetermined States without Fault Data. IEEE Trans. Instrum. Meas. 2025, 74, 3547715. [Google Scholar]
Yongxing, Z.; Yuntian, T.; Ran, B.; Bo, T.; Zhengjie, L.; Yihong, Y.; Jingsong, X.; Zhibin, G. A cross-working-condition prediction method for bearing remaining useful life based on SPW-SVDD health indicators and temporal self-attention mechanism. Adv. Eng. Inform. 2026, 71, 104313. [Google Scholar] [CrossRef]
Ji, C.; Zhang, C.; Suo, L.; Liu, Q.; Peng, T. Swarm intelligence based deep learning model via improved whale optimization algorithm and Bi-directional long short-term memory for fault diagnosis of chemical processes. ISA Trans. 2024, 147, 227–238. [Google Scholar] [CrossRef]
Guo, L.; Li, N.; Jia, F.; Lei, Y.; Lin, J. A recurrent neural network based health indicator for remaining useful life prediction of bearings. Neurocomputing 2017, 240, 98–109. [Google Scholar] [CrossRef]
Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]
Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks for action segmentation and detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2017; pp. 156–165. [Google Scholar]
Wang, J.; He, H.; Wang, Z.; Du, W.; Duan, N.; Zhang, Z. Application of optimized adaptive chirp mode decomposition method in chirp signal. Appl. Sci. 2020, 10, 3695. [Google Scholar] [CrossRef]
Iordanis, I.; Koukouvinos, C.; Silou, I. On the efficacy of conditioned and progressive Latin hypercube sampling in supervised machine learning. Appl. Numer. Math. 2025, 208, 256–270. [Google Scholar] [CrossRef]
Wang, Z.; Geng, Z.; Fang, X.; Tian, Q.; Lan, X.; Feng, J. The optimal and economic planning of a power system based on the microgrid concept with a modified seagull optimization algorithm integrating renewable resources. Appl. Sci. 2022, 12, 4743. [Google Scholar] [CrossRef]
Wang, X.; Li, J.H.; Jing, Z.; Li, H.; Xing, Z.; Yang, Z.; Cao, L.; Zhou, X. Fault diagnosis method of rolling bearing based on SSA-VMD and RCMDE. Sci. Rep. 2024, 14, 30637. [Google Scholar] [CrossRef]
Li, J.; Luo, W.; Bai, M.; Song, M. Fault diagnosis of high-speed rolling bearing in the whole life cycle based on improved grey wolf optimizer-least squares support vector machines. Digit. Signal Process. 2024, 145, 104345. [Google Scholar] [CrossRef]
Yang, Y.; Liu, H.; Han, L.; Gao, P. A feature extraction method using VMD and improved envelope spectrum entropy for rolling bearing fault diagnosis. IEEE Sens. J. 2023, 23, 3848–3858. [Google Scholar] [CrossRef]
Zhang, C.; Ma, H.; Hua, L.; Sun, W.; Nazir, M.S.; Peng, T. An evolutionary deep learning model based on TVFEMD, improved sine cosine algorithm, CNN and BiLSTM for wind speed prediction. Energy 2022, 254, 124250. [Google Scholar] [CrossRef]
Smith, W.A.; Randall, R.B. Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study. Mech. Syst. Signal Process. 2015, 64, 100–131. [Google Scholar] [CrossRef]
Raj, K.K.; Kumar, S.; Kumar, R.R.; Andriollo, M. Enhanced fault detection in bearings using machine learning and raw accelerometer data: A case study using the Case Western Reserve University dataset. Information 2024, 15, 259. [Google Scholar] [CrossRef]

Figure 1. Framework of VGNMD.

Figure 2. Flowchart of the ISAO algorithm.

Figure 3. The comparative architecture of BiTSF-Net and serial TCN-LSTM.

Figure 4. Flowchart of Diagnostic Process.

Figure 5. Experimental platform of Western Reserve University.

Figure 6. Line Chart of Envelope Entropy.

Figure 7. Plot of Optimal IMF.

Figure 8. Line Chart of Feature Data.

Figure 9. Plot of Diagnostic Results for M1–M6.

Figure 10. Comparison of Convergence Curves between ISAO and SAO.

Table 1. Model parameters of different fault states.

Status	Fault Diameter (Inches)
Rolling Element Fault	0.007/0.014/0.021
Inner Race Fault	0.007/0.014/0.021
Outer Race Fault	0.007/0.014/0.021
Normal Condition	None

Table 2. Parameter settings of bearing data.

Parameters	Settings
Load	0 HP
Model	SKF6025
Frequency	12 kHz
Rotational Speed	1797 rpm
Sampling Points	2048

Table 3. Composition of the experimental bearing fault states dataset.

Dataset Label	Bearing Fault Type	Fault Diameter (Inches)
1	Normal	0
2	Inner Race Fault	0.007
3	Rolling Element Fault	0.007
4	Outer Race Fault	0.007
5	Inner Race Fault	0.014
6	Rolling Element Fault	0.014
7	Outer Race Fault	0.014
8	Inner Race Fault	0.021
9	Rolling Element Fault	0.021
10	Outer Race Fault	0.021

Table 4. Comparison of Various Control Group Models.

Number	Model	Number	Model
M1	CNN	M4	BiTSF-Net
M2	GRU	M5	SAO-BiTSF-Net
M3	LSTM	M6	ISAO-BiTSF-Net

Table 5. Comparison of diagnostic results for the CWRU dataset.

Diagnostic Model	Time	Optimal Accuracy Rate	Average Accuracy Rate
M1	2.04	86.67%	85.33%
M2	1.43	91.33%	90.79%
M3	2.44	88.67%	87.67%
M4	1.65	98%	97.33%
M5	2.57	98.53%	98.33%
M6	3.09	100.0%	99.63%

Table 6. Ablation analysis of different optimization strategies and network architectures.

Method	Time	Optimal Accuracy Rate	Average Accuracy Rate
TCN-LSTM	1.51	97.12%	96.78%
BiTSF-Net	1.65	98%	97.33%
SAO-BiTSF-Net	2.57	98.33%	98.53%
ISAO-BiTSF-Net	3.09	100.0%	99.63%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, X.; Li, H.; Zhang, X.; Lai, J.; Hu, X.; Peng, T. Enhancement of Bearing Fault Diagnosis Using Optimized Variational Decomposition, Entropy-Based Modal Reconstruction, and Evolutionary Bidirectional Fusion Network. Processes 2026, 14, 1861. https://doi.org/10.3390/pr14121861

AMA Style

Chen X, Li H, Zhang X, Lai J, Hu X, Peng T. Enhancement of Bearing Fault Diagnosis Using Optimized Variational Decomposition, Entropy-Based Modal Reconstruction, and Evolutionary Bidirectional Fusion Network. Processes. 2026; 14(12):1861. https://doi.org/10.3390/pr14121861

Chicago/Turabian Style

Chen, Xupeng, Huiyin Li, Xu Zhang, Jianling Lai, Xin Hu, and Tian Peng. 2026. "Enhancement of Bearing Fault Diagnosis Using Optimized Variational Decomposition, Entropy-Based Modal Reconstruction, and Evolutionary Bidirectional Fusion Network" Processes 14, no. 12: 1861. https://doi.org/10.3390/pr14121861

APA Style

Chen, X., Li, H., Zhang, X., Lai, J., Hu, X., & Peng, T. (2026). Enhancement of Bearing Fault Diagnosis Using Optimized Variational Decomposition, Entropy-Based Modal Reconstruction, and Evolutionary Bidirectional Fusion Network. Processes, 14(12), 1861. https://doi.org/10.3390/pr14121861

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancement of Bearing Fault Diagnosis Using Optimized Variational Decomposition, Entropy-Based Modal Reconstruction, and Evolutionary Bidirectional Fusion Network

Abstract

1. Introduction

1.1. Research Background

1.2. Literature Review

1.3. Research Gaps and Innovations

2. Theoretical Research

2.1. VGNMD

2.2. Improve SAO

2.2.1. Latin Hypercube Sampling Initialization

2.2.2. Tent Chaos Mapping Enhances Population Diversity

2.2.3. ISAO Optimization of VGNMD Parameters

2.3. Envelope Entropy and Feature Extraction

3. BiTSF-Net Construction

3.1. BiTCN

3.2. BiLSTM

3.3. Feature Extraction

3.4. Feature Fusion and Overall Network Architecture of BiTSF-Net

3.5. Bearing Fault Diagnosis Method for ISAO-VGNMD and BiTSF-Net

4. Experimental Verification and Analysis

4.1. Data Introduction

4.2. Data Processing

4.3. Comparison of Diagnostic Results in the CWRU Dataset

5. Conclusions and Further Research

5.1. Conclusions

5.2. Further Research

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI