Enhancing Fault Diagnosis: A Hybrid Framework Integrating Improved SABO with VMD and Transformer–TELM

Yang, Jingzong; Li, Xuefeng; Mao, Min

doi:10.3390/lubricants13040155

Open AccessArticle

Enhancing Fault Diagnosis: A Hybrid Framework Integrating Improved SABO with VMD and Transformer–TELM

by

Jingzong Yang

¹

,

Xuefeng Li

² and

Min Mao

^3,*

¹

School of Big Data, Baoshan University, Baoshan 678000, China

²

College of Automobile and Traffic Engineering, Nanjing Forestry University, Nanjing 210037, China

³

Faculty of Information Engineering, Quzhou College of Technology, Quzhou 324000, China

^*

Author to whom correspondence should be addressed.

Lubricants 2025, 13(4), 155; https://doi.org/10.3390/lubricants13040155

Submission received: 6 March 2025 / Revised: 23 March 2025 / Accepted: 29 March 2025 / Published: 31 March 2025

(This article belongs to the Special Issue Tribological Characteristics of Bearing System, 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Rolling bearings, as core components in mechanical systems, directly influence the overall reliability of equipment. However, continuous operation under complex working conditions can easily lead to gradual performance degradation and sudden faults, which not only result in equipment failure but may also trigger a cascading failure effect, significantly amplifying downtime losses. To address this challenge, this study proposes an intelligent diagnostic method that integrates variational mode decomposition (VMD) optimized by the improved subtraction-average-based optimizer (ISABO) with transformer–twin extreme learning machine (Transformer–TELM) ensemble technology. Firstly, ISABO is employed to finely optimize the initialization parameters of VMD. With the improved initialization strategy and particle position update method, the optimal parameter combination can be precisely identified. Subsequently, the optimized parameters are used to model and decompose the signal through VMD, and the optimal signal components are selected through a constructed two-dimensional evaluation system. Furthermore, diversified time-domain features are extracted from these components to form an initial feature set. To deeply mine feature information, a multi-layer Transformer model is introduced to refine more discriminative feature representations. Finally, these features are input into the constructed TELM fault diagnosis model to achieve precise diagnosis of rolling bearing faults. The experimental results demonstrate that this method exhibits excellent performance in terms of noise resistance, accurate fault feature capture, and fault classification. Compared with traditional machine learning techniques such as kernel extreme learning machine (KELM), extreme learning machine (ELM), support vector machine (SVM), and Softmax, this method significantly outperforms other models in terms of accuracy, recall, and F1 score.

Keywords:

improved subtraction-average-based optimizer (ISABO); transformer; twin extreme learning machine (TELM); variational mode decomposition (VMD); fault diagnosis

1. Introduction

In modern industrial systems, the stable operation of mechanical equipment is a crucial factor in ensuring production efficiency and product quality. Among these, rolling bearings, as important components for connecting and supporting various parts, play a vital role in the reliability and safety of the entire mechanical system. During long-term operation, rolling bearings are inevitably subjected to various types of faults due to factors such as complex and varying loads, deterioration of lubrication conditions, and material fatigue. If these faults are not detected and addressed promptly, they can not only lead to equipment shutdowns, causing significant economic losses, but they may also pose serious threats to the safety of operators. Therefore, the development of an efficient and accurate fault diagnosis method for rolling bearings is of great significance for improving equipment maintenance efficiency, reducing operating costs, and ensuring production safety.

Faults in rolling bearings are usually accompanied by changes in vibration signals, which contain rich fault information. By collecting and analyzing these vibration signals, fault features can be effectively extracted, thereby enabling the diagnosis of bearing faults. In the field of fault feature extraction, traditional time-frequency analysis methods such as Short-Time Fourier Transform (STFT) [1,2,3] and Wavelet Decomposition [4,5,6] are widely used but are limited by issues such as preset basis functions and poor adaptability to non-stationary signals. Adaptive decomposition methods such as Local Mean Decomposition (LMD) [7], Empirical Mode Decomposition (EMD) [8], and their derivative algorithms (e.g., Ensemble Empirical Mode Decomposition (EEMD) [9] and Complete Ensemble Empirical Mode Decomposition (CEEMD) [10]) effectively alleviate these problems through dynamic decomposition mechanisms. Zhao H et al. [11] introduced a methodology integrating dual interpolation with variational interval reconstruction for Local Mean Decomposition (LMD), aiming to augment the stability of the demodulation process and refine the precision of signal components. The outcomes indicate that the envelope spectrum derived from the proposed LMD method displays prominent fault frequencies with lower noise contamination. Sun Y et al. [12] proposed a bearing fault diagnosis method based on EMD and an improved Chebyshev distance. This method decomposes the signal using EMD and converts the retained IMFs into symmetric dot pattern (SDP) images. Subsequently, an improved Chebyshev distance is constructed to measure the difference between each IMF component and the average matrix, using the improved Chebyshev distance of IMF1 as a feature. Experimental tests verify that this method can effectively diagnose rolling bearing faults. Zhao Y et al. [13] proposed a fault diagnosis method based on multi-scale fuzzy entropy feature fusion for complex fault vibration signals in rolling bearings, which are difficult to extract features from. This method obtains IMF components through EEMD, calculates energy and kurtosis indicators, selects optimal IMF components, computes multi-scale fuzzy entropy for feature fusion, and then uses a Least Squares Support Vector Machine (LSSVM) for fault diagnosis. Experimental verification shows that this method can quantitatively represent fault signal data and improve anti-interference ability. Zhang L et al. [14] proposed a diagnosis method based on equipment operating principles and CEEMD for fault diagnosis of rotor components in large rotating machinery. Firstly, the rotor vibration displacement data are preprocessed, and then vibration data from effective and stable operation phases are selected for EMD and CEEMD analysis. Finally, dimensionless statistical indicators are extracted from IMFs for fault diagnosis. Experimental results show that the proposed method successfully achieves fault diagnosis of rotor components in large rotating machinery. However, modal mixing and endpoint effects in the above algorithms still limit their practical engineering applications.

In recent years, the variational mode decomposition (VMD) method proposed by Dragomiretskiy et al. [15] has demonstrated unique advantages in the field of signal processing as an emerging adaptive signal processing technique. By constructing and solving a variational problem, VMD decomposes the signal into a series of modal components with specific center frequencies and limited bandwidths, effectively avoiding issues such as modal mixing and endpoint effects and improving the stability and accuracy of signal decomposition. In the field of rolling bearing fault diagnosis, VMD has become a research hotspot due to its excellent performance. Ma Z et al. [16] proposed the RIME-VMD feature extraction method for rolling bearing fault diagnosis, optimizing VMD parameters through the RIME algorithm and reconstructing and denoising by selecting IMFs with the most significant fault features. Then, sample entropy is calculated as a fault feature and input into an SVM for operational diagnosis. Compared with VMD optimized by the Whale Optimization Algorithm, RIME-VMD has a higher search efficiency, enhances the robustness of fault detection, and achieves rapid identification. Chang B et al. [17] proposed a fault diagnosis method based on improved dung beetle optimization (DBO) algorithm-optimized VMD combined with Stacked Sparse Autoencoders (SSAE) for the non-stationary and nonlinear vibration signals of high-speed train axle box bearings. This method optimizes VMD parameters to solve issues such as mode mixing and constructs a feature set input into the SSAE model for training and testing, improving the accuracy of rolling bearing fault diagnosis. For early weak fault detection in rolling bearings affected by noise interference, Lv Q et al. [18] proposed the SCSSA-VMD-MCKD method. This method optimizes VMD and Maximum Correlated Kurtosis Deconvolution (MCKD) through the Sine–Cosine and Cauchy Mutation Sparrow Search Algorithm (SCSSA), aiming to leverage their advantages in noise reduction and highlighting fault frequencies. Experiments show that this method can effectively identify weak fault signals in bearing signals. However, the performance of the VMD algorithm largely depends on the selection of its parameters, including the number of modes and penalty factors, which directly affect the quality of decomposition results and the extraction effect of fault features. Therefore, how to optimize the parameters of the VMD algorithm to further enhance its performance in rolling bearing fault diagnosis has become a focus and difficulty of current research. Meanwhile, the core idea of signal processing methods based on non-stationary nonlinear time-frequency analysis is to decompose complex fault signals into multiple signal components, ranging from high to low frequencies, in order to more effectively extract fault features. In practical applications, the most representative key component is usually selected from these decomposed signal components according to specific rules. This key component often contains the most abundant fault information and can provide strong support for the modeling and classification of subsequent fault diagnosis models [19].

In the realm of fault state recognition, deep learning models have increasingly supplanted traditional machine learning methods due to their formidable nonlinear mapping capabilities. Convolutional neural networks (CNNs) have shown notable advantages in bearing fault classification tasks by utilizing local receptive fields and weight-sharing mechanisms [20,21,22,23,24]. However, traditional CNN architectures face two primary challenges when processing long temporal vibration signals. First, the local perception characteristic of convolutional operations limits the model’s ability to capture global temporal dependencies. Second, the concatenation of final fully connected layers with a Softmax classifier can reduce generalization performance for nonlinearly separable problems. Recently, the Transformer model [25,26], leveraging the dynamic weight allocation of its self-attention mechanism, has demonstrated exceptional performance in temporal signal modeling. By calculating correlation weights among sequence elements, it can adaptively focus on crucial fault feature regions, while its parallelized computational architecture significantly boosts training efficiency. Notably, the multi-head attention mechanism of the Transformer model can jointly mine time-frequency coupling features of signals from diverse subspaces, which offers a novel technical pathway for fault mode identification under complex operating conditions. Concurrently, the twin extreme learning machine (TELM) model [27], by amalgamating the structural risk minimization principle of twin support vector machines with the random feature mapping strengths of extreme learning machines, strikes a balance between computational efficiency and generalization performance at the classifier design level. How to construct a collaborative framework between Transformer and TELM and exploit their complementary advantages in feature extraction and classification decision-making is crucial for enhancing the performance of the diagnostic system.

In summary, this paper introduces a fault diagnosis approach that integrates an improved subtraction-average-based optimizer (ISABO) for optimizing the variational mode decomposition (VMD) algorithm with a Transformer-enhanced TELM model. The methodology begins with the establishment of a dual-index evaluation framework, which consists of the envelope entropy and envelope Gini coefficient. The ISABO algorithm is then used to precisely adjust the parameters of VMD, ensuring an optimal parameter combination. These optimized parameters are applied in the VMD algorithm to facilitate the adaptive decomposition of fault signals. Based on the dual-index evaluation framework, the optimal intrinsic mode functions (IMFs) are selected, and various time-domain features are extracted to form an initial feature set. To enhance diagnostic precision, a Transformer is employed for in-depth feature extraction and refinement of the initial feature set. The refined features are then input into the TELM fault diagnosis model. Finally, rigorous training and testing of the model are conducted to achieve accurate diagnosis and identification of rolling bearing faults.

The structure of this paper is outlined below: Section 2 delineates the theoretical foundations of VMD, ISABO, Transformer, and TELM; Section 3 exhaustively describes the overall architecture and critical technical implementation steps of the proposed method; Section 4 validates the effectiveness and advancement of the method through an array of comparative experiments; and Section 5 summarizes the research findings and anticipates future research directions.

2. Theoretical Background

2.1. VMD

VMD represents an innovative non-recursive signal processing technique, with its core concept centered on leveraging a specific variational framework to discern and isolate Amplitude-Modulated and Frequency-Modulated (AM-FM) components, referred to as intrinsic mode functions (IMFs), within signals. Throughout the iterative computation procedure of VMD, the algorithm iteratively optimizes the center frequency and bandwidth parameters of each IMF. Consequently, these IMF components are adaptively segregated based on their respective frequency domain attributes. From a mathematical perspective, the formulation of VMD can be expressed as an optimization-problem-solving process.

μ_{k} (t) = A_{k} (t) \cos [ϕ_{k} (t)]

(1)

In the aforementioned mathematical formulation, the phase

ϕ_{k} (t)

is regarded as a non-decreasing function. Meanwhile, the envelope function and the instantaneous phase are denoted by the specific symbols

A_{k} (t)

and

ω_{k} (t)

, respectively.

ω_{k} (t) = {ϕ_{k}}^{'} (t) = \frac{d ϕ_{k} (t)}{d t}

(2)

where signal

μ_{k} (t)

can be considered as a harmonic signal, possessing a clearly defined amplitude and frequency, which are denoted by

A_{k} (t)

and

ω_{k} (t)

, respectively.

In the application of the VMD algorithm, signal X(t) is decomposed into multiple finite-bandwidth IMFs. This process entails applying the Hilbert transform to each IMF to obtain its analytic signal. Following this, these analytic signals undergo modulation to yield the modulated baseband signals. To assess the bandwidth of each IMF, the squared norm of the gradient of the shifted demodulated signal is computed. This step results in an expression that incorporates constraint conditions, which have a substantial influence on the decomposition performance of VMD.

\{\begin{cases} \min_{\{μ_{k}\}, \{ω_{k}\}} \{\sum_{k} {‖d_{t} [(δ (t) + \frac{j}{π t}) * μ_{k} (t)] e^{- j ω_{k} t}‖}_{2}^{2}\} \\ s . t . \sum_{k} μ_{k} (t) = x (t) \end{cases}

(3)

In this expression, the set of center frequencies and the unit impulse function are denoted by specific symbols, namely

\{ω_{k}\} = \{ω_{1}, ω_{2}, ω_{3}, \dots ω_{k}\}

and

δ (t)

, respectively. The derivative of the function concerning time t is represented by the corresponding mathematical notation

d_{t}

. To augment the Lagrangian expression, we introduce penalty factor and Lagrangian multiplier(s).

L (\{μ_{k}\}, \{ω_{k}\}, λ) = α {\sum_{k} {‖d_{t} [(δ (t) + \frac{j}{π t}) * μ_{k} (t)] e^{- j ω_{k} t}‖}_{2}^{2} + ‖x (t) - \sum_{k} μ_{k} (t)‖}_{2}^{2} + 〈 λ (t), x (t) - \sum_{k} μ_{k} (t) 〉

(4)

In the solution procedure of VMD, the algorithm utilizes an iterative search method, whose primary goal is to identify the minimum point of the Lagrangian function L. Utilizing this strategy, the original input signal can be precisely decomposed into k distinct signal components.

2.2. Improved Subtraction-Average-Based Optimizer

2.2.1. SABO Algorithm

The subtraction-average-based optimizer (SABO) algorithm [28,29], a newly introduced intelligent optimization algorithm in recent years, adjusts the positions of population members in the solution space by computing the subtraction averages among individuals, thereby enabling rapid convergence to the global optimal solution. This algorithm not only accelerates convergence speed but also augments its capability to strike a balance between global search and local exploitation, effectively averting the peril of becoming trapped in local optimal solutions. The optimization procedure is detailed as follows.

(1) In the initialization stage, a random initialization approach is adopted to create an initial population comprising N individuals within a d-dimensional solution space. The ith individual, X_i, within the population signifies the candidate solution in the optimization problem. The specific formula for the initialization stage is presented in Equation (5).

X_{i} = b_{l} + r_{1} * (b_{u} - b_{l})

(5)

In the given equation,

b_{l}

and

b_{u}

denote the lower and upper boundaries of the solution space, respectively, whereas

r_{1}

represents a sequence of random numbers within the range [0, 1].

Once the population initialization is concluded, SABO introduces a novel concept, termed the “−v” operation, when updating each individual within the population. This operation is designated as the v-subtraction of individual B from individual A. The precise formulation for this operation is presented in Equation (6).

A - B_{v} = s i g n [f (A) - f (B)] \times (A - v * B)

(6)

where v denotes a random number sequence with a dimensionality of d and values confined within the range [1, 2]; f(·) signifies the objective function pertinent to the optimization problem at hand; and sign(·) is the signum function, which assumes a value of 1 when its argument is less than 0, 0 when its argument equals 0, and −1 when its argument exceeds 0.

Within SABO, the trajectory of optimization for each individual

X_{i}

within the solution space is governed by the average outcome of its “−v” operation with individual

X_{j}

. The iterative update formula for the ith individual

X_{i}

during the (t + 1)th iteration is presented in Equation (7).

{X_{i}}^{t + 1} = {X_{i}}^{t} + r_{2} * \frac{1}{N} \sum_{j = 1}^{N} ({X_{i}}^{t} - {X_{j, v}}^{t})

(7)

where N signifies the aggregate count of individuals comprising SABO; and

r_{2}

denotes a sequence of random variables adhering to a normal distribution characterized by a mean of 0 and a variance of 1.

After updating the individuals, similar to most optimization algorithms, we revise the ultimate position within the current iteration to ascertain the optimal solution. The revision is based on the objective function values corresponding to the (t + 1)th and tth iterations of the individuals. The explicit update formula is detailed in Equation (8).

{X_{i}}^{t + 1} = \{\begin{cases} {X_{i}}^{t + 1}, f ({X_{i}}^{t + 1}) \leq f ({X_{i}}^{t}) \\ {X_{i}}^{t}, f ({X_{i}}^{t + 1}) > f ({X_{i}}^{t}) \end{cases}

(8)

2.2.2. Improved Strategies for SABO

Chaotic maps, serving as innovative tools within optimization metaheuristic algorithms, markedly elevate the global search capabilities of these algorithms through their distinctive nonlinear dynamic attributes. Leveraging the intrinsic randomness of chaotic theory, these maps exhibit extreme sensitivity to initial conditions and unpredictability in their evolutionary trajectories, facilitating the generation of dynamic sequences characterized by high randomness and ergodicity. In contrast to traditional random parameters, pseudo-random numbers derived from chaotic maps present dual advantages in optimization algorithms; they effectively mitigate premature convergence by sustaining population diversity and guide the algorithm to conduct a comprehensive search across the solution space through their ergodicity. Addressing the enhancement requirements of the SABO algorithm concerning the uniformity of initial population distribution and diversity during iterations, this study incorporates chaotic sequences into the algorithm framework, supplanting conventional random parameters. Specifically, this study introduces the sinusoidal chaotic map. The generated chaotic sequences exhibit a quasi-uniform distribution within the [0, 1] interval, empowering the enhanced SABO algorithm to more adeptly circumvent the curse of dimensionality during the global exploration phase, thereby systematically augmenting the solution precision and robustness for intricate optimization challenges.

X_{n + 1} = a \cdot X_{n}^{2} \sin (π X_{n})

(9)

where

X_{n}

signifies the state at the n-th iteration,

X_{n + 1}

indicates the state in the subsequent iteration, and X belongs to the interval [0, 1]. Within this paper, the value of a is designated as 2.

Additionally, the SABO algorithm, which updates by utilizing the subtraction average of all particle positions rather than the global optimum value at each iteration, is particularly prone to becoming trapped in local optima when the initial particle positions are poorly initialized. To mitigate this issue, the present study introduces the following strategy: when there is no improvement in the particle’s fitness value at the current iteration, the Golden Sine Algorithm (Golden-SA) is adopted to update the particle’s position. This approach not only reduces the extra computational load of fitness evaluations but also utilizes the global search capabilities of the Golden Sine Algorithm to help the SABO algorithm escape from local optima. The position update equation for the Golden Sine Algorithm is provided below.

{X_{i}}^{d} (t + 1) = {X_{i}}^{d} (t) |\sin (r_{1})| + r_{2} \sin (r_{1}) |x_{1} P^{d} (t) - x_{2} {X_{i}}^{d} (t)|

(10)

{X_{i}}^{d} (t)

represents the spatial position of the i-th individual in the d-dimensional individual space during the t-th iteration;

P^{d} (t)

denotes the global optimal position during the t-th iteration; r1 and r2 are random numbers within the range [0, 2π]; x1 and x2 are coefficients obtained through the golden section ratio, which narrow down the search space and guide the current value towards the optimal value, thereby ensuring the convergence of the algorithm; and the golden section ratio is given by

τ = (5^{1 / 2} - 1) / 2

,

x_{1} = a τ + b (1 - τ)

,

x_{2} = a (1 - τ) + b τ

. The initial values of a and b are set to −π and +π, respectively. Subsequently, a and b change with the variation in the objective value, and x1 and x2 are updated accordingly.

2.3. Transformer Model

Initially introduced by the Google team in 2017, the Transformer model boasts a core advantage in its distinctive self-attention mechanism, which effectively captures the interdependencies among diverse elements within a sequence. This mechanism obviates the need for the gradual iterative information propagation process required by traditional models, thereby significantly reducing the risks of gradient vanishing and gradient exploding. Furthermore, the Transformer model incorporates residual connections and layer normalization techniques, which further mitigate the gradient issues commonly encountered during the training of deep neural networks. By introducing a multi-head attention mechanism, the model is capable of capturing correlations among features from multiple dimensions, successfully overcoming the limitations of recurrent neural networks in terms of parallel computation. The Transformer model has demonstrated exceptional performance in various fields, including natural language processing, machine translation, and sequence prediction. The structural diagram of the Transformer is shown in Figure 1.

2.4. TELM Model

Extreme learning machine (ELM), an innovative network learning algorithm proposed by Huang et al. [30], has attracted widespread attention in the field of machine learning due to its unique design principles. The core mechanism of this algorithm lies in its ability to randomly set the network input weights and biases of hidden-layer nodes, thereby directly deriving the output weights without the iterative adjustment process required by traditional methods. This characteristic not only significantly enhances the learning speed of the algorithm but also effectively reduces the risk of falling into local optima, thus improving the generalization performance of the network. In terms of its theoretical framework, the ELM algorithm cleverly combines the advantages of randomness and single hidden layer feedforward neural networks (SLFNs). By randomly initializing weights and biases, the algorithm can quickly locate and explore the solution space of the problem. The determination of output weights is achieved by solving a system of linear equations, a process that is both concise and efficient. Additionally, due to the randomness of input weights and biases, ELM demonstrates remarkable ability in avoiding overfitting, further consolidating its position in the field of machine learning.

When discussing the construction of a neural network, we first set the input vector as

\{x_{i} | i = 1, 2, \dots, n\}

and the output vector as

\{y_{j} | j = 1, 2, \dots, m\}

and determine the number of neurons l in the hidden layer as well as the chosen activation function

g (x)

. Referring to the schematic diagram in Figure 2, we can derive the output matrix Y of the network, as shown in Equation (13).

\begin{array}{l} Y = {[y_{1}, y_{2}, \dots, y_{j}]}_{n \times m} = \\ {[\sum_{i = 1, j = 1}^{l, m} β_{i j} g (w_{i j} x_{i} + b_{i})]}_{n \times m} = (H_{m \times l} β_{l \times n}) \end{array}

(11)

Within the framework of ELM, H plays a crucial role, as it represents the output matrix of the hidden layer. A particularly notable characteristic of ELM is that when the activation function g(x) satisfies the condition of being infinitely differentiable, its connection weights w and hidden-layer thresholds b can be randomly selected prior to the training phase and remain unchanged throughout the entire training process. This design significantly simplifies the computational steps and effectively reduces the computational scale, thereby accelerating the learning process. Additionally, the connection weights β between the hidden layer and the output layer are obtained by solving a specific system of linear equations. It is noteworthy that the random selection mechanism of ELM not only enhances computational efficiency but also grants the model excellent generalization ability.

\min_{β} ‖H β - Y^{'}‖

(12)

\hat{β} = H^{+} Y^{'}

(13)

In the expression above, H+ denotes the Moore–Penrose pseudoinverse of the hidden-layer output matrix H. Next, based on the core theory of ELM, the process of constructing its model is elaborated in detail:

The first step is initialization. During this phase, the connection weights w and the thresholds b of the hidden-layer neurons are randomly set.

Subsequently, the number of hidden-layer neurons needs to be determined. Specifically, the number of hidden-layer neurons l in ELM is set to twice the number of input-layer neurons n plus one, mathematically expressed as l = 2 × n + 1.

Then, selecting an appropriate activation function g(x) is crucial, as it will affect the model’s nonlinear mapping capability.

The final step involves solving for the weights of the output layer. This typically involves solving a system of linear equations to derive the output-layer weights.

The TELM algorithm represents an innovative improvement on the ELM algorithm and is also designed as an efficient binary classifier. Unlike the traditional ELM algorithm, the TELM algorithm employs two non-parallel hyperplanes for classification, rather than a single hyperplane. This enhancement enables the TELM algorithm to determine the optimal positions of these two non-parallel hyperplanes by solving two smaller-scale Quadratic Programming Problems (QPPs). In the TELM algorithm, we denote U and V as the data matrices corresponding to hidden-layer node outputs labeled as +1 and −1, respectively.

\begin{array}{l} f_{1} {(x)}_{\cdot}^{\cdot} = β_{1} \cdot h (x) = 0 \\ f_{2} {(x)}_{\cdot}^{\cdot} = β_{2} \cdot h (x) = 0 \end{array}

(14)

The TELM algorithm searches for these two non-parallel hyperplanes in the real number field R. It positions them as close as possible to the data points of their respective classes while maximizing their distance from the data points of the other class. The classification of a new data point (i.e., assignment to the +1 class or −1 class) depends on its relative distance to these two non-parallel hyperplanes. The principle behind the implementation of the linear TELM algorithm lies in solving two Quadratic Programming Problems (QPPs). In these two QPPs, the objective function of one problem corresponds to one class (e.g., the +1 class), with the constraints corresponding to the other class (e.g., the −1 class). Conversely, the setup for the other QPP is the opposite.

\begin{array}{l} \min_{β_{1}, ξ} \frac{1}{2} {‖U β_{1}‖}_{2}^{2} + c_{1} e_{2}^{T} ξ \\ subject to, - V β_{1} + ξ \geq e_{2}, ξ \geq 0 \end{array}

(15)

\begin{array}{l} \min_{β_{2}, η} \frac{1}{2} {‖V β_{2}‖}_{2}^{2} + c_{2} e_{1}^{T} η \\ subject to, U β_{2} + η \geq e_{1}, η \geq 0 \end{array}

(16)

In these two Quadratic Programming Problems (QPPs), the error vectors correspond to the training patterns of class −1 and class +1, respectively. The parameters c1 and c2 are positive weighting factors that balance the importance of the objective function and the constraints. Additionally, e1 and e2 represent unit vectors. The Wolfe dual problem of the primal problem can be obtained according to the following formula:

\begin{array}{l} \max_{α} e_{2}^{T} α - \frac{1}{2} α^{T} V {(U^{T} U + ε I)}^{- 1} V^{T} α \\ subject to, 0 \leq α_{i} \leq c_{1}, i = 1, 2 \dots, m_{2} \end{array}

(17)

\begin{array}{l} \max_{γ} e_{2}^{T} γ - \frac{1}{2} γ^{T} U {(V^{T} U + ε I)}^{- 1} U^{T} γ \\ subject to, 0 \leq γ_{i} \leq c_{2}, i = 1, 2 \dots, m_{2} \end{array}

(18)

Through the above two formulas, we can obtain the optimal values of the Lagrange multipliers

α

and

γ

, and the decision variables

β_{1}

and

β_{2}

can be calculated as follows:

\begin{array}{l} β_{1} = - {(U^{T} U + ε I)}^{- 1} V^{T} α \\ β_{2} = {(V^{T} V + ε I)}^{- 1} U^{T} γ \end{array}

(19)

A new data point x ∈ Rⁿ is assigned to r(r = +1, −1) from the class r.

f (x) = \arg (\underset{r = 1, 2}{\min d_{r} (x)}) = \arg (\min_{r = 1, 2} |β_{r}^{T} h (x)|)

(20)

3. Detailed Implementation Procedure of the Proposed Fault Diagnosis Method

3.1. Optimization of VMD Parameters Using the Improved SABO

In the vast field of signal processing techniques, the variational mode decomposition (VMD) algorithm occupies a pivotal position due to its unique advantages in the adaptive decomposition of signals. The core implementation elements of this algorithm focus on the intelligent optimization of modal parameters, which directly influences the effectiveness of subsequent feature extraction and the reliability of diagnostic decisions. Previous studies have confirmed that the decomposition number k and penalty factor a are crucial bivariates determining the demodulation accuracy, and any arbitrary configuration can lead to significant undesired outcomes. Specifically, the value of k profoundly affects the granularity of the VMD decomposition. When k is set too small, signals may not be adequately decomposed, and some critical signal components may not be effectively identified and separated. Conversely, when k is set too large, it may lead to over-decomposition, causing signal components to be unnecessarily split, which increases the complexity of subsequent processing. Additionally, the value of a has a significant regulatory effect on the bandwidth of the decomposed signal components. Setting a too small can result in excessively large bandwidths of the decomposed signal components, easily mixing in other interfering components and affecting the purity of the decomposition. Setting a too large may cause the bandwidths of the decomposed signal components to be too small, resulting in some useful signal components being omitted during the decomposition process, leading to incomplete information. Therefore, when using the VMD algorithm, it is necessary to carefully and thoughtfully select the decomposition parameters to ensure that the signal can be accurately and reasonably decomposed. To further optimize the parameter settings of VMD, this study introduces an improved SABO algorithm, namely the ISABO algorithm.

During the optimization process of the ISABO algorithm, the selection of the fitness function is particularly crucial. The fitness function serves as an important benchmark for measuring the search direction and convergence speed of the optimization algorithm. Given that bearing fault signals exhibit unique characteristics in both the time and frequency domains, this study proposes a new design approach for the fitness function to more accurately assess the effectiveness of the VMD decomposition algorithm and effectively guide the ISABO algorithm to find the optimal results. This approach combines the envelope entropy and envelope Gini coefficient, which can comprehensively and deeply consider the time-domain impulsiveness and frequency-domain cyclostationarity of signals, thereby more effectively extracting fault features. In the specific implementation, this study adopts corresponding calculation methods to compute the envelope entropy E and the envelope spectrum Gini coefficient, which can be expressed as follows:

\{\begin{cases} E = - \sum_{j = 1}^{N} e_{i} \lg e_{j} \\ e_{i} = a (j) / \sum_{j = 1}^{N} a (j) \end{cases}

(21)

When calculating the envelope entropy, the first step is to perform Hilbert envelope demodulation on each signal sample x(j) to obtain the envelope signal a(j). Subsequently, a(j) is normalized, and based on the principles of information entropy calculation, the envelope entropy E is derived.

\{\begin{cases} E G i n i = 1 - 2 \sum_{n = 1}^{N} \frac{|E_{(n)}|}{{‖E‖}_{1}} (\frac{N - n + 1 / 2}{N}) \\ G i n i = 1 - 2 \sum_{n = 1}^{N} \frac{|x_{(n)}|}{{‖x‖}_{1}} (\frac{N - n + 1 / 2}{N}) \end{cases}

(22)

In the above equation,

{‖x‖}_{1}

represents the L₁ norm of the signal x, and E denotes the envelope of the signal x.

In the field of non-stationary signal representation analysis, entropy evaluation methods based on envelope analysis play a crucial role in condition monitoring of rotating machinery due to their ability to effectively reflect the sparsity characteristics of signals. When VMD is employed to process vibration signals, if the resulting intrinsic mode functions (IMFs) contain significant background noise interference, the sparsity of these components will degrade notably, leading to a corresponding increase in the calculated envelope entropy. Conversely, when the IMFs obtained from decomposition contain abundant fault characteristic information, such signal components tend to exhibit better sparsity distribution, resulting in a significant reduction in the entropy index. It is worth noting that the Gini index analysis method, originally from the field of economics, has been extended to the evaluation of time-frequency features. However, when the traditional Gini index is directly applied to untreated raw signals, its ability to identify periodic transient impulse waveforms caused by mechanical damage is limited. To address this, this study proposes a method for constructing an envelope Gini coefficient index by refining the Hilbert spectral envelope of the signal. This approach integrates time-domain characteristics and modulation information features while maintaining the original evaluation framework of the Gini coefficient. This method not only accurately captures the characteristics of instantaneous pulse energy distribution but also effectively quantifies the concentration of fault information in each decomposed component. In summary, this study establishes a comprehensive index evaluation system. By using the reciprocal of the ratio of the above two parameters as an optimized benchmark, the combination of decomposition parameters corresponding to the minimum value of the fitness function ensures the optimal signal decomposition result that best highlights the fault characteristics. The calculation method is as follows:

E E G i n i = \frac{E}{E G i n i}

(23)

The specific steps of optimizing the VMD using the ISABO algorithm are as follows. Meanwhile, the framework of the ISABO-VMD algorithm is illustrated in Figure 3.

(1): Algorithm initialization phase: During this phase, the basic operational parameters of the ISABO optimizer are meticulously set, including the population size and the maximum number of iterations. Meanwhile, based on the specific problem, the feasible solution space for the key VMD parameters (such as the number of decomposition layers and the penalty factor) is defined. Multiple sets of parameter combinations within the preset ranges are randomly generated to form the initial population. Each individual in the initial population represents a potential solution to the VMD parameter optimization problem.
(2): Fitness evaluation construction: A dual-indicator fusion fitness evaluation system is established, which uses the ratio of the envelope entropy to the envelope Gini coefficient of the VMD decomposition results as the performance criterion for parameter optimization. This criterion quantifies the quality of signal decomposition under different parameter combinations. By calculating the ratio of these two indicators, the effectiveness of VMD decomposition under various parameter combinations can be comprehensively evaluated, with a smaller ratio indicating higher decomposition quality.
(3): Population position iterative update: Based on the optimization mechanism of the ISABO algorithm, a dynamic adjustment strategy is constructed by calculating the subtraction average among population individuals to drive their positions towards advantageous regions in the search space. In each iteration, for each individual in the population, its subtraction average relative to other individuals is first calculated. This involves subtracting the parameter values of other individuals from the target individual’s parameter values and then taking the average of these differences. Subsequently, based on the subtraction average and the current individual’s fitness value, the update strategy for the individual’s position is determined. Individuals with poor fitness are more likely to be influenced by the subtraction average, resulting in larger positional adjustments to explore new parameter space regions. Conversely, individuals with good fitness may undergo smaller positional adjustments for refined searches within the current region. As iterations proceed, the population gradually converges towards regions of high-quality solutions.
(4): Convergence condition discrimination: When the algorithm reaches the pre-set maximum number of iterations, the search is forcibly terminated regardless of the current population state. This ensures that the algorithm does not run indefinitely. Subsequently, the individual with the optimal fitness in the current population is locked in as the approximate global optimal VMD parameter combination.
(5): Parameter application: The optimized decomposition parameters (i.e., the number of decomposition layers and the penalty factor) are imported into the VMD system to guide the decomposition of input signals. By reconstructing the signal based on the optimized parameters, more accurate and efficient signal decomposition results can be obtained, thereby constructing a more precise and efficient algorithm model.

3.2. Selection Criteria for IMF Signal Components

In the analysis of bearing vibration signals processed by VMD, the generated signal components may contain redundant modes or false information unrelated to the core fault. Neglecting the selection of valid components may lead to a decrease in the accuracy of subsequent fault feature extraction. This paper proposes a dual-index joint decision strategy, utilizing envelope entropy to accurately quantify the sparsity characteristic of fault features within intrinsic mode functions (IMFs). When an IMF component contains dominant fault impact components, its corresponding envelope entropy value decreases significantly, reflecting a trend of concentration in dynamic features. Meanwhile, the envelope Gini coefficient is combined to measure the feature set, which can sensitively capture the differentiated characteristics of IMF components in the energy–time domain distribution, especially the periodic intensity variations caused by impact events. Based on this, a dual-dimensional evaluation system consisting of envelope entropy and envelope Gini coefficient is established. The specific screening mechanism is as follows: the parameter ratio of the two indices for each IMF component is calculated; a smaller ratio indicates that the component simultaneously meets the preferred conditions of “prominent feature sparsity” and “significant energy distribution difference.”

3.3. Transformer–TELM Model Structure

The fault detection system designed in this study aims to fully exploit the synergistic effects of Transformer and TELM, with the former dedicated to extracting high-dimensional features and the latter to achieving efficient classification. Specifically, the Transformer network employs a multi-layer architecture, comprising a sequence input layer, a positional embedding layer, residual connection layers, two levels of self-attention layers, and a feature compression layer. The sequence input layer accepts feature vectors of variable dimensions and maps them into an initial sequence representation. Subsequently, the positional embedding layer adds positional information with a maximum positional encoding length of 16 to the input sequence through a learnable embedding matrix, capturing the relative positional characteristics of the temporal signals. The sequence input layer accepts feature vectors of variable dimensions and maps them into an initial sequence representation. The positional embeddings are merged with the input features via an addition layer. This forms residual connections, enhancing information propagation. Next, the model includes two self-attention layers, utilizing causal self-attention and regular self-attention mechanisms, respectively. The causal attention ensures that the model only focuses on information from the current and previous time steps through a masking mechanism, while the regular attention is used for global feature interaction. Each attention layer contains four processing units, with a key vector dimension of 64 per unit, totaling 256 dimensions, to enhance the model’s parallel processing capabilities. Following this, the feature compression layer reduces the high-dimensional attention output to 20 dimensions through a fully connected layer. After inhibiting overfitting via a Dropout layer, the temporal index layer extracts the features at the end of the sequence as the global representation. Finally, this representation is mapped to a four-dimensional fault category space through another fully connected layer, completing end-to-end feature extraction. The output is then converted into a probability distribution via a Softmax layer, and the classification output layer generates category labels. After training, features are extracted from the second fully connected layer and input into TELM for fault classification. The framework of the Transformer-TELM algorithm is illustrated in Figure 4.

3.4. Algorithm Steps and Flow

The detailed implementation steps for the fault diagnosis method combining VMD optimized by the ISABO algorithm with the Transformer–TELM model are as follows:

(1): Fine-tuning of VMD Parameters: To explore the optimal configuration of parameters in the VMD algorithm, the ISABO algorithm is introduced to perform fine-tuning of the parameters. The core of this process lies in continuously iterating and optimizing to achieve the best performance of the VMD algorithm, thereby laying a solid foundation for subsequent signal decomposition tasks.
(2): VMD Signal Decomposition and Key Component Identification: Taking the original signal of the rolling bearing as input, the VMD algorithm optimized by the ISABO algorithm is utilized to perform signal decomposition. This step generates multiple signal components. Subsequently, a screening strategy proposed in this paper is adopted, which involves calculating the ratio of the envelope entropy to the envelope Gini coefficient for each signal component and selecting the component with the smallest ratio as the key signal component.
(3): Calculation and Extraction of Preliminary Features: Based on the identified key signal component, a series of statistical feature quantities, such as the mean, variance, peak value, and kurtosis, are further calculated. These feature quantities, as intuitive reflections of the basic statistical characteristics of the signal, provide the necessary basis and support for subsequent deeper feature mining and learning.
(4): Mining and Learning of Deep Features: To deeply explore the complex features hidden in the signal, the Transformer model is introduced as a feature extractor. With its powerful sequence processing capability, the Transformer model can automatically capture complex patterns and regularities in the data, thereby extracting richer, more effective, and discriminative feature representations.
(5): Identification of Rolling Bearing Faults: The deep features extracted by the Transformer model are input into the TELM model for fault diagnosis of rolling bearings. Through rigorous model training and validation processes, accurate identification and classification of rolling bearing fault types can be achieved.

4. Experimental Verification

4.1. Analysis of Optimization Performance of ISABO

To evaluate the effectiveness of the ISABO algorithm in the field of optimization, this study designed comparative experiments. In these experiments, the ISABO algorithm was compared with several well-known optimization algorithms, including the traditional SABO, as well as the recently highlighted Grey Wolf Optimizer (GWO), Golden Jackal Optimization (GJO), Multi-verse Optimization (MVO), and dung beetle optimizer (DBO) [31,32,33,34]. The dataset selected for the experiments was CEC2005, from which six challenging benchmark test functions were meticulously chosen as benchmarks for evaluating algorithm performance. The CEC2005 test suite is a standard dataset released by the IEEE (Institute of Electrical and Electronics Engineers) congress on evolutionary computation (CEC) in 2005 for evaluating the performance of optimization algorithms. It is designed to assess and compare the performance of different optimization algorithms. This test suite includes a series of standard test functions specifically designed to evaluate the performance of optimization algorithms, and it is widely used to assess the performance of various optimization algorithms, particularly evolutionary algorithms.

To quantify the performance of each algorithm in terms of convergence speed and accuracy, the experiments conducted an in-depth analysis of their convergence curves. As shown in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10, these figures visually present the trend of fitness changes for each algorithm on the six complex test functions: F2, F5, F8, F10, F13, and F15. By carefully observing these charts, we can clearly see the performance differences among the algorithms on different test functions. The experimental results indicate that the ISABO algorithm, leveraging its unique optimization strategies and mechanisms, demonstrates significant advantages in multiple aspects such as convergence speed, stability, and global search capability. In particular, its improved optimization mechanism not only effectively accelerates the convergence process of the algorithm but also significantly reduces the final optimal fitness value.

To thoroughly validate the reliability and robustness of the experimental results, this study conducted further repetitive experimental verifications. These verifications were carried out within a designed experimental framework. In this framework, the population size of the algorithms was maintained at 30 individuals, and the maximum number of iterations was set to 2500. Specifically, we adopted a strategy of independently running each algorithm 30 times to minimize the random effects that might arise from a single experiment. On this basis, key performance indicators for each experiment, including the minimum value (min), standard deviation (Std), average value (avg), and median, were meticulously recorded and used as important criteria for evaluating algorithm performance. The specific data can be found in Table 1. Through a detailed analysis and comprehensive comparison of the data in Table 1, we can clearly observe that the ISABO algorithm demonstrated significant performance advantages compared to the other optimization algorithms involved in the comparison. This was particularly evident when testing complex functions such as F2, F5, F8, F10, F13, and F15. In particular, the ISABO algorithm achieved remarkable results in multiple evaluation dimensions, including the minimum value (min), standard deviation (Std), average value (avg), and median. Specifically, when testing the F2 function, the ISABO algorithm achieved optimal performance indicators alongside the SABO and DBO algorithms. When testing the F5, F8, and F13 functions, the ISABO algorithm was comprehensively superior, achieving optimal values in all evaluation dimensions. When testing the F10 function, the ISABO algorithm led in terms of the standard deviation (Std), while its minimum value (min) was tied for best with the SABO, DBO, and GWO algorithms, and its average value (avg) and median were optimal alongside the DBO algorithm. When testing the F15 function, the ISABO algorithm achieved optimal values in terms of both the standard deviation (Std) and average value (avg). These experimental results fully validate the exceptional performance of the ISABO algorithm in solving complex function optimization problems.

4.2. Verification of Bearing Experimental Data

To comprehensively evaluate the effectiveness and reliability of the method proposed in this study in practical application scenarios, experiments were conducted using the widely acclaimed bearing dataset published by Case Western Reserve University (CWRU) for in-depth exploration. Figure 11 clearly illustrates the layout and detailed composition of the bearing fault data acquisition system [35,36]. This experimental framework integrates core components such as an electric motor, torque measurement devices, and power monitoring equipment, which work in concert to ensure the accuracy of experimental data and the robustness of experimental conclusions. For the experiments, SKF6205 bearings were selected as test samples. To simulate bearing faults under actual operating conditions, electrical discharge machining was used to introduce single-point diameter damage on the inner race, outer race, and rolling elements of the bearings. These damage patterns aimed to mimic the types of faults that bearings may encounter in real-world operating environments, thereby enhancing the precision and practicality of the experiments. In terms of experimental parameter configuration, a sampling frequency of 12 kHz was selected to ensure the accuracy and completeness of data acquisition were unaffected. Additionally, the motor speed was maintained at 1797 revolutions per minute to simulate the operating state of the bearings under normal conditions. During the data collection process, special attention was given to the vibration signals of the inner race, outer race, and rolling elements of the bearings with a fault diameter of 0.007 inches, as these signals contain rich fault characteristic information that is crucial for comprehensively evaluating the performance of the proposed solution.

To extract valuable feature information from the raw data, we employed the sliding window technique to perform overlapping sampling. Specifically, we set a sliding window with a size of 1000 data points, and each data sample encompassed 2048 data points. This sampling strategy ensured that data coherence and integrity were effectively maintained while significantly reducing the risk of data loss. After the feature extraction phase, we obtained 120 samples for each fault type, totaling 480 samples. These samples provided ample data resources for subsequent model training and testing. In terms of dataset partitioning, we followed the conventional 3:1 ratio to allocate the training set and test set. This allocation ensured that the model received sufficient learning during the training phase while accurately measuring its performance during the testing phase. The basic information of the data used in the experiment is shown in Table 2. The time-domain waveforms of a bearing under normal operation and with an inner ring fault are shown in Figure 12. The time-domain waveforms of a bearing with outer ring fault and rolling element fault are shown in Figure 13.

At the initial stage, following the VMD parameter optimization procedure outlined in this paper, we employed the ISABO optimization algorithm to optimize and select the two key parameters in VMD: the number of decomposition levels (k) and the penalty factor (a). During the execution of ISABO optimization, we specifically chose the ratio of the envelope entropy to envelope Gini coefficient as the fitness function metric. For the search intervals of the VMD parameters, we set the number of decomposition levels within a range from 3 to 10 and the penalty factor within a range from 100 to 2500. When configuring the parameters of the ISABO algorithm, the experiment specified a population size of 30 and a maximum of 15 iterations to ensure comprehensive coverage and efficient operation of the optimization process. After completing the parameter optimization, this study obtained the variations in the fitness curve of the ISABO algorithm under different operating conditions (including normal operation, inner race fault, outer race fault, and rolling element fault), as depicted in Figure 14. Additionally, Table 3 lists the optimal parameter values obtained in detail. By closely observing Figure 14, we can clearly see that the algorithm converged rapidly. When the fitness reached its optimal state, the values of the number of decomposition levels and the penalty factor for rolling bearings under different operating conditions exhibited the following differences: under normal operating conditions, k was 2446 and a was 9; for inner race faults, k was 2273 and a was 3; for outer race faults, k was 100 and a was 4; and for rolling element faults, k was 2238 and a was 10. Based on these optimization results, the experiment determined the optimal parameter combinations for the VMD algorithm. Subsequently, these meticulously optimized parameters were accurately input into the VMD algorithm to perform VMD decomposition processing on signals from four different operating conditions of the bearing. The time-domain waveforms and frequency spectra of the decomposed signal components obtained after the decomposition process are shown in Figure 15, Figure 16, Figure 17 and Figure 18. From these figures, it is intuitive to observe that the VMD algorithm with optimized parameters exhibited excellent performance in signal decomposition. It not only effectively avoided issues such as modal aliasing but also accurately decomposed the signal into components ranging from high to low frequencies, providing solid and powerful data support for subsequent fault diagnosis work.

In this study, we fully leveraged the proposed indicator evaluation system. We conducted an in-depth assessment of the signal component characteristics of bearings under four different operating conditions, which were obtained through the ISABO-VMD method. Specifically, we selected the ratio of the envelope entropy to the envelope Gini coefficient as the core function indicator to measure fitness. During the experiment, this ratio indicator was calculated to quantitatively analyze the characteristic performance of each signal component. After a series of detailed calculation processes, we obtained the results for the signal components, which are presented in Figure 19. By comparing and analyzing the data displayed in the bar charts in Figure 19, we found that under normal bearing operation, the indicator value corresponding to the IMF4 signal component was the smallest, indicating that IMF4 was highly representative of the signal characteristics of normal bearing operation. Similarly, when a fault occurred in the inner race of the bearing, the indicator value of the IMF2 signal component dropped to the lowest level, demonstrating the superiority of IMF2 in reflecting the fault characteristics of the inner race. Analogously, in the case of faults in the rolling elements and the outer race of the bearing, the IMF3 and IMF2 signal components exhibited the smallest indicator values, respectively. Based on these observations and analyses, we selectively identified these signal components as the most sensitive modal components. Subsequently, we calculated nine statistical feature parameters, including the mean, variance, peak value, kurtosis, root mean square value, peak factor, etc., for the selected optimal sensitive modal components. To further explore the deep features embedded in the signals, we introduced a Transformer encoder to comprehensively and thoroughly mine the complex relationships among data features and the long- and short-term dependencies in the time series.

At this juncture, to further evaluate the performance of the proposed method under different noise environments, this study intentionally introduced noise interference. The noise was at two different intensities, namely −2 dB and −7 dB, and was added to four types of bearing operating condition signals. This was aimed at simulating the diverse noise conditions that may be encountered in practical applications, thereby enabling a more accurate assessment of the method’s robustness. Subsequently, relying on the optimization capabilities of the ISABO algorithm, a systematic search for optimal parameters in the VMD algorithm was conducted. During the optimization process, a rigorous scientific attitude was maintained, and the ratio of the envelope entropy to the envelope Gini coefficient was selected as the fitness function to ensure that the algorithm could accurately and sensitively reflect the actual effects of parameter optimization. The search range for the VMD parameters remained consistent with the previous settings to ensure the rationality of the parameter search. Under the interference of noise at −2 dB and −7 dB intensities, the trend in the fitness curves of the ISABO optimization algorithm was closely observed in the experiments, as clearly shown in Figure 20 and Figure 21. From these figures, it is evident that regardless of the noise intensity, the fitness curves exhibited trends of rapid convergence to varying degrees and found the optimal fitness values under their respective noise conditions. After optimization using ISABO, the optimal combinations of the VMD parameters for different noise conditions were obtained and are detailed in Table 4. These optimal parameter combinations provided strong support for subsequent bearing signal processing and analysis.

In the subsequent analysis, this study employed the ratio of the envelope entropy to the envelope Gini coefficient as an evaluation criterion to explore the signal component characteristics of bearings in four different operating conditions (normal operation, inner race fault, rolling element fault, and outer race fault) obtained through the ISABO-VMD method under various signal-to-noise ratio conditions (−2 dB and −7 dB). Through a series of calculations, the experimental results were obtained for each signal component’s evaluation index and are visually presented in bar charts in Figure 22 and Figure 23. Specifically, the bar charts in Figure 22 vividly reveal the evaluation values of each IMF signal component decomposed under different operating conditions of the bearing in a −2 dB noise environment. A detailed examination of these bar charts revealed the following: under normal operating conditions, the IMF4 signal component exhibited the smallest index value. In the case of an inner race fault, IMF4 also demonstrated the best performance. For a rolling element fault, IMF2 showed the highest sensitivity. In addition, for an outer race fault, IMF2 again exhibited the most prominent characteristic. Furthermore, Figure 23 meticulously depicts the evaluation values of each IMF signal component in the four operating conditions of the bearing under −7 dB noise interference. Analysis of these bar charts led to the following conclusions: under normal operating conditions, IMF5 performed the best; in the case of an inner race fault, IMF1 exhibited the highest sensitivity; for a rolling element fault, IMF8 stoof out; and for an outer race fault, IMF2 was considered the optimal choice.

Based on the above analysis, we selectively identified these IMF signal components with the smallest index values as the most sensitive modal components. Subsequently, we systematically calculated nine statistical features, including the mean, variance, peak value, kurtosis, etc., for the optimal signal components screened under −2 dB and −7 dB noise interference conditions. To further explore feature information, this study also introduced a Transformer encoder to comprehensively and deeply analyze the complex correlations between data features and the long- and short-term dependencies in the time series.

To validate the feature learning capability of the model proposed in this paper, this study employed the t-distributed stochastic neighbor embedding algorithm (t-SNE) to reduce the dimensionality of the high-dimensional features output by the final fully connected layer of the Transformer model. During the experiment, the characteristic parameter values of some signals extracted under different decibel noise levels are shown in Table 5 We have cited Figure 2 in the main body of the paper.These reduced features were then visualized. Figure 24, Figure 25 and Figure 26 present comparative results of two-dimensional projections of the original data versus the data enhanced by the Transformer features under different noise intensities. In the absence of noise (Figure 24), the features of the original data samples did not overlap (Figure 24a). However, the clustering of features under different operating conditions was not ideal. In contrast, the feature space processed by the Transformer network (Figure 24b) exhibited relatively clear class separability. Four bearing conditions (normal, inner race fault, outer race fault, rolling element fault) basically formed independent clusters. This was attributed to the Transformer structure effectively aggregating global contextual information through the self-attention mechanism. Meanwhile, as the noise intensity increased to −2 dB (Figure 25) and −7 dB (Figure 26), the feature distribution of the original data underwent significant degradation. In particular, under strong −7 dB noise interference (Figure 26), the feature points of various categories mostly overlapped, posing considerable challenges for fault identification using traditional signal processing and pattern recognition methods. In contrast, the feature space enhanced by the Transformer (Figure 25b and Figure 26b) maintained relatively high intra-class compactness and inter-class separability to some extent. This demonstrates that the multi-head self-attention mechanism enhanced the model’s robust representation ability for key discriminative features under noise interference through parallel feature interactions.

Next, a fault diagnosis model was constructed utilizing the pivotal feature information extracted from the raw data via the Transformer architecture. These extracted features were subsequently subjected to effective classification processing with the aid of the TELM algorithm. To comprehensively and deeply compare the performance of different classification models in fault diagnosis tasks, this study simultaneously conducted classification experiments based on KELM (kernel extreme learning machine), ELM, SVM (Support Vector Machine), the Softmax model, and the CNN model. The experimental section focuses on examining the classification performance differentiation characteristics of each model under different signal-to-noise ratio conditions (normal operating conditions, −2 dB interference, and −7 dB noise environment). By constructing a visualization analysis interface for confusion matrices (Figure 27, Figure 28 and Figure 29), we achieved an intuitive presentation of multidimensional classification results. The model evaluation employed a comprehensive evaluation system based on the precision, recall, and F1 measure. The main experimental results are detailed in the quantitative analysis of Table 6. Under ideal experimental conditions (without noise interference), all the compared intelligent models achieved full score indicators of 100% across the three evaluation dimensions. This confirms the superiority of feature engineering, as raw data under no interference conditions can fully retain its physically essential representation information, providing an ideal training benchmark for the modeling process. When −2 dB noise interference was introduced into the experimental environment, the performance of traditional models showed significant differentiation: the accuracy of KELM, ELM, SVM, the Softmax model, and CNN dropped to 94.17%, 94.17%, 92.50%, 90%, and 95% respectively, and there were varying degrees of decline in the recall and F1 score as well. It is noteworthy that under the condition of adding −7 dB noise, the accuracy of the comparison models showed a step-wise decline (KELM: 83.33%, ELM: 85.47%, SVM: 84.17%, Softmax: 82.50%, CNN: 82.50%). Meanwhile, their recall rates and F1 scores also continued to show a downward trend.

5. Conclusions

This study proposes an intelligent diagnosis method integrating VMD optimized by an improved SABO with Transformer–TELM. This systematic research revealed the following key findings:

(1): In terms of feature extraction optimization, the ISABO algorithm is introduced to adaptively optimize the core parameters of VMD. This significantly improves the diversity of particles and accelerates the convergence process, effectively avoiding the problem of local optimal solutions. Compared with algorithms such as SABO, GWO, GJO, MVO, and DBO, the ISABO algorithm has a simple operation process, fast convergence speed, and high convergence accuracy and does not fall into local optimal solutions throughout the optimization process.
(2): In the dimension of modal decomposition evaluation, a dual-index evaluation system based on the envelope entropy and Gini coefficient is constructed. This composite fitness function effectively integrates the characteristics of time-domain sparsity and frequency-domain energy distribution. At the same time, with the help of the ISABO algorithm, the key parameters of VMD are meticulously optimized, achieving precise decomposition and efficient reconstruction of signals. Based on this dual-dimension index evaluation system, the optimal signal components can be more accurately selected.
(3): In view of the limitations of traditional data-driven modeling methods in fault diagnosis, a fault diagnosis model based on Transformer–TELM is proposed. This model uses a multi-layer Transformer model to deeply analyze the initially extracted feature quantities. By automatically capturing the potential patterns in the data, it extracts more discriminative feature representations. Subsequently, features are extracted from the second fully connected layer and input into TELM for fault classification. To verify the effectiveness of feature extraction, the t-SNE algorithm is introduced to compare and analyze the data before and after feature extraction. The results show that the feature space enhanced by Transformer maintains a relatively high degree of intra-class compactness and inter-class separability. Compared with traditional methods based on models such as KELM, ELM, SVM, and Softmax, the method proposed in this study achieves a significant improvement in diagnostic accuracy.

In summary, the intelligent diagnostic method proposed in this study, which integrates ISABO-optimized VMD with Transformer–TELM, combines the advantages of signal processing and deep learning to form an efficient hybrid approach. This combined method fully leverages the strengths of different technologies. Specifically, ISABO-optimized VMD can efficiently extract key modal components during the signal preprocessing stage, providing high-quality input data for subsequent fault diagnosis. Meanwhile, the Transformer–TELM model demonstrates strong capabilities in feature extraction and fault diagnosis, automatically learning complex patterns in the data and accurately classifying faults. This combined method not only significantly improves the accuracy of fault diagnosis but also enhances the model’s robustness under different noise conditions, making it more advantageous in practical applications. Compared with other similar hybrid methods, the method proposed in this study exhibits significant advantages in multiple aspects. Firstly, during the signal processing stage, the introduction of the ISABO algorithm enables more efficient and accurate parameter optimization for VMD, effectively avoiding the over-decomposition and under-decomposition issues that may arise in traditional methods. Secondly, during the feature extraction and fault diagnosis stages, the multi-layer architecture of the Transformer model can automatically capture potential patterns in the data, extracting more discriminative feature representations. Additionally, the introduction of the TELM model further improves the accuracy of fault classification.

Looking ahead, as industrial systems continue to become more complex and intelligent, the requirements for the accuracy and real-time performance of fault diagnosis will become increasingly stringent. The combined method proposed in this study offers a new approach and perspective for fault diagnosis. In the future, its application scope can be further expanded, for example, by applying it to a wider range of industrial equipment and systems, such as power systems and aerospace equipment.

Author Contributions

Conceptualization, J.Y.; Data curation, J.Y.; Formal analysis, J.Y.; Funding acquisition, J.Y. and M.M.; Methodology, J.Y. and M.M.; Resources, J.Y. and X.L.; Writing—original draft, J.Y.; Writing—review and editing, J.Y., M.M. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Yunnan Fundamental Research Projects (No. 202301AT070256), training program for Baoshan Xingbao Talents (202303), 10th batches of Baoshan young and middle-aged leaders training project in academic and technical fields (No. 202109), the General Scientific Research Project Funding from Zhejiang Provincial Department of Education (No. Y202353293), the Project of Quzhou Science and Technology Plan (No. 2021K31).

Data Availability Statement

The data presented in this study are available on request from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhong, J.; Huang, Y. Time-frequency representation based on an adaptive short-time Fourier transform. IEEE Trans. Signal Process. 2010, 58, 5118–5128. [Google Scholar] [CrossRef]
Griffin, D.; Lim, J. Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 1984, 32, 236–243. [Google Scholar] [CrossRef]
Durak, L.; Arikan, O. Short-time Fourier transform: Two fundamental properties and an optimal implementation. IEEE Trans. Signal Process. 2003, 51, 1231–1242. [Google Scholar] [CrossRef]
Guo, T.; Zhang, T.; Lim, E.; Lopez-Benitez, M.; Ma, F.; Yu, L. A review of wavelet analysis and its applications: Challenges and opportunities. IEEE Access 2022, 10, 58869–58903. [Google Scholar] [CrossRef]
Al-Badour, F.; Sunar, M.; Cheded, L. Vibration analysis of rotating machinery using time–frequency analysis and wavelet techniques. Mech. Syst. Signal Process. 2011, 25, 2083–2101. [Google Scholar] [CrossRef]
Zhu, K.; San Wong, Y.; Hong, G.S. Wavelet analysis of sensor signals for tool condition monitoring: A review and some new results. Int. J. Mach. Tools Manuf. 2009, 49, 537–553. [Google Scholar] [CrossRef]
Smith, J.S. The local mean decomposition and its application to EEG perception data. J. R. Soc. Interface 2005, 2, 443–454. [Google Scholar] [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Wu, Z.; Huang, N.E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar]
Yeh, J.R.; Shieh, J.S.; Huang, N.E. Complementary Ensemble Empirical Mode Decomposition: A Novel Noise Enhanced Data Analysis Method. Adv. Adapt. Data Anal. 2010, 2, 135–156. [Google Scholar]
Zhao, H.; Li, X.; Liu, Z.; Wen, H.; He, J. A Double Interpolation and Mutation Interval Reconstruction LMD and Its Application in Fault Diagnosis of Reciprocating Compressor. Appl. Sci. 2023, 13, 7543. [Google Scholar] [CrossRef]
Sun, Y.; Li, S.; Wang, X. Bearing fault diagnosis based on EMD and improved Chebyshev distance in SDP image. Measurement 2021, 176, 109100. [Google Scholar]
Zhao, Y.; Fan, Y.; Li, H.; Gao, X. Rolling bearing composite fault diagnosis method based on EEMD fusion feature. J. Mech. Sci. Technol. 2022, 36, 4563–4570. [Google Scholar]
Zhang, L.; Yu, S.; Guo, G.; Gong, B. A Fault Diagnosis Approach for Rotating Machinery Rotor Parts Based on Equipment Operation Principle and CEEMD. Mechanics 2023, 29, 494–501. [Google Scholar]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar]
Ma, Z.; Zhang, Y. A study on rolling bearing fault diagnosis using RIME-VMD. Sci. Rep. 2025, 15, 4712. [Google Scholar]
Chang, B.; Zhao, X.; Guo, D.; Zhao, S.; Fei, J. Rolling Bearing Fault Diagnosis Based on Optimized VMD and SSAE. IEEE Access 2024, 12, 130746–130762. [Google Scholar]
Lv, Q.; Zhang, K.; Wu, X.; Li, Q. Fault Diagnosis Method of Bearings Based on SCSSA-VMD-MCKD. Processes 2024, 12, 1484. [Google Scholar] [CrossRef]
Yin, C.; Li, Y.; Wang, Y.; Dong, Y. Physics-guided degradation trajectory modeling for remaining useful life prediction of rolling bearings. Mech. Syst. Signal Process. 2025, 224, 112192. [Google Scholar]
Zhou, H.; Liu, R.; Li, Y.; Wang, J.; Xie, S. A rolling bearing fault diagnosis method based on a convolutional neural network with frequency attention mechanism. Struct. Health Monit. 2024, 23, 2475–2495. [Google Scholar]
Ding, L.; Guo, H.; Bian, L. Convolutional Neural Networks Based on Resonance Demodulation of Vibration Signal for Rolling Bearing Fault Diagnosis in Permanent Magnet Synchronous Motors. Energies 2024, 17, 4334. [Google Scholar] [CrossRef]
Wang, Y.; Li, D.; Li, L.; Sun, R.; Wang, S. A novel deep learning framework for rolling bearing fault diagnosis enhancement using VAE-augmented CNN model. Heliyon 2024, 10, e35407. [Google Scholar]
Shi, L.; Liu, W.; You, D.; Yang, S. Rolling bearing fault diagnosis based on CEEMDAN and CNN-SVM. Appl. Sci. 2024, 14, 5847. [Google Scholar] [CrossRef]
Zhou, Q.; Tang, J. An Interpretable Parallel Spatial CNN-LSTM Architecture for Fault Diagnosis in Rotating Machinery. IEEE Internet Things J. 2024, 11, 31730–31744. [Google Scholar]
Chen, H.; Wei, J.; Huang, H.; Wen, L.; Yuan, Y.; Wu, J. Novel imbalanced fault diagnosis method based on generative adversarial networks with balancing serial CNN and Transformer (BCTGAN). Expert Syst. Appl. 2024, 258, 125171. [Google Scholar]
Rahman, A.U.; Alsenani, Y.; Zafar, A.; Ullah, K.; Rabie, K.; Shongwe, T. Enhancing heart disease prediction using a self-attention-based transformer model. Sci. Rep. 2024, 14, 514. [Google Scholar]
Wan, Y.; Song, S.; Huang, G.; Li, S. Twin Extreme Learning Machines for Pattern Classification. Neurocomputing 2017, 260, 235–244. [Google Scholar]
Trojovský, P.; Dehghani, M. Subtraction-Average-Based Optimizer:A New Swarm-Inspired Metaheuristic Algorithm for Solving Optimization Problems. Biomimetics 2023, 8, 149. [Google Scholar] [CrossRef]
Dai, J.; Zhang, Z.; Li, S.; Li, L. Research on Fault Section Location in an Active Distribution Network Based on Improved Subtraction-Average-Based Optimizer. Symmetry 2025, 17, 107. [Google Scholar] [CrossRef]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar]
Chopra, N.; Ansari, M.M. Golden jackal optimization: A novel nature-inspired optimizer for engineering applications. Expert Syst. Appl. 2022, 198, 116924. [Google Scholar]
Mirjalili, S.; Mirjalili, S.M.; Hatamlou, A. Multi-verse optimizer:a nature-inspired algorithm for global optimization. Neural Comput. Appl. 2016, 27, 495–513. [Google Scholar]
Xue, J.K.; Shen, B. Dung beetle optimizer: A new metaheuristic algorithm for global optimization. J. Supercomput. 2023, 79, 7305–7336. [Google Scholar]
Sun, P.; Liao, Y.H.; Lin, J. The shock pulse index and its application in the fault diagnosis of rolling element bearings. Sensors 2017, 17, 535. [Google Scholar] [CrossRef]
Neupane, D.; Seok, J. Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: A review. IEEE Access 2020, 8, 93155–93178. [Google Scholar]

Figure 1. Encoder structure diagram of Transformer.

Figure 2. Topology diagram of ELM.

Figure 3. Framework of the ISABO-VMD algorithm.

Figure 4. Framework of the Transformer–TELM algorithm.

Figure 5. Optimization results of F2 function.

Figure 6. Optimization results of F5 function.

Figure 7. Optimization results of F8 function.

Figure 8. Optimization results of F10 function.

Figure 9. Optimization results of F13 function.

Figure 10. Optimization results of F15 function.

Figure 11. Case Western Reserve University bearing fault detection equipment. (a) Bearing evaluation platform. (b) High-resolution rolling contact bearing.

Figure 12. Waveforms of normal operation (a) and inner ring fault (b).

Figure 13. Waveforms of outer ring fault (a) and rolling element fault (b).

Figure 14. Fitness variation curves for bearings under normal operation (a), inner race fault (b), rolling element fault (c), and outer race fault (d) conditions (without noise addition).

Figure 15. Waveform (a) and spectrum (b) of VMD decomposition for normal operation.

Figure 16. Waveform (a) and spectrum (b) of VMD decomposition for inner race fault.

Figure 17. Waveform (a) and spectrum (b) of VMD decomposition for rolling element fault.

Figure 18. Waveform (a) and spectrum (b) of VMD decomposition for outer race fault.

Figure 19. Evaluation indicator values of individual IMF components obtained via VMD decomposition for normal operation (a), inner race fault (b), rolling element fault (c), and outer race fault (d) conditions.

Figure 20. Fitness variation curves for bearings under normal operation (a), inner race fault (b), rolling element fault (c), and outer race fault (d) conditions after adding −2 dB noise.

Figure 21. Fitness variation curves for bearings under normal operation (a), inner race fault (b), rolling element fault (c), and outer race fault (d) conditions after adding −7 dB noise.

Figure 22. Evaluation indicator values of individual IMF components obtained via VMD decomposition for normal operation (a), inner race fault (b), rolling element fault (c), and outer race fault (d) conditions (with −2 dB noise added).

Figure 23. Evaluation indicator values of individual IMF components obtained via VMD decomposition for normal operation (a), inner race fault (b), rolling element fault (c), and outer race fault (d) conditions (with −7 dB noise added).

Figure 24. Visualization of training data without added noise (a) and after processing with Transformer (b).

Figure 25. Visualization of training data with −2 db noise added (a) and after processing with Transformer (b).

Figure 26. Visualization of training data with −7 db noise added (a) and after processing with Transformer (b).

Figure 27. Fault diagnosis results based on recognition models using TELM (a), KELM (b), ELM (c), SVM (d), Softmax (e), and CNN (f) without noise addition.

Figure 28. Fault diagnosis results based on recognition models using TELM (a), KELM (b), ELM (c), SVM (d), Softmax (e), and CNN (f) under −2 dB noise addition.

Figure 29. Fault diagnosis results based on recognition models using TELM (a), KELM (b), ELM (c), SVM (d), Softmax (e), and CNN (f) under −7 dB noise addition.

Table 1. Comparison of the results of several test functions run by CEC 2005.

Function	Index	ISABO	SABO	DBO	GWO	GJO	MVO
F2	min	0.00 × 10⁰	0.00 × 10⁰	0.00 × 10⁰	1.10 × 10⁻¹⁷³	8.41 × 10⁻³⁰²	3.71 × 10⁻³
	std	0.00 × 10⁰	0.00 × 10⁰	0.00 × 10⁰	0.00 × 10⁰	0.00 × 10⁰	3.07 × 10⁻³
	avg	0.00 × 10⁰	0.00 × 10⁰	0.00 × 10⁰	3.79 × 10⁻¹⁶⁷	1.89 × 10⁻²⁹⁴	8.07 × 10⁻³
	median	0.00 × 10⁰	0.00 × 10⁰	0.00 × 10⁰	2.61 × 10⁻¹⁷⁰	7.03 × 10⁻²⁹⁷	7.67 × 10⁻³
F5	min	5.38 × 10⁻⁸	5.93 × 10⁰	5.69 × 10⁻¹	5.19 × 10⁰	5.26 × 10⁰	4.13 × 10⁰
	std	6.63 × 10⁻⁵	7.14 × 10⁻¹	8.20 × 10⁻¹	7.11 × 10⁻¹	6.88 × 10⁻¹	1.25 × 10²
	avg	3.44 × 10⁻⁵	6.54 × 10⁰	1.33 × 10⁰	6.06 × 10⁰	6.73 × 10⁰	5.94 × 10¹
	median	3.35 × 10⁻⁶	6.21 × 10⁰	1.05 × 10⁰	6.23 × 10⁰	7.08 × 10⁰	8.56 × 10⁰
F8	min	−4.19 × 10³	−2.80 × 10³	−4.19 × 10³	−3.48 × 10³	−3.38 × 10³	−3.71 × 10³
	std	1.98 × 10⁻⁴	1.31 × 10²	3.72 × 10²	2.82 × 10²	3.75 × 10²	2.18 × 10²
	avg	−4.19 × 10³	−2.50 × 10³	−3.89 × 10³	−2.89 × 10³	−2.61 × 10³	−3.21 × 10³
	median	−4.19 × 10³	−2.48 × 10³	−3.97 × 10³	−2.88 × 10³	−2.63 × 10³	−3.22 + 03
F10	min	4.44 × 10⁻¹⁶	4.44 × 10⁻¹⁶	4.44 × 10⁻¹⁶	4.00 × 10⁻¹⁵	4.44 × 10⁻¹⁶	4.32 × 10⁻³
	std	0.00 × 10⁰	9.01 × 10⁻¹⁶	0.00 × 10⁰	0.00 × 10⁰	1.60 × 10⁻¹⁵	3.66 × 10⁻¹
	avg	4.44 × 10⁻¹⁶	3.76 × 10⁻¹⁵	4.44 × 10⁻¹⁶	4.00 × 10⁻¹⁵	3.05 × 10⁻¹⁵	7.65 × 10⁻²
	median	4.44 × 10⁻¹⁶	4.00 × 10⁻¹⁵	4.44 × 10⁻¹⁶	4.00 × 10⁻¹⁵	4.00 × 10⁻¹⁵	9.38 × 10⁻³
F13	min	1.03 × 10⁻¹⁰	8.87 × 10⁻⁴	1.35 × 10⁻³²	7.11 × 10⁻⁸	8.28 × 10⁻⁷	1.22 × 10⁻⁵
	std	6.43 × 10⁻⁷	7.67 × 10⁻²	2.98 × 10⁻²	3.01 × 10⁻²	9.44 × 10⁻²	2.85 × 10⁻³
	avg	4.48 × 10⁻⁷	4.44 × 10⁻²	1.09 × 10⁻²	9.85 × 10⁻³	7.32 × 10⁻²	8.68 × 10⁻⁴
	median	1.45 × 10⁻⁷	8.62 × 10⁻³	2.82 × 10⁻³¹	2.83 × 10⁻⁷	6.01 × 10⁻⁶	1.13 × 10⁻⁴
F15	min	3.08 × 10⁻⁴	3.08 × 10⁻⁴	3.07 × 10⁻⁴	3.07 × 10⁻⁴	3.07 × 10⁻⁴	3.08 × 10⁻⁴
	std	1.55 × 10⁻⁵	1.06 × 10⁻⁴	3.44 × 10⁻⁴	8.99 × 10⁻³	2.32 × 10⁻⁴	1.22 × 10⁻²
	avg	3.22 × 10⁻⁴	4.76 × 10⁻⁴	5.55 × 10⁻⁴	5.72 × 10⁻³	3.69 × 10⁻⁴	5.71 × 10⁻³
	median	3.14 × 10⁻⁴	4.68 × 10⁻⁴	3.31 × 10⁻⁴	3.07 × 10⁻⁴	3.08 × 10⁻⁴	5.55 × 10⁻⁴

Table 2. Experimental data.

Status	Data Length	Sample Number	Label
Normal	2048	120	1
Inner ring	2048	120	2
Rolling element	2048	120	3
Outer ring	2048	120	4

Table 3. Parameters obtained from VMD optimized by IBKA.

Category	a	k
Normal	2446	9
Inner ring	2273	3
Rolling element	2238	10
Outer ring	100	4

Table 4. Parameters obtained from VMD.

Category	−2 dB	−7 dB
Normal	α = 2344, k = 8	α = 2100, k = 10
Inner ring	α = 2219, k = 8	α = 2447, k = 10
Rolling element	α = 2297, k = 10	α = 1811, k = 9
Outer ring	α = 120, k = 3	α = 287, k = 3

Table 5. Values of selected feature parameters for signals with noise at different decibel levels.

Noise Situation	Category	1	2	3	4	5	6	7	8	9
Without noise addition	A	0.00003	0.00024	0.06093	1.67803	0.01565	3.89365	4.39619	1.12906	4.85315
	B	0.00011	0.01956	1.02767	4.73608	0.13987	7.34740	10.36760	1.41106	13.20921
	C	0.00001	0.00036	0.09217	1.95135	0.01891	4.87420	5.62661	1.15436	6.29117
	D	0.00292	0.09003	1.61593	2.53194	0.30007	5.38521	6.60537	1.22658	7.71407
−2 dB	A	0.00009	0.01272	0.58018	2.36468	0.11277	5.14501	6.21697	1.20835	7.18660
	B	0.00022	0.01764	0.92745	3.04446	0.13281	6.98353	8.73434	1.25071	10.25235
	C	0.00060	0.01093	0.64122	2.77426	0.10454	6.13398	7.62936	1.24379	8.98314
	D	0.00227	0.15646	3.08134	4.27411	0.39555	7.79001	10.47850	1.34512	12.84789
−7 dB	A	0.00011	0.03649	1.27916	3.31807	0.19101	6.69672	8.52395	1.27286	10.14771
	B	0.01149	0.03550	1.08048	3.09752	0.18877	5.72370	7.27148	1.27042	8.72239
	C	0.00001	0.04276	1.12691	2.70226	0.20679	5.44948	6.70285	1.23000	7.80779
	D	0.00154	0.18171	3.11818	3.47646	0.42628	7.31487	9.56494	1.30760	11.64287

Table 6. Recognition results of various models.

Noise Situation	Evaluating Indicator	Transformer-TELM	Transformer-KELM	Transformer-ELM	Transformer-SVM	Transformer-Softmax	CNN
Without noise addition	Accuracy/%	100	100	100	100	100	100
	Recall/%	100	100	100	100	100	100
	F1 measure/%	100	100	100	100	100	100
−2 dB	Accuracy/%	100	94.167	94.167	92.5	90	95
	Recall/%	100	94.711	94.711	92.5	90	95
	F1 measure/%	100	94.154	94.154	92.438	89.991	95
−7 dB	Accuracy/%	100	83.333	85.47	84.167	82.5	82.5
	Recall/%	100	83.916	85.521	84.167	82.5	82.5
	F1 measure/%	100	82.601	84.948	83.824	82.165	82.319

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, J.; Li, X.; Mao, M. Enhancing Fault Diagnosis: A Hybrid Framework Integrating Improved SABO with VMD and Transformer–TELM. Lubricants 2025, 13, 155. https://doi.org/10.3390/lubricants13040155

AMA Style

Yang J, Li X, Mao M. Enhancing Fault Diagnosis: A Hybrid Framework Integrating Improved SABO with VMD and Transformer–TELM. Lubricants. 2025; 13(4):155. https://doi.org/10.3390/lubricants13040155

Chicago/Turabian Style

Yang, Jingzong, Xuefeng Li, and Min Mao. 2025. "Enhancing Fault Diagnosis: A Hybrid Framework Integrating Improved SABO with VMD and Transformer–TELM" Lubricants 13, no. 4: 155. https://doi.org/10.3390/lubricants13040155

APA Style

Yang, J., Li, X., & Mao, M. (2025). Enhancing Fault Diagnosis: A Hybrid Framework Integrating Improved SABO with VMD and Transformer–TELM. Lubricants, 13(4), 155. https://doi.org/10.3390/lubricants13040155

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Fault Diagnosis: A Hybrid Framework Integrating Improved SABO with VMD and Transformer–TELM

Abstract

1. Introduction

2. Theoretical Background

2.1. VMD

2.2. Improved Subtraction-Average-Based Optimizer

2.2.1. SABO Algorithm

2.2.2. Improved Strategies for SABO

2.3. Transformer Model

2.4. TELM Model

3. Detailed Implementation Procedure of the Proposed Fault Diagnosis Method

3.1. Optimization of VMD Parameters Using the Improved SABO

3.2. Selection Criteria for IMF Signal Components

3.3. Transformer–TELM Model Structure

3.4. Algorithm Steps and Flow

4. Experimental Verification

4.1. Analysis of Optimization Performance of ISABO

4.2. Verification of Bearing Experimental Data

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI