1. Introduction
Rolling bearings, as critical components of rotating machinery, directly influence the operational reliability and service life of equipment [
1]. Traditional bearing life assessment methods primarily rely on a vibration signal analysis, which monitors degradation through time-domain and frequency-domain feature extraction combined with fault frequency identification [
2,
3]. However, single-channel vibration signals are susceptible to noise interference under complex operating conditions and exhibit limited characterization capabilities for early-stage weak faults and multi-mode composite damage [
4,
5], leading to insufficient model generalizability. In recent years, multi-source signal fusion techniques have significantly enhanced degradation feature representation by integrating multi-physics data such as the vibration, temperature, and slippage [
6,
7]. For instance, the research reported in [
8] demonstrates that temperature signals effectively reflect the gradual transition process of bearing friction and lubrication states, complementing the transient impact characteristics of vibration signals. Reference [
9] combined acoustic and vibration data to achieve an over 99% diagnostic accuracy across all signal-to-noise ratio (SNR) conditions, exhibiting an exceptional robustness and noise immunity in both noisy and high-SNR environments. Nevertheless, the high dimensionality and redundancy of multi-source features escalate the modeling complexity, urgently demanding efficient feature fusion and screening mechanisms [
10]. Addressing the inherent non-stationarity of vibration signals under evolving defect conditions remains particularly challenging. Recent approaches, such as the dynamic modeling framework proposed by Galli et al. for non-stationary bearing vibration signals [
11], offer promising avenues for simulating complex degradation behaviors.
In the field of feature extraction and fusion, time–frequency analysis methods such as wavelet packet decomposition (WPD) and empirical mode decomposition (EMD) have been widely applied to capture the local characteristics of nonlinear signals [
12,
13]. For instance, reference [
14] utilized wavelet packet energy entropy to construct multidimensional feature sets but failed to address the interference from inter-feature correlations; reference [
15] proposed an adaptive feature fusion (AFF) strategy to compress the data dimensionality by dynamically adjusting the feature weights, although its long-term dependency on degradation trends remained insufficient. The auto-associative kernel regression (AAKR) method quantifies residuals through health space mapping, demonstrating a high sensitivity in rolling bearing life predictions [
16], yet its computational complexity has hindered real-time applicability [
17]. To address these limitations, this study adopted a dual-stage AFF–AAKR fusion framework, integrating spatial compression with a temporal residual analysis to balance the computational efficiency and the degradation characterization capability.
Feature sensitivity screening constitutes a critical step in optimizing the model performance. The existing studies have predominantly employed single-indicator metrics (e.g., monotonicity, trendability) to evaluate feature-degradation correlations [
18], yet they have overlooked the synergistic effects of multidimensional indicators. For instance, reference [
19] introduced a mutual information-based feature-clustering method but failed to incorporate regularization constraints to suppress redundancy; reference [
20] jointly utilized trendability and complexity metrics to screen sensitive features, yet its linear weighting strategy struggled to balance nonlinear relationships among the indicators. To address these limitations, this study designed a composite sensitivity index (CSI) that integrates the trend persistence index (TPI), the trend monotonicity index (TMI), and the signal complexity index (SCI). By implementing a product constraint mechanism, the CSI achieves multi-indicator synergy optimization. Furthermore, the framework combines K-means++ clustering with an L1-regularized entropy weight allocation to effectively extract low-redundancy high-sensitivity parameter sets from high-dimensional features.
Health state partitioning models must adapt to the nonlinear and phased characteristics of bearing degradation. Traditional methods, such as hidden Markov models (HMMs) and Gaussian mixture models (GMMs), rely on prior distribution assumptions and are prone to mis-segmentation under complex degradation patterns [
21,
22,
23,
24,
25,
26]. To overcome these limitations, this study developed a dual-criteria adaptive bottom-up merging (DC-ABUM) algorithm, which integrates fitting error thresholds and segment quantity constraints to achieve the adaptive three-stage partitioning of degradation curves. This approach significantly enhances the turning point identification accuracy by dynamically optimizing the segmentation granularity based on a residual analysis and statistical significance testing.
Furthermore, existing health indicator (HI) evaluation systems predominantly focus on singular performance metrics (such as monotonicity or correlation), lacking a comprehensive quantitative framework [
27,
28]. Although reference [
29] proposed a robustness indicator based on trend-residual decomposition, its weight assignment relied on empirical settings. Reference [
30] employed detrended fluctuation analysis (DFA) to assess the long-term correlation of the HI but did not integrate it with short-term dynamic characteristics. This paper constructs a comprehensive index (CI) encompassing monotonicity, correlation, and robustness. Through weighted fusion, the CI quantifies the overall performance of the HI, providing a unified benchmark for method comparison.
To address the aforementioned challenges, this paper proposes a multi-source signal fusion and adaptive optimization framework for rolling bearing health monitoring. The framework employs a composite sensitivity index (CSI) with a two-tier screening mechanism (combining mutual information clustering and L1-regularized entropy weighting) to eliminate redundant features while preserving degradation-sensitive parameters. A dynamic feature fusion model (AFF–AAKR) is further designed to integrate spatial compression (via adaptive feature fusion) and temporal residual analysis (via auto-associative kernel regression), constructing a high-performance health indicator (HI) with minimized redundancy and maximized degradation information. Additionally, the entropy weight method dynamically allocates weights to vibration and temperature signals based on their information entropy contributions, ensuring optimal multi-source fusion. Finally, the DC-ABUM-based health state partitioning model is developed to precisely delineate the degradation stages (e.g., normal, incipient fault, severe degradation) by jointly minimizing fitting errors and controlling stage transitions, thereby reflecting the actual degradation process with higher fidelity.
The remainder of this paper is organized as follows:
Section 2 introduces foundational theories and related methodologies.
Section 3 details the proposed two-tier feature selection methodology (CSI-based screening and entropy-weighted fusion) and the DC-ABUM-driven health state partitioning model.
Section 4 validates the proposed framework through accelerated bearing degradation experiments and benchmarks its performance against state-of-the-art methods in terms of HI smoothness, stage identification accuracy, and computational efficiency. Finally,
Section 5 concludes the paper and suggests future research directions.
2. Theoretical Methodology
2.1. Feature Extraction Based on Vibration Acceleration Signals
When surface damage occurs in bearing components (e.g., inner/outer races or rolling elements), transient shock pulses are generated at mechanical contact interfaces. These micro-impact events induce characteristic harmonic components in the vibration signal spectrum, whose frequency parameters exhibit a strict mathematical correspondence with specific fault types. By constructing a fault-sensitive feature set, this mechanism enables full-lifecycle monitoring of bearing operational states and precise fault mode identification. In this study, feature sets are systematically developed across three domains: time-domain, frequency-domain, and time–frequency-domain analyses.
- (1)
Time-Domain Features
Time-domain features reveal the operational states and fault patterns of bearings by directly analyzing the statistical characteristics of vibration signals along the time axis. These features are categorized into two groups: dimensional statistical parameters, such as the mean, variance, and root mean square (RMS), which evaluate the overall amplitude level and energy distribution of the signal, and dimensionless statistical parameters, including kurtosis, crest factor, and waveform factor. The latter demonstrate superior noise immunity due to their insensitivity to amplitude scaling and baseline shifts, offering significant advantages in detecting early-stage weak faults in bearings. The time-domain features and their definitions are systematically cataloged in
Table 1.
- (2)
Frequency-domain features
Frequency-domain features are derived by mapping vibration signals into the frequency dimension via fast Fourier transform (FFT), enabling the precise localization of fault-related characteristic frequencies and their energy distribution patterns. Core parameters, such as the mean frequency and frequency-domain root mean square (FRMS), are utilized to identify fault types and their corresponding spectral signatures. The frequency-domain feature extraction process and parameters are detailed in
Table 2.
- (3)
Time–Frequency Feature
Wavelet packet analysis (WPA) enables the multiscale characterization of signal time–frequency properties. The relationships between the scaling function
and wavelet function
are expressed as
where
and
denote the coefficients of the low-pass and high-pass filters, respectively,
represents the time variable, and
is the decomposition level index.
The wavelet packet function family
is generated through recursive decomposition of the scaling space
and wavelet space
:
Extending this to wavelet packets yields
Here,
is an integer index for the wavelet packet nodes. The decomposition and reconstruction formulas are defined as
where
represents the wavelet packet coefficients at the
i-th decomposition level and
n-th node, with
and
being the reconstruction filters.
Based on Parseval’s theorem, the signal energy can be represented as the sum of squared wavelet packet coefficients:
After
i-level WPA decomposition, the signal is divided into
frequency bands. The energy of each band is calculated using the RMS formulation established in Equation (7):
Figure 1 illustrates the decomposition process for
i = 3, where the original signal is progressively split into eight frequency bands.
2.2. Feature-Based AFF–AAKR Fusion
The adaptive feature fusion (AFF) method is a feature fusion approach that constructs a health indicator (HI) through weighted averaging. Its core principle involves dynamically adjusting weights based on the correlations among features to reflect their degradation trends over time [
14]. Specifically, assuming a bearing has
features, the feature set is denoted as
, and the feature vector length is
. The observation matrix
X, composed of n features, has a dimension of
. For the feature value
at time
, its average distance
to other features at time
is calculated as
A smaller
indicates stronger correlations between
and other features, resulting in a larger weight
:
where
is a smoothing parameter. The HI curve generated by the AFF algorithm is
The AFF method offers computational efficiency and spatial compression but often inadequately reflects degradation trends. To enhance degradation characterization, auto-associative kernel regression (AAKR) is adopted over alternative regression methods for three key reasons:
- (1)
Nonlinear Mapping: Unlike linear regression (e.g., PCA), AAKR captures nonlinear relationships between observation vectors and health baselines through kernel functions, which is critical for complex degradation processes.
- (2)
Residual Sensitivity: Compared to Gaussian process regression (GPR), AAKR directly quantifies deviations via Euclidean residuals, providing clearer degradation quantifiers than probabilistic outputs.
- (3)
Computational Feasibility: While long short-term memory (LSTM) networks model temporal dependencies effectively, they require large labeled datasets and extensive training. AAKR avoids these constraints with minimal training overhead.
Furthermore, the bearing test data in this study exhibit complex non-stationary degradation patterns requiring nonlinear modeling, benefit from direct residual-based degradation quantifiers, and are derived from limited experimental runs, making AAKR’s nonlinear capability, residual sensitivity, and computational efficiency particularly well-suited.
AAKR maps observation vectors to a health space and constructs HI curves by calculating residuals between observation vectors and reconstructed signals [
15]. For a rolling bearing with p monitoring parameters, the observation vector
at time t is expressed as
The health space matrix H, composed of m health vectors, is defined as
The mapping of
in the health space is
where the weights
are calculated as
The kernel function K typically employs a Gaussian form:
The HI value at time t is the Euclidean distance between
and its reconstructed vector
:
The AAKR method demonstrates superior performance in constructing health indicator (HI) curves that better align with bearing degradation trends, while quantitatively measuring the temporal discrepancies between observation matrices and health matrices. However, this method exhibits significant computational overhead, particularly when processing large initial datasets, due to the extensive vector operations involved.
To integrate the strengths of both AFF and AAKR methods while mitigating their limitations, this study proposes an AFF–AAKR feature fusion approach. The method first employs AFF-based spatial compression to preliminarily reduce the multi-dimensional feature matrix X into a one-dimensional feature vector Z, thereby effectively decreasing the data volume.
Subsequently, AAKR further fuses the compressed vector Z in the temporal domain by quantifying residuals between observation vectors and health space reconstructions.
where
is the reconstructed value of
in the health space. This dual fusion strategy not only resolves AAKR’s computational inefficiency but also enhances HI curve performance.
2.3. Comprehensive Health State Evaluation of Bearings via Multi-Information Fusion Using Entropy Weight Method
The entropy weight method (EWM) is a multi-criteria weighting approach that integrates subjective and objective factors. It dynamically adjusts weights by quantifying indicator variability and is widely applied in engineering decision-making and condition assessment [
31]. The core workflow comprises four key steps: data standardization, information entropy calculation, weight determination, and comprehensive evaluation.
Significant differences in data dimensions and ranges necessitate standardization to eliminate scale effects and enhance comparability. Common methods include the following:
Min–max normalization, which linearly maps data to the [0, 1] interval:
This method is straightforward but sensitive to outliers, potentially distorting data distributions.
Z-score standardization, which transforms data using the mean and standard deviation:
where
is the mean, and
is the standard deviation. This approach preserves outlier information and offers superior stability for complex data distributions.
Information entropy reflects the disorder degree of indicator data: lower entropy indicates greater variability and higher information content. The steps are as follows:
Probability proportion: For standardized data, compute the proportion of the
i-th sample under the
j-th indicator:
Entropy value: Calculate the entropy
for the
j-th indicator:
When = 0, define = 0 to avoid computational invalidity.
EWM derives weights inversely from entropy values: lower entropy corresponds to higher weights.
Redundancy Calculation: Define the information redundancy degree .
Weight Allocation: Normalize the redundancy to obtain final weights:
where m is the total number of samples. The weight vector satisfies
, ensuring rationality of the comprehensive evaluation.
The comprehensive health indicator (HI) for each sample is derived by the weighted summation of normalized data:
The HI score intuitively reflects the bearing health status: higher values indicate better conditions, providing quantitative support for maintenance decisions. The workflow of the entropy weight method is illustrated in
Figure 2.
2.4. Comprehensive HI Evaluation Method Based on Monotonicity, Correlation, and Robustness
To systematically evaluate the performance characteristics of health indicators (HI), this study employs a three-dimensional evaluation system based on monotonicity, correlation, and robustness. A comprehensive quantitative metric is formed through weighted fusion.
First, the HI is decomposed into a trend component and a stochastic component:
where
represents the HI value at time
,
denotes the trend component, and
represents the stochastic component.
Monotonicity measures the consistency of the HI trend evolution, quantified via a sign consistency test of the trend term differences. A higher value indicates a more significant degradation trend, which facilitates the identification of life stages.
where
, and N is the length of the HI sequence.
Correlation evaluates the association strength between the HI trend component and the physical degradation process of the equipment, measured using a correlation coefficient method to assess the temporal correlation. A higher value reflects superior degradation characterization accuracy.
Robustness characterizes the HI’s resistance to random disturbances, evaluated through residual error analysis between the trend component and the original HI. A higher value indicates stronger noise resistance and better fluctuation suppression.
The CI provides a unified quantitative assessment of health indicator performance by integrating monotonicity, correlation, and robustness through weighted fusion. Based on the study by Wu et al. [
31], the coefficient vector is set as
. The CI formulation is given by Equation (28), where higher CI values (0 ≤ CI ≤ 1) indicate superior degradation characterization capability. This metric serves as the primary benchmark for the comparative method evaluation in
Section 4.
3. Proposed Methodology
3.1. Feature Selection Mechanism Based on Multi-Dimensional Sensitivity Evaluation and Composite Index Optimization
The core of the feature selection mechanism lies in constructing a feature space that characterizes the sensitivity of bearing performance degradation, achieving information concentration by eliminating low-sensitivity features. To establish a quantitative association analysis between feature parameters and degradation trends, a sensitivity evaluation system based on statistical properties is required. This study proposes an enhanced dual-index evaluation framework, constructing a multi-dimensional assessment space using the trend persistence index (TPI) and trend monotonicity index (TMI), and further optimizes the comprehensive quantitative evaluation of feature performance by incorporating the signal complexity index (SCI).
The trend persistence index (TPI) quantifies the consistency of consecutive trend directions in a feature sequence, reflecting the stability of the degradation process. Its mathematical formulation is
where
is the indicator function (outputs 1 for consecutive same-sign differences, otherwise 0), and N is the sequence length. The TPI value range is [0, 1], with higher values indicating stronger persistence in degradation trends.
The trend monotonicity index (TMI) quantifies the consistency of evolutionary direction in a feature sequence, defined as
where
is the sign function, and
N is the feature sequence length. This index reflects the stability of degradation trends through cumulative directional derivatives, with a value range of [0, 1]. Higher values indicate more significant monotonic interpretability of the degradation process.
The signal complexity index (SCI) evaluates the regularity of feature sequences based on sample entropy (SampEn), defined as
where
is the sample entropy value, m is the embedding dimension, and r is the similarity tolerance. Normalization ensures the SCI value range is [0, 1], with higher values indicating lower signal complexity and stronger correlation with degradation regularity.
These three complementary evaluation metrics are integrated into a composite sensitivity index (CSI) for comprehensive feature performance quantification:
Since the TPI, TMI, and SCI are all normalized (range [0, 1]), the CSI value also converges to [0, 1]. The multiplicative operator in CSI strengthens the synergistic constraints among metrics, ensuring a strict positive correlation between CSI levels and the degradation characterization capability of features.
Based on descending CSI rankings, this study employs statistical significance testing to determine the feature selection threshold. By setting an empirical threshold , features satisfying are selected to form a sensitive feature subset. This mechanism effectively retains features with strong degradation characterization capabilities, providing a robust feature foundation for subsequent degradation modeling.
3.2. Feature Re-Screening Method Based on Fast Clustering and Regularized Information Entropy
After completing sensitivity-based feature selection, the obtained feature subset eliminates low-sensitivity parameters but still suffers from redundancy due to information-homologous features. The overlapping degradation–informative parameter sets significantly increase the risk of the ‘curse of dimensionality’ in feature fusion and lead to exponential growth in computational complexity. To address this, this study proposes a secondary screening mechanism based on fast clustering and regularized information entropy, achieving efficient feature space optimization through adaptive clustering and sparse constraints.
A symmetric similarity matrix is constructed using mutual information to quantify statistical dependencies between features:
where
is the mutual information between features, and
denotes their joint entropy. This design accurately captures nonlinear feature correlations, avoiding the distributional assumptions inherent in traditional correlation coefficients.
Features are partitioned using an enhanced K-means++ algorithm, where the optimal cluster number K is automatically determined via a silhouette–elbow joint criterion:
The elbow method computes the within-cluster sum of squared errors (SSE) for varying K, selecting K at the ‘elbow’ inflection point.
The silhouette coefficient evaluates intra-cluster compactness and inter-cluster separation simultaneously, choosing K that maximizes the silhouette score.
For each feature cluster Ck, an L1-regularized information entropy weight is assigned:
where
is the feature’s information entropy, and λ is a sparsity coefficient. Sparse constraints suppress redundant feature weights, retaining parameters with high information density and low correlation.
3.3. Dual-Criteria Adaptive Health State Partitioning Model
To address the nonlinear degradation characteristics of rolling bearing health indicator (HI) curves, this study develops a DC-ABUM algorithm. The core idea is to explicitly decouple degradation stages via piecewise linear approximation (PLA).
Algorithm Framework:
Fitting error threshold: Terminate iterations if the maximum segment fitting error exceeds a preset threshold .
Segment number constraint: Terminate optimization when the predefined segment count M is reached. Mathematically, this solves
where
is the linear fit for the
i-th segment, and
denotes the Euclidean norm.
For any sub-segment
, its linear model
is solved via least squares:
The solved slope
and intercept
are
The merge cost between adjacent segments
and
is defined as the incremental fitting error before and after merging:
Lower merge costs indicate higher consistency in linear trends, prioritizing merges to simplify the model.
Step 1: Initialization
Divide the HI sequence of length N into N atomic segments , each containing a single data point.
Step 2: Cost Calculation
Compute merge costs for all adjacent segment pairs and store them in a priority queue Q.
Step 3: Iterative Optimization
Extract the pair with minimal cost from Q and merge them into .
Update Q: Remove costs associated with the original pairs, compute new costs for with its neighbors, and reinsert into Q.
Terminate if ∣S∣ ≤ M or .
Step 4: Output
Return the final segmentation , completing health state partitioning.
The pseudocode is presented in Algorithm 1.
Algorithm 1 Health State Partitioning |
1: Input: HI_sequence, M, |
2: Output: S (Final Segmentation) |
3: Initialize: |
4: , is a data point from HI_sequence. |
5: Q = Priority Queue of pairs for all adjacent pairs. |
6: = merge cost function |
7: Iterative Merging: |
8: while and do |
9. |
10:
{Merge segments} |
11: Update Queue: |
12: Remove old pairs costs: |
13: Insert new costs: |
14: end while |
15: Termination: |
16: if
or then |
17: Stop |
18: end if |
19: Output:. |
3.4. Algorithm Implementation Steps
The implementation workflow of the proposed algorithm is illustrated in
Figure 3. First, the vibration and temperature signals are synchronously acquired with temporal alignment, where the vibration signals are sampled at 25.6 kHz and the temperature signals at 1 Hz. Time-domain statistics, frequency-domain parameters, and wavelet packet energy features (using decomposition level
i = 3) are extracted from the vibration signals, while temperature signal features including the moving root mean square (RMS) (window size = 60 s) and the gradient magnitude are derived to comprehensively characterize the bearing degradation. Subsequently, a multi-dimensional sensitivity evaluation and composite index optimization-based feature selection mechanism are applied for preliminary feature screening with the CSI threshold τ = 0.3. Then, the fast clustering and regularized information entropy-based re-screening method is employed to achieve efficient optimization and reconstruction of the feature space, where mutual information clustering uses K-means++ (K = 3 clusters), and entropy weighting applies an L1-regularization coefficient λ = 0.1. Next, the AFF method with a smoothing parameter ε = 1 × 10
−6 dynamically weights and compresses the multi-dimensional features to generate a one-dimensional feature sequence. The AAKR method with a Gaussian kernel bandwidth σ = 0.1 is then used to quantify residuals between observation vectors and the health space, constructing a degradation-sensitive HI curve that balances the computational efficiency and trend characterization. Afterward, information entropy values of vibration and temperature HIs are calculated separately, dynamically weighted, and fused to generate a comprehensive HI curve, enhancing robustness in degradation representation. Finally, the dual-criteria health state partitioning method with a maximum fitting error
= 0.05 and target segment count M = 3 divides the operational states of the bearing into distinct phases (normal, incipient fault, severe degradation).