1. Introduction
Rotating machinery, such as aircraft engines, high-speed turbines, and industrial pumps, is the core power source of modern industrial systems, and its stable and reliable operation is crucial to ensuring production safety, reducing economic losses, and improving operational efficiency [
1,
2]. Reliable prognostics regarding the remaining useful life (RUL) of rotating machinery are fundamental to effective predictive maintenance, which can effectively avoid unplanned downtime, reduce maintenance costs, and ensure the safe and continuous operation of industrial equipment [
3,
4]. However, the degradation process of rotating machinery is nonlinear, nonstationary, and affected by multiple complex factors, resulting in great challenges in RUL prediction [
5,
6,
7].
Driven by the increasingly stringent reliability and safety requirements in modern industrial systems, RUL forecasting for rotating machinery has emerged as a paramount focal point within reliability engineering. Existing prognostic approaches can be broadly categorized into two dominant paradigms: model-based methods [
8] and data-driven methods [
9]. Traditional model-based approaches fundamentally rely on the explicit physical degradation mechanisms of equipment to formulate mathematical representations. Prominent techniques in this category encompass stochastic process models—such as the Wiener process [
10,
11,
12] and Gamma process [
13,
14,
15]—as well as Bayesian state estimation algorithms [
16,
17], notably particle filtering (PF) [
18,
19] and extended Kalman filtering (EKF) [
20,
21]. The primary advantage of these mechanistic models lies in their transparent physical significance and robust mathematical interpretability, making them highly effective for scenarios with well-defined degradation dynamics. For instance, Lim et al. [
21] introduced a multi-modal prognostic framework integrating a switching Kalman filter ensemble, which effectively mitigated prediction uncertainties in complex degradation processes. Similarly, Cui et al. [
19] formulated a comprehensive RUL estimation strategy for rolling bearings utilizing time-varying particle filtering, thus improving the model’s adaptability to time-dependent degradation characteristics. Despite their theoretical soundness, the successful implementation of physical models necessitates a profound prior understanding of the equipment’s internal structural dynamics and fatigue evolution mechanisms. In practice, modern rotating machinery operates under highly coupled and variable conditions, rendering the construction of precise physical models exceptionally challenging. Furthermore, the reliance on complex differential equations often leads to an arduous parameter calibration process, which severely restricts their generalizability and broader engineering applications.
To circumvent the inherent bottlenecks associated with explicit physical modeling, data-driven methodologies have emerged as a highly promising alternative. These methods bypass the need for prior mechanistic knowledge, inferring degradation trajectories by directly extracting latent feature representations from voluminous condition monitoring data. Representative algorithms encompass conventional machine learning models, deep learning architectures, and nonlinear adaptive filters, such as Support Vector Machines (SVM) [
22,
23], Deep Belief Networks (DBN) [
24,
25], Convolutional Neural Networks (CNN) [
26,
27] and Long Short-Term Memory (LSTM) networks [
28,
29]. Considerable progress has been achieved within this paradigm. For example, Shen et al. [
22] designed a novel transfer learning model based on SVM, effectively mitigating domain shifts and improving the prognostic adaptability of rolling bearings across disparate working conditions. Furthermore, Islam et al. [
30] developed a recursive support vector regression approach that captures temporal dependencies to sequentially evaluate the RUL of rolling bearings. More recently, state-of-the-art prognostic frameworks have progressively incorporated advanced deep learning architectures to further push the boundaries of RUL prediction. For instance, Transformer models and self-attention mechanisms have been increasingly utilized to effectively capture complex long-range temporal dependencies in degradation sequences [
31,
32,
33]. Additionally, Physics-Informed Neural Networks (PINNs) [
34] and Graph Neural Networks (GNNs) [
35] have emerged as powerful tools, integrating explicit physical degradation laws and spatial-temporal topologies into data-driven models to improve generalization capabilities under varying working conditions. While deep learning approaches (e.g., DBNs) exhibit powerful feature extraction capabilities, they are often hindered by their “black-box” nature and heavy reliance on massive amounts of run-to-failure training data, which are rarely available in actual industrial settings [
36]. In contrast, kernel adaptive learning (KAL) methods [
37,
38,
39] have recently attracted significant attention as an elegant online nonlinear filtering technology. By employing the kernel trick, KAL efficiently projects low-dimensional nonlinear input signals into a high-dimensional reproducing kernel Hilbert space (RKHS) to map complex degradation patterns. Compared to highly parameterized neural networks, KAL uniquely retains a degree of mathematical transparency and structural interpretability. Additionally, it boasts low computational overhead and ease of deployment, demonstrating robust predictive performance in practical small-sample scenarios where large-scale data collection is prohibitive.
In recent years, fractional derivative technology has been gradually applied to RUL prediction, which can effectively capture the memory and hereditary properties of nonstationary signals and better describe the long-range dependence of rotating machinery degradation processes [
40]. For example, researchers have proposed fractional derivative-based learning methods such as FrKLMS [
41] and FrKRLS [
42], which have achieved better prediction results than traditional kernel learning methods. Although fractional derivative technology has shown great promise in describing the nonstationary characteristics of degradation signals, a critical theoretical gap remains: existing fractional derivative-based prediction methods have not been fundamentally integrated with the robust nonlinear optimization of kernel adaptive learning. Conventional KAL algorithms predominantly utilize integer-order gradient descent, which strictly depends on instantaneous prediction errors and inherently lacks the capacity to retain historical degradation information. Conversely, existing fractional approaches (e.g., FrKLMS and FrKRLS) often rely on standard Mean Square Error (MSE) criteria and computationally heavy fractional definitions, making them highly vulnerable to non-Gaussian impulsive noises and difficult to implement efficiently. To the best of our knowledge, no existing research has successfully embedded the Hadamard fractional derivative directly into the reproducing kernel Hilbert space (RKHS) optimization framework in conjunction with a multi-kernel mixture measure. By doing so, our proposed method redesigns the weight-updating mechanism to mathematically encode both “memory capacity” and complex nonlinear feature mapping, bridging a crucial gap in current prognostic methodologies.
In view of the above deficiencies, aiming at the problems of insufficient labeled data, poor interpretability of models, and difficulty in capturing nonstationary degradation characteristics in RUL prediction of rotating machinery, this study combines fractional derivative and kernel adaptive learning to propose a new RUL prediction method. The fractional derivative is used to enhance the ability of capturing nonstationary and long-range dependent characteristics of degradation signals, and the kernel adaptive learning is used to improve the nonlinear fitting ability and interpretability of the model, so as to solve the key technical problems in RUL prediction of rotating machinery and provide a new technical route for practical engineering application.
The pivotal contributions of this study are delineated as follows:
(1) The Hadamard fractional derivative is innovatively incorporated into the algorithm’s gradient descent and weight-updating mechanism. Unlike traditional integer-order models that solely rely on short-term instantaneous errors, this fractional operator mathematically encodes the “memory capacity” and “hereditary properties” of physical equipment, empowering the model to accurately extract complex long-range temporal dependencies native to structural degradation.
(2) A multi-kernel mixture (MKM) measure is uniquely integrated into the fractional-order optimization objective to replace the conventional mean square error. This structural enhancement significantly improves the algorithm’s operational robustness, effectively suppressing the adverse impacts of high-level noises and extreme measurement outliers prevalent in harsh industrial settings.
(3) By deeply integrating the aforementioned techniques into an adaptive learning paradigm, the proposed method yields a mathematically transparent “white-box” architecture. It successfully circumvents the opaque “black-box” limitations and heavy training data dependency of current deep learning models, making it highly advantageous for real-world small-sample scenarios.
3. Proposed Method
3.1. Overall Process Framework
Unlike contemporary end-to-end deep learning approaches that often function as opaque “black boxes” and heavily rely on massive labeled datasets, the proposed framework leverages a white-box analytical architecture driven by the fractional derivative multi-Kernel adaptive learning algorithm.
The overall process framework is systematically delineated into six interconnected phases as shown in
Figure 2.
- (1)
Data Acquisition
The prognostic pipeline initiates with the empirical collection of physical operational data. Utilizing mechanical accelerated life testing platforms, horizontal and vertical high-frequency accelerometers continuously capture the raw, high-dimensional vibration signals of the rotating machinery. This dynamic monitoring spans the entire lifecycle—from a pristine health state to complete structural failure—under varying operational conditions, such as distinct rotational speeds and radial loads.
- (2)
Feature Extraction
Because raw vibration signals are highly susceptible to environmental noise and lack intuitive degradation signatures, time-domain feature extraction is executed to formulate reliable, low-dimensional health indicators. Primary Degradation Indicator: The Maximum Amplitude (MA) of the vibration signal is extracted to quantitatively characterize the macroscopic, progressive mechanical degradation trajectory of the bearing over its lifecycle. Simultaneously, the statistical Kurtosis feature is extracted. Due to its high sensitivity to early-stage transient impulses, it is utilized to identify incipient structural anomalies and facilitate health state demarcation.
- (3)
State Division
To optimize computational resources and prevent premature or erroneous prognostic estimations during the normal operational phase, an adaptive anomaly detection mechanism is established. The Confidence Interval: Utilizing historical kurtosis data from the early, stable healthy stage, the system establishes a statistical baseline defined by a interval . To mitigate false alarms induced by random ambient noise, a robust sequential triggering logic is applied. The RUL prediction mechanism is exclusively activated when a predefined sequence () of consecutive kurtosis values strictly breaches this boundary. This specific chronological coordinate is designated as the First Prediction Time (FPT), officially marking the transition into the degenerative stage.
- (4)
Model Specification
Upon reaching the FPT, the proposed algorithm is deployed to conduct forecasting on the non-linear degradation trajectory. The core novelty of this model lies in the unprecedented mathematical integration of three advanced theoretical paradigms:
Kernel Adaptive Learning: Employs the “kernel trick” to project low-dimensional input vectors into a high-dimensional feature space, providing a mathematically transparent non-linear time-series regression mechanism that overcomes the “black-box” nature and data-hungry limitations of neural networks.
Multi-kernel mixture measure: Substitutes the traditional MSE cost function with the MKM measure. This structural adaptation fundamentally enhances the algorithm’s robustness, effectively shielding the prognostic model from non-Gaussian noise and extreme outliers prevalent in harsh industrial settings.
Fractional Derivative: Serving as the core methodological innovation, the incorporation of the Hadamard fractional-order calculus directly into the MKM-based gradient descent mechanism mathematically encodes “memory capacity” and “hereditary properties” into the model’s adaptive weights. This unique integration allows the algorithm to precisely capture long-term temporal dependencies without the excessive computational overhead of traditional fractional filters, significantly outperforming integer-order counterparts.
- (5)
Prognostic Execution
To estimate the RUL, a deterministic failure threshold is first established based on domain expertise, historical failure modes, and safety regulations. Starting from the prediction origin, the trained algorithm autoregressively forecasts the future trajectory of the MA degradation curve. The exact time at which this extrapolated trajectory intersects the predefined threshold is recorded as the predicted failure time. Consequently, the estimated RUL is calculated as the time difference between the projected failure time and the prediction starting point.
- (6)
Performance Analysis
To rigorously validate the algorithm’s engineering viability, robustness, and stability, the framework’s evaluation extends beyond a single static prediction at the FPT by adopting a multi-point sequential tracking approach. Specifically, the temporal span between the FPT and the actual failure is discretized into multiple uniform segments (e.g., 10 segments). At the onset of each segment, the prediction algorithm iteratively updates the RUL prediction by incorporating newly acquired operational data, thereby simulating a continuous, online prognostic monitoring environment.
3.2. Fractional Derivative Multi-Kernel Adaptive Learning Algorithm
3.2.1. Problem Definition
The health condition of rotating machinery deteriorates over time, and its maximum deliverable capacity gradually decreases. This process of state decline can be mathematically represented as a nonlinear time series related to the number of operating cycles. To accurately characterize this degradation mechanism, the extracted system state feature
x is selected as the primary state variable. Consequently, the one-step-ahead capacity forecasting model is constructed as follows:
Here,
represents an unknown mapping function that governs the underlying degradation dynamics,
indicates the system state feature at the
i-th cycle, and
accounts for the associated modeling noise. Additionally, the parameter
l signifies the time-embedding dimension. Given the highly complex decay mechanism, deriving an exact analytical formulation for
is practically unfeasible. To address this challenge and obtain a reliable approximation, the fractional derivative multi-kernel adaptive learning algorithm is utilized to construct a data-driven prediction model. Assuming a given training dataset denoted by
, where the historical input feature vector is defined as
and the target output is
, the predictive mapping can be estimated via
In this expression, specifies the dynamically adjusted weight assigned to the j-th support vector stored in the dictionary at iteration n, where n represents the total number of training steps completed.
3.2.2. Multi-Kernel Mixture Measure
As is the case with various machine learning frameworks, the choice of an appropriate cost function is of paramount importance for the effective training of kernel adaptive learning algorithm. To mitigate the adverse effects introduced by measurement outliers, the multi-kernel mixture (MKM) measure is adopted as the objective function in this study. For any two continuous random variables,
u and
v, the mathematical formulation of the MKM measure is defined as
in which
means the mathematical expectation operator,
represents the mixture weighting coefficient, and
as well as
specify the kernel bandwidths of the standard Gaussian function
, which is given by
Originating from the information theoretic learning paradigm [
46], the MKM serves as a localized similarity metric, essentially representing a generalized correlation within the reproducing kernel space. Given a bounded collection of error observations
, the empirical estimation of the MKM-derived cost function is evaluated through sample averaging
During the adaptive learning process, the optimal filter weights are updated by minimizing this empirical cost function. For an individual instantaneous error e, the corresponding sample-wise loss is expressed as .
3.2.3. Algorithm Derivation
The optimal weight vector
at training step
i can be derived by minimizing the regularized objective function:
In this optimization problem,
designates the adaptive weight vector, while
specifies the set of indices up to the current iteration. The term
and
compute the unnormalized kernel evaluation derived from Equation (
7). Furthermore,
indicates the desired target response,
denotes the transformed input vector for
,
quantifies the instantaneous prediction error between the target and the model output, and
serves as a non-negative regularization penalty parameter.
By algebraically rearranging the cost function to incorporate the memory-preserving characteristics of fractional calculus, the optimization criterion is equivalently extended to a maximization problem in Equation (
11):
To determine the optimal weights, we apply the
-th order fractional derivative to Equation (
11) with respect to
and equate the gradient to zero. Applying the fractional chain rule establishes the critical condition detailed in Equation (
12):
To simplify this complex expression and make the mathematical derivation more transparent, we factorize the fractional error term as
. This allows us to extract the shared terms and define an intermediate error-weighting scalar variable
for each sample
j:
With this definition, the terms inside the summation of Equation (
12) can be elegantly condensed to
. By substituting the definition of the instantaneous estimation error
into this simplified summation and separating the target response from the model prediction, Equation (
12) is sequentially transformed into Equation (
14):
To express this relationship computationally efficiently in a compact matrix format, we define the following matrices and vectors for the sequence up to the current iteration i: The mapped input data matrix ; The -th order fractional derivative input matrix ; The diagonal weighting matrix ; The target response vector .
Using these definitions, we construct the error-weighted target response vector
. Consequently, the algebraic summations in Equation (
14): can be directly translated into the matrix form depicted in Equation (
15):
By factoring out the optimal weight vector
, its explicit analytic solution can be rearranged as Equation (
16):
In this operation, we encapsulate the intermediate expression within an auxiliary parameter vector
. Substituting the identity
back into this definition yields Equation (
17):
Isolating
provides its closed-form representation:
Substituting the expression for
from Equation (
18) back into Equation (
16) provides the finalized mathematical update rule for the weights:
To facilitate a computationally efficient recursive evaluation of
without requiring a full matrix inversion at every time step, we construct an inverse correlation matrix
, where
computes the kernelized inner product mapping between
and
. As new data arrives, the matrix product
can be structurally decomposed into a block-partitioned format, separating the previous
computations from the current
i-th update, as illustrated in Equation (
20):
In this block decomposition,
denotes a zero vector of appropriate dimensions. The cross-correlation vector is defined as
, and the scalar self-inner product evaluates to
, where
is the inner product in
. Consequently,
can be recursively structured via the block matrix in Equation (
21):
Applying the standard block matrix inversion lemma to Equation (
21): enables the analytical inversion of the matrix block by block. This allows
to be elegantly updated via Equation (
22):
where the intermediate transitional vectors (
and
) and the scalar normalizer (
, acting as the Schur complement) are defined as follows to simplify the notation:
Furthermore, by iteratively partitioning the weighted target vector as
, the recursive update rule for the pivotal parameter vector
is obtained by multiplying the expanded block matrix
with the partitioned vector:
By recognizing that
and that the a priori system prediction is
, we can substitute the a priori estimation error
into the expression. Factoring out the terms allows for the recursive update rule for
to elegantly simplify to Equation (
26):
In this recursion,
characterizes the a priori estimation error at the
i-th iteration step, evaluated as
. Ultimately, the predicted adaptive system output
for the current instance is acquired through
. A comprehensive architectural overview of the proposed algorithm is systematically detailed in Algorithm 1.
| Algorithm 1: Fractional derivative multi-kernel adaptive learning algorithm. |
![Sensors 26 04137 i001 Sensors 26 04137 i001]() |
3.3. Interpretability and Explainable Mechanism
While conventional data-driven deep learning methodologies (e.g., Convolutional Neural Networks, Deep Belief Networks) function as uninterpretable “black boxes” with myriad opaque parameters and hidden layers, the proposed fractional-derivative multi-kernel adaptive learning framework inherently possesses a transparent “white-box” architecture. The interpretability and explainable mechanisms of the framework are primarily established upon three dimensions:
(1) In typical deep neural networks, the feature extraction and nonlinear mapping processes are largely unobservable. In contrast, the prediction mechanism of the proposed model is strictly governed by an explicit mathematical formulation (Equation (
6)). The predicted output
is calculated as a linear combination of kernel evaluations
weighted by
. This explicit structure allows the prediction to be directly interpreted as a similarity-weighted aggregation of historical representative states (support vectors). Practitioners can intuitively trace exactly which past degradation patterns most significantly drive the current RUL estimation. Furthermore, instead of relying on opaque gradient backpropagation, the learning process is mathematically transparent. The derivation from the objective function (Equation (
10)) to the recursive block-matrix update rules (Equations (18)–(26)) precisely demonstrates how the model parameters are incrementally adjusted step-by-step based on instantaneous estimation errors.
(2) In standard machine learning models, hyperparameters often lack physical meaning. However, the incorporation of the Hadamard fractional derivative serves as a physically meaningful operator. The wear and fatigue of rotating machinery are historically dependent, accumulative processes rather than memoryless Markovian ones. The fractional-order operator () mathematically parameterizes this “memory capacity” and “hereditary properties” over the entire degradation trajectory. This provides a transparent mathematical formulation of how cumulative structural damage continuously influences the current physical state, bridging abstract algorithmic optimization with actual mechanical degradation principles.
(3) The adoption of the Multi-Kernel Mixture (MKM) measure (Equation (
7)) provides an interpretable noise-handling mechanism. The mixture coefficient
and distinct kernel bandwidths (
,
) explicitly govern the trade-off between standard prediction accuracy and outlier suppression. During the weight-updating process, the intermediate parameter
(defined in Algorithm 1) explicitly acts as a dynamic confidence penalty based on the error distribution. This provides a clear functional explanation for its superior robustness compared to traditional Mean Square Error (MSE)-based filters, as the model mathematically scales down the influence of extreme non-Gaussian sensor anomalies in real-time.
4. Experiments
This section evaluates the proposed method using two publicly available rolling bearing datasets: the XJTU-SY bearing lifetime dataset from Xi’an Jiaotong University–Shaanxi [
47] and the PRONOSTIA benchmark dataset [
48]. Both the XJTU-SY and PRONOSTIA datasets consist of run-to-failure vibration acceleration data acquired from accelerated mechanical degradation testbeds. Specifically, the data contains raw, high-frequency acceleration signals captured continuously by horizontal and vertical high-frequency accelerometers from a pristine healthy state until complete structural failure.
4.1. Hyperparameter Optimization and Justification
The model is deployed for real-time predictive maintenance. Raw vibration signals are continuously captured, and Maximum Amplitude (MA) and Kurtosis features are extracted. During the healthy stage, an adaptive interval mechanism monitors the Kurtosis feature. Once anomalies are detected (the First Prediction Time, FPT), the prediction algorithm is officially activated.
Inputs: The mathematical input to the model at the i-th cycle is a time-embedded sliding window of the historical Maximum Amplitude (MA) feature extracted from the recent vibration data: , where l is the time-embedding dimension.
Outputs: The mathematical output is an autoregressive one-step-ahead prediction of the degradation state, .
RUL Calculation: By iteratively feeding its one-step predictions back into itself as new inputs, the algorithm extrapolates the continuous future degradation trajectory. The Remaining Useful Life (RUL) is calculated as the precise time difference between the current moment and the predicted timestamp when the extrapolated trajectory intersects the predefined mechanical acceleration cutoff (failure threshold).
To ensure the robustness, optimal predictive performance, and reproducibility of the proposed fractional-derivative multi-kernel adaptive learning framework, the crucial hyperparameters (, , , , and ) must be systematically justified and optimized prior to prognostic execution. Instead of relying on empirical assignments, a grid-search optimization strategy combined with time-series cross-validation is employed. The physical and mathematical justifications, along with the search spaces for these parameters, are delineated as follows:
(1) Kernel bandwidths (, ) and mixture weighting coefficient (): These parameters govern the multi-kernel mixture measure. To effectively handle distinct types of error distributions, two contrasting bandwidths are utilized. The smaller bandwidth is tailored to capture fine-grained, localized modeling deviations, whereas the larger bandwidth is designated to suppress extreme non-Gaussian measurement outliers commonly encountered in harsh industrial environments. Consequently, the search spaces are respectively set as and . The parameter , which controls the trade-off between local sensitivity and global outlier robustness, is optimized with a step size of 0.1.
(2) Fractional derivative order (): As the pivotal parameter encoding the “memory capacity” and hereditary attributes of the mechanical degradation process, strictly governs the model’s ability to extract long-range temporal dependencies. Given the varying complexities of physical degradation across different operating conditions, is systematically tuned via grid search within the interval with a step size of 0.05.
(3) Regularization parameter (): To constrain the norm of the adaptive weights and prevent algorithmic overfitting during the adaptive learning process, the regularization penalty is optimized over a logarithmically spaced candidate set .
In the practical implementation for both the XJTU-SY and PRONOSTIA datasets, the optimal parameter combination is systematically determined using historical condition monitoring data acquired prior to the First Prediction Time (FPT). Specifically, the grid-search algorithm iteratively evaluates the candidate parameters, and the combination that minimizes the cross-validation error on the historical degradation sequence is selected. Once identified, these optimized hyperparameters are locked and deployed for the online remaining useful life (RUL) prediction.
4.2. Case I: RUL Prediction on XJTU-SY Datasets
4.2.1. Datasets
In this study, the XJTU-SY testbed (illustrated in
Figure 3) was employed to capture the complete run-to-failure degradation trajectories of LDK UER204 rolling bearings. To ensure extensive coverage of operational states, the experiments were executed across three representative speed-load profiles (2100 rpm/12 kN, 2250 rpm/11 kN, and 2400 rpm/10 kN). During the testing phase, dual-axis (lateral and vertical) vibrations and thermal variations were sampled at 25.6 kHz, capturing 1.28-s snapshots every 60 s. This rigorous acquisition strategy yields a massive, high-quality dataset characterized by a superior signal-to-noise ratio. Notably, the induced bearing damages present a broad spectrum of fault manifestations, mainly consisting of inner race deterioration, outer race fractures, and broken cages.
4.2.2. Experimental Results
Employing the vibration signal’s maximum amplitude (MA) as the health monitoring feature, the full life-cycle degradation trajectory of bearing 1-1 is presented as a representative example in
Figure 4a. The bearing’s operational lifespan systematically unfolds across three phases: the healthy operation stage, the incipient degradation stage, and the severe degradation stage. During the healthy operation stage, the vibration amplitude maintains a consistently low and stable profile. As the bearing transitions into the incipient degradation stage, a gradual escalation in signal amplitude occurs, denoting the onset of structural damage. Subsequently, the degradation index deteriorates drastically with exponential growth, marking the entry into the severe degradation stage. This critical phase transition necessitates the initiation of RUL prediction, and the exact moment this prognostic mechanism is triggered is defined as the first prediction time (FPT).
To determine the FPT objectively, this study adopts the adaptive
interval strategy to distinguish between normal and anomalous bearing states. Initially, historical monitoring data from the healthy operational phase is utilized to establish the
confidence interval
, where
m and
denote the sample mean and standard deviation of the kurtosis feature, respectively. This statistical baseline is then employed to detect anomalous conditions. Upon acquiring a new kurtosis measurement
at timestamp
, it is evaluated against the established
bounds. A value falling outside this interval indicates a potential anomaly, which may originate from actual physical defects or stochastic noise. To preclude false alarms induced by random noise, a robust delayed-triggering criterion is implemented: the prognostic forecasting is activated exclusively when
consecutive kurtosis readings violate the
threshold. The execution of the
interval technique and the subsequent FPT identification for bearing 1-1 are visualized in
Figure 4b.
For experimental evaluation, vibration datasets from bearings 1-1, 1-2, 1-3 and 1-4 were selected. After deriving the MA condition indicator from the raw signals, the respective FPT for each bearing was identified via the aforementioned adaptive
methodology. Prior to initiating the online prediction at the identified FPT, the crucial hyperparameters (
,
,
,
, and
) of the proposed model were systematically calibrated for each specific bearing using the grid-search strategy detailed in
Section 4.1, ensuring adaptive alignment with the unique degradation dynamics.
To benchmark the prognostic efficacy of the proposed algorithm, comparative experiments were conducted against six conventional counterparts: two data-driven models (RVM [
23] and DBN [
25]), two model-based filters (PF [
19] and EKF [
20]), Transformer [
31] and LSTM [
28]. The model parameters were carefully calibrated based on the recommendations from their original literature and further optimized through grid search and cross-validation to guarantee that each baseline model achieved its best possible predictive performance on the given datasets. Guided by extensive observations of specific degradation patterns, iterative empirical testing, and domain expertise, a uniform failure threshold of 20 g was prescribed for all test cases. This limit represents the maximum permissible vibration amplitude for safe mechanical operation; surpassing this value signifies critical bearing failure, mandating immediate replacement. The degradation prediction trajectories commencing at the FPT are graphically depicted in
Figure 5.
The prediction error for each evaluated algorithm is quantified by computing the temporal discrepancy between the actual failure occurrence and the predicted moment when the trajectory first intersects the threshold. Furthermore, to rigorously ascertain the temporal robustness and dynamic tracking capability of the models, the chronological duration bounded by the FPT and the ultimate failure threshold was evenly partitioned into 10 discrete segments. At the onset of each segment, a sequential RUL estimation trial was re-initialized. The updated prognostic outputs and their corresponding estimation errors were systematically documented across the progressive degradation continuum. This iterative multi-stage validation procedure was replicated across all selected bearings, and the aggregated prediction outcomes from these diverse starting points are summarized in
Figure 6.
4.3. Case II: RUL Prediction on PRONOSTIA Datasets
4.3.1. Datasets
To further evaluate the proposed methodology, acceleration-based degradation datasets were acquired from the PRONOSTIA testbed, developed by the FEMTO-ST Institute. As depicted in
Figure 7, this experimental rig is engineered to execute accelerated run-to-failure testing on rolling element bearings across diverse operational scenarios. The mechanical architecture of the testbed comprises a rotating shaft driven by an AC electric motor, sustained by a primary support bearing and the designated test bearing. To emulate varying environmental stresses, an adjustable radial force is exerted on the test bearing via a hydraulic actuator, while the motor dictates the rotational velocity. For vibration monitoring, two DYTRAN 3035B high-frequency (HF) accelerometers—featuring a sensitivity of
and an operating bandwidth spanning from
to
—were orthogonally attached (in horizontal and vertical orientations) to the exterior of the bearing housing. Furthermore, to maximize signal fidelity and mitigate interference from ambient vibrations, these sensors were deliberately positioned in strict proximity to the test bearing.
4.3.2. Experimental Results
During the experimental phase, an accelerated degradation protocol was implemented, driving the bearing specimens at alongside a static radial force up to the point of severe structural damage. Vibration signals were intermittently recorded at 10-s intervals. In each sampling cycle, a 0.1-s snapshot was acquired utilizing a sampling rate of , yielding 2560 discrete data points per sample. Employing such a high-resolution sampling strategy is imperative for capturing the transient, high-frequency oscillatory features indicative of incipient bearing defects.
The overall dataset encompasses three distinct experimental groups, with each group containing seven run-to-failure bearing tests. For the RUL estimation experiments, datasets from bearings 1-1, 1-4, 2-2, and 3-2 were randomly selected to evaluate prognostic performance, strictly adhering to the technical framework implemented for the XJTU-SY datasets. Initially, the FPT was adaptively determined by leveraging a confidence interval criterion.
Consistent with the technical framework implemented for the XJTU-SY datasets, the model’s hyperparameters (, , , , and ) for bearings 1-1, 1-4, 2-2, and 3-2 were systematically optimized via grid search on their respective pre-FPT historical data prior to RUL estimation.
Subsequently, informed by domain expertise and iterative heuristic evaluations of the specific degradation trajectories, the final failure thresholds for bearings 1-1, 1-4, 2-2, and 3-2 were established at
,
,
, and
, respectively. The initial RUL prediction results triggered at the calculated FPTs are illustrated in
Figure 8.
Furthermore, to rigorously validate the robustness and temporal consistency of the proposed algorithm, the entire degradation phase—spanning from the FPT to the final failure threshold—was evenly partitioned into ten chronological segments. A renewed RUL prediction task was systematically initiated at the onset of each segmented interval. This progressive forecasting procedure continuously tracked the evolving prognostic trajectories and associated estimation errors until the degradation stage was fully traversed. This comprehensive evaluation protocol was replicated across all selected bearings, culminating in a synthesized statistical aggregation of the predictive performances, as depicted in
Figure 9.