Next Article in Journal
Poisson Multi-Bernoulli Filter Driven Information-Controlled Selection of Pose Graph Constraints for SLAM
Previous Article in Journal
Decentralized Tele-Rehabilitation via Edge AI-Oracle Architecture for Spatiotemporal Pain Assessment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Fractional-Derivative Multi-Kernel Adaptive Learning Approach for Remaining Useful Life Prediction of Rotating Machinery

1
School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
2
School of Computer Science and Technology, Yibin University, Yibin 644001, China
*
Author to whom correspondence should be addressed.
Sensors 2026, 26(13), 4137; https://doi.org/10.3390/s26134137
Submission received: 2 May 2026 / Revised: 10 June 2026 / Accepted: 23 June 2026 / Published: 1 July 2026
(This article belongs to the Section Fault Diagnosis & Sensors)

Abstract

Robust Remaining Useful Life (RUL) forecasting is indispensable for condition-based maintenance in rotating machinery. Nevertheless, realizing high predictive precision constitutes an arduous endeavor, primarily complicated by the highly nonlinear and nonstationary nature of degradation processes. Existing prognostic approaches typically face critical bottlenecks: physical models require arduous parameter calibration, while data-driven deep learning methods suffer from “black-box” limitations and rely heavily on massive run-to-failure datasets. To overcome these challenges, this paper proposes a novel fractional-derivative multi-kernel adaptive learning approach for robust RUL prediction of rotating machinery. By integrating kernel adaptive learning with a multi-kernel mixture measure, the method provides a mathematically transparent “white-box” architecture that operates effectively in practical small-sample scenarios. Innovatively, the Hadamard fractional derivative is incorporated into the algorithm’s weight-updating mechanism, mathematically encoding the “memory capacity” and “hereditary properties” of physical degradation to capture complex long-range temporal dependencies. Additionally, an adaptive 3 σ confidence interval scheme featuring sequential delayed-triggering logic is designed for First Prediction Time (FPT) identification, effectively eliminating noise-induced false alarms. Extensive evaluations through multi-point sequential tracking on two practical datasets confirm that the proposed method surpasses established baselines. Notably, it achieves superior predictive accuracy and lower estimation errors while obtaining the lowest asymmetric penalty scores.

1. Introduction

Rotating machinery, such as aircraft engines, high-speed turbines, and industrial pumps, is the core power source of modern industrial systems, and its stable and reliable operation is crucial to ensuring production safety, reducing economic losses, and improving operational efficiency [1,2]. Reliable prognostics regarding the remaining useful life (RUL) of rotating machinery are fundamental to effective predictive maintenance, which can effectively avoid unplanned downtime, reduce maintenance costs, and ensure the safe and continuous operation of industrial equipment [3,4]. However, the degradation process of rotating machinery is nonlinear, nonstationary, and affected by multiple complex factors, resulting in great challenges in RUL prediction [5,6,7].
Driven by the increasingly stringent reliability and safety requirements in modern industrial systems, RUL forecasting for rotating machinery has emerged as a paramount focal point within reliability engineering. Existing prognostic approaches can be broadly categorized into two dominant paradigms: model-based methods [8] and data-driven methods [9]. Traditional model-based approaches fundamentally rely on the explicit physical degradation mechanisms of equipment to formulate mathematical representations. Prominent techniques in this category encompass stochastic process models—such as the Wiener process [10,11,12] and Gamma process [13,14,15]—as well as Bayesian state estimation algorithms [16,17], notably particle filtering (PF) [18,19] and extended Kalman filtering (EKF) [20,21]. The primary advantage of these mechanistic models lies in their transparent physical significance and robust mathematical interpretability, making them highly effective for scenarios with well-defined degradation dynamics. For instance, Lim et al. [21] introduced a multi-modal prognostic framework integrating a switching Kalman filter ensemble, which effectively mitigated prediction uncertainties in complex degradation processes. Similarly, Cui et al. [19] formulated a comprehensive RUL estimation strategy for rolling bearings utilizing time-varying particle filtering, thus improving the model’s adaptability to time-dependent degradation characteristics. Despite their theoretical soundness, the successful implementation of physical models necessitates a profound prior understanding of the equipment’s internal structural dynamics and fatigue evolution mechanisms. In practice, modern rotating machinery operates under highly coupled and variable conditions, rendering the construction of precise physical models exceptionally challenging. Furthermore, the reliance on complex differential equations often leads to an arduous parameter calibration process, which severely restricts their generalizability and broader engineering applications.
To circumvent the inherent bottlenecks associated with explicit physical modeling, data-driven methodologies have emerged as a highly promising alternative. These methods bypass the need for prior mechanistic knowledge, inferring degradation trajectories by directly extracting latent feature representations from voluminous condition monitoring data. Representative algorithms encompass conventional machine learning models, deep learning architectures, and nonlinear adaptive filters, such as Support Vector Machines (SVM) [22,23], Deep Belief Networks (DBN) [24,25], Convolutional Neural Networks (CNN) [26,27] and Long Short-Term Memory (LSTM) networks [28,29]. Considerable progress has been achieved within this paradigm. For example, Shen et al. [22] designed a novel transfer learning model based on SVM, effectively mitigating domain shifts and improving the prognostic adaptability of rolling bearings across disparate working conditions. Furthermore, Islam et al. [30] developed a recursive support vector regression approach that captures temporal dependencies to sequentially evaluate the RUL of rolling bearings. More recently, state-of-the-art prognostic frameworks have progressively incorporated advanced deep learning architectures to further push the boundaries of RUL prediction. For instance, Transformer models and self-attention mechanisms have been increasingly utilized to effectively capture complex long-range temporal dependencies in degradation sequences [31,32,33]. Additionally, Physics-Informed Neural Networks (PINNs) [34] and Graph Neural Networks (GNNs) [35] have emerged as powerful tools, integrating explicit physical degradation laws and spatial-temporal topologies into data-driven models to improve generalization capabilities under varying working conditions. While deep learning approaches (e.g., DBNs) exhibit powerful feature extraction capabilities, they are often hindered by their “black-box” nature and heavy reliance on massive amounts of run-to-failure training data, which are rarely available in actual industrial settings [36]. In contrast, kernel adaptive learning (KAL) methods [37,38,39] have recently attracted significant attention as an elegant online nonlinear filtering technology. By employing the kernel trick, KAL efficiently projects low-dimensional nonlinear input signals into a high-dimensional reproducing kernel Hilbert space (RKHS) to map complex degradation patterns. Compared to highly parameterized neural networks, KAL uniquely retains a degree of mathematical transparency and structural interpretability. Additionally, it boasts low computational overhead and ease of deployment, demonstrating robust predictive performance in practical small-sample scenarios where large-scale data collection is prohibitive.
In recent years, fractional derivative technology has been gradually applied to RUL prediction, which can effectively capture the memory and hereditary properties of nonstationary signals and better describe the long-range dependence of rotating machinery degradation processes [40]. For example, researchers have proposed fractional derivative-based learning methods such as FrKLMS [41] and FrKRLS [42], which have achieved better prediction results than traditional kernel learning methods. Although fractional derivative technology has shown great promise in describing the nonstationary characteristics of degradation signals, a critical theoretical gap remains: existing fractional derivative-based prediction methods have not been fundamentally integrated with the robust nonlinear optimization of kernel adaptive learning. Conventional KAL algorithms predominantly utilize integer-order gradient descent, which strictly depends on instantaneous prediction errors and inherently lacks the capacity to retain historical degradation information. Conversely, existing fractional approaches (e.g., FrKLMS and FrKRLS) often rely on standard Mean Square Error (MSE) criteria and computationally heavy fractional definitions, making them highly vulnerable to non-Gaussian impulsive noises and difficult to implement efficiently. To the best of our knowledge, no existing research has successfully embedded the Hadamard fractional derivative directly into the reproducing kernel Hilbert space (RKHS) optimization framework in conjunction with a multi-kernel mixture measure. By doing so, our proposed method redesigns the weight-updating mechanism to mathematically encode both “memory capacity” and complex nonlinear feature mapping, bridging a crucial gap in current prognostic methodologies.
In view of the above deficiencies, aiming at the problems of insufficient labeled data, poor interpretability of models, and difficulty in capturing nonstationary degradation characteristics in RUL prediction of rotating machinery, this study combines fractional derivative and kernel adaptive learning to propose a new RUL prediction method. The fractional derivative is used to enhance the ability of capturing nonstationary and long-range dependent characteristics of degradation signals, and the kernel adaptive learning is used to improve the nonlinear fitting ability and interpretability of the model, so as to solve the key technical problems in RUL prediction of rotating machinery and provide a new technical route for practical engineering application.
The pivotal contributions of this study are delineated as follows:
(1) The Hadamard fractional derivative is innovatively incorporated into the algorithm’s gradient descent and weight-updating mechanism. Unlike traditional integer-order models that solely rely on short-term instantaneous errors, this fractional operator mathematically encodes the “memory capacity” and “hereditary properties” of physical equipment, empowering the model to accurately extract complex long-range temporal dependencies native to structural degradation.
(2) A multi-kernel mixture (MKM) measure is uniquely integrated into the fractional-order optimization objective to replace the conventional mean square error. This structural enhancement significantly improves the algorithm’s operational robustness, effectively suppressing the adverse impacts of high-level noises and extreme measurement outliers prevalent in harsh industrial settings.
(3) By deeply integrating the aforementioned techniques into an adaptive learning paradigm, the proposed method yields a mathematically transparent “white-box” architecture. It successfully circumvents the opaque “black-box” limitations and heavy training data dependency of current deep learning models, making it highly advantageous for real-world small-sample scenarios.

2. Preliminaries

2.1. Kernel Adaptive Learning

When employing the kernel adaptive learning to address nonlinear time-series forecasting tasks, the primary objective is to construct a functional relationship based on a given set of training pairs { x i , d i } i = 1 n . In this context, x i U R m × 1 serves as the m-dimensional input vector of the learning algorithm at the i-th iteration, while d i R represents the corresponding desired target output. According to kernel learning theory [43], the original input data x i is implicitly mapped into a high-dimensional feature space F by means of a reproducing kernel. Subsequently, the model’s prediction is derived from a linear combination of these weighted transformed features, which is mathematically formulated as
f ( x i ) = W T φ ( x i )
where W F indicates the adaptive weight of the model, and  φ ( · ) acts as a non-linear mapping function that adheres to the following inner product condition:
φ ( u ) T φ ( v ) = κ ( u , v )
with κ ( · , · ) denoting a reproducing kernel characterized by its universal approximation capability. Analogous to conventional supervised learning paradigms, kernel adaptive learning algorithms are required to minimize a specific objective function to determine the optimal weight vector ω . Traditionally, the algorithm utilizes a cost function governed by the least mean square error criterion. This optimization problem can be expressed as follows:
min ω i = 1 n | d i W T φ ( x i ) | 2 , s . t . d i = W T φ ( x i ) + ε i
where ε i signifies the estimation error. At every iterative update, the optimal weight vector is recursively updated by solving the minimization problem defined in Equation (3). Figure 1 presents the complete structural architecture of the kernel adaptive learning model.

2.2. Fractional Calculus

Fractional calculus broadens the traditional scope of integer-order differentiation by generalizing it to non-integer domains. It has been extensively adopted across various scientific and engineering fields due to its unique capability to model the memory effects and hereditary characteristics inherent in complex dynamic systems. Among the diverse methodologies proposed for defining fractional operators, the Hadamard fractional derivative is particularly prominent. Distinct from alternative formulations, the Hadamard derivative is constructed upon the remainder term of a Taylor series expansion. Consequently, it evaluates the local asymptotic behavior of a function in the vicinity of a specific point, circumventing the need for intricate integral transformations or the Gamma function [44]. Mathematically expressed in Equation (4), this formulation proves exceptionally effective for characterizing ultra-slowly evolving physical processes. Typical applications include modeling geological rock creep, structural weathering, and the progressive mechanical degradation of rolling bearings:
D β ( u ) = lim u u 0 h ( u ) T n 1 ( u ) ( u u 0 ) β
where D β ( u ) denotes the fractional differential operator of order β ; T β 1 ( u ) represents the Taylor polynomial of degree ( β 1 ) for the function h ( u ) expanded at u 0 ; and β stands for the ceiling function, which yields the smallest integer strictly greater than β . Consequently, the Hadamard fractional derivative provides a structurally concise and mathematically intuitive framework, effectively bridging the gap between abstract theoretical analysis and practical engineering implementations. When contrasted with alternative fractional derivative definitions, the Hadamard approach exhibits several distinct advantages [45]:
(1)
By circumventing convoluted Gamma function evaluations, its mathematical formulation remains relatively straightforward, which significantly enhances its interpretability and conceptual clarity.
(2)
It eliminates the strict requirement for predefined initial boundary conditions, thereby broadening its adaptability and applicability across diverse operational scenarios.
(3)
It significantly simplifies the derivation and establishment of fundamental analytical properties essential for theoretical research, including the fractional chain rule.

3. Proposed Method

3.1. Overall Process Framework

Unlike contemporary end-to-end deep learning approaches that often function as opaque “black boxes” and heavily rely on massive labeled datasets, the proposed framework leverages a white-box analytical architecture driven by the fractional derivative multi-Kernel adaptive learning algorithm.
The overall process framework is systematically delineated into six interconnected phases as shown in Figure 2.
(1)
Data Acquisition
The prognostic pipeline initiates with the empirical collection of physical operational data. Utilizing mechanical accelerated life testing platforms, horizontal and vertical high-frequency accelerometers continuously capture the raw, high-dimensional vibration signals of the rotating machinery. This dynamic monitoring spans the entire lifecycle—from a pristine health state to complete structural failure—under varying operational conditions, such as distinct rotational speeds and radial loads.
(2)
Feature Extraction
Because raw vibration signals are highly susceptible to environmental noise and lack intuitive degradation signatures, time-domain feature extraction is executed to formulate reliable, low-dimensional health indicators. Primary Degradation Indicator: The Maximum Amplitude (MA) of the vibration signal is extracted to quantitatively characterize the macroscopic, progressive mechanical degradation trajectory of the bearing over its lifecycle. Simultaneously, the statistical Kurtosis feature is extracted. Due to its high sensitivity to early-stage transient impulses, it is utilized to identify incipient structural anomalies and facilitate health state demarcation.
(3)
State Division
To optimize computational resources and prevent premature or erroneous prognostic estimations during the normal operational phase, an adaptive anomaly detection mechanism is established. The 3 σ Confidence Interval: Utilizing historical kurtosis data from the early, stable healthy stage, the system establishes a statistical baseline defined by a 3 σ interval [ μ 3 σ , μ + 3 σ ] . To mitigate false alarms induced by random ambient noise, a robust sequential triggering logic is applied. The RUL prediction mechanism is exclusively activated when a predefined sequence ( l + 1 ) of consecutive kurtosis values strictly breaches this 3 σ boundary. This specific chronological coordinate is designated as the First Prediction Time (FPT), officially marking the transition into the degenerative stage.
(4)
Model Specification
Upon reaching the FPT, the proposed algorithm is deployed to conduct forecasting on the non-linear degradation trajectory. The core novelty of this model lies in the unprecedented mathematical integration of three advanced theoretical paradigms:
Kernel Adaptive Learning: Employs the “kernel trick” to project low-dimensional input vectors into a high-dimensional feature space, providing a mathematically transparent non-linear time-series regression mechanism that overcomes the “black-box” nature and data-hungry limitations of neural networks.
Multi-kernel mixture measure: Substitutes the traditional MSE cost function with the MKM measure. This structural adaptation fundamentally enhances the algorithm’s robustness, effectively shielding the prognostic model from non-Gaussian noise and extreme outliers prevalent in harsh industrial settings.
Fractional Derivative: Serving as the core methodological innovation, the incorporation of the Hadamard fractional-order calculus directly into the MKM-based gradient descent mechanism mathematically encodes “memory capacity” and “hereditary properties” into the model’s adaptive weights. This unique integration allows the algorithm to precisely capture long-term temporal dependencies without the excessive computational overhead of traditional fractional filters, significantly outperforming integer-order counterparts.
(5)
Prognostic Execution
To estimate the RUL, a deterministic failure threshold is first established based on domain expertise, historical failure modes, and safety regulations. Starting from the prediction origin, the trained algorithm autoregressively forecasts the future trajectory of the MA degradation curve. The exact time at which this extrapolated trajectory intersects the predefined threshold is recorded as the predicted failure time. Consequently, the estimated RUL is calculated as the time difference between the projected failure time and the prediction starting point.
(6)
Performance Analysis
To rigorously validate the algorithm’s engineering viability, robustness, and stability, the framework’s evaluation extends beyond a single static prediction at the FPT by adopting a multi-point sequential tracking approach. Specifically, the temporal span between the FPT and the actual failure is discretized into multiple uniform segments (e.g., 10 segments). At the onset of each segment, the prediction algorithm iteratively updates the RUL prediction by incorporating newly acquired operational data, thereby simulating a continuous, online prognostic monitoring environment.

3.2. Fractional Derivative Multi-Kernel Adaptive Learning Algorithm

3.2.1. Problem Definition

The health condition of rotating machinery deteriorates over time, and its maximum deliverable capacity gradually decreases. This process of state decline can be mathematically represented as a nonlinear time series related to the number of operating cycles. To accurately characterize this degradation mechanism, the extracted system state feature x is selected as the primary state variable. Consequently, the one-step-ahead capacity forecasting model is constructed as follows:
x ( i ) = f ( x ( i 1 ) , x ( i 2 ) , , x ( i l ) ) + v ( i )
Here, f ( · ) represents an unknown mapping function that governs the underlying degradation dynamics, x i indicates the system state feature at the i-th cycle, and  v ( i ) accounts for the associated modeling noise. Additionally, the parameter l signifies the time-embedding dimension. Given the highly complex decay mechanism, deriving an exact analytical formulation for f ( · ) is practically unfeasible. To address this challenge and obtain a reliable approximation, the fractional derivative multi-kernel adaptive learning algorithm is utilized to construct a data-driven prediction model. Assuming a given training dataset denoted by { ( x i , y i ) } i = 1 n , where the historical input feature vector is defined as x i = [ x ( i 1 ) , x ( i 2 ) , , x ( i l ) ] T and the target output is y i = x ( i ) , the predictive mapping can be estimated via
y i ω T φ ( x i ) = j = 1 n ω n ( j ) κ ( x j , x i )
In this expression, ω n ( j ) specifies the dynamically adjusted weight assigned to the j-th support vector x j stored in the dictionary at iteration n, where n represents the total number of training steps completed.

3.2.2. Multi-Kernel Mixture Measure

As is the case with various machine learning frameworks, the choice of an appropriate cost function is of paramount importance for the effective training of kernel adaptive learning algorithm. To mitigate the adverse effects introduced by measurement outliers, the multi-kernel mixture (MKM) measure is adopted as the objective function in this study. For any two continuous random variables, u and v, the mathematical formulation of the MKM measure is defined as
M ( u , v ) = E [ α κ σ 1 ( u v ) + ( 1 α ) κ σ 2 ( u v ) ]
in which E [ · ] means the mathematical expectation operator, α [ 0 , 1 ] represents the mixture weighting coefficient, and  σ 1 as well as σ 2 specify the kernel bandwidths of the standard Gaussian function κ σ ( u v ) , which is given by
κ σ ( u v ) = exp ( u v ) 2 2 σ 2
Originating from the information theoretic learning paradigm [46], the MKM serves as a localized similarity metric, essentially representing a generalized correlation within the reproducing kernel space. Given a bounded collection of error observations e = [ e 1 , e 2 , , e n ] T , the empirical estimation of the MKM-derived cost function is evaluated through sample averaging
J ( e ) = 1 1 n i = 1 n α κ σ 1 ( e i ) + ( 1 α ) κ σ 2 ( e i )
During the adaptive learning process, the optimal filter weights are updated by minimizing this empirical cost function. For an individual instantaneous error e, the corresponding sample-wise loss is expressed as L ( e ) = 1 [ α κ σ 1 ( e ) + ( 1 α ) κ σ 2 ( e ) ] .

3.2.3. Algorithm Derivation

The optimal weight vector ω i at training step i can be derived by minimizing the regularized objective function:
min ω i F j I d [ 1 α κ σ 1 ( e j ) ( 1 α ) κ σ 2 ( e j ) ] + γ 2 ω i F 2
In this optimization problem, ω i F designates the adaptive weight vector, while I d = { 1 , 2 , , i } specifies the set of indices up to the current iteration. The term κ α 1 ( e j ) and κ α 2 ( e j ) compute the unnormalized kernel evaluation derived from Equation (7). Furthermore, d j indicates the desired target response, φ j = φ ( u j ) denotes the transformed input vector for j I d , e j = d j ω i T φ j quantifies the instantaneous prediction error between the target and the model output, and  γ serves as a non-negative regularization penalty parameter.
By algebraically rearranging the cost function to incorporate the memory-preserving characteristics of fractional calculus, the optimization criterion is equivalently extended to a maximization problem in Equation (11):
max ω i F j I d [ α κ σ 1 ( d j ω i T φ j ) + ( 1 α ) κ σ 2 ( d j ω i T φ j ) ] D β ( γ ω i )
To determine the optimal weights, we apply the β -th order fractional derivative to Equation (11) with respect to ω i and equate the gradient to zero. Applying the fractional chain rule establishes the critical condition detailed in Equation (12):
j I d α φ j e j σ 1 2 β exp e j 2 2 σ 1 2 + ( 1 α ) φ j e j σ 2 2 β exp e j 2 2 σ 2 2 γ ω i = 0
To simplify this complex expression and make the mathematical derivation more transparent, we factorize the fractional error term as e j β = e j β 1 · e j . This allows us to extract the shared terms and define an intermediate error-weighting scalar variable g j for each sample j:
g j = α σ 1 2 β exp e j 2 2 σ 1 2 + ( 1 α ) σ 2 2 β exp e j 2 2 σ 2 2 e j β 1
With this definition, the terms inside the summation of Equation (12) can be elegantly condensed to φ j β g j e j . By substituting the definition of the instantaneous estimation error e j = d j ω i T φ j into this simplified summation and separating the target response from the model prediction, Equation (12) is sequentially transformed into Equation (14):
j I d φ j β g j ( d j ω i T φ j ) γ ω i = 0 j I d φ j β g j d j j I d φ j β g j ω i T φ j γ ω i = 0
To express this relationship computationally efficiently in a compact matrix format, we define the following matrices and vectors for the sequence up to the current iteration i: The mapped input data matrix Ψ i = [ φ 1 , φ 2 , . . . , φ i ] ; The β -th order fractional derivative input matrix Φ i = [ φ 1 β , φ 2 β , . . . , φ i β ] ; The diagonal weighting matrix G i = diag ( g 1 , g 2 , . . . , g i ) ; The target response vector D i = [ d 1 , d 2 , . . . , d i ] T .
Using these definitions, we construct the error-weighted target response vector D ¯ i = G i D i = [ g 1 d 1 , g 2 d 2 , . . . , g i d i ] T . Consequently, the algebraic summations in Equation (14): can be directly translated into the matrix form depicted in Equation (15):
Φ i D ¯ i Φ i G i Ψ i T ω i = γ ω i
By factoring out the optimal weight vector ω i , its explicit analytic solution can be rearranged as Equation (16):
ω i = γ 1 Φ i ( D ¯ i G i Ψ i T ω i ) = Φ i Θ i
In this operation, we encapsulate the intermediate expression within an auxiliary parameter vector Θ i = γ 1 ( D ¯ i G i Ψ i T ω i ) . Substituting the identity ω i = Φ i Θ i back into this definition yields Equation (17):
γ Θ i = D ¯ i G i Ψ i T Φ i Θ i
Isolating Θ i provides its closed-form representation:
Θ i = ( γ I + G i Ψ i T Φ i ) 1 D ¯ i
Substituting the expression for Θ i from Equation (18) back into Equation (16) provides the finalized mathematical update rule for the weights:
ω i = Φ i ( γ I + G i Ψ i T Φ i ) 1 D ¯ i
To facilitate a computationally efficient recursive evaluation of ω i without requiring a full matrix inversion at every time step, we construct an inverse correlation matrix C i = ( γ I + G i Ξ i ) 1 , where Ξ i = Ψ i T Φ i computes the kernelized inner product mapping between Ψ i and Φ i . As new data arrives, the matrix product G i Ξ i can be structurally decomposed into a block-partitioned format, separating the previous i 1 computations from the current i-th update, as illustrated in Equation (20):
G i Ξ i = G i 1 0 0 T g i Ξ i 1 μ i μ i T κ i i = G i 1 Ξ i 1 G i 1 μ i g i μ i T κ i i g i
In this block decomposition, 0 denotes a zero vector of appropriate dimensions. The cross-correlation vector is defined as μ i = Ψ i 1 T φ i β , and the scalar self-inner product evaluates to κ i i = φ i , φ i β F , where · , · F is the inner product in F . Consequently, C i can be recursively structured via the block matrix in Equation (21):
C i = ( γ I + G i Ξ i ) 1 = C i 1 1 G i 1 μ i g i μ i T κ i i g i + γ 1
Applying the standard block matrix inversion lemma to Equation (21): enables the analytical inversion of the matrix block by block. This allows C i to be elegantly updated via Equation (22):
C i = C i 1 + θ i 1 g i z G i z i T θ i 1 z G i θ i 1 g i z i T θ i 1
where the intermediate transitional vectors ( z G i and z i ) and the scalar normalizer ( θ i , acting as the Schur complement) are defined as follows to simplify the notation:
z G i = C i 1 G i 1 μ i , z i = C i 1 T μ i
θ i = γ + κ i i g i g i μ i T C i 1 G i 1 μ i
Furthermore, by iteratively partitioning the weighted target vector as D ¯ i = [ D ¯ i 1 T , g i d i ] T , the recursive update rule for the pivotal parameter vector Θ i = C i D ¯ i is obtained by multiplying the expanded block matrix C i with the partitioned vector:
Θ i = ( C i 1 + θ i 1 g i z G i z i T ) D ¯ i 1 θ i 1 z G i g i d i θ i 1 g i z i T D ¯ i 1 + θ i 1 g i d i
By recognizing that Θ i 1 = C i 1 D ¯ i 1 and that the a priori system prediction is y i = μ i T Θ i 1 = z i T D ¯ i 1 , we can substitute the a priori estimation error e i = d i y i into the expression. Factoring out the terms allows for the recursive update rule for Θ i to elegantly simplify to Equation (26):
Θ i = Θ i 1 θ i 1 g i z G i e i θ i 1 g i e i
In this recursion, e i characterizes the a priori estimation error at the i-th iteration step, evaluated as e i = d i μ i T Θ i 1 . Ultimately, the predicted adaptive system output y i for the current instance is acquired through y i = μ i T Θ i 1 . A comprehensive architectural overview of the proposed algorithm is systematically detailed in Algorithm 1.
Algorithm 1: Fractional derivative multi-kernel adaptive learning algorithm.
Sensors 26 04137 i001

3.3. Interpretability and Explainable Mechanism

While conventional data-driven deep learning methodologies (e.g., Convolutional Neural Networks, Deep Belief Networks) function as uninterpretable “black boxes” with myriad opaque parameters and hidden layers, the proposed fractional-derivative multi-kernel adaptive learning framework inherently possesses a transparent “white-box” architecture. The interpretability and explainable mechanisms of the framework are primarily established upon three dimensions:
(1) In typical deep neural networks, the feature extraction and nonlinear mapping processes are largely unobservable. In contrast, the prediction mechanism of the proposed model is strictly governed by an explicit mathematical formulation (Equation (6)). The predicted output y i is calculated as a linear combination of kernel evaluations κ ( x j , x i ) weighted by ω n ( j ) . This explicit structure allows the prediction to be directly interpreted as a similarity-weighted aggregation of historical representative states (support vectors). Practitioners can intuitively trace exactly which past degradation patterns most significantly drive the current RUL estimation. Furthermore, instead of relying on opaque gradient backpropagation, the learning process is mathematically transparent. The derivation from the objective function (Equation (10)) to the recursive block-matrix update rules (Equations (18)–(26)) precisely demonstrates how the model parameters are incrementally adjusted step-by-step based on instantaneous estimation errors.
(2) In standard machine learning models, hyperparameters often lack physical meaning. However, the incorporation of the Hadamard fractional derivative serves as a physically meaningful operator. The wear and fatigue of rotating machinery are historically dependent, accumulative processes rather than memoryless Markovian ones. The fractional-order operator ( β ) mathematically parameterizes this “memory capacity” and “hereditary properties” over the entire degradation trajectory. This provides a transparent mathematical formulation of how cumulative structural damage continuously influences the current physical state, bridging abstract algorithmic optimization with actual mechanical degradation principles.
(3) The adoption of the Multi-Kernel Mixture (MKM) measure (Equation (7)) provides an interpretable noise-handling mechanism. The mixture coefficient α and distinct kernel bandwidths ( σ 1 , σ 2 ) explicitly govern the trade-off between standard prediction accuracy and outlier suppression. During the weight-updating process, the intermediate parameter g i (defined in Algorithm 1) explicitly acts as a dynamic confidence penalty based on the error distribution. This provides a clear functional explanation for its superior robustness compared to traditional Mean Square Error (MSE)-based filters, as the model mathematically scales down the influence of extreme non-Gaussian sensor anomalies in real-time.

4. Experiments

This section evaluates the proposed method using two publicly available rolling bearing datasets: the XJTU-SY bearing lifetime dataset from Xi’an Jiaotong University–Shaanxi [47] and the PRONOSTIA benchmark dataset [48]. Both the XJTU-SY and PRONOSTIA datasets consist of run-to-failure vibration acceleration data acquired from accelerated mechanical degradation testbeds. Specifically, the data contains raw, high-frequency acceleration signals captured continuously by horizontal and vertical high-frequency accelerometers from a pristine healthy state until complete structural failure.

4.1. Hyperparameter Optimization and Justification

The model is deployed for real-time predictive maintenance. Raw vibration signals are continuously captured, and Maximum Amplitude (MA) and Kurtosis features are extracted. During the healthy stage, an adaptive 3 σ interval mechanism monitors the Kurtosis feature. Once anomalies are detected (the First Prediction Time, FPT), the prediction algorithm is officially activated.
Inputs: The mathematical input to the model at the i-th cycle is a time-embedded sliding window of the historical Maximum Amplitude (MA) feature extracted from the recent vibration data: x i = [ x ( i l ) , x ( i l + 1 ) , , x ( i 1 ) ] T , where l is the time-embedding dimension.
Outputs: The mathematical output is an autoregressive one-step-ahead prediction of the degradation state, y i x ( i ) .
RUL Calculation: By iteratively feeding its one-step predictions back into itself as new inputs, the algorithm extrapolates the continuous future degradation trajectory. The Remaining Useful Life (RUL) is calculated as the precise time difference between the current moment and the predicted timestamp when the extrapolated trajectory intersects the predefined mechanical acceleration cutoff (failure threshold).
To ensure the robustness, optimal predictive performance, and reproducibility of the proposed fractional-derivative multi-kernel adaptive learning framework, the crucial hyperparameters ( α , β , σ 1 , σ 2 , and γ ) must be systematically justified and optimized prior to prognostic execution. Instead of relying on empirical assignments, a grid-search optimization strategy combined with time-series cross-validation is employed. The physical and mathematical justifications, along with the search spaces for these parameters, are delineated as follows:
(1) Kernel bandwidths ( σ 1 , σ 2 ) and mixture weighting coefficient ( α ): These parameters govern the multi-kernel mixture measure. To effectively handle distinct types of error distributions, two contrasting bandwidths are utilized. The smaller bandwidth σ 1 is tailored to capture fine-grained, localized modeling deviations, whereas the larger bandwidth σ 2 is designated to suppress extreme non-Gaussian measurement outliers commonly encountered in harsh industrial environments. Consequently, the search spaces are respectively set as σ 1 [ 0.1 , 2.0 ] and σ 2 [ 2.0 , 10.0 ] . The parameter α ( 0 , 1 ) , which controls the trade-off between local sensitivity and global outlier robustness, is optimized with a step size of 0.1.
(2) Fractional derivative order ( β ): As the pivotal parameter encoding the “memory capacity” and hereditary attributes of the mechanical degradation process, β strictly governs the model’s ability to extract long-range temporal dependencies. Given the varying complexities of physical degradation across different operating conditions, β is systematically tuned via grid search within the interval ( 0 , 2.0 ] with a step size of 0.05.
(3) Regularization parameter ( γ ): To constrain the norm of the adaptive weights and prevent algorithmic overfitting during the adaptive learning process, the regularization penalty γ is optimized over a logarithmically spaced candidate set { 10 4 , 10 3 , 10 2 , 10 1 , 1 } .
In the practical implementation for both the XJTU-SY and PRONOSTIA datasets, the optimal parameter combination is systematically determined using historical condition monitoring data acquired prior to the First Prediction Time (FPT). Specifically, the grid-search algorithm iteratively evaluates the candidate parameters, and the combination that minimizes the cross-validation error on the historical degradation sequence is selected. Once identified, these optimized hyperparameters are locked and deployed for the online remaining useful life (RUL) prediction.

4.2. Case I: RUL Prediction on XJTU-SY Datasets

4.2.1. Datasets

In this study, the XJTU-SY testbed (illustrated in Figure 3) was employed to capture the complete run-to-failure degradation trajectories of LDK UER204 rolling bearings. To ensure extensive coverage of operational states, the experiments were executed across three representative speed-load profiles (2100 rpm/12 kN, 2250 rpm/11 kN, and 2400 rpm/10 kN). During the testing phase, dual-axis (lateral and vertical) vibrations and thermal variations were sampled at 25.6 kHz, capturing 1.28-s snapshots every 60 s. This rigorous acquisition strategy yields a massive, high-quality dataset characterized by a superior signal-to-noise ratio. Notably, the induced bearing damages present a broad spectrum of fault manifestations, mainly consisting of inner race deterioration, outer race fractures, and broken cages.

4.2.2. Experimental Results

Employing the vibration signal’s maximum amplitude (MA) as the health monitoring feature, the full life-cycle degradation trajectory of bearing 1-1 is presented as a representative example in Figure 4a. The bearing’s operational lifespan systematically unfolds across three phases: the healthy operation stage, the incipient degradation stage, and the severe degradation stage. During the healthy operation stage, the vibration amplitude maintains a consistently low and stable profile. As the bearing transitions into the incipient degradation stage, a gradual escalation in signal amplitude occurs, denoting the onset of structural damage. Subsequently, the degradation index deteriorates drastically with exponential growth, marking the entry into the severe degradation stage. This critical phase transition necessitates the initiation of RUL prediction, and the exact moment this prognostic mechanism is triggered is defined as the first prediction time (FPT).
To determine the FPT objectively, this study adopts the adaptive 3 σ interval strategy to distinguish between normal and anomalous bearing states. Initially, historical monitoring data from the healthy operational phase is utilized to establish the 3 σ confidence interval [ m 3 σ , m + 3 σ ] , where m and σ denote the sample mean and standard deviation of the kurtosis feature, respectively. This statistical baseline is then employed to detect anomalous conditions. Upon acquiring a new kurtosis measurement m f at timestamp t f , it is evaluated against the established 3 σ bounds. A value falling outside this interval indicates a potential anomaly, which may originate from actual physical defects or stochastic noise. To preclude false alarms induced by random noise, a robust delayed-triggering criterion is implemented: the prognostic forecasting is activated exclusively when l + 1 consecutive kurtosis readings violate the 3 σ threshold. The execution of the 3 σ interval technique and the subsequent FPT identification for bearing 1-1 are visualized in Figure 4b.
For experimental evaluation, vibration datasets from bearings 1-1, 1-2, 1-3 and 1-4 were selected. After deriving the MA condition indicator from the raw signals, the respective FPT for each bearing was identified via the aforementioned adaptive 3 σ methodology. Prior to initiating the online prediction at the identified FPT, the crucial hyperparameters ( α , β , σ 1 , σ 2 , and γ ) of the proposed model were systematically calibrated for each specific bearing using the grid-search strategy detailed in Section 4.1, ensuring adaptive alignment with the unique degradation dynamics.
To benchmark the prognostic efficacy of the proposed algorithm, comparative experiments were conducted against six conventional counterparts: two data-driven models (RVM [23] and DBN [25]), two model-based filters (PF [19] and EKF [20]), Transformer [31] and LSTM [28]. The model parameters were carefully calibrated based on the recommendations from their original literature and further optimized through grid search and cross-validation to guarantee that each baseline model achieved its best possible predictive performance on the given datasets. Guided by extensive observations of specific degradation patterns, iterative empirical testing, and domain expertise, a uniform failure threshold of 20 g was prescribed for all test cases. This limit represents the maximum permissible vibration amplitude for safe mechanical operation; surpassing this value signifies critical bearing failure, mandating immediate replacement. The degradation prediction trajectories commencing at the FPT are graphically depicted in Figure 5.
The prediction error for each evaluated algorithm is quantified by computing the temporal discrepancy between the actual failure occurrence and the predicted moment when the trajectory first intersects the threshold. Furthermore, to rigorously ascertain the temporal robustness and dynamic tracking capability of the models, the chronological duration bounded by the FPT and the ultimate failure threshold was evenly partitioned into 10 discrete segments. At the onset of each segment, a sequential RUL estimation trial was re-initialized. The updated prognostic outputs and their corresponding estimation errors were systematically documented across the progressive degradation continuum. This iterative multi-stage validation procedure was replicated across all selected bearings, and the aggregated prediction outcomes from these diverse starting points are summarized in Figure 6.

4.3. Case II: RUL Prediction on PRONOSTIA Datasets

4.3.1. Datasets

To further evaluate the proposed methodology, acceleration-based degradation datasets were acquired from the PRONOSTIA testbed, developed by the FEMTO-ST Institute. As depicted in Figure 7, this experimental rig is engineered to execute accelerated run-to-failure testing on rolling element bearings across diverse operational scenarios. The mechanical architecture of the testbed comprises a rotating shaft driven by an AC electric motor, sustained by a primary support bearing and the designated test bearing. To emulate varying environmental stresses, an adjustable radial force is exerted on the test bearing via a hydraulic actuator, while the motor dictates the rotational velocity. For vibration monitoring, two DYTRAN 3035B high-frequency (HF) accelerometers—featuring a sensitivity of 100 mV / g and an operating bandwidth spanning from 0.5 Hz to 10 kHz —were orthogonally attached (in horizontal and vertical orientations) to the exterior of the bearing housing. Furthermore, to maximize signal fidelity and mitigate interference from ambient vibrations, these sensors were deliberately positioned in strict proximity to the test bearing.

4.3.2. Experimental Results

During the experimental phase, an accelerated degradation protocol was implemented, driving the bearing specimens at 1800 rpm alongside a static 4 kN radial force up to the point of severe structural damage. Vibration signals were intermittently recorded at 10-s intervals. In each sampling cycle, a 0.1-s snapshot was acquired utilizing a sampling rate of 25.6 kHz , yielding 2560 discrete data points per sample. Employing such a high-resolution sampling strategy is imperative for capturing the transient, high-frequency oscillatory features indicative of incipient bearing defects.
The overall dataset encompasses three distinct experimental groups, with each group containing seven run-to-failure bearing tests. For the RUL estimation experiments, datasets from bearings 1-1, 1-4, 2-2, and 3-2 were randomly selected to evaluate prognostic performance, strictly adhering to the technical framework implemented for the XJTU-SY datasets. Initially, the FPT was adaptively determined by leveraging a 3 σ confidence interval criterion.
Consistent with the technical framework implemented for the XJTU-SY datasets, the model’s hyperparameters ( α , β , σ 1 , σ 2 , and γ ) for bearings 1-1, 1-4, 2-2, and 3-2 were systematically optimized via grid search on their respective pre-FPT historical data prior to RUL estimation.
Subsequently, informed by domain expertise and iterative heuristic evaluations of the specific degradation trajectories, the final failure thresholds for bearings 1-1, 1-4, 2-2, and 3-2 were established at 20 g , 15 g , 20 g , and 15 g , respectively. The initial RUL prediction results triggered at the calculated FPTs are illustrated in Figure 8.
Furthermore, to rigorously validate the robustness and temporal consistency of the proposed algorithm, the entire degradation phase—spanning from the FPT to the final failure threshold—was evenly partitioned into ten chronological segments. A renewed RUL prediction task was systematically initiated at the onset of each segmented interval. This progressive forecasting procedure continuously tracked the evolving prognostic trajectories and associated estimation errors until the degradation stage was fully traversed. This comprehensive evaluation protocol was replicated across all selected bearings, culminating in a synthesized statistical aggregation of the predictive performances, as depicted in Figure 9.

5. Discussion

5.1. Evaluation Metrics

The efficacy of the proposed model is assessed through a domain-specific prognostic score, in conjunction with three widely accepted error indicators: mean absolute percentage error (MAPE), root mean square error (RMSE), and cumulative relative accuracy (CRA). The CRA metric aggregates the relative estimation precision across all inspection timestamps, thereby providing a comprehensive evaluation of the overall predictive capability. Its mathematical formulation is defined as
C R A = k = 1 K R A ( T k ) · w k
where w k = k / i = 1 K i serves as a normalized weight coefficient, and R A ( T k ) denotes the relative prognostic accuracy at a specific time step T k , calculated by
R A ( T k ) = 1 | A c t R U L ( T k ) R U L ( T k ) | | A c t R U L ( T k ) |
Here, A c t R U L ( T k ) and R U L ( T k ) refer to the ground-truth and the estimated remaining useful life (RUL) values at T k , respectively. As a fundamental indicator of estimation fidelity, RMSE quantifies the global magnitude of deviations between the algorithm’s predictions and actual observations. It is calculated as follows:
R M S E = 1 K k = 1 K ( R U L ( T k ) A c t R U L ( T k ) ) 2
To intuitively evaluate the relative deviation, MAPE normalizes the absolute errors against the actual RUL values, yielding a percentage-based performance criterion:
M A P E = 100 % K k = 1 K R U L ( T k ) A c t R U L ( T k ) A c t R U L ( T k )
Finally, the Score metric employs an asymmetric exponential penalty function to assess the overall prediction reliability, where a lower computed value signifies superior accuracy. In practical industrial operations, overestimating the RUL is inherently more hazardous than underestimating it, as delayed maintenance can trigger unexpected and catastrophic equipment failures. To account for this asymmetric risk, Jiang et al. [49] introduced a specialized scoring mechanism that imposes significantly harsher penalties on over-predictions compared to early predictions. This approach aligns the algorithmic assessment closely with real-world maintenance and safety requirements. The Score is formulated as
S c o r e = k = 1 K exp E k 13 1 , E k < 0 exp E k 10 1 , E k 0
where E k = R U L ( T k ) A c t R U L ( T k ) denotes the estimation error at the k-th prediction step.

5.2. Analysis and Discussion

We evaluate the algorithmic performance quantitatively using the aforementioned metrics and provides a comprehensive discussion of the experimental results. The quantitative evaluations across the XJTU-SY and PRONOSTIA datasets are detailed in Table 1, Table 2, Table 3 and Table 4.
The CRA metric comprehensively assesses the relative estimation precision accumulated across all sequential prediction timestamps. As presented in Table 1, the proposed framework demonstrates superior global tracking capability, achieving the highest CRA in almost all tested bearing degradation trajectories. For instance, in the highly non-linear XJTU-SY B 13 scenario, the proposed method attains an exceptional CRA of 0.9386, markedly outperforming both advanced deep learning models like the Transformer (0.9215) and traditional state-estimation filters like EKF (0.8915). This superior temporal tracking validates the methodological innovation of embedding the Hadamard fractional derivative into the algorithm’s weight-updating mechanism. Unlike traditional integer-order gradient descent methods that exhibit “amnesia” by relying solely on short-term instantaneous errors, the fractional-order operator mathematically encodes the “memory capacity” and “hereditary properties” of structural fatigue. This empowers the model to accurately capture and extrapolate complex long-range temporal dependencies inherent in the physical degradation process.
Table 2 and Table 3 quantify the absolute and relative prediction deviations through RMSE and MAPE, respectively. Across all eight test cases under varying operating conditions, the proposed method consistently yields the lowest RMSE and MAPE values. The degradation trajectories in the PRONOSTIA dataset, collected under accelerated failure testing, are particularly susceptible to severe degradation fluctuations, high-frequency transient impulses, and nonstationary environmental noise. Consequently, traditional algorithms experience significant performance drops. For example, on the PRONOSTIA B 14 dataset, RVM and EKF produce remarkably high RMSE values of 19.1356 and 19.2435, respectively. In contrast, the proposed method effectively mitigates these deviations, securing the minimum RMSE of 12.3267 and the lowest MAPE of 5.6330%. This exceptional robustness is directly attributed to the incorporation of the Multi-Kernel Mixture (MKM) measure. By replacing the conventional Mean Square Error (MSE) criterion, the MKM measure establishes a dynamic confidence penalty that fundamentally isolates and suppresses the adverse impacts of extreme measurement outliers and non-Gaussian noises, ensuring a smooth and reliable RUL prediction trajectory.
In practical condition-based maintenance, overestimating the remaining useful life (i.e., late prediction) presents a substantially higher catastrophic risk than underestimating it, as delayed maintenance interventions can directly trigger unexpected mechanical breakdowns and severe system downtime. The asymmetric Score metric is explicitly formulated to heavily penalize such hazardous over-predictions. As demonstrated in Table 4, the proposed method secures the lowest (best) penalty scores across all experimental cases. For instance, on the PRONOSTIA B 22 dataset, the proposed framework achieves a score of 0.4951, presenting a substantial safety advantage over advanced models like the Transformer (0.5659) and LSTM (0.6014), as well as conventional approaches like RVM (0.9092). The consistently minimized Score confirms that the RUL estimations generated by the proposed algorithm are not only numerically precise but also mathematically conservative, effectively minimizing the risk of hazardous false-positive lifespan extensions and fulfilling the rigorous safety requirements of actual industrial applications.
Synthesizing the multi-metric assessments, the structural advantages of the proposed framework over existing prognostic paradigms become evident. While explicit mechanistic models (PF, EKF) require arduous mathematical derivations and physical parameter calibrations, advanced deep learning architectures (Transformer, LSTM) function as opaque “black boxes.” These deep learning models necessitate massive run-to-failure training datasets to optimize their countless hidden parameters and are prone to overfitting in real-world scenarios where only small sample sizes (data obtained solely after the First Prediction Time) are available. The proposed framework elegantly overcomes these bottlenecks. Operating as a computationally transparent “white-box” kernel adaptive filter, it bridges complex physical degradation dynamics with data-driven nonlinear optimization. It requires only limited historical data to accurately map nonstationary degradation patterns, providing an optimal balance of predictive fidelity, operational robustness, and algorithmic interpretability for the robust RUL prediction of rotating machinery.

6. Conclusions

In this paper, a novel fractional-derivative multi-kernel adaptive learning framework is proposed as an applied sensor signal processing tool for the RUL prediction of rotating machinery. Rather than focusing on pure machine learning theory, this study addresses the concrete engineering challenge of processing complex, non-stationary accelerometer signals. Addressing the limitations of “black-box” deep learning and complex physical models, this framework provides a mathematically transparent “white-box” architecture highly effective for practical small-sample industrial scenarios in sensor-based condition monitoring. The core innovation lies in integrating the Hadamard fractional derivative into the algorithm’s weight-updating mechanism, effectively encoding the “memory capacity” and hereditary properties of mechanical degradation to capture complex long-range temporal dependencies. Furthermore, a multi-kernel mixture measure is employed to enhance robustness against non-Gaussian noise and extreme outliers, while an adaptive 3 σ delayed-triggering scheme ensures reliable FPT identification by eliminating noise-induced false alarms. Comprehensive multi-point sequential tracking validations on two real-world datasets (XJTU-SY and PRONOSTIA) demonstrate the framework’s superiority. It significantly outperforms standard data-driven and model-based baselines, achieving the highest cumulative relative accuracy (CRA) alongside the lowest RMSE and MAPE.

Author Contributions

L.P. (Long Pan): writing—original draft preparation; J.X.: conceptualization; L.P. (Libiao Peng): methodology, funding acquisition; D.B.: validation; Y.X.: supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Natural Science Foundation of Sichuan Province under Grants 2026NSFSC1506 and the National Natural Science Foundation of China through Grants 62027803.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data generated or analyzed during this study are included in this article. The publicly available datasets mentioned in this article can be accessed through the following links: https://github.com/WangBiaoXJTU/xjtu-sy-bearing-datasets (accessed on 1 August 2018), https://github.com/topics/pronostia-dataset (accessed on 1 June 2012).

Conflicts of Interest

The authors confirm that there are no known financial conflicts of interest or personal relationships that might have impacted the research presented in this study.

References

  1. Perez, R.X. Overview of Rotating Machinery. In Process Machinery Handbook: For Field Personnel, Decision Makers, and Students; Wiley Online Library: Hoboken, NJ, USA, 2025; pp. 1–29. [Google Scholar]
  2. Kumar, N.; Satapathy, R. Bearings in aerospace, application, distress, and life: A review. J. Fail. Anal. Prev. 2023, 23, 915–947. [Google Scholar] [CrossRef]
  3. Ferreira, C.; Gonçalves, G. Remaining Useful Life prediction and challenges: A literature review on the use of Machine Learning Methods. J. Manuf. Syst. 2022, 63, 550–562. [Google Scholar] [CrossRef]
  4. Kumar, S.; Raj, K.K.; Cirrincione, M.; Cirrincione, G.; Franzitta, V.; Kumar, R.R. A comprehensive review of remaining useful life estimation approaches for rotating machinery. Energies 2024, 17, 5538. [Google Scholar] [CrossRef]
  5. Duan, X.; Feng, Z. Time-varying filtering for nonstationary signal analysis of rotating machinery: Principle and applications. Mech. Syst. Signal Process. 2023, 192, 110204. [Google Scholar] [CrossRef]
  6. Bagri, I.; Tahiry, K.; Hraiba, A.; Touil, A.; Mousrij, A. Vibration signal analysis for intelligent rotating machinery diagnosis and prognosis: A comprehensive systematic literature review. Vibration 2024, 7, 1013–1062. [Google Scholar] [CrossRef]
  7. Zhou, J.; Yang, J.; Xiang, S.; Qin, Y. Remaining useful life prediction methodologies with health indicator dependence for rotating machinery: A comprehensive review. IEEE Trans. Instrum. Meas. 2025, 74, 3528519. [Google Scholar] [CrossRef]
  8. Lei, Y.; Li, N.; Gontarz, S.; Lin, J.; Radkowski, S.; Dybala, J. A model-based method for remaining useful life prediction of machinery. IEEE Trans. Reliab. 2016, 65, 1314–1326. [Google Scholar] [CrossRef]
  9. Li, W.; Zhang, L.C.; Wu, C.H.; Wang, Y.; Cui, Z.X.; Niu, C. A data-driven approach to RUL prediction of tools. Adv. Manuf. 2024, 12, 6–18. [Google Scholar]
  10. Liu, W.; Yang, W.A.; You, Y. Three-stage wiener-process-based model for remaining useful life prediction of a cutting tool in high-speed milling. Sensors 2022, 22, 4763. [Google Scholar] [CrossRef] [PubMed]
  11. Zhang, X.; Shi, B.; Feng, B.; Liu, L.; Gao, Z. A hybrid method for cutting tool RUL prediction based on CNN and multistage Wiener process using small sample data. Measurement 2023, 213, 112739. [Google Scholar] [CrossRef]
  12. Liu, K.; Zou, T.J.; Xin, M.C.; Lv, C.M. RUL prediction based on two-phase wiener process. Qual. Reliab. Eng. Int. 2022, 38, 3829–3843. [Google Scholar] [CrossRef]
  13. Wang, H.; Liao, H.; Ma, X.; Bao, R. Remaining useful life prediction and optimal maintenance time determination for a single unit using isotonic regression and gamma process model. Reliab. Eng. Syst. Saf. 2021, 210, 107504. [Google Scholar] [CrossRef]
  14. You, K.; Qiu, G.; Gu, Y. Remaining useful life prediction of lithium-ion batteries using EM-PF-SSA-SVR with gamma stochastic process. Meas. Sci. Technol. 2024, 35, 015015. [Google Scholar]
  15. Zhou, S.; Xu, A.; Tang, Y.; Shen, L. Fast Bayesian inference of reparameterized gamma process with random effects. IEEE Trans. Reliab. 2023, 73, 399–412. [Google Scholar] [CrossRef]
  16. Liu, J.; Wang, D.; Kong, J.Z.; Li, N.; Peng, Z.; Tsui, K.L. New look at Bayesian prognostic methods. IEEE Trans. Autom. Sci. Eng. 2024, 23, 3225–3240. [Google Scholar] [CrossRef]
  17. Zheng, R.; Yang, B.; Qian, Y.; Li, H.; Gao, D.; Jiang, L. Joint SOH and RUL estimation for lithium-ion batteries via optimal deep belief network with Bayesian algorithm. J. Energy Storage 2025, 114, 115891. [Google Scholar] [CrossRef]
  18. Nguyen, B.V.; Jeon, J.W. Enhancing particle filter performance for high accuracy state estimation and rul prediction. IEEE Trans. Instrum. Meas. 2025, 74, 3537112. [Google Scholar] [CrossRef]
  19. Cui, L.; Li, W.; Wang, X.; Zhao, D.; Wang, H. Comprehensive remaining useful life prediction for rolling element bearings based on time-varying particle filtering. IEEE Trans. Instrum. Meas. 2022, 71, 1–10. [Google Scholar] [CrossRef]
  20. Singleton, R.K.; Strangas, E.G.; Aviyente, S. Extended Kalman filtering for remaining-useful-life estimation of bearings. IEEE Trans. Ind. Electron. 2014, 62, 1781–1790. [Google Scholar]
  21. Lim, P.; Goh, C.K.; Tan, K.C.; Dutta, P. Multimodal degradation prognostics based on switching Kalman filter ensemble. IEEE Trans. Neural Netw. Learn. Syst. 2015, 28, 136–148. [Google Scholar] [CrossRef] [PubMed]
  22. Shen, F.; Yan, R. A new intermediate-domain SVM-based transfer model for rolling bearing RUL prediction. IEEE/ASME Trans. Mechatron. 2021, 27, 1357–1369. [Google Scholar]
  23. Guo, W.; He, M. An integrated method for bearing state change identification and prognostics based on improved relevance vector machine and degradation model. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [Google Scholar] [CrossRef]
  24. Cao, M.; Zhang, T.; Wang, J.; Liu, Y. A deep belief network approach to remaining capacity estimation for lithium-ion batteries based on charging process features. J. Energy Storage 2022, 48, 103825. [Google Scholar] [CrossRef]
  25. Pan, Y.; Cheng, D.; Wei, T.; Jia, Y. Rolling bearing performance degradation assessment based on deep belief network and improved support vector data description. Mech. Syst. Signal Process. 2022, 181, 109458. [Google Scholar] [CrossRef]
  26. Wang, B.; Lei, Y.; Yan, T.; Li, N.; Guo, L. Recurrent convolutional neural network: A new framework for remaining useful life prediction of machinery. Neurocomputing 2020, 379, 117–129. [Google Scholar] [CrossRef]
  27. Shang, Y.; Tang, X.; Zhao, G.; Jiang, P.; Lin, T.R. A remaining life prediction of rolling element bearings based on a bidirectional gate recurrent unit and convolution neural network. Measurement 2022, 202, 111893. [Google Scholar] [CrossRef]
  28. Wang, F.; Liu, X.; Deng, G.; Yu, X.; Li, H.; Han, Q. Remaining Life Prediction Method for Rolling Bearing Based on the Long Short-Term Memory Network. Neural Process. Lett. 2019, 50, 2437–2454. [Google Scholar] [CrossRef]
  29. Shi, P.; Ma, H.; Xu, X.; Han, D. A novel remaining useful life prediction method of rolling bearings based on multivariate prediction method and long short-term memory with residuals model. Measurement 2026, 279, 121726. [Google Scholar] [CrossRef]
  30. Islam, M.M.; Prosvirin, A.E.; Kim, J.M. Data-driven prognostic scheme for rolling-element bearings using a new health index and variants of least-square support vector machines. Mech. Syst. Signal Process. 2021, 160, 107853. [Google Scholar] [CrossRef]
  31. Zhou, Z.; Liu, L.; Song, X.; Chen, K. Remaining useful life prediction method of rolling bearing based on Transformer model. J. Beijing Univ. Aeronaut. Astronaut. 2023, 49, 430–443. [Google Scholar]
  32. Chen, J.; Huang, R.; Chen, Z.; Mao, W.; Li, W. Transfer learning algorithms for bearing remaining useful life prediction: A comprehensive review from an industrial application perspective. Mech. Syst. Signal Process. 2023, 193, 110239. [Google Scholar] [CrossRef]
  33. Sun, W.; Wang, H.; Liu, Z.; Qu, R. Method for predicting RUL of rolling bearings under different operating conditions based on transfer learning and few labeled data. Sensors 2022, 23, 227. [Google Scholar] [CrossRef] [PubMed]
  34. Fang, Y.; Shi, H.; Zhao, C.; Li, T.; Wang, Y.; Wang, Z.; Hou, Z.; Wei, J. A physics-informed temporal-enhancing neural network for remaining useful life prediction of rolling bearing. IEEE Trans. Instrum. Meas. 2026, 75, 3513413. [Google Scholar]
  35. Yang, S.; Liu, R. A review of graph neural networks for rolling bearing fault diagnosis. Meas. Sci. Technol. 2026, 37, 022002. [Google Scholar] [CrossRef]
  36. Wu, F.; Wu, Q.; Tan, Y.; Xu, X. Remaining useful life prediction based on deep learning: A survey. Sensors 2024, 24, 3454. [Google Scholar] [CrossRef] [PubMed]
  37. Singh, A.; Principe, J.C. Information theoretic learning with adaptive kernels. Signal Process. 2011, 91, 203–213. [Google Scholar] [CrossRef]
  38. Li, W.; Wang, Z.; Hu, J.; Du, J.; Sheng, W. Kernel adaptive filtering over complex networks. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 4339–4346. [Google Scholar] [CrossRef]
  39. Shi, L.; Lu, R.; Liu, Z.; Yin, J.; Chen, Y.; Wang, J.; Lu, L. An improved robust kernel adaptive filtering method for time-series prediction. IEEE Sens. J. 2023, 23, 21463–21473. [Google Scholar] [CrossRef]
  40. Shah, K.; Arfan, M.; Ullah, A.; Al-Mdallal, Q.; Ansari, K.J.; Abdeljawad, T. Computational study on the dynamics of fractional order differential equations with applications. Chaos Solitons Fractals 2022, 157, 111955. [Google Scholar] [CrossRef]
  41. Zhang, X.; Ding, F. Optimal adaptive filtering algorithm by using the fractional-order derivative. IEEE Signal Process. Lett. 2021, 29, 399–403. [Google Scholar] [CrossRef]
  42. Li, X. Riemann-Liouville Derivative Kernel Recursive-Least-Square Filtering for Remaining Useful Life Prediction on Rolling Bearings. In Proceedings of the 2025 22nd International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 19–21 December 2025; pp. 1–4. [Google Scholar]
  43. Pinheiro, M., Jr.; Dral, P.O. Kernel methods. In Quantum Chemistry in the Age of Machine Learning; Elsevier: Amsterdam, The Netherlands, 2023; pp. 205–232. [Google Scholar]
  44. Zhou, Y. Basic Theory of Fractional Differential Equations; World Scientific: Singapore, 2023. [Google Scholar]
  45. Klimek, M. Sequential fractional differential equations with Hadamard derivative. Commun. Nonlinear Sci. Numer. Simul. 2011, 16, 4689–4697. [Google Scholar] [CrossRef]
  46. Jeon, H.J.; Van Roy, B. An information-theoretic framework for deep learning. Adv. Neural Inf. Process. Syst. 2022, 35, 3279–3291. [Google Scholar] [CrossRef]
  47. Wang, B.; Lei, Y.; Li, N.; Li, N. A hybrid prognostics approach for estimating remaining useful life of rolling element bearings. IEEE Trans. Reliab. 2018, 69, 401–412. [Google Scholar] [CrossRef]
  48. Nectoux, P.; Gouriveau, R.; Medjaher, K.; Ramasso, E.; Chebel-Morello, B.; Zerhouni, N.; Varnier, C. PRONOSTIA: An experimental platform for bearings accelerated degradation tests. In Proceedings of the IEEE International Conference on Prognostics and Health Management (PHM’12), Denver, CO USA, 18–21 June 2012; pp. 1–8. [Google Scholar]
  49. Jiang, L.; Zhang, T.; Lei, W.; Zhuang, K.; Li, Y. A new convolutional dual-channel transformer network with time window concatenation for remaining useful life prediction of rolling bearings. Adv. Eng. Inform. 2023, 56, 101966. [Google Scholar] [CrossRef]
Figure 1. Model of kernel adaptive learning.
Figure 1. Model of kernel adaptive learning.
Sensors 26 04137 g001
Figure 2. Overall process framework.
Figure 2. Overall process framework.
Sensors 26 04137 g002
Figure 3. Testbed of XJTU-SY.
Figure 3. Testbed of XJTU-SY.
Sensors 26 04137 g003
Figure 4. Typical life-cycle degradation trajectory of bearing 1-1. (a) The full life-cycle degradation trajectory; (b) The execution of the 3 σ interval technique and the subsequent FPT identification.
Figure 4. Typical life-cycle degradation trajectory of bearing 1-1. (a) The full life-cycle degradation trajectory; (b) The execution of the 3 σ interval technique and the subsequent FPT identification.
Sensors 26 04137 g004
Figure 5. Degradation prediction trajectories commencing at the FPT of different bearings in XJTU-SY.
Figure 5. Degradation prediction trajectories commencing at the FPT of different bearings in XJTU-SY.
Sensors 26 04137 g005
Figure 6. RUL prediction outcomes from these diverse starting points of different bearings in XJTU-SY.
Figure 6. RUL prediction outcomes from these diverse starting points of different bearings in XJTU-SY.
Sensors 26 04137 g006
Figure 7. Testbed of PRONOSTIA.
Figure 7. Testbed of PRONOSTIA.
Sensors 26 04137 g007
Figure 8. Degradation prediction trajectories commencing at the FPT of different bearings in PRONOSTIA.
Figure 8. Degradation prediction trajectories commencing at the FPT of different bearings in PRONOSTIA.
Sensors 26 04137 g008
Figure 9. RUL prediction outcomes from these diverse starting points of different bearings in PRONOSTIA.
Figure 9. RUL prediction outcomes from these diverse starting points of different bearings in PRONOSTIA.
Sensors 26 04137 g009
Table 1. Comparison of Cumulative Relative Accuracy (CRA) results.
Table 1. Comparison of Cumulative Relative Accuracy (CRA) results.
DatasetsResults
RVM DBN PF EKF Transformer LSTM Proposed
XJTU-SY B 11 0.76510.78640.82100.74990.84510.85140.8712
XJTU-SY B 12 0.76890.77130.78360.74460.86540.84600.8832
XJTU-SY B 13 0.83760.82340.84520.89150.92150.91520.9386
XJTU-SY B 14 0.81220.82470.81860.88180.92140.91120.9294
PRONOSTIA B 11 0.81740.83240.87530.83790.92220.91850.9236
PRONOSTIA B 14 0.85660.84560.86280.83490.90840.89750.9249
PRONOSTIA B 22 0.82160.82820.87520.84350.91480.92080.9228
PRONOSTIA B 32 0.81230.82370.85860.81990.91890.90850.9168
Table 2. Comparison of root mean square error (RMSE) results.
Table 2. Comparison of root mean square error (RMSE) results.
DatasetsResults
RVM DBN PF EKF Transformer LSTM Proposed
XJTU-SY B 11 2.26532.12241.83232.14291.33231.44291.3122
XJTU-SY B 12 1.72841.67331.42361.45560.91240.90850.8736
XJTU-SY B 13 8.33438.82486.43629.84555.41205.46615.1562
XJTU-SY B 14 2.56402.12241.87862.87281.42661.46591.3249
PRONOSTIA B 11 10.83049.53648.756811.23947.86547.78537.5672
PRONOSTIA B 14 19.135618.856715.34519.243512.74513.05012.3267
PRONOSTIA B 22 8.45677.83697.67888.83295.27085.33504.9588
PRONOSTIA B 32 9.83688.32477.83858.58184.49784.56024.4680
Table 3. Comparison of Mean Absolute Percentage Error (MAPE) results.
Table 3. Comparison of Mean Absolute Percentage Error (MAPE) results.
DatasetsResults
RVM DBN PF EKF Transformer LSTM Proposed
XJTU-SY B 11 8.35787.56757.23688.74765.53045.67505.2357
XJTU-SY B 12 7.56546.59836.58777.68984.43564.54504.3596
XJTU-SY B 13 8.65328.25587.45868.54825.04505.12054.9681
XJTU-SY B 14 8.37857.85366.92558.57284.57314.58304.3547
PRONOSTIA B 11 9.65349.56758.785310.15736.58756.78516.5786
PRONOSTIA B 14 11.55479.78579.421010.36655.67525.87725.6330
PRONOSTIA B 22 8.63707.96236.98568.66294.88574.78954.6543
PRONOSTIA B 32 8.23617.56527.01258.26535.39855.52605.3362
Table 4. Comparison of Score results.
Table 4. Comparison of Score results.
DatasetsResults
RVM DBN PF EKF Transformer LSTM Proposed
XJTU-SY B 11 0.87970.77220.76990.92080.57850.57200.5570
XJTU-SY B 12 0.79640.67330.70080.80850.46990.49200.4638
XJTU-SY B 13 0.91080.84240.79350.89980.54870.55610.5285
XJTU-SY B 14 0.88190.80140.73680.90240.46890.49780.4633
PRONOSTIA B 11 1.01610.97630.93461.06910.76900.72080.6999
PRONOSTIA B 14 1.21620.99851.00221.09120.64190.67220.5993
PRONOSTIA B 22 0.90920.81250.74310.91190.56590.60140.4951
PRONOSTIA B 32 0.86690.77200.74600.87010.61050.62110.5676
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pan, L.; Xu, J.; Peng, L.; Bi, D.; Xie, Y. A Fractional-Derivative Multi-Kernel Adaptive Learning Approach for Remaining Useful Life Prediction of Rotating Machinery. Sensors 2026, 26, 4137. https://doi.org/10.3390/s26134137

AMA Style

Pan L, Xu J, Peng L, Bi D, Xie Y. A Fractional-Derivative Multi-Kernel Adaptive Learning Approach for Remaining Useful Life Prediction of Rotating Machinery. Sensors. 2026; 26(13):4137. https://doi.org/10.3390/s26134137

Chicago/Turabian Style

Pan, Long, Juan Xu, Libiao Peng, Dongjie Bi, and Yongle Xie. 2026. "A Fractional-Derivative Multi-Kernel Adaptive Learning Approach for Remaining Useful Life Prediction of Rotating Machinery" Sensors 26, no. 13: 4137. https://doi.org/10.3390/s26134137

APA Style

Pan, L., Xu, J., Peng, L., Bi, D., & Xie, Y. (2026). A Fractional-Derivative Multi-Kernel Adaptive Learning Approach for Remaining Useful Life Prediction of Rotating Machinery. Sensors, 26(13), 4137. https://doi.org/10.3390/s26134137

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop