1. Introduction
With the rapid digital and intelligent transformation of manufacturing, key equipment components are increasingly evolving toward greater scale and integration [
1]. When mechanical equipment components operate under complex conditions, degradation rates and degradation patterns exhibit significant individual variability. Based on daily inspection and regular maintenance, planned preventive maintenance in traditional maintenance strategies results in serious delays in fault monitoring and weak preventive abilities [
2]. The development and application of Prognostics and Health Management (PHM) technology, which relies on the real-time health status of equipment, are dedicated to avoiding excessive maintenance, reducing false-alarm rates, and ensuring safe equipment operation by dynamically scheduling maintenance. As an essential aspect of PHM, Remaining Useful Life (RUL) prediction focuses on continuously monitoring equipment conditions and predicting failure times to extend the operational duration of the equipment and improve economic efficiency. Typical RUL prediction approaches are generally categorized into physics-based approaches and data-driven approaches [
3].
Physics-based approaches represent the degradation process by establishing physical models of a complex system. For instance, Zhang et al. [
4] developed a capacity-cycling degradation model to estimate the RUL of online lithium-ion batteries. In their study, they estimated the core temperature, state of charge, and battery capacity by leveraging thermal and Coulomb SOC models. Similarly, Shutin et al. [
5] proposed a degradation model integrating tribological theory with the physical wear mechanisms of rolling bearings to predict the RUL of hydrodynamic bearings. In another study, Protopapadakis et al. [
6] implemented an understandable AI-assisted RUL estimation method for turbine engines by leveraging a degradation model derived from aerothermodynamics and analyzing measurement data. Although physics-based methods are easily interpretable, their reliance on in-depth information about the principle of equipment failure mechanisms and domain-specific knowledge limits their generalization capability for unknown complex systems.
The widespread adoption of intelligent sensors and advances in big data technology are rapidly increasing the volume of monitoring data available to industries. Extracting value from these multi-source, heterogeneous datasets enhances Remaining Useful Life (RUL) prediction. Thus, data-driven RUL prediction is becoming popular across industrial and academic domains. Currently, data-driven remaining useful life prediction mainly utilizes three prevalent approaches: statistical methods, machine learning (ML), and deep learning (DL). Statistical methods such as Hidden Markov Models (HMMs), Kalman filters, and Wiener processes analyze statistical distributions, trends, and patterns in historical data to model equipment degradation and predict the RUL. Zhang et al. [
7] successfully addressed challenges related to nonlinearity, state transitions, and stochasticity in predicting the remaining useful life of lithium-ion batteries. Their approach combines a nonlinear drift-driven Wiener process, a Markov chain-switching model, and a fuzzy system. Furthermore, they significantly improved the model’s reliability in terms of predictive precision by introducing adaptive filtering techniques in dynamic environments. Although statistical models are computationally efficient, they have limited ability in modeling nonlinear relationships. As a result, they are frequently combined with ML approaches, including Support Vector Machines (SVMs), Random Forests (RFs), and decision trees, to derive valuable information from the Probability Density Functions (PDFs) of datasets [
8]. For instance, Alfarizi et al. [
9] constructed a two-stage model to forecast the remaining useful life of experimental bearings. In the first stage, the input signals were decomposed into different frequency bands using empirical mode decomposition, eliminating irrelevant frequencies and highlighting fault characteristics. In the second stage, they combined a random forest model with Bayesian hyperparameter tuning to enhance the accuracy of RUL prediction. Although ML methods show strong nonlinear modeling capabilities, the abovementioned component-level RUL prediction approaches grounded in ML methodologies demand an index that reflects degradation levels and heavily rely on feature engineering, rendering them unsuitable for RUL prediction using multidimensional time-series data (MTSD).
As a subfield of ML, DL provides substantial technological advantages for component RUL prediction, owing to its strengths in modeling intricate nonlinear relationships, processing high-dimensional data, and automating feature engineering. Recurrent Neural Network (RNNs), which are capable of capturing temporal degradation patterns, are extensively used for estimating RUL [
10]. Nevertheless, for extended time-series data, they are prone to issues such as gradient explosion and reduced computational efficiency, limiting their practical deployment in industrial settings [
11]. As enhanced successors of RNNs, the use of Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRUs) strengthens the ability to model long-term dependencies and mitigate gradient vanishing and explosion via gating mechanisms and gradient optimization strategies, establishing them as dominant approaches in remaining useful life estimation. Chui et al. [
12] presented a Non-dominated Sorting Genetic Algorithm II (NSGA-II)-based RUL prediction model that combines the short-term prediction strengths of RNNs with the long-term prediction capabilities of LSTM, ultimately efficiently addressing challenges related to machine downtime and redundant maintenance during the running to failure and preventive maintenance of turbofan engines. Beyond long-term temporal dependencies, it is essential to consider the local features of MTSD. To combine global temporal patterns and local and global information for the RUL prediction of equipment, Cao et al. [
13] proposed a parallel RUL prediction architecture combining a multi-scale CNN (MSCNN) and multi-scale LSTM (MSLSTM) to extract multi-dimensional health indicators, effectively reducing the local fluctuation caused by the CNN. Then, they used a dynamic time warping (DTW)-based similarity-matching algorithm to identify historical training samples with degradation trends similar to the test sequence. Finally, they achieved accurate prediction for railway freight car wheels. Duan et al. [
14] treated mechanical monitoring data as “natural language sequences of machines” and input them into a Transformer. This methods begins by utilizing its attention layers to highlight core time-step information and compute the output in parallel, followed by the integration of two attention mechanisms in the Transformer structure with LSTM in the encoder to extract both local and long-term temporal dependency information of the degradation process. Finally, it employs a nonlinear Wiener process (NWP) to calculate the PDF of the RUL. Song et al. [
15] addressed the impact of prediction outcomes of the complex interactions among high-dimensional variables in MTSD by constructing multi-dimensional feature-correlated spatiotemporal (MFCST) graphs to implement feature extraction for data in different formats while employing a stacked long short-term memory (ST-LSTM) network to comprehensively explore local and global temporal pattern of MTSD. Then, they strategically weighted spatial and temporal patterns to enhance the model’s generalization ability and spatial perception of high-dimensional variable feature structures. Currently, DL-based methods for RUL prediction frequently adopt averaging or predefined weights to address issues such as high-frequency noise, local fluctuations, and missing values in high-frequency sensor data. Additionally, models trained using randomly sampled training data cannot adequately leverage the intrinsic correlations among sensors and their different levels of contributionsto the degradation process, which limits the model’s predictive capability on real-time data. To resolve these difficulties, regularization techniques are typically used, but they require manual parameter tuning, increasing the cost of human intervention.
In contrast, deep reinforcement learning (DRL) stands out as an ML technique, providing significant support for the capture of temporal dependencies in equipment RUL prediction due to its remarkable ability to learn optimal strategies through interaction with the environment, achieving stronger exploration and generalization capabilities [
16]. In particular, DRL is divided into two approaches: the value-based Deep Q-Network (DQN) algorithm and the Deterministic Policy Gradient (DPG) algorithm. DQN uses deep neural networks to approximate the Q-value function and improves stability through experience replay and target networks. On this basis, Yao et al. [
17] proposed a Deep Transfer Reinforcement Learning (DTRL) network based on LSTM, which utilizes novel Q-function updates and transfer strategies to estimate the RUL of machinery operating under similar tool and cutting conditions. DQN demonstrates exceptional performance in discrete action spaces; however, it exhibits significant limitations when applied to continuous action spaces. In contrast, DPG directly optimizes policy parameters through gradient ascent to maximize expected rewards. Despite its theoretical elegance, DPG lacks scalability in deep learning frameworks [
18]. To address this, Actor–Critic algorithm was developed as an enhancement of DPG, combining value-function approximation with policy gradient methods. By leveraging the advantage function, it effectively reduces variance in gradient estimation [
19]. Nevertheless, the Actor–Critic model’s training remains unstable due to high variance. Building upon this, the Deep Deterministic Policy Gradient (DDPG) algorithm extends the Actor–Critic strategy and is specifically designed for continuous action spaces. DDPG incorporates deterministic policies and target networks, successfully scaling DPG to high-dimensional, continuous action spaces and improving training stability. Zheng et al. [
20] designed a DRL model trained based on the Twin Delayed DDPG (TD3) algorithm. The model leverages the powerful feature representation of DL while maintaining temporal dependencies across samples through RL, enabling accurate RUL estimation for rolling bearings. Hu et al. [
21] integrated DRL with a Markov Decision Process (MDP) framework to learn to derive an optimal strategy for RUL prediction. Although current deep fusion techniques mitigate the shortcomings of conventional DL in RUL prediction and improve model robustness and adaptability, the use of DRL to dynamically adjust time-scale parameters and prioritize critical degradation phases remains an unresolved challenge.
To overcome the above limitations, this study presents ADAPT-RULNet, an adaptive RUL prediction framework that integrates attention mechanisms with DRL based on a hybrid network. The proposed framework enhances the quality of input signals by utilizing Functional Alignment Resampling (FAR) for MTSD preprocessing and using DTW to construct personalized datasets with similar degradation stages. The framework extracts both local and long-term degraded information from MTSD through an attention-enhanced multi-scale CNN and LSTM. These features are then integrated through an adaptive Bayesian fusion layer, thereby achieving better prediction performance. Furthermore, the framework introduces the DDPG to adaptively adjust critical time-scale parameters to obtain an optimal global balance between prediction accuracy and model complexity. To validate the effectiveness of ADAPT-RULNet, extensive experiments were carried out on two datasets in comparison with the advanced method. The experimental results suggest that ADAPT-RULNet outperforms all other approaches, on average.
The main contributions of this work are outlined below:
We propose a robust and precise data preprocessing framework for mechanical equipment RUL prediction. By employing the novel FAR method for data signal optimization, we effectively address the issues of noise, heterogeneity, and inconsistent time-series lengths in raw sensor data, thereby providing high-quality input signals for subsequent feature extraction. Furthermore, leveraging the attention-based DTW-enhanced model leads to the selection of degradation stages with highly similar processes across devices to construct a personalized dataset with high-quality and consistent degradation.
We construct an attention-enhanced multi-scale parallel feature extraction model. The proposed method leverages a multi-scale CNN with spatial–temporal attention mechanisms to extract temporal features and local degradation patterns from multi-modal sensor data. Simultaneously, multi-scale LSTM with multi-head attention mechanisms is employed to capture multi-modal temporal features and global degradation patterns. The multi-dimensional features are adaptively fused using Bayesian probability to enhance the accuracy of RUL prediction.
We introduce a complexity–efficiency balancing strategy based on DDPG. This approach reformulates the parameter optimization process in RUL prediction as an MDP. This strategy leverage the experience replay mechanism and target-network soft update technique within the DDPG framework to adaptively optimize the time-window size and the number of similar samples of key parameters during the construction of the feature extraction dataset. This ensures a global balance between prediction performance and model complexity.
With respect to practical application prospects, the approach was validated on public and industrial datasets, confirming its effectiveness. The prediction results surpass those of existing CNN-LSTM models, demonstrating strong potential for intelligent maintenance in real industrial applications.
The structure of this study is outlined as follows:
Section 2 reviews the related research and technical methodologies.
Section 3 describes the proposed adaptive RUL prediction framework.
Section 4 analyzes the experimental results, while
Section 5 discusses the conclusions and outlines future research directions.
3. Proposed Method
To efficiently integrate heterogeneous sensor data, mine the degradation patterns of mechanical equipment components under complex operating conditions, and accurately predict the RUL, this paper proposes an adaptive RUL prediction framework—ADAPT-RULNet, which integrates attention mechanisms and deep reinforcement learning, as illustrated in
Figure 4.
The proposed framework consists of four tightly coupled modules:
Data preprocessing employs Functional Alignment Resampling (FAR) to optimize raw sensor signals by addressing noise, heterogeneity, and inconsistent time-series lengths.
Personalized dataset construction utilizes attention-enhanced Dynamic Time Warping (DTW) to build similarity-based degradation stages, ensuring that samples with highly similar degradation trajectories are grouped together.
Hybrid network-based RUL prediction constructs a hybrid architecture combining multi-scale CNN (MSCNN) and multi-scale LSTM (MSLSTM) with attention mechanisms and applies Bayesian fusion for adaptive feature integration.
Reinforcement learning-based adaptive parameter tuning introduces a Deep Deterministic Policy Gradient (DDPG) to adaptively adjust critical parameters such as time-window size and the number of selected similar samples, balancing prediction accuracy with model complexity.
These modules work synergistically to remarkably enhance the model’s robustness and generalization capability under complex operating conditions.
3.1. Data Preprocessing
In real-world industrial scenarios, the operational conditions of mechanical equipment exhibit high complexity, and their degradation processes demonstrate significant individual variability. Data collected from multi-source sensors often suffer from missing values, nonlinear characteristics, high levels of noise interference, unequal-length time series, and multi-source heterogeneity. Traditional data preprocessing methods encounter difficulty in capturing intricate degradation patterns. To tackle these issues, this paper introduces the Functional Adaptive Regression (FAR) approach, which reconstructs temporal continuity to better capture local or global degradation trends. By transforming time-series signals into functional signals, it facilitates the subsequent RUL prediction process.
Locally Weighted Scatterplot Smoothing (LOWESS) [
28] and Cubic Natural Spline (CNS) smoothing serve as the core methodologies of FAR, enabling the transformation of raw time-series signals into functional signals. LOWESS employs localized smoothing techniques to effectively capture high-frequency noise and local fluctuations of the MTSD to adapt the local variations in the time series. Let
, where
represents the duration of the time series,
represents the number of devices, and
signifies the number of sensors. The term
represents the average value of all
within a local window (
h), as illustrated in Equation (
15):
The smoothed value (
) of LOWESS is computed through locally weighted regression, as shown in Equation (
16):
where
and
are the regression coefficients obtained through weighted least squares and the weights (
) are defined as shown in Equation (
17):
where
is the observed value of time-series signal
at time step
j.
The CNS interpolation method is applied to the multi-channel functional signals (
) to perform global fitting, generating continuous and smooth functional signal data. The fitting function (
) is given by Equation (
18):
where coefficients
,
,
, and
are obtained by minimizing the following objective function:
where
is the smoothing parameter, controlling the smoothness of the fitted curve, and
is the second derivative of the spline function, used to measure the curvature of the curve. Finally, by performing global fitting on the smoothed data, a continuous functional signal (
is generated, as shown in Equation (
20):
3.2. Construction of Personalized Datasets with Similarity Degradation Stages
To completely capture the local suddenness and long-term trends of degradation behavior in mechanical equipment throughout the entire life cycle, this paper integrates a multi-head attention mechanism with DTW for personalized dataset construction (Attention-DTW). This approach overcomes the limitation of traditional DTW in terms of ignoring the differences among various sensors during the degradation process due to fixed weights. At the same time, it addresses the limitation of traditional Euclidean distance in effectively measuring the waveform similarity between two time series. By selecting the most relevant historical degradation samples for the test sequence, this method better captures the time-varying characteristics in dynamic uncertain environments and identifies similarities among operational signals, thereby providing reliable dataset support for the accuracy and robustness of RUL prediction.
Specifically, let the input MTSD be
, where
represents the time-series data of the
u-th sensor. For the test sequence (
) and the historical sequence (
), the attention scores of each sensor channel are first calculated, as shown in Equation (
21):
where
is a multi-layer perceptron that maps a single-channel sequence into a feature vector, while
reflects the importance weight of the
u-th sensor in the similarity measurement. By incorporating the weight (
) into the calculation of the multi-channel DTW distance, we obtain Equation (
22):
where this distance metric adaptively focuses on sensors sensitive to degradation while effectively mitigating noise interference.
During the system degradation process, the tail data typically best reflects the current state and the latest trends of the system. In the implementation, a fixed window of length
is used to extract the last
time points from each unlabeled sample (
), resulting in
. Subsequently, a sliding window of length
is applied to the historical data (
) to extract all candidate segments (
). Based on the Attention-DTW similarity calculation, the distance (
) is computed according to Equation (
22), and the
most similar segments are selected as shown in Equation (
23):
The predicted label (
) is derived based on Equation (
24):
and the final dataset is formulated as Equation (
25):
This process builds the complete training dataset (), which will be used in the subsequent CNN feature extraction network.
3.3. Attention-Enhanced Multi-Scale Hybrid Network for Remaining Useful Life Prediction
The extraction of features from local and long-term degradation trends in the ADAPT-RULNet structure is mainly achieved through an attention-enhanced multi-scale hybrid RUL prediction network based on Bayesian fusion, as illustrated in
Figure 5.
The proposed method primarily consists of three components:
(a) Attention-Enhanced Multi-Scale Depthwise Separable Convolution (DSC). This module is designed to accurately identify short-term dependencies across multiple scales while reducing computational complexity. Four convolutional kernels with distinct dimensions—, , , and —are employed to capture multi-scale features from the input data. Residual connections are introduced to preserve and enhance low-level feature information, mitigating the risk of gradient vanishing in deep networks. To further optimize feature representation, a Combined Spatial and Channel Attention Module (CSAM) is integrated, which fuses the Channel Attention Module (CAM) and the Spatial Attention Module (SAM) to dynamically weight key features and achieve efficient fusion of multi-level features.
(b) Attention-Enhanced Multi-Scale LSTM Network. This network is proposed to enhance the extraction of long-term degradation trends. It employs three hidden layers with dimensions of 64, 128, and 256 to capture multi-scale temporal patterns while avoiding overfitting. To improve the identification of degradation characteristics, a Multi-Head Self-Attention (MHSA) mechanism is introduced, which dynamically allocates weights to focus on the most critical time-scale features for RUL prediction. Specifically, the attention mechanism computes weights for each time-scale feature, performs weighted fusion, and integrates the feature maps extracted by the LSTM network across the three scales to generate a comprehensive global feature representation.
(c) Bayesian Feature Fusion. A Bayesian probability-based feature fusion approach is designed to optimally combine local and global features while mitigating the uncertainty inherent in feature extraction from different networks. The mathematical formulation is given in Equation (
26):
where
represents the fused feature distribution;
and
denote the conditional probability distributions of the global and local features, respectively; and
is the prior distribution.
The fused feature representation is then obtained as follows:
Finally, the features obtained through adaptive probabilistic fusion are fed into a multi-layer fully connected network to achieve precise prediction of the equipment’s RUL. The training procedure of the Attention-Enhanced Multi-Scale Hybrid Network is presented in Algorithm 1.
Algorithm 1 Attention-Enhanced Multi-Scale Hybrid Network Training |
Require: 1: train_data: Preprocessed training data 2: train_labels: Corresponding RUL labels 3: val_data: Validation data 4: val_labels: Corresponding RUL labels 5: device: Computational device (‘cpu’ or ‘cuda’) 6: fusion_dim: Dimension for feature fusion 7: learning_rate: Initial learning rate 8: epochs: Number of training epochs 9: batch_size: Batch size for training Ensure: 10: trained_model: Trained neural network model 11: procedure TrainNeuralNetwork(, , , , , , , , ) 12: Initialize CNN-LSTM model with attention mechanisms 13: Define loss function (e.g., MSE) and optimizer (e.g., AdamW) 14: for epoch = 1 to epochs do 15: for batch = 1 to len(train_data)/batch_size do 16: Load batch data and labels 17: Forward pass: compute model output 18: Calculate loss 19: Backward pass: compute gradients 20: Update model parameters 21: end for 22: Validate model on validation set 23: Compute validation loss and metrics 24: Update learning rate scheduler if needed 25: end forreturn trained_model 26: end procedure
|
3.4. Strategy for Balancing Model Complexity and Efficiency
To achieve global balancing of model complexity and predictive performance and to overcome the lack of flexibility of traditional fixed-window approaches in capturing both long-term and short-term dependencies, this study innovatively introduces the DDPG algorithm into RUL prediction. By abstracting the adaptive parameter adjustment problem of the RUL prediction model into a DRL environment, we define the state space, action space, and reward function, thereby constructing a dynamic optimization framework.
State Space : The state vector of the DDPG agent is composed of the performance metrics of the current RUL prediction model and the parameters of the personalized dataset. Specifically, the state vector includes the Mean Squared Error (MSE), Akaike Information Criterion (AIC), and DTW similarity, as well as the current time-window length (
) and dataset size (
). These metrics comprehensively reflect the model’s prediction accuracy, complexity, and dataset quality. The AIC value is defined as follows:
where
denotes the total number of model parameters,
U represents the number of sensor channels,
is the sliding-window size,
is the number of similar segments, and
is the likelihood-function value of the maximum likelihood estimation.
Action Space : The agent adjusts two continuous parameters— and , which represent the adjustment magnitudes of and , respectively. These values are normalized within and mapped to the valid ranges of L and M via linear scaling, ensuring rational and feasible parameter updates.
Reward Function : The reward drives the optimization process of the DDPG agent. A composite reward is designed to dynamically adjust and , enabling better adaptability across different degradation stages and achieving global balancing between complexity and accuracy.
(i) Base Reward: Negatively correlated with MSE and AIC and positively correlated with DTW similarity:
where
, and
are weighting coefficients.
(ii) Improvement Reward: Measures improvement compared to the previous step:
where
, and
are weighting coefficients.
(iii) Stability Reward: Penalizes unstable fluctuations in recent performance:
where
is a weighting coefficient.
(iv) Total Reward: The overall reward is defined as follows:
During DDPG training, the agent iteratively interacts with the RUL prediction environment. At each time step, the actor network selects an action, and the environment updates
and
and recalculates the MSE, AIC, and DTW similarity. The corresponding reward and new state are returned, which are stored in the replay buffer. The agent periodically samples from the buffer to update actor and critic parameters, while target networks are updated via a soft-update strategy. Through iterative training, the agent progressively learns an optimal policy that adaptively adjusts
and
to balance model complexity and predictive performance. The process of DDPG for hyperparameter optimization is presented in Algorithm 2.
Algorithm 2 DDPG for Hyperparameter Optimization |
Require: 1: env: Environment for RUL prediction 2: state_dim: Dimension of state space 3: action_dim: Dimension of action space 4: action_range: Range of actions 5: memory_capacity: Capacity of replay memory 6: batch_size: Batch size for training DDPG 7: gamma: Discount factor 8: tau: Soft update coefficient 9: actor_lr: Learning rate for actor network 10: critic_lr: Learning rate for critic network Ensure: 11: ddpg_agent: Trained DDPG agent 12: procedure DDPG(, , , , , , , , , ) 13: Initialize actor network and critic network Q 14: Initialize target networks and 15: Initialize replay memory 16: Initialize actor and critic optimizers 17: for each training step do 18: Obtain current state from environment 19: Select action using actor network 20: Execute action in environment, obtain reward and next state 21: Store transition in replay memory 22: Sample random batch from replay memory 23: Update critic network using sampled batch 24: Update actor network using sampled batch 25: Soft update target networks 26: end forreturn ddpg_agent 27: end procedure
|
5. Conclusions
Accurate prediction of RUL is of paramount importance in optimizing maintenance strategies, reducing operational expenses, and ensuring the operational safety of equipment. This study introduces an adaptive framework for RUL prediction that integrates attention mechanisms and RL. The primary objective is to address challenges related to the adaptability of the prediction process, the accuracy of prediction results, and the generalization ability of the prediction model. Additionally, the method employs the FAR approach for data preprocessing and utilizes attention mechanism-based DTW to construct personalized datasets, ensuring high-quality signal input while improving the efficiency of feature extraction. The proposed attention-enhanced CNN-LSTM hybrid network architecture achieves the fusion of local temporal and global dependency features, enhancing the accuracy of RUL prediction, particularly providing high-precision predictions during the later stages of equipment degradation. Finally, to balance model complexity and prediction performance, parameter tuning is transformed into an MDP model, and techniques including experience replay and target-network soft updates of the DDPG algorithm are adopted to adaptively adjust key parameters in personalized dataset construction. The method’s efficacy was validated on datasets covering two distinct components. Comparative experiments with various current DL methods were conducted. The findings indicate that the new approach achieves the highest accuracy and average metrics.
However, our proposed method also has limitations, especially in terms of model performance and interpretability. In upcoming research, we plan to continuously optimize hyperparameters and the reinforcement learning reward function, as well as further improve the computational efficiency of the algorithm. In addition, we will explore physics-informed deep learning by integrating physical models with neural networks to gain deeper insights into the degradation mechanisms of mechanical components, thereby enhancing both interpretability and predictive accuracy. We also aim to incorporate Bayesian neural networks for uncertainty estimation, providing more reliable confidence intervals for RUL predictions and enabling decision-makers to assess risks more scientifically. Furthermore, the integration of the RUL prediction model with digital twin technology will be pursued to achieve real-time monitoring and online updating of equipment status, significantly improving prediction accuracy and timeliness. Through these forward-looking studies, we expect to advance the field of RUL prediction and provide stronger technical support for equipment maintenance and operational management. In future work, we plan to continuously optimize hyperparameters and the reinforcement learning reward function, as well as further improve the computational efficiency of the algorithm. In addition, we will explore physics-informed deep learning by integrating physical models with neural networks to gain deeper insights into the degradation mechanisms of mechanical components, thereby enhancing both interpretability and predictive accuracy. We also aim to incorporate Bayesian neural networks for uncertainty estimation, providing more reliable confidence intervals for RUL predictions and enabling decision-makers to assess risks more scientifically. Furthermore, the integration of the RUL prediction model with digital-twin technology will be pursued to achieve real-time monitoring and online updating of equipment status, significantly improving prediction accuracy and timeliness. Through these forward-looking studies, we expect to advance the field of RUL prediction and provide stronger technical support for equipment maintenance and operational management.