1. Introduction
Against the backdrop of the accelerating global transition toward sustainable energy structures, lithium-ion batteries, as core energy storage components, are playing an increasingly crucial role [
1]. From portable electronic devices to electric vehicles, and from distributed energy storage systems to smart grid peak regulation, lithium-ion batteries have become key technologies supporting the electrification of modern society due to their high energy density, long cycle life, and low self-discharge rate [
1,
2].
It is noteworthy that the performance degradation mechanism of lithium-ion batteries is extremely complex, involving interdisciplinary fields such as electrochemical reaction kinetics, material structural evolution, and interfacial chemistry [
3]. In practical applications, battery capacity inevitably decays with increasing charge–discharge cycles and service time, while internal resistance rises simultaneously. This not only directly shortens device endurance and reduces the available capacity of energy storage systems—thus significantly affecting economic benefits—but may also trigger severe safety accidents such as thermal runaway due to performance deterioration, posing potential threats to life and property. Therefore, accurate prediction of the remaining useful life (RUL) of lithium-ion batteries is of great significance for optimizing battery management strategies, improving utilization efficiency, reducing operation and maintenance costs, and ensuring safe and stable system operation. It also promotes the efficient circulation of the battery’s full life-cycle value chain [
4,
5].
However, due to the complex coupling of electrochemical reactions inside batteries, diverse operating conditions, and performance variability caused by manufacturing differences among individual cells, traditional RUL prediction methods based on empirical models, physicochemical mechanism models, or shallow data-driven models struggle to accurately capture complex degradation patterns and long-term degradation trends. Both prediction accuracy and generalization ability face significant challenges. Therefore, developing advanced RUL prediction models capable of deeply mining degradation features and adapting to complex working conditions has become a key scientific problem and research hotspot in the field of battery management. It holds profound theoretical and engineering value for promoting the widespread application of lithium-ion battery technology and achieving efficient and sustainable energy systems [
6].
In recent years, lithium-ion battery RUL prediction methods can be broadly divided into model-based methods and data-driven methods [
7]. Model-based methods predict battery lifetime through various modeling strategies, ranging from simplified equivalent circuit models (ECM) to electrochemical–thermal–aging coupled models. By incorporating internal physical mechanisms (e.g., lithium-ion diffusion, electrode activity decay, and thermal effects), these methods improve RUL prediction accuracy and provide a foundation for interpretable and generalizable prediction models [
8]. However, model-based approaches are constrained by simplified assumptions regarding complex degradation mechanisms, making it difficult to accurately capture nonlinear aging characteristics under multi-factor coupling [
9,
10,
11,
12,
13,
14]. They are prone to cumulative errors under dynamic conditions and extreme environments. Furthermore, parameter identification depends heavily on prior knowledge and dynamically drifts with battery degradation, limiting universality across battery types and full life cycles [
15,
16].
In contrast, data-driven methods do not require modeling complex internal electrochemical processes. Instead, they analyze operational data to extract features reflecting battery health status and achieve RUL prediction, often requiring only partial external characteristics for efficient estimation [
10,
11,
17].
In the RUL prediction domain, operational data exhibit significant temporal characteristics. Although Convolutional Neural Networks (CNNs) [
18] possess strong spatial feature extraction capabilities, they are less effective for processing time-series degradation data. Recurrent Neural Networks (RNNs) [
19,
20], on the other hand, are naturally suited for modeling sequential data and can effectively capture dynamic features such as capacity decay during charge–discharge cycles, thereby significantly improving prediction accuracy. Long Short-Term Memory (LSTM) networks [
21,
22] introduce gating mechanisms and cell states to selectively remember and forget information, alleviating gradient vanishing and explosion problems in RNNs and enhancing long-sequence modeling capability. Gated Recurrent Units (GRUs) [
23], as simplified variants of LSTMs, reduce structural complexity while maintaining performance and shortening training time. Although these methods capture temporal dependencies to some extent, they typically lack explicit modeling of the internal dynamic state evolution of batteries [
24,
25,
26]. Moreover, they often adopt static weight allocation mechanisms when handling multi-scale degradation features, making it difficult to adapt to dynamic changes across different aging stages (e.g., linear decay phase and nonlinear accelerated phase).
In summary, although data-driven methods improve prediction accuracy, they still face the following limitations [
27,
28,
29]:
- (1)
Strong dependence on high-quality labeled data: Full life-cycle degradation data are costly and time-consuming to obtain in practical scenarios, and uneven sample distributions may lead to model overfitting.
- (2)
Insufficient multi-scale feature extraction and state modeling capability: Existing networks often struggle to jointly model local capacity fluctuations and global degradation trends and are highly sensitive to noise.
- (3)
Lack of adaptive feature fusion mechanisms: Most methods employ fixed fusion strategies when combining features from different time steps or scales, making it difficult to adaptively adjust fusion weights according to the current degradation state, thus limiting robustness under complex dynamic conditions.
To address these issues, this paper proposes a lithium-ion battery RUL prediction method based on a Reinforced Dynamic Degradation Evolution Modeling Network (RDDEMN). The proposed method first utilizes a Dynamic State Transition Network (DST-Net) to jointly model multi-scale degradation features in capacity sequences, then introduces a Sequence Pattern Attention (SPA) mechanism to highlight key temporal contributions, and finally employs a reinforcement learning-based adaptive gating mechanism to dynamically fuse multi-scale features with the current degradation state. The main contributions of this paper are as follows:
- (1)
A novel RDDEMN architecture is proposed. By integrating a dedicated DST-Net module with input-conditioned state transitions and local temporal convolution, the model accurately captures multi-scale degradation features and dynamic state evolution processes.
- (2)
A reinforcement learning-based adaptive gating fusion mechanism is designed. Combined with SPA, the mechanism intelligently adjusts fusion weights according to the current degradation state, significantly enhancing adaptability and robustness across different aging stages.
- (3)
Extensive validation on full life-cycle datasets is conducted. The objective is to comprehensively evaluate the proposed RDDEMN’s prediction accuracy and robustness under various degradation scenarios, providing a reliable methodological premise for practical battery health management and fault diagnosis.
The remainder of this paper is organized as follows:
Section 2 introduces the theoretical background.
Section 3 presents the proposed network model.
Section 4 describes experimental validation and result analysis.
Section 5 concludes the paper and discusses future work.
3. The Proposed Remaining Useful Life Prediction Method for Lithium-Ion Battery
To address the problems of insufficient mining of multi-scale degradation features and the inability of static fusion strategies to adapt to dynamic degradation processes in existing data-driven methods under complex working conditions, this paper proposes a battery RUL prediction method based on the RDDEMN. On the basis of extracting multi-scale degradation features, this method innovatively introduces the SPA and a reinforcement learning-based adaptive gating mechanism to achieve accurate feature extraction and dynamic fusion prediction.
3.1. The Proposed RDDEMN Algorithm
The architecture of the proposed RDDEMN network is shown in
Figure 1, which mainly consists of three core functional modules: the DST-Net temporal encoding module, the SPA temporal weight assignment module, and the reinforcement learning-based adaptive gating fusion module.
First, normalization processing is performed on the capacity data of the battery throughout its life cycle, and the sliding window technique is adopted to slice the data into sequence samples of fixed length for constructing the training set and test set. To strictly prevent data leakage and avoid temporal overlap, the 80/20 sequential division is performed chronologically based on the discharge cycles. Specifically, for each battery, the first 80% of the continuous life-cycle sequences are strictly allocated to the training set, while the remaining 20% of the sequences are used exclusively for testing. The sliding window moves chronologically without crossing the boundary between the training and testing sets, ensuring that the model’s predictive performance is evaluated solely on unseen, future degradation trajectories. Subsequently, the DST-Net is used to conduct joint modeling of multi-scale degradation features for the capacity time series to capture local fluctuations and global trends; then, the SPA mechanism adaptively assigns higher weights to key time steps; finally, a reinforcement learning agent is constructed to dynamically generate gating coefficients according to the current degradation state, adaptively fuse multi-scale features with state features, and output the final RUL prediction value through the fully connected layer.
3.2. Dynamic State Transition Encoder
The DST-Net is designed to collaboratively extract dynamic state evolution features and local temporal features in time series. Let the input window sequence be , where is the capacity value at the -th time step and is the window length.
First, linear mapping is performed on the input to obtain the feature vector
:
where
and
are the weight matrix and bias of the input projection, respectively.
To simulate the dynamic process of battery degradation, input-conditional state transition parameters are constructed, including the state attenuation coefficient
and the state injection vector
:
where
is the Sigmoid activation function, tanh is the hyperbolic tangent activation function,
,
are mapping weights, and
are biases. Based on the above parameters, a dynamic state recurrence equation is constructed to update the hidden state
, which is then mapped back to the feature space to obtain
:
where
is the initial state vector, and ⊙ denotes element-wise multiplication. Meanwhile, to capture local temporal dependencies, a depthwise separable 1D convolution is introduced to process the input features to obtain
, and the state features and convolution features are fused through a gating mechanism:
where
is the multi-scale degradation feature output by the DST-Net, and
is the feature fusion gating vector. By stacking multiple layers of DST-Net units, a multi-scale feature sequence containing rich historical degradation information
can be obtained.
3.3. Sequence Pattern Attention
In the battery degradation process, the contribution of capacity fluctuations at different time steps to RUL prediction varies. To highlight the information of key time steps, this paper designs the SPA mechanism. First, the feature sequence output by the DST-Net is mapped to the attention space to calculate the intermediate representation:
where
are attention mapping parameters, and
is the weight vector. Subsequently, the Softmax function is used to calculate the normalized weight
of each time step:
Finally, a weighted summation is performed on the feature sequence to obtain the multi-scale degradation feature
focusing on key degradation patterns:
3.4. The Designed Reinforcement Learning-Based Adaptive Gated Fusion Mechanism
To achieve dynamic fusion between multi-scale features and the current degradation state, this paper proposes an adaptive gating mechanism based on reinforcement learning. This mechanism intelligently adjusts the proportion of feature fusion by perceiving the current degradation state.
3.4.1. Definition of State and Action Space
The hidden state
at the end of the sequence is extracted as the current degradation state feature. The reinforcement learning state vector
is defined as the statistical characteristics of the battery capacity within the current window to characterize the degradation level and fluctuation degree:
where
are the mean and variance of the capacity within the window, respectively. A policy network
is constructed to generate actions. To ensure exploration and stability, actions
are sampled from a truncated Gaussian distribution:
where
denotes the upper bound of action adjustment amplitude, and
represents the standard deviation used for exploration.
3.4.2. Adaptive Fusion and Loss Function
The final gating value
is jointly determined by the learnable base gating parameter
and the reinforcement learning action
:
Using this adaptive gate, the multi-scale degradation features
and the current degradation state features
are fused, and the RUL prediction value
for the next cycle is output through a fully connected layer:
To jointly optimize prediction accuracy and the policy network, the model adopts a combined loss function
for training:
where
denotes the mean squared error prediction loss,
represents the policy gradient loss,
is the reward function defined based on prediction error, and
is the balancing coefficient. This design enables the model to adaptively learn the optimal feature fusion strategy according to prediction feedback.
4. Experimental Validation and Analysis
To evaluate the performance of the proposed network model in the lithium-ion battery life prediction task, the publicly available full-life lithium-ion battery experimental dataset released by the National Aeronautics and Space Administration (NASA) was selected for validation.
The lifetime of lithium-ion batteries cannot be directly measured and is typically indirectly estimated through related parameters such as capacity, voltage, and current. Therefore, to rapidly determine battery health status and its remaining useful life, it is first necessary to identify key indicators that can represent battery performance. Battery performance degradation is the comprehensive manifestation of various internal physical and chemical changes during cycling. The variation in performance parameters is mainly reflected in capacity decrease and internal resistance increase. Therefore, battery capacity or internal resistance is commonly used to define RUL. In practical research, defining RUL based on capacity is more common. Accordingly, this study estimates battery lifetime based on the remaining capacity, and the life label is defined as follows:
where
denotes the rated capacity of the battery;
represents the current actual capacity of the battery, which is measured under standard charge–discharge conditions. When the battery is new, the current actual capacity equals the rated capacity, and the RUL is 100%.
4.1. Experimental Description and Data Acquisition
This study adopts the publicly available lithium-ion battery degradation dataset provided by NASA to evaluate the predictive performance of the proposed model under real-world conditions. The dataset, released by NASA Ames Research Center, is widely used in battery health management and RUL prediction research and is highly authoritative and representative.
The dataset includes full-life discharge cycle data of multiple lithium-ion batteries under different operating conditions, covering various sensor signals such as discharge capacity, voltage, current, and temperature. In this paper, four battery cells (B0005, B0006, B0007, and B0018) are selected as experimental subjects. These batteries were charged in constant current–constant voltage (CC–CV) mode and discharged at constant current to a specified threshold. Throughout their lifetime, the batteries experienced hundreds of complete charge–discharge cycles until their capacity degraded below 70% of the rated capacity, at which point they were considered failed.
The discharge capacity recorded in each cycle reflects the gradual degradation trend of battery performance, exhibiting clear temporal characteristics and nonlinear degradation patterns. Due to its authenticity, reliability, and inclusion of multiple degradation trajectories, this dataset provides an excellent validation platform for lithium-ion battery lifetime modeling and prediction methods.
Table 1. presents the dataset description.
Charging process: The four lithium-ion batteries (models B0005, B0006, B0007, and B0018) were operated at room temperature and charged at a constant current (CC) of 1.5 A until reaching 4.2 V, followed by constant voltage (CV) charging until the charging current decreased to 20 mA.
Discharging process: Discharge was conducted at a constant current (CC) level of 2 A until the battery voltages of B0005, B0006, B0007, and B0018 dropped to 2.7 V, 2.5 V, 2.2 V, and 2.5 V, respectively.
Figure 2. illustrates the full-life capacity degradation curves of the four batteries.
4.2. Experimental Setup and Analysis
4.2.1. Model Parameter Settings
In this study, the hyperparameter configuration for model training is as follows: batch size is set to 32, total training epochs are 1000, the initial learning rate is 0.001, and the Adam optimizer is used for gradient descent and parameter updating. Unlike traditional regression tasks, to jointly optimize prediction accuracy and gating strategy, the network is trained using a combined loss function composed of supervised prediction loss and reinforcement learning policy loss.
Model construction and training are implemented in Python 3.9 based on the PyTorch 2.1.0 framework. The overall structural parameters are shown in
Table 2. During data processing, the data are organized into tensors of shape (batch_size, sequence_length, input_size) and fed into the network model, where batch_size represents the batch size, sequence_length denotes the input sequence length, and input_size indicates the feature dimension of each sample. The true RUL value corresponding to each time step is used as the supervised label, and end-to-end optimization of the RDDEMN model parameters is achieved via backpropagation.
4.2.2. Comparative Experiments
To highlight the advancement of the proposed algorithm, comparative experiments were conducted on the NASA dataset against four state-of-the-art models: CNN, CNN + LSTM, RNN, and GRU. To ensure a fair comparison, all models were configured with identical parameter settings. Three classical error metrics—mean squared error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE)—were selected to evaluate the predictive performance of each model, with all metrics expressed in percentage form to facilitate horizontal comparison. The visualization of the prediction results for the test samples from the NASA dataset is presented in
Figure 3.
From the overall trends illustrated in
Figure 3, it is evident that the proposed RDDEMN model (OUR) achieves the best performance across all four battery datasets. In particular, RDDEMN demonstrates significant advantages in the RMSE and MAE metrics, maintaining error levels in the lowest range compared to the baseline models. This indicates that by introducing the reinforcement learning-based adaptive gating mechanism, the model possesses superior generalization capability and stability when handling complex degradation patterns and varying battery operating conditions. In contrast, both CNN and CNN + LSTM exhibit relatively poor performance across the three evaluation metrics; notably, the CNN model shows abnormally high prediction errors for battery B0018, indicating poor stability. This further confirms that feature extraction methods based solely on CNN are not well-suited for time-series data such as RUL prediction, which exhibits strong temporal dependencies. Meanwhile, although the RNN and GRU models can capture temporal features, they show large fluctuations in error across different samples. Overall, the proposed RDDEMN method not only maintains high prediction accuracy on individual battery samples but also demonstrates robust cross-sample and cross-condition generalization. The combination of high accuracy and strong stability enables the RDDEMN model to provide a reliable technical solution for fault diagnosis in practical battery health management tasks.
To intuitively demonstrate the prediction reliability across the battery life cycle,
Figure 2 illustrates the degradation trajectories of the true capacity versus the RDDEMN predicted capacity over discharge cycles. A horizontal dashed line is plotted at the 1.4 Ah threshold, representing the 70% capacity retention End of Life (EOL) criterion. Furthermore,
Table 3 details the exact EOL prediction cycle results for each battery cell. As observed, the predicted trajectories closely track the true capacity dynamics even in highly nonlinear degradation phases. The absolute cycle errors at the specific EOL points are remarkably small (ranging from 0 to 2 cycles), with the relative prediction errors controlled within 2.1%. These quantitative EOL results verify that the proposed RDDEMN model provides highly transparent and reliable early warnings for battery failure, fulfilling the practical requirements of real-world battery health management systems.
4.2.3. Ablation Experiments
To validate the modeling capability of the proposed attention mechanism along the temporal dimension, further analysis and visualization were conducted on the attention weight distribution learned by the model during the testing phase. As shown in
Figure 4, the learned average attention weights exhibit a clearly non-uniform distribution along the time dimension. The most recent capacity states are assigned higher weights, indicating their dominant role in future capacity prediction. Meanwhile, early historical information retains a certain level of attention weight, whereas the intermediate time steps contribute relatively less. These results demonstrate that the proposed attention mechanism can adaptively model multi-scale temporal dependencies in the battery capacity degradation process, rather than relying solely on a fixed window or single time-step information.
From the overall trend, the attention weights of the four battery groups all exhibit a distinctly non-uniform distribution over time, generally assigning higher weights to capacity states closer to the prediction time. This indicates that the model relies more heavily on recent degradation information during capacity prediction, which is consistent with the inherent time-correlated physical characteristics of battery capacity evolution. Differences in attention distributions are observed among different batteries. Battery B0006 shows a more concentrated attention allocation in the later stages of its lifetime, whereas B0005, B0007, and B0018 display gradually increasing attention weights over time, reflecting their relatively smooth degradation patterns. It is noteworthy that although the model emphasizes recent information, early historical states still maintain a certain degree of contribution. This indicates that the proposed attention mechanism does not degenerate into a single-step prediction model relying only on the latest observation, but instead adaptively balances short-term variations and long-term degradation characteristics across the temporal dimension, thereby effectively modeling multi-scale temporal dependencies in battery capacity degradation.
To further verify the effectiveness of the proposed model components, ablation experiments were conducted on the RDDEMN model using the NASA dataset. The objective was to determine the impact of the SPA module and the RL module on prediction accuracy and stability.
Table 4 presents the prediction results after removing different components of the model. A0 denotes the model without SPA, A1 denotes the model without RL, and A2 denotes the model with both components removed. The analysis shows that removing any single component leads to performance degradation, which fully verifies the effectiveness of each constructed module in the model.
Figure 5 illustrates the visualization results of the ablation experiments. Based on comprehensive analysis of the MSE, RMSE, and MAE metrics, the RDDEMN model demonstrates significant advantages in battery lifetime prediction tasks. Compared with A0, A1, and A2, the RDDEMN model consistently achieves the lowest prediction errors across all battery types, reflecting superior prediction accuracy and stability. The A2 model exhibits a sharp increase in prediction error for batteries such as B0006, indicating poor stability, thereby validating the effectiveness and superiority of the proposed architecture in battery lifetime prediction tasks.