1. Introduction
Lithium-ion batteries are essential energy storage technologies in modern applications, renowned for their high energy density, long lifespan, and low self-discharge rate. They are increasingly used in electric vehicles and energy storage devices [
1,
2]. However, with long-term use, battery performance gradually degrades due to internal chemical reactions and external environmental factors, impacting its lifespan and potentially causing failure [
3,
4,
5]. As batteries age, thermal runaway can occur, leading to rapid temperature increases that may result in combustion, posing significant safety risks [
6,
7]. Therefore, accurate evaluation of the SOH of lithium batteries has become a key factor in ensuring their safe and stable operation.
Battery SOH is a key indicator to measure the current health level of the battery, which reflects the performance changes and degradation of the battery during use [
8,
9]. SOH is usually expressed as the ratio between the current maximum available capacity of the battery and its initial rated capacity, and the calculation method is as follows:
where
is the maximum available capacity of the battery at the
n-th charge and discharge cycle, and
is the initial rated capacity of the battery. As the battery is used, the maximum available capacity of the battery gradually decreases, resulting in a decrease in SOH.
The battery management system (BMS) is a crucial component of modern battery-driven devices, such as electric vehicles, battery energy storage systems, and mobile devices. Its primary responsibilities include ensuring the safety, efficiency, and long-term performance of the battery. SOH prediction plays a vital role in this [
10,
11]. By monitoring the battery health in real time, the BMS can provide early warning if the battery is nearing failure and detect abnormal signs of battery degradation timely, thereby effectively reducing the risk of thermal runaway and ensuring that the battery operates within a safe operating range. However, implementing SOH prediction in a BMS is challenging. First, SOH is influenced by a variety of factors, including environmental conditions like temperature and humidity, usage patterns such as charge and discharge rates and depth of discharge, material properties, and the complex changes in the internal chemical reactions of the battery, making the battery degradation process highly complex and nonlinear. This complexity makes predicting battery degradation difficult. Second, SOH prediction depends on multidimensional data, such as a battery’s charge and discharge history, voltage, current, and temperature information. In practical applications, however, the quality and uncertainty of data often pose significant challenges to achieving accurate prediction. Finally, the SOH prediction model needs robust generalization capabilities to be effective across different battery types, brands, and working conditions.
The degradation process of lithium batteries is affected by multiple factors, such as charging and discharging cycles, temperature fluctuations, and material aging. These factors cause the battery to exhibit different degradation characteristics at different stages of use and working conditions. Therefore, it is particularly important to identify key features related to battery aging. It is worth noting that the battery degradation process usually exhibits obvious nonlinear characteristics. Especially after extended use, the degradation rate tends to accelerate, which poses significant challenges to accurately predicting SOH [
12,
13,
14].
Research on lithium battery SOH prediction can be divided into two categories: physical model-based methods and data-driven model-based methods. Physical model-based methods focus on in-depth exploration of the dynamic process of electrochemical reactions inside batteries by establishing equivalent circuits and electrochemical models of batteries. Such models usually simulate the current and voltage changes of batteries and internal chemical reactions to achieve accurate prediction of battery SOH [
15,
16,
17,
18,
19,
20]. For example, Rahman et al. [
15] used the particle swarm optimization (PSO) algorithm to identify the key parameters of the electrochemical model of lithium batteries containing LiCoO
2 cathode materials, thereby generating corresponding battery models for healthy batteries and degraded batteries. Sung et al. [
16] simplified the battery model into differential algebraic equations and used parameters such as potential, lithium-ion concentration, and lithium molar flux to calculate and solve them. Forman et al. [
17] identified parameters such as anode equilibrium potential, cathode equilibrium potential, and solution conductivity based on the voltage and current-cycle data of the battery, and further established an electrochemical model for lithium-ion batteries. These electrochemical models can describe the internal electrochemical reactions of the battery with high accuracy, thereby achieving accurate prediction of the battery SOH. However, these models also have certain limitations. They usually rely on prior knowledge of battery electrochemical principles and a large amount of empirical data. In addition, the model establishment process is computationally intensive, the equations are complex, and the solutions are complicated, making them difficult to apply in actual scenarios.
With the continuous development of artificial intelligence technology, data-driven machine learning and deep-learning methods have become a hot research direction for lithium battery SOH prediction. These methods analyze various measurement data collected from the battery during operation and use machine learning and deep-learning models to extract features and make predictions. Unlike physical model-based methods, data-driven methods do not rely on prior knowledge or experience of battery electrochemical principles. Instead, they process the data obtained during the charging and discharging process to extract features related to SOH, thereby realizing the prediction of battery status [
21,
22,
23,
24,
25,
26]. For example, Gong et al. [
24] extracted four health indicators during the charging and discharging process, established an LSTM model to map the relationship between health indicators and battery SOH, and used a particle swarm optimization algorithm to optimize the key hyperparameters of the neural network. By utilizing optimization algorithms to search for hyperparameters, the model can effectively achieve the optimal performance. However, using only four direct health indicators as input may limit the model’s generalization capabilities and the applicability of its features. Additionally, this approach may not fully leverage the potential of the model architecture, potentially hindering its ability to process more complex or diverse data. Yang et al. [
25] used a convolutional neural network (CNN) to extract health indicators and SOH changes between two consecutive charging and discharging cycles, and combined it with a random forest algorithm to generate the final SOH estimate. Although CNN demonstrates robust feature extraction capabilities, the process of extracting indicators is relatively complex and cumbersome. Moreover, it is highly susceptible to noise interference, which can adversely affect the model’s stability and robustness. Furthermore, the generalization performance of the model still requires validation, potentially limiting its effectiveness and reliability in practical applications. Wei et al. [
26] extracted seven health indicators from the reference discharge data of the battery and combined a deep neural network (DNN) and a Markov chain model to predict SOH. Constructing a nonlinear DNN model with multiple hidden layers effectively extracts local features from data. However, this model is susceptible to capturing random noise from the training set, which can lead to overfitting. To address this issue, a Markov chain is introduced for error correction, thereby improving prediction accuracy. While this strategy significantly enhances model performance, it also substantially increases the demand for computing resources. These methods have shown excellent prediction capabilities in experiments, indicating that extracting typical features from battery capacity decay data and establishing a mapping relationship between features and SOH are crucial for data-driven battery SOH prediction. Compared with single-feature extraction, the use of multiple-feature sets and feature combinations of different scales has been proven to be a key technology to improve model performance [
27,
28]. However, for batteries with different electrochemical reactions, charge and discharge curves, and ambient temperatures, the generalization ability and scope of application of existing models still need to be further verified and optimized.
To this end, this paper proposes an innovative method based on multidimensional features and a transformer–LSTM fusion model. This method first extracts multiple features from the battery charge and discharge cycle data, including voltage, energy, and temperature curve time domain, frequency domain, and time dimension features. Subsequently, the local outlier factor (LOF) algorithm is used to identify and remove outliers in the data, and data smoothing is performed through linear interpolation and a Savitzky–Golay filter, thereby effectively improving the data quality. Next, the features most relevant to SOH are selected through correlation analysis as model input. In order to further improve the accuracy and stability of SOH prediction, we constructed a transformer–LSTM fusion model, combining the self-attention mechanism of transformer and the long-term dependency capture capability of LSTM. To verify the effectiveness of the model, we used 124 battery datasets generated under 72 different fast-charging conditions for training and testing. The experimental results indicate that the proposed model exhibits superior performance in SOH prediction and demonstrates strong generalization capability. The SOH prediction process is shown in
Figure 1.
The main contributions of this paper are summarized as follows.
A feature extraction method based on battery charge and discharge cycle data is proposed to extract the time domain, frequency domain, and time dimension characteristics of voltage, energy, and temperature curves.
Combining an LOF algorithm, Savitzky–Golay filter, and Pearson correlation analysis, the effectiveness of the feature dataset is further improved.
A transformer–LSTM fusion model is built, which uses the self-attention mechanism of transformer and the long-term dependency capture capability of LSTM to improve the accuracy and stability of SOH prediction.
Experimental results verify that the proposed method has significant SOH prediction performance, showing strong reliability and good generalizability.
4. Results and Discussion
4.1. Experimental Setup
The MIT battery dataset contains three batches of batteries, namely the “2017-05-12” batch, the “2017-06-30” batch, and the “2018-04-12” batch. According to the experimental scheme proposed by Severson et al. [
29], the dataset is divided into a training set and two test sets. The training set is used for model training and parameter selection, including 20 batteries from the first batch and 21 batteries from the second batch; the test set is used to evaluate model performance, where the main test set contains 21 batteries from the first batch and 22 batteries from the second batch; and the secondary test set includes 40 batteries from the third batch. The first two batches of batteries are from the same year, constituting the training set and the main test set, and the third batch is from the batteries of the second year, serving as the secondary test set. In the experiment, the prediction results of the main test set are used to verify the accuracy of the model, while the results of the secondary test set better demonstrate the generalization ability of the model.
Based on the transformer–LSTM fusion model, we conducted three experiments on SOH prediction. In the first experiment, the degradation feature data from the same battery was divided into training set, validation set, and test set in a ratio of 4:1:5. The model was trained on the first half of the dataset to predict the SOH of the second half. The second experiment utilized the training set consisting of data from 41 batteries and selected data from one battery as the validation set. The model then performed SOH prediction on two different test sets, one consisting of 43 battery data and the other consisting of 40 battery data. The third experiment served as a comparative test to evaluate the performance of various models—LSTM, CNN–LSTM, and transformer–LSTM fusion—in SOH prediction. These experiments comprehensively assessed the predictive performance of the proposed model under diverse conditions. The experiments were conducted using Python 3.8 as the programming language and implemented with toolkits such as PyTorch 2.3.1, Scikit-learn 1.3.2, and Pandas 2.0.3.
4.2. Evaluation Indicators
We used root mean square error (
RMSE) and mean absolute error (
MAE) as evaluation indicators to quantify the error between the model prediction results and the actual observations to evaluate the model’s prediction performance. The definitions are as follows:
where
is the number of samples,
is the actual SOH, and
is the predicted SOH. Generally speaking, the smaller the values of
RMSE and
MAE, the higher the model’s prediction accuracy, indicating that the model’s performance is better.
4.3. Prediction on the Same Battery
In predicting the SOH of the same battery, the main challenges are twofold. On the one hand, the selected features need to be closely related to the SOH; on the other hand, the change in battery capacity usually shows a slow decline in the early stage, and as the use time increases, the SOH will enter a stage of accelerated decline along with the battery capacity. Therefore, the selection of features needs to accurately reflect the changing trends of the long-term health status of the battery. At the same time, the prediction model must adapt to the accelerated decline phase based on the training data of early capacity degradation to achieve accurate SOH prediction.
We selected two batteries from each of the batches “2017-05-12”, “2017-06-30”, and “2018-04-12” in the dataset to predict the SOH of the same battery. These batteries were those with channel numbers 18 and 20 in the “2017-05-12” batch, channel numbers 8 and 48 in the “2017-06-30” batch, and channel numbers 17 and 42 in the “2018-04-12” batch. In the dataset of each battery, we used the first half as the training set and the second half as the test set, with a ratio of 1:1 between the training set and the test set. In the SOH prediction for the same battery, selecting a small number of representative features can often bring better prediction results than using more features. However, too few features may be interfered with by noise, thus affecting the accuracy of the prediction. Finally, three key features were selected from batches “2017-05-12” and “2018-04-12”, and five key features were selected from the “2017-06-30” batch for SOH prediction.
The SOH prediction results for the same battery are shown in
Figure 10. It can be observed that the SOH value (blue curve) in the first half shows a slow decay trend, while in the second half, the rate of decline of SOH is significantly accelerated. Using the first-half data to predict the second half, although a noticeable deviation exists between the predicted and true values towards the end of the prediction, the results indicate that the trend between the predicted values (red curve) and the true values is largely consistent. According to the prediction evaluation indicators shown in
Table 1, the
RMSE values of the six batteries are between 0.00205 and 0.00984 and the
MAE values are between 0.00163 and 0.00621. These data show that although the prediction errors of batteries in different batches are different, overall, the model has high accuracy in SOH prediction. In particular, the battery with channel number 42 in the “2018-04-12” batch has a very small prediction error, which is consistent with the results in
Figure 10. This shows that the features selected based on the transformer–LSTM prediction model can accurately capture the changing patterns of battery SOH, verifying the effectiveness and accuracy of the selected features and models in battery SOH prediction.
4.4. Prediction of the Different Batteries
Predicting the SOH of different batteries presents two main challenges: feature selection effectiveness and model generalization capability. First, whether the selected features effectively reflect changes in SOH directly impacts prediction accuracy. Therefore, selected features must strongly correlate with the battery’s degradation process and be representative enough to capture the dynamics of its SOH. Second, the model’s generalization ability is of critical importance. Factors such as operating environment and cycle conditions may lead to varying performance across batteries during SOH degradation. Thus, the prediction model must generalize well, adapt to the characteristics of different batteries, and accurately infer their SOH trends.
For different batteries, we used the training set and two test sets in
Section 4.1 for SOH prediction. The training set contains the complete feature data of 41 batteries from the first two batches, and the test set is divided into the primary test set (containing 43 batteries) and the secondary test set (containing 40 batteries). In order to ensure the representativeness of the features while avoiding increasing the computational burden of the model, we chose to predict battery SOH based on 10 features. In the two test sets, six batteries were randomly selected for prediction performance evaluation: batteries with channel numbers 27 and 36 from the “2017-05-12” batch, batteries with channel numbers 11 and 17 from the “2017-06-30” batch, and batteries with channel numbers 20 and 30 from the “2018-04-12” batch.
The SOH prediction results for different batteries are shown in
Figure 11. Although the cycles of different batteries vary from 463 to 874, their SOH change trends show similar decay patterns. Even so, the model demonstrates excellent predictive performance on the test set. The predicted value (red curve) is highly consistent with the true value (blue curve), especially in the critical stage of battery decay, and the prediction model can accurately capture the change in SOH. In particular, for the battery with channel number 36 in the batch “2017-05-12” and the battery with channel number 20 in the batch “2018-04-12”, the predicted
RMSE and
MAE values are both less than 0.002, as shown in
Table 2, showing extremely low prediction errors and the high-precision prediction ability of the model on these batteries. For the battery with channel number 17 in the batch “2017-06-30”, the figure shows that the SOH is interfered with by noise, but the model can still accurately reflect the overall decay trend of the battery. As for the battery with channel number 30 in the batch of “2018-04-12”, unlike other batteries, its SOH maintained a similar downward trend throughout the cycle. Although there was a certain deviation between the predicted value and the true value at the beginning and end of the data, the overall performance remained consistent. These results show that the model performs stably on different batteries and can effectively handle the complexity and differences in battery SOH prediction.
4.5. Comparative Experiments
To verify the effectiveness of the proposed method in battery SOH prediction, we compared the performance of three models: LSTM, CNN–LSTM, and transformer–LSTM. For the experiments, we selected the battery with channel number 44 in the batch “2017-05-12” and the battery with channel number 48 in the batch “2018-04-12” for testing, and the prediction results displayed in
Figure 12. The prediction accuracy metrics of each model, including
RMSE and
MAE, are documented in
Table 3. The results indicate that the proposed method excels in the SOH prediction task, achieving significantly lower prediction errors compared to other models. Specifically, while the LSTM model is adept at capturing long-term dependencies, it falls short in terms of prediction accuracy. The CNN–LSTM model has an advantage in extracting local spatial features from the input data; however, the transformer–LSTM model outperforms the others in terms of prediction accuracy and generalization capability. These results clearly highlight the significant advantages of the proposed method in terms of prediction accuracy and generalization performance.
4.6. Generalization Performance
In
Section 4.3, SOH prediction for the same battery demonstrates that the sizes of the training and test sets are uneven due to the varying end-of-life cycles of each battery, which partially evaluates the model’s generalization capability. Despite these differences, the model demonstrates its generalization capability by predicting the battery’s SOH changes with reasonable accuracy. In
Section 4.4, the training set is constructed using batteries from different batches, which further improves the generalization ability of the model. In particular, the data of the “2018-04-12” batch is used as a secondary dataset. This batch is one year apart and uses different batteries for prediction. As shown in
Table 4, the average
RMSE and
MAE values predicted by the model on the two test sets are shown. The results thoroughly verify the adaptability and robustness of the model when facing different batteries, and further prove that prediction ability based on the transformer–LSTM model can still be stable under different batteries and different usage cycles.
4.7. Ablation Experiment
We thoroughly investigated the noise level within the dataset and its effects on the prediction accuracy and model robustness of SOH. As illustrated in
Figure 4, there are some outliers in the dataset that are obviously deviant, and some of them are far from the overall data curve. If the curve containing these outliers is directly normalized and correlated, these features are obviously unable to effectively characterize SOH. Through experiments, it was found that there were notable differences between the features after noise removal and the features selected for the non-denoised data. Although the features selected for the non-denoised data retained some capacity to characterize SOH, their prediction performance was considerably inferior, as depicted in
Figure 13. Compared with the prediction results after denoising, the prediction error of the non-denoised model is larger, the prediction curve appears to be more unsmooth, and abnormal prediction values appear. These findings underscore the substantial negative impact of noise on prediction performance, highlighting that denoising significantly enhances the model’s prediction accuracy.
5. Conclusions
This paper proposes a method for predicting battery SOH based on multidimensional features and a transformer–LSTM fusion model, aiming to enhance the accuracy and generalizability of SOH prediction. The key findings reveal that by extracting features from the time domain, frequency domain, and time dimension of the voltage, energy and temperature curves, the trend of battery degradation can be effectively captured. Furthermore, denoising the feature data significantly improves prediction accuracy and reduces the negative impact of noise on model performance. By integrating these optimized features with the transformer–LSTM fusion model for prediction, the experimental results demonstrate that the proposed model excels in both prediction accuracy and generalization ability, thus verifying its efficacy and practicality in SOH prediction.
Looking ahead, this study will focus further on the implementation and optimization of SOH prediction in actual application scenarios. Compared to laboratory scenarios, the operating conditions of actual batteries are considerably more complex and variable. They are influenced by unstable external factors such as fluctuations in ambient temperature, variations in user operating habits, and the use of diverse charging equipment. These factors pose challenges to the accuracy of SOH prediction. Therefore, to ensure effective implementation of SOH prediction in real-world applications, it is crucial to adapt and optimize the model based on actual operating data of the batteries to accommodate changes in real-life scenarios.