1. Introduction
Compared to traditional batteries such as lead-acid or nickel-metal hydride (NiMH) batteries, lithium-ion batteries offer superior performance in terms of high energy density [
1], high voltage [
2], long lifespan [
3], and low self-discharge rate [
4], and they are widely used in high-power energy storage applications such as communications and aerospace. In addition, lithium-ion batteries have become a key energy storage technology for electric vehicle power supply [
5], and battery management systems (BMSs) can effectively monitor the power supply of the equipment, maintain the normal operation of power devices, and prevent accidents. One of the main functions of BMSs is to provide accurate information about the internal state of the battery, such as its SOH [
6], State of Charge (SOC) [
7], and State of Power (SOP) [
8]. The SOH characterizes the health state of the battery, typically represented by capacity or power loss. As the battery is continuously used, the SOH gradually decreases, and the overall performance of the battery deteriorates, characterized by capacity fading and increased internal resistance [
9]. Since the SOH cannot be directly measured, accurately estimating the SOH is crucial [
10,
11,
12]. The SOH represents the battery’s ability to store electrical energy and is generally expressed as a percentage. If the SOH of a battery is below 80%, the battery is usually considered to be discarded and no longer usable.
Currently, SOH estimation methods can be broadly categorized into the following: (1) Direct Measurement Methods: These methods include various techniques used to estimate the SOH of lithium-ion batteries [
13]. Coulomb counting, as a basic estimation technique, is simple and reliable, but its accuracy is largely dependent on the battery’s initial charge state and often fails to fully account for dynamic factors such as temperature and internal resistance changes during charging and discharging, which may lead to cumulative errors [
14]. In contrast, the Open Circuit Voltage (OCV) method performs well in terms of accuracy and is relatively simple to operate. However, it requires the battery to be idle for a long time to take accurate measurements, which significantly limits its real-time applicability in practical scenarios [
15]. On the other hand, the internal resistance method estimates the SOH by measuring the internal resistance of the battery. Although this method is theoretically feasible, its accuracy is often compromised in practical applications due to interference from measurement noise and the precision of the measurement technique [
16,
17]. (2) Model-Based Methods: In the model-based estimation of the SOH for lithium-ion batteries, a variety of models are used. First, empirical models combine mathematical theory with practical experience to construct empirical or semi-empirical models based on the collection and analysis of experimental data [
18]. However, the accuracy of this method is directly influenced by the completeness and accuracy of the available data [
19]. Second, impedance models estimate the SOH by measuring the impedance spectrum and its correlation with the SOH. While this method requires a deep understanding of electrochemical reactions and is costly, it has been proven effective [
20]. Finally, equivalent circuit models simulate the chemical reaction processes inside the battery using electrochemical components. While this method accurately reflects the actual operating conditions of the battery, it requires the construction of a complex circuit model and extensive computation to determine the model parameters, which can be computationally expensive [
21]. (3) Data-Driven Methods: Data-driven methods have gained significant importance in the estimation of the SOH for lithium-ion batteries. These methods involve advanced techniques such as Gaussian process regression [
22], genetic algorithms [
23], Kalman filtering [
24], SVM [
25], and artificial neural networks (ANNs) [
26,
27]. Gaussian process regression relies on a Gaussian process prior to analyze the collected data and estimate the SOH, with its estimation accuracy being directly influenced by the distribution characteristics of the actual data. Genetic algorithms simulate the natural evolution process to find the optimal solution, and while they can estimate the SOH, their results are significantly affected by the initial population selection, introducing some uncertainty. Kalman filtering estimates the optimal value of the current state based on historical data and current observations, but this method demands high precision in both the model and data processing capabilities. Support vector machines estimate the SOH by using a small number of data points as support vectors and minimizing structural risk, though they are sensitive to noise, and their computational process is relatively complex. Finally, artificial neural networks simulate biological neural network operations and adjust internal weights to estimate the SOH. However, this method requires a large amount of experimental data, making data acquisition costly. This section presents an overview of the current methods for estimating the SOH of lithium-ion batteries, discussing the strengths and weaknesses of each approach.
In addition, combining models to predict the SOH can further improve accuracy. Sun et al. [
28] proposed an SOH prediction method for lead-acid batteries based on the CNN-BiLSTM-Attention model. This model uses a convolutional neural network (CNN) to extract features and reduce the dimensionality of the input factors, which are then used as inputs to the bidirectional long short-term memory network (BiLSTM). By introducing the attention mechanism, the model places more focus on the key features in the input sequence that significantly impact the output results, ultimately achieving multi-step prediction of the battery’s SOH. Jia et al. [
29] proposed a multi-scale RUL and SOH prediction method that combines the wavelet neural network (WNN) with the unscented particle filter (UPF) model. Through discrete wavelet transform (DWT), the capacity degradation data of lithium-ion batteries are decomposed into low-frequency degradation trends and high-frequency fluctuation components. Based on the WNN-UPF model, the long-term RUL of lithium-ion batteries is predicted using the low-frequency degradation trend data. The high-frequency fluctuation data and RUL prediction results are effectively integrated to estimate the short-term SOH of lithium-ion batteries. The combination of XGBoost and ARIMA models demonstrates significant advantages in predicting the State of Health (SOH) of lithium-ion batteries, particularly in terms of their complementarity, computational efficiency, and anti-overfitting capabilities. XGBoost, as an ensemble learning method based on gradient-boosted trees, excels at handling complex nonlinear relationships. By iteratively optimizing feature weights, it effectively captures the evolving patterns of battery performance over time. However, despite its outstanding performance in nonlinear modeling, XGBoost has limitations when it comes to handling the long-term trends and short-term fluctuations inherent in time series data. ARIMA, a classic time series analysis method, effectively compensates for these shortcomings of XGBoost. ARIMA specializes in capturing time dependencies and modeling long-term trends and periodic fluctuations, which enhances the accuracy of SOH prediction. ARIMA further strengthens XGBoost’s ability to model time series data, thereby improving the stability and accuracy of the predictions. The XGBoost–ARIMA hybrid model not only outperforms individual models in terms of accuracy but also demonstrates strong anti-overfitting abilities, reducing both bias and variance. XGBoost itself has high computational efficiency, enabling fast processing of large datasets, while ARIMA processes time series characteristics with relatively low computational cost. The combination of the two models provides a strong computational advantage and lower time costs. Therefore, the XGBoost–ARIMA model excels in accuracy, stability, generalization, and computational efficiency, making it particularly suitable for SOH prediction in real-world battery management systems.
To address the challenges and limitations of traditional methods for estimating the SOH of lithium-ion batteries, we propose a novel intelligent estimation approach. First, by utilizing matplotlib for data visualization, this method allows for an intuitive tracking of the battery’s performance over time, enabling the identification of critical features essential for SOH estimation. These features are crucial in precisely capturing signs of battery aging. Subsequently, an XGBoost-based model is constructed, which leverages these key features to accurately estimate the SOH of the battery. To further optimize the estimation results, the ARIMA model is introduced to correct the XGBoost predictions. Given its excellent ability to process time series data, the ARIMA model is adept at capturing long-term trends and periodic fluctuations in the data, thereby refining the estimation and ensuring it aligns more closely with actual battery conditions. The primary contributions of this paper are summarized as follows:
Innovative Integration of Data Visualization and Feature Extraction: By using matplotlib for data visualization, our method provides an intuitive way to track the battery’s performance over time. This approach not only allows for better monitoring of battery health but also effectively identifies key features crucial for accurate SOH estimation, improving both feature extraction and estimation relevance.
XGBoost and ARIMA Joint Optimization Approach: We introduce a unique method that combines the XGBoost algorithm with the ARIMA model. XGBoost ensures robust SOH estimation by leveraging key features, while ARIMA optimizes the predictions by adjusting for time series trends and periodic fluctuations. This combined approach enhances the final estimate’s precision, effectively overcoming the limitations of single-model methods.
Automated Feature Extraction and Anti-Overfitting: Our method features automated feature extraction, minimizing manual input and improving both efficiency and reliability in the estimation process. Furthermore, the models employed exhibit strong anti-overfitting capabilities and excellent generalization performance, ensuring high prediction accuracy across different battery types and operational environments. This makes the method highly applicable and dependable in real-world scenarios.
The remainder of this paper is organized as follows:
Section 2 introduces the XGBoost and ARIMA algorithms;
Section 3 presents the lithium-ion battery SOH prediction model based on the XGBoost–ARIMA algorithm;
Section 4 provides experimental simulations and result analysis; and
Section 5 concludes the paper and outlines directions for future work.
4. Experimental Results and Analysis
The data used in this experiment were obtained from the NASA Ames Research Center [
36]. Under steady-state discharge conditions, four sets of lithium-ion battery datasets (B0005, B0006, B0007, and B0018) were collected through three different operations: charging, discharging, and impedance testing. These four battery sets (B0005, B0006, B0007, and B0018) were operated at room temperature with three different processes: charging, discharging, and impedance measurement.
For charging, the batteries were charged in a constant current (CC) mode at 1.5 A until the battery voltage reached 4.2 V, after which the charging continued in a constant voltage (CV) mode until the charging current dropped to 20 mA. For discharging, a constant current (CC) of 2 A was applied until the battery voltages for B0005, B0006, B0007, and B0018 dropped to 2.7 V, 2.5 V, 2.2 V, and 2.5 V, respectively. Impedance measurements were taken through electrochemical impedance spectroscopy (EIS), with frequency scans from 0.1 Hz to 5 kHz. Repeated charging and discharging cycles lead to accelerated aging of the batteries, while impedance measurements provide deeper insights into the internal parameter changes during battery aging. The experiment ends when the battery reaches its End of Life (EOL) criteria, defined as a 30% reduction in rated capacity (from 2.0 Ah to 1.4 Ah). This dataset is valuable for predicting the remaining capacity (for a given discharge cycle) and the Remaining Useful Life (RUL) of the batteries.
Figure 5 shows the discharge data (voltage, current, temperature, and SOH) for batteries B0005, B0006, and B0007. To verify the generalizability of the XGBoost algorithm for SOH estimation, the learning rate was set to 0.2, the minimum leaf weight was set to 1, and the tree depth was set to 3 (experimental results indicate that this model converges). Two sets of experiments were conducted: In the first set, the discharge data of B0006 and B0007 were used as the training dataset for the model, while the discharge data of B0005 were used as the test dataset to evaluate the model’s performance. In the second set, the discharge data of B0005 and B0007 were used as the training dataset, and the discharge data of B0006 were used as the test dataset to evaluate the model’s performance.
4.1. Technical Indicators
In this study, three key performance metrics were used to evaluate the model’s ability to predict the SOH of lithium-ion batteries: Mean Absolute Error (MAE), Root Mean Squared Percentage Error (RMSPE), and Maximum Error. These metrics were selected due to their complementary ability to provide a comprehensive assessment of model accuracy, robustness, and performance across various conditions. Each of these metrics offers unique insights into the prediction capabilities of the model, helping to assess different aspects of prediction performance, from average error to worst-case error.
The MAE measures the average magnitude of errors in the predictions, disregarding the direction of the error. It is calculated by averaging the absolute differences between the true values and the predicted values:
where
represents the true value, and
is the predicted value. The MAE provides a straightforward, interpretable measure of the average prediction error, making it particularly useful for evaluating the accuracy of SOH predictions. Smaller MAE values indicate better prediction accuracy on average.
The RMSPE quantifies the error as a percentage of the true value, making it sensitive to large errors due to its squared nature. It is defined as
where
and
are the true and predicted values, respectively. The RMSPE is useful in SOH prediction because it reflects both the magnitude and the relative error, offering a better understanding of model performance across different scales of data. A lower RMSPE indicates that the model is consistently accurate, even when the magnitude of the data varies.
Lastly, the Maximum Error measures the largest deviation between the true and predicted values in the dataset:
This metric highlights the worst-case scenario, providing insight into the largest single error that the model produces. A smaller Maximum Error indicates that the model is unlikely to produce large, problematic deviations, which is crucial in applications such as battery management systems, where even a small error can have significant implications. Together, these three metrics—MAE, RMSPE, and Maximum Error—offer a comprehensive evaluation of the model’s performance. The lower the values of these metrics, the better the model’s performance in predicting the SOH. A small MAE suggests that the model’s predictions are close to the true values on average, a low RMSPE indicates that the model performs well across varying data scales, and a small Maximum Error ensures that the model does not make large, unmanageable errors. These metrics are essential for ensuring the accuracy and reliability of SOH predictions in practical applications.
4.2. Analysis of Experimental Results
To validate the accuracy of the lithium-ion battery SOH estimation method based on the XGBoost algorithm, the predicted results are compared with those obtained from other widely used predictive models, including RF, LR, KNN, and SVM. As shown in
Figure 6, this comparison offers a comprehensive evaluation of the performance of the XGBoost-based model relative to these alternative methods. By examining the predictive accuracy of multiple models, the effectiveness and robustness of the XGBoost algorithm in estimating the SOH can be thoroughly assessed, ensuring its superior performance in various scenarios.
Table 1 provides a comparative analysis of the error results for XGBoost and four other regression algorithms applied to the B0005 and B0006 datasets evaluated using three performance metrics: MAE, RMSPE, and Maximum Error. As shown in
Table 1, XGBoost consistently achieved lower error values in all statistical error analysis metrics, outperforming the other algorithms. This indicates that XGBoost offers superior estimation accuracy, making it a more reliable choice compared to the other four regression algorithms. The lower error values across these key performance indicators suggest that XGBoost delivers more precise predictions of the SOH, which is crucial for battery management systems. Consequently, it can be concluded that XGBoost stands out in terms of predictive performance, offering more accurate SOH estimates, thereby enhancing the overall reliability of battery performance predictions.
As shown in
Figure 7, the visual comparison of the MAE, RMSPE, and Maximum Error from
Table 1 reveals that XGBoost outperformed the other four regression algorithms in both the B0005 and B0006 datasets. XGBoost achieved an error range of approximately ±0.4%, showcasing its superior estimation accuracy across all three performance metrics. This result highlights XGBoost’s ability to maintain high precision while controlling errors, offering a distinct advantage over the other algorithms in terms of both overall accuracy and error management. Consequently, XGBoost proved to be a highly reliable and effective algorithm for SOH estimation of lithium-ion batteries, demonstrating robust performance across diverse datasets.
The discharge data of lithium-ion batteries exhibit real-time characteristics, with the SOH continuously changing over time. As a result, the residuals between the predicted and actual values from the XGBoost model form a time series. To assess the stationarity of this residual time series, a unit root test (ADF test) was conducted [
37], as shown in
Table 2. Under the assumption that the residual time series has a unit root, if the test statistic is smaller than the threshold at the 1% significance level, the null hypothesis is rejected, indicating that the time series is stationary. In both the B0005 and B0006 datasets, the t-values are significantly smaller than the threshold at the 1% level, and the
p-values are well below 0.05. This confirms that the residual time series of XGBoost is stationary in both datasets. By ensuring that the residuals are stationary, we can conclude that the XGBoost model’s prediction errors do not exhibit systematic trends, making the model more reliable for estimating the SOH of lithium-ion batteries.
Figure 8 and
Figure 9 illustrate the confidence interval, with the shaded area representing boundaries set by twice the standard deviation of the correlation coefficient. As seen in these figures, there exists a clear correlation between the residual at the current moment and historical residuals, which suggests a temporal dependency within the residual time series. This correlation indicates that by identifying patterns in past residual data, it is possible to forecast future residuals. Such findings underscore the predictive nature of residuals, offering a promising avenue to improve the accuracy of the model. The ability to predict future residuals based on historical data can significantly enhance the reliability of the SOH predictions for lithium-ion batteries, contributing to more accurate battery health management. Through analysis, it can be concluded that ARIMA (3, 0, 1) and ARIMA (2, 0, 1) are the optimal parameter solutions for the XGBoost residual time series data of the B0005 and B0006 datasets, respectively.
In order to further improve the reliability and accuracy of SOH prediction, this study employed a lithium-ion battery SOH estimation method based on the XGBoost–ARIMA combined model applied to two different datasets. First, the XGBoost model was used to compute the predicted values of SOH. Then, the residuals between the predicted SOH values and the actual values were treated as a new time series, which was input into the ARIMA model to correct the residuals, resulting in a more accurate SOH estimate. As shown in
Figure 10, the results demonstrate that this method significantly improves the accuracy and reliability of SOH predictions. This combined approach leverages the strength of XGBoost in handling large datasets and the time series correction capability of ARIMA, offering a robust solution for predicting the health state of lithium-ion batteries.
Figure 11 shows a comparison of the errors between the XGBoost–ARIMA and XGBoost predictions across different datasets. From the comparison of error results, it is evident that the errors produced by the XGBoost–ARIMA predictions fluctuate more closely around zero, with a smaller range compared to XGBoost in both datasets. This suggests that the SOH predictions made by the XGBoost–ARIMA model are closer to the actual values, and the XGBoost–ARIMA model demonstrates better stability in SOH estimation over long time series compared to XGBoost. The improved stability of the XGBoost–ARIMA model highlights its suitability for long-term SOH predictions, where stability and accurate error reduction are crucial for ensuring reliable battery health management over time.
5. Conclusions
This paper presents an intelligent estimation method for the SOH of lithium-ion batteries based on the XGBoost–ARIMA combined model. By analyzing the key features in the battery discharge process, it was found that voltage difference, temperature difference, and average voltage can effectively reflect the aging characteristics of the battery. Therefore, these three features were selected as input variables for SOH estimation. Then, the XGBoost model was used to predict the SOH, learning the underlying patterns of battery health from the input features. To further improve the prediction accuracy, the ARIMA model was introduced to correct the residuals of the XGBoost predictions, optimizing the final SOH estimate. To validate the effectiveness of this method, the XGBoost–ARIMA combined model was compared with five other mainstream regression algorithms, including RF, KNN, LR, and SVM. The experimental results show that the XGBoost–ARIMA combined model outperformed the other five regression algorithms in terms of estimation accuracy on both the B0005 and B0006 datasets. Specifically, the XGBoost–ARIMA model is able to control the SOH prediction error within a smaller range and exhibits higher stability and stronger generalization ability in long-term sequence predictions. In conclusion, the XGBoost–ARIMA combined model provides a highly efficient and reliable solution for lithium-ion battery health management.
Although our approach has shown promising results in a controlled experimental environment, there are some limitations to be addressed. Firstly, the model currently relies on specific datasets (such as the NASA dataset), which may limit its generalizability to other types of batteries or different application scenarios. Therefore, future research could validate the method on more diverse battery datasets to assess its applicability across various battery types and operating conditions. Secondly, although the XGBoost–ARIMA model performs well in terms of accuracy, its adaptability to complex battery characteristics (e.g., internal resistance, cycle count, etc.) and different aging conditions may present challenges. As such, we plan to further optimize feature selection by incorporating additional battery features, which could improve the model’s robustness and generalization ability across diverse use cases. Furthermore, the accuracy of the model may be affected by data quality issues in practical applications. Future work could address this by integrating data enhancement and online learning methods to improve data quality and enhance model stability.
Although this paper demonstrates good accuracy in a controlled experimental setting, applying this method in real-world battery management systems presents a number of challenges. First, battery management systems typically require real-time monitoring and quick decision making. Hence, improving the model’s computational efficiency and real-time responsiveness, while ensuring high accuracy, is a critical challenge. Additionally, real-world applications often face data quality issues, such as sensor failures or missing data, which can impact the prediction accuracy. Future work could explore techniques like data imputation and augmentation to improve data quality. Furthermore, efficiently deploying the model in embedded devices or industrial-scale battery management systems remains a technical challenge, including limitations in hardware resources and energy consumption. As a result, we plan to focus on improving the computational efficiency and deployability of the model in future research, combining deep learning techniques with traditional statistical methods to enhance prediction accuracy and address real-time concerns.