Stage-Wise SOH Prediction Using an Improved Random Forest Regression Algorithm

Xiao, Wei; Jia, Jun; Gao, Wensheng; Li, Haibo; Xu, Hong; Zhong, Weidong; He, Ke

doi:10.3390/electronics15020287

Open AccessEditor’s ChoiceArticle

Stage-Wise SOH Prediction Using an Improved Random Forest Regression Algorithm

by

Wei Xiao

¹,

Jun Jia

^2,*

,

Wensheng Gao

¹,

Haibo Li

²

,

Hong Xu

³,

Weidong Zhong

² and

Ke He

²

¹

Department of Electrical Engineering, Tsinghua University, Beijing 100084, China

²

Tsinghua Sichuan Energy Internet Research Institute, Chengdu 610042, China

³

School of Control Science and Engineering, Shandong University, Jinan 250061, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(2), 287; https://doi.org/10.3390/electronics15020287

Submission received: 28 November 2025 / Revised: 22 December 2025 / Accepted: 6 January 2026 / Published: 8 January 2026

(This article belongs to the Special Issue Advanced Control and AI Methods for Future Battery Diagnostics and Prognostics)

Download

Browse Figures

Versions Notes

Abstract

In complex energy storage operating scenarios, batteries seldom undergo complete charge–discharge cycles required for periodic capacity calibration. Methods based on accelerated aging experiments can indicate possible aging paths; however, due to uncertainties like changing operating conditions, environmental variations, and manufacturing inconsistencies, the degradation information obtained from such experiments may not be applicable to the entire lifecycle. To address this, we developed a stage-wise state-of-health (SOH) prediction approach that combined offline training with online updating. During the offline training phase, multiple single-cell experiments were conducted under various combinations of depth of discharge (DOD) and C-rate. Multi-dimensional health features (HFs) were extracted, and an accelerated aging probability

p_{A A}

was defined. Based on the correlation statistics between HFs,

k_{H F}

, the SOH, and

p_{A A}

, all cells in the dataset were divided into general early, middle, and late aging stages. For each stage, cells were further classified by their longevity (long, medium, and short), and multiple models were trained offline for each category. The results show that models trained on cells following similar aging paths achieve significantly better performance than a model trained on all data combined. Meanwhile, HF optimization was performed via a three-step process: an initial screening based on expert knowledge, a second screening using Spearman correlation coefficients, and an automatic feature importance ranking using a random forest regression (RFR) model. The proposed method is innovative in the following ways: (1) The stage-wise multi-model strategy significantly improves the SOH prediction accuracy across the entire lifecycle, maintaining the mean absolute percentage error (MAPE) within 1%. (2) The improved model provides uncertainty quantification, issuing a warning signal at least 50 cycles before the onset of accelerated aging. (3) The analysis of feature importance from the model outputs allows the indirect identification of the primary aging mechanisms at different stages. (4) The model is robust against missing or low-quality HFs. If certain features cannot be obtained or are of poor quality, the prediction process does not fail.

Keywords:

SOH prediction; lithium-ion battery energy storage system; stage-wise aging path; random forest regression

1. Introduction

Lithium-ion battery energy storage systems are now widely deployed in photovoltaic, wind power integration, and grid peak-shaving applications, and the SOH is directly tied to operational safety and economic efficiency. An accurate and reliable SOH prediction is a key function of the battery management system (BMS), aiding in evaluating remaining capacity and lifetime, optimizing charge–discharge strategies to prolong service life, and preventing premature battery failures. Moreover, as lithium-ion batteries reach end-of-life, accurate health assessments become crucial for evaluating retired batteries intended for second-life applications. S. Tao et al. proposed fast diagnostic methods to determine the SOH of retired batteries and facilitate their sustainable redeployment [1]. In recent years, numerous methods for the lithium-ion battery SOH estimation have been proposed, which can be broadly classified into the following categories.

(1): Direct methods based on capacity: These methods typically rely on periodic standard charge–discharge tests under constant conditions to measure the battery’s actual capacity or internal resistance for the SOH evaluation. Their advantages are a simple principle and intuitive results, which can yield accurate health indicators under laboratory conditions. However, this approach requires the battery to undergo complete charge–discharge cycles or long rest tests, which are difficult to achieve in real operation. For example, Wei et al. showed that, in practical applications, batteries rarely experience full constant-current charge–discharge processes, making it hard to continuously perform the SOH estimation based on full-cycle capacity measurements [2]. Moreover, internal resistance measurements are sensitive to state of charge and ambient temperature; thus, methods solely relying on capacity calibration data (e.g., full charge–discharge capacity or steady-state resistance) have limited applicability under field conditions.
(2): Data-driven methods based on health feature extraction: These methods utilize easily accessible signals during battery operation (voltage, current, temperature, etc.) to extract health features and estimate the SOH through the mapping between these features and SOH [3]. Common HFs include the following: incremental capacity (IC) curve characteristics, where the positions and heights of peaks in the differential voltage (dV/dQ) curve during charging are analyzed to characterize changes in usable capacity [4]; voltage relaxation behavior, i.e., features of the voltage relaxation curve after charging or discharging, which reflect battery polarization and reversible capacity loss [5]; direct current resistance (DCR) evolution, where an equivalent circuit model or voltage–current transient response under operating conditions is used to obtain internal resistance growth for the SOH evaluation [6]; Coulombic efficiency and capacity variance statistics; and features of the probability density function (PDF) of voltage or capacity. Studies have shown that these features can reflect internal aging mechanisms and degradation trends to varying degrees. For example, incremental capacity analysis has been widely used to capture internal reaction changes—Li et al. applied Gaussian smoothing to IC curves and extracted peak positions/heights to predict the SOH of high-energy NMC cells [7]. Meanwhile, many works combine the above physical features with machine learning algorithms to improve the HF-to-SOH mapping [8]. Typical algorithms include support vector machines (SVMs), neural networks, and ensemble learning models. For instance, Zhang et al. combined IC curve features with support vector regression to achieve online SOH monitoring for vehicle batteries [9], while Peng et al. used IC curve features and a back-propagation (BP) neural network to accurately predict lithium-ion battery capacity [10]. Ensemble methods such as random forest regression (RFR) have also demonstrated strong performance in battery SOH prediction, offering high accuracy and interpretable feature importance [11].
(3): Hybrid methods combining mechanism models and data-driven models: These methods integrate battery physical models (e.g., equivalent circuit models or electrochemical models) with data-driven algorithms to leverage both physical interpretability and predictive accuracy. On the one hand, physical models provide a physical basis for battery aging through equivalent circuit parameters or empirical degradation formulas describing capacity fade and resistance growth; on the other hand, data-driven approaches are introduced to correct or estimate the nonlinear aspects that mechanistic models struggle to capture, thereby improving robustness and generalization [12]. Some researchers have constructed detailed electrochemical–mechanical coupled models to simulate SEI film growth and lithium plating effects on capacity fade. Dong et al. proposed a physics-based model considering both chemical and mechanical degradation mechanisms, which can simulate the SEI formation/growth process to predict capacity decay [13]; similarly, Zhuo et al. built models for active material loss and cyclable lithium loss in the electrodes, achieving the accurate characterization of capacity evolution [14]. Overall, hybrid methods use physical models to provide constraints and priors, supplemented by data-driven models to correct the parts that are hard to capture, thereby enhancing the model’s adaptability to different operating conditions and battery types [15].

Most of the above methods have demonstrated effectiveness on simulation or laboratory datasets. However, when applying these algorithms to complex and variable real-world scenarios, the many challenges that remain are as follows:

(1): Uncertainty and trend identification: In actual operation, data noise, model error, and other uncertainties exist, and error distributions are not necessarily Gaussian. Most existing studies provide only a single-point SOH estimate; the few studies that consider error distribution often assume it to be Gaussian. Such deterministic outputs cannot reveal whether the health state is trending better or worse than expected. In other words, traditional methods struggle to promptly determine whether a battery is “degrading faster than expected” or “performing better than expected.” Recently, some researchers have introduced probabilistic approaches such as the Bayesian model for averaging the SOH estimation to incorporate model and parameter uncertainty, outputting a probability distribution for the SOH [16]. This approach highlights that quantifying prediction uncertainty is important for the early warnings of abnormal aging.
(2): Lack of standard data under dynamic conditions: Field batteries operate under complex, fluctuating conditions—load power and charge–discharge rates vary frequently—making it difficult to regularly obtain complete full charge–discharge curves as the SOH references. In practice, batteries often undergo only partial charge–discharge and do not reach terminal voltages, which renders many full-cycle-based methods unusable. Thus, methods are needed that can estimate the SOH from fragmented, incomplete operating data. For example, some studies have attempted to extract features from ~10-min voltage relaxation curves or partial charge data to estimate capacity fade [17]. How to reliably extract health features and maintain model accuracy under non-standard operating profiles remains a major engineering challenge.
(3): Insufficient generalization over long-term aging and cell-to-cell differences: Different batteries, due to manufacturing variance, usage environments, and operational history, may exhibit significantly different aging trajectories. A single model is hard-pressed to cover an entire fleet of batteries over their full lifespans. On the one hand, battery aging typically occurs in stages (e.g., initial, plateau, accelerated, end-of-life), with each stage governed by different degradation mechanisms and feature evolution patterns; a single model cannot easily accommodate all stages with high accuracy. On the other hand, as a battery ages, if a model trained on the initial data is never updated, its error may accumulate, failing to reflect new changes in health.

This paper addresses the above challenges by introducing a probabilistic characterization of SOH prediction results and a stage-wise adaptive modeling approach. In Section 2, we detail the overall framework and implementation of the proposed method, including the offline multi-model training and online update strategy. Section 3 describes the experimental dataset and the extraction of health features. Section 4 evaluates the training results on the offline dataset, analyzes the aging paths captured by the models at different stages, and verifies the online application of the model under actual operating conditions through case studies. Finally, Section 5 concludes the paper.

2. Offline–Online SOH Prediction Method

Figure 1 shows the overall algorithm framework and application process. In the following subsections, we describe the offline training and online application processes in detail. Noteworthily, the first two feature-screening steps are relatively independent, while the final model update step relies on results from the offline training. For clarity, the feature selection procedure is discussed in a separate subsection.

2.1. Offline Multi-Model Ensemble Training

2.1.1. Automatic SOH Staging

As shown in Figure 2, battery degradation is not a linear process; generally, it can be divided into an initial stage, a mild aging stage (plateau), an accelerated aging stage, and a final failure stage. The end-of-life stage usually corresponds to reaching the retirement threshold (e.g., SOH 80%). Different stages involve different aging mechanisms, HF evolution trends, and SOH decay rates, which makes it challenging for a single full-lifecycle model to maintain accuracy and robustness across all stages. Therefore, stage-specific models can better capture the aging characteristics of each phase. However, due to cell-to-cell variability, the turning points between stages are not identical for every battery, making it difficult to determine universal breakpoint values.

To quantitatively determine whether a battery has entered the accelerated aging phase, we propose a

p_{A A}

calculation method based on the aging slope, which uses global statistics and a nonlinear mapping to represent the aging trend probabilistically.

First, for each battery’s SOH-Ah curve, various smoothing and interpolation strategies are applied to reconstruct a continuous trajectory. As aging curves differ under different conditions, we select the best-fitting scheme among polynomial fitting, Savitzky–Golay filtering, and LOWESS fitting. For any interpolated sequence

(A h_{i}, {S O H}_{i}),

we compute the discrete slope:

k_{S O H} ({A h}_{i}) = \frac{S O H_{i + 1} - S O H_{i}}{{A h}_{i + 1} - {A h}_{i}}, i = 1,2, \dots, n - 1,

(1)

and take the absolute value of the slope, denoted as

| k_{S O H} |

, as a uniform measure.

Next, we collect all |

k_{S O H}

| values from every cell over its entire life, remove abnormally high outliers, and calculate the cumulative distribution function

P (k)

:

P (k) = \int_{0}^{k} p (x) d x,

(2)

Then, we define a nonlinear mapping function to convert

P (k)

into

p_{A A}

:

p_{A A} (k) = \{\begin{array}{l} 0.5 \frac{P (k)}{P (\bar{k})}, & k \leq \bar{k} \\ 0.5 + 0.5 \frac{P (k) - P (\bar{k})}{1 - P (\bar{k})}, & k > \bar{k,} \end{array}

(3)

where

\bar{k}

denotes the global mean. This mapping function ensures that, in the steady-aging region (

k < \bar{k}

),

p_{A A} \leq 0.5

, whereas in the accelerated-aging region (

k < \bar{k}

),

p_{A A} \to 1

(see Figure 3).

To account for individual differences, we recalculate a baseline slope

k_{n o r m}

for each battery locally. We take the average slope in the SOH range of 88–95% as a steady-state reference:

k_{n o r m} = m e a n (| k_{S O H} |_{0.88 \leq S O H \leq 0.95})

(4)

We define the slope ratio

r_{i} = | k_{S O H, i} | / k_{n o r m}

using the set of samples

{r_{i}}_{i = 1}^{n}

within a local window, and we construct an empirical cumulative distribution function (ECDF):

{\hat{F}}_{R} (r) = \frac{1}{n} \sum_{i = 1}^{n} 1 (r_{i} \leq r)

(5)

Then, we define a local accelerated aging probability as follows:

p_{A A}^{*} (r) = \{\begin{array}{l} 0, & r \leq 1 \\ m i n (1, {\hat{F}}_{R} (r)), & r > 1 \end{array}

(6)

As shown in Figure 4, for most cells,

p_{A A}^{*}

increases gradually with cycling and remains at a low value, characterizing a normal slow-degradation stage; if for a particular cell

p_{A A}^{*}

rises significantly above the baseline, it means that the cell has deviated from the group’s usual degradation pattern and is considered to have entered an accelerated aging stage. This method, based on a physically interpretable SOH decay rate and combining global statistics with individual normalization, achieves a transformation from a deterministic aging rate to a probabilistic aging trend. Compared to using

k_{S O H}

alone, the accelerated aging probability provides a more intuitive indication of the risk of abrupt degradation increase and maintains consistency across different cells. In early life, various HFs change markedly and have an approximately linear relationship with the SOH; thus, HFs correlate strongly with the SOH and reliably indicate overall capacity fade. However, as cycle count increases and active material loss and polarization effects accumulate, HF changes gradually saturate and some features become less sensitive to the SOH. At this stage, even small fluctuations in the capacity fade can directly drive significant changes in

p_{A A}

. Utilizing this behavior, we propose a correlation-accumulation method with two thresholds to automatically segment the SOH trajectory, the core of which determines the two breakpoints A and B (with

1 \geq A > B \geq 0.7

). We define

C_{SOH} (S) = \sum_{j} |ρ ({HF}_{j}, SOH | S)|

and

C_{P A A} (S) = \sum_{j} |ρ (k_{HF, j}, P_{A A} | S)|

for any SOH subset

S

, where

ρ (\cdot, \cdot)

denotes the correlation coefficient. First, we perform a discrete search over the candidate threshold set

A

on

S_{1} (A)

:

A^{*} = \arg \max_{A \in A} \frac{C_{S O H} (S_{1} (A))}{C_{P A A} (S_{1} (A)) + ε}

(7)

where

S_{1} (A) = {S O H > A}

; ε > 0 is a small stabilizing term to avoid the denominator approaching zero. Next, we perform a discrete search over the set

B

on

S_{3} (B) = 0.7 \leq S O H \leq B

:

B^{*} = \arg \max_{B \in B} \frac{C_{P A A} (S_{3} (B))}{C_{S O H} (S_{3} (B)) + ε}

(8)

where

S_{3} (B) = {0.7 \leq S O H \leq B}

.

This approach eliminates the need to manually set thresholds or predefine inflection points; instead, it leverages the data’s own characteristics to achieve the automatic SOH stage division. Noteworthily, the obtained breakpoints reflect the statistical characteristics of the experimental dataset, and not every cell adheres exactly to this pattern. If the dataset used for stage determination changes or the criteria for “capacity drop” are altered, the identified breakpoints also shift.

2.1.2. Stage-Wise Multi-Rate Model Set Training

Figure 5 illustrates 17 days of operational data from an energy storage power station providing peak-shaving services. As shown, the operating profile varies day by day, leading to different aging trajectories even for the same total throughput or elapsed time. This means that using a fixed model for long-term SOH prediction can result in error accumulation, trend deviation, or the misidentification of accelerated aging, severely limiting the practical value of the prediction method [18].

Additionally, due to engineering economic constraints, most energy storage systems are not equipped with high-precision sensors (for example, the voltage resolution in this dataset is 0.01 V). Such limited precision also hampers parameter identification for detailed electrochemical models and the extraction of certain features. Some common HFs may be unattainable or significantly skewed due to data loss or noise.

Therefore, a single prediction model is hard to transplant across highly variable operating conditions; even under identical conditions, differences in battery consistency make it difficult for one model to remain valid from start to finish. Thus, it is necessary to establish multiple targeted models segmented by stage and grouped by the aging rate.

The specific construction is as follows: first, we divide the cells into three categories (long-, medium-, short-life) based on total lifetime. Then, for each cell, we compute its

k_{S O H}

(capacity fade rate) within each SOH stage identified in Section 2.1.1, and evenly divide the range of aging rates in that stage into three levels labeled “slow,” “moderate,” and “fast.” For each combination of aging stage and rate category, we train one improved RFR model, thereby constructing a multi-stage, multi-rate SOH prediction model system. When a battery is in a certain life stage and exhibits a certain decay speed, there is a corresponding model (or a closely matching model) available for use.

2.1.3. Improved RFR Algorithm Incorporating Prediction Result Probability Distribution

Generally, to address the prediction errors caused by noise and cell inconsistency, a common approach is to take the average of multiple prediction results. This method effectively captures the dominant aging trend and is easy to understand, but it overlooks the aging information contained in the distribution of the results [19]. Therefore, we optimize the RFR algorithm to retain the output distribution and extract additional insights from it. A typical RFR training procedure is can be found in reference [20]. The existing RFR models generally use the mean of the tree outputs as the final prediction. To enhance the practical reliability of SOH prediction, we introduce probability distribution modeling and data augmentation into the conventional RFR framework, proposing an improved RFR algorithm that retains the distribution information. In this improved approach, we preserve the complete set of leaf-node outputs from all decision trees and compile the distribution of predicted SOH values for each sample. From this distribution, we extract typical statistical features including the mean, variance, and skewness. This reveals the shape of the SOH prediction distribution, reflecting the fluctuation trend of the health state and potential risks. For instance, if the output distribution is negatively skewed, it indicates a downward tendency of the SOH (possibly forewarning accelerated aging); if the distribution is positively skewed, it suggests the decay rate is slowing or even a capacity recovery is possible.

Through these improvements, the proposed RFR model offers the following advantages:

(1): The prediction output is expanded from a single point value to a probabilistic interval, preserving the output distribution information. This endows the model with the ability to recognize abnormal aging behaviors such as capacity recovery or accelerated degradation, enhancing its practical utility.
(2): By integrating data augmentation and error distribution reconstruction, the model overcomes limitations of limited and non-ideal datasets [21], improving its robustness and adaptability in complex scenarios.

2.2. Feature Selection and Importance Ranking

Many studies have shown that using multiple HFs as inputs can effectively improve the accuracy and generalization of SOH estimation, reducing the uncertainty associated with any single feature being susceptible to noise, loss, or changing conditions [22]. However, increasing the number of features also significantly increases the computational burden; especially for ensemble learning algorithms like RFR, too many redundant features not only affect training and inference efficiency but also may trigger the “curse of dimensionality,” undermining model stability. Therefore, based on constructing a multi-dimensional HF set, we designed a three-step optimization process as follows:

2.2.1. Experience-Based Manual Screening

First, drawing on extensive research findings and preliminary experimental analysis, we initially select a set of health features that are physically interpretable and closely related to aging mechanisms. Typical features span capacity metrics, IC curve parameters, relaxation performance indices, ohmic and polarization resistances, Coulombic efficiency, temperature trends, etc. At the same time, considering the characteristics of different aging stages, we differentiate feature subsets intended for gradual aging, accelerated aging, and thermal safety warning, forming a multi-stage feature library (Table 1).

Noteworthily, many fine-grained electrochemical features, while capable of reflecting internal battery states in depth, often require strict testing conditions (e.g., constant temperature, low C-rate, extended rest) and complex computations. These are difficult to obtain frequently or compute in real time in the field. Therefore, in engineering applications, one should prioritize features that do not require special conditions, impose low computational burden, and still provide clear indications of aging.

2.2.2. Correlation Coefficient-Based Screening

Considering that the actual collected data may exhibit non-normal distributions, outliers, and multicollinearity, the traditional Pearson linear correlation has limited applicability. We employ the Spearman correlation coefficient to eliminate features with weak correlation or poor stability, thereby reducing the interference of redundant information on model performance. The Spearman coefficient is defined as follows [26]:

ρ_{s} (H F, S O H) = 1 - \frac{6 \sum_{i} {(r_{{H F}_{i}} - r_{{S O H}_{i}})}^{2}}{n (n^{2} - 1)}

(9)

where

r_{{H F}_{i}}

and

r_{{S O H}_{i}}

denote the ranks of the

i

th sample in the HF and SOH sequences, respectively, and

n

denotes the total number of samples.

ρ_{s}

ranges from −1 to 1, with a larger absolute value indicating a stronger correlation. Based on the results, we set a threshold and consider features with a correlation below that threshold to be invalid features, which are removed. This step filters out features that have a weak or inconsistent relationship with the SOH, thereby reducing the noise and redundancy in the model inputs.

2.2.3. Automated HF Importance Ranking Using the RFR Model

Random forests can measure each feature’s influence on the overall prediction by the contribution of that feature to the splits in all trees. For example, in regression trees, one can define the importance of a feature by summing the total reduction in mean squared error (MSE) it contributes across all trees [27]. Specifically, if Δ

{MSE}_{t, j}

is the reduction in MSE due to feature j in tree t, the importance score

I_{j}

can be defined as the sum of these reductions over all T trees:

I_{j} = \sum_{t = 1}^{T} Δ {MSE}_{t, j}

(10)

A larger

I_{j}

indicates that feature

j

contributes more to the model’s decisions. By sorting features in a descending order of

I_{j}

, we identify the most critical features and the relatively less important ones. Accordingly, lower-importance features can be further removed to simplify the model and avoid unnecessary noise.

2.3. Online Model Update and Application

2.3.1. Short-Term HF Prediction Modeling and Application

In online SOH prediction, since the interval between two capacity calibration tests is typically long (possibly months or even longer), a key challenge is how to utilize routine operating data to evaluate the SOH during these intervals. To address this, we construct a feature evolution mapping model to achieve short-term HF prediction as illustrated in Figure 6.

As an example, Figure 7 shows the cluster of VAR curves for all cells, along with polynomial interpolation fits for those curves.

The same operating condition was repeated on two sets of cells; the mean VAR from those is calculated and found to follow a normal distribution, as shown in Figure 8.

For clarity, Figure 9 highlights a few example trajectories (paths) when VAR = 0.5. Different combinations of DOD and C-rate lead to different subsequent aging paths, and these paths are related to the current VAR value. By calculating the change slope of VAR (i.e., how VAR changes per unit Ah) for each path at VAR = 0.5, we obtain 16 slope data points (for 16 combinations of DOD and C). Plotting these as a 3D scatter and applying a two-dimensional surface interpolation yields the map shown in Figure 10.

It is apparent that larger C-rates and deeper DODs result in a higher VAR change slope. Different HFs have their own characteristic slopes; thus, this mapping needs to be computed for the current value of whichever HF we use. Figure 11 shows map surfaces for three different VAR values (0.3, 0.5, 0.7); as VAR increases, the same charge throughput produces a larger change in VAR (i.e., a steeper decline in that feature).

When using the map model for prediction, we adopt an iterative rolling prediction. Using the current HF value as input, we search for all (DOD, C) pairs that correspond to the current HF and build a map for that specific HF value. From this map, we obtain the short-term change rate

k_{H F}

. We then update the HF as follows:

{H F}_{e n d} = {H F}_{s t a r t} + k_{H F} \cdot ∆ A h

(11)

where

{H F}_{s t a r t}

denotes the initial HF at the beginning of the interval,

{H F}_{e n d}

denotes the predicted HF, and

∆ A h

denotes the forecasted charge throughput over the prediction horizon. As operation continues, the newly predicted HF becomes the starting point for the next iteration, and the prediction rolls forward. As each iteration uses the latest HF to rebuild the map, the value of

k_{H F}

can change each step. If in practice the operating condition (DOD, C) shifts, one can directly calculate the corresponding

k_{H F}

from the map; if an untested (DOD, C) condition occurs, its effect can be obtained via interpolation on the map.

2.3.2. Offline Model Set Selection and Update

In large-scale energy storage systems, capacity test data are usually regarded as the starting point for subsequent health prediction. However, due to measurement errors, operational fluctuations, or capacity recovery effects, the SOH obtained from a capacity test often has biases, appearing as unstable local aging slopes or stage-wise “bounce-back” behavior. To improve the model’s fidelity to the true aging trajectory, we propose a multi-candidate fitting approach to expand the SOH–

k

dataset, and with it develop a more robust model selection and update mechanism.

(1): SOH-k set expansion under uncertainty: For the most recent capacity test point, let the measured capacity correspond to a health state ${\hat{s}}_{t}$ . Using historical capacity test data ${(A h_{i}, s_{i})}_{i = 1}^{n}$ , we construct multiple fitting schemes (e.g., polynomial fitting, piecewise fitting, weighted smoothing) to obtain several approximate functions $f_{j} (A h)$ . For each fitting form $f_{j} (\cdot)$ , we calculate the predicted health state at the current charge throughput $A h_{t}$ :

$s_{t}^{(j)} = f_{j} (A h_{t}), j = 1,2, \dots$

(12)

and we compute the local aging rate (slope) using the previous capacity test point:

$k_{t}^{(j)} = - \frac{s_{t}^{(j)} - s_{t - 1}^{(j)}}{A h_{t} - A h_{t - 1}}$

(13)

yielding multiple candidate point pairs ${(s_{t}^{(j)}, k_{t}^{(j)})}$ .
(2): Model set selection: For each candidate pair ${(s_{t}^{(j)}, k_{t}^{(j)})}$ , we find the model $M_{opt}$ from the offline-trained model library whose training stage and aging rate are closest to these values. We collect the set of such matching models as $M_{t}$ for use in the next prediction period.
(3): Secondary training of models: Generally, different battery clusters or packs in a large energy storage station age at different paces. Incorporating information from other batteries with similar environments and aging stages into the model is an effective way to improve its accuracy. For each model in $M_{t}$ , we select a nearby subset of data from its original training set as follows:

$D^{*} = {(x_{i}, s_{i}) | | s_{i} - {\tilde{s}}_{t} | < Δ_{s}, | k_{i} - {\tilde{k}}_{t} | < Δ_{k}}$

(14)

where $\tilde{s_{t}}$ and $\tilde{k_{t}}$ represent the central SOH and aging rate of that model’s training group, and $Δ_{s} a n d$ $Δ_{k}$ represent small tolerance margins. We retrain (or perform an incremental update on) the model using $D^{*}$ to obtain an updated model, thereby incorporating the latest aging state information of the current battery group or station cluster.
(4): Model set application: We take the multi-dimensional HFs obtained from the capacity test as the starting point, and we use the map model to predict their short-term evolution. In parallel, we continuously extract other HFs during normal (non-capacity-test) operation. The input feature vector thus includes both real-time measured HFs and short-term predicted HFs:

$x_{i} = [H F_{1}, H F_{2}, \dots, H F_{p}, H F_{p + 1}^{p r e d}, \dots]$

(15)

where $H F_{p + 1}^{p r e d}$ etc. denote the predicted features. Using each model, we obtain a set of SOH predictions for the current time. We analyze the SOH trend and its probability distribution to assist O&M personnel in assessing the battery health state. For the SOH distribution at each prediction time $t$ , we calculate the mean $μ_{t}$ and the 5% lower confidence bound $q_{t}^{(5 %)}$ . On the one hand, we calculate $p_{A A}$ based on $μ_{t}$ . If $p_{A A}$ exceeds the accelerated aging threshold $τ_{A A}$ , we issue an accelerated-aging warning. On the other hand, we define their difference $Δ_{t} = μ_{t} - q_{t}^{(5 %)}$ as an indicator of lower-tail deviation. When $∆_{t}$ consistently exceeds the threshold $τ_{D D}$ more than $N_{A}$ times, it is considered that the lower confidence interval has significantly shifted downward, indicating an increased left-tail risk of the predictive distribution, which triggers an accelerated-aging warning. If the instances of Δt being continuously below $τ_{D D}$ exceed $N_{D}$ , the warning is lifted. An accelerated-aging warning is issued if the warning is triggered based on either $p_{A A}$ or $Δ_{t}$ . In subsequent cases, the values are as follows: $τ_{A A}$ = 0.8, $τ_{D D}$ = 0.03, $N_{A}$ = 3, and $N_{D}$ = 5.
(5): Dynamic adjustment and expansion: Based on accumulated data and observed aging trends, we periodically augment or prune the model library to ensure prediction reliability and system stability over long durations and evolving operating conditions. In situations where operating conditions are complex or the battery has entered the late stage of aging, the prediction window can be appropriately shortened. In the following cases, the prediction window used is 1000 Ah, which corresponds to 10 cycles.

Noteworthily, battery service life is becoming very long in practice, often over 10 years, with the vast majority of time spent in regular operation (rather than controlled tests). The proposed workflow only requires performing a model update and prediction after a new capacity test result is obtained, which greatly reduces computational demands. By selecting an appropriate model from the stage-wise ensemble, long-term accumulated error is reduced, and the risk of error buildup or model failure inherent in using a single fixed model is mitigated.

3. Experimental Implementation and Data Processing

3.1. Experimental Method

The batteries tested were 50 Ah NCM622 pouch cells, manufactured by China Aviation Lithium Battery Technology Co., Ltd. (CALB), Changzhou, China. We carried out aging experiments under various combinations of depth of discharge (DOD) and C-rate, as summarized in Table 2. All tests were performed in an environment maintained at ~25 °C ambient temperature and charging phase proceeded to the full charge voltage of 4.2 V (100% SOC).

Cells were cycled under the specified conditions, and every calendar month, a performance assessment was conducted including a capacity test (full charge–discharge at a low rate). Taking Cell #1 as an example, several cycles of operation are shown in Figure 12.

The operation of one full cycle for Cell #1 is illustrated in Figure 13. Regions a–e denote performance tests: capacity test, low-rate discharge, DST (Dynamic Stress Test), FUDS (Federal Urban Driving Schedule), and HPPC (Hybrid Pulse Power Characterization), respectively. Region f is the repetitive cycling under the designated aging condition. The cyclic aging phase is designed to simulate real usage where DOD is not 100%; under such conditions, many HFs that depend on nearly complete charge–discharge curves cannot be obtained.

When a cell’s capacity decayed to 80% of its rated capacity, we ended the test for that cell and marked it as reaching end-of-life. If a cell exhibited severe swelling or leakage, the experiment was immediately halted for safety. (For more detailed information, please refer to our previous work [28]).

3.2. Results and Analysis

All cells’ aging curves are plotted in Figure 14, and the polynomial fits of those curves are shown in Figure 15.

Among the cells, Cell #3 had a manufacturing defect, and its life was more than 40% shorter than that of Cell #4 under the same conditions. Cell #30 experienced swelling and leakage at an SOH of 0.844, terminating its test early. These two cells were excluded from subsequent analysis. As the cell SOH approaches 0.8, testing frequency is usually increased for safety, to avoid bloating or leakage accidents caused by excessive cycling. We observed that low-C-rate capacity tests temporarily slowed the capacity decay rate, mainly because continuous high-rate cycling causes non-uniform lithium distribution in the cell, whereas low-rate cycling with rest allows lithium to redistribute, temporarily increasing usable capacity. Noteworthily, 2.0C was the maximum charge–discharge rate in our tests. Investigation of even higher current rates (beyond 2C) was beyond our experimental scope; such extreme conditions may need to be examined in future works to further validate the model under more aggressive cycling.

Most cells did not exhibit a distinct knee point throughout their life; the accelerated aging stage was not prominent; instead, they showed an almost linear, steady decay. Such cells (for example, sample C2-DOD30-2) can be modeled quite accurately even without the stage-wise prediction strategy proposed in this work. In other words, for batteries whose aging path remains smooth and linear, our method can still achieve good accuracy, though the advantage of the stage-wise approach becomes most evident for cells that do exhibit nonlinear aging behavior.

From the experimental data, we extracted multiple HFs at each capacity test and performance test and conducted correlation analysis. Due to space constraints, we present only representative features from each category and report each feature’s Spearman rank correlation with the SOH, as well as the correlation between its degradation rate

k_{H F}

and the capacity fade rate

k_{S O H}

.

1. Capacity features: Efficiency-related indicators did not show obvious changes by the end of the experiment. Self-discharge-related metrics could not be obtained because, for the sake of accelerated testing, no prolonged rest periods were included. Therefore, the capacity features considered are primarily VAR and the capacities in specific voltage segments

Q_{s c}

and

Q_{e c}

.

Figure 16 and Figure 17 show the evolution of all cells’ capacity features over accumulated Ah and over the SOH, respectively, and Figure 18 shows the relationship between the slopes of these features and the capacity decay rate

k_{SOH}

. We observe that the VAR feature is quite sensitive to SOH changes, exhibiting a well-behaved monotonic decrease;

Q_{s c}

also generally tracks the capacity fade trend; however,

Q_{e c}

, having a small absolute value and showing significant changes only in later life, is strongly affected by measurement noise. Furthermore, the slopes of all three capacity features (Figure 18) do not correlate well with the capacity fade slope

k_{SOH}

. This indicates that relying solely on capacity feature slopes is insufficient to accurately characterize the aging rate, and other types of features are needed to complement them.

2. IC features: Figure 19, Figure 20 and Figure 21 present the evolution of IC curve features (focusing on the areas, heights, and voltage positions of the main IC peaks) for all cells. Overall, the two primary peaks of the IC curve (denoted as Peak 1 and Peak 2) exhibit different aging patterns across cells. In general, one would expect all IC peaks to diminish with aging; however, in our data, the first peak’s amplitude increases contrary to expectations, for which no satisfactory explanation has yet been found. Nevertheless, Peak 1’s area and height have low correlation with the SOH, whereas the area and peak value of Peak 2 correlate much more strongly with the SOH, making Peak 2 a more reliable indicator of capacity loss. This is likely because the electrode phase-change reaction corresponding to Peak 2 accounts for a larger fraction of the capacity; thus, its attenuation is more pronounced as the cell ages. Additionally, the voltage position of Peak 1 shifts significantly lower as cycles progress, while the position of Peak 2 remains relatively stable. (A downward shift in peak voltage is another signature of aging, reflecting changes in internal resistance and reaction kinetics.)

3. Relaxation features: The relaxation process (voltage recovery after charge or discharge) reflects the extent of battery polarization (ohmic and concentration polarization) and the hysteresis of ion diffusion. When the battery is fresh, relaxation is fast and has a small amplitude; as internal resistance grows and polarization worsens, the relaxation takes longer and the steady-state voltage offset increases.

Figure 22 and Figure 23 show how relaxation metrics (e.g., relaxation time and voltage drop at various SOC levels) evolve with Ah and the SOH, and Figure 24 shows how the slopes of these metrics relate to

k_{SOH}

. Interestingly, the relaxation time initially increases with cycling and later decreases, exhibiting a non-monotonic “rise-then-fall” behavior. A possible explanation is that, from early to mid-life, SEI growth and side reactions cause polarization to increase, lengthening relaxation time and increasing its magnitude, but in the later severe aging stage, a large loss of cyclable lithium and active material sharply reduces capacity, which in turn lowers the absolute current stress (for a given C-rate); thus, the relaxation curve changes become slower or even somewhat mitigated. Thus, after peaking around mid-life, the relaxation metrics show some decline in the final stage. Additionally, differences in relaxation behavior at various SOCs provide further insights—for instance, relaxation at high SOC is typically slower and larger in magnitude because polarization is greater when the cell is near full charge; at low SOC (near end-of-discharge), the relaxation amplitude is also relatively large due to the cell being close to depletion. Overall, relaxation features capture the accumulated effects of internal resistance and diffusion impedance (peaking in mid-life), and they carry useful indications of the aging status.

4. Impedance/resistance features: These include resistances obtained from HPPC tests at various SOC levels (ohmic resistance

R_{o h m}

and polarization resistance

R_{p o l}

), as shown in Figure 25, Figure 26 and Figure 27. The various internal resistance features generally exhibit a trend of initially decreasing slightly and then increasing, with a significant rise observed near the end-of-life.

The results of all correlation calculations are shown in Table 3. In the subsequent model training process, features with a correlation below 0.5 were removed.

4. Results of Model Training and Application

4.1. Offline Model Performance Evaluation

Following the methodology of Section 2.1, we conducted stage-wise, multi-model offline training. Using the stage partitioning criteria from Section 2.1.1, the training dataset was divided into three aging stages. The cumulative correlation ratios are plotted in Figure 28, which indicates global inflection points at A = 0.90 and B = 0.82.

For each stage, we further grouped the data into three subsets by the aging rate according to the method of Section 2.1.2. We randomly selected one cell from each rate group as a test sample (C2-DOD50-2, C1-DOD70-1, and C1.2-DOD30-1, representing long-, medium-, and short-life cases, respectively) to evaluate the method’s online application performance; these cells’ data were not used in training. Because the number of samples within each stage is limited, the “fast/medium/slow” rate grouping was performed by dividing the

k_{SOH}

values for that stage into three ranges of equal frequency (using that stage’s 33rd and 67th percentiles as thresholds). In total, nine model subsets were obtained, each containing data from a certain set of cells/time points. The model numbering is shown in Table 4.

Each model was trained only on its subset’s data, with a portion of each subset held out for evaluation. The models used 100 decision trees; the maximum tree depth, minimum leaf samples, and other hyperparameters were determined via the RandomizedSearchCV method to balance accuracy and computational speed.

Figure 29, Figure 30 and Figure 31 show the SOH estimation results and feature importance rankings for the representative models from the early stage (Model 3), middle stage (Model 5), and late stage (Model 8) on their respective test cells.

1. High-SOH (early) stage:

At the beginning of life, the model relies most on the area of the second IC peak (IC_Area2), followed by the height of the second peak (IC_Peak2). This indicates that, in the initial stage, the features of the second peak of the incremental capacity curve during charging are the most sensitive to the SOH. Early capacity fade is predominantly controlled by the loss of cyclable lithium (e.g., the initial SEI formation consuming Li+). Internal-resistance-related features contribute negligibly to this stage, consistent with the fact that internal resistance growth is very slow early on and does not limit capacity.

2. Middle stage:

IC_Area2 still ranks first, but VAR has jumped to the most important feature; in addition, the Peak 2 height, Peak 1 area, and several resistance features (e.g., ohmic resistance at 60% SOC, polarization resistance at 80% SOC) enter the top five. This suggests that, by mid-life, capacity fade transitions to being co-driven by active material loss and increasing internal resistance. On the one hand, the VAR feature reflects changes in the overall shape of the charge curve, indicating that the distortion of the charge–discharge voltage profile is worsening (implying changes in electrode porosity structure and reversible capacity); on the other hand, resistance features begin to assume significant importance, indicating that, by the middle stage, the polarization impedance has risen enough to noticeably affect capacity. Mid-life degradation is thus a process jointly dominated by active material loss and resistance growth [29].

3. Low-SOH (late) stage:

In the final life stage, the derivative of IC_Area2 (k_IC_Area2) appears among the top five features, indicating that the model is now leveraging not only the absolute values of peaks but also their rates of decline to judge the aging trend. Relaxation time features also gain some weight, suggesting that towards end-of-life, failure mechanisms like lithium plating and current collector corrosion may emerge, manifested by a sudden capacity drop accompanied by sharply rising polarization. Among resistance features, polarization resistance metrics rank higher than ohmic resistance, implying that, in the low-SOH region, the aging is mainly governed by diffusion limitations and double-layer polarization processes. In sum, the late-stage model relies on a more diverse array of features, including direct capacity-loss indicators as well as indirect indicators of accelerated degradation kinetics.

To further understand each model’s decision basis, we extracted the feature importance rankings for the representative model of each stage. In summary, the top features for the stage-specific models (in the order of cumulative importance for that stage) are as follows:

(1.0, 0.9): IC_Area2, IC_Peak2; (0.9, 0.82): IC_Area2, VARs, IC_Peak2, IC_Area1, R_ohms_60, R_ps_80, R_ps_40, R_ps_20, Qsc, R_ohms_20, R_ps_60, k_IC_Peak2, R_ps_50; (0.82, 0): IC_Area2, VARs, IC_Area1, k_IC_Area2, IC_Peak2, R_ps_60, R_ps_20, R_ps_40, Qsc, R_ps_80, relaxation_time_50, R_ohms_20, relaxation_time_60, Qec.

Clearly, the models in different stages depend on features very differently. Early-life degradation is mainly driven by reversible lithium loss (SEI layer formation), mid-life shifts towards a combination of active material loss and impedance growth, and late life may see sudden mechanisms like lithium plating becoming dominant. Notably, IC_Area2 shows extremely high predictive value in all stages, but extracting IC curve features like IC_Area2 requires low-C-rate standard charge–discharge tests, which may not always be available in field applications. If a full IC curve cannot be obtained, the VAR feature can serve as a substitute—essentially, VAR captures how much the charging voltage curve shape deviates from that of a fresh battery, thereby conglomerating multiple aging effects to some extent.

4.2. Online Application Case Studies

We selected three cells (one from each life category) to demonstrate the online prediction performance. Figure 32, Figure 33 and Figure 34 show the SOH prediction results and the evolution of their predicted probability distributions after each capacity test over the entire life of each cell.

The results indicate that the stage-wise models achieve high prediction accuracy for cells with different aging characteristics: at each capacity test point, the predicted SOH mean is very close to the measured value, with MAPE within 1%. For example, for the longest-lived cell (C1.2-DOD30-1), the average error between the predicted SOH and measured SOH at each stage is under 0.5%, with a maximum error of only about 0.8%; for the medium-life cell (C1-DOD70-1), the errors are slightly above 1%; for the shortest-lived cell (C2-DOD50-2), which exhibited an abrupt change in aging rate, the prediction error is around 0.7%. Compared to using a single model without stage partitioning, our method significantly improves long-term SOH prediction accuracy and stability. In particular, in late life, a single global model often underestimates the degree of degradation due to limited training data for that region, whereas the stage-wise models effectively avoid this issue, keeping the cumulative error across the lifespan low.

It should be emphasized that, because capacity measurements themselves have some error, pointwise prediction errors are not the sole basis for evaluating model performance. The evolution of SOH with cycling and its predicted probability bands align more closely with the actual aging trajectory, demonstrating the model’s practical value.

In the case of cell C2-DOD50-2, the third and fourth capacity tests showed anomalous results (a slight capacity recovery), which may have misled experimenters—by the next test, the SOH was already near 0.8. To avoid a safety incident, a follow-up test was conducted shortly after, confirming that the SOH had fallen below 0.8, and the experiment was terminated. In fact, accelerated aging should have been detected by the fourth time. With our method, a warning has been issued before the third capacity test, prompting maintenance personnel to increase the test frequency and thus avert the risk. Additionally, looking back at the later data, the SOH predicted before the third test likely reflected the true aging path more accurately than the temporarily higher measured value. In the cell C1-DOD70-1, accelerated aging could only be confirmed (by a notable change in average decay slope) after the 7th capacity test, whereas our method raised an alert right after the 5th test. For the cell C1.2-DOD30-1, the overall aging rate was so slow that, even by the 6th capacity test, it was unclear whether aging had accelerated, but our method was continuously issuing warnings during that phase of operation.

To test the model’s resilience under data deficiencies, we performed a simulation on a sample cell (C2-DOD50-2) where we introduced feature dropouts and noise: 5% of the input data points were randomly replaced with NaN values. The results, shown in Figure 35, indicate that the predicted SOH distribution interval widens notably, but the overall trend assessment remains correct, and the early warning still exceeds 50 cycles.

We further compared the proposed approach with a Gaussian Process Regression (GPR) [30] model on the cell C2-DOD50-2 using the same input features and training data (Figure 36). Under the assumption of Gaussian noise, the GPR provides a predictive mean along with the 95% predictive interval. Although the mean SOH estimate generally follows the overall aging trend, the prediction intervals remain relatively narrow and symmetric during the early cycles, offering limited indication of the impending accelerated degradation. Only after a pronounced SOH decline do the uncertainty bounds expand substantially, resulting in a noticeably delayed warning compared with the proposed RFR-based method.

Table 5 presents a quantitative comparison between the improved RFR and the GPR baseline. GPR’s performance in mean prediction is also acceptable. However, within the standard GPR framework adopted in this study, the uncertainty characterization is primarily governed by smoothness priors and observation noise assumptions, making it less effective at capturing the asymmetric risk associated with abrupt transitions in aging trajectories at an early stage.

5. Conclusions

Focusing on the lithium-ion battery SOH estimation, we proposed a stage-wise prediction method. The main conclusions drawn from this study are summarized as follows:

Stage-wise multi-model strategy: Segmenting the battery life into stages and training dedicated models for different aging rates can make prediction models more targeted. With each stage’s model handling its respective segment, long-term prediction accuracy and robustness are significantly improved.
Improved RFR with uncertainty quantification: The improved RFR algorithm provides an uncertainty evaluation of the results, enabling trend forecasting. For incipient accelerated decay, the model’s output distribution becomes significantly negatively skewed or even bimodal, giving a clear warning signal well before the SOH actually plunges.
Adaptability to missing data and condition changes: By selecting multi-dimensional HFs and performing importance screening, the model does not fail if some HFs cannot be obtained or deviate abnormally. Meanwhile, the importance ranking of the remaining features provides insights into the primary aging modes.

The areas for further research include the following:

The short-term HF prediction model is based on aging data. For different manufacturers, cell chemistries, or markedly different operating conditions (e.g., much higher C-rates or different temperature regimes), additional aging experiments are needed to enrich the spectrum of aging paths and improve the map model’s coverage. In the future, techniques like transfer learning could be explored to quickly transfer the existing model to new types of batteries.
In practical deployment, the frequency of capacity tests should be optimized. Our current strategy performs a capacity test approximately once per month to update the model, but an adaptive schedule could be considered. Further field validation and study of such adaptive diagnostic scheduling would help balance maintenance efforts and early warning effectiveness, ensuring the model’s benefits are fully utilized in real-world operations.

Author Contributions

Conceptualization, W.X., W.G. and J.J.; methodology, J.J., H.X. and W.X.; software, J.J. and H.X.; validation, K.H., W.X. and W.Z.; formal analysis W.X.; investigation, J.J. and K.H.; resources, H.L., K.H., W.X. and W.Z.; writing—original draft preparation, J.J. and W.X.; writing—review and editing, J.J., W.G., H.L., W.Z. and W.X.; visualization, H.X. and J.J.; supervision, W.G., W.X. and K.H.; funding acquisition, K.H., H.L. and W.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China Joint Fund (Project No.: U23B20111, Project Title: Theory of Uncertainty for High-Proportion Wind and Solar Power Sources and Grid Regulation Technology).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We thanks the students and teachers of Chongqing University who participated in the experiment for more than three years. We thank the researchers mentioned in the references for their research results on health feature extraction.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SOH	State of health
DOD	Depth of discharge
HF	Health feature
RFR	Random forest regression
Ah	Ampere-hour
VAR	Capacity variance (variance of throughput over voltage range)
IC	Incremental capacity
IC_Area1	Area under the first peak of the IC curve
IC_Area2	Area under the second peak of the IC curve
IC_Peak1	Peak value of the first peak in the IC curve
IC_Peak2	Peak value of the second peak in the IC curve
k_IC_Area2	Degradation rate of IC_Area2
Q_sc	Capacity in semi-charged segment (voltage-based)
Q_ec	Capacity in end-charged segment (voltage-based)
DCR	Direct current resistance
R_ohm	Ohmic resistance
R_pol	Polarization resistance
HPPC	Hybrid Pulse Power Characterization
FUDS	Federal Urban Driving Schedule
$P_{A A}$	Probability of Accelerated Aging
$k_{S O H}$	SOH degradation rate (slope)
$k_{H F}$	Degradation rate of health feature
ECDF	Empirical cumulative distribution function
MAPE	Mean absolute percentage error
BMS	Battery management system
MAP	Feature Evolution Mapping Model
DST	Dynamic Stress Test
SEI	Solid Electrolyte Interphase

References

Tao, S.; Guo, R.; Lee, J.; Moura, S.; Casals, L.C.; Jiang, S.; Shi, J.; Harris, S.; Zhang, T.; Chung, C.Y.; et al. Immediate remaining capacity estimation of heterogeneous second-life lithium-ion batteries via deep generative transfer learning. Energy Environ. Sci. 2025, 18, 7413–7426. [Google Scholar] [CrossRef]
Wei, Z.; Ruan, H.; Li, Y.; Li, J.; Zhang, C.; He, H. Multistage state of health estimation of lithium-ion battery with high tolerance to heavily partial charging. IEEE Trans. Power Electron. 2022, 37, 7432–7442. [Google Scholar] [CrossRef]
Huang, X.; Tao, S.; Liang, C.; Ma, R.; Wang, X.; Xia, B.; Zhang, X. Robust and generalizable lithium-ion battery health estimation using multi-scale field data decomposition and fusion. J. Power Sources 2025, 642, 236939. [Google Scholar] [CrossRef]
Guo, J.; Li, Y.; Meng, J.; Pedersen, K.; Gurevich, L.; Stroe, D.-I. Understanding the mechanism of capacity increase during early cycling of commercial NMC/graphite lithium-ion batteries. J. Energy Chem. 2022, 74, 34–44. [Google Scholar] [CrossRef]
An adaptive and interpretable SOH estimation method for lithium-ion batteries based-on relaxation voltage cross-scale features and multi-LSTM-RFR2. Energy 2024, 304, 132167. [CrossRef]
Ibraheem, R.; Strange, C.; dos Reis, G. Capacity and internal resistance of lithium-ion batteries: Full degradation curve prediction from voltage response at constant current at discharge. J. Power Sources 2023, 556, 232477. [Google Scholar] [CrossRef]
Li, K.; Xie, N. Battery health prognostics based on improved incremental capacity using a hybrid grey modelling and gaussian process regression. Energy 2024, 303, 131888. [Google Scholar] [CrossRef]
Ospina Agudelo, B.; Zamboni, W.; Postiglione, F.; Monmasson, E. Battery state-of-health estimation based on multiple charge and discharge features. Energy 2023, 263, 125637. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, Y.; Wang, J.; Zhang, T. State-of-health estimation for lithium-ion batteries by combining model-based incremental capacity analysis with support vector regression. Energy 2022, 239, 121986. [Google Scholar] [CrossRef]
Peng, J.; Zhao, X.; Ma, J.; Meng, D.; Jia, S.; Zhang, K.; Gu, C.; Ding, W. State of health estimation of Li-ion battery via incremental capacity analysis and internal resistance identification based on kolmogorov–arnold networks. Batteries 2024, 10, 315. [Google Scholar] [CrossRef]
Tao, S.; Ma, R.; Zhao, Z.; Ma, G.; Su, L.; Chang, H.; Chen, Y.; Liu, H.; Liang, Z.; Cao, T.; et al. Generative learning assisted state-of-health estimation for sustainable battery recycling with random retirement conditions. Nat. Commun. 2024, 15, 10154. [Google Scholar] [CrossRef]
Xu, B.; Shi, J.; Li, S.; Li, H.; Wang, Z. Energy consumption and battery aging minimization using a Q-learning strategy for a battery/ultracapacitor electric vehicle. Energy 2021, 229, 120705. [Google Scholar] [CrossRef]
Dong, G.; Wei, J. A physics-based aging model for lithium-ion battery with coupled chemical/mechanical degradation mechanisms. Electrochim. Acta 2021, 395, 139133. [Google Scholar] [CrossRef]
Zhuo, M.; Offer, G.; Marinescu, M. Degradation model of high-nickel positive electrodes: Effects of loss of active material and cyclable lithium on capacity fade. J. Power Sources 2023, 556, 232461. [Google Scholar] [CrossRef]
Tao, S.; Zhang, M.; Zhao, Z.; Li, H.; Ma, R.; Che, Y.; Sun, X.; Su, L.; Sun, C.; Chen, X.; et al. Non-destructive degradation pattern decoupling for early battery trajectory prediction via physics-informed learning. Energy Environ. Sci. 2025, 18, 1544–1559. [Google Scholar] [CrossRef]
Zou, Q.; Wen, J. Battery state-of-health estimation incorporating model uncertainty based on bayesian model averaging. Energy 2024, 308, 132884. [Google Scholar] [CrossRef]
Feng, X.; Zhang, Y.; Xiong, R.; Tang, A. Estimating battery state of health with 10-min relaxation voltage across various charging states of charge. iEnergy 2023, 2, 308–313. [Google Scholar] [CrossRef]
Li, X.; Yu, D.; Vilsen, S.B.; Subramanian, V.R.; Stroe, D.-I. Robust remaining useful lifetime prediction for lithium-ion batteries with dual gaussian process regression-based ensemble strategies on limited sample data. IEEE Trans. Transp. Electrif. 2025, 11, 6279–6290. [Google Scholar] [CrossRef]
Liu, H.; Li, C.; Hu, X.; Li, J.; Zhang, K.; Xie, Y.; Wu, R.; Song, Z. Multi-modal framework for battery state of health evaluation using open-source electric vehicle data. Nat. Commun. 2025, 16, 1137. [Google Scholar] [CrossRef]
Wang, G.; Lyu, Z.; Li, X. An optimized random forest regression model for Li-ion battery prognostics and health management. Batteries 2023, 9, 332. [Google Scholar] [CrossRef]
Tao, S.; Ma, R.; Chen, Y.; Liang, Z.; Ji, H.; Han, Z.; Wei, G.; Zhang, X.; Zhou, G. Rapid and sustainable battery health diagnosis for recycling pretreatment using fast pulse test and random forest machine learning. J. Power Sources 2024, 597, 234156. [Google Scholar] [CrossRef]
Shi, Z.; Zhu, C.; Liang, H.; Wang, S.; Yu, C. Multiple measurement health factors extraction and transfer learning with convolutional-BiLSTM algorithm for state-of-health evaluation of energy storage batteries. Ionics 2025, 31, 1699–1717. [Google Scholar] [CrossRef]
Movahedi, H.; Pannala, S.; Siegel, J.B.; Stefanopoulou, A.G. Physics-informed optimal experiment design of calendar aging tests and sensitivity analysis for SEI parameters estimation in lithium-ion batteries. IFAC-PapersOnLine 2023, 56, 433–438. [Google Scholar] [CrossRef]
Xiao, W.; Miao, S.; Jia, J.; Zhu, Q.; Huang, Y. Lithium-ion batteries fault diagnosis based on multi-dimensional indicator. IET Conf. Proc. 2022, 2021, 96–101. [Google Scholar] [CrossRef]
Xu, H.; Jia, J.; Xiao, W.; Hou, L.; Shang, Y. A high-precision state of health estimation method based on data augmentation for large-capacity lithium-ion batteries. J. Energy Storage 2024, 102, 114028. [Google Scholar] [CrossRef]
He, X.; Wu, Z.; Bai, J.; Zhu, J.; Lv, L.; Wang, L. A novel SOH estimation method for lithium-ion batteries based on the PSO–GWO–LSSVM prediction model with multi-dimensional health features extraction. Appl. Sci. 2025, 15, 3592. [Google Scholar] [CrossRef]
Mawonou, K.S.R.; Eddahech, A.; Dumur, D.; Beauvois, D.; Godoy, E. State-of-health estimators coupled to a random forest approach for lithium-ion battery aging factor ranking. J. Power Sources 2021, 484, 229154. [Google Scholar] [CrossRef]
Xiao, W.; Jia, J.; Zhong, W.; Liu, W.; Wu, Z.; Jiang, C.; Li, B. A novel differentiated control strategy for an energy storage system that minimizes battery aging cost based on multiple health features. Batteries 2024, 10, 143. [Google Scholar] [CrossRef]
Fan, C.; Tian, X.; Gu, C. Perturbation-based battery impedance characterization methods: From the laboratory to practical implementation. Batteries 2024, 10, 414. [Google Scholar] [CrossRef]
Luo, X.; Song, Y.; Bu, W.; Liang, H.; Zheng, M. A Joint Prediction of the State of Health and Remaining Useful Life of Lithium-Ion Batteries Based on Gaussian Process Regression and Long Short-Term Memory. Processes 2025, 13, 239. [Google Scholar] [CrossRef]

Figure 1. Algorithmic framework and workflow of the proposed SOH prediction method.

Figure 2. A typical segmented aging curve.

Figure 3. Probability density distribution of

| k_{SOH} |

and the mapping relationship of

p_{A A}

.

Figure 3. Probability density distribution of

| k_{SOH} |

and the mapping relationship of

p_{A A}

.

Figure 4. Probability density distribution of

r_{i}

and the mapping relationship of

p_{A A}^{*}

.

Figure 4. Probability density distribution of

r_{i}

and the mapping relationship of

p_{A A}^{*}

.

Figure 5. Operational data from a representative energy storage power station.

Figure 6. Illustration of the HF prediction procedure.

Figure 7. Original and fitted VAR curves.

Figure 8. Statistical distribution of calculated VARs for two batteries.

Figure 9. Schematic diagram of multiple paths with VAR = 0.5.

Figure 10. Map of C, DOD, and k with VAR = 0.5.

Figure 11. Map of C, DOD, and k with VAR = 0.3, 0.5, and 0.7.

Figure 12. Partial test profile of Cell #1: (a) current and (b) voltage.

Figure 13. One test cycle of Cell #1 (performance tests a–e, aging cycles f).

Figure 14. Aging curves of all tested cells.

Figure 15. Aging curves of all cells with polynomial fitting.

Figure 16. Capacity-related features vs. accumulated Ah for all cells.

Figure 17. Capacity-related features vs. SOH for all cells.

Figure 18. Slopes of capacity features vs.

k_{S O H}

for all cells.

Figure 18. Slopes of capacity features vs.

k_{S O H}

for all cells.

Figure 19. IC curve features (peak areas, heights) vs. Ah for all cells.

Figure 20. IC curve features vs. SOH for all cells.

Figure 21. Slopes of IC features vs.

k_{SOH}

for all cells.

Figure 21. Slopes of IC features vs.

k_{SOH}

for all cells.

Figure 22. Relaxation features (voltage recovery) vs. Ah for all cells.

Figure 23. Relaxation features vs. SOH for all cells.

Figure 24. Slopes of relaxation features vs.

k_{SOH}

for all cells.

Figure 24. Slopes of relaxation features vs.

k_{SOH}

for all cells.

Figure 25. Ohmic and polarization resistance vs. Ah for all cells.

Figure 26. Ohmic and polarization resistance vs. SOH for all cells.

Figure 27. Ohmic and polarization resistance slopes vs.

k_{SOH}

for all cells.

Figure 27. Ohmic and polarization resistance slopes vs.

k_{SOH}

for all cells.

Figure 28. Cumulative correlation ratio trends for all data (global breakpoints A = 0.90 and B = 0.82).

Figure 29. SOH estimation and feature importance for test cell using Model 3 (early stage).

Figure 30. SOH estimation and feature importance for test cell using Model 5 (middle stage).

Figure 31. SOH estimation and feature importance for test cell using Model 8 (late stage).

Figure 32. Online SOH prediction and accelerated-aging warning for cell C2-DOD50-2.

Figure 33. Online SOH prediction and accelerated-aging warning for cell C1-DOD70-1.

Figure 34. Online SOH prediction and accelerated-aging warning for cell C1.2-DOD30-1.

Figure 35. Online SOH prediction and accelerated-aging warning for cell C2-DOD50-2 with injected data errors (5% missing data).

Figure 36. Online SOH prediction and accelerated-aging warning for cell C2-DOD50-2 by GPR.

Table 1. Typical health features corresponding to different aging stages.

Feature	Req. Cond.?	Comp. Cost	Slow Aging	Accelerated	Thermal
Maximum available Li-ion concentration	Yes	High	✓	✓
SEI film resistance [23]	Yes	High	✓	✓
Overpotential (η)	Yes	High		✓	✓
Electrolyte loss	Yes	High		✓	✓
Active material loss	No	Medium	✓	✓	✓
IC curve features	Yes	Medium	✓	✓
Relaxation performance features	No	Medium	✓	✓	✓
Capacity variance (VAR) [24]	No	Low	✓	✓
Voltage segment capacities (Qsc, Qec) [25]	No	Low	✓	✓
HPPC-derived ohmic and polarization R	Yes	Low	✓	✓	✓
Pulse-derived equivalent resistance	No	Low	✓	✓	✓
Statistical metrics from CCCV curves	Yes	Low	✓	✓

Table 2. Aging test matrix.

	1	1.2	1.5	2
DOD	1	1.2	1.5	2
30	1, 2	9, 10	17, 18	25, 26
50	3, 4	11, 12	19, 20	27, 28
70	5, 6	13, 14	21, 22	29, 30
100	7, 8	15, 16	23, 24	31, 32

Table 3. Summary of correlations between health features and SOH metrics.

k_HF	k_VARs	k_Qec	k_Qsc	k_IC_Area1	k_IC_Area2	k_IC_Peak1
	0.688	−0.532	−0.572	0.390	−0.810	0.375
	k_IC_Peak2	k_IC_Voltage1	k_IC_Voltage2	k_relaxation_time_20	k_relaxation_time_40	k_relaxation_time_50
	−0.723	0.446	0.299	−0.012	−0.030	−0.038
	k_relaxation_time_60	k_relaxation_time_80	k_R_ohms_20	k_R_ohms_40	k_R_ohms_50	k_R_ohms_60
	−0.026	0.014	0.214	0.193	0.207	0.228
	k_R_ohms_80	k_R_ps_20	k_R_ps_40	k_R_ps_50	k_R_ps_60	k_R_ps_80
	0.190	0.209	0.208	0.200	0.205	0.211
HF	VARs	Qec	Qsc	IC_Area1	IC_Area2	IC_Peak1
	−0.987	0.662	0.968	−0.934	0.990	−0.910
	IC_Peak2	IC_Voltage1	IC_Voltage2	relaxation_time_20	relaxation_time_40	relaxation_time_50
	0.981	−0.742	−0.344	0.684	0.792	0.813
	relaxation_time_60	relaxation_time_80	R_ohms_20	R_ohms_40	R_ohms_50	R_ohms_60
	0.802	0.654	−0.638	−0.759	−0.750	−0.725
	R_ohms_80	R_ps_20	R_ps_40	R_ps_50	R_ps_60	R_ps_80
	−0.607	−0.793	−0.833	−0.842	−0.829	−0.759
	DCRs_20	DCRs_40	DCRs_50	DCRs_60	DCRs_80	-
	−0.700	−0.778	−0.792	−0.761	−0.660	-

Table 4. Indexing of models corresponding to different aging stages.

SOH Range	k Range	Number of Batteries Available for Training	Model Number
0.9–1.0	0.0042–0.0071	13	1
	0.0071–0.0100	11	2
	0.0100–0.0129	6	3
0.82–0.9	0.0049–0.0144	27	4
	0.0144–0.0238	2	5
	0.0238–0.0333	1	6
0–0.82	0.0047–0.0225	27	7
	0.0225–0.0403	2	8
	0.0403–0.0581	1	9

Table 5. Comparison between improved RFR and GPR baseline.

Model	RMSE	MAPE (%)	Early Warning Cycles
Proposed RFR	0.0093	0.78	218
GPR baseline	0.0115	1.03	79

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiao, W.; Jia, J.; Gao, W.; Li, H.; Xu, H.; Zhong, W.; He, K. Stage-Wise SOH Prediction Using an Improved Random Forest Regression Algorithm. Electronics 2026, 15, 287. https://doi.org/10.3390/electronics15020287

AMA Style

Xiao W, Jia J, Gao W, Li H, Xu H, Zhong W, He K. Stage-Wise SOH Prediction Using an Improved Random Forest Regression Algorithm. Electronics. 2026; 15(2):287. https://doi.org/10.3390/electronics15020287

Chicago/Turabian Style

Xiao, Wei, Jun Jia, Wensheng Gao, Haibo Li, Hong Xu, Weidong Zhong, and Ke He. 2026. "Stage-Wise SOH Prediction Using an Improved Random Forest Regression Algorithm" Electronics 15, no. 2: 287. https://doi.org/10.3390/electronics15020287

APA Style

Xiao, W., Jia, J., Gao, W., Li, H., Xu, H., Zhong, W., & He, K. (2026). Stage-Wise SOH Prediction Using an Improved Random Forest Regression Algorithm. Electronics, 15(2), 287. https://doi.org/10.3390/electronics15020287

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Stage-Wise SOH Prediction Using an Improved Random Forest Regression Algorithm

Abstract

1. Introduction

2. Offline–Online SOH Prediction Method

2.1. Offline Multi-Model Ensemble Training

2.1.1. Automatic SOH Staging

2.1.2. Stage-Wise Multi-Rate Model Set Training

2.1.3. Improved RFR Algorithm Incorporating Prediction Result Probability Distribution

2.2. Feature Selection and Importance Ranking

2.2.1. Experience-Based Manual Screening

2.2.2. Correlation Coefficient-Based Screening

2.2.3. Automated HF Importance Ranking Using the RFR Model

2.3. Online Model Update and Application

2.3.1. Short-Term HF Prediction Modeling and Application

2.3.2. Offline Model Set Selection and Update

3. Experimental Implementation and Data Processing

3.1. Experimental Method

3.2. Results and Analysis

4. Results of Model Training and Application

4.1. Offline Model Performance Evaluation

4.2. Online Application Case Studies

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI