1. Introduction
With the rapid development of electric vehicles (EVs), battery safety has become a critical issue. Battery-related failures account for a substantial proportion of performance degradation and safety incidents in EVs [
1,
2]. Among various battery chemistries, lithium iron phosphate (LFP) batteries are widely used in cost-effective EVs due to their high thermal stability, long cycle life, and intrinsic safety [
3,
4]. In addition to LFP, nickel-rich layered oxides such as lithium nickel manganese cobalt oxide (NMC) and lithium nickel cobalt aluminum oxide (NCA) are extensively adopted in high-energy-density EVs, owing to their superior specific capacity and energy density. However, these chemistries are generally more susceptible to thermal runaway and require stricter safety management. Ensuring safe operation over the entire battery life cycle therefore necessitates continuous monitoring of the battery health state [
5]. In this context, accurate estimation and prediction of the state of health (SOH) [
6] are essential for early fault diagnosis and second-life applications [
7].
SOH prediction typically involves four stages: data acquisition, health indicator extraction, mapping of features to SOH, and generalization validation on new data [
8]. Common data sources include voltage [
9], current [
10], temperature, and electrochemical impedance spectroscopy (EIS) [
11,
12]. However, voltage, current, and temperature signals are highly dynamic under real-world conditions, making it difficult to extract stable and reliable features. EIS, on the other hand, provides richer electrochemical information but is usually collected offline, which limits its real-time applicability due to time-consuming measurements.
Although voltage, current, and temperature signals [
13] are highly dynamic under real-world conditions, they can still yield informative health indicators through appropriate signal processing techniques. For instance, incremental capacity analysis (ICA) has been extensively applied to voltage–capacity curves, where peak positions, amplitudes, and areas are extracted as robust indicators of degradation [
14]. A random forest (RF)-based approach in [
9] averaged IC and IIC peak coordinates to assess battery consistency, achieving 97.2% classification accuracy. However, such methods are often sensitive to noise, baseline drift, and data preprocessing. More recent advances have sought to enhance robustness, such as adaptive ICA schemes that adjust smoothing parameters and sampling intervals according to operating conditions, thereby enabling feature extraction from partial charging data [
15]. These developments demonstrate that even under dynamic load profiles, voltage and current signals can provide reliable features if properly processed.
Other approaches employ hybrid pulse power characterization (HPPC) combined with equivalent circuit models (ECMs) [
16] to extract resistance and capacitance values. For example, the elite opposition-based learning snake optimizer (EOLSO) was applied in [
17] to parameterize a second-order RC model, outperforming conventional algorithms. While ECM-based methods offer physical interpretability, they may struggle to represent complex diffusion dynamics within the battery.
Features can be derived from ECM parameterization or from the distribution of relaxation times (DRTs) [
18]. DRT approaches, in particular, enable the extraction of relaxation times, peak positions, and integrated areas, though they are highly sensitive to regularization parameters and demand high data integrity to avoid false peaks [
19]. In parallel, data-driven methods, especially deep neural networks, have shown the ability to learn features directly from raw signals [
20,
21]. These approaches can achieve impressive accuracy but rely on large, high-quality datasets and often lack interpretability, which limits their deployment in safety-critical applications.
Overall, SOH modeling approaches can be categorized as physics-based or data-driven. Physics-based models are grounded in the electrochemical mechanisms and dynamics of batteries and offer good interpretability. However, they are difficult to generalize across different chemistries and operational conditions. In contrast, Data-driven methods, including machine learning models [
22,
23,
24], offer flexibility and strong predictive capability but are susceptible to overfitting and require substantial training data.
To enable accurate SOH prediction using EIS data within a limited frequency range, this study proposes the following contributions: (1) An interpretable empirical prediction model is developed; (2) Effective feature parameters are identified from partial EIS data; (3) The influence of state of charge (SOC) is considered in optimizing feature combinations; (4) The proposed method is validated using EIS measurements under sub-zero temperature conditions, highlighting the effectiveness of inductive-phase impedance features in SOH prediction.
2. Experimental Materials and Methods
To prevent temperature fluctuations caused by environmental influences, the test chamber was precisely maintained at −5 °C throughout the experiment. The experimental subjects were three prismatic LiFePO4 (LFP) batteries (Model: LF50F) produced by EVE Power Co., Ltd. (Jingmen, China), each measuring 148.3 mm (L) × 26.7 mm (W) × 127 mm (H) with a nominal capacity of 50 Ah. In order to investigate the electrochemical impedance responses under sub-zero temperature conditions (−5 °C) at various states of charge (SOC), the following experimental procedures were designed:
To obtain impedance data at different stages of battery health, the cell underwent aging cycles under a controlled ambient temperature of 25 °C using a 1C constant current-constant voltage (CC-CV) charging and 1C constant current discharging protocol. Each aging cycle consisted of 40 full charge–discharge operations. The discharge capacity recorded at the final cycle of each aging phase was used as the reference capacity to indicate the state of health (SOH) at that stage.
To capture the impedance response at different SOC levels, the cell was conditioned in a thermal chamber at −5 °C for 2 h. Based on the findings from reference [
25], which investigated the rest period effects on LiFePO
4 battery impedance measurements, a 2 h stabilization time was adopted in this experiment. The SOC was adjusted using controlled charging and discharging steps. After each aging cycle, electrochemical impedance spectroscopy (EIS) measurements were conducted at SOC levels of 0%, 20%, 40%, 60%, 80%, and 100% using an electrochemical workstation. The EIS tests were performed with an AC excitation current of 3A over a frequency range from 10 kHz to 1 kHz. As illustrated in
Figure 1, the voltage amplitude responses obtained from the EIS measurements under different SOC conditions remain below 10 mV across the selected frequency range (1–10 kHz). This indicates that the applied perturbation (3A excitation current) is sufficiently small to avoid nonlinear polarization effects. Therefore, the experimental data satisfy the small-signal linearity condition, ensuring that the extracted impedance parameters accurately reflect the intrinsic electrochemical characteristics of the battery system.
In this experiment, temperature sensors were deployed at both the geometric center of the battery and the positive tab to monitor the temperature of the cell body and the current connection point, respectively. Temperature data were synchronously recorded using the integrated data acquisition system of the Neware battery testing equipment, which had been properly calibrated and adjusted for measurement accuracy.
3. Methodology
3.1. Feature Extraction
Figure 2 presents the Nyquist plots of impedance spectra obtained at various states of charge (0–100%) under a low-temperature condition of −5 °C, during a specific aging stage. All measurements were performed in the ultra-high frequency range from 10 kHz to 1 kHz, and a clear SOC-dependent variation in impedance behavior can be observed. Notably, the impedance curves exhibit a distinct bending toward the inductive region at higher frequencies, where the imaginary component becomes positive.
This phenomenon is attributed to multiple low-temperature effects, including increased charge transfer resistance, limited ionic mobility, and reduced electrode reaction kinetics, which together cause a rise in the inductive part of the impedance. Although inductive responses in this frequency region are often disregarded in conventional battery EIS analysis—primarily due to their weak correlation with dominant electrochemical storage mechanisms—they may still encode meaningful information. For instance, they can reflect interfacial dynamics, current collector inductance, or transient electrode behavior, especially under extreme conditions. Therefore, this study aims to extract and analyze meaningful features from the EIS curves in the high-frequency inductive region to deepen the understanding of battery characteristics at low temperatures.
To characterize the inductive arc observed in the ultra-high-frequency region of the impedance spectra, an equivalent circuit model (ECM) was constructed, as shown in
Figure 1. This model consists of a series connection of an ohmic resistor
and a parallel branch composed of a resistor
and an inductor
. Such a structure is designed to capture the dynamic high-frequency behavior that becomes prominent under low-temperature conditions due to slow interfacial kinetics and limited ion mobility.
The total impedance of this circuit, denoted
, is expressed as:
The real and imaginary components of this impedance are given by:
By eliminating the frequency-dependent terms, the impedance locus satisfies the following circular relationship:
This represents a semicircle in the complex plane, centered at with radius , corresponding to the inductive arc located in the fourth quadrant of the Nyquist plot.
To extract model parameters from experimental data, a least-squares fitting method was adopted. The impedance data points
, measured within the ultra-high-frequency range (10 kHz to 1 kHz), are assumed to satisfy the implicit form of the circular equation:
This is reformulated as a linear least-squares problem by defining:
The objective is to minimize the total squared error:
This leads to a solvable linear system, from which the optimal coefficients
and
are obtained. These coefficients are then used to compute the center
and radius
of the fitted circle. Based on the identified circle parameters, the corresponding equivalent circuit parameters are derived as:
The inductance
is further calculated by substituting the real part of the measured impedance at a fixed high frequency (e.g., 10 kHz) into the analytical expression:
Solving this equation for yields the final inductive parameter. This methodology minimizes the total fitting error and effectively transforms a nonlinear geometric identification problem into a linear least-squares formulation, making it well-suited for robust parameter extraction from high-frequency impedance data under extreme low-temperature conditions.
3.2. Feature Weight Optimization
To achieve reliable and accurate estimation of battery State of Health (SOH) under varying environmental and operational conditions, such as temperature and State of Charge (SOC), we propose a hybrid machine learning framework that integrates Bayesian optimization (BO) for adaptive feature weighting with two ensemble regression algorithms: Random Forest (RF) and XGBoost.
Each input sample is represented by a three-dimensional feature vector extracted from Electrochemical Impedance Spectroscopy (EIS) measurements in the ultra-high-frequency range. Specifically, the three circuit parameters identified in the previous section are selected as physically interpretable features that reflect the battery’s internal electrochemical dynamics:
These features are highly sensitive to changes in battery aging and low-temperature behavior and therefore serve as informative inputs for SOH modeling.
To further enhance prediction accuracy, a feature weighting strategy is employed, where each feature is assigned an optimal weight
learned via Bayesian optimization. The weighted feature vector is given by:
The weighted features are then fed into both RF and XGBoost models, which are trained to regress the SOH value. This combination of physics-informed features and data-driven optimization enables the model to generalize well across different SOC levels and temperature conditions.
These features correspond to specific frequency domain characteristics of the battery, capturing impedance-related information at fixed measurement points. However, under varying operational scenarios, the relative importance of these features may change. To accommodate this, a set of trainable feature weights is introduced:
The original feature vector is then transformed via element-wise multiplication, resulting in a weighted feature input:
Rather than using fixed weights, the vector
is optimized through Bayesian Optimization. BO constructs a probabilistic surrogate model over the space of weight vectors and seeks the optimal weight configuration that minimizes the model prediction error. Specifically, for a given regression model
, the following loss function is defined as the BO objective:
Here, denotes the measured SOH label of the -th sample, and is the model prediction based on the weighted feature input. At each iteration, BO proposes a candidate weight vector , retrains the model with weighted features, and evaluates the resulting RMSE. This process is repeated for a predefined number of steps, yielding the optimal feature weight vector that is subsequently used in final model training.
3.3. SOH Evaluation Model
3.3.1. Bayesian-Optimized Random Forest (BO-RF)
Random Forest is a bagging-based ensemble learning model that constructs multiple decision trees using bootstrapped data and random feature splits. In the proposed framework, the input to each tree is the optimized weighted feature vector
, and the final SOH prediction is obtained by averaging the outputs of all trees:
where
is the prediction of the
-th tree, and
is the total number of trees in the forest. The model is trained to minimize the mean squared error:
This structure allows RF to capture nonlinear interactions among the weighted features while maintaining resistance to overfitting through ensemble averaging.
3.3.2. Bayesian-Optimized XGBoost (BO-XGBoost)
Reference [
26] has demonstrated that the XGBoost-based approach can achieve prediction accuracy of up to 90% using only a limited number of features, thereby highlighting its potential for embedded BMS applications. Building upon this, the present study integrates Bayesian weight optimization with machine learning models to enable highly accurate SOH prediction. XGBoost is a gradient boosting framework that builds decision trees in a stage-wise manner, each learning to correct the residuals of the previous model. After applying the optimized feature weights
, the prediction is expressed as:
where
is the
-th regression tree and
denotes the space of all possible trees. The model is trained to minimize the following regularized loss function:
Here, is the number of leaves in the -th tree, represents the output value of leaf , and , are regularization hyperparameters. The inclusion of penalizes overly complex trees, improving the model’s generalization ability. The use of first- and second-order gradient information in tree construction also accelerates convergence and enhances robustness.
The complete prediction process consists of three stages: (1) feature extraction from EIS data, (2) feature reweighting via Bayesian optimization to learn , and (3) final SOH regression using either the BO-RF or BO-XGBoost model. This framework not only allows flexible adaptation to diverse operating conditions but also emphasizes interpretable feature importance and optimized model accuracy across battery life cycles.
To provide a clear visualization of the proposed SOH prediction framework,
Figure 3 illustrates the overall architecture of both Bayesian-optimized Random Forest (BO-RF) and Bayesian-optimized GBDT models (BO-GBoost/XGBoost). The input feature vector is first reweighted using Bayesian optimization, after which the reweighted features are passed into the ensemble models for regression. For the RF-based model, predictions from each tree are averaged, whereas in the GBDT-based model, each tree sequentially fits the residuals from the previous stage to refine the final prediction.
This pipeline is repeated across different SOC levels to assess model robustness. The resulting optimal weights and RMSE values provide insight into both feature importance and model suitability.
4. Results and Discussions
4.1. High-Frequency Feature Extraction
As shown in
Figure 4,
consistently exhibits positive correlations with SOH across all SOCs, with only minor variations, indicating that its relationship with SOH is largely independent of SOC and thus highly generalizable. Similarly,
maintains strong negative correlations with SOH across different SOC levels, further confirming its robustness as a health-related feature.
In contrast, displays substantial fluctuations across both SOCs and cells, suggesting that it is influenced by factors beyond the battery itself, such as measurement equipment and testing conditions. Consequently, is not considered as a feature for SOH prediction in this study.
These findings demonstrate that and are SOC-generalizable indicators suitable for constructing SOH prediction models that are robust across a wide range of SOC conditions, whereas the variability of highlights the need for further investigation to mitigate external influences in future studies.
4.2. Model Comparison Across SOC Conditions
Figure 5 illustrates the comparison between predicted SOH values and measured SOH values for three ensemble learning models—Random Forest, GBoost, and XGBoost—across the training set, validation set, and test set. The input features encompass
,
,
and
. The training set incorporates feature parameters at 0%, 20%, 40%, and 60% SOC, while the validation and test sets utilize parameters at 80% and 100% SOC, respectively.
In each figure, a direct visual comparison of the predicted and measured SOH values is presented. Across all datasets, the Random Forest model exhibits a high degree of accuracy, with its predicted values closely following the measured values, as indicated by the points densely clustered around the red diagonal line. This is further supported by its RMSE values in
Table 1: 0.066 for the training set, 0.011 for the validation set, and an extremely low 0.001 for the test set.
Table 1 summarizes the RMSE values for SOH predictions using three machine learning models. The Random Forest model achieves RMSEs of 0.066, 0.184, and 0.170 for the training, validation, and test sets, respectively, indicating consistently accurate predictions. Visual inspection further confirms a tight alignment of predicted values with the measured SOH.
In comparison, Gradient Boosting shows very low training RMSE (0.011) but higher validation (0.191) and test (0.210) errors, suggesting overfitting. Its predicted points are more dispersed around the diagonal, reflecting lower prediction consistency. XGBoost achieves training, validation, and test RMSEs of 0.001, 0.203, and 0.194, respectively. While slightly outperforming Gradient Boosting in some cases, its predictions still deviate more from measured values than those of Random Forest.
Overall, considering both RMSE metrics and the visual distribution of predictions, Random Forest demonstrates superior accuracy and consistency, making it the most suitable model for battery SOH estimation.