Battery State-of-Health Estimation with Embedded Impedance Spectrum Features Under Multiple Battery Chemistry and Temperature Conditions

Yue Xiang; Dikshit Chauhan; Dipti Srinivasan

doi:10.3390/batteries12020077

Abstract

The transition to clean energy and electrification of transportation requires accurate, real-time monitoring of the state of health (SoH) of lithium-ion batteries, which serve as critical components for energy storage. Conventional SoH estimation methods typically rely on fixed statistical feature extraction, have poor generalization ability, and are unsuitable for multiple battery chemistry and temperature conditions. In this work, we propose a deep learning framework based on a transformer encoder and XGBoost to extract ageing-related electrochemical impedance spectroscopy (EIS) features, capturing low-, mid-, and high-frequency ageing characteristics, directly from daily operation profiles for capacity estimation. The approach requires only current, voltage, and temperature time-series data, making it suitable for edge deployment without the need for explicit EIS measurements. Validation on a dataset with two battery chemistries and three temperature conditions yields a root-mean-square error of 0.16% to 0.20% in capacity estimation. These results establish the feasibility of accurate SoH estimation during multiple operation of battery energy storage systems and electric vehicles.

Keywords:

state-of-health estimation; electrochemical impedance spectroscopy; transformer encoder; XGBoost regression; edge-deployable online monitoring

1. Introduction

The accelerating global transition toward electrified transportation, renewable energy integration, and large-scale stationary storage has placed lithium-ion batteries at the centre of modern energy infrastructure. As deployment expands from consumer electronics to electric vehicles (EVs), microgrids, and utility-scale storage plants, ensuring long-term reliability and safety becomes increasingly critical [1]. In these diverse applications, batteries are exposed to heterogeneous thermal, electrical, and usage environments, all of which accelerate degradation in complex and often unpredictable ways. Consequently, accurate and robust estimation of battery state of health (SoH) is now a key enabling technology for digital battery twins, second-life utilization, warranty evaluation, and predictive maintenance [2].

As transportation electrifies and clean-energy systems and consumer electronics expand rapidly, batteries, central components of energy storage, face increasingly stringent performance demands [1,2,3]. Lithium-ion batteries have attracted widespread industrial interest due to their substantially higher energy and power densities, as well as their longer cycle life [4,5]. However, due to coupled mechanical, electrical, and chemical processes, these cells inevitably undergo performance degradation during service [6,7]. Accurate, real-time estimation of battery SoH, therefore supports both the safe and economically optimal operation of energy storage systems [8]. SoH is most directly and effectively quantified by the ratio of current maximum available capacity to nominal capacity [9]; this metric directly informs safety alerts, high-precision state-of-charge (SoC) estimation, and battery depreciation and recycling strategies [9,10,11]. We therefore pursue the development of high-precision, online SoH estimation methods to further unlock the performance and long-term reliability of lithium-ion batteries.

1.1. Literature Review

Methods used for SoH estimation and prognosis are commonly grouped into three classes: electrochemical models, equivalent circuit models (ECMs), and data-driven models. Electrochemical models adopt a first-principles perspective, and, under appropriate assumptions, explicitly represent the internal electrochemical reactions and ion-electron transport processes of lithium-ion cells to infer the current maximum available capacity [12,13]. When battery chemistry and manufacturing parameters are known accurately and comprehensively, these models can achieve high fidelity [14,15,16,17]; however, their complexity and strong parameter sensitivity limit their direct use for system-level SoH estimation, and they are more often applied to materials design [18,19]. Schuster et al. [20] report on NMC-graphite cells and show that higher charge C-rates precipitate an earlier inflexion (“diving”) point in the capacity fade curve, providing direct evidence of accelerated, nonlinear loss; EIS and DC-resistance analyses in their study further reveal that the mid-frequency semicircle (charge-transfer kinetics) expands substantially with ageing (≈2× the initial diameter), with the DC-resistance rise largely attributable to cathode-interface resistance. Ding et al. [21] analyze Si-containing LCO cells and demonstrate, via EIS and equivalent-circuit fitting, a pronounced, monotonic increase in charge-transfer resistance as ageing proceeds; they attribute this impedance growth primarily to silicon-particle swelling and interfacial fracture that progressively degrade the electrode/electrolyte interface. Zhu et al. [22] use equivalent-circuit modelling to quantify cycling-driven trends and show that key impedance parameters, specifically charge-transfer resistance and the Warburg diffusion coefficient, increase monotonically during cycling, thereby corroborating the observation that EIS spectra shift toward higher impedance regions. You et al. [23] report multi-cell ageing tests under aggressive (4.5C) charging and find accelerated nonlinear capacity decay that enters a rapid-degradation regime after ≈200 cycles, while cells cycled under milder/low-temperature conditions degrade more linearly over hundreds of cycles; although the work does not present full EIS series, it proposes a porosity-reduction mechanism (SEI growth and lithium plating) that is consistent with a progressive increase in charge-transfer impedance. Wang et al. [24] identify a transition from near-linear to accelerated nonlinear capacity loss at approximately 85% SoH under both normal- and high-temperature cycling and directly show via EIS that aged spectra shift rightward with marked growth in mid-high-frequency arcs, effects they attribute to increased SEI and charge-transfer resistances. ECMs are phenomenological: classical control theory is used to construct circuit analogues of the cell’s external behaviour, and health is inferred from the ageing of equivalent-circuit parameters [25]. Electrochemical impedance spectroscopy (EIS) serves as the principal tool for parameter identification within ECM frameworks [26]. ECMs benefit from conceptual simplicity and ease of deployment; however, they degrade in performance during strongly nonlinear ageing stages [27]. The consistency of EIS-based parameter identification depends heavily on the current rate and temperature, which impedes real-world, real-time applications [28]. Data-driven models avoid the explicit modelling of complex internal physicochemical processes by extracting ageing-relevant features from large operational datasets and fitting nonlinear mappings between those features and SoH using machine learning [29]. Fueled by rapid advances in deep learning and the practical difficulty of constructing accurate first-principles models for complex systems, data-driven approaches are therefore attracting growing attention [4,30,31].

Lin et al. [32] propose a hybrid data-driven, ECM approach that uses identified internal resistance together with differential/incremental-derived thermoelectric features as inputs to an explainable boosting machine, achieving mean absolute error (MAE) < 1% on the Oxford dataset. However, the method relies on constant-current charging and specialized differential feature extraction, which may constrain its applicability for real-time, in-service SoH monitoring across diverse operating conditions. Lin et al. [33] propose an EIS-based feature pipeline in which inflexion points on Nyquist plots are converted into distance metrics, important features are selected via random forest, and an long short-term memory (LSTM) network is trained on the selected features for SoH prediction. The results show the lowest error when using the EIS+RF-selected features. Although the feature extraction is claimed to be automated and robust across operating conditions, it still depends on explicit EIS acquisition and is validated only at room and elevated temperatures, which may limit practicality and generalizability for continuous, in-service monitoring. Li et al. [34] present an ageing-feature framework based on an electrochemical model (EM): internal health features (IHFs) such as charge-transfer resistance, solid-phase diffusion coefficient, and electrode volume fraction are defined, while multi-stage external health features (EHFs) are extracted from segmented voltage and temperature traces; these IHFs and EHFs are then used with standard machine-learning regressors for offline and online SoH estimation. Experiments demonstrate improved estimation accuracy across various operating scenarios and charge–discharge modes; however, the dependence on EM-derived IHFs and staged feature extraction may complicate real-time deployment and scalability. Lin et al. [26] introduce a physics-informed deep-learning framework that fuses EIS measurements with three model-derived, physically informative parameters, employing two data-fusion schemes, physical regularization, multi-task learning, and deep-ensemble uncertainty quantification to produce interpretable capacity estimates. The approach achieves strong performance on eight commercial batteries from the Cambridge EIS dataset; however, its reliance on explicit EIS acquisition and model-based parameter extraction limits its applicability for continuous, in-service monitoring. Liu et al. [35] introduce the cumulative uninterrupted cycling duration (CUCD) concept and model the cycling-ageing drift rate as a monotonic spline function of CUCD, thereby capturing the coupling between calendar and cycling ageing; numerical and real-case validations demonstrate the model’s effectiveness for degradation assessment, though the work focuses on lifecycle drift modelling rather than online feature extraction from routine BMS signals. Liu et al. [36] propose a sequential variational Gaussian mixture regression (SVGMR) that jointly models partial charge curves and capacity, enabling the use of unlabeled data and an efficient sequential updating algorithm for online assimilation; this generative approach yields SoH estimates with uncertainty from partial voltage segments.

Existing feature-engineering paradigms typically select a small set of statistical descriptors that correlate with SoH and use these descriptors as regressors. Among such features, impedance descriptors derived from EIS show particularly strong correlation with SoH and can yield very high estimation accuracy. Two fundamental limitations restrict the practical use of EIS-based feature engineering in operational systems. First, performing EIS measurements during normal operation incurs substantial time cost and is practically infeasible in many contexts, such as electric vehicles or portable devices. Second, reliable EIS identification requires careful alignment of the test AC frequency, amplitude, and temperature, which further raises operational cost and complexity. These constraints motivate the development of methods that recover EIS-like information without relying on explicit, real-world EIS testing.

We propose a novel deep-learning framework that combines a transformer encoder with XGBoost, architected as an upstream feature-extraction network followed by a downstream regression network. The framework is designed to extract ageing-related EIS-like features directly from routine current, voltage, and temperature time series, thereby capturing impedance behaviour in low, mid, and high frequency bands for use in capacity regression. Only partial time series recorded during charging are required; no explicit EIS measurements are needed to achieve high-precision, online SoH estimation. Complexity analyses of both runtime and memory show that the method is compatible with edge computing platforms, enabling deployment on battery management system (BMS) hardware. By embedding impedance-spectrum information into a data-driven estimator, our approach represents a significant step toward continuous, lifecycle-scale monitoring of lithium-ion battery ageing.

1.2. Contributions

The key contributions of this work are as follows:

(1): A transformer, XGBoost, hybrid framework that extracts ageing-relevant, EIS-like impedance features directly from routine current-voltage-temperature time-series data, enabling accurate SoH estimation without explicit EIS measurements or specialized test protocols.
(2): A unified feature-extraction approach that captures low-, mid-, and high-frequency impedance behaviour during normal charging, generalizing across multiple battery chemistries and temperature conditions and achieving 0.16–0.20% root-mean-square error (RMSE) in capacity estimation.
(3): An edge-efficient model architecture, supported by runtime and memory analyses, demonstrating feasibility for real-time deployment on battery management systems for continuous, in-service SoH monitoring.
(4): A practical pathway toward embedded, lifecycle-scale health diagnostics, showing that impedance-spectrum information can be implicitly learned and leveraged from operational data alone.

The remainder of this paper is organized as follows. Section 2 describes the materials, datasets, and models used in this study. Section 3 and Section 4 present the experimental results and their corresponding discussions. Finally, Section 5 concludes the paper.

2. Materials and Methods

This section describes the datasets, preprocessing pipeline, and modelling framework used to develop and evaluate the proposed SoH estimation method. We first introduce the ageing dataset, including operating conditions, measurement protocols, and impedance spectrum extraction. We then detail the construction of input features, intermediate impedance labels, and cycle-level capacity targets used for supervised learning. Finally, we present the overall architecture of the hybrid transformer, XGBoost model, including the upstream impedance-feature extraction module, the downstream capacity-regression module, and the associated loss functions and training strategies.

2.1. Dataset

The ageing dataset used in this study is obtained from the publicly available Karlsruhe Institute of Technology (KIT) collection, titled “Data-driven capacity estimation of commercial lithium-ion batteries from voltage relaxation.” [37]. As shown in Table 1, the dataset comprises cycle-by-cycle ageing records for two chemistry systems, NCA (LiNi_0.86Co_0.11Al_0.03O₂) and NCM (LiNi_0.83Co_0.11Mn_0.07O₂), under nine operating conditions formed by three charging rates (0.25C, 0.5C, and 1C) and three ambient temperatures (25 °C, 35 °C, and 45 °C). The NCA group contains 66 cylindrical cells, and the NCM group contains 55 cylindrical cells. A single ageing cycle is executed as follows: cells are charged with a standard constant-current-constant-voltage (CC-CV) protocol where the CC stage uses the specified 0.25C/0.5C/1C rate until the upper voltage limit of 4.2 V is reached, and the CV stage terminates at a cutoff current of 0.05C; a 30 min rest follows charging. Discharge is performed at current rates ranging from 0.25C to 4C until the chemistry-dependent cutoff voltages of 2.65 V (NCA) and 2.5 V (NCM) are reached, and another 30 min rest is applied after discharge. Nominal capacity for both chemistries is 3500 mAh, and the capacity label for each cycle is computed by time-discretized integration of the measured discharge current. Charge and discharge data are sampled every 2 s, while rest periods are sampled every 60 s. The recorded ageing trajectories cover the most commonly used operational life window of lithium-ion cells (SoH ≈ 100% → 80%). For each operating condition, one cell is selected for full-lifecycle electrochemical impedance spectroscopy (EIS) testing across the frequency range of 0.047 Hz to 10,000 Hz.

Table 1. Overview of datasets.

Across the dataset, both capacity and impedance exhibit pronounced, non-linear degradation with cycle number. The charging rate and ambient temperature exert the largest influence on lifetime decay. Higher charging rates produce a more pronounced non-linear capacity loss, whereas lower rates result in an approximately linear decay, consistent with the steady loss of active material at the electrodes. Concurrently, the measured impedance spectra progressively shift toward higher impedance regions, indicating sustained thickening of the solid–electrolyte interphase (SEI) and an increase in charge-transfer resistance. To capture the global impedance-decay signature while maintaining low computational complexity suitable for edge deployment, we extract three representative impedance features from the EIS spectrum for each chemistry: a low-frequency impedance (NCA: 10 Hz; NCM: 0.25 Hz), a mid-frequency impedance (NCA: 100 Hz; NCM: 3 Hz), and a high-frequency impedance (NCA: 1000 Hz; NCM: 35 Hz). These impedance bands are used as ageing-sensitive feature variables in the subsequent modelling.

Figure 1B–G present the correlation analysis between the complex impedance components at the three selected frequencies and cycle capacity. Strong, near-linear relationships are observed: the real part at the lowest frequency (Re1) correlates negatively with capacity (r = −0.96) while its imaginary counterpart (Im1) correlates positively (r = 0.93). The mid-frequency real and imaginary parts show similarly strong relationships (Re2: r = −0.95; Im2: r = 0.97). At the highest frequency, the real part again correlates negatively (Re3: r = −0.94) and the imaginary part shows a strong positive correlation (Im3: r = 0.94). These results confirm that impedance components at the selected bands capture ageing-sensitive information that is tightly coupled to capacity loss. Given the strong positive correlation between internal resistance growth and health state degradation, the selection of impedance values at specific frequencies as ageing features is justified.

Figure 1. Overview of constant charging section (A) and correlation analysis between capacity and six impedance indices: Re1 (B), Im1 (C), Re2 (D), Im2 (E), Re3 (F), Im3 (G).

For each cycle, the measured current, voltage, and temperature traces are first processed to form the model inputs and labels. The constant-current (CC) charging segment is discretely integrated to obtain the CC-stage charging capacity; this capacity trace is then divided by the measured voltage to produce an incremental-capacity (IC) time series that serves as a dQ/dV proxy. From the CC stage, we extract the first 300 sampling points of the [voltage, IC, temperature] triple and apply a Savitzky–Golay filter to each component (window_length = 9, polyorder = 3) to suppress measurement noise. The smoothed three-channel sequence is concatenated to form the model input tensor (x). From the EIS spectrum, the real and imaginary parts are taken at the chosen low-mid-high frequency bands and assembled into a six-dimensional intermediate feature vector

(y = [R e 1, I m 1, R e 2, I m 2, R e 3, I m 3])

, which is used as an internal constraint during training. The discharge current is likewise discretely integrated and normalized by the nominal capacity (3500 mAh) to produce the cycle-level SoH label (z).

To enhance robustness under diverse operational conditions after deployment, two dataset-splitting strategies are adopted: random splitting and stratified splitting. For random splitting, all input tensors (x), impedance features (y), and capacity labels (z) from every operating condition are pooled and shuffled; 80% of the samples are drawn for training, and the remaining 20% are held out for testing. To mitigate potential gradient instability and accelerate convergence in deep learning, all inputs, intermediate constraints, and output labels are z-score normalized prior to training. Model performance is evaluated on the unseen test partition using MAE and RMSE. For stratified splitting, one representative cell from each operating condition is reserved to constitute a combined, shuffled test set, while the remaining cells are mixed to form the training set; this scheme more clearly quantifies the model’s ability to generalize across the full degradation trajectory. The same MAE and RMSE metrics are reported for the stratified experiments.

2.2. Model

Figure 2A illustrates the overall architecture of the proposed model, which is composed of an upstream feature-extraction module and a downstream feature-regression module.

Figure 2. Overview of Model structure (A) and upstream structure (B).

Upstream: A transformer encoder [38] is employed to map a truncated portion of the charging trajectory ([V, IC, Temperature]) to the impedance values at three designated spectral frequencies. The encoder stack consists of four transformer layers, each with eight multi-head attention heads and a hidden dimension of 64. The input is treated as a three-channel time series, and the encoder’s final pooled representation is forwarded to a multi-task head. The multi-task head is implemented as a three-layer multilayer perceptron with layer widths of 64 → 32 → 6 and GeLU activations, outputting the six impedance components (real and imaginary parts at three frequencies). The upstream prediction

(y)

(the 6-D complex impedance feature) is compared with the EIS labels during training and thus serves as an intermediate constraint. The detailed upstream schematic is shown in Figure 2B. By relying on attention rather than recurrence, the transformer overcomes the long-term memory degradation that limits RNN families (e.g., LSTM, GRU) and provides strong capacity to learn highly nonlinear mappings.

Downstream: XGBoost [39] is used to regress the cycle-level SoH from the six-dimensional complex-impedance feature vector ([Re1, Im1, Re2, Im2, Re3, Im3]). A tree ensemble is adopted with 100 estimators, a maximum tree depth of 5, a subsample ratio of 0.8, a column subsample per tree of 0.8, and a learning rate of 0.1. The downstream SoH prediction for the current cycle is compared with the ground-truth SoH label to provide the terminal training constraint.

Training minimizes a composite mean-squared error:

\begin{array}{l} L_{T o t a l} = M S E (\bar{Y}, Y) + M S E (\bar{Z}, Z), \\ M S E (\bar{Y}, Y) = \frac{1}{n} \sum_{i = 1}^{n} {({\bar{y}}_{i} - y_{i})}^{2}, {\bar{y}}_{i} \in \bar{Y}, y_{i} \in Y, \\ M S E (\bar{Z}, Z) = \frac{1}{n} \sum_{i = 1}^{n} {({\bar{z}}_{i} - z_{i})}^{2}, {\bar{z}}_{i} \in \bar{Z}, z_{i} \in Z, \end{array}

where

L_{T o t a l}

is total loss function of the model,

M S E

is the mean-squared error function,

\bar{Y}

are estimated impedance of upstream,

Y

are ground-truth impedance labels,

\bar{Z}

are estimated SoH of downstream and

Z

are the true SoH labels. All inputs, intermediate impedance labels and SoH outputs are z-score normalized prior to training, so the two MSE terms are placed on comparable numerical scales; under these conditions an unweighted sum provides balanced gradient signals without introducing scale bias. Moreover, our co-training procedure (alternatingly freezing/updating upstream and downstream modules) further reduces interference between objectives and promotes stable convergence. Model parameters are optimized using AdamW with an initial learning rate of 1 × 10⁻³ and cosine annealing to a final learning rate of 1 × 10⁻⁵ over 50 epochs. Because the downstream estimator is a decision-tree ensemble, a co-training scheme is adopted: upstream and downstream modules are trained collaboratively by alternatingly freezing one module while updating the other. All training and evaluation are performed on an NVIDIA A40 GPU.

3. Results

Figure 3 summarizes the cycle-level SoH estimates produced by the proposed framework for both the NCA and NCM cohorts across the nine operating conditions. For the NCA group (Figure 3A), the model attains high accuracy with MAE = 0.12% and RMSE = 0.20%. The largest errors occur in the early ageing stage, particularly when the measured maximum capacity falls within the 3200-3300 mAh range. We attribute this behaviour to an initial SEI-formation period in NCA cells, during which interfacial electrochemical properties are unstable, and the model struggles to track the incipient ageing trajectory. Accuracy improves markedly in the middle stage of ageing. For the NCM group (Figure 3B), performance is even higher (MAE = 0.09%, RMSE = 0.16%), a result that we ascribe to the more homogeneous, SEI-dominated ageing mode of the NCM negative electrode. The relatively concentrated degradation pathways in NCM produce more consistent impedance-capacity mappings that are easier for a data-driven model to capture. By contrast, surface reconstruction and oxygen release phenomena on NCA positive electrodes introduce competing ageing mechanisms that broaden the degradation manifold and complicate feature learning. Nevertheless, the proposed model’s SoH estimates substantially outperform previously reported results on the same dataset [37,40,41]. For fair benchmarking on the KIT dataset, we compare our results with recent state-of-the-art results (Table 2). Zhu et al. report RMSE = 1.10% using XGBoost/ElasticNet/SVR; Xiang et al. obtain MAE = 1.47% and RMSE = 2.06% with a bi-GRU + GPR hybrid; another RNN-based Cerberus model (Xiang et al.) reports MAE in the range 0.29–5.14% across regimes. In contrast, our transformer encoder + XGBoost pipeline with impedance-based intermediate supervision yields MAE = 0.12% and RMSE = 0.20% on the same benchmark, demonstrating a clear empirical advantage and supporting the efficacy of embedding multidimensional EIS constraints.

Figure 3. Capacity estimation results on Dataset 1 (NCA) and Dataset 2 (NCM). The estimation results and errors of capacity on Dataset 1 (NCA) (A) or Dataset 2 (NCM) (B).

Table 2. Results comparison on Dataset 1 (NCA) with existing SoH estimation techniques.

Figure 4 and Figure 5 present stratified-test estimates for the complex impedance components, which serve as intermediate outputs and constraints in the upstream model. Results indicate that the feature extraction network reliably recovers the six impedance indicators from the truncated charging sequences, as shown in Table 3. For the NCA cohort the estimated errors are: Re1 MAE = 0.09%, RMSE = 0.18%; Im1 MAE = 0.32%, RMSE = 1.17%; Re2 MAE = 0.07%, RMSE = 0.15%; Im2 MAE = 0.29%, RMSE = 0.63%; Re3 MAE = 0.06%, RMSE = 0.13%; Im3 MAE = 0.23%, RMSE = 0.76%. For the NCM cohort the estimates are: Re1 MAE = 0.16%, RMSE = 0.26%; Im1 MAE = 0.65%, RMSE = 1.10%; Re2 MAE = 0.04%, RMSE = 0.16%; Im2 MAE = 0.21%, RMSE = 0.44%; Re3 MAE = 0.04%, RMSE = 0.10%; Im3 MAE = 0.09%, RMSE = 0.21%.

Figure 4. Feature impedances estimation results on Dataset 1 (NCA). The estimation results and errors of Re1 (A), Im1 (B), Re2 (C), Im2 (D), Re3 (E), Im3 (F) on three typical batteries under different temperatures.

Figure 5. Feature impedances estimation results on Dataset 2 (NCM). The estimation results and errors of Re1 (A), Im1 (B), Re2 (C), Im2 (D), Re3 (E), Im3 (F) on three typical batteries under different temperatures.

Table 3. Estimation results of feature impedance.

Two general observations emerge. First, upstream estimation of resistive (real) components is consistently more accurate than estimation of reactive (imaginary) components; this pattern is expected because the imaginary terms are more sensitive to subtle changes in electrode-electrolyte interfacial coupling. Second, the largest deviations in impedance estimates occur in the early-to-mid ageing window, which mirrors the distribution of errors seen in capacity estimation. This coupling arises because the downstream SoH regressor depends on accurately tracked impedance indicators. When certain health-sensitive metrics are biassed during early to mid-ageing, the downstream model’s ability to learn the correct mapping to current SoH is impaired. Importantly, this observation motivates our use of EIS-derived intermediate supervision: by constraining the upstream encoder with multidimensional impedance bands, the model is guided to learn chemistry- and temperature-sensitive ageing signatures that generalize across operating conditions. Indeed, even under fully mixed random splitting of all conditions, the proposed model obtains MAE = 0.12% and RMSE = 0.20% on the NCA cohort, demonstrating that impedance-informed supervision materially improves cross-condition generalization. Notably, the SoH prediction accuracy is, overall, superior to the raw impedance reconstruction accuracy, which indicates that the downstream regression stage successfully identifies and filters out noise in the upstream features that are irrelevant to the true health state.

We conduct targeted ablation experiments to quantify the contribution of each intermediate impedance band to SoH estimation (Table 4). Each ablation is performed by masking the corresponding upstream impedance output while keeping all other model settings fixed; experiments are repeated and averaged to ensure stability. Masking the first impedance point (Z1) increases MAE and RMSE by 8.50% and 7.03%, respectively; masking Z2 increases MAE and RMSE by 9.32% and 7.88%; masking Z3 increases MAE and RMSE by 4.76% and 3.25%. Pairwise masking further elevates errors (for Z1+Z2: MAE and RMSE increase by 9.76% and 8.01%; for Z2+Z3, MAE and RMSE increase by 8.35% and 6.63%; for Z1+Z3, MAE and RMSE increase by 7.11% and 5.94%, respectively). Finally, removing all impedance supervision and training the model to regress SoH directly from [V, IC, T] (direct estimation) leads to the largest degradation (MAE and RMSE increase by 12.92% and 11.27%). These results show that (i) the full-feature model is robust, (ii) Z2 exerts the greatest influence, consistent with its stronger correlation to capacity reported in the Section 2.1, and (iii) the intermediate impedance constraint substantially improves SoH accuracy relative to direct sequence to SoH regression.

Table 4. Results of ablation and direct estimation experiments.

4. Discussion

In deployment, the estimator requires only the first 300 sampling points of the charging-stage measurements, as well as standard BMS-collected current, voltage, and temperature traces, to produce an immediate SoH prediction. When applied in a streaming-data regime, the memory footprint of these inputs amounts to approximately 3.6 kB, and a single forward inference completes in roughly 7.3 ms. From a complexity perspective, the upstream transformer’s self-attention has time complexity O(

L^{2} * d

) and working-memory cost that grows as O(L²), but with our fixed input length (L = 300) and modest hidden dimension (d = 64) the trained encoder occupies on the order of 1 MB of weights and requires only a few megabytes of transient working memory during a forward pass. The downstream XGBoost (100 trees, max depth five) requires only ≈0.1–0.2 MB of model storage and ≈500 node evaluations per sample, so its runtime and memory costs are negligible compared with the encoder; together these properties make the end-to-end pipeline practical for edge deployment and amenable to further latency/memory reduction via quantization or attention approximations. The resulting cycle-level SoH error remains within the range of 0.16% to 0.20%. These operational figures demonstrate that accurate, near-real-time health monitoring can be delivered with minimal data buffering and latency, meeting the tight resource envelopes of onboard BMS or edge-computing platforms.

A key practical advantage of the proposed pipeline is its deliberate simplicity. The upstream transformer and the downstream XGBoost regressor together form a compact, modular architecture that attains state-of-the-art accuracy without resorting to heavy ensembles or highly parameterized models. Because the runtime, memory, and interpretability constraints are satisfied, there is little justification, given the present dataset and objectives, for adopting substantially more complex architectures; in practice, added model complexity often yields marginal gains at the cost of deployability and maintainability. We emphasize that the apparent simplicity is purposefully combined with EIS-informed intermediate supervision: embedding low/mid/high-frequency impedance bands during training both reduces estimation error and smooths the underlying learning manifold, which improves convergence. Moreover, the recovered impedance diagnostics confer physical interpretability and enhance robustness across chemistries and temperatures, obviating the need for multiple specialized models while retaining deployability at the edge.

Although EIS measurements are used offline to provide impedance labels as an intermediate training constraint, no EIS testing is required after deployment. The upstream module’s impedance predictions serve primarily as an internal supervisory signal during training and remain an ancillary by-product at inference time; the downstream XGBoost regressor consumes only the recovered impedance features to estimate SoH. This intermediate constraint paradigm departs from classical feature-engineering approaches by embedding multidimensional impedance ageing structure directly into the learning objective. The consequence is twofold: the model is guided to learn physically meaningful, nonlinear mappings that reflect electrochemical degradation, and a degree of interpretability is conferred. Predicted impedance trends can be inspected to explain model decisions, rather than treating the network as an opaque black box. As shown in Section 3, this training strategy produces superior performance on the KIT dataset, underscoring its effectiveness for continuous, lifecycle-scale battery monitoring and straightforward integration into real-world BMS deployments.

5. Conclusions

We present a compact, deployable framework that achieves high-precision, real-time SoH estimation from routine BMS signals. By coupling a transformer encoder for upstream feature extraction with an XGBoost regressor for downstream mapping, impedance-like features in low, mid, and high frequency bands are embedded directly from partial charging traces ([V, IC, T]) and used to regress cycle-level capacity. On the KIT dataset, which spans two chemistries (NCA, NCM), three charge rates, and three temperatures, the proposed pipeline achieves cycle-level errors of MAE = 0.12% and RMSE = 0.20% for NCA and MAE = 0.09% and RMSE = 0.16% for NCM. The upstream module reliably recovers resistive impedance components with low reconstruction error and captures the principal ageing signatures that the downstream regressor exploits to produce accurate capacity estimates. The method combines operational simplicity with physical grounding: after offline training (where EIS labels are used as an intermediate supervisory signal), no further EIS measurements are required in deployment. Only the first 300 charging samples (approximately 3.6 kB) are streamed; single-cycle inference completes in approximately 7.3 ms. The intermediate impedance constraint both regularizes the learned mapping, improving robustness across chemistries, charge rates, and temperatures, and provides a succinct, physically interpretable diagnostic (predicted impedances) that can be inspected alongside the SoH estimate. By embedding multidimensional impedance structure into a compact, edge-capable estimator, the approach removes the operational burdens and high cost of continual laboratory-grade EIS while retaining its diagnostic power. This balance of accuracy, efficiency, and interpretability supports a practical shift in battery health management: lifecycle-scale, in situ ageing surveillance is made feasible for electric vehicles and grid-scale storage, enabling safer operation, targeted maintenance, and improved asset utilization.

Author Contributions

Conceptualization: Y.X., D.C. and D.S.; methodology: Y.X. and D.C.; resources: D.S.; supervision: D.S.; writing—original draft: Y.X. and D.C.; writing—review and editing: Y.X., D.C. and D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work is funded by the Energy Market Authority (EMA) of Singapore under the EDGE Program LA/Contract under Grant EDGE2-GC2022-008.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We would like to acknowledge the partial support from the NUS Central High-Performance Computing Facility for the computational work in this research.

Conflicts of Interest

The authors declare no competing interests.

References

Xie, L.; Singh, C.; Mitter, S.K.; Dahleh, M.A.; Oren, S.S. Toward carbon-neutral electricity and mobility: Is the grid infrastructure ready? Joule 2021, 5, 1908–1913. [Google Scholar] [CrossRef]
Chu, S.; Majumdar, A. Opportunities and challenges for a sustainable energy future. Nature 2012, 488, 294–303. [Google Scholar] [CrossRef]
Davis, S.J.; Lewis, N.S.; Shaner, M.; Aggarwal, S.; Arent, D.; Azevedo, I.L.; Benson, S.M.; Bradley, T.; Brouwer, J.; Chiang, Y.-M.; et al. Net-zero emissions energy systems. Science 2018, 360, eaas9793. [Google Scholar] [CrossRef]
Meng, Q.; Huang, Y.; Li, L.; Wu, F.; Chen, R. Smart batteries for powering the future. Joule 2024, 8, 344–373. [Google Scholar] [CrossRef]
Levin, T.; Bistline, J.; Sioshansi, R.; Cole, W.J.; Kwon, J.; Burger, S.P.; Crabtree, G.W.; Jenkins, J.D.; O’Neil, R.; Korpås, M. Energy storage solutions to decarbonize electricity through enhanced capacity expansion modelling. Nat. Energy 2023, 8, 1199–1208. [Google Scholar] [CrossRef]
Hu, X.; Xu, L.; Lin, X.; Pecht, M. Battery lifetime prognostics. Joule 2020, 4, 310–346. [Google Scholar] [CrossRef]
Palacín, M.R.; de Guibert, A. Why do batteries fail? Science 2016, 351, 1253292. [Google Scholar] [CrossRef] [PubMed]
Che, Y.; Hu, X.; Teodorescu, R. Opportunities for battery aging mode diagnosis of renewable energy storage. Joule 2023, 7, 1405–1407. [Google Scholar] [CrossRef]
Guo, R.; Tian, J. Battery health management in the era of big field data. Joule 2024, 8, 2951–2953. [Google Scholar] [CrossRef]
Melin, H.E.; Rajaeifar, M.A.; Ku, A.Y.; Kendall, A.; Harper, G.; Heidrich, O. Global implications of the EU battery regulation. Science 2021, 373, 384–387. [Google Scholar] [CrossRef]
Tao, Y.; Rahn, C.D.; Archer, L.A.; You, F. Second life and recycling: Energy and environmental sustainability perspectives for high-performance lithium-ion batteries. Sci. Adv. 2021, 7, eabi7633. [Google Scholar] [CrossRef]
Newman, J.S.; Tobias, C.W. Theoretical Analysis of Current Distribution in Porous Electrodes. J. Electrochem. Soc. 1962, 109, 1183. [Google Scholar] [CrossRef]
Gu, X.; Wang, X.; Ren, Y.; Zhou, W.; Huan, X.; Siegel, J.; Jiang, W.; Song, Z. Mechanical information enhanced battery state-of-health estimation. eTransportation 2025, 25, 100440. [Google Scholar] [CrossRef]
Doyle, M.; Fuller, T.F.; Newman, J. Modeling of Galvanostatic Charge and Discharge of the Lithium/Polymer/Insertion Cell. J. Electrochem. Soc. 1993, 140, 1526. [Google Scholar] [CrossRef]
Haran, B.S.; Popov, B.N.; White, R.E. Determination of the hydrogen diffusion coefficient in metal hydrides by impedance spectroscopy. J. Power Sources 1998, 75, 56–63. [Google Scholar] [CrossRef]
Ding, S.; Li, Y.; Dai, H.; Wang, L.; He, X. Accurate Model Parameter Identification to Boost Precise Aging Prediction of Lithium-Ion Batteries: A Review. Adv. Energy Mater. 2023, 13, 2301452. [Google Scholar] [CrossRef]
Liu, W.; Hu, X.; Zhang, K.; Xie, Y.; He, J.; Song, Z. Enabling high-fidelity electrothermal modeling of electric flying car batteries: A physics-data hybrid approach. Appl. Energy 2025, 388, 125633. [Google Scholar] [CrossRef]
Xiao, J.; Adelstein, N.; Bi, Y.; Bian, W.; Cabana, J.; Cobb, C.L.; Cui, Y.; Dillon, S.J.; Doeff, M.M.; Islam, S.M. Assessing cathode–electrolyte interphases in batteries. Nat. Energy 2024, 9, 1463–1473. [Google Scholar] [CrossRef]
Fan, C.; Liu, K.; Zhu, T.; Peng, Q. Understanding of Lithium-ion battery degradation using multisine-based nonlinear characterization method. Energy 2024, 290, 130230. [Google Scholar] [CrossRef]
Schuster, S.F.; Bach, T.; Fleder, E.; Müller, J.; Brand, M.; Sextl, G.; Jossen, A. Nonlinear aging characteristics of lithium-ion cells under different operational conditions. J. Energy Storage 2015, 1, 44–53. [Google Scholar] [CrossRef]
Ding, S.; Jiang, B.; Liang, Y.; Fu, L.; Qu, C.; Wei, X.; Xie, H.; Dai, H. Decoding Silicon-Driven Degradation: An Adaptive Fusion Framework for Robust Battery Electrode-level Diagnostics. Energy Storage Mater. 2026, 84, 104874. [Google Scholar] [CrossRef]
Zhu, J.; Darma, M.S.D.; Knapp, M.; Sørensen, D.R.; Heere, M.; Fang, Q.; Wang, X.; Dai, H.; Mereacre, L.; Senyshyn, A. Investigation of lithium-ion battery degradation mechanisms by combining differential voltage analysis and alternating current impedance. J. Power Sources 2020, 448, 227575. [Google Scholar] [CrossRef]
You, H.; Wang, X.; Zhu, J.; Jiang, B.; Han, G.; Wei, X.; Dai, H. Investigation of lithium-ion battery nonlinear degradation by experiments and model-based simulation. Energy Storage Mater. 2024, 65, 103083. [Google Scholar] [CrossRef]
Wang, M.; Wu, S.; Chen, Y.; Luan, W. The snowball effect in electrochemical degradation and safety evolution of lithium-ion batteries during long-term cycling. Appl. Energy 2025, 378, 124909. [Google Scholar] [CrossRef]
Nejad, S.; Gladwin, D.T.; Stone, D.A. A systematic review of lumped-parameter equivalent circuit models for real-time estimation of lithium-ion battery states. J. Power Sources 2016, 316, 183–196. [Google Scholar] [CrossRef]
Lin, Y.-H.; Ruan, S.-J.; Chen, Y.-X.; Li, Y.-F. Physics-informed deep learning for lithium-ion battery diagnostics using electrochemical impedance spectroscopy. Renew. Sustain. Energy Rev. 2023, 188, 113807. [Google Scholar] [CrossRef]
Diao, W.; Kim, J.; Azarian, M.H.; Pecht, M. Degradation modes and mechanisms analysis of lithium-ion batteries with knee points. Electrochim. Acta 2022, 431, 141143. [Google Scholar] [CrossRef]
Hu, X.; Deng, Z.; Lin, X.; Xie, Y.; Teodorescu, R. Research directions for next-generation battery management solutions in automotive applications. Renew. Sustain. Energy Rev. 2021, 152, 111695. [Google Scholar] [CrossRef]
Wang, Y.; Guo, S.; Cui, Y.; Deng, L.; Zhao, L.; Li, J.; Wang, Z. A comprehensive review of machine learning-based state of health estimation for lithium-ion batteries: Data, features, algorithms, and future challenges. Renew. Sustain. Energy Rev. 2025, 224, 116125. [Google Scholar] [CrossRef]
Das, K.; Kumar, R.; Krishna, A. Analyzing electric vehicle battery health performance using supervised machine learning. Renew. Sustain. Energy Rev. 2024, 189, 113967. [Google Scholar] [CrossRef]
You, G.-w.; Park, S.; Oh, D. Real-time state-of-health estimation for electric vehicle batteries: A data-driven approach. Appl. Energy 2016, 176, 92–103. [Google Scholar] [CrossRef]
Lin, M.; Yan, C.; Wang, W.; Dong, G.; Meng, J.; Wu, J. A data-driven approach for estimating state-of-health of lithium-ion batteries considering internal resistance. Energy 2023, 277, 127675. [Google Scholar] [CrossRef]
Sin, S.; Cho, S.; Lee, P.; Abbas, M.; Lee, S.; Kim, J. Data-driven prediction of battery degradation using EIS-based robust features. In Proceedings of the 2022 IEEE Energy Conversion Congress and Exposition (ECCE), Detroit, MI, USA, 9–13 October 2022 ; pp. 1–5. [Google Scholar]
Li, X.; Ju, L.; Geng, G.; Jiang, Q. Data-driven state-of-health estimation for lithium-ion battery based on aging features. Energy 2023, 274, 127378. [Google Scholar] [CrossRef]
Liu, X.; Hu, Z.; Wang, X.; Xie, M. Capacity degradation assessment of lithium-ion battery considering coupling effects of calendar and cycling aging. IEEE Trans. Autom. Sci. Eng. 2023, 21, 3052–3064. [Google Scholar] [CrossRef]
Liu, X.; Hu, Z.; Mao, L.; Xie, M. Adaptive State of Health Estimation for Lithium-ion Battery with Partially Unlabeled and Incomplete Charge Curves. IEEE Trans. Transp. Electrif. 2024, 11, 6165–6176. [Google Scholar] [CrossRef]
Zhu, J.; Wang, Y.; Huang, Y.; Bhushan Gopaluni, R.; Cao, Y.; Heere, M.; Mühlbauer, M.J.; Mereacre, L.; Dai, H.; Liu, X. Data-driven capacity estimation of commercial lithium-ion batteries from voltage relaxation. Nat. Commun. 2022, 13, 2261. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; NIPS: Grenada, Spain, 2017; Volume 30. [Google Scholar]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Xiang, Y.; Jiang, B.; Dai, H. Cerberus: A Deep Learning Hybrid Model for Lithium-Ion Battery Aging Estimation and Prediction Based on Relaxation Voltage Curves. arXiv 2023, arXiv:2308.07824. [Google Scholar] [CrossRef]
Xiang, Y.; Fan, W.; Zhu, J.; Wei, X.; Dai, H. Semi-supervised deep learning for lithium-ion battery state-of-health estimation using dynamic discharge profiles. Cell Rep. Phys. Sci. 2024, 5, 101763. [Google Scholar] [CrossRef]

Figure 1. Overview of constant charging section (A) and correlation analysis between capacity and six impedance indices: Re1 (B), Im1 (C), Re2 (D), Im2 (E), Re3 (F), Im3 (G).

Figure 2. Overview of Model structure (A) and upstream structure (B).

Figure 3. Capacity estimation results on Dataset 1 (NCA) and Dataset 2 (NCM). The estimation results and errors of capacity on Dataset 1 (NCA) (A) or Dataset 2 (NCM) (B).

Figure 4. Feature impedances estimation results on Dataset 1 (NCA). The estimation results and errors of Re1 (A), Im1 (B), Re2 (C), Im2 (D), Re3 (E), Im3 (F) on three typical batteries under different temperatures.

Figure 5. Feature impedances estimation results on Dataset 2 (NCM). The estimation results and errors of Re1 (A), Im1 (B), Re2 (C), Im2 (D), Re3 (E), Im3 (F) on three typical batteries under different temperatures.

Table 1. Overview of datasets.

Dataset	Source	Cathode	Number of Cells	Form Factor	Nominal Capacity	Charge Condition	Temperature	Sampling
1	KIT	NCA (LiNi_0.86Co_0.11Al_0.03O₂)	66	18650 cylindrical	3500 mAh	0.25C/0.5C/1C	25 °C/35 °C/45 °C	Charge/Discharge: 2 s, Relax: 60 s
2	KIT	NCM (LiNi_0.83Co_0.11Mn_0.07O₂)	55	18650 cylindrical	3500 mAh	0.25C/0.5C/1C	25 °C/35 °C/45 °C	Charge/Discharge: 2 s, Relax: 60 s

Table 2. Results comparison on Dataset 1 (NCA) with existing SoH estimation techniques.

Research	Zhu et al. [37]	Xiang et al. [41]	Xiang et al. [40]	This Paper
Techniques	XGBoost/ElasticNet/SVR	bi-GRU + GPR	RNN-based	Transformer + XGBoost
MAE	Not Available	1.47%	0.29%~5.14%	0.12%
RMSE	1.10%	2.06%	Not Available	0.20%

Table 3. Estimation results of feature impedance.

Estimation		Re1	Im1	Re2	Im2	Re3	Im3
Dataset 1 (NCA)	MAE	0.09%	0.32%	0.07%	0.29%	0.06%	0.23%
Dataset 1 (NCA)	RMSE	0.18%	1.17%	0.15%	0.63%	0.13%	0.76%
Dataset 2 (NCM)	MAE	0.16%	0.65%	0.04%	0.21%	0.04%	0.09%
Dataset 2 (NCM)	RMSE	0.26%	1.10%	0.16%	0.44%	0.10%	0.21%

Table 4. Results of ablation and direct estimation experiments.

Impedance Features	[Z2, Z3]	[Z1, Z3]	[Z1, Z2]	[Z1]	[Z2]	[Z3]	Direct
Delta MAE	8.50%	9.32%	4.76%	8.35%	7.11%	9.76%	12.92%
Delta RMSE	7.03%	7.88%	3.25%	6.63%	5.94%	8.01%	11.27%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.