Next Article in Journal
Graph Representation Learning for Battery Energy Systems in Few-Shot Scenarios: Methods, Challenges and Outlook
Previous Article in Journal
Distributed Energy Storage Configuration Method for AC/DC Hybrid Distribution Network Based on Bi-Level Optimization
Previous Article in Special Issue
Remaining Useful Life Prediction of Lithium Batteries Based on Transfer Learning and Particle Filter Fusion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data-Driven State-of-Health Estimation by Reconstructing Virtual Full-Charge Segments

School of Mechanical Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
*
Author to whom correspondence should be addressed.
Batteries 2026, 12(1), 10; https://doi.org/10.3390/batteries12010010 (registering DOI)
Submission received: 8 December 2025 / Revised: 20 December 2025 / Accepted: 24 December 2025 / Published: 26 December 2025

Abstract

The rapid growth of new energy vehicles necessitates accurate battery state of health (SOH) assessment to ensure safety and reliability. However, real-world SOH estimation is challenging because users rarely perform full charge–discharge cycles, leaving only fragmented charging segments that obscure true battery capacity. To address this, we propose a data-driven method that reconstructs a virtual full-charge cycle. By clustering charging segments based on temperature and current, the approach creatively splices multiple incomplete curves from similar mileages and conditions into a complete charging profile. This enables robust full-capacity estimation on a large-scale real-world vehicle dataset, achieving estimation errors below 2% when compared with offline validation tests. The method offers a practical and scalable solution for SOH monitoring and fleet management using field data.

1. Introduction

Lithium-ion batteries serve as the primary energy source for propulsion in new energy vehicles (NEVs) [1]. The state of health (SOH) is a pivotal metric that reflects the extent of electrochemical degradation within the battery system and directly governs the NEV’s performance, safety, and operational reliability [2,3,4]. Over prolonged cycling, side reactions such as solid electrolyte interphase (SEI) growth, lithium plating, and transition metal dissolution induce capacity fading, internal resistance increase, and heat generation, collectively deteriorating the available power and energy output [5,6]. These degradation phenomena not only constrain the driving range and acceleration capability but also elevate the risk of thermal runaway and mechanical failure [7,8]. Consequently, accurate and real-time SOH estimation is indispensable for energy optimization, and the safe, long-term management of NEVs [9,10].
Beyond its technical significance, precise SOH estimation also has profound economic and environmental implications [11,12]. Reliable SOH assessment provides a scientific basis for operational decision making, including usage optimization, maintenance scheduling, and end-of-life management, thereby reducing life-cycle costs and enhancing system sustainability [13,14]. Moreover, SOH serves as a key criterion for second-life applications, determining whether a retired battery retains sufficient functionality for echelon utilization in distributed energy storage or other low-demand scenarios [15,16]. In this context, SOH estimation constitutes the scientific foundation for extending the life-cycle value chain of retired lithium-ion batteries [17].
Presently, estimation methodologies for battery SOH are broadly categorized into model-driven and data-driven approaches [18]. Model-driven methods rely on physics-based electrochemical models or equivalent circuit models (ECMs) to interpret degradation behaviors from a mechanistic perspective [19,20]. Conversely, data-driven methods leverage statistical analysis and machine learning to capture degradation trends by mining large-scale historical operational data [21]. With the rapid development of big data analytics and artificial intelligence, data-driven SOH estimation has become a prominent research frontier, offering enhanced adaptability and scalability for real-world NEV applications [22,23].
Nonetheless, conventional model-driven methods exhibit significant limitations when applied to real-world vehicle operations [24]. These approaches are typically developed under controlled laboratory conditions and depend on specific features or parameters extracted from experiments conducted at constant temperature or under constant-current/constant-voltage protocols [25]. During real-world driving and charging, however, the operating conditions are highly dynamic and nonlinear, making it impossible to satisfy these stringent laboratory prerequisites. This mismatch between theoretical model assumptions and practical conditions leads to substantial discrepancies between estimated and actual SOH values [26].
While data-driven methods are inherently better suited to practical operating conditions, they face their own challenges [27]. Most existing techniques require pre-labeled SOH data as the ground truth for model training [28,29]. These labels are typically derived from costly, time-consuming laboratory tests or inferred from the vehicle’s battery management system (BMS)  [30]. However, the SOH values reported by onboard BMSs are prone to estimation errors, causing error propagation and limiting the achievable estimation precision of data-driven models [31].
A fundamental bottleneck common to both categories of methods lies in the incompleteness of real-world operational data [32]. Due to the stochastic and user-dependent nature of charging behavior, NEV batteries are seldom charged from a low state of charge to full capacity [33]. Consequently, historical datasets are dominated by incomplete charging records, which hinder the construction of accurate degradation features and impose an upper limit on achievable estimation accuracy [34].
To address these challenges, this study proposes a novel SOH estimation framework that integrates data-driven techniques with a clustering-based feature reconstruction strategy. The proposed method operates exclusively on real-world data and eliminates the dependence on pre-labeled SOH values. A clustering algorithm leveraging statistical indicators of temperature and current is introduced to partition complex operating conditions and mitigate the influence of variable thermal and loading profiles. Most critically, a charging segment splicing strategy is developed, which identifies and concatenates multiple incomplete charging segments from the same vehicle that share similar operating conditions and close mileage proximity. This reconstruction yields a virtually complete charging profile, enabling robust capacity estimation. By averaging the charge capacity across spliced segments, the proposed framework significantly reduces estimation bias caused by anomaly segments and effectively overcomes the accuracy bottleneck imposed by the lack of fully charged data.

2. Theoretical Framework for Virtual Capacity Reconstruction

The SOH of a battery is fundamentally defined as the ratio of its current maximum available capacity C c u r r ( t ) at a specific time t (or mileage m) to its initial, beginning-of-life (BOL) capacity C i n i t i a l  [35]. This relationship is mathematically formulated as
S O H ( t ) = C c u r r ( t ) C i n i t i a l
Therefore, obtaining an accurate measurement of the current full capacity, C c u r r ( t ) , is the physical basis for any SOH estimation. Yet, in real-world operation, charging processes are frequently interrupted by stochastic user behavior, resulting in field data dominated by incomplete charging segments.
A common method to estimate the current full capacity C s e g , i from a single incomplete segment i is to divide the measured charge accumulation Δ Q i by the corresponding change in state of charge (SOC) Δ S O C i  [36]. This relationship is formulated as
C s e g , i = Δ Q i Δ S O C i = t s t a r t , i t e n d , i I ( t ) d t S O C ( t e n d , i ) S O C ( t s t a r t , i )
However, the  S O C ( t ) value used in the denominator is itself an onboard estimation and is subject to errors from model inaccuracies, sensor drift, and temperature variations [37]. The estimation error in the denominator propagates directly into the capacity calculation, introducing significant uncertainty and bias in the estimated segment capacity C s e g , i .
To address these limitations, we propose a novel approach for reconstructing a virtual full-charge capacity, denoted as C v i r t u a l . This method synthesizes a complete and physically consistent charging profile by splicing multiple incomplete charging segments extracted from real-world operation. The theoretical soundness of this reconstruction is established based on two key assumptions:
  • Quasi-static degradation: Within a sufficiently small mileage window Δ M , the lithiumion battery’s capacity degradation is considered negligible. For any two mileages m i and m j within this window,
    | m i m j |     Δ M C c u r r ( m i ) C c u r r ( m j )
  • Consistency of polarization via clustering: Ideally, the capacity integration intervals should be defined based on the open-circuit voltage (OCV) to reflect thermodynamic equilibrium. However, real-world BMS data typically provides only terminal voltage ( V t e r m i n a l ), which differs from OCV due to ohmic potential drop and polarization effects:
    V t e r m i n a l = V O C V + I · R i n t + V p o l
    Direct usage of V t e r m i n a l for data splicing across heterogeneous operating conditions would introduce significant systematic errors, as the voltage drop would vary drastically. To validate the use of V t e r m i n a l , we employ a strict clustering strategy based on the dominant operating factors, including current and temperature. By ensuring that all segments within a specific cluster share statistically similar current and temperature profiles, we impose a constraint where the polarization terms remain consistent across spliced segments:
    ( I · R i n t + V p o l ) i ( I · R i n t + V p o l ) j C p o l
    Under this condition, aligning segments based on V t e r m i n a l becomes mathematically equivalent to aligning them based on V O C V with a constant offset C p o l . This effective alignment preserves the physical validity of the capacity integration Δ Q within the standardized voltage intervals, despite the lack of OCV measurements. While internal resistance increases with long-term aging, the proposed method reconstructs capacity profiles locally within a narrow mileage window. Consequently, the impedance and polarization characteristics are considered distinct for each reconstruction instance but consistent within the splicing set, thereby isolating the aging effect from the reconstruction process.
In practical automotive applications, lithium-ion batteries predominantly exhibit a linear degradation trajectory during their primary lifespan. The end-of-life (EOL) threshold is conventionally defined as 80% of the nominal capacity which is typically reached after a cumulative mileage ranging from 80,000 to 150,000 km. Taking a conservative estimation where an electric vehicle reaches 80% SOH at 80,000 km, the average degradation rate is approximately 0.025% per 100 km. Consequently, within the proposed sliding window of Δ M = 2000 km, the theoretical capacity loss is limited to approximately 0.5%. This magnitude is significantly lower than the inherent sensor noise and the algorithmic tolerance margin of 2%. Furthermore, although lithium-ion batteries may exhibit nonlinear aging or capacity diving phenomena in their late life-cycle stages, the degradation curve remains locally linear within such a narrow mileage interval. Therefore, the capacity variation within closely spaced driving intervals is statistically insignificant and can be reasonably considered quasi-static.
Our methodology reconstructs the capacity by discretizing the standardized, full voltage range [ V m i n , V m a x ] into N small, non-overlapping intervals, Δ V j = [ V j , V j + 1 ] (e.g.,  Δ V = 0.01 V ). The current capacity C c u r r ( t ) can thus be represented as the sum of the incremental capacities q j required to charge through each interval Δ V j ,
C c u r r ( t ) = j = 1 N q j ( t ) where q j ( t ) = V j V j + 1 I ( V ) d t d V d V
To ensure operating condition consistency, all charging segments are first partitioned into k distinct clusters. For a target charging segment i at mileage m i belonging to cluster k, we first collect a set S k ( m i ) of n adjacent charging segments that satisfy the two assumptions.
S k ( m i ) = { r vehicle ( r ) = vehicle ( i ) , cluster ( r ) = k , | m r m i |     Δ M }
For each voltage interval Δ V j , we then gather the set of all available incremental capacity measurements { q r , j } from all segments r S k ( m i ) that cover this specific voltage interval. A robust average, q ¯ j ( m i , k ) , is then calculated for this interval by applying a statistical filter to remove outliers from the set { q r , j } ,
q ¯ j ( m i , k ) = RobustAverage ( { q r , j r S k ( m i ) and Δ V j [ V s t a r t , r , V e n d , r ] } )
The virtual full-charge capacity C v i r t u a l ( m i , k ) for the target segment i is defined as the sum of these robustly averaged incremental capacities q ¯ j across the entire standardized voltage range [ V m i n , V m a x ] .
C v i r t u a l ( m i , k ) = j = 1 N q ¯ j ( m i , k )
The reconstructed C v i r t u a l ( m i , k ) serves as a robust and physically consistent approximation of the current full capacity C c u r r ( m i ) under operating condition k. By substituting the reconstructed capacity into the fundamental definition of SOH, the SOH corresponding to segment i at mileage m i and condition k can be expressed as
S O H ( m i , k ) = C v i r t u a l ( m i , k ) C i n i t i a l
This formulation enables accurate SOH estimation by reconstructing a complete and physically meaningful capacity profile from fragmented real-world data, thereby effectively overcoming the inherent limitations imposed by incomplete charging records.

3. Algorithmic Implementation for SOH Calculation

The implementation of the proposed SOH estimation framework comprises four major stages: (1) data acquisition and feature engineering, (2) operating condition classification through clustering analysis, (3) virtual capacity reconstruction via charging segment splicing, and (4) SOH calibration coupled with anomaly detection. The overall architecture of this process is depicted in Figure 1.

3.1. Data Acquisition and Feature Engineering

The foundational dataset utilized in this study comprises historical time-series data uploaded from the BMS of NEVs to the cloud platform during charging events. For each charging segment i, the raw dataset D i contains timestamps t, current I ( t ) , individual cell voltages V c e l l , j ( t ) , and temperatures T ( t ) .
For each cell, the incremental capacity q i , j charged within each predefined voltage interval Δ V j (e.g., Δ V = 0.01 V) is calculated using the Ampere-hour integration method. The charge Δ Q k accumulated between two consecutive timestamps t k and t k + 1 is given by
Δ Q k = t k t k + 1 I ( t ) d t
By mapping the calculated Δ Q k values to their corresponding voltage intervals Δ V j , the set of incremental capacities { q i , j } is derived for each charging segment i. Concurrently, to facilitate the subsequent clustering analysis, a comprehensive series of statistical features is extracted, including the cumulative mileage m i as well as the maximum, minimum, and mean values of both temperature ( T m a x , i , T m i n , i , T m e a n , i ) and current ( I m a x , i , I m i n , i , I m e a n , i ). These indicators comprehensively characterize the electrical and thermal loading conditions of each segment, thereby establishing a robust foundation for the classification of operating conditions.

3.2. Operating Condition Classification via K-Means Clustering

To account for the substantial influence of operating conditions on charging behavior, all charging segments are partitioned into k distinct clusters representing different thermal–electrical modes. The k-means algorithm is adopted for this purpose, owing to its efficiency and stability in handling high-dimensional continuous features and its suitability for unsupervised partitioning of large-scale battery datasets [38]. For each segment i, a feature vector X i is constructed from its most representative indicators, including the maximum, minimum, and mean values of temperature and current. It should be noted that mileage is not included in the clustering features, as it serves as a prerequisite criterion for sample selection; only charging segments with a mileage difference smaller than a predefined threshold Δ M are grouped together prior to clustering. A feature vector X i is constructed for each segment i using its most influential features:
X i = [ T m a x , i , T m i n , i , T m e a n , i , I m a x , i , I m i n , i , I m e a n , i ]
To ensure that each feature contributes equally to the distance calculation, the features are first normalized to the range [ 0 , 1 ] using min-max scaling:
x = x x m i n x m a x x m i n
The k-means algorithm aims to minimize the within-cluster sum of squares (SSE)  [39], which is defined as
J = k = 1 K X i S k X i μ k 2
where S k denotes the set of charging segments belonging to cluster k, as defined in Equation (7), and  μ k represents the centroid of that cluster. The optimal number of clusters K is determined through a comprehensive evaluation that combines the Elbow method and the Silhouette coefficient across a range of candidate K values [40,41]. Each resulting cluster S k represents a distinct charging operating condition.

3.3. Virtual Capacity Reconstruction via Splicing

The algorithm for computing the virtual full-charge capacity C v i r t u a l is implemented for each target charging segment i based on the mathematical framework established in Section 2. The procedure proceeds sequentially through adjacent segment filtering, incremental capacity aggregation, robust statistical averaging, and final capacity summation.
In the adjacent segment filtering stage, a sequence-based filtering mechanism is applied using a sliding window of length Δ M . Starting from the mileage corresponding to each target segment i (with mileage m i and cluster index k), the algorithm iteratively slides the window forward along the mileage axis to identify all charging segments belonging to the same vehicle and operating condition cluster, where the mileage difference satisfies | m r m i |     Δ M (typically Δ M = 2000 km). Consistent with the theoretical derivation in Section 2, this interval corresponds to a conservative capacity loss of approximately 0.5%, which is statistically negligible compared to the algorithmic tolerance margin, thereby strictly satisfying the quasi-static assumption. The resulting set of filtered segments, denoted as S a d j , forms the local neighborhood used for virtual capacity reconstruction.
Once the set S a d j is defined, the algorithm aggregates all available incremental capacity data. A temporary data structure Q c o l l e c t o r is created. Q c o l l e c t o r is an array of lists, where each index j corresponds to a specific voltage interval Δ V j in the standardized range [ V m i n , V m a x ] . The algorithm iterates through every segment r S a d j . For each segment, it reads its pre-calculated incremental capacity data { q r , j } and appends each q r , j to the corresponding list in the collector:
Q c o l l e c t o r [ j ] = { q r , j r S a d j and Δ V j is covered by r }
After this step, Q c o l l e c t o r [ j ] holds all measured incremental capacity values for the j-th voltage interval from all relevant neighboring segments.
A simple mean of Q c o l l e c t o r [ j ] is highly susceptible to outliers and may be distorted by anomalous values from individual segments. To enhance robustness, a statistical filtering procedure is applied to each list Q c o l l e c t o r [ j ] prior to averaging. Specifically, a standard boxplot-based interquartile range (IQR) filter is employed to remove outliers and retain only statistically consistent values, as detailed in Algorithm 1.
Algorithm 1 Robust Statistical Filtering for Virtual Capacity Calculation
Require: List of incremental capacities Q c o l l e c t o r [ j ] = { q 1 , q 2 , , q n } for voltage interval j
Ensure: Robust average q ¯ j
 1:
Compute the first and third quartiles:
Q 1 j = Quantile ( Q c o l l e c t o r [ j ] ,   0.25 ) , Q 3 j = Quantile ( Q c o l l e c t o r [ j ] ,   0.75 )
 2:
Calculate the interquartile range: I Q R j = Q 3 j Q 1 j
 3:
Define acceptance bounds:
L o w e r _ B o u n d j = Q 1 j 1.5 × I Q R j , U p p e r _ B o u n d j = Q 3 j + 1.5 × I Q R j
 4:
Filter outliers to form a new list:
Q c o l l e c t o r [ j ] = { q Q c o l l e c t o r [ j ] L o w e r _ B o u n d j q U p p e r _ B o u n d j }
 5:
Compute the robust mean:
q ¯ j = 1 | Q c o l l e c t o r [ j ] | q Q c o l l e c t o r [ j ] q
 6:
return q ¯ j
Finally, the total virtual capacity C v i r t u a l ( m i , k ) is computed by summing the robust averages q ¯ j for all N intervals across the standardized voltage range [ V m i n , V m a x ] as described in Equation (9). This single scalar value is the output of the algorithm for segment i. It represents the robustly estimated full capacity at mileage m i for operating condition k.

4. Results and Discussion

This section presents the experimental validation of the proposed SOH estimation methodology. The algorithm was applied to an extensive dataset of real-world operational data collected from a fleet of electric vehicles. The following subsections detail the dataset characteristics, the intermediate clustering results, the final SOH estimation accuracy, and specific validation cases.

4.1. Experimental Dataset Overview

The dataset utilized in this study consists of historical charging data collected from a fleet of 108 real-world electric vehicles of the same model. All vehicles are equipped with the same battery pack specifications, featuring a configuration of Nickel–Cobalt–Manganese (NCM) cells with a nominal capacity of 145 Ah. Comprising a total of 65,250 charging segments, the dataset captures detailed time-series records of current, cell voltages, temperature, and cumulative mileage for each charging event. The aggregated data spans a broad operational spectrum, with vehicle mileages ranging from 0 km to approximately 250,000 km, covering the entire life-cycle from fresh to highly degraded states. Figure 2 illustrates the mileage distribution of the fleet, demonstrating a comprehensive and data-rich sample across various degradation stages.

4.2. Operating Condition Clustering Results

As outlined in the methodology, partitioning the data by operating condition is a prerequisite for accurate splicing. While the full estimation model employs a comprehensive set of six statistical features, specifically the maximum, minimum, and mean values of both temperature and current as clustering inputs, for the purpose of clear visualization, the k-means clustering results are demonstrated here using two simplified representative features, average charging current and average charging temperature. Figure 3 depicts the results for k = 7 . The clusters effectively segregate distinct charging behaviors, clearly distinguishing between low-power/low-temperature scenarios and high-power/high-temperature scenarios. This separation is essential to ensure that the virtual capacity reconstruction strategy strictly splices segments that share comparable electrochemical conditions.

4.3. Comparison with Industrial Baseline

Figure 4 presents a comparative analysis of the SOH degradation trajectory for a representative vehicle, estimated by the proposed splicing methodology versus the conventional Ampere-hour integration method calculating capacity via Δ Q / Δ S O C . The separation of the high-temperature cluster (red) into two groups along the mileage axis in Figure 4 is attributed to the multi-year duration of the dataset, which covers more than one summer season.
It is important to acknowledge that the Ampere-hour integration method serves as a baseline with known limitations, primarily due to its heavy reliance on the onboard BMS-estimated SOC, which inherently suffers from measurement drift and estimation errors. To faithfully reproduce the real-world performance constraints of legacy systems, the baseline capacity is calculated directly utilizing the raw SOC values reported by the BMS, without the application of additional filtering. However, as this method remains the prevailing standard in many legacy industrial systems, this comparison is essential. Its purpose is not to benchmark against ideal laboratory models, but to highlight a critical advantage of the proposed approach, which is the ability to decouple SOH estimation from unreliable SOC inputs.
The results illustrated in Figure 4 clearly demonstrate this benefit. The Ampere-hour integration method exhibits significant volatility and non-physical fluctuations, directly attributable to the SOC estimation noise discussed in Section 2. In contrast, the proposed virtual charge-splicing method yields a stable, monotonic, and physically plausible degradation trend, effectively filtering out the noise inherent in single-segment estimations and providing a robust metric for real-world applications.

4.4. SOH Estimation Performance on Randomly Selected Vehicles

To evaluate the method’s adaptability across individual instances, four vehicles were randomly selected from the fleet: X143, X228, X251, and X287. Their SOH estimation results are plotted against mileage in Figure 5a–d. A linear degradation model was fitted to the data for each vehicle to represent its specific aging trajectory. The shaded regions in the plots indicate the estimation error bands. As observed, the estimated SOH points for all four vehicles cluster tightly around their respective fitted degradation lines. Quantitative analysis reveals that the average error band is less than 1%, with the maximum outlier deviation remaining within 4%. These results demonstrate that the proposed virtual capacity reconstruction method maintains high consistency and precision across different individual vehicles, effectively capturing the linear aging trend despite the randomness of real-world charging behaviors.

4.5. Field Validation via Controlled Full-Charge Experiments

To rigorously validate the absolute accuracy of the algorithm against a reliable physical benchmark, controlled field experiments were conducted on randomly selected operational vehicles (X761, X655, and X117). Distinct from the statistical analysis based on historical data mining, this validation involved active intervention coordinated with fleet operators. The procedure was strictly controlled, beginning with the scheduling of target vehicles during their operational downtime. To reach the lower SOC limit required for a full cycle, the vehicles were actively discharged through a combination of test driving and continuous operation of the air conditioning system. Subsequently, the vehicles were fully charged from this lower limit to 100% utilizing calibrated high-precision charging equipment. The equipment employed for this validation features a measurement precision of 0.1% F.S. (full scale) for both voltage and current, thereby ensuring that the measured charged capacity served as a precise ground truth for the current SOH.
The algorithmic SOH estimates, derived exclusively from historical incomplete segments, were benchmarked against the physically measured ground truth. The quantitative results, summarized in Table 1, reveal exceptional precision across the vehicle lifespan. Notably, the validation covers a broad mileage spectrum, ranging from early-stage operation (approx. 40,000 km) to an advanced aging stage (approx. 250,000 km). In all cases, the relative estimation error remains strictly below 2%, with a maximum deviation of only 1.9% observed at the highest mileage. This empirical evidence confirms that the proposed virtual capacity reconstruction strategy effectively aligns with the true physical capacity measured under controlled conditions.

5. Conclusions

This paper proposes a novel data-driven methodology for estimating the SOH of power batteries based on a virtual full-charge reconstruction strategy. Designed to overcome the fundamental challenges of real-world vehicle data, the method successfully eliminates the dependence on pre-labeled SOH values and unreliable onboard SOC estimations. By employing a k-means clustering algorithm based on statistical features of temperature and current, the framework effectively partitions complex charging data into distinct operating modes to ensure electrochemical consistency. A key innovation lies in the charging segment splicing strategy which synthesizes a comprehensive capacity profile from multiple incomplete charging segments. This approach effectively resolves the primary bottleneck of fragmented data and mitigates errors associated with small voltage integration intervals. Coupled with robust statistical averaging, the method ensures high resilience against anomalies in individual charging events. Validation through both large-scale fleet analysis and controlled field experiments demonstrates high precision with estimation errors consistently maintained below 2%. Consequently, the proposed method offers a scalable and reliable solution for the health management of new energy vehicle fleets.
Future investigations could explore advanced clustering algorithms such as density-based spatial clustering or Gaussian mixture models to identify more nuanced operating conditions. Additionally, the incorporation of nonlinear degradation models could capture long-term aging behaviors with greater accuracy. A significant objective is to optimize the algorithm for computational efficiency to enable deployment for real-time estimation on cloud-based battery management platforms. Furthermore, the applicability of the splicing concept should be evaluated on different battery chemistries and diverse electric platforms. Finally, integrating this high-precision SOH estimation into a holistic BMS could significantly improve the accuracy of related metrics including SOC and SOE.

Author Contributions

Conceptualization, D.G. and Y.Z.; methodology, D.G.; software, D.G.; validation, D.G., Z.Z., and X.L.; formal analysis, D.G.; investigation, D.G.; resources, D.G.; data curation, Z.Z.; writing—original draft preparation, D.G.; writing—review and editing, X.L.; visualization, Z.Z.; supervision, Z.Z.; project administration, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China (NSFC) under grant numbers of 52507260 and 52277222, and the Shanghai Science and Technology Development Fund under grant number 22ZR14445000.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request, subject to the permission of the vehicle manufacturer.

Acknowledgments

During the preparation of this study, the authors used Gemini 2.5 in order to improve the readability and language of the manuscript. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BMSBattery management system
BOLBeginning of life
ECMEquivalent circuit model
EOLEnd of life
IQRInterquartile range
NCMNickel–cobalt–manganese
NEVNew energy vehicle
OCVOpen-circuit voltage
SEISolid electrolyte interphase
SOCState of charge
SOHState of health
SSESum of squares

References

  1. Khan, F.M.N.U.; Rasul, M.G.; Sayem, A.; Mandal, N.K. Design and optimization of lithium-ion battery as an efficient energy storage device for electric vehicles: A comprehensive review. J. Energy Storage 2023, 71, 108033. [Google Scholar] [CrossRef]
  2. Sun, Y.; Xiong, R.; Meng, X.; Deng, X.; Li, H.; Sun, F. Battery degradation evaluation based on impedance spectra using a limited number of voltage-capacity curves. eTransportation 2024, 22, 100347. [Google Scholar] [CrossRef]
  3. Tao, J.; Wang, S.; Cao, W.; Fernandez, C.; Blaabjerg, F. A comprehensive review of multiple physical and data-driven model fusion methods for accurate lithium-ion battery inner state factor estimation. Batteries 2024, 10, 442. [Google Scholar] [CrossRef]
  4. Zhang, G.; Wei, X.; Wang, X.; Zhu, J.; Chen, S.; Wei, G.; Tang, X.; Lai, X.; Dai, H. Lithium-ion battery sudden death: Safety degradation and failure mechanism. eTransportation 2024, 20, 100333. [Google Scholar] [CrossRef]
  5. Meng, D.; Ma, Z.; Li, L. Multi-scale heterogeneity of electrode reaction for 18650-type lithium-ion batteries during initial charging process. Batteries 2024, 10, 109. [Google Scholar] [CrossRef]
  6. Qu, Y.; Xing, B.; Xia, Y.; Zhou, Q. How does room temperature cycling ageing affect lithium-ion battery behaviors under extreme indentation? eTransportation 2024, 20, 100331. [Google Scholar] [CrossRef]
  7. Li, J.; Gao, P.; Tong, B.; Cheng, Z.; Cao, M.; Mei, W.; Wang, Q.; Sun, J.; Qin, P. Revealing the mechanism of pack ceiling failure induced by thermal runaway in NCM batteries: A coupled multiphase fluid-structure interaction model for electric vehicles. eTransportation 2024, 20, 100335. [Google Scholar] [CrossRef]
  8. Wang, G.; Ping, P.; Kong, D.; Peng, R.; He, X.; Zhang, Y.; Dai, X.; Wen, J. Advances and challenges in thermal runaway modeling of lithium-ion batteries. Innovation 2024, 5, 100624. [Google Scholar] [CrossRef]
  9. Madani, S.S.; Ziebert, C.; Vahdatkhah, P.; Sadrnezhaad, S.K. Recent progress of deep learning methods for health monitoring of lithium-ion batteries. Batteries 2024, 10, 204. [Google Scholar] [CrossRef]
  10. Qiao, D.; Wei, X.; Zhu, J.; Zhang, G.; Yang, S.; Wang, X.; Jiang, B.; Lai, X.; Zheng, Y.; Dai, H. Mechanism of battery expansion failure due to excess solid electrolyte interphase growth in lithium-ion batteries. eTransportation 2025, 25, 100450. [Google Scholar] [CrossRef]
  11. Terkes, M.; Demirci, A.; Gokalp, E. An evaluation of optimal sized second-life electric vehicle batteries improving technical, economic, and environmental effects of hybrid power systems. Energy Convers. Manag. 2023, 291, 117272. [Google Scholar] [CrossRef]
  12. Keske, C.; Srinivasan, A.; Sansavini, G.; Gabrielli, P. Optimal economic and environmental arbitrage of grid-scale batteries with a degradation-aware model. Energy Convers. Manag. X 2024, 22, 100554. [Google Scholar] [CrossRef]
  13. Teichert, O.; Link, S.; Schneider, J.; Wolff, S.; Lienkamp, M. Techno-economic cell selection for battery-electric long-haul trucks. eTransportation 2023, 16, 100225. [Google Scholar] [CrossRef]
  14. Safarzadeh, H.; Di Maria, F. Progress, Challenges and Opportunities in Recycling Electric Vehicle Batteries: A Systematic Review Article. Batteries 2025, 11, 230. [Google Scholar] [CrossRef]
  15. Salek, F.; Resalati, S.; Babaie, M.; Henshall, P.; Morrey, D.; Yao, L. A review of the technical challenges and solutions in maximising the potential use of second life batteries from electric vehicles. Batteries 2024, 10, 79. [Google Scholar] [CrossRef]
  16. Fan, W.; Jiang, B.; Wang, X.; Yuan, Y.; Zhu, J.; Wei, X.; Dai, H. Enhancing capacity estimation of retired electric vehicle lithium-ion batteries through transfer learning from electrochemical impedance spectroscopy. eTransportation 2024, 22, 100362. [Google Scholar] [CrossRef]
  17. Machala, M.L.; Chen, X.; Bunke, S.P.; Forbes, G.; Yegizbay, A.; de Chalendar, J.A.; Azevedo, I.L.; Benson, S.; Tarpeh, W.A. Life cycle comparison of industrial-scale lithium-ion battery recycling and mining supply chains. Nat. Commun. 2025, 16, 988. [Google Scholar] [CrossRef]
  18. Xiong, R.; Li, L.; Tian, J. Towards a smarter battery management system: A critical review on battery state of health monitoring methods. J. Power Sources 2018, 405, 18–29. [Google Scholar] [CrossRef]
  19. Kim, J.; Han, D.; Lee, P.Y.; Kim, J. Transfer learning applying electrochemical degradation indicator combined with long short-term memory network for flexible battery state-of-health estimation. eTransportation 2023, 18, 100293. [Google Scholar] [CrossRef]
  20. Yu, J.; Yao, F. Multi-Timescale Estimation of SOE and SOH for Lithium-Ion Batteries with a Fractional-Order Model and Multi-Innovation Filter Framework. Batteries 2025, 11, 372. [Google Scholar] [CrossRef]
  21. Madani, S.S.; Hébert, M.; Boulon, L.; Lupien-Bédard, A.; Allard, F. Comparative Analysis of ML and DL Models for Data-Driven SOH Estimation of LIBs Under Diverse Temperature and Load Conditions. Batteries 2025, 11, 393. [Google Scholar] [CrossRef]
  22. Lu, Y.; Lin, J.; Guo, D.; Zhang, J.; Wang, C.; He, G.; Ouyang, M. Towards real-world state of health estimation, Part 1: Cell-level method using lithium-ion battery laboratory data. eTransportation 2024, 21, 100338. [Google Scholar] [CrossRef]
  23. Lu, Y.; Guo, D.; Xiong, G.; Wei, Y.; Zhang, J.; Wang, Y.; Ouyang, M. Towards real-world state of health estimation: Part 2, system level method using electric vehicle field data. eTransportation 2024, 22, 100361. [Google Scholar] [CrossRef]
  24. Dini, P.; Colicelli, A.; Saponara, S. Review on modeling and soc/soh estimation of batteries for automotive applications. Batteries 2024, 10, 34. [Google Scholar] [CrossRef]
  25. Kucinskis, G.; Bozorgchenani, M.; Feinauer, M.; Kasper, M.; Wohlfahrt-Mehrens, M.; Waldmann, T. Arrhenius plots for Li-ion battery ageing as a function of temperature, C-rate, and ageing state–An experimental study. J. Power Sources 2022, 549, 232129. [Google Scholar] [CrossRef]
  26. Wang, Q.; Wang, Z.; Liu, P.; Zhang, L.; Sauer, D.U.; Li, W. Large-scale field data-based battery aging prediction driven by statistical features and machine learning. Cell Rep. Phys. Sci. 2023, 4, 101720. [Google Scholar] [CrossRef]
  27. Liu, H.; Li, C.; Hu, X.; Li, J.; Zhang, K.; Xie, Y.; Wu, R.; Song, Z. Multi-modal framework for battery state of health evaluation using open-source electric vehicle data. Nat. Commun. 2025, 16, 1137. [Google Scholar] [CrossRef]
  28. Chen, L.; Xie, S.; Lopes, A.M.; Li, H.; Bao, X.; Zhang, C.; Li, P. A new SOH estimation method for Lithium-ion batteries based on model-data-fusion. Energy 2024, 286, 129597. [Google Scholar] [CrossRef]
  29. Li, C.; Yang, L.; Li, Q.; Zhang, Q.; Zhou, Z.; Meng, Y.; Zhao, X.; Wang, L.; Zhang, S.; Li, Y.; et al. SOH estimation method for lithium-ion batteries based on an improved equivalent circuit model via electrochemical impedance spectroscopy. J. Energy Storage 2024, 86, 111167. [Google Scholar] [CrossRef]
  30. Zhang, C.; Luo, L.; Yang, Z.; Zhao, S.; He, Y.; Wang, X.; Wang, H. Battery SOH estimation method based on gradual decreasing current, double correlation analysis and GRU. Green Energy Intell. Transp. 2023, 2, 100108. [Google Scholar] [CrossRef]
  31. Wang, F.; Zhai, Z.; Zhao, Z.; Di, Y.; Chen, X. Physics-informed neural network for lithium-ion battery degradation stable modeling and prognosis. Nat. Commun. 2024, 15, 4332. [Google Scholar] [CrossRef] [PubMed]
  32. Schmitt, J.; Horstkötter, I.; Bäker, B. State-of-health estimation by virtual experiments using recurrent decoder–encoder based lithium-ion digital battery twins trained on unstructured battery data. J. Energy Storage 2023, 58, 106335. [Google Scholar] [CrossRef]
  33. Xiang, Y.; Fan, W.; Zhu, J.; Wei, X.; Dai, H. Semi-supervised deep learning for lithium-ion battery state-of-health estimation using dynamic discharge profiles. Cell Rep. Phys. Sci. 2024, 5, 101763. [Google Scholar] [CrossRef]
  34. Yao, L.; Xu, S.; Tang, A.; Zhou, F.; Hou, J.; Xiao, Y.; Fu, Z. A review of lithium-ion battery state of health estimation and prediction methods. World Electr. Veh. J. 2021, 12, 113. [Google Scholar] [CrossRef]
  35. Jiang, M.; Li, D.; Li, Z.; Chen, Z.; Yan, Q.; Lin, F.; Yu, C.; Jiang, B.; Wei, X.; Yan, W.; et al. Advances in battery state estimation of battery management system in electric vehicles. J. Power Sources 2024, 612, 234781. [Google Scholar] [CrossRef]
  36. Tran, M.K.; Panchal, S.; Khang, T.D.; Panchal, K.; Fraser, R.; Fowler, M. Concept review of a cloud-based smart battery management system for lithium-ion batteries: Feasibility, logistics, and functionality. Batteries 2022, 8, 19. [Google Scholar] [CrossRef]
  37. Shen, Y.; Guo, D.; Wang, Y.; Chen, J.; Liu, X.; Han, X.; Zheng, Y.; Ouyang, M. Enhanced few-shot state-of-health estimation for lithium-ion batteries via Masked Autoencoder. Energy 2025, 335, 138263. [Google Scholar] [CrossRef]
  38. Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
  39. Nainggolan, R.; Perangin-angin, R.; Simarmata, E.; Tarigan, A.F. Improved the performance of the K-means cluster using the sum of squared error (SSE) optimized by using the Elbow method. J. Physics Conf. Ser. 2019, 1361, 012015. [Google Scholar] [CrossRef]
  40. Sutomo, F.; Muaafii, D.A.; Al Rasyid, D.N.; Kurniawan, Y.I.; Afuan, L.; Cahyono, T.; Maryanto, E.; Iskandar, D. Optimization of the k-nearest neighbors algorithm using the elbow method on stroke prediction. J. Tek. Inform. (Jutif) 2023, 4, 125–130. [Google Scholar] [CrossRef]
  41. Lai, H.; Huang, T.; Lu, B.; Zhang, S.; Xiaog, R. Silhouette coefficient-based weighting k-means algorithm. Neural Comput. Appl. 2025, 37, 3061–3075. [Google Scholar] [CrossRef]
Figure 1. Overall framework of the proposed SOH estimation method.
Figure 1. Overall framework of the proposed SOH estimation method.
Batteries 12 00010 g001
Figure 2. Mileage distribution of the 108 vehicles.
Figure 2. Mileage distribution of the 108 vehicles.
Batteries 12 00010 g002
Figure 3. Clustering of charging segments based on average current and average temperature.
Figure 3. Clustering of charging segments based on average current and average temperature.
Batteries 12 00010 g003
Figure 4. Comparative analysis of SOH degradation trajectories. (a) SOH estimated using the conventional Ampere-hour integration method. (b) SOH estimated using the proposed virtual full-charge reconstruction method.
Figure 4. Comparative analysis of SOH degradation trajectories. (a) SOH estimated using the conventional Ampere-hour integration method. (b) SOH estimated using the proposed virtual full-charge reconstruction method.
Batteries 12 00010 g004
Figure 5. SOH estimation results for four randomly selected vehicles: (a) X143, (b) X228, (c) X251, and (d) X287. The blue dots represent the estimated SOH values obtained via the splicing method, and the gray shaded regions represent the estimation error bands, where the dark gray area indicates the average absolute error and the light gray area indicates the maximum error.
Figure 5. SOH estimation results for four randomly selected vehicles: (a) X143, (b) X228, (c) X251, and (d) X287. The blue dots represent the estimated SOH values obtained via the splicing method, and the gray shaded regions represent the estimation error bands, where the dark gray area indicates the average absolute error and the light gray area indicates the maximum error.
Batteries 12 00010 g005
Table 1. Comparison of field validation results between the splicing algorithm and calibrated measurements.
Table 1. Comparison of field validation results between the splicing algorithm and calibrated measurements.
Vehicle No.Mileage (km)Measured SOHEstimated SOHError
X76139,5650.9620.9491.3%
X65589,8410.9100.9281.8%
X117248,7780.8150.7961.9%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, D.; Zou, Z.; Lai, X.; Zheng, Y. Data-Driven State-of-Health Estimation by Reconstructing Virtual Full-Charge Segments. Batteries 2026, 12, 10. https://doi.org/10.3390/batteries12010010

AMA Style

Guo D, Zou Z, Lai X, Zheng Y. Data-Driven State-of-Health Estimation by Reconstructing Virtual Full-Charge Segments. Batteries. 2026; 12(1):10. https://doi.org/10.3390/batteries12010010

Chicago/Turabian Style

Guo, Dongxu, Zhenghang Zou, Xin Lai, and Yuejiu Zheng. 2026. "Data-Driven State-of-Health Estimation by Reconstructing Virtual Full-Charge Segments" Batteries 12, no. 1: 10. https://doi.org/10.3390/batteries12010010

APA Style

Guo, D., Zou, Z., Lai, X., & Zheng, Y. (2026). Data-Driven State-of-Health Estimation by Reconstructing Virtual Full-Charge Segments. Batteries, 12(1), 10. https://doi.org/10.3390/batteries12010010

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop