Research on Predicting the Lifespan of Lithium-Ion Batteries Using the Micro XGBoost Model Cluster

Jiao, Yinbo; Zeng, Linjun; Li, Xun; Wang, Shen; Huang, Lei; Cai, Yimei; Huang, Can

doi:10.3390/pr14111829

Open AccessArticle

Research on Predicting the Lifespan of Lithium-Ion Batteries Using the Micro XGBoost Model Cluster

by

Yinbo Jiao

¹,

Linjun Zeng

¹

,

Xun Li

^2,*,

Shen Wang

¹,

Lei Huang

¹,

Yimei Cai

¹ and

Can Huang

¹

School of Energy and Power Engineering, Changsha University of Science and Technology, Changsha 410205, China

²

School of Intelligent Manufacturing and Equipment, Shenzhen University of Information Technology, Shenzhen 518172, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(11), 1829; https://doi.org/10.3390/pr14111829 (registering DOI)

Submission received: 23 March 2026 / Revised: 17 April 2026 / Accepted: 22 April 2026 / Published: 5 June 2026

(This article belongs to the Special Issue Analysis and Control of New Power System with Multiple Types of Flexible Resources)

Download

Browse Figures

Versions Notes

Abstract

Accurately predicting the capacity degradation of lithium-ion batteries is crucial for ensuring the reliability and safety of electric vehicles and energy storage systems. However, existing methods—including those based on physical principles, deep learning, and traditional machine learning—all face challenges in balancing accuracy, computational efficiency, and adaptability to non-linear aging dynamics. This study proposes a new framework that combines multi-scale data preprocessing and a divide-and-conquer strategy to address these limitations. Firstly, a hybrid Wavelet–SG filter is applied to suppress noise, and a set of specialized XGBoost micro models is trained, with each model predicting capacity for a specific cycle, enabling precise trajectory prediction at different aging stages. The evaluation on the Toyota-MIT-Stanford dataset (118 batteries under different operating protocols) shows that this method achieves an average MAPE of 1.16% and a maximum of no more than 2.5% on the unfamiliar protocol test set. In terms of accuracy, it achieves performance comparable to CNN, LSTM, and CNN-LSTM benchmarks. Importantly, its parallel architecture enables fast inference (400 milliseconds on CPU), making it suitable for edge deployment in battery management systems. The model also has interpretability consistent with physical laws and can autonomously capture stage-dependent degradation mechanisms. This work provides a reliable, efficient, and interpretable solution for real-world battery health monitoring.

Keywords:

capacity degradation prediction; divide-and-conquer protocol; lithium-ion batteries; micro XGBoost model cluster; wavelet–SG filter

1. Introduction

Driven by the global pursuit of energy sustainability, lithium-ion batteries have become indispensable for electric vehicles and energy storage infrastructure [1]. The burgeoning demands on lithium-ion batteries necessitate stringent operational safety, precise health monitoring, and strategic life cycle management [2]. Crucial to these paradigms is the accurate prediction of capacity degradation and remaining useful life (RUL), which underpins system reliability and economic viability [3]. Therefore, formulating robust, efficient, and versatile capacity estimation protocols remains a significant technical challenge in the field.

Extensive research efforts have addressed this challenge, leading to three predominant modeling paradigms: physics-based, deep learning-based, and machine learning-based approaches. Physics-based models, grounded in electrochemical principles and equivalent circuit theories, provide invaluable mechanistic insights into aging phenomena [4]. For instance, models that integrate electrochemical characteristics with feature engineering have demonstrated commendable accuracy [5]. However, the practical application of these methods is often hindered by the need for deep domain expertise in parameter setting, high computational costs that preclude real-time deployment, and difficulty in capturing the random variability inherent to individual cells [6].

The rise in data-driven techniques has offered a compelling alternative. Deep learning architectures, particularly those designed for sequential data, excel at modeling complex, non-linear degradation patterns without explicit physical assumptions. For example, some frameworks using Convolutional and Recurrent Neural Networks have shown promise for trajectory forecasting [7,8]. Advanced architectures like Transformers enhanced with denoising autoencoders have also been proposed for early remaining useful life (RUL) prediction [9]. Notwithstanding their predictive prowess, these deep learning models typically possess substantial parameter counts, demanding a large-scale, high-fidelity dataset for training to mitigate overfitting [10]. They are also computationally intensive, requiring significant resources for both training and inference, which poses a barrier to embedded battery management system applications [11]. Furthermore, their performance can critically depend on the representativeness of the training data, with generalization often deteriorating under unseen operating protocols or charging protocols [12].

In parallel, traditional machine learning methods coupled with feature engineering remain widely explored. Techniques range from multi-source signal fusion to transfer component analysis, combined with regression models [13,14]. While these approaches can be effective and relatively simple, their performance is heavily contingent on the manual design and quality of input features. Features extracted under specific protocols may lose discriminative power when applied to different protocols, limiting model adaptability [15]. Moreover, sophisticated feature engineering pipelines themselves can introduce prohibitive preprocessing overhead, undermining real-time feasibility [16].

Despite these advancements, a critical examination reveals persistent and interconnected challenges that hinder the deployment of robust prediction models in real-world battery management systems (BMSs): (i) The Accuracy–Efficiency Trade-off: A stark compromise often exists between model interpretability/computational efficiency and high predictive accuracy. (ii) The Multi-Scale Noise Challenge: Capacity data is inherently contaminated by noise from various sources. Inadequate attention to multi-scale noise filtering can cause models to learn spurious artifacts rather than the underlying degradation trends. (iii) The Dynamic Mechanism Problem: Battery degradation is a non-stationary process where dominant mechanisms (e.g., solid electrolyte interphase growth, lithium plating) evolve over time. Monolithic models struggle to adaptively capture these stage-dependent dynamics [17,18].

To bridge these critical gaps, this study introduces a novel divide-and-conquer ensemble framework for lithium-ion battery capacity prediction, which strategically decomposes the full degradation trajectory into segment-specific forecasting tasks. The core innovation lies in the development of a parallelized micro XGBoost model cluster using PyCharm Community Edition 2024.3.5, where each model is tailored to capture degradation dynamics within a designated capacity segment or aging stage. This architecture inherently adapts to the non-stationary evolution of degradation mechanisms—such as SEI growth, active material loss, and knee point transitions—without relying on explicit physical assumptions. To further enhance signal fidelity, a hybrid Wavelet–Savitzky–Golay filter is employed for multi-scale noise suppression and local trend preservation, ensuring robust input for model training. This synergistic integration of adaptive denoising and segmented ensemble modeling achieves a superior trade-off between prediction accuracy, computational efficiency, and generalization capability, outperforming conventional monolithic and deep learning-based approaches in both edge-deployment feasibility and physical interpretability.

The rest of this paper is organized as follows. Section 2 introduces a lithium battery data processing method based on a Wavelet–SG filter. Section 3 describes the feature extraction of lithium-ion battery capacity degradation data. Section 4 describes the divide-and-conquer protocol for XGBoost targeting battery degradation mechanisms. Section 5 presents the case study analysis. Finally, Section 6 outlines the conclusions and contributions.

2. A Lithium Battery Data Processing Method Based on Wavelet–SG Filter

Accurate capacity prediction requires input data that faithfully reflects true degradation dynamics, free from measurement noise. However, real-world capacity data is inevitably contaminated by multi-scale, non-stationary noise from sensor errors, electromagnetic interference, and intrinsic phenomena like capacity regeneration. Conventional single-filter methods often fail to balance the dual requirements of global noise suppression and local trend preservation [19]. Currently, advanced battery state-of-charge estimation usually relies on the family of Kalman filters, which can be categorized into linearization-based methods and deterministic sampling-based methods. While the Extended Kalman Filter (EKF) is widely deployed due to its balanced complexity, it remains sensitive to linearization errors and noise covariance initialization. Advanced variants like the Unscented Kalman Filter (UKF) and Cubature Kalman Filter (CKF) offer higher accuracy by employing sigma-point or cubature-point approximations, yet they incur significant computational overhead that may challenge low-cost BMS platforms. Recent adaptive protocols, such as the Threefold Modified Adaptive EKF, further improve robustness against sensor bias and aging-induced parameter drift. However, these model-based observers heavily depend on the fidelity of parameterized equivalent circuit models [20].

To overcome this, we propose a hierarchical two-stage preprocessing framework that synergistically combines Wavelet Transform (WT) and Savitzky–Golay (SG) filtering. The data processing workflow is shown in Figure 1. This design leverages their complementary strengths: WT provides multi-resolution analysis for effective global noise separation, while SG smoothing excels at preserving local polynomial trends and critical derivative information needed to identify degradation inflection points. A comparative analysis of common filtering protocols and our proposed method is summarized in Table 1.

2.1. Multi-Resolution Analysis Matching of Battery Degradation Mechanisms Using Wavelet Transform

The wavelet transform provides a powerful mathematical framework for analyzing non-stationary signals through simultaneous time–frequency localization. Unlike the Fourier transform, which assumes signal stationarity and provides only frequency domain information, wavelet analysis decomposes signals into scaled and translated versions of a mother wavelet function, enabling adaptive resolution across different frequency bands. The continuous wavelet transform of a discrete capacity sequence

C_{(t)}

is mathematically defined as:

W (a, b) = \frac{1}{\sqrt{a}} \int_{- \infty}^{+ \infty} C (t) Ψ^{*} (\frac{t - b}{a}) d t

(1)

where

W (a, b)

represents the wavelet coefficient at scale

a

and position

b

,

ψ (t)

denotes the mother wavelet function,

ψ (t)

is its complex conjugate,

a

is the scale parameter inversely related to frequency (

a

> 0), and

b

is the translation parameter that determines temporal localization. The scaling factor

\frac{1}{\sqrt{a}}

ensures energy conservation across different scales.

While wavelet denoising successfully suppresses high-frequency measurement noise, the reconstructed signal may still exhibit residual micro-fluctuations that obscure the underlying capacity fade trajectory. These fluctuations can arise from incomplete noise removal, electrochemical regeneration phenomena, or temperature-induced variations. To address this, we apply the SG filter as a second-stage refinement. The SG filter is a digital smoothing technique based on local polynomial least-squares regression within a sliding window [21].

For a sliding window centered at position i containing n = 2m + 1 points, the SG filter seeks to minimize the weighted least-squares error:

E = \sum_{j = - m}^{m} {w_{j} [C_{W T} (i + j) - p (j)]}^{2}

(2)

where

w_{j}

are optional weighting factors (typically set to unity), and

p_{(j)}

is a polynomial of degree k:

p (j) = \sum_{l = 0}^{k} a_{l} j^{l} = a_{0} + a_{1} j + a_{2} j^{2} + \dots + a_{k} j^{k}

(3)

The coefficients {

a_{0}, a_{1}, . . ., a_{k}

} are determined by solving the normal equations derived from minimizing Equation (2). The smoothed value at position i is given by the constant term of the fitted polynomial:

C_{S G} (i) = p (0) = a_{0} = \sum_{j = - m}^{m} c_{j} C_{W T} (i + j)

(4)

where

c_{j}

are the convolution coefficients, which depend only on the window size 2m + 1 and polynomial degree k, and can be pre-computed using Gram polynomials [22].

To quantitatively assess preprocessing effectiveness, we define three performance metrics. The Signal-to-Noise Ratio (SNR) measures the relative strength of the true signal versus noise:

S N R = 10 \log_{10} (\frac{\sum_{i = 1}^{N} C_{t r u e}^{2} (i)}{\sum_{i = 1}^{N} {[C_{r a w} (i) - C_{t r u e} (i)]}^{2}})

(5)

where

C_{t r u e}

represents the true underlying capacity (approximated by the heavily smoothed signal), and

C_{r a w}

is the raw measured capacity. Higher SNR values indicate better noise suppression.

Table 2 presents a comprehensive comparison of different preprocessing methods applied to representative battery samples from our dataset. The raw data exhibits an SNR of 15.2 dB, indicating substantial noise contamination. Traditional moving average filtering (window size n = 5) improves SNR to 18.6 dB but significantly degrades the feature preservation rate (FPR) to 89.4%, indicating loss of critical knee point information. Kalman filtering achieves 19.8 dB SNR and 91.2% FPR, but requires model specification and incurs higher computational cost (12.3 ms per battery). Wavelet-only preprocessing (db4, J = 5) attains 22.4 dB SNR with 95.6% FPR, demonstrating effective noise removal with good feature retention. SG filter alone (window 2m + 1 = 11, degree k = 3) yields 21.1 dB SNR and 93.8% FPR. The proposed hierarchical WT-SG combination achieves the best overall performance: 24.7 dB SNR (62% noise power reduction compared to raw data), 97.3% FPR (excellent preservation of degradation characteristics), and the lowest smoothness index of 1.67 ×

10^{- 3}

, indicating superior signal quality. The total processing time for all batteries in the dataset is approximately 902 milliseconds. The computational overhead remains practical at 6.9 ms per battery, enabling real-time deployment.

Based on the spectral characteristics of lithium iron phosphate (LFP) battery degradation, the Symlet-8 (sym8) wavelet is selected as the mother wavelet due to its superior symmetry and regularity, which minimizes phase distortion during reconstruction. The decomposition level is adaptively determined with a maximum depth of L = 5. To suppress high-frequency sensor noise while preserving the underlying aging trend, a soft-thresholding protocol is applied. The threshold λ is calculated using a robust noise estimator based on the Median Absolute Deviation (MAD):

λ = k \cdot \frac{m e d i a n (d_{1})}{0.6745} \cdot \sqrt{2 \ln (N)}

(6)

where

d_{1}

denotes the first-level detail coefficients, N is the sequence length, and k = 1.2 is the threshold scaling factor optimized via grid search on the training set to prevent over-smoothing.

Following wavelet reconstruction, a Savitzky–Golay filter is implemented to further refine the localized fluctuations. We employ a 3rd-order polynomial fit to capture the non-linear degradation curvature. To maintain consistency across batteries with varying lifespans, an adaptive window size W is introduced:

W = o d d (m i n (21, [\frac{2}{3} L_{s e q} + 1]))

(7)

Let

L_{s e q}

be the length of the capacity sequence (number of cycles) for a given battery. This adaptive logic ensures that the filter maintains high fidelity for short-life batteries while providing sufficient smoothing for long-life samples. Table 3 provides detailed parameters.

The capacity degradation curves of lithium-ion batteries usually exhibit strong non-linear characteristics. The cubic polynomial (k = 3) is chosen because it can fit the locally non-linear curves with inflection points well. When k is too low, the SG filter degenerates into a simple moving average or quadratic fitting. The model becomes too insensitive to local fluctuations, which can easily lead to underfitting. When k is too high, although it can track the data more closely, the model is extremely sensitive to noise and is prone to generating severe pseudo-amplitude oscillations (Runge’s phenomenon) at the data edges or fluctuation points. For windows, smaller windows are highly sensitive to local fluctuations and can precisely retain short-term capacity drops, but they lack the ability to smooth out low-frequency noise. Larger windows can produce extremely smooth curves, but they will lead to severe over-smoothing, thereby delaying the identification of the battery end-of-life point, which is detrimental to life prediction.

2.2. Normalization

Due to variations in battery manufacturing protocols, actual batteries are not identical, inevitably leading to deviations in initial capacity. While such deviations are not the primary factor affecting battery capacity degradation, they introduce unnecessary bias into model learning. It is therefore necessary to eliminate scale effects through normalization. Similar methods have been used to process the data [23].

x^{*} = \frac{x}{x_{m a x}}

(8)

where

x

is the real capacity,

x_{m a x}

is the maximum capacity over the entire lifetime, and

x *

is the normalized capacity.

3. Feature Extraction of Lithium-Ion Battery Capacity Degradation Data

Feature extraction involves mapping variable-length capacity sequences onto a feature space [24]. To ensure the micro models can accurately perceive both the operational environment and the electrochemical status of the battery, a hybrid feature extraction protocol is developed. The final fused input vector,

X_{f u s e d}

is constructed by integrating external excitation factors with internal degradation fingerprints. The first component of the feature set represents the external “loading” protocols that drive battery degradation. From the different fast-charging protocols in the MIT-Stanford dataset, we extract a three-dimensional stress vector S:

S = [C C 1, S O C, C C 2]

(9)

where CC1 and CC2 represent the first and second charging rates, and SOC denotes the state of charge threshold for the rate transition (more detailed explanations can be found in Section 5).

To capture the battery’s unique “electrochemical fingerprint” and its initial health status, we extract a 100-cycle seed sequence H from the early-life data:

H = [{S O H}_{1}, {S O H}_{2}, {S O H}_{3}, . . ., {S O H}_{100}]

(10)

where

S O H

is the state of health. The stress factors S are only 3-dimensional, while the seed sequence H is 100-dimensional. Without proper calibration, the high-dimensional sequence data could “dominate” the critical charging protocol information during the model’s decision-splitting process. To resolve this, we implement a weighted hybrid fusion mechanism.

α_{1}

,

α_{2}

and

α_{3}

are hyper-parameters optimized in PyCharm Community Edition 2024.3.5 through the Optuna framework. These parameters are critical as they directly dictate the lithium-plating risks and thermal stress experienced during the charging process.

S * = [α_{1} \cdot C C 1, α_{2} \cdot S O C, α_{3} \cdot C C 2]

(11)

The final input vector

X_{f u s e d}

is generated with

S^{*}

and H.

X_{f u s e d} = [S *; H]

(12)

This weighting ensures that the 103-dimensional input space is balanced, forcing the micro models to prioritize the influence of different fast-charging regimes alongside the initial sequential inertia.

4. Divide-and-Conquer Strategy for XGBoost Targeting of Battery Degradation Mechanisms

The degradation of lithium-ion battery capacity is a heterogeneous, multi-stage process driven by distinct electrochemical mechanisms [6,25]. Initial rapid loss (0–10% life) is dominated by solid electrolyte interphase (SEI) formation, transitions to a mid-life (10–80% life) phase of relatively linear fade from gradual SEI thickening and active material loss [26,27,28]. Finally, an accelerated non-linear decline (80–100% life) occurs as synergistic failure modes lead to a characteristic “knee point” where the degradation rate surges [29,30].

This mechanistic evolution poses a core challenge for prediction: a single global model cannot optimally capture the qualitatively different behaviors across all stages, leading to compromised accuracy. Traditional machine learning models and even sequential deep learning models struggle with this heterogeneity. They either lack adaptive complexity or face issues like gradient vanishing, high computational cost, and poor capture of abrupt transitions like knee points, while also offering limited interpretability [17,18].

To fundamentally overcome this, we propose a novel divide-and-conquer protocol employing an ensemble of specialized micro models, which decomposes the full trajectory prediction into discrete subtasks. We adopt a “one-cycle-one-model” paradigm, constructing an ensemble of

L_{m a x}

independent XGBoost regressors (M = {

M_{1}

,

M_{2}

, …,

M_{L_{m a x}}

}), where each micro model is dedicated to predicting the normalized capacity at a specific future cycle. The research flow is shown in Figure 2. This architecture, illustrated in Figure 3, enables fine-grained specialization: each model learns the local degradation dynamics pertinent to its assigned cycle, allowing the collective ensemble to accurately reconstruct the entire heterogeneous trajectory. XGBoost is chosen for its efficiency, robust performance, and ability to model complex non-linear relationships through an optimized gradient boosting framework. Moreover, by setting a reasonable block size, XGBoost efficiently utilizes CPU cache to accelerate data reading. When feature values are missing, XGBoost can also automatically learn the direction of node splitting [31].

Unlike conventional regression models that perform pointwise estimation, each model is optimized to map the high-dimensional input space

X_{f u s e d}

to a tri-dimensional vectorized output

Y_{i}

Y_{i} = [{S O H}_{i, 1}, {S O H}_{i, 2}, {S O H}_{i, 3}]

(13)

The whole SOH trajectory is reconstructed via a temporal overlapping consensus mechanism. Each specific cycle in the future horizon is redundantly predicted by three adjacent models:

M_{i - 1}

,

M_{i}

and

M_{i + 1}

. The final predicted value

{\hat{S O H}}_{i}

is computed using a weighted aggregation of these overlapping outputs:

{\hat{S O H}}_{i} = \frac{1}{3} ({\hat{Y}}_{i - 1,3} + {\hat{Y}}_{i, 2} + {\hat{Y}}_{i + 1,1})

(14)

The input and output principles of the model are shown in Figure 4. Due to the particularity of our model structure, the lifespan of some long-life batteries in the dataset reaches 1200 cycles, which is the longest lifespan in the dataset. To ensure the feasibility of training, the number of models must exceed the maximum battery lifespan in the dataset. Consequently, we set

L_{m a x} = 1300

.

5. Case Study

5.1. Data Configuration

The battery aging test data under applied operating protocols utilized in this study originates from a collaborative project between Toyota Research Institute, Massachusetts Institute of Technology and Stanford University [32]. The total number of cycles across all batteries amounts to approximately 96,700, representing around 8 gigabytes of data. This is one of the largest publicly available datasets for battery life cycle testing under diverse operating protocols. This dataset consists of 124 commercial lithium-ion batteries cycled to failure under fast-charging protocols. These lithium iron phosphate (LFP)/graphite cells, manufactured by A123 Systems (APR18650M1A), were cycled in horizontal cylindrical fixtures on a 48-channel Arbin LBT potentiostat in a forced convection temperature chamber set to 30 °C. The cells have a nominal capacity of 1.1 Ah and a nominal voltage of 3.3 V. To ensure the scientific rigor and representativeness of the study, batteries exhibiting excessively long or short lifespans, or those with excessive data noise that causes severe distortion, were excluded. The final dataset comprises 118 batteries tested under over sixty distinct operating protocols. The charging mode diagrams and life distribution of these batteries are shown in Figure 5.

All cells in this dataset are charged with a one-step or two-step fast-charging policy. This policy has the format “C1-Q1-C2”, in which C1 and C2 are the first and second constant-current steps, respectively, and Q1 is the percentage of charge during the current switching. The second current step ends at 80% SOC, after which the cells charge at 1C CC-CV. The upper and lower cutoff potentials are 3.6 V and 2.0 V, respectively, which are consistent with the manufacturer’s specifications. These cutoff potentials are fixed for all current steps, including fast charging; after some cycling, the cells may reach the upper cutoff potential during fast charging, leading to significant constant-voltage charging. All cells discharge at 4C. Based on the values of CC1, SOC, and CC2, different charging protocols can be distinguished. The battery charging protocol is represented as [CC1, SOC, CC2].

This study initiates with an exhaustive traversal analysis of charging protocols across the entire battery dataset, categorizing them based on the frequency of protocol recurrence: (1) SP-cells: These batteries were tested with a specific charging protocol. Even after traversing the entire dataset, no second battery using this charging protocol could be found (approximately 30%); (2) GP-cells: These batteries were tested with a general charging protocol. By traversing the entire dataset, other batteries with the same protocol can be found. Finally, two experiments were conducted: the test with SP-cells and the test with GP-cells.

The GP-cells test was first constructed to preliminarily evaluate the model’s capability in handling known protocols and baseline performance metrics. A subset of representative batteries exhibiting shared protocols was selected as the test set, with the remaining batteries assigned to the training set, ensuring mutual exclusivity between the two subsets in this experiment. The battery life distribution in the test set is shown in Table 4.

To enhance scientific rigor and rigorously assess the model’s generalization ability for unseen complex protocols, an asymmetric partitioning mechanism based on protocol rarity was implemented to supplement the GP-cells test. The test set was constructed by randomly selecting approximately 20% of SP-cells as an independent test set, simulating the model’s capacity to handle novel protocols in real-world scenarios. The remaining 80% of batteries were allocated to a cross-validation pool, where they were partitioned into five mutually exclusive subsets via a fully randomized process. The types and proportions of batteries in each sub-set of the division are shown in Table 5. During training, five experimental folds were constructed, with one subset sequentially designated as the validation (hyperparameter-tuning) set while the remaining four were combined as the training set. This rotational mechanism ensured comprehensive learning of data distributions and objective optimization of hyperparameters. In this experiment, the test and validation sets remained mutually exclusive.

5.2. Analysis of Data Preprocessing Results

Figure 6 above compares the original capacity decay sequences with the denoised capacity decay sequences for three representative batteries at end-of-life. The sequential processing of wavelet transform and SG filtering produces a synergistic effect: the wavelet transform first removes coarse-grained global high-frequency noise, providing a “clean” signal foundation for subsequent processing. Subsequently, SG filtering focuses on refining fine-grained local smoothing, thereby avoiding potential distortions that might arise from direct manipulation of noisy data. This combination effectively addresses different stages of battery degradation: during the initial phase with pronounced cycling noise, wavelet denoising excels; in the intermediate phase requiring smoothing to maintain linear trends, the SG filter demonstrates its strengths; and in the late accelerated decay phase, the abrupt characteristics are preserved by the wavelet while the SG filter ensures smooth transitions. In order to more clearly highlight the impact of data preprocessing on the learning effect of the model, we set up a set of mirror experiments in the SP-cells test. We used the unprocessed capacity sequence for the experiments, and the remaining steps and methods remained unchanged. The performance on the SP-cells test set is shown in Table 6.

5.3. Industrial Applicability and Edge-Feasibility Evaluation

GP-cells test: To verify the model’s robustness under different aging protocols, we conducted comprehensive tests on eight batteries from the test set. Results are shown in Figure 7. The model maintained excellent levels of MAE, RMSE, and MAPE on all test cells, demonstrating its good generalization ability to individual differences among batteries and meeting the industrial requirement for algorithm stability. Furthermore, we conducted a horizontal comparison of the proposed method with mainstream deep learning benchmark models. Three representative deep learning models were selected for comparison and are presented in Figure 7 and Figure 8. Detailed experimental results for each model can be found in [33,34,35]. Table 7 summarizes the quantitative results. While existing hybrid models (CNN-LSTM) achieve a Mean Absolute Percentage Error of approximately 1.47% in this operational range, our model achieves a remarkable MAPE of 0.39%, representing a reduction of over 73% in prediction error. Similarly, the RMSE is reduced by approximately 50% compared to the standard LSTM benchmark.

SP-cells test: In order to further test the performance of the model when dealing with unfamiliar protocols, we conducted the SP-cells experiment. The dataset partitioning strategy of this experiment was introduced in Section 5.1. Regarding Figure 7, it can be seen that the maximum MAPE observed throughout the entire test set of the SP-cells experiment does not exceed 2.5%. This indicates that the model can maintain relatively excellent engineering accuracy even when faced with completely unfamiliar protocols.

This order-of-magnitude improvement is attributed to our model’s ability to treat each time step as an independent regression task, thereby avoiding the error propagation chain that limits traditional time-series approaches. Traditional deep learning methods, especially those based on recurrent neural networks, rely on a serial recursive structure when processing time-series data. This structure requires the current moment’s computation to wait for the hidden state output of the previous moment, preventing parallel acceleration and resulting in extremely low efficiency on edge-side CPUs without GPUs. To overcome this fundamental computational bottleneck, this paper proposes a parallel decoupled architecture. Figure 9 shows the differences between the traditional serial model and the parallel mode proposed in this study. This structure breaks the strong dependency between time steps, allowing for different features of the input sequence to be processed simultaneously in independent computing units. This structural innovation lays a theoretical foundation for achieving high-frequency and real-time SOH prediction on low-computing-power hardware. This high-stability prediction is crucial for avoiding false alarms from the BMS. The ideal model for a BMS should feature modest parameter identification, low computational burden, and good accuracy under different operating protocols [36].

Computational latency is a decisive factor for real-time applications in BMS. Since BMS units are typically embedded systems with limited processing power and memory, complex deep learning models often require expensive hardware accelerators or suffer from unacceptable delays. Therefore, inference speed is a key metric for evaluating the engineering feasibility of any prognostic algorithm. To rigorously evaluate the computational efficiency, we benchmarked the inference speed of the proposed micro XGBoost cluster against standard deep learning baselines. All models were tested on the same hardware environment. To ensure a fair comparison and highlight the differences, the deep learning benchmark model is informed of the battery life cycle in advance. It only needs to predict a sequence of the same length as the number of life cycles, while the XGBoost model cluster needs to predict a sequence of length 1300. Figure 10 presents a comparison of inference speeds of different models in standard CPU and GPU environments, revealing a remarkable performance gap. The experimental results show that in the case of GPU acceleration, the speed differences among the four models are all within the acceptable range for engineering. However, in the CPU environment without GPU acceleration, the inference time for LSTM and CNN-LSTM can be as high as 80 to 120 s, which is completely unacceptable for real-time BMSs that require sub-second or even millisecond-level responses. Thanks to the aforementioned parallel decoupling structure, the method proposed in this paper only requires 400 ms to complete inference on the same CPU, whereas industrial BMS units typically have processors with limited computing power and cannot handle deep learning models that rely on expensive GPU acceleration. Therefore, it is necessary to reduce the computational burden for other complex BMS functions [37]. This nearly 300-fold speed improvement means that the method in this paper can achieve real-time, high-precision SOH monitoring on low-cost edge hardware.

To visually demonstrate the trade-off relationships among accuracy, stability, efficiency, and cost of each model, we introduced a five-dimensional radar chart shown in Figure 11 for comprehensive evaluation. This chart uses a positive scoring mechanism, with larger areas representing superior comprehensive performance. The radar chart clearly reveals the characteristics of different technical routes: although some deep learning benchmark models perform reasonably well in some accuracy indicators, it shows a significant performance degradation in terms of edge efficiency in the “Edge Efficiency (CPU)”, which represents the CPU speed, indicating that it is extremely dependent on expensive computing resources. In contrast, the method proposed in this paper presents the fullest and most balanced pentagon envelope. It not only achieves full marks in the MAE and RMSE accuracy dimensions, but also has an absolute advantage in edge computing efficiency and deployment costs. This result strongly proves that the method in this paper successfully solves the industry pain point of “high accuracy” and “low power consumption”, providing a globally optimal engineering solution for the next generation of low-cost and high-performance intelligent BMS. In the GP-cells extraction experiment, the average training time using the training set (approximately 70 batteries) was approximately 36 min. Intuitively, although the total number of models is large, the size of the shallow model files is very small, ranging from 2 kB to several tens of kB. Thanks to the unique structure of the model, during actual operation, the entire model is not invoked; instead, only the micro models responsible for specific time intervals are activated to work. Therefore, it is feasible to deploy this model in modern BMS.

To further verify the generalization ability of the model in other electrochemical system batteries, we used the battery aging dataset publicly available from NASA and CACLE for validation. The NASA dataset contains four commercial 18,650 batteries: B0005, B0006, B0007, and B0018. The CACLE dataset contains three lithium cobalt oxide (LCO) square batteries: CS2-35, CS2-36, and CS2-37. The cathode materials, experimental temperatures, and dataset divisions of these batteries are listed in Table 8. The accuracies obtained from tests on the NASA dataset and the CACLE dataset are presented in Table 9.

To verify that the model has not overfitted, we plotted the Figure 12 to show the average best RMSE of the model during the parameter adjustment stage in SP-cells experiment. The graph clearly shows that the model has not suffered from overfitting.

5.4. Multidimensional Feature Correlation and Robustness Evaluation Across Life Cycle

To further reveal the decision-making logic of the micro model cluster in the full life cycle prediction, a deep analysis was conducted from two dimensions: feature space correlation and model parameter similarity. Firstly, the non-linear mapping relationship between the input features and the degradation target was analyzed using the Pearson correlation coefficient. Subsequently, the cosine similarity matrix was utilized to deconstruct the structural independence and phased evolution patterns of 1300 micro models in the parameter space. The combination of these two aspects provides direct evidence for verifying the ‘physical mechanism perception ability’ of the model.

A pivotal contribution of this study is the emergent physical awareness of the parallel model architecture. The protocol sensitivity heatmap in Figure 13 visualizes the dynamic reconfiguration of the model’s internal decision-making logic. In the early aging phase (Cycles 0–400), the model exhibits high sensitivity toward CC1. This aligns with electrochemical degradation kinetics, as high currents primarily drive lithium plating and SEI layer thickening in fresh cells [28]. Conversely, in the later stages (Cycles > 800), the sensitivity weight shifts significantly toward CC2 and SOC. This transition suggests that the model autonomously captures the critical impact of high-rate maintenance on the non-linear capacity plunge in aged cells. Such interpretability, derived purely from data trajectories without explicit physical priors, bridges the gap between black-box deep learning and battery degradation physics.

To validate the necessity of the 1300 parallel models, we analyzed the cosine similarity matrix of the model parameters in Figure 14. The matrix exhibits a prominent block-diagonal structure, indicating that the models spontaneously cluster into three distinct aging regimes: early-stable, mid-transition, and late-collapse phases. A sharp transition in similarity is observed at approximately Cycle 850, which precisely corresponds to the electrochemical knee point of the battery life. Accurate determination of the non-linear degradation knee point is beneficial to battery evaluation, providing guidance for battery management in electric vehicles and energy storage systems [38]. This structural discontinuity proves that the proposed architecture achieves successful temporal decoupling. By allowing each model to specialize in a localized aging context, the model effectively resolves the feature conflict issues inherent in monolithic models, thereby ensuring superior precision and robustness across the entire life cycle.

To evaluate the model’s robustness across diverse degradation patterns, we selected three representative batteries covering distinct lifespan ranges in Figure 15. The statistical results show that in the acceleration collapse segment following the knee point, the prediction residuals of the model did not significantly increase with the enhancement of non-linearity, and the fluctuation range of the average absolute error remained within 2%. As observed in the top row, the predicted trajectories exhibit an exceptional alignment with the ground truth. When the battery enters the later stage of aging, affected by the significant increase in internal resistance and the loss of active substances, the capacity curve shows a significant downward bend. The model successfully predicted this physical process, and the predicted trajectory highly coincided with the actual knee point position, demonstrating the model’s sensitive capture of the degradation inflection point.

The proposed architecture achieves excellent predictive accuracy in blind testing, which warrants a deeper analysis. This high precision is primarily attributed to the following synergistic factors:

De-complexification via specialization: By deploying 1300 micro models, the high-dimensional degradation modeling is decomposed into localized tasks. Each model only approximates the localized manifold at a specific temporal coordinate, significantly reducing the empirical risk compared to monolithic global models.

Temporal error suppression: The overlapping consensus mechanism serves as a collective intelligence filter. The trilateral voting from adjacent models effectively neutralizes the variance of individual base learners, ensuring that the reconstructed trajectory remains physically consistent and free of ‘step-jump’ artifacts.

Informed initialization: The integration of a 100-cycle seed sequence allows the model to leverage early-life electrochemical signatures. When combined with the Optuna-optimized weight factors for stress parameters, the ensemble can accurately identify the degradation regime of each battery sample from the onset.

6. Conclusions

This study proposes a comprehensive data-driven framework for high-precision prediction of lithium-ion battery capacity degradation, addressing key challenges in computational efficiency and interpretability in real-world BMS. This paper introduces a hybrid Wavelet–SG Filter filtering technique for achieving robust multi-scale noise suppression and feature retention, combined with a novel divide-and-conquer ensemble of specialized XGBoost micro models. This architecture can effectively capture the heterogeneity and multi-stage characteristics of the battery aging process—from initial SEI formation to the non-linear knee point transition—without relying on expensive hardware accelerators.

The proposed method achieved an average absolute percentage error (MAPE) of 0.39% on the Toyota-MIT-Stanford dataset for GP-cells, and the maximum MAPE observed on the test set of the entire GP-cells experiment was 1.04%. On SP-cells, the average MAPE reached 1.16%, and the maximum MAPE was 2.3%. More importantly, the parallelized structure of these micro models enables inference speeds of 400 milliseconds on standard CPUs, making it suitable for edge deployment in resource-constrained BMS environments. Through protocol sensitivity heatmaps and cosine similarity analysis, we further demonstrated the intrinsic physical interpretability of the model, which is consistent with known electrochemical degradation stages.

These results highlight the strong potential of this framework in industrial applications, including battery health monitoring for electric vehicles, lifetime prediction for energy storage systems, and life assessment. Future work will focus on extending this method to other battery chemistries, adapting to real-time operating protocols, and verifying performance under dynamic load curves on embedded BMS platforms.

Author Contributions

Conceptualization, Y.J.; Methodology, Y.J.; Software, Y.J., L.Z., X.L., S.W., Y.C. and C.H.; Validation, L.H.; Resources, X.L.; Data curation, Y.J., L.Z., X.L., S.W., L.H., Y.C. and C.H.; Writing—original draft, Y.J.; Writing—review and editing, Y.J., L.Z. and X.L.; Visualization, Y.J.; Funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Hunan Province under Grant 2026JJ80510.

Data Availability Statement

Data available in a publicly accessible repository. The data presented in this study are openly available in the following publicly accessible repositories: the Toyota-MIT-Stanford Battery Dataset at [https://data.matr.io/1/, accessed on 17 April 2026], the NASA Prognostics Data Repository at [https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/, accessed on 17 April 2026], and the CALCE Battery Research Group Dataset at [https://calce.umd.edu/battery-data, accessed on 17 April 2026]. These datasets were used for the training, validation, and benchmarking of the proposed battery health estimation models.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviation

The following abbreviations are used in this manuscript:

RUL	remaining useful life
BMS	battery management system
CKF	Cubature Kalman Filter
EKF	Extended Kalman Filter
SG	Savitzky–Golay
UKF	Unscented Kalman Filter
WT	wavelet transform
FPR	feature preservation rate
SNR	Signal-to-Noise Ratio
CC1, CC2	first and second constant-current charging rates
Optuna	Optuna framework
SOC	state of charge
SOH	state of health
SEI	solid electrolyte interphase
XGBoost	eXtreme Gradient Boosting
GP-cells	general-protocol cells
LFP	lithium iron phosphate
SP-cells	specific-protocol cells
CNN	Convolutional Neural Network
LSTM	Long Short-Term Memory
MAE	mean absolute error
MAPE	mean absolute percentage error
RMSE	root mean square error

References

Zhang, J.; Huang, H.; Zhang, G.; Dai, Z.; Wen, Y.; Jiang, L. Cycle Life Studies of Lithium-Ion Power Batteries for Electric Vehicles: A Review. J. Energy Storage 2024, 93, 112231. [Google Scholar] [CrossRef]
Li, J.; Peng, Y.; Wang, Q.; Liu, H. Status and Prospects of Research on Lithium-Ion Battery Parameter Identification. Batteries 2024, 10, 194. [Google Scholar] [CrossRef]
Ansari, S.; Zainuri, M.A.A.M.; Ayob, A.; Lipu, M.S.H.; Rahman, M.S.; Ibrahim, M.; Hannan, M.A. Expert Deep Learning Techniques for Remaining Useful Life Prediction of Diverse Energy Storage Systems: Recent Advances, Execution Features, Issues and Future Outlooks. Expert Syst. Appl. 2024, 258, 125163. [Google Scholar] [CrossRef]
Yuan, J.; Qin, Z.; Huang, H.; Gan, X.; Wang, Z.; Yang, Y.; Liu, S.; Wen, A.; Bi, C.; Li, B.; et al. Progress in the Prognosis of Battery Degradation and Estimation of Battery States. Sci. China-Mater. 2024, 67, 1014–1041. [Google Scholar] [CrossRef]
Li, F.; Feng, H.; Min, Y.; Zhang, Y.; Zuo, H.; Bai, F.; Zhang, Y. Prediction of Lithium-Ion Battery Degradation Trajectory in Electric Vehicles under Real-World Scenarios. Energy 2025, 317, 134663. [Google Scholar] [CrossRef]
Tang, K.; Luo, B.; Chen, D.; Wang, C.; Chen, L.; Li, F.; Cao, Y.; Wang, C. The State of Health Estimation of Lithium-Ion Batteries: A Review of Health Indicators, Estimation Methods, Development Trends and Challenges. World Electr. Veh. J. 2025, 16, 429. [Google Scholar] [CrossRef]
Wang, S.; Ren, P.; Takyi-Aninakwa, P.; Jin, S.; Fernandez, C. A Critical Review of Improved Deep Convolutional Neural Network for Multi-Timescale State Prediction of Lithium-Ion Batteries. Energies 2022, 15, 5053. [Google Scholar] [CrossRef]
Zhang, Y.; Tan, X.; Wang, Z. Stat-of-Charge Estimation for Lithium-Ion Batteries Based on Recurrent Neural Network: Current Status and Perspectives. J. Energy Storage 2025, 112, 115575. [Google Scholar] [CrossRef]
Cai, L.; Yan, J.; Jin, H.; Meng, J.; Peng, J.; Wang, B.; Liang, W.; Teodorescu, R. A Two-Stage Method with Twin Autoencoders for the Degradation Trajectories Prediction of Lithium-Ion Batteries. J. Energy Chem. 2025, 103, 759–772. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, C.; Chen, Y. Early Prediction of Lithium-Ion Battery Capacity and Remaining Useful Life Based on Cycle-Consistency Learning and an Improved Transformer. J. Energy Storage 2025, 134, 118147. [Google Scholar] [CrossRef]
Liu, Y.; Ahmed, M.; Feng, J.; Mao, Z.; Chen, Z. Deep Learning-Powered Lifetime Prediction for Lithium-Ion Batteries Based on Small Amounts of Charging Cycles. IEEE Trans. Transp. Electrif. 2025, 11, 3078–3090. [Google Scholar] [CrossRef]
Lin, Y.; Wan, F.; Yang, D.; Li, S.; Liu, R.; Yin, W.; Mu, J.; Chen, W. Battery Degradation Trajectory Early Prediction with Degradation Recognition and Physics-Guided under Different Charging Strategies. Energy 2025, 336, 138485. [Google Scholar] [CrossRef]
Yao, X.-Y.; Chen, G.; Hu, L.; Pecht, M. A Multi-Model Feature Fusion Model for Lithium-Ion Battery State of Health Prediction. J. Energy Storage 2022, 56, 106051. [Google Scholar] [CrossRef]
Luo, J.; Liu, Z.; Wu, L.; Luo, C.; Yi, G. A Health Indicator-Based Multidimensional Spatiotemporal Feature Extraction Network for Remaining Useful Life Prediction. Measurement 2026, 263, 120153. [Google Scholar] [CrossRef]
Zhu, Y.; Shang, Y.; Gu, X.; Wang, Y.; Zhang, C. Rapid Assessing Cycle Life and Degradation Trajectory Based on Transfer Learning for Lithium-Ion Battery. IEEE Trans. Transp. Electrif. 2025, 11, 5509–5520. [Google Scholar] [CrossRef]
Li, X.; Ju, L.; Geng, G.; Jiang, Q. Data-Driven State-of-Health Estimation for Lithium-Ion Battery Based on Aging Features. Energy 2023, 274, 127378. [Google Scholar] [CrossRef]
Kurucan, M.; Ozbaltan, M.; Yetgin, Z.; Alkaya, A. Applications of Artificial Neural Network Based Battery Management Systems: A Literature Review. Renew. Sustain. Energy Rev. 2024, 192, 114262. [Google Scholar] [CrossRef]
Jiang, M.; Li, D.; Li, Z.; Chen, Z.; Yan, Q.; Lin, F.; Yu, C.; Jiang, B.; Wei, X.; Yan, W.; et al. Advances in Battery State Estimation of Battery Management System in Electric Vehicles. J. Power Sources 2024, 612, 234781. [Google Scholar] [CrossRef]
Chen, H.; Yue, W.; Bin, G.; Jiang, Q.; Shao, W.; She, C. Filter Methods Comparation for Incremental Capacity Analysis in Lithium-Ion Batteries Health Prediction. J. Energy Storage 2024, 101, 113878. [Google Scholar] [CrossRef]
Rout, S.; Das, S.; Kumar, K.M.S.; C., D.; Muyeen, S.M. Advanced Battery Modeling and State-of-Charge Estimation of Lithium-Ion Batteries: A Comprehensive Review of Modeling Approaches, Parameterization, and Operational Challenges. Energy Strategy Rev. 2026, 64, 102192. [Google Scholar] [CrossRef]
Luo, J.W.; Ying, K.; Bai, J. Savitzky-Golay Smoothing and Differentiation Filter for Even Number Data. Signal Process. 2005, 85, 1429–1434. [Google Scholar] [CrossRef]
Suescun-Diaz, D.; Rasero Causil, D.A.; Figueroa-Jimenez, J.H. Adams-Bashforth-Moulton Method with Savitzky-Golay Filter to Reduce Reactivity Fluctuations. Kerntechnik 2017, 82, 674–677. [Google Scholar] [CrossRef]
Wang, F.; Zhai, Z.; Liu, B.; Zheng, S.; Zhibin, Z.; Chen, X. Open Access Dataset, Code Library and Benchmarking Deep Learning Approaches for State-of-Health Estimation of Lithium-Ion Batteries. J. Energy Storage 2024, 77, 109884. [Google Scholar] [CrossRef]
Cui, Z.; Gao, X.; Mao, J.; Wang, C. Remaining Capacity Prediction of Lithium-Ion Battery Based on the Feature Transformation Process Neural Network. Expert Syst. Appl. 2022, 190, 116075. [Google Scholar] [CrossRef]
Kanevskii, L.S.; Dubasova, V.S. Degradation of Lithium-Ion Batteries and How to Fight It: A Review. Russ. J. Electrochem. 2005, 41, 1–16. [Google Scholar] [CrossRef]
Dachraoui, W.; Erni, R. Anode-Electrolyte Interface in Lithium-Ion Batteries Investigated by Liquid Phase Transmission Electron Microcopy: Achievements, Challenges, and Future Directions. Nano Energy 2025, 143, 111321. [Google Scholar] [CrossRef]
Zhao, D.; Ding, M.; Tao, M.; Shan, P.; Lin, H.; Chen, Y.; Chen, J.; Zhou, Y.; Yang, Y. Advanced Interfacial Engineering of Graphite Anodes for Next-Generation Lithium-Ion Batteries. Small 2026, 22, e12150. [Google Scholar] [CrossRef] [PubMed]
Rahman, M.M.; Nisar, U.; Abouimrane, A.; Belharouak, I.; Amin, R. Valuation of Anode Materials for High-Performance Lithium Batteries: From Graphite to Lithium Metal and Beyond. Electrochem. Energy Rev. 2025, 8, 14. [Google Scholar] [CrossRef]
Ruiz, P.L.; Damianakis, N.; Mouli, G.R.C. Physics-Based and Data-Driven Modeling of Degradation Mechanisms for Lithium-Ion Batteries-A Review. IEEE Access 2025, 13, 21164–21189. [Google Scholar] [CrossRef]
Attia, P.M.; Bills, A.; Brosa Planella, F.; Dechent, P.; dos Reis, G.; Dubarry, M.; Gasper, P.; Gilchrist, R.; Greenbank, S.; Howey, D.; et al. Review-"Knees" in Lithium-Ion Battery Aging Trajectories. J. Electrochem. Soc. 2022, 169, 060517. [Google Scholar] [CrossRef]
Song, S.; Fei, C.; Xia, H. Lithium-Ion Battery SOH Estimation Based on XGBoost Algorithm with Accuracy Correction. Energies 2020, 13, 812. [Google Scholar] [CrossRef]
Severson, K.A.; Attia, P.M.; Jin, N.; Perkins, N.; Jiang, B.; Yang, Z.; Chen, M.H.; Aykol, M.; Herring, P.K.; Fraggedakis, D.; et al. Data-Driven Prediction of Battery Cycle Life before Capacity Degradation. Nat. Energy 2019, 4, 383–391. [Google Scholar] [CrossRef]
Lin, M.; You, Y.; Meng, J.; Wang, W.; Wu, J.; Stroe, D.-I. Lithium-Ion Battery Degradation Trajectory Early Prediction with Synthetic Dataset and Deep Learning. J. Energy Chem. 2023, 85, 534–546. [Google Scholar] [CrossRef]
Deng, Z.; Lin, X.; Cai, J.; Hu, X. Battery Health Estimation with Degradation Pattern Recognition and Transfer Learning. J. Power Sources 2022, 525, 231027. [Google Scholar] [CrossRef]
Chen, D.; Zhang, W.; Zhang, C.; Sun, B.; Cong, X.; Wei, S.; Jiang, J. A Novel Deep Learning-Based Life Prediction Method for Lithium-Ion Batteries with Strong Generalization Capability under Multiple Cycle Profiles. Appl. Energy 2022, 327, 120114. [Google Scholar] [CrossRef]
Bhushan, N.; Mekhilef, S.; Tey, K.S.; Shaaban, M.; Seyedmahmoudian, M.; Stojcevski, A. Overview of Model- and Non-Model-Based Online Battery Management Systems for Electric Vehicle Applications: A Comprehensive Review of Experimental and Simulation Studies. Sustainability 2022, 14, 15912. [Google Scholar] [CrossRef]
Hong, S.; Kang, M.; Park, H.; Kim, J.; Baek, J. Real-Time State-of-Charge Estimation Using an Embedded Board for Li-Ion Batteries. Electronics 2022, 11, 2010. [Google Scholar] [CrossRef]
Zhu, J.; Weng, W.; You, H.; Zhang, J.; Wang, Y.; Jiang, B.; Ji, C.; Wei, X.; Dai, H. Lithium-Ion Battery End of Life Prediction Based on the Decelerating Aging Point. Appl. Energy 2025, 401, 126692. [Google Scholar] [CrossRef]

Figure 1. The specific process of using the combined filter of Wavelet and Savitzky–Golay when processing data: (a) Wavelet denoising; (b) wavelet signal reconstruction; (c) SG filter smoothing.

Figure 2. Flowchart of the research process: data processing, feature extraction, model construction and training, and model testing.

Figure 3. A model cluster constructed based on the parallel architecture following the divide-and-conquer strategy.

Figure 4. Schematic diagram of model cluster input and input principle based on the divide-and-conquer strategy.

Figure 5. (a) Diagram illustrating the charging mode of the battery. (b) Histogram of the lifespan distribution of the battery dataset. (c) The capacity decline trajectories of all batteries.

Figure 6. Comparison of input signals of some batteries before and after preprocessing.

Figure 7. (a) The MAPE, MAE and RMSE values of all the cells obtained after testing on the test set in the GP-cells experiment; (b) The MAPE, MAE and RMSE values of all the cells obtained after testing on the test set in the SP-cells experiment.

Figure 8. The average MAPE, MAE and RMSE obtained on the SP-cells test set and the GP-cells test set, which were compared with three advanced deep learning methods, namely CNN, LSTM and CNN-LSTM.

Figure 9. The differences between the parallel structure proposed by the research institute and the traditional serial mode that make the model achieve a remarkable speed.

Figure 10. (a) Comparison of the running speed of the proposed method with the benchmark models of CNN, LSTM, and CNN-LSTM on the CPU. (b) Comparison of the running speed of the proposed method with the benchmark models of CNN, LSTM, and CNN-LSTM on the GPU.

Figure 11. A comprehensive comparison of this method with the other three deep learning methods.

Figure 12. The trend graph of the average best RMSE of the five-fold cross-validation for the SP-cells experiment, showing how it changes with the number of parameter adjustments.

Figure 13. The thermal sensitivity diagrams of CC1, CC2 and SOC in the protocol for the model.

Figure 14. The residual similarity graph of the model cluster.

Figure 15. The capacity decline trajectories, error graphs and box plots of the three typical life types of batteries (short life, medium life and long life) at the knee point.

Table 1. The performance of common filters in three dimensions: Feature retention capability, adaptive non-stationary signals, and computational cost.

Method	Feature Retention Capability	Adaptability of Non-stationary Signals	Computational Cost
Wavelet	Strong	Strong	Moderate
SG Filter	Strong	Poor	Low
MA Filter	Poor	Poor	Low
EEMD	Strong	Strong	High
Proposed	Strong	Strong	Moderate

Table 2. Comparison of different preprocessing methods.

Method	SNR (dB)	FPR (%)	SI (10⁻³)	Computation Time (ms)
Raw Data	15.2	100.0	8.73	-
Moving Average (n = 5)	18.6	89.4	3.21	0.8
Kalman Filter	19.8	91.2	2.95	12.3
Wavelet Only (db4, J = 5)	22.4	95.6	2.14	5.6
SG Filter Only (m = 5, k = 3)	21.1	93.8	2.48	1.2
Wavelet + SG (Proposed)	24.7	97.3	1.67	6.9

Table 3. The specific parameter settings for the Wavelet Transform and SG Filter in the preprocessing steps of this study.

Component	Parameter	Value/Setting
Wavelet Transform	Wavelet Basis	Symlet-8
	Decomposition Level	Adaptive (Max 5)
	Threshold Mode	Soft-thresholding
	Scaling Factor ( $k$ )	1.2
SG Filter	Polynomial Order	3
SG Filter	Window Size ( $W$ )	5

Table 4. Based on the battery life distribution in the dataset, a representative portion of batteries was selected, and their life lengths included short life, medium life and long life.

ID	Types	Life
Cell 1	GP-cells	Short
Cell 2	GP-cells	Short
Cell 3	GP-cells	Short
Cell 4	GP-cells	Short
Cell 5	GP-cells	Mid
Cell 6	GP-cells	Mid
Cell 7	GP-cells	Long
Cell 8	GP-cells	Long

Table 5. The dataset partitioning strategy in the SP-cells experiment, the proportion of the subsets and the category of the cells.

Dataset Split		Proportion	Cell Types
Test set		20%	SP-cells
Cross-validation set	Training set	16%	SP-cells, GP-cells
Cross-validation set	Parameter adjustment set	64%	SP-cells, GP-cells

Table 6. The comparison of the accuracy obtained from the mirror experiment using unprocessed data with that of the original experiment.

	MAPE (%)	RMSE (Ah)	MAE (Ah)
Unprocessed	4.593	0.0464	0.0437
Processed	1.16	0.0138	0.0103

Table 7. The average MAPE, MAE and RMSE of the three advanced deep learning models, namely CNN, LSTM and CNN-LSTM. The averaged MAPE, MAE and RMSE values calculated on the test set of the SP-cells and GP-cells experiments.

Model	MAE (Ah)	RMSE (Ah)	MAPE (%)
CNN	0.012	0.015	1.65
LSTM	0.0094	0.0113	1.28
CNN-LSTM	0.0110	0.0135	1.47
Proposed (GP-cells)	0.0035	0.0057	0.39
Proposed (SP-cells)	0.0103	0.0138	1.16

Table 8. Cathode materials, experimental temperatures, and partitioning strategies of each subset in the NASA dataset and the CACLE dataset.

Cell	Temperature	Cathode Material	Dataset
B0005	24 °C	LiNiCoAlO₂	Training set
B0006	24 °C	LiNiCoAlO₂	Training set
B0007	24 °C	LiNiCoAlO₂	Validation set
B0018	24 °C	LiNiCoAlO₂	Test set
CS2-35	25 °C	LCO	Training set
CS2-36	25 °C	LCO	Validation set
CS2-37	25 °C	LCO	Test set
CS2-38	25 °C	LCO	Training set

Table 9. The MAPE, MAE and RMSE obtained through testing on the B0018 battery in the NASA dataset and the CS2-37 battery in the CACLE dataset.

ID	MAPE (%)	MAE (Ah)	RMSE (Ah)
B0018	2.117	0.018	0.0221
CS2-37	1.964	0.01373	0.01833

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiao, Y.; Zeng, L.; Li, X.; Wang, S.; Huang, L.; Cai, Y.; Huang, C. Research on Predicting the Lifespan of Lithium-Ion Batteries Using the Micro XGBoost Model Cluster. Processes 2026, 14, 1829. https://doi.org/10.3390/pr14111829

AMA Style

Jiao Y, Zeng L, Li X, Wang S, Huang L, Cai Y, Huang C. Research on Predicting the Lifespan of Lithium-Ion Batteries Using the Micro XGBoost Model Cluster. Processes. 2026; 14(11):1829. https://doi.org/10.3390/pr14111829

Chicago/Turabian Style

Jiao, Yinbo, Linjun Zeng, Xun Li, Shen Wang, Lei Huang, Yimei Cai, and Can Huang. 2026. "Research on Predicting the Lifespan of Lithium-Ion Batteries Using the Micro XGBoost Model Cluster" Processes 14, no. 11: 1829. https://doi.org/10.3390/pr14111829

APA Style

Jiao, Y., Zeng, L., Li, X., Wang, S., Huang, L., Cai, Y., & Huang, C. (2026). Research on Predicting the Lifespan of Lithium-Ion Batteries Using the Micro XGBoost Model Cluster. Processes, 14(11), 1829. https://doi.org/10.3390/pr14111829

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Predicting the Lifespan of Lithium-Ion Batteries Using the Micro XGBoost Model Cluster

Abstract

1. Introduction

2. A Lithium Battery Data Processing Method Based on Wavelet–SG Filter

2.1. Multi-Resolution Analysis Matching of Battery Degradation Mechanisms Using Wavelet Transform

2.2. Normalization

3. Feature Extraction of Lithium-Ion Battery Capacity Degradation Data

4. Divide-and-Conquer Strategy for XGBoost Targeting of Battery Degradation Mechanisms

5. Case Study

5.1. Data Configuration

5.2. Analysis of Data Preprocessing Results

5.3. Industrial Applicability and Edge-Feasibility Evaluation

5.4. Multidimensional Feature Correlation and Robustness Evaluation Across Life Cycle

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI