Improving State-of-Health Estimation for Lithium-Ion Batteries Based on a Generative Adversarial Network and Partial Discharge Profiles

Zhang, Hangyu; Lai, Yi-Horng

doi:10.3390/wevj16050277

Open AccessArticle

Improving State-of-Health Estimation for Lithium-Ion Batteries Based on a Generative Adversarial Network and Partial Discharge Profiles

by

Hangyu Zhang

and

Yi-Horng Lai

^*

School of Mechanical and Electrical Engineering & Automation, Xiamen University Tan Kah Kee College, Zhangzhou 363105, China

^*

Author to whom correspondence should be addressed.

World Electr. Veh. J. 2025, 16(5), 277; https://doi.org/10.3390/wevj16050277

Submission received: 21 February 2025 / Revised: 29 April 2025 / Accepted: 14 May 2025 / Published: 16 May 2025

(This article belongs to the Special Issue Lithium-Ion Battery Diagnosis: Health and Safety)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The aging effect weakens the capacity of lithium batteries, seriously affecting the performance of electric vehicles. Developing state-of-health estimation technology for lithium batteries can help to optimize the charging and discharging strategies of electric vehicles. This study investigates the use of partial discharge data for SOH estimation to address the unstable output of traditional estimation models when using partial discharge data under low-voltage conditions. This study first used the DoppelGANger network to generate artificially synthesized data. After the data augmentation process, we trained the temporal convolutional network to construct a data-driven SOH model. Finally, the performance of the SOH model output was evaluated using three indicators: RMSE, MAPE, and delta. The proposed method improved five kinds of low-voltage operating conditions in seven testing scenarios compared with traditional SOH estimation models. The experimental results provide a practical solution for data-driven SOH estimation.

Keywords:

lithium battery; state of health; DoppelGANger network; temporal convolutional network

1. Introduction

With the rapid development of electric vehicles (EVs), the demand for high-performance battery systems has increased significantly. As a core component of EVs, the battery management system (BMS) plays a vital role in ensuring the safety, reliability, and longevity of battery packs [1]. However, due to the inevitable aging process, battery degradation gradually occurs, leading to reduced performance and shorter lifespans. Among the various functions of the BMS, accurate estimation of the state of health (SOH) and remaining useful life (RUL) is essential for effective battery management [2,3,4].

Lithium battery aging is typically accompanied by increased internal resistance, which reduces the available capacity and negatively impacts EV performance [5]. Therefore, developing accurate SOH estimation techniques is critical for optimizing charging/discharging strategies and extending battery life. A comprehensive review of lithium battery modeling and state estimation approaches is provided in [6], where the existing SOH estimation methods are generally categorized into experimental-based and model-based techniques.

Sun et al. [7] proposed an SOH estimation method based on electrochemical impedance spectroscopy (EIS), which characterizes internal battery conditions by analyzing impedance responses at multiple frequencies. EIS can effectively capture key parameters, such as electrode kinetics, electrolyte diffusion, and interface stability, making it suitable for online and non-destructive SOH monitoring [8,9].

For model-based SOH estimation, Demirci et al. [10] developed a method that monitors real-time physical signals, such as voltage, current, and temperature, and uses data-driven models for SOH prediction. Incremental capacity analysis (ICA), another popular approach, identifies degradation patterns and predicts SOH by analyzing incremental capacity curves. The combination of ICA with support vector regression has demonstrated high reliability in practical applications [11].

To address the dependency on precise load profiles, Shu et al. [12] proposed an online SOH estimation method based on short-term charging curves. Their method extracts features from early-stage voltage curves and applies machine learning techniques to achieve fast and reliable SOH prediction. The experimental results indicated that the initial charging curve shape correlates strongly with the battery aging status, making this a feasible approach for real-time EV applications.

Electrochemical models, when combined with data-driven algorithms, can mitigate model uncertainty and measurement noise, enabling dynamic RUL estimation. For example, Lyu et al. [13] presents a lead–acid battery RUL model, based on particle filtering, integrated with an electrochemical model, providing a foundation for hybrid modeling strategies in lithium batteries.

Gou et al. [14] proposed a hybrid data-driven method for SOH and RUL estimation using deep learning models such as convolutional neural networks (CNNs) and long short-term memory (LSTM) networks [15], incorporating features such as the voltage, current, and temperature. Further enhancements include the CNN–BiLSTM–Attention model [16], which introduces attention mechanisms to focus on informative features, improving the estimation accuracy.

Traditionally, SOH estimation relies on full charge/discharge profiles. However, acquiring complete data is often time-consuming and impractical, especially under real-world EV operating conditions. Recent research has shifted toward SOH estimation using partial charge/discharge profiles, which reduces the data collection burden [17,18,19]. For instance, the authors of [20] propose a multi-feature extraction method combining a temporal convolutional network (TCN) and particle swarm optimization, while the authors of [21] improved TCN-based SOH estimation using Bayesian optimization. Although TCNs have shown promise for SOH modeling on partial data, their performance degrades significantly under low-SOC conditions [22]. More work is required to enhance its robustness in such challenging scenarios.

Another major challenge in data-driven SOH modeling is the limited sizes of high-quality training datasets. Many recent efforts have focused on data augmentation to improve model generalization. For example, the authors of [23] apply synthetic data generation to address imbalance in bearing failure classification tasks, while the authors of [24] use generative adversarial networks (GANs) to augment fault signals. Similarly, the authors of [25,26] generate synthetic battery data using time series GANs to improve SOH estimation accuracy, although these methods primarily target full discharge data. Applying data augmentation to partial discharge profiles remains a promising research direction.

Despite these advances, the following two critical research gaps remain:

Lack of task-specific datasets for SOH estimation under partial and low-SOC conditions. The existing studies largely depend on full-cycle data, which is not feasible for real-time EV monitoring. Public datasets rarely capture low-SOC partial discharge profiles in a volume sufficient for effective model training.
Insufficient robustness and accuracy of existing SOH estimation models under low-SOC conditions. While models such as TCNs are effective in some scenarios, their accuracy degrades in sparse or low-voltage regions. Moreover, few studies integrate data augmentation techniques, such as GANs, to address these limitations.

Based on this analysis, we propose an SOH estimation framework that integrates GAN-based data augmentation with a TCN model. First, discharge data from seven aging conditions were segmented into four partial SOC ranges. To mitigate the issue of limited data, high-fidelity synthetic data were generated using a GAN model. Both the control and experimental groups were modeled using TCNs. Finally, the model performance was evaluated using RMSE, MAPE, and delta metrics. The experimental results demonstrated that the proposed method significantly improved SOH estimation accuracy under low-SOC conditions, while also showing strong robustness across the full SOC range.

The main contributions of this work are summarized as follows:

A data augmentation strategy based on the GAN is proposed to address the limitation of insufficient training data. The GAN model was tailored to match the distributional characteristics of battery discharge data, enabling the generation of high-quality synthetic datasets.
A TCN-based SOH estimation framework was developed for partial discharge profiles. The proposed method significantly improved the prediction accuracy in both constrained low-SOC scenarios and broader applications involving full-range SOC profiles.
An integrated framework is presented for SOH estimation from partial discharge profiles under data-scarce conditions, combining synthetic data generation, segmentation, and temporal modeling to achieve robust and accurate predictions.

2. The Analysis of the Dataset

2.1. Dataset Introduction

This study employed the randomized battery usage dataset provided by NASA’s Prognostics Center of Excellence (PCoE) [27,28], which is widely used to evaluate data-driven lithium-ion battery (LIB) aging models and provides a reliable benchmark for algorithm performance validation. The dataset includes aging data for 28 lithium cobalt oxide (LCO) 18,650 cells, each with a nominal capacity of approximately 2.1 to 2.2 Ah, as detailed in Table 1. Based on the testing conditions, the dataset was divided into seven groups, each containing data from four cells. Groups 1 to 5 were cycled at room temperature, while Groups 6 and 7 were cycled at an ambient temperature of 40 °C.

To replicate real-world applications, the batteries were subjected to randomized load profiles during charging and discharging. The profiles are referred to here as random walk (RW) discharging. The specific groupings and test conditions are summarized in Table 2. The aging tests were performed periodically, during which the voltage, current, and temperature of the cells were recorded. After a predetermined number of cycles, a constant current discharge test was conducted to collect reference discharge data, which were used to characterize the SOH of the batteries. Figure 1 illustrates a sample of the voltage, current, and temperature data recorded during the first 10 cycles for a single cell. The dataset was processed and provided in .mat format. The experiment continued until the SOH of the cells dropped to a range between 50% and 80%.

2.2. Definition of SOH

Since the original dataset does not directly provide SOH data, the SOH was evaluated using the reference full discharge profiles (from 100% to 0% discharge) to calculate the battery capacity. Coulomb counting (CC) was employed for current integration, resulting in

S O H_{Q}

, which represents the battery’s health status based on capacity degradation. Its definition is as follows:

S O H_{Q} (t_{i}) = \frac{Q (t_{i})}{Q (t_{0})} \times 100 %,

(1)

where

Q (t_{0})

represents the capacity of the battery at the beginning of its life, and

Q (t_{i})

denotes the capacity at time

t_{i}

. The maximum SOH value is defined as 100%, serving as a metric to quantify the battery’s aging process.

2.3. Partial Discharge Profiles

In real-world applications, the randomness and dynamic nature of battery usage often make it challenging to obtain complete discharge profiles. Partial discharge profiles are more practical and valuable for real-world scenarios. To simulate SOH estimation using partial profiles, the complete discharge profiles were segmented based on different SOC ranges. The SOC and corresponding voltage ranges for various use cases are presented in Table 3. To visually demonstrate the segmentation of the partial discharge profiles for different use cases, all reference discharge profiles of Cell 1 from Group 1 throughout the testing process were plotted, as shown in Figure 2.

3. Data Augmentation Using GAN

To augment the limited battery sample data and generate a high-quality synthetic dataset, we employed a GAN model [29]. The GAN is a powerful generative model that leverages the adversarial interaction between a generator (G) and a discriminator (D). The structure of the GAN model is shown in Figure 3.

The generator (G) is responsible for generating synthetic data from random noise that resembles the real data distribution. The discriminator (D), on the other hand, determines whether the input data are real or generated.

Their objectives are defined by a loss function, as shown in Equation (2):

\min_{G} \max_{D} V (D, G) = E_{x ~ p_{data} (x)} [\log D (x)] + E_{z ~ p z (z)} [\log (1 - D (G (z)))] .

(2)

Here,

p_{data} (x)

represents the distribution of real data,

p_{z} (z)

denotes the distribution of random noise, and

G (z)

is the output of the generator, i.e., the generated data.

Traditional GANs are primarily designed for static images or non-sequential data, and their direct application to time series data may result in generated outputs lacking temporal dependencies. To better generate time series data with dynamic characteristics, researchers have proposed various GAN models tailored for time series data, such as RCGAN [30] and TimeGAN [31]. Considering that the current data used for SOH estimation exhibit long-term temporal dependencies, this study adopted DoppelGANger (DG) [32], one of the most effective methods for handling long-term time series data.

The overall architecture of the DG, as shown in Figure 4, comprises a metadata generator, a time series generator, an auxiliary discriminator, and a standard discriminator. The specific structure is described as follows:

3.1. Metadata Generator

The metadata generator employs a multi-layer perceptron (MLP) model to generate high-dimensional metadata associated with the time series. The generated metadata not only satisfy the statistical characteristics of the real distribution but also serve as conditional input for the time series generator, guiding the generation process.

3.2. Time Series Generator

The time series generator is built on a recurrent neural network (RNN) and is responsible for sequentially generating time series data. Its input includes random noise and metadata generated by the metadata generator. To enhance the ability to capture long-term dependencies, the generator adopts a batch generation method, producing multiple consecutive time steps of data at once, which significantly reduces the computational complexity required for generating long sequences.

3.3. Discriminator

The standard discriminator distinguishes between generated and real time series data, guiding the optimization of the time series generator through an adversarial loss function. To further improve the quality of metadata generation, DG incorporates an auxiliary discriminator dedicated to verifying whether the metadata match the real distribution. The joint optimization of the two discriminators ensures the fidelity of the joint distribution of time series data and metadata.

3.4. Normalization Mechanism and Mode Collapse Prevention

The DG introduces an adaptive normalization mechanism that normalizes each time series individually, with normalization parameters (e.g., maximum and minimum values) included as part of the metadata. This design effectively mitigates the mode collapse problem commonly encountered in traditional GANs when dealing with data with varying ranges, ensuring the diversity and authenticity of the generated data.

4. Estimation of SOH Through TCN

In recent years, convolutional architectures have achieved high accuracy in processing sequential data, such as audio and translation. Considering the small sample characteristics of the partial discharge sequence data used for SOH estimation, this study adopted the TCN to perform SOH estimation [31], leveraging best practices from convolutional architectures in other fields.

The TCN is a time series modeling approach based on CNNs, specifically designed to handle time series data or tasks with sequential dependencies. By combining causal convolution and dilated convolution, a TCN efficiently captures long-term dependencies in sequences, while benefiting from parallel computation.

Compared to traditional RNN architectures, the TCN model features parameter sharing, lower memory requirements for training, and faster training speeds, making it more suitable for implementation in a BMS. Figure 5 illustrates the TCN architecture used in this study. The specific structure is described as follows:

4.1. Dilated Convolutional Networks

Dilated convolution significantly enlarges the receptive field by introducing gaps (i.e., dilation rates) between the convolutional kernels, allowing for efficient capture of long-term dependencies. The dilation rate typically grows exponentially, which enables the model to cover a broader temporal range, even with fewer layers.

Compared to traditional convolutions, dilated convolutions maintain a low computational cost while capturing global dependencies. Figure 6 illustrates a dilated convolution with dilation rates of [1, 2, 4, 8]. As shown, for a four-layer network, setting the dilation rates to [1, 2, 4, 8] allows the output at a single time step to be related to data from 16 time steps at the input, significantly expanding the receptive field.

4.2. Causal Convolutional Networks

As shown in Figure 5, causal convolutions ensure that the value at time t from the previous layer only depends on the current and previous values at time t in the next layer. Unlike traditional convolutional neural networks, causal convolutions do not allow access to future data. They are designed with a unidirectional structure, where the “cause” must precede the “effect”, thus introducing a strict temporal constraint. This mechanism prevents information leakage, ensuring the model’s validity for forecasting tasks.

4.3. Residual Blocks

The deeper the network, the stronger its expressive power and the better its performance, to a certain extent. However, experiments have shown that, when the network depth becomes too large, the performance may degrade. In addition to the dilated convolution method, the use of residual connections is also an effective solution to this issue. Residual connections originate from residual networks (ResNet) [33], which aim to address the problems of network degradation and vanishing gradients caused by increasing the network depth.

Let

x

be the input to the model and

F (x)

be the output after a linear transformation and activation. The formula for this process is as follows:

o = A c t i v a t i o n (x + F (x)),

(3)

where

o

represents the output, and “Activation” denotes the activation function. This connection process is known as a residual connection, with each connection forming a residual module. Multiple residual modules are combined to form a ResNet.

In this work, a residual block was designed based on the characteristics of the discharge current sequence, as shown in Figure 5. The residual block consisted of two layers of causal dilated convolutions, with ReLU as the activation function. Dropout was employed to mitigate overfitting, and an optional 1 × 1 convolution was introduced to ensure that the input and output lengths of the residual block were the same.

5. Method and Procedure

The methodology and procedure of the proposed method are demonstrated in Figure 7. The procedure consisted of the following five main components: data preprocessing, GAN enhancement, data splitting, TCN training, and performance evaluation.

5.1. Data Preprocessing

To ensure the model’s generalization ability and the reliability of the performance evaluation, the limited dataset needed to be partitioned. Typically, the dataset is divided into training, validation, and testing sets. The training set, which constitutes the largest portion of the dataset, is used for the model training process. Data from the training set are utilized to compute the loss function and update the model parameters, allowing the model to learn the mapping between input data and target output.

The validation set is used to monitor the model’s performance during training to prevent overfitting. During training, the model’s performance on the validation set (such as validation loss or accuracy) is used to select the optimal hyperparameters (e.g., learning rate, model structure, etc.). The testing set is used for the final evaluation of the model’s generalization performance. The test data remain completely unseen during the model training and hyperparameter tuning phases, and the evaluation results from the testing set provide an accurate reflection of the model’s performance in real-world applications.

As introduced in Section 2.1, this study used the NASA PCoE random battery usage dataset. The dataset was divided into seven groups, based on different test conditions, with each group containing data from four batteries. In this study, the first battery in each group was designated as the test data (testing set), while the remaining three batteries were treated as raw data. The raw data were further divided into training and validation sets, with details on this division provided in Section 5.3.

5.2. GAN Enhancement

Data-driven algorithms heavily depend on high-quality datasets. To enhance the performance of such algorithms, this study utilized GAN to augment the limited dataset. The raw data, as defined in Section 5.1, were input into the GAN, which generated synthetic data with characteristics and distributions similar to the original data. The synthetic data were then combined with the raw data to create an enhanced dataset, referred to as the enhanced data.

5.3. Data Split

After data preprocessing and GAN enhancement, we obtained the three following datasets: test data, raw data, and enhanced data. All three are complete discharge profiles. However, since this study focuses on evaluating SOH using partial discharge profiles, the data needed further segmentation, as described in Section 2.3.

Considering that hyperparameter tuning is required during the training process of the data-driven model, the segmented raw data and enhanced data were further divided into the following subsets: raw data training, raw data validation, enhanced data training, and enhanced data validation. These subsets facilitated the process of hyperparameter optimization.

5.4. TCN Training

This study employed a TCN to establish the relationship between the partial discharge profiles and the SOH. The TCN takes the partial discharge profiles as inputs and outputs the estimated SOH.

To achieve the best SOH estimation performance, it is essential to determine the optimal model architecture and corresponding weights. This involves training and selecting the approximate function

\hat{f}

, associated with the model structure, along with its corresponding model weights

θ

, as follows:

\underset{θ}{a r g \min} |y - \hat{f} (x, θ)|

(4)

where

x

represents the input vector, and

y

denotes the corresponding output values. The model was trained using the Adam optimizer. The loss function selected for this study was the mean squared error (MSE), which is defined as the following:

L (y, \hat{y}) = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2},

(5)

where

\hat{y}

represents the model’s output, and

n

corresponds to the number of samples input into the model.

5.5. Performance Evaluation

To evaluate the SOH estimation method based on the GAN-enhanced partial discharge profiles proposed in this study, partial discharge profiles derived from raw data and enhanced data were used to train and fine-tune the TCN model. The performance of the model was then assessed using profiles from the testing data, which were not involved in the training or tuning processes.

The evaluation metrics used to evaluate the similarity of the data generated by the GAN-based model included the Wasserstein Distance and Kullback–Leibler (KL) Divergence. These metrics quantify the similarity between the distributions of the generated and real data. Their definitions are provided as follows:

The Wasserstein Distance measures the minimum effort required to transform one probability distribution into another. For one-dimensional distributions

P

and

Q

, the first-order Wasserstein Distance is defined as follows:

W (P, Q) = \inf_{γ \in Γ (P, Q)} E_{(x, y) ~ γ} [∥ x - y ∥],

(6)

where

Γ (P, Q)

denotes the set of all joint distributions

γ (x, y)

with marginals

P

and

Q

, while

∥ x - y ∥

is typically the Euclidean distance.

The Kullback–Leibler (KL) Divergence, in its discrete form, measures the divergence between two probability distributions over a shared finite domain. It is defined as follows:

D_{KL} (P ∥ Q) = \sum_{i} P (i) \log \frac{P (i)}{Q (i)},

(7)

where

P (i)

and

Q (i)

represent the probabilities of the i-th bin or discrete event in the real and generated data distributions, respectively.

The evaluation to assess the performance of a neural network model involves visualizing the delta curves between the true and predicted values, the mean absolute percentage error (MAPE), and the root mean squared error (RMSE). The definitions of the delta value, MAPE, and RMSE are as follows:

Δ_{i} = |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}| \times 100 %;

(8)

M A P E = \frac{\sum_{i = 1}^{N} |\frac{y_{i}^{*} - y_{i}}{y_{i}}| \times 100 %}{N};

(9)

R M S E = \sqrt{\frac{\sum_{i = 1}^{N} {(y_{i}^{*} - y_{i})}^{2}}{N}} .

(10)

6. Experimentation and Analysis

6.1. GAN-Based Synthetic Data Generation and Evaluation

This study employed a GAN model to augment the original dataset, specifically adopting the DoppelGANger (DG) model. The implementation was based on Python 3.9.2 and PyTorch 2.5.0. The model training was performed on a computer equipped with an i5-13490F CPU (Intel, Santa Clara, CA, USA) and an Nvidia GeForce GTX 1660 SUPER GPU (Nvidia Corporation, Santa Clara, CA, USA). The hyperparameters of the DG model are shown in Table 4.

To assess the effectiveness of the synthetic data generated using the DG model, a comparison was made between a set of original data and its synthetic counterpart through visualization. The selected data included one set of voltage records and one set of current records, as shown in Figure 8 and Figure 9. The red curve on the left represents the original data, while the blue curve on the right corresponds to the synthetic data.

Figure 9 illustrates that the synthetic data generated by the DG model closely resembled the original data, showcasing a good similarity to the original data. Furthermore, to better evaluate the similarity between the generated data and the original data, principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) were employed for visualization.

PCA is a linear dimensionality reduction technique that extracts the principal components of the data. The principle of PCA is mapping high-dimensional data to a lower-dimensional space. As shown in Figure 10, the visualization results indicate that the distribution of the generated data closely aligned with the principal component distribution of the original data, demonstrating that the GAN-generated data captured the overall features of the original data effectively.

t-SNE, on the other hand, is a nonlinear dimensionality reduction method well-suited for handling high-dimensional data and revealing its local structures in a lower-dimensional space. After applying t-SNE, the visualizations revealed that the generated data exhibited a high degree of similarity to the original data in terms of local cluster structures. As illustrated in Figure 10, the t-SNE distribution of the synthetic data closely resembled that of the original data.

The visualization results confirmed that the GAN-generated data not only approximated the global distribution of the original data, but also achieved a high level of consistency in local structures.

To evaluate the quality of the synthetic data generated by DoppelGANger, we performed a quantitative analysis using the Wasserstein Distance and Kullback–Leibler (KL) Divergence, with their respective formulations detailed in Section 5.5. As shown in Table 5, both metrics yielded relatively small values for the voltage and current data, lower than typical values for sequential data, indicating that the distributions of the generated samples closely resembled those of the real data. This demonstrates that DoppelGANger was capable of producing high-fidelity synthetic data that preserved the essential statistical characteristics of the original dataset.

6.2. TCN-Based SOH Estimation Using Partial Discharge Profiles

In this study, the TCN model was employed to estimate the SOH using partial discharge profiles. The TCN implementation was based on Python 3.11.9 and PyTorch 2.2.2. Model training was conducted on a computer equipped with an i5-13490F CPU and an Nvidia GeForce GTX 1660 SUPER GPU. The hyperparameters used in the TCN model are listed in Table 6.

To better evaluate the proposed data augmentation method, this paper only selected the partial discharge of the lower voltage section for comparative experiments. The control group only used the original dataset to train the TCN model, while the experimental group used both the original dataset and the synthesized data generated by the GAN. Two sets of experiments were labeled as “Raw” and “Raw + Synthetic”. The experiment used indicators such as delta, MAPE, and RMSE, as mentioned in Section 5.5.

Figure 11 compares the MAPE values of two experimental groups. Among the seven test conditions, the MAPE values of the experimental group decreased in five conditions. The specific MAPE and RMSE values are listed in Table 7. The experimental results demonstrate that the data augmentation method proposed in this paper can effectively improve the accuracy of SOH estimation using low-SOC profiles.

This study evaluated the robustness performance of the TCN model output using the delta value. Figure 12 shows the delta comparison of the outputs from seven group models. From the comparison plots of delta values, the variation in the delta value of the experimental group was smaller than that of the control group. The experimental results indicate that the proposed model has better robustness performance for partial SOC profiles.

In particular, in Groups 1 and 2, the model trained solely on raw data achieved better performance than the model augmented with synthetic data. This phenomenon can be attributed to two potential factors. First, the raw data in these groups exhibited relatively high quality and sufficient diversity, as the operating conditions were simpler and more consistent. Under such circumstances, the original dataset already provided effective coverage of the degradation patterns, and the inclusion of synthetic data may have introduced distributional variance without meaningful improvement. Second, although the DoppelGANger model is effective in most cases, it may not perfectly replicate certain fine-grained temporal dynamics inherent in the real data, especially when the raw data are already well-structured. Consequently, in data-rich scenarios, the performance gain from augmentation is limited and may even lead to slight overfitting or learning drift.

Furthermore, to facilitate a clearer interpretation of the presented results, a comparative analysis between the present work and the existing studies is provided in Table 8. All studies included in the comparison employed the NASA battery dataset as the experimental benchmark, with the overall root mean square error (RMSE) used as the primary evaluation metric. However, most of the existing studies relied on full discharge profiles, while the research focused on partial discharge profiles in low-SOC ranges remains limited. Accordingly, the comparison was structured in two parts.

For application scenarios constrained to low-SOC ranges, we compared the results of Use Case 3 with those reported in [22], which adopted a similar data partitioning strategy. Reference [22] achieved an RMSE of 0.131 using a TCN-based approach. In contrast, our method—which integrates GAN-based data augmentation with a TCN model—had a substantially reduced RMSE of 0.078, highlighting its superior performance under data-scarce conditions.

For more general application scenarios, where training was not limited by the SOC range, we compared the results of Use Case 4, which utilized full-range SOC discharge profiles, with several representative approaches, including machine learning-based, neural network-based, and physics-informed models. In comparison, our proposed method achieved the lowest RMSE of 0.0061, significantly outperforming all of the baseline models. This result demonstrates the model’s strong generalization ability and robustness across varying SOC conditions, particularly in challenging and underexplored scenarios.

7. Conclusions

In this study, a novel data-driven framework was proposed to estimate the SOH of lithium-ion batteries using partial discharge profiles. To address the challenge of limited data availability, particularly under low-SOC conditions, a data augmentation strategy based on the DoppelGANger network was employed. The quality of the generated synthetic data was validated using statistical metrics (Wasserstein Distance and Kullback–Leibler Divergence) and dimensionality reduction techniques (PCA and t-SNE), confirming its high similarity to real battery data.

To capture the temporal dependencies in discharge sequences, a TCN model was adopted. The proposed framework was evaluated using three key performance metrics—RMSE, MAPE, and delta value—across seven battery aging scenarios. The experimental results demonstrated that the GAN-augmented data significantly improved the accuracy and robustness of the SOH estimation. Specifically, in low-SOC range profiles, the model achieved an overall RMSE of 0.078, representing a substantial improvement over the baseline methods. Under full SOC range conditions, the framework achieved an overall RMSE of 0.0061, outperforming all of the existing comparative approaches. In addition, the MAPE values decreased in five out of seven test cases, and the delta curves exhibited reduced variance, indicating enhanced stability in the estimation output.

Despite these promising results, several limitations of the current work remain, including the following:

The framework only incorporated voltage and current, excluding other relevant indicators, such as temperature, impedance, or internal resistance.
The evaluation was validated solely on the NASA dataset, and the generalizability of the proposed method to other battery types or operational environments has yet to be tested.

Future work will focus on addressing these limitations in the following ways:

Integrating multi-modal sensor data to further enhance the reliability of SOH estimation;
Verifying the transferability of the approach to different datasets and real-world usage scenarios;
Investigating more advanced temporal architectures, such as attention-based models and graph neural networks to improve the modeling of battery degradation patterns.

Overall, the proposed method offers a practical and extensible solution for partial data-driven SOH estimation, contributing to the development of intelligent battery management systems for electric vehicles.

Author Contributions

Conceptualization, Y.-H.L.; methodology, Y.-H.L.; software, H.Z.; validation, H.Z. and Y.-H.L.; formal analysis, H.Z.; investigation, H.Z.; resources, H.Z.; data curation, H.Z.; writing—original draft preparation, H.Z.; writing—review and editing, Y.-H.L.; visualization, H.Z.; supervision, Y.-H.L.; project administration, Y.-H.L.; funding acquisition, Y.-H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Xiamen University Tan Kah Kee College (JGH2024015), and the APC was funded by JGH2024015.

Data Availability Statement

The data presented in this study are available at the following link: Prognostics Center of Excellence Data Set Repository—NASA (https://www.nasa.gov/intelligent-systems-division/discovery-and-systems-health/pcoe/pcoe-data-set-repository/, accessed on 10 July 2024). Details on the processing and utilization of the raw data are elaborated in Section 5.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gabbar, H.A.; Othman, A.M.; Abdussami, M.R. Review of Battery Management Systems (BMS) Development and Industrial Standards. Technologies 2021, 9, 28. [Google Scholar] [CrossRef]
Chen, S.; Dai, F.; Cai, M. Opportunities and Challenges of High-Energy Lithium Metal Batteries for Electric Vehicle Applications. ACS Energy Lett. 2020, 5, 3140–3151. [Google Scholar] [CrossRef]
Niu, H.; Zhang, N.; Lu, Y.; Zhang, Z.; Li, M.; Liu, J.; Zhang, N.; Song, W.; Zhao, Y.; Miao, Z. Strategies toward the Development of High-Energy-Density Lithium Batteries. J. Energy Storage 2024, 88, 111666. [Google Scholar] [CrossRef]
Li, J.; Adewuyi, K.; Lotfi, N.; Landers, R.G.; Park, J. A Single Particle Model with Chemical/Mechanical Degradation Physics for Lithium Ion Battery State of Health (SOH) Estimation. Appl. Energy 2018, 212, 1178–1190. [Google Scholar] [CrossRef]
Lin, M.; Yan, C.; Wang, W.; Dong, G.; Meng, J.; Wu, J. A Data-Driven Approach for Estimating State-of-Health of Lithium-Ion Batteries Considering Internal Resistance. Energy 2023, 277, 127675. [Google Scholar] [CrossRef]
Wang, Y.; Tian, J.; Sun, Z.; Wang, L.; Xu, R.; Li, M.; Chen, Z. A Comprehensive Review of Battery Modeling and State Estimation Approaches for Advanced Battery Management Systems. Renew. Sustain. Energy Rev. 2020, 131, 110015. [Google Scholar] [CrossRef]
Sun, X.; Zhang, Y.; Zhang, Y.; Wang, L.; Wang, K. Summary of Health-State Estimation of Lithium-Ion Batteries Based on Electrochemical Impedance Spectroscopy. Energies 2023, 16, 5682. [Google Scholar] [CrossRef]
Andre, D.; Meiler, M.; Steiner, K.; Wimmer, C.; Soczka-Guth, T.; Sauer, D.U. Characterization of High-Power Lithium-Ion Batteries by Electrochemical Impedance Spectroscopy. I. Experimental Investigation. J. Power Sources 2011, 196, 5334–5341. [Google Scholar] [CrossRef]
Li, C.; Yang, L.; Li, Q.; Zhang, Q.; Zhou, Z.; Meng, Y.; Zhao, X.; Wang, L.; Zhang, S.; Li, Y.; et al. SOH Estimation Method for Lithium-Ion Batteries Based on an Improved Equivalent Circuit Model via Electrochemical Impedance Spectroscopy. J. Energy Storage 2024, 86, 111167. [Google Scholar] [CrossRef]
Demirci, O.; Taskin, S.; Schaltz, E.; Acar Demirci, B. Review of Battery State Estimation Methods for Electric Vehicles-Part II: SOH Estimation. J. Energy Storage 2024, 96, 112703. [Google Scholar] [CrossRef]
Weng, C.; Cui, Y.; Sun, J.; Peng, H. On-Board State of Health Monitoring of Lithium-Ion Batteries Using Incremental Capacity Analysis with Support Vector Regression. J. Power Sources 2013, 235, 36–44. [Google Scholar] [CrossRef]
Shu, X.; Li, G.; Zhang, Y.; Shen, J.; Chen, Z.; Liu, Y. Online Diagnosis of State of Health for Lithium-Ion Batteries Based on Short-Term Charging Profiles. J. Power Sources 2020, 471, 228478. [Google Scholar] [CrossRef]
Lyu, C.; Lai, Q.; Ge, T.; Yu, H.; Wang, L.; Ma, N. A Lead-Acid Battery’s Remaining Useful Life Prediction by Using Electrochemical Model in the Particle Filtering Framework. Energy 2017, 120, 975–984. [Google Scholar] [CrossRef]
Gou, B.; Xu, Y.; Feng, X. State-of-Health Estimation and Remaining-Useful-Life Prediction for Lithium-Ion Battery Using a Hybrid Data-Driven Method. IEEE Trans. Veh. Technol. 2020, 69, 10854–10867. [Google Scholar] [CrossRef]
Zhang, A.; Lipton, Z.C.; Li, M.; Smola, A.J. Dive into Deep Learning; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
Sun, S.; Sun, J.; Wang, Z.; Zhou, Z.; Cai, W. Prediction of Battery SOH by CNN-BiLSTM Network Fused with Attention Mechanism. Energies 2022, 15, 4428. [Google Scholar] [CrossRef]
Petkovski, E.; Marri, I.; Cristaldi, L.; Faifer, M. State of Health Estimation Procedure for Lithium-Ion Batteries Using Partial Discharge Data and Support Vector Regression. Energies 2024, 17, 206. [Google Scholar] [CrossRef]
Saxena, S.; Hendricks, C.; Pecht, M. Cycle Life Testing and Modeling of Graphite/LiCoO2 Cells under Different State of Charge Ranges. J. Power Sources 2016, 327, 394–400. [Google Scholar] [CrossRef]
Zhao, C.; Andersen, P.B.; Træholt, C.; Hashemi, S. Data-Driven Battery Health Prognosis with Partial-Discharge Information. J. Energy Storage 2023, 65, 107151. [Google Scholar] [CrossRef]
Yu, P.; Zhou, C.; Yu, Y.; Chang, Z.; Li, X.; Huang, K.; Yu, J.; Yan, K.; Jiang, X.; Su, Y. Improved PSO-TCN Model for SOH Estimation Based on Accelerated Aging Test for Large Capacity Energy Storage Batteries. J. Energy Storage 2025, 108, 115031. [Google Scholar] [CrossRef]
Fan, C.; Sun, J.; Wang, H. SOH Estimation of Lithium-Ion Batteries Based on Raw Charging Data and TCN. In Proceedings of the 2024 4th International Conference on Energy, Power and Electrical Engineering (EPEE), Wuhan, China, 20–22 September 2024; pp. 699–702. [Google Scholar]
Bockrath, S.; Lorentz, V.; Pruckner, M. State of Health Estimation of Lithium-Ion Batteries with a Temporal Convolutional Neural Network Using Partial Load Profiles. Appl. Energy 2023, 329, 120307. [Google Scholar] [CrossRef]
Li, J.; Liu, Y.; Li, Q. Generative Adversarial Network and Transfer-Learning-Based Fault Detection for Rotating Machinery with Imbalanced Data Condition. Meas. Sci. Technol. 2022, 33, 045103. [Google Scholar] [CrossRef]
Bui, V.; Pham, T.L.; Nguyen, H.; Jang, Y.M. Data Augmentation Using Generative Adversarial Network for Automatic Machine Fault Detection Based on Vibration Signals. Appl. Sci. 2021, 11, 2166. [Google Scholar] [CrossRef]
Shangguan, A.; Xie, G.; Fei, R.; Mu, L.; Hei, X. Train Wheel Degradation Generation and Prediction Based on the Time Series Generation Adversarial Network. Reliab. Eng. Syst. Saf. 2023, 229, 108816. [Google Scholar] [CrossRef]
Seol, S.; Lee, J.; Yoon, J.; Kim, B. Improving SOH Estimation for Lithium-Ion Batteries Using TimeGAN. Mach. Learn. Sci. Technol. 2023, 4, 045007. [Google Scholar] [CrossRef]
Bole, B.; Kulkarni, C.S.; Daigle, M. Adaptation of an Electrochemistry-Based Li-Ion Battery Model to Account for Deterioration Observed Under Randomized Use. Annu. Conf. PHM Soc. 2014, 6, 1–9. [Google Scholar] [CrossRef]
Bole, B.; Kulkarni, C.S.; Daigle, M. “Randomized Battery Usage Data Set”. NASA Prognostics Data Repository. NASA Ames Research Center, Moffett Field, CA, USA, 2014. Available online: https://www.nasa.gov/intelligent-systems-division/discovery-and-systems-health/pcoe/pcoe-data-set-repository/ (accessed on 10 July 2024).
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Curran Associates, Inc.: New York, NY, USA, 2014; Volume 27. [Google Scholar]
Esteban, C.; Hyland, S.L.; Rätsch, G. Real-Valued (Medical) Time Series Generation with Recurrent Conditional GANs. Available online: https://arxiv.org/abs/1706.02633v2 (accessed on 23 April 2025).
Yoon, J.; Jarrett, D.; van der Schaar, M. Time-Series Generative Adversarial Networks. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Lin, Z.; Jain, A.; Wang, C.; Fanti, G.; Sekar, V. Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions. In Proceedings of the ACM Internet Measurement Conference, New York, NY, USA, 27 October 2020; Virtual Event USA; ACM: New York, NY, USA, 2020; pp. 464–483. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Shen, S.; Sadoughi, M.; Li, M.; Wang, Z.; Hu, C. Deep Convolutional Neural Networks with Ensemble Learning and Transfer Learning for Capacity Estimation of Lithium-Ion Batteries. Appl. Energy 2020, 260, 114296. [Google Scholar] [CrossRef]
State-of-Health Estimation of Li-Ion Batteries in Electric Vehicle Using IndRNN under Variable Load Condition. Available online: https://www.mdpi.com/1996-1073/12/22/4338 (accessed on 23 April 2025).
A Computationally Efficient Approach for the State-of-Health Estimation of Lithium-Ion Batteries. Available online: https://www.mdpi.com/1996-1073/16/14/5414 (accessed on 23 April 2025).
Ye, J.; Xie, Q.; Lin, M.; Wu, J. A Method for Estimating the State of Health of Lithium-Ion Batteries Based on Physics-Informed Neural Network. Energy 2024, 294, 130828. [Google Scholar] [CrossRef]

Figure 1. Voltage, current, and temperature data of Cell 1 in the first 10 cycles in Group 1.

Figure 2. Reference discharge profiles are segmented into different use cases based on the ranges defined in Table 3. Background colors represent different use cases, and the color gradients of the curves illustrate the evolution of profiles with increasing RW cycles.

Figure 3. The structure of the basic GAN mode.

Figure 4. The overall architecture of the DG.

Figure 5. The overall architecture of a TCN.

Figure 6. Causal convolutional networks.

Figure 7. The overview of the methodology and procedure.

Figure 8. Original and synthetic voltage data.

Figure 9. Original and synthetic current data.

Figure 10. Visualization of PCA and t-SNE.

Figure 11. The experimental results of MAPE.

Figure 12. The experimental results of the delta values.

Table 1. Battery specifications for the NASA PCoE randomized battery usage dataset.

Battery Key Characteristics	Specifications
Manufacturer	LG Chem
Battery chemistry	Lithium cobalt oxide vs. graphite
Nominal capacity	2.1 Ah
Lower cut-off voltage	3.2 V
Upper threshold voltage	4.2 V

Table 2. NASA PCoE randomized battery usage dataset test conditions and groups.

Group (Cells Id)	Test Conditions
Group 1 (RW1, RW2, RW7, RW8)	Randomized charging (0.5–3 h) to 4.2 V and discharging to 3.2 V, with currents between −0.5 A and −4 A. Reference tests every 50 cycles.
Group 2 (RW3–RW6)	Non-randomized charging to 4.2 V and discharging to 3.2 V, with randomized currents (−0.5 A to −4 A). Reference tests every 50 cycles.
Group 3 (RW9–RW12)	Charging and discharging with randomized current pulses (30 min–3 h). Discharging currents between −0.5 A and −4 A. Reference tests every 1500 cycles.
Group 4 (RW13–RW16)	Charging to 4.2 V and discharging to 3.2 V, with customized probability distribution (peak at 4 A). Load points are updated every minute. Tests at ~40 °C. Reference tests every 50 cycles.
Group 5 (RW17–RW20)	Same as Group 4, but the ambient temperature was not strictly controlled (lower than 40 °C). Reference tests every 50 cycles.
Group 6 (RW21–RW24)	Same as Group 4, but the probability distribution skewed toward lower currents (peak at 2 A). Tests at ~40 °C. Reference tests every 50 cycles.
Group 7 (RW25–RW28)	Same as Group 6, but the ambient temperature was not strictly controlled (lower than 40 °C). Reference tests every 50 cycles.

Table 3. SOC ranges and voltage ranges to fragment the partial discharge profiles.

Use Case	SOC Ranges	Voltage Ranges
1	100% to 66.7%	4.2 V to 3.7 V
2	66.7% to 33.3%	3.7 V to 3.5 V
3	33.3% to 0%	3.5 V to 3.2 V
4	100% to 0%	4.2 V to 3.2 V

Table 4. The hyperparameters of the DG model.

Hyperparameters	Values
max_sequence_len	700
sample_len	500
batch_size	1000
generator_learning_rate	1 × 10⁻⁴
discriminator_learning_rate	1 × 10⁻⁴
Epochs	5000

Table 5. Quantitative indicators of the GAN model.

Data Source	Wasserstein Distance (<0.05)	KL Divergence (<2)
voltage	0.0094	0.6817
current	0.0267	1.2137

Table 6. The hyperparameters of the TCN model.

Hyperparameters	Values
input_size	1
output_size	1
kernel_size	3
dropout	0.33
Dilation	[1, 2, 4, 8, 16, 32, 64]
earning_rate	0.001
epochs	5000

Table 7. The experimental results of MAPE and RMSE.

Group	Raw Data (TCN Only)		Raw + Synthetic Data (DoppelGANger and TCN)
	MAPE	RMSE	MAPE	RMSE
1	8.2710%	0.0749	11.9831%	0.1121
2	6.9889%	0.0624	8.8346%	0.0772
3	11.0467%	0.0745	10.0822%	0.0684
4	12.2142%	0.0936	7.1604%	0.0571
5	15.2889%	0.1065	9.8016%	0.0741
6	8.8731%	0.0825	3.2651%	0.0359
7	9.1879%	0.0819	3.6310%	0.0409

Table 8. The comparison of the performance index.

References	Methods	Conditions	RMSE
[22]	TCN	Partial discharge profiles (Use Case 3)	0.131
Present work	DoppelGANger and TCN	Partial discharge profiles (Use Case 3)	0.078
[34]	DCNN	Full discharge profiles	0.015~0.037
[35]	IndRNN	Full discharge profiles	0.017~0.03
[36]	LightGBM-WQR	Full discharge profiles	0.0136~0.0286
[37]	Physics-informed Model	Full discharge profiles	0.008–0.015
[22]	TCN	Full discharge profiles (Use Case 4)	0.01
Present work	DoppelGANger and TCN	Full discharge profiles (Use Case 4)	0.0061

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Lai, Y.-H. Improving State-of-Health Estimation for Lithium-Ion Batteries Based on a Generative Adversarial Network and Partial Discharge Profiles. World Electr. Veh. J. 2025, 16, 277. https://doi.org/10.3390/wevj16050277

AMA Style

Zhang H, Lai Y-H. Improving State-of-Health Estimation for Lithium-Ion Batteries Based on a Generative Adversarial Network and Partial Discharge Profiles. World Electric Vehicle Journal. 2025; 16(5):277. https://doi.org/10.3390/wevj16050277

Chicago/Turabian Style

Zhang, Hangyu, and Yi-Horng Lai. 2025. "Improving State-of-Health Estimation for Lithium-Ion Batteries Based on a Generative Adversarial Network and Partial Discharge Profiles" World Electric Vehicle Journal 16, no. 5: 277. https://doi.org/10.3390/wevj16050277

APA Style

Zhang, H., & Lai, Y.-H. (2025). Improving State-of-Health Estimation for Lithium-Ion Batteries Based on a Generative Adversarial Network and Partial Discharge Profiles. World Electric Vehicle Journal, 16(5), 277. https://doi.org/10.3390/wevj16050277

Article Menu

Improving State-of-Health Estimation for Lithium-Ion Batteries Based on a Generative Adversarial Network and Partial Discharge Profiles

Abstract

1. Introduction

2. The Analysis of the Dataset

2.1. Dataset Introduction

2.2. Definition of SOH

2.3. Partial Discharge Profiles

3. Data Augmentation Using GAN

3.1. Metadata Generator

3.2. Time Series Generator

3.3. Discriminator

3.4. Normalization Mechanism and Mode Collapse Prevention

4. Estimation of SOH Through TCN

4.1. Dilated Convolutional Networks

4.2. Causal Convolutional Networks

4.3. Residual Blocks

5. Method and Procedure

5.1. Data Preprocessing

5.2. GAN Enhancement

5.3. Data Split

5.4. TCN Training

5.5. Performance Evaluation

6. Experimentation and Analysis

6.1. GAN-Based Synthetic Data Generation and Evaluation

6.2. TCN-Based SOH Estimation Using Partial Discharge Profiles

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI