Explainable Artificial Intelligence for State of Charge Estimation of Lithium-Ion Batteries

Chan, Ho Tung Jeremy; Rubeša-Zrim, Jelena; Pichler, Franz; Salihi, Amil; Mourad, Adam; Šimić, Ilija; Časni, Kristina; Veas, Eduardo

doi:10.3390/app15095078

Open AccessProject Report

Explainable Artificial Intelligence for State of Charge Estimation of Lithium-Ion Batteries

by

Ho Tung Jeremy Chan

^1,2,*

,

Jelena Rubeša-Zrim

³

,

Franz Pichler

³

,

Amil Salihi

³

,

Adam Mourad

³

,

Ilija Šimić

²

,

Kristina Časni

² and

Eduardo Veas

^1,2

¹

Human-Centred Computing, Graz University of Technology, 8010 Graz, Austria

²

Human AI Interaction, Know-Center Research GmbH, Sandgasse 34, 8010 Graz, Austria

³

Virtual Vehicle Research GmbH, Inffeldgasse 21a, 8010 Graz, Austria

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 5078; https://doi.org/10.3390/app15095078

Submission received: 20 February 2025 / Revised: 15 April 2025 / Accepted: 28 April 2025 / Published: 2 May 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

The production of electric vehicle (EV) batteries is playing an increasingly significant role in the decarbonization of the mobility sector. In order for EV batteries to be competitive against internal combustion engines, it is crucial to maximize the primary and secondary life cycles of batteries. This necessitates a battery management system that can ensure performance, safety, and longevity. State of Charge (SoC) estimation is important for such a system, as it ensures efficiency of the battery’s performance, and it is necessary for the prediction of the battery’s health and lifespan. Existing SoC estimation methods heavily depend on laboratory tests, which are both costly and time consuming. Additionally, the simulated nature of laboratory settings cannot guarantee robustness when the same method is applied to field data collected from real-world scenarios. A suitable alternative to this problem is the use of data-driven approaches. The goal of this work is the estimation of SoC with a real-world dataset using neural networks. Furthermore, we demonstrate how explainable AI (xAI) and importance estimate can be applied to inform what signals and which parts of a signal are important for SoC estimation. This helps to reduce redundancy, and it provides more information regarding the relationships within battery cells that are otherwise obscured by the complexity of the battery. The methods that we used resulted in a mean squared error (MSE) of as low as 3 ×

10^{- 4}

, and the information provided by xAI suggested that it is possible to discard up to 25% of the input profile whilst retaining similar performance.

Keywords:

electric vehicle field data; explainable AI; lithium-ion battery; neural networks; state of charge estimation

1. Introduction

The electrification of energy-intensive industries and transportation is crucial in addressing pressing environmental issues. Energy storage systems, particularly batteries, are pivotal in this transition. Batteries are of great importance in the mobility sector due to their cost and limited lifespan, as they are critical for the performance and safe operation of hybrid and electric vehicles (evs) [1]. The main issue regarding mobility transition is the continuously increasing number of battery systems along with their increasing complexity. This poses a management challenge for type-specific battery estimation, prediction, and assessment [2]. Therefore, it is important to develop tools to understand battery cells and battery packs. Such tools need to be easily adaptable to different batteries. This can be achieved by using the combined possibilities of long-accumulated battery knowledge and cloud-based data-driven artificial intelligence (ai) technologies [3], which are more adaptable toward the fast-developing pace and complexity of real-world applications when compared to classical methods [4]. This is especially true since the majority of classical methods tend to model battery cells of EVs whilst assuming controlled laboratory settings, which is highly unlikely in real-world applications.

The main objective of this work is to enable fast and reliable real EV battery assessment under real working conditions, i.e., using field data. We demonstrate this via an accurate and flexible State of Charge (soc) prediction framework for Lithium-Ion (li-ion) batteries using Neural Networks (nns). This is achieved by conducting comprehensive modeling of SoC estimation and benchmarking state-of-the-art NN architectures for time-series data in this context with data from real working conditions.

Multiple methods of SoC estimation for Li-ion batteries have been proposed. There are direct methods such as Coulomb Counting [5,6,7]; model-based methods like Equivalent Circuit [5,8,9]; observer-based methods akin to the Proportional Integral Observer [5]; filter-based methods that consist of Kalman Filter variations [5,6]; data-driven methods, which include the use of machine learning such as linear models, trees, ensembles, and kNN enhanced by filters like the Gaussian filter [5,10]; and other data-driven methods that use NNs [5] to capture the relationships within the data without prior domain knowledge.

NNs [11] are mathematical models with a cascade of computational units connected in non-linear manners. They have shown to be capable regressors and classifiers when used in a data-driven manner. The battery of an EV delivers many signals, which makes it a suitable use-case for NN. This is highly relevant in recent years, as suggested by Jose and Shrivastava [12] due to the rise of hybrid methodologies in SoC. Different NN architectures [13] offer different functionalities when it comes to capturing information within the data. For example, Convolutional NNs (cnns) utilize kernel filters to capture information from images or sequences, while Recurrent NNs (rnns) utilize gates to capture temporal information from ordered sequences. Related works have deployed basic CNN architectures and RNN variants (lstm, gru) to estimate SoC with data from lab tests [14].

Although NNs should be able to learn to estimate SoC via the abundance of signal data from the battery of an EV, signals can have redundancy, and there can also be obfuscated correlations between signals and the SoC. When this is not properly preprocessed, it can be detrimental to a NN’s estimation of SoC, where it could prohibit learning of the relationships between the signals and the SoC, thus becoming dependent on an unrelated pattern between signals to make their SoC estimation. Therefore, it is important to identify significance and redundancy within the input data. This is, however, not trivial for NNs, as feature selection is accomplished via their internal mechanism during backpropagation [11]. Methods directed at identifying significance and redundancy within the input are referred to as explainable AI (xai) or importance estimate methods. Within this work, we demonstrate the utility of xAI methods to derive the significance of features used in battery modeling with NNs for SoC estimation. Additionally, we show how this information can be used to further improve SoC estimation via the input profiles of NNs.

The main contributions of this work can be summarized as follows:

Adaptation and benchmarking of NN architectures for SoC estimation of deployed EV battery field data.
Establishment of a training, validation, and testing procedure for field data of deployed EV batteries, such that SoC estimation from NNs can be evaluated with different metrics.
Improvement of SoC estimation via the investigation of EV battery signals and the fusion of signal features.
Use of xAI methods in a workflow to obtain the key features used by a NN when performing SoC estimation.

2. Related Work

2.1. Battery Testing

There exist many forms of battery testing. Within this work, we focus on battery cycles in particular, which also consist of different state estimation categories: State of X (sox), including State of Power (sop), State of Health (soh), and SoC.

SoP focuses on the distribution of power; it measures this by examining the power output during various different conditions [9]. SoH focuses on the lifespan of a battery; it measures the battery’s performance in terms of capacity and internal resistances [15]. This can be achieved by using different signal profiles such as voltage and current [16]. SoC focuses on representing the remaining energy within a battery; it measures this by calculating the charge via methods such as Coulomb Counting [5,7], or via model-based estimations using different signal profiles [5,10]. SoC is important for state estimation of batteries, as most battery characteristics, e.g., open-circuit voltage, resistance, time constants, etc., strongly depend on SoC. For example, in order to estimate SoH accurately, the capacity of the battery is required, which is directly related to SoC.

Many SoC methods in the literature [17] are only applicable with experiments that have been specifically designed to be conducted within a laboratory, e.g., constant-current discharging. In most real-world scenarios, this is not feasible. The simplest approach to SoC estimation is Coulomb counting [5,7]; however, it suffers from cumulative measurement errors and sampling rates. This calls for model-based methods, such as equivalent circuits [9] used in conjunction with the Kalman filter and its variations [6], which adjust errors based on a battery model. With the rise of AI, there has also been an increasee in data-driven methods to estimate SoC [8,18], which have shown comparable results with the results of model-based methods. Within this work, we explore the feasibility of utilizing a data-driven approach when estimating SoC, and provide its achieved benchmark.

2.2. Battery Modeling

Modeling and simulation tools are the key components of any state estimation algorithm. They can be roughly categorized into three groups:

White-box Models: Full physical description (physics-based models)
Black-box Models: Purely data driven (AI methods)
Gray-box Models: Mixed approach, empirical models that use data for parameterization

When considering EV battery packs, an additional layer of complexity exists. A battery pack does not only consists of one battery cell but hundreds of cells, where the performance of individual cell can influence the battery pack’s performance drastically.

Classic (human knowledge or physics-based) battery modeling primarily utilizes two of the three approaches above; namely, gray-box models and white-box models.

Equivalent circuits [5,8,9] are a form of gray-box model that can explain the electrical and possible thermal behavior of batteries via simple electrical building blocks, e.g., resistors, voltage sources, and capacitors. These blocks can represent and model processes such as open-circuit voltage, reaction kinetics, and diffusion processes. Such models are easy to understand and easy to apply; however, they are limited in their scope and ability to capture more complex processes, such as the aging of a battery through state evolution.

Electro-chemical modeling [8] is a white-box model that can directly derive the battery behavior from the physical parameters of all included materials within the battery. Theoretically, this approach allows for in-depth, accurate prediction and state estimation, which involves the aging of a battery through state evolution. However, accurate knowledge of the mentioned physical parameters and physical processes is needed in order to utilize these models in a proper way. Such information is not available in most situations since direct measurement of these parameters is costly and often impossible. This makes it infeasible in real-world applications.

When EV battery packs based solely on the field data are analyzed, all the discussed challenges become even more difficult due to restricted knowledge of what is happening within the cells. Typically, the data available from an EV consist only of temperature data from a few temperature sensors, usually one sensor per module, and parallel brick-level cell voltage signals (up to 70 cells in parallel). The lack of information poses a challenging initial situation for any estimation framework considering the minimal to no knowledge of the cell scattering within the EV battery pack. Conversely, black-box models are purely data-driven and require the least amount of prior knowledge regarding the battery or the physical processes within the battery [8], as they model the relationships within the data to the target.

2.3. Battery Modeling Using Neural Networks

With the demand of EVs and the growth of NNs, researchers have attempted to utilize NNs to model EV batteries. Guo et al. [19] used NNs to acquire accurate SoC estimation via optimization schemes to tune the parameters of NN models. Choi et al. [16] used multi-channel profiles to estimate SoH, comparing three types of networks: forward neural networks (FNNs), cnns, and lstm. They used the NASA battery dataset with eight Li-ion batteries running three different operational profiles, i.e., charging, discharging, and rest period, at room temperature. Bhattacharjee et al. [20] found that a 1D CNN was suitable for producing accurate SoC estimates for EV batteries. Chen et al. [14] suggested using gate recurrent unit (gru) networks to predict SoC. They collected data with battery testing equipment to deploy their models and compared RNN, LSTM and GRU, obtaining marginally smaller errors with the latter. Jia et al. [21] used bidirectional LSTM as a speed predictor in conjunction with an energy management system to ensure accurate prediction. Hannan et al. [22] used a transformer architecture for battery SoC estimation, and they also introduced a self-supervised learning framework for SoC labeling.

Table 1 summarizes the related works in terms of dataset type, NN architecture type, and the use of xAI. Existing works have demonstrated the feasibility of using NNs to estimate SoC. However, each of these works uses different datasets and a different set of NN architectures, making it difficult to draw conclusions or compare performances. In addition, these works use datasets acquired in experimental settings under laboratory conditions. Our goal is to benchmark NN architectures that excel in time-series applications in the context of SoC estimation using field data that are not simulated or obtained within a laboratory controlled environment.

2.4. Explainable Artificial Intelligence

NNs are often difficult to interpret due to obscured internal mechanisms. Being able to understand which features amongst the input are contributing to a model’s output is important, as it can lead to better decision making and redundancy reduction. In the case of EV batteries, where features are signals that are collected from the battery, being able to understand what signals NNs are focusing on can reduce redundancy in SoC estimation.

There are many feature importance approaches, such as the use of model gradients [28], information backpropagation [27], embedded feature selection methods [29,30], etc. These can be grouped into global [31,32] or local approaches [33]. Global approaches estimate general importance across samples, whereas local approaches estimate importance per individual sample.

Global approaches focus on the predictive power of feature subsets. They include methods such as Triple-Hierarchy Feature Enhancement (THFE [34]), which utilizes multiple layers of a Deep NN to extract different feature spaces. Such spaces are then used in conjunction to provide information regarding feature importance captured by the model. Shapley Additive Global Importance (SAGE [35]) is another global approach, which quantifies a feature’s importance based on how much a model depends on a feature and its interaction with other features in its predictive power.

Local approaches include model interpretability methods that focus on attributing importance. This can create attribution maps, which can be aggregated to reveal data properties picked up by the model [36]. This is also the preferred approach when it comes to understanding models [37]. Examples include InputXGradient [27], which attributes importance by combining the input and the respective gradient; and DeepLIFT [27], which attributes importance based on activation differences during propagation.

In the previous section, we discussed the difference between existing works and our work in terms of the field data nature of our dataset and our data-driven modeling approach using NNs for SoC estimation. Table 1 further shows another notable difference, which is the incorporation of xAI within our work. We utilized xAI to gain a better understanding of what signals are important from an EV battery when it comes to estimating its SoC. We utilized both global and local xAI approaches to compare and verify the information derived. This provides additional justification for the feature choices during the NN modeling process.

3. Methodology

3.1. Outline

In this work, we aim to approach SoC estimation of a deployed EV battery in the following manner:

Data: Obtain data from EVs under real working conditions and utilize the battery signals in a data-driven approach.
NN Modeling: Build NNs that can provide robust and resilient estimation of SoC and compare their performance.
Feature Analysis—Fusion of Static and Dynamic Features: Engineer an approach for the fusion of features, specifically in terms of static and dynamic features, in a data-driven manner to estimate SoC.
Feature Analysis—Significance and Redundancy: Use xAI and importance estimate methods to evaluate the data features used for the estimation of SoC.

3.2. Data

The battery-related data used within this work were collected by AVILOO GmbH [38]. AVILOO is a company that specializes in the diagnostics of electric and hybrid vehicle batteries, specifically the testing of lithium-ion batteries with independent, accurate, and reliable diagnostics. AVILOO’s battery diagnostics provide transparency and confidence in order for potential buyers to be informed when purchasing EVs.

The data were directly obtained from the Battery Management System (bms) of EVs. Currently, AVILOO has data from more than 70 different car models, including long-term monitoring data as well as several full discharge cycles, albeit sparse with exceptions.

Within this work, we focus on the vehicle model of Volkswagen e-Golf (e-Golf), due to the vast amount of data available and the variety of age, mileage, and SoC amongst the Volkswagen e-Golf cars. The details are available in Table 2.

Signal description: The e-Golf dataset contains various battery-related signals, such as voltage, current, and temperature data of the battery. Table 3 offers an overview of the available signals. The battery pack of an e-Golf consists of 27 battery modules. A module consists of cell blocks connected in series, and each cell block is a parallel connection of single cells. Two different modules were used in the e-Golf battery pack: 4S3P, and 2S3P. A 4S3P module consists of four cell blocks in series, and each cell block is a parallel connection of three single battery cells. The 2S3P is a series connection of two cell blocks. These modules (17× 4S3P and 10× 2S3P) were connected in series, resulting in 88 units of cells put in 3P (3 cells in parallel). The battery cells used in the e-Golf had a rated capacity of 37 Ah and a nominal voltage of 3.7 V. Therefore, the overall capacity of the battery pack was 111 Ah, and the overall nominal voltage was 325 V.

Two different SoC signals were obtained from the bms. The SOC REAL signal is related to the 111 Ah overall capacity of the battery pack. However, to protect the battery pack and increase its service life, the voltage limits were restricted. This means that only 100 Ah were available for capacity and discharging when the car was driven. The SOC DISPLAY signal is related to the restricted 100 Ah capacity and is also the signal that is provided to the driver. Figure 1 shows the two SoC signals and how they relate to the battery pack’s capacity.

We have introduced two new features derived from the existing measurements: differential capacity and delta resistance. Both features were calculated based on the measurements of current, voltage, and charge. They were used to aid SoC estimation.

Differential capacity represents the change in charge concerning the change in voltage. It was calculated as follows:

C_{d i f f} = \frac{Δ Q}{Δ V}

(1)

where Q represents the charge and was computed using the cumulative sum of the product of current and time intervals.

Delta resistance represents the change in voltage with respect to the change in current and was calculated as follows:

R_{Δ} = \frac{Δ V}{Δ I}

(2)

Normalization: To prevent any single feature from overpowering the learning algorithm based on its magnitude and to expedite optimization convergence, all signals were scaled to the range of [0, 1]. This scaling technique ensures that each feature contributes proportionally to the learning process. The formula for normalization is as follows:

x_{n o r m} = \frac{x - m i n (x)}{m a x (x) - m i n (x)}

(3)

where x is an element of an array that represents a certain feature.

Interpolation of signals: The signals collected exhibited varying sampling rates. For example, some signals (e.g., current) were sampled frequently, while others had a lower frequency. To address this, we established a consistent time vector by selecting the maximum start time and the minimum end time across all signals. Using the defined time step in seconds, we constructed a vector with equidistant intervals. For each session, we created interpolators for individual signals to synchronize the data with the uniform time vector.

Window segmentation: It was imperative to address the inherent heterogeneity in the temporal characteristics of the captured driving sessions. These sessions were reflective of authentic, unscripted driving experiences; this means they exhibited varying durations and frequencies of pauses, which were not pre-determined by AVILOO [38]. To facilitate a rigorous comparative analysis, it became necessary to introduce controlled segmentation of the data. This was achieved through the establishment of discrete “windows”, defined by a specific number of data points denoted as N (e.g.,

N = 1200

is equivalent to 20 min). Sub-sessions falling short of the N threshold were excluded from consideration. Conversely, for sub-sessions exceeding N, an operational parameter s was introduced, representing the stride. This parameter dictated the step size or the interval at which the data window advanced along the temporal axis. In cases where

s = N

, a “full-stride” scenario arose, wherein each window was independent of its preceding counterpart, containing no overlapping information.

Windows of size N were constructed as follows:

We interpolate the signals to the same equidistant frequency f (e.g., f = 1 s).
We take the first N values of every signal and construct a matrix, which is our first data input.
We use the stride of size s to ‘slide’ the window and take the values of every signal from time point s to s + N, which is our second data input. This step is repeated until the end of the session is reached.
Finally, these data inputs are grouped into batches, which are then pushed through the network.

Figure 2 demonstrates an example of window creation for

N = 2051

and

s = 1

.

3.3. Dataset

The data available from AVILOO [38] include a significant number of different e-Golf cars in several thousand sessions; see Table 2. A session is defined as the totality of signal data points between the start of a drive or charging event and its end. Sessions can be differentiated based on whether the battery was charging or discharging. They can also be differentiated by whether the session is a full cycle or a partial cycle. A full cycle is when the difference between the MAX SOC DISPLAY signal value and the MIN SOC DISPLAY signal value is greater than 90% within the session. Within this work, we focus on full cycle discharging sessions.

Recognizing the frequent inaccuracies in the original SoC reported by the vehicles, we carried out a thorough verification process and corrected the SoC whenever a mistake was found. We achieved this by developing precise Open-Circuit Voltage (ocv) curves; see Figure 3. The curves serve as a benchmark to align the cell voltages of the cars with established lookup tables, thus ensuring the SoC values to be as accurate and reliable as possible.

Dataset composition: The data comprise multiple sets of individual e-Golf cars. One car set contains one or more full-cycle discharged sessions. A car set containing only one full-cycle discharge session is referred as a single-labeled session. A car set containing more than one full-cycle discharge session is referred as a multi-labeled session. The sessions are labeled with SoC. The dataset used consisted of 4 multi-label sessions and 196 single-label sessions.

Data inspection: The features and the labels were checked to identify any anomalous signal behavior or outliers. Every sub-session (an uninterrupted driving session inside of an individual car set) was visualized; this allowed for a visual assessment of signal distribution. Subsequently, domain experts analyzed outliers effectively. The noise and layout of signals that could lead to false conclusions were excluded from the dataset. For example, one of the multi-SoC sessions was excluded from the analysis due to its lack of monotonically decreasing SoC behavior.

3.4. Train, Validation, and Test Splits

Following the creation of windows from the dataset, three distinct splitting strategies were employed:

Random (window) splitting: The windows were randomly divided into train, validation, and test sets, adhering to a stratified sampling approach based on the SoC label. The distribution of the SoC labels was maintained at approximately 70% (train), 15% (validation), and 15% (test) sets.

Intra-session splitting: With this splitting method, windows from the train and validation sets were combined and subsequently shuffled and randomly assigned to form new train and validation sets at a 70/30 ratio. However, the test set remained separate; hence, the testing was performed on ”unseen” data.

Session splitting: In this strategy, windows originating from the same session were not allowed to be allocated across multiple sets, which means windows from the same session were only present in one of the train, validation, or test sets. The objective was to prevent any data leakage between sets and maintain the integrity of session-wise information. Like the random splitting strategy, the train, validation, and test sets were formed using stratified sampling on the SoC label, which also ensured approximate distributions of 70%, 15%, and 15% respectively. By utilizing the stratification measure, we ensured the desired distribution based on the session size.

Such splitting strategies were used in order to investigate issues of data imbalance, overfitting and underfitting, data leakage, etc.

3.5. Neural Network Architectures

A literature analysis (see Section 2.3) revealed the following network types to be suitable for time series problem and SoC estimation:

Feed-Forward Neural Networks (FNNs): the baseline for comparison
Recurrent Neural Networks (RNNs)
–
LSTM
–
GRU
Convolutional Neural Networks (CNNs)
–
ResNet
–
InceptionTime
Hybrid CNN and RNN
–
CNN-LSTM
–
ResNet-LSTM
Transformer-based neural networks
–
Informer
Fusion of dynamic and static features: Fusion Hybrid Model

3.6. Evaluation Metrics

Evaluation metrics measure a model’s performance and how well it estimates SoC. The following evaluation metrics were used to compare NNs:

Mean Squared Error (MSE)
Mean Absolute Error (MAE)
Root Mean Squared Error (RMSE)
R Squared (R²)

4. Experiments and Results

We combined the aforementioned methods regarding data, datasets in the form of sessions, train–validation–test splits, and NN architectures to perform SoC estimation of the deployed EV battery. There were a total of three experiments:

E1: SoC Estimation using Different NN Architectures: this experiment serves as a benchmark of NN architectures for accurate and robust SoC estimation.
E2: SoC Estimation using Fusion Hybrid Model: this experiment utilizes NNs and the fusion of static and dynamic features from EV batteries to conduct SoC estimation.
E3: SoC Estimation Guided by xAI and Importance Estimates: this experiment builds on E1 by applying xAI and importance estimate methods to evaluate the significance of features for SoC estimation.

4.1. E1: SoC Estimation Using Different NN Architectures

This experiment benchmarks the architectures mentioned in Section 3.5. Comprehensive details regarding their functionality and parameters, as well as any required adjustments, are documented below:

FNNs are simple networks that can be used as a baseline for comparison. In our case, we adapted some basic NNs for SoC estimation and began implementing NNs with more complex architectures once the preliminary tests were conducted.
LSTM/GRU [39,40,41]: These are recurrent NNs that utilize internal states from hidden units along with memory gates to process data sequences.
The following hyperparameters were analyzed and tuned; the specifics can be found in Appendix B.
–
number of LSTM/GRU layers—how many layers are stacked on each other
–
hidden size of LSTM/GRU—the size of the hidden state of LSTM
–
dropout rate—rate of random values being set to 0 during training
ResNet [41]: also known as a residual NN, ResNet uses residual connections between an abundance of layers to improve the accuracy of model output. We adapted the ResNet implementation from Tayal et al.’s GitHub [42]. In our implmentation of ResNet within this work, layers were divided into three groups. Within each group, every layer had the same number of filters. This was done to minimize the number of hyperparameters, which would otherwise rise exponentially when fine-tuning. In addition, our implementation of ResNet did not allow for a large number of layers compared to the original implementation (100–150); instead, the highest number of layers that could be specified was 16. This decision was made based on initial testing, where performance started to decrease due to overfitting when larger numbers of layers were employed. The following hyperparameters were analyzed and tuned; the specifics can be found in Appendix B.
–
Number of layers—how many layers there are in the architecture, keeping in mind that every block contains two layers, so the number of layers must be an even number
–
Number of filters in groups 1/2/3—how many filters are in each layer in one group has
–
Stride in groups 1/2/3—the length of a stride from the convolution kernel in the specified group
–
Dropout rate—rate of random values being set to 0 during training
InceptionTime [43]: This is a variant of ResNet that employs the use of residual blocks for time series classification. We used the original implementation in TensorFlow [44], which we converted to PyTorch [45]. The following hyperparameters were analyzed and tuned; the specifics can be found in Appendix B:
–
Residual block size—number of filters that are output from the residual block (this number of filters is divided by four so that each convolution within the Inception module output has the same size and, when concatenated, produces desired output size)
–
Number of residual blocks—how many residual blocks are stacked
–
Number of Inception modules per residual block
–
Kernels and padding—a triplet of kernels and paddings for each kernel that is used in the Inception module
–
Pooling kernel—the size of the kernel that is used in the MaxPooling layer in the Inception module
–
Bottleneck size—the size of the bottleneck convolution in the Inception module
CNN-LSTM [39,46] The first part of this architecture consists of one or more convolutional layers followed by pooling layers. This basic CNN architecture can extract the spatial features from the input sequence through its filters. The second part of the network is constructed from LSTM, which also contains one or more layers. This part of the network then extracts the temporal features from the data filtered by the CNN. In the end, a linear fully connected layer is used to encapsulate the information, which forwards it to the output layer. This network shares the merits of both CNN and LSTM networks, and can extract both spatial and temporal features from input data [47]. The following hyperparameters were analyzed and tuned; the specifics can be found in Appendix B.
–
Number of convolution and pooling layers
–
Number of filters in every convolutional layer
–
Kernel sizes in convolutional and pooling layers (for each layer)
–
Number of LSTM layers—how many layers are stacked on each other
–
Hidden size of the LSTM—how large the hidden state of the LSTM is
–
Dropout rate—rate of random values being set to 0 during training
–
Number of hidden units of the last linear layer
ResNet-LSTM [39,41] The ResNet-LSTM architecture differs from CNN-LSTM in the stacking of residual blocks instead of the stacking of convolutional layers. In this setup, ResNet extracts features, followed by the LSTM to train the encoded feature vector with dynamics and make a prediction [48]. As mentioned previously regarding ResNet, the maximal number of layers is 16. The following hyperparameters were analyzed and tuned; the specifics can be found in Appendix B.
–
Number of ResNet layers, keeping in mind that every block contains two layers, so the number of layers must be an even number
–
Number of filters in groups 1/2/3 of ResNet—how many filters each layer in one group has
–
Stride in groups 1/2/3 of ResNet—the stride size that the convolution kernel will have in the specified group
–
Number of LSTM layers—how many layers are stacked on each other
–
Hidden size of LSTM—how large the hidden state of LSTM is
–
Dropout rate—rate of random values being set to 0 during training
Informer [49] The general informer (transformer-based) architecture is made for sequence- to-sequence problems. Since we are dealing with a regression task, we need only the encoder part of the architecture to encapsulate the input into a feature map. Then, we add a linear layer to forward this information to one output variable. According to Zhang et al. [50], the transformer architecture has three significant limitations:
–
The quadratic computation of self-attention
–
The memory bottleneck in stacking layers for long inputs
–
The speed decrease when predicting long outputs

As discussed above, the decoder part is not relevant to our problem, nor is the third limitation. To improve the mentioned limitations, the informer architecture introduces the ProbSparse self-attention mechanism, which needs to calculate the O (ln LQ) dot-product for each query-key lookup. Moreover, instead of fully connected layers, the informer architecture contains 1D convolutional layers followed by max-pooling, reducing overall memory usage. The embeddings in the original implementation are a combination of value embedding, positional embedding, and temporal embedding. However, the temporal embedding may introduce redundant information about our use case since different cars may have the same SoC at different points in time. Therefore, the model is tested with and without temporal embeddings.

The following hyperparameters were analyzed and tuned; the specifics can be found in Appendix B:

–: Dimension of the model—the size of the inputs after the embedding
–: Number of attention heads
–: Number of encoder layers
–: Number of filters in the convolutional layers
–: Type of attention used in the encoder; options: “prob” or “full”

The architectures listed above were chosen for the processing of the EV’s battery data and SoC estimation because they have been shown in existing works to be capable of processing time series data whilst yielding low prediction errors. As the signals from the BMS are also time series data, it follows that the architectures listed above should be able to utilize the signals and perform estimation of SoC for EV batteries. They are also architectures that have been widely established with many existing implementations; thus, their availability and reproducibility make them suitable for experimentation.

The aforementioned NNs were deployed with the following settings:

Dataset: The dataset consists of 269 sessions (265 single-labeled and 4 multi-labeled) recorded for the car model e-Golf. Window segments of 20 min were used as input with a time step interpolation of 1 s and a 30 s stride. The signals used for the NNs were as follows:

CURRENT
VOLTAGE
VOLTAGE_DIFF
T_CELL_AVG
T_DIFF
CUMULATIVE_DE
CUM_CE_DIFF

These were selected based on expert opinions and existing knowledge regarding SoC.

Preprocessing: Outlier removal and interpolation were applied to the dataset. Using MinMaxScaler, all data, including the labels, were normalised to the range of 0 to 1.

Train, validation, test split: Stratified session splitting was described in Section 3.4. There were a total of 296 sessions; the train set had 188 sessions, the validation set had 40 sessions, and the test set had 41 sessions.

Training procedures: Training involved an initial phase of hyperparameter tuning using Optuna [51] with the Tree-Parsing Sampler, while the experiments were subsequently recorded on Neptune.ai [52] as an experiment tracking tool. No oversampling or undersampling techniques were employed for training.

The models trained for 1500 epochs with an initial learning rate of 0.0028408 and a batch size of 32, which was scheduled by a cosine scheduler. Early stopping was used with the Adam optimizer, which terminated training after 300 epochs if the loss experienced no changes. This was applied to prevent overfitting.

Once the set of hyperparameters and the ideal number of epochs were determined by Optuna, the models were retrained, and their SoC estimations were evaluated based on the metrics in Section 3.6. As previously mentioned, an extensive list of the hyperparameters is available in Appendix B.

Results: Utilizing the stratified session split method, we trained the different architectures listed for SoC estimation. The results of each architecture with the metrics presented in Section 3.6 are available in Table 4. The results of the metrics inform us that most of the model architectures performed well across the train, validation, and test sets. In particular, LSTM, GRU, and CNN-LSTM achieved MSE values lower than 5

\times 10^{- 3}

and MAE values lower than 20

\times 10^{- 3}

in terms of SoC estimation. The poorer performances (MSE values higher than 1

\times 10^{- 3}

and MAE values higher than 30

\times 10^{- 3}

in terms of SoC estimation) of ResNet, ResNet-LSTM, and InceptionTime suggest that architectures saturated with residual blocks and convolution mechanisms are a poor fit for the signals of EV batteries for SoC estimation. The SoC estimations from CNN-LSTM and GRU can also be confirmed visually in Figure 4a and Figure 5a. Extensive details are given in Figure A1a, Figure A2a, Figure A3a, Figure A4a and Figure A5a, which are available in Appendix A. Additionally, an extended version of Table 4 can also be found in Appendix A’s Table A1.

During the experiment, it was found that the signal CUMULATIVE_DE is prone to misleading the models. This prevents the models from learning how to accurately estimate SoC. Because of this, extra care was taken with CUMULATIVE_DE to minimize this effect. This is further expanded upon in Section 4.3.

4.2. E2: SoC Estimation Using Fusion Hybrid Model

In the domain of multivariate time series regression, the fusion of static and time series features has emerged as a critical avenue for enhancing predictive models [42]. As mentioned previously in Section 3.2, the signals within an EV battery are features of a time series. More specifically, the signals can be referred to as the dynamic features, whilst the static features would be some further information extracted from the signals. An example of a static feature would be the average of a signal. Such features can be obtained during the preprocessing of time series signals [53].

The concept behind the fusion [54] of features is to make maximum use of the information available in sequences. This would require a hybrid model that can combine temporal patterns with sequential dependencies and the contextual information of other external features. In woek of Li, etal. [47], the success from the fusion of static and time features via NN attention mechanisms suggests that the same can be transferred to SoC estimation of EV batteries.

The Fusion Hybrid Model approach seeks to connect the unchanging characteristics of the data with their temporal evolution, aiming to provide a comprehensive understanding of the underlying relationships with contextual information. In the case of EV battery modeling, dynamic features will be signals such as those in Table 3, and static features will be information extracted from signals, such as the following:

Average current in each window
(CURRENT_AVG_WINDOW)
Average voltage in each window
(VOLTAGE_AVG_WINDOW)
Difference in voltage in each window
(VOLTAGE_DIFF_WINDOW)
Average cumulative discharge energy in each window
(CUMULATIVE_DE_AVG)
Average cumulative charge energy in each window
(CUMULATIVE_CE_AVG)
Average temperature for each window
(T_AVG_WINDOW)
Difference in SoC in each window
(SOC_REAL_DIFF)
Average SoC in each window
(SOC_REAL_AVG)
Difference between the max voltage and min voltage in each window
(VOLTAGE_MAXMIN_DIFF)

The features were fused based on a concatenation-based approach, which was characterized by the straightforward merging of static and dynamic features, forming an extended feature space [48]. By directly combining these attributes, the model gained immediate access to both the stable attributes and the evolving temporal patterns. However, this requires the managing of potentially high-dimensional feature spaces and would also require careful consideration of normalization techniques to maintain balanced contributions from each feature.

The following settings were used in the experiment.

Dataset: The dataset consists of 269 sessions (265 single-labeled and 4 multi-labeled) recorded for the car model e-Golf. Window segments of 20 min were used as input, with a time step interpolation of 1 s and a 20 min stride.

Features: As this is a hybrid model, there are two sets of features: the dynamic and static features. The dynamic features for SoC include the following:

CURRENT
VOLTAGE
VOLTAGE_DIFF
T_CELL_AVG
T_DIFF
DIFF_CAP
DELTA_RESISTANCE

The static features consisted of the aforementioned static features from the list above. The features were picked based on expert opinions regarding SoC and the outcome from Section 4.1.

Preprocessing: The dataset was subjected to outlier removal and interpolation. Using MinMaxScaler, all data, including the labels, were normalized to the range of 0 to 1. The process was the same as that described in Section 4.1.

Train, validation, test split: Stratified session splitting was performed with a total of 296 sessions. This means the train set had 188 sessions, the validation set had 40 sessions, and the test set had 41 sessions. This same setting was used in Section 4.1.

Model: The Fusion Hybrid Model has 2 CNN layers with 256 neurons and LSTM layers with 66 neurons per layer. It has a feed-forward NN with three layers to capture the static features. The output of this model is the combined linear layer, which takes the individual output of the dynamic and static part of the model. Dropout was used to prevent overfitting.

Training procedure: Training involved an initial phase of hyperparameter tuning using Optuna [51] with Tree-Parsing Sampler, while the experiments were subsequently recorded on Neptune.ai [52] as an experiment tracking tool.

The Fusion Hybrid Model trained for 1500 epochs with an initial learning rate of 0.0028408 and a batch size of 32, which was scheduled by a cosine scheduler. Early stopping was used with the Adam optimizer, which terminated after 300 epochs if the loss had not changed. This was done to prevent overfitting.

The resulting SoC estimation from the Fusion Hybrid Model was evaluated based on the metrics in Section 3.6.

Results: Utilizing the stratified session split method with the Fusion Hybrid Model, the SoC estimation results remained excellent across the train, validation and test sets when evaluated using the evaluation metrics. In terms of SoC estimation, the Fusion Hybrid Model was able to maintain a MSE lower than 5

\times 10^{- 4}

and a MAE lower than 20

\times 10^{- 3}

. This is similar to the results of the LSTM and the GRU, with only a difference of up to 0.5

\times 10^{- 4}

in MSE and a difference of up to 0.6

\times 10^{- 3}

in MAE The results are available in Table 4, and the SoC estimations can also be visually confirmed in Figure 6. Additionally, an extended version of Table 4 can also be found in Appendix A’s Table A1.

4.3. E3: SoC Estimation Guided by xAI and Importance Estimates

As mentioned in Section 2, there are methods that can inform the user about the importance estimation and significance of features present within the data. This is useful, as it provides information regarding what features are necessary and what features are redundant when estimating SoC. Within this work, we used xAI and importance estimates to guide our modeling of NNs for EV SoC estimations. We utilized a local xAI method named InputXGradient [27] and a global importance estimate method named Pairwise Importance Estimate Extension (PIEE) [26]. We compared and verified the results from both methods to further understand the relative significance of the signals as well as the significance of their window size as input. As mentioned before, this is not commonly done in existing work of SoC estimation, as evidenced in Table 1. Additionally, studies that compare explanation of xAI and importance estimates are even more scarce.

xAI— InputXGradient: This is an xAI method that is used to understand the influence of input features on a model’s prediction. It takes the gradient of the output with respect to the input features, indicating how sensitive the prediction is to changes in each input feature. The gradient of each input feature is then multiplied by its respective input value to highlight which features have the highest influence on the model’s prediction. InputXGradient was chosen because it is an intuitive local explainability approach, and its implementation can be readily found in Captum [55]. As the aim of this work is not a thorough walkthrough of the working of xAI methods, further information of InputXGradient can be found in Shrikumar et al.’s work [27].

Importance estimate—PIEE: This method utilizes an embedded pairwise layer to extract information for each feature from the input. This information is captured in the form of a profile, which can be pairwise weights focused or pairwise gradients focused. These profiles can be combined with statistical analysis to reach a relative estimate of feature importance. PIEE was chosen because of its stability and ease of implementation. It is a global importance estimate approach as opposed to the alternative, which is a local explainability approach. Again, as the aim of this work is not a thorough walkthrough of the working of xAI methods, further information of PIEE can be found in Chan and Veas’s work [26], and details of its implementation can be found on GitHub [56].

The experiment followed the setup below:

Dataset: The dataset consisted of 269 sessions (265 single-labeled and 4 multi-labeled) recorded on e-Golf. Window segments of 20 min were used as input, with a time step interpolation of 1 s and a 30 s stride. This was the same as in Section 4.1. The signals used for SoC estimation varied depending on the investigation of feature importance.

Preprocessing: Outlier removal and interpolation were applied to the dataset. Using MinMaxScaler, all the data, including the labels, were normalized to the range of 0 to 1. This was also the same as in Section 4.1.

Train, validation, test split: Stratified session splitting was applied. There were a total of 296 sessions; the train set had 188 sessions, the validation set had 40 sessions, and the test set had 41 sessions. The same setting was used in Section 4.1.

Models: The same models from Section 4.1 were used for feature importance analysis.

Training procedures: Training followed from Section 4.1. The settings of this experiment were very similar if not the same as the setting of Section 4.1, as we needed a baseline setting in order to evaluate the effects of the xAI and feature importance estimate methods.

Results—Over-reliance: A thorough evaluation was conducted for the following features, which were taken from Table 3:

CURRENT
VOLTAGE
VOLTAGE_CELL_MIN
VOLTAGE_CELL_MAX
VOLTAGE_DIFF
T_CELL_AVG
T_CELL_MIN
T_DIFF
SOC_REAL
MILEAGE
CUMULATIVE_DE
CUMULATIVE_CE
CUM_CE_DIFF

The list of features was determined via expert opinions and established existing works [7,10,18] regarding SoC. It served as a comprehensive list to measure the relative importance between the signals.

InputXGradient’s averaged results of relative importance between the signals from the different NNs are presented in Figure 7. The figure indicates that the models over-rely on CUMULATIVE_DE for SoC estimation. Further investigation revealed that the signal was moderately correlated with SoC, where a Pearson correlation coefficient of

0.554

was measured. This suggests that the models were learning the correlation instead of utilizing the signals for their intended purpose of estimating SoC.

This information was not obvious from the dataset and would not have been discovered without the use of xAI. Afterwards, we experimented with different subsets of features as inputs and repeated the procedure to learn the contributions of the features. It was found that the models tended to over-rely on MILEAGE and CUMULATIVE_DE. Therefore, necessary adjustments were made, such as discarding some sessions.

Results—Signals: After the investigation of correlation reliance, we made necessary adjustments to the signals and established a condensed set of features for measuring relative importance based on expert opinions. These were as follows:

CURRENT
VOLTAGE
VOLTAGE_DIFF
T_CELL_AVG
T_DIFF
CUMULATIVE_DE
CUM_CE_DIFF

This set of condensed signals were used to retrain the different NNs, and xAI’s InputXGradient was applied again. The results, demonstrated in Figure 8, showed that CURRENT and VOLTAGE were both considered to have a higher relative importance within this set of condensed signals. This outcome concurs with expert opinions and the basis of SoC.

We additionally verified the relative importance of the features by retraining the NNs with only CURRENT and VOLTAGE to compare the results of SoC estimation. This is shown in Table 5. We focused on CNN-LSTM and GRU since the xAI results were obtained from these two corresponding models. The visuals of CNN-LSTM and GRU are available in Figure 4b and Figure 5b, respectively. We only focus on the results from the test set in order to be representative when applying the changes to a real application, where it would be impossible to perform any further tuning. Based on the differences from the respective metrics between the architectures, MSE with a difference of up to 3

\times 10^{- 4}

and MAE with a difference of approximately 3

\times 10^{- 3}

, we can conclude that there were minimum changes between the performances of SoC estimation. This shows that the NNs trained with only these two features exhibit comparable performance to NNs trained with the condensed set of features. Interestingly, the SoC estimations of ResNet and InceptionTime improved when using only CURRENT and VOLTAGE; this is supported by Figure A3 and Figure A5, respectively, in Appendix A. This further reinforces the importance of the two features within the signals. An extended version of Table 5 can be found in Appendix A’s Table A2.

Results—Time Steps of Input Window: The previous outcome regarding the signals’ relative importance revealed that VOLTAGE and CURRENT are particularly important for SoC estimation. It can also be observed that not all time steps within the input window were considered equally important in Figure 8.

Therefore, we also investigated the importance of time steps within the multivariate time series context. Specifically, we wanted to know if we could make an informed decision regarding the window size for our input. Here, we used the importance estimate method PIEE [26], which is more suitable when considering the time steps of a multivariate time series context because it can examine each time point from a global standpoint. We applied PIEE to the NNs trained with CURRENT and VOLTAGE. The results of the importance estimates are shown in Figure 9, which reinforce the previous observation regarding time steps within the input window.

Utilizing the information of the time steps, we retrained CNN-LSTM and GRU with reduced window sizes, which were informed by their corresponding results of PIEE. The results of the retrained NNs with reduced window sizes are available in Table 6, and the visuals of CNN-LSTM and GRU are available in Figure 4c and Figure 5c, respectively. Regarding Table 6, again, we are only concerned with the test sets in order to be representative of the challenges of real application. Here, we note that there are results of SoC estimation that have a minimum difference when compared to their counterparts from Table 5 (up to 0.1

\times 10^{- 4}

difference in MSE, up to 0.3

\times 10^{- 3}

difference in MAE); however, there are also results where there is a noticeable difference when compared to their counterparts (5

\times 10^{- 4}

difference in MSE and 5

\times 10^{- 3}

difference in MAE). This is due to the open interpretation of PIEE’s estimate. From Figure 9, we can observe that the method does not produce a definitive value regarding the smallest effective window size; instead, it produces an estimate of importance for each time step, which can be combined to form a heatmap of importance. Using the heatmap of importance, it becomes possible to make informed choices regarding the approximate effective window size. This is demonstrated by the results of CNN-LSTM with window size 53, shown in Table 6, which is based on Figure 9a. This can also be visually compared in Figure 4c. However the effective window size is not necessarily always clear, and this is evident in the example of Figure 9b, which can lead to a result such as GRU with a window size 45 from Table 6 and Figure 5c. In such cases, a systematic reduction evaluation approach can be used to reach a conclusion. This is also demonstrated in Table 6. An extended version of the table can be found in Appendix A’s Table A3. The key findings from the results suggest that the window size can be reduced up to 9–25% and still retain performance.

5. Discussion

In Section 3.1, we provided a general structure for the paper along with its main themes. Here, we will evaluate what we have achieved within this work with respect to the main themes.

5.1. Evaluation of Main Themes

Data: We obtained data from EVs under real working conditions, and we utilized the battery signals in a data-driven approach. We achieved this using the plethora of data that AVILOO collected from their fleet of EVs’ BMS. Within this work, we focused on the car model e-Golf, and this provided a total of 269 full-cycle discharging sessions from deployed EV batteries. Within these sessions, we utilized relevant signals for SoC estimation. This required interpolation and normalization of the signals. Furthermore, segmentation of sessions allowed for partial drive cycle SoC estimation, and it was necessary to process the signals further in meaningful ways. Finally, in order for the data to be trained and evaluated via NNs, different data splitting strategies were used to create train, validation, and test sets. These achievements are all defined in Section 3.2, Section 3.3, Section 3.4.

NN Modeling: We built NNs that could provide robust and resilient estimation of SoC, and we compared their performance. We achieved this mainly in Section 4.1, where we documented the uses of architectures such as RNNs like GRU and LSTM, layering of NNs such as CNN-LSTM, and transformer-based NNs such as the Informer. We also explained what is necessary to adapt the Informer for the use case of EV batteries. The results achieved by the different NN architectures are demonstrated in Table 4 and Figure 4a and Figure 5a.

Utilizing the results of E1, we continued using the different NN architectures with different signals and even reduced the input window size, as described in Section 4.3. The different NNs continued to show high performance when compared to their performance from E1, which suggests that the NNs can provide robust and resilient estimation of SoC.

Feature Analysis—Fusion of Static and Dynamic Features: We engineered an approach for the fusion of features, specifically in terms of static and dynamic features, in a data-driven manner to estimate SoC. We accomplished this in Section 4.2. We previously explained that signals can be considered as features within time series in Section 3.2. In E2, we further expanded upon what dynamic and static features are in relation to time series, and we defined what dynamic and static features would be used in the fusion of features within the context of EV’s battery.

We engineered a NN that could process the fusion of dynamic and static features, and detailed the relevant parameters along with its training procedure. The results achieved by this Fusion Hybrid Model are shown in Table 4 and Figure 6 which were comparable, if not better than, the other NNs’ performances.

Feature Analysis—Significance and Redundancy: We used xAI and importance estimate methods to evaluate the data features used for the estimation of SoC. We accomplished this in Section 4.3. Within the experiment, we utilized the xAI method, InputXGradient [27], and an importance estimate method, Pairwise Importance Estimate Extension (PIEE) [26], to evaluate what features were important for SoC estimation. The features we investigated were signals from EVs, where we found that some signals were prone to being over relied upon by the NNs. This was not obvious and would have otherwise been missed in our normal analysis. This is discussed in E3’s Results—Over-reliance.

Utilizing xAI’s InputXGradient, we determined what signals from the set of available signals had higher relative importance. We verified this by retraining the NNs with the specific higher-importance signals. This result is available in Table 5 and is documented in E3’s Results—Signals. We also observed that not all time steps within the input window were important according to the methods. We then proceeded to investigate whether it was possible to reduce the window size for the NNs based on the information of importance from PIEE. The results of this are available in Table 6 and are documented in E3’s Results—Time Steps of Input Window, which showed that it is possible to reduce up to 9–25% of the input window without detrimentally affecting performance.

5.2. Future Work

There are various areas within this work that could be expanded upon. These include the following:

Transfer learning: We only focused on the e-Golf car model within this work. However, there are other available car models in the data repository of AVILOO, such as the Nissan Leaf and Tesla Models S. If we are to make our work applicable with other car models, transfer learning techniques will be required. Transfer learning allows NNs to effectively learn from new domains without forgetting the knowledge learned from previous domains. This enables a NN to not be restricted by the information of one particular domain, allowing the model to be more versatile when deployed in real-world applications. In our case, transfer learning will enable accurate SoC predictions of the other car models, such as Nissan Leaf and Tesla Models S, whilst maintaining the low SoC prediction errors obtained for the e-Golf car model.

Preliminary work on transfer learning with SoC EV battery estimation with different EV battery models has been carried out in studies such as those performed by Bhattacharjee et al. [20], Hannan et al. [22], and Kailong et al. [57]. They have confirmed that the employment of transfer learning techniques with EV batteries are able to achieve higher accuracy in SoC estimation where prediction errors were minimized.

NN ensemble: We provided the SoC estimation results of different NN architectures and compared their performances with each other. The different NNs can serve as necessary components of an ensemble [58], which can be used to improve the robustness and reliability of SoC estimation. An ensemble combines the outputs of different methods—in this case, the different NNs that have been developed—and produces a more reliable output. The basic approach of achieving this is the averaging the different methods’ outputs.

There also exist other methods that strengthen the reliability of the output. An example of such is the work by Caiping et al. [59], who performed quantile regression on lithium-ion batteries such that the regression result was not a value but a likelihood distribution of what the value would likely be.

6. Conclusions

Within this work, we utilized data-driven methods to estimate the SoC of a deployed EV battery. The data used were field data from a real-world scenario as opposed to simulated data from a laboratory setting. We benchmarked time-series-suitable NN architectures on the estimation of SoC. As a result, we established a framework for the training and testing of the deployed EV battery that enables the evaluation of SoC estimation based on concrete metrics. Furthermore, we demonstrated the use of two xAI methods to investigate the relationship between the signals within an EV battery.

This paper documents our complete approach with regards to the data, NN modeling, and signal analysis. We formulated three experiments to achieve the outlined outcomes, described in Section 4.1, Section 4.2, Section 4.3. In E1, NN architectures in the context of time-series data were benchmarked for SoC estimation. It was found that LSTM, GRU, and CNN-LSTM performed better than the others based on visual inspection and metrics standards. In E2, we investigated the fusion of dynamic and static features by constructing the Fusion Hybrid Model. The model outperformed the other NN architectures from E1 in some cases based on visual inspection and metric standards. In E3, we demonstrated the application of the xAI method InputXGradient [27] and an importance estimate method PIEE [26] to investigate the signals from the battery. We showcased how these methods were used to determine signals that the model over-relied on for the estimation of SoC. Through xAI, we discovered the signals from the battery that are essential for the NNs conducting SoC estimation. Lastly, we were able to reduce the input window size whilst retaining performance. The process of determining the reduced window size was not trivial.

After evaluating our results, we discussed potential future research directions. Specifically, we discussed how the current framework could be expanded to estimate the SoC of other car models’ batteries with the aid of transfer learning and how an ensemble could be incorporated to improve the reliability and robustness of current SoC estimations. We hope that this work is useful to other researchers and to relevant communities.

Author Contributions

Methodology, everyone contributed; Development of Experiments 1 and 2, A.S., A.M. and K.Č.; Development of Experiment 3, H.T.J.C., I.Š. and K.Č.; Writing: Draft Preparation, everyone contributed; Writing: Editing, H.T.J.C. and E.V.; Supervision, J.R.-Z., F.P. and E.V.; Administration, J.R.-Z. All authors have read and agreed to this version of the manuscript.

Funding

This project, grant number FO999888165, was funded by the program “BRIDGE” of the Austrian Federal Ministry for Climate Action (BMK). This publication was written at Virtual Vehicle Research GmbH in Graz and partially funded within the COMET K2 Competence Centers for Excellent Technologies from the Austrian Federal Ministry for Climate Action (BMK), the Austrian Federal Ministry for Labour and Economy (BMAW), the Province of Styria (Dept. 12), and the Styrian Business Promotion Agency (SFG). The Austrian Research Promotion Agency (FFG) authorized the program management. The APC of this manuscript was supported by the TU Graz Open Access Publishing Fund.

Data Availability Statement

The dataset used for the experiments cannot be shared due to data confidentiality of the owner, AVILOO. The NN architectures used for the experiments and their parameters are available in Appendix B The code for the explainability methods, PIEE and DeepLIFT, is available. Details of PIEE can be found in Chan and Veas’s work [26], and the code for implementation is available on Github [56]. Details of DeepLIFT can be found in Shrikumar’s work [27], and an implementation of it is available in the Captum [55] framework.

Acknowledgments

Open Access Funding by the Graz University of Technology.

Conflicts of Interest

The authors Ho Tung Jeremy Chan, Ilija Šimić, Kristina Časni, and Eduardo Veas were employed by the company “Know-Center Research GmbH”. Eduardo Veas was also employed by Graz University of Technology. The authors Jelena Rubeša-Zrim, Franz Pichler, Amil Salihi, and Adam Mourad were employed by the company “Virtual Vehicle Research GmbH”. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

$R^{2}$	R Squared
AI	Artificial Intelligence
BMS	Battery Management System
CNN	Convolutional Neural Network
e-Golf	Volkswagen e-Golf
EV	Electric Vehicle
FNN	Forward Neural Network
GRU	Gated Recurrent Unit
I	Current
Li-ion	Lithium Ion
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MSE	Mean Squared Error
NN	Neural Network
OCV	Open-Circuit Voltage
Q	Calculated charge
RMSE	Root Mean Squared Error
RNN	Recurrent Neural Network
SoC	State of Charge
SoH	State of Health
SoP	State of Power
SoX	State of X
V	Voltage
xAI	Explainable Artificial Intelligence

Appendix A. Results: Visualization and Tables

Figure A1. LSTM SoC prediction: SoC predictions (blue) versus real SoC (orange). (a) E1’s result. (b) E3’s result, trained only with CURRENT and VOLTAGE. (c) E3’s result, trained only with CURRENT and VOLTAGE as well as a reduced window size.

Figure A2. Informer SoC prediction: SoC predictions (blue) versus real SoC (orange). (a) E1’s result. (b) E3’s result, trained only with CURRENT and VOLTAGE.

Figure A3. Informer SoC prediction: SoC predictions (blue) versus real SoC (orange). (a) E1’s result. (b) E3’s result, trained only with CURRENT and VOLTAGE.

Figure A4. ResNet-LSTM SoC prediction: SoC predictions (blue) versus real SoC (orange). (a) E1’s result. (b) E3’s result, trained only with CURRENT and VOLTAGE.

Figure A5. InceptionTime SoC prediction: SoC predictions (blue) versus real SoC (orange). (a) E1’s result. (b) E3’s result, trained only with CURRENT and VOLTAGE.

Table A1. E1’s NNs and E2’s Fusion Hybrid Model performance results for train, validation, and test sets according to the evaluation metrics.

Model	Train				Validation				Test
Model	R²	MSE	MAE	RMSE	R²	MSE	MAE	RMSE	R²	MSE	MAE	RMSE
LSTM	0.996	3.4 $\times 10^{- 4}$	11.8 $\times 10^{- 3}$	18.5 $\times 10^{- 3}$	0.993	5.6 $\times 10^{- 4}$	14.7 $\times 10^{- 3}$	23.7 $\times 10^{- 3}$	0.996	2.9 $\times 10^{- 4}$	12.2 $\times 10^{- 3}$	17.1 $\times 10^{- 3}$
GRU	0.995	3.8 $\times 10^{- 4}$	12.6 $\times 10^{- 3}$	19.5 $\times 10^{- 3}$	0.992	6.2 $\times 10^{- 4}$	15.7 $\times 10^{- 3}$	25 $\times 10^{- 3}$	0.995	3.5 $\times 10^{- 4}$	13.3 $\times 10^{- 3}$	18.6 $\times 10^{- 3}$
CNN-LSTM	0.994	4.6 $\times 10^{- 4}$	14.6 $\times 10^{- 3}$	21.4 $\times 10^{- 3}$	0.991	7.3 $\times 10^{- 4}$	18.5 $\times 10^{- 3}$	27.1 $\times 10^{- 3}$	0.994	4.5 $\times 10^{- 4}$	15.4 $\times 10^{- 3}$	21.3 $\times 10^{- 3}$
Informer	0.976	1.9 $\times 10^{- 3}$	33.6 $\times 10^{- 3}$	43.6 $\times 10^{- 3}$	0.973	2.2 $\times 10^{- 3}$	35.8 $\times 10^{- 3}$	46.4 $\times 10^{- 3}$	0.975	1.9 $\times 10^{- 3}$	33.5 $\times 10^{- 3}$	43.4 $\times 10^{- 3}$
ResNet	0.751	19.4 $\times 10^{- 3}$	12.1 $\times 10^{- 2}$	13.9 $\times 10^{- 2}$	0.769	18.2 $\times 10^{- 3}$	11.5 $\times 10^{- 2}$	13.5 $\times 10^{- 2}$	0.725	20.4 $\times 10^{- 3}$	12.5 $\times 10^{- 2}$	14.3 $\times 10^{- 2}$
ResNet-LSTM	0.949	4.0 $\times 10^{- 3}$	50.4 $\times 10^{- 3}$	63.3 $\times 10^{- 3}$	0.952	3.8 $\times 10^{- 3}$	49.9 $\times 10^{- 3}$	61.7 $\times 10^{- 3}$	0.948	3.9 $\times 10^{- 3}$	49.4 $\times 10^{- 3}$	62.2 $\times 10^{- 3}$
InceptionTime	0	41.2 $\times 10^{- 2}$	57.8 $\times 10^{- 2}$	64.2 $\times 10^{- 2}$	0	37.0 $\times 10^{- 2}$	53.9 $\times 10^{- 2}$	60.8 $\times 10^{- 2}$	0	41.5 $\times 10^{- 2}$	58.3 $\times 10^{- 2}$	64.4 $\times 10^{- 2}$
Fusion Hybrid	0.995	3.5 $\times 10^{- 4}$	13.4 $\times 10^{- 3}$	18.7 $\times 10^{- 3}$	0.996	3.1 $\times 10^{- 4}$	13.2 $\times 10^{- 3}$	17.5 $\times 10^{- 3}$	0.996	3 $\times 10^{- 4}$	12.8 $\times 10^{- 3}$	17.3 $\times 10^{- 3}$

Table A2. E3’s performance results of NNs for train, validation, and test sets according to the evaluation metrics only using signals: CURRENT and VOLTAGE. The values in brackets indicate the difference compared to the respective metric from E1’s Table A1.

Model	Train				Validation				Test
Model	R²	MSE	MAE	RMSE	R²	MSE	MAE	RMSE	R²	MSE	MAE	RMSE
GRU	0.996	3.3 $\times 10^{- 4}$	13.3 $\times 10^{- 3}$	18.1 $\times 10^{- 3}$	0.992	6 $\times 10^{- 4}$	16.7 $\times 10^{- 3}$	24.5 $\times 10^{- 3}$	0.993 (−2 $\times 10^{- 3}$ )	5.2 $\times 10^{- 4}$ (+1.7 $\times 10^{- 4}$ )	16.2 $\times 10^{- 3}$ (+3 $\times 10^{- 3}$ )	22.8 $\times 10^{- 3}$ (+4.2 $\times 10^{- 3}$ )
CNN-LSTM	0.993	5.6 $\times 10^{- 4}$	16.5 $\times 10^{- 3}$	23.6 $\times 10^{- 3}$	0.990	8 $\times 10^{- 4}$	18.9 $\times 10^{- 3}$	28.3 $\times 10^{- 3}$	0.990 (−4 $\times 10^{- 3}$ )	7.1 $\times 10^{- 4}$ (+2.6 $\times 10^{- 4}$ )	18.2 $\times 10^{- 3}$ (+2.8 $\times 10^{- 3}$ )	26.6 $\times 10^{- 3}$ (+5.3 $\times 10^{- 3}$ )

Table A3. E3’s performance results of NNs for train, validation, and test sets according to the evaluation metrics only using signals: CURRENT and VOLTAGE, as well as a reduced window size. The values in brackets indicate the difference compared to the respective metric from E2’s Table 5.

Model	Window	Train				Validation				Test
Model	Window	R²	MSE	MAE	RMSE	R²	MSE	MAE	RMSE	R²	MSE	MAE	RMSE
GRU	10	0.989	8.6 $\times 10^{- 4}$	20.2 $\times 10^{- 3}$	29.3 $\times 10^{- 3}$	0.985	1.2 $\times 10^{- 3}$	23 $\times 10^{- 3}$	34.9 $\times 10^{- 3}$	0.986 (−7 $\times 10^{- 3}$ )	10.3 $\times 10^{- 4}$ (+5.1 $\times 10^{- 4}$ )	21.5 $\times 10^{- 3}$ (+5.3 $\times 10^{- 3}$ )	32.1 $\times 10^{- 3}$ (+9.3 $\times 10^{- 3}$ )
GRU	15	0.990	7.6 $\times 10^{- 4}$	18.8 $\times 10^{- 3}$	27.5 $\times 10^{- 3}$	0.986	1.1 $\times 10^{- 3}$	21.4 $\times 10^{- 3}$	32.6 $\times 10^{- 3}$	0.988 (−5 $\times 10^{- 3}$ )	9.1 $\times 10^{- 4}$ (+3.9 $\times 10^{- 4}$ )	20.4 $\times 10^{- 3}$ (+4.2 $\times 10^{- 3}$ )	30.2 $\times 10^{- 3}$ (+7.4 $\times 10^{- 3}$ )
GRU	25	0.991	6.7 $\times 10^{- 4}$	18.1 $\times 10^{- 3}$	25.9 $\times 10^{- 3}$	0.988	9.5 $\times 10^{- 4}$	20.8 $\times 10^{- 3}$	30.9 $\times 10^{- 3}$	0.989 (−4 $\times 10^{- 3}$ )	7.9 $\times 10^{- 4}$ (+2.7 $\times 10^{- 3}$ )	19.4 $\times 10^{- 3}$ (+3.2 $\times 10^{- 3}$ )	28 $\times 10^{- 3}$ (+5.2 $\times 10^{- 3}$ )
GRU	35	0.992	5 $\times 10^{- 4}$	16.3 $\times 10^{- 3}$	22.3 $\times 10^{- 3}$	0.991	7.4 $\times 10^{- 4}$	18.8 $\times 10^{- 3}$	27.2 $\times 10^{- 3}$	0.992 (−1 $\times 10^{- 3}$ )	6 $\times 10^{- 4}$ (+0.8 $\times 10^{- 4}$ )	17.8 $\times 10^{- 3}$ (+1.6 $\times 10^{- 3}$ )	24.5 $\times 10^{- 3}$ (+1.7 $\times 10^{- 3}$ )
GRU	45	0.995	3.8 $\times 10^{- 4}$	14 $\times 10^{- 3}$	19.6 $\times 10^{- 3}$	0.992	6.2 $\times 10^{- 4}$	16.8 $\times 10^{- 3}$	24.9 $\times 10^{- 3}$	0.993 (−0)	5.1 $\times 10^{- 4}$ (−0.1 $\times 10^{- 4}$ )	15.9 $\times 10^{- 3}$ (−0.3 $\times 10^{- 3}$ )	22.5 $\times 10^{- 3}$ (−0.3 $\times 10^{- 3}$ )
CNN-LSTM	55	0.993	5.6 $\times 10^{- 4}$	16.5 $\times 10^{- 3}$	23.6 $\times 10^{- 3}$	0.990	8.1 $\times 10^{- 4}$	19.1 $\times 10^{- 3}$	28.5 $\times 10^{- 3}$	0.990 (−0)	7.1 $\times 10^{- 4}$ (−0)	18.3 $\times 10^{- 3}$ (+0.1 $\times 10^{- 3}$ )	26.7 $\times 10^{- 3}$ (+0.1 $\times 10^{- 3}$ )
CNN-LSTM	53	0.993	5.6 $\times 10^{- 4}$	16.6 $\times 10^{- 3}$	23.6 $\times 10^{- 3}$	0.990	8.1 $\times 10^{- 4}$	19.2 $\times 10^{- 3}$	28.5 $\times 10^{- 3}$	0.990 (−0)	7.1 $\times 10^{- 4}$ (−0)	18.4 $\times 10^{- 3}$ (+0.2 $\times 10^{- 3}$ )	26.6 $\times 10^{- 3}$ (−0)

Appendix B. Hyperparameters

GRU
–
GRU layers = 3
–
Hidden GRU neurons = 128
–
Dropout = 0.2
–
FNN = 1
–
FNN neurons = 128
LSTM
–
LSTM layers = 3
–
Hidden LSTM neurons = 128
–
Dropout = 0.2
–
FNN = 1
–
FNN neurons = 32
Resnet
–
Blocks = 3, 2 layers for each block
–
Convolution filters = 64, for the first convolutional layer
–
Res filters = 64, 128, 256, respectively, for each block
–
Res strides = 1, 2, 2, respectively, for each block
–
Dropout = 0.2
–
FNN = 1
–
FNN neurons = 32
InceptionTime
–
Residual block size = 256, number of filters produced from a residual block
–
Residual blocks = 2
–
Inception modules = 4 per residual block
–
Kernels = (11, 21, 41), respectively, for the 3 convolutions inside the Inception module
–
Padding = (5, 10, 20), respectively, for the 3 convolutions inside the Inception module
–
Pooling kernel = 3, size for the Max Pooling layer in the Inception module
–
Bottleneck size = 32, size of bottleneck convolution in the Inception module
CNN-LSTM
–
Convolutional layers = [(128 filters, 3 kernels, 1 stride), (64 filters, 3 kernels, 1 stride)]
–
Pooling layers = [(3 kernels, 3 strides), (3 kernels, 3 strides)]
–
Dropout = 0.2
–
LSTM layers = 3
–
Hidden LSTM neurons = 256
–
FNN = 1
–
FNN neurons = 32
Resnet-LSTM
–
Resnet layers = 10
–
Convolution filters = 64, for the first convolutional layer
–
Res filters = 64, 128, 256, respectively, for each block
–
Res strides = 1, 2, 2, respectively, for each block
–
Dropout = 0.6
–
LSTM layers = 3
–
Hidden LSTM neurons = 128
–
FNN = 1
–
FNN neurons = 128
Informer
–
Factor = 5
–
Dimension of model = 512
–
Attention heads = 8
–
Encoder layers = 3
–
Attention used in encoder = “prob”
–
Dropout = 0
–
Time features encoding = “fixed”
–
Activation function = “GELU”
–
FNN neuron = 512

References

Rezvanizaniani, S.M.; Liu, Z.; Chen, Y.; Lee, J. Review and recent advances in battery health monitoring and prognostics technologies for electric vehicle (EV) safety and mobility. J. Power Sources 2014, 256, 110–124. [Google Scholar] [CrossRef]
Biresselioglu, M.E.; Demirbag Kaplan, M.; Yilmaz, B.K. Electric mobility in Europe: A comprehensive review of motivators and barriers in decision making processes. Transp. Res. Part A Policy Pract. 2018, 109, 1–13. [Google Scholar] [CrossRef]
Chhabra, P.; Goyal, D.S. A Thorough Review on Deep Learning Neural Network. In Proceedings of the 2023 International Conference on Artificial Intelligence and Smart Communication (AISC), Greater Noida, India, 27–29 January 2023; pp. 220–226. [Google Scholar] [CrossRef]
Huyen, C. Designing Machine Learning Systems; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2022. [Google Scholar]
Selvaraj, V.; Vairavasundaram, I. A comprehensive review of state of charge estimation in lithium-ion batteries used in electric vehicles. J. Energy Storage 2023, 72, 108777. [Google Scholar] [CrossRef]
Movassagh, K.; Raihan, S.A.; Balasingam, B.; Pattipati, K. A Critical Look at Coulomb Counting Towards Improving the Kalman Filter Based State of Charge Tracking Algorithms in Rechargeable Batteries. arXiv 2021, arXiv:2101.05435. [Google Scholar]
He, W.; Williard, N.; Chen, C.; Pecht, M. State of charge estimation for Li-ion batteries using neural network modeling and unscented Kalman filter-based error cancellation. Int. J. Electr. Power Energy Syst. 2014, 62, 783–791. [Google Scholar] [CrossRef]
Das, K.; Kumar, R. A Comprehensive Review of Categorization and Perspectives on State-of-Charge Estimation Using Deep Learning Methods for Electric Transportation. Wirel. Pers. Commun. 2023, 133, 1599–1618. [Google Scholar] [CrossRef]
Wu, M.; Qin, L.; Wu, G. State of power estimation of power lithium-ion battery based on an equivalent circuit model. J. Energy Storage 2022, 51, 104538. [Google Scholar] [CrossRef]
Korkmaz, M. SoC estimation of lithium-ion batteries based on machine learning techniques: A filtered approach. J. Energy Storage 2023, 72, 108268. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G.E. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Jose, A.; Shrivastava, S. Evolution of Electrical Vehicles, Battery State Estimation, and Future Research Directions: A Critical Review. IEEE Access 2024, 12, 158627–158646. [Google Scholar] [CrossRef]
Casolaro, A.; Capone, V.; Iannuzzo, G.; Camastra, F. Deep Learning for Time Series Forecasting: Advances and Open Problems. Information 2023, 14, 598. [Google Scholar] [CrossRef]
Chen, J.; Lu, C.; Chen, C.; Cheng, H.; Xuan, D. An Improved Gated Recurrent Unit Neural Network for State-of-Charge Estimation of Lithium-Ion Battery. Appl. Sci. 2022, 12, 2305. [Google Scholar] [CrossRef]
He, Z.; Shen, X.; Sun, Y.; Zhao, S.; Fan, B.; Pan, C. State-of-health estimation based on real data of electric vehicles concerning user behavior. J. Energy Storage 2021, 41, 102867. [Google Scholar] [CrossRef]
Choi, Y.; Ryu, S.; Park, K.; Kim, H. Machine Learning-Based Lithium-Ion Battery Capacity Estimation Exploiting Multi-Channel Charging Profiles. IEEE Access 2019, 7, 75143–75152. [Google Scholar] [CrossRef]
Naguib, M.; Kollmeyer, P.; Emadi, A. Lithium-ion battery pack robust state of charge estimation, cell inconsistency, and balancing. IEEE Access 2021, 9, 50570–50582. [Google Scholar] [CrossRef]
Wang, Z.; Feng, G.; Zhen, D.; Gu, F.; Ball, A. A review on online state of charge and state of health estimation for lithium-ion batteries in electric vehicles. Energy Rep. 2021, 7, 5141–5161. [Google Scholar] [CrossRef]
Guo, Y.; Yang, Z.; Liu, K.; Zhang, Y.; Feng, W. A compact and optimized neural network approach for battery state-of-charge estimation of energy storage system. Energy 2021, 219, 119529. [Google Scholar] [CrossRef]
Bhattacharjee, A.; Verma, A.; Mishra, S.; Saha, T.K. Estimating State of Charge for xEV Batteries Using 1D Convolutional Neural Networks and Transfer Learning. IEEE Trans. Veh. Technol. 2021, 70, 3123–3135. [Google Scholar] [CrossRef]
Jia, C.; He, H.; Zhou, J.; Li, J.; Wei, Z.; Li, K. Learning-based model predictive energy management for fuel cell hybrid electric bus with health-aware control. Appl. Energy 2024, 355, 122228. [Google Scholar] [CrossRef]
Hannan, M.A.; Neoh, D.; Hossain Lipu, M.S.; Mansor, M.; Ker, P.J.; Dong, Z.; Sahari, K.; Tiong, S.; Muttaqi, K.; Mahlia, T.M.I.; et al. Deep learning approach towards accurate state of charge estimation for lithium-ion batteries using self-supervised transformer model. Sci. Rep. 2021, 11, 19541. [Google Scholar] [CrossRef]
Saha, B.; Goebel, K. NASA Battery Dataset. Available online: https://www.kaggle.com/datasets/patrickfleith/nasa-battery-dataset (accessed on 19 February 2025).
Kollmeyer, P. Panasonic 18650PF Li-ion Battery Data. 2018. Available online: https://data.mendeley.com/datasets/wykht8y7tg/1 (accessed on 19 February 2025). [CrossRef]
Kollmeyer, P.; Vidal, C.; Naguib, M.; Skells, M. LG 18650HG2 Li-ion Battery Data and Example Deep Neural Network xEV SOC Estimator Script. 2020. Available online: https://data.mendeley.com/datasets/cp3473x7xv/3 (accessed on 19 February 2025). [CrossRef]
Chan, H.T.J.; Veas, E. Importance estimate of features via analysis of their weight and gradient profile. Sci. Rep. 2024, 14, 23532. [Google Scholar] [CrossRef] [PubMed]
Shrikumar, A.; Greenside, P.; Shcherbina, A.; Kundaje, A. Learning Important Features Through Propagating Activation Differences. arXiv 2019, arXiv:1704.02685. [Google Scholar] [CrossRef]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 3319–3328. [Google Scholar]
de Vargas, D.L.; Oliva, J.T.; Teixeira, M.; Casanova, D.; Rosa, J.a.L.G. Feature Extraction and Selection from Electroencephalogram Signals for Epileptic Seizure Diagnosis. Neural Comput. Appl. 2023, 35, 12195–12219. [Google Scholar] [CrossRef]
Ghosh, T.; Kirby, M. Feature Selection using Sparse Adaptive Bottleneck Centroid-Encoder. arXiv 2023, arXiv:2306.04795. [Google Scholar]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. Explainable AI for Trees: From Local Explanations to Global Understanding. arXiv 2019, arXiv:1905.04610. [Google Scholar] [CrossRef]
Horel, E.; Mison, V.; Xiong, T.; Giesecke, K.; Mangu, L. Sensitivity based Neural Networks Explanations. arXiv 2018, arXiv:1812.01029. [Google Scholar]
Samek, W.; Montavon, G.; Lapuschkin, S.; Anders, C.J.; Müller, K.R. Explaining deep neural networks and beyond: A review of methods and applications. Proc. IEEE 2021, 109, 247–278. [Google Scholar] [CrossRef]
Guo, Y.; Yu, H.; Ma, L.; Zeng, L.; Luo, X. THFE: A Triple-hierarchy Feature Enhancement method for tiny boat detection. Eng. Appl. Artif. Intell. 2023, 123, 106271. [Google Scholar] [CrossRef]
Covert, I.; Lundberg, S.; Lee, S.I. Understanding Global Feature Contributions with Additive Importance Measures. arXiv 2020, arXiv:2004.00668. [Google Scholar]
Lapuschkin, S.; Wäldchen, S.; Binder, A.; Montavon, G.; Samek, W.; Müller, K.R. Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun. 2019, 10, 1096. [Google Scholar] [CrossRef]
Bhatt, U.; Xiang, A.; Sharma, S.; Weller, A.; Taly, A.; Jia, Y.; Ghosh, J.; Puri, R.; Moura, J.M.; Eckersley, P. Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 27–30 January 2020; pp. 648–657. [Google Scholar]
AVILOO GmbH. AVILOO Battery Diagnostic. 2018. Available online: https://aviloo.com/home-en.html (accessed on 19 February 2025).
Yang, F.; Song, X.; Xu, F.; Tsui, K.L. State-of-Charge Estimation of Lithium-Ion Batteries via Long Short-Term Memory Network. IEEE Access 2019, 7, 53792–53799. [Google Scholar] [CrossRef]
Hamad, R.; Yang, L.; Woo, W.L. Joint Learning of Temporal Models to Handle Imbalanced Data for Human Activity Recognition. Appl. Sci. 2020, 10, 5293. [Google Scholar] [CrossRef]
Choi, H.; Ryu, S.; Kim, H. Short-Term Load Forecasting based on ResNet and LSTM. In Proceedings of the 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Aalborg, Denmark, 29–31 October 2018; pp. 1–6. [Google Scholar] [CrossRef]
Tayal, K.; Jia, X.; Ghosh, R.; Willard, J.; Read, J.; Kumar, V. Invertibility Aware Integration of Static and Time-Series Data: An Application to Lake Temperature Modeling; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2022; pp. 702–710. [Google Scholar] [CrossRef]
Ismail Fawaz, H.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.A.; Petitjean, F. InceptionTime: Finding AlexNet for time series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: https://www.tensorflow.org (accessed on 19 February 2025).
Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, NeuralIPS’19; Curran Associates Inc.: Red Hook, NY, USA, 2019. [Google Scholar]
Yang, F.; Song, X.; Wang, D.; Tsui, K.L. Combined CNN-LSTM Network for State-of-Charge Estimation of Lithium-Ion Batteries. IEEE Access 2019, 7, 88894–88902. [Google Scholar] [CrossRef]
Li, D.; Lyons, P.; Klaus, J.; Gage, B.; Kollef, M.; Lu, C. Integrating Static and Time-Series Data in Deep Recurrent Models for Oncology Early Warning Systems. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, CIKM’21, Virtual Event, 1–5 November 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 913–936. [Google Scholar] [CrossRef]
Esteban, C.; Staeck, O.; Yang, Y.; Tresp, V. Predicting Clinical Events by Combining Static and Dynamic Information Using Recurrent Neural Networks. arXiv 2016, arXiv:1602.02685. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. arXiv 2020, arXiv:2012.07436. [Google Scholar] [CrossRef]
Zhang, D.; Yin, C.; Zeng, J.; Yuan, X.; Zhang, P. Combining structured and unstructured data for predictive models: A deep learning approach. BMC Med. Inform. Decis. Mak. 2020, 20, 280. [Google Scholar] [CrossRef]
Shibata, M. An Introduction to the Implementation of Optuna, a Hyperparameter Optimization Framework. 2021. Available online: https://medium.com/optuna/an-introduction-to-the-implementation-of-optuna-a-hyperparameter-optimization-framework-33995d9ec354 (accessed on 30 June 2022).
Neptune team. Neptune.ai. 2019. Available online: https://neptune.ai/ (accessed on 19 February 2025).
Figo, D.; Diniz, P.C.; Ferreira, D.R.; Cardoso, J.M.P. Preprocessing techniques for context recognition from accelerometer data. Pers. Ubiquitous Comput. 2010, 14, 645–662. [Google Scholar] [CrossRef]
Oh, S.W.; Jeong, H.; Chung, S.; Lim, J.M.; Noh, K.J. Multimodal Sensor Data Fusion and Ensemble Modeling for Human Locomotion Activity Recognition. In Proceedings of the Adjunct Proceedings of the 2023 ACM International Joint Conference on Pervasive and Ubiquitous Computing & the 2023 ACM International Symposium on Wearable Computing, UbiComp/ISWC’23 Adjunct, Cancun, Mexico, 8–12 October 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 546–550. [Google Scholar] [CrossRef]
Kokhlikyan, N.; Miglani, V.; Martin, M.; Wang, E.; Alsallakh, B.; Reynolds, J.; Melnikov, A.; Kliushkina, N.; Araya, C.; Yan, S.; et al. Captum: A Unified and Generic Model Interpretability Library for PyTorch. 2020. Available online: https://github.com/pytorch/captum (accessed on 19 February 2025).
Chan, J. PIEE: Pairwise Importance Estimate Extension. 2024. Available online: https://github.com/thehcclab/pairwise-importance-estimate-extension (accessed on 19 February 2025).
Liu, K.; Peng, Q.; Che, Y.; Zheng, Y.; Li, K.; Teodorescu, R.; Widanage, D.; Barai, A. Transfer learning for battery smarter state estimation and ageing prognostics: Recent progress, challenges, and prospects. Adv. Appl. Energy 2023, 9, 100117. [Google Scholar] [CrossRef]
Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
Zhang, C.; Wang, Y.; Gao, Y.; Wang, F.; Mu, B.; Zhang, W. Accelerated fading recognition for lithium-ion batteries with Nickel-Cobalt-Manganese cathode using quantile regression method. Appl. Energy 2019, 256, 113841. [Google Scholar] [CrossRef]

Figure 1. SOC DISPLAY and SOC REAL signals.

Figure 2. Splitting data into windows: Creation of windows using a stride parameter set to 1.

Figure 3. Automated SOC corrections based on OCV curves.

Figure 4. CNN-LSTM SoC prediction: SoC predictions (blue) versus real SoC (orange). (a) E1’s result. (b) E3’s result, trained only with CURRENT and VOLTAGE. (c) E3’s result, trained only with CURRENT and VOLTAGE as well as a reduced window size. There is no stark visual difference between subfigures (a–c).

Figure 5. GRU SoC prediction: SoC predictions (blue) versus real SoC (orange). (a) E1’s result. (b) E3’s result, trained only with CURRENT and VOLTAGE. (c) E3’s result, trained only with CURRENT and VOLTAGE as well as a reduced window size. Subfigures (a,b) show no stark visual difference, whereas (c) show a clear visual difference despite being informed by xAI; this suggests that finding the optimal reduced window size is not trivial.

Figure 6. Fusion Hybrid Model SoC Prediction: SoC predictions (blue) versus real SoC (orange). The Fusion Hybrid Model was able to produce accurate SoC predictions based on visual comparison.

Figure 7. Signal Importance: Importance based on xAI’s InputXGradient. CUMULATIVE_DE’s importance is abnormally high.

Figure 8. xAI: InputXGradient’s average explanation over all samples, with red or blue color signifying importance. (a) Based on CNN-LSTM. (b) Based on GRU. CURRENT and VOLTAGE have high relative importance.

Figure 9. PIEE’s multivariate importance estimate, with red signifying importance. (a) Based on CNN-LSTM. (b) Based on GRU. This suggests that not all time steps within the input window are important for SoC estimation.

Table 1. NN architectures and datasets used in existing works related to battery capacity estimation.

Publication	Data	NN	xAI
Guo et al. [19]	Laboratory simulated	FNN variant	–
Choi et al. [16]	NASA battery dataset [23]	FNN, CNN, LSTM	–
Chen et al. [14]	Laboratory simulated	RNN, LSTM, GRU	–
Bhattacharjee et al. [20]	Panasonic Li-ion dataset [24,25]	CNN	–
Hannan et al. [22]	Laboratory simulated	Self-supervised transformer	–
Our work and contribution	Deployed EV (VW e-Golf) battery pack	LSTM, GRU, CNN-LSTM, Informer, ResNet, etc.	PIEE [26], InputXGradient [27]

Table 2. Vehicle model: Volkswagen e-Golf overview.

Vehicle Model	Cars	Sessions	Full Cycles
Volkswagen e-Golf 35 kWh	200	7000	208

Table 3. Overview of signals.

Signal Name	Unit	Frequency	Description
VOLTAGE	[V]	4 Hz	Voltage value of the battery pack
CURRENT	[A]	4 Hz	Current value of the battery pack
SOC_DISPLAY	[%]	0.1 Hz	SoCe based on actually available capacity
SOC_REAL	[%]	0.1 Hz	SoC based on total battery capacity
VOLTAGE_CELL_001-088	[V]	1 Hz	Voltage for cell blocks 1 to 88
VOLTAGE_CELL_MAX	[V]	1 Hz	Highest voltage among cell blocks
VOLTAGE_CELL_MIN	[V]	1 Hz	Lowest voltage among cell blocks
T_MODULE_01-27	[°C]	0.5 Hz	Temperature of battery modules 1 to 27
T_CELL_MAX	[°C]	0.5 Hz	Highest temperature among cell blocks
T_CELL_MIN	[°C]	0.5 Hz	Lowest temperature among cell blocks
T_CELL_AVG	[°C]	0.5 Hz	Average cell block temperature
CAPACITY_CELL_001-088	[Ah]	-	Capacity of cells 1 to 88 estimated by the BMS
CUMULATIVE_CC	[Ah]	0.01 Hz	Cumulated charged capacity over lifetime
CUMULATIVE_DC	[Ah]	0.01 Hz	Cumulated discharged capacity over lifetime
CUMULATIVE_CE	[Wh]	0.01 Hz	Cumulated charged energy over lifetime
CUMULATIVE_DE	[kWh]	0.01 Hz	Cumulated discharged energy over lifetime
AVG_CONSUMPTION	[kWh]	0.01 Hz	Average consumption per 100 km
MILEAGE	[km]	0.1 Hz	Car’s mileage
T_OUTSIDE	[°C]	0.1 Hz	Outside temperature

Table 4. E1’s NNs and E2’s Fusion Hybrid Model performance results for train, validation, and test sets according to the evaluation metrics.

Model	Train		Validation		Test
Model	MSE	MAE	MSE	MAE	MSE	MAE
LSTM	3.4 $\times 10^{- 4}$	11.8 $\times 10^{- 3}$	5.6 $\times 10^{- 4}$	14.7 $\times 10^{- 3}$	2.9 $\times 10^{- 4}$	12.2 $\times 10^{- 3}$
GRU	3.8 $\times 10^{- 4}$	12.6 $\times 10^{- 3}$	6.2 $\times 10^{- 4}$	15.7 $\times 10^{- 3}$	3.5 $\times 10^{- 4}$	13.3 $\times 10^{- 3}$
CNN-LSTM	4.6 $\times 10^{- 4}$	14.6 $\times 10^{- 3}$	7.3 $\times 10^{- 4}$	18.5 $\times 10^{- 3}$	4.5 $\times 10^{- 4}$	15.4 $\times 10^{- 3}$
Informer	1.9 $\times 10^{- 3}$	33.6 $\times 10^{- 3}$	2.2 $\times 10^{- 3}$	35.8 $\times 10^{- 3}$	1.9 $\times 10^{- 3}$	33.5 $\times 10^{- 3}$
ResNet	19.4 $\times 10^{- 3}$	12.1 $\times 10^{- 2}$	18.2 $\times 10^{- 3}$	11.5 $\times 10^{- 2}$	20.4 $\times 10^{- 3}$	12.5 $\times 10^{- 2}$
ResNet-LSTM	4.0 $\times 10^{- 3}$	50.4 $\times 10^{- 3}$	3.8 $\times 10^{- 3}$	49.9 $\times 10^{- 3}$	3.9 $\times 10^{- 3}$	49.4 $\times 10^{- 3}$
InceptionTime	41.2 $\times 10^{- 2}$	57.8 $\times 10^{- 2}$	37.0 $\times 10^{- 2}$	53.9 $\times 10^{- 2}$	41.5 $\times 10^{- 2}$	58.3 $\times 10^{- 2}$
Fusion Hybrid	3.5 $\times 10^{- 4}$	13.4 $\times 10^{- 3}$	3.1 $\times 10^{- 4}$	13.2 $\times 10^{- 3}$	3 $\times 10^{- 4}$	12.8 $\times 10^{- 3}$

Table 5. E3’s performance results of NNs for train, validation, and test sets according to the evaluation metrics only using the signals: CURRENT and VOLTAGE. The values in brackets indicate the difference compared to the respective metric from E1’s Table A1.

Model	Train		Validation		Test
Model	MSE	MAE	MSE	MAE	MSE	MAE
GRU	3.3 $\times 10^{- 4}$	13.3 $\times 10^{- 3}$	6 $\times 10^{- 4}$	16.7 $\times 10^{- 3}$	5.2 $\times 10^{- 4}$ (+1.7 $\times 10^{- 4}$ )	16.2 $\times 10^{- 3}$ (+3 $\times 10^{- 3}$ )
CNN-LSTM	5.6 $\times 10^{- 4}$	16.5 $\times 10^{- 3}$	8 $\times 10^{- 4}$	18.9 $\times 10^{- 3}$	7.1 $\times 10^{- 4}$ (+2.6 $\times 10^{- 4}$ )	18.2 $\times 10^{- 3}$ (+2.8 $\times 10^{- 3}$ )

Table 6. E3’s performance results of NNs for train, validation, and test sets according to the evaluation metrics only using signals: CURRENT and VOLTAGE, as well as a reduced window size. The values in brackets indicate the difference compared to the respective metric from E2’s Table 5.

Model	Window	Train		Validation		Test
Model	Window	MSE	MAE	MSE	MAE	MSE	MAE
GRU	10	8.6 $\times 10^{- 4}$	20.2 $\times 10^{- 3}$	1.2 $\times 10^{- 3}$	23 $\times 10^{- 3}$	10.3 $\times 10^{- 4}$ (+5.1 $\times 10^{- 4}$ )	21.5 $\times 10^{- 3}$ (+5.3 $\times 10^{- 3}$ )
GRU	15	7.6 $\times 10^{- 4}$	18.8 $\times 10^{- 3}$	1.1 $\times 10^{- 3}$	21.4 $\times 10^{- 3}$	9.1 $\times 10^{- 4}$ (+3.9 $\times 10^{- 4}$ )	20.4 $\times 10^{- 3}$ (+4.2 $\times 10^{- 3}$ )
GRU	25	6.7 $\times 10^{- 4}$	18.1 $\times 10^{- 3}$	9.5 $\times 10^{- 4}$	20.8 $\times 10^{- 3}$	7.9 $\times 10^{- 4}$ (+2.7 $\times 10^{- 3}$ )	19.4 $\times 10^{- 3}$ (+3.2 $\times 10^{- 3}$ )
GRU	35	5 $\times 10^{- 4}$	16.3 $\times 10^{- 3}$	7.4 $\times 10^{- 4}$	18.8 $\times 10^{- 3}$	6 $\times 10^{- 4}$ (+0.8 $\times 10^{- 4}$ )	17.8 $\times 10^{- 3}$ (+1.6 $\times 10^{- 3}$ )
GRU	45	3.8 $\times 10^{- 4}$	14 $\times 10^{- 3}$	6.2 $\times 10^{- 4}$	16.8 $\times 10^{- 3}$	5.1 $\times 10^{- 4}$ (−0.1 $\times 10^{- 4}$ )	15.9 $\times 10^{- 3}$ (−0.3 $\times 10^{- 3}$ )
CNN -LSTM	55	5.6 $\times 10^{- 4}$	16.5 $\times 10^{- 3}$	8.1 $\times 10^{- 4}$	19.1 $\times 10^{- 3}$	7.1 $\times 10^{- 4}$ (−0)	18.3 $\times 10^{- 3}$ (+0.1 $\times 10^{- 3}$ )
CNN -LSTM	53	5.6 $\times 10^{- 4}$	16.6 $\times 10^{- 3}$	8.1 $\times 10^{- 4}$	19.2 $\times 10^{- 3}$	7.1 $\times 10^{- 4}$ (−0)	18.4 $\times 10^{- 3}$ (+0.2 $\times 10^{- 3}$ )

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chan, H.T.J.; Rubeša-Zrim, J.; Pichler, F.; Salihi, A.; Mourad, A.; Šimić, I.; Časni, K.; Veas, E. Explainable Artificial Intelligence for State of Charge Estimation of Lithium-Ion Batteries. Appl. Sci. 2025, 15, 5078. https://doi.org/10.3390/app15095078

AMA Style

Chan HTJ, Rubeša-Zrim J, Pichler F, Salihi A, Mourad A, Šimić I, Časni K, Veas E. Explainable Artificial Intelligence for State of Charge Estimation of Lithium-Ion Batteries. Applied Sciences. 2025; 15(9):5078. https://doi.org/10.3390/app15095078

Chicago/Turabian Style

Chan, Ho Tung Jeremy, Jelena Rubeša-Zrim, Franz Pichler, Amil Salihi, Adam Mourad, Ilija Šimić, Kristina Časni, and Eduardo Veas. 2025. "Explainable Artificial Intelligence for State of Charge Estimation of Lithium-Ion Batteries" Applied Sciences 15, no. 9: 5078. https://doi.org/10.3390/app15095078

APA Style

Chan, H. T. J., Rubeša-Zrim, J., Pichler, F., Salihi, A., Mourad, A., Šimić, I., Časni, K., & Veas, E. (2025). Explainable Artificial Intelligence for State of Charge Estimation of Lithium-Ion Batteries. Applied Sciences, 15(9), 5078. https://doi.org/10.3390/app15095078

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainable Artificial Intelligence for State of Charge Estimation of Lithium-Ion Batteries

Abstract

1. Introduction

2. Related Work

2.1. Battery Testing

2.2. Battery Modeling

2.3. Battery Modeling Using Neural Networks

2.4. Explainable Artificial Intelligence

3. Methodology

3.1. Outline

3.2. Data

3.3. Dataset

3.4. Train, Validation, and Test Splits

3.5. Neural Network Architectures

3.6. Evaluation Metrics

4. Experiments and Results

4.1. E1: SoC Estimation Using Different NN Architectures

4.2. E2: SoC Estimation Using Fusion Hybrid Model

4.3. E3: SoC Estimation Guided by xAI and Importance Estimates

5. Discussion

5.1. Evaluation of Main Themes

5.2. Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Results: Visualization and Tables

Appendix B. Hyperparameters

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI