We combined the aforementioned methods regarding data, datasets in the form of sessions, train–validation–test splits, and NN architectures to perform SoC estimation of the deployed EV battery. There were a total of three experiments:
4.1. E1: SoC Estimation Using Different NN Architectures
This experiment benchmarks the architectures mentioned in
Section 3.5. Comprehensive details regarding their functionality and parameters, as well as any required adjustments, are documented below:
As discussed above, the decoder part is not relevant to our problem, nor is the third limitation. To improve the mentioned limitations, the informer architecture introduces the ProbSparse self-attention mechanism, which needs to calculate the O (ln LQ) dot-product for each query-key lookup. Moreover, instead of fully connected layers, the informer architecture contains 1D convolutional layers followed by max-pooling, reducing overall memory usage. The embeddings in the original implementation are a combination of value embedding, positional embedding, and temporal embedding. However, the temporal embedding may introduce redundant information about our use case since different cars may have the same SoC at different points in time. Therefore, the model is tested with and without temporal embeddings.
The following hyperparameters were analyzed and tuned; the specifics can be found in
Appendix B:
- –
Dimension of the model—the size of the inputs after the embedding
- –
Number of attention heads
- –
Number of encoder layers
- –
Number of filters in the convolutional layers
- –
Type of attention used in the encoder; options: “prob” or “full”
The architectures listed above were chosen for the processing of the EV’s battery data and SoC estimation because they have been shown in existing works to be capable of processing time series data whilst yielding low prediction errors. As the signals from the BMS are also time series data, it follows that the architectures listed above should be able to utilize the signals and perform estimation of SoC for EV batteries. They are also architectures that have been widely established with many existing implementations; thus, their availability and reproducibility make them suitable for experimentation.
The aforementioned NNs were deployed with the following settings:
Dataset: The dataset consists of 269 sessions (265 single-labeled and 4 multi-labeled) recorded for the car model e-Golf. Window segments of 20 min were used as input with a time step interpolation of 1 s and a 30 s stride. The signals used for the NNs were as follows:
CURRENT
VOLTAGE
VOLTAGE_DIFF
T_CELL_AVG
T_DIFF
CUMULATIVE_DE
CUM_CE_DIFF
These were selected based on expert opinions and existing knowledge regarding SoC.
Preprocessing: Outlier removal and interpolation were applied to the dataset. Using MinMaxScaler, all data, including the labels, were normalised to the range of 0 to 1.
Train, validation, test split: Stratified session splitting was described in
Section 3.4. There were a total of 296 sessions; the train set had 188 sessions, the validation set had 40 sessions, and the test set had 41 sessions.
Training procedures: Training involved an initial phase of hyperparameter tuning using Optuna [
51] with the Tree-Parsing Sampler, while the experiments were subsequently recorded on Neptune.ai [
52] as an experiment tracking tool. No oversampling or undersampling techniques were employed for training.
The models trained for 1500 epochs with an initial learning rate of 0.0028408 and a batch size of 32, which was scheduled by a cosine scheduler. Early stopping was used with the Adam optimizer, which terminated training after 300 epochs if the loss experienced no changes. This was applied to prevent overfitting.
Once the set of hyperparameters and the ideal number of epochs were determined by Optuna, the models were retrained, and their SoC estimations were evaluated based on the metrics in
Section 3.6. As previously mentioned, an extensive list of the hyperparameters is available in
Appendix B.
Results: Utilizing the stratified session split method, we trained the different architectures listed for SoC estimation. The results of each architecture with the metrics presented in
Section 3.6 are available in
Table 4. The results of the metrics inform us that most of the model architectures performed well across the train, validation, and test sets. In particular, LSTM, GRU, and CNN-LSTM achieved MSE values lower than 5
and MAE values lower than 20
in terms of SoC estimation. The poorer performances (MSE values higher than 1
and MAE values higher than 30
in terms of SoC estimation) of ResNet, ResNet-LSTM, and InceptionTime suggest that architectures saturated with residual blocks and convolution mechanisms are a poor fit for the signals of EV batteries for SoC estimation. The SoC estimations from CNN-LSTM and GRU can also be confirmed visually in
Figure 4a and
Figure 5a. Extensive details are given in
Figure A1a,
Figure A2a,
Figure A3a,
Figure A4a and
Figure A5a, which are available in
Appendix A. Additionally, an extended version of
Table 4 can also be found in
Appendix A’s
Table A1.
During the experiment, it was found that the signal CUMULATIVE_DE is prone to misleading the models. This prevents the models from learning how to accurately estimate SoC. Because of this, extra care was taken with CUMULATIVE_DE to minimize this effect. This is further expanded upon in
Section 4.3.
4.2. E2: SoC Estimation Using Fusion Hybrid Model
In the domain of multivariate time series regression, the fusion of static and time series features has emerged as a critical avenue for enhancing predictive models [
42]. As mentioned previously in
Section 3.2, the signals within an EV battery are features of a time series. More specifically, the signals can be referred to as the dynamic features, whilst the static features would be some further information extracted from the signals. An example of a static feature would be the average of a signal. Such features can be obtained during the preprocessing of time series signals [
53].
The concept behind the fusion [
54] of features is to make maximum use of the information available in sequences. This would require a hybrid model that can combine temporal patterns with sequential dependencies and the contextual information of other external features. In woek of Li, etal. [
47], the success from the fusion of static and time features via NN attention mechanisms suggests that the same can be transferred to SoC estimation of EV batteries.
The
Fusion Hybrid Model approach seeks to connect the unchanging characteristics of the data with their temporal evolution, aiming to provide a comprehensive understanding of the underlying relationships with contextual information. In the case of EV battery modeling, dynamic features will be signals such as those in
Table 3, and static features will be information extracted from signals, such as the following:
Average current in each window
(CURRENT_AVG_WINDOW)
Average voltage in each window
(VOLTAGE_AVG_WINDOW)
Difference in voltage in each window
(VOLTAGE_DIFF_WINDOW)
Average cumulative discharge energy in each window
(CUMULATIVE_DE_AVG)
Average cumulative charge energy in each window
(CUMULATIVE_CE_AVG)
Average temperature for each window
(T_AVG_WINDOW)
Difference in SoC in each window
(SOC_REAL_DIFF)
Average SoC in each window
(SOC_REAL_AVG)
Difference between the max voltage and min voltage in each window
(VOLTAGE_MAXMIN_DIFF)
The features were fused based on a concatenation-based approach, which was characterized by the straightforward merging of static and dynamic features, forming an extended feature space [
48]. By directly combining these attributes, the model gained immediate access to both the stable attributes and the evolving temporal patterns. However, this requires the managing of potentially high-dimensional feature spaces and would also require careful consideration of normalization techniques to maintain balanced contributions from each feature.
The following settings were used in the experiment.
Dataset: The dataset consists of 269 sessions (265 single-labeled and 4 multi-labeled) recorded for the car model e-Golf. Window segments of 20 min were used as input, with a time step interpolation of 1 s and a 20 min stride.
Features: As this is a hybrid model, there are two sets of features: the dynamic and static features. The dynamic features for SoC include the following:
CURRENT
VOLTAGE
VOLTAGE_DIFF
T_CELL_AVG
T_DIFF
DIFF_CAP
DELTA_RESISTANCE
The static features consisted of the aforementioned static features from the list above. The features were picked based on expert opinions regarding SoC and the outcome from
Section 4.1.
Preprocessing: The dataset was subjected to outlier removal and interpolation. Using MinMaxScaler, all data, including the labels, were normalized to the range of 0 to 1. The process was the same as that described in
Section 4.1.
Train, validation, test split: Stratified session splitting was performed with a total of 296 sessions. This means the train set had 188 sessions, the validation set had 40 sessions, and the test set had 41 sessions. This same setting was used in
Section 4.1.
Model: The Fusion Hybrid Model has 2 CNN layers with 256 neurons and LSTM layers with 66 neurons per layer. It has a feed-forward NN with three layers to capture the static features. The output of this model is the combined linear layer, which takes the individual output of the dynamic and static part of the model. Dropout was used to prevent overfitting.
Training procedure: Training involved an initial phase of hyperparameter tuning using Optuna [
51] with Tree-Parsing Sampler, while the experiments were subsequently recorded on Neptune.ai [
52] as an experiment tracking tool.
The Fusion Hybrid Model trained for 1500 epochs with an initial learning rate of 0.0028408 and a batch size of 32, which was scheduled by a cosine scheduler. Early stopping was used with the Adam optimizer, which terminated after 300 epochs if the loss had not changed. This was done to prevent overfitting.
The resulting SoC estimation from the Fusion Hybrid Model was evaluated based on the metrics in
Section 3.6.
Results: Utilizing the stratified session split method with the Fusion Hybrid Model, the SoC estimation results remained excellent across the train, validation and test sets when evaluated using the evaluation metrics. In terms of SoC estimation, the Fusion Hybrid Model was able to maintain a MSE lower than 5
and a MAE lower than 20
. This is similar to the results of the LSTM and the GRU, with only a difference of up to 0.5
in MSE and a difference of up to 0.6
in MAE The results are available in
Table 4, and the SoC estimations can also be visually confirmed in
Figure 6. Additionally, an extended version of
Table 4 can also be found in
Appendix A’s
Table A1.
4.3. E3: SoC Estimation Guided by xAI and Importance Estimates
As mentioned in
Section 2, there are methods that can inform the user about the importance estimation and significance of features present within the data. This is useful, as it provides information regarding what features are necessary and what features are redundant when estimating SoC. Within this work, we used xAI and importance estimates to guide our modeling of NNs for EV SoC estimations. We utilized a local xAI method named InputXGradient [
27] and a global importance estimate method named Pairwise Importance Estimate Extension (PIEE) [
26]. We compared and verified the results from both methods to further understand the relative significance of the signals as well as the significance of their window size as input. As mentioned before, this is not commonly done in existing work of SoC estimation, as evidenced in
Table 1. Additionally, studies that compare explanation of xAI and importance estimates are even more scarce.
xAI— InputXGradient: This is an xAI method that is used to understand the influence of input features on a model’s prediction. It takes the gradient of the output with respect to the input features, indicating how sensitive the prediction is to changes in each input feature. The gradient of each input feature is then multiplied by its respective input value to highlight which features have the highest influence on the model’s prediction. InputXGradient was chosen because it is an intuitive local explainability approach, and its implementation can be readily found in Captum [
55]. As the aim of this work is not a thorough walkthrough of the working of xAI methods, further information of InputXGradient can be found in Shrikumar et al.’s work [
27].
Importance estimate—PIEE: This method utilizes an embedded pairwise layer to extract information for each feature from the input. This information is captured in the form of a profile, which can be pairwise weights focused or pairwise gradients focused. These profiles can be combined with statistical analysis to reach a relative estimate of feature importance. PIEE was chosen because of its stability and ease of implementation. It is a global importance estimate approach as opposed to the alternative, which is a local explainability approach. Again, as the aim of this work is not a thorough walkthrough of the working of xAI methods, further information of PIEE can be found in Chan and Veas’s work [
26], and details of its implementation can be found on GitHub [
56].
The experiment followed the setup below:
Dataset: The dataset consisted of 269 sessions (265 single-labeled and 4 multi-labeled) recorded on e-Golf. Window segments of 20 min were used as input, with a time step interpolation of 1 s and a 30 s stride. This was the same as in
Section 4.1. The signals used for SoC estimation varied depending on the investigation of feature importance.
Preprocessing: Outlier removal and interpolation were applied to the dataset. Using MinMaxScaler, all the data, including the labels, were normalized to the range of 0 to 1. This was also the same as in
Section 4.1.
Train, validation, test split: Stratified session splitting was applied. There were a total of 296 sessions; the train set had 188 sessions, the validation set had 40 sessions, and the test set had 41 sessions. The same setting was used in
Section 4.1.
Models: The same models from
Section 4.1 were used for feature importance analysis.
Training procedures: Training followed from
Section 4.1. The settings of this experiment were very similar if not the same as the setting of
Section 4.1, as we needed a baseline setting in order to evaluate the effects of the xAI and feature importance estimate methods.
Results—Over-reliance: A thorough evaluation was conducted for the following features, which were taken from
Table 3:
CURRENT
VOLTAGE
VOLTAGE_CELL_MIN
VOLTAGE_CELL_MAX
VOLTAGE_DIFF
T_CELL_AVG
T_CELL_MIN
T_DIFF
SOC_REAL
MILEAGE
CUMULATIVE_DE
CUMULATIVE_CE
CUM_CE_DIFF
The list of features was determined via expert opinions and established existing works [
7,
10,
18] regarding SoC. It served as a comprehensive list to measure the relative importance between the signals.
InputXGradient’s averaged results of relative importance between the signals from the different NNs are presented in
Figure 7. The figure indicates that the models over-rely on CUMULATIVE_DE for SoC estimation. Further investigation revealed that the signal was moderately correlated with SoC, where a Pearson correlation coefficient of
was measured. This suggests that the models were learning the correlation instead of utilizing the signals for their intended purpose of estimating SoC.
This information was not obvious from the dataset and would not have been discovered without the use of xAI. Afterwards, we experimented with different subsets of features as inputs and repeated the procedure to learn the contributions of the features. It was found that the models tended to over-rely on MILEAGE and CUMULATIVE_DE. Therefore, necessary adjustments were made, such as discarding some sessions.
Results—Signals: After the investigation of correlation reliance, we made necessary adjustments to the signals and established a condensed set of features for measuring relative importance based on expert opinions. These were as follows:
CURRENT
VOLTAGE
VOLTAGE_DIFF
T_CELL_AVG
T_DIFF
CUMULATIVE_DE
CUM_CE_DIFF
This set of condensed signals were used to retrain the different NNs, and xAI’s InputXGradient was applied again. The results, demonstrated in
Figure 8, showed that CURRENT and VOLTAGE were both considered to have a higher relative importance within this set of condensed signals. This outcome concurs with expert opinions and the basis of SoC.
We additionally verified the relative importance of the features by retraining the NNs with only CURRENT and VOLTAGE to compare the results of SoC estimation. This is shown in
Table 5. We focused on CNN-LSTM and GRU since the xAI results were obtained from these two corresponding models. The visuals of CNN-LSTM and GRU are available in
Figure 4b and
Figure 5b, respectively. We only focus on the results from the test set in order to be representative when applying the changes to a real application, where it would be impossible to perform any further tuning. Based on the differences from the respective metrics between the architectures, MSE with a difference of up to 3
and MAE with a difference of approximately 3
, we can conclude that there were minimum changes between the performances of SoC estimation. This shows that the NNs trained with only these two features exhibit comparable performance to NNs trained with the condensed set of features. Interestingly, the SoC estimations of ResNet and InceptionTime improved when using only CURRENT and VOLTAGE; this is supported by
Figure A3 and
Figure A5, respectively, in
Appendix A. This further reinforces the importance of the two features within the signals. An extended version of
Table 5 can be found in
Appendix A’s
Table A2.
Results—Time Steps of Input Window: The previous outcome regarding the signals’ relative importance revealed that VOLTAGE and CURRENT are particularly important for SoC estimation. It can also be observed that not all time steps within the input window were considered equally important in
Figure 8.
Therefore, we also investigated the importance of time steps within the multivariate time series context. Specifically, we wanted to know if we could make an informed decision regarding the window size for our input. Here, we used the importance estimate method PIEE [
26], which is more suitable when considering the time steps of a multivariate time series context because it can examine each time point from a global standpoint. We applied PIEE to the NNs trained with CURRENT and VOLTAGE. The results of the importance estimates are shown in
Figure 9, which reinforce the previous observation regarding time steps within the input window.
Utilizing the information of the time steps, we retrained CNN-LSTM and GRU with reduced window sizes, which were informed by their corresponding results of PIEE. The results of the retrained NNs with reduced window sizes are available in
Table 6, and the visuals of CNN-LSTM and GRU are available in
Figure 4c and
Figure 5c, respectively. Regarding
Table 6, again, we are only concerned with the test sets in order to be representative of the challenges of real application. Here, we note that there are results of SoC estimation that have a minimum difference when compared to their counterparts from
Table 5 (up to 0.1
difference in MSE, up to 0.3
difference in MAE); however, there are also results where there is a noticeable difference when compared to their counterparts (5
difference in MSE and 5
difference in MAE). This is due to the open interpretation of PIEE’s estimate. From
Figure 9, we can observe that the method does not produce a definitive value regarding the smallest effective window size; instead, it produces an estimate of importance for each time step, which can be combined to form a heatmap of importance. Using the heatmap of importance, it becomes possible to make informed choices regarding the approximate effective window size. This is demonstrated by the results of CNN-LSTM with window size 53, shown in
Table 6, which is based on
Figure 9a. This can also be visually compared in
Figure 4c. However the effective window size is not necessarily always clear, and this is evident in the example of
Figure 9b, which can lead to a result such as GRU with a window size 45 from
Table 6 and
Figure 5c. In such cases, a systematic reduction evaluation approach can be used to reach a conclusion. This is also demonstrated in
Table 6. An extended version of the table can be found in
Appendix A’s
Table A3. The key findings from the results suggest that the window size can be reduced up to 9–25% and still retain performance.