1. Introduction
Li-ion batteries (LIBs) are widely used in various applications, and deep neural networks (DNNs) were increasingly adopted to estimate their states, especially as battery management systems are connected to cloud systems. However, the effect of the hyperparameters on the accuracy of the LIB state estimation using DNNs was not adequately studied. While many DNNs can be generated with various hyperparameters outside of models, domain experts can optimize them for specific applications [
1]. Nevertheless, the performance of a model depends on the specific problem and data characteristics. Black-box methods [
2], such as Bayesian optimization, genetic algorithms, or particle swarm optimization, were used to find the optimal hyperparameters. However, these methods have limitations in practice, since a single hyperparameter can have an impact on multiple other hyperparameters, making it challenging to find the global optimal solution using black-box methods alone. Therefore, it is essential to understand how the hyperparameters affect the model’s accuracy and to develop effective hyperparameter optimization strategies.
LIBs are essential components in electric vehicles, but accurately estimating and verifying their internal states, including the state of charge (SOC), state of health (
SOH), state of energy (SOE), state of power (SOP), state of temperature (SOT), and remaining useful life (RUL), remains a challenge in real-world scenarios [
3]. Despite the potential of DNNs to address this challenge, their adoption was hindered by several factors, including (1) limitations in the processing power, physical memory, and communication speed of battery management systems (BMSs); (2) difficulty in labeling the internal states of LIBs, as they are not directly measurable; (3) insufficient data to train DNNs due to the lack of logging or transmission of the necessary data by BMSs to internal memory or cloud servers; and (4) the safety-critical nature of BMSs, which are typically mixed-critical systems, while DNNs are nondeterministic algorithms. However, recent advances in architectural design and cloud technologies enabled automotive systems to send large amounts of data and adopt DNNs for the internal state estimation of LIBs [
4].
Figure 1 presents a generalized high-level architecture that utilizes data-driven algorithms in the cloud and in a vehicle for various applications in the energy, body, and chassis domains. The architecture consists of three layers: electronic control units (ECUs) for real-time data processing and safety-critical functions, edge computers for temporary data storage and personalized machine learning algorithms, and the cloud for data storage, analysis, and generalized machine learning algorithms. The ECUs, edge computers, and cloud are designed to accommodate different levels of computational power, data storage, and safety criticality. The offline process supports the overall components of three layers by providing initial machine learning algorithms from laboratory data and continually improving these algorithms with historical data from the cloud. The data-driven methods are represented by green boxes, while blue boxes indicate the existing components that are not specific to data-driven techniques. The proposed architecture facilitates the integration of data-driven algorithms into safety-critical automotive systems for improved estimation accuracy and safety.
The proposed architecture for the energy domain, as shown in
Figure 2, was derived from the generalized high-level architecture. The energy domain architecture was designed to tackle two key assumptions that arise in real-world scenarios related to catastrophic failures and the limitations of the training dataset. First, catastrophic failures may occur when system components such as LIBs and BMSs exceed the managed tolerance levels. To mitigate these risks, anomaly detection techniques can be employed. It is important to note that some techniques such as distance-based techniques may exhibit worse performance in high-dimensional data [
5]. However, our diversified DNNs, which estimate the
SOH and reduce dimensionality, allow for outlier detection using their outputs. Second, the training dataset may not encompass all possible variations in the battery state, leading to suboptimal solutions and overfitting [
6]. In contrast to the majority of model-based and data-driven methods that assume the observed variance [
7,
8,
9], our approach incorporates the impact of the unobserved variance in the battery state estimation.
The architectural design enhances the estimation accuracy and safety in automotive systems by combining deterministic and nondeterministic algorithms. The cloud manages the energy storage systems of vehicles, utilizing a generalized DNN for the battery state estimation and personalized DNNs for online training, providing scalable computing power and data storage but with potential risks of data loss from vehicles. Personalized DNNs, trained using data from a single vehicle, consider the variations in driving and storage conditions, while a generalized DNN, trained using data from multiple vehicles, accounts for the variations from vehicle to vehicle, cell to cell, module to module, and pack to pack. Continual online training of personalized DNNs keeps the local DNNs on the energy domain control units (e-DCU) in each vehicle up to date, mitigating the risk of data loss. Comparing the results of the generalized DNN and personalized DNNs enables the detection of outliers, considering variations that were not observed in the training data. The effectiveness of detecting the abnormal states of LIBs or systems relies on both the generalizability of the generalized DNN and the accuracy of the personalized DNNs. A Luenberger observer, a deterministic algorithm on the BMS with Automotive Safety Integrity Level (ASIL) D, provides the ASIL D signals. The ASIL rates the potential severity of an automotive system malfunction or failure, with level D being the highest level of risk and A being the lowest level of risk. ASIL D is typically required for BMS in electric vehicles due to the criticality of the battery operation.
Nondeterministic algorithms such as the extended Kalman filter (EKF), recursive least square, and DNN on the e-DCU are passed through a plausibility check or a selection function. The ASIL B signals from these algorithms are bounded by the ASIL D signals to ensure that the estimation results are within the acceptable range as determined by the deterministic algorithm, thereby improving accuracy while maintaining safety [
10]. The selection function can incorporate ensemble strategies such as stacking and voting to optimize performance while considering resource constraints. This is particularly useful when training data are limited, and the cloud does not have a large dataset from vehicles [
11].
A significant challenge in using DNNs for SOH estimation is the limited availability of complete datasets for LIB degradation. Obtaining this data is a time-consuming process that involves both laboratory testing and data collection from operational vehicles over an extended period. Due to resource constraints and unrevealed variations in development and manufacturing processes, it is impractical to train a DNN using large datasets that account for all variations at the initial stage. To address this challenge, it is essential to improve the generalizability of models trained using small datasets with good cross-validation performance. Generalizability refers to the ability of a model to accurately estimate the SOH when applied to data and conditions that differ from those it was trained on. Enhancing the generalizability of the model ensures that it can perform effectively in real-world scenarios before additional online training.
This paper proposes a novel system software architecture for estimating the state of health (
SOH) in lithium-ion batteries intended for use in electric vehicles, incorporating functional safety considerations. The architecture includes a personalized DNN, a generalized DNN, and an outlier detector. Firstly, an overview of the challenges of
SOH estimation in batteries is provided, along with a discussion of related work in this area. The approach to data cleansing and feature extraction from the NASA Prognostic Center of Excellence battery data [
12] is then described. Subsequently, experiments are presented to evaluate the feasibility of the proposed architecture and to examine the existence of generalized and personalized DNNs through various hyperparameter settings. An outlier detector is also utilized to highlight variations and detect abnormal states in the
SOH estimation between the generalized DNN and personalized DNN. Finally, the findings, limitations, and future research directions are discussed. This study makes a significant contribution to the ongoing research in this field, while also providing a roadmap for future studies.
4. Experiments and Results
4.1. Training and Testing Method
To evaluate the feasibility of the proposed architecture and demonstrate the necessity for both the generalized and personalized DNN models, we trained and evaluated the DNN models for predicting the
SOH. We utilized the Mahalanobis distance [
32] as an outlier detector to detect abnormal states by considering variations between cells or a generalized DNN and personalized DNNs, as shown in
Figure 6. By evaluating both the generalized and personalized DNN models, we demonstrated their respective strengths and limitations in predicting the
SOH.
For the generalized DNNs, each model was trained on one LIB and evaluated by the average prediction accuracy across all the LIBs. In contrast, for the personalized DNNs, all the LIBs were used to train the model, and the prediction accuracy was evaluated for each individual LIB. All the models were trained and validated with 300,000 training epochs, with the best internal parameters recorded automatically to avoid overfitting from prolonged training. To evaluate the accuracy of the models, we utilized the RMSE as the performance evaluation metric during both the training and validation process. The RMSE was chosen due to its widespread usage in similar studies to assess the accuracy of prediction models. The RMSE measures the difference between the predicted and actual values, which allows for comparison with previous research and facilitates a more comprehensive understanding of the model performance.
TensorFlow, Python, and Jupyter Notebook were utilized to develop a model generation and validation program. The model generation section enabled the creation of multiple models by inputting hyperparameters such as loss functions, learning rate, β_1, β_2, and the AMSGrad of the ADAM optimizer, the number of hidden layers and nodes, batch normalization, L_2 regularization, dropout regularization, and Gaussian noise. The best model, as determined by the accuracy during training, was automatically recorded along with its internal parameters, hyperparameters, and training and validation histories. This prevented overfitting issues that may arise from excessive training and allows for complete automation without human intervention by instantiating the model generation and training and validation classes with various hyperparameters.
Table 1 summarizes the hyperparameters used in the generalized model (M57) and personalized model (M65) among the 65 models generated and evaluated in
Table A1 and
Table A2.
4.2. The Generalized Models
The generalized models were trained from LIB B7 and validated using LIB B6. To assess the generalizability of the models, cross validation was performed using LIB B5 and B18, which helped to identify issues with overfitting and poor generalization. By cross-validating the models, it was ensured that the models could accurately predict the SOH of other LIBs beyond those used for training and validation. The best generalized models were selected based on the highest average accuracy among the models. The mean SOH prediction values were used to evaluate the accuracy, since the SOH does not change much within a cycle in real-world applications. The real-time SOH provides insight into the difficulty of training at each cycle, with a significant difference between the minimum and maximum indicating a challenging training process.
Among the DNN models evaluated in this study, models 57, 40, 59, and 41 demonstrate higher average accuracy than the other models. These models were generated using
tanh activation and Huber loss functions and did not include batch normalization, dropout regularization, or AMSGrad.
Table 2 provides a comparison of the differences between these models. Notably, model 57, which included four hidden layers and did not use L2 regularization, exhibited the highest average accuracy.
The training and validation accuracy, number of training epochs required to optimize the model parameters, as well as the real-time
SOH and mean
SOH predictions are presented in
Figure 7. The model 57 was trained for approximately 6000 epochs. The predictions for the LIB B6 were found to be more accurate compared to the other cells. However, the
SOH prediction for the LIB B7 deviated at approximately the 125th cycle, while the overall
SOH prediction for the LIB B5 was biased. The convergence of the
SOH prediction for the LIB B18 was observed only at approximately the 50th cycle. The accuracy comparison of the four best models in
Table 3 reveals similar accuracies, suggesting that the L2 regularization was not a critical factor. However, the low accuracies for LIB B5 and B18 compared to LIB B6 and B7 may be attributed to variations between the different LIBs or the insufficient training of the models.
4.3. The Personalized Models
In contrast to the generalized models, the personalized models were trained and validated using data from all the LIBs without cross validation, with the aim of achieving the highest possible accuracy without consideration for generalizability. The best personalized models were selected based on the highest training and validation accuracy among all models. Among the DNN models evaluated, models 65, 7, 8, and 46 demonstrated higher training and validation accuracy than the other models and were generated using batch normalization, AMSGrad, and Huber loss functions without dropout regularization. The differences between these models are outlined in
Table 4. Notably, the
ReLU activation function used in models 65, 7, and 18 was found to be more accurate compared to the
tanh activation function used in model 46, suggesting that the
ReLU activation function was better suited for predicting the
SOH.
The experiments demonstrated that model 65, an AMSGrad-batch FFNN, achieved the highest training accuracy among all the models.
Figure 8a–d displays the loss during training and the corresponding
SOH prediction curves for all LIBs. However, the results for LIB B18 were not obtained using the original dataset, as the
SOH predictions for LIB B18 showed a significant improvement in accuracy using a cleaned dataset, with 11.3544% RMSE for the original dataset and 0.2513% RMSE for the cleaned dataset. Comparing model 65 with the improved radial basis function NN [
28] using the same dataset,
Table 5 demonstrates that model 65 achieved higher accuracy in terms of the
SOH prediction for all the LIBs. The LIB 18 results from [
28] were not available. These results suggest that the personalized models, particularly model 65, could be effective for predicting the
SOH of LIBs in real-world applications.
4.4. Outlier Detector
To compare the performance of the two DNNs, we calculated the absolute error between their SOH estimations using the mean absolute error (MAE) metric. A high MAE could indicate the divergence of the two algorithms, which may be caused by either poor training of the neural networks or large cell-to-cell variations in the data. The MAE provides a measure of the overall difference between the two algorithms, but it may not be able to identify the exact reason behind it.
To complement the MAE, the Mahalanobis distance was employed as an additional measure to detect abnormal degradation. The Mahalanobis distance considers the covariance of the data and captures the correlations between different variables. The gradient of the Mahalanobis distance was used to monitor the rate of change of the distance with respect to the SOHs, which could indicate the occurrence of abnormal degradation. By monitoring the rate of change in the Mahalanobis distance, abnormal degradation can be detected in a more nuanced way than using the MAE alone. The use of both the MAE and Mahalanobis distance provides a comprehensive approach to detecting abnormal degradation and ensures the accuracy and safety of the system.
The Mahalanobis distance, shown in Equation (4), which considers the covariance between variables, is a more accurate measure of dissimilarity compared to the Euclidean distance.
A vector
consists of two variables,
G_SOH and
P_SOH, representing the
SOH of a generalized DNN and a personalized DNN, respectively, as shown in Equation (5).
Each vector is subtracted by the mean
; then, we multiply the deviation vector by the inverse of the covariance matrix for the correlation between
G_SOH and
P_SOH, as shown in Equation (6).
Finally, the gradient of the Mahalanobis can be calculated by .
Figure 9 presents the comparison of the Mahalanobis distance and the absolute error between the
SOH estimations from the generalized DNN and the personalized DNN for all the LIBs. The MAE and covariances between model 57 and model 65 for all the LIBs were as follows:
The results of our analysis revealed that the Mahalanobis distances for LIB B6 were relatively high despite the low absolute errors. This is attributed to the high covariance between the output variables, which may lead to an overestimation of the Mahalanobis distance or an underestimation of the absolute errors. On the other hand, the Mahalanobis distances for LIB B18 were relatively low despite the high absolute errors, which can be explained by the low covariance between the output variables. Additionally, we found that the gradient of the Mahalanobis distance for LIB B5 and B7 at the 38th cycle was approximately 0.005, indicating an abnormal degradation of these batteries. This suggests that the Mahalanobis distance can be a useful tool for detecting abnormal behavior in LIBs.
6. Discussion
The findings emphasize the necessity of both the generalized DNN and the personalized DNN, as they exhibited different hyperparameters and achieved generalizability and high accuracy, respectively. In addition, the Mahalanobis distance was utilized as an outlier detector to evaluate the feasibility of the proposed architecture and detect abnormal degradation at specific cycles.
However, there were several limitations to the study. Firstly, due to the limited availability of datasets, the generalized DNNs were not trained and evaluated on various cell types and operation conditions. Secondly, the analysis did not consider variations from module to module, pack to pack, and vehicle to vehicle, since the datasets came from cell tests. Thirdly, it was challenging to distinguish between the reasons for the differences, whether due to variations between cells or training problems of the generalized DNN, as no experiments were conducted on variations between cells.
The limitations of the present study can be addressed in future research by utilizing real-world large datasets from vehicles obtained through the cloud. This would enable the training and evaluation of the generalized DNN on various cell types and degradation conditions, as well as the consideration of variations from module to module, pack to pack, and vehicle to vehicle. Furthermore, conducting experiments to evaluate the variations between cells and packs can help distinguish the reasons for the observed differences highlighted by the Mahalanobis distance, whether they are due to variations in cells and packs or training issues of the generalized DNN.
7. Conclusions
The proposed architecture utilizing machine learning algorithms demonstrated the potential to enhance SOH prediction accuracy, identify unmanaged variations in manufacturing and development processes, and detect abnormal degradation. Through the integration of a generalized DNN trained on small datasets for generalizability, a personalized DNN trained on all datasets for accuracy, and an outlier detector to compare the outputs of the two DNNs, the necessity for both models in achieving high accuracy for all LIBs was demonstrated. The experiments identified two FFNN models with the highest accuracy among 65 models generated, achieving an average cross-validation accuracy of 4.6% RMSE for the generalized model and an average accuracy of 0.33% RMSE for the personalized model. The Mahalanobis distance was utilized to detect abnormal degradation by considering the differences between the outputs of the generalized DNN and the personalized DNN.
However, the study was limited due to the use of limited datasets and the lack of consideration for variations between the module to module, pack to pack, and vehicle to vehicle. As a result, future research will be conducted in four parts based on the feasibility analysis of the proposed architecture in this study. Firstly, data will be acquired from cells and packs used in real vehicles under various charging and discharging conditions in laboratory tests to further train and validate the DNN models and to assess their performance in real-world scenarios. Secondly, an embedded controller running the personalized DNN and outlier detectors will be prototyped, with a focus on optimizing its performance for use in electric vehicles. Thirdly, SOH labeling will be performed on the cloud data from vehicles, since the real SOH value is not easily measurable under real vehicle data. This process will involve developing new labeling techniques and validating them against existing methods. Finally, we will train a generalized DNN and personalized DNNs using labeled data from multiple vehicles and individual vehicles, respectively. We anticipate that the generalized DNN will improve with larger datasets, and the outlier detector will become more robust. These additional research efforts will allow for a comprehensive evaluation of the potential benefits of the proposed architecture for battery state estimation and anomaly detection in real-world scenarios.