A Machine Learning-Based Robust State of Health (SOH) Prediction Model for Electric Vehicle Batteries

: The car industry is entering a new age due to electric energy as a fuel in the contemporary era. Electric batteries are being more widely used in the automobile sector these days. As a result, the inner workings of these battery systems must be fully comprehended. There is currently no accurate model for predicting an electric car battery’s state of health (SOH). This study aims to use machine learning to develop a reliable SOH prediction model for batteries. A correct optimal method was also constructed to drive the modeling process in the right direction. Extensive simulations were performed to verify the accuracy of the suggested methodology. A state of health method for data processing was developed. The method involves a complex data-driven model combining Big Data, Artiﬁcial Intelligence (A.I.), and the Internet of Things (IoT) technologies. To establish the most effective technique for certifying the actual condition of real-life battery health, researchers compared the accuracy and performance of several states of health models. For improved understanding and prediction of the condition of health behavior, data-driven modeling has certain signiﬁcant advantages over older methodologies. The methods used in this study can be seen as a revolutionary low-cost, high-accuracy, and dependable approach to understanding and analyzing the state of health of batteries. At ﬁrst, an intelligent model was created using a data-driven modeling strategy. Secondly, the concurrent battery data are qualiﬁed using the data-driven model. The machine learning (ML) method creates a very accurate and dependable model for forecasting battery health in real-world scenarios. Third, the previously established ML model was used to develop a knowledge-based online service for battery health. This web service can be used to test battery health, monitor battery behavior, and perform a variety of other tasks. A variety of similar solutions for diverse systems can be derived using the same technique. The default efﬁciency of the ML algorithmic module, R-Squared (R2), and Mean Square Error (MSE) were also utilized as performance measures. The R2 as a standard is used to examine the effectiveness of a ﬁt. The result is a value between 0 and 1, with 1 indicating a better model ﬁt. MSE stands for mean squared error. A lower MSE number implies superior model performance, since it reﬂects how close the parameter estimates are to the actual values. The training set of the battery model had a score of 0.9999, whereas the testing set had a score of 0.9995. The R2 score was one, with an M.S.E. of 0.03. As a result of these three indicators, the data-driven ML model used in this study proved to be accurate.


Introduction
The environment is facing major challenges, such as global warming, exacerbated by the widespread use of conventional gasoline in automobiles, which releases tons of CO 2 each year [1]. Furthermore, the rising price of crude oil is creating a serious hindrance to the automobile sector, necessitating the development of alternative fueled vehicles. Understanding these challenges and implementing EVs has received a lot of attention and is quickly becoming a useful solution among academic researchers and automotive experts, because they have the potential to lower greenhouse gas (GHG) emissions [2]. The mode of transportation is changing as the number of electric vehicles (EVs) on the road grows. EVs are gaining a lot of popularity due to their success and efficiency in recent years. As many electric car models have only been on the road for a few years, gaining driver trust will take time. Research into the underlying technology of battery management systems (BMSs) has garnered a lot of attention, as electric cars (EVs) have risen in popularity. A battery management system (BMS) was developed to prevent malfunctions and, eventually, catastrophic failures in batteries [3]. The complicated physical-chemical process, on the other hand, is preventing BMSs from being widely used. To improve battery efficiency, the state of charge (SOC), the state of operation (SOF), and the state of health (SOH) should all be continuously monitored in the BMS [4]. Estimating battery capacity is without a doubt one of the most crucial duties for BMS. [5]. Range anxiety, which is induced by the battery's variable state of health (SOH) and remaining useful life (RUL), is one of the key factors limiting the adoption of electric cars (EVs). In the BMS, these two components are crucial. To begin with, the BMS should be able to estimate battery state-of-charge (SOC), swift power availability, and battery state-of-health (SOH) metrics, such as power fade, energy decline, and response to changing cell characteristics when the battery cells package change. A battery management system requires an accurate assessment of battery states. SOH has been defined in various ways, including calendar life, cycle life, power fading, and so on [6]. The SOH of the battery is affected by several factors. The magnitude of the charge/discharge current, ambient temperature, discharge depth, charge control system, over-charge and over-discharge, storage type, and length are all considered. A few experiments have been conducted to determine the battery SOH in the literature. In some research, the impedance calculation approach was used [7], whereas, in others, the battery power was calculated to evaluate the battery SOH [8]. The usage of rechargeable batteries in EV applications has gained popularity in recent years [9]. Lithium-ion batteries are believed to be the most promising energy source due to their high capacity and energy density [10]. In addition to their high energy density (23-70 Wh/kg), Li-ion batteries have very high performance (about 90%) and a reasonable cycle life (3000 cycles at 80 percent depth of discharge). During both cycling and storage, battery capacity fades, and resistance increases as happens with most battery systems. This technology, however, is still fragile and is impeded by a host of aspects, including protection [11], cost [12], recycling, and infrastructure charges. Understanding battery aging mechanisms, as well as safety concerns, is critical for more accurate lifetime forecasts and increased battery efficiency [13]. The most difficult task is to discover aging mechanisms. Many influences from the environmental atmosphere, as well as the charge and discharge modes, interact to produce different aging results. This makes aging comprehension a great challenge, and several studies have attempted to investigate the evidence over the years [14].
The most common method is "coulomb counting," which involves a simple integration of current over time to estimate SOH [15]. It necessitates periodic calibration, which cannot be accomplished in real time [16]. The aging estimation problem was modeled with Equation (1), with an input 'u' (state vector) and an output 'y' (voltage), both dependent on variables 'x'.
This research focuses on applying an optimal Machine Learning (ML) technique to create and develop a robust model based on experimental battery data. This study's method is sufficient to demonstrate the suggested battery model's accuracy and robustness over existing models. Contributions include the following: III. Real-world battery systems can be analyzed and monitored using the model.
The rest of the study is organized as follows.
In Section 2, we present the proposed method in detail. Discussions of the comparative results are provided in Section 3. Section 4 concludes the study and suggests options for future research.

An Overview of the Proposed Method
This study helps to identify a novel approach toward rapid data modeling employing a data-driven ML-based approach. With experimental, realtime, or simulated results, the output models are reliable in robust battery state of health prediction. Resources including a high-end computer, battery testing data, and ML systems were used to construct the SOH model for the battery system. The schematic of the ML modeling procedure is shown in Figure 1. The algorithm guided the overall process, and the following is a complete description of the implementation approach. This research focuses on applying an optimal Machine Learning (ML) technique to create and develop a robust model based on experimental battery data. This study's method is sufficient to demonstrate the suggested battery model's accuracy and robustness over existing models. Contributions include the following: The electric battery's SOH model is resilient as a result of this research. II.
The model behavior is accurate. III.
Real-world battery systems can be analyzed and monitored using the model.
The rest of the study is organized as follows.
In Section 2, we present the proposed method in detail. Discussions of the comparative results are provided in Section 3. Section 4 concludes the study and suggests options for future research.

An Overview of the Proposed Method
This study helps to identify a novel approach toward rapid data modeling employing a data-driven ML-based approach. With experimental, realtime, or simulated results, the output models are reliable in robust battery state of health prediction. Resources including a high-end computer, battery testing data, and ML systems were used to construct the SOH model for the battery system. The schematic of the ML modeling procedure is shown in Figure 1. The algorithm guided the overall process, and the following is a complete description of the implementation approach.

Hardware
A computer with an i5 7th generation processor with 8 CPUs, NVIDIA GeForce G.T.X. 850Ti GPU, and 8 G.B. of RAM was used to conduct the ML process. This computer acted as the data accumulator server and data processor.

Battery Arrays
The battery arrays mentioned in this study were retrieved from the data source [University of Michigan]. In this study, battery 25, 26, and 27 data were used to build the model. A respective model was created to understand their behavior in the battery systems. A compact model was also developed to understand their combined behavior in the system.

Machine Learning (ML) Domain
The ML domain represented the whole ML system. The ML system consisted of a built-in data preprocessing unit and ML algorithmic module. This study used the optimized algorithm from a set of algorithms available in the ML module. The algorithm optimization system found the algorithm to create the model from battery data.

Data Preprocessing
A customized data preprocessing system was also developed in this study. The SOH data were not experimentally generated. A relevant script was written to derive SOH data

Hardware
A computer with an i5 7th generation processor with 8 CPUs, NVIDIA GeForce G.T.X. 850Ti GPU, and 8 G.B. of RAM was used to conduct the ML process. This computer acted as the data accumulator server and data processor.

Battery Arrays
The battery arrays mentioned in this study were retrieved from the data source [University of Michigan]. In this study, battery 25, 26, and 27 data were used to build the model. A respective model was created to understand their behavior in the battery systems. A compact model was also developed to understand their combined behavior in the system.

Machine Learning (ML) Domain
The ML domain represented the whole ML system. The ML system consisted of a builtin data preprocessing unit and ML algorithmic module. This study used the optimized algorithm from a set of algorithms available in the ML module. The algorithm optimization system found the algorithm to create the model from battery data.

Data Preprocessing
A customized data preprocessing system was also developed in this study. The SOH data were not experimentally generated. A relevant script was written to derive SOH data from the charging capacity of the experimental data. The SOH was calculated using Equation (2), given below. where SOH j , Q max j and Q rated are the state of health, the maximum capacity in specific cycle j, and the rated capacity, respectively.

ML Algorithm Optimization
To use algorithms on a trial and test basis from the ML algorithmic domain, it is wise to test the algorithm's performance capability against the data available before using it. This study used algorithmic optimization techniques to obtain the best algorithm from the ML module. The optimization process is an online process for model development. A proper script was written by the author to find the optimum algorithm. The optimization process yielded Classification and Regression Trees (CART) [17] as the optimum algorithm against the battery data used in this study. Thus, the CART algorithm was used to build a model and further the prediction of battery data. The formal definition of the CART algorithm can be given as follows: Given training vectors x i ∈ R n , i = 1, . . . , I, and a label vector y ∈ R i , a decision tree recursively partitions the feature space such that the samples with the same labels or similar target values are grouped together.
Let the data at node m be represented by Q m and N m samples. For each candidate split θ = (j, t m ) consisting of a feature j and threshold t m , we partition the data into Q The quality of a candidate split of a node m was then computed using an impurity function or loss function H( ), the choice of which depended on the task being solved (classification or regression) We selected the parameters that minimized the impurity We recursed for subsets Q le f t m (θ * ) and Q right m (θ * ) until the maximum allowable depth was reached, N m < min samples or N m = 1.

ML Battery Model
After successful execution of the algorithm written for ML battery data, the respective ML model for each battery was produced. These ML models were the logical counterpart of the real battery. This model had the capability to show the same behavior as the same battery used in the experiment. The prediction capability was tested and assessed, as described in the following sections.

Prediction
This module tested the prediction capability for the newly built ML battery models. Several prediction tests were performed to obtain the visualization of different criteria. Both SOH and charging capacity were visualized to show their relationship.

Comparison
This module compared the experimental results with the ML model predicted results. A thorough comparison was conducted to assess the error and accuracy of the prediction results. The following Equations (2) and (3) were used to calculate the accuracy and errors involved in each cycle.
Electronics 2022, 11, 1216 5 of 14 where the CA is the comparative accuracy-related the ML battery model, E MLBat is the error involved in the ML battery model, D Exp are the battery data taken from the source experiment, and DataPred are the predicted data by the ML battery mode.

The Operational Algorithm
The proposed data-driven alternative model was based on experimental data from the batteries CH25, CH26, and CH27, and it learned from the data using the machine learning approach. The ML procedures in this work were carried out using Algorithm 1. The machine learning procedure began with loading new datasets into the machine learning application and finished with the export of the ML model and results. Several machine learning algorithms were accessible, but none of them were acceptable for a particular dataset. The algorithm's suitability and applicability depended on the dataset's intrinsic values. As a result, before using an algorithm, it was necessary to choose one that would be efficient. A variety of linear and nonlinear techniques, ranging from Linear Regression (LR) to Extra Trees (ET), were investigated in this study. The Boosting Method (BM) was shown to have the best performance versus the experimental datasets, which is why it was chosen for further processing of experimental data. The optimum approach for battery datasets was the Decision Tree Regress (DTR), which was obtained using the BM method.
It was necessary to divide the dataset into two parts, the training set, and the test set, to conduct machine learning. Using the test set, the computer learned from the training set and assessed its learning performance. In this study, 80 percent of the data was utilized for training, and the remaining 20% was used for testing. After that, data were scaled to make them more regular in nature, and the model was generated using the specified technique. Therefore, the model was fitted to the test set to determine accuracy and error rates. Finally, the developed model was applied to forecast new data points for CH25, CH26, and CH27. The constructed model accurately predicted data points identical to CH25, CH26, and CH27 experimental data. The data prediction reliability was so good that the error score was in the range of 10 −2 . This minimal score in data prediction validated the output data.
The whole operation of the ML modeling process was pivoted by Algorithm 1. This algorithm tied the custom scripts written for this study together with a built-in ML algorithmic module. The details of the algorithm are given below.

In-Field Vehicle Testing
To obtain battery performance data from a vehicle, a field test was conducted. We used an electric A0 class car with 32,000 km on the odometer and over 300 charge cycles under its belt. The traction batteries were lithium nickel manganese cobalt oxide batteries with a graphite anode.
The nominal capacity of the battery cell was 52 Ah, and the typical voltage measurement was 3.7 V, according to the datasheet. The lower and upper cutoff voltages were 3.0 V and 4.1 V, respectively. An estimated 18.5 kWh of energy could be stored in the battery package, which was made up of 97 batteries.
For battery cells, the BMS voltage measurement precision was one mV, while the current sensing precision was 0.1 percent. Signals such as cell voltage, current, and temperature were monitored and sent to a CAN data recorder with a 2 Hz sampling rate through CAN (Controller Area Network). The only criterion was that the car stopped intermittently to deliver a charging pulse to the batteries, which were driven by three different test drivers.
This work used three driving cycles of three different batteries with identical battery temperatures to assess battery properties identified from distinct driving cycles. There are some cycle statistics shown in Table 1. Cell 01 s measured terminal voltage was the lowest of the 97 cells while in operation, owing to the fact that cell 01 s state of charge was slightly lower than the other batteries after 32,000 miles of service without equalization. Because cell 01 was always the first to reach the lower cutoff voltage of 3.0 V, other batteries were not allowed to totally discharge; hence, cell 01 was chosen for analysis and for SoH estimation in the next section. Table 1 indicates that different driving techniques and random factors affected the energy recovery rate and mileage. Despite this, the difference between the charged capacity from the power grid after the driving cycle and the discharged capacity during the driving cycle was relatively constant.
That means that around 99 percent of the charged capacity could be discharged during driving, resulting in a charging and discharging efficiency of about 99.5 percent. Additionally, it indicated a vital fact: ampere-hour counting within a single driving cycle was exceptionally precise and reliable, in spite of 0.5 s time steps, and with a great deal of dynamic current. Table 2 illustrates the battery driving cycle information. The impedance test equipment, which included a data recording device for signal generation and data acquisition, a power amplifier for signal amplification, and two Series resistors for current measurement, was used to record the battery's impedance performance at various excitation frequencies. Excitation employing sinusoidal voltage was used to measure impedance. Recording and storing with a DC bias, sinusoidal voltages and currents were generated, and their complex quotients were computed to determine the battery's impedance. To decrease measurement noise, a convolution-based approach was used to determine impedance phase delay, which was paired with Parseval's theorem for amplitude computation [27]. The sinusoidal input's DC bias was accurately controlled to avoid charging or discharging the battery during the impedance test. The sinusoidal input's DC bias was accurately controlled to avoid charging or discharging the battery during the impedance test. The sampling frequency was connected to the input signal frequency and varied from 0.025 to 5 kHz. The following were some of the most essential aspects of an impedance test system: sinusoidal voltage excitation frequency range: 0.025 to 5 kHz, sinusoidal voltage excitation DC compensation range: −5 V to 5 V, and sinusoidal voltage excitation AC amplitude: 100 mV. Figure 2 depicts the physical layout of the battery testing system.
The impedance test equipment, which included a data recording device for signal generation and data acquisition, a power amplifier for signal amplification, and two Series resistors for current measurement, was used to record the battery's impedance performance at various excitation frequencies. Excitation employing sinusoidal voltage was used to measure impedance. Recording and storing with a DC bias, sinusoidal voltages and currents were generated, and their complex quotients were computed to determine the battery's impedance. To decrease measurement noise, a convolution-based approach was used to determine impedance phase delay, which was paired with Parseval's theorem for amplitude computation [27]. The sinusoidal input's DC bias was accurately controlled to avoid charging or discharging the battery during the impedance test. The sinusoidal input's DC bias was accurately controlled to avoid charging or discharging the battery during the impedance test. The sampling frequency was connected to the input signal frequency and varied from 0.025 to 5 kHz. The following were some of the most essential aspects of an impedance test system: sinusoidal voltage excitation frequency range: 0.025 to 5 kHz, sinusoidal voltage excitation DC compensation range: −5 V to 5 V, and sinusoidal voltage excitation AC amplitude: 100 mV. Figure 2 depicts the physical layout of the battery testing system.

Results and Discussion
This section provides the complete visualizations of ML battery model outcomes. For fair representation, results were categorized as ML model predicted, error estimation, and accuracy results.

Findings from the ML Method
The results showed an impressive agreement between the experimental data and the data produced by the ML battery model. The capacity and SOH prediction results OF battery 25, battery 26 and battery 27 are shown in Figures 3, Figure 4 and 5, respectively. Figures 3-5 show the capacity plots of batteries 25, 26, and 27 predicted and experimentally confirmed. The experimental capacity was presented with the red dashed line and the predicted capacity with the blue dashed line. Similarly, the SoH plot of the predicted and experimental cases for respective batteries is shown in Figures 6-8, respectively. The experimental SOH was shown with the red dashed line and predicted SOH

Results and Discussion
This section provides the complete visualizations of ML battery model outcomes. For fair representation, results were categorized as ML model predicted, error estimation, and accuracy results.

Findings from the ML Method
The results showed an impressive agreement between the experimental data and the data produced by the ML battery model. The capacity and SOH prediction results OF battery 25, battery 26 and battery 27 are shown in Figure 3, Figure 4 and Figure 5, respectively. with the blue dashed line. The results were subdivided and are described below based on their types.

Results Related to Derived Capacity
As mentioned above, the charging capacity was derived from the aging data. Th respective capacity data and model-predicted data are visualized below.      As shown in Figures 3-5, for each case of capacity prediction, the ML batter had a fine prediction capability to reproduce and trace the experimental behavio battery. There was a tiny deviation from the capacity trend, so the result was exp be reasonably accurate.    As shown in Figures 3-5, for each case of capacity prediction, the ML batter had a fine prediction capability to reproduce and trace the experimental behavio battery. There was a tiny deviation from the capacity trend, so the result was exp be reasonably accurate.  The comparative plots of calculated SOH and battery model predicted SOH are pre sented in Figures 6-8. The ML model prediction findings agreed with the experimenta results. The results mean that the battery model is reasonably accurate. Vehicle batterie can be studied experimentally or in real time using these results. As seen in Figures 6-8 the battery model had a higher agreement with the calculated SOH from experimenta data for every snap of SOH prediction.

Results Related to Derived Capacity
As mentioned above, the charging capacity was derived from the aging data. The respective capacity data and model-predicted data are visualized below.
As shown in Figures 3-5, for each case of capacity prediction, the ML battery model had a fine prediction capability to reproduce and trace the experimental behavior of the battery. There was a tiny deviation from the capacity trend, so the result was expected to be reasonably accurate.

Results Related to SoH
The comparative plots of calculated SOH and battery model predicted SOH are presented in Figures 6-8. The ML model prediction findings agreed with the experimental results. The results mean that the battery model is reasonably accurate. Vehicle batteries can be studied experimentally or in real time using these results. As seen in Figures 6-8 As seen from Figures 9-11, for every scenario of SOH error calculation, the mi error ranged from 0 to 0.10, and the height error was 0.10 to 0.24. These error calc results mean that the model was fairly accurate. This minimum error in the SOH tion is the key to the finer agreement with the experimental data.    As seen from Figures 9-11, for every scenario of SOH error calculation, the m error ranged from 0 to 0.10, and the height error was 0.10 to 0.24. These error cal results mean that the model was fairly accurate. This minimum error in the SOH tion is the key to the finer agreement with the experimental data.    As seen from Figures 9-11, for every scenario of SOH error calculation, the m error ranged from 0 to 0.10, and the height error was 0.10 to 0.24. These error cal results mean that the model was fairly accurate. This minimum error in the SOH tion is the key to the finer agreement with the experimental data.    As seen from Figures 9-11, for every scenario of SOH error calculation, the minimum error ranged from 0 to 0.10, and the height error was 0.10 to 0.24. These error calculation results mean that the model was fairly accurate. This minimum error in the SOH prediction is the key to the finer agreement with the experimental data. The battery ML model had healthier agreement due to learning from the data and adjusting to new data. This showed the power of ML-based modeling.

Accuracy Results
Two different methods were used to define accuracy. The first was the accuracy formula defined in this study in Equation (2), and the second was the accuracy metrics related to the default ML algorithmic domain. All of the accuracy metrics showed that the ML-based battery model was accurate.
It is seen from Figures 12-14 for every scenario of SOH prediction, the ML battery model had the lowest accuracy of 99.8%, along with the highest possible accuracy of 99.99%. Whatever accuracy it had was above 99%.
Electronics 2022, 11, x FOR PEER REVIEW Two different methods were used to define accuracy. The first was the accur mula defined in this study in Equation (2), and the second was the accuracy me lated to the default ML algorithmic domain. All of the accuracy metrics showed ML-based battery model was accurate.
It is seen from Figures 12-14 for every scenario of SOH prediction, the ML model had the lowest accuracy of 99.8%, along with the highest possible accu 99.99%. Whatever accuracy it had was above 99%.   Electronics 2022, 11, x FOR PEER REVIEW Two different methods were used to define accuracy. The first was the accur mula defined in this study in Equation (2), and the second was the accuracy me lated to the default ML algorithmic domain. All of the accuracy metrics showed ML-based battery model was accurate.
It is seen from Figures 12-14 for every scenario of SOH prediction, the ML model had the lowest accuracy of 99.8%, along with the highest possible accu 99.99%. Whatever accuracy it had was above 99%.      Secondly, the ML algorithmic module's default accuracy, R-Squared (R2) and Mean Square Error (MSE) were used as another accuracy metric. The R2 metric measures the goodness of a fit. The value ranges from 0 to 1; 1 indicates a better model fit. MSE is the mean of the squared errors. The MSE indicates how close the predicted values are to the actual values; hence, a lower MSE value signifies good model performance. The accuracy score of the training set of the battery model was 0.9999, and the testing set was 0.9995. The overall R2 score was one, and the MSE was 0.03. Hence, these three measures implied that the data-driven ML model used in this study was accurate. Furthermore, the built-in accuracy results were consistent with the calculated accuracy results.