Lithium-Ion Battery Health Prediction on Hybrid Vehicles Using Machine Learning Approach

: Efforts to decarbonize the world have shown a quick increase in electric vehicles (EVs), limiting increasing pollution. During this electric transportation revolution, lithium-ion batteries (LIBs) play a vital role in storing energy. To determine the range of an electric vehicle (EV), the state of charge and the state of health (SOH) of the battery pack is essential. Access to high-quality data on battery parameters is a crucial challenge for researchers working in the energy storage domain due primarily to conﬁdentiality constraints on manufacturers of batteries and EVs. This paper proposes a hybrid framework for predicting the state of a lithium-ion battery for electric vehicles (EV). Electric vehicles are growing worldwide because of their environmental and sustainability advantages. Batteries are replacing fossil fuels in electric vehicles. In order to prevent failure, Li-ion batteries in electric vehicles should be operated and controlled in a controlled and progressive manner to ensure increased efﬁciency and safety. An extreme gradient boosting (XGBoost) algorithm is used in this paper to estimate the state of health (SOH) of lithium-ion batteries used in electric vehicles. The model is subjected to error analysis to optimize the battery’s performance parameter. The model undergoes an error analysis to optimize its performance parameters. Furthermore, a state of health (SOH) estimation method based on the extreme gradient boosting algorithm with accuracy correction is proposed here to improve the accuracy of state of health (SOH) estimation for lithium-ion batteries. To describe the aging process of batteries, we extract several features such as average voltages, voltage differences, current differences, and temperature differences. The extreme gradient boosting (XGBoost) model for estimating the state of health (SOH) of lithium-ion batteries is based on the ensemble learning algorithm’s higher prediction accuracy and generalization ability. Experimental results suggest that the boundary gradient lifting algorithm model is capable of more accurate prediction.


Introduction
Environmental degradation has been heavily impacted by modified global conditions and the greenhouse effect for more than two decades, now that energy sources and storage methods have been studied and designed due to this growing demand. The Li-ion battery is the most widely used energy storage device and is often utilized as the primary power supply for electric vehicles. Lithium-ion batteries offer several benefits over conventional batteries, such as lead-acid batteries and Ni-H batteries. The benefits are a lower selfdischarge rate, longer cycle life, and higher power density [1]. Energy storage systems are central to electric vehicles' safeness and user approval, even though their performance has been significantly enhanced [2]. Transport sectors such as vehicles consume significant fossil fuels. Electric vehicles and plug-in hybrid electric vehicles can replace conventional fuels in cars and reduce fossil fuel use in energy production and distribution [3]. Batteries are the primary energy source for battery electric vehicles or hybrid/plug-in hybrid electric vehicles [4]. Battery technology used in automobiles mainly utilizes lithium due to its remarkable properties, including energy density, fast charging, low maintenance, and long lifespan; furthermore, those lithium-based solutions can provide highly powerful, light, and compact configurations [5][6][7][8]. Moreover, the management of charge and discharge phases is a critical factor in determining battery performance and dependability. Avoiding overcharging or deep discharging of the battery requires a suitable handling of these processes, from which such damage would be irreversible or at least hard to reverse. Advancing the battery lifespan, preparing a trip away and charging stops, and optimizing the discharge of energy of hybrid electric vehicles require constant and accurate monitoring of the battery's state [9]. Reliably detecting the battery status is crucial to maximizing the battery's performance. As a result, a method must be developed for accurately detecting battery levels' state of health (SOH) [10]. In order to properly diagnose a battery, two primary parameters should be considered: the residual energy in the group, known as the state of charge (SOC), and the degradation undergone by the battery, known as the state of health (SOH) [11]. During preliminary experimental characterizations in a laboratory environment, carmakers sometimes use look-up tables (LUTs) to map the state of health (SOH) behavior. Testing is usually performed utilizing so-called direct techniques, which measure the voltage and impedance of the battery using ampere-hour counting [12]. In a given use scenario, a battery's state of health explains the degradation of its performance due to a combination of internal and external elements which describe its profile. Numerous dependability problems have occurred since the fundamentals have not been correlated with the dynamically modifying elements in battery use cases. There are no insights from the state of charge regarding which events influence the battery [13]. In electric vehicles, Li-ion batteries, as storage components, play a significant role. In order to make sure that these batteries meet EPA needs, the state of health of their electrodes must be checked [14]. The state of health is one of the most critical feature for battery management systems (BMS) [15][16][17][18]. Even specific equation or process cannot be used to calculate the aging of batteries and the state of health can provide great insight. In addition to improving accuracy and reducing computational costs, an efficient choice of measurable features is critical since the state of health is not directly measurable. Choosing the numerous decisive elements for the prediction of battery health improves the accuracy of the outcome and speeds up the method. A variety of methods can determine the health of a battery [19][20][21]. Various battery health parameters have been used to predict the state of health (SOH) and data-driven techniques such as machine learning and statistics. Statistical and machine learning models need substantially lower computational costs than deep learning methods, as is evident from the existing state of health (SOH) prediction models in the literature [22]. The impressive learning effect of XGBoost, which, at the system level, is genuinely an extensible tree-enhanced machine learning approach, along with its high learning efficiency, has attracted a wide attention [23][24][25][26][27]. As a response to the limitations above, we suggest estimating the state of health for lithium-ion batteries by using the extreme XGBoost algorithm in an electric vehicle. This can virtually avoid the overfitting issue.
In the current paper, we suggest a framework for day-ahead estimation of hybrid electric vehicles (HEV), which is more efficient in predicting the state of health (SOH). The contributions of the paper to the area include the following: • An approach to measure the state of battery health prediction. • A hybrid-car prediction model based on battery parameters. • A comparative experiment with attention to comparing existing hybrid models with our suggested model shows that the suggested model is accurate and has better performance.
This paper is structured as follows: Section 2 offers a literature survey that details related work. Section 3 introduces the details of the dataset and the feature selection of the suggested framework is presented in Section 4. The validation of the proposed method is demonstrated in Section 5. The conclusions and future work are given in Section 6.

Literature Survey
This section presents a literature study related to the state of health (SOH) prediction and a summary of methods utilized to design batteries in electric vehicles. There are two kinds of power sources incorporated in hybrid electric vehicles: thermal and electrical. Energy management strategies at the highest level control their synergetic operations. The configurations, modeling, and optimization of these strategies have been summarized in [28]. A framework for developing techniques for hybrid electric vehicles that manage energy to optimize battery lifetime can be seen in this context. We can understand that by improving the size of the electrified powertrain during the vehicle development phases, it is possible to correctly estimate the impact of the hybrid electric vehicles' powertrain process on the life of the high-voltage battery [29]. A battery management system is built into all lithium-ion battery packs installed in hybrid and pure electric vehicles. The battery management system includes hardware and software used for management, using an algorithm to determine battery states. The technical literature has been studied, and the individual processes have been classified into different classes [30]. In new energy electric vehicles, lithium-ion batteries provide the energy; a lithium-ion battery offers unique advantages over nickel-metal hybrid batteries, nickel-chromium batteries, and lead-acid batteries. These features make this automotive battery an ideal choice for many car manufacturers. It is high on energy efficiency, has a wide temperature range, has a low self-discharge rate, and has a long cycle life. As a rule, the battery's performance in an electric vehicle needs to be replaced if it has degraded to the point where it may compromise the driving and safety of the vehicle [31].

Hybrid Electric Vehicles
Hybrid electric vehicles were developed to meet environmental policies and cope with rising fuel prices by providing a better fuel economy and lower tailpipe emissions. A detailed analysis of the concept and an analysis of the way forward have been performed, comparing hybrid electric vehicles to conventional vehicles. In summary, hybrid electric vehicles and energy management strategies have been reviewed [32]. Battery electric vehicles can be a solution to an environmental problem. It has been considered a vital method related to batteries, electronic control, etc., to improve the safety and reliability of the electric vehicle. This has been described based on the technical advancements of electric cars and the emerging technologies for their future application [33]. A predictive energy management strategy for hybrid electric vehicles based on battery aging, by considering the temperature, was proposed with the main aim of determining degradation and battery life costs, since lithium-ion battery degradation is a problem at present in hybrid electric vehicles. The paper used model predictive control to evaluate urban transportation in real conditions and to optimize energy management based on the battery electrical-thermalaging dynamics [34]. Mechanics, electric motors, and power electronics form a hybrid electrical vehicle. An article provided a general description of hybrid electrical vehicles, with the objective to describe their benefits and drawbacks, classify the different types of cars, and discuss energy management strategies [35]. Another paper compared the common thermal management systems for lithium-ion batteries and the benefits and disadvantages of each strategy, focusing on energy and environmental issues [36].

Statue of Health Estimation and Prediction of Batteries
Evaluating the state of health of lithium-ion batteries with adequate accuracy is essential for improving the state of charge estimation, guaranteeing safety, and improving their service life. A new technique for estimating the state of health in lithium-ion batteries for electric vehicles was presented in [37], founded on the datasets of partial charging voltage under constant current and current value under constant voltage. Based on the partial constant-voltage (CV) charging data, a novel state of health (SOH) estimator was suggested. Different indicators of constant-voltage (CV) health were analyzed thoroughly aiming at a result of constant-voltage (CV) charging partiality. There are practical challenges to obtain a precise estimation of a battery's state of health. That estimation was made using the extracted constant-voltage (CV) capacity in [38]. The state of health (SOH) of lithiumion batteries in electric vehicles was estimated by determining the size capacitance of its matching RC circuit model to get the attenuation factor for the bulk capacitance in different cycles [39]. A data-driven prediction technique called support vector regression was applied based on partial incremental capacity curves to establish a battery degradation model. Decomposed incremental capacity curves were used as training datasets to extract battery health features cycles [19]. An incorrect state of health estimation and prediction can effectively result in an incorrect estimation of the battery's utilization state, compromising the ability to catch battery's fault states and raising the risk of hidden safety stakes in the battery system [40].

Other Methods
Using an ensemble learning algorithm, Li et al. offered an assessment model for batteries, which focused on predicting the state of health to verify and validate a battery's performance [41]. Another paper developed a method to predict lithium-ion batteries' remaining life using an effective state of health indicator and a moving-window method based on a real battery management system. By using the partial-charge voltage curves of the cells, a state of health indicator was extracted to calculate the battery's remaining useful life [42]. It is crucial to implement online life-cycle state-of-health assessments in battery management systems to ensure battery performance and reliability. An online measurable battery-parameter-based method for self-adaptive life-cycle assessment of lithium-ion batteries was proposed for less computational complexity, which was achieved by training the model with least squares support vector machines [43]. In a battery management system, estimations of the battery's state of charge and state of health are essential since they represent the power demand for electric vehicles. Based on a dual sliding mode observer, Chen et al. proposed a method for estimating the state of charge (SOC) and state of health (SOH) together, taking into account the capacity-fading factor [44]. The state of health of lithium-ion batteries was predicted by a backpropagation neural network optimized through genetic algorithms, which was studied for analysis based on appropriate aging mechanisms and the state of health prediction of lithium-ion batteries over their whole lifespan [45][46][47][48][49][50]. Based on deep reinforcement learning, a hybrid battery system comprising a high-power and high-energy battery pack was described. In order to minimize energy loss and raise both the electrical and thermal safety of the system, an energy management strategy was generated based on the electrical and thermal characterization of the battery cells [51]. For electric vehicles, this article offered a novel resource allocation scheme based on deep reinforcement learning that did not work at the level of complex vehicle dynamics. Because different energy storage devices behave differently under varying operating conditions, using multiple energy storage devices in parallel increases their maintenance. Therefore, the suggested strategy goals were used to evaluate optimal approaches [52]. A summary of the experimental techniques studied, and their benefits and disadvantages can be found in Table 1.

Ref. No. Benefits Drawbacks
Energy level [53] Accurate When the battery is active, it is not possible Internal resistance measurements [54] Direct and simple method Online estimation is not suitable.
Equivalent model [55] BMS is easily embedded

Demands prior knowledge
Data-driven method [56] Good accuracy Large computational cost Internal impedance measurements [57] Accurate Based on battery chemistry information Incremental capacity analysis [58] Less computational cost At least one health factor must exist

Methodology
In this section, the proposed method for the prediction of the state of health of batteries in hybrid and electric vehicles is presented. The suggested framework is shown in Figure 1 with attention to input and output characteristics. For predicting the state of health (SOH) of electric vehicles' (EV) Li-ion batteries, the selected elements were employed in a hybrid prediction model. The XGboost model was utilized to create the suggested prediction model. As shown in Figure 1, the framework employed XGBoost to predict the input and output features, respectively. The suggested methodology can predict the state of health (SOH) for an electric vehicle's Li-ion batteries.

Extreme Gradient Boosting
XGBoost stands for extreme gradient boosting, which is a regression modeling approach based on building supervised regression models. Sequentially constructed decision trees are utilized in XGBoost. Furthermore, weights are essential in the XGBoost algorithm. Each independent variable is given a weight, which is then used to predict results through a determination tree. Variables incorrectly predicted lead to improved associated weights, which are fed to the double determination tree. There are some significant benefits of XGBoost as a tree-boosting scalable system; it can be increased efficiently by decreasing computing time, and using parallel trees with a sparsity-aware algorithm, XGBoost can be applied satisfactorily with missing data. XGBoost has been utilized in classification, various limited prediction issues, and a wide field of applications, including regression. The gradient boosting decision tree (GBDT) is an improved and advanced performance version of XGBoost. Gradient boosting is a technique used to build a strong learner by operating multiple decision trees that utilize gradient boosting. An algorithm is used repeatedly in this model via a set number of trees. Furthermore, training a GBDT model can take thousands of iterations in a large, complex dataset, affecting computational bottlenecks. Gradient strength can be increased with XGBoost. The various advantages listed above enable XGBoost to perform exceptionally well in machine learning. Figure 2 shows a diagram of how the XGBoost model is structured.

Root
Root Root In the following, we list all of the necessary extreme gradient boosting equations. A tree ensemble model to predict an output is defined by the following dataset: D = (x, y) (D = Dataset). S includes the predicted values SOH i,E (battery parameters voltage, charge, and discharge, etc.), and real or ground truth values SOH i,R in Equation (1): Here, we define the tree complexity Ω to formalize the improved goal function in Equation (2) : where T is the number of leaf nodes in the tree, W is the leaf weight, and Ω indicates the regular term. Thus, with attention to Equation (2) we have the definition of the objective function in Equation (3).
where ı means the model's loss function and the penalty for the number of leaves N is controlled by the parameters η and γ.

Bagging Boosting Algorithm
The bagging is a meta-algorithm machine learning set that aims to improve the accuracy and precision of machine learning algorithms in statistical classification and regression. Furthermore, it creates an overfitting avoidance mechanism that reduces variance, to change the weight that is used as an iterative strategy. Figure 3a demonstrates the structure of a bagging tree algorithm. Table 2 demonstrates an example of the operation of the bagging tree algorithm as follows. The bagging technique predicts a result by learning from weak learners in parallel, combining them, and finding the average of their predictions utilizing every learner's dataset. Table 2. Explanation of steps in the bagging trees algorithm.
Step No. Description

Ensemble Boosting Algorithm
The ensemble boosting algorithm has the ability to create good models for prediction and can remove the bias error in general during the process. Some weights are used during training for a state in the process. Figure 3b shows the operation of an ensemble boosting algorithm. The process of an algorithm boosting trees is shown in Table 3. The new tree: F(x) 7 Update the residuals The performance of extreme gradient boosting to error minimization is sufficiently high, and the algorithm can even predict by using a small quantity of data. Moreover, it does not count as one machine when learning from scalable and scale-distributed samples.
In this algorithm for error minimization, low accuracy models are boosted. Here, referring to Figure 3 in bagging, the random trees have been collected by combining several models and are designed.
MAE and RMSE were utilized to calculate error levels in the prediction; the functions were used to evaluate the suggested framework, as shown in Figure 4. While the state of health (SOH) prediction is extremely important for keeping the EV running without interruption or hazard, it may also be crucial for improved battery control and EV car maintenance. As part of the suggested framework, a rigorous model review method ensures prediction accuracy by comparing error and accuracy values. This set of evaluation parameters was utilized to evaluate the accuracy of our proposed hybrid model in the performance assessment and improvement techniques given later in Section 5.

State
Of Health Evaluation And Validation The definition of a prediction, considering D as a dataset is given in Equation (4): Considering x as input and y as output in Equation (4), the prediction model utilizes x and y (train and test) from the dataset that is described in Equation (5): Thus, the provided available input values and the trained prediction model output values are in Equation (6): x t n +1 y t n +1 Moreover, the model predictions can predict the output results of the past by considering the time state.

Data
This section presents the approach for preparing the dataset [59]. Figure 3 illustrates the data preparation flow and explains it. Meanwhile, we describe the process used to select the inputs and outputs.

Data Description
The dataset and experimental conditions used in this study are described below. These data cannot be directly mapped to the battery's state of health, so we identified the health features in the battery data that could indicate the battery's state of health. Charge and discharge processes of this kind are called random walk operations. The charge and discharge operations of this type are called random walk operations. A series of reference charging and discharging cycles were conducted after the 1500 loading times used to establish reference measures for a battery's health [59]. Three lithium-ion batteries (5,6,7) were selected for the strategy of a stable discharge experiment considering charging, discharging, and impedance. Using a series of 18,650 charging and discharging currents, lithium-ion batteries were continuously run. A battery's reference profile included the battery's capacity from observing a reference charge and discharge cycle. The second one was to benchmark the dynamics of a battery. The pulsed current was used for charging and discharging. The open-circuit voltage of the batteries was observed as a function of the state of charge, employing a low current discharge of 0.04 A. An examination based on Li-ion batteries was performed under different operation profiles of charge, discharge, and capacities at room temperature. Initially, the battery was charged in a constant currentvoltage model at 1.5 A to 4.2 V, and then following this, with a constant-voltage mode cutoff current dropped to 20 mA. The battery voltage discharged at 2.7 V to 2.2 V at a constant current level of 2 A. Batteries accelerated to an age when they were continuously charged and discharged. With attention to these datasets, the aging trend of the three datasets demonstrated in Figure 5 could be used to predict the battery's state of health. Furthermore, Table 4 shows an overview of the dataset.

Feature Selection
The state of health indicates the capacity of the battery to store electric energy, with attention to the present and initial capacities. Firstly, the aim was to examine the existing prediction models and the inputs and outputs leading to Equation (7) [60]. The percentage is generally used to indicate the state of health of the battery. In Equation (7), C aged represents the current capacity of the battery and C rated represents the rated capacity of the battery. In addition, the charge and discharge operational profile characteristics are less complicated and less expensive to process, which reduces the costs and delay in predicting and forecasting the state of health [19].
The parameters' features for the framework to predict are the time of reference charge, time of reference discharge, ratio of constant-current charge time, and voltage post charge. An analysis of existing datasets was performed prior to modeling the state of health (SOH) for a battery. Figure 6 shows a visualization of the dataset on charging performance, showing the charging curves for the battery (a), the current measured curves for the battery (b), and the temperature curves for the battery (c).  Here, it appears that the discharge voltage decreases faster as the cycle number increases. Due to the fact that the discharge current remained constant in the cycling experiment, the duration of the discharging process was able to represent the discharge capacity, from which the state of health (SOH) was then calculated. The discharging curves for the battery (a), current measured curves for the battery (b), and temperature curves for the battery (c) are shown in Figure 7.

Results and Discussion
XGBoost was designed to be trained with 80% of the input datasets and tested with 20% of the battery datasets. As part of the model validation process, the performance metrics were utilized to evaluate the simulation results. A comparison between the real and predicted state of health (SOH) was made to measure the XGBoost algorithm's performance; the offered model achieved a higher prediction accuracy rate. Table 5 presents the implemented environment summary. The operating memory in this system was 32 GB. The system processing CPU was an Intel(R) Core(TM) i5-9600 CPU @3.20 GHz. The programming language for the machine learning algorithm implementation was Python 3.8.3. The involved machine learning algorithm was an extreme gradient boosting learning.

Performance Evaluation
In this paper, four different evaluation parameters for the performance of the proposed state of health estimation model were utilized as evaluating indicators to reflect the integrity of the suggested approach, namely, the root-mean-square error (RMSE), mean absolute error (MAE) as follows: where N is the number of data in Equations (8) and (9); furthermore, SOH Estimate is the estimated value per SOH and SOH Real indicates the measured value for the state of health (SOH) during the test in Equations (8) and (9).

SOH Estimation
As batteries age, their capacity to store energy and supply power diminishes. The battery's state of health serves as an indicator of this deterioration. The battery's state of health (SOH) is typically 100% when the battery is new and 0% when the battery's storage capacity or output power declines to a certain level. The capacity and impedance of the battery are regarded as indicators of the energy and power of the battery, respectively. Thus, the current battery capacity and impedance measurements can be used to define the state of health of a battery. The state of health (SOH) was considered at 0% when the battery capacity was below 80% of its nominal capacity or double that of its nominal impedance. The battery's state of health (SOH) denotes the battery's dependability and safety as an electrical source. A battery's health tends to deteriorate with rapid aging, which may drastically drop its health. Based on the capacity measurement over the cycle life of a cell, the state of health was computed as a health state index. A range of continuous values was established here for the initial dataset employing the min-max tuples for the current (I), voltage (V), and temperature (T). As a result of training the model with the training dataset, the validation dataset (test data) was utilized to validate the model. The root-mean-square error (RMSE) was used to measure the performance incidence of the trained model output. Iterations were performed until the RMSE reached 0.72% or lower. We used the resulting model to make predictions of the state of health (SOH) over the prediction data and cross-checked them with known values to assess the model's accuracy.
The XGBoost algorithm was verified by operating the data from battery 6 for the purpose of testing the prediction problem. The data from batteries 5 and 7 were used for training the model. The ensemble boosting prediction method employing the ensemble bagging technique on dataset 6 was used in order to confirm the accuracy of the state of health (SOH) estimation of lithium-ion batteries by the extreme gradient boosting method. The prediction results of the state of health (SOH) on dataset 6 are shown in Figure 8. This was based on the expected ensemble boosting prediction, the ensemble bagging prediction, on dataset 6. In the estimation as well as and in Figure 8, comparing ensemble boosting and ensemble bagging gave a better estimate of the state of health. The ensemble boosting feature performed better for multiple input features directly affecting the training data. As a result of overfitting in the training data, ensemble bagging did not produce better results. We further evaluated the performance of XGBoost applied to the state of health (SOH) estimation based on cell 6. The predicted results and errors are shown in Table 6 in terms of mean absolute error (MAE) and root-mean-square error (RMSE) compared to other methods. Table 6. Performance analysis of the estimation results using the proposed method.

MAE (%) RMSE (%)
Encoder-decoder [61] 0.79 1.03 PSO-LSTM [62] 1.9566 0.6349 XGBoost 0.001315 0.001929 Battery discharge is a time-varying process, and a battery's health is a process that occurs over an extended period from the time of charge to the time of replacement. Gradually, the degree and state of health decline with time. The results of Figure 9 shows that XGBoost and raw data have good prediction effects at the later stage, and their values are near the real ones.

Conclusions
The XGBoost algorithm was verified by using the data from battery 6 for the purpose of testing the prediction problem. The data from batteries 5 and 7 were used for training the model. The ensemble boosting prediction method employing the ensemble bagging on dataset 6 was used in order to confirm the accuracy of the state of health (SOH) estimation of lithium-ion batteries by the XGBoost method. The prediction results of the state of health (SOH) on dataset 6 were demonstrated in Figure 8. This was based on the expected ensemble boosting prediction and the ensemble bagging prediction on the dataset 6. In the estimation and performance figures, the ensemble boosting and ensemble bagging methods gave a better estimate of the state of health. The ensemble boosting feature performed better for multiple input features directly affecting the training data. As a result of overfitting in the training data, the ensemble bagging method did not produce better results.

Future Direction
As future work, a specific temperature is required to validate the proposed method. In contrast, electric vehicles have different temperatures when their batteries are used. The algorithm needs to be further tested, and adaptive parameter techniques must be adopted to ensure that it works well in a wide range of temperatures and aging requirements. Research into the related fundamental theory is also expected to advance the methods for estimating the battery's state. One of the considerations in future work can be a model battery based on machine learning with an intelligent monitoring system for the vehicle battery's state of health. Funding: This research was financially supported by the Ministry of Small and Medium-sized Enterprises (SMEs) and Startups (MSS), Korea, under the "Regional Specialized Industry Development Plus Program (R&D, S3246057)" supervised by the Korea Institute for Advancement of Technology (KIAT).

Conflicts of Interest:
The authors declare no conflict of interest.