State of Charge Estimation of Lithium-Ion Battery for Electric Vehicles Using Machine Learning Algorithms

: The durability and reliability of battery management systems in electric vehicles to forecast the state of charge (SoC) is a tedious task. As the process of battery degradation is usually non-linear, it is extremely cumbersome work to predict SoC estimation with substantially less degradation. This paper presents the SoC estimation of lithium-ion battery systems using six machine learning algorithms for electric vehicles application. The employed algorithms are artiﬁcial neural network (ANN), support vector machine (SVM), linear regression (LR), Gaussian process regression (GPR), ensemble bagging (EBa), and ensemble boosting (EBo). Error analysis of the model is carried out to optimize the battery’s performance parameter. Finally, all six algorithms are compared using performance indices. ANN and GPR are found to be the best methods based on MSE and RMSE of (0.0004, 0.00170) and (0.023, 0.04118), respectively.


Introduction
The transport industry accounts for the bulk of greenhouse gas emissions and pollution to the environment [1]. The transport sector can be improved by the introduction of the e-mobility applications such as electric vehicles (EVs) [2], hybrid locomotives and other battery-energy storage systems [3]. The energy storage system is one of the most significant parts of EVs and smart grid technologies [4][5][6][7]. The smart grid technology is the emerging technology in electricity transmission and distribution lines. Numerous batteries are available in the market for various energy storage applications. Specifically, lithium-ion batteries are selected as an energy storage technology for EVs due to its gravimetric and volumetric density, high hour's efficiency, and long life [8,9]. However, thermal management of batteries for EV application is important [10]. EV charging stations are widely used internationally, and ports have been expanded at public and private charging points [11]. In Belgium, two EVs with different battery capacities are investigated. It is reported that the grid utility for the EVs leads to volatility in power supply, electricity quality and grid control issues [12,13]. Currently, research is going on to transform buildings from energy consumers to energy producers by integrating renewable energy systems into the building heating, ventilation air conditioning (HVAC) analysis of electric vehicles, home appliances, distributed power generation and electrical storage are being carried out based on the artificial intelligence of the system to estimate SoC [45][46][47]. Recent studies on battery charging, discharging characteristic and state of estimation studies are listed in Table 1. The SVM algorithm is also used to process Application programming interface (API) weather data [48]. Results demonstrate a good forecast for photovoltaic panels to optimize energy production and cargo balance. The distribution grid defines optimization algorithms for the assignment and operation of stationary and EV batteries. The results suggest that a significant drop in battery size and a minimal energy loss can be accomplished by EV simulation and prediction of load [49] and Photovoltaic(PV) interactions [13,[50][51][52]. Besides, data-driven methods and battery performance evaluations rely not just on the choice of health metrics but also on the battery model range. The state of charge (SoC) estimation of the battery is one of the important parameters for the battery management system (BMS) in electric vehicles. Table 1. Earlier studies on the battery charging and discharging characteristics in machine learning.

Feature Parameter Battery Performance Index and Precision Reference
The energy of the signal (current, voltage) NASA 18650 MAE <1.29% [53] Temperature (min, max, average, area) NASA 18650 RMSE <3.58% [54] The slope of the charging voltage curve NASA 18650 RMSE <3.45% [55] The slope of the discharging voltage curve NASA 18650 RMSE <3.84% [56] Equal voltage drops in charging curve NCM/ graphite RMSE 2% [57] Equal voltage drops in discharging curve NASA 18650 MAE <1.29% [58] The characteristic of I.C. curves (peak, valley) Prismatic Li-ion Battery RMSE 2.99% [59] In the present work, the SoC of lithium-ion batteries is predicted based on the six machine learning algorithms using data derived from the electric vehicle BMS. The algorithms used in the studies are artificial neural network (ANN), support vector machine (SVM), linear regression (LR), Gaussian process regression (GPR), and ensemble bagging (EBa), and ensemble boosting (EBo) algorithm. Finally, all six algorithms are compared with performance indices.

Materials and Method
The Panasonic 18650FP battery cell (Panasonic, Zellik, Belgium) is used for experimental data sets for an electric vehicle. The research equipment used to compile experimental data sets includes the tested batteries, host computer, battery programming, battery discharge, thermal chamber, stress, current, temperature and electrical quantity instruments. Panasonic's 18650PF dataset [38] is compiled at McMaster University, Ontario, Canada, by the Department of Mechanical Engineering. The battery datasets acquired are trained, validated and tested using MATLAB version 2020b (MathWorks, Natick, MA 01760, USA) which is carried out on a 24 GB Quadro NVIDIA RTX 6000 workstation computer with an Intel i9 processor. MATLAB 2020b's Neural Network Toolbox, Regression Toolbox and Statistics and Fitting Toolbox are the toolboxes used in this experiment.
First, the suggested machine learning (ML) algorithm is used according to known partial data to construct a predictive model of the state of charge (SoC) for forecasting the complete charging curve. The flow chart of the proposed SoC method is illustrated in Figure 1. The overall SoC diagnostics framework is based on short-term charging results. The method suggested consists of three modules: input parameters, feature extraction and machine learning (ML) algorithms, and estimates the SoC. Six different algorithms are adopted in this study, namely, ANN, SVM, LR, GPR, ensemble bagging and ensemble boosting algorithms.

Batteries State of Charge Estimation
The machine-learning (ML) algorithms are specifically used to create an accurate SoC. The following are all the main sections shown in the flowchart. The proposed SoC estimation method is validated under a wide range of battery operating conditions. The ML output data are then removed, and the SoC features are collected. Finally, the nonlinear cartography between input and SoC functions provides SoC diagnosis with the aid of a serviceable model for all different algorithms such as ANN, SVM, LR, GPR, ensemble bagging and ensemble boosting. The training module ANN, SVM, linear regression, GPR, ensemble bagging and ensemble boosting comprises the critical parameter optimization process. The ML algorithm-based estimation of the SoC can be predicted from the four essential parameters of a lithium-ion battery such as battery current, battery voltage, battery capacity and temperature of the battery based on the available dataset. The ML algorithm used in this study is elaborated in the following sections.

Artificial Neural Network (ANN)
Artificial neural networks (ANNs) are parallel processing approaches that can specifically describe non-linear and complex interactions using input-output data set training patterns. ANNs provide non-linear mapping between inputs and outputs through intrinsic capacity. The ANNs' ability to learn system behavior from the representative data enables ANNs to solve numerous complex large-scale problems. The ANN algorithm follows, which is shown in Figure 2.
Step 1: Randomly initialize the weights and bias of the model Step 2: Log sigmoid activation function is used in the hidden layer

Batteries State of Charge Estimation
The machine-learning (ML) algorithms are specifically used to create an accurate SoC. The following are all the main sections shown in the flowchart. The proposed SoC estimation method is validated under a wide range of battery operating conditions. The ML output data are then removed, and the SoC features are collected. Finally, the non-linear cartography between input and SoC functions provides SoC diagnosis with the aid of a serviceable model for all different algorithms such as ANN, SVM, LR, GPR, ensemble bagging and ensemble boosting. The training module ANN, SVM, linear regression, GPR, ensemble bagging and ensemble boosting comprises the critical parameter optimization process. The ML algorithm-based estimation of the SoC can be predicted from the four essential parameters of a lithium-ion battery such as battery current, battery voltage, battery capacity and temperature of the battery based on the available dataset. The ML algorithm used in this study is elaborated in the following sections.

Artificial Neural Network (ANN)
Artificial neural networks (ANNs) are parallel processing approaches that can specifically describe non-linear and complex interactions using input-output data set training patterns. ANNs provide non-linear mapping between inputs and outputs through intrinsic capacity. The ANNs' ability to learn system behavior from the representative data enables ANNs to solve numerous complex large-scale problems. The ANN algorithm follows, which is shown in Figure 2. where SoCl,m is the estimated SoC and So is the activation function.
Step 3: The error estimated is backpropagated to the hidden layer from the output layer The hidden layer error is calculated by Step 4: The weights and biases are updated using the weight equations.

Support Vector Machine (SVM)
The support vector machine (SVM) is a popular and commonly used soft computing technique in many areas. The basic principle of SVMs is to use non-linear mapping for Step 1: Randomly initialize the weights and bias of the model Step 2: Log sigmoid activation function is used in the hidden layer For input variable m, the j-th input layer node holds x m,j . The overall input to the k-th node in the hidden layer is where, w k,j = weight from the input layer to the hidden layer, θ k,j = bias from the input layer to the hidden layer. The hidden layer output at l-th node is given by The overall l-th node in the output layer is given by w l,i = weight from the hidden layer to the output layer, θ l,j = bias from the hidden layer to the output layer.
The final output layer is given by where SoC l,m is the estimated SoC and S o is the activation function.
Step 3: The error estimated is backpropagated to the hidden layer from the output layer The hidden layer error is calculated by World Electr. Veh. J. 2021, 12, 38 6 of 17 Step 4: The weights and biases are updated using the weight equations.

Support Vector Machine (SVM)
The support vector machine (SVM) is a popular and commonly used soft computing technique in many areas. The basic principle of SVMs is to use non-linear mapping for data mapping in some areas and apply the linear algorithm in the function space. One form of SVM is the support vector regressor, which has been developed for regression problems. The SVM algorithm model is followed, which is shown in Figure 3. An empirical equation of proposed algorithm is presented in Table 2.
World Electr. Veh. J. 2021, 12, x 6 of 18 data mapping in some areas and apply the linear algorithm in the function space. One form of SVM is the support vector regressor, which has been developed for regression problems. The SVM algorithm model is followed, which is shown in Figure 3. An empirical equation of proposed algorithm is presented in Table 2.
Step 1. Import the input features Step 2.
Analyze the correlation and directivity of the data Step 3.
Split the dataset into the train and validation test Step 4.
Choose the kernel function out of (linear, polynomial, sigmoid, radial basis) Step 5.
Train the model with training data Step 6.
Evaluate the model performance Step 7.
Test the model with testing data Step 8.
Calculate the performance metrics for the tested data

Algorithms Model Empirical Equation
Artificial Neural Network

Algorithms Model Empirical Equation
Artificial Neural Network

Gaussian Process Regression
Test the model with testing data calculate the performance metrics for the tested data

Ensemble Bagging
Output the bagging model: Ensemble Boosting The output of boosting tree: Step 1. Import the input features Step 2. Analyze the correlation and directivity of the data Step 3. Split the dataset into the train and validation test Step 4. Choose the kernel function out of (linear, polynomial, sigmoid, radial basis) Step 5. Train the model with training data Step 6. Evaluate the model performance Step 7. Test the model with testing data Step 8. Calculate the performance metrics for the tested data

Linear Regression (LR)
Linear regression algorithms will be applied if the output is a continuous variable. In contrast, grading algorithms are applied when output is broken up into sections like pass/fail, good/average/bad, et cetera. We have different regression algorithms or classifying behavior, the LR algorithm being the fundamental regression algorithm. The linear model algorithm follows, which is shown in Figure 4. An empirical equation of the proposed algorithm is presented in Table 2.

Linear Regression (LR)
Linear regression algorithms will be applied if the output is a continuous variable. In contrast, grading algorithms are applied when output is broken up into sections like pass/fail, good/average/bad, et cetera. We have different regression algorithms or classifying behavior, the LR algorithm being the fundamental regression algorithm. The linear model algorithm follows, which is shown in Figure 4. An empirical equation of the proposed algorithm is presented in Table 2.
Get the input features Step 2.
Analyze the correlation and directivity of the data Step 3.
Estimate the model Step 4.
Fit the best fitting line Step 5.
Evaluate the model and Step 6.
Test the model with testing data Step 7.
Calculate the performance metrics for the tested data

Gaussian Process Regression (GPR)
Gaussian regression of process (GPR) is a non-parametric Bayesian regression method that generates waves in machine learning. GPR has various advantages, such as, it works well on small data sets and can provide predictive uncertainty measurements. The algorithm of the Gaussian process regression (GPR) model is shown in Figure 4. An empirical equation of the proposed algorithm is presented in Table 2.
Import the input features Step 2.
Analyze the correlation and directivity of the data Step 3.
Split the dataset into the train and validation test Step 4.
Build the model for the Gaussian process regression model Step 5.
Train the model with training data Step 6.
Evaluate the model performance

Ensemble Bagging (EBa)
Bagging is a meta-algorithm machine-learning set designed to strengthen machine learning algorithms' accuracy and precision in statistical classification and regression. It also reduces variance and aids in over-fitting avoidance. Bagging is a way to reduce the uncertainty of estimation by producing additional data for dataset testing using duplication variations to generate different sets of initial data. Boosting is an iterative strategy that relies on the previous description to adjust the weight of the observation. The bagging trees algorithm is as follows, which is shown in Figure 5a.
for i =1 to K, do Step 1. Get the input features Step 2. Analyze the correlation and directivity of the data Step 3. Estimate the model Step 4. Fit the best fitting line Step 5. Evaluate the model and Step 6. Test the model with testing data Step 7. Calculate the performance metrics for the tested data

Gaussian Process Regression (GPR)
Gaussian regression of process (GPR) is a non-parametric Bayesian regression method that generates waves in machine learning. GPR has various advantages, such as, it works well on small data sets and can provide predictive uncertainty measurements. The algorithm of the Gaussian process regression (GPR) model is shown in Figure 4. An empirical equation of the proposed algorithm is presented in Table 2.
Step 1. Import the input features Step 2. Analyze the correlation and directivity of the data Step 3. Split the dataset into the train and validation test Step 4. Build the model for the Gaussian process regression model Step 5. Train the model with training data Step 6. Evaluate the model performance

Ensemble Bagging (EBa)
Bagging is a meta-algorithm machine-learning set designed to strengthen machine learning algorithms' accuracy and precision in statistical classification and regression. It also reduces variance and aids in over-fitting avoidance. Bagging is a way to reduce the uncertainty of estimation by producing additional data for dataset testing using duplication variations to generate different sets of initial data. Boosting is an iterative strategy that relies on the previous description to adjust the weight of the observation. The bagging trees algorithm is as follows, which is shown in Figure 5a. Step 2. Generate a bootstrap sample of the original data Step 3.
Train an unpruned tree model on this sample

Ensemble Boosting (EBo)
Boosting is a whole sequential process that eliminates the bias error in general and generates good predictive models. The algorithm assigns weights to each resulting mode during training, shown in Figure 5b. An Algorithm Boosting Trees and empirical equation of the proposed algorithm is presented in Table 2.
Set (x) = 0 and = for all i in the training set Step 2.
Compute the average response, , and use this as the initial predicted value sample Step 3.
for i = 1 to K, do Step 4.
Fit a tree × ( ) with D splits (d + 1 terminal nodes) to the training data Step 5.
Update (x) by adding in a shrunken version of the new tree: Step 6.

Training and Testing Datasets
The dataset is split into training, validation and testing sets. The training set consists of 43,355 data values, and it is split into training and validation in the ratio of 80% to 20%, respectively. The training set data is used to train the model, and finally, the testing set is used to test the performance. The data splitting is shown in Table 3. The experiments are carried out at constant temperature in the chamber at 25 °C. Increase or decrease in the rising temperature affects the performance of the battery.  Step 1. for i = 1 to K, do Step 2. Generate a bootstrap sample of the original data Step 3. Train an unpruned tree model on this sample Step 4. End

Ensemble Boosting (EBo)
Boosting is a whole sequential process that eliminates the bias error in general and generates good predictive models. The algorithm assigns weights to each resulting mode during training, shown in Figure 5b. An Algorithm Boosting Trees and empirical equation of the proposed algorithm is presented in Table 2.
Step 1. Setf (x) = 0 and r i = y i for all i in the training set Step 2. Compute the average response, y, and use this as the initial predicted value sample Step 3. for i = 1 to K, do Step 4. Fit a treef ×i (x) with D splits (d + 1 terminal nodes) to the training data Step 5. Updatef (x) by adding in a shrunken version of the new tree: Step 6.f (x) ←f (x) + λf ×i (x) Step 7. Update the residuals, r i ← r iλf ×i (x) Step 8. End

Training and Testing Datasets
The dataset is split into training, validation and testing sets. The training set consists of 43,355 data values, and it is split into training and validation in the ratio of 80% to 20%, respectively. The training set data is used to train the model, and finally, the testing set is used to test the performance. The data splitting is shown in Table 3. The experiments are carried out at constant temperature in the chamber at 25 • C. Increase or decrease in the rising temperature affects the performance of the battery.  [38]. All obtained data are normalized to reduce the fluctuation in training and also the speed of the training time. The Bayesian optimization algorithm optimizes the model using hyperparameter tuning of the proposed machine learning. Typically, this algorithm requires more time, but it can result in good generalization for difficult, small or noisy datasets. According to adaptive weight minimization, training stops (regularization). The Levenberg-Marquardt backpropagation algorithm takes more memory but less time. Training stops automatically when generalization stops improving, as shown by a rise in the validation samples' mean square error.  [38]. All obtained data are normalized to reduce the fluctuation in training and also the speed of the training time. The Bayesian optimization algorithm optimizes the model using hyperparameter tuning of the proposed machine learning. Typically, this algorithm requires more time, but it can result in good generalization for difficult, small or noisy datasets. According to adaptive weight minimization, training stops (regularization). The Levenberg-Marquardt backpropagation algorithm takes more memory but less time. Training stops automatically when generalization stops improving, as shown by a rise in the validation samples' mean square error.
The Bayesian optimization in the neural network provides the optimal neurons and number of layers best fitted for the obtained dataset by considering its RMSE value. In ensemble boosting and bagging machine learning, the Bayesian optimization algorithm plays a major role in selecting the tree's depth to fine-tune the model. As defined in Section 2, the artificial neural network is one of the main algorithms in machine learning.

Performance Metrics
To evaluate the results of our the predicted SoC for the adopted models, compare the predicted SoC with the experiment SoC's actual results. The p metrics are therefore assessed by these different metrics [60].

Root Mean Square Error (RMSE)
The root mean square error is simply the square root of the square mea errors. RMSE is a good measure of accuracy, but only applicable to comparing dictions with data and not between variables.

R 2 Square
This is a statistical indicator that describes the amount of uncertainty ex an independent variable.

Results and Discussion
The neural network model is a learning-prediction method in battery m systems for an EVs, such as learning the SoC relationship based on the chargi charging process of data and then using it to predict the real-time SoC relation realistic operating conditions. As seen in the Figure 8a, each charging and count time is fed into the ANN (input-size = 4, hidden-size = 10), and then th tion becomes 4. According to the predicted model, the ANN hidden-size func modified: the larger the hidden-size, the more accurate the computational e creating an artificial neural network model, the three features flow into the The Bayesian optimization in the neural network provides the optimal neurons and number of layers best fitted for the obtained dataset by considering its RMSE value. In ensemble boosting and bagging machine learning, the Bayesian optimization algorithm plays a major role in selecting the tree's depth to fine-tune the model. As defined in Section 2, the artificial neural network is one of the main algorithms in machine learning.

Performance Metrics
To evaluate the results of our the predicted SoC for the adopted models, we need to compare the predicted SoC with the experiment SoC's actual results. The performance metrics are therefore assessed by these different metrics [60].

Root Mean Square Error (RMSE)
The root mean square error is simply the square root of the square mean of all the errors. RMSE is a good measure of accuracy, but only applicable to comparing model predictions with data and not between variables.

R 2 Square
This is a statistical indicator that describes the amount of uncertainty explained by an independent variable.

Results and Discussion
The neural network model is a learning-prediction method in battery management systems for an EVs, such as learning the SoC relationship based on the charging and discharging process of data and then using it to predict the real-time SoC relationship under realistic operating conditions. As seen in the Figure 8a, each charging and discharging count time is fed into the ANN (input-size = 4, hidden-size = 10), and then the data function becomes 4. According to the predicted model, the ANN hidden-size function can be modified: the larger the hidden-size, the more accurate the computational effort. After creating an artificial neural network model, the three features flow into the linear layer and are transformed into one feature. This section analyses the efficiency of SoC's ML forecast model.  Figure 8a shows the predicted SoC estimation of the support vector machine and artificial neural network with the expected state of charge. The neural network has a better prediction rate than the support vector machine due to its capability to handle non-linear data. Figure 8b shows the error plot of the neural network and support vector machine for predicted and actual SoC. The SoC measurement methods' overall performance as-  Figure 8a shows the predicted SoC estimation of the support vector machine and artificial neural network with the expected state of charge. The neural network has a better prediction rate than the support vector machine due to its capability to handle non-linear data. Figure 8b shows the error plot of the neural network and support vector machine for predicted and actual SoC. The SoC measurement methods' overall performance assessment results under four conditions in various modes of electric vehicle operation are shown in Figure 8a. Figure 9a compares the precited values of SoC of gaussian process regression and linear regression for the actual SoC estimation value. The Gaussian process regression has a better advantage over the linear regression. The Gaussian process can give the most reliable prediction of their uncertainty. However, the GPR will require more training time when compared to the linear regression as it takes the entire training dataset for training. The error analysis of the SoC is shown in Figure 9b. The SoC measurement methods' overall performance assessment results under four conditions in various electric vehicle operation modes are shown in Figure 9a The ensemble boosting prediction is a better SoC estimation when compared with ensemble bagging. The ensemble boosting has the characteristics of working better with multiple input features directly affecting training data. The ensemble bagging is not showing better results due to the problem of overfitting in the training data. The error plot of the proposed method is shown in Figure 10b. Another amazing detail in Figure 10 is that the ensemble bagging and boosting system accomplished low efficiency with great fluctuations and error. This is presumably because the temporal dependence between historical measurements and the SoC is not considered for the ensemble process. Finally, the learned model is obtained by repeating the learning process until the error is within a reasonable range. Blue lines and red lines are the projected SoC values and the true SoC values, and the light blue regions are the confidence level of the estimated SoC values.   Table 4 outlines the performance analysis of the different proposed machine learning algorithms. Table 4 and Figures 8,9 and 10a,b show that the proposed GPR approach achieves good efficiency with 85% Mean absolute Error( MAE) which outperformed all methods. The suggested GPR-linear approach reduces the MAE by 51% and 50%, respectively. This may be because the GPR kernel is capable of collecting the dynamic time structures of sequential data. In comparison, the GPR-linear approach produces a better prediction outcome than the SVM-ANN, with a 10% reduction in the MAE. The experiment's findings demonstrate that the proposed approach will accomplish SoC prediction under varying ambient temperature conditions with several network parameters. The standard SVM-ANN approach and the ensemble trees method suggested the lower MAE, variance (VAF) obtained, as seen in Table 4 and Figure 11. One particular benefit of the proposed approach is that it can provide confidence intervals for the SoC calculations and infer the SoC estimation values' volatility. This is critical for evaluating the volatility of the forecasts and thus offers more insightful performance. The ensemble boosting prediction is a better SoC estimation when compared with ensemble bagging. The ensemble boosting has the characteristics of working better with multiple input features directly affecting training data. The ensemble bagging is not showing better results due to the problem of overfitting in the training data. The error plot of the proposed method is shown in Figure 10b. Another amazing detail in Figure 10 is that the ensemble bagging and boosting system accomplished low efficiency with great fluctuations and error. This is presumably because the temporal dependence between historical measurements and the SoC is not considered for the ensemble process. Finally, the learned model is obtained by repeating the learning process until the error is within a reasonable range. Blue lines and red lines are the projected SoC values and the true SoC values, and the light blue regions are the confidence level of the estimated SoC values. Table 4 outlines the performance analysis of the different proposed machine learning algorithms. Table 4 and Figures 8, 9 and 10a,b show that the proposed GPR approach achieves good efficiency with 85% Mean absolute Error( MAE) which outperformed all methods. The suggested GPR-linear approach reduces the MAE by 51% and 50%, respectively. This may be because the GPR kernel is capable of collecting the dynamic time structures of sequential data. In comparison, the GPR-linear approach produces a better prediction outcome than the SVM-ANN, with a 10% reduction in the MAE. The experiment's findings demonstrate that the proposed approach will accomplish SoC prediction under varying ambient temperature conditions with several network parameters. The standard SVM-ANN approach and the ensemble trees method suggested the lower MAE, variance (VAF) obtained, as seen in Table 4 and Figure 11. One particular benefit of the proposed approach is that it can provide confidence intervals for the SoC calculations and infer the SoC estimation values' volatility. This is critical for evaluating the volatility of the forecasts and thus offers more insightful performance.   For model training, different numbers of neurons in the ANN model are used. The RMSE of the SoC estimation and the time taken to evaluate the ML model's performance are calculated. The calculation time for ML increases considerably for an increase in the number of neurons in the neural networks. As the number of neurons reaches 10, RMSE and MSE eventually converge to a stable value. It can be shown that, if the statistic is higher than 15, an increase in neurons does not greatly increase the performance of the calculation but loses measuring time. These findings demonstrate that ML does not lead to overfitting because RMSE converges to a small value.

Conclusions
Prediction of lithium-ion batteries' SoC plays a vital role in the battery management system of the electric vehicle's performance. In this work, the battery SoC is predicted based on six-machine learning algorithms which include artificial neural network (ANN), support vector machine (SVM), linear regression (LR), Gaussian process regression (GPR), ensemble bagging and ensemble boosting algorithms. With the proposed machine learning models, the non-linear mapping of the input features such as voltage and current to the SoC estimation is analyzed. Machine learning algorithms are selected for estimating the battery SoC due to their better handling of non-linear data. Besides, the proposed method can be used for real-time SoC estimations after optimizing the GPR-linear model hyperparameters. With 85% MAE, the proposed ANN and GPR approach achieves strong performance while outperforming other methods. This could be because the GPR kernel can extract sequential data from complex time structures. In contrast, the GPR-linear ap- For model training, different numbers of neurons in the ANN model are used. The RMSE of the SoC estimation and the time taken to evaluate the ML model's performance are calculated. The calculation time for ML increases considerably for an increase in the number of neurons in the neural networks. As the number of neurons reaches 10, RMSE and MSE eventually converge to a stable value. It can be shown that, if the statistic is higher than 15, an increase in neurons does not greatly increase the performance of the calculation but loses measuring time. These findings demonstrate that ML does not lead to overfitting because RMSE converges to a small value.

Conclusions
Prediction of lithium-ion batteries' SoC plays a vital role in the battery management system of the electric vehicle's performance. In this work, the battery SoC is predicted based on six-machine learning algorithms which include artificial neural network (ANN), support vector machine (SVM), linear regression (LR), Gaussian process regression (GPR), ensemble bagging and ensemble boosting algorithms. With the proposed machine learning models, the non-linear mapping of the input features such as voltage and current to the SoC estimation is analyzed. Machine learning algorithms are selected for estimating the battery SoC due to their better handling of non-linear data. Besides, the proposed method can be used for real-time SoC estimations after optimizing the GPR-linear model hyperparameters. With 85% MAE, the proposed ANN and GPR approach achieves strong performance while outperforming other methods. This could be because the GPR kernel can extract sequential data from complex time structures. In contrast, the GPR-linear approach performs better than the SVM-ANN, with a 10% decrease in the MAE. We conclude that the proposed ANN and GPR-based method further encourages improvement in the SoC estimate because of the probability distribution rather than the estimation of the point. The optimized features input into the machine learning model predict the battery state of charge estimation, which will help stakeholders and the researchers to identify their best battery for specific applications. ANN and GPR will help design the optimum battery management system for electric vehicles based on SoC predictions.