Optimizing Electric Vehicle Battery Life: A Machine Learning Approach for Sustainable Transportation

Karthick, K.; Ravivarman, S.; Priyanka, R.

doi:10.3390/wevj15020060

Open AccessArticle

Optimizing Electric Vehicle Battery Life: A Machine Learning Approach for Sustainable Transportation

by

K. Karthick

^1,*

,

S. Ravivarman

²

and

R. Priyanka

³

¹

Department of Electrical and Electronics Engineering, GMR Institute of Technology, Rajam 532127, Andhra Pradesh, India

²

Department of Electrical and Electronics Engineering, Vardhaman College of Engineering, Hyderabad 501218, Telangana, India

³

Department of Electrical and Electronics Engineering, S.A. Engineering College, Chennai 600077, Tamil Nadu, India

^*

Author to whom correspondence should be addressed.

World Electr. Veh. J. 2024, 15(2), 60; https://doi.org/10.3390/wevj15020060

Submission received: 29 December 2023 / Revised: 5 February 2024 / Accepted: 6 February 2024 / Published: 9 February 2024

(This article belongs to the Special Issue Propulsion Systems of EVs 2.0)

Download

Browse Figures

Versions Notes

Abstract

:

Electric vehicles (EVs) are becoming increasingly popular, due to their beneficial environmental effects and low operating costs. However, one of the main challenges with EVs is their short battery life. This study presents a comprehensive approach for predicting the Remaining Useful Life (RUL) of Nickel Manganese Cobalt-Lithium Cobalt Oxide (NMC-LCO) batteries. This research utilizes a dataset derived from the Hawaii Natural Energy Institute, encompassing 14 individual batteries subjected to over 1000 cycles under controlled conditions. A multi-step methodology is adopted, starting with data collection and preprocessing, followed by feature selection and outlier elimination. Machine learning models, including XGBoost, BaggingRegressor, LightGBM, CatBoost, and ExtraTreesRegressor, are employed to develop the RUL prediction model. Feature importance analysis aids in identifying critical parameters influencing battery health and lifespan. Statistical evaluations reveal no missing or duplicate data, and outlier removal enhances model accuracy. Notably, XGBoost emerged as the most effective algorithm, providing near-perfect predictions. This research underscores the significance of RUL prediction for enhancing battery lifecycle management, particularly in applications like electric vehicles, ensuring optimal resource utilization, cost efficiency, and environmental sustainability.

Keywords:

electric vehicle; battery; remaining useful life; machine learning; regression; prediction model

Graphical Abstract

1. Introduction

Electric vehicles (EVs) predominantly utilize lithium-ion (Li-ion) batteries, chosen for their exceptional characteristics, including high energy density, absence of memory effect, extended lifespan, and versatility in charging and discharging [1]. Despite these advantages, the automotive industry faces challenges from dynamic weather conditions, increased air pollution due to vehicle emissions, and uncertainties in renewable energy supply chains [2].

The energy stored in EV batteries offers a promising solution to environmental concerns and uncertainties. Decarbonizing the transportation sector depends on advancements in, and the widespread adoption of, EVs with enhanced range, safety, and reliability. Yet, the use of Li-ion batteries presents obstacles, including capacity degradation, environmental implications, and challenges in end-of-life management [3].

The concept of RUL is pivotal in predictive maintenance and reliability engineering, representing the estimated time or usage before a component, device, or system is anticipated to either fail or no longer meet its operational criteria [4]. In the EV battery context, predicting RUL involves employing machine learning algorithms, based on multiple factors.

After approximately 6.5 years of consistent operation, an EV battery’s capacity typically decreases by about 10%, which presents a significant challenge [5]. Predicting RUL and monitoring capacity degradation are complex tasks, especially considering Li-ion batteries’ gradual capacity decline over charge and discharge cycles [6]. These tasks fall under the domain of battery management systems (BMSs).

Accurately forecasting the intricate and non-linear trajectory of battery capacity degradation is essential. Machine learning (ML) offers substantial advantages in predicting EV battery life, thus facilitating efficient trip planning for owners and aiding manufacturers in designing longer-lasting batteries and optimal charging methods [7,8,9,10].

Given the non-linear and multifaceted factors affecting battery performance, ML methodologies are invaluable for addressing engineering the challenges related to battery degradation. Overcoming scalability and time limitations, ML techniques provide precise, non-invasive solutions.

The electrification of transportation infrastructure is pivotal, as it addresses the dual imperatives of sustainable energy and cost-effective mobility. This study seeks to establish a robust and accurate methodology for anticipating EV battery life, benefiting both manufacturers and owners, and fostering global sustainable development.

Problem Definition

The primary objective of this study is to forecast the RUL of lithium-ion (Li-ion) batteries—an essential task with significant real-world applications. Predicting RUL is pivotal for industries that heavily rely on Li-ion batteries, as it facilitates proactive maintenance strategies and efficient resource allocation. To achieve this, our dataset encompasses critical features, such as the cycle index, discharge time, and maximum voltage discharge. The target variable, RUL, which signifies the battery’s remaining operational lifespan, enables the creation of a robust and precise prediction model with these features.

2. Related Work

J.-H. Chou et al. [11] tackle the challenging task of predicting the RUL of lithium-ion batteries. They propose a hybrid method based on transfer learning, integrating empirical mode decomposition, support vector regression, and bidirectional long short-term memory with attention mechanism models. This approach significantly enhanced RUL prediction accuracy, particularly for batteries with higher cycle numbers, and demonstrating relative error values of 6.96%, 0.6%, and 6.25% for different charging policy target batteries.

J. Zhao et al. [12] address the challenges of predicting battery capacity for electric vehicles (EVs). Utilizing feature-based machine learning on a dataset comprising 420 cells and 9 battery packs, they design a two-step noise reduction method and employ a stacking ensemble learning approach. Their models achieve a Mean Absolute Percentage Error (MAPE) of 0.28% and a Root Mean Square Percentage Error (RMSPE) of 0.55% for capacity estimation, with an average error of 1.22% in predicting RUL. This study contributes to accurate and physically consistent predictions within the intricate context of EV battery systems.

C. Zoerr et al. [13] focus on lithium plating in lithium-ion batteries during fast charging in embedded systems. They introduce a novel charging procedure, based on the correlation between negative electrode polarization and anode potential, effectively mitigating lithium plating risks. This validated approach, used under various conditions, leverages an anode potential regulation, derived from a Newman-type P2D modeling framework, and showcasing a significant reduction in the risks of lithium plating.

D. A. Najera-Flores et al. [14] present a groundbreaking end-to-end deep learning framework for rapid lithium-ion battery RUL prediction. By emphasizing temporal patterns and cross-data correlations from raw data, like terminal voltage, current, and cell temperature, the approach achieves predictions 25X faster with a noteworthy 10.6% mean absolute error rate improvement.

D. Zapata Dominguez et al. [15] delve into the influence of manufacturing processes on graphite blend electrodes with silicon nanoparticles for lithium-ion batteries (LiBs). The study investigates correlating input/output parameters to discern the interrelation of properties in silicon/graphite blends, thus shedding light on viscosity, slurry rheology, and porosity thresholds, as well as their effects on electrode stability, ionic resistance, and cycling life.

G. Zhao et al. [16] highlight the escalating importance of lithium-ion battery health in transportation electrification. They propose an innovative approach that integrates Gaussian process regression, transfer learning, and gated recurrent neural network techniques to predict RUL. This method optimizes health indicators, implements online model correction, and introduces a self-correction strategy, thus enhancing accuracy beyond traditional methods, which is crucial for predictive maintenance in battery management.

M. Soltani et al. [17] investigate the degradation behavior and end-of-life prediction of lithium titanate oxide (LTO) batteries. The study explores temperature, current rate, and cycle depth impacts on capacity degradation and cycle life, employing a feed-forward neural network model for accurate health state and end-of-life predictions. Their research underscores factors affecting LTO battery performance and lifespan, revealing accelerated degradation under high temperatures and current rates, with cycle depth significantly influencing cycle life.

A. B. Çolak [18] examines the impacts of road gradient and coolant flow on electric vehicle battery-powered electronic components using a machine learning approach. The study emphasizes the pivotal role of data quantity in enhancing predictive accuracy for artificial neural networks (ANNs), suggesting that adequate data are paramount for optimal performance, while acknowledging the computational resources required for training larger datasets.

X. Guo et al. [19] introduce a novel CEEMDAN-CNN BiLSTM approach for predicting the RUL of lithium-ion batteries. By merging Complete Ensemble Empirical Mode Decomposition with Adaptive Noise, 1D CNN, and BiLSTM, this model surpasses baseline models on the NASA battery dataset. Its robustness to noisy, non-stationary data positions it as a promising tool for bolstering battery management systems, ensuring safe operations, and prolonging battery lifespan.

D. Li et al. [20] propose a novel approach for predicting battery thermal runaway faults in electric vehicles (EVs) using abnormal heat generation (AHG) and deep learning algorithms. Their model, trained on diverse AHG profiles, accurately forecasts the time to thermal runaway, paving the way for preventive measures through which to enhance EV safety.

In our proposed research, we evaluated and compared the performance of various machine learning algorithms, including XGBoost, BaggingRegressor, LightGBM, CatBoost, and ExtraTreesRegressor. This comprehensive analysis provides insights into the suitability of different algorithms for predicting the RUL of Li-ion batteries. Our analysis delves deeper into scalability concerns across various battery types and conditions. This research considers multiple battery health indicators, such as discharge time, reduction in voltage, max discharge voltage, min charging voltage, time at 4.15 V, constant current time, and charging time. Analyzing the relationships between these indicators and RUL provides a holistic understanding of the factors influencing battery longevity. This research incorporates hyperparameter tuning, using the GridSearchCV method, to optimize the performance of machine learning models. This approach acknowledges the importance of tuning external configurations to enhance the accuracy of RUL predictions.

3. Materials and Methods

Figure 1 presents the block diagram of the proposed RUL prediction model. The process commences with data collection and preprocessing, which is succeeded by feature selection and outlier removal. The GridSearchCV method is employed for hyperparameter optimization using five-fold cross validation. The XGBoost, BaggingRegressor, LightGBM, CatBoost, and ExtraTreesRegressor machine learning algorithms are utilized to develop the RUL prediction model for predicting a battery’s RUL. Subsequently, regression performance metrics are employed to evaluate the model’s effectiveness. Finally, a feature importance analysis is conducted to identify the most influential features within the dataset.

3.1. Data

The dataset originates from a study conducted by the Hawaii Natural Energy Institute, focusing on 14 individual Nickel Manganese Cobalt-Lithium Cobalt Oxide (NMC-LCO) 18,650 batteries, each with a nominal capacity of 2.8 Ah. The term “18,650” denotes a specific battery size specification, measuring approximately 18 mm in diameter and 65 mm in length.

These batteries underwent an intensive cycling regimen, enduring over 1000 cycles at a controlled temperature of 25 °C. The charging and discharging protocols were standardized, utilizing a constant current–constant voltage (CC-CV) charge rate set at a C/2 rate, i.e., they were charged at half the battery’s capacity per hour. Additionally, they were discharged at a rate of 1.5 C, i.e., discharging the battery at 1.5 times its capacity per hour.

To extract meaningful insights and facilitate predictive modeling, specific features were derived from the original dataset. These features highlight the voltage and current behaviors observed throughout each battery cycle [21]. The meticulously chosen features provide essential information, aiming to effectively forecast the RUL of these batteries, a pivotal metric for assessing battery health. NMC-LCO batteries are commonly used in various applications, including electric vehicles. The chemistry and behavior of these batteries in the dataset could provide insights into the degradation patterns and lifespans of similar batteries used in EVs. The batteries in the dataset underwent over 1000 cycles, which is analogous to the kind of cycle life testing EV batteries would undergo. This extensive cycling provides a rich dataset for understanding how these batteries degrade over time and cycles.

Table 1 describes the features of the RUL dataset, which was used to develop the prediction model. Table 2 provides statistical descriptions for various features in the dataset. It can be observed that some time entries show negative values, indicating that there could have been errors or anomalies during the data collection process.

3.2. Data Prepreocessing

The data were examined for missing and duplicate values. After preprocessing, it was observed that there were no missing or duplicate instances present in the dataset. Ensuring that there are no missing or duplicate records helps to maintain the integrity of the dataset. Any missing values could lead to biased analyses or inaccurate predictions, especially in machine learning models [22]. Removing duplicates ensures that each record in the dataset is unique [23]. This is particularly important when conducting statistical analyses or training machine learning models, as redundant data can affect the efficiency and accuracy of these processes. In this dataset, no missing or duplicate instances were found. After the preprocessing stage, the dataset contained 15,064 instances and 8 features, excluding the target feature.

3.3. Feature Selection

Figure 2 displays the heatmap of the RUL dataset. The heatmap reveals that the RUL and cycle index have a correlation coefficient of −1. Additionally, the maximum voltage at discharge correlates with RUL at 0.78, while the minimum voltage during charging has a correlation of −0.76 with RUL. Other features exhibit low correlations with the target variable, RUL. It is to be observed that there is an inverse relationship between the cycle index and RUL values. For instance, when the cycle index is 1, the RUL is 1112; conversely, when the RUL is 1, the cycle index is 1112. Similarly, if the RUL is 1113, the cycle index becomes 0.

This inverse correlation suggests that the cycle index alone might not provide actionable insights about the actual health or remaining life of the battery, as its value could be misleading. If the model is trained with the cycle index as a feature, it might inadvertently learn this inverse relationship too strongly, leading to overfitting. Overfitting occurs when a model learns the noise or random fluctuations in the training data, reducing its ability to generalize to new, unseen data [24]. By relying heavily on the cycle index, the model might not perform well on real-world data, for which this inverse relationship may not hold or other factors may play significant roles. To mitigate this, the decision was made to exclude the cycle index feature from the model. This decision also considers potential external factors or material degradation that could influence RUL. If the model becomes overly dependent on the cycle index, it may not account for these other important variables or external influences, leading to inaccurate predictions. After the feature selection process, the dataset contained 15,064 instances and 7 features, excluding the target feature.

3.4. Outlier Removal

Figure 3 presents data that contain outliers, focusing on various features related to battery performance and health. Outliers can significantly skew statistical measures, such as the mean and standard deviation, making them less representative of the typical behavior of the dataset. Outliers can distort predictive models, leading to less accurate predictions. Models trained on datasets with outliers might generalize poorly to new, unseen data [25]. By removing outliers, the performance of predictive models can be enhanced, ensuring they provide more precise estimations of RUL. Outliers can introduce noise, making it challenging to interpret patterns, trends, or anomalies in the data [26]. Outliers might not represent the typical behavior of the battery lifecycle or performance. Figure 4 showcases the same metrics as Figure 3, on the same dataset, in which outliers have been removed. The following is the process involved in outlier removal.

Step 1: Calculate the Interquartile Range (IQR):

The IQR is a measure of statistical dispersion and is computed as the difference between the third quartile (Q3) and the first quartile (Q1). Mathematically,

IQR = Q3 − Q1.

Step 2: Define the lower and upper bounds:

Using the IQR, lower and upper bounds are defined to identify outliers. The lower bound is calculated as (Q1 − 1.5 × IQR), and the upper bound is calculated as Q3 + (1.5 × IQR). Any data point below the lower bound or above the upper bound is considered an outlier.

Step 3: Filter outliers:

After determining the lower and upper bounds for each column, the code filters the dataframe (df) to retain only those rows for which the values for each column lie within the calculated bounds. The dropna() method is then used to remove any rows that contain NaN (missing) values, which might have arisen during the filtering process. After the outlier removal process, the number of instances became 14,445.

Table 3 provides a comparative analysis of skewness and kurtosis values for various battery performance features, both before and after outlier removal. Prior to outlier removal, features such as Discharge Time, Decrement 3.6–3.4 V, Time at 4.15 V, Time Constant Current, and Charging Time had high skewness and kurtosis values. These values indicated distributions that were highly right-skewed, with heavy tails and pronounced peaks. Post outlier removal, the skewness values generally shifted closer to zero, or became negative, when compared to their initial values. A skewness value nearing zero signifies a more symmetrical distribution [27]. Consequently, the diminished or negative skewness post removal implies that the feature distributions became more balanced and symmetrical. Similarly, the kurtosis values were predominantly reduced in the ‘After Removal of the Outliers’ column, relative to their initial counterparts. This reduction suggests a decrease in the peakedness or tail heaviness of the data. A negative kurtosis value indicates a distribution with lighter tails than the normal distribution, whereas a positive value suggests heavier tails [28]. Observing these decreased kurtosis values, it becomes evident that the datasets post outlier removal exhibit distributions that are less peaked and possess lighter tails, aligning more closely with a normal distribution or displaying fewer extreme values. After removing the outliers, we used 14,445 instances with 7 features, excluding the target variable ‘RUL’, for ML model development.

3.5. Machine Learning Model Development

3.5.1. Data Splitting

The RUL dataset was divided into a training set and a test set, using an 80:20 ratio. Specifically, 11,556 instances were allocated for training, while 2889 instances were set aside for testing. The features used to develop the RUL prediction model included the following: Discharge Time (s), Decrement 3.6–3.4 V (s), Maximum Voltage Discharge (V), Minimum Voltage Charging (V), Time at 4.15 V (s), Time Constant Current (s), and Charging Time (s). This data splitting is crucial for evaluating the model’s performance across various scenarios, enhancing its reliability and applicability in real-world settings [29].

3.5.2. ML Model Selection

Model selection is a critical phase that involves choosing the most suitable machine learning algorithm or model architecture tailored for a specific task. Experimenting with multiple models and assessing their performance on a test set is essential for making an informed choice [30]. Model training, a crucial step in the development process, involves the model learning from training data and adjusting parameters for accurate predictions.

Advanced ML algorithms, namely XGBoost, BaggingRegressor, LightGBM, CatBoost, and ExtraTreesRegressor were evaluated to determine the most appropriate one for the RUL prediction model.

XG Boost

XG Boost, or Extreme Gradient Boosting [31], is a powerful machine learning algorithm known for its efficiency in regression tasks. Its primary strength lies in its ability to handle complex datasets and reduce overfitting. By incorporating a regularization component into its loss function, XG Boost can make accurate predictions at each decision-making step, ensuring optimal performance in RUL prediction, for which precision is crucial.

Using the universal function, the estimated model can be obtained using the following equation:

{\hat{y}}_{i}^{t} = \sum_{k = 1}^{t} f_{k} {(x}_{i}) = {\hat{y}}_{i}^{(t - 1)} + f_{t} {(x}_{i})

(1)

where,

${\hat{y}}_{i}^{t}$ = forecasts at stage t
$f_{t} {(x}_{i})$ = a learner at stage t
$x_{i}$ = the input variable
${\hat{y}}_{i}^{(t - 1)}$ = forecasts at stage t − 1

BaggingRegressor

BaggingRegressor, short for Bootstrap Aggregating Regressor [32], is ideal for reducing variance and preventing overfitting by training multiple instances of a predictor and averaging their outputs. For RUL battery prediction, for which the dataset might have various sources of noise and inconsistency, BaggingRegressor can enhance prediction stability by combining multiple models, ensuring more reliable and robust estimations.

LightGBM

LightGBM is a gradient boosting framework that excels in handling large datasets efficiently and quickly [33]. Its ability to handle categorical features and its high training speed make it suitable for RUL prediction tasks, especially when dealing with a vast number of battery-related variables. The algorithm’s efficient memory usage and parallel training capabilities further enhance its suitability for complex regression problems like RUL prediction.

The LightGBM method takes, as input, a supervised training set X and a loss function L(y, f(x)) whose anticipated value is to be minimized

\hat{f}

(x). It is given in Equation (2).

\hat{f} = {\arg m i n}_{f} E_{y, X} L (y, f (x))

(2)

CatBoost

CatBoost is designed to tackle categorical variables effectively, making it apt for datasets in which feature types are diverse [34]. For RUL battery prediction, for which understanding the nuances of each variable type is critical, CatBoost’s inherent ability to handle categorical data without extensive preprocessing can be advantageous. Additionally, its robust handling of overfitting and out-of-the-box compatibility with various data formats make it a favorable choice.

ExtraTreesRegressor

ExtraTreesRegressor, an ensemble learning method, combines multiple decision tree predictors to provide accurate regression outcomes [35]. Its randomized decision-making process, coupled with feature randomness, ensures reduced variance and overfitting, making it suitable for noisy datasets. Given the inherent variability and unpredictability in RUL battery data, ExtraTreesRegressor can provide stable and accurate predictions by leveraging ensemble techniques.

3.5.3. Hyper Parameter Optimization with k-Fold Cross Validation

Optimizing hyperparameters is vital for refining a machine learning model tailored for RUL prediction, ensuring the most effective configurations for the given battery dataset. While parameters adapt based on the data provided during training, hyperparameters are values set prior to this phase. To fine-tune our model for RUL forecasting, we employed GridSearchCV, a renowned method that systematically evaluates a range of specified hyperparameter values [36], incorporating a 5-fold cross validation strategy. Table 4 shows the optimal hyperparameters that have been obtained using GridSearchCV with five-fold cross validation.

3.5.4. Performance Evaluation Metrics for RUL Regression Models

The evaluation metrics for assessing the performance of the regression model include the Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the R-squared value. The formulas for calculating these metrics are provided, with y_i representing actual values, and y_p representing predicted values for a set of ‘n’ instances. The R-squared value, derived from the coefficient of determination, measures the proportion of the variance in the dependent variable that is predictable from the independent variables [37].

The MAE is determined using Equation (3).

M A E = \frac{|(y_{i} - y_{p})|}{n}

(3)

The MSE is determined using Equation (4).

M S E = \frac{\sum {(y_{i} - y_{p})}^{2}}{n}

(4)

The RMSE is determined using Equation (5).

R M S E = \sqrt{\frac{\sum {(y_{i} - y_{p})}^{2}}{n}}

(5)

The R-Squared, or coefficient of determination, is calculated by Equation (6).

R^{2} = 1 - \frac{\sum {(y_{i} - y_{p})}^{2}}{\sum {(y_{i} - \bar{y_{i}})}^{2}}

(6)

Here,

\bar{y_{i}}

is the mean of all of the actual values.

4. Results and Discussion

Table 5 provides a comparative analysis of various ML algorithms’ performance metrics when predicting the RUL values of batteries. The metrics evaluated include the MAE, MSE, RMSE, and R-Squared for both the training and test datasets. The XGBoost algorithm exhibits the lowest MAE and RMSE values across both the training and test sets, indicating its superior accuracy in predicting RUL. Additionally, the R-Squared values are exceptionally high, suggesting a near-perfect fit of the model to the data. While the BaggingRegressor algorithm performs well, especially in the training set, it demonstrates slightly higher MAE and RMSE values compared to XGBoost. However, its R-Squared values remain consistently high, indicating strong predictive capability. The LightGBM algorithm shows a noticeable increase in the MAE, MSE, and RMSE values for both the training and test sets, compared to the previous models. Still, the R-Squared values are above 0.995, suggesting a reliable predictive model. The CatBoost algorithm presents higher error metrics, with the MAE, MSE, and RMSE values surpassing those of the previous models. The R-Squared values slightly decrease, indicating a relatively less accurate prediction, compared to other models. Among the evaluated models, ExtraTreesRegressor exhibits the highest error metrics for both the training and test sets. The R-Squared values are slightly lower than those of the other models, implying a less optimal fit to the data.

XGBoost appears to be the most effective algorithm for RUL prediction, followed by BaggingRegressor, LightGBM, CatBoost, and, finally, ExtraTreesRegressor. Figure 5 presents an in-depth visualization and analysis of RUL prediction for batteries using the XGBoost machine learning algorithm. In Figure 5a, the Actual Vs Predicted Plot graphically compares the actual RUL values of the batteries against the RUL values predicted by the XGBoost algorithm. Ideally, in an accurate predictive model, all data points would lie along a diagonal line, representing a perfect match between the actual and predicted values. Deviations from this line indicate discrepancies or errors in the model predictions. In Figure 5b, the residual plot depicts the differences between the actual RUL values and the corresponding predicted values. Essentially, the vertical distances between data points and the horizontal reference line (typically at y = 0) showcase the magnitudes and directions of errors. A well-performing model would show residuals randomly scattered around this line, without forming any discernible patterns.

Figure 5c displays the residual histogram of the XGBoost algorithm-driven RUL prediction model. This histogram offers a distribution of the prediction errors, providing insights into their spread and frequency. A bell-shaped curve centered at zero would indicate that the model’s errors are both randomly distributed and unbiased.

Figure 6 illustrates the Feature Importance Plot. It shows the relevance or importance of various features in predicting the RUL values of batteries. A score closer to 1 suggests high importance, while a score closer to 0 indicates lower importance. Features with higher importance scores can be crucial for ensuring that the model generalizes well to unseen data. The “Time at 4.15 V (s)” feature stands out, with the highest score of 0.8219, making it the most significant feature for predicting RUL among all of the listed features. The “Time Constant Current (s)” feature has a score of 0.0979; this feature holds significant importance. The “Decrement 3.6–3.4 V (s)” feature has a score of 0.0013; this feature has the least importance among all the listed features, and its contribution to predicting RUL is minimal.

While high-importance features are valuable, it is essential to ensure that they do not introduce biases into the model or lead to overfitting. Over-reliance on a single feature (even if it is highly predictive) might make the model less robust or less generalizable. Therefore, a balanced consideration of all features, including those with lower scores, is crucial.

The outcome of this research will help to achieve sustainable advancements in electric vehicle infrastructure. This research suggests that accurate prediction of RUL enables proactive maintenance and resource optimization, leading to cost savings and increased operational efficiency for industries relying on Li-ion batteries. Li-ion batteries are integral components in various industries, particularly in electric vehicles. Predicting the RUL values of these batteries is crucial for implementing proactive maintenance strategies. Accurate RUL prediction enables optimal resource utilization by allowing industries or other applications to schedule maintenance precisely when needed. This can result in cost savings and increased operational efficiency. Understanding and predicting RUL contribute to extending the lifespans of Li-ion batteries. This is critical for sustainable development, as longer-lasting batteries reduce the environmental impact associated with frequent replacements and disposal. For electric vehicle owners, accurate RUL predictions facilitate efficient trip planning. This research aligns with the ongoing transformation at the intersection of technology and industry. Comparative analysis of various machine learning algorithms provides valuable insights into their performance in predicting RUL. Identifying superior algorithms, like XG Boost in this case, establishes benchmarks for future research and applications in similar domains. This study acknowledges the environmental impact of Li-ion batteries and proposes that accurate RUL predictions may reduce concerns related to battery disposal. Sustainable practices, driven by technological advancements, are crucial to minimizing environmental footprints.

5. Conclusions

This study systematically navigates through challenges in predicting the RUL values of Li-ion batteries, emphasizing the significance of ML approaches. Various ML algorithms are evaluated, with XGBoost demonstrating superior performance in RUL prediction. The results highlight the efficacy of the XGBoost algorithm in minimizing errors and accurately predicting RUL. These findings are instrumental for both manufacturers and owners, fostering efficient trip planning and facilitating the development of longer-lasting batteries.

This research contributes to the transformative intersection of technology and industry, paving the way for sustainable advancements in electric vehicle infrastructure. Considering the focus on sustainable development and the environmental impact of electric vehicles, there could be a hypothesis that accurate prediction of RUL contributes to extending battery lifespan, thereby reducing the environmental concerns associated with battery disposal. There may be a hypothesis suggesting that accurate prediction of RUL enables proactive maintenance and resource optimization, resulting in cost savings and increased operational efficiency for industries relying on Li-ion batteries. The limitations of this research include the quality and availability of data, which can significantly impact the performance of machine learning models. This research also may not account for variations in environmental conditions that could impact battery performance. Factors like temperature, humidity, and usage patterns may influence RUL, but were not explicitly addressed.

Author Contributions

Conceptualization, K.K.; Methodology, K.K; Validation, S.R.; Formal analysis, R.P.; Data curation, S.R.; Writing—original draft, K.K.; Writing—review & editing, R.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Kaggle digital repository.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Khan, F.N.U.; Rasul, M.G.; Sayem, A.S.M.; Mandal, N. Maximizing energy density of lithium-ion batteries for electric vehicles: A critical review. Energy Rep. 2023, 9 (Suppl. 11), 11–21. [Google Scholar] [CrossRef]
Alanazi, F. Electric Vehicles: Benefits, Challenges, and Potential Solutions for Widespread Adaptation. Appl. Sci. 2023, 13, 6016. [Google Scholar] [CrossRef]
Kumar, M.; Panda, K.P.; Naayagi, R.T.; Thakur, R.; Panda, G. Comprehensive Review of Electric Vehicle Technology and Its Impacts: Detailed Investigation of Charging Infrastructure, Power Management, and Control Techniques. Appl. Sci. 2023, 13, 8919. [Google Scholar] [CrossRef]
Kang, Z.; Catal, C.; Tekinerdogan, B. Remaining Useful Life (RUL) Prediction of Equipment in Production Lines Using Artificial Neural Networks. Sensors 2021, 21, 932. [Google Scholar] [CrossRef]
Uzair, M.; Abbas, G.; Hosain, S. Characteristics of Battery Management Systems of Electric Vehicles with Consideration of the Active and Passive Cell Balancing Process. World Electr. Veh. J. 2021, 12, 120. [Google Scholar] [CrossRef]
Pang, X.; Huang, R.; Wen, J.; Shi, Y.; Jia, J.; Zeng, J. A Lithium-Ion Battery RUL Prediction Method Considering the Capacity Regeneration Phenomenon. Energies 2019, 12, 2247. [Google Scholar] [CrossRef]
Jiang, Y.; Song, W. Predicting the Cycle Life of Lithium-Ion Batteries Using Data-Driven Machine Learning Based on Discharge Voltage Curves. Batteries 2023, 9, 413. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, Y.; Addepalli, S. Remaining Useful Life Prediction using Deep Learning Approaches: A Review. Procedia Manuf. 2020, 49, 81–88. [Google Scholar] [CrossRef]
Wu, J.; Kong, L.; Cheng, Z.; Yang, Y.; Zuo, H. RUL Prediction for Lithium Batteries Using a Novel Ensemble Learning Method. Energy Rep. 2022, 8 (Suppl. 12), 313–326. [Google Scholar] [CrossRef]
Li, X.; Yu, D.; Byg, V.S.; Ioan, S.D. The development of machine learning-based remaining useful life prediction for lithium-ion batteries. J. Energy Chem. 2023, 82, 103–121. [Google Scholar] [CrossRef]
Chou, J.-H.; Wang, F.-K.; Lo, S.-C. Predicting future capacity of lithium-ion batteries using transfer learning method. J. Energy Storage 2023, 71, 108120. [Google Scholar] [CrossRef]
Zhao, J.; Ling, H.; Liu, J.; Wang, J.; Burke, A.F.; Lian, Y. Machine learning for predicting battery capacity for electric vehicles. eTransportation 2023, 15, 100214. [Google Scholar] [CrossRef]
Zoerr, C.; Sturm, J.J.; Solchenbach, S.; Erhard, S.V.; Latz, A. Electrochemical polarization-based fast charging of lithium-ion batteries in embedded systems. J. Energy Storage 2023, 72, 108234. [Google Scholar] [CrossRef]
Najera-Flores, D.A.; Hu, Z.; Chadha, M.; Todd, M.D. A Physics-Constrained Bayesian neural network for battery remaining useful life prediction. Appl. Math. Model. 2023, 122, 42–59. [Google Scholar] [CrossRef]
Dominguez, D.Z.; Mondal, B.; Gaberscek, M.; Morcrette, M.; Franco, A.A. Impact of the manufacturing process on graphite blend electrodes with silicon nanoparticles for lithium-ion batteries. J. Power Source 2023, 580, 233367. [Google Scholar] [CrossRef]
Zhao, G.; Kang, Y.; Huang, P.; Duan, B.; Zhang, C. Battery health prognostic using efficient and robust aging trajectory matching with ensemble deep transfer learning. Energy 2023, 282, 128228. [Google Scholar] [CrossRef]
Soltani, M.; Vilsen, S.B.; Stroe, A.I.; Knap, V.; Stroe, D.I. Degradation behaviour analysis and end-of-life prediction of lithium titanate oxide batteries. J. Energy Storage 2023, 68, 107745. [Google Scholar] [CrossRef]
Çolak, B. A new study on the prediction of the effects of road gradient and coolant flow on electric vehicle battery power electronics components using machine learning approach. J. Energy Storage 2023, 70, 108101. [Google Scholar] [CrossRef]
Guo, X.; Wang, K.; Yao, S.; Fu, G.; Ning, Y. RUL prediction of lithium ion battery based on CEEMDAN-CNN BiLSTM model. Energy Rep. 2023, 9, 1299–1306. [Google Scholar] [CrossRef]
Li, D.; Liu, P.; Zhang, Z.; Zhang, L.; Deng, J.; Wang, Z.; Dorrell, D.G.; Li, W.; Sauer, D.U. Battery Thermal Runaway Fault Prognosis in Electric Vehicles Based on Abnormal Heat Generation and Deep Learning Algorithms. IEEE Trans. Power Electron. 2022, 37, 8513–8525. [Google Scholar] [CrossRef]
Dataset Link. Available online: https://www.kaggle.com/datasets/ignaciovinuales/battery-remaining-useful-life-rul (accessed on 30 November 2023).
Carpenter, J.R.; Smuk, M. Missing data: A statistical framework for practice. Biom. J. 2021, 63, 915–947. [Google Scholar] [CrossRef]
Ali, A.; Emran, N.A.; Asmai, S.A. Missing values compensation in duplicates detection using hot deck method. J. Big Data 2021, 8, 112. [Google Scholar] [CrossRef]
Montesinos López, O.A.; Montesinos López, A.; Crossa, J. Overfitting, Model Tuning, and Evaluation of Prediction Performance. In Multivariate Statistical Machine Learning Methods for Genomic Prediction; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 2021, 23, 18. [Google Scholar] [CrossRef] [PubMed]
Yu, Y.; Zhu, Y.; Li, S.; Wan, D. Time Series Outlier Detection Based on Sliding Window Prediction. Math. Probl. Eng. 2014, 2014, 879736. [Google Scholar] [CrossRef]
Gandhi, S.M.; Sarkar, B.C. Chapter 11—Conventional and Statistical Resource/Reserve Estimation. In Essentials of Mineral Exploration and Evaluation; Gandhi, S.M., Sarkar, B.C., Eds.; Elsevier: Amsterdam, The Netherlands, 2016; pp. 271–288. ISBN 9780128053294. [Google Scholar] [CrossRef]
Zhang, Y.; Umair, M. Examining the interconnectedness of green finance: An analysis of dynamic spillover effects among green bonds, renewable energy, and carbon markets. Environ. Sci. Pollut. Res. Int. 2023, 30, 77605–77621. [Google Scholar] [CrossRef]
Shoaib, M.; Shah, B.; EI-Sappagh, S.; Ali, A.; Ullah, A.; Alenezi, F.; Gechev, T.; Hussain, T.; Ali, F. Corrigendum: An advanced deep learning models-based plant disease detection: A review of recent research. Front. Plant Sci. 2023, 14, 1282443. [Google Scholar] [CrossRef]
Westphal, M.; Brannath, W. Evaluation of multiple prediction models: A novel view on model selection and performance assessment. Stat. Methods Med. Res. 2020, 29, 1728–1745. [Google Scholar] [CrossRef] [PubMed]
Shahani, N.M.; Zheng, X.; Liu, C.; Hassan, F.U.; Li, P. Developing an XGBoost Regression Model for Predicting Young’s Modulus of Intact Sedimentary Rocks for the Stability of Surface and Subsurface Structures. Front. Earth Sci. 2021, 9, 761990. [Google Scholar] [CrossRef]
Sharafati, A.; Asadollah, S.B.H.S.; Al-Ansari, N. Application of bagging ensemble model for predicting compressive strength of hollow concrete masonry prism. Ain Shams Eng. J. 2021, 12, 3521–3530. [Google Scholar] [CrossRef]
Paudel, P.; Karna, S.K.; Saud, R.; Regmi, L.; Thapa, T.B.; Bhandari, M. Unveiling Key Predictors for Early Heart Attack Detection using Machine Learning and Explainable AI Technique with LIME. In Proceedings of the 10th International Conference on Networking, Systems and Security (NsysS’23), Khulna, Bangladesh, 21–23 December 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 69–78. [Google Scholar] [CrossRef]
Krishnan, S.; Aruna, S.K.; Kanagarathinam, K.; Venugopal, E. Identification of Dry Bean Varieties Based on Multiple Attributes Using CatBoost Machine Learning Algorithm. Sci. Program. 2023, 2023, 2556066. [Google Scholar] [CrossRef]
Barrera-Animas, A.Y.; Oyedele, L.O.; Bilal, M.; Akinosho, T.D.; Delgado, J.M.D.; Akanbi, L.A. Rainfall prediction: A comparative analysis of modern machine learning algorithms for time-series forecasting. Mach. Learn. Appl. 2022, 7, 100204. [Google Scholar] [CrossRef]
Anyanwu, G.O.; Nwakanma, C.I.; Lee, J.-M.; Kim, D.-S. Optimization of RBF-SVM Kernel Using Grid Search Algorithm for DDoS Attack Detection in SDN-Based VANET. IEEE Internet Things J. 2023, 10, 8477–8490. [Google Scholar] [CrossRef]
Jierula, A.; Wang, S.; OH, T.-M.; Wang, P. Study on Accuracy Metrics for Evaluating the Predictions of Damage Locations in Deep Piles Using Artificial Neural Networks with Acoustic Emission Data. Appl. Sci. 2021, 11, 2314. [Google Scholar] [CrossRef]

Figure 1. Proposed RUL prediction model.

Figure 2. Heatmap of RUL dataset.

Figure 3. Data with outliers: (a) duration of discharge, in seconds; (b) time taken for a specific voltage decrement within a range from 3.6 V to 3.4 V, in seconds; (c) highest voltage level during the discharge process, in volts; (d) lowest voltage level observed during the charging phase, in volts; (e) time at 4.15 V, in seconds; (f) time constant current, in seconds; (g) charging time, in seconds; and (h) RUL of battery.

Figure 4. Data without outliers: (a) duration of discharge, in seconds; (b) time taken for a specific voltage decrement within a range from 3.6 V to 3.4 V, in seconds; (c) highest voltage level during the discharge process in volts; (d) lowest voltage level observed during the charging phase, in volts; (e) time at 4.15 V, in seconds; (f) time constant current, in seconds; (g) charging time, in seconds; and (h) RUL of battery.

Figure 5. RUL prediction using XGBoost ML algorithm: (a) actual vs. predicted plot, (b) residual plot, and (c) residual histogram.

Figure 6. Feature importance plot.

Table 1. Dataset description.

Features	Description
Cycle Index	Denotes the sequential number of the battery cycle, providing a chronological order of observations.
Discharge Time (s)	Represents the duration (in seconds) of the discharge phase for each cycle.
Decrement 3.6–3.4 V (s)	Denotes the time (in seconds) for the battery voltage to decrement from 3.6 V to 3.4 V during discharge.
Max. Voltage Discharge (V)	Represents the maximum voltage (in volts) observed during the discharge process of the battery.
Min. Voltage Charging (V)	Indicates the minimum voltage (in volts) observed during the charging process of the battery.
Time at 4.15 V (s)	Represents the duration (in seconds) the battery remains at a voltage level of 4.15 V.
Time Constant Current (s)	Denotes the time constant (in seconds) of the current during the battery cycle.
Charging Time (s)	Indicates the time taken (in seconds) for the battery to be charged fully.
RUL	This is the target variable, representing the remaining operational lifespan in terms of the cycles remaining in the battery.

Table 2. Dataset statistics.

Feature	Count	Mean	std	Min	25%	50%	75%	Max
Cycle Index	15,064	556.155	322.378	1	271	560	833	1134
Discharge Time (s)	15,064	4581.27	33144	8.69	1169.31	1557.25	1908	958,320
Decrement 3.6–3.4 V (s)	15,064	1239.78	15,039.6	−397,646	319.6	439.239	600	406,704
Max. Voltage Discharge (V)	15,064	3.90818	0.091	3.043	3.846	3.906	3.972	4.363
Min. Voltage Charging (V)	15,064	3.5779	0.1237	3.022	3.488	3.574	3.663	4.379
Time at 4.15 V (s)	15,064	3768.34	9129.55	−113.58	1828.88	2930.2	4088.33	245,101
Time Constant Current (s)	15,064	5461.27	25155.8	5.98	2564.31	3824.26	5012.35	880,728
Charging Time (s)	15,064	10,066.5	26415.4	5.98	7841.92	8320.42	8763.28	880,728
RUL (Cycles)	15,064	554.194	322.435	0	277	551	839	1133

Table 3. Comparison of skewness and kurtosis values for battery performance features before and after outlier removal.

Feature	Before Outlier Removal (15,064 Instances)		After Removal of the Outliers (14,445 Instances)
Feature	Skew	Kurtosis	Skew	Kurtosis
Discharge Time (s)	16.300	339.993	−0.154	−1.170
Decrement 3.6–3.4 V (s)	9.986	253.344	0.241	−0.899
Max. Voltage Discharge (V)	−0.530	11.564	−0.079	−0.966
Min. Voltage Charging (V)	0.329	1.145	0.213	−0.235
Time at 4.15 V (s)	16.238	340.628	−0.106	−1.206
Time Constant Current (s)	24.723	696.544	−0.138	−1.171
Charging Time (s)	22.770	587.790	−0.125	−0.654
RUL (Cycles)	0.006	−1.208	−0.012	−1.202

Table 4. Best hyperparameters of ML algorithms on RUL prediction.

ML Algorithm	Best Hyperparameters
XGBoost	‘learning_rate’: 0.1, ‘max_depth’: 10, ‘n_estimators’: 200
BaggingRegressor	‘max_features’: 1.0, ‘max_samples’: 1.0, ‘n_estimators’: 100
LightGBM	‘learning_rate’: 0.1, ‘max_depth’: 7, ‘n_estimators’: 200, ‘num_leaves’: 121
CatBoost	‘learning_rate’: 0.1, ‘max_depth’: 10, ‘n_estimators’: 200
ExtraTreesRegressor	‘max_depth’: 10, ‘max_features’: ‘auto’, ‘min_samples_leaf’: 1, ‘min_samples_split’: 2, ‘n_estimators’: 200

Table 5. Comparison of performance of ML algorithms on RUL prediction.

	Training Set				Test Set
ML Algorithm	MAE	MSE	RMSE	R-Squared	MAE	MSE	RMSE	R-Squared
XGBoost	2.243	10.628	3.260	0.999	8.191	245.993	15.684	0.997
BaggingRegressor	3.268	42.456	6.515	0.999	8.517	272.162	16.497	0.997
LightGBM	9.121	187.071	13.677	0.998	12.597	416.982	20.419	0.995
CatBoost	15.15	431.900	20.782	0.995	17.122	574.958	23.978	0.994
ExtraTreesRegressor	19.375	720.336	26.839	0.992	21.336	862.952	29.376	0.991

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Karthick, K.; Ravivarman, S.; Priyanka, R. Optimizing Electric Vehicle Battery Life: A Machine Learning Approach for Sustainable Transportation. World Electr. Veh. J. 2024, 15, 60. https://doi.org/10.3390/wevj15020060

AMA Style

Karthick K, Ravivarman S, Priyanka R. Optimizing Electric Vehicle Battery Life: A Machine Learning Approach for Sustainable Transportation. World Electric Vehicle Journal. 2024; 15(2):60. https://doi.org/10.3390/wevj15020060

Chicago/Turabian Style

Karthick, K., S. Ravivarman, and R. Priyanka. 2024. "Optimizing Electric Vehicle Battery Life: A Machine Learning Approach for Sustainable Transportation" World Electric Vehicle Journal 15, no. 2: 60. https://doi.org/10.3390/wevj15020060

Article Menu

Optimizing Electric Vehicle Battery Life: A Machine Learning Approach for Sustainable Transportation

Abstract

1. Introduction

Problem Definition

2. Related Work

3. Materials and Methods

3.1. Data

3.2. Data Prepreocessing

3.3. Feature Selection

3.4. Outlier Removal

3.5. Machine Learning Model Development

3.5.1. Data Splitting

3.5.2. ML Model Selection

XG Boost

BaggingRegressor

LightGBM

CatBoost

ExtraTreesRegressor

3.5.3. Hyper Parameter Optimization with k-Fold Cross Validation

3.5.4. Performance Evaluation Metrics for RUL Regression Models

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI