Lithium-Ion Battery Prognostics through Reinforcement Learning Based on Entropy Measures

: Lithium-ion is a progressive battery technology that has been used in vastly different electrical systems. Failure of the battery can lead to failure in the entire system where the battery is embedded and cause irreversible damage. To avoid probable damages, research is actively con-ducted, and data-driven methods are proposed, based on prognostics and health management (PHM) systems. PHM can use multiple time-scale data and stored information from battery capacities over several cycles to determine the battery state of health (SOH) and its remaining useful life (RUL). This results in battery safety, stability, reliability, and longer lifetime. In this paper, we propose different data-driven approaches to battery prognostics that rely on: Long Short-Term Memory (LSTM), Autoregressive Integrated Moving Average (ARIMA), and Reinforcement Learning (RL) based on the permutation entropy of battery voltage sequences at each cycle, since they take into account vital information from past data and result in high accuracy.


Lithium-Ion Batteries
Lithium-ion batteries, as the primary power source in electric vehicles, have attracted significant attention recently and have become a focus of research. It is assumed that lithium-ion batteries have the inherent potential for building future power sources for environmentally friendly vehicles [1].
Lithium-ion batteries are the best option for electrical vehicles due to their high-quality performance, capacity, small volume, light weight, low pollution, and rechargeability with no memory effect [2]. However, battery performance degrades when facing poor pavement conditions, temperature, and load changes. This leads to leakage, insulation damage, and partial short-circuits. Consequential situations can arise if these failures are not detected timeously [3,4]. As an example, several Boeing 787 aircraft caught fire because of lithium-ion battery failure in 2013, causing the airliners to be grounded [5]. Hence, it is necessary to detect performance degradations timeously and estimate future battery performance. This is where battery prognostics and health management (PHM) plays an important and vital role. PHM determines the battery state of health prediction (SOH) and battery remaining useful life prediction (RUL) of the product using possible failure information in the system, thus yielding improved system reliability and stability in the actual life-cycle of the battery.
Battery PHM and a battery management system (BMS) are important to ensure the reliable and safe functionality of energy storage units [6]. Battery RUL prediction, battery SOH prediction, and battery capacity fade prediction are among the topics which have drawn more attention from researchers in the recent decade [7]. However, these tasks are very difficult, as battery degradation has a complex nature and numerous factors must be taken into consideration [8,9].

Entropy Measures
Entropy is a measurement metric for irregularities in time series data, and is used to quantify the stochastic process in data analyses [10]. It was first introduced in classical thermodynamics, and has applications in diverse fields such as chemistry and physics, biological systems, cosmology, economics, sociology, weather science, climate change research, and information systems. Entropy has expanded to far-ranging fields and systems. Shannon, Permutation, Renyi, Tsallis, Approximate, and Sample entropy measures are some of the conceptions of entropy regularly in use [11].
From the afore-mentioned entropies, permutation entropy (PE) is a simple and robust approach to calculating the complexity of a non-linear system using the order relations between values of a time series and assigning a probability to the ordinal patterns. The permutation entropy measure technique works flexibly; it is computationally efficient, and has a range of several thousand parameter values similar to Lyapunov exponents. PE is discussed in more detail in Reference [12]. In this study, PE of the discharge battery voltage sequences is calculated and used as an input to the proposed models.

ML and DL Techniques
Recently, Machine Learning (ML) and Deep Learning (DL) algorithms have found very significant and useful applications in research and practice. These concepts have been used to develop various models for predicting different characteristics in diverse fields. In general, ML and DL algorithms aim to capture information from past data, learn from that data, and apply what they have learned to make informed decisions. Therefore, the associated systems are not required to be broadly programmed in all aspects.
ML is used to synthesize the fundamental relationships between large amount of data to solve real-time problems such as big-data analytics and evolution of information [13]. DL, in turn, is able to process a large number of features and, hence, is preferred when computing huge datasets and unstructured data. DL facilitates analysis and extraction of important information from raw data by using computer systems. [14]. Different types of parameters with various quantities can be applied to the developed models as the input to obtain expected predictive variables as the output.
Deep Learning techniques, including Long Short-Term Memory (LSTM) [15] and Reinforcement Learning (RL) [16], can fit numerical dependent variables and have great generalization ability, and therefore, are applicable to battery data. The LSTM algorithm, a Deep Learning algorithm with multiple gates, performs on the basis of updating and storing key information in the time series data [15], and is applicable to battery prognostics. The RL algorithm, on the other hand-as one of the latest Deep Learning methods and tools-has the capability of creating a simulation of the whole system and making intelligent decisions (i.e., charge, replace, repair, etc.) after it is utilized to predict the battery RUL and SOH for the purpose of battery PHM and BMS [16].

Research Objective
In this study, the objective is to progress the study of lithium-ion battery performance based on battery SOH and RUL prognostics. To do so, we propose an entropy-based Reinforcement Learning model, predict the next-cycle battery capacity, and compare the numerical results from the proposed entropy-based RL models to those from two other datadriven methods-namely, ARIMA and LSTM-which are both constructed based on the same input variable (i.e., permutation entropy of voltage sequences at each cycle). Permutation entropy of the battery discharge voltage, as well as the previous battery capacities, are given to these models as input variables. Finally, evaluation metrics such as MSE, MAE, and RMSE are applied to the proposed methods to compare the observed and predicted battery capacities.
Based on Figure 1, the remainder of this work consists of the following sections. First, battery data is prepared and provided for the study. The data is then analyzed from different points of view. Based on the data analysis, various models are proposed for lithiumion battery performance using ML and DL techniques. We evaluate and compare the models in detail in the next sections. Finally, conclusions are presented in the last section.

Related Work
In the current literature, entropy-based predictive models for battery prognostics, as well as other predictive models, have been researched and tested. Table 1 illustrates a brief overview of some of the most relevant and recently published papers that use data-driven methods for lithium-ion battery prognostics. [19] NASA Ames Long short-term memory (LSTM) The proposed model has a better performance for the time series problem of li-ion battery prognostics and a stronger learning ability of the degradation process when compared to other ANN algorithms.
[20] NASA lithium-ion battery dataset Long short-term memory (LSTM) The method produces exceptional performances for RUL prediction under different loading and operating conditions. [21] Data repository of the NASA Ames Prognostics Center of Excellence (PCoE) Autoregressive integrated moving average (ARIMA) The RMSE of the model for the RUL prognostics varies in the range of 0.0026 to 0.1065. [22] Lithium-ion battery packs from forklifts in commercial operations Autoregressive integrated moving average (ARIMA) The ARIMA method can be used for SOH prognostics, but the loss function indicates further enhancement is needed for the environmental conditions.
[23] NASA prognostic model library Reinforcement Learning (RL) RL model enables accurate calibration of the battery prognostics but has only been tested on simulated data and sim-to-real transfer needs to be made to test the proposed algorithm on real data.
[24] SPMeT Reinforcement Learning (RL) The proposed method can extend the battery life effectively and ensure end-user convenience. However, experimental validation needs to be implemented for the optimal charging strategy.
[25] Simulated datasets Ensemble Learning A data-driven method known as Ensemble Learning is presented for predicting degradation in a time-varying environment. [26] Experimental data from multiple lithium-ion battery cells at three different temperatures

Sparse Bayesian
The authors present a Sparse Bayesian model based on sample entropy of voltages for estimating SOH and RUL. It is shown that the Sparse Bayesian model outperforms the Polynomial model with the same input and target data. [27] Collected data through an experimental study Unscented Particle Filter and Support Vector Regression A hybrid model based on a combination of a data-driven method and a model-based approach is presented, which results in higher accuracy compared to each model individually.
The literature review reveals a research gap, which can be summarized as follows. Most of the research undertaken so far has relied on traditional Machine Learning and Deep Learning methods. However, the RL method is recognized as an area with room for exploration. Based on these findings, this paper is devoted to filling this gap in the research. LSTM and ARIMA methods are also studied as state-of-the-art models, which can be developed based on the entropy measures and compared with the RL method.
The main contribution of our study is the proposal of a Reinforcement Learning model based on the permutation entropy of the voltage sequences for predicting the nextcycle battery capacity. To the best of our knowledge, an RL model for lithium-ion battery prognostics, using entropy measures as the input, has not been previously tested in the literature. Additionally, we compare the numerical results from our proposed entropybased RL model with the results from the state-of-the-art models (i.e., ARIMA and LSMT), which are built based on entropy measures for a fair and reliable comparison.

Data and Battery Specifications
The datasets used in this study were retrieved from the Center for Advanced Life Cycle Engineering (CALCE) at the University of Maryland [28]. The studied batteries are graphite/LiCoO2 pouch cells with a capacity rating of 1500 mAh, weight of 30.3 gm, and dimensions of 3.4 × 84.8 × 50.1 mm, labeled as PL19, PL11, and PL09. Table 2 shows the number of cycles in each dataset.  Figure 2 illustrates the battery capacities over the number of cycles and indicates the decrease in capacities as the number of cycles increases. It can also be observed that in PL09 and PL19 capacities are discrete, while in PL11, they differ continuously. Since the battery capacity and entropy were not observed in all cycles, we have estimated each unrecorded capacity value and its related entropy using the average of its previous and next known capacity and entropy value. By doing so, we have increased the number of data, and hence, the proposed models can be trained and tested more accurately.

Methodology
The mathematical notations used throughout this paper are summarized in Table 3. In the following subsections, permutation entropy calculation and the proposed models will be discussed.

Permutation Entropy
To compute a order permutation entropy for a one-dimensional set of time series data with data points, the following steps are taken [29]. First, the data is partitioned into a matrix with rows and − ( − 1) columns, where is the delay time.
After rebuilding the data, is defined as the permutation pattern for columns: The relative probability of each permutation in is calculated as below: where is the number of times the permutation is found in the time series. Finally, the relative probabilities are used to compute the permutation entropy: An algorithm for the permutation entropy computation is presented below.

Algorithm 1: Permutation Entropy
Step1 Reshape the data series into a matrix as in Equation (1) Step2 Find the permutation patterns Step3 Calculate the probability of each permutation in Step4 Compute as in Equation (4) Permutation entropy of the coarse-grained battery voltage is extracted, as in Figure  6. Despite the noise affecting the entropies, in PL11, the differences in the entropies are relatively small compared to the earlier cycles, while the deviations increase as the number of cycles increases. In PL19, the range of entropy is approximately constant over a different number of cycles; however, in PL09, they are completely random. After data analysis, we split the data into train and test subsets. The proposed models utilize approximately 90% of the data for training purposes and take the rest for evaluation, as in Figure 7. The mechanism through which the training/test ration is selected is explained in the following sections.

Predictive Models
The predictive models are presented in this section as follows.

LSTM
Long Short-Term Memory, known simply as LSTM, is a framework for a recurrent neural network (RNN) which avoids the problem of long-term dependency. Unlike standard feedforward neural networks, LSTM has feedback connections, and hence, it can update and store necessary information. It has been widely utilized in time series forecasting in different fields of science in recent years [30].
A unit LSTM cell consists of an input gate , forget gate , and an output gate . Each gate receives the current input , the previous state ℎ , and the state of the cell's internal memory. , ℎ , and are passed through non-linear functions, which yield the updated and ℎ [31]. Considering , , , and , , , as the correspondig weights matrices and , , , as the bias vectors, each LSTM cell operates based on the following Equations.
= * + * ̃ (8) In this study, all three gates take permutation entropy of the battery voltage at cycle and the battery capacity at cycle − 1 as their input variables, and , and output the estimated battery capacity, , for the given inputs as shown in Figure 8. Furthermore, an algorithm is presented for the proposed LSTM model.
MA( ) can be described as follows: ARMA( . ) is a combination of AR( ) and MA( ), and is described as below: where and , respectively, are the observed and estimated values; ∅ and , respectively, are coefficients; and is a normal white noise process with zero mean. ARIMA is an advanced version of ARMA, which also works well for non-stationary time series data. To convert the non-stationary to stationary data, a data transformation is needed using a -order difference equation [32]. Consequently, ARIMA ( . . ) can be described as Equation (14).
where = ∇ and ∇ is the gradient operator. When = 0, Equation (14) is the same as Equation (13) and, thus, ARIMA acts the same as ARMA. and are initialized using the autocorrelation function (ACF) and partial autocorrelation function (PAFC).
AFC measures the average correlation between data points in a time series and previous values of the series measured for different lag lengths. PACF is the same as ACF, except that each correlation controls for any correlation between observations of a shorter lag length [32]. Figure 9 demonstrates the ARIMA framework from the input data stage through the prediction stage. In this study, an ARIMA model is proposed to predict future battery capacities. Since we are working with a non-stationary time series, we have made a data transformation with = 1. and , respectively, are set to 5 and 0, and thus, predictions were made with ARIMA(5.1.0). The rationale behind choosing the order of the ARIMA model is as follows. We compare the results from a range of non-negative integers, = [1,10] (extracted from the existing literature), and select the optimal number of time lags for the autoregressive model, which results in minimal errors compared to other orders in that range. The results from the optimal model are displayed and reported here.
There is a battery voltage sequence at each cycle (i.e., a time series of voltages at each cycle). We first compute the permutation entropy of each voltage sequence according to the corresponding algorithm; then, we use the time series of the permutation entropy measures (i.e., one entropy measure at each cycle) as an input in the ARIMA model, compare them with the deviations in the battery capacities, and predict the next-cycle battery capacity as an output of the model.
An algorithm for the ARIMA model is presented as follows.

Reinforcement Learning
Reinforcement Learning (RL) is a type of multi-layered neural network, and has become a focus of research in modern artificial intelligence. The concept is based on rewarding or punishing an agent's performance in a specific environment. A state is a description of the environment made to provide the necessary information for the agent to decide at each time step. For each and every state , the agent has a number of selecting actions to make decisions from. A policy is required, based on a cost function, to map each state to the optimal action with the consideration of maximizing its reward function during the episode [33].
Reinforcement Learning has real-life applications in various fields such as driving cars, landing rockets, trading and finance, diagnosing patients, and so on. This Deep Learning technique differs from supervised learning, as it does not require correct sets of actions and labeled input/output pairs [34]. Instead, the goal is to find a balance between exploration and exploitation. Figure 10 illustrates the schematic of a general Reinforcement Learning structure and its Equations are described as follows.
In this study, we have considered the permutation entropy of the battery voltage as the states and the capacities as the actions, which should be taken at each state based on the given entropy. An algorithm for the RL model is presented in the following. using Equation (19) Evaluate the estimation using the following loss function as in Equations (20)

-(22) end
The hyperparameters of the proposed models define how they are structured. Optimal hyperparameters are approximated so that the loss is reduced. In other words, we explore various model architectures and search for the optimal values in the hyperparameter space to minimize the resulting performance metrics; for instance, Mean Squared Error. For this purpose, in the three models, grid search is used for tuning the hyperparameters and achieving reliable comparisons between the numerical results from the models. A model is built for each possible combination of all of the hyperparameter values; next, the models are evaluated based on the performance metrics, and then the architecture which produces the best results is selected. The results and findings are reported in the following section.

Results and Findings
The numerical results and findings are presented in this section as follows.

Performance Measures
To evaluate the performance of the proposed models, we present the observed and predicted battery capacities for ARIMA and LSTM models and the reward and loss functions obtained from the RL model. Furthermore, we compare the observed and predicted battery capacities gained from each of these models using three performance metrics [35] as shown below: Mean Squared Error (MSE): Mean Absolute Error (MAE): Root Mean Squared Error (RMSE): where and , respectively, are the observed and predicted capacity at cycle , and is the number of test data.

Numerical Results
The observed and predicted battery capacities results from ARIMA and LSTM models are shown in Figures 11-13. Based on the graphs obtained, it can be seen that in all three datasets the ARIMA model predictions are following the trends in the test data, and so, yields better results as compared to the LSTM model for predicting the time series of battery capacities. Figure 11. Train, test, and predicted data results from ARIMA and LSTM models for PL19. Figure 12. Train, test, and predicted data results from ARIMA and LSTM models for PL11. Figure 13. Train, test, and predicted data results from ARIMA and LSTM models for PL09.
The early battery-life prediction, which includes a prediction of the battery cycles at earlier cycles, is performed, and the results are displayed in Figures 14-16. It is observed that the deviation between the predicted capacities and the actual capacities are not significant, indicating that the proposed ARIMA and LSTM models are capable of predicting battery capacities at earlier cycles. Figure 14. Train, test, and predicted data results from ARIMA and LSTM models for PL19. Figure 15. Train, test, and predicted data results from ARIMA and LSTM models for PL11. Figure 16. Train, test, and predicted data results from ARIMA and LSTM models for PL09.
In the RL model, as demonstrated in Figure 17, the reward values have an impressive increase and immediately become stable with some noise. The loss values increase at first; however, after approximately 250 epochs, they decline to 0, which verifies the procedure of Reinforcement Learning. To find the best data split ratio, our proposed RL approach is initially trained using shuffled datasets with five different training ratios (70%, 75%, 80%, 85%, and 90%). Afterwards, Mean Squared Error (MSE) is utilized as a loss function to evaluate the obtained results. Based on Table 4, the best accuracy is gained by using 90% of each dataset for training purposes and using the rest for the testing process ( Figure 18). Finally, this ratio is applied to training the other two models (LSTM and ARIMA). To save space, the results from the LSTM and ARIMA models are not reported here. The results from the other two models are consistent with those from RL (i.e., the best training ratio of 10%).  18. Finding the best Train-Test Split.

Comparisons
Tables 5-7 represent a snapshot comparison of the aforesaid models for the PL19, PL11, and PL09 datasets, respectively. As the results show, in all datasets, ARIMA slightly surpasses the LSTM and RL models since it results in the smallest MSE, MAE, and RMSE values. However, the differences are not significant, and for PL19 and PL11, ARIMA and RL yield approximately the same values of performance measures. It is concluded that LSTM and RL also result in minor errors. From Tables 5-7, it is observed that the ARIMA model yields smaller errors compared to the LSTM model. ARIMA, which is a mean-reverting process, has the ability to predict battery capacities with smaller deviations. However, the LSTM model-which is a recurrent network-attempts to avoid the long-term dependency by storing only necessary information, and thus, it is unable to probabilistically exclude the input (i.e., previous permutation entropy of battery voltage sequences) and the recurrent connections to the units of the network from the activation and weight updates while the model is being trained. Consequently, the deviations between the actual battery capacities and the predicted capacities resulting from the LSTM model are greater than those resulting from the ARIMA model. The results displayed on Figures 11-13 are consistent with the Tables.

Conclusions
In lithium-ion battery applications, failures in the system can be minimized by performing prognostics and health management. Data-driven methods are one way of doing so, and identify the optimal replacement intervals or the optimal time for changing the battery in an appropriate manner. This paper presents three different models (LSTM, ARIMA, and RL), which all are built based on the permutation entropies of the battery voltage sequences, for next-cycle battery capacity prediction using the status of the previous states. In various data conditions, different models may be required; having a collection of models, even for the same purpose, can be useful. In addition to accurate prediction of battery capacities based on the ARIMA model, it is shown that the LSTM and the proposed entropy-based RL models have similar performance and both result in small errors.