Attention-Based Long Short-Term Memory Recurrent Neural Network for Capacity Degradation of Lithium-Ion Batteries

: Monitoring cycle life can provide a prediction of the remaining battery life. To improve the prediction accuracy of lithium-ion battery capacity degradation, we propose a hybrid long short-term memory recurrent neural network model with an attention mechanism. The hyper-parameters of the proposed model are also optimized by a differential evolution algorithm. Using public battery datasets, the proposed model is compared to some published models, and it gives better prediction performance in terms of mean absolute percentage error and root mean square error. In addition, the proposed model can achieve higher prediction accuracy of battery end of life.


Introduction
Lithium-ion (Li-ion) batteries are widely used in backup power supplies, portable communication equipment, consumer electronics, electric vehicles, and energy storage systems [1]. As the number of charge/discharge cycles increases, the battery capacity will gradually decrease. The main problem is to diagnose their state-of-health (SOH) and predict the remaining useful life (RUL) [1][2][3][4]. The RUL of a Li-ion battery is the length of time from the current time to the end of life (EOL), where EOL is approximately 70-80% of its nominal capacity [3]. In other words, the capacity drops to a specific value called the threshold limit, and it will reach a pre-defined aging metric. Therefore, RUL prediction is one of the main tasks of a battery management system.
There are three types of RUL prediction methods: model-based method, data-driven method, and hybrid method [1][2][3][4]. The model-based method, including physical model and mathematical model, is to build a model for analysis with observable values and other key indicators during battery circuits or electrochemical degradation. The data-driven method is to build regression models for analysis with large amounts of data. It can be summarized as artificial intelligence (AI) or machine learning, filtering processes, statistical methods, and stochastic processes. Hybrid models combine two or more model-based methods or data-driven methods to improve the accuracy of RUL predictions. Kim et al. [5] proposed a SOH classification method based on multilayer perceptron (MLP). Zhang et al. [6] proposed long short-term memory (LSTM) recurrent neural network for the remaining useful life prediction of lithium-ion batteries. However, Li et al. [7] combined the LSTM model with empirical mode decomposition algorithm and Elman neural network for battery RUL prediction. Ren et al. [8] implemented another deep learning approach for lithium-ion battery remaining useful life prediction. Similarly, Yang et al. [9] implemented an improved extreme learning machine algorithm for RUL prediction. Support vector machine also applied for battery state-of-health and RUL prediction using different approaches. For instance, Wei et al. [10] used the hybrid model based on particle filter and support vector regression for prediction of battery SOH and RUL. Wang et al. [11] applied support vector regression with differential evolution algorithm to predict battery RUL using cycle, current and voltage as input function. Yang et al. [12] proposed support vector regression for battery state-of-health prediction. Zhao et al. [13] presented a data-driven method that combines feature vector selection (FVS) with SVR model for battery SOH and RUL prediction using the time interval of an equal charging voltage difference as health indicator. Wang et al. [14] used constant voltage charging profile as a health indicator for lithium-ion battery RUL prediction. Moreover, different researchers applied other machine learning models, such as relevance vector machines (RVM) [15][16][17], Gaussian process regression (GPR) [18][19][20][21], Gaussian process models [22], random forest regression (RFR) [23], and gradient boosted regression (GBR) [24,25], using different approaches for battery SOH estimation. This study focused on the hybrid model of LSTM with attention mechanism, as it gives better prediction accuracy.
Deep learning models such as LSTM and gated recurrent unit (GRU) have been paid superior attention in several research fields, due to vanishing problems of the traditional recurrent neural network. The long-term memory unit of these models stores long-term information in the state of other units. The outputs of the traditional RNN model have been limited until the development of these deep learning models. LSTM or GRU models have often replaced traditional RNNs and use gates to control input-output information. They can solve the problem of gradient vanishing of traditional RNN model. Thus, LSTM is one of the recent prediction techniques for time-series problems, and it has three gates, such as input gate, output gate, and forget gate. Therefore, the network of the LSTM model can memorize longer sequences and manage longer dependencies in order to converge on specific problems. However, the network cannot fully memorize long-term information or state and transfer to the next LSTM unit, which makes it difficult to avoid the defect of long-term forgetting in the LSTM model. Therefore, implementing LSTM model alone cannot give better and adequate accuracy in the process of continuing prediction. Currently, some researchers have familiarized the LSTM model with the attention mechanism, in order to improve the information-processing capabilities of the model. They can obtain better prediction accuracy when the models are incorporated. Attention mechanism allows the network to focus on specific more valuable information selectively. As a result, attention mechanisms soon expanded to various fields, including time series prediction. Therefore, we develop a long short-term memory recurrent neural network model with attention mechanism to analyze the capacity degradation of lithium-ion batteries. The proposed model has two parts: the LSTM model and attention model. The attention mechanism is located on the output layer of LSTM, and it is used to model long-term dependencies. At the same time, we used a differential evolution (DE) algorithm to obtain the optimal hyper-parameters of the model. The performance of the proposed method for capacity degradation and battery end-of-life prediction was studied using the four public battery datasets. The rest of this article is organized as follows. Section 2 introduces the feature extraction of Li-ion batteries. The proposed model will be discussed in Section 3. Section 4 describes the analysis of capacity degradation estimates and RUL predictions. Finally, we make conclusions and further research directions.

Lithium-Ion Battery Datasets
A total of sixteen batteries come from four different types of Li-ion batteries, and are used to compare the predictive capabilities of different models. Four 18650-size rechargeable batteries are from the NASA battery data [26], and three LiCoO 2 cathode cells are from the CALCE battery data [7,27]. Besides, four pouch-shaped cells are from the Oxford battery degradation dataset [28], and five commercial Li-ion phosphate/graphite cells are from Toyota data [29]. Table 1 shows some battery specifications, and detailed experimental settings can be found in the literature [25][26][27][28][29]. When the capacity drops to approximately 70%~80% of the rated capacity, the experiment is stopped. The cycle life of the battery capacity is normalized as where C normalzied represents the SOH, C current is the capacity of the current battery, and C nominal is the rated capacity of the battery (see the last column of Table 1). The capacity of the battery is affected by the loss of available lithium ions and the loss of anode and cathode materials. Two variables, such as cycle number and temperature, are used as input to model except for CALCE data. Since temperature data cannot be obtained from CALCE data, the number of cycles is used only as an input variable.

Long Short-Term Memory with Attention Mechanism
Attention-based LSTM model has two parts: LSTM layer and attention layer. The network of LSTM model can learn and manage longer sequences and dependencies in order to converge on specific solutions. The three nonlinear gating units of long shortterm memory recurrent neural network are forget gate, input gate, and output gate [30]. The purpose of the storage unit in LSTM network is to identify when to acquire new information and when to forget old information. In order to allow the networks to focus on valuable selected information, the attention layer was combined with the LSTM model in this study. The clue of an attention-based LSTM model is to add an attention layer to the output layer of LSTM unit for modeling long-term dependencies in the network. It can also control importance-based sampling. Therefore, this study considered an LSTM model with an attention mechanism for the prediction of capacity degradation trends. Equation (2) provides the input gate of the network, which controls the level of the new memory added. Equation (3) regulates the amount of forgotten memory in the forget gate. Finally, Equation (6) moderates the level of the output memory in the output gate, and finally, LSTM calculates the control state h i and the cell state c i .
where σ and tanh are the sigmoid and activation function, respectively; ⊗ indicates multiplication of the elements; [h i , x i ] is h i−1 and x i concatenation; W i , W f , W c , and W o are the learning weight parameters; b i , b f , b c , and b o are the learning bias parameters. Moreover, an attention mechanism is used to improve the accuracy of the LSTM model [31]. The attention layer aids the selection of the critical output of the earlier layers for each subsequent phase in the model. It assists the networks to focus on specific important information. The functions of the attention model are indicated as where H is the matrix of extracted features [l t1 , l t2 , . . . , l tn ], e n εR n is a vector, u a is embedded attention mechanism, α is the vector form of extracted feature H attention weights, and r is the final output of the attention model that is the weighted sum of extracted features H. The embedding is learned during model training. Figure 1 shows the summary of the proposed model, which is attention-based LSTM model for capacity degradation trend prediction of lithium-ion batteries. Support vector machine is a common machine learning model for prediction, classification, clustering, and other learning tasks. In this study, support vector regression is used for the prediction of the capacity degradation trend. Let the training set {(x 1 , y 1 ), . . . , (x n , y n )}, where x i ∈ R n is a feature vector, and y i ∈ R is the target output. The SVR function is given by where φ(x) is a nonlinear mapping function, w ∈ R n and b ∈ R are adjustable coefficients, and K(x i , x j ) = exp(−γ x i − x j 2 ) is the Gaussian radial basis function. The SVR hyper parameters such as gamma (γ), Cost (C) and Epsilon (ε) are optimized by DE algorithm in this study. The MLP model consists of a feed-forward artificial neural network model for classification or regression problems. A seven-layer MLP network, such as an input layer, five hidden layers, and an output layer were considered in this study. The number of neurons, dropout, epochs, and batch size of the MLP model are optimized by DE algorithm.
The parameters of the proposed model are optimized by the DE algorithm. DE algorithm is the search heuristic which was innovated by Storn and Price in 1996 [33]. In this study, DEoptim R library is used, which has different parameters such as NP, F, and CR. The number of parameter vectors in the population is represented by NP in the DE algorithm, and it guesses the optimal parameter value at generation zero. It finds the optimal parameters from the random values between the lower and upper bounds. The variable F is a positive factor between zero and one. The mutation of the algorithm can be continued until either the length of the mutation has been made or random number greater than a crossover probability (CR), which is between zero and one. The choices of NP, F, and CR depend on the specific problem. The following procedures are carried out to obtain optimal parameters of the proposed model, SVR and MLP models using the DE algorithm.
Step 1. Normalize all features, such as capacity, cycle, and temperature.
Step 2. Choose the fitness function, which is mean absolute error (MAE) in this study. It can be obtained by where C i is the actual SOH at cycle i,Ĉ i is the predicted SOH at cycle i, and n is the number of cycles used in the calculation.
Step 3. Select the ranges for the model parameters.
Step 4. Decide the values of DE parameters. NP = 40, CR = 0.9, and F = 0.8 are used in this study.
Step 5. Obtain the optimal values for each parameter.

Model Performance for Prediction of Capacity Degradation Trend
Four Li-ion batteries, such as #5, #6, #7, and #18, are used to illustrate the effectiveness of the proposed model for SOH estimation. We compare the prediction accuracy of the proposed model with the other two models, as indicated in Table 2. The first 80 cycles of the battery are used as training data, and the remaining cycles are used as test data. Mean absolute percentage error (MAPE) and root mean square error (RMSE) are chosen for evaluating the prediction accuracy of the models, which are obtained as In Table 2, the results show that the proposed model performs better than SVR and MLP models in terms of MAPE and RMSE for test data. For battery #5, the RMSEs of SVR, MLP, and LSTM with attention were 0.0123, 0.0174, and 0.0078, respectively. The results show that the proposed model is the best model, followed by the SVR model. Moreover, the performance of the models trained with training data #1-#80 for batteries #5 and #6 is shown in Figure 2a,b. It shows that the proposed model has better prediction accuracy than SVR and MLP models.
We investigate the effects of regeneration by using three different ranges of training data, which are cycle numbers #1-#80, #1-#100, and #1-#120. Table 3 shows the RMSE values of batteries #5, #6, #7 and #18 under three different test data. For instance, the RMSE values of the proposed model on battery #5 under three different test data are 0.0078, 0.0047, and 0.0058, respectively. The results show that the proposed model is not significantly affected by a different training dataset. Therefore, we can conclude that the proposed model is a robust approach.  Pouch-shaped batteries were used to study the performance of a fusion method based on wavelet de-noise (WD) and hybrid Gaussian process function regression (HGPFR) model under different training data [20]. Here, three different training data for Cell-1 and Cell-7, including 100-3000, 100-3500, and 100-4000, are considered. Table 4 shows the predicted RMSE values for different testing dataset in order to compare the proposed model with published methods. For example, the RMSEs of GPR, HGPFR, WD-HGPFR, SVR, MLP, and LSTM with attention for Cell-1 under training data (cycles 100-3000) are obtained as 0.0600, 0.0408, 0.0108, 0.0085, 0.0101, and 0.0030, respectively. The results show that LSTM with attention model can provide higher prediction accuracy than the three models in [20], SVR, and MLP models. In addition, the performance of the proposed model for unseen datasets is validated by other cells in the same experiment. For example, batteries #5 and #6 are used as training data to predict the SOH of battery #7 and #18. Table 5 shows the prediction accuracy of the models on the test data. The proposed model also outperforms individual models for invisible data. Therefore, we can conclude that the proposed model provides better performance for unseen data.

Battery EOL Prediction
When the capacity drops to 70% or 80% of the rated capacity, the battery is regarded as EOL. In this study, 70% of the rated capacity is considered as EOL for battery #18, CS2_37, and CS2_38. For battery #7, 75% of the rated capacity is used as the EOL of the battery. Note that the capacity value of battery #6 is greater than 2.0 Ah in cycles 1-7, and the training data start after cycle #8. For Cell-7 from Oxford battery, 80% of the rated capacity is considered as battery EOL. Moreover, 85% of the rated capacity is regarded as EOL for batteries from Toyota data, because if 80% is used as the battery EOL, the predicted EOL of all models will be similar (i.e., the test end). Table 6 shows the performance of the models in the battery EOL prediction for unseen dataset. The first cycle number is used as the starting point for all test data, and EOL is expressed in cycle number. To evaluate the prediction performance of the models, we used relative error (RE) as a performance measure, which is obtained as where R represents the actual EOL value andR represents the predicted EOL value. The results indicate that LSTM with the attention model is better than SVR and MLP models in most cases for the unseen dataset. Therefore, we can conclude that LSTM with attention model provides better prediction performance for battery EOL prediction.

Conclusions
Cycle life prediction plays a vital role in a battery management system. In this study, we propose an LSTM model with attention mechanism to analyze the capacity degradation of Li-ion batteries. In addition, the DE algorithm is used to obtain the optimal hyperparameters of the SVR, MLP, and LSTM with attention models. Using four batteries from the NASA data and two cells from the Oxford data, the proposed model performs better than SVR and MLP models for the prediction of capacity degradation trend in terms of MAPE and RMSE criteria. Moreover, we found that the proposed model is not significantly affected by different training datasets, and it can accurately predict the SOH and EOL of the battery for unseen datasets. Therefore, we can conclude that LSTM model with attention mechanism can produce more accurate and reliable results.
We will study on another hybrid artificial intelligence model for battery state-of-health estimation and RUL prediction in the future.