Ensemble Model Based on Stacked Long Short-Term Memory Model for Cycle Life Prediction of Lithium– Ion Batteries

: To meet the target value of cycle life, it is necessary to accurately assess the lithium–ion capacity degradation in the battery management system. We present an ensemble model based on the stacked long short-term memory (SLSTM), which is used to predict the capacity cycle life of lithium–ion batteries. The ensemble model combines LSTM with attention and gradient boosted regression (GBR) models to improve prediction accuracy, where these individual prediction values are used as input to the SLSTM model. Among 13 cells, single and multiple cells were used as the training set to verify the performance of the proposed model. In seven single-cell experiments, 70% of the data were used for model training, and the rest of the data were used for model validation. In the second experiment, one cell or two cells were used for model training, and other cells were used as test data. The results show that the proposed method is superior to individual and traditional integrated learning models. We used Monte Carlo dropout techniques to estimate variance and obtain prediction intervals. In the second experiment, the average absolute percentage errors for GBR, LSTM with attention, and the proposed model are 28.6580, 1.7813, and 1.5789, respectively.


Introduction
Lithium-ion (Li-ion) batteries have the advantages of low cost, high energy density, and long service life so that they can be widely used in mobile electronics and automotive industries [1,2]. For all battery chemistries, Li-ion batteries degrade with each charge and discharge cycle. Therefore, the accurate description and estimation of the degradation process of lithium-ion batteries have become an important issue, in which the battery health status (SOH) and remaining useful life (RUL) are two important indicators [3,4]. The accurate estimated value of SOH can make users maintain the battery more reasonably, to improve the safe usage rate of the battery [5,6]. SOH is the indicator of the present performance of the battery in terms of either capacity or resistance/power, while RUL indicates the rest of the combined cycle and calendar life until the predefined end-of-life (EOL) level is achieved.
Model methods, data-driven methods, and hybrid methods often perform capacity reduction and cycle life prediction. Model-based methods are based on degraded empirical or physical models; however, constructing such a mathematical model is not an easy task. This method does not require large amounts of data. The data-driven approach can capture strong nonlinearities without prior system knowledge. Data-driven methods have been commonly used to predict cycle life and can represent the inherent relationship of the battery without requiring professional knowledge of degradation mechanisms [7]; however, this method requires a lot of data and computational cost.
Hybrid methods show great potential for better prediction accuracy than model-based and datadriven prediction methods [8]. Ensemble learning technology is one of the most popular hybrid methods, which improves prediction accuracy by combining multiple learning algorithms [9]. The ensemble learning method is a learning algorithm that can aggregate predictions generated by multiple learning algorithms to enhance prediction performance.
Previous research has shown that ensemble learning algorithms generally outperform any individual learning algorithm [9][10][11][12][13]. Most ensemble methods use the average value of multiple models as the final prediction result; however, the prediction of ensemble learning in regression problems does not necessarily guarantee that its prediction results will be better than a single model. Therefore, a new method needs to be developed to test whether an ensemble model combining multiple learning algorithms can provide better prediction performance than a single model. The long short-term memory (LSTM) recurrent neural network (RNN) was used to analyze the capacity degradation of lithium-ion batteries [14]. The LSTM model can be extended and added to other mechanisms such as bidirectional, stacked, and attention mechanisms.
In this study, we propose an ensemble model based on a stacked long short-term memory (SLSTM) model to the cycle life estimation of lithium-ion batteries. The final prediction output was obtained by stacking LSTM models, rather than taking the average of the prediction values of each model. Besides, the selected features and optimal hyperparameter values may affect the accuracy of cycle life prediction. Few studies consider multiple features as input to the model. For example, Chen et al. [15] and Wang and Mamo [16] used support vector machines with multiple features such as cycle and temperature to predict the SOH of lithium-ion batteries. The best model hyperparameters are obtained by using the differential evolution (DE) algorithm, and several features were also considered in this study. LSTM with attention and gradient boosted regression (GBR) models were used as two single models. Sections 2 and 3 describe the experimental data and a detailed description of the proposed model, respectively. Section 4 describes model verification. Finally, we summarize the conclusions and future research directions.

Experimental Data
Thirteen cells from commercial Lithium Ferrous Phosphate (LFP)/graphite (A123 Systems, model APR18650M1A, 1.1 Ah rated capacity) [2] were used for model verification. These 13 cells were cycled in a temperature-controlled environmental chamber (30 °C) under various fast charging policies and discharged with a constant-current constant-voltage (CC-CV) discharge at 4 C to 2.0 V with a current cutoff of C/50. The rated capacity of each unit is 1.1 Ah, and the rated voltage is 3.3 V.
The entire data set consisted of three batches of cells running in parallel. The cell was tested using a two-step fast charging policy. For example, a two-step strategy might include a 6 C charging step from 0% to 50% state-of-charge (SOC), followed by a 4 C charging step from 50% to 80% SOC. The 72 charging policies represent different combinations of current steps in the 0% to 80% SOC range [2]. The batch test conditions were slightly different. For the "2017-05-12" batch, after reaching 80% SOC, the test conditions during charging and after discharging were 1 min and 1 s, respectively. For the "2017-06-30" batch, after reaching 80% SOC, the test conditions during charging and after discharging were both 5 min. For the "2018-04-12" batch, after reaching 80% SOC, the test time during the charging process, after the internal resistance test and before and after discharging, was 5 s.
For model verification, we selected thirteen cells from the "2017-05-12" and "2017-06-30" batches. Table 1 shows the representation of 13 cells with their charging policies. In the first experiment, seven cells are used, where the first 70% of the data on each cell was used for model training to predict the remaining discharge capacity. In the second experiment, two groups (cell 1, cell 2, and cell 3) and (cell 2_25, cell 2_26, and cell 2_27) were used, where the test condition of each group is under the same charging policy. Several features such as discharge capacity, cycle number, temperature, and internal resistance for each cell were used for model training to evaluate the performance of the proposed model.

Prediction Models
In this study, we propose an ensemble model based on the SLSTM model for cycle life prediction. The ensemble model is based on two models, LSTM with attention and GBR models, which are introduced in this subsection.

LSTM with an Attention Mechanism
Although the standard recurrent neural network is an extension of the conventional feedforward neural network, it has the problem of gradient vanishing or explosion. LSTM was developed to solve these problems and achieve excellent performance. It has unique memory and forgetting modes and can be flexibly adapted to the timing characteristics of network learning tasks. The units of the LSTM model includes a forget gate, input gate, and output gate [17,18]. The forget gate is designed to determine whether it needs to be discarded from the cell state. The input gate is designed to determine whether new information should be stored in the cell state. The output gate is designed to determine what information will be transferred from the cell state to the current hidden layer data. These gating units are derived by where is the sigmoid function to keep the output value between 0 and 1; ht−1 and Xt are previous layer data and the current input layer data; ( , , ) are the input weight, the recurrent weight, and the bias of a forget gate.
where ( , , ) are the input weight, the recurrent weight, and the bias of an input gate. A tanh layer is chosen to form the new memory as = ℎ( + ℎ + ), where ( , , ) are the input weight, the recurrent weight, and the bias of a new memory. Then, the cell state is updated by = × + × .
where ( , , ) are the input weight, the recurrent weight, and the bias of an output gate. Finally, we multiply it by the output of the sigmoid gate as ℎ = × ℎ( ) . Figure 1 shows the architecture of the LSTM cell.  [19,20].
Recently, the attention mechanism is usually used to analyze images and time-series data. Compared with other ordinary deep learning models, combining attention with LSTM can obtain better results. Note that the attention layer only helps to select the output of earlier layers that are critical to each subsequent stage of the model. It allows the network to focus on specific information selectively. It is accomplished by building a neural network focused on appropriate tasks. Detailed information on the attention-based LSTM model can be found in [21][22][23][24]. In this study, the attentionbased LSTM model is used as a single model. The best model parameters are obtained from the DE algorithm.

Gradient Boosted Regression
Gradient boosting is a useful machine learning model that can obtain accurate results in various practical tasks. It focuses on the errors caused by each step iteratively until the weak learners are combined by finding suitable strong learners as the sum of consecutive weak learners [25][26][27]. The boosting iteration can be based on functional gradient descent. Let S = {( , ), ( , ), … , ( , )} be samples. A function ( ) is used to predict values based on the local loss function ( , ( )). We minimize the expected value of the loss function to obtain the approximate value ( ) of the function f(x). GBR follows the regularization-method based on shrinkage and updates in each corresponding area as follows: where is called shrinkage to control the learning rate of the procedure and is the number of leaves of defined by the rectangular regions ℓ . The coefficients ℓ of a new tree can be fitted by retaining the leaves rectangles , . . . , of Parameters such as shrinkage (v), number of trees (t), number of leaves ( ), bag fraction, and interaction depth need to be determined by using the DE algorithm. The ratio of bags is the observed score of the training data, which is randomly selected to generate the next tree. Figure 2 illustrates a novel framework for cycle life prediction using an ensemble model. In the first level, two machine learning models, such as LSTM with attention and GBR models, are used to generate predictions. These predicted values are used as input features for the final prediction with actual features. In the second level, the SLSTM model with a sliding window method is used to predict the final predicted value. All hyperparameters of each model are derived using the DE algorithm. The detail of the DE algorithm can be found in [28][29][30]. For example, five parameters need to be determined in advance, such as lookback, batch size, neuro, steps per epochs, and epochs in the LSTM model. The following steps can obtain the best hyperparameters.

Propose Model
Step 1. Extract the features such as cycle, capacity, internal resistance, and average temperature.
Step 2. Use mean absolute percentage error (MAPE) as the fitness function, which is obtained by where is the actual capacity at cycle t, is the predicted capacity at cycle t, and n is the prediction length.
Step 3. Select the range of LSTM and GBR hyperparameters, and use the specified model to calculate the MAPE value.
Step 4. Decide the values of DE algorithms such as NP, CR, and F, which are 50, 0.9, and 0.8, respectively, in this study.
Step  The LSTM model should include dropout parameters to reduce overfitting and enhance model performance. Dropout is a regularization method. In this method, the cyclic connection with the input of the LSTM cell may not be excluded from the activation and weight update in network training. That is to say, two parameters (such as dropout and recursive dropout) are used for linear transformation of the recursive state. Therefore, the Monte Carlo (MC) dropout technique is used to obtain the variance and bias of the proposed model, and the sliding window method is used to construct its prediction interval. Converting a conventional network to a Bayesian network through MC dropout is as simple as using the dropout technique for each layer during training and testing. It is equivalent to sampling from the Bernoulli distribution and provides predictive stability across samples [31]. The idea is to run the model several times with random dropout, which will produce different output values. Then, we can calculate the empirical mean and variance of the output to obtain the prediction interval for each time step. The sliding window is a temporary approximation of the actual value of the time-series data. The window and segment sizes will increase until a smaller error approximation is reached. After selecting the first segment, we select the next segment from the end of the first segment; repeat the process until all time-series data are segmented.

Analysis Results
This section discusses the results of the proposed model on Li-ion battery capacity degradation and cycles life prediction using single and multiple cells for training data. We compare the performance of the proposed model with two individual models, such as GBR and LSTM with attention, and the conventional ensemble-learning model. The average of the two predicted values from individual models is used for the conventional ensemble-learning model. The confidence intervals of the proposed model are reported.

Capacity Degradation Trend Prediction
Two experiments were carried out in this study. In the first experiment, seven cells were conducted. For each cell, 70% of the data was used for training, and the remaining 30% of the data was used to test the model. Table 2 shows the best parameters of the LSTM with attention and proposed models, such as lookback, batch size, neuro, steps per epoch, and epochs. The MAPE and root mean square error (RMSE) were selected as measure criteria for the test data, where RMSE is obtained by Table 3 shows the capacity degradation predictions of different models. The results indicate that the proposed model outperforms the other two single models and the conventional ensemble learning model. For example, on cell #2_5, the RMSE values of the GBR, LSTM with attention, conventional ensemble learning, and proposed models are 0.0290, 0.0121, 0.0198, and 0.0047, respectively. LSTM with an attention model provides the second-best value for the prediction of lithium-ion capacity trends. Due to the worst prediction result of the GBR model, the performance of the conventional ensemble learning model becomes worse than that of the LSTM with the attention model. As a result, the conventional ensemble learning model does not necessarily guarantee that its prediction results will be better than a single model. For further clarification, Figures 3 and 4 show the prediction performance of different models for cells #2_11 and #2_12, respectively. Although the proposed model provides better prediction performance, it has a longer computation time than other models, as shown in Table 4. This is the limitation of our proposed model. The computation time of the LSTM with the attention model is the lowest; however, the time required to find the optimal parameters of the LSTM with the attention model will take several hours, depending on the number of iterations in the DE algorithm.    The MC dropout method is used for variance estimation to construct a prediction interval. The proposed model was run 100 times with random dropout, which will produce different output values. We can calculate the mean and variance of the prediction results from these different output values, and then use the conventional formula to construct the prediction interval. Figure 5 shows the prediction interval of the proposed model on 2_11 and 2_12 cells at a 95% confidence level, where LB is the lower boundary, and UB is the upper boundary. The narrow width of the interval indicates that the prediction result is more reliable and credible. The prediction uncertainty increases when the prediction point is far from the starting cycle of test data. In the second experiment, one or two cells were used as training data, and other cells were used as test data to verify the proposed model. The training data can be one cell or two cells. Table 5 lists the performances of different models. The average MAPE values of GBR, LSTM with attention, and the proposed model are 1.2734, 0.9029, and 0.7294. The results show that the proposed model performs better than the GBR model and the LSTM with the attention model in all cases. For the test data such as cells #3 and #2_27, two training cells can provide better predictive performance than one training cell. Figure 6 shows the prediction performance of the proposed model using one cell or two cells as training data for cell #2_27. This shows that the proposed model can accurately predict the degradation trend of lithium-ion battery capacity.  Figure 6. The prediction performance of the proposed model using one cell or two cells as training data for cell 2_27: (a) predicted capacity and (b) absolute prediction error.

Cycle Life Prediction
The cycle life prediction of the proposed model is evaluated by using an absolute percentage error (APE), which is given by where represents actual cycle life and represents the predicted cycle life. The lowest APE value indicates that the model has better performance. Table 6 provides the performance of the different models by the cycle life of a single cell experiment. The actual cycle life values for seven cells are given in the second column of Table 6. In addition, Table 7 shows the estimated life cycle of six cells under one cell or two cells as training data with the actual cycle life for four test cells. For cells #2_26 and #2_27, The results show that by averaging the experimental results, the model has better performance than the single model.

Conclusions
Our research uses an ensemble model based on stacked long short-term memory, which combines the LSTM with attention and gradient boost regression models to predict the cycle life of lithium-ion batteries. The model hyperparameters are obtained by using the DE algorithm. The performance of the proposed model is compared with a single model using single and multiple cells for training. The first experiment used data from seven cells to verify the performance of the proposed model, where 70% of the data was used for model training, and the remaining data were used for model verification. The second experiment used one cell or two cells as the training set, and other cells to verify the model's predictive ability. In most cases, the comparison results verify that the proposed model is superior to the single model in predicting the capacity decline trend. In the first experiment, the maximum APE value for predicting cycle life is 0.4504. In the second experiment, the average APE values of GBR, LSTM with attention, and the proposed model are 28.6580, 1.7813, and 1.5789, respectively. These results show that the proposed model has a better cycle life prediction performance than other models. In addition, the prediction variance of the model can be obtained using the MC dropout technique, which can provide prediction uncertainty. From the analysis results, we conclude that the proposed model can provide more accurate and reliable prediction results; however, the calculation time required for the ensemble model is longer than that of the single model.
In the future, the capacity degradation and cycle life prediction of other ensemble learning models in different types of lithium-ion batteries are worth investigating. In addition, the number of single models used in ensemble models can be increased to more than five. In Table 5, we found that two cells as training data provide better prediction accuracy than one cell; however, this result comes from a small experiment. Further large-scale experimental data analysis is worthy of studying the results of transfer learning.