A LSTM-STW and GS-LM Fusion Method for Lithium-Ion Battery RUL Prediction Based on EEMD

: To address inaccurate prediction in remaining useful life (RUL) in current Lithium-ion batteries, this paper develops a Long Short-Term Memory Network, Sliding Time Window (LSTM-STW) and Gaussian or Sine function, Levenberg-Marquardt algorithm (GS-LM) fusion batteries RUL prediction method based on ensemble empirical mode decomposition (EEMD). Firstly, EEMD is used to decompose the original data into high-frequency and low-frequency components. Secondly, LSTM-STW and GS-LM are used to predict the high-frequency and low-frequency components, respectively. Finally, the LSTM-STW and GS-LM prediction results are e ﬀ ectively integrated in order to obtain the ﬁnal prediction of the lithium-ion battery RUL results. This article takes the lithium-ion battery data published by NASA as input. The experimental results show that the method has higher accuracy, including the phenomenon of sudden capacity increase, and is less a ﬀ ected by the prediction starting point. The performance of the proposed method is better than other typical battery RUL prediction methods. model, and ﬁnally all results are added to obtain the battery prediction. The experimental results show that the proposed method can obtain than others methods. The prediction result is less by changing the In addition to this, it can accurately predict of


Introduction
Electric energy, as a clean secondary energy, is used in all aspects of our life. Lithium-ion batteries, as a device for storing electrical energy, have been widely used in transportation, aerospace and military defense applications due to their high energy density, low self-discharge rate, recyclability, high safety, and high voltage [1][2][3]. Since materials are key in the development of advanced devices for efficient electrical energy storage, far more efforts has been devoted to the development of lithium-ion battery materials [4][5][6]. To further improve the performance of lithium-ion batteries, other technologies that can ensure the achievement of high efficiency with respect to capacity and energy use are in demand. In recent years, battery management technology has attracted much attention, as it can enhance the safety and reliability of the battery systems during operation, reducing failure rate and operating cost.
With the increasing numbers of charge and discharge cycles of lithium-ion battery, the internal electrochemical reaction coupled with some complicated side reactions will lead to the continuous loss of lithium ions, and the increase of internal resistance of lithium-ion batteries. These irreversible electrochemical reactions are the main causes of lithium-ion battery deterioration [7]. When the lithium-ion battery degrades to a certain extent, it will not be able to withstand the load added between the positive and negative electrodes. In some large electrical systems, once an accident is caused by the result error. In short, the single neural network model often has a complex model structure and low accuracy. Therefore, some fusion methods have been receiving increasing attention. Pang et al. [18] proposed a novel method fusing the wavelet decomposition technology (WDT) and the Nonlinear Auto Regressive neural network (NARNN) model to predict the lithium-ion battery RUL. The global degradation and local regeneration of battery capacity time series were separated by WDT, and the global degradation trend and local regeneration were predicted by NARNN. Li et al. [19] proposed a novel hybrid battery RUL prediction method based on the empirical mode decomposition (EMD) algorithm, LSTM and Elman neural networks. EMD algorithm is used to decompose the original battery capacity data into several sub-layers. Then Elman and LSTM neural networks are established to predict the highand low-frequency sub-layers, respectively. Experimental results show that the hybrid Elman-LSTM model has better performance than other models. Wang et al. [20] designed a multi-scale fusion prediction method based on nonlinear autoregressive neural network and ensemble empirical mode decomposition (EEMD). The original battery capacity data is turned into several different frequency components through EEMD. Then, a non-linear autoregressive neural network is used to predict each component, and finally, the prediction results are added to obtain final RUL prediction. The fusion methods effectively improve the accuracy of RUL prediction, but increase the complexity of the model.
At present, most prediction methods based on fusion methods only involve a single model, and it is difficult to predict different objects while retaining good performance. This paper develops a multi-scale fusion prediction method based on EEMD and the LSTM-STW-GS-LM mixture model. The EEMD is used to decompose the original battery capacity data into multiple different frequency components. Based on the characteristics of the decomposed data, the GS-LM model is used to fit and predict some of its low-frequency components, and the LSTM-STW model is used to predict high-frequency components. The prediction outcomes of GS-LM and LSTM-STW were effectively integrated in order to obtain health battery RUL results.
The main innovations of this paper are as follows: (1) Decompose the battery raw data by EEMD method, and use the appropriate LSTM-STW and GS-LM model to separately train and predict through the sequence characteristics after decomposition to avoid the complexity of hybrid neural network under ensuring the prediction accuracy and improve the prediction efficiency. (2) In the case of different prediction starting points, the method in this paper has smaller prediction errors, is less affected by the prediction starting points, and better prediction results which include the phenomenon of charge capacity sudden increase compared with other methods.
The structure of the article is as follows: Section 1 is introduction. Section 2 presents the LSTM-STW and GS-LM fusion prediction method for batteries RUL prediction. Section 3 presents the results and discussion. Section 4 presents the conclusion.

LSTM-STW and GS-LM (LSTM-STW-GS-LM) Fusion Prediction Method
The major procedures of the proposed fusion prediction method LSTM-STW-GS-LM for the battery RUL prediction are shown in Figure 1, which is mainly divided into the following steps: (1) The original battery capacity is preprocessed.
(2) The preprocessed data is decomposed into low-frequency and high-frequency data by EEMD.
(3) The low-frequency prediction model is constructed by GS-LM, and the high-frequency prediction model is constructed by LSTM-STW. All the prediction results are integrated effectively to obtain the final combined prediction result.

Experimental Data
Two types of 18,650 sized batteries (#5, #6) from the NASA Ames center of excellence database are taken in this paper [21]. The rechargeable capacity of the battery is reduced to less than 70% of the initial capacity (i.e., there is a potential safety hazard in the use of the battery) through multiple discharge and charge cycle experiments. The test conditions of each battery's discharge and charge cycle are as follows: charge in constant current (CC) mode of 1.5 A until the battery voltage reaches 4.2 V, and then continue in constant voltage (CV) mode until the charging current drops to 20 mA. Discharge at a constant current (CC) of 2 A until the voltages of cell 5 and cell 6 drop to 2.7 V and 2.5 V, respectively. Figure 2 shows the full charge capacity degradation curve for both batteries. Capacity data is normalized (that is the ratio of capacity data to initial capacity value) and capacity data unit is Ah.

Ensemble Empirical Mode Decomposition (EEMD)
As aliasing occurs during EMD, the correctness and accuracy of the analysis results are affected. Wu and Huang [22] proposed a new method called EEMD. EEMD is an adaptive signal processing method, especially suitable for non-stationary signals. Its essence is smooth processing of time series signals. That is to say, according to waves of different scales that are actually present in the signal, the signal will be gradually decomposed and filtered, finally resulting in a series of data sequences with different scale characteristics called intrinsic mode functions (IMFs). Each IMF must meet two conditions:

Experimental Data
Two types of 18,650 sized batteries (#5, #6) from the NASA Ames center of excellence database are taken in this paper [21]. The rechargeable capacity of the battery is reduced to less than 70% of the initial capacity (i.e., there is a potential safety hazard in the use of the battery) through multiple discharge and charge cycle experiments. The test conditions of each battery's discharge and charge cycle are as follows: charge in constant current (CC) mode of 1.5 A until the battery voltage reaches 4.2 V, and then continue in constant voltage (CV) mode until the charging current drops to 20 mA. Discharge at a constant current (CC) of 2 A until the voltages of cell 5 and cell 6 drop to 2.7 V and 2.5 V, respectively. Figure 2 shows the full charge capacity degradation curve for both batteries. Capacity data is normalized (that is the ratio of capacity data to initial capacity value) and capacity data unit is Ah.

Experimental Data
Two types of 18,650 sized batteries (#5, #6) from the NASA Ames center of excellence database are taken in this paper [21]. The rechargeable capacity of the battery is reduced to less than 70% of the initial capacity (i.e., there is a potential safety hazard in the use of the battery) through multiple discharge and charge cycle experiments. The test conditions of each battery's discharge and charge cycle are as follows: charge in constant current (CC) mode of 1.5 A until the battery voltage reaches 4.2 V, and then continue in constant voltage (CV) mode until the charging current drops to 20 mA. Discharge at a constant current (CC) of 2 A until the voltages of cell 5 and cell 6 drop to 2.7 V and 2.5 V, respectively. Figure 2 shows the full charge capacity degradation curve for both batteries. Capacity data is normalized (that is the ratio of capacity data to initial capacity value) and capacity data unit is Ah.

Ensemble Empirical Mode Decomposition (EEMD)
As aliasing occurs during EMD, the correctness and accuracy of the analysis results are affected. Wu and Huang [22] proposed a new method called EEMD. EEMD is an adaptive signal processing method, especially suitable for non-stationary signals. Its essence is smooth processing of time series signals. That is to say, according to waves of different scales that are actually present in the signal, the signal will be gradually decomposed and filtered, finally resulting in a series of data sequences with different scale characteristics called intrinsic mode functions (IMFs). Each IMF must meet two conditions:

Ensemble Empirical Mode Decomposition (EEMD)
As aliasing occurs during EMD, the correctness and accuracy of the analysis results are affected. Wu and Huang [22] proposed a new method called EEMD. EEMD is an adaptive signal processing method, especially suitable for non-stationary signals. Its essence is smooth processing of time series signals. That is to say, according to waves of different scales that are actually present in the signal, the signal will be gradually decomposed and filtered, finally resulting in a series of data sequences with different scale characteristics called intrinsic mode functions (IMFs). Each IMF must meet two conditions: (1) the numbers of zero points and extreme points are equal or different by one for the entire data set; (2) the average value of the upper and lower envelopes at any location must be zero [23].
The decomposition process is as follows: Step 1: Set the integration times NE and Gaussian white noise, and the original signal C(n) (C(n) is the original capacity attenuation sequence of the battery for n cycles).
Step 2: Take the original signal with Gaussian white noise as a whole S(n), and then perform EMD decomposition to obtain each IMF component. EMD decomposition process can be seen in Appendix A.
Step 3: Repeat steps 1 and 2, each time adding a new Gaussian white noise sequence.
Step 4: The integrated result of the IMF obtained each time is taken as the final result.
Each IMF component is the average value of the IMF obtained by the decomposition of multiple EMD. During the averaging process, the effect of adding Gaussian white noise on signal decomposition is reduced. Take the #5 battery as an example, decomposing multiple groups of components with different frequencies as shown in Figure 3. (1) the numbers of zero points and extreme points are equal or different by one for the entire data set; (2) the average value of the upper and lower envelopes at any location must be zero [23].
The decomposition process is as follows: Step 1: Set the integration times NE and Gaussian white noise, and the original signal C(n) (C(n) is the original capacity attenuation sequence of the battery for n cycles).
Step 2: Take the original signal with Gaussian white noise as a whole S(n), and then perform EMD decomposition to obtain each IMF component. EMD decomposition process can be seen in Appendix A.
Step 3: Repeat steps 1 and 2, each time adding a new Gaussian white noise sequence.
Step 4: The integrated result of the IMF obtained each time is taken as the final result.

GS-LM Model
The GS-LM model is used to fit and predict some of the decomposed low-frequency sequences. This algorithm framework is shown in Figure 4. The local characteristics of the battery capacity data extracted by EEMD can greatly reduce the influence of the data fluctuation caused by sudden increase of capacity on the prediction performance of the algorithm.

GS-LM Model
The GS-LM model is used to fit and predict some of the decomposed low-frequency sequences. This algorithm framework is shown in Figure 4.  Based on the EEMD, the sequence characteristics of some components are obtained, and the Gaussian function or sine function is used to fit and predict: Where a1, b1, c1 are parameters to be obtained, x is the number of battery cycles, and f is the battery capacity.
The Levenberg-Marquardt (LM) algorithm is used to train and fit the previous n battery capacity decline data in order to predict the battery capacity decline trend after n+1 cycles.
The LM algorithm is an iterative algorithm that can be used to solve least squares problems. The algorithm steps are as follows: Step 1: Objective function: P is the vector that can be obtained by a1, b1, c1. x and ̂are the measured capacity and the estimated capacity of the battery.
Step 2: Using Taylor's first-order expansion in the f(P) neighborhood, the higher-order terms are removed, and an equation is obtained as follows: δP is the iterative step size where J is the Jacobian matrix.
Step 3: Therefore, we can get the following equations: Step 4: The optimal solution of min‖η − JδP‖ exists if η − JδP is orthogonal to J.
The damping term µ is introduced to construct an incremental normal equation. Where I is the unit array.  Based on the EEMD, the sequence characteristics of some components are obtained, and the Gaussian function or sine function is used to fit and predict: (2) where a 1 , b 1 , c 1 are parameters to be obtained, x is the number of battery cycles, and f is the battery capacity. The Levenberg-Marquardt (LM) algorithm is used to train and fit the previous n battery capacity decline data in order to predict the battery capacity decline trend after n+1 cycles.
The LM algorithm is an iterative algorithm that can be used to solve least squares problems. The algorithm steps are as follows: Step 1: Objective function: P is the vector that can be obtained by a 1 , b 1 , c 1 . x andx are the measured capacity and the estimated capacity of the battery.
Step 2: Using Taylor's first-order expansion in the f (P) neighborhood, the higher-order terms are removed, and an equation is obtained as follows: δ P is the iterative step size where J is the Jacobian matrix.
Step 3: Therefore, we can get the following equations: Step 4: The optimal solution of min η − Jδ P exists if η − Jδ P is orthogonal to J.
The damping term µ is introduced to construct an incremental normal equation. Where I is the unit array.
J T J + µI δ P = J T c Energies 2020, 13, 2380 7 of 13 δ P is iteratively updated to reduce the error, then the update is accepted and the damping term µ is reduced; conversely, if the current increment makes the function increase, then the damping term is increased and the incremental normal equation is re-solved until the value of the function can be reduced.
At each step of the LM algorithm, the damping term µ is adjusted to ensure that the error decreases. When µ is large, the algorithm approaches the steepest descent method, and the step size becomes smaller; otherwise, it approaches Gauss-Newton. In summary, LM is an adaptive algorithm that slowly decreases when the current solution is far away from the optimal solution, and quickly converges in the neighborhood of the optimal solution.

LSTM-RNN
Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) is a recurrent neural network with deep learning ability proposed by Sepphochreiter and Schmidhuber in 1997. It is designed for long-term dependent information. LSTM is a type of recurrent neural network that may be able to solve vanishing gradient problems [19]. The LSTM network structure is shown in Figure 5. There are 10 hidden layers (LSTM1-LSTM10) in the figure.
Energies 2020, 13, x 7 of 13 δP is iteratively updated to reduce the error, then the update is accepted and the damping term µ is reduced; conversely, if the current increment makes the function increase, then the damping term is increased and the incremental normal equation is re-solved until the value of the function can be reduced.
At each step of the LM algorithm, the damping term µ is adjusted to ensure that the error decreases. When µ is large, the algorithm approaches the steepest descent method, and the step size becomes smaller; otherwise, it approaches Gauss-Newton. In summary, LM is an adaptive algorithm that slowly decreases when the current solution is far away from the optimal solution, and quickly converges in the neighborhood of the optimal solution.

LSTM-RNN
Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) is a recurrent neural network with deep learning ability proposed by Sepphochreiter and Schmidhuber in 1997. It is designed for long-term dependent information. LSTM is a type of recurrent neural network that may be able to solve vanishing gradient problems [19]. The LSTM network structure is shown in Figure 5. There are 10 hidden layers (LSTM1-LSTM10) in the figure.
The capacity data X(t) at time t is predicted by inputting the capacity attenuation data for the first k times from X(t−k) to X(t−1). The single LSTM-RNN memory unit is shown in Figure 6 below:  The capacity data X(t) at time t is predicted by inputting the capacity attenuation data for the first k times from X(t − k) to X(t − 1). The single LSTM-RNN memory unit is shown in Figure 6 below: Energies 2020, 13, x 7 of 13 δP is iteratively updated to reduce the error, then the update is accepted and the damping term µ is reduced; conversely, if the current increment makes the function increase, then the damping term is increased and the incremental normal equation is re-solved until the value of the function can be reduced.
At each step of the LM algorithm, the damping term µ is adjusted to ensure that the error decreases. When µ is large, the algorithm approaches the steepest descent method, and the step size becomes smaller; otherwise, it approaches Gauss-Newton. In summary, LM is an adaptive algorithm that slowly decreases when the current solution is far away from the optimal solution, and quickly converges in the neighborhood of the optimal solution.

LSTM-RNN
Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) is a recurrent neural network with deep learning ability proposed by Sepphochreiter and Schmidhuber in 1997. It is designed for long-term dependent information. LSTM is a type of recurrent neural network that may be able to solve vanishing gradient problems [19]. The LSTM network structure is shown in Figure 5. There are 10 hidden layers (LSTM1-LSTM10) in the figure.
The capacity data X(t) at time t is predicted by inputting the capacity attenuation data for the first k times from X(t−k) to X(t−1). The single LSTM-RNN memory unit is shown in Figure 6 below:  The output of the LSTM-RNN depends on three gates: the forget gate, the input gate, and the output gate. Initial C and h should be given before the program starts. The forget gate: the first step and determines the information we will discard from the cell state. 2.
The input gate: it determines how much new information is added to the cell state.
We can obtain short-term storage information of cells: 3.
Output gate determines the current output information: is the offset corresponding to each gate. σ and tanh are incentive functions as follows:

Sliding Time Window (STW)
Because the IMF obtained after decomposition of the initial data is an obvious nonlinear relation, a sliding time window is designed to extract the new feature, and the extracted new sequence data is used as input of LSTM-RNN.
As shown in Figure 7, the resulting data sequence IMF by a fixed sliding window, the window length of which is k+1.The sequence of the previous k capacity data from X(t − k) to X(t − 1) in the window is used as the input data of LSTM-RNN, and the last data in the window is used as the corresponding output data. By sliding the window, multiple sets of corresponding input and output data are obtained.
Energies 2020, 13, x 8 of 13 The output of the LSTM-RNN depends on three gates: the forget gate, the input gate, and the output gate. Initial C and h should be given before the program starts.
1. The forget gate: the first step and determines the information we will discard from the cell state.
2. The input gate: it determines how much new information is added to the cell state.
We can obtain short-term storage information of cells: 3. Output gate determines the current output information: where Wf, Wi, WC, Wo is the parameter matrix, [ht−1, xt] is a matrix connected by two vectors ht−1, xt, bf, bi, bC, bo is the offset corresponding to each gate. σ and tanh are incentive functions as follows:

Sliding Time Window (STW)
Because the IMF obtained after decomposition of the initial data is an obvious nonlinear relation, a sliding time window is designed to extract the new feature, and the extracted new sequence data is used as input of LSTM-RNN.
As shown in Figure 7, the resulting data sequence IMF by a fixed sliding window, the window length of which is k+1.The sequence of the previous k capacity data from X(t−k) to X(t−1) in the window is used as the input data of LSTM-RNN, and the last data in the window is used as the corresponding output data. By sliding the window, multiple sets of corresponding input and output data are obtained.
Feature Extraction

New Input
New Output

X(t-k)
Sliding Time Window (size=k+1) Figure 7. New sequence data based on Sliding Time Window.

Evaluation Criteria
MAPE, MAE, RMSE and Error are used as the evaluation criteria of battery RUL prediction results. The smaller they are, the better the prediction results are.

Evaluation Criteria
MAPE, MAE, RMSE and Error are used as the evaluation criteria of battery RUL prediction results. The smaller they are, the better the prediction results are. (1) the mean absolute percentage error (MAPE) y i+s −ŷ i+s y i+s (15) (2) the mean absolute error (MAE) (3) the root mean square error (RMSE) where y i+s andŷ i+s stand for the battery charge capacity raw data and battery charge capacity prediction data, and s is the number of prediction data sets. EOL is the prediction starting point value.ÊOL is the number of times used to predict the end of battery life.

Experimental Results
To verify the reliability and effectivity of the proposed battery RUL prediction method, we designed M2, M3 and M4 (as shown in Table 1) to compare the results. The M2 and M3 use a double exponential model and a cubic polynomial model respectively. Two models determine the parameters in the fixed model through training data fitting, so as to predict the attenuation trend of battery capacity. So they are simple and easy to implement based on traditional data fitting ideas. M4 uses a single LSTM. It is more complex with multi-layer neural network. We designate 90 times as the starting point of capacity prediction in these two kinds of batteries. Take 70% of the initial battery capacity as the battery's end of life, which is EOL.  Figure 8 shows a comparison of the different results obtained by the four methods. It can be clearly observed that both batteries are closer to the original capacity attenuation curve of the battery under the M1, and the capacity regeneration phenomenon can be more accurately captured as the battery usage increases. Essentially, M2 and M3 are easy to realize, as they only apply a single mathematical formula for fitting prediction. However, because of the limitations of formula fixation, the prediction results are often unsatisfactory which can be seen in Figure 8. The errors between the prediction results and the original data are much larger. The M4 has a better trend for a period of time after the prediction starting point 90 times, and the error gradually increases after 140 times. Therefore, these three methods cannot accurately and effectively predict the battery RUL. Energies 2020, 13, x 10 of 13  Table 2 gives the evaluation criteria for the prediction results of four models. Runtime represents the total time of program training and prediction. From the table, it is easy to see that although M1 is the most time-consuming, its evaluation results are basically below 0.01, while the evaluation results of other models (M2-M4) are far greater than 0.01. Therefore, M1 prediction results have high accuracy over other models.

Different Prediction Starting Points
Pang et al. [18] tested the prediction results of four kinds battery data under four predictions starting points. The prediction curve and original curve had a similar declining trend. However, the smaller the starting point value was, the more the prediction curve deviated from the original data. To verify that the proposed M1 still has good prediction under different prediction starting points, we designed three prediction starting points (starting at 70, 80, 90) in the experiments. As shown in Figure 9, below, for two kinds of batteries, under three different prediction starting points, the prediction curve maintained a similar degradation trend to the original data curve, and fluctuated around the original data. Apparently, no matter which predication starting points are chosen, there is no significant deviation from the original data.   Table 2 gives the evaluation criteria for the prediction results of four models. Runtime represents the total time of program training and prediction. From the table, it is easy to see that although M1 is the most time-consuming, its evaluation results are basically below 0.01, while the evaluation results of other models (M2-M4) are far greater than 0.01. Therefore, M1 prediction results have high accuracy over other models.

Different Prediction Starting Points
Pang et al. [18] tested the prediction results of four kinds battery data under four predictions starting points. The prediction curve and original curve had a similar declining trend. However, the smaller the starting point value was, the more the prediction curve deviated from the original data. To verify that the proposed M1 still has good prediction under different prediction starting points, we designed three prediction starting points (starting at 70, 80, 90) in the experiments. As shown in Figure 9, below, for two kinds of batteries, under three different prediction starting points, the prediction curve maintained a similar degradation trend to the original data curve, and fluctuated around the original data. Apparently, no matter which predication starting points are chosen, there is no significant deviation from the original data. Table 3 shows the RUL prediction Error of the two batteries under three predictions starting points. We can see that the Error is within 10 times for both batteries, and decreases with the increase of prediction starting point value. When the prediction starting point is 90 times, the Error of both batteries reaches the minimum value. This indicates that the predicted RUL of battery is very close to the real RUL at this time. Therefore, when M1 changes the prediction starting point, it can be easily concluded that the prediction error of RUL does not change much.
To verify that the proposed M1 still has good prediction under different prediction starting points, we designed three prediction starting points (starting at 70, 80, 90) in the experiments. As shown in Figure 9, below, for two kinds of batteries, under three different prediction starting points, the prediction curve maintained a similar degradation trend to the original data curve, and fluctuated around the original data. Apparently, no matter which predication starting points are chosen, there is no significant deviation from the original data.

Conclusions
To improve the lithium-ion battery RUL prediction accuracy, this paper adopts a data-driven method. Through the combination of neural network and traditional empirical model fitting prediction, the proposed model is applied to two different lithium-ion batteries, and the prediction starting point is changed in the experimental process. First, the battery data is decomposed into multiple time-series data with different frequencies by EEMD. Second, different sequence data is trained and predicted by LSTM-STW model and GS-LM model, and finally all results are added to obtain the battery RUL prediction. The experimental results show that the proposed method can obtain the more prediction accuracy than others typical prediction methods. The prediction result is less influenced by changing the prediction starting point. In addition to this, it can accurately predict the trend of battery capacity degradation, which is the phenomenon of a sudden increase of charge capacity during the degradation process.