Lithium-Ion Battery Remaining Useful Life Prediction Based on Hybrid Model

: Accurate prediction of the remaining useful life (RUL) is a key function for ensuring the safety and stability of lithium-ion batteries. To solve the capacity regeneration and model adaptability under different working conditions, a hybrid RUL prediction model based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and a bi-directional gated recurrent unit (BiGRU) is proposed. CEEMDAN is used to divide the capacity into intrinsic mode functions (IMFs) to reduce the impact of capacity regeneration. In addition, an improved grey wolf optimizer (IGOW) is proposed to maintain the reliability of the BiGRU network. The diversity of the initial population in the GWO algorithm was improved using chaotic tent mapping. An improved control factor and dynamic population weight are adopted to accelerate the convergence speed of the algorithm. Finally, capacity and RUL prediction experiments are conducted to verify the battery prediction performance under different training data and working conditions. The results indicate that the proposed method can achieve an MAE of less than 4% with only 30% of the training set, which is veriﬁed using the CALCE and NASA battery data.


Introduction
Lithium-ion batteries are widely used in new-energy vehicles, communication equipment, and aerospace electronics owing to their fast-charging rate and long life [1].However, with the accumulation of cycles, the performance of lithium-ion batteries inevitably deteriorates, leading to system failure and possibly even safety accidents [2,3].Therefore, it is important to efficiently and accurately predict the Remaining Useful Life (RUL) of Li-ion batteries.During the long-term use of lithium-ion batteries, along with the increasing number of charge and discharge cycles, some irreversible chemical reactions occur inside the battery, resulting in increased internal resistance and performance degradation [4][5][6].Reliable life prediction techniques not only allow for more efficient use of the battery but also reduce the incidence of failure [7].RUL is the number of charge and discharge cycles from new to end of life (EOL) under certain operating conditions, with a typical 20% capacity degradation upon reaching EOL [8][9][10].
Currently, life prediction methods for lithium-ion batteries are mainly divided into model and data-driven methods [11,12].Although the prediction accuracy of lithium-ion batteries based on the model method is high, it depends significantly on the modeling of the internal physical and chemical properties of the battery [13].The data-driven method can explore the relationship between the external parameters and internal state of the battery without building a complex battery model.Babaeiyazdi et al. [14] designed a model with electrochemical impedance spectroscopy (EIS) and Gaussian process regression (GPR) to predict the state of charge.Ren et al. proposed the use of a convolutional neural network (CNN) and long short-term memory (LSTM) to improve the prediction accuracy of data-driven methods with insufficient degradation data [15].Zhao et al. combined Sustainability 2023, 15, 6261 2 of 18 a broad learning system algorithm and LSTM to increase the size of the training data.The results demonstrated that the training data can be reduced to only 25% using a datadriven method [16].Yao et al. [17] used particle swarm optimization to optimize (PSO) the parameters of an extreme learning machine (ELM) and effectively predict the RUL of lithium-ion batteries.However, it is difficult to accurately predict battery capacity using a data-driven method owing to the capacity regeneration phenomenon [18,19].
With the aim of tackling the problem of battery capacity regeneration, this study aims to eliminate the influence of regeneration capacity on global prediction.Cheng et al. [20] used Empirical Mode Decomposition (EMD) to decompose the original capacity into a series of intrinsic mode functions from high to low frequencies, thus reducing the impact of capacity regeneration.However, mode aliasing occurs during EMD decomposition, which interferes with the signal decomposition.Chen et al. [21] adopted an integrated empirical mode decomposition to add Gaussian white noise to the original signal to reduce modal mixing.However, EEMD may produce some false components in the process of adding noise to the signal, which affects subsequent signal analysis.CEEMDAN [22] was improved based on EEMD by adding pairs of Gaussian white noise in each decomposition.Averaging was then used to solve the problem of white noise transmission from high frequency to low frequency and to suppress the generation of modal aliasing and false components, which is more suitable for time-frequency analysis of nonlinear and non-stationary signals.
The nonlinear and non-stationary characteristics of the battery capacity curve match the application range of the CEEMDAN algorithm [23][24][25].Therefore, CEEMDAN was selected to extract the signal characteristics of battery capacity.
The judgment accuracy of a neural network depends excessively on the selection of weights and thresholds, which requires a large amount of training data.The operation is complex, and its stability is insufficient.The optimal threshold can easily be changed [26,27].Ding et al. [28] proposed cuckoo search (CS) optimization to optimize the decomposition layer of variational model decomposition as the input of a gated recurrent unit (GRU).However, the IMF component decomposed by CEEMDAN has different signal frequencies and requirements for universality of neural networks [29].Therefore, the grey wolf optimization (GWO) algorithm was used to optimize the network parameters and improve the adaptability of the network.The GWO algorithm was proposed by Mirjalili [30], based on the observation of predator hunting in nature.The grey wolf optimization algorithm model is simple and realizable, and its optimization performance in many fields is no less than that of other meta-heuristic swarm intelligence algorithms [31][32][33].Owing to the shortcomings of the GWO algorithm in large-scale optimization problems, such as premature convergence, easy local optimization, and low convergence accuracy, many researchers have proposed various solutions from different aspects.Nadimi-Shahraki et al. [34] proposed an improved GWO based on dimension-learning-based hunting (DLH) to maintain the diversity of the wolf population.Long et al. [35] proposed a hybrid algorithm using GWO and CS to balance exploration and exploitation.Zhao et al. [36] used chaos-enhanced GWO for the overall search and the initial population of the wolf can be restricted to a certain range.However, this method only considers the diversity of the initial population and ignores the weight and position relationships of different wolf groups.
This study proposed a comprehensive RUL prediction method.First, the original capacity of the battery was decomposed using CEEMDAN to eliminate interference noise with low correlation.IGWO was used to optimize the parameters of the neural network.Finally, CALCE and NASA data were tested to verify the effectiveness and stability of this method.The following were achieved: (1) The regeneration of battery capacity was eliminated through CEEMDAN, and the accuracy of the prediction model was improved.(2) The weights of the initial population, control factor, and wolf group in the traditional GWO were improved to enhance the diversity and iteration speed of the population.
(3) IGWO was used to improve the parameters of neurons, dropout rate, batch size, etc. in the neural network.The universality of RUL prediction was improved for IMF components at different frequencies.

Battery Remaining Useful Life
The remaining life of a battery represents the number of battery cycles from its rated capacity to the end of its life [37].The formula is as follows: where N eol is the maximum number of battery cycles, and N represents the number of cycles of the battery.

Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
CEEMDAN is an improved algorithm based on EEMD and EMD.It has the advantages of a good mode spectrum separation effect, fewer shielding iterations, and low calculation cost, and is often used to process non-stationary and nonlinear signals [38,39].The specific steps of CEEMDAN are as follows: Step 1: Add Gaussian white noise to the original capacity signal r 0 .EMD is used to decompose the new signal to obtain the first IMF component.Then the first residual is calculated according to the following equations: Step 2: Add adaptive white noise to the remaining signal of Equation ( 2) and decompose the second modal component obtained after EMD processing, using: Step 3: Steps 2 and 3 are repeated to decompose the signal and t.The k residual and k+1 modal components are, respectively, as follows: (5) where n denotes the total number of modal components.The final residual is:

Bidirectional Gate Recurrent Unit
The remaining life of a battery represents the number of cycles of the battery from its rated capacity to the end of its life.The equation is as follows: where r t is the reset gate, z t is the update gate, x t denotes the current input, h t−1 is the last-moment output, h t is the current output, σ and tan are the activation functions, W r and W z are weight matrices, "•" is the dot product, and " * " is the matrix product.The classical GRU structure uses one-way propagation along the sequence transfer direction, and time is only related to the past time [40].However, in some cases, feedback on the future sequence value at a certain time is considered when building the model.This information can be used to modify a model [41].Therefore, a BiGRU model was constructed, as shown in Figure 1.The basic concepts are as follows.For each training sequence, two GRU models are established in the forward and reverse directions, and the hidden layer nodes of the two models are connected to the same output layer.This data-processing method can provide complete historical and future information for each time point in the input sequence of the output layer [42].Therefore, the BiGRU network can learn the relationship between past and future load-influencing factors and the current load, which helps extract the characteristics of capacity data.The output is shown in Equations ( 12)- (14).
where where rt is the reset gate, zt is the update gate, xt denotes the current input, ht−1 is the last moment output, h t is the current output, σ and tan are the activation functions, Wr and Wz are weight matrices, "•" is the dot product, and "*" is the matrix product.The classical GRU structure uses one-way propagation along the sequence transfe direction, and time is only related to the past time [40].However, in some cases, feedback on the future sequence value at a certain time is considered when building the model.Thi information can be used to modify a model [41].Therefore, a BiGRU model was con structed, as shown in Figure 1.The basic concepts are as follows.For each training se quence, two GRU models are established in the forward and reverse directions, and the hidden layer nodes of the two models are connected to the same output layer.This data processing method can provide complete historical and future information for each time point in the input sequence of the output layer [42].Therefore, the BiGRU network can learn the relationship between past and future load-influencing factors and the curren load, which helps extract the characteristics of capacity data.The output is shown in Equa tions ( 12)- (14).
where h ⃗ t is the output of the forward hidden layer at time t; h ⃖ t is the output of the reverse hidden layer at time t; wt and vt, respectively, represent the forward hidden layer state h ⃗ corresponding to the two-way GRU at time t and the reverse hidden state ℎ ⃖ correspond ing weight, respectively; and bt represents the offset corresponding to the hidden laye state at time t.The structure of the BiGRU is shown in Figure 1.
ht-1 ht ht+1 In the forward layer, the forward direction from time step 1 to time step t is calcu lated, and the output h of each forward hidden layer is obtained and saved in h ⃗ t .In the reverse layer, a reverse calculation is performed from the current time step t to the previ ous time step t − 1, and the output h of each reverse hidden layer is obtained and saved in In the forward layer, the forward direction from time step 1 to time step t is calculated, and the output h of each forward hidden layer is obtained and saved in → h t .In the reverse layer, a reverse calculation is performed from the current time step t to the previous time step t − 1, and the output h of each reverse hidden layer is obtained and saved in ← h t .Finally, the final output is obtained by combining the output results of the forward and reverse layers.

The Grey Wolf Optimizer
Swarm intelligence is a powerful branch of computational intelligence for solving optimization problems.The SI algorithm simulates and imitates natural social behaviors, such as fish schools, birds, and animals [43].The GWO algorithm is inspired by the hunting behavior of grey wolves and is considered one of the fastest swarm intelligence algorithms.
In each group of grey wolves, there is a common social class that represents the power and dominance of each wolf, as shown in Figure 2. In the implementation process of the GWO algorithm, grey wolves attack prey based on fitness values and social levels.The main process of grey wolf hunting is to round up and attack [44].The hunting technology and social level of grey wolves can be used to design mathematical models as follows: Sustainability 2023, 15, x FOR PEER REVIEW 5 of 18 ℎ ⃖ .Finally, the final output is obtained by combining the output results of the forward and reverse layers.

The Grey Wolf Optimizer
Swarm intelligence is a powerful branch of computational intelligence for solving optimization problems.The SI algorithm simulates and imitates natural social behaviors, such as fish schools, birds, and animals [43].The GWO algorithm is inspired by the hunting behavior of grey wolves and is considered one of the fastest swarm intelligence algorithms.
In each group of grey wolves, there is a common social class that represents the power and dominance of each wolf, as shown in Figure 2. In the implementation process of the GWO algorithm, grey wolves attack prey based on fitness values and social levels.The main process of grey wolf hunting is to round up and attack [44].The hunting technology and social level of grey wolves can be used to design mathematical models as follows: Step 1: Surround the prey.The social class of the grey wolf was modeled and α was adopted as the best solution.The second and third best solutions were β and δ, respectively.The remaining candidate solutions are considered as ω.
where D is the distance between the prey and wolf pack, A and C are the coefficient vectors, Xp is the position vector of the prey, X is the position vector of the grey wolf, and n is the current iteration number.Vectors A and C can be calculated separately.
where r1 and r2 are the random vectors of [0, 1].The parameter a is linear, as shown in Equation (19), and decreases from 2 to 0 during the iterations.
where I is the maximum number of iterations.
Step 2: Attack prey.The grey wolf can identify and surround this position.It is typically composed of an α guidance.β and δ participated in occasional hunting.However, in the virtual search space, the location of the best prey is unknown.To simulate the hunting behavior of grey wolves mathematically, it was assumed that α was the best candidate Step 1: Surround the prey.The social class of the grey wolf was modeled and α was adopted as the best solution.The second and third best solutions were β and δ, respectively.The remaining candidate solutions are considered as ω.
where D is the distance between the prey and wolf pack, A and C are the coefficient vectors, X p is the position vector of the prey, X is the position vector of the grey wolf, and n is the current iteration number.Vectors A and C can be calculated separately.
where r 1 and r 2 are the random vectors of [0, 1].The parameter a is linear, as shown in Equation (19), and decreases from 2 to 0 during the iterations.
where I is the maximum number of iterations.
Step 2: Attack prey.The grey wolf can identify and surround this position.It is typically composed of an α guidance.β and δ participated in occasional hunting.However, in the virtual search space, the location of the best prey is unknown.To simulate the hunting behavior of grey wolves mathematically, it was assumed that α was the best candidate solution.β and γ are the potential locations of the prey.Therefore, each grey wolf can be updated according to the best location, as shown in Equations ( 20)- (22).

Improved Grey Wolf Optimizer
In the GWO, α, β, and δ guide the ω wolf moves in the best direction.This behavior may lead to a locally optimal solution.The distribution of the initial wolves also yields a locally optimal solution for the algorithm.To overcome these problems, an improved grey wolf optimizer (IGWO) is proposed in this section.Improvements include a new search strategy associated with the selection and update steps.IGWO includes four stages: initialization, movement, selection, and update.

Population Initialization
Because the initial grey wolf population determines whether the optimal path can be found and the convergence speed, the diversity of the initial population helps prove the performance of the algorithm in finding the optimal path.The traditional GWO randomly initializes the location of the wolf group, which primarily affects the search efficiency of the algorithm.The initialized population needs to be distributed as evenly as possible in the initial space to improve the diversity of the initial wolf group and to prevent the algorithm from falling into the local optimal solution.
GWO usually generates an initialization population randomly.The more uniformly distributed the initial population in the search space, the better the optimization efficiency and accuracy of the algorithm.To improve population diversity, tent mapping was used to initialize the population.The tent map has a simple structure, uniform density distribution, and good ergodicity.The tent mapping is expressed as: The initial wolf pack was randomly generated within the specified search range, as shown in Equation (24).
where x t is a random value in the range [0, 1]; U b and L b are the upper and lower bounds of the component, respectively; X ij is the position of the jth wolf in the ith iteration; N is the number of wolves; and D is the search dimension.The improved chaotic mapping of the wolf position can ensure a uniform distribution of the initial population.

Optimize Control Parameters α
In Equation ( 17), the value of A in the traditional GWO algorithm is affected by α.The influence of this factor linearly decreased from 2 to 0. For such complex nonlinear problems, as the battery capacity declines, it is impossible to balance the local search and global search energy effectively.Therefore, this paper proposes a method for nonlinear control of the additional variable A.
where i is the current iteration number of the α wolf and I denotes the maximum number of iterations.The improved iteration curve for A is shown in Figure 3.
Sustainability 2023, 15, x FOR PEER REVIEW 7 of 18 global search energy effectively.Therefore, this paper proposes a method for nonlinear control of the additional variable A.
where i is the current iteration number of the α wolf and I denotes the maximum number of iterations.The improved iteration curve for A is shown in Figure 3.

Dynamic Weight
The traditional grey wolf algorithm is a one-way local search, and the position change cannot be changed.In this study, a two-way search was improved, and the selection of a random number control direction was introduced.The search formula is as follows: where w is the weight, wmax is the maximum weight, and wmin is the minimum weight.Generally, when the maximum weight is 0.9 and the minimum weight is 0.4, the algorithm can maintain the best performance [45].In GWO, the original formula of the inertia weight to change the grey wolf position vector can be introduced on this basis.
where Wα, Wβ, and Wδ represent the weights of the wolf population class.

Dynamic Weight
The traditional grey wolf algorithm is a one-way local search, and the position change cannot be changed.In this study, a two-way search was improved, and the selection of a random number control direction was introduced.The search formula is as follows: where w is the weight, w max is the maximum weight, and w min is the minimum weight.Generally, when the maximum weight is 0.9 and the minimum weight is 0.4, the algorithm can maintain the best performance [45].In GWO, the original formula of the inertia weight to change the grey wolf position vector can be introduced on this basis.
where W α , W β , and W δ represent the weights of the wolf population class.

Data Sets
To verify the effectiveness and generalization of the method in this study, the datasets including NASA and CALCE were used [46,47].For the CALCE battery dataset, CS2_35, CS2_36, CS2_37, CS2_38 were selected.The charging and discharging experimental data of the four batteries are used as the input data for the model.All batteries experienced the same charging and discharging modes.At room temperature of 25 • C, the samples were charged at a constant current of 0.5 C. When the voltage reached 4.2 V, they were charged at a constant voltage.When the charging current dropped to 0.05 A, they stopped.The discharge process used 1C current for constant current discharge and stopped when the voltage dropped to 2.7 V.
NASA battery data were selected from B0005, B0006, B0007, and B0018.Under the condition of 24 • C at room temperature, these were charged at a constant current of 1.5 A until the battery voltage reached 4.2 V, and then they were charged at a constant voltage (CV) mode until the charging current dropped to 20 mA.The discharge process was conducted at a constant current (CC) mode of 2 A until the voltage of batteries 5, 6, 7, and 18 dropped to 2.7 V, 2.5 V, 2.2 V, and 2.5 V.
Considering 70% of capacity as the end of life (EOL), NASA's battery capacity was reduced from 2 Ah to 1.4 Ah.The discharge process was conducted at a constant current (CC) mode of 2 A until the voltage of batteries 5, 6, 7, and 18 dropped to 2.7 V, 2.5 V, 2.2 V, and 2.5 V.The experimental capacity of CACEL was reduced from 1.1 Ah to 0.77 Ah as EOL.
The experimental hardware is configured with Intel Core i5-11320H processor, 16 G RAM, NVIDIA 2070, Windows 11, Python 10.10, and TensorFlow 2.10.First, 50% of the data was used as the training set and test set.After that, 30% of the data was used as the training set, and 70% data was used as the test set to inspect the performance of the model.

Evaluation Criteria
To evaluate the performance of the prediction model, this paper introduces four evaluation indicators to evaluate the prediction results of the model, which are mean absolute error (MAE), absolute correlation coefficient (R 2 ), and root mean square error (RMSE).MAE can effectively measure the prediction error of the model, and R 2 reflects the fitting effect of the model.The closer the result, the better the effect [48].RMSE can measure the prediction accuracy of the model, and its calculation formula is as follows: where n is the total number of cycles and y and ŷ are the actual and predicted values of the battery capacity during the cycle.In addition, the error calculation index of battery RUL is introduced, and e 1 and e 2 are used to measure the life prediction accuracy.The equation is as follows: In the formula, e 1 and e 2 represent the error and relative error between the actual value and the predicted value of battery RUL, respectively.R rul represents the actual RUL value, and P rul represents the predicted RUL value.

Structure of the RUL Prediction Model
Combining the advantages and disadvantages of CEEMDAN and BiGRU, the improved Grey Wolf algorithm is used to optimize the parameters of the BiGRU network, and an RUL prediction method based on the CEEMDAN-IGWO-BiGRU model is proposed.The model can reduce the probability of local optimization and over-fitting, and the model design has good robustness.The framework diagram of the RUL estimation is presented in Figure 4.The specific implementation steps are as follows:

Structure of the RUL Prediction Model
Combining the advantages and disadvantages of CEEMDAN and BiGRU, the improved Grey Wolf algorithm is used to optimize the parameters of the BiGRU network, and an RUL prediction method based on the CEEMDAN-IGWO-BiGRU model is proposed.The model can reduce the probability of local optimization and over-fitting, and the model design has good robustness.The framework diagram of the RUL estimation is presented in Figure 4.The specific implementation steps are as follows: Step 1: Collect the battery capacity data and use CEEMDAN to decompose the capacity data into multiple components according to Equations ( 2)-( 7).Divide the training set and test set, and then normalize the data.
Step 2: Determine the number of inputs, output, and hidden layers of the BiGRU, and set parameters such as the number of IGWO populations and the number of iterations.
Step 3: The number of neurons in the input layer of the BiGRU model, number of training samples, and forgetting rate are taken as the individual searchers in the IGWO, and the root mean square error between the expected output of the training samples of the BiGRU model and the actual output is taken as the fitness value.Combined with the improved grey wolf optimization algorithm, the position is continuously updated, and the fitness value is calculated until the conditions are met.
Step 4: Input the obtained optimal weight threshold into the BiGRU model and obtain the IMF prediction results for capacity after training.Step 1: Collect the battery capacity data and use CEEMDAN to decompose the capacity data into multiple components according to Equations ( 2)-( 7).Divide the training set and test set, and then normalize the data.
Step 2: Determine the number of inputs, output, and hidden layers of the BiGRU, and set parameters such as the number of IGWO populations and the number of iterations.
Step 3: The number of neurons in the input layer of the BiGRU model, number of training samples, and forgetting rate are taken as the individual searchers in the IGWO, and the root mean square error between the expected output of the training samples of the BiGRU model and the actual output is taken as the fitness value.Combined with the improved grey wolf optimization algorithm, the position is continuously updated, and the fitness value is calculated until the conditions are met.
Step 4: Input the obtained optimal weight threshold into the BiGRU model and obtain the IMF prediction results for capacity after training.

Results of CEEMDAN
From Figure 5, CEEMDAN can be used to analyze the current change trend and process.Except for CS2_36, other data are decomposed into six intrinsic mode function (IMF) components and one residual signal (Res) component using the CEEMDAN method.CS2_36 can be decomposed into five IMF components, as shown in Figure 5b, and arranged from high to low frequency compared to the degraded sequence of the decomposed modal components and the original capacity.Some modal classifications with low correlation can be discarded as noise.

Results of CEEMDAN
From Figure 5, CEEMDAN can be used to analyze the current change trend and process.Except for CS2_36, other data are decomposed into six intrinsic mode function (IMF) components and one residual signal (Res) component using the CEEMDAN method.CS2_36 can be decomposed into five IMF components, as shown in Figure 5b, and arranged from high to low frequency compared to the degraded sequence of the decomposed modal components and the original capacity.Some modal classifications with low correlation can be discarded as noise.The correlation between the decomposed variables and battery capacity was analyzed.As shown in Figure 6, the correlation between the original battery capacity and Res is higher than 90%, and Res can effectively reflect the long-term trend of the capacity decline.In addition, the correlation of IMF1, IMF2, and IMF3 in the CALCE dataset was less than 1%, which can be used for noise removal, and the rest of the IMF and Res were used as prediction objects together.In the NASA dataset, IMF components with a correlation greater than 12% are selected as prediction objects.The correlation between the decomposed variables and battery capacity was analyzed.As shown in Figure 6, the correlation between the original battery capacity and Res is higher than 90%, and Res can effectively reflect the long-term trend of the capacity decline.In addition, the correlation of IMF1, IMF2, and IMF3 in the CALCE dataset was less than 1%, which can be used for noise removal, and the rest of the IMF and Res were used as prediction objects together.In the NASA dataset, IMF components with a correlation greater than 12% are selected as prediction objects.
Decomposing each IMF component, it is difficult to accurately predict the trend of IMF components with low correlation.The signal fluctuation range was within 0.05, and there was no periodicity.After removing the components with less correlation according to the Pearson correlation coefficient, the short-term regeneration trend of the capacity is eliminated, but the overall prediction accuracy of the model can be significantly improved.Decomposing each IMF component, it is difficult to accurately predict the trend of IMF components with low correlation.The signal fluctuation range was within 0.05, and there was no periodicity.After removing the components with less correlation according to the Pearson correlation coefficient, the short-term regeneration trend of the capacity is eliminated, but the overall prediction accuracy of the model can be significantly improved.

Results of IGWO-BiGRU
In this study, the IGWO algorithm was used to optimize the parameters of the BiLSTM network model.The BiLSTM in this model selects a sigmoid function as the activation function, including two bidirectional hidden layers.The default number of neurons is 128, the dropout rate is 0.15, the batch size is 128, and the number of iterations is 500.The optimizer is Adam.In the IGWO algorithm, the number of the grey wolf population is defined as 30, and the number of iterations is 5.
During training, 50% was used as the training set.The final prediction results are shown in Figure 7.Under the default parameters, the LSTM model predicts the short-term capacity data accurately.However, with an increase in the number of cycles, the prediction curve gradually deviates.Taking Figure 7b (CS2_38) as an example, the capacity decline trend of 38 batteries increased after 800 cycles.Although the fitting degree of the BiGRU is better than that of LSTM because the default parameters are consistent, there are also deviations for long-term capacity prediction.After using IGWO to optimize the initial parameters, the declining trend of capacity prediction can be consistent with the original value, and the CEEMDAN decomposed and combined curve can better fit the original capacity.
Take Figure 7c,d for example.Owing to the small number of cycles of the NASA dataset, the prediction results of LSTM and the BiGRU deviate laterally from the original capacity, indicating that the prediction method has a certain lag.After using CEEMDAN decomposition to eliminate noise, the reconstructed forecast data were consistent with the

Results of IGWO-BiGRU
In this study, the IGWO algorithm was used to optimize the parameters of the BiLSTM network model.The BiLSTM in this model selects a sigmoid function as the activation function, including two bidirectional hidden layers.The default number of neurons is 128, the dropout rate is 0.15, the batch size is 128, and the number of iterations is 500.The optimizer is Adam.In the IGWO algorithm, the number of the grey wolf population is defined as 30, and the number of iterations is 5.
During training, 50% was used as the training set.The final prediction results are shown in Figure 7.Under the default parameters, the LSTM model predicts the shortterm capacity data accurately.However, with an increase in the number of cycles, the prediction curve gradually deviates.Taking Figure 7b (CS2_38) as an example, the capacity decline trend of 38 batteries increased after 800 cycles.Although the fitting degree of the BiGRU is better than that of LSTM because the default parameters are consistent, there are also deviations for long-term capacity prediction.After using IGWO to optimize the initial parameters, the declining trend of capacity prediction can be consistent with the original value, and the CEEMDAN decomposed and combined curve can better fit the original capacity.
Take Figure 7c,d for example.Owing to the small number of cycles of the NASA dataset, the prediction results of LSTM and the BiGRU deviate laterally from the original capacity, indicating that the prediction method has a certain lag.After using CEEMDAN decomposition to eliminate noise, the reconstructed forecast data were consistent with the actual capacity trend.The combined algorithm can effectively fit the declining trend of the original capacity and avoid premature convergence of the network.NASA has fewer datasets, and it was necessary to set the EOL to 80% of the capacity.The volume of CALCE data volume was sufficient and the EOL was set as 70% of the capacity for comparison between the models.Table 1 lists the evaluation indicators of the prediction results for the eight lithium-ion batteries under different algorithms.The smaller the RMSE and MAPE, the higher the prediction accuracy of the prediction model.Table 1 shows that the performance indicators predicted by the combined prediction model CEEMDAN-IGWO-BIGRU are superior to other methods, with high accuracy.In the CALCE dataset, the maximum RMSE was less than 2.6% and the maximum MAE was controlled within 1.6% in the NASA dataset.Owing to the lack of data, the model training was insufficient.Taking B0018 as an example, R 2 was lower than 73% in all the models.It can be observed from the observation of the original data that the sampling time of this battery was relatively discrete, and the number of cycles is 132, which was nearly a quarter less than the sampling data of other batteries, resulting in poor prediction accuracy of the model.In the prediction results of all batteries, the error e2 of the combined model was NASA has fewer datasets, and it was necessary to set the EOL to 80% of the capacity.The volume of CALCE data volume was sufficient and the EOL was set as 70% of the capacity for comparison between the models.Table 1 lists the evaluation indicators of the prediction results for the eight lithium-ion batteries under different algorithms.The smaller the RMSE and MAPE, the higher the prediction accuracy of the prediction model.Table 1 shows that the performance indicators predicted by the combined prediction model CEEMDAN-IGWO-BIGRU are superior to other methods, with high accuracy.In the CALCE dataset, the maximum RMSE was less than 2.6% and the maximum MAE was controlled within 1.6% in the NASA dataset.Owing to the lack of data, the model training was insufficient.Taking B0018 as an example, R 2 was lower than 73% in all the models.It can be observed from the observation of the original data that the sampling time of this battery was relatively discrete, and the number of cycles is 132, which was nearly a quarter less than the sampling data of other batteries, resulting in poor prediction accuracy of the model.In the prediction results of all batteries, the error e 2 of the combined model was controlled under 6%.It is proved that CEEMDAN decomposition and the IGWO algorithm can effectively improve the universality of the model in different application scenarios.To predict the capacity decline trend in early life, the remaining service life of the lithium-ion battery was measured, and the prediction model was trained using a 30% dataset.The prediction results for the four models are shown in Figure 8.
Because the model training was insufficient, the CALCE dataset had a large deviation in the long-term capacity prediction trend.As shown in Figure 8a, when the number of forecast cycles in the figure was 700, the forecast models deviated from the original capacity curve.However, the CEEMDAN-IGWO-BiGRU model maintained the same declining trend as the original capacity.As shown in Figure 8c, the model proposed in this study is still within the trend of declining range of capacity with 30% training data.It can be seen that the prediction model can still fit the degradation trend of the battery well with less training data, and its stability is better than other algorithms.
Table 2 summarizes the predicted performance indicators of all lithium-ion batteries in the 30% training set.It can be observed from Table 2 that the prediction effect of LSTM and the BiGRU is poor owing to the influence of the training data.The CALCE dataset exhibits capacity regeneration when reaching EOL, and the amplitude range is large.In CS2_37, the RUL error was significantly greater than those of the other three batteries, although the RMSE of the hybrid method was 4.53%.In addition, CEEMDAN-IGWO-BiGRU maintained a high accuracy rate, with a maximum MAE of less than 4%.Table 2 summarizes the predicted performance indicators of all lithium-ion batteries in the 30% training set.It can be observed from Table 2 that the prediction effect of LSTM and the BiGRU is poor owing to the influence of the training data.The CALCE dataset exhibits capacity regeneration when reaching EOL, and the amplitude range is large.In CS2_37, the RUL error was significantly greater than those of the other three batteries, although the RMSE of the hybrid method was 4.53%.In addition, CEEMDAN-IGWO-BiGRU maintained a high accuracy rate, with a maximum MAE of less than 4%.
The MAE of B0005 and B0007 in the NASA battery set was up to 20%.The MAE of CEEMDAN-IGWO-BiGRU could still be maintained below 3.5.Owing to the small number of datasets, the relative error of the RUL could be controlled within 3.5%, showing strong robustness.The IGWO significantly improves the stability of the network prediction.The combined prediction algorithm fully trained the data, improved the prediction accuracy, and reduced the strong dependence on the data.In addition, in CS_ 38 and B0018, CEEMDAN had limited improvement in model accuracy.The main reason for this is that some IMF components with weak correlations were removed.Although the predicted capacity curve maintains the global declining trend after noise elimination, it cannot predict the change in capacity regeneration at high frequencies in the short term.The MAE of B0005 and B0007 in the NASA battery set was up to 20%.The MAE of CEEMDAN-IGWO-BiGRU could still be maintained below 3.5.Owing to the small number of datasets, the relative error of the RUL could be controlled within 3.5%, showing strong robustness.The IGWO significantly improves the stability of the network prediction.The combined prediction algorithm fully trained the data, improved the prediction accuracy, and reduced the strong dependence on the data.In addition, in CS_ 38 and B0018, CEEMDAN had limited improvement in model accuracy.The main reason for this is that some IMF components with weak correlations were removed.Although the predicted capacity curve maintains the global declining trend after noise elimination, it cannot predict the change in capacity regeneration at high frequencies in the short term.

Conclusions
In this study, a CEEMDAN-IGWO-BiLSTM model was proposed to predict the capacity and RUL of lithium batteries.The following conclusions were drawn from previous research.The CEEMDAN algorithm is used to effectively solve the mode aliasing problem after the initial current data decomposition and to reduce the error caused by the current data instability in the prediction process.The CEEMDAN-BiGRU model, which uses the IMF component as the model input and capacity values as the output, is suitable for RUL prediction under various working conditions.The GWO algorithm was improved by the nonlinear control of variable A. Additionally, tent mapping improved the global search ability and convergence speed of the GWO algorithm.The improved GWO algorithm was used to optimize the weight and threshold of the BiGRU prediction model to obtain the best parameters.Through simulation and comparison with LSTM and BiGRU, the improved GWO algorithm can further explore the relationship between the IMF component of capacity and RUL.Both the MAE and RMSE were less than 6.3% when only 30% of the dataset was used.Moreover, the accuracy and stability of the RUL prediction of the battery were improved using the hybrid model.The relative error of the RUL was less than 7.13%.

Figure 4 .
Figure 4.The framework diagram of the RUL estimation.Figure 4. The framework diagram of the RUL estimation.

Figure 4 .
Figure 4.The framework diagram of the RUL estimation.Figure 4. The framework diagram of the RUL estimation.
corresponding weight, respectively; and b t represents the offset corresponding to the hidden layer state at time t.The structure of the BiGRU is shown in Figure1.
→h t corresponding to the two-way GRU at time t and the reverse hidden state ← h t

Table 1 .
Lithium battery life prediction error under 50% training set.

Table 2 .
Lithium battery life prediction error under 30% training set.