1. Introduction
Under the burden of the energy crisis, the hunt for safe, clean, and efficient energy conversion and storage technologies has risen to the forefront of the scientific research community. As a storage and conversion carrier of electrical energy [
1], lithium-ion batteries are extensively employed in electronic devices and systems because of their advantages of high safety, many-cycle lifespan, low pollution, and low self-discharge rate. However, during the cycling process, the battery’s internal structure will change with the accumulation of battery uses, increasing the battery’s internal resistance [
2] and the capacity decay degradation. Aging lithium-ion batteries can cause a decline in the functionality of electronic devices and bring safety hazards. Battery health status is an important indicator that reflects the degree of battery deterioration and aging and that may be used as a reference indicator for the replacement and recycling of deteriorated batteries so that the system can continue to work properly and avoid incidents [
3].
Current lithium-ion battery health state prediction research has made significant strides, and several novel methodologies and techniques have been developed that can effectively improve the accuracy of lithium-ion battery health state prediction. Battery health state prediction methods are divided into two major categories [
4], namely, model-based prediction methods [
5,
6,
7] and data-driven prediction methods [
8,
9,
10]. Data-driven prediction methods for lithium-ion battery SOH are the most widely used prediction methods at home and abroad; the data-driven methods include Support Vector Machine (SVM) [
11,
12], Gaussian Process Regression (GPR) [
13,
14,
15], Relevance Vector Machine (RVM) [
16,
17,
18], and neural network methods [
19,
20,
21], etc. The data-driven prediction methods based on the complicated chemical processes occurring within the battery do not need to be considered and are more generic.
The degradation of lithium-ion batteries is often expressed by a drop in battery capacity as the number of charge/discharge cycles increases, and the remaining battery capacity can serve as an indicator of the battery’s health. However, battery capacity data is typically impossible to collect directly online; hence, indirect health indicators are derived to predict the battery’s remaining capacity and reflect the battery’s health status. Wei et al. [
22] selected battery capacity as the variable indicating the battery’s health status and used a support vector regression-based method to estimate the battery’s SOH. However, battery capacity is difficult to measure directly, and the metric lacks real-time capability. Hu et al. [
23] selected the sample entropy of discharge, voltage sample entropy, as the health indicator and Huang et al. [
24] selected equal voltage rise interval as the health indicator, but extracting the health indicator only from a single scale does not reflect the different battery health states well and is insufficient to predict complex capacity changes.
The data-driven prediction method does not need to consider complex electrochemical characteristics but instead pulls health parameters that indicate the deterioration trend from raw data, learns through intelligent algorithms, and predicts SOH. Li et al. [
25] extracted characteristic parameters from the capacity increment curve and established a model for estimate and prediction using the Gaussian process regression technique. However, the hyperparameter adjustment of the GPR algorithm is more complicated and involves many matrix inversion operations. The algorithm’s complexity is quite high and its size is huge; there is a limit to the quantity of training samples, and modeling is challenging. Zhang et al. [
26] proposed using the accelerated particle swarm optimization algorithm to extract six new features from the cyclic charge and discharge cycle and develop an adaptive multi-core correlator SOH prediction model. However, due to the high sparsity of RVM, the prediction method based on RVM is often not stable enough; limited by its structure, the model is only good in short-term prediction. Tian et al. [
27] proposed a method for extracting the temperature difference curve in the constant charging process as a health factor and using the SVM to establish a SOH model, but the SVM algorithm has limitations, in that it is difficult to select a kernel function, and it is difficult to handle when the data scale is too large.
With the rapid advancement of machine learning and deep learning, neural networks are increasingly employed to predict battery SOH. Guo et al. [
28] proposed a deep-learning-based lithium-ion battery capacity estimation method using recurrent neural network (RNN) to learn to predict the discharge voltage sequence of the battery. However, RNN suffers from intractable long-term reliance and restricted data input. Yin et al. [
29] proposed a technique for a combined online estimate of battery SOC and SOH based on long- and short-term memory neural networks, the construction of LSTM neural network models, and the incorporation of stage SOH averages into the calculation of SOC to accomplish joint estimation. However, the LSTM prediction parameters are too many, and empirical parameter selection makes it difficult to properly leverage the LSTM’s prediction capabilities.
In the existing field of battery health state prediction, the selection strategy of health factors greatly impacts the prediction model’s accuracy. In the current research, however, there are issues such as single health factor indications and long-term dependence on the algorithm. The battery’s intricate internal construction makes it difficult to measure SOH directly. Therefore, this paper proposes a method for analyzing constant current discharge characteristics to extract multi-dimensional health indicators, including the extraction of health factors through the equivalent model of lithium-ion batteries and constant current discharge curves and the application of the e-GA algorithm to optimize the voltage drop segment of the TAB. Finally, four health factors are obtained as model input, and the hyperparameters of the LSTM model are optimized to predict the health status of the lithium-ion battery.
3. SOH Prediction Feature Health Factor Extraction
The internal chemical reaction of lithium battery aging and aging will be accompanied by the changes in capacitance and resistance, while the changes in resistance and capacitance are often not quantifiable. Therefore, in practice, the battery capacity is reflected by direct or indirect health indicators. The health indicator can be the battery capacity, resistance, and other inherent parameters of the battery. Alternatively, it can be the time variation of the iso-voltage difference, the capacity increment curve, and the time when the battery temperature reaches its peak. In this paper, the selected health indicator is used as the input of the battery prediction model to build the battery SOH prediction model.
3.1. First-Order RC Equivalent Model for Lithium Batteries
The lithium battery equivalent circuit consists of an ideal voltage source, a resistor and n RC loops. The first-order and second-order RC equivalent circuit models are more accurate and reliable than other models. When the HPPC experimental environment is not needed, the first-order RC equivalent circuit is advantageous in speed and in the outcome of identification. The MAE and RMSE of terminal voltage and open circuit voltage as estimated by different models are shown in
Table 1 [
32].
The first-order RC equivalent circuit model of a lithium battery is shown in
Figure 2 [
33]. This model consists of a resistor, an ideal voltage source, and an RC loop.
According to the equivalent model shown in
Figure 2, the equivalent equation of the circuit can be obtained according to the Kirchhoff voltage and current law and other relevant knowledge:
where
is the battery ohmic internal resistance,
is the battery polarization internal resistance,
is the battery polarization capacitance,
is the battery terminal voltage,
is the battery open circuit voltage,
is the battery polarization voltage, and
is the battery charge and discharge current.
By analyzing the change of terminal voltage in the three stages of discharge, it can be established that the discharge of lithium battery is the zero-step response stage when the terminal voltage
changes over time, as shown in Equation (11).
The static stage after the discharge is the zero-input response stage. At this time, the formula of terminal voltage calculation is shown in Formula (12) and Formula (13), from which it can be seen that it will gradually decay to 0 after a period of time as the polarization reaction diminishes [
34].
3.2. Selection of Health Indicators
Based on the equivalent circuit model of the lithium battery, the external characteristics of the lithium battery, as indicated by the discharge curve of the terminal voltage, are analyzed and fitted. The polarization reaction of the lithium battery is closely related to the SOH of the battery, and the polarization resistance and capacitance of the battery will change with the cycle. The increase in the number of times shows a regular change. The initial discharge internal resistance, polarization internal resistance, and polarization capacitance of the battery are taken as important indicators to map the SOH of the lithium battery, so as to predict the actual capacity of the battery according to its partial discharge characteristics. Therefore, the initial discharge internal resistance, polarization internal resistance, and polarization capacitance of the battery are regarded as the health indicators of the battery’s capacity. Also, the initial discharge internal resistance value as an important manifestation of battery degradation is also regarded as one of the health indicators.
3.2.1. Using Parameter Identification to Extract and
According to the constant current discharge characteristics of the battery, the primary discharge process is divided into three parts: before discharge (before point a), during discharge (from point b to point c) and after discharge (after point d). It can be seen that after the completion of discharge, there will be a period of voltage rise when the battery is left standing for a short period of time. The reason for this phenomenon is that the battery’s discharge is accompanied by a polarization reaction. The polarization reaction is manifested as a voltage change of the battery. In case of high-current discharging, the voltage drop will be large. When the discharging stops, the battery polarization stops, and the battery’s voltage will rise significantly. According to Equations (12) and (13), the rebound curve after discharge at point d is a function curve expressed as
in
Figure 3, which provides a basis for the next round of parameter identification. In this paper, the curve parameter identification and the least squares method are used.
The B0005 battery data of a NASA lithium battery is selected as the research object. This is the most used public dataset in the current research in this field. A constant temperature box is used to eliminate the influence of external temperature. However, there are only a few battery samples in this data set, each battery sample corresponds to a single battery, and the sample points of each battery sample are relatively small. Also, the battery capacity degradation curve will show a phenomenon of short-term capacity regeneration. Among them, the B0005 battery data provided by NASA has a total of 168 cycles. In order to identify health factors through the discharge recovery phase, the terminal voltage recovery curve during 168 discharges is sorted out as shown in
Figure 4. The starting point in the figure is point c in
Figure 3. Since the terminal voltage values have time intervals, the actual voltage curve shows a certain distortion. Therefore, curve fitting is performed to infer the position of point d in the theoretical curve, which is the work of the subsequent preparation for identification.
In order to find the position of point d, the polyfit function in MATLAB is used to fit the remaining points in each cycle recovery voltage curve except the first point based on the fifth degree polynomial fitting through the least square method, where the polyfit function is the minimum used in MATLAB. The function of quadratic method parameter estimation for curve fitting can be called by command statement, and the output is the coefficient line. The fitting results are shown in
Figure 5. For each cycle, point d is taken as the origin of the coordinate axis, and the recovery curve of the polarization reaction is drawn, as shown in
Figure 6.
According to Equations (12) and (13) after the end of the discharge polarization reaction formula, it can be deduced that the polarization reaction formula in
Figure 4 is as follows:
where the values of the parameters are unknown. The process of identifying curve parameters by using the least squares method is shown in
Figure 7. Specifically,
Figure 7a shows the identification result of the entire cycle, the blue points represent the actual voltage values, and the red curve represents the identification curve of the polarization formula, while
Figure 7b present the identification results of several cycles.
Finally, the polarization internal resistance and polarization capacitance are identified as shown in
Figure 8. It can be seen from the figure that the polarization resistance and capacitance change regularly with the increase in the number of cycles [
35], and
and
are feasible as the health factors of the battery to map the battery SOH.
3.2.2. Modified Discharge Curve Extraction
The initial stage of discharge is the ohmic polarization stage, the voltage drop is caused by the internal resistance of each part of the battery connection, and the value of voltage drop follows Ohm’s law. Since the terminal voltage curve is not displayed in real time, the position of point b in
Figure 9 on the actual voltage curve can be estimated by calculation. Also, the value of
and
can be obtained, while Equation (11) can be used to estimate the position of point b. Then, the value of corresponding discharge initial internal resistance
can be obtained through the ratio of the voltage difference and the discharge current between point a and point b as shown in
Figure 9.
Figure 10 shows the curve of discharge initial resistance changing with cycle.
3.2.3. Extracting under Constant Current Working Condition
Under the context of constant current, the discharge capacity
is obtained by multiplying the equal voltage drop discharge time series
by the discharge current, and the comparison curve between
and actual capacity of B0005 lithium battery is obtained by selecting a fixed voltage drop range of 3.8–3.5 V. As can be seen clearly from
Figure 11, the connection with the actual capacity of the battery is very close. The feasibility of equal pressure drop discharge time as an indirect health indicator is verified. Among them, the actual capacity of the battery is measured by the ampere-hour method from the current curve of the battery in the process of discharging from the fully charged state to the cut-off voltage.
According to Equation (11), the effect of the polarization reaction on the end voltage will gradually decline or even disappear rapidly as the discharge process proceeds. Therefore, the end voltage can be taken as a simple internal resistance model equation, as shown in Equation (15) for most of the time in the discharge process [
36].
3.3. Optimization of Health Indicator Based on e-GA
Although equal pressure drop discharge time is widely used as a health indicator for battery health state prediction, the
obtained by selecting different discharge voltage drop fragments varies, and the accuracy of prediction results produced by using them often varies significantly. To select a suitable health indicator, three discharge voltage drop segments (3.8–3.5 V, 3.5–3.2 V, and 3.0–2.7 V) are selected for comparison by examining the correlation between the iso-voltage drop and the actual capacity for different intervals. In the existing studies, there is a gray correlation analysis conducted to analyze the relationship between the extracted equal voltage drop discharge time and the actual discharge capacity [
37], but the algorithm is computationally intensive. Therefore, the Pearson product distance correlation coefficient is used to verify the correlation between the different iso-voltage drop discharge time and the actual capacity. The calculation equation is shown as Equation (16).
where
x and
y are the comparison sequences with the sequence length
n, and
r is the Pearson correlation coefficient between the two sequences. The sequences generated by the
,
,
, and
with the number of cycles are
x, and the battery capacity sequence is
y. When
r > 0, the
x and
y sequences show positive correlation. When
r < 0, the
x and
y sequences show negative correlation. The correlation relationship between the four health indicators and the actual capacity is obtained as follows.
As shown in
Table 2, the correlations of the health indicators
,
, and
are moderate or weak, but the correlation coefficient of
is related to the selection of the pressure drop interval. Therefore, the commonly used voltage interval for battery discharge is delineated, and the span of the voltage drop interval is constrained. The genetic algorithm based on the elite selection model is used to optimize the voltage drop segment. With the Pearson correlation as the fitness function of the e-GA algorithm, the pressure drop segment with the best correlation of
is selected.
The elite retention strategy is adopted to optimize GA. The individual with the highest level of fitness in the current generation is retained as an elite individual, and the remaining individuals are subjected to genetic operations such as random selection, crossover, and mutation. The steps of the algorithm include population initialization, fitness function setting, selection, crossover and mutation, and elite retention strategies. The voltage drop interval selected by the model ranges between 0.1 V and 0.2 V. Considering that the commonly used SOC range is from 80% to 30%, the search range for the optimal voltage drop interval of the B0005 battery is 3.85–3.10 V, e-GA. The algorithm has a step size of 0.01 V and 0.05 V. The model parameters are set as follows: the number of individuals in the initial population n = 20, the number of evolution k = 100, the crossover rate = 0.4, and = 0.1.
The flowchart of model algorithm is as follows:
Step 1: Population initialization, which is to randomly generate the initial population P(t) of n individuals through binary coding of the selected pressure drop sequence.
Step 2: Fitness function, where the Pearson product-distance correlation coefficient is used to calculate individual fitness.
Step 3: Selection operator, which is used to obtain the individual fitness according to the fitness function, with n individuals randomly selected from the population (each individual has the same probability of being selected) for the population individual gambler.
Step 4: Crossover operator, with single-point crossover used and population individuals randomly paired. After successful pairing, a crossover bit is randomly selected by single-point crossover according to the crossover probability, the binary code exchange of the crossover bit is performed, and two new individuals will be obtained. Then, the new individual is tested. If the test is successful, the mutation operator is entered; otherwise, the cross position is re-selected until the test is successful.
Step 5: Mutation operator, where the individual is a binary coded string and the basic mutation method is used to mutate the individual. According to the mutation rate , the compiled bit is randomly selected, the binary character corresponding to the bit is reversed, and the fitness of the compiled individual is verified. If the verification is successful, then we proceed to get a new generation of individuals; otherwise, they will be re-selected and compiled until the verification is successful.
Step 6: Elite retention strategy, where the optimal individual with the optimal fitness value searched in the evolution of the population is saved as an elite individual. Also, the individual with the lowest fitness is replaced in the next generation, so as to avoid the loss and destruction of the best genes in the current population.
Step 7: It is determined whether the termination condition is satisfied. If the evolutionary count is reached, the pressure drop fragment with the optimal fitness is outputted; otherwise, the evolutionary count is increased by one before returning to step 2.
The optimized adaptation curve is obtained using the e-GA model, as shown in
Figure 12. A total of 168 cycles of the B0005 cell correlation optimal correlation correspond to the interval of equal pressure drop ranging from 3.65 V to 3.45 V, and the optimal individual adaptation is 0.9989, which improves the correlation coefficient by 0.0027 compared to the interval of 3.8 V–3.5 V. Also, the measurement time is reduced by about 200 s according to the comparative test.
4. PSO-LSTM Network Model
Although the LSTM network can be used to solve the long-term dependence problem that RNN cannot handle, the introduction of the gate structure increases the number of hyperparameters. If the selection of hyperparameters is unreasonable, the accuracy of prediction will be insufficient. Therefore, PSO is used to optimize the hyperparameters of the LSTM network and to build a PSO-LSTM lithium-ion capacity prediction model. Considering the training time of the neural network and the need for the PSO algorithm to repeatedly obtain the error of the verification set as the fitness value, the LSTM network with a single-layer structure is selected. Thus, there are three hyperparameters to be configured: learning rate
, training times
and hidden layer neuron number
. The structure of the PSO-LSTM algorithm model is shown in
Figure 13. The three hyperparameters of LSTM are used as the parameters to be optimized for PSO, and the fitness function is the mean absolute percentage error (
MAPE) outputted by the model, as shown in Equation (17):
where
MAPE is calculated as shown in (18).
where
is the actual value of the remaining capacity,
is the predicted value of the remaining capacity, and n denotes the number of predicted load points.
The combined model algorithm process is as follows:
Step 1: Data processing, where the battery degradation data is obtained from the NASA Li-ion battery data set, with the B0005 battery data divided into training set, validation set, and test set at a ratio of 6:1:3. Specifically, the first 100 cycles comprise the training set data, cycles 101 to 118 comprise the validation set, and cycles 119 to 168 comprise the test set. The health indicators (, , , ), as described in the previous section, are extracted and normalized.
Step 2: Optimization of the LSTM hyperparameters through PSO.
(1) With the learning rate, the number of training times, and the number of neurons in the hidden layer in the LSTM model as the optimization objects, their search ranges are determined. Based on the actual training volume and experience, the learning rate range of the solvable space is selected as [0,0.2], the range for the number of neurons in the hidden layer is [1,50], and the range of training time is [100,500]
(2) The PSO parameters are initialized, with the particle dimension set as 20, the population set as 100, the minimum value of particle velocity = [0.001,1,10], the maximum value = [0.05,10,50], the number of iterations M = 100, the learning factor is , and the inertia weight is = 0.8, taking into account the variation of neural network hyperparameters, randomly generated search point velocity and position in the search range and position, where the randomly generated particle position is a multiple of .
(3) The fitness value of PSO is calculated. The fitness value of each particle is calculated, the LSTM model is constructed with the corresponding parameters of each particle, the LSTM model is built using the training set, the verification set data are predicted, and the error of the verification set prediction result is calculated as the fitness value of each particle. The fitness formula is shown in Equation (17), and the particle individual extremum and global extremum are updated.
(4) Position and speed are updated. The search speed and position of hyperparameters are updated according to Equations (7) and (8). In order to prevent the particles from exceeding the search range and jumping out of the solution space at the time of search, it is replaced with the boundary value if the speed exceeds the boundary.
(5) It is judged whether the algorithm meets the criterion of termination. If this criterion is met, the population history optimal solution is ended and outputted; otherwise, we return to step 3.
Step 3: Model training, with the global optimal parameters searched in step 2 configured into the LSTM network model. Besides, four health indicators obtained from the first 118 cycles are used as input for model training, so as to obtain the PSO-LSTM battery capacity prediction model.
Step 4: Model testing, with the standardized data of the health indicator of the 50 cycles inputted into the prediction model after the test set to obtain the capacity prediction results.