Health Factor Extraction of Lithium-Ion Batteries Based on Discrete Wavelet Transform and SOH Prediction Based on CatBoost

Zhang, Mei; Chen, Wanli; Yin, Jun; Feng, Tao

doi:10.3390/en15155331

Open AccessArticle

Health Factor Extraction of Lithium-Ion Batteries Based on Discrete Wavelet Transform and SOH Prediction Based on CatBoost

College of Electrical and Information Engineering, Anhui University of Science and Technology (AUST), Huainan 232001, China

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(15), 5331; https://doi.org/10.3390/en15155331

Submission received: 20 June 2022 / Revised: 15 July 2022 / Accepted: 20 July 2022 / Published: 22 July 2022

Download

Browse Figures

Versions Notes

Abstract

:

Aiming to accurately identify the state of health (SOH) and the remaining useful life (RUL) of lithium-ion batteries, in this paper, we propose an algorithm for the health factor extraction and SOH prediction of the batteries based on discrete wavelet transform and the Cauchy–Gaussian variation tent sparrow search algorithm (DWT-CGTSSA). Firstly, concerning the inconsistent data length, discrete wavelet transform (DWT) was adopted to decompose the battery’s signals and extract features. Then, the Cauchy–Gaussian variation tent sparrow search algorithm (CGTSSA) was utilized to extract features and obtain the optimal feature subset after encoding. Finally, the optimal feature subset was used to establish a prediction model based on CatBoost for predicting the SOH of lithium-ion batteries. Experiments were conducted for verification. The experimental results showed that the model established in this research is capable of realizing the prediction between different battery packs. The B0005 battery from dataset A was taken as the training set to predict the complete SOH of B0006 and B0007 batteries. For the prediction model of CGTSSA-CatBoost, the goodness of fit (R²) exceeded 0.99, and the value of mean square error (MSE) was less than 1‰. A comparison with other state-of-the-art prediction models verified the superior performance of the CGTSSA-CatBoost model. Under different working conditions, the R² of all models in dataset B exceeded 0.98.

Keywords:

lithium-ion battery; SOH prediction; DWT; CGTSSA; CatBoost

1. Introduction

Due to the merits of high energy density, high power density, and long cycle life, lithium-ion batteries have been widely used to store energy [1,2,3,4,5]. The service life of a battery gradually declines with the prolonged service time, resulting from the battery’s internal chemical reaction and the influence of the external environment. The decline and degradation of lithium-ion batteries will increase the maintenance cost of many electronic devices. The sudden failure of batteries easily causes the crash of large equipment, including new-energy vehicles, which may lead to major accidents. Therefore, the accurate and timely prediction of battery SOH and failure time helps replace the failed batteries in a timely manner, so as to reduce the risks of accidents. As the core energy storage component of new-energy vehicles, lithium-ion batteries are of great significance in energy storage equipment. Therefore, the research on lithium-ion batteries is decisive and meaningful for the development of new-energy vehicles [6,7,8]. Influenced by the composition and structure of lithium-ion batteries, we cannot directly obtain the real-time SOH of batteries through sensing technology. Therefore, it is necessary to obtain the SOH and RUL of batteries by developing estimation and prediction models [9,10].

In machine learning, excellent features determine the upper bound of the model performance, and feature engineering is particularly critical in the prediction model. The extracted health factors directly determine the prediction effect of the prediction model. However, the length of the measured battery data during the operation of the lithium-ion battery is uncertain, and the extraction of health factors is difficult, which undoubtedly leads to the inaccurate prediction of SOH and residual service life (RUL) of the battery. The determination of effective feature engineering to realize the extraction and screening of battery health factors has become the focus of research and is challenging to achieve in lithium-ion batteries.

At present, there are two main methods for the RUL prediction of lithium-ion batteries: model-based and data-driven methods. In a model-based method, it is required to establish the simulation model of batteries. Considering the complex chemical reactions in lithium-ion batteries, various partial differential equations are established, involving the calculation of relevant parameters and matrices of a large equivalent circuit model. Therefore, there are many challenges in ensuring the accuracy of the model in practical applications [11]. In a data-driven method, the typical features are usually extracted from degraded data. Then, a machine learning method is used to map the relationship between degraded data and SOH, so as to predict the SOH and RUL. In this research, the data-driven method was adopted.

The battery RUL prediction based on a data-driven method has extensively been investigated. Shi Yongsheng et al. proposed a SOH estimation method for lithium-ion batteries based on an attention-improved bi-directional gating cycle unit (BiGRU). This method realizes high-precision SOH estimation of different types of batteries. They used internal resistance, voltage change rate, test time, discharge energy, discharge depth, and terminal voltage in the discharge cycle as health factors; thus, the selected health factors are fewer, and the feature engineering is simple. When selecting health factors, subjective judgment was used as the basis for selection, and therefore the established feature engineering needs to be improved [12]. Zhang et al. proposed a battery SOH prediction method combining one-dimensional convolution (1DCNN) and long short-term memory network (LSTM). The voltage, current, and temperature intervals in the charging and discharging data were averaged and then serially connected. This method is relatively simple, but it requires high data integrity, and it is difficult to establish feature engineering in incomplete data [13]. Feng HL et al. proposed a new SOH and RUL prediction method based on GPR and established a new SOH prediction model for lithium-ion batteries by improving the basic Gaussian process regression model [14]. Cai L et al. established a more effective SOH estimator through support vector regression (SVR) and the short-term characteristics of the current pulse test and optimized the whole process of the SOH estimator through NSGA-II, taking into account the measurement cost and estimation accuracy of the characteristics. They selected four inflection points in the voltage response curve as the characteristics of the SOH estimator and used feature extraction to eliminate ineffective interference, but the extracted features were fewer, and the generalization ability of the model with fewer features in the actual prediction needs to be proved [15]. Song et al. proposed a SOH estimation method for lithium-ion batteries based on the XGBoost algorithm. The SOH of lithium-ion batteries was estimated using the XGBoost model and then corrected through the Markov chain [16]. Taking the differential voltage curve as well as charging and discharging curves as the degradation features of battery capacity, Li et al. adopted the Elman neural network to predict the RUL of batteries [17]. Park et al. used the wavelet transform method to promote feature extraction to preprocess nonlinear feature data from lib and used convolutional neural network (CNN) and long short-term memory (LSTM) technology for lithium-ion SOH estimation using the wavelet transform method [18]. Kaur K et al. proposed three models of different network architecture families. By considering the influence of parameters such as the complexity of the model, the sampling rate of battery measurable signals, and the type of battery measurable signals, they evaluated the battery capacity estimation of different models, relying on the ability to effectively process time series data by remembering long-term dependence [19]. Khumprom et al. applied the deep neural network (DNN) to predict the SOH and RUL of lithium-ion batteries, showing equivalent or more competent performance, compared with other machine learning algorithms [20]. Rossi et al. proposed a method to adjust the extended Kalman filter (EKF) covariance matrix by applying the optimization process based on the genetic algorithm (GA), which realized the SOH prediction of batteries [21]. Jos et al. proposed a novel preprocessing method, aiming to improve the efficiency of machine-learning-based SOH estimation, which included the relative state of charge and data processing, and the conversion of time-domain data into the state-of-charge (SOC)-domain data. The results showed that their feature extraction method achieved a better accuracy [22].

In terms of the SOH prediction of lithium-ion batteries, the above studies have made remarkable achievements. However, there is scarce research on feature extraction. The extraction results of the health factors of lithium-ion batteries are of great significance for predicting the accuracy of the predicted model. Thus, in this paper, we adopted DWT-CGTSSA to establish feature engineering, extract health factors, and predict the SOH through the CatBoost prediction model, so as to accurately predict the SOH and RUL of lithium-ion batteries. Meanwhile, by predicting the SOH for different datasets, we verified the strong generalization capability of the feature extraction method and the prediction model proposed here.

2. Algorithm Principle

Regarding the prediction of battery life, we adopted the DWT-CGTSSA to extract the health factor features and constructed a CGTSSA-CatBoost model to predict the SOH. The detailed principle is shown in Figure 1.

The health factor extraction and the CGTSSA-CatBoost prediction model based on the DWT-CGTSSA are mainly composed of three parts: feature engineering construction, the CGTSSA-optimized CatBoost model, and SOH prediction. In feature engineering construction, DWT is used to decompose the voltage and current signals of charging and discharging into approximate signals and detailed signals. Through analyzing the signal features of signal amplitude factor and pulse factor, a 320-dimensional feature vector set is obtained. By eliminating the redundant features using the CGTSSA optimization algorithm, an 11-dimensional optimal feature subset is obtained. The CGTSSA-optimized CatBoost model adopts the CGTSSA to optimize the parameters of the CatBoost model and obtains the optimal CatBoost model through training. The obtained optimal CatBoost model predicts the SOH of lithium-ion batteries, outputs the prediction results, and conducts an evaluation of the model.

2.1. Discrete Wavelet Transform (DWT)

Wavelet transform has better performance in processing nonstationary sequence signals than Fourier transform. By replacing the sinusoidal and conical waves decomposed by Fourier transform with a set of degenerative orthogonal bases, the abrupt and nonstationary parts of the signal can be better performed [23].

The data

x [n]

are passed through a half-band low-pass filter with impulse response

h [n]

using

x [n] \times h [n] = \sum_{k = - \infty}^{\infty} x [k] \cdot x [n - k]

(1)

Following the Nyquist theorem, we performed downsampling for data. One sample point is removed at an interval, one-half of the sample points of the signal are reserved, and the scale is doubled. The high-pass filtering is performed for this half by

y [n] = \sum_{k = - \infty}^{\infty} h [k] \cdot x [2 n - k]

(2)

x [n]

is decomposed into

α

layers by wavelet transform. The low-frequency information and high-frequency information of layer

α

are expressed by

\{\begin{matrix} x_{α, L} [n] = \sum_{k = 0}^{K - 1} x_{α - 1, L} [2 n - k] \cdot g [k] \\ x_{α, H} [n] = \sum_{k = 0}^{K - 1} x_{α - 1, L} [2 n - k] \cdot h [k] \end{matrix}

(3)

where

h [k]

is a high-pass filter,

g [k]

is a low-pass filter,

x_{α, L} [n]

is the low-frequency information in layer

α

, and

x_{α, H} [n]

is the high-frequency information in layer

α

.

2.2. CGTSSA

When initializing the population, the sparrow search algorithm uses the random generation method. The disadvantage of this method is that it will make the sparrow population unevenly distributed and affect the subsequent iterative optimization. By taking advantage of the randomness, ergodicity, and regularity of chaotic mapping, the tent sparrow search algorithm (TSSA) optimizes the position of individual sparrows by using tent maps in chaotic mapping to avoid falling into local optimization, thus improving the global search ability and optimization accuracy [24,25].

The expression of tent mapping is given by

Z_{i + 1} = \{\begin{matrix} 2 \times Z_{i} & 0 \leq Z_{i} \leq \frac{1}{2} \\ 2 \times (1 - Z_{i}) & \frac{1}{2} < Z_{i} \leq 1 \end{matrix}

(4)

where

Z_{i}

is the initial value, and

Z_{i + 1}

is the value of tent after mapping.

The specific steps of TSSA are as follows:

Step 1. Apply Equation (7) to generate a chaotic variable

Z^{d}

according to initial particle

X^{d}

;

Step 2. Transfer the chaotic variable carrier to the solution space of the problem to be solved as follows:

X_{n e w}^{d} = X_{\min}^{d} + (X_{\max}^{d} - X_{\min}^{d}) \times Z^{d}

(5)

where

X_{\max}^{d}

and

X_{\min}^{d}

are the maximum and minimum values of the d-dimensional variable

X_{n e w}^{d}

, respectively;

Step 3. Perform the chaotic disturbance to the individual according to Equation (6):

X_{n e w}^{'} = \frac{(X + X_{n e w})}{2}

(6)

where

X

is the individual requiring chaotic disturbance,

X_{n e w}

is the generated chaotic disturbance, and

X_{n e w}^{'}

is the individual after chaotic disturbance.

During the late iteration of TSSA, the assimilation and mutation strategy among sparrows lead to a shorter search step, resulting in local optimization. To avoid local optimization and ensure the mutation sparrow has ample step size in the later stage, we adopted the Cauchy–Gaussian mutation strategy to mutate the optimal individual position [26,27]. The specific iteration formula is expressed as

U_{b e s t}^{t} = X_{b e s t}^{t} [1 + λ_{1} G a u s s (0, σ^{2}) + λ_{2} C a u c h y (0, σ^{2})]

(7)

where

U_{b e s t}^{t}

represents the position of the optimal individual after variation,

σ^{2}

represents the standard deviation of Cauchy–Gaussian mutation strategy,

G a u s s (0, σ^{2})

is a random variable satisfying the Gaussian distribution,

C a u c h y (0, σ^{2})

is a random variable satisfying the Cauchy distribution, and

λ_{1}

and

λ_{2}

are dynamic parameters adaptively adjusted with the number of iterations. Influenced by and the Cauchy–Gaussian mutation mode, with the increase in iteration time,

λ_{1}

gradually increases, and

λ_{2}

gradually decreases, which can make the algorithm locally optimal and coordinate its local development and global exploration capabilities during iteration.

The expression of

σ^{2}

is

σ = \{\begin{matrix} 1, f (X_{b e s t}) < f (X_{i}) \\ \exp (\frac{f (X_{b e s t}) - f (X_{i})}{|f (X_{b e s t})|}), otherwise \end{matrix}

(8)

where

f (X_{b e s t})

is the fitness value of the best individual sparrow at present, and

f (X_{i})

is the fitness value after variation.

2.3. CatBoost Model

Composed of categorical and boosting features, the gradient boosting decision tree (GBDT) algorithm is a base learner based on oblivious trees, with the advantages of fewer parameters, supporting the categorical variables, and high accuracy [28,29].

2.3.1. GBDT Algorithm

The GBDT algorithm mainly includes the bagging algorithm and boosting algorithm. As the mainstream algorithm, the boosting algorithm can promote a weak learner to a strong learner. It relies on the framework of an integrated algorithm, uses the previous round of weak-learner errors to strengthen the next round of weak learners through iteration, and fuses multiple learners to form a strong learner through some combination strategy.

Gradient boosting is a machine learning technique used for regression, classification, and sorting tasks. It belongs to the boosting algorithm family. The GBDT algorithm is a novel integrated algorithm fused from the CART number and gradient lifting algorithm [30].

The GBDT framework mainly has four steps:

Step 1. Initialize the weak learner:

f_{0} (x) = \arg \min_{c} \sum_{i = 1}^{N} L (y_{i}, c)

(9)

where L (y_i,

c

) is the loss function, y_i is the ith label, and

c

is the parameter with the smallest square loss function.

Step 2. Calculate the residual error:

r_{i m} = - {[\frac{\partial L (y_{i}, f (x_{i}))}{\partial f (x_{i})}]}_{f (x) = f_{m - 1} (x)}

(10)

where

M

is the number of iterations (

m = 1, 2, \dots, M

);

N

is the total number of samples (

i = 1, 2, \dots, N

).

Step 3. Strengthen the next learner according to the residual of the last weak learner:

f_{m} (x) = f_{m - 1} (x) + \sum_{j = 1}^{J} γ_{j m} I_{j m} (x \in R_{j m})

(11)

2.3.2. Principle of CatBoost Model

Similar to all standard gradient lifting algorithms, CatBoost also builds a new tree to fit the gradient of the current model. However, all classical lifting algorithms have overfitting problems caused by biased pointwise gradient estimation. To solve this problem, CatBoost has made some improvements to the classical gradient lifting algorithm [31].

Set

D

as the training set:

D = (X_{i}, Y_{i})

(12)

where

n

is the number of sample groups (

i = 1, 2, \dots, n

), each group of samples is

X_{i} = (x_{i}^{1}, x_{i}^{2}, \dots, x_{i}^{m})

,

x_{i}^{m}

is the

m

th feature vector of group

i

of samples, and

Y_{i}

is the

i

th target value of Y.

The feature conversion value of the CatBoost model is

{\hat{x}}_{i}^{k} = \frac{\sum_{j = 1}^{n} φ (x_{j}^{k} = x_{i}^{k}) Y_{j} + α p}{\sum_{j = 1}^{n} φ (x_{j}^{k} = x_{i}^{k}) + α}

(13)

where

φ

is the indicator function,

p

is the prior value, and

α

is a priori weight.

Using ordered boosting to replace the gradient estimation method in the traditional algorithm, CatBoost is capable of reducing the deviation of gradient estimation and improving the generalization capability of the model.

3. Experimental

3.1. Instrument and Equipment

The instruments used in the experiment include a battery detector, a thermostat, an upper computer, and a crocodile clip.

Adopting a battery tester, we conducted a circular charge–discharge experiment on the battery and detected the relevant parameters, including voltage, current, power, and capacity during the process. The experiment was conducted in an environment with a temperature of 15 °C through a constant temperature box.

An 18,650 lithium-ion battery was taken as the experimental object, the detailed information of which is shown in Table 1.

3.2. Experimental Steps

The experimental process is shown in Figure 2.

The experimental equipment is shown in Figure 3.

As shown in Figure 3, the computer was used as the host computer; then, the host computer was connected to a Xinwei tester by using the LAN, and the battery was connected with the crocodile clip. The tester detected the voltage, current, power, and other data of the battery during charging and discharging.

3.3. Experimental Data

In this research, two datasets were used, i.e., dataset A and dataset B, to verify the generalization capability of the proposed feature extraction method and the prediction model. Taking dataset A as the research object, we established the prediction model and used B0005 as the training set to predict the SOH and RUL of B0006 and B0007. Taking dataset B as the verification object, we reestablished feature engineering for dataset B, developed the prediction model, and observed the prediction effect, so as to verify that the feature engineering proposed in the paper has a strong generalization capability.

Dataset A is from the first batch dataset of NASA’s Prediction Center of Excellence (PCoE). Taking B0005, B0006, and B0007 batteries as the data, we recorded the voltage, current, and temperature during the battery cycle charging and discharging. The experimental process is as follows: First, the battery was charged with 1.5 A constant current, followed by constant voltage charging; after the voltage reached 4.2 V, charging was stopped until the charging current was less than 20 mA; lastly, the battery was discharged with 2.0 A constant current until the voltage was less than 2.5 V. According to the above process, the battery was charged and discharged for 168 cycles. When conducting the prediction experiment for the battery SOH, the detection parameters (voltage, current, and power) of dataset A were taken as independent variables in the prediction model, and the total discharge capacity (Ah) of each cycle was taken as the prediction object of the prediction model.

Taking the 18650 battery as the research object, dataset B comprises the data measured in the experiment according to the experimental process in Section 2.2. A total of 148 cycles of battery parameters were obtained. The total discharge capacity (mAh) of each cycle was taken as the prediction object of the prediction model.

4. Extraction of Health Factor Based on DWT-CGTSSA

The charging and discharging time of lithium-ion batteries changes with the aging of the battery during the process of cycle charging, resulting in inconsistent time steps of the parameters measured in each cycle. Taking the charging current of the B0005 battery in dataset A as an example, the charging current cycle curves of different cycles are shown in Figure 4.

It can be seen from Figure 4 that the charging time of each charging cycle was inconsistent, and the length of the sequence was different. With the transformation of the curve obtained by the number of cycles, the transformation trend between some characteristics on the curve and the capacity was similar. The sampling frequency of the first 31 cycles of the curve was obviously inconsistent with that of the following cycles, and the time length of the curve at 31 cycles presented a cliff-like change. There were 10 groups in dataset A, which had similar curves, and there were 6 groups of curves in dataset B. However, these data should be subjected to feature extraction before being input into the prediction model.

4.1. Extraction of Health Factor Based on DWT

Through decomposing the signal into approximate signal and detail signal using DWT, we extracted the curve features, captured the waveform information, calculated the amplitude factor, waveform factor, pulse factor, and margin factor as the health factors for SOH prediction, and finally obtained the feature vector set.

The extracted information is shown in Table 2.

According to Table 2, the approximate signal and detail signal of each charging and discharging parameter had 16 kinds of feature information, with a total of 32 kinds of features. Dataset A generated 10 sets of signal curves for each charging and discharging cycle. Therefore, a 320-dimensional feature vector set was obtained for dataset A in the experiment.

4.2. Selection of Health Factors Based on CGTSSA

As mentioned in Section 3.1, we finally obtained a 320-dimensional feature vector of dataset A. Excess features increase the complexity of the model and cause overfitting. Therefore, it is necessary to remove irrelevant and redundant features.

Adopting the CGTSSA to encode the optimal feature subset, we selected the health factors. First, the 320-dimensional feature vector set was taken as a variable with a length of 320 dimensions. The variable value can be filled with 0/1, where 0 represents the unselected feature, and 1 represents the selected feature. Next, the B0005 battery of dataset A was divided into a training set and a verification set. The first 100 cycles were set as training samples, and cycles 101 to 168 were set as verification samples, aiming to verify that the fitness function of the MSE value of the verification set is the fitness value and establish a fitness function. Taking the CGTSSA as the optimization algorithm, the CatBoost model as the prediction model, and the selected feature set as variables, the features of the feature vector set were extracted by constructing a CGTSSA coding algorithm. Finally, taking the 320-dimensional vector as the independent variable, the MSE value as the fitness function value, and the CGTSSA as the optimization algorithm, we constructed a binary CGTSSA optimization algorithm to find the optimal feature subset. The fitness function curve of the health factors is shown in Figure 5.

As can be seen from Figure 5, the 11-dimensional optimal feature subset was obtained after CGTSSA coding.

When the number of CGTSSA iterations reached 20, the fitness value no longer changed, and the optimal solution of CGTSSA was obtained. The MSE value reached 0.0298‰ in the verification set.

The Pearson correlation coefficient between the optimal feature subset and the battery capacity to be predicted is shown in Figure 6.

As shown in Figure 6, there was a strong correlation between some screened features and capacity.

5. Model Establishment and Verification

5.1. Comparison of CatBoost Model and Its Parameter Optimization Algorithm

For dataset A, we used B0005 as the training set, extracted the feature vector by adopting the proposed feature extraction method, trained the CatBoost model, and obtained the prediction model. Then, the model was used to predict the SOH and RUL of B0006 and B0007, so as to realize the prediction between different batteries under the same model and the same charging and discharging strategy. Finally, the complete cycle capacity of several other batteries was predicted based on the model constructed by available battery training.

Super-parameter setting greatly affects the performance of the CatBoost model. If the parameter learning_rate is too small, the learning step of the model will be small, resulting in the insufficient fitting of the model. If the parameter learning_rate is too large, the learning span will be large, and the model’s convergence will be reduced, resulting in the overfitting of the model. The regularization parameter L2_leaf_reg has a direct influence on the fitting capability of the model. The matched L2_leaf_reg parameter is capable of preventing the model from overfitting and enhancing the generalization capability of the model. Bayesian bagging controls the intensity of bagging_temperature, thereby affecting the performance of the model. The depth of the tree affects the prediction effect of the model. A deeper tree has an advantage in increasing the expression capability of the model but has the disadvantage of causing model overfitting [32]. The optimization algorithm was adopted to optimize the super parameters in the above four CatBoost models, thereby improving the performance of the model and enhancing the generalization capability of the model.

The commonly used parameter optimization algorithms include particle swarm optimization (PSO) and sparrow search algorithm (SSA). In this research, CGTSSA, SSA, and PSO were adopted to optimize the four super parameters of the CatBoost model, and a comparison between the algorithms was made. In the experiment, the three optimization models and the original CatBoost model were used to train the data of the B0005 battery, and then the trained model was used to predict the battery life of B0006 and B0007. The prediction results are shown in Figure 7.

As shown in Figure 7, the battery capacity dropping to 80% of rated capacity was considered a failure, and the failure thresholds were set to 1.38 Ah and 1.5 Ah for the B0006 and B0007 batteries, respectively.

It can be seen from Figure 7 that the CatBoost model with default parameters achieved the poorest prediction effect.

To comprehensively compare the performance of the above models, in this paper, the square sum of error (SSE), root mean square error (RMSE), the goodness of fit (R²), absolute error (AE), and mean square error (RMSE) of the residual life of the batteries were taken as the indicators to compare the model performance.

The detailed comparison results of the models are shown in Table 3.

Combining the results in Table 3 and Figure 7, for the CGTSSA-CatBoost model, the AE was 0, R² was higher than 0.99, SSE was lower than 0.2, and MSE was lower than 1‰ in B0006 and B0007 batteries. Thus, the CGTSSA-CatBoost model showed the highest prediction accuracy and the best prediction effect among the above models.

5.2. Comparison with Other Models

At present, with the development of machine learning, there are many prediction models in the market. The ELM and SVM models are the most classical and popular prediction models, which play great roles in the SOH prediction of lithium-ion batteries. Many studies in the literature are based on the ELM and SVM as prediction models. The SVM model has a good processing effect in small-sample data processing [15,33]. Since the kernel function method overcomes the problems of dimension disaster and nonlinear separability, the computational complexity is not increased when mapping to high-dimensional space. The ELM model is a simple and effective single-hidden-layer feedforward neural network SLFN learning algorithm, which has the advantages of fewer training parameters and very fast speed.

To obtain the optimal prediction model, we selected the ELM, support vector machine (SVM), and CatBoost models to predict the preprocessed data and adopted CGTSSA to optimize the ELM, SVM, and CatBoost models. The performance of the models is compared, as shown in Figure 8.

The detailed indicators of each model are shown in Table 4.

Combined with the data in Table 4 and Figure 8, it was revealed that the CGTSSA-CatBoost model had the best performance in the B0006 and B0007 batteries. The prediction effect of the ELM model was slightly worse, and the prediction effect of the SVM model was better. When predicting the same battery, the performance of the SVM model was close to the CatBoost model, and the R² of the SVM model was higher than 0.99, indicating that the model has a good fitting effect. In terms of R², RMSE, MSE, and SSE indicators, the CatBoost model performed best, i.e., the CatBoost model had the best SOH prediction effect, followed by the SVM and ELM models. As regards the AE index, the AE value of the CatBoost model was 0, which was the lowest, indicating that the RUL prediction effect of the CatBoost model was the best. The RUL of the ELM model was the same as that of the SVM model on the B0006 battery, but the prediction effect of the SVM model was higher than that of the ELM model on the B0007 battery. Compared with the ELM and SVM models, the CatBoost model had better model performance and stronger generalization ability.

5.3. Verification of the Generalization Capability

To verify the generalization capability of the feature engineering and prediction model proposed in this paper, we re-extracted features for dataset B and performed SOH prediction based on the CGTSSA-CatBoost model. There were 148 cycle data in dataset B. The first 68 cycles were used in the training set, and the last 80 cycles were the test set. The above four models were used to predict the batteries’ capacity, with the prediction results shown in Figure 9.

Figure 9 shows that the CatBoost model optimized by super parameters had a better prediction performance than that with default parameters. The detailed evaluation indicators are shown in Table 5.

Figure 9 and Table 5 suggest that the CGTSSA-CatBoost model had the lowest MSE value but the best model performance in dataset B. It has also achieved good prediction results in the feature engineering established in this paper, with a value of RMSE < 5, which is capable of realizing the accurate prediction of SOH of the same battery.

In summary, the DWT-CGTSSA health factor extraction method and CGTSSA-CatBoost prediction model proposed in this paper had better prediction accuracy in different datasets than other state-of-the-art models or methods. The proposed model had a satisfying generalization capability. The results in Figure 7 and Figure 9, Table 3 and Table 5 revealed that the proposed method and model were capable of realizing the SOH prediction of different batteries, under the same charging and discharging strategy and under different working conditions. Satisfying prediction accuracy was achieved, with a goodness of fit of 0.99, and an MSE value of <1‰.

6. Discussion

Most of the published papers used the first I cycle of the same battery to predict the next II cycle (I + II = N, N is the total number of cycles). However, in practical applications, it is preferable to achieve the model trained by one or more batteries to predict the complete SOH of another battery under the same model and working conditions. In this study, it was found that the SOH prediction of several other batteries by one battery was valid, and some characteristics were significantly correlated with the SOH of the battery under the same model and working conditions. Each charging and discharging of the battery produce a large volume of data. Thus, how to mine information from the data is very important. In this paper, the feature engineering established by the DWT-CGTSSA could effectively extract the characteristics of the battery. The model established by these features could obviously predict the SOH between different batteries, which was proved in dataset A, but the premise was that the models, working conditions, and working environment of these batteries (temperature, humidity, etc.) were the same. In this paper, dataset B was used to test the generalization ability of feature engineering, and it was also proven that the prediction of feature engineering in the same battery could also achieve good results.

In this paper, NASA data and NCM batteries in the laboratory were used. Their capacities are different, so it is obvious that their chemical materials are slightly different. The algorithm proposed in this paper had a good prediction effect on the two kinds of batteries with different materials. The algorithm in this paper had a markedly strong generalization ability and could adapt to the prediction of NASA batteries and NCM batteries. In our subsequent work, we plan to apply the algorithm to lithium-ion batteries of various materials to observe whether the proposed algorithm can be applied to the SOH prediction of lithium-ion batteries with different chemicals. In addition, our later work will be combined with hardware, embed the algorithm into a microcontroller, and combine the algorithm with practical applications to make a real-time detection system. In addition, it should be noted that the prediction model proposed in this paper needs to be batteries of the same model, the same working condition, and the same environment. For different batteries, we need to redefine our algorithm to establish feature engineering.

Among other relatively new research results, the authors of [34] realized prediction in the same battery, whereas in this study, we completed prediction between different batteries. Compared with the study of [34], our model is more practical. In another study [16], the SOH of the whole battery was also completely predicted. NASA data were used in the data of this paper and in the study of [16], but its training set used three groups of batteries. In contrast, the training set in this paper only used one group of batteries, but the MSE value in this paper was lower when predicting the same battery, and the prediction effect was better. Therefore, the proposed model in this study used fewer training data to achieve better results.

7. Conclusions

In this paper, we adopted the DWT-CGTSSA to extract features, establish the CGTSSA-CatBoost prediction model, and conduct the experiment on different datasets to accurately predict the SOH and RUL of lithium-ion batteries. The conclusions are summarized as follows:

(1): By extracting the characteristics of signal amplitude factor and pulse factor after DWT decomposition, and then screening the characteristics by using the CGTSSA algorithm, the characteristics that were more suitable for the batter SOH prediction model could be extracted.
(2): In the established feature engineering, the CatBoost and its optimization models were used to predict the SOH of different batteries in dataset A, and good prediction results were obtained. Among them, the CGTSSA-CatBoost model had the best prediction effect, with AE 0, an MSE lower than 1‰, and an SSE lower than 0.2.
(3): Compared with the ELM and SVM models commonly used in the SOH prediction of battery, the CatBoost model had better performance and a better effect on multiple indicators such as MSE and RMSE. The AE index was 0, and the RUL prediction error of the model was lower.
(4): The SOH prediction of the same battery in dataset B using the feature engineering and prediction model in this paper also achieved good prediction results, the RMSE was less than 5, and the proposed method had a strong generalization ability.

In summary, the proposed feature engineering and prediction model can not only accurately predict the SOH of the same battery but also accurately predict the SOH between different batteries under the same model and the same charging and discharging strategy. The latter is more meaningful in practice.

Author Contributions

Conceptualization, W.C.; data curation, M.Z.; methodology, W.C.; project administration, M.Z.; resources, M.Z. and J.Y.; validation, J.Y.; visualization, T.F.; writing—original draft preparation, W.C.; writing—review and editing, M.Z. and W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of the Higher Education Institute of Anhui Province (KJ2020A0309) and the National Natural Science Foundation of China (51874010).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The access URL for dataset A in the manuscript is: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/, accessed on 16 November 2021.

Acknowledgments

The authors would like to thank Natural Science Foundation of the Higher Education Institute of Anhui Province for helpful discussions on topics related to this work. The authors would like to thank Natural Science Foundation of the National Natural Science Foundation of China for helpful discussions on topics related to this work.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Ali, M.U.; Zafar, A.; Nengroo, S.H.; Hussain, S.; Alvi, M.J.; Kim, H.J. Towards a Smarter Battery Management System for Electric Vehicle Applications: A Critical Review of Lithium-Ion Battery State of Charge Estimation. Energies 2019, 12, 446. [Google Scholar] [CrossRef] [Green Version]
Zhang, R.F.; Xia, B.Z.; Li, B.H.; Cao, L.B.; Lai, Y.Z.; Zheng, W.W.; Wang, H.W.; Wang, W. State of the Art of Lithium-Ion Battery SOC Estimation for Electrical Vehicles. Energies 2018, 11, 1820. [Google Scholar] [CrossRef] [Green Version]
Chen, Z.; Gu, Q.; Shen, S.; Shen, J.; Shu, X. Prediction of lithium ion battery health status based on health feature extraction and PSO-RBF neural network. J. Kunming Univ. Technol. Nat. Sci. Ed. 2020, 45, 92–103. [Google Scholar] [CrossRef]
Zeng, X.Q.; Li, M.; Abd El-Hady, D.; Alshitari, W.; Al-Bogami, A.S.; Lu, J.; Amine, K. Commercialization of Lithium Battery Technologies for Electric Vehicles. Adv. Energy Mater. 2019, 9, 1900161. [Google Scholar] [CrossRef]
Yao, L.; Fang, Z.P.; Xiao, Y.Q.; Hou, J.J.; Fu, Z.J. An Intelligent Fault Diagnosis Method for Lithium Battery Systems Based on Grid Search Support Vector Machine. Energy 2021, 214, 118866. [Google Scholar] [CrossRef]
Liang, X.; Zhang, M.; Huang, G. Review on lithium-ion battery modeling methods based on BMS. Energy Storage Sci. Technol. 2020, 9, 1933–1939. [Google Scholar] [CrossRef]
He, L.; Yang, Z.; Gu, Y.; Liu, C.; He, T.; Shin, K.G. SoH-Aware Reconfiguration in Battery Packs. IEEE Trans. Smart Grid 2018, 9, 3727–3735. [Google Scholar] [CrossRef]
Ge, M.F.; Liu, Y.B.; Jiang, X.X.; Liu, J. A review on state of health estimations and remaining useful life prognostics of lithium-ion batteries. Measurement 2021, 174, 109057. [Google Scholar] [CrossRef]
Shen, S.Q.; Liu, B.C.; Zhang, K.; Ci, S. Toward Fast and Accurate SOH Prediction for Lithium-Ion Batteries. IEEE Trans. Energy Convers. 2021, 36, 2036–2046. [Google Scholar] [CrossRef]
Lin, C.P.; Cabrera, J.; Yu, D.Y.W.; Yang, F.; Tsui, K.L. SOH Estimation and SOC Recalibration of Lithium-Ion Battery with Incremental Capacity Analysis & Cubic Smoothing Spline. J. Electrochem. Soc. 2020, 167, 090537. [Google Scholar] [CrossRef]
Jian, X.; Wei, J.; Wang, R. Remaining life prediction of lithium-ion batteries based on RPMDE-MKSVM. Control. Eng. 2021, 28, 665–671. [Google Scholar] [CrossRef]
Wang, F.; Shi, Y.; Liu, B.; Zuo, Y.; Fu, Z.; Jamsher, A. Health state estimation of lithium-ion batteries based on attention augmented BiGRU. Energy Storage Sci. Technol. 2021, 10, 2326–2333. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, H.; Wang, X. Hybrid 1DCNN-LSTM model for predicting lithium ion battery state of health. Energy Storage Sci. Technol. 2022, 11, 240–245. [Google Scholar] [CrossRef]
Feng, H.L.; Shi, G.L. SOH and RUL prediction of Li-ion batteries based on improved Gaussian process regression. J. Power Electron. 2021, 21, 1845–1854. [Google Scholar] [CrossRef]
Cai, L.; Meng, J.H.; Stroe, D.I.; Peng, J.C.; Luo, G.Z.; Teodorescu, R. Multiobjective Optimization of Data-Driven Model for Lithium-Ion Battery SOH Estimation With Short-Term Feature. IEEE Trans. Power Electron. 2020, 35, 11855–11864. [Google Scholar] [CrossRef]
Song, S.X.; Fei, C.; Xia, H.Y. Lithium-Ion Battery SOH Estimation Based on XGBoost Algorithm with Accuracy Correction. Energies 2020, 13, 812. [Google Scholar] [CrossRef] [Green Version]
Li, L.; Li, S.; Li, J.; Sun, K.; Wang, Z.; Yang, H.; Gao, B.; Yang, S. RUL prediction of lithium-ion battery based on differential voltage and Elman neural network. Energy Storage Sci. Technol. 2021, 10, 2373–2384. [Google Scholar] [CrossRef]
Park, M.S.; Lee, J.K.; Kim, B.W. SOH Estimation of Li-Ion Battery Using Discrete Wavelet Transform and Long Short-Term Memory Neural Network. Appl. Sci. 2022, 12, 3996. [Google Scholar] [CrossRef]
Kaur, K.; Garg, A.; Cui, X.J.; Singh, S.; Panigrahi, B.K. Deep learning networks for capacity estimation for monitoring SOH of Li-ion batteries for electric vehicles. Int. J. Energy Res. 2021, 45, 3113–3128. [Google Scholar] [CrossRef]
Khumprom, P.; Yodo, N. A Data-Driven Predictive Prognostic Model for Lithium-ion Batteries based on a Deep Learning Algorithm. Energies 2019, 12, 660. [Google Scholar] [CrossRef] [Green Version]
Rossi, C.; Falcomer, C.; Biondani, L.; Pontara, D. Genetically Optimized Extended Kalman Filter for State of Health Estimation Based on Li-Ion Batteries Parameters. Energies 2022, 15, 3404. [Google Scholar] [CrossRef]
Jo, S.; Jung, S.; Roh, T. Battery State-of-Health Estimation Using Machine Learning and Preprocessing with Relative State-of-Charge. Energies 2021, 14, 7206. [Google Scholar] [CrossRef]
Bhavsar, K.; Vakharia, V.; Chaudhari, R.; Vora, J.; Pimenov, D.Y.; Giasin, K. A Comparative Study to Predict Bearing Degradation Using Discrete Wavelet Transform (DWT), Tabular Generative Adversarial Networks (TGAN) and Machine Learning Models. Machines 2022, 10, 176. [Google Scholar] [CrossRef]
Thenmozhi, R.; Nasir, A.W.; Sonthi, V.K.; Avudaiappan, T.; Kadry, S.; Pin, K.; Nam, Y. An Improved Sparrow Search Algorithm for Node Localization in WSN. CMC-Comput. Mat. Contin. 2022, 71, 2037–2051. [Google Scholar] [CrossRef]
Ouyang, C.T.; Qiu, Y.X.; Zhu, D.L. Adaptive Spiral Flying Sparrow Search Algorithm. Sci. Program. 2021, 2021, 6505253. [Google Scholar] [CrossRef]
Wang, W.C.; Xu, L.; Chau, K.W.; Xu, D.M. Yin-Yang firefly algorithm based on dimensionally Cauchy mutation. Expert Syst. Appl. 2020, 150, 113216. [Google Scholar] [CrossRef]
Yuan, J.H.; Zhao, Z.W.; Liu, Y.P.; He, B.L.; Wang, L.; Xie, B.B.; Gao, Y.L. DMPPT Control of Photovoltaic Microgrid Based on Improved Sparrow Search Algorithm. IEEE Access 2021, 9, 16623–16629. [Google Scholar] [CrossRef]
Huang, G.M.; Wu, L.F.; Ma, X.; Zhang, W.Q.; Fan, J.L.; Yu, X.; Zeng, W.Z.; Zhou, H.M. Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions. J. Hydrol. 2019, 574, 1029–1041. [Google Scholar] [CrossRef]
Dhananjay, B.; Sivaraman, J. Analysis and classification of heart rate using CatBoost feature ranking model. Biomed. Signal Process. Control 2021, 68, 102610. [Google Scholar] [CrossRef]
Fu, F.C.; Jiang, J.W.; Shao, Y.X.; Cui, B. An Experimental Evaluation of Large Scale GBDT Systems. Proc. VLDB Endow. 2019, 12, 1357–1370. [Google Scholar] [CrossRef] [Green Version]
Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for big data: An interdisciplinary review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef] [PubMed]
Aggarwal, A.; Chakradar, M.; Bhatia, M.S.; Kumar, M.; Stephan, T.; Gupta, S.K.; Alsamhi, S.H.; Al-Dois, H. COVID-19 Risk Prediction for Diabetic Patients Using Fuzzy Inference System and Machine Learning Approaches. J. Healthc. Eng. 2022, 2022, 4096950. [Google Scholar] [CrossRef] [PubMed]
Ge, D.D.; Zhang, Z.D.; Kong, X.D.; Wan, Z.P. Extreme Learning Machine Using Bat Optimization Algorithm for Estimating State of Health of Lithium-Ion Batteries. Appl. Sci. 2022, 12, 1398. [Google Scholar] [CrossRef]
Li, Q.L.; Li, D.Z.; Zhao, K.; Wang, L.C.; Wang, K. State of health estimation of lithium-ion battery based on improved ant lion optimization and support vector regression. J. Energy Storage 2022, 50, 104215. [Google Scholar] [CrossRef]

Figure 1. Multiscale health factor extraction and CGTSSA-CatBoost prediction model based on DWT.

Figure 2. Experimental process.

Figure 3. Experimental equipment. (a) experimental equipment. (b) equipment connection diagram.

Figure 4. Charging current curve of different cycles.

Figure 5. Fitness curve of CGTSSA screening health factors.

Figure 6. Pearson correlation coefficient between the optimal features and prediction of battery capacity.

Figure 7. Prediction results of CatBoost model and its optimization model.

Figure 8. Comparison results of multiple models.

Figure 9. Prediction results of dataset B.

Table 1. Battery’s detailed parameter table.

Item	Specifications	Item		Specifications
Shell material	Nickel plated steel	charging method	standard	0.5C_5A × 7.5 h
Nominal capacity	1300 mAh	(CC/CV)	fast	1C_5A × 2.5 h
Rated voltage	3.7 V		charging	0 °C~45 °C 32 °F~113 °F
Charging voltage (Max)	4.2 V		charging	0 °C~45 °C 32 °F~113 °F
Discharge cutoff voltage	2.7 V		discharging	−15 °C~60 °C 5 °F~140 °F
Charging current (Max)	1 C₅A		discharging	−15 °C~60 °C 5 °F~140 °F
Discharge current (Max)	3 C₅A		storage	−20 °C~60 °C −4 °F~113 °F
Internal resistance (Max at 1000 Hz)	≤25 mΩ	working temperature	storage	−20 °C~60 °C −4 °F~113 °F

Table 2. Table of Feature information.

Feature Value	Symbolic Representation	Feature Value	Symbolic Representation
Mean value	Mean	Mean absolute value	Mae
Standard deviation	Std	Kurtosis	Kur
Root mean square	Rms	Energy	En
Maximum value	Max	Amplitude factor	Cre
Minimum value	Min	Waveform factor	Sha
Peak value	Peak	Impulse factor	Imp
Skewness	Ske	Clearance factor	Cle
Variance	Var	Root amplitude	Root

Table 3. Detailed prediction results of various algorithms.

Data	Method	Really Life	Predict Life	AE	R²	RMSE	SSE	MSE (‰)
B0006	CGTSSA-CatBoost	113	113	0	0.9938	0.0268	0.1210	0.7202
	SSA-CatBoost		112	1	0.9918	0.0317	0.1689	1.0056
	PSO-CatBoost		110	3	0.9807	0.0531	0.4728	2.8144
	CatBoost		102	11	0.9561	0.0708	0.8418	5.0109
B0007	CGTSSA-CatBoost	126	126	0	0.9947	0.0118	0.0233	0.1386
	SSA-CatBoost		127	1	0.9926	0.0147	0.0362	0.2155
	PSO-CatBoost		128	2	0.9828	0.0263	0.1166	0.6941
	CatBoost		117	9	0.9412	0.0465	0.3631	2.1613

Table 4. Detailed prediction results of various models.

Data	Method	Really Life	Predict Life	AE	R²	RMSE	SSE	MSE (‰)
B0006	CGTSSA-CatBoost	113	113	0	0.9938	0.0268	0.1210	0.7202
	CGTSSA-SVM		110	3	0.9901	0.0362	0.2206	1.3130
	CGTSSA-ELM		110	3	0.9750	0.0579	0.5632	3.3523
B0007	CGTSSA-CatBoost	126	126	0	0.9947	0.0118	0.0233	0.1386
	CGTSSA-SVM		129	3	0.9937	0.0178	0.0532	0.3169
	CGTSSA-ELM		104	22	0.9798	0.0447	0.3356	1.9977

Table 5. Prediction results of various algorithms.

Method	R²	RMSE	SSE	MSE
CGTSSA-CatBoost	0.9962	1.1905	113.3773	1.4172
SSA-CatBoost	0.9961	1.8156	263.7207	3.2965
PSO-CatBoost	0.9958	2.1051	354.504	4.4313
CatBoost	0.9864	4.7566	1809.9964	22.625

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, M.; Chen, W.; Yin, J.; Feng, T. Health Factor Extraction of Lithium-Ion Batteries Based on Discrete Wavelet Transform and SOH Prediction Based on CatBoost. Energies 2022, 15, 5331. https://doi.org/10.3390/en15155331

AMA Style

Zhang M, Chen W, Yin J, Feng T. Health Factor Extraction of Lithium-Ion Batteries Based on Discrete Wavelet Transform and SOH Prediction Based on CatBoost. Energies. 2022; 15(15):5331. https://doi.org/10.3390/en15155331

Chicago/Turabian Style

Zhang, Mei, Wanli Chen, Jun Yin, and Tao Feng. 2022. "Health Factor Extraction of Lithium-Ion Batteries Based on Discrete Wavelet Transform and SOH Prediction Based on CatBoost" Energies 15, no. 15: 5331. https://doi.org/10.3390/en15155331

APA Style

Zhang, M., Chen, W., Yin, J., & Feng, T. (2022). Health Factor Extraction of Lithium-Ion Batteries Based on Discrete Wavelet Transform and SOH Prediction Based on CatBoost. Energies, 15(15), 5331. https://doi.org/10.3390/en15155331

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Health Factor Extraction of Lithium-Ion Batteries Based on Discrete Wavelet Transform and SOH Prediction Based on CatBoost

Abstract

1. Introduction

2. Algorithm Principle

2.1. Discrete Wavelet Transform (DWT)

2.2. CGTSSA

2.3. CatBoost Model

2.3.1. GBDT Algorithm

2.3.2. Principle of CatBoost Model

3. Experimental

3.1. Instrument and Equipment

3.2. Experimental Steps

3.3. Experimental Data

4. Extraction of Health Factor Based on DWT-CGTSSA

4.1. Extraction of Health Factor Based on DWT

4.2. Selection of Health Factors Based on CGTSSA

5. Model Establishment and Verification

5.1. Comparison of CatBoost Model and Its Parameter Optimization Algorithm

5.2. Comparison with Other Models

5.3. Verification of the Generalization Capability

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI