The Effect of Voltage Dataset Selection on the Accuracy of Entropy-Based Capacity Estimation Methods for Lithium-Ion Batteries

.


Introduction
Lithium-ion batteries have started to be considered for electric vehicles [1] and various energy storage applications such as grid support, grid integration of renewables, or micro-grids [2][3][4][5].One of the biggest concerns in the aforementioned applications is their limited lifetime because their performance deteriorates with usage.A battery management system can efficiently treat the dynamics of the energy storage process in the battery and maintain the optimal operation of the batteries [6].In order to improve the performance and prolong the battery's life, accurate knowledge of the state of health (SOH) for battery management system is essential and SOH is often expressed as the ratio between the present capacity and the nominal capacity [7].As batteries are complex electrochemical systems, the capacity loss is difficult to measure in time and the accurate estimation of the battery's capacity is still an issue [7,8].
The most straightforward way to estimate the capacity is by measuring the charge transferred through the battery during either charging and/or discharging [9].However, the long time needed to measure the battery current and the requirement of high-precision current sensors decrease the practicality of this method in real applications.Based on electrochemical models or electrical models, state observers such as the multi-scale extended Kalman filter [10] and multi-scale nonlinear predictive filter [11] have been designed for the joint estimation of battery state of charge (SOC) and capacity.Due to the complex internal principles and uncertain working conditions, it is difficult to establish a battery model that can exhibit the battery dynamic characteristics accurately.Data-driven methods do not need knowledge of the battery model, and only depend on the training data.Moreover, with the development of big data technology, real-time monitored parameters like voltage, current, and temperature will be stored and processed, which also decreases the requirements of the microcontroller, and improves the estimation accuracy and robustness.Various data driven methods have been used to predict the battery SOH such as support vector machine [8], relevance vector machine [12], neural network [13], Gaussian process regression [14], etc.
Since the information inside the features determines the effectiveness of the data driven estimator, the features should contain plentiful information about the battery degradation [15].The incremental capacity peak and valley and their corresponding voltage values in the incremental capacity curve were used as features in [16].Differential geometric analysis [17] and importance sampling [18] were used to extract features from the terminal voltage of the constant current charging test.In [19], the cycling data (voltage, temperature and time) were clustered into several groups, and the regression estimator was trained by support vector machine based on the classification results.
Entropy is also used as a feature for capacity estimation because it can capture the variation of voltage, current, and temperature during the battery aging process.Li et al. predicted the battery capacity by combining particle filter with the sample entropy (SE) feature of voltage in the full discharging process [20].In their later work [21], the SE of the charging temperature, charging capacity, and rest time were considered simultaneously, and the relationship between multi-variable factors and discharge capacity was established.In [22], support vector machine and relevance vector machine were used to estimate the SOH based on the SE of the voltage in the full discharging process.Compared with the full charging or full discharging test, the hybrid pulse power characterization test is more efficient for the entropy-based estimator because it takes only a few seconds [23,24].Polynomial fitting method [23] and sparse Bayesian predictive modeling [24] were proposed to establish the relationship between the SE and the capacity, respectively.
Although several entropy-based methods have been proposed for capacity estimation, the performance of these estimators have not been compared when different voltage datasets are used as input.Considering that the voltage response varies under different test conditions, the influence of voltage datasets needs to be evaluated.In this paper, the hybrid pulse power characterization profile was divided into two single pulses (i.e., one for charging and another for discharging) that were performed on a LiFePO4/C battery cell at three SOC levels.Therefore, six test conditions (TCs) considering the current direction and SOC were studied.Approximate entropy (AE), SE, and multiscale sample entropy (MSE) were chosen as features, and the relationship between them and the battery capacity was established based on a first-order polynomial fitting.The accuracy of these capacity estimation methods under different TCs was compared.The rest of this paper is organized as follows.Section 2 presents the experimental battery aging test and the particularities of the current pulse test applied to the battery.The theory of the AE, SE, and MSE algorithm is introduced in Section 3. Section 4 compares the accuracy of the capacity estimation considering the voltage datasets and the type of entropy.Finally, conclusions are drawn in Section 5.

Experimental Test
The parameters of the tested LiFePO 4 /C battery are listed in Table 1. Figure 1 shows the process of the whole tests.The battery was aged using a primary frequency regulation mission profile for a period of 38 weeks.After each week of aging, the capacity of the battery was measured.A set of current pulse tests under different TCs were performed to generate the voltage datasets.The detailed process of the experiment is presented in the following subsection.During the battery aging test and the performance check-up, the battery was placed in a climatic chamber with a temperature set point of 25 • C [25] and the sample period of one second was adopted.

Aging Test and Capacity Fade Curve Acquisition
A LiFePO 4 /C battery cell was aged with a mission profile from the energy storage system providing the primary frequency regulation service for the grid [25].As seen in Figure 2, the mission profile had a length of one week, and the SOC varied from 10 to 90%.One full equivalent cycle (FEC) was defined as one fully charging plus one fully discharging.For the tested 2.5 Ah battery, 5 Ah went through the battery after each FEC.There was a total 4560 FEC during the overall aging process.In order to determine the capacity of the LiFePO 4 /C battery during its lifespan, the capacity test was performed following the one-week aging test.The capacity test consisted of a full charging process using a constant current (1C-rate)-constant voltage (3.6 V) method and a full discharging process with a 1C-rate current.The cut-off voltage for charging and discharging was 3.6 V and 2 V, respectively.The capacity obtained during the discharging test was considered as the capacity value.As shown in Figure 3, the capacity decreased from 2.57 Ah to 2.14 Ah, and the aging test was stopped because the battery reached its end-of-life when the capacity faded by 20% [26].

Current Pulse Test
Two impact factors (i.e., the SOC and the direction of pulse current) were considered in the current pulse test.Thus, a test matrix composed of six TCs was developed, as listed in Table 2.After each round of the capacity test, the LiFePO 4 /C battery was charged by the 1C-rate constant current until its SOC reached three pre-set SOC levels (i.e., 20% SOC, 50% SOC, and 80% SOC).At each SOC level, two single current pulses (a charging pulse and a discharging pulse) were performed on the battery cell.Both pulses were applied with a 4C-rate current (i.e., 10 A) and each of them lasted for 20 s. Between the pulses, a relaxation time of 15 min was considered in order to allow the cell to reach electrochemically stability.The current excitation and voltage response in the pulse test are shown in Figure 4.The voltage response was collected considering the 20 s of a single pulse and the first 10 s of relaxation.

Approximate Entropy Algorithm
AE was proposed to measure the probability of generating a new pattern in a signal [27].The AE can be expressed as AE(m, r, N), where N is a given total number of data; r (a positive real number) is the tolerance for accepting matrices; and m (a positive integer) is the dimension of vectors.The AE algorithm is summarized as follows: Step 1: For a given series {v(1), v(2), . . ., v(N)}, the N − m + 1 vectors V m (i) is formed as Step 2: The distance between vector V m (i) and V m (j) is defined as the maximum absolute difference of their scalar elements: Step 3: The conditional probability C i m (r) can be calculated as: where r is the given value and W m is the number of d[V m (i), V m (j)] ≤ r.
Step 4: The expression of the probability of matching points is defined as: Step 5: Similarly, m is replaced with m + 1, and step 1 through step 4 are repeated, and the function C i m+1 (r) and Φ m+1 (r) can be expressed as: where W m+1 is the number of d[V m+1 (i), V m+1 (j)] ≤ r, and Φ m (r) and Φ m+1 (r) are the probabilities that two sequences match for the m and m + 1 points, respectively.
Step 6: By fixing m and r, AE can be obtained As the length of the data is always limited, the AE is then estimated by the statistic

Sample Entropy Algorithm
SE is also a useful statistic for quantitatively analyzing the complexity and predictability degree of the time series [28].It is the negative natural logarithm of the conditional probability (CP) and vectors are not compared to themselves.The first two steps of the SE algorithm are the same as the AE, and the remaining steps are summarized below.B i m (r) and A i m (r) are defined as: The expression of the probability of matching points is defined as: where B m (r) and A m (r) are the probabilities that two sequences match for the m and m + 1 points, respectively.The expression of SE is given as The flow chart of the AE and SE algorithm is shown in Figure 5.Typically, the parameter m is suggested to be set at 2 or 3, and r is to be set between 0.1 and 0.25 times the standard deviation of the data [27].As these suggestions do not always demonstrate the best results for all kinds of datasets, the AE and SE algorithm can be tested using a range of parameter combinations (m = 2 and 3, r ranging from 0.1-0.3times the standard deviation of the data) [29].Then, the parameter can be chosen based on the minimization of the maximum sample entropy relative error [30].In the strategy for the optimal selection of r, the standard approximation is used and its expression is σ g(CP) g (CP) σ CP (14) where g(CP) = −log(CP).Then, where σ g(SE) and σ g(CP) are the relative errors of the SE and CP estimates, respectively.The parameter r can be selected by minimizing the quantity Figure 5. Flowchart of the approximate entropy (blue) and the sample entropy (red) algorithm.

Multiscale Sample Entropy Algorithm
The idea of coarse graining is introduced to calculate the multiscale entropy and it can describe the degree of irregularity of a complex time series [31,32].For a given time series {v(1), v(2), . . ., v(N)}, it is first divided into non-overlapping windows of length τ.Then, the average value of the data points inside each window is calculated.Thus, the consecutive coarse-grained time series {y(1), y(2), . . ., y(N/τ)} can be constructed according to Equation (18).
where τ is the scale factor.The diagram of the coarse-grained procedure is shown in Figure 6.

Datasets Analysis
The obtained six voltage datasets during the current pulse tests are shown in Figure 7, and each dataset contained 32 voltage time series.As the whole test lasted for 38 weeks and some voltage profile were lost during the experiment, six voltage series were excluded.Figure 8 shows the terminal voltage curve and its slope under the 1C-rate constant current full charging test.The slope of voltage is expressed as here V slope1 , V slope10 , and V slope20 is the slope for the 1 s, 10 s, and 20 s intervals, respectively.The trend of the slope curve shows that the polarization of the battery cell is serious at the depleted state (polarization zone) and the variation of battery voltage in mid-SOC range is flat (flat zone) [33].This is why the voltage datasets under TC2, TC5, and TC6 varied more rapidly than that under other TCs.

Accuracy Comparison
The overview of the proposed analyzing method is illustrated in Figure 9.The AE and SE were computed using the voltage datasets and the first-order polynomial fitting was chosen to explore the entropy-capacity mapping [23].By comparing the accuracy of both estimators, the best suitable TC can be selected.In order to select the best suitable scale, three MSE-based estimators (scale 1, 2, 3) were constructed based on the selected TC.The datasets were divided into the validating group data and training group data, as shown in Figure 10, and the training group data were used to establish the relationship between the battery capacity and entropy.The accuracy of the estimators was verified by the validating group data.Based on the optimal selection strategy of the minimization of the maximum sample entropy relative error, the parameters of m, r, N of the entropy algorithm were taken as 3, 0.04, and 30, respectively.The accuracy of the capacity estimators was evaluated using the mean percentage error (MPE) and the root-mean-squared percentage error (RMSPE), and their expressions are 21) where C i is the reference capacity measured after each round of aging; Ĉi is the estimated capacity; N T is the total number of test points; and PE i is the percentage error of the ith test point.It can be seen from Figure 11 that the AE feature and SE feature under TC5 changed more obviously than that under other TCs.As the value of SOC under TC5 will enter the polarization zone in the future, quick changes of the voltage response at high SOC level will lead to large differences in entropy.In contrast, the SOC will become smaller for TC6 and it is away from the depleted state.Hence, TC6 cannot be used to estimate the capacity, and the results agreed with the previous analysis of polarization characteristics in Section 4.1.The relationships between the AE/SE feature and the capacity were acquired based on the first-order polynomial fitting.As shown in Figure 12, the variation of entropy over the capacity was flat under other TCs (except TC5), which will lead to a large estimation error.Consequently, as shown in Figure 13, the percentage error under TC5 was lower than that under other TCs in most ranges of the cycling time.The three-dimensional diagram of MPE and RMSPE with respect to two impact factors is shown in Figure 14.When the SOC was 0.2 and 0.5, the estimation errors were relatively high regardless of what the current direction was because the battery state was in the flat zone.The best accuracy for the AE-based and SE-based estimators was achieved under TC5 as a result of polarization.Similarly, a low SOC level (i.e., smaller than 0.1) can also lead to high accuracy as a result of the polarization; however, this case was not considered in the experiment.In addition, both AE-based and SE-based methods showed very similar estimation errors.Based on these results, it can be concluded that the combined action of the SOC level and the current direction has the main effect on the capacity estimation accuracy.Only if the battery state enters into the polarization zone will high accuracy be achieved.As the SE-based method in TC5 showed the best performance in terms of MPE and RMSPE, the MSE-based method was selected as a case study to analyze the effect of scale on the capacity estimation accuracy.The voltage dataset measured from TC5 was used for the MSE feature calculation, and the scale had the value of 1, 2, and 3, respectively.It can be seen from Figure 15a that no linear relationships between the MSE and the cycling time existed when the scale was larger than one.As a result, as seen in Figure 15b, the slope of the first-order polynomial function was negative, which will lead to a large estimation error.The estimation results in Figure 16 indicate that the percentage error of the MSE-based method decreased with the reduction in the scale.Table 3 shows more clearly that the MSE-based method with scale 1 had the highest accuracy.In this case, MPE and RMSPE were 1.67% and 1.95%, respectively.It can be concluded that MSE feature is not suitable for battery capacity estimation, because enlarging the scale factor means a reduction in sample points and eventually causes a drop in estimated accuracy.

Conclusions
The effect of voltage dataset selection on the accuracy of three entropy-based capacity estimation methods for LiFePO 4 /C battery was studied in this work.Six types of voltage datasets were collected considering the current direction and the SOC level.A first-order polynomial fitting was utilized to establish the relationship between the entropy feature (i.e., AE, SE, and MSE) and the capacity.
The experimental results showed that the combined action of the SOC level and the current direction plays a main role in the capacity estimation accuracy.Only if the battery SOC enters into the polarization zone (SOC larger than 0.8 or smaller than 0.1) will a high accuracy of the capacity estimation be achieved.In addition, both AE-based and SE-based methods showed very similar estimation errors.In this work, MSE was not suitable as a feature because large scales lead to information loss and eventually causes a drop in the estimated accuracy; nevertheless, higher voltage sampling (e.g., 100 ms) could improve the accuracy of this method.

Figure 2 .
Figure 2. State of charge variation in the aging mission profile and the current and voltage response in the zoomed part.

Figure 3 .
Figure 3. Capacity test of the tested battery cell.

Figure 4 .
Figure 4. Pulse current and voltage response in the current pulse test.(a) Charging; (b) Discharging.

Figure 6 .
Figure 6.The diagram of the coarse-graining procedure.

Figure 8 .
Figure 8. Terminal voltage and its slope curve under the 1C-rate constant current full charging test.

Figure 9 .
Figure 9.The overview of the proposed analysis method.

Figure 10 .
Figure 10.Schematic diagram of the validating dataset and training dataset.

Figure 11 .
Figure 11.Curve of the calculated entropy during the cycling of the tested battery cell.(a) Approximate entropy-based method; (b) Sample entropy-based method.

Figure 13 .
Figure 13.Estimation results under six test conditions.(a) Approximate entropy-based method; (b) Sample entropy-based method.

Figure 14 .
Figure 14.The diagram of the mean percentage error and root-mean-squared percentage error considering the state of charge level and current direction (minus one indicates the charging direction and plus one indicates the discharging direction).(a) Approximate entropy-based method; (b) Sample entropy-based method.

Figure 15 .
Figure 15.Curve of the multiscale sample entropy and their polynomial fitting results.(a) Curve of the calculated multiscale sample entropy during the cycling of the tested battery cell; (b) First-order polynomial fitting results between the calculated multiscale sample entropy and measured capacity.

Table 1 .
The datasheet of the LiFePO 4 /C battery.
Figure 1.The flowchart of the test schedules.

Table 2 .
Test matrix for obtaining the voltage dataset for training the entropy-based capacity estimator.