Capacity State-of-Health Estimation of Electric Vehicle Batteries Using Machine Learning and Impedance Measurements

: With the increasing adoption of electric vehicles (EVs) by the general public, a lot of research is being conducted in Li-ion battery-related topics, where state-of-health (SoH) estimation has a prominent role. Accurate knowledge of this parameter is essential for efﬁcient and safe EV operation. In this work, machine-learning techniques are applied to estimate this parameter in EV applications and in diverse scenarios. After thoroughly analysing cell ageing in different storage conditions, a novel approach based on impedance data is developed for SoH estimation. A fully-connected feed-forward neural network (FC-FNN) is employed to estimate the battery’s maximum available capacity from a small set of impedance measurements. The method was tested for estimation in long-term scenarios and for diverse degradation procedures with data from real EV batteries. High accuracy was obtained in all situations, with a mean absolute error as low as 0.9%. Thus, the proposed algorithm constitutes a powerful and viable solution for fast and accurate SoH estimation in real-world battery management systems.


Introduction
The world's mobility systems are experiencing a profound change as the market share of electric vehicles (EVs) increases for both personal and professional use. This has pushed forward the research on energy storage systems, where lithium-ion (Li-ion) batteries is the most widespread technology. Many different types of Li-ion chemistries exist depending on the choice of the positive and negative electrodes' materials, each of them showing different characteristics. In EV applications, the most-common ones are lithium-nickel-manganese-cobalt (NMC) and lithium-nickel-cobalt-aluminium (NCA) [1].
In order to perform efficient energy and power control of an EV, as well as safe and reliable operation, an effective battery management system (BMS) must be implemented, which has state-of-health (SoH) estimation as one of its essential functions. This parameter is defined in this paper as the ratio of maximum available capacity at the current state to rated capacity, as in Equation (1).
In recent years, many SoH estimation algorithms have been proposed based on different considerations. However, despite their great potential for solving complex modelling problems, a lot of possibilities for applying machine learning (ML) algorithms are yet to be explored.

State-of-the-Art Review
Over the last few years, plenty of SoH estimation algorithms have been proposed based on a wide variety of factors and procedures. A very comprehensive review of SoH estimation methods may be found in [2], considering both experimental and adaptive approaches, as well as some based on degradation-mechanism analysis. Another stateof-the-art review, but focused on just machine learning algorithms can be found in [3]. A detailed analysis of the procedures and results presented in these works shows that there is still room for improvement in terms of both applicability and performance, and that the correlation between impedance and maximum available capacity has not been exploited yet. Using impedance as health indicator has some interesting advantages over other methods: while approaches based on measurements taken while charging usually require a long time to complete, impedance can be measured in a very short time and without the need of charging the EV. This has been demonstrated by some recent works. In [4], the authors implemented two types of pseudo-random sequences, namely binary and ternary, to significantly reduce the execution time of the impedance measurements. A similar approach was applied in [5], but using the continuous Morse wavelet transform instead of the fast Fourier transform. Psuedo-random sequences were also employed in [6] with a focus on various implementation aspects in EV. Authors in [7] proposed using machine learning to reconstruct the entire impedance spectrum based on a few multi-sine signal injection measurements, which also reduced the execution time significantly. In [8], a very comprehensive analysis of battery ageing in different storage conditions was performed on long-term data. Then, a model was derived to predict the variation in some of the equivalent-circuit model parameters based on storage data. In [9], an SoH estimator was developed in the form of a support-vector regression algorithm which took as inputs the equivalent-resistance measurements at some specific state-of-charge levels.

Paper Contributions and Structure
In this paper, a new method for estimating EV batteries' SoH is proposed by making use of machine learning and electrochemical impedance spectroscopy (EIS) data. The algorithm requires a very limited amount of impedance measurements at some fixed frequencies. This information is fed to a fully-connected feed-forward neural network (FC-FNN) to predict the cell's maximum available capacity. Thus, the contributions of this paper are:

1.
Analysis of impedance measurements' potential as health indicator to predict a battery's maximum available capacity; 2.
Improved flexibility, since the execution constraints are relaxed, e.g., the algorithm does not require the battery to be charging; 3.
Highly-accurate estimations obtained with a limited number of measurements.
The remainder of the paper is organised as follows: the data set, the employed neural network and the estimation algorithm are described in Section 2. The experimental validation is outlined and the obtained results are presented in Section 3. Lastly, conclusions are drawn in Section 4.
The content of this paper is based on the work carried out by the corresponding author in relation to his master's thesis [10] and built upon previous research of the same team [8,11].

Data Set Description
The data used throughout this work is composed of impedance and maximum capacity measurements taken from a set of BMW i3 battery modules, whose characteristics are presented in Table 1. The former is obtained by means of electrochemical impedance spectroscopy, while the latter is found with constant-current, constant-voltage (CCCV) charge cycles. Since in real-world EV applications batteries spend most of the time in an idle state, the ageing procedure employed for this research is focused on storage at different conditions in terms of state of charge (SoC) and temperature, as shown in Table 2. This arrangement allows for a good study of the impact of both parameters on capacity loss while covering a wide range of realistic operation conditions of EV batteries. The cells were taken out of storage once a month to conduct the reference performance tests. All experiments were performed at a constant temperature of 25°C. Impedance was measured in galvanostatic mode at 10%, 50% and 90% SoC at 50 points from 10 mHz to 10 kHz, while the charge and discharge C-rate was 0.2 C. Data spanning around 13 months were recorded. A more comprehensive description of the data set and the experiments can be found in [8,11].
For simplicity purposes, only measurements corresponding to one of the SoC levels are employed for the degradation analysis and the development of the estimation algorithm. However, the impact of this factor on impedance must be assessed first. To that end, the measurements at all three charge levels are shown in Figure 1 for cell C1 at both pristine and aged stages. The plot shows that cell's SoC during the measurements does not have a significant impact on EIS results, specially when compared to ageing. Thus, this parameter can be neglected when developing the estimation algorithm, although it will be considered in the testing stage.

Degradation Analysis
In order to analyse the relationship between maximum available capacity and cell impedance, the evolution of cell C1's impedance as the experiment progressed is depicted in the Nyquist plot in Figure 2 and the Bode plot in Figure 3. The connection between these two diagrams is that points in the upper-right zone of the Nyquist plot correspond to lowfrequency measurements, while those in the lower-left area are from high-frequency ones.    In the Nyquist plot, a clear tendency of increasing real part with degradation can be observed, which occurs at all frequencies. The same behaviour can be noted from the Bode plot, which shows how the impedance's magnitude becomes larger with degradation, while its phase experiences a similar variation. The main consequence of this impedance increment would be higher losses when charging and discharging the battery and, thus, lower energy efficiency. This correlation between degradation and cell impedance is the fundamental argument for using these measurements as health indicators.
Based on the collected data, it is possible to analyse how cells stored at different temperatures and SoC levels deteriorate. Figure 4 shows the evolution of the cells' maximum available capacity as time passed. When looking at data from cells aged at the same SoC level (50%), capacity loss is largest at the hottest temperature (45°C) and lowest at the coldest one (7°C). At the same time, when comparing between similar storage temperatures (45°C), degradation was most significant for cells which where halfway charged (50%) than for those closer to maximum (90%) and minimum (10%) levels, being the latter the one with lowest capacity decay. It can also be observed how there exists a considerable difference between the best and worst cases, which retained 59.8 A h and 38.9 A h after 13 months in storage. In terms of SoH, these correspond to 98% and 63.8%, respectively.  In the legend, the leftmost column represents the storage SoC, and the rightmost one, the storage temperature. Figure 5 shows the Nyquist plot of cells which were stored at the same temperature (45°C) and three different SoC levels (10%, 50% and 90%). Data corresponds to measurements at 50% SoC after 13 months in storage. It can be observed how resistance increment is much more significant for cells stored at intermediate charge levels than for the ones which were almost depleted or full. Between the extremes, capacity loss was lower when the battery was stored at lower SoC. Similarly, Figure 6 shows the Nyquist plot of cells which were aged at the same SoC (50%) and multiple temperatures (7°C, 35°C, 40°C and 45°C). In this case, impedance evolution becomes more severe for those cells which were kept at hotter temperatures than for those stored at colder temperatures. These plots prove that storage conditions play a major role in a battery's capacity and impedance degradation, being cold temperatures and extreme charge levels the most beneficial ones. Furthermore, they also show that, in general, capacity loss is linked to an increase in the impedance's real part at all frequencies or, in other words, a rightwards displacement of the Nyquist plots. This evolution will be exploited when designing the SoH estimation algorithm.

Fully-Connected Feed-Forward Neural Networks
For this work, a very simple fully-connected feed-forward neural network is chosen as an estimator. Although not the most powerful ML structure, FC-FNNs provide a good trade-off between accuracy, simplicity and low computational cost. This is due to the fact that, as will be shown throughout this section, they consist of simple linear operations (additions and multiplications) which can be carried out swiftly even in low-power processors. Furthermore, these can be parallelized to make the algorithm even more efficient. They are made up of several layers stacked one after another, each of them containing several individual neurons. Every neuron takes as inputs the outputs of all the neurons in the previous layer, which are multiplied by the corresponding weights and added to the bias term. Lastly, a non-linear activation function is applied to the result, such as the hyperbolic tangent or the rectified linear unit (ReLU). The operation carried out in each neuron is given by Equation (2), where f (α) is the non-linear activation function. A general diagram of the FC-FNN is shown in Figure 7, with the symbols as defined in Table 3.   Weight from sth neuron in ith hidden layer to output y q The weights in the network are updated following the back-propagation algorithm with stochastic gradient descent. An Adam optimiser is used with mean-squared-error (MSE) as loss metric, along with early stopping and batch training to avoid over-fitting while training. The data split between training and validation is 70% and 30% of the samples, respectively.
For the remainder of the paper, the nomenclature used for describing the FC-FNN structure will be FNN(n 1 − . . . − n i , AF), where n 1 is the number of neurons in the first hidden layer, n i is the amount of neurons in the last one and AF denotes the activation function employed in all neurons.

Impedance-Based SoH Estimation Algorithm
The method proposed in this paper aims at predicting a cell's maximum available capacity based on a very limited set of impedance measurements. The choice of which frequencies to use is critical in order to guarantee good performance and applicability. Two criteria are fundamentally used for this purpose: first, large differences should exist between a pristine and an aged cell in terms of complex impedance value. Secondly, intermediate frequencies are preferred over extreme ones, since low-frequency measurements would greatly increase the process' duration, and high-frequency ones could lead to larger errors caused by the inductive behaviour of the wiring. It is also desired to keep the amount of points to a minimum so that the entire estimation process can be performed faster.
The variation of impedance throughout ageing can be easily quantified by computing the standard deviation of the impedance at each frequency for the entire experiment's period. Thus, high values of standard deviation would indicate that differences are larger at that given frequency and, therefore, that the point is more relevant as health indicator. Figure 8 shows this metric for cells aged at the same temperature (45°C), while Figure 9 does so for cells stored at the same SoC level (50%).   It is clear that relevant differences exist only in the real part of the impedance, since the imaginary part's standard deviation is quite low, no matter the ageing procedure. Thus, the largest variations correspond to the cell stored at 50% SoC when considering sametemperature experiments, while they are highest for the cell kept at 45°C in same-SoC ones. This is consistent with the previous degradation analysis, since these were the cells who experienced the largest capacity loss (see Figure 4) and the largest impedance change (see Figures 5 and 6) by the end of the experiment. Figures 8 and 9 also show that, as expected from Figures 5 and 6, impedance variation is not the same at all frequencies and, thus, some points are more relevant that others to exploit the phenomenon described throughout this paper.
Based on these considerations it was chosen to use the following frequencies as feature points: 10 Hz, 20 Hz, 50 Hz, 100 Hz, 200 Hz, 500 Hz and 1 kHz. The input vector to the FC-FNN is then built following Equation (3), while the output is the cell's estimated maximum available capacity Q max .
x = [ Re(Z 10 ), Re(Z 20 ), Re(Z 50 ), Re(Z 100 ), Re(Z 200 ), Re(Z 500 ), Re(Z 1000 ), Im(Z 10 ), Im(Z 20 ), Im(Z 50 ), Im(Z 100 ), Im(Z 200 ), Im(Z 500 ), Im(Z 1000 )] T It must be noted that these are approximated frequency values, since they do not perfectly match the ones present in the data set. Exactness is not a critical aspect in the proposed approach, as long as the frequencies used in the real application are the same ones as during training.

Results
In order to evaluate the performance of the proposed algorithm, two different scenarios are designed: in the first one, the FC-FNN is trained with all the data from a set of cells (C1, C2, C3, C4) and then tested on data from units aged in different conditions (C5, C6). Secondly, the algorithm is trained with only a part of the entire time series (0-10 months) and then tested on the remaining part (11-13 months). The former aims at verifying if recording data in a limited set of conditions is enough to make predictions no matter the ageing procedure, while the latter assesses the accuracy of the algorithm when facing deeper-degradation data. Additionally, to further investigate the applicability of the method, all three available SoC levels are tested.
The performance metrics used are the root mean squared error (RMSE) , the mean absolute error (MAE) and the maximum absolute error (ME), as well as the residuals' mean (µ res ) and standard deviation (σ res ), all of them normalized. The formulas for these metrics are as given by Equations (4) and (5), whereŷ i , y i are the estimated and real output values of the i th sample, respectively, and N is the total number of samples.
Besides the error rates, the amount of weights in each of the networks is given to provide some insight on the computational requirements of each of the models. These, added with the total amount of neurons in each network, define the memory usage.

SoC-Dependent Capacity Estimation
In this case, an independent estimator, i.e., a different FC-FNN, is trained and tested for each SoC level. The obtained results are presented in Tables 4-6 for 10%, 50% and 90% SoC, respectively.  It can be observed that the algorithm performs very well in all cases, with similar error rates in both scenarios. The algorithm could be made even more accurate if more data were available or by adding more neurons and layers. It was chosen not to do so to keep the amount of parameters as low as possible.

SoC-Independent Capacity Estimation
In order to further test the proposed approach's capabilities, a final test is conducted in which data from all three available SoC levels are merged into a single data set, while keeping the input vector as defined in Equation (3). The same two scenarios as before are used. The goal is to verify whether the algorithm can work properly no matter the current SoC of the battery, since this parameter was shown to have little effect on impedance (see Section 2.1). Results are presented in Table 7. This shows that, even though the algorithm also works well when the influence of the SoC in impedance is neglected, accuracy is sensibly lower than in the SoC-dependent test cases.

Conclusions
The proposed algorithm showed great performance on challenging scenarios, combining very low estimation errors with low computational cost, since only seven complex impedance measurements and a very compact model were required. Despite these very good results, the low amount of recorded data definitely had a negative impact on performance.
It has been shown that a relationship between impedance spectrum and maximum available capacity exists and can thus be exploited for SoH estimation. The use of ML algorithms simplified the modelling of such relationship, as these do not require comprehensive electrochemical knowledge of the inner operation principles of the battery. The method developed in this work showed very good accuracy with an MAE below 2% when facing new data from different ageing procedures, as well as when making long-term predictions. The proposed approach greatly improves applicability, since impedance measurements may be conducted at any time as soon as the battery is at some specific SoC level and temperature, which reduces to only temperature if the combined method is used. For future works, more degradation data need to be collected from more batteries and conditions in order to enhance and evaluate performance even further. Funding: This work has been part of the Adaptive Battery Diagnostic Tools for Lifetime Assessment of EV Batteries (BATNOSTIC) research and development project, project no. 64015-0611, and the Workshop Automated BAttery Tester (WABAT) project, project no. 64019-0056. The authors gratefully acknowledge EUDP Denmark for providing the financial support necessary for carrying out this work.

Data Availability Statement:
The data used in this paper was first described in [8], although the data used in the original work included only 11 months of ageing and measurements.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: BMS