State-of-Health Identiﬁcation of Lithium-Ion Batteries Based on Nonlinear Frequency Response Analysis: First Steps with Machine Learning

: In this study, we show an effective data-driven identiﬁcation of the State-of-Health of Lithium-ion batteries by Nonlinear Frequency Response Analysis. A degradation model based on support vector regression is derived from highly informative Nonlinear Frequency Response Analysis data sets. First, an ageing test of a Lithium-ion battery at 25 ◦ C is presented and the impact of relevant ageing mechanisms on the nonlinear dynamics of the cells is analysed. A correlation measure is used to identify the most sensitive frequency range for ageing tests. Here, the mid-frequency range from 1 Hz to 100 Hz shows the strongest correlation to Lithium-ion battery degradation. The focus on the mid-frequency range leads to a dramatic reduction in measurement time of up to 92% compared to standard measurement protocols. Next, informative features are extracted and used to parametrise the support vector regression model for the State of Health degradation. The performance of the degradation model is validated with additional cells and validation data sets, respectively. We show that the degradation model accurately predicts the State of Health values. Validation data demonstrate the usefulness of the Nonlinear Frequency Response Analysis as an effective and fast State of Health identiﬁcation method and as a versatile tool in the diagnosis of ageing of Lithium-ion batteries in general.


Introduction
Lithium-ion batteries (LIBs) are currently the most widely used type of battery for electromotive applications and are seen as the most promising candidate for the realisation of a comprehensive electro-mobility. Current research focuses on the optimisation of safety as well as ageing diagnostics and lifetime predictions. The cell design, its operation and the environmental conditions all affect the ageing of a cell and therefore the lifespan of LIBs [1,2]. Various LIB degradation processes, which lead to a distinct decrease of the maximal usable capacity C as well as an increase of the internal ohmic resistance R of the LIB, can be distinguished [1][2][3]. In the field of e-mobility, for instance, the LIB degradation results in a loss of driving range per charge in electric vehicles (EVs) [2] and therefore in the deterioration of the coulombic efficiency η of LIBs. Ageing processes occurring at the negative and the positive electrodes differ significantly and are usually differentiated in the literature [1]. Ageing processes in the electrolyte and the separator mostly result, respectively, from reactions with the electrodes and reactions at the electrode-electrolyte interface [1]. The most prominent ageing process at the negative electrode, the anode, is the formation of the solid electrolyte interface (SEI) [4] as well as Lithium plating [5,6]. At the positive electrode, the cathode, structural changes during cycling in combination with chemical decomposition reactions of the electrode material result in capacity decreases and therefore decreasing charging efficiency [1]. Furthermore, calendar and cyclic ageing of LIBs can be distinguished where both depend on environmental conditions [2,7]. Additionally, the State-of-Charge at which the battery is stored impacts calendar ageing [8]. Regarding cyclic ageing, the depth of discharge (DoD) also impacts battery degradation processes [9]. Moreover, manufacturing aspects, i.e., variations in raw materials and assembly accuracy, complete the picture of relevant ageing factors. Following a cause-and-effect principle, we summarised the main factors influencing battery degradation and their interactions in an Ishikawa diagram; see Figure 1. In this work, we do not aim to identify and analyse dedicated ageing mechanisms but focus on higher-level LIB diagnostics tailored for battery management systems (BMSs).  Reliable and precise state diagnosis of LIBs, e.g., State-of-Health (SoH), the State-of-Charge (SoC), and the characteristic remaining useful life (RUL) values, is therefore essential for reducing safety risks and estimating and extending the cycle life of battery systems used in EVs [8]. The accurate diagnosis of SoH is an indispensable part of effective battery health management for every BMS [10]. The SoH is mostly characterised by the maximum usable residual capacity C or by the relative change in the internal ohmic resistance R [11]. Typically, BMS uses such values as the capacity decrease and the power fade based on the increase of R with regard to the nominal values C 0 and R 0 to quantify the SoH [12].
For a non-destructive determination of the SoH and the RUL under operating conditions or at rest, electrochemical in situ measurement methods are mandatory. Besides monitoring open-circuit-voltage (OCV), the internal resistance R, determined via constant and pulsed current-voltage measurements, can be used to monitor the SoH [13,14]. Other approaches use the internal resistance R for the estimation of the SoH [15][16][17]. For the measurement of cell capacity, various in situ techniques are proposed in the literature, such as incremental capacity analysis (ICA) [18], differential voltage analysis (DVA) [19], combined IC-DV analysis techniques [20] as well as electrochemical impedance spectroscopy (EIS) [21]. Therefore, the use of ICA, DVA and EIS is suitable for SoH diagnosis as well as for the identification and differentiation of various ageing mechanisms [20,22]. These methods have specific advantages in SoH estimation. EIS has a shorter measurement time and can be used at particular frequencies for SoC. In contrast, ICA and DVA have the drawback of an extended test duration of approximately 10 h. Furthermore, these methods are not suitable for specific cell states [21]. However, ICA and DVA are superior to EIS in terms of their universal, model-independent applications and low-cost hardware implementations as well as their easy calculation [20]. Nevertheless, LIBs have to be in a steady-state operating condition to apply the previously mentioned diagnostic techniques. Another disadvantage of EIS is that cell processes with a nonlinear current-voltage relation are only slightly excited in EIS due to the low excitation amplitude. Hence, only the linear system behaviour is observed, and relevant nonlinear cell information is not accessible for LIB diagnostics and SoH identification. In general, an efficient BMS benefits from a fast, cheap and accurate on-board SoH diagnosis of the state and the cycle life of LIBs [10,23].
Recently, a novel dynamic analysis method, the Nonlinear Frequency Response Analysis (NFRA), was established and applied to LIBs to investigate cell behaviour when it is excited with a nonlinear signal [24][25][26]. Thus, the effect of ageing mechanisms on the measured cell response is enhanced, and NFRA-based data might be more suitable for reliable SoH identification [24]. The application of NFRA for SoH estimation and RUL prediction is summarised in Figure 2 using a SWOT matrix to assess the NFRA concept as well as its application in LIB diagnostics and SoH identification. The strengths and opportunities of NFRA in LIB diagnostics look promising and motivate our experimental study. The weaknesses and threats seem to be manageable, in particular, as NFRA is a mature tool used in the state diagnosis of non-electrochemical systems. In 1985, NFRA was used for the first time to determine the SoC of lead-acid batteries [27]. In the field of fuel cells, at the stack and cell level, NFRA was successfully applied for system monitoring and analysis [28][29][30], but it was also used in more general kinetic studies [31,32]. Over the last decade, data mining approaches have become popular for diagnosing and predicting the cell state of LIBs in BMSs. There are different approaches to unambiguously identify the complex and multi-scale ageing of LIBs and to predict cell state indicators. Common data-driven diagnostic algorithms use artificial neural networks as well as support vector machines (SVMs). Both methods have already been implemented for state identification and for determining the RUL of LIBs for simple test cases [33][34][35]. Since the accuracy of the data-driven processes depends critically on the data basis, one also attempts to increase the information content of the raw data with the help of first-principles models. Hybrid methods ideally combine the benefits of data mining with those of classical modelling principles [36,37]. Furthermore, they serve as the basis for data mining of electrochemical manipulated variables and measurements, such as current, voltage, internal resistance, maximum usable capacity and impedances [21,38]. To the authors' knowledge, diagnostic algorithms, which also take NFRA data for LIB diagnosis into account, are currently not available in the literature.
In this work, we analyse the applicability of NFRA as a diagnostic tool for LIBs and their SoH. Therefore, an ageing test of LIBs at 25 • C is presented, and the impact of relevant ageing mechanisms on the nonlinear dynamics of the cells is explained in detail. Following a data mining strategy, an empirical degradation model is derived from informative NFRA data sets. First, highly diagnostic features are extracted and used to calibrate the degradation model for the NFR-driven SoH identification. Finally, the performance of the degradation model is validated with additional cells and validation data, respectively.

Methods
In this section, methods for SoH identification of LIBs are presented. First, NFRA as a dynamic ageing analysis method is introduced to gather informative data. Next, the Spearman rank correlation is used to determine the relevant frequencies related to LIB ageing and NFRA data, respectively. NFRA data limited to meaningful frequency ranges are processed to build the degradation model, which is implemented as a support vector regression (SVR) approach.

Nonlinear Frequency Response Analysis (NFRA)
With NFRA, a sinusoidal input signal with high current amplitudes I AC is applied to the system in a defined frequency range from mHz to kHz. Changes in the output voltage U AC in the time domain are measured. By applying a Fast Fourier Transformation (FFT), the sinusoidal voltage output signal U AC is transferred from the time domain to the frequency domain. However, not only the voltage signal corresponding to input frequency Y 1 , but also higher harmonic responses Y n with n ≥ 1 are detected. Higher harmonic amplitudes of the responses Y 2 to Y n are observed at multiples of the fundamental frequency f 1 of the sinusoidal input signal: Even Y 2n and odd harmonics Y 2n+1 have different characteristic responses. We examined these LIB characteristics in our previous work by investigating individual higher harmonic responses Y n as well as the corresponding sum n ∑ i=2 Y i over the frequency [24]. In general, the values for Nonlinear Frequency Responses (NFRs) correspond to the sum of nonlinearities The working principle of NFRA as well as the processing of the voltage output from the time in frequency domain using an FFT is illustrated in Figure 3.

SoH Degradation Model Based on Machine Learning
In the literature, we can find various concepts for SoH degradation modelling which can be classified as first-principles, empirical, and hybrid models. First principles models translate electrochemical knowledge in equation systems but are generally difficult to derive. First, relevant processes have to be determined, associated model equations defined, and model parameters precisely identified. Thus, experimental data must be available to calibrate first-principles models. Alternatively, empirical models can be directly extracted from exhaustive data sets without any detailed knowledge of the underlying electrochemical processes. Please note that the quality of the empirical models depends critically on the available data and may fall short in extrapolation in the case of limited experimental data. Hybrid models, in turn, combine first-principles and data-driven concepts to use the benefits of both concepts; i.e., no perfect system understanding is necessary and missing data are compensated for with mechanistic rules. In this study, research is focused on empirical models and data mining techniques. Besides clustering and classification, regression concepts are frequently used for correlated data that are represented by governing equations for their input-output relationships. Complex neural networks and deep learning principles might be good candidates for data-driven SoH identification. The motivation for the present study, however, was to demonstrate the practical relevance of NFRA in SoH identification as an interesting concept for the next generation of BMSs. Thus, we have consciously chosen a simple data-processing work-flow: (1) frequencies for NFRA which are highly representative for cell ageing are identified via a correlation analysis; (2) informative features of the NFR data are calculated; (3) a sensitivity analysis reveals the most informative feature; and, (4) based on this feature, a SVR model is derived for SoH identification. The overall work flow is summarised in Figure 4 and described in detail below.

Correlation Analysis
Feature Extraction Sensitivity Analyis

Correlation Analysis
Correlation measures the strength of association between two variables, and its value varies between +1 and −1 for positive and negative relationships. A value of ± 1 indicates a perfect degree of association between the two variables. As the correlation coefficient decreases towards 0, the correlation between the two variables also declines.
In this study, Spearman rank [39] is used for testing the relationship between NFR and cycle number. Spearman rank correlation is a non-parametric test that measures the degree of association between two variables. Spearman rank correlation analysis does not need assumptions about the distribution of data and is the appropriate correlation analysis when the values of one variable are monotonically related to the other variable. Spearman rank correlation coefficient ρ S is calculated as: where d j represents the difference between the ranks of two variables, and k represents the number of observations.

Feature Extraction and Sensitivity Analysis
To derive a data-driven SoH model, informative features are calculated from the NFRA data that cover the identified relevant frequency range. Various characteristics can be determined addressing geometrical, frequency, time-frequency, and statistical measures [40,41]. In this study, the geometrical features, y-axis intercept of the extrapolated NFR and the slope of the NFR spectra, are extracted for each ageing cycle. In detail, a simple linear regression analysis of the extracted NFR data in the sensitive frequency range is executed. Please note that the linear regression analysis is a low-cost and easy to implement approach, that is, an ideal method for the extraction of the relevant ageing features. In principle, additional features can be added, processed and translated to fewer but informative measures using principal component analysis and similar techniques [42,43]. Here, we directly select the most informative feature via a sensitivity study, that is, determining which feature changes the most with the cycle numbers. Only for this feature, an empirical SoH model is derived based on SVR.

Support Vector Regression
SVMs can be used for classification and regression [44,45]. In the case of regression, support vector machines are termed as support vector regression and is closely related to statistical learning. To build a regression model, a training data set T is required: where X = {x 1 , . . . , x K } are the input and Y = {y 1 , . . . , y K } the output sets. The goal of the SVR is to approximate the data set via a regression model F similar to: where the coefficients b and a l , ∀l = 1, . . . , M are unknown and have to be identified using the training data set T . Please note that the SVR was originally developed for linear regression problems but can be easily extended to nonlinear regression problems due to the so-called Kernel trick [46]. Moreover, empirical models based on SVR are robust to outliers, that is, data that are corrupted by large random errors or offsets. In addition, the SVR only considers those coefficients a i that are relevant and sets non-relevant coefficients to zero, which simplifies the model building and ensures well-posed identification problems in the case of limited data; that is, K < M. Thus, outliers and limited data have less impact on SVR models compared to ordinary regression techniques. For the technical details of SVR, the interested reader is referred to [46][47][48] and references therein. In the present study, frequency dependent NFR data sets during cycle ageing are included in the training data sets. Technically, we used the following SVR setting within the R statistical computing environment including the e1071 library: radial basis kernel, an epsilon value in the insensitive-loss function of 0.1, and a C-constant of the regularisation term in the Lagrange formulation of 8.

Measurements and Cells
LIBs in the pouch format with nickel-manganese-cobalt (NMC) as the cathode and graphite as the anode material were analysed in this ageing study. Electrode manufacturing (Figure 5a  Cycle ageing experiments at 25 • C were conducted by charging with 1 C constant current (CC)/constant voltage (CV) and discharging with 1 C CC in a potential window between 2.9 V and 4.2 V. NFRA was measured with a Zahner Electrochemical Workstation (Zennium) in the galvanostatic mode in a temperature chamber also at a constant environmental temperature of 25 • C prior to ageing and after each 50th cycle. Detailed measurement settings are given in [24], and cell specifications are summarised in Table 1. Typically, the SoH of an LIB is estimated by dividing the actual maximum residual capacity C i of the i-th cycles by the nominal capacity C 0 as in: In the proposed case study, the SoH is calculated using highly informative NFR training data sets and is compared to the state-of-the art capacity based SoH estimation. This is motivated by the fact that we observed a monotonic correlation of NFR with capacity loss and increased resistance R [24]. Thus, this indicates to evaluate directly the SoH via NFRA.
The extrapolated NFR at the y-axis intercept is evaluated by linear regression analysis for all cycles i, NFR min,i is used for this purpose and normalized with the extrapolated NFR at the y-axis intercept NFR min,0 prior to ageing. The SoH NFR measure is defined as: Please note that with decreasing capacity C 0 and increasing internal resistance R, the sum of the higher harmonic amplitudes of the NFR increases for constant excitation amplitude I AC . Thereby, it can be deduced that an increase of NFR correlates to a decrease of the SoH. The initial NFR value, NFR min,0 corresponds to a new cell (SoH = 100%), and NFR min,i correlates to a decreased SoH after i cycles. Finally, the accuracy of the degradation model is calculated by comparing SoH SV M to the SoH i , which is determined via standard capacity measurements during cycling.

Results and Discussion
Prior to NFR data analysis and the development of the degradation model, appropriate frequency ranges for the relevant processes had to be identified in the NFR spectrum of the ageing training data sets [24,26]. In Figure 6, NFR is shown over the frequency at specific cycle steps from 0 to 400 cycles, at each 50th cycle, respectively. Processes in the low-frequency range I from 0.02 Hz to 1 Hz can be attributed to diffusion processes in active material particles and processes in the mid-frequency range II from 1 Hz to approximately 300 Hz to electrochemical reactions. Processes in frequency range III, which show significantly lower NFR than the processes in ranges I and II, can most probably be attributed to ionic transport processes between and in the SEI and electrolyte. According to the literature, capacitive processes show almost minor or constant NFR in comparison to Faradaic processes with a Butler-Volmer kinetic. Furthermore, the separation of ranges I and II at approximately 1 Hz is distinct in the training data sets and, furthermore, is not affected by cell ageing. The identified process ranges with the corresponding time constants τ and frequencies ω are listed in Table 2. The correlation of processes to the frequency range in the NFR spectrum is supported by the corresponding impedance spectrum of the cell, which is shown in the inset of Figure 6 and discussed in detail in [24]. The NFR spectra in Figure 6 further illustrate that nonlinearities in the training data sets increase continuously with cycle ageing, particularly in frequency ranges I and II. In frequency range III, NFR spectra seem to be ageing-independent. However, without using data analysis methods, it is not possible to state with sufficient certainty an explicit ageing-to-NFR-correlation. Figure 6. NFR spectra during cycle ageing at each 50th cycle, measured with I AC = 1.6 C; the impedance spectrum initial to cycle ageing in the inset, is measured with C/15 C. Table 2. Typical time constants and frequency ranges of processes identified for analysed LIB in the NFR spectra.

τ/s ω-Range/Hz
Process 50 to 1 0.02 to 1 Solid diffusion 1 to 0.003 1 to 300 Electrochemical reactions 0.003 to 0.0001 300 to 10,000 Ionic transport processes at interfaces Therefore, in the next step, a correlation analysis of the NFR and cycle number as a function of the analysed frequencies is calculated using the Spearman rank correlation measure. In Figure 7, the correlation coefficients for Spearman ρ S correlation is shown over frequency to identify the frequencies with relevant ageing information in the training data sets. Three characteristic frequency ranges with different ageing correlations are identified and can be distinctly separated at approximately 300 Hz. For frequencies higher than 300 Hz, range C, ρ S varies strongly and therefore NFR data extracted at those frequencies have no valid information about the ageing of the analysed LIB. For frequencies between 0.2 Hz and 300 Hz, range B, ρ S is exactly 1, which indicates a perfect positive degree of association between NFR and cycle number. In the low frequency range A, ρ S differs from 1; ρ S extends from 0.93 to 0.98, thereby indicating that NFR and the cycle number have a weaker positive relationship at these frequencies. In the next step, the identified correlation ranges are interpreted by comparing correlation ranges A to C with the identified processes in frequency ranges I to III, as shown in Table 3. The most sensitive processes in the LIB are in the mid-frequency range from 0.2 to 300 Hz with a correlation coefficient of 1 for both types of correlation analysis methods. Thus, NFR data measured at these frequencies are highly suitable for LIB ageing quantification and the degradation model development. NFR data measured at frequencies lower than 0.2 Hz show a high correlation to ageing, as both correlation coefficients have values of >0.95. However, the data are not perfectly suitable in comparison with the higher frequency NFR data from 0.2 Hz to 150 Hz. For frequencies higher than 150 Hz, the NFR data have lower and strongly varying correlation coefficients for both correlation methods. Thus, NFR data at high frequencies have minor ageing sensitivities and are therefore not applicable for SoH quantification and not considered for use as training data for the degradation model. The ageing-sensitive data identified for frequencies less than 150 Hz are extracted and shown over frequency in Figure 8. For calculating the degradation model using SVR, the geometrical features, y-axis intercept of the extrapolated NFR and the slope of the NFR spectra, are extracted for each ageing cycle. In Figure 9, the cycle specific feature values of extrapolated NFR at the y-axis intercept and the slope are plotted over the frequency of each cycle step. Apparently, extrapolated NFR at the y-axis intercept is more sensitive to ageing than the slope of the NFR spectra. Extrapolated NFR increases monotonically with the cycle number. The slope, however, deviates at cycles 100 and 150. Therefore, extrapolated NFR at the y-axis intercept is used for the parametrisation and training of the SVR degradation model. In Figure 10, the correlation of the SoH with the extracted degradation feature is shown. In the inset of Figure 10, the values of the degradation features extracted from the data (blue) as well as the calculated values by the SVR model (red) are shown in terms of the cycle number to illustrate the high accuracy of the degradation model as the measured and calculated values match perfectly. In the next step, the performance of the degradation model is tested with validation data sets and additional cells, which are non-identical to the initial test cells. NFR data sets identical and not identical to the training cells were applied to the degradation model, and the SoH was identified. The results of this validation are shown in Table 4. Moreover, the algorithm predicts the SoH of an identical cell with a high accuracy of 3% and the SoH of a non-identical cell with a high accuracy of 4%. Thereby, it is shown that the NFR data recorded during cycle ageing are highly suitable for the quantification of the SoH of the cell for both identical and for non-identical cells. Please note that the battery cells used in the case study are manufactured by the Battery LabFactory Braunschweig and do not fulfill industrial standards, i.e., they may show substantial variations in their configurations that make a precise SoH identification difficult in general. The primary goal of this study was to provide the overall concepts and to demonstrate the proposed work-flow with first experimental results. Future work will focus on optimized and more sophisticated machine learning concepts with considerable higher accuracy, e.g., more informative features of NFRA data, empirical degradation models with memory (Bayesian concepts), and rigorous uncertainty analysis in SoH identification and SoH predictions. With NFRA, the measurement time can be reduced up to 92% compared to standard EIS protocols, as the ageing sensitivity of NFR data is sufficiently high in the mid-frequency range from 1 Hz to 100 Hz for the cells that are analysed in the presented study. Therefore, it is not necessary to analyse the overall frequency range from mHz to kHz, which is typically done using dynamic measurement methods. Please also note that measuring the overall frequency range might be useful for SoH diagnosis of different cells and at different ageing conditions.

Conclusions
In this study, we demonstrated the effective data-driven identification of SoH by using the novel dynamic analysis method NFRA. First, an accelerated ageing test of an LIB at 25 • C was executed and a dynamic analysis with NFRA performed in defined cycle steps at constant environmental and measurement conditions. The nonlinear dynamics of the cell changed regarding the ageing protocol with proceeding cycle numbers. The NFR increased monotonically with cycling, but the qualitative progress of the NFR spectra remained unaffected. The most likely ageing effects at the applied cycling current and temperature are electrolyte degradation and the growth of the SEI, respectively. Such ageing effects include capacity loss and increased cell impedance, which lead to higher NFR values, as observed in this ageing study.
For identifying the most sensitive frequency range for ageing tests, a correlation analysis was performed. Here, the mid-frequency range from 0.2 Hz to 150 Hz shows the strongest correlation with LIB degradation. After extraction of extrapolated NFR at the y-axis intercept and the slope as informative features for LIB ageing, the extrapolated NFR values at the y-axis intercept were used to parametrise the SVR model for SoH degradation, as it shows a higher correlation to NFR ageing data. By using additional cells and data sets, the degradation model was validated and tested. It was shown that the degradation model can predict the SoH values with high accuracy.
Taken together, these results demonstrate the usefulness of NFRA as an effective and fast SoH identification method as well as a versatile tool in the ageing diagnosis of LIBs in general. Furthermore, as the correlation coefficients indicate a perfect correlation of NFR to ageing for all frequencies between 0.2 Hz and 150 Hz, we suggest a reduction of the measurement range from 1 Hz to 100 Hz for the analysed cells, which leads to a dramatic reduction in measurement time of up to 92% compared to standard measurement protocols. Future work will take the evaluation of NFRA data sets of commercial cells and the impact of changes in the measurement scenarios, such as temperature variations, as well as ageing conditions on the SoH identification algorithm into account. Furthermore, we presently work on the effect of temperature distribution within cells on NFRA; this is a first step to account the more complex behaviour of battery packs, which show temperature and property distributions.