Unsupervised Neural Networks for Identiﬁcation of Aging Conditions in Li-Ion Batteries

: This paper explores a new methodology based on data-driven approaches to identify and track degradation processes in Li-ion batteries. Our goal is to study if it is possible to differentiate the state of degradation of cells that present similar aging in terms of overall parameters (similar remaining capacity, state of health or internal resistance), but that have had different applications or conditions of use (different discharge currents, depth of discharges, temperatures, etc.). For this purpose, this study proposed to analyze voltage waveforms of cells obtained in cycling tests by using an unsupervised neural network, the Self-Organizing Map (SOM). In this work, a laboratory dataset of real Li-ion cells was used, and the SOM algorithm processed battery cell features, thus carrying out smart sensing of the battery. It was shown that our methodology differentiates the previous conditions of use (history) of a cell, complementing conventional metrics such as the state of health, which could be useful for the growing second-life market because it allows for determining more precisely the state of disease of a battery and assesses its suitability for a speciﬁc application.


Introduction
The use of lithium batteries for energy storage has had a large growth in the last decade, becoming standard for both electric vehicles (EV) and hybrid vehicles (HEV), as well as for stationary storage systems. For this reason, the demand for critical materials such as cobalt, graphite or lithium is increasing [1].
The growth in lithium demand for battery production already accounted for 46% of all global production in 2017 [2]. The global lithium production has grown at a 20% rate per year since 2000 [3], and it is expected that in 2025 the market for second-life batteries will exceed 95 GWh only from EV [4]. This growth rate makes it necessary to design a full lifecycle, from manufacturing to the reuse and recycling processes.
In the second-life market there are three levels of reuse: pack, module and cell [5]. Battery cells can be used in applications related to stationary energy storage systems (SESSs), where the battery reuse extends their lifespan by 40% [6]. In addition, they can be ideal for frequency containment and restoration reserves (FCR and FRR) in on-grid regulation [7], or in the residential sector to support solar energy production [8].
The value-chain of Li-ion batteries after being used in their main application can pose different use scenarios, such as direct reuse, second-use or recycling. However, the current second-life market only considers the reuse of the complete pack to be profitable due to economic criteria and the difficulty of determining the quality of the extracted cells [9]. Nowadays, the second-life batteries come from automotive applications, where battery packs are manufactured. The mechanical difficulty of disassembling these packs makes the material and labor costs rise significantly. Some authors are already studying packs are manufactured. The mechanical difficulty of disassembling these packs makes the material and labor costs rise significantly. Some authors are already studying the possibilities to find an alternative source of second-life cells, for example, Li-ion cells rescued from notebook batteries, whose capacities continue to exceed 70% nominal [10].
As a contribution to this matter, this article presents a new technique with the objective of differentiating the aging of the cells coming from different unknown applications. A large number of variables are used for determining battery aging. All cells have a calendar-life degradation [11] which is not related to their use and will have a long-term impact that is the same for all the batteries produced in the same batch. However, there are many other factors that depend on the application for which the batteries have been previously used: temperature, charging and discharging current profiles, and voltage range or power peaks, among others. These factors are further defined in [12,13].
Aging entails a variation of the internal composition of the cell, affecting the electrochemical reactions that take place, reducing its capacity and, from an electrical point of view, increasing its internal resistance. These effects make the modeling of a battery throughout its life cycle a highly nonlinear problem that is not yet well resolved.
The most common metric introduced to define the state of a battery is the State of Health (SoH), which returns a percentage (0-100%) based on a definable quotient for several characteristics of the model (capacities, impedances, etc.), calculable by using a wide variety of well-known techniques, grouped in [14,15]. Some authors recently studied physical correlations to define the actual SoH, such as [16,17]. Others have proposed the study of the incremental capacity curves from laboratory analysis [18,19].
However, this classic view of aging does not take into account the non-linearity of the health status of cells nor the influence of SoH in other states of function, such as the expected capacity fade, the maximum obtainable power, self-heating and other known issues of a degraded cell. Thus, it has been experimentally observed that two cells can have the same SoH at a certain point in their life cycle, but they can have a totally different aging curve ( Figure 1) due to different applications or uses that they have previously had (discharge currents, temperatures or depth of discharge).
More modern approaches use data-driven techniques to estimate aging, which are reviewed in [20,21]. There are similar techniques based on Support Vector Machines (SVM) trained with data from previous uses, such as [22].
Newer techniques such as LSTM neural networks and deep learning are used in online prediction by using inputs such as capacity and temperature [23,24].  More modern approaches use data-driven techniques to estimate aging, which are reviewed in [20,21]. There are similar techniques based on Support Vector Machines (SVM) trained with data from previous uses, such as [22].
Newer techniques such as LSTM neural networks and deep learning are used in online prediction by using inputs such as capacity and temperature [23,24].
Other authors have proposed capacity estimation methods based on complete charging curves using Gaussian Process Regression (GPR), such as [25,26].
Particle filtering is another widely applied method for degradation estimation, as proposed in [27]. Capacity analysis has also been studied, calculating degradation through the charge-discharge efficiency calculation [28]. Finally, [29] proposed the use of Principal Component Analysis (PCA), perhaps the proposal most similar to ours, but with the important limitation that PCA is a linear technique.
The aim of our study consists of developing a methodology for cell aging estimation complementary to the classical definition of SoH, where only battery capacity is considered. Thus, a new methodology based on data-driven techniques is presented for contributing to determine the state of degradation of cells with different conditions of use but similar degradation (i.e., with similar remaining capacity, SoH or internal resistance). This new methodology analyzes voltage waveforms obtained in cycling tests by using an unsupervised neural network, the Self-Organizing Map (SOM). It will be shown that this methodology allows the differentiation of previous conditions of use (history) of a cell in an unsupervised and visual way, complementing conventional metrics such as the SoH.
A first version of this methodology was previously explored in [30]. However, the results were not as successful as expected due to the limited dataset used and the lack of a feature extraction stage. In this new study, a different dataset is used, feature extraction is incorporated, and new quantitative metrics are proposed.
In summation, these are the original contributions of this article: This paper is organized as follows. In Section 2, the methodology is presented. First, the self-organizing map neural model is introduced, then the dataset used in our work is described, the feature extraction process is explained, and new metrics are defined. In Section 3, the results achieved by our methodology are presented, including two types of visualizations for the SOM. In Section 4, two applications of our methodology are described. Finally, in Section 5 the conclusions of our work are provided.

Methodology
Our methodology consisted of three steps: (1) voltage-time curves were obtained and normalized, (2) some features were extracted, (3) features were processed by means of self-organizing maps, which were analyzed by visual inspection and by using the proposed metrics. A schematic for the technique is shown in Figure 2. Other authors have proposed capacity estimation methods based on complete charging curves using Gaussian Process Regression (GPR), such as [25,26].
Particle filtering is another widely applied method for degradation estimation, as proposed in [27]. Capacity analysis has also been studied, calculating degradation through the charge-discharge efficiency calculation [28]. Finally, [29] proposed the use of Principal Component Analysis (PCA), perhaps the proposal most similar to ours, but with the important limitation that PCA is a linear technique.
The aim of our study consists of developing a methodology for cell aging estimation complementary to the classical definition of SoH, where only battery capacity is considered. Thus, a new methodology based on data-driven techniques is presented for contributing to determine the state of degradation of cells with different conditions of use but similar degradation (i.e., with similar remaining capacity, SoH or internal resistance). This new methodology analyzes voltage waveforms obtained in cycling tests by using an unsupervised neural network, the Self-Organizing Map (SOM). It will be shown that this methodology allows the differentiation of previous conditions of use (history) of a cell in an unsupervised and visual way, complementing conventional metrics such as the SoH.
A first version of this methodology was previously explored in [30]. However, the results were not as successful as expected due to the limited dataset used and the lack of a feature extraction stage. In this new study, a different dataset is used, feature extraction is incorporated, and new quantitative metrics are proposed.
In summation, these are the original contributions of this article:  A new methodology for cell aging analysis based on the SOM is proposed.  A feature extraction methodology based on polynomials is studied.  An original database crafted for aging studies, provided by an external OEM, is used.  Three new metrics to evaluate trajectories onto the SOM are proposed.  New D-matrix 3D-map representation, including trajectories with their quantization errors, is used.  Some application examples for the proposed technique are given.
This paper is organized as follows. In Section 2, the methodology is presented. First, the self-organizing map neural model is introduced, then the dataset used in our work is described, the feature extraction process is explained, and new metrics are defined. In Section 3, the results achieved by our methodology are presented, including two types of visualizations for the SOM. In Section 4, two applications of our methodology are described. Finally, in Section 5 the conclusions of our work are provided.

Methodology
Our methodology consisted of three steps: (1) voltage-time curves were obtained and normalized, (2) some features were extracted, (3) features were processed by means of self-organizing maps, which were analyzed by visual inspection and by using the proposed metrics. A schematic for the technique is shown in Figure 2.

Self-Organizing Maps
A Self-Organizing Map (SOM) [31,32] is an unsupervised neural model used for pattern recognition, cluster search and database visualization. Unlike other unsupervised techniques (clustering, PCA, etc.), the SOM is nonlinear and allows for monitoring the evolution of a system.
A SOM consists of an input layer and an output layer (map), which is a rectangular array of neurons ( Figure 3). It has two modes of operation: training and inference. Given a map of nx × ny neurons and an input layer of n variables x k (1 ≤ k ≤ n), each neuron (i, j) stores a weight vector w ij of n components ( Figure 3). In inference, each neuron (i, j) computes the similarity between the vector of inputs x k and the vector of weights w ijk ; the Euclidean distance is normally used: ing for the winning neuron is carried out, as described in the inference phase. The weights of the winning neuron and those around it (neighborhood) are adjusted by: where ε(t) is the learning rate, which decreases over time: where ε0 represents the initial learning rate (ε0 < 1.0), εf is the final rate (εf ≈ 0.01), tf is the estimated number of iterations to reach εf. The neuron weights can be updated online (example by example) or after processing a batch of examples; in our work, the batch algorithm was used. On the other hand, in many applications the input variables have very different variation ranges so they are usually normalized; in our work, the input variables were normalized to mean 0 and variance 1. The neuron that minimizes the distance to the input vector is considered to have recognized the input pattern, called the winning neuron or best-matching unit (BMU).
In the training phase, the synaptic weights are adjusted. The weights start with a random value, in each iteration t an input vector x(t) is presented and the process of looking for the winning neuron is carried out, as described in the inference phase. The weights of the winning neuron and those around it (neighborhood) are adjusted by: where ε(t) is the learning rate, which decreases over time: where ε 0 represents the initial learning rate (ε 0 < 1.0), ε f is the final rate (ε f ≈ 0.01), t f is the estimated number of iterations to reach ε f . The neuron weights can be updated online (example by example) or after processing a batch of examples; in our work, the batch algorithm was used. On the other hand, in many applications the input variables have very different variation ranges so they are usually normalized; in our work, the input variables were normalized to mean 0 and variance 1.
To quantify the quality of the trained map, two metrics are available, as can be seen in [33]. The quantization error (qe) measures the resolution of the trained map and is calculated as the mean of the Euclidean distances between every input vector, x i , and the neuron vector for which this distance is minimized (BMU), m x i , for all input samples, N, as it is shown in Equation (4): On the other hand, the SOM generates a non-linear projection of the multidimensional input space onto the two-dimensional map, allowing the visualization of the structure of the dataset and possible clusters. The topographic error (te) quantifies the topology preservation by computing the proportion of all input vectors for which the first BMU and the second BMU (the second neuron that minimizes the distance between the input vector and its weights) are not adjacent: where u() acquires the value 1 when the two first BMUs for the input vector x i are adjacent and 0 otherwise. A visualization tool called a U-matrix [31,32] is frequently used for drawing borders between the different clusters that may be presented on the SOM. The U-matrix procedure generates a new map that visualizes the Euclidean distance between neighboring neurons in gray levels, but the dimensions of this new map are almost double those of the initial SOM since it is built from Euclidean distances computed for pairs of adjacent neurons: a nx × ny SOM yields a (2nx − 1) × (2ny − 1) U-matrix map. In this work, the so-called D-matrix is used as a visualization tool instead because the D-matrix averages the distances between neighboring nodes of the U-matrix to generate a map of mean distances that has the same dimensions as the original SOM (nx × ny).
Finally, once a SOM has been trained, it is used in inference mode. In our case (Sections 3 and 4), by presenting data of a cell, the BMU given by Equation (1) indicates its current state (cell diagnosis). By providing a data series from a cell, the consecutive BMUs describe a trajectory on the map that shows the evolution of the state of that battery cell.

Database
An automotive OEM has provided the database used in this work; neither cell references nor manufacturer can be revealed for confidentiality reasons. The dataset consists of thirteen NCA (LiNiCoAlO 2 ) 18,650 Li-ion cells with a nominal voltage and capacity of 3.6 V and 3.2 Ah, respectively.
Cells were continuously cycled in environmental chambers simulating real conditions of usage with a different profile for each cell (different Vmax, charge/discharge rates, temperatures, etc.), thus, each cell suffered different degradation processes depending on each specific application (described in Table 1). Homogeneous control tests were carried out periodically for all the cells. This dataset was formed by control tests carried out for almost four years. The control tests have a sampling frequency of 1 Hz. The database does not include information corresponding to the continuous real application-simulation cycles. There is only information from the homogeneous tests (charge and discharge) that were performed periodically to all cells to check their status.
This database is useful for degradation studies, since it includes data from homogeneous tests from cells being used in diverse conditions of use, different applications (in operation or stored), temperature conditions, and current rates as well as maximum charging voltage. Notice that several cells did not reach their End of Life (EoL) point. Since this database was provided by an external party, there was no information about the design of experiment (DoE) about this term.
Although it has similarities with other databases that are widely used for degradation studies, such as the one provided by NASA [34], these other databases do not include information on the cycles themselves, and no similar tests for all cells are included.
Cycling conditions during normal operation for each cell can be consulted in Table 1. The first four cells were simply stored (unused) between experiments at high voltage and constant temperatures; the rest of the cells were cycled with different conditions and a constant-current profile.
Control tests were performed on the 13 cells, however, within the course of the experiment, some cells reached their EoL. In total, 42 stops from the normal cycling conditions were made to carry out control tests. These tests were made approximately every 6 weeks during the 4 years that the experiment lasted. Some cells reached their End of Life early, and a lower number of tests could be performed ( Table 1 shows the number of tests carried out for every cell). Thus, in total, 391 control tests were made.
The control tests consisted of a full constant-current, constant-voltage (CC/CV) charge/discharge profile with a current rate of C/3, from 4.1 V to 3 V, and a cut-off current of 65 mA at a constant ambient temperature of 25 • C.
The normalized control tests (dataset) analyzed in this article can be consulted in [35].

Preprocessing and Feature Extraction
The aim of our work consists of developing a methodology for cell aging estimation ( Figure 2) complementary to the classical definition of SoH, where only battery capacity is considered. For this reason, in our work the discharge curve is converted to a normalized form. This removes all information on the cell capacity, which is related to the maximum charged or discharged capacity at each moment of its life cycle.
The State of Health (SoH) values were calculated with the capacity fade approach [14]: where Q max (t) represents the maximum capacity discharged on each of the control tests (calculated with Coulomb Counting), and Q nominal represents the initial capacity of each cell when manufactured. EoL points are very different from each discharge curve of the dataset ( Figure 4). It was decided to study the available aging data up to the 85% SoH level ( Figure 5). Our goal was to check if a SOM can differentiate causes of aging from discharge data recorded in similar degradation conditions. The duration of the discharge (Figure 6), i.e., the length of the curve on the x-axis, depends on the product of the capacity and the discharge current. The capacity information must be removed to validate the study hypothesis. Therefore, to ensure that the discharge curve of the voltage does not depend on the cell capacity, it has been normalized using the state of charge (SoC), as defined in Equation (7), as it is a relative magnitude.
Electronics 2021, 10, 2294 7 of 20 The capacity information must be removed to validate the study hypothesis. Therefore, to ensure that the discharge curve of the voltage does not depend on the cell capacity, it has been normalized using the state of charge (SoC), as defined in Equation (7), as it is a relative magnitude.  SoH evolution for each cell Cell     Voltage-SoC curves were obtained from the discharge part of the control tests, using Coulomb Counting. For each cycle, the maximum discharged capacity was considered 100% SoC, as can be observed in Figure 7. Voltage-SoC curves were obtained from the discharge part of the control tests, using Coulomb Counting. For each cycle, the maximum discharged capacity was considered 100% SoC, as can be observed in Figure 7. Some studies, such as [36], propose differential features between several cycles, but these techniques require several laboratory cycles, reducing their useful life in real applications. Others study the degradation only with the capacity feature [37], but as explained in the introduction, two cells can have the same capacity at a certain point in their life cycle but can have a totally different aging curve. Other approaches extract statistical features, such as the kurtosis or the skewness of the curve [38].
In our work, three ways of presenting the input data (input vectors) for training the SOM were tried. First, the waveforms from voltage-SoC curves were interpolated to achieve curves with 10 points, 20 points, 50 points and 100 points to be used as input vectors.
On the other hand, as is usual in machine learning, several feature extraction techniques were tried (Stage 2 in Figure 2). Thus, some statistics (mean, median, average absolute deviation, kurtosis and skewness of the curves) were used as input features. Some studies, such as [36], propose differential features between several cycles, but these techniques require several laboratory cycles, reducing their useful life in real applications. Others study the degradation only with the capacity feature [37], but as explained in the introduction, two cells can have the same capacity at a certain point in their life cycle but can have a totally different aging curve. Other approaches extract statistical features, such as the kurtosis or the skewness of the curve [38].
In our work, three ways of presenting the input data (input vectors) for training the SOM were tried. First, the waveforms from voltage-SoC curves were interpolated to achieve curves with 10 points, 20 points, 50 points and 100 points to be used as input vectors.
On the other hand, as is usual in machine learning, several feature extraction techniques were tried (Stage 2 in Figure 2). Thus, some statistics (mean, median, average absolute deviation, kurtosis and skewness of the curves) were used as input features.
In addition, various mathematical expressions have been proposed to fit the normalized discharge curves (mean 0, standard deviation 1), and their fitting parameters (coefficients) will be the SOM input features. The mathematical expressions tried were ( Table 2) Fourier series (5 and 8 terms), Gaussian model (4 terms), 5th-and 9th-degree polynomials and rational fraction. The goodness-of-fit was checked by using the average root-mean-square error (RMSE) and the sum of squared errors (SSE) as metrics.   The results achieved by using all these techniques will be shown in Section 3.

Metrics
Usually the results provided by the SOM are visual and qualitative [31,32]; metrics such as quantization error and topological error (Section 2.1) only inform about the quality of the trained map. Once trained, the map is used for representing the evolution of the aging of a battery cell in the form of a trajectory or path of BMUs (Figures 8 and 9). Here, two new metrics were introduced to evaluate and compare cell trajectories on the map because no SOM quality criteria applied to the evaluation of trajectories was found in the literature.
This metric computes the deployment distances between two trajectories. For each point in one trajectory Qj (marked as X in red in Figure 9), the minimal distance to the other trajectory Pi (for all the points i) is calculated on a radial base, selecting the value of the (minimum) distance, represented with green dots in Figure 9.
As distances are calculated on a squared grid, the Manhattan distance was selected in order to consider all the adjacent neurons at the same distance The numerator adds the minimum Manhattan distances calculated between all the points of the first trajectory, Pi, with each of the points of the other trajectory, Qj. To normalize this measure, it is divided by the length of the trajectory, with the aim of relativizing this distance regardless of the dimension of the map or of the trajectories studied. Thus, cases with equally separated trajectories but different lengths will produce different SI. The calculations of this separability index for pairs of trajectories are presented in a matrix form (as in a confusion matrix), to quantify the relative unfolding between all possible combinations of trajectories on a map trained with several battery cells.
For measuring the deployment quality of the trajectories, the number of coincident BMUs (NoCB), based on the Hamming distance [40], was proposed. This metric measures the minimum number of required changes between two series so one can become the other. Then, the NoCB represents the maximum number of BMUs from a trajectory map, which are coincident to another trajectory. A high NoCB means that the corresponding two cell cycles are interpreted by the SOM as equals.

Discharge Maps
This section shows the results achieved by our three-step methodology (Figure 2) by incorporating the tools shown in Section 2. In the first step, the voltage-time curve is normalized to the voltage-SoC curve to remove the information on cell capacity (as explained in Section 2.3). The second step, feature extraction, depends on the specific feature extraction technique selected of those presented in Section 2.3. In the third step, the data are Start End Q j P i Figure 9. Representation of distance calculation in the Separability Index (SI).
The first proposed metric, the relative deployment index (DI), assesses the quality of the trajectory deployment. It is desirable that a trajectory on the map has a length as close as possible to the minimum distance between the initial and the final point ( Figure 8). Thus, first, the length of a specific trajectory is calculated and then divided by the minimum possible length: Given a trajectory of N points, its length is evaluated by adding the Manhattan distances [39] between all its consecutive points P i and P i−1 , and then it is divided by the Euclidean distance (d E ) between the start (P 1 ) and end (P N ) points: where, given a point P, Px and Py represent its x and y coordinates on the map. Thus, a trajectory with DI close to the unit indicates a good trajectory (close to optimal, Figure 8).
The second proposed metric, separability index (SI), compares several trajectories (on the same map) against each other to assess their separability (similarity). SI measures the geometric distance between two trajectories on the map (related to their size); similar trajectories will have a low SI.
This metric computes the deployment distances between two trajectories. For each point in one trajectory Q j (marked as X in red in Figure 9), the minimal distance to the other trajectory P i (for all the points i) is calculated on a radial base, selecting the value of the (minimum) distance, represented with green dots in Figure 9.
As distances are calculated on a squared grid, the Manhattan distance was selected in order to consider all the adjacent neurons at the same distance The numerator adds the minimum Manhattan distances calculated between all the points of the first trajectory, P i , with each of the points of the other trajectory, Q j . To normalize this measure, it is divided by the length of the trajectory, with the aim of relativizing this distance regardless of the dimension of the map or of the trajectories studied. Thus, cases with equally separated trajectories but different lengths will produce different SI. The calculations of this separability index for pairs of trajectories are presented in a matrix form (as in a confusion matrix), to quantify the relative unfolding between all possible combinations of trajectories on a map trained with several battery cells.
For measuring the deployment quality of the trajectories, the number of coincident BMUs (NoCB), based on the Hamming distance [40], was proposed.
This metric measures the minimum number of required changes between two series so one can become the other. Then, the NoCB represents the maximum number of BMUs from a trajectory map, which are coincident to another trajectory. A high NoCB means that the corresponding two cell cycles are interpreted by the SOM as equals.

Discharge Maps
This section shows the results achieved by our three-step methodology (Figure 2) by incorporating the tools shown in Section 2. In the first step, the voltage-time curve is normalized to the voltage-SoC curve to remove the information on cell capacity (as explained in Section 2.3). The second step, feature extraction, depends on the specific feature extraction technique selected of those presented in Section 2.3. In the third step, the data are introduced as inputs to the SOM, the map is trained and, finally, metrics are calculated, and the results are analyzed.
The SOM allows for displaying information in different ways [31,32]; for this work, labeled maps, feature maps, trajectory maps and 3D visualizations are shown and discussed.
From the initial 391 control tests, 165 were made of cells with a SoH metric above 85% (eight cells in total, selected as explained in Section 2.3). SOMs were trained with 165 input vectors (examples) from those 165 control tests. An input vector consists of data of a battery cycle, including the features extracted from discharge curves or raw discharge data (Section 2.3).
The SOM Toolbox library [41] was used for developing the maps. This library automatically sets the hyperparameters, nevertheless, the SOM is usually very tolerant to the specific value of many of its hyperparameters. The main hyperparameter is the dimension of the map (nx × ny neurons, Section 2.1). In most SOM applications, the number of neurons on the map should be similar to the number of training examples (165 control tests in our case), in order to get the input data to be well separated onto the map when projected. In our case, we used 10 × 18 maps (Figure 10), i.e., 180 neurons, because it is a number close to the 165 training samples available. In its operation, the SOM distributed the 165 training samples among the 180 neurons and the training samples projected to the map were well separated (one sample per neuron, approximately).
Electronics 2021, 10, x FOR PEER REVIEW 12 of 22 5th degree polynomial. Finally, the lowest NoCB are found when using 5th degree polynomials, both in average and maximum values. In Figure 10, some maps trained by using several feature extraction techniques were included to show the deployment quality of eight cell trajectories. Notice that, once trained, the SOM is used in inference: An input vector (cell cycle) is presented, every neuron computes Equation (1) and the winning neuron (BMU) is marked. Presenting a series of input vectors corresponding to the cycles of a specific cell, its evolution is displayed as an aging trajectory that links all the BMUs. Best results are given by raw voltages (100 points) and the 5th order polynomial.
Given the quantitative results (Table 3) and the visual analysis of the deployed trajectories (Figure 10), feature extraction based on modeling the voltage-SoC curve by using a 5th degree polynomial is a good choice: where x represents the SoC. This is the feature extraction technique shown in Figure 2. Thus, in our methodology, normalized discharge curves (mean 0, standard deviation 1) are fitted to this 5th degree polynomial, and their six fitting parameters (ai) are the SOM input features. From now on, only maps trained by using this feature extraction technique will be considered.    Table 3 shows the achieved results for 10 × 18 SOMs trained by using the different feature extraction techniques explained in Section 2.3. The first two metrics, quantization (q e ) and topographic error (t e ), measure the quality of the trained map, as explained in Section 2.1. The other shown values are the maximum and mean deployment index (DI) and separability index (SI), and the number of coincident BMUs (NoCB), as defined in Section 2.4. For q e , t e , DI and NoCB, the lower the value, the better. In the case of the SI, a higher value indicates a greater distance between trajectories (better separability).  It can be observed in Table 3 that satisfactory values for quantization and topographic errors are achieved in all cases. The best SI is obtained by using as inputs sample statistics and the coefficients of 5th degree polynomials. In the case of the deployment index, the best results are obtained with sample statistics, interpolated voltages (100 points) and the 5th degree polynomial. Finally, the lowest NoCB are found when using 5th degree polynomials, both in average and maximum values.
In Figure 10, some maps trained by using several feature extraction techniques were included to show the deployment quality of eight cell trajectories. Notice that, once trained, the SOM is used in inference: An input vector (cell cycle) is presented, every neuron computes Equation (1) and the winning neuron (BMU) is marked. Presenting a series of input vectors corresponding to the cycles of a specific cell, its evolution is displayed as an aging trajectory that links all the BMUs. Best results are given by raw voltages (100 points) and the 5th order polynomial.
Given the quantitative results (Table 3) and the visual analysis of the deployed trajectories ( Figure 10), feature extraction based on modeling the voltage-SoC curve by using a 5th degree polynomial is a good choice: where x represents the SoC. This is the feature extraction technique shown in Figure 2. Thus, in our methodology, normalized discharge curves (mean 0, standard deviation 1) are fitted to this 5th degree polynomial, and their six fitting parameters (a i ) are the SOM input features. From now on, only maps trained by using this feature extraction technique will be considered.

Labeled Maps
Once the SOM has been trained, in our case, by using 165 input vectors including six features (the six coefficients of a 5th degree polynomial), some visualizations can be generated ( Figure 11). In Figure 11a, the D-matrix of the trained SOM is shown, representing a map of mean distances between neighboring neurons (as explained in Section 2.1).
The bigger the distance (yellow and red colors), the bigger the difference between the information learned by the neighboring neurons, which allows delimiting natural borders on the map (possible clusters). Blue valleys (similar neurons) are separated by yellow and red boundaries (frontiers), separating input vectors with very different features. Thus, D-matrix in Figure 11a helps to interpret clusters shown in the following analysis involving labels, features and trajectories.
For the other map visualizations, all the input vectors used for training (corresponding to the cell cycles under analysis) are presented to the SOM. The BMU corresponding to every cell cycle is colored in relation to the following labels ( Figure 11 (e) (f) (d) Figure 11. White neurons are those that never have been BMU (interpolating neurons), small colored squares indicate that a neuron has been BMU once or a few times, large colored squares indicate that a neuron has been BMU many times. In the case of the maximum voltage (Figure 11c), lower voltages are found in the upper right zone of the map, whereas higher voltages are at the lower left of the map. Figure 11d,e show the labels corresponding to the charge and discharge rate, respectively. In both maps there is a zone at the bottom corresponding to the cell that was stored without being charged. On the other hand, in Figure 11e a central cluster can be distinguished for cells discharged at a current rate of C/10, but there are no separate clusters for cells with current rates of C/2 and 3C/10. Figure 11f shows the SoH corresponding to each of the cycles used for training. A clear progression can be observed, from the lower right area (red), where all the new cell cycles are found, towards the rest of the map, where aging is increasing. This disposition seems coherent since the impact of the different conditions of use in new cells is less determinant in their voltage responses. As they degrade, differences are observed, just as they would be seen in their equivalent electrical models.
Finally, as an illustration of the use of labeled maps, Figure 12 merges the second map of Figure 10 (trajectories of eight cells) and Figure 11f (neurons colored according to their SoH). Thus, in Figure 12 the history (aging) of eight battery cells can be appreciated: All the cell trajectories start at the lower right corner of the map (high SoH) and end at the upper areas (low SoH). Depending on the use conditions of every battery cell, their aging diseases are different, being represented on different zones of the map.

Trajectory Maps
As already seen, by presenting to a trained SOM a series of input vectors corresponding to several cycles of a specific battery cell, the SOM provides a BMU for every input vector (cell cycle), and the evolution of that cell can be seen as an aging trajectory that links all the BMUs. The BMU is the neuron whose value provided by Equation (1) is minimum, but, in addition, this equation provides the quantization error of the input data versus the winning neuron. This error gives an idea of the confidence of the position estimation of the input vector onto the map: A low error provides confidence, while a high error suggests that the current input vector does not actually have a good match.
The new visualization of Figure 13

Trajectory Maps
As already seen, by presenting to a trained SOM a series of input vectors corresponding to several cycles of a specific battery cell, the SOM provides a BMU for every input vector (cell cycle), and the evolution of that cell can be seen as an aging trajectory that links all the BMUs. The BMU is the neuron whose value provided by Equation (1) is minimum, but, in addition, this equation provides the quantization error of the input data versus the winning neuron. This error gives an idea of the confidence of the position estimation of the input vector onto the map: A low error provides confidence, while a high error suggests that the current input vector does not actually have a good match.
The new visualization of Figure 13 (3-D visualization or 'geographic metaphor') was developed from the D-matrix and represents the change in the weights of neighboring neurons in grey levels on the z-axis. Areas of similar weights (valleys) have low values (black color), while the areas with the greatest change in weights (white) suggest cluster borders (mountains). The cell paths and their quantization errors (represented by the height of a point over the surface) were included in the 'geographic metaphor' for observing the cell evolutions. It can be seen that all the cells have their starting point (green points) in the same area of the map, with high SoH (see Figure 12). From there, each one evolves through different zones, as already seen in the labeled maps. Cells 6 and 7, with a cycling temperature of 15 °C, are located in the same valley. Cells 5 and 9, with a charging current of C/10, are located in the central valley, and cells 8, 11 and 12, with cycling temperatures of 25 °C and 35 °C, are grouped in the right zone.
In addition to visual inspection, the correct deployment of the paths can be assessed by the relative deployment index (Table 4) and the separability index ( Table 5). The worst separability values were obtained for cells 11 and 12, as well as 9 and 12, as they are in the same area of the map. In the case of the separability index, the worst score was for cell 3, as it travels a shorter distance on the map, with a more erratic deployment.  It can be seen that all the cells have their starting point (green points) in the same area of the map, with high SoH (see Figure 12). From there, each one evolves through different zones, as already seen in the labeled maps. Cells 6 and 7, with a cycling temperature of 15 • C, are located in the same valley. Cells 5 and 9, with a charging current of C/10, are located in the central valley, and cells 8, 11 and 12, with cycling temperatures of 25 • C and 35 • C, are grouped in the right zone.
In addition to visual inspection, the correct deployment of the paths can be assessed by the relative deployment index (Table 4) and the separability index ( Table 5). The worst separability values were obtained for cells 11 and 12, as well as 9 and 12, as they are in the same area of the map. In the case of the separability index, the worst score was for cell 3, as it travels a shorter distance on the map, with a more erratic deployment.

Application Examples
In this section, some applications of the proposed methodology are discussed. A neural network trained with a dataset can be used in inference mode, providing a response to a new input vector. For example, given a used battery destined for the second-life market, it can be analyzed (cycled) in the laboratory and the achieved data (six features in our case) can be presented to the SOM, and the BMU obtained can be used to predict is evolution.
The first application of our methodology considers a new cell, periodically extracted from its use conditions to cycle it on specific laboratory conditions and track its aging by following its evolution on the trained SOM. Cell 13, which was not included in the training dataset, was selected as an illustration. All its available input vectors (cycles) were preprocessed and applied to the trained SOM. As shown in Table 1, this cell was cycled with a room temperature of 35 • C, with a charge rate of C 5 and discharge of C 10 . Figure 14 shows its trajectory onto the SOM, which was deployed on the middle cluster. The distance to the map on the z-axis represents the quantization error, a measure of the certainty of the estimations that the SOM provides. A slightly bigger amount of quantization error (mean of 0.809, still a low value) in relation to the cells in Figure 13 (mean of 0.650) is due to the lack of similar data on the training dataset (the error grows with the cell aging).
Electronics 2021, 10, x FOR PEER REVIEW 18 of 22 As this cell was not included in the training dataset, some of its input vectors were more similar to interpolating neurons than to the data used for training. This means that the SOM has generalized from the training set and provides a reasonable response to new input data.
Its trajectory was on the 25 °C-35 °C valley, and it was possible to infer from Figure  15b that the cell had a similar response to the cells cycled with a maximum voltage of 4.1V. Figure 15c,d locate the trajectory on the zone of cells cycled with a charge rate between C/10 and 3C/10 and discharge rate between C/10 and C/2.
The second application shows how the proposed technique could predict the past conditions of use of a used battery. In order to evaluate a used cell, laboratory tests similar to the one carried out on the cells in the dataset must be performed. Then, these data must be preprocessed, and the resulting six features (polynomial parameters) are the SOM inputs; finally, the BMU is marked on the map.
To simulate this process, the last cycle of cell 2 (not included in the training test) was selected. This cell was stored at 35 °C and its End-of-Life was 90%. For the selected cycle, the results are shown in Figure 16 as a point on the output BMU, located on the lower left part corner of the map. As it can be seen in Figure 16b, the cycle was clustered on the 45 °C zone due to the lack of cells stored at 35 °C in the training set. Results match conditions for charge and discharge rates (as shown in Figure 16c), as the BMU is located on the zone of cells stored. The BMU represents a similar SoH condition for the presented cycle, as it is located between 90-91% of SoH. With this result, a life evolution similar to the cells located in this part of the map could be predicted.  The evolution of cell 13 in relation to the label maps developed in Section 3.2 can be seen in Figure 15. The SOM processes every new input vector and provides new BMUs; notice that while some are colored BMUs, others are white BMUs (interpolating neurons).

Conclusions
This work proposes a new methodology that combines self-organizing maps with feature extraction based on polynomial coefficients to represent the aging of Lithium-ion cells with different usage conditions. This methodology for battery smart sensing, based on two-dimensional maps that process voltage curves, is intended to complement conventional unidimensional metrics, such as SoH, where only battery capacity is considered.
The SOM is a well-known tool for unsupervised data analysis, but in this work some new metrics were proposed to assess the unfolding of the cell trajectories, and 3D visualizations were introduced that allowed a clear view of the quantization error to determine the uncertainty of the map diagnosis.
On the one hand, all these new tools have showed that a 5th degree polynomial fitting is an appropriate feature extraction technique for this problem. On the other hand, all these new tools validate our methodology, since the trajectories of the cells on the map are separated and well defined, and the different aging phases can be located. The eight example cells used in the study are well-separated temperature-wise, being able to differentiate between high-and low-temperature applications. As this cell was not included in the training dataset, some of its input vectors were more similar to interpolating neurons than to the data used for training. This means that the SOM has generalized from the training set and provides a reasonable response to new input data.
Its trajectory was on the 25 • C-35 • C valley, and it was possible to infer from Figure 15b that the cell had a similar response to the cells cycled with a maximum voltage of 4.1 V. Figure 15c,d locate the trajectory on the zone of cells cycled with a charge rate between C/10 and 3C/10 and discharge rate between C/10 and C/2.
The second application shows how the proposed technique could predict the past conditions of use of a used battery. In order to evaluate a used cell, laboratory tests similar to the one carried out on the cells in the dataset must be performed. Then, these data must be preprocessed, and the resulting six features (polynomial parameters) are the SOM inputs; finally, the BMU is marked on the map.
To simulate this process, the last cycle of cell 2 (not included in the training test) was selected. This cell was stored at 35 • C and its End-of-Life was 90%. For the selected cycle, the results are shown in Figure 16 as a point on the output BMU, located on the lower left part corner of the map. As it can be seen in Figure 16b, the cycle was clustered on the 45 • C zone due to the lack of cells stored at 35 • C in the training set. Results match conditions for charge and discharge rates (as shown in Figure 16c), as the BMU is located on the zone of cells stored. The BMU represents a similar SoH condition for the presented cycle, as it is located between 90-91% of SoH. With this result, a life evolution similar to the cells located in this part of the map could be predicted.

Conclusions
This work proposes a new methodology that combines self-organizing maps with feature extraction based on polynomial coefficients to represent the aging of Lithium-ion cells with different usage conditions. This methodology for battery smart sensing, based on two-dimensional maps that process voltage curves, is intended to complement conventional unidimensional metrics, such as SoH, where only battery capacity is considered.
The SOM is a well-known tool for unsupervised data analysis, but in this work some new metrics were proposed to assess the unfolding of the cell trajectories, and 3D visualizations were introduced that allowed a clear view of the quantization error to determine the uncertainty of the map diagnosis.
On the one hand, all these new tools have showed that a 5th degree polynomial fitting is an appropriate feature extraction technique for this problem. On the other hand, all these new tools validate our methodology, since the trajectories of the cells on the map are separated and well defined, and the different aging phases can be located. The eight example cells used in the study are well-separated temperature-wise, being able to differentiate between high-and low-temperature applications.

Conclusions
This work proposes a new methodology that combines self-organizing maps with feature extraction based on polynomial coefficients to represent the aging of Lithium-ion cells with different usage conditions. This methodology for battery smart sensing, based on two-dimensional maps that process voltage curves, is intended to complement conventional unidimensional metrics, such as SoH, where only battery capacity is considered.
The SOM is a well-known tool for unsupervised data analysis, but in this work some new metrics were proposed to assess the unfolding of the cell trajectories, and 3D visualizations were introduced that allowed a clear view of the quantization error to determine the uncertainty of the map diagnosis.
On the one hand, all these new tools have showed that a 5th degree polynomial fitting is an appropriate feature extraction technique for this problem. On the other hand, all these new tools validate our methodology, since the trajectories of the cells on the map are separated and well defined, and the different aging phases can be located. The eight example cells used in the study are well-separated temperature-wise, being able to differentiate between high-and low-temperature applications.
In addition, as an illustration of the usefulness of our methodology, two cases of use of the trained SOMs were presented. The first one considered a new cell with periodical control tests in specific laboratory conditions, which allows tracking its aging by following its evolution on the trained SOM. The second one presented the application of this methodology to identify the state of a cell from previous unknown use in order to estimate its previous uses, actual state and predict its life expectancy depending on its state of disease, or aging. This was done by comparing with cell trajectories included in the training dataset.
Although the achieved results are promising, the dataset used in our study only includes 13 cells and in very specific situations. For a deep validation of the proposed methodology and for accurate results, a larger dataset with broader usage conditions would be required, but currently only limited databases are available.
Finally, by using maps trained with more complete databases, this technique could be used to visualize aging in batteries destined for the second-life market with unknown past uses. For instance, it would allow for assigning second-life applications that are a more adequate fit for used cells based on their capabilities.