Ensemble Surrogate Models for Fast LIB Performance Predictions

: Battery Cell design and control have been widely explored through modeling and simulation. On the one hand, Doyle’s pseudo-two-dimensional (P2D) model and Single Particle Models are among the most popular electrochemical models capable of predicting battery performance and therefore guiding cell characterization. On the other hand, empirical models obtained, for example, by Machine Learning (ML) methods represent a simpler and computationally more efﬁcient comple-ment to electrochemical models and have been widely used for Battery Management System (BMS) control purposes. This article proposes ML-based ensemble models to be used for the estimation of the performance of an LIB cell across a wide range of input material characteristics and parameters and evaluates 1. Deep Learning ensembles for simulation convergence classiﬁcation and 2. structured regressors for battery energy and power predictions. The results represent an improvement on state-of-the-art LIB surrogate models and indicate that deep ensembles represent a promising direction for battery modeling and design.


Introduction
This paper considers the task of estimating the energy and power density of LIB designs across a range of characteristics and parameters and focuses on the problem of obtaining these results without incurring the considerable computational costs of detailed physical simulations in a recurrent manner. While state-of-the-art contributions such as [1] exploit Deep Learning neural networks trained on simulated data as surrogate models for this objective, to the best of our knowledge, no literature contribution yet has evaluated the performance of so-called ensemble models for this task. We address this gap in the literature by comparing state-of-the-art models with a number of ensemble surrogates.
Even though Lithium-Ion Batteries (LIB) have progressively been improved since their market introduction in 1991 by Sony, their massive deployment requires them to be further optimized in terms of performance, durability, and safety. From the physical point of view, to obtain these results, in the battery design phase, it is important to reduce limitations to ion transport, thus avoiding undesirable cell polarization effects. Consequently, an extensive literature describes the effect each design parameter has on transport mechanisms, and therefore on cell performance. On the one hand, the effects of several parameters (including electrode thickness, particle size, tortuosity and discharge rate) have been investigated by experimental methods. On the other hand, cell design has also been explored through modeling approaches, demonstrating that model-based design can be used to reduce the number of experiments, while accurately describing battery performance and providing guidance for battery characterization.
In general, the battery models presented in the literature mainly fall into two categories: physics-based electrochemical models and empirical ones. In addition, the recent Contributing to the domain of such surrogates, Wu et al. [1] propose to address cell design characterization by combining Machine Learning classifiers and regressors based on deep feed-forward neural network models. While the first neural network is a classifier that predicts whether a set of input design variables would result in a physically realizable cell, the second neural network estimates the specific energy and specific power of the design. Both neural networks are trained and validated using data from finite-element, thermo-electrochemical simulations.
Although ensemble models are well represented in the literature, to the best of our knowledge, no contribution yet has evaluated their performance as surrogates of physical LIB simulators. This paper addresses this gap by comparing a state-of-the-art LIB surrogate model based on deep feed-forward networks with a set of ensemble models integrating deep classifiers and regressors. In this sense, we demonstrate the accuracy of composite surrogate models for the estimation of the density of power and energy of a given LIB parameter set. The input and output parameters taken into consideration for the electrochemical simulations are listed in Table 1. Their values are measured by proprietary electrochemical and physicochemical protocols for a set of proprietary electrodes in CIDETEC, and vary between the minimum and maximum values shown in Table 1. In terms of application, the present paper does not directly focus on optimization. Instead, as also previously done in [1], it considers surrogates that can be used to estimate the performance of battery designs across a wide range of material characteristics and parameters. This applicative objective reduces the importance of using surrogate models that can output estimated values and their uncertainties, for instance by exploiting techniques such as Monte Carlo Dropout [30] and Variational Inference [31]. Consequently, in the rest of our treatment, we limit ourselves to non-probabilistic surrogates.
The methodology we introduce and detail in the sections below is based on the general framework put forward by [32,33], integrating and extending the Pseudo-2D surrogate proposed by [1] to evaluate the performance of composite surrogate models integrating both classification and regression. Through classification, a Machine Learning model evaluates the convergence of the simulator, while, through regression, the algorithms predict the output parameters of the Ragone plot [34] (energy and power, as per Figure 1). We extend the state of the art in [1] by showing the advantage of adopting respectively ensemble methods for convergence classification and structured regression for the estimation of the Ragone parameters, rather than simpler feed-forward networks for both tasks. Furthermore, from the methodological point of view, we apply a quantitative performance evaluation proce-dure that is based on K-fold validation [35] rather than on simple train/test/validation splitting as in [1]. The accuracy of some of the ensemble surrogate models we introduce compares favorably to that of the state-of-the-art method introduced in [1]. Furthermore, the approach in the reference is extended by using a K-fold cross-validation method to evaluate if the model can generalize the quality of its results.

Methods
The construction and validation of the P2D-based surrogate model have involved several steps: • Selection of the appropriate input and output parameters. In this regard, measurable electrode and electrolyte transport properties are selected as input parameters, while the parameters of the very-well known Ragone plot (energy and power) are selected as outputs to be used for battery characterization purposes, as shown in Table 1, where the minimum and maximum values for each of the parameters are also reported. • Selection of electrochemical model and Design of Experiments. A proprietary P2D model is used to generate the data set. • Development of surrogate models. Each surrogate is composed of three models: a binary simulation convergence classifier and two separate regression models that estimate integrated power and energy density values from the input parameters (see Figure 2). • Model validation, typically carried out similarly as for the data-driven models.

Electrochemical Model and Design of Experiments
The electrochemical model used for building the data-set is expressed in terms of four conservation relations described in Smith et al. [36] by partial differential equations and their corresponding boundary conditions. The numerical approach considered for solving the system of equations is based on Finite Elements Methods (FEM) for space-discretization as implemented in the open-source FEniCS toolkit (https://fenicsproject.org/, accessed on 8 May 2021), and on Implicit Euler methods for time-discretization.
The data set is generated by running the model with the selected input variables and their ranges presented in Table 1.
As in Wu et al. [1], in this paper, we focus on the variables that are controllable during battery manufacturing. Among all the design variables, the electrode thickness, porosity, and particle radius are chosen because they can be controlled within the material or the electrode manufacturing steps. For example, electrode thickness and porosity can be experimentally adjusted during the calendering step of the cell manufacturing process. Separator thickness and porosity ranges are defined considering commercially available components. In addition, the initial electrolyte concentration and ionic conductivity are further important tunable variables and are considered as they play an important role in determining thick electrode performance limitations due to the dry-out of the electrolyte. When it comes to the electrolyte, two parameters are selected for the design of experiments. First, the ionic conductivity (k e ) is selected as it can be modulated by doping salts, acids, metals, alkali, etc. to the solvent matrix and the initial salt concentration for a similar reason (i.e., c 0 ). For the sake of simplicity, we have assumed that the ionic conductivity has no dependence on the electrolyte concentration. In further developments, we envision that this dependency will be taken into account. Finally, the applied C-rate is also selected as an input variable, as this parameter may vary from very low values to very high ones depending on the battery application. Constant-current discharge protocols are used as a baseline, with the aim of the characterization of the cell in terms of energy and power during discharge. Even so, the testing protocol could be adapted to other battery applications. The rest of the parameters needed to simulate battery performance using a P2D model are summarized in Table 2 and Figure 3.  The general structure of the model. A simulator produces values for energy and power density integrals, together with a convergence flag indicating the successful execution of a run, up to a fixed end time. Run-stopping criteria corresponding to non-convergence involve, for instance, physically meaningless negative ion density values. A classifier and two regressors learned from the simulated data can be exploited to generate estimates. The classifier estimates whether a simulation will produce physically unrealizable conditions, and therefore stop without completing a run. The two regressors estimate integrated energy and power for converging runs.     Considering the previous inputs, we generate a simulated data set by running a P2D model implemented in Python. As previously mentioned, simulations consider a given set of input parameter values. They propagate a set of variables describing the electrochemical status of the battery up to a well-defined and fixed time limit. Upon both convergence and premature termination of the simulation, Ragone energy and power estimates are integrated as per where t d is the discharge time. The Ragone plot [34] is commonly used to illustrate the performance of energy storage devices, and is widely used to compare technologies catering to specific demands. Accordingly, in this work, we generate a Ragone plot for a wide range of electrode/cell design parameters aimed at a variety of future designs of electrochemical energy storage devices for different usage applications. Constant currents for discharge are used as a baseline since the Ragone plot is usually generated with a constant C-rate. Accordingly, we do not consider variable currents. Reasons for failures in simulating up to the specified end time include reaching physically meaningless or non-realizable situations, such as negative values for ion concentrations. In these cases, the run will stop prematurely, producing only partial estimates for the above Ragone integrals, and appropriately setting a negative convergence status flag.

Surrogate Modeling
We generate a surrogate model to reproduce the results of the P2D model simulator. Once trained, this empirical model can approximate simulation results in a fraction of the running time of the physical simulator, keeping the approximation errors under control. The input parameters taken into account correspond to the inputs of simulations. As previously mentioned, the output variables include the convergence status of the simulation, and the Ragone plot variables corresponding to energy and power integrals. The proposed surrogate model structure is a composite one: the convergence status is modeled by a binary classifier (Section 2.2.1) and the energy and power output integrals are approximated by regressors (Section 2.2.4).

Simulation Convergence Classification
We consider that, in a Pseudo-2D electrochemical model, the completion status of a simulation can either be full (corresponding to propagation up to the intended complete temporal extension) or partial (corresponding to early termination). This status can depend on several phenomena with varying degrees of complexity. Those phenomena range from the consumption of chemical species to variations in the separator/electrode interface resistance due to parasitic ion deposition and layering reactions. Modeling the range of phenomena involved requires adding flexibility to the surrogate model. To that end, we combine the prediction capabilities of committees of ML classifiers by using ensemble methods [37].
The key hypothesis is that, by exploiting extended ensembles of classifiers, we can implicitly learn about the different mechanisms that can stop the execution of the simulator. The resulting composite models can then better reproduce the varying degrees of complexity of the physical phenomena that lead a simulation to an early stop, thereby improving on state-of-the-art approaches such as the one in [1].
To evaluate this hypothesis, we compare a single feed-forward network (indicated by the 'single' label in subsequent tables and diagrams), corresponding to the state of the art in [1], to a set of ensemble binary classifiers based on different approaches: a voting ensemble composing a small number of tree-based classifiers ('voting'), an efficient Gradient Boosting ensemble implementation ('xgboost'), and a stacking ensemble of models including feed-forward networks ('stacking'). The list of classifiers is in Table 3. We briefly introduce each model in the present section and detail implementation parameters in Section 3, dedicated to experiments.

Feed-Forward Deep Network
We start by establishing a baseline for the performance of convergence estimators, by considering a deep learning classifier as per the state of the art in [1], as a reference against which the performance of the rest of the models can be quantitatively evaluated.
To that end, we define a fully connected network with a single hidden layer, considering a log-entropy loss function suited for classification [38], and an ADAMW optimizer [39] to address possible weight decay issues [40]. The limited structural complexity of the network corresponds to the results published in the literature.

Voting Ensemble of Classifiers
We build a first ensemble classifier by combining three simple models. The combination operates by basic voting among level-1 members, listed in Table 4. We describe them briefly. A pruned decision tree is one in which sections that are not critical to reducing an empirical loss function are iteratively eliminated. The pruning operates by considering the Minimum Description Length principle to reduce the risk of over-fitting [41]. The second type of level-1 model is a decision stump, a basic single-level decision tree [42]. The third type of level-1 model considered is a random forest, in itself an ensemble of decision trees [43] that outputs the mode of the output of the trees that compose it. Table 4. Voting ensemble instance types.

Voting Ensemble Member Type Instances
Pruned decision tree 1 Decision stump optimized by AdaBoost 1 Random forest 1 We then consider a Gradient Boosting implementation in XGBoost [44]. This specific model is selected on account of its flexibility and of the robustness of available implementations, which draw on efficient linear model solvers and effective tree learning algorithms. We complete the set of classifiers by taking into account a further model ensemble that includes instances of the feed-forward network. In this last case, the stacking is performed by a second-level classifier that takes as input the outputs of the first-level classification models that compose the committee. In this sense, the predictions of the first-level models are combined by an ensemble bagging/boosting combination mechanism that is based on a 'stacking' second-level classifier.
The models considered in the set are listed in Table 5. The three level-1 instances of the feed-forward network in the ensemble are similar in structure yet independent of one another, in the sense that they are independently trained on non-overlapping subsets of the data set. Detailed descriptions of the models, together with the results obtained on a test data set, are reported and compared in Section 3.

Ragone Variable Regression
Regression is carried out separately on the integrated energy and power values generated by the simulation. The input data set is filtered considering only the samples that correspond to complete propagation of the model through time. The integrated energy and power values are approximated by regressors whose functional structure has a basis in the physical model. In particular, a polynomial structure is extended to a more general one, by considering an exponential transformation of the ionic conductivity of the electrolytê so that an estimateV for one of the integrated Ragone integrals (energy or power) is expressed in terms of a polynomial of order N of the Θ input parameters θ i of the simulation as well as of their exponential exp(θ i ). This inclusion of the exponentiation of the parameters of the simulation is motivated by the structural form of the Butler-Volmer equation, as well as by the solution implied by the conservation and diffusion equations in the model. Note that, for simplicity, neither temperature nor concentration dependence is assumed for parameters such as the ionic conductivity in the present contribution. We plan to address this relation in future extensions of this work.

Results
Simulated data consists of 11 input and 3 output attributes, as detailed in Table 1. The outputs are not temporally located, since they represent the integration of the state of a simulation that is successfully extended until a given time limit. Concerning the available data and parameter measurements, we observe that uncertainties from sensors and experimental measurements are not considered at this stage. We believe that sensor noises are more critical in BMS development, as they could have a big impact on the performance of the numerical algorithms used to estimate the state variables. They can be incorporated in treatments such as the present one by e.g., Variational Inference techniques, modifying the output layers of the network architectures considered and the loss functions specified.

Simulation Convergence Prediction by Classification
We partition the data set into separate training, testing and validation sets by 60/20/20% stratified sampling without replacement. Consequently, the distribution of the simulation convergence status is preserved in the generated sets. The data points from each set are shown in Figure 4. We use K-fold validation [35] as a strategy for the measurement of performance of the classification methods, with the K parameter set to 8. The input variables are normalized by a Z-score transformation in the preprocessing step. The randomized data are organized in batches of 32 samples to speed up the learning [45,46]. While the validation subset is fixed, the train/test split varies by k ∈ {1 . . . K} in the K-fold validation procedure. The values of the input simulation parameters tend to be uniformly distributed in the input data set, with a maximum absolute value of the Pearson correlation between variables around 0.19. Asterisks are used to mark partial and non-converging runs. Samples in the 20% validation set are in orange. Samples from an example training/testing 60%/20% split are depicted in green and blue, respectively. While the validation subset is fixed, the train/test split varies by k in the K-fold validation procedure.
To establish baseline results, we first consider a state-of-the-art single fully connected feed-forward network as in [1]. The hyper-parameters for the surrogate are summarized in Table 6. The number of epochs is experimentally limited to 15, observing the convergence plots produced during the learning phase. As a consequence of the limited number of training epochs, training time is limited to five seconds on a single Intel i7-8550U CPU device running at 1.80 GHz. Though a GPU can be used as a learning device, the fast convergence of the learning procedure means that learning directly on the CPU is practically usable. The Confusion Matrix for the single feed-forward network for simulation convergence estimation as trained with the whole training/testing set and measured on the validation set is reported in Table 7. The results for this fully connected deep classifier are included and compared in Figure 5.  Table 7. Confusion Matrices for a single run of the simulator convergence classification models: feedforward network (a), voting ensemble (b), gradient boosting ensemble (c), deep stacking ensemble (d). The results for the single run may not be representative of the full results obtained via the K-fold validation in Figure 5. A first ensemble model implements a voting committee with simple tree-based models as described in the above Section 2.2.1. The confusion matrix is reported in Table 7, while K-fold and validation results are summarized and compared in Figure 5.
After that, a gradient boosting ensemble classifier is configured with the hyperparameter values in Table 8. At each training cycle, an ensemble of decision trees with a predefined maximum depth is improved through adding estimators for residuals to globally optimize a given objective function [48]. In our case, the maximum depth and the number of training cycles for the classifier are both set to two, while the objective taken into account for the loss function minimization is a binary logistic function. The step size shrinkage hyper-parameter used in the updating steps to prevent over-fitting is set to one, to indicate that no shrinking should take place. The Confusion Matrix for the gradient boosting simulation convergence estimator trained with the whole training/testing data and validated on the validation set is reported in Table 7. The resulting performance indicators for the gradient boosting ensemble classification model are included in Figure 5. Finally, we consider an ensemble convergence model operating by a second-level classifier stacking the results obtained by first-level classifiers as per Section 2.2.1. The Confusion Matrix for the stacking ensemble simulation convergence model is reported in Table 7. The performance for the simulator convergence prediction of the stacking ensemble model is again reported in Figure 5.

Regression
Regression is carried out separately on the integrated energy and power values generated by the simulation. The input data set is filtered considering only the samples that correspond to the successful propagation of the model through time. For both models, we consider 70% of the samples for training and the remaining 30% of the samples for testing.

Energy
The distribution of the energy integral is shown in Figure 6. Values lower than 80 J are considered as outliers and filtered out. By considering the correlations between the energy and the input variables, a first simple model is proposed to predict the values of the energy integral,Ê, in the form of Equation (2): L p being the thickness of the positive electrode and k e the ionic conductivity of the electrolyte. The resulting mean squared error and the R 2 score are MSE = 32.83 and R 2 = 0.848, respectively. In order to improve these results, a more complete model is introduced, as shown in Equation (3):Ê = a · L p + b · e c·k e + d · L n + e · C rate + f where L n represents the thickness of the negative electrode and C rate is the applied current as a C-rate. Doing so results in a mean squared error of MSE = 19.30 and a R 2 = 0.911. Both models can be compared as in Figure 7a.
To ensure that the models have the desired accuracy and variance, some crossvalidation is needed. To avoid the possibility of high bias in cases of limited data, a K-Fold cross-validation technique is used with K = 30, which means that the data set is split into 30 different groups. The value for K is chosen so that each train/test group of data samples is large enough to be statistically representative of the broader data set. The R 2 and the MSE are compared to those obtained when creating the model in Figure 7b,c.
From Figure 7b,c, it can be concluded that, in the Simple Model, around 20 out of 30 times, the R 2 values are higher than 0.8, and the same number of times, the MSE is below 25. For the Complete Model, around 20 of 30 times, the R 2 is higher than 0.9 and the MSE lower than 20. The models behave as expected and are validated by comparing these values with the ones obtained first in the models.

Power
The distribution of the power integral of simulation outputs is shown in Figure 8. Values higher than 285 W can be considered as outliers, and, therefore, the dataset is again filtered for these values. The power integral presents a significantly higher correlation with the applied current and the thickness of the positive electrode, compared to that of other input variables. Thus, a first simple linear regression is proposed to predict the values of the power integral,P based on the following equation: with C rate the C-rate and L p the thickness of the positive electrode. The results obtained by optimizing the constants in Equation (4) can be observed in Figure 9a. The resulting mean squared error and the R 2 score would be MSE = 183.49 and R 2 = 0.988, respectively. As in the case of energy estimates, K-fold cross-validation is performed in the developed model, with K = 30 meaning that the dataset is again split into 30 different groups.

Discussion
The results above indicate that an effective way to speed up the characterization of LIB designs involves complementing simulations with empirical, ML surrogate models learned from simulation results [33]. In terms of computational performance, P2D simulations using CIDETEC's proprietary code take, on average, 20 seconds for a 1C discharge. Once the surrogate models are trained, instead, predictions can be obtained in a few hundredths of a second. Deep ensembles and structured regression models represent an interesting direction for the development of efficient surrogate models capable of rapidly and accurately predicting Ragone plot variables such as energy and power integrals. Those results can be integrated with design workflows and extended to other fields such as management and control. In the specific case of the prediction of early termination of the simulator by classification, the results seem to indicate that ensemble models can indeed outperform state-of-the-art surrogates based on deep learning fully connected networks, exploiting their flexibility in implicitly learning the multiple mechanisms that can stop the execution of the simulator.
On the other hand, it is perhaps worth mentioning that an important point that is sometimes overlooked [49] is that physical simulations are complemented rather than substituted by such data-based modeling: some of the computational costs are moved in time rather than eliminated, since some simulated data need to be generated first for the learning to take place.

Conclusions and Future Work
The present contribution has introduced and evaluated composite surrogate models for the prediction of the performance of Lithium-Ion Batteries which combine classifiers based on deep ensembles and structured regressors for the prediction of energy and power densities. We have compared a single state-of-the-art feed-forward network to three types of ensemble binary classifiers for the prediction of simulation convergence. The quantitative results obtained indicate that ensemble models can indeed outperform state of the art models based on a single deep network. Furthermore, we have quantitatively validated the applicability of structured regressors to the estimation of LIB energy and power. Future work to be considered includes analyzing feature and parameter sensitivities [50] and the use of ML explainability/interpretability methods to ensemble models from an electrochemical perspective. Furthermore, model and data uncertainty management [51,52] should also be analyzed: the quantification of the risk from different sources is valuable for the real-world applicability of the developed methodology.