Review Reports - Solid-State Lithium Battery Cycle Life Prediction Using Machine Learning

Round 1

Reviewer 1 Report

See the attached file

Comments for author File: Comments.pdf

Author Response

Response letter

Manuscript ID: applsci-1168244

Solid-State Lithium Battery Cycle Life Prediction Using Machine Learning

Danpeng Cheng 1‡, Wuxin Sha 1‡, Linna Wang 2, Shun Tang 1, Aijun Ma 3, Yongwei Chen 3, Huawei Wang 3,Ping Lou 4, Songfeng Lu 5, Yuan-Cheng Cao 1,*

Response to Reviewer #1

We thank Reviewer #1 for the thoughtful and encouraging comments about our manuscript, and welcome the opportunity to address and clarify the issues raised in the report. Our responses to the points raised in the report are described below.

General of comments:

This work should be revised in order to enhance its quality and impact. Indeed, the standard sections of a scientific contributions are not present in this work or some sections are mixed. As a consequence, this work should be rewritten and the standard sections should be correctly inserted within the main text. The currently lacking sections are “Materials and Methods”, “Results”, “Discussion”. In addition, the presentation of the main results should be improved and discussed with reference to the current state of the art. In other words, the value of this work is not clearly presented to the interested readers.

Response to General of comments：

We sincerely thank Reviewer #1's for the support to our work. We have made further revisions on our manuscript based on Reviewer #1's suggestions. We have reordered our article based on sections “Materials and Methods”, “Results” and “Discussion”. Then simplified the redundant paragraphs and supplemented the missing parts. We believe that these revisions and improvements will make the revised manuscript more reasonable and of higher quality, and our work would be finally accepted by Applied Sciences.

Comment Section1: “2. Battery cycle data generation: data set description”

*) It is not clear whether this section should be placed into “Introduction” or into “Materials and Methods”

Response to Comment 1-1:

We thank Reviewer #1’s kind and valuable suggestions. We have divided this section and placed it into “Introduction” and “Materials and Methods” according to the content.

lines: “Figure 1. Cycle performance of LFP/NCM solid-state lithium polymer batteries: discharge capacity decreases with cycle numbers. Fig. 1 shows the discharge capacity as a function of cycle number for the whole cycle 107 life, where the color denotes different batteries. The capacity fade speeds are different in 108 the first few cycles and there are even some noisy fluctuations in the capacity fading 109 curves. Considering the high dimensional test data forms and intricate degradation mech- 110 anism of solid lithium batteries, we employ a data-driven approach using a series of de- 111 scriptors extracted from the full voltage curves of each cycle. The specific feature selection 112 and algorithm design are discussed in the next section.”

*) It is not clear where this figure should be placed. Perhaps in “Materials and Methods”

Response to Comment 1-2：

We thank Reviewer #1’s kind suggestions. We placed this paragraph into “Materials and Methods” with Figure 1.

Comment Section. “3. Feature Selection”:

Lines:” Many physical and chemical battery models have been proposed for diverse degra- 116 dation mechanisms such as the growth of the solid electrolyte interphase, the fracture of 117 lithium dendrites, active material loss, and interfacial polarization, all of them are listed 118 in Fig. 2 [22-26]. In general, the causes of battery degradation can be divided into two 119 categories, capacity fading and impedance increase. Both of them have some related meas- 120 urable physical quantities. For commercial lithium ion batteries, impedance doesn’t 121 change too much because the liquid electrolyte wets the gaps inside electrode particles as 122 well as the gaps between electrodes and electrolytes. 19 In contrast, Impedance increasing 123 takes up a major degradation factor for solid state batteries. 6 Delicate point contact is 124expected for solid-solid interfaces in solid state batteries. These solid electrolyte/electrode 125 interfaces significantly govern the electrochemical properties and cycle lifetime, as un- 126 wanted reaction products at the interface cannot dissolve and diffuse in the solid electro- 127 lyte [27, 28]. Hence, it is necessary to employ several data features to capture the electro- 128 chemical impedance evolution of battery during cycling. 129

*) These lines seem to belong to the “Introduction” sections

Response to Comment 1-3：

We thank Reviewer #1’s thoughtful suggestions. We placed these lines into “Introduction” sections.

Lines: “Nevertheless, considering the lack of cycle-by-cycle impedance spectroscopy in our 133 dataset and the impracticality and expensive cost of real-time impedance testing, we de- 134 cide to capture the impedance evolution from voltage-time curve. Luckily, voltage-time 135 curve does reflect battery aging information caused by impedance increasing and similar 136 approaches have been proved efficient by some researchers' previous work [29]. Due to 137 the existence of internal impedance, the electrode potential deviates from the equilibrium 138 potential when the current passes through the electrode. This phenomenon is called po- 139 larization, which produces overpotential as shown in Fig. 3. The result of polarization is 140 that the terminal voltage of a battery is lower than the equilibrium potential when the 141 battery is discharging and higher than the equilibrium potential when the battery is charg- 142 ing. Therefore, the working voltage curve of the battery becomes the superposition of po- 143 larization information inside the battery. For example, Chie Koga et al. have proved the 144 efficiency of quasi polarization, defined as a half of the difference in the averaged value 145 between charge and discharge voltage curve. Quasi polarization monotonically increases 146 with cycle numbers and roughly estimate the evolution of polarization. 147”

*) These lines seem to belong to the “Material and Methods” sections.

Response to Comment 1-4：

We thank Reviewer #1’s precious advice. We placed these lines into “Material and Methods” sections.

Lines: “We also established a measure of polarization degree, which is the kurtosis of 152 charge/discharge voltage difference sequence (gray area), as shown in Fig. 4. Kurtosis is a 153 common mathematical feature, which can be used to measure the steepness of probability 154 distribution of random variables. Its calculation formula is as follows. The larger the kur- 155 tosis value is, the more concentrated the probability distribution is. The smaller the kur- 156 tosis value is, the more uniform the probability distribution is. Because the overpotential 157 caused by polarization concentrate at the beginning and end of charge /discharge process 158 in the early battery cycles, and then gradually fills the whole charge /discharge process, 159 we proposed that the kurtosis value would decrease with the cycle numbers. As shown in 160 Fig. 4, taking 6# battery as an example, its kurtosis value does monotonously decrease 161 with cycle numbers, which proves the correctness of our assumption. 162 Based on the above discussion of degradation mechanisms and analysis of 169 charge/discharge curves, 30 features are selected for the RUL prediction (Supplementary 170 Table. 1). Some features represent the fading of capacity. The other features represent 171 the increasing of internal resistance. Furthermore, as shown in Fig. 5, by sorting their co- 172 efficients of association between feature value and RUL, 11 important features are re- 173 tained and described in detail as below: 174 Feature 1-2: Charge/Discharge time. Time of constant current charge/discharge pro- 175 gress. The duration indicates the capacity which the battery can be charged. Hence, this 176 phenomenon can reflect the battery degradation process 177 Feature 3-4: Charge/Discharge capacity. The total capacity of constant current 178 charge/discharge progress 179 Feature 5-6: the integral value of charge/discharge capacity-voltage curves: capacity- 180 voltage curves are drawn with charge/discharge capacity as the x-coordinate and the out- 181put voltage as the y-coordinate. The integral value can be used as a measure of the energy, 182 containing the heating energy consumed by the polarization internal resistance. 183 Feature 7-8: Turning point coordinates of capacity-voltage curve. As shown in Fig. 3, 184 turning points are defined as the positions where the slope of capacity-voltage curves 185 dramatic change. 186 Feature 9-10: The Kurtosis and skewness of difference between charge and discharge 187 voltage curve. Skewness, also known as third-order central moment, can be used to meas- 188 ure the asymmetry of probability distribution of random variables. The formulation is 189 (1) 190 where S represents the skewness, n is the number of samples, Xi is the value of the 191 ith sample (here is the voltage difference), μ and σ are the mean and variance of the volt- 192 age difference sequence. 193 Kurtosis, also called forth-order central moment, can be used to measure the steep- 194 ness of probability distribution of random variables. The formulation is 195 where K represents the kurtosis, n, Xi, μ and σ are the same as Equation (1). Here we 197 have subtracted the value by 3 so that the kurtosis of the normal distribution is zero, which 198 also explains the negative kurtosis values in Fig. 4. 199 Feature 11: Energy dissipation. The difference value of feature 5 and feature 6. 200”

*) On the contrary, these lines seem to belong to the “Materials and Methods” sections. Please rework accordingly.

Response to Comment 1-5：

We thank Reviewer #1’s precious suggestions. We reworded and placed these lines into “Material and Methods” sections.

Lines: “4. Algorithm selection and experimental results 207 Cycle life and RUL prediction is a regression task. Since there is no clear formula to 208 describe the relationship between the proposed features and cycle numbers quantita- 209 tively, we choose symbolic regression (SR), a genetic programming-based analysis model, 210 to solve this problem because it could generate an expression close to the theoretical for- 211 mula by progressive optimization. The model structure is shown in Fig. 6. Firstly, a series 212 of initial functions are randomly generated to roughly fit the mathematical relationship 213 between the proposed features and RUL. Then, this algorithm allows these initial func- 214 tions to breed, mutate and evolve on account of the survival of the fittest. We use a Python 215 library called gplearn to implement symbolic regression programming. Hyper-parame- 216 ters (p_crossover = 0.5, p_subtree_mutation = 0.1, p_hoist_mutation = 0.2, p_point_muta- 217 tion = 0.1) and function sets ('add', 'sub', 'mul', 'div', 'sqrt', 'log', 'abs', 'neg', 'inv', 'max', 218 'min') adopted herein is to obtain a brief but accurate description. After several popula- 219 tions of training, an optimized expression generated by our SR model outputs predicted 220 values approaching the ground-truth RUL values in our training battery data. Among the produced features, only those with high accuracy and low complexity 225 are explainable for researchers to better understand the relationship of solid-state batteries 226 to their characteristics. Therefore, the fifth formula, containing features X20(Kurtosis) and 227 X1(Charge specific capacity), in Table 1. met trade-off between accuracy and complexity. 228 Our model achieves excellent prediction accuracy from the perspective of coefficient of 229 determination (R2, 0.88) and root-mean-square error (RMSE, 20.97). We also conduct the comparison of predicted results and errors with the other three 236 machine learning methods, support vector regression (SVR), Gaussian process regression 237 (GPR) and elastic net (EN), as shown in Table 2. SVR is a non-parametric algorithm based 238 on Support Vector Machine, GPR a kernel-based probabilistic model according to the sim- 239 ilarity between samples, while the elastic net is a regularized linear regression model. 240 Take cycle 78 as an example, the true RUL value is 127. The predicted RUL values of GPR, 241 SVR, EN and SR are 98, 107, 122 and 128, respectively. The higher optimization efficiency 242 of our SR model is attributed to the diversity of applied symbols, which breaks the re- 243striction of fixed expressions.30 244”

*) This section is a mix between “Materials and Methods” and “Results”, and furthermore “Comments”. Please, split the selection of the algorithm and the main results of this manuscript. In addition, all comments related to the effectiveness of the provided framework should be inserted in the relevant section “Discussion”.

Response to Comment 1-6：

We thank Reviewer #1’s thoughtful comments. We split the selection of the algorithm and the main results of this manuscript and placed them into “Materials and Methods” , “Results” and “Discussion” accordingly.

*) The Discussion section is currently lacking. Nevertheless, this is an important section where all the main results of this work should be discussed with respect to the current state of the art. Here the authors should better present the value of their work.

Response to Comment 1-7：

We thank Reviewer #1’s kind suggestions. We added details of discussion about our main results of this work and modified whole sections. The revised “Discussion” section as follows:

In this work, we have successfully assemble 12 solid-state polymer batteries with the cycle life ranging from 70 to 213 cycles to build the training datasets. Then we perform battery RUL prediction by using symbolic regression method with 11 highly correlated features as model inputs. These 11 physical quantities such as capacity, energy and charge/discharge voltage differences are designed and screened by analyzing their different degradation mechanisms and charging/discharging curves compared with liquid lithium batteries. After populations of breed, mutation and evolution training, the optimized expression generated by our SR model shows a higher accuracy (R2, 0.88). Through appropriate pretraining on LFP solid battery data, the model also exhibits a passable transfer prediction ability for NMC solid battery data. These experimental results show that the method developed in this paper is universal in the field of solid-state lithium batteries with different material systems.

Machine learning has recently emerged as a popular approach largely applied in battery degradation prediction [11-15]. Hence, we also conduct the comparison of predicted results and errors with the other three machine learning methods coming from re-cent excellent related literature, support vector regression (SVR), Gaussian process regression (GPR) and elastic net (EN), as shown in Table 2. SVR is a non-parametric algorithm based on Support Vector Machine, GPR is a kernel-based probabilistic model which makes prediction according to the similarity between samples, while the elastic net is a regularized linear regression model. These three methods have been employed frequently in recent literature, and have achieved high accuracy. However, due to the existence of internal fixed formula, they still have no satisfactory adaptability and applicability in different battery systems. For accuracy comparison during these machine learning models, we randomly selected the cycle 78 as an example. The true RUL value is 127. The predict-ed RUL values of GPR, SVR, EN and SR are 98, 107, 122 and 128, respectively. The higher prediction accuracy of our SR model is attributed to the screened features which accurate-ly track the battery degradation and create highly correlated inputs for the SR model. The diversity of applied symbols breaks the restriction of fixed expressions and is also helpful for better optimization results[30]. However, the computation cost might be the current limitations of this model. Comparing the methods with fixed mapping formula form, the calculation time and storage consumption of symbolic regression are larger. This can be improved by reducing the number of screened features and formula symbols by domain knowledge in future work. Besides, the training process is strongly dependent on the qual-ity and quantity of battery testing data. So we make our experimental data publicly avail-able at Zenodo [37]. It is like a call to encourage other battery researchers to share their ex-perimental data to promote development of battery cycle life prediction research.

Table 2. Comparison of RUL prediction results of GPR, SVR, EN and SR

Method	R²	RMSE	RUL_Error	RUL_prediction (RUL_True=127)
SVR	0.56	38.95	29	98
GPR	0.70	32.66	20	107
EN	0.79	27.04	5	122
SR	0.88	20.97	1	128

The promise of symbolic regression for fast and accurate battery RUL prediction is demonstrated in this work. RUL prediction would promote the development of cell pack balanced design and battery echelon utilization [7]. Battery cycle life estimation could also greatly shorten the electrochemical test time and accelerate optimization of battery material systems and manufacturing processes. Overall, RUL prediction could provide great prospects for battery application, management and optimization in the future.

Lines: “5. Conclusion 248 This paper focuses on identifying degradation patterns and RUL prediction of solid- 249 state lithium polymer batteries by using domain knowledge features. Our symbolic re- 250 gression method takes 11 physical quantities such as capacity, energy and charge/dis- 251 charge voltage differences as input features by analyzing of degradation mechanisms and 252 charging/discharging curves. After populations of breed, mutation and evolution train- 253 ing, the optimized expression generated by our SR model shows a higher accuracy (R2, 254 0.88) compared with other three machine learning methods (SVR, GPR, elastic net with 255 R2 as 0.69, 0.49, 0.79, respectively). Through appropriate pretraining on LFP solid battery 256 data, the model also exhibits a passable transfer prediction ability for NMC solid battery 257 data. These experimental results show that the method developed in this paper is univer- 258 sal in the field of solid-state lithium batteries with different material systems. Solid state 259 lithium batteries are close to the gate of commercialization and their RUL prediction be- 260 comes of vital importance. Improving the battery model with higher prediction accuracy, 261 earlier estimation and better interpretability is promising to further promote the large- 262 scale applications of solid-state lithium batteries. 263”

*) This section should succinctly report the main achievements of this work and underline the value of this work.

Response to Comment 1-8：

We thank Reviewer #1’s important views. We streamlined our main achievements and underlined the value of our work in “Conclusion”. The revised “Conclusion” version is as follows:

This paper focuses on identifying degradation patterns and RUL prediction of sol-id-state lithium polymer batteries by using domain knowledge features. First, 11 physical quantities such as capacity, energy and charge/discharge voltage differences are chosen as input features by analyzing of degradation mechanisms, charging/discharging curves and correlations between features and RUL. Then the RUL is estimated by a symbolic regression model, which shows that Kurtosis and Charge specific capacity have strong correlations with RUL. Finally, the SR model shows a higher accuracy (R2, 0.88) compared with other three machine learning methods (SVR, GPR, EN with R2 as 0.69, 0.49, 0.79, respectively). Solid-state lithium batteries are close to the gate of commercialization and their RUL prediction becomes of vital importance. Improving the RUL prediction model with higher accuracy, earlier estimation and better interpretability is promising to further promote the large-scale applications of solid-state lithium batteries.

Reviewer 2 Report

Solid-State Lithium Battery Cycle Life Prediction Using Machine Learning

This paper employed a genetic algorithm to predict the remaining useful life of lithium polymer batteries. This paper is very straightforward. However, it misses many important key points for a good paper. The reviewer suggests a major overhaul of this paper and to be resubmitted in the future. Here are some comments:

The title does not represent the work. The title said the prediction was carried out using machine learning. However, in the abstract, the prediction was performed with a genetic algorithm, and it was mentioned that this paper shows a great prospect of machine learning algorithms.
Omit or explain the abbreviated term such as R2. The abstract does not represent the title in terms of the method employed.
There are many key MDPI published paper in this topic that is not included, such as:
1. Sidorov, Denis, Fang Liu, and Yonghui Sun. "Machine Learning for Energy Systems." (2020): 4708.
2. Khumprom, Phattara, and Nita Yodo. "A data-driven predictive prognostic model for lithium-ion batteries based on a deep learning algorithm." Energies4 (2019): 660.
3. Chandran, Venkatesan, et al. "State of Charge Estimation of Lithium-Ion Battery for Electric Vehicles Using Machine Learning Algorithms." World Electric Vehicle Journal1 (2021): 38.
4. Sheikh, Shehzar Shahzad, et al. "A Battery Health Monitoring Method Using Machine Learning: A Data-Driven Approach." Energies14 (2020): 3658.
Line 69-72 highlights the paper objective. Additional information on the difference between liquid lithium and solid-state lithium would be great. Why is the solid-state battery more challenging? And why the Symbolic regression (SR) can help?
Line 93. LiFePO4 explanation is not given.
Line 123. What does “ 19 In contrast…”refer to? similarly, Line 124, there is “6 Delicate”? Please check
Line 152. Please provide the equation of a polarization degree.
Figure 4, right. The meaning of ρ value was not explained.
Line 170. Supplementary Table 1 was not provided with the manuscript. Please check.
Line 198. Why Eq. 2 needs to be subtracted by 3? What does value 3 represent?
Line 216-218. Please explain what each hyperparameter and each function sets mean. What is the purpose of applying them?
Line 223. What does the formula in Table 1 mean? Please elaborate. Additionally, provide the R2 formulation and explain the R2 scale in terms of accuracy. As well as, what does complexity mean here, and how to determine the complexity?
Figure 6 can be updated with experimental data set up and feature selection.
Table 2. How were the RUL error and RUL prediction determine? Please provide equations or explanations.
Line 247. Please discussed why the SR outperformed other methods. Which property of the SR is better than other methods? In addition, are there any anticipated drawbacks?

Author Response

Response letter

Manuscript ID: applsci-1168244

Solid-State Lithium Battery Cycle Life Prediction Using Ma-chine Learning

Danpeng Cheng 1‡, Wuxin Sha 1‡, Linna Wang 2, Shun Tang 1, Aijun Ma 3, Yongwei Chen 3, Huawei Wang 3,Ping Lou 4, Songfeng Lu 5, Yuan-Cheng Cao 1,*

Response to Reviewer #2:

We thank Reviewer #2 for the thoughtful comments and reading our manuscript with great patience, and welcome the opportunity to address and clarify the issues raised in the report. Our responses to the points raised in the report are described below.

General of comments：

Response to General of comments：

We thank Reviewer #2 very much for the positive assessment and important support of our manuscript. We made further revisions on our manuscript based on Reviewer #2's suggestions. We believe that these revisions and improvements will make the revised manuscript more reasonable and insightful, and our work would be accepted by Applied Sciences.

Comment 2-1:

The title does not represent the work. The title said the prediction was carried out using machine learning. However, in the abstract, the prediction was performed with a genetic algorithm, and it was mentioned that this paper shows a great prospect of machine learning algorithms.

Response to Comment 2-1:

We thank Reviewer-2’s valuable suggestion. We agree that the original statement was a little confusing. In fact, symbolic regression is indeed a machine learning algorithm, which uses data-driven methods to find the optimal parameters. But its optimization method is genetic algorithm (reference [30] for details). Therefore, we delete the words "genetic algorithm" and replace with “machine learning”.

Comment 2-2:

Omit or explain the abbreviated term such as R2. The abstract does not represent the title in terms of the method employed.

Response to Comment 2-2:

We thank Reviewer-2’s valuable questions. We have omitted the word “R2” in the abstract and provide more explanation in the following text, including its formulation. As we updated in Comment 2-1, the method employed in the abstract and title have both changed into machine learning algorithms.

Comment 2-3:

There are many key MDPI published paper in this topic that is not included, such as:

Sidorov, Denis, Fang Liu, and Yonghui Sun. "Machine Learning for Energy Systems." (2020): 4708.
Khumprom, Phattara, and Nita Yodo. "A data-driven predictive prognostic model for lithium-ion batteries based on a deep learning algorithm." Energies4 (2019): 660.
Chandran, Venkatesan, et al. "State of Charge Estimation of Lithium-Ion Battery for Electric Vehicles Using Machine Learning Algorithms." World Electric Vehicle Journal1 (2021): 38.
Sheikh, Shehzar Shahzad, et al. "A Battery Health Monitoring Method Using Machine Learning: A Data-Driven Approach." Energies14 (2020): 3658.

Response to Comment 2-3:

We thank Reviewer-2’s valuable question. The corresponding articles have been added to the reference list with the number [16-19].

Comment 2-4:

Line 69-72 highlights the paper objective. Additional information on the difference between liquid lithium and solid-state lithium would be great. Why is the solid-state battery more challenging? And why the Symbolic regression (SR) can help?

Response to Comment 2-4:

We thank Reviewer-2’s valuable question. The differences between liquid lithium and solid-state lithium battery degradation mechanisms have been added in the following paragraph of the selected part and summarized in figure 1. Complicated impedance evolution is the main reason why the solid-state battery RUL prediction is more challenging. We choose symbolic regression (SR) as the prediction method because of the lack of mature quantitative mapping formula between battery features and cycle life as well as the strong nonlinear fitting ability of symbolic regression. More details about the selection of SR are discussed in the part “2.3 Algorithm selection”. The corresponding scripts have been updated as follows:

Introduction

Lithium ion batteries are gradually favored by the market because of their excellent performance and decreasing cost [1-4]. And the demand for higher energy density and concerns for safety issues of energy storage devices lead to the development of solid-state lithium batteries. First, solid-state batteries mitigate safety risks by replacing flammable organic liquid electrolytes with thermostable solid electrolytes. Second, solid electrolytes with a wider electrochemical window could enable the applications of lithium metal anodes and certain cathodes with higher voltage capability and larger specific capacity [5,6].

Many physical and chemical battery models have been proposed for diverse degradation mechanisms such as the growth of the solid electrolyte interphase, the fracture of lithium dendrites, active material loss, and interfacial polarization, all of them are listed in Fig. 1 [22-26]. In general, the causes of battery degradation can be divided into two categories, capacity fading and impedance increase. Both of them have some related measurable physical quantities while impedance information is more expensive to test and difficult to analyze. For commercial lithium ion batteries, impedance doesn’t change too much because the liquid electrolyte wets the gaps inside electrode particles as well as the gaps between electrodes and electrolytes[35]. In contrast, Impedance increasing takes up a major degradation factor for solid-state batteries[6]. Delicate point contact is expected for solid-solid interfaces in solid-state batteries. These solid electrolyte/electrode interfaces significantly govern the electrochemical properties and cycle lifetime, as unwanted reaction products at the interface cannot dissolve and diffuse in the solid electrolyte [27, 28]. Hence, it is necessary to employ both features to capture the electrochemical impedance evolution during degradation and features to capture capacity fading during cycling.

Figure 1. Summary of causes of battery degradation

2.3 Algorithm selection

Cycle life and RUL prediction is a regression task. Since there is no clear formula to describe the relationship between the proposed features and cycle numbers quantitatively, we choose symbolic regression (SR), a machine learning model complemented by genetic programming optimization method, to solve this problem because it could generate an expression close to the theoretical formula by progressive optimization.

Comment 2-5:

Line 93. LiFePO4 explanation is not given.

Response to Comment 2-5:

We thank Reviewer-2’s valuable question. LiFePO4 represents for lithium iron phosphate. We have abbreviated it as LFP in the front part of the same paragraph. So we just changed it to LFP.

Comment 2-6:

Line 123. What does “ 19 In contrast…”refer to? similarly, Line 124, there is “6 Delicate”? Please check

Response to Comment 2-6:

We thank Reviewer-2’s valuable question. We have corrected these. The numbers are references and we are sorry for not adding square brackets. The corresponding reference lists are shown below:

[31] R.Koerver, I. Dursun, T.Leichtwei , C. Dietrich, W. Zhang, & J. Binder, et al. Chem. Mater., 2017, 29, 5574−5582.

[6] L. Xu, S. Tang, Y. Cheng, K. Wang, J. Liang, C. Liu, Y.-C. Cao, F. Wei, and L. Mai, Joule, 2018, 2, 1991-2015.

Comment 2-7:

Line 152. Please provide the equation of a polarization degree.

Response to Comment 2-7:

We thank Reviewer-2’s valuable question. The polarization degree of a battery is measured by the kurtosis of charge/discharge voltage difference sequence (gray area in figure 4a). The corresponding equation is equation 4 in feature “Feature 9-10”.

Figure 4a. The charge/discharge voltage difference sequence

where K represents the kurtosis, n is the number of voltage sequence samples, X_i is the value of the ith sample (here is the voltage difference), μ and σ are the mean and variance of the voltage difference sequence.

Comment 2-8:

Figure 4, right. The meaning of ρ value was not explained.

Response to Comment 2-8:

We thank Reviewer-2’s valuable question. The meaning of ρ value is the Correlation coefficient which is introduced to examine the correlation between features and RUL. The formulation is:

where denotes one of features extracted from our dataset, ‘cov’ stands for the covariance between variables, ‘var’ represents the variance of the data, RUL represents the cycle numbers of remaining useful life.

Comment 2-9:

Line 170. Supplementary Table 1 was not provided with the manuscript. Please check.

Response to Comment 2-9:

We thank Reviewer-2’s valuable question. This time we have confirmed the supplementary Table 1 at the end of PDF version of manuscript and the file of Supplementary Material has been successfully uploaded to the submitting website.

Comment 2-10:

Line 198. Why Eq. 2 needs to be subtracted by 3? What does value 3 represent?

Response to Comment 2-10:

We thank Reviewer-2’s valuable question. Eq. 2 needs to be subtracted by 3 because the value of Eq. 2 for normal distribution is 3. The corresponding calculation and derivation process are shown as follows:

Assume x obey normal distribution n (μ, σ²), where μ is 0 and σ is 1. The corresponding central moment of the fourth order is as follows

And gamma function is

Therefore,

We do that subtraction operation so that the value of Eq. 4 for normal distribution is zero. Then, if the value for another distribution is greater than zero, the two variables are positively correlated. If the value for another distribution is less than zero, the two variables are negatively correlated. This will facilitate later data processing and property induction.

Comment 2-11:

Line 216-218. Please explain what each hyperparameter and each function sets mean. What is the purpose of applying them?

Response to Comment 2-11:

We thank Reviewer-2’s valuable question. Here are the explanations of these hyperparameters:

p_crossover is used to control the probability of mixing components between individual trees. While subtree will be replaced by a naïve random element during mutation, which is controlled by p_subtree_mutation. Great p_hoist_mutation number could avoid too complicated formula and p_point_mutation could bring opportunities for reintroducing eliminative functions and operators. The above probabilities were chosen to trade off the complexity against the accuracy of formulas. These function sets are the basic calculation operations from the function library of gplearn code [Stephens, T. gplearn. https://gplearn.readthedocs.io/en/latest/intro.html]. And we could explain the meaning of Corresponding operation symbol:

min() : Take the minimum value of elements.

max() : Take the maximum value of elements.

sqrt() : Take the square root of elements.

sub() : Take subtraction between elements.

mul() : Take multiplication between elements.

div(): Take division between elements.

We choose to apply these parameters because the quantization mapping formula between battery features and cycle life is unknown. So we need to obtain as many basic operation symbols as possible to cover the possible rules in solid battery degradation.

Comment 2-12:

Line 223. What does the formula in Table 1 mean? Please elaborate. Additionally, provide the R2 formulation and explain the R2 scale in terms of accuracy. As well as, what does complexity mean here, and how to determine the complexity?

Response to Comment 2-12:

We thank Reviewer-2’s valuable question.

The six formulas in Table 1 mean were generated by our SR model. They don’t have physical meanings while their components are import physical features about battery degradation. But we could explain the meaning of Corresponding operation symbol:

min() : Take the minimum value of elements.

max() : Take the maximum value of elements.

sqrt() : Take the square root of elements.

sub() : Take subtraction between elements.

mul() : Take multiplication between elements.

div(): Take division between elements.

X is the one of 30 features we listed in supplementary table S1.

We used sklearn.metrics.r2_score to calculate R² (coefficient of determination) . It reflects the fitting degree of regression equation of one variable y. The formula is as follows:

where represents the predicted value of the ith sample, means the corresponding true value, and n denotes the number of samples. The best possible score is 1.0 and it could be negative when the model is arbitrarily worse.

The complexity is the number of operational symbols which could measure the complexity of formulas in case of overfitting.

Comment 2-13:

Figure 6 can be updated with experimental data set up and feature selection.

Response to Comment 2-13:

We thank Reviewer-2’s valuable question. We have updated Figure 6 with the Table 1 “The experimental setup of hyper-parameters in SR model”. The Feature selection part is listed in supplementary table S1.

Table 1. The experimental setup of hyper-parameters in SR model.
Parameter	Value
population size	5000
Generations	5
stopping criteria	0.01
p_crossover	0.7
p_subtree_mutation	0.1
p_hoist_mutation	0.05
p_point_mutation	0.1
function set	'add', 'sub', 'mul', 'div', 'sqrt', 'log', 'abs', 'neg', 'inv', 'max', 'min'
parsimony coefﬁcient	0.001
p_crossover is used to controll the probability of mixing components between individual trees. While subtree will be replaced by a naïve random element during mutation, which is controlled by p_subtree_mutation. Great p_hoist_mutation number could avoid too complicated formula and p_point_mutation could bring opportunities for reintroducing eliminative functions and operators. The above probabilities were chosen to trade off the complexity against accuracy of formulas.

Comment 2-14:

Table 2. How were the RUL error and RUL prediction determine? Please provide equations or explanations.

Response to Comment 2-14:

We thank Reviewer-2’s valuable question. We have provided the RUL error equations in Table 3 (|RUL_error = RUL_True - RUL_prediction|, ‘||’ means taking the absolute value). RUL prediction is the output value of our model, which is calculated by the fifth formula in Table 1.

min(X20, 0.978*X1)

where ‘min’ means taking the smaller value of two variables, X20 represents the kurtosis of charge/discharge voltage difference sequence calculated by Eq. 2, X1 represents the charge specific capacity of the battery. The compared RUL prediction formulas in EN, SVR and GPR can be found in reference [20], [22] and [24].

Comment 2-15:

Line 247. Please discussed why the SR outperformed other methods. Which property of the SR is better than other methods? In addition, are there any anticipated drawbacks?

Response to Comment 2-15:

We thank Reviewer-2’s valuable question. We have added some information about the excellent property and two potential drawbacks of SR models in the discussion part as follows:

“The higher prediction accuracy of our SR model is attributed to the screened features which accurately track the battery degradation and create highly correlated inputs for the SR model. The diversity of applied symbols breaks the restriction of fixed expressions and is also helpful for better optimization results [37]. However, the computation cost might be the current limitation of this model. Comparing the methods with fixed mapping formula form, the calculation time and storage consumption of symbolic regression are larger. This can be improved by reducing the number of screened features and formula symbols by domain knowledge in future work. Besides, the training process is strongly dependent on the quality and quantity of battery testing data. So we make our experimental data publicly available at Zenodo [38]. It is like a call to encourage other battery researchers to share their experimental data to promote development of battery cycle life prediction research.”

Author Response File: Author Response.docx

Reviewer 3 Report

I can see the authors’ motivation and justification of this research, but it is a little ambiguous in some points and incomplete in others. Besides, the proposed algorithm seems adapted from an existing Python library, so that it is not new. Thus, the merit of this work resides in the application of existing technologies and algorithms to the problem of predicting solid-state lithium batteries RUL. Due to this, a more detailed justification is needed. Some concrete corrections are suggested below.

Lines 29 – 31: “And the demand for higher energy density and fire safety of energy storage devices leads to the development of solid-state lithium batteries due to the use of inflammable solid electrolyte with wider redox window” – It is a bit confusing statement. I suppose that it describes the risks of previous lithium battery designs compared with solid-state lithium batteries, but it should be clarified.

Lines 74 – 76: “As far as we know, the largest open source solid-state lithium polymer battery charging and discharging dataset (LFP/Li or NCM/Li) is generated and disclosed” - I don’t understand this, something is missing, please explain.

Lines 121 - 125: “For commercial lithium ion batteries, impedance doesn’t change too much because the liquid electrolyte wets the gaps inside electrode particles as well as the gaps between electrodes and electrolytes. 19 In contrast, Impedance increasing takes up a major degradation factor for solid state batteries. 6 Delicate point contact is expected for solid-solid interfaces in solid state batteries” - The numbers appear without square brackets, so it is not clear if they are references.

Lines 219 – 22: “After several populations of training, an optimized expression generated by our SR model outputs predicted values approaching the ground-truth RUL values in our training battery data” - Please, could you be more specific about the number of populations and other computing measures (execution time, used memory, computer characteristics, etc)?

Lines 236 – 245: “We also conduct the comparison of predicted results and errors with the other three machine learning methods, support vector regression (SVR), Gaussian process regression(GPR) and elastic net (EN), as shown in Table 2. SVR is a non-parametric algorithm based on Support Vector Machine, GPR a kernel-based probabilistic model according to the similarity between samples, while the elastic net is a regularized linear regression model. Take cycle 78 as an example, the true RUL value is 127. The predicted RUL values of GPR, SVR, EN and SR are 98, 107, 122 and 128, respectively. The higher optimization efficiency of our SR model is attributed to the diversity of applied symbols, which breaks the restriction of fixed expressions.30” - The final number is confusing, is it a reference number? Why do you select these algorithms to compare with the proposed one? Are there any criteria based on recent literature or other considerations? Please explain.

Author Response

Response letter

Manuscript ID: applsci-1168244

Solid-State Lithium Battery Cycle Life Prediction Using Ma-chine Learning

Danpeng Cheng 1‡, Wuxin Sha 1‡, Linna Wang 2, Shun Tang 1, Aijun Ma 3, Yongwei Chen 3, Huawei Wang 3,Ping Lou 4, Songfeng Lu 5, Yuan-Cheng Cao 1,*

Response to Reviewer #3:

We thank Reviewer #3 for the thoughtful comments and reading our manuscript with great patience, and welcome the opportunity to address and clarify the issues raised in the report. Our responses to the points raised in the report are described below.

General of comments：

Response to General of comments：

We thank Reviewer #3 very much for the positive assessment and important support of our manuscript. We made further revisions on our manuscript based on Reviewer #3's suggestions. We believe that these revisions and improvements will make the revised manuscript more reasonable and insightful, and our work would be accepted by Applied Sciences.

Comment 3-1:

Response to Comment 3-1:

We thank Reviewer-3’s valuable suggestion. The corresponding section has been updated as follows:

“And the demand for higher energy density and concerns for safety issues of energy storage devices lead to the development of solid-state lithium batteries. First, solid-state batteries mitigate safety risks by replacing flammable organic liquid electrolytes with thermostable solid electrolytes. Second, solid electrolytes with wider electrochemical window could enable the applications of lithium metal anodes and certain cathodes with higher voltage capability and larger specific capacity.”

Comment 3-2:

Response to Comment 3-2:

We thank Reviewer-3’s valuable suggestion. We have already open source our experimental dataset on the website (https://zenodo.org/record/4697238). It contains charging and discharging testing data from 12 solid-state lithium polymer batteries. We want to be the first research team to publish the solid battery testing data and encourage other researchers to share their data. It is like a call to make experimental data and code publicly available to promote development of battery cycle life prediction research.

Comment 3-3:

Response to Comment 3-3:

We thank Reviewer-3’s valuable suggestion. The numbers are references and we are sorry for not adding square brackets. The corresponding reference lists are shown below:

[31] R. Koerver, I. Dursun, T.Leichtwei , C. Dietrich, W. Zhang, & J. Binder, et al. Chem. Mater, 2017, 29, 5574-5582.

[6] L. Xu, S. Tang, Y. Cheng, K. Wang, J. Liang, C. Liu, Y.-C. Cao, F. Wei, and L. Mai, Joule, 2018, 2, 1991-2015.

Comment 3-4:

Response to Comment 3-4:

We thank Reviewer-3’s valuable suggestion. The number of populations is 5 and this value means that our formulas have evolved over five generations. We have updated Table 3. to describe computational characteristics and Table 1. for the experimental setup of hyper-parameters.

Table 3. The computational characteristics for the SR model.
Computer characteristics	Value
Training time	20.6223s
Inference time	0.0009s
Total used memory	0.6736GB
The explanation of each characteristics are as follows: Training time is the whole time for training of our SR model including 5 generations to breed, mutate and evolve, Inference time is the time taken by the model from receiving input battery feature parameters to calculating predicted RUL values, Total used memory is the memory containing model parameters and formulas of every generation.

Table 1. The experimental setup of hyper-parameters in SR model.
Parameter	Value
population size	5000
Generations	5
stopping criteria	0.01
p_crossover	0.7
p_subtree_mutation	0.1
p_hoist_mutation	0.05
p_point_mutation	0.1
function set	'add', 'sub', 'mul', 'div', 'sqrt', 'log', 'abs', 'neg', 'inv', 'max', 'min'
parsimony coefﬁcient	0.001
p_crossover is used to controll the probability of mixing components between individual trees. While subtree will be replaced by a naïve random element during mutation, which is controlled by p_subtree_mutation. Great p_hoist_mutation number could avoid too complicated formula and p_point_mutation could bring opportunities for reintroducing eliminative functions and operators. The above probabilities were chosen to trade off the complexity against accuracy of formulas.

Comment 3-5:

Lines 236 – 245: “We also conduct the comparison of predicted results and errors with the other three machine learning methods, support vector regression (SVR), Gaussian process regression (GPR) and elastic net (EN), as shown in Table 2. SVR is a non-parametric algorithm based on Support Vector Machine, GPR a kernel-based probabilistic model according to the similarity between samples, while the elastic net is a regularized linear regression model. Take cycle 78 as an example, the true RUL value is 127. The predicted RUL values of GPR, SVR, EN and SR are 98, 107, 122 and 128, respectively. The higher optimization efficiency of our SR model is attributed to the diversity of applied symbols, which breaks the restriction of fixed expressions.30” - The final number is confusing, is it a reference number? Why do you select these algorithms to compare with the proposed one? Are there any criteria based on recent literature or other considerations? Please explain.

Response to Comment 3-5:

We thank Reviewer-3’s valuable suggestion. The final number “30” is indeed a reference number and the corresponding literature has been listed at the end. We select the GPR, SVR, EN algorithms as Benchmarking methods to compare with the proposed because these methods come from recent excellent literature.

EN (elastic network):

Peter M. Attia et al. established a closed-loop optimization system using elastic network regression and Bayesian optimization to predict the battery life and improve fast charging policy, and compress the original experimental period of over 500 days to 16 days [20].

SVR (support vector regression):

Liu et al. proposes a self-adaptive battery state of health assessment method based on different degradation features extracted from the voltage, electric current and critical time during operation. These degradation features are fused and trained by the support vector regression mapping model that links the feature space to the state of health space. The correlation between the degradation feature and the battery testing capacity is higher than 0.7 and the mean error of health estimation is less than 0.05. This study illustrates the adaptability and applicability of the proposed health state assessment approach in battery applications. [22]

GPR (Gaussian process regression):

Zhang et al. used Gaussian process regression to deal with the electrochemical impedance spectroscopy of commercial Lithium ion batteries, and successfully predicted the capacity and remaining cycle life of these batteries. By calculating the importance weight corresponding to different frequencies of impedance spectrum, the interfacial impedance evolution represented by the low frequency region is detected out to be the focal points affecting battery capacity and degradation patterns [24].

The corresponding references are listed below:

[20] P. M. Attia, A. Grover, N. Jin, K. A. Severson, T. M. Markov, Y. H. Liao, M. H. Chen, B. Cheong, N. Perkins, Z. Yang, P. K. Herring, M. Aykol, S. J. Harris, R. D. Braatz, S. Ermon, and W. C. Chueh, Nature, 2020, 578, 397-402.

[22] Liu D, Song Y, Li L, et al. On-line life cycle health assessment for lithium-ion battery in electric vehicles. Journal of Cleaner Production, 2018, 199(PT.1-1130):1050-1065.

[24] Y. Zhang, Q. Tang, Y. Zhang, J. Wang, U. Stimming, and A. A. Lee, Nature Commun., 2020, 11, 1706.

[37] Weng B, Song Z, Zhu R, et al. Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts. Nat Commun., 2020, 11, 3513.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Title: “Solid-State Lithium Battery Cycle Life Prediction Using Machine Learning”

In this work, the authors provided charge/discharge data of 12 solid-state lithium polymer batteries, which were collected with cycle life ranged from 71 to 213 cycles. The authors predicted the remaining useful life of these batteries by using a genetic algorithm, called Symbolic regression. The authors claim that after populations of breed, mutation and evolution training, the R² accuracy of quantitative prediction of cycle life reaches 87.9%, in addition they claim that this work shows the great prospect of data-driven machine learning algorithm in the life prediction of solid-state batteries, providing a new approach for the batch classification, echelon utilization and recycling of batteries.

General comment: This work was partially revised by the authors. Nevertheless, some further revisions are needed to improve the quality and the effectiveness of the work. In particular, the main results achieved in this work are not clearly explained to the interested readers nor compared to the current state of the art. Perhaps, some important information is provided in supplementary files and it is not clear for the readers. As a consequence, the value of this work is still not clear.

Some detailed comment:

Lines: “ Table 2. The six mathematical formulas generated by the SR model

We build a symbolic regression model using a series of descriptors extracted from 264

the full voltage curves of each cycle. The specific feature selection and hyperparameter 265

setup are discussed in the above section. After several populations of training, an opti- 266

mized expression generated by our SR model outputs predicted values approaching the 267

ground-truth RUL values in our training battery data. The formulas generated by our SR 268

model can be found in Table 2. 26

The complexity in Table 2 is the number of operational symbols which could measure 273

the complexity of formulas in case of overfitting. Among the produced formulas, only 274

those with high accuracy and low complexity are explainable for researchers to better un- 275

derstand the relationship of solid-state batteries to their characteristics. Therefore, the fifth 276

formula in Table 2, containing features X20 (Kurtosis) and X1 (Charge specific capacity) is 277

the best compromise between accuracy and complexity. The pair plot for the actual RUL 278

of the test batteries and the predicted RUL inferred by the fifth formula in Table 2 is shown 279

in Fig. 7. All the data points are concentrated on the diagonal of the pair plot, which shows 280

that the predicted value is basically consistent with the actual value. Our model achieves 281

excellent prediction accuracy from the perspective of coefficient of determination (R2, 0.88) 282

and root-mean-square error (RMSE, 20.97). R2 and RMSE are frequently chosen to evalu- 283

ate the machine learning model performance [36]. R2 could reflects the fitting degree of 284

regression equation and the formula is as follows”

*) Perhaps in this table are resumed the main results of this work. Nevertheless, the table is not clearly explained in the main text nor in the caption. Thus, the authors are requested to deeply rework this part of the text (i.e., the Results section) to provide a clear explanation of their main results. In particular, all the symbols (apart from X1 and X20 already cited) X5,X6,X24,X29, etc in this table should be described in detail.

Lines; Figure 7. The pair plot for Predicted/Actual RUL 30”

*) Apparently also in this figure an important result is shown. Nevertheless, no description of this results is provided in the main text nor in the caption. Please rework accordingly.

*) Formulas 5,6,7 are not results but already known formulas so they should be cited (or eventually inserted) within the “Materials and Methods” section.

Lines:”4. Discussion 305

In this work, we have successfully assemble 12 solid-state polymer batteries with the 306

cycle lives ranging from 70 to 213 cycles to build the training datasets. Then we perform 307

battery RUL prediction by using symbolic regression method with 11 highly correlated 308

features as model inputs. These 11 physical quantities such as capacity, energy and 309

charge/discharge voltage differences are designed and screened by analyzing their differ- 310

ent degradation mechanisms and charging/discharging curves compared with liquid lith- 311

ium batteries. After populations of breed, mutation and evolution training, the optimized 312

expression generated by our SR model shows a higher accuracy (R2, 0.88). Through ap- 313

propriate pretraining on LFP solid battery data, the model also exhibits a passable transfer 314

prediction ability for NMC solid battery data. These experimental results show that the 315

method developed in this paper is universal in the field of solid-state lithium batteries 316

with different material systems. 317

Machine learning has recently emerged as a popular approach largely applied in bat- 318

tery degradation prediction [11-15]. Hence, we also conduct the comparison of predicted 319

results and errors with the other three machine learning methods coming from recent ex- 320

cellent related literature, support vector regression (SVR), Gaussian process regression 321

(GPR) and elastic net (EN), as shown in Table 3. SVR is a non-parametric algorithm based 322

on Support Vector Machine, GPR is a kernel-based probabilistic model which makes pre- 323

diction according to the similarity between samples, while the elastic net is a regularized 324

linear regression model. These three methods have been employed frequently in recent 325

literature, and have achieved high accuracy. However, due to the existence of internal 326

fixed formula, they still have no satisfactory adaptability and applicability in different 327

battery systems. For accuracy comparison during these machine learning models, we ran- 328

domly selected the cycle 78 as an example. The true RUL value is 127. The predicted RUL 329

values of GPR, SVR, EN and SR are 98, 107, 122 and 128, respectively. 330

331

Table 3. Comparison of RUL prediction results of GPR, SVR, EN and S”

*) The authors should deeply rework this section. They should enlarge and improve this part of their work. In particular, in the “Results section” they should provide quantitative comparisons between experimental data and “predictive” models for *all * the experimental curves. Indeed, the comparison to a randomly selected curve (cycle78) is not enough to show the value of this approach. In addition, within the discussion section they should discuss (not present) their results. As a consequence, all the numeric tables should be provided within the “Results” section and only discussed in this section.

Table 4. The computational characteristics for the SR model

*) See the previous comment. Also in this case all the numeric results should be presented in the “Results” section, while their value should be discussed in the “Discussion” sections

Lines: “5. Conclusion 356

This paper focuses on identifying degradation patterns and RUL prediction of solid- 357

state lithium polymer batteries by using domain knowledge features. First, 11 physical 358

quantities such as capacity, energy and charge/discharge voltage differences are chosen 359

as input features by analyzing degradation mechanisms, charging/discharging curves and 360

correlations between features and RUL. Then the RUL is estimated by a symbolic regres- 361

sion model, which shows that Kurtosis and Charge specific capacity have strong correla- 362

tions with RUL. Finally, the SR model shows a higher accuracy (R2, 0.88) compared with 363

other three machine learning methods (SVR, GPR, EN with R2 as 0.69, 0.49, 0.79, respec- 364

tively). Solid-state lithium batteries are close to the gate of commercialization and their 365

RUL prediction becomes of vital importance. Improving the RUL prediction model with 366

higher accuracy, earlier estimation and better interpretability is promising to further pro- 367

mote the large-scale applications of solid-state lithium batteries.”

*) Similarly, this section should be improved to underline how the achieved results could improve the current state of the art .

Author Response

Response letter (Round 2)

Manuscript ID: applsci-1168244

Solid-State Lithium Battery Cycle Life Prediction Using Machine Learning

Danpeng Cheng 1‡, Wuxin Sha 1‡, Linna Wang 2, Shun Tang 1, Aijun Ma 3, Yongwei Chen 3, Huawei Wang 3,Ping Lou 4, Songfeng Lu 5, Yuan-Cheng Cao 1,*

Response to Reviewer #1

We thank Reviewer #1 for the thoughtful and encouraging comments about our manuscript and welcome the opportunity to address and clarify the issues raised in the report. Our responses to the points raised in the report are described below.

General of comments:

This work was partially revised by the authors. Nevertheless, some further revisions are needed to improve the quality and the effectiveness of the work. In particular, the main results achieved in this work are not clearly explained to the interested readers nor compared to the current state of the art. Perhaps, some important information is provided in supplementary files and it is not clear for the readers. As a consequence, the value of this work is still not clear.

Response to General of comments：

Comment 1-1

Lines: “ Table 2. The six mathematical formulas generated by the SR model. We build a symbolic regression model using a series of descriptors extracted from 264the full voltage curves of each cycle. The specific feature selection and hyperparameter 265setup are discussed in the above section. After several populations of training, an opti- 266mized expression generated by our SR model outputs predicted values approaching the 267ground-truth RUL values in our training battery data. The formulas generated by our SR 268model can be found in Table 2. 26The complexity in Table 2 is the number of operational symbols which could measure 273the complexity of formulas in case of overfitting. Among the produced formulas, only 274those with high accuracy and low complexity are explainable for researchers to better un- 275derstand the relationship of solid-state batteries to their characteristics. Therefore, the fifth 276formula in Table 2, containing features X20 (Kurtosis) and X1 (Charge specific capacity) is 277the best compromise between accuracy and complexity. The pair plot for the actual RUL 278of the test batteries and the predicted RUL inferred by the fifth formula in Table 2 is shown 279in Fig. 7. All the data points are concentrated on the diagonal of the pair plot, which shows 280that the predicted value is basically consistent with the actual value. Our model achieves 281excellent prediction accuracy from the perspective of coefficient of determination (R2, 0.88) 282and root-mean-square error (RMSE, 20.97). R2 and RMSE are frequently chosen to evaluate 283 the machine learning model performance [36]. R2 could reflects the fitting degree of 284regression equation and the formula is as follows”

*) Perhaps in this table are resumed the main results of this work. Nevertheless, the table is not clearly explained in the main text nor in the caption. Thus, the authors are requested to deeply rework this part of the text (i.e., the Results section) to provide a clear explanation of their main results. In particular, all the symbols (apart from X1 and X20 already cited) X5,X6,X24,X29, etc in this table should be described in detail.

Response to Comment 1-1

We thank Reviewer #1’s kind and valuable suggestions. We have explained our main results and added details including all the symbols to describe Tabel 2. The revised version is as follows:

Table 2. The six mathematical formulas generated by the SR model

Formulas	R²	Complexity
X5	0.7482	0
min(X29, X5)	0.7730	1
0.820 X1	0.8768	1
min(X20, 0.798 X1)	0.8792	2
max(X1, ) 24	0.7884	3
X5 max( , X6-(X24/X21))	0.9307	5
The explanation of each symbol are as follows: X1 denotes charge specific capacity; X5, X6 represents the integral of capacity and voltage charging/discharging curve; X20 is Kurtosis of difference between charging and discharging curve; X21 describes the previous locally convex/concave X coordinate of voltage and capacity charging curve; X24 denotes the latter locally convex/concave Y coordinate of voltage and capacity charging curve; X29 is the charging time. “min()” means taking the minimum value of elements. “max()” means taking the maximum value of elements. could be calculated by using equation (5)(6). And the complexity is the number of operational symbols that could measure the complexity of formulas in case of overfitting.

Among the produced formulas, only those with high accuracy and low complexity are explainable for researchers to better understand the relationship of solid-state batteries to their characteristics. The six mathematical formulas that met the criteria of simplicity and accuracy are shown in Table 2. The first formula, X5, is the integral of capacity and voltage charging curve of each battery (measured in the unit of mWh), which represents the amount of energy charged into the battery. This value is more than just capacity because it also takes inner resistance into account. The increase of impedance will affect the voltage value of each capacity point on the charge/ discharge curve, thus affecting the integral energy. It could roughly predict the number of RUL with the simplest form. The second formula, min(X29, X5), is the smaller value between charging time and integral energy. The charging time is proportional to the capacity because of the constant current charging operation. This formula shows that the battery capacity and energy will decrease with the decrease of its RUL. The third formula is the battery discharge capacity (measured in the unit of mAh) multiplied by a factor of 0.820, and could basically predict the value of RUL. It proves that capacity degradation also takes up a major factor for solid-state batteries aging. The fourth formula, min (X20, 0.798 X1), is taking the smaller value between our proposed kurtosis feature and discharge capacity multiplied by a factor. X20 is a measure of polarization degree, which is the kurtosis value of charge/discharge voltage difference sequence. And X20 monotonously decreases with the cycle numbers as shown in Fig. 3, because the overpotential caused by polarization concentrates at the beginning and the end of charge/discharge process in the early battery cycles, and then gradually fills the whole charge/discharge process. X1 is a measure of capacity degradation as we discussed in the third formula. Taken together, it proves that capacity degradation and impedance evolution are both highly correlated factors in solid-state battery aging which is consistent with our previous conjecture. The forms of the fifth and sixth formulas are more complicated and lose their original physical meaning after calculation. However, it can still be seen that the values of charging/discharging energy and capacity are directly proportional to RUL and the values of resistance and polarization degree are inversely proportional to RUL. But these two formulas are abandoned because they do not meet the trade-off between accuracy and complexity. Therefore, the fourth formula, containing features X20 (Kurtosis) and X1 (Charge specific capacity) is the best compromise between accuracy and complexity.

Comment 1-2

Lines; Figure 7. The pair plot for Predicted/Actual RUL 30”

*) Apparently also in this figure an important result is shown. Nevertheless, no description of these results is provided in the main text nor in the caption. Please rework accordingly.

*) Formulas 5,6,7 are not results but already known formulas so they should be cited (or eventually inserted) within the “Materials and Methods” section.

Response to Comment 1-2

We thank Reviewer #1’s kind suggestions. We reworked this part by providing more descriptions of the results. And We have cited the corresponding references for Formulas 5,6,7. The corresponding revised manuscript is as follows:

The pair plot for the actual RUL of the test batteries and the predicted RUL inferred by the fourth formula in Table 2 is shown in Figure 7. The upper-right corner of the plot shows the initial stage of the battery cycling. At this stage, the predicted value is less than the actual value. This may be due to the absence of significant aging characteristics, and the model cannot accurately predict the battery life from the selected features. The lower-left corner of the plot shows the end stage of the battery cycling, and the predicted value is slightly higher than the actual value at this time, which may be caused by the rapid decline of various performance indicators. In the middle of the chart, the accuracy of the model is the most satisfactory. The obvious aging characteristics shown by the batteries are convenient for the model to capture and predict. Basically, all the data points are concentrated on the diagonal of the pair plot, which shows that the predicted value is in good agreement with the actual value. the estimated value is in good agreement with the measured one.

[40]. D. Luo, S. Zeng, J. Chen, Mathematics, 2020, 8, (3).

[4]. K. A. Severson, P. M. Attia, N. Jin, N. Perkins, B. Jiang, Z. Yang, M. H. Chen, M. Aykol, P. K. Herring, D. Fraggedakis, M. Z. Bazant, S. J. Harris, W. C. Chueh, R. D. Braatz, Nature Energy, 2019, 4, (5), 383-391.

Comment 1-3

Lines:”4. Discussion 305In this work, we have successfully assemble 12 solid-state polymer batteries with the 306cycle lives ranging from 70 to 213 cycles to build the training datasets. Then we perform 307battery RUL prediction by using symbolic regression method with 11 highly correlated 308features as model inputs. These 11 physical quantities such as capacity, energy and 309charge/discharge voltage differences are designed and screened by analyzing their differ- 310ent degradation mechanisms and charging/discharging curves compared with liquid lith- 311ium batteries. After populations of breed, mutation and evolution training, the optimized 312expression generated by our SR model shows a higher accuracy (R2, 0.88). Through ap- 313propriate pretraining on LFP solid battery data, the model also exhibits a passable transfer 314prediction ability for NMC solid battery data. These experimental results show that the 315method developed in this paper is universal in the field of solid-state lithium batteries 316with different material systems. 317Machine learning has recently emerged as a popular approach largely applied in bat- 318tery degradation prediction [11-15]. Hence, we also conduct the comparison of predicted 319results and errors with the other three machine learning methods coming from recent ex- 320cellent related literature, support vector regression (SVR), Gaussian process regression 321(GPR) and elastic net (EN), as shown in Table 3. SVR is a non-parametric algorithm based 322on Support Vector Machine, GPR is a kernel-based probabilistic model which makes pre- 323diction according to the similarity between samples, while the elastic net is a regularized 324linear regression model. These three methods have been employed frequently in recent 325literature, and have achieved high accuracy. However, due to the existence of internal 326fixed formula, they still have no satisfactory adaptability and applicability in different 327battery systems. For accuracy comparison during these machine learning models, we ran- 328domly selected the cycle 78 as an example. The true RUL value is 127. The predicted RUL 329values of GPR, SVR, EN and SR are 98, 107, 122 and 128, respectively. 330

331

Table 3. Comparison of RUL prediction results of GPR, SVR, EN and SR”

Response to Comment 1-3

We thank Reviewer #1’s kind suggestions. We have deeply reworked this section by enlarging and improving the “Results” section and ‘Discussion’ section. All the numeric tables have been provided within the “Results” section and only discussed in the ‘Discussion’ section. The pair plot in Figure 7, R² accuracy and RMSE results are the quantitative comparisons between actual experimental data and “predictive” models for all the experimental curves. We agree that the comparison to a randomly selected curve (cycle78) is not enough. So we calculated the R² accuracy of all cycles for the three methods (GPR, SVR, EN) and compared the results.

Comment 1-4

Table 4. The computational characteristics for the SR model

*) See the previous comment. Also in this case all the numeric results should be presented in the “Results” section, while their value should be discussed in the “Discussion” sections.

Response to Comment 1-4

We thank Reviewer #1’s kind suggestions. We have moved Table 4 to the ‘Results’ section, and the corresponding discussion remains in the ‘Discussion’ section. And we have checked that all the numeric results have been presented in the “Results” section, while their values have been discussed in the “Discussion” sections.

Comment 1-5

Lines: “5. Conclusion 356

This paper focuses on identifying degradation patterns and RUL prediction of solid- 357state lithium polymer batteries by using domain knowledge features. First, 11 physical 358quantities such as capacity, energy and charge/discharge voltage differences are chosen 359as input features by analyzing degradation mechanisms, charging/discharging curves and 360correlations between features and RUL. Then the RUL is estimated by a symbolic regres- 361sion model, which shows that Kurtosis and Charge specific capacity have strong correla- 362tions with RUL. Finally, the SR model shows a higher accuracy (R2, 0.88) compared with 363other three machine learning methods (SVR, GPR, EN with R2 as 0.69, 0.49, 0.79, respec- 364tively). Solid-state lithium batteries are close to the gate of commercialization and their 365RUL prediction becomes of vital importance. Improving the RUL prediction model with 366higher accuracy, earlier estimation and better interpretability is promising to further pro- 367mote the large-scale applications of solid-state lithium batteries.”

*) Similarly, this section should be improved to underline how the achieved results could improve the current state of the art.

Response to Comment 1-5

We thank Reviewer #1’s kind suggestions. We have revised the ‘Conclusion’ section to underline how the achieved results could improve the current state of the art. In short, the proposed model can estimate the aging of the solid-state battery more accurately through selecting impedance evolution characteristics and formula mining. The corresponding manuscripts have been updated as follows:

This paper focuses on identifying degradation patterns and RUL prediction of solid-state lithium polymer batteries by using domain knowledge features. First, 11 physical quantities such as capacity, energy and charge/discharge voltage differences are chosen as input features by analyzing degradation mechanisms, charging/discharging curves and correlations between features and RUL. Then the RUL is estimated by a symbolic regression model, which shows that Kurtosis and Charge specific capacity have strong correlations with RUL. Finally, the SR model shows a higher accuracy (R², 0.91) compared with the other three machine learning methods (SVR, GPR, EN with R² as 0.81, 0.83, 0.69, respectively). This improvement is attributed to the accurate tracking of battery aging by the proposed impedance evolution features and the successful optimization by the unique formula mining function of SR models. Solid-state lithium batteries are close to the gate of commercialization and their RUL prediction becomes of vital importance. Improving the RUL prediction model with higher accuracy, earlier estimation and better interpretability is promising to further promote the large-scale applications of solid-state lithium batteries.

Author Response File: Author Response.docx

Reviewer 2 Report

Most of the comments had been addressed well by the authors. Additional comments for minor revision are:

Please include the response to Comment 2-12 in the manuscript as well around Table 2, and direct the readers to find additional information in the supplementary document for all the 30 variables of X presented.
Encourage the authors to explain the 6 formulas listed in Table 2.
Check all the "R2" or "R²" notations through the paper. Many of those are not superscripted.

Author Response

Response letter (Round 2)

Manuscript ID: applsci-1168244

Solid-State Lithium Battery Cycle Life Prediction Using Machine Learning

Danpeng Cheng 1‡, Wuxin Sha 1‡, Linna Wang 2, Shun Tang 1, Aijun Ma 3, Yongwei Chen 3, Huawei Wang 3,Ping Lou 4, Songfeng Lu 5, Yuan-Cheng Cao 1,*

Response to Reviewer #2:

General of comments：

Most of the comments had been addressed well by the authors. Additional comments for minor revision are:

Please include the response to Comment 2-12 in the manuscript as well around Table 2, and direct the readers to find additional information in the supplementary document for all the 30 variables of X presented.

Encourage the authors to explain the 6 formulas listed in Table 2.

Check all the "R2" or "R²" notations through the paper. Many of those are not superscripted.

Response to General of comments：

We thank Reviewer #2 very much for the positive assessment and important support of our manuscript. We have added the response to Comment 2-12 in the manuscript as well around Table 2, including the explanations of the 6 formulas and the calculation method and meaning of complexity. We also directed the readers to find table S1 in the supplementary document by marking “Supplementary Materials, Table S1” in brackets in section 2.2. Feature Selection and mentioning the existence of Supplementary Materials at the end of the manuscript. Finally, we have checked to make the “2” in "R2" or "R²" notations superscripted. We believe that these revisions and improvements will make the revised manuscript more reasonable and insightful, and our work would be accepted by Applied Sciences.

The corresponding manuscripts have been updated as follows:

Table 2. The six mathematical formulas generated by the SR model

Formulas	R²	Complexity
X5	0.7482	0
min(X29, X5)	0.7730	1
0.820 X1	0.8768	1
min(X20, 0.798 X1)	0.8792	2
max(X1, ) 24	0.7884	3
X5 max( , X6-(X24/X21))	0.9307	5
The explanation of each symbol are as follows: X1 denotes charge specific capacity; X5, X6 represents the integral of capacity and voltage charging/discharging curve; X20 is Kurtosis of difference between charging and discharging curve; X21 describes the previous locally convex/concave X coordinate of voltage and capacity charging curve; X24 denotes the latter locally convex/concave Y coordinate of voltage and capacity charging curve; X29 is the charging time. “min()” means taking the minimum value of elements. “max()” means taking the maximum value of elements. could be calculated by using equation (5)(6). And the complexity is the number of operational symbols that could measure the complexity of formulas in case of overfitting.

Among the produced formulas, only those with high accuracy and low complexity are explainable for researchers to better understand the relationship of solid-state batteries to their characteristics. The six mathematical formulas that met the criteria of simplicity and accuracy are shown in Table 2. The first formula, X5, is the integral of capacity and voltage charging curve of each battery (measured in the unit of mWh), which represents the amount of energy charged into the battery. This value is more than just capacity because it also takes inner resistance into account. The increase of impedance will affect the voltage value of each capacity point on the charge/ discharge curve, thus affecting the integral energy. It could roughly predict the number of RUL with the simplest form. The second formula, min(X29, X5), is the smaller value between charging time and integral energy. The charging time is proportional to the capacity because of the constant current charging operation. This formula shows that the battery capacity and energy will decrease with the decrease of its RUL. The third formula is the battery discharge capacity (measured in the unit of mAh) multiplied by a factor of 0.820, and could basically predict the value of RUL. It proves that capacity degradation also takes up a major factor for solid-state batteries aging. The fourth formula, min (X20, 0.798 X1), is taking the smaller value between our proposed kurtosis feature and discharge capacity multiplied by a factor. X20 is a measure of polarization degree, which is the kurtosis value of charge/discharge voltage difference sequence and monotonously decrease with the cycle numbers as shown in Fig. 3. Because the overpotential caused by polarization concentrate at the beginning and the end of charge/discharge process in the early battery cycles, and then gradually fills the whole charge/discharge process. X1 is a measure of capacity degradation as we discussed in the third formula. Taken together, it proves that capacity degradation and impedance evolution are both highly correlated factors in solid-state battery aging which is consistent with our previous conjecture. The forms of the fifth and sixth formulas are more complicated and lose their original physical meaning after calculation. However, it can still be seen that the values of charging/discharging energy and capacity are directly proportional to RUL and the values of resistance and polarization degree are inversely proportional to RUL. But these two formulas are abandoned because they do not meet the trade-off between accuracy and complexity. Therefore, the fourth formula, containing features X20 (Kurtosis) and X1 (Charge specific capacity) is the best compromise between accuracy and complexity.

Author Response File: Author Response.docx

Reviewer 3 Report

The authors have made a great effort to improve their manuscript. The reviewers' suggestions have been addressed and the overall quality is much better. In fact, some new information could be the most valuable, for example, the project of a free and open solid-state lithium battery database. The new references clarify some points. My only concern is that no new real improvement is made with regards the used algorithms, so that the merit of the paper must be assessed mainly by its proposed database and applications.

Author Response

Response letter (Round 2)

Manuscript ID: applsci-1168244

Solid-State Lithium Battery Cycle Life Prediction Using Machine Learning

Danpeng Cheng 1‡, Wuxin Sha 1‡, Linna Wang 2, Shun Tang 1, Aijun Ma 3, Yongwei Chen 3, Huawei Wang 3,Ping Lou 4, Songfeng Lu 5, Yuan-Cheng Cao 1,*

Response to Reviewer #3:

General of comments：

Response to General of comments：

We thank Reviewer #3 very much for the positive assessment and important support of our manuscript. We have uploaded and augmented our solid-state lithium battery database, which is available at Zenodo ( https://zenodo.org/record/4697238). It contains useful 30 features we extracted from our solid-state battery data. And an explanation for 30 features could be found in the manuscript or Supplementary Materials. We believe that these revisions and improvements will make the revised manuscript more reasonable and insightful, and our work would be accepted by Applied Sciences.

Author Response File: Author Response.docx

Round 3

Reviewer 1 Report

Title: “Solid-State Lithium Battery Cycle Life Prediction Using Machine Learning”

General comment: This work was partially revised by the authors. Nevertheless, some further revisions are still needed to improve the quality and the effectiveness of the work. In particular, the main results achieved in this work should be presented as formulas within the “Results” section not only in a Table. In particular, the important explanations of the meaning of all symbols should be inserted within the main text and not in a figure caption. In addition, it is not clear how the quantitative comparison between predictions and *all* experimental data have been performed. Indeed, this point is crucial to judge the overall quality of the work and should be inserted within the main text and not in a figure or in a figure caption. The main concern is related to the reliability of the proposed procedure, which seems to be based only on the comparison between predictions and experimental data: Nevertheless, no physical reasons seems to be provided for the proposed formulas, so it is quite difficult to judge about the general validity of the proposed framework (i.e., for whatever set of experimental data)

Some detailed comments:

* Equation (2) is not clear. Please explain better the definition of RUL and the relationship with n and i.

*In “Materials and Method” section almost all Equations are definitions of well known indexes. Are they really needed in this work ? Perhaps the authors could only cire R^2 and RMSE...

*) Table 5 should be transformed in text and inserted in the “Materials and Methods” section.

*) “Results” section: this is the most important part of a research manuscript. It should be deeply reworked, enlarged and improved. In particular, Tables 3 and 4 should be transformed in the text of the “Results” section, while their captions in explanations.

*) Figure 7 is not clear. Please provide a better explanation.

*) Discussion and Conclusion

It is not clear whether the proposed formulas to predict RUL are physically based. Are there relationships between the real system and the proposed formulas ? Please explain this important point. Furthermore it is not clear, why the proposed procedure should be better than physically based methods, exploiting the physics of the batteries. Indeed, for these systems the physics is known and likely predictions could be provided. In addition, it is not clear whether the goodness of the proposed formulas is based only on the value of R^2 and RMSE, without any other physical reason. Please explain.

Author Response

Response letter (Round 3)

Manuscript ID: applsci-1168244

Solid-State Lithium Battery Cycle Life Prediction Using Machine Learning

Danpeng Cheng 1‡, Wuxin Sha 1‡, Linna Wang 2, Shun Tang 1, Aijun Ma 3, Yongwei Chen 3, Huawei Wang 3,Ping Lou 4, Songfeng Lu 5, Yuan-Cheng Cao 1,*

Response to Reviewer

General of comments:

This work was partially revised by the authors. Nevertheless, some further revisions are still needed to improve the quality and the effectiveness of the work. In particular, the main results achieved in this work should be presented as formulas within the “Results” section not only in a Table. In particular, the important explanations of the meaning of all symbols should be inserted within the main text and not in a figure caption. In addition, it is not clear how the quantitative comparison between predictions and *all* experimental data have been performed. Indeed, this point is crucial to judge the overall quality of the work and should be inserted within the main text and not in a figure or in a figure caption. The main concern is related to the reliability of the proposed procedure, which seems to be based only on the comparison between predictions and experimental data: Nevertheless, no physical reasons seems to be provided for the proposed formulas, so it is quite difficult to judge about the general validity of the proposed framework (i.e., for whatever set of experimental data)

Response to General of comments：

We sincerely thank the Reviewer for the support of our work. We have made further revisions to our manuscript based on the Reviewer's thoughtful suggestions. The main results achieved in this work have been presented as formulas within the “Results” section. And the important explanations of the meaning of all symbols have been inserted within the main text. Indeed, we provided physical explanations for our formulas in the “Discussion” section.

The quantitative comparisons between predictions and *all* experimental cycle data have been evaluated by R², RMSE, and a pair plot in Figure 7. The pair plot compares the RUL value of each cycle with the predicted value of the proposed formula. We also added a 205-word explanation for Figure 7 inserted within the main text. We believe that these revisions and improvements will make the revised manuscript more reasonable and of higher quality, and our work would be finally accepted by Applied Sciences.

Comment 1

Equation (2) is not clear. Please explain better the definition of RUL and the relationship with n and i.

Response to Comment 1

We thank the Reviewer’s kind and valuable suggestions. We have added more explanations of the definition of RUL and the relationship with n and i. The revised version is as follows:

“ RUL for the th cycle is calculated by

RUL_i=n+1-i

where n is the total number of battery charge and discharge cycles, i is the ith charge and discharge cycle we choose to study. “Plus 1” means the present cycle is also included in the remaining useful cycles.”

For instance, our battery in the test dataset has 209 cycles, thus n is 209. For the 50^th cycle, RUL could be calculated with “209+1-50” and the RUL is 160.

Comment 2

In the “Materials and Method” section almost all Equations are definitions of well known indexes. Are they really needed in this work? Perhaps the authors could only cire R^2 and RMSE...

Response to Comment 2

We thank Reviewer’s kind suggestions. Although these Equations in the “Materials and Method” section are definitions of well-known indexes, they are required by the other reviewer in round 2. We apologized that we need to remain those equations and citations.

Comment 3

Table 5 should be transformed in text and inserted in the “Materials and Methods” section.

Response to Comment 3

We thank the Reviewer for the thoughtful comments. We need to confirm whether Table 5 means Figure 5 or Table 4, as we only have 4 tables in the whole manuscript. Therefore we have updated both Figure 5 and Table 4. For Figure 5, we have explanation text in the “Materials and Methods” just above the figure as follows:

“The model structure is shown in Figure 5. Firstly, a series of initial functions are randomly generated to roughly fit the mathematical relationship between the proposed features and RUL. Then, this algorithm allows these initial functions to breed, mutate and evolve on account of the survival of the fittest. We use a Python library called gplearn to implement symbolic regression programming. Hyper-parameters (p_crossover = 0.5, p_subtree_mutation = 0.1, p_hoist_mutation = 0.2, p_point_mutation = 0.1) and function sets ('add', 'sub', 'mul', 'div', 'sqrt', 'log', 'abs', 'neg', 'inv', 'max', 'min') adopted herein is to obtain a brief but accurate description formula.”

And for Table 4, we thought it does not belong to the “Materials and Methods” section, because they are all calculation results, which should belong to the “Results” section. But we have transformed Table 4 in text and inserted it right above the table in the “Results” section as required.

Comment 4

“Results” section: this is the most important part of a research manuscript. It should be deeply reworked, enlarged and improved. In particular, Tables 3 and 4 should be transformed in the text of the “Results” section, while their captions in explanations.

Response to Comment 4

We thank the Reviewer very much for the positive assessment. Our captions have explained above Tables 3 and 4 and we’ll write them for you again just in case.

“We also conduct the comparison of predicted results and errors with the other three machine learning methods coming from recent excellent related literature, support vector regression (SVR), Gaussian process regression (GPR) and elastic net (EN), as shown in Table 3. The RULprediction results for SR model are calculated by the fourth formula in Table 2 and the other three are calculated by the methods in reference [22], [24] and [4], respectively. The computational characteristics for the four models are shown in table Table 4. The explanation of each characteristic is as follows. Training time is the whole time for training of our SR model including 5 generations to breed, mutate and evolve. Inference time is the time taken by the model from receiving input battery feature parameters to calculating predicted RUL values. Total used memory is the memory containing model parameters and formulas of every generation.”

Comment 5

Figure 7 is not clear. Please provide a better explanation.

Response to Comment 5

We thank Reviewer’s kind comments. We have added a better explanation for Figure 7 as follows:

“And the pair plot for the actual RUL of the test batteries and the predicted RUL inferred by the fourth formula in Table 2 is shown in Figure 7.”

“We further investigate the fourth formula in Table 2 by comparing the predicted RUL and true value of all experimental cycles. As shown in Figure 7, the upper-right corner of the plot demonstrates the initial stage of battery cycling. At this stage, the predicted value is less than the actual value. This may be due to the absence of significant aging characteristics, and the model cannot accurately predict the battery life from the selected features. The lower-left corner of the plot shows the end stage of the battery cycling, and the predicted value is slightly higher than the actual value at this time, which may be caused by the rapid decline of various performance indicators [31]. In the middle of the chart, the accuracy of the model is the most satisfactory. The obvious aging characteristics shown by the batteries are convenient for the model to capture and predict. Basically, all the data points are concentrated on the diagonal of the pair plot, which shows that the predicted value is in good agreement with the actual value.”

Comment 6

Discussion and Conclusion

Response to Comment 6

We sincerely thank the Reviewer’s thoughtful suggestions. The proposed formulas are physically based because each term in the formula is a physical quantity or a statistic of a physical quantity, which can be used to measure the capacity degradation or impedance evolution of solid-state batteries. The corresponding explanation has been given in the manuscript as follows:

“The third formula (X1) is the battery discharge capacity (measured in the unit of mAh) multiplied by a factor of 0.820, and could basically predict the value of RUL. It proves that capacity degradation also takes up a major factor for solid-state batteries aging. The fourth formula, min (X20, 0.798*X1), is taking the smaller value between our proposed kurtosis feature and discharge capacity multiplied by a factor. X20 is a measure of polarization degree, which is the kurtosis value of charge/discharge voltage difference sequence. And X20 monotonously decreases with the cycle numbers as shown in Figure 3b, because the overpotential caused by polarization concentrates at the beginning and the end of charge/discharge process in the early battery cycles, and then gradually fills the whole charge/discharge process. X1 is a measure of capacity degradation as we discussed in the third formula. Taken together, it proves that capacity degradation and impedance evolution are both highly correlated factors in solid-state battery aging which is consistent with our previous conjecture.”

The only disadvantage is that the physical quantity in the formula loses the unit dimension after symbolic operations, so there is no specific physical unit. In fact, so far, no physical formula can accurately predict the battery life and retain the physical units. Some physically based methods do provide predictions but they always consume large computational cost and can not be trained and perform prediction online. And due to the nonlinearity and chaos of battery system aging, there are no accurate and clear physical models for these systems. The same view can be found in the following references [Ng, MF., Zhao, J., Yan, Q. et al. Predicting the state of charge and health of batteries using data-driven machine learning. Nat Mach Intell 2, 161–170 (2020). https://doi.org/10.1038/s42256-020-0156-7] [ Severson, K.A., Attia, P.M., Jin, N. et al. Data-driven prediction of battery cycle life before capacity degradation. Nat Energy 4, 383–391 (2019). https://doi.org/10.1038/s41560-019-0356-8]. The goodness of the proposed formulas is not based only on R² and RMSE. We also presented a pair plot in Figure 7, which compares the RUL value of each cycle with the predicted value of the proposed formula. The closer the pair plot is to the diagonal, the more accurate the prediction is. The physical reasons for this proposed formula are presented in the text of the second and third paragraphs in the “Discussion” section.

Author Response File: Author Response.docx