Predicting Perovskite Performance with Multiple Machine-Learning Algorithms

Perovskites have attracted increasing attention because of their excellent physical and chemical properties in various fields, exhibiting a universal formula of ABO3 with matching compatible sizes of A-site and B-site cations. In this work, four different prediction models of machine learning algorithms, including support vector regression based on radial basis kernel function (SVM-RBF), ridge regression (RR), random forest (RF), and back propagation neural network (BPNN), are established to predict the formation energy, thermodynamic stability, crystal volume, and oxygen vacancy formation energy of perovskite materials. Combined with the fitting diagrams of the predicted values and DFT calculated values, the results show that SVM-RBF has a smaller bias in predicting the crystal volume. RR has a smaller bias in predicting the thermodynamic stability. RF has a smaller bias in predicting the formation energy, crystal volume, and thermodynamic stability. BPNN has a smaller bias in predicting the formation energy, thermodynamic stability, crystal volume, and oxygen vacancy formation energy. Obviously, different machine learning algorithms exhibit different sensitivity to data sample distribution, indicating that we should select different algorithms to predict different performance parameters of perovskite materials.


Introduction
With the progress of science and technology as well as the development of social economy, people are very active in the research on the development and utilization of various energy resources [1][2][3][4][5][6]. In recent years, ABO 3 perovskite composite oxides have attracted great interest [3,4,[7][8][9][10][11][12][13][14][15]. Research has focused on the development of new perovskite materials to improve activity, selectivity, and stability, as well as the development of advanced manufacturing techniques to reduce their cost while ensuring their reliability, safety, and reproducibility [14][15][16]. In ABO 3 perovskite oxides, the A site is the rare earth or alkaline earth metal ions, which usually stabilize the structure, while the B site is occupied by the smaller transition metal ions [17]. Through partial substitution of A and B sites, multi-component perovskite compounds can be combined [16]. When A or B sites are partially replaced by other metal ions, anion defects or B sites at different valences can be formed. This improves the properties of the compounds but does not fundamentally change the crystal structure [17]. This kind of composite oxide has gas sensitivity, oxidation catalytic property, conductivity, oxygen permeability, and other properties. In addition, its structure and performance are closely related to the composition of the system [17]. Perovskite-type oxides can form compounds through partial doping of metal ions at A and B sites on the basis of maintaining stable crystal structure, as well as controlling the A and B sites on the basis of maintaining stable crystal structure, as well as controlling the elements and valence states so that the performance of perovskite materials present rich diversity [18][19][20].
The research in the field of materials is generally based on the successful preparation of experimental samples. The various properties of samples are measured to understand the various physical properties, and the materials are analyzed and classified through different performance parameters [21]. The traditional material experimental process has a strong dependence on the samples. A lot of repetitive work during the experiment process leads to long development times. With the continuous development of computer science, many methods, such as first-principles calculation, phase field simulation, and finite element analysis, have emerged to investigate the structure and performance of materials, but they are often large and costly. These are the major factors limiting the development and transformation of materials [22][23][24].
With the fast development of artificial intelligence, many researchers have applied machine learning methods to accelerate material sciences [25,26]. Due to its strong data processing capacity and relatively low research threshold, machine learning can effectively reduce the cost of human and material resources in industrial development and shorten the research and development cycle [27]. By replacing or cooperating with traditional experiments and computational simulation, it can analyze material structure and predict material properties more quickly and accurately, so as to develop new functional materials more effectively [28,29]. Selecting different machine learning methods to predict material performance parameters from existing large data sets can effectively improve the prediction accuracy of material performance, so as to select materials with reasonable performance for experimental research [21]. Using existing data to predict the performance parameters can not only expand the space of material data but also provide guidance for material experiments and applications. Different machine learning algorithms have different sensitivities to material data in different ranges of data sets, so it is necessary to make a feature selection on specific material data samples to evaluate algorithm by performance evaluation [30][31][32].
The perovskite data set calculated based on the first principles and density functional theory by Antoine et al. was selected as the training samples [33]. Weike et al. showed that deep neural networks utilizing just two descriptors (the Pauling electronegativity and ionic radii) can predict the DFT formation energies of C3A2D3O12 garnets and ABO3 perovskites with low mean absolute errors (MAEs) [34]. Wei et al. developed machine learning models to predict the thermodynamic phase stability of perovskite oxides using a dataset of more than 1900 DFT-calculated perovskite. The results showed that that error is within the range of errors in DFT formation energies relative to elemental reference states when compared to experiments and, therefore, may be considered sufficiently accurate to use in place of full DFT calculations [35]. Using different machine learning algorithms, the formation energy, thermodynamic stability, crystal volume, and oxygen vacancy formation energy of perovskite materials were predicted [36,37].
Step 2: A different algorithm (3) was trained based on the training dataset (2) into different models (4).
Step 4: Performance evaluations (7) were obtained by calculation of predicted value and true value.
Four kinds of machine learning algorithms were used to establish the multi-algorithm prediction model for the multi-performance parameters of perovskite materials, and the prediction accuracy of the model was compared and evaluated. The experimental results have important reference value and practical significance for the further study of machine learning methods in the prediction of perovskite material properties and the discovery of new perovskite materials.

Regression Prediction of Support Vector Machines
According to the limited sample information, SVM seeks the best compromise between the complexity of the model and the learning ability to obtain the best generalization ability based on statistical learning theory [38][39][40]. SVM has many unique advantages in solving small sample, nonlinear, and high-dimensional pattern recognition. Its basic idea is to map the data x to the high-dimensional feature space F through a nonlinear mapping φ and make linear regression in this space.
Assume a sample set {(x i , y i )} N i , in which the input data are x i ∈ R n , y i ∈ R n , and the optimal linear model function constructed in high dimensional space is: where ω and b are weight and bias item, respectively. Thus, the linear regression in the high dimensional feature space corresponds to the nonlinear regression in the low dimensional input space. When using SVM to solve regression problems, we need to use the appropriate kernel function instead of inner product according to the characteristics of solving problems, so as to implicitly transform the inner product operation of high-dimensional feature space into the kernel function operation of low dimensional original space. This skillfully solves the "dimension disaster" caused by calculation in high-dimensional feature space [36]. The commonly used kernel functions are RBF, linear, etc. [39]. In addition, SVM introduces a parameter ε insensitive loss function and uses the loss function to complete the linear regression in the high-dimensional feature space, while the complexity of the model is reduced by minimizing ||ω|| 2 . Finally, the objective function of SVM is as follows: Here, we introduce a nonnegative relaxation variable ξ and ξ', and C is a regularization parameter to control the penalty for the samples that exceed the error.

Random Forest
The random forest (RF) regression algorithm is a combination algorithm based on the decision tree classifier [36,[41][42][43][44]. It uses the bootstrap re-sampling method to extract multiple samples from the original samples, construct the decision tree for each of the bootstrap samples, and then take the voting results that appear most in all the decision trees as the final prediction result [36].
The decision tree corresponding to random parameter vector θ is T(θ), and its leaf nodes are represented as I(x, θ). The steps of the RF algorithm are as follows: Step 1: Repeat the bootstrap method and randomly generate k training sets θ 1 , θ 2 , . . . , and θ k use each training set to generate the corresponding decision tree {T(x, θ 1 )}, {T(x, θ 2 )}, . . . , and {T(x, θ k )}.
Step 2: Assuming that the feature has M dimensions, m features are randomly selected from the feature of M dimension as the splitting feature set of the current node, and the node is split in the best way among the m features.
Step 3: The maximum growth of each decision tree is achieved, and the pruning is not carried out in this process.
Step 4: For the new data, the prediction of a single decision tree T(θ) can be obtained by averaging the observed values of leaf node I(x, θ) where the weight vector is w i (x, θ).
Step 5: The prediction of a single decision tree is obtained through the weighted average of the observed value Y i (i = 1, 2, . . . , n) of the dependent variable, and the predicted valueû of a single decision tree is shown in Equation (3).
From the original training sample set, n samples are randomly selected repeatedly to generate a new training sample set training decision tree, and then M decision trees are generated according to the above steps to form a random forest. The classification result of the new data depends on the number of votes of the classification tree, and the weights are updated in successive iterations.

Ridge Regression
Ridge regression (RR) is a biased estimation regression method for collinear data analysis, which is an improvement of the least square estimation method [45][46][47][48][49][50]. It gives up the unbiased advantage of the least square and gains the stability of the regression coefficient at the cost of losing part of the information and reducing the fitting accuracy [48]. The multiple regression model can be expressed as follows: where Y is the dependent variable, X is the independent variable, β is the regression coefficient, and ε is the error. The regression coefficient is estimated by the least square method as follows: If the independent variables have multiple collinearities, the matrix X T X is singular and the eigenvalue is very small; this causes the elements on the diagonal of the matrix X T X −1 to be very large and makes the parameter estimation extremely unstable [48].
Small changes in the data may lead to great changes in the parameter estimation. The coefficients cannot objectively reflect the influence of the independent variables on the dependent variables and also have a great impact on the prediction results. Ridge regression is to add a diagonal matrix to the matrix X T X so that the eigenvalue of the matrix becomes larger, and the singular matrix is transformed into a nonsingular matrix as far as possible, so as to improve the stability of parameter estimation, and the obtained parameters can more truly reflect the objective reality. Ridge regression is used to solve the regression coefficient β: where K is ridge regression parameter, K ∈ [0, 1]. The larger the value of K is, the smaller the influence of collinearity on the stability of retrospective parameters is K = 0; it becomes the least square estimation, which is an unbiased estimation. K = 0; it is a biased estimation, and the variance of prediction increases with the increase of the variance K. Therefore, K should be enough to eliminate the influence of collinearity on parameter estimation and be as small as possible; this means that when the change of ridge trajectory tends to be stable, the smaller value K should be selected as far as possible [50].

BP Neural Network
The BP neural network is a multilayer feedforward neural network trained by error back propagation algorithm, also known as error back propagation neural network. It is one of the most widely used neural network models at present [51][52][53][54][55]. The BP neural network has the characteristics of self-organization, self-learning, and knowledge reasoning for information processing and has the adaptive characteristics for uncertain regular system [51]. It can use the training of samples to realize the mapping of any nonlinear functional relationship from input to output and reveal its internal laws and characteristics from these mapping relationships [52].
In the process of forward propagation, the input signal is processed layer by layer from the input layer through the hidden layer to the output layer, and the output signal is generated. The neural network element state of each layer only affects the neuron state of the next layer; if the output signal cannot meet the expected output requirements, it is transferred to the error backward propagation process. According to the prediction error, from the output layer to the input layer, the weights and thresholds of the BP neural network are constantly modified so that the prediction output of the BP neural network is close to the expected output [56].
As shown in Figure 2, the BP neural network is composed of three parts: input layer, hidden layer, and output layer, in which the hidden layer can have multiple layers. X 1 , X 2 , . . . , and X n represent the input value of the BP neural network, and Y 1 , Y 2 , . . . , and Y n represent the output value of the BP neural network [56]. obtained parameters can more truly reflect the objective reality. Ridge regression is used to solve the regression coefficient : where K is ridge regression parameter, ∈ [0,1]. The larger the value of is, the smaller the influence of collinearity on the stability of retrospective parameters is K = 0; it becomes the least square estimation, which is an unbiased estimation. ≠ 0; it is a biased estimation, and the variance of prediction increases with the increase of the variance K. Therefore, K should be enough to eliminate the influence of collinearity on parameter estimation and be as small as possible; this means that when the change of ridge trajectory tends to be stable, the smaller value K should be selected as far as possible [50].

BP Neural Network
The BP neural network is a multilayer feedforward neural network trained by error back propagation algorithm, also known as error back propagation neural network. It is one of the most widely used neural network models at present [51][52][53][54][55]. The BP neural network has the characteristics of self-organization, self-learning, and knowledge reasoning for information processing and has the adaptive characteristics for uncertain regular system [51]. It can use the training of samples to realize the mapping of any nonlinear functional relationship from input to output and reveal its internal laws and characteristics from these mapping relationships [52].
In the process of forward propagation, the input signal is processed layer by layer from the input layer through the hidden layer to the output layer, and the output signal is generated. The neural network element state of each layer only affects the neuron state of the next layer; if the output signal cannot meet the expected output requirements, it is transferred to the error backward propagation process. According to the prediction error, from the output layer to the input layer, the weights and thresholds of the BP neural network are constantly modified so that the prediction output of the BP neural network is close to the expected output [56].
As shown in Figure 2, the BP neural network is composed of three parts: input layer, hidden layer, and output layer, in which the hidden layer can have multiple layers. X1, X2, ..., and Xn represent the input value of the BP neural network, and Y1, Y2, ..., and Yn represent the output value of the BP neural network [56].  At present, there is no more accurate method to determine the number of neurons in the hidden layer. We can only determine the number of neurons in the hidden layer through empirical formula and many experiments.
where l represents the number of hidden layer neurons, n represents the number of input layer nodes, m represents the number of output layer nodes, and a represents an arbitrary integer from 0 to 10.
In the learning process of the neural network, the phenomenon of over-fitting may always occur. Over-fitting may not reflect the true result, so it is necessary to introduce regularization technology. The regularization techniques commonly used in neural networks include L2 regularization and Dropout regularization [55].

Performance Evaluation
In this study, mean absolute error (MAE), mean square error (MSE), and coefficient of determination (R 2 ) were used to observe and measure the prediction accuracy of the model and to compare the performance differences of different models. The smaller MAE and MSE are, the larger R 2 is, and the closer to 1 is, indicating that the prediction effect of the model is better [42,49,57]. Their formula is as follows: where n is the number of samples, y j is the true value,ŷ j is the predicted value, and y is the average value.

Model Construction
The properties of the materials are potentially related and interacted with each other. This makes it possible to predict some unknown properties from existing properties. In this work, the data set of perovskite materials was obtained from researchers Antoine et al. [33] based on the first principles and density functional theory. In the process of data preprocessing, data cleaning was the main work, including the deletion of duplicated information, a data legitimacy check, and the correction of the existing errors, so as to ensure the validity of data. After data preprocessing, 5276 ABO 3 perovskite high-throughput data sets were obtained, and four characteristic performance parameters including formation energy, thermodynamic stability, crystal volume, and oxygen vacancy formation energy in the original material data set were going to predict [58].
The table of characteristic energy parameters of the dataset is shown in Table 1. For the prediction of formation energy, stability, and volume, there are 5276 complete data sets.
For the prediction of formation energy, stability, and volume, 12 properties were used as characteristic variables, including 11 independent variables and 1 predictive variable, containing 5276 pieces of effective data.
For the prediction of oxygen vacancy formation energy, 13 properties were used as characteristic variables, including 12 independent variables and 1 predictive variable, containing 4914 pieces of effective data. The perovskite property prediction model based on machine learning is constructed as follows: 1. Data preparation: divide the effective data into training set and test set, 80% into training set, 20% into test set, and normalize the data.
2. Model training: models of SVM-RBF, RF, RR, and BPNN algorithms were established, respectively. Based on these algorithms, four characteristic performance parameters including formation energy, thermodynamic stability, crystal volume, and oxygen vacancy formation energy of perovskite materials, were independently trained.
3. Model effect evaluation: for the results of the test set, MAE, MSE, and R 2 were used to evaluate the model effect.
4. Model application: after training, multiple algorithm models could be used to independently predict the four performance parameters of perovskite materials.
In order to improve the accuracy of model prediction, the algorithm should be optimized before model training. For the application of support vector regression algorithm, the grid search algorithm is used to find the optimal penalty coefficient and the optimal kernel function radius. Table 2 lists the evaluation results of training performance using different algorithm models. MAE, MSE, and R 2 were used to evaluate the model, and the results are shown in Figure 3. It can be seen that the R 2 value of RF is the highest, which is 0.7231, and the values of MAE and MSE are the lowest, which are 0.3731 and 0.2449, respectively. RF has the best prediction effect on the formation energy. For the stability prediction, the R 2 value of SVM-RBF is 0.8081, and the MAE and MSE are 0.2074 and 0.0898, respectively, which are the best for the stability prediction. For crystal volume prediction, the R 2 value of BPNN is the largest, which is 0.9372, and the MAE and MSE are the smallest, which are 0.4134 and 0.4679, respectively. For the prediction of oxygen vacancy formation energy, the evaluation indexes of SVM-RBF and RF are similar, and the prediction effect is better than that of RR and BPNN.    According to the above conclusions, the fitting diagram combining the predicted value of multiple algorithms and the calculated value of DFT are shown in Figures 4-7. The horizontal axis is the calculated data of DFT, while the vertical axis is the predicted data. In Figure 4c, the points (DFT, Predicated) are closer to the reference points so that SVM-RBF has a better prediction effect on volume. In the same way, RF has a better prediction effect on volume, stability, and formation energy; RR has a better prediction effect on stability; and BPNN has a better prediction performance on these four characteristic parameters, which can effectively predict formation energy, stability, volume, and oxygen vacancy formation of perovskite materials.

Results and Discussion
The above results show that the prediction effect of different algorithms on different properties of material data is different. SVM-RBF can effectively predict the volume. RF can effectively predict the crystal volume, thermodynamic stability, and formation energy. RR can effectively predict the thermodynamic stability. BPNN can effectively predict the formation energy, thermodynamic stability, crystal volume, and oxygen vacancy formation energy. Therefore, the performance parameters in perovskite system which are difficult to be obtained by traditional experimental methods can be predicted by machine learning. According to the above conclusions, the fitting diagram combining the predicted value of multiple algorithms and the calculated value of DFT are shown in Figures 4-7. The horizontal axis is the calculated data of DFT, while the vertical axis is the predicted data. In Figure 4c, the points (DFT, Predicated) are closer to the reference points so that SVM-RBF has a better prediction effect on volume. In the same way, RF has a better prediction effect on volume, stability, and formation energy; RR has a better prediction effect on stability; and BPNN has a better prediction performance on these four characteristic parameters, which can effectively predict formation energy, stability, volume, and oxygen vacancy formation of perovskite materials.       ;. = -.

Conclusions
Four different machine learning algorithms, including support vector machine based on radial basis function (SVM-RBF), random forest (RF), ridge regression (RR), and BP neural network (BPNN), were used to predict the formation energy, stability, volume, and oxygen vacancy formation energy of perovskite materials. The algorithm model gets prediction results. SVM-RBF has a better prediction effect on the crystal volume; RF has a better prediction effect on the crystal volume, thermodynamic stability, and formation energy; RR has a better prediction effect on stability; and BPNN has a better prediction effect on all four characteristic parameters. It is further proved that different machine learning algorithms have different sensitivity to data, and different methods need to be selected to predict different performance parameters of perovskite materials. The machine learning method is applied to the performance prediction of perovskite materials, which improves the prediction efficiency and the subsequent performance prediction effect. The results have practical reference value for the study of machine learning methods in the performance prediction of perovskite materials and even in the research and development of new perovskite materials. Data Availability Statement: The data of this work are available upon request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.