Machine Learning Prediction of Surface Segregation Energies on Low Index Bimetallic Surfaces

: Surface chemical composition of bimetallic catalysts can di ﬀ er from the bulk composition because of the segregation of the alloy components. Thus, it is very useful to know how the di ﬀ erent components are arranged on the surface of catalysts to gain a fundamental understanding of the catalysis occurring on bimetallic surfaces. First-principles density functional theory (DFT) calculations can provide deeper insight into the surface segregation behavior and help understand the surface composition on bimetallic surfaces. However, the DFT calculations are computationally demanding and require large computing platforms. In this regard, statistical / machine learning methods provide a quick and alternative approach to study materials properties. Here, we trained previously reported surface segregation energies on low index surfaces of bimetallic catalysts using various linear and non-linear statistical methods to ﬁnd a correlation between surface segregation energies and elemental properties. The results revealed that the surface segregation energies on low index bimetallic surfaces can be predicted using fundamental elemental properties.

The surface composition of catalysts is of paramount importance in catalysis as the reaction centers/active sites are located on the surface where reactions occur allowing the bond forming and breaking processes. For a bimetallic catalyst, its surface chemical composition can differ from the bulk composition because of the segregation of the alloy components [19]. As a result, the segregation could result in the surface composition of an alloy catalyst enriched in one component. For instance, the surface of PtTM alloy (TM = 3d-TM) catalysts is enriched with Pt forming a core-shell type of structures due to the preferred segregation of Pt on the surface [3,[20][21][22][23][24][25]. Therefore, the understanding of the surface composition of TM-alloy catalysts provides important insight for the rational design of alloy catalysts. Experimental techniques such as photoemission spectroscopy of surface core-level shifts (SCLS) and quantum computational methods, such as first-principles density functional theory (DFT) calculations, have been routinely used to study the surface composition and segregation trends on alloy catalysts [26][27][28][29][30]. In many alloy surfaces, the difference in surface energy between the metals that make up the alloy and the heat of solution of the guest metal into the bulk of the host metal is found to be the driving forces of surface segregation [31]. The metal with lower surface energy and or positive heat of solution has been found to move to the surface.
Ruban et al. [32] have performed calculations based on Green's-function linear-muffin-tin-orbitals method to estimate the surface segregation energy on the thermodynamically most stable low index surfaces [(111) surface of face centered cubic phase, (110) surface of body centered cubic phase and (001) surface of hexagonal cubic phase] of bimetallic alloys. Similarly, DFT calculations were carried out to understand segregation behavior on low-index open surfaces [(100) surface of fcc and bcc and (1010) surface of hcp)] of bimetallic transition metal alloys [33]. The DFT calculations provide an accurate prediction of surface segregation trends and are often consistent with the experimental observations [30,[33][34][35][36]. However, the DFT calculations are computationally demanding and require large computing platforms. In this regard, statistical/machine learning methods provide a quick and alternative approach to study materials properties. In recent years, such models have been successfully used to study the structures of catalysts as well as reaction mechanisms in heterogeneous catalysis [37][38][39][40][41][42][43][44][45]. Here, we used a previously published database of surface segregation energies calculated using Green's-function method [32] to fit various machine learning models. The surface segregation energies were correlated with elemental properties of host and guest metals. Results obtained from the various statistical models used in this study demonstrated that the surface segregation energies of bimetallic alloys can be predicted using simple non-linear regression methods.

Methods
The dataset used in the present study has one response variable which is the surface segregation energies of transition metal alloys and a total of 20 features (10 features each for host and guest metals) as shown in Table 1 and Table S7 (in the supporting information (SI)). Surface segregation energies were taken from the study of Ruban et al. [32]. The features were the elemental properties of the host and guest metals (Table 1) [46]. These properties are atomic radius, atomic number, atomic mass, density, work function, period, electron affinity, ionization energy, d-shell and Pauling electronegativity. Each of the host metals was combined with all the guest metals to form a bimetallic system. With each of the 24 metals (Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zr, Nb, Mo, Tc, Ru, Rh, Pd, Ag, Hf, Ta, W, Re, Os, Ir, Pt, Au) combined with all the 24 metals, we had a total of~576 systems. The host transition metals were classified according to three most common crystal structures-face centered cubic (FCC), body centered cubic (BCC) and the hexagonal closed packed (HCP) [47] as shown in Figure 1A-C, respectively. The surface segregation energy was defined as the energy required to move an impurity atom from the inside of a host crystal to the surface ( Figure 1D,F) [33]. The boxplots ( Figure S25) showed that that two features (density and atomic mass) showed relatively large inhomogeneity compared to other features included in this study when compared between host and guest atoms.
Various linear and non-linear machine learning models were employed to correlate the elemental properties of host and guest metals with surface segregation energies of bimetallic alloys. Linear regression is a method that finds the linear relationship between descriptors and one or more target variables. The linear models used in the present study include Ordinary Least Squares (OLS) and Partial Least Squares (PLS) [48]. The non-linear regression is a method that models the non-linear relationship between the descriptors and the target variable. The non-linear models used in this study included gradient boosting regression (GBR) [  Various linear and non-linear machine learning models were employed to correlate the elemental properties of host and guest metals with surface segregation energies of bimetallic alloys. Linear regression is a method that finds the linear relationship between descriptors and one or more target variables. The linear models used in the present study include Ordinary Least Squares (OLS) and Partial Least Squares (PLS) [48]. The non-linear regression is a method that models the non-linear relationship between the descriptors and the target variable. The non-linear models used in this study All the models were implemented using an open source scikit-learn library of python [53]. Nonlinear models were tested for a range of values for various parameters. The parameters for GBR were the number of estimators, learning rate and max depth, the parameters for GPR were kernel and the number of restart optimizers, the parameters for KRR were alpha, coefficient, degree, gamma, kernel and kernel parameters. The parameters that produced the best model predictions for the training set were used for the prediction of the surface segregation of the test set. The accuracies of our models Non-linear models were tested for a range of values for various parameters. The parameters for GBR were the number of estimators, learning rate and max depth, the parameters for GPR were kernel and the number of restart optimizers, the parameters for KRR were alpha, coefficient, degree, gamma, kernel and kernel parameters. The parameters that produced the best model predictions for the training set were used for the prediction of the surface segregation of the test set. The accuracies of our models were primarily tested using the root mean square error (RMSE) and mean absolute error (MAE), commonly used metrics in machine learning/statistical models.
For the GBR method, we tested different values of parameters, including the number of estimators, the learning rate and the maximum depth. We tested the learning rate values of 0.011, 0.1 and 0.2. For the number of estimators, we tested values ranging from 100 to 300. Also, we tested maximum depth values ranging between 3 and 5. Based on the results, we found that for bimetallic FCC (111) surfaces, the number of estimators, learning rate and the maximum depth values that produced the best accuracy were 200, 0.1 and 3, respectively. For BCC (110) surfaces, the number of estimators, learning rate and the maximum depth values that produced the best accuracy were 100, 0.1 and 3, respectively. Similarly, for HCP (001) surfaces, the number of estimators, learning rate and the maximum depth values that produced the best accuracy were 100, 0.1 and 3, respectively. For combined FCC (111), BCC (110) and HCP (001) dataset, the number of estimators, learning rate and the maximum depth values that produced the best accuracy were 200, 0.1 and 3, respectively.
We employed the random permutation cross-validation, also known as the shuffle split cross-validation technique, to offset the bias of our model. This method was similar to the k-fold cross-validation where the data is split into k-folds and one of the folds will be the test set and the other (k − 1) folds will be the training set. The difference between the k-fold cross-validation and the shuffle split cross-validation was that for the former, the data was not shuffled after each iteration. However, for the latter, the data was shuffled and then split into training and test set. The data set was shuffled and split 7 times and randomly sampled during each iteration. The predictions using the test and training sets in each iteration is not used in the development of the model. Rather, all the predicted values using the test and training sets in each iteration were appended to a list and the error is calculated over the entire dataset. In the case of the shuffle split cross-validation method, splitting our data into a test set (20%) and a training set (80%) resulted in the best accuracy. (see SI for the details of our results using the shuffle split cross-validation).

Model Predictions of the Surface Segregation Energy on Bimetallic FCC (111) Surfaces
The surface segregation energies on bimetallic FCC (111) surfaces were modeled using OLS, PLS, SVR, GPR, GBR and KRR methods. It was found that the GBR model showed the best result ( Figure 2A) with an RMSE of 0.15 eV for the test set and 0.03 eV for the training set. Our calculated MAE of training and test sets were 0.02 eV and 0.11 eV, respectively. As shown in Table 2, the R 2 values for the training and test sets were 0.99 and 0.89, respectively. In contrast, the RMSE and MAE values obtained using other machine learning models (Figures S1-S5; Tables S1-S6) were higher compared to those obtained using the GBR method. The GBR method typically promotes weak learners to strong learners and learn complex non-linear decision boundaries via boosting to increase its overall predictive performance. As expected, the predictive performance of the GBR method, as well as the other methods, was found to be much better for the training set as compared to the test set. Among all the data points, the predicted value of surface segregation energy of RhAg (−0.52 eV) had the largest deviation from actual value (−0.92 eV). Importantly, our GBR model was able to capture the sign of surface segregation energies on (111) surface of all bimetallic alloys and the RMSE of the GBR model was comparable to the typical error (~0.1 eV) in density functional theory (DFT) calculations. For example, consistent with the experimental observations, our model predicted Pt surface segregation on (111) surfaces of PtFe, PtCo and PtNi bimetallic systems [11,15]. Thus, the results based on our small data set showed that the GBR model can be used to reliably predict surface segregation energies on thermodynamically most stable low index bimetallic (111) surfaces using the elemental properties (20 features) of components in bimetallic alloys.  Next, we studied the relative importance of features (out of total 20 features) used in all statistical models including the GBR model. Figure 2B showed the feature-importance scores of all 20 descriptors. We found that host metal's d-shell was the least important feature and its contribution towards the prediction of surface segregation energy was negligible, while all other features contribute in model predictions. The 11 most significant features identified in this work were guest metal's work function, guest metal's electron affinity, host metal's density, host metal's atomic mass, guest metal's density, guest metal's atomic number, host metal's atomic number, guest metal's ionization energy, guest metal's radius, host metal's electron affinity and guest metal's Pauling  Next, we studied the relative importance of features (out of total 20 features) used in all statistical models including the GBR model. Figure 2B showed the feature-importance scores of all 20 descriptors. We found that host metal's d-shell was the least important feature and its contribution towards the Energies 2020, 13, 2182 6 of 13 prediction of surface segregation energy was negligible, while all other features contribute in model predictions. The 11 most significant features identified in this work were guest metal's work function, guest metal's electron affinity, host metal's density, host metal's atomic mass, guest metal's density, guest metal's atomic number, host metal's atomic number, guest metal's ionization energy, guest metal's radius, host metal's electron affinity and guest metal's Pauling electronegativity. Subsequently, we developed the GBR model using the 11 most significant features ( Figure 3A) and obtained RMSE values of 0.04 eV and 0.14 eV for training and test sets (Table 3), respectively. These values were comparable to the values obtained using 20 features. We also investigated the predicting accuracy of the GBR model using the 5 most important features (guest metal's work function, host metal's atomic number, guest metal's electron affinity, host metal's atomic mass and guest metal's density; Figure 3B) and obtained an RMSE of 0.05 eV and 0.14 eV for training and test sets (Table 4), respectively. This value of RMSE was very similar to the values obtained using 20 and 11 features. Thus, we concluded that 5 features could be a reasonable choice for the prediction of surface segregation energies on (111) surface of bimetallic alloys using the GBR model.    Table 2 show that the R 2 values for the training and test sets were 0.98 and 0.85, respectively. Our current GBR model showed that the predicted surface segregation energy (-0.96 eV) of FePt bimetallic sytem had the largest deviation from actual value (-0.66 eV). Conversely, the RMSE and MAE values that were retrieved from other machine learning models (Figures S6-S10; Tables S1-S6) were higher as compared to those obtained using the GBR method. Also, the predictive ability of the GBR method and other methods considered was significantly better for the training set as compared to the test set. It was noteworthy that our GBR model essentially captured the sign of surface segregation energies on (110) surfaces of all bimetallic alloys. Therefore, the results suggested that the GBR model could be potentially used to predict surface segregation energies of BCC bimetallic (110) surfaces using the fundamental properties of transition metal components in alloys.

Model Predictions of the Segregation Energy on Bimetallic BCC (110) Surfaces
Machine learning models, OLS, PLS, SVR, GPR, GBR and KRR, similar to those used on FCC (111) surfaces, were used to model the surface segregation energies on bimetallic BCC (110) surfaces. The results demonstrated that the GBR outperformed other models ( Figure 4A), having an RMSE of 0.22 eV for the test set and 0.10 eV for the training set. The estimated (MAE) of our training and test sets, were 0.07 eV and 0.17 eV, respectively. Moreover, the results presented in Table 2 show that the R 2 values for the training and test sets were 0.98 and 0.85, respectively. Our current GBR model showed that the predicted surface segregation energy (−0.96 eV) of FePt bimetallic sytem had the largest deviation from actual value (−0.66 eV). Conversely, the RMSE and MAE values that were retrieved from other machine learning models (Figures S6-S10; Tables S1-S6) were higher as compared to those obtained using the GBR method. Also, the predictive ability of the GBR method and other methods considered was significantly better for the training set as compared to the test set. It was noteworthy that our GBR model essentially captured the sign of surface segregation energies on (110) surfaces of all bimetallic alloys. Therefore, the results suggested that the GBR model could be potentially used to predict surface segregation energies of BCC bimetallic (110) surfaces using the fundamental properties of transition metal components in alloys.  The feature importance analysis on BCC (110) surfaces based on the GBR method was performed similarly compared to FCC (111) surfaces and the results are presented in Figure 4B. The results showed that the host d shell and host period were the least important features and their influence on the prediction of surface segregation energy was insignificant, while all other features contribute in model predictions. In line with the predictions on FCC (111) surfaces, we found that the model accuracy using the 11 topmost features ( Figure 5A, Table 3: (RMSE; MAE) are (0.10 eV; 0.22 eV) and (0.07 eV; 0.18 eV) for the test and training sets, respectively ) was similar to that obtained using all 20 features. In contrast, the model performance using the 5 topmost features ( Figure 5B, Table 4: RMSE = 0.12 eV and 0.27 eV; MAE = 0.10 eV and 0.20 eV for the training and test sets, respectively ) was slighter weaker compared to the models using 20 and 11 features. The feature importance analysis on BCC (110) surfaces based on the GBR method was performed similarly compared to FCC (111) surfaces and the results are presented in Figure 4B. The results showed that the host d shell and host period were the least important features and their influence on the prediction of surface segregation energy was insignificant, while all other features contribute in model predictions. In line with the predictions on FCC (111) surfaces, we found that the model accuracy using the 11 topmost features ( Figure 5A, Table 3: (RMSE; MAE) are (0.10 eV; 0.22 eV) and (0.07 eV; 0.18 eV) for the test and training sets, respectively ) was similar to that obtained using all 20 features. In contrast, the model performance using the 5 topmost features ( Figure 5B, Table 4: RMSE = 0.12 eV and 0.27 eV; MAE = 0.10 eV and 0.20 eV for the training and test sets, respectively ) was slighter weaker compared to the models using 20 and 11 features.

Model Predictions of Segregation Energy for Bimetallic HCP (001) Surfaces
Similar linear and non-linear machine learning models as described in preceding sections were employed to predict surface segregation energies on bimetallic HCP (001) surfaces. Consistent with our results on FCC (111) and BCC (110) surfaces, the GBR model was superior and had (RMSE; MAE) values of (0.23 eV; 0.18 eV) for the test set and (0.09 eV; 0.07 eV) for the training set. Other models performed weakly and had higher RMSE and MAE values (Figures S11-S15; Tables S1-S6). Also, as presented in Table 2, the R 2 values for the training and test sets were 0.97 and 0.76, respectively. Once again, our GBR model efficiently captured the sign of surface segregation energies on (001) surfaces of all bimetallic alloys. RuCr abd CoCr were found to be the outliers with largest deviation from the actual values. Our model was qualitatively able to capture the surface segregation behavior even though, the RMSE and MAE values of GBR model, the best performing model, were relatively higher compared to those estimated on FCC (111) and BCC (110) surfaces (Table 2). The feature importance analysis on BCC (110) surfaces based on the GBR method was performed similarly compared to FCC (111) surfaces and the results are presented in Figure 4B. The results showed that the host d shell and host period were the least important features and their influence on the prediction of surface segregation energy was insignificant, while all other features contribute in model predictions. In line with the predictions on FCC (111) surfaces, we found that the model accuracy using the 11 topmost features ( Figure 5A, Table 3: (RMSE; MAE) are (0.10 eV; 0.22 eV) and (0.07 eV; 0.18 eV) for the test and training sets, respectively ) was similar to that obtained using all 20 features. In contrast, the model performance using the 5 topmost features ( Figure 5B, Table 4: RMSE = 0.12 eV and 0.27 eV; MAE = 0.10 eV and 0.20 eV for the training and test sets, respectively ) was slighter weaker compared to the models using 20 and 11 features.

Model Predictions of Segregation Energy for Bimetallic HCP (001) Surfaces.
Similar linear and non-linear machine learning models as described in preceding sections were employed to predict surface segregation energies on bimetallic HCP (001) surfaces. Consistent with our results on FCC (111) and BCC (110) surfaces, the GBR model was superior and had (RMSE; MAE) values of (0.23 eV; 0.18 eV) for the test set and (0.09 eV; 0.07 eV) for the training set. Other models performed weakly and had higher RMSE and MAE values (Figures S11-S15; Tables S1-S6). Also, as presented in Table 2, the R 2 values for the training and test sets were 0.97 and 0.76, respectively. Once again, our GBR model efficiently captured the sign of surface segregation energies on (001) surfaces of all bimetallic alloys. RuCr abd CoCr were found to be the outliers with largest deviation from the The GBR model using the most important 11 features ( Figure 6A), out of 20 features ( Figure 7B), has (RMSE; MAE) values of (0.10 eV; 0.08 eV) and (0.24 eV; 0.19 eV) for the training and test sets, respectively. It can be observed that the RMSE value for the test set (0.24 eV) from the GBR model with the 11 most significant features was slightly higher than the RMSE value (0.23 eV) that was obtained when all 20 features were used in the model. Additional analysis was performed to examine the predicting accuracy of our model by employing the 5 most important features ( Figure 6B) and we obtained an (RMSE; MAE) value of (0.12 eV; 0.09 eV) for the training set and (0.25 eV; 0.20 eV) for the test set, respectively. The smaller difference of the values of RMSE and MAE between the models using various number of features suggested that the choice of 5 features could be a reasonable choice to qualitatively predict the surface segregation energies on HCP (001) suraces.
Energies 2020, 13, x FOR PEER REVIEW  8 of 12 actual values. Our model was qualitatively able to capture the surface segregation behavior even though, the RMSE and MAE values of GBR model, the best performing model, were relatively higher compared to those estimated on FCC (111) and BCC (110) surfaces (Table 2).   actual values. Our model was qualitatively able to capture the surface segregation behavior even though, the RMSE and MAE values of GBR model, the best performing model, were relatively higher compared to those estimated on FCC (111) and BCC (110) surfaces (Table 2).   Finally, OLS, PLS, SVR, GPR, GBR and KRR models were developed for a combined data set (that contains surface segregation energies on FCC (111), BCC (110) and HCP (001) surfaces). It was found that the GBR performed better than other models ( Figure 8A) with an RMSE of 0.11 eV for the

Model Predictions of the Segregation Energy on Combining Bimetallic FCC (111), BCC (110) and HCP(001) Surfaces] Dataset
Finally, OLS, PLS, SVR, GPR, GBR and KRR models were developed for a combined data set (that contains surface segregation energies on FCC (111), BCC (110) and HCP (001) surfaces). It was found that the GBR performed better than other models ( Figure 8A) with an RMSE of 0.11 eV for the training set and 0.20 eV for the test set. The MAE of our training and testing set were estimated to be 0.08 eV and 0.16 eV, respectively. As presented in Table 2, the R 2 values for the training and testing sets were 0.97 and 0.91, respectively. We found that the RMSE and MAE values from other machine learning models (Figures S16-S20; Tables S1-S6) were higher as compared to those obtained from our GBR model. We found similar feature importance in the combined data set ( Figure 8B) compared to individual FCC (111), BCC (110) and HCP (001) datasets. Thus, 5 topmost features for which the RMSE and MAE values were slightly higher to those obtained using 20 and 11 features and is a reasonable choice to predict qualitative trends of surface segregation energies on low index bimetallic surfaces ( Figure 9A,B).
Energies 2020, 13, x FOR PEER REVIEW 9 of 12 training set and 0.20 eV for the test set. The MAE of our training and testing set were estimated to be 0.08 eV and 0.16 eV, respectively. As presented in Table 2, the R 2 values for the training and testing sets were 0.97 and 0.91, respectively. We found that the RMSE and MAE values from other machine learning models (Figures S16-S20; Tables S1-S6) were higher as compared to those obtained from our GBR model. We found similar feature importance in the combined data set ( Figure 8B) compared to individual FCC (111), BCC (110) and HCP (001) datasets. Thus, 5 topmost features for which the RMSE and MAE values were slightly higher to those obtained using 20 and 11 features and is a reasonable choice to predict qualitative trends of surface segregation energies on low index bimetallic surfaces ( Figures 9A,B).  Overall, our calculations showed that the surface segregation energies on low index bimetallic surfaces can be correlated with elemental properties and predicted using simple statistical models. The predictive performance of simple non-linear statistical methods such as GBR was remarkable given the small size of the data set. The predictability of such models is expected to improve for a larger data set and thus, the future data-fitting strategy should focus on increasing the size of the input dataset. Some of the features in input features are related to each other. Subsequently, we performed additional analysis removing (d-shell), (d-shell + atomic radius + volume) and (d-shell + atomic radius + volume) and adding a new feature volume = 4r 3 π/3 and the results are presented as Tables S7-S9 and Figures S26-S37 in the SI. The new results showed similar RMSE and MAE values for the GBR model. These observations suggested that some of the features can be combined to reduce the number of input features.  Overall, our calculations showed that the surface segregation energies on low index bimetallic surfaces can be correlated with elemental properties and predicted using simple statistical models. The predictive performance of simple non-linear statistical methods such as GBR was remarkable given the small size of the data set. The predictability of such models is expected to improve for a larger data set and thus, the future data-fitting strategy should focus on increasing the size of the input dataset. Some of the features in input features are related to each other. Subsequently, we performed additional analysis removing (d-shell), (d-shell + atomic radius + volume) and (d-shell + atomic radius + volume) and adding a new feature volume = 4r 3 π/3 and the results are presented as Tables S7-S9 and Figures

Conclusions
In summary, we used various linear and non-linear regression methods to train the surface segregation energies calculated using Green's-function linear-muffin-tin-orbitals method and correlate with the elemental properties of transition metals. The results showed that the surface segregation energy of the bimetallic alloys can be moderately predicted using simple machine learning models. It was noted that the non-linear gradient boosting regression (GBR) outperformed all other statistical methods employed in the present study. Using 20 features, the RMSE and MAE values for training/test sets in GBR model were 0.03/0.15 eV and 0.02/0.11 eV for FCC (111), 0.10/0.22 eV and 0.07/0.17 eV for BCC (110) and 0.09/0.23 eV and 0.07/0.18 eV for HCP (001) surfaces, respectively. Our feature importance analysis indicated that the 11 most significant features (common features include: guest work function, guest electron affinity, host density, host atomic mass, guest density, guest atomic number, host atomic number, guest ionization energy, guest radius, host electron affinity and guest Pauling electronegativity) produced similar results compared to those obtained by using all 20 features. The overall performance of the GBR model using the topmost 5 features is slightly weaker compared to the models using 20 and the topmost 11 features. This study, therefore, showed that the surface segregation energies on low index bimetallic surfaces can be predicted using elemental properties given the enough DFT database is available.