Predicting Interfacial Thermal Resistance by Ensemble Learning

: Interfacial thermal resistance (ITR) plays a critical role in the thermal properties of a va-14


Introduction
Interfacial thermal resistance (ITR) is a property that measures an interface's resistance to thermal flow [1][2][3].When thermal flux is applied across an interface, ITR causes a finite temperature discontinuity.Low ITR is technologically important for heat dissipation in integrated circuits [4], whereas high ITR is critical for engine turbine protection [5,6].The growing interest in space exploration necessitates developing special material systems that can withstand high temperatures and have high ITR.A reliable and accurate prediction of ITR is thus critical for the design of materials with desired properties.However, ITR is affected by a wide range of factors, including melting point, film thickness, material density, heat capacity, electronegativity, binding energy, and temperature [7,8].Furthermore, for nanodevices, quantum effects must be considered [9].As a result, ITR prediction is a high-dimensional problem that cannot be solved ideally using traditional methods.
The traditional representative models for predicting ITR are the acoustic mismatch model (AMM), diffuse mismatch model (DMM) [10], scattering-mediated acoustic mis- match model (SMAMM) [11], and molecular dynamics simulation (MD) [12].To some extent, these models all have simplified assumptions, which limit the prediction to specific scenarios.Briefly, the AMM model is based on the idea that ITR is caused by an acoustic impedance mismatch between two materials.Other factors, such as phonon scattering, geometry, and chemical bonds, are ignored, resulting in AMM working well only at low temperatures and underestimating ITR at high temperatures [13].The DMM is proposed as a supplement to AMM.It assumes complete phonon scattering at the interface, resulting in the overestimation of ITR, particularly at low temperatures [10].The SMAMM improves the AMM model by incorporating phonon scattering and achieves reasonable ITR prediction over a wider temperature range.However, its accuracy is still limited due to the omission of other factors influencing ITR.As an atomic-level simulation, MD simulation considers chemical bonds, defects, and atomic species at interfaces and can outperform AMM, DMM, and SMAMM models [12].However, because inelastic quantum scattering is difficult to capture in MD simulation, it is challenging to integrate quantum effects [14].In addition, MD simulation is computationally expensive, preventing its applications to complex material systems.The rise of machine learning in the last decade has enabled the solution of high-dimensional problems.The goal of machine learning is to automatically discover the underlying pattern by training models on given datasets.The Xu group predicted ITR using classical machine learning methods (LSBoost, support vector machines, and Gaussian process regression) and achieved higher prediction accuracy than traditional AMM and DMM models [9,10].Table 1 summarizes the advantages and disadvantages of machine learning models in ITR prediction when compared to traditional AMM, DMM, and SMAMM models, and MD simulations.In general, machine learning models are more reliable when trained on large datasets.[7,8] However, the available ITR dataset from the literature is relatively small (692 instances) [9,10].In this case, the corresponding machine learning models are prone to overfitting, thereby reducing prediction accuracy and robustness.In this work, we used ensemble learning to reduce the overfitting of machine learning models and achieve a robust and precise ITR prediction, as illustrated in Figure 1.First, exploratory data analysis (EDA) and data visualization were performed on the raw data to obtain a comprehensive view of the dataset.The correlations between training descriptors and target ITR values were used to select descriptors.Second, the XGBoost (XGB) algorithm was chosen to create an interpretable XGB model and demonstrate the significance of various descriptors in ITR prediction.Following that, the top 20 descriptors with the highest importance scores were chosen, except for fdensity, fmass, and smass, to build concise models using XGBoost, Kernel Ridge Regression, and deep neural networks.An ensemble model was created by combining these three models.Finally, over 80,000 material systems were constructed and used as test data for ITR predictions.The top 40 material systems with melting points higher than 600 K predicted by our ensemble model were reported.
Material systems were constructed using the "descriptor dataset."The corresponding python codes' material system construction for ITR prediction can be downloaded directly from: https://github.com/pacificknight/Ensemble-learning-for-ITR-prediction.

XGBoost
The XGB is a gradient boosting library implementation designed for high speed and accuracy in solving many data science problems [20].Another advantage of XGB is that obtaining importance scores for each descriptor after the boosted trees are constructed is relatively straightforward.Importance scores indicate a descriptor's contribution to the final prediction and serve as guidelines for descriptor selection.The greater a descriptor's importance score, the more it contributes to the final prediction.The training time for the XGB model with the selected descriptors is 0.16 s.

Kernel Ridge Regression
The KRR was chosen because it is a well-known machine learning method.It applies the powerful idea of support vector machines to regression.KRR improves computational efficiency by combining ridge regression with the "kernel trick" and extending ridge regression to the nonlinear case [21][22][23][24].In this work, the radial basis function was used as the kernel.The training time for the KRR model with selected descriptors is 0.045 s.

Deep Neural Network
A deep neural network (DNN) is an artificial neural network with more than three hidden layers between the input and output layers.The DNNs are feedforward networks suitable for modeling complex nonlinear relationships [25][26][27].There are several reasons to use DNN.First, DNN differs from traditional machine learning methods.It represents cutting-edge deep learning technology.Second, we expected to use DNN to capture potential patterns that may be ignored by traditional machine learning models in ITR predictions.Lastly, DNN can eliminate feature engineering and generate layered structures that eliminate representational redundancy.The parameter optimization of DNN is based on backpropagation.DNN is prone to overfitting; thus, regularization is widely adopted in DNN to penalize overfitting and make the model more robust and reliable [27,29].A DNN with three hidden layers was constructed.From the first hidden layer to the last hidden layer, the neuron numbers were 64, 128, and 64, respectively.Root mean squared error (RMSE) was used as the loss function.Adam optimizer was employed with a weight decay of 5E-5, a learning rate of 3E-4.The training epochs were 15,000.R 2 and RMSE were used to evaluate the model performance.First, batch normalization was performed on the input data using the equation below.

𝑦 = 𝑥 − 𝐸[𝑥] √𝑉𝑎𝑟[𝑥] + 𝜖 ,
where E[x] and √[] +  are the mean and standard deviation, respectively.The purpose of batch normalization is to accelerate DNN training by reducing internal covariate shifts [30].The data was then transformed from 17 to 64 dimensions using a linear transformation and the Relu activation function, and nonlinear relationships were introduced into the model.Following that, the data was passed through the hidden layers.Dropout layers with a drop rate of 0.25 were added to each hidden layer after batch normalization and before linear transformation as a regularization method to reduce overfitting.Dropout refers to the process of randomly removing a percentage of neurons from our DNN during the training process [28].By this act, the final ITR results will no longer depend on specific neurons, making the DNN model more robust and reliable.Furthermore, our DNN was trained with Adam optimizer at a decay weight rate of 5e-5.Weight decay is another method of regularization [31].Weight decay is based on the idea that neural networks with smaller weights are less likely to overfit.Given that large prediction error in ITR is misleading to materials design, we chose MSE over mean absolute error as the criterion in our DNN, which penalized large error more heavily.The DNN model with selected descriptors takes 434 s to train.

Ensemble Model
The XGB, KRR, and DNN models were used as base estimators.The ensemble model is built by combining all of the models used in the ITR prediction process to predict low variance, high accuracy, less feature noise, and bias.The aggregating method adopted is averaging.As a result, the final prediction was made by averaging the predicting results of XGB, KRR, and DNN models.
Even though the ITR has insignificant temperature dependence above room temperature in the training example, when constructing the test samples, the temperature used to predict high melting point ITR was set to be 600 K.

Algorithm Evaluation
The R 2 score and RMSE were used to evaluate the model performances.The R 2 score is also known as the coefficient of determination in statistics, and it measures how well-observed outcomes are replicated by the model [32].The R 2 score has a range of 0-1.The closer the R 2 score is to 1, the better the model performance.The RMSE is often used to measure the difference between a model's real and the predicted value [33].For the same dataset, the smaller the RMSE, the better the model's performance.
In statistics, R 2 is calculated by .
The RMSE is given by where ,   ,   , and  ̅ are numbers of data instances, real ITR, predicted ITR, and averaged real ITR values, respectively.

Descriptor Selection
The EDA is a statistical approach for analyzing and summarizing datasets and their characteristics [34].First, the Pearson heatmap of correlations between all 35 descriptors was calculated and plotted in Figure S1.The heatmap is diagonally symmetric because the correlation between descriptor A and descriptor B is equal to the correlation between descriptor B and descriptor A. The correlation between all 35 descriptors ranges from −1 to 1, with some descriptors having a high correlation with one another (yellow or dark areas).Figure 2 shows a heatmap of descriptors with absolute correlation values greater than 0.9.These highly correlated descriptors suggest that descriptor dimensions can be significantly reduced without negatively impacting model performance.For example, fmass has a correlation value of 0.9 with fdensity consistent with the two descriptors' physical correlation.As a result, descriptor selection is required to reduce descriptor dimensions for a quick and reliable prediction.Second, the Pearson correlation coefficient between each descriptor and the target ITR was investigated, as shown in Figure 3. Descriptors such as funit, fmass, fAC2y, and fAC1y have a strong positive correlation with ITR, whereas fheatcap, sheatcap, fmelt, and T demonstrate a strong negative correlation with ITR.It is important to note that descriptor selection should not be based solely on the absolute value of descriptor correlations with ITR.For example, if we build models using high correlation descriptors like fmass, fAC2y, and fAC1y, the prediction results will be inaccurate.Because descriptors fAC2y and fAC1y have a high correlation with descriptor fmass, as shown in Figure 2, models with all three descriptors are not expected to outperform models containing only one of the three descriptors.In light of this, we chose descriptor based not only on the importance scores of each descriptor provided by the XGB model but also on the Pearson correlation coefficients.First, an XGB model with all 35 descriptors was built, and the corresponding parameters were optimized.The test dataset had an R 2 score of 0.88 and an RMSE of 9.44.To better understand the contributions of each descriptor, the ranked descriptor importance of all 35 descriptors was given in Figure S2.It indicates that the binding energy, volume per formula unit, melting point, density, heat capacity, and temperature, among other factors, play a significant role in predicting ITR.For example, binding energy and melting point can influence phonon transport; volume per formula unit and density can influence the Debye cutoff frequency; [10] and temperature can influence ITR directly through heat capacity and phonon distribution.[35] However, some of the descriptors are highly correlated with one another and thus redundant for ITR prediction.In addition, it is difficult to collect all of the descriptor data in practice, which dictates descriptor selection.Figure 4 shows the descriptor selection procedures.First, the top 20 descriptors with the highest importance scores were selected.Then, within the top 20 descriptors (fdensity, fmass, and smass), duplicated descriptors were removed, yielding 17 relatively independent dominating descriptors.

Model Performance
In this section, we trained and evaluated the performance of a new XGB model with the 17 descriptors obtained from the preceding section.The R 2 score and RMSE obtained for the test dataset were 0.87 and 10.00, respectively, comparable to the XGB model trained with all 35 descriptors.This indicates that we have extracted all necessary descriptors to build reliable and concise machine learning models.The KRR and DNN models were also trained with the training data and evaluated with the test data.Table 2 summarizes the predictive performance of all three models evaluated by R 2 and RMSE.After descriptor selection, all three models maintained high predictive performance, resulting in concise and accurate models.According to the no free lunch theorem, there is no universally best machine learning algorithm.Almost all machine learning algorithms are based on few assumptions (learning bias) about the relationship between descriptors and targets.Some algorithms perform better on certain datasets than others, while some datasets will not be modeled effectively by a given algorithm [36].Ensemble learning combines the benefits of various algorithms to achieve higher predictive accuracy than individual algorithms [37].Common types of ensembles include bootstrap aggregating [38], boosting [39], Bayesian model averaging [40], Bayesian model combination [41], Bucket of models [42], and stacking and averaging etc [43,44].As the raw dataset for ITR prediction is small, our individual models are more or less overfitted.Ensemble averaging has been demonstrated to reduce overfitting to some extent and make predictions more robust.As a result, we averaged the prediction results of all our models (XGB, KRR, and DNN models) to make a final ITR prediction, leading to higher R and lower RMSE than any individual model shown in Table 1, indicating better predictive performance.Figure 5 shows the experimental ITR versus the predicted ITR values of the ensemble model.The blue dots represent values predicted with all descriptors and the orange dots represent results predicted with selected descriptors.As we can see, the majority of the blue and orange dots overlapped and is located near the black diagonal dash line.Similarly, the correlations between experimental ITR and predicted ITR forecasted by individual models (XGB, KRR, and DNN) before and after descriptor selection were given in Figures S3-S5.The high overlap between orange and blue dots indicates that we have selected all necessary descriptors, and built a concise ensemble model with improved predictive performances.The concise ensemble model was then used to search high melting point, high ITR material systems.To benchmark the prediction performance of our ensemble model, the predicted top 20 materials systems with high ITR was listed in Table S2.Bi/Diamond, Bi/graphite, Bi/P, Bi/B, Bi/BN, and Bi/BeO systems were also predicted by other groups in the literature, indicating the effectiveness of our ensemble model.To predict high melting points, high ITR material systems, material systems with melting points higher than 600 K were filtered and ranked based on the ITR values.The top 20 high melting points and high ITR material systems predicted by the ensemble model are listed in Table 3.The top predicted material systems are mainly composed of carbon-based substrates, such as diamond, graphite, and graphene.Carbon materials have long been the focus of scientific and industrial communities due to their exceptional electrical, thermal, optical, and mechanical properties.Diamond, graphite, graphene, and carbon nanotubes, for example, have high melting points due to C-C covalent bonds.Furthermore, as synthesis technologies evolve, graphite and graphene can be economically produced on a large scale.As a result, we encourage experimentalists to follow our predictions and explore high melting points, high ITR material systems for practical applications such as spacecraft, automobiles, building insulation, etc.Some of the predicted material systems have been experimentally validated, according to reports in the literature.For example, it has been reported that a Pb/Diamond material system has a very low interface thermal conductance of .ca 25 MW/m 2 K [45], corresponding to very high ITR 40 (10 −9 m 2 K/W).This indicates that our model has a high prediction ac-curacy.Although some material systems, such as CdTe/Diamond, have been explored for solar cells [46], their ITRs are yet to be measured and reported.

Conclusions
In this study, using ensemble learning, we predicted ITR in a robust and reliable approach.The EDA and data visualization were conducted to analyze the raw dataset, summarize its characteristics, and provide guidance for descriptor engineering.The XGB model's importance scores and Pearson coefficients of descriptors were employed for descriptor selection and dimension reduction.To create concise models, 17 out of 35 descriptors were chosen.For ITR prediction, an ensemble model based on the XGB, KRR, and DNN algorithms was developed.The predicted ITR values were used to identify and rank material systems with high melting points and high ITR.The ITR of the Pb/Diamond system predicted by our ensemble model was highly consistent with the experimental value reported in the literature, indicating the high prediction performance of our ensemble model.The predicted material systems provide effective guidelines and significantly reduce the effort required by experimentalists and engineers to search for high melting, high ITR material systems.
The current data used for training models have several limitations.First, the data set is small, and it cannot cover all of the material systems required for training.Second, the data contains only a small fraction (2.9%) of two-dimensional material systems, which must be improved by combining more data from experiments.
Our future work will be focused on developing a database for ITR data from various material systems and collecting additional ITR data from recent literature and experiments.Furthermore, we intend to investigate multicomponent (>2) high melting points, high ITR material systems by synthesizing corresponding nanomaterial compounds.For example, we can synthesize PtTe, PdTe, and graphite nanomaterials separately, and uniformly combine them to produce PtTe/PdTe/graphite nano compounds for practical applications.
Author Contributions: Conceptualization, M. Chen, X. Tian, and X. Zhang; Formal analysis, J. Li and B. Tian; Investigation, Y. M. Al-Hadeethi and B. Arkook; Writing-original draft, M. Chen and X. Tian; Writing-review & editing, X. Zhang.All the authors have read and agreed to the published version of the manuscript.

Figure 1 .
Figure 1.A schematic of predicting high melting point, high ITR material systems by ensemble learning.

Figure 3 .
Figure 3. Pearson correlation coefficient map between all 35 descriptors and the target ITR in a training dataset.

Figure 4 .
Figure 4. Venn diagram showing the descriptor selection process.First, the top 20 descriptors were selected by XGBoost based on importance scores.Then, smass, fdensity, and fmass were removed as they are highly correlated with other descriptors.The remaining relatively independent 17 descriptors were selected to build concise machine learning models.

Figure 5 .
Figure 5. Correlation between the experimental values and values predicted by ensemble model with all descriptors (blue dots) and selected descriptors (orange dots).

Table 1 .
Pros and cons of different models for ITR prediction.

Table 2 .
The predictive performance of various models evaluated by R and RMSE.

Table 3 .
High melting point, high ITR material systems predicted by ensemble model.