Predicting Elastic Constants of Refractory Complex Concentrated Alloys Using Machine Learning Approach

Refractory complex concentrated alloys (RCCAs) have drawn increasing attention recently owing to their balanced mechanical properties, including excellent creep resistance, ductility, and oxidation resistance. The mechanical and thermal properties of RCCAs are directly linked with the elastic constants. However, it is time consuming and expensive to obtain the elastic constants of RCCAs with conventional trial-and-error experiments. The elastic constants of RCCAs are predicted using a combination of density functional theory simulation data and machine learning (ML) algorithms in this study. The elastic constants of several RCCAs are predicted using the random forest regressor, gradient boosting regressor (GBR), and XGBoost regression models. Based on performance metrics R-squared, mean average error and root mean square error, the GBR model was found to be most promising in predicting the elastic constant of RCCAs among the three ML models. Additionally, GBR model accuracy was verified using the other four RHEAs dataset which was never seen by the GBR model, and reasonable agreements between ML prediction and available results were found. The present findings show that the GBR model can be used to predict the elastic constant of new RHEAs more accurately without performing any expensive computational and experimental work.


Introduction
RCCAs are refractory complex concentrated alloys with complicated compositions [1]. Unlike the more restricted refractory high-entropy alloys (RHEAs), RCCAs may include less than four main elements in the composition, have multiple phases, and have small entropy configurations. Compared to typical superalloys, RCCAs have recently attracted increasing interest as a potential choice for high-temperature structural application materials. This is because the appealing qualities of RCCAs which include outstanding high-temperature strength, ductility, high melting temperature, and oxidation resistance [2][3][4][5][6]. The mechanical and thermodynamic properties of alloys are directly linked to their elastic constants. For example, an elastic strain of materials under different stresses depends on the elastic constant [7]. The elastic constants are also linked to essential characteristics, including bulk modulus, shear modulus, Young's modulus, Poisson's ratio, and Debye temperatures. The Voigt-Reuss-Hill (VRH) averaging approximations can be used to estimate them [8].
Several theoretical density functional theory (DFT) studies [9][10][11][12][13][14] and experimental work [15][16][17] are performed to obtain the elastic characteristics of RHEAs. However, understanding of elastic constants of body-centered cubic (BCC) of RCCAs is still restricted because of a lack of high-efficiency data generation methods. It is very time consuming, expensive, and complicated to synthesize and measure the elastic constants of alloys experimentally. Additionally, computational methods need many CPU hours, making them

Calculation of CALPHAD and Data Generation via DFT
The database contains binary, ternary, quaternary, and quinary RCCAs having BCC structures with refractory elements of Cr, Hf, Ta, Ti, Mo, Nb, W, Re, Zr, and V. The thermocalc-2019 package with ThermoCalc's high-entropy alloy database [35] is employed to identify the RCCAs with stable BCC phases. The predicted phases by the Thermo-Calc package are under the experimental reports [36][37][38][39]. The database for RCCAs with stable BCC phases was created by computing the elastic constants of alloys using special quasirandom structure (SQS) model [40] and the MedeA 2.22 software, Materials Design, Inc., San Diego, CA, USA [41]. The possible alloy supercell structures include AB, A 3 B, ABC, A 2 BC, ABCD, and ABCDE, where each A, B, C, D, and E represents one of the above refractory elements. The diagrams of different SQS structures of RCCAs used in the dataset are shown in Figure 1. The number of atoms of AB, A 3 B, ABC, A 2 BC, ABCD, and ABCDE supercell structures is 8,16,36,32,64, and 125, respectively. To calculate the elastic constants, i.e., C 11 , C 12, and C 44 of RCCAs, the Vienna Ab initio simulation package (VASP) [42] was used, and the first-principles DFT [43,44] simulations were carried out. The generalized gradient approximation (GGA) was utilized as an exchange-correlation function to simulate manybody electronic interactions, as implemented in the Perdew-Burke-Ernzerhof (PBE) [45]. The electron-ion interactions were detailed using the projector augmented wave (PAW) model [46]. In all calculations, a 500 eV energy cutoff was employed for the plane-wave basis. The requirements for energy and force convergences in relaxation processes are set to be 10 −5 eV/atom and 10 −5 eV/Å, correspondingly. A smearing parameter of 0.2 eV was used in the Methfessel-Paxton method [47]. The stability of RCCAs was checked by the formation energy, E f = E(RCCA) − ∑ i n i µ i , where E(RCCA) is the energy of RCCA, n i is the number of atoms of species i, and µ i is the chemical potential of species i. The elastic constants of 370 RCCAs are computed from the DFT calculations. By utilizing strain matrix e, altering the Bravais lattice vectors R = (a, b, c) to R = (a , b , c ), a deformation of the unit cell is generated, and the elastic constants are obtained, where , , represent the total energy of unstrained lattice, volume of distorted cell, and elements of the elastic constants matrix, respectively. The un symmetry can lessen the number of independent elastic constants, and it is only 3 cubic system, i.e., C11, C12, and C44.

Descriptor Selection
Selecting the meaningful physical descriptors is a critical component of an M rithm, and the optimal choice usually depends on the target variable under invest It is essential to select the descriptors whose value can be predicted with any DFT tations to save computational costs. Initially, ten numeric descriptors of RCCAs categorical data (name of alloys) are chosen for ML models, and numeric descrip calculated using the Python program. The mechanical and thermal characteristic Due to the deformation, total crystal energy will change. The Le Page and Saxe [48] symmetry-general approach is used for calculating elastic constants from total energy calculations, where E 0 , V 0 , and C ij represent the total energy of unstrained lattice, volume of the undistorted cell, and elements of the elastic constants matrix, respectively. The unit cell's symmetry can lessen the number of independent elastic constants, and it is only 3 for the cubic system, i.e., C 11 , C 12 , and C 44 .

Descriptor Selection
Selecting the meaningful physical descriptors is a critical component of an ML algorithm, and the optimal choice usually depends on the target variable under investigation. It is essential to select the descriptors whose value can be predicted with any DFT computations to save computational costs. Initially, ten numeric descriptors of RCCAs and one categorical data (name of alloys) are chosen for ML models, and numeric descriptors are calculated using the Python program. The mechanical and thermal characteristics of the alloy's elemental constituents are used to calculate these descriptors. The RCCAs descriptors are entropy of mixing (∆S mix ) [49], enthalpy of mixing (∆H mix ) [50], the temperature of melting, average atomic radius (r), average electronegativity (χ) and a dimensionless parameter omega (Ω). The Ω is defined as ∆S mix divided by the ∆H mix and multiplied by alloys' average melting temperature (MT) [51]. Material properties cannot be characterized by averaging the constituent properties. Thus, it is imperative to consider the differences in properties [52]. As a result, the characteristic of these differences, i.e., atomic size difference (δ), electronegativity difference (∆χ), and valence electron concentration (VEC) of RCCAs [53] are also taken into consideration. The number of descriptors used in these calculations is successfully applied in predicting the stability and properties of alloys in past studies [19][20][21][22][23][24][25][26]. For example, the Islam group [20] used 118 instances of data set with five descriptors, i.e., VEC, ∆S mix , ∆H mix , δ, and χ, to determine the phase in multi-principle element alloys. Khakurel et al. [27] used some ML models to estimate Young's modulus of CCAs. They found that VEC, geometrical parameter, δ, and melting temperature play essential roles in determining the modulus of CCAs. The geometrical parameter (λ), which is defined by ∆S mix divided by the square of δ, has also been added as a descriptor in this work. The above-selected ten numeric descriptors are calculated using the following Equations (3)-(12), shown in Table 1. A total of 370 alloys used in the ML models are included in the form of categorical data. Table 1. List of the mathematical formulation of descriptors used for different ML models.

Descriptors Abbreviation Descriptors Calculation Formula
Entropy of mixing Entropy Enthalpy of mixing Enthalpy Melting temperature MT Average atomic radius AAR Average electronegativity AE Unitless parameter Omega Omega Atomic size difference ASD Valence electron concentration VEC Electronegativity difference ED Geometrical parameter GP λ = ∆S mix δ 2 (12) In equations of Table 1, a i and a j are the atomic percentages of the i th and j th element. r i , χ i and (VEC) i are the radius, Pauling electronegativity, and valence electron concentration of the i th element. r is averaged atomic radius, χ is average Pauling electronegativity, T m is the melting temperature of the element, and R is the ideal gas constant. The required data of refractory elements for calculating the descriptors are shown in Tables 2 and 3.

Machine Learning Models
There are 370 RCCAs data points from the DFT calculation. A total of 90% of data were utilized for training, and 10% were used to test and validate the results. The three ML models used in this work are random forest (RF) regressor, gradient boosting regressor (GBR), and XGBoost (XGB) through the Scikit-learn library [55], which is also used for data preparation and model assessment. The RF model is an ensemble approach consisting of many decision trees to improve prediction accuracy and reduce overfitting by averaging the trees. The samples of the training dataset may reappear in the single tree due to sampling replacement, also known as bootstrapping. GBR and XGB are multi-purpose machine learning methods that can handle both classification and regression problems. The GBR is a type of ensemble model having an iterative collection of tree models, and it is capable of learning from the previous model's mistakes. The XGBoost [56] is a powerful machine learning method that makes a quick decision by implementing boosted decision trees efficiently and effectively. After finalizing the descriptors, the 3 ML models were trained using 370 datasets of elastic constants with all calculated descriptors. The GridSearchCV function in the sklearn library was used to improve the machine learning model with a cross-validation score of 5. The adjustment of hyperparameters was essential since it impacts the overall performance of machine learning algorithms as it cross-validates the training and testing datasets. Following the searching algorithm of GridSearchCV, hyper-tuned parameters for all models were found. It was used in the model to make ML models to predict elastic constants. The root-mean-squared error (RMSE), the average coefficient of determination (R 2 ), and mean absolute error (MAE) are calculated for each ML model to evaluate its performance. The most satisfactory model with high performance during the training and testing phases is further used to predict the elastic constants of four experimental and simulation data. These data are entirely new to the model to predict elastic constants. The workflow of this work, as explained above, is illustrated in Figure 2.

Feature Selection Criteria
Before applying the ML models, the Pearson correlation coefficients (p) between each pair of descriptors are calculated, and the results are shown using the triangular heatmap in Figure 3. The numeric values of p equal to 0, 1, and −1 denote no correlation, high positive correlation, and high negative correlation, respectively. The highly correlated features are likely to overfit the ML model, which is undesirable. The dark purple and lightpink colors represent highly positive and negative correlations, respectively. Two of the descriptors, i.e., average electronegativity and average radius, are strongly correlated with VEC, and it has been removed from the descriptor list. Since there is no substantial dependency between any two of the remaining descriptors, all metrics should be incorporated into the ML model. Therefore, the number of descriptors reduce from 11 to 9.

Feature Selection Criteria
Before applying the ML models, the Pearson correlation coefficients (p) between each pair of descriptors are calculated, and the results are shown using the triangular heatmap in Figure 3. The numeric values of p equal to 0, 1, and −1 denote no correlation, high positive correlation, and high negative correlation, respectively. The highly correlated features are likely to overfit the ML model, which is undesirable. The dark purple and light-pink colors represent highly positive and negative correlations, respectively. Two of the descriptors, i.e., average electronegativity and average radius, are strongly correlated with VEC, and it has been removed from the descriptor list. Since there is no substantial dependency between any two of the remaining descriptors, all metrics should be incorporated into the ML model. Therefore, the number of descriptors reduce from 11 to 9.

ML Model Performance
The elastic constant for RCCAs was predicted from the dataset by employing three ML methods: RF, GBR, and XGBoost. The dataset is randomly divided into a training and a testing set with a 9 to 1 ratio. For each approach, hyper tunning is done with a crossvalidation score of 5. The different tuned hyperparameter of three ML models obtained from the Grid search method is shown in Table 4. MSA, RMSE, and R 2 are calculated for measuring the performances of the ML models using the equations below.
where y i ,ŷ i , y and n represent the DFT elastic constants, ML-predicted elastic constants, mean of all elastic constants, and sample size, respectively.

ML Model Performance
The elastic constant for RCCAs was predicted from the dataset by employing three ML methods: RF, GBR, and XGBoost. The dataset is randomly divided into a training and a testing set with a 9 to 1 ratio. For each approach, hyper tunning is done with a crossvalidation score of 5. The different tuned hyperparameter of three ML models obtained from the Grid search method is shown in Table 4. MSA, RMSE, and R are calculated for measuring the performances of the ML models using the equations below.
where , , , and n represent the DFT elastic constants, ML-predicted elastic constants, mean of all elastic constants, and sample size, respectively. The performance metrics MAE, R 2 , and RMSE of all three models are shown in Figure  4. To evaluate which model performs very well while predicting the elastic constant, MSA, R , and RMSE needs to be analyzed. ML model with a higher value of R 2 close to 1.0 with low MAE and RMSE will result in better performance. It is hard to distinguish which model performs well on the dataset if only R 2 is analyzed. Analyzing results from Figure  4a-c, GBR has the least MAE and RMSE with higher R 2 value than the other two ML models indicating its higher prediction accuracy of elastic constant. The R 2 value for all models ranges from 0.76-0.97 for the training and testing datasets. Similar value of R 2 was found by Trostianchyn et al. [57] while predicting the magnetic remanence of Sm-Co magnets using machine learning methods. They found the R 2 value of their ensemble ML model greater than 0.7. In this work, considering the different performance metrics of the GBR among other models, it is considered to be the optimal model. The performance metrics MAE, R 2 , and RMSE of all three models are shown in Figure 4. To evaluate which model performs very well while predicting the elastic constant, MSA, R 2 , and RMSE needs to be analyzed. ML model with a higher value of R 2 close to 1.0 with low MAE and RMSE will result in better performance. It is hard to distinguish which model performs well on the dataset if only R 2 is analyzed. Analyzing results from Figure 4a-c, GBR has the least MAE and RMSE with higher R 2 value than the other two ML models indicating its higher prediction accuracy of elastic constant. The R 2 value for all models ranges from 0.76-0.97 for the training and testing datasets. Similar value of R 2 was found by Trostianchyn et al. [57] while predicting the magnetic remanence of Sm-Co magnets using machine learning methods. They found the R 2 value of their ensemble ML model greater than 0.7. In this work, considering the different performance metrics of the GBR among other models, it is considered to be the optimal model.   The GBR model optimizes the bias while training the dataset and repeatedly trains the trees in the entire process to avoid the early paralysis of trees during the training data which results in better performance in prediction. The performance metrics of all 3 ML models are listed in Table 5. The MAE of training data of the GBR model is below 5, which means it has predicted the elastic constants of RHEAs with an average error of less than 5 GPa. Similarly, the MAE of the test data of the GBR model is less than 11 for the test dataset, which means there is not much difference in the prediction of the training and testing datasets. This manifests the testing data is not highly overfitted. The overall performances of all three models were good, which confirms that the selection of 9 descriptors fitted well for the ML models. The elastic constants predicted by the optimal GBR models versus DFT results on the training and testing datasets are illustrated in Figure 5. Most training data points (blue dots), and testing data points (dark pink diamonds) are clustered around the light blue solid diagonal line, indicating that predicted elastic constants of RCCAs predicted by the GBR ML model are consistent with the DFT calculations. Some testing data points are slightly far away from the diagonal line. It is due to the absence of sufficient data points around the relevant area. In the future, the addition of more data points will further improve the model, and data points will likely distribute evenly around the diagonal line.  The GBR model optimizes the bias while training the dataset and repeatedly trains the trees in the entire process to avoid the early paralysis of trees during the training data which results in better performance in prediction. The performance metrics of all 3 ML models are listed in Table 5. The MAE of training data of the GBR model is below 5, which means it has predicted the elastic constants of RHEAs with an average error of less than 5 GPa. Similarly, the MAE of the test data of the GBR model is less than 11 for the test dataset, which means there is not much difference in the prediction of the training and testing datasets. This manifests the testing data is not highly overfitted. The overall performances of all three models were good, which confirms that the selection of 9 descriptors fitted well for the ML models. The elastic constants predicted by the optimal GBR models versus DFT results on the training and testing datasets are illustrated in Figure 5. Most training data points (blue dots), and testing data points (dark pink diamonds) are clustered around the light blue solid diagonal line, indicating that predicted elastic constants of RCCAs predicted by the GBR ML model are consistent with the DFT calculations. Some testing data points are slightly far away from the diagonal line. It is due to the absence of sufficient data points around the relevant area. In the future, the addition of more data points will further improve the model, and data points will likely distribute evenly around the diagonal line.

Additional Testing on The Final Model
Furthermore, four experimental and simulation data of RCCAs available in the previous publications [10,17] were passed in the optimal model, i.e., GBR, to predict the elastic constants for additional testing. All the elastic constants data of four alloys is provided in Table 6 and the results are shown in Figure 6, a histogram with an error bar. The error bar denotes the deviation from the original prediction. The elastic constants predicted by the GBR ML models for TiZrVNbMo, TiZrNbMo, and TiZrVNb RHEAs are compared with the simulation conducted by Tian's group. The elastic constants predicted by the ML models are in good agreement with the SQS model calculation since the average error bar length is smaller. However, there is a slight difference between the SQS model and the ML model, which may have been caused by the number of atoms used in the model construction and the applied DFT method. The SQS structures for RHEAs of MoNbTiZr, MoNbTiVZr, and NbTiVZr consist of 128, 250, and 128 atoms, respectively. In comparison, depending on the types of RCCAs, the DFT calculation utilized in this work to build the database comprises supercells of 8, 16, 32, 36, 64, and 125 atoms. Due to inconsistency in the size of the atomic model and applied potential, there is a difference between chemical interactions of atoms in the DFT method, which results in a slight difference in elastic constants [58]. The elastic constant C 11 predicted by the ML model for NbTaTiV matches the experimental value reported in Lee's study very well [17]. The slight difference between the ML and the experimental values of elastic constants may result from the temperature difference. The DFT models were performed at 0K, whereas the experiment was conducted at 300 K. Similar results were found while predicting the elastic constants of HEA Al0.3CoCrFeNi by Kim et al. [28]. The predicted elastic constants of HEA Al0.3CoCrFeNi deviate from the measured values from the in situ neutron-diffraction. As mentioned before, measuring elastic constants by experiment is time consuming and has its limitation. It is expected Materials 2022, 15, 4997 9 of 16 more high throughput experimental methods will be designed to confirm the findings of the elastic constants for RHEAs MoNbTiVZr, MoNbTiZr, and NbTiVZr.

Additional Testing on The Final Model
Furthermore, four experimental and simulation data of RCCAs available in the previous publications [10,17] were passed in the optimal model, i.e., GBR, to predict the elastic constants for additional testing. All the elastic constants data of four alloys is provided in Table 6 and the results are shown in Figure 6, a histogram with an error bar. The error bar denotes the deviation from the original prediction. The elastic constants predicted by the GBR ML models for TiZrVNbMo, TiZrNbMo, and TiZrVNb RHEAs are compared with the simulation conducted by Tian's group. The elastic constants predicted by the ML models are in good agreement with the SQS model calculation since the average error bar length is smaller. However, there is a slight difference between the SQS model and the ML model, which may have been caused by the number of atoms used in the model construction and the applied DFT method. The SQS structures for RHEAs of MoNbTiZr, MoNbTiVZr, and NbTiVZr consist of 128, 250, and 128 atoms, respectively. In comparison, depending on the types of RCCAs, the DFT calculation utilized in this work to build the database comprises supercells of 8, 16, 32, 36, 64, and 125 atoms. Due to inconsistency in the size of the atomic model and applied potential, there is a difference between chemical interactions of atoms in the DFT method, which results in a slight difference in elastic constants [58]. The elastic constant C11 predicted by the ML model for NbTaTiV matches the experimental value reported in Lee's study very well [17]. The slight difference between the ML and the experimental values of elastic constants may result from the temperature difference. The DFT models were performed at 0K, whereas the experiment was conducted at 300 K. Similar results were found while predicting the elastic constants of HEA Al0.3CoCrFeNi by Kim et al. [28]. The predicted elastic constants

Descriptor Importance and Visualization
The Shapley Additive exPlanations (SHAP) method explains how each descriptor influences the ML by calculating the Shapley values [59]. All the dataset instances are used in the SHAP bar graph by taking the mean absolute value of every descriptor. The relevance ranking of 9 descriptors and their role in contributing to the elastic constants are displayed in Figure 7. VEC and MT hold the top two positions in predicting the elastic constants. From the previous findings, it is interesting to note the role of the VEC descriptor, which has the top position in descriptor importance in determining the crystal structure [20][21][22][23][24][25], Young's modulus [27,60,61], and hardness [62] of HEAs. The previous study has shown that VEC is the essential feature in determining Young's modulus of complex concentrated alloys in their ML models [27]. Moreover, Roy et al. [63] also found that MT is an essential parameter in predicting Young's modulus of HEAs using the gradient boosting method.

Descriptor Importance and Visualization
The Shapley Additive exPlanations (SHAP) method explains how each descriptor influences the ML by calculating the Shapley values [59]. All the dataset instances are used in the SHAP bar graph by taking the mean absolute value of every descriptor. The relevance ranking of 9 descriptors and their role in contributing to the elastic constants are displayed in Figure 7. VEC and MT hold the top two positions in predicting the elastic constants. From the previous findings, it is interesting to note the role of the VEC descriptor, which has the top position in descriptor importance in determining the crystal structure [20][21][22][23][24][25], Young's modulus [27,60,61], and hardness [62] of HEAs. The previous study has shown that VEC is the essential feature in determining Young's modulus of complex concentrated alloys in their ML models [27]. Moreover, Roy et al. [63] also found that MT is an essential parameter in predicting Young's modulus of HEAs using the gradient boosting method. Using the SHAP value, the relevance ranking of various descriptors provides a numerical correlation between the input descriptors and target descriptors. Further details between the descriptors and elastic constant can be explored using the data visualization techniques and the concept of material. The parallel coordinate plot (PCP) of the dataset is plotted to display the correlations of the descriptor with the elastic constants, and it is shown in Figure 8. The highest value of each descriptor is shown on the top and bottom of the vertical axes, respectively. The values of alloy elastic constants are demonstrated by the three-colored bars on the right-hand side of Figure 8. The purple color represents the highest elastic constants, and yellow represents the lower elastic constant. The PCP indicates that higher VEC, MT, and GP values with lower omega values are favorable for developing a higher elastic constant in refractory alloys. The VEC value positively influences the elastic constant of RHEAs; the greater the VEC, the higher the elastic constant, and vice versa. This observation is quite similar to Guo et al. findings [53], which show when VEC is less than 6.87, the solid solution phase of the BCC is more stable due to solid strengthening caused by lattice distortion. Similar phenomena is observed from the PCP, when VEC lies between 5 to 6.3, the elastic constant becomes higher and it could be due to the lattice distortion. The lattice distortion is also formed when different size elements are mixed, and this mixing will force the atoms to move from their original position and result in lattice distortion. Previous work has also described the significance and importance of VEC in physical characteristics and phase stability in alloys [53,54,64,65]. Similar results were found by Huang et al. [66] about the VEC, which holds the top position in features importance while predicting the hardness of RHEAs using the RF model. They also highlight that VEC is important in charge transfer, which impacts RHEA strength because of lattice distortion. So far, the lattice distortion study is still in progress, and it is done by simulation only. Experimental measurement of lattice distortion in alloys is needed to get further knowledge about the contribution of each element to the elastic properties of RHEAs. The second top descriptor is melting temperature. According to the PCA plot, RHEAs with higher melting temperatures favor the elastic constant of materials. The RHEAs consisting of elements Re, W, Ta, and Mo which have higher VEC and MT, have increased the alloy's elastic constant. Additional visualization using the contour plot is plotted to see the relationship between top important features VEC and MT with elastic constants. Figure 9a-c show the contour plot of the elastic constants of RHEAs as a function of VEC and MT. The original value of VEC and MT is normalized to get the smooth graph. The right-hand side color bar denotes the values of elastic constant in GPa. According to the contour plot, elastic constants of RHEAs increased with higher MT and VEC values. If MT is higher in RHEAs, with low VEC, then the alloy will have low elastic constants. A similar color pattern graph is obtained for all elastic constants as a function of VEC and MT. As a result, it is necessary to consider MT and VEC to get high elastic constants. The change in elastic constants with change in materials stability will help to improve the mechanical properties. Using the SHAP value, the relevance ranking of various descriptors provides a numerical correlation between the input descriptors and target descriptors. Further details between the descriptors and elastic constant can be explored using the data visualization techniques and the concept of material. The parallel coordinate plot (PCP) of the dataset is plotted to display the correlations of the descriptor with the elastic constants, and it is shown in Figure 8. The highest value of each descriptor is shown on the top and bottom of the vertical axes, respectively. The values of alloy elastic constants are demonstrated by the three-colored bars on the right-hand side of Figure 8. The purple color represents the highest elastic constants, and yellow represents the lower elastic constant. The PCP indicates that higher VEC, MT, and GP values with lower omega values are favorable for developing a higher elastic constant in refractory alloys. The VEC value positively influences the elastic constant of RHEAs; the greater the VEC, the higher the elastic constant, and vice versa. This observation is quite similar to Guo et al. findings [53], which show when VEC is less than 6.87, the solid solution phase of the BCC is more stable due to solid

Comparison of Current Work with Previous Work
In terms of elastic constant, Vasquez et al. [67] reported the analytical model to predict the elastic constants of RHEAs consisting of five elements Nb, Mo, Ta, W, and V alloys. In comparison, ten elements were included in the dataset in this study to give a boarder range of compositional space for alloys. Furthermore, hyper tunning of all 3 ML models is also performed, and metrics such as MSE and RMSE are calculated to find the optimal ML model capable of predicting elastic constant with higher accuracy. The dataset and model could be more beneficial to optimizing the various alloys and help in alloy design.
According to the contour plot, elastic constants of RHEAs increased with higher MT and VEC values. If MT is higher in RHEAs, with low VEC, then the alloy will have low elastic constants. A similar color pattern graph is obtained for all elastic constants as a function of VEC and MT. As a result, it is necessary to consider MT and VEC to get high elastic constants. The change in elastic constants with change in materials stability will help to improve the mechanical properties.   function of VEC and MT. The original value of VEC and MT is normalized to get the smooth graph. The right-hand side color bar denotes the values of elastic constant in GPa. According to the contour plot, elastic constants of RHEAs increased with higher MT and VEC values. If MT is higher in RHEAs, with low VEC, then the alloy will have low elastic constants. A similar color pattern graph is obtained for all elastic constants as a function of VEC and MT. As a result, it is necessary to consider MT and VEC to get high elastic constants. The change in elastic constants with change in materials stability will help to improve the mechanical properties.  . (a-c) represents the contour plot of elastic constants C11, C12, and C44, respectively. Figure 9. (a-c) Represents the contour plot of elastic constants C 11, C 12 , and C 44 , respectively.
As the availability of elastic constants for RCCAs is very limited, this study demonstrates the methodology of estimating the elastic constants for RCCAs based on the ML models using a database constructed from DFT calculations. A total of 370 instances of data were created with the help of DFT computational simulations. The used database is relatively small, but as previous studies show, ML can successfully be applied to a small dataset. As an example, Islam et al. [20] also used 118 instances and five features of the multi-principal element alloys dataset, which were used to forecast the alloy's structure. Wen et al. [32] applied ML methods to the alloy system of Al-Co-Cr-Cu-Fe-Ni to find the maximum hardness with just 155 samples. Another group [68] identified the phase formation in HEAs with a small dataset by selecting low dimension descriptors. This study aimed to predict the elastic constants of RCCAs without performing any expensive DFT simulations and experimental methods, which is supposed to save time, effort, and money in designing RCCAs. It is indispensable to state that elastic constants are essential properties affecting materials' stability, strengths, and anisotropy. There is still plenty of room to optimize the compositional space of alloys. It is necessary to determine the shear strains using an atomic displacement field to evaluate the lattice distortion. The elastic constant is also one parameter for obtaining the overall influence of such local strain changes [69]. Therefore, faster methods that are effective in exploring the compositional space of alloys for creating high-temperature and high-pressure performance materials are highly demanded. The current ML methods can acquire information from the descriptor of datasets and link the relationship between the HEAs composition and target properties, i.e., elastic constants. One achievement of this study is that if a user comes up with a new composition of an alloy, the provided Python program can predict all of the nine descriptors for the considered alloys. If the obtained information is passed into the trained ML models, the elastic constant of the given alloy will be predicted. These models may be further applied to study the compositional space of novel RCCAs, and they can be used to develop future RCCAs with desirable mechanical properties. These ML models do not consider complex phenomena such as lattice defects, dislocations, grain boundaries, slip planes, and plastic deformation in RCCAs. It is expected that more powerful models will be developed that can consider the complex phenomenon of RCCAs to make more accurate predictions of mechanical properties.

Conclusions
In this study, three popular ML methods, namely random forest regressor, gradient boosting regressor, and decision tree regression models, were used for predicting the elastic properties of RCCAs. The ML model consists of eight different descriptors of RCCAs. GBR performs very well as compared with the other two ML models. While predicting elastic constants C 11 , C 12 , and C 44 , the R 2 value of testing data of the GBR model were 0.97, 0.803, and 0.787, respectively. The GBR model has a high R 2 value and low MAE and RMSE error than RF and XGB ML models and is promising for predicting the elastic constant of RHEAs. The elastic constants of alloys MoNbTiVZr, MoNbTiZr, NbTiVZr, and NbTaTiV predicted by the GBR model are consistent with the current simulation and experimental findings, which validate its prediction accuracy and applicability. The feature importance from SHAP and PCP plot shows that the VEC and MT are the most influential features affecting the elastic constants of RHEAs. The dataset created in this study using CALPHAD and DFT modeling could help scientists to construct RHEAs that can endure extreme temperatures and pressures. Compared to the computationally expensive DFT calculation and experimentally tedious trial-and-error methods for calculating the elastic constants, ML methods can be applied to predict the elastic constant of any novel refractory high-entropy alloys with very low costs, and hence helps on designing RHEAs to meet the industry demands. The GBR model is not capable of predicting the elastic constants of RHEAs at different temperatures and therefore future work should be focused on creating an ML model that can predict the elastic constants at varying temperatures.

Data Availability Statement:
The dataset utilized to produce the results in this work is accessible at https://github.com/uttambhandari91/Elastic-constant-DFT-data.

Conflicts of Interest:
The authors declare no conflict of interest.