Phase Transformation Temperature Prediction in Steels via Machine Learning

The phase transformation temperature plays an important role in the design, production and heat treatment process of steels. In the present work, an improved version of the gradient-boosting method LightGBM has been utilized to study the influencing factors of the four phase transformation temperatures, namely Ac1, Ac3, the martensite transformation start (MS) temperature and the bainitic transformation start (BS) temperature. The effects of the alloying element were discussed in detail by comparing their influencing mechanisms on different phase transformation temperatures. The training accuracy was significantly improved by further introducing appropriate features related to atomic parameters. The melting temperature and coefficient of linear thermal expansion of the pure metals corresponding to the alloying elements, atomic Waber–Cromer pseudopotential radii and valence electron number were the top four among the eighteen atomic parameters used to improve the trained model performance. The training and prediction processes were analyzed using a partial dependence plot (PDP) and Shapley additive explanation (SHAP) methods to reveal the relationships between the features and phase transformation temperature.


Introduction
The microstructure and mechanical properties of steels depend on chemical composition, plastic deformation and heat treatment process [1][2][3][4][5][6].Phase transformation start and finish temperatures in steels, marking the initial formation of the diverse microstructure, are crucial in designing steels with different targeted microstructures [7,8], such as martensite and bainite, as well as other advanced high-strength steels with complex microstructures, for example, medium manganese steels requiring inter-critical annealing [9][10][11][12].The martensite transformation start (MS) temperature has attracted significant interest over the years because lath martensite is a base microstructure constituent for most high-strength steels [13].Numerous methodologies, such as thermodynamics-based methods, linear regression, artificial neural network (ANN) modeling and machine learning, have been applied to predict the martensite transformation start (MS) temperature [14][15][16].In the meantime, due to the virtues of bainitic steels, such as high strength, ductility, toughness and creep resistance at reasonable costs, the austenite-to-bainite transformation has also gained lots of interest [17].Ac1 and Ac3 temperatures are the main parameters used to design the heat treatment process of steels.For example, medium Mn steel (3-12 wt.Mn%), as one of the various candidate steels for third-generation AHSS, typically consists of an ultrafine-grained dual-phase (austenite-ferrite) microstructure obtained through the inter-critical annealing (annealing between Ac1 and Ac3 temperature) of the quenched martensite matrix [18].However, studies on the prediction of the BS temperature [19] and Ac1 and Ac3 temperatures [20] by means of machine learning are limited compared with those on the prediction of the MS temperature.Systematic and comparative studies on the influencing factors of the main phase transformation temperature in steels have not been reported.Meanwhile, most research has focused on the accuracy of trained models, but the focus on the training process and explanation of the trained models has been limited.This work aims to conduct in-depth research on these two concerns.
The main drawback of the empirical formulation is that it utilizes a linear equation to describe the relationship between the alloying element content and the phase transformation start temperature [21][22][23].The general structure of the equation is as follows [24]: where k 1 k 2 k 3 • • • k N represent the content of alloying elements (wt%).Similarly, w 1 w 2 w 3 • • • w N denote the weight coefficients, and k 0 is the bias coefficient.Normally, empirical formulas are only applicable to limited steels.For machine learning, early on, the chemical composition was chosen as the input feature to train models [25].Recently, new features related to a simplified but still complicated Gibbs energy change description have been included in the feature space to improve the performance of trained modes, and significant improvements have been achieved.Moreover, recently, to predict the martensite transformation start temperature in Fe-C-X alloys, researchers constructed a complicated formula to represent the main part of the non-driving force of martensite transformation [26].However, in our previous work [27], it was found that the addition of new features related to six atomic parameters significantly improved the performance of the trained model to predict the martensite transformation temperature.Therefore, in this work, instead of constructing complicated formulas to describe the driving force and/or resistance force of the specific phase transformation, more atomic parameters were considered to construct new features.
To improve the performance of the trained prediction model of the phase transformation temperature, a lot of efforts have been made, such as parameter tuning, principal component analysis, careful training dataset preparation, diverse dataset cleaning methodology, different machine learning method comparisons [28] and so on.Wang et al. [29] integrated deep data mining of thermodynamic calculations with a deep learning framework to develop a versatile and scalable model for the prediction of the martensite transformation start temperature.Thermodynamic calculations enhance the information in a feature set but necessitate specialized computational software and databases.Lu et al. [30] utilized thermodynamic knowledge in combination with a multi-layer feedforward neural network to reduce the dimension of the feature space through kernel principal component analysis.Furthermore, a genetic algorithm was employed to find the appropriate hyperparameters to predict the martensite transformation start temperature of steels.Peet et al. [31] utilized a combination of a thermodynamic model and a Bayesian neural network to predict the martensite transformation start temperature.Tian et al. [32] assessed four machine learning models, namely random forest regression, support vector regression, linear regression and XGB regression.Both random forest and XGB, which are based on tree models, demonstrate excellent performance, suggesting that integrated algorithms coupled with tree models are effective in addressing nonlinear problems.Most existing machine learning models struggle with interpretability.Tree-based models, including random forest and gradient-boosting tree models, exhibited natural and strong interpretability.These integrated models not only retain the excellent interpretability of tree models but also offer superior performance, addressing the limitations associated with interpretability in traditional models.LightGBM is an improved and high-performance gradient-boosting framework with higher efficiency and accuracy [33].For example, recently, LightGBM outperformed other classic machine learning methods, such as XGBoost, random forest, SVR and Lasso, in the prediction of the corrosion rate of 3C steel [34], and other boosting methods, such as adaptive boosting (AdaBoost), gradient-boosting machine (GBM), extreme gradient boosting (XGBoost) and categorical gradient boosting (CatBoost), in the prediction of the sequence of plastic hinge formation in steel frame structures [35].
Therefore, in this work, to deepen the understanding of the influence mechanisms of alloying elements on phase transformation temperatures, a systematical study of the four transformation temperatures was conducted.LightGBM was chosen as the machine learning algorithm integrating atomic parameter descriptors.Furthermore, to improve the understanding of the prediction models, the partial dependence plot (PDP) and Shapley additive explanations (SHAP) analysis methods were utilized to explain the trained models and the prediction results.

LightGBM Algorithm
In the present work, an improved version of the gradient-boosting method LightGBM was chosen as the machine learning algorithm [36].LightGBM yields better training speed and prediction accuracy by using improved measures such as histogram-based algorithms, which bucket the continuous values of features into discrete bins, and a leaf-wise tree growth strategy wherein the leaf with maximum loss is selected to grow and, therefore, the number of leaves at each level is not always the same [37,38].

Evaluation Metrics
Mean absolute error (MAE) and the coefficients of determination (R 2 ) were utilized to evaluate the accuracy of the trained model.Mean absolute error (MAE), representing the mean absolute error between the predicted value and the real value, can be expressed as follows: where N represents the number of the samples, and y i and Y i represent the real and predicted values of the ith sample, respectively.The smaller the MAE is, the better the performance [39].The coefficient of determination (R 2 ) indicates the amount of dependent variable Y that can be accounted for by the independent variable x in the regression model, expressed as follows: where N refers to the number of samples.y i , Y i and y represent the real value of the ith sample, the predicted value of the ith sample and the average of real values, respectively.The larger R 2 is, the better the performance [40].

Model Interpretability Metrics
Interpretability is crucial in understanding the trained models by means of machine learning methods [41].In this work, PDP and SHAP were used to explore the relationships between the features and phase transformation temperatures.Partial dependence plots (PDP) describe the marginal influence of one or two features on the prediction outcome of a trained machine learning model: where x s is the set of the features for which the partial dependence function should be plotted and x c represents the other remained features utilized in the machine learning model.Features x s and the set x c together make up the total feature space.By marginalizing over the features in set x c , a function that depends on the features in x s is then obtained [42].
Shapley additive explanations (SHAP) comprise an interpretative method of machine learning based on game theory.The formula is as follows: where Z ′ ∈ {0, 1} M , M is the number of input features, and φ i ∈ R. The Z i ′ variables typically represent a feature being observed Z i ′ = 1 or unknown Z i ′ = 0 , and the φ i ′ s are the feature attribution values [43][44][45].

Machine Learning Strategy
K-fold cross-validation was utilized in the present work as presented in Figure 1.The dataset was subdivided into K subsets which were independent of each other.Each subset was selected as a test set in turn, and the remaining K − 1 subsets were selected as a training set.The performance of the selected machine learning method was evaluated by obtaining the averaged prediction accuracies of the K tests [46].The workflow of the present work is shown in Figure 2.
Materials 2024, 17, x FOR PEER REVIEW 4 of 30 obtained [42].Shapley additive explanations (SHAP) comprise an interpretative method of machine learning based on game theory.The formula is as follows: where  ∈ {0,1} ,  is the number of input features, and  ∈ .The  variables typically represent a feature being observed ( = 1) or unknown ( = 0), and the  ′ are the feature attribution values [43][44][45].

Machine Learning Strategy
K-fold cross-validation was utilized in the present work as presented in Figure 1.The dataset was subdivided into K subsets which were independent of each other.Each subset was selected as a test set in turn, and the remaining K − 1 subsets were selected as a training set.The performance of the selected machine learning method was evaluated by obtaining the averaged prediction accuracies of the K tests [46].The workflow of the present work is shown in Figure 2.   obtained [42].Shapley additive explanations (SHAP) comprise an interpretative method of machine learning based on game theory.The formula is as follows: where  ∈ {0,1} ,  is the number of input features, and  ∈ .The  variables typically represent a feature being observed ( = 1) or unknown ( = 0), and the  ′ are the feature attribution values [43][44][45].

Machine Learning Strategy
K-fold cross-validation was utilized in the present work as presented in Figure 1.The dataset was subdivided into K subsets which were independent of each other.Each subset was selected as a test set in turn, and the remaining K − 1 subsets were selected as a training set.The performance of the selected machine learning method was evaluated by obtaining the averaged prediction accuracies of the K tests [46].The workflow of the present work is shown in Figure 2.

Data Collection and Screened
The original dataset of the present work was downloaded from a subset of the phase transformation database named the Materials Algorithms Project (MAP), provided by University of Cambridge.Although the Materials Algorithm Project (MAP) is an open scientific research project, it originated from a high-quality joint project of the University of Cambridge and the National Physical Laboratory and was sponsored for four years by the Engineering and Physical Sciences Research Council (EPSRC) of the United Kingdom [47][48][49].
To ensure data quality, entries with the same chemical composition but with different MS values were removed.In both Ac1 and Ac3 datasets, terms with the same chemical composition and heating rate but with different Ac1 and Ac3 values were removed.In the BS dataset, terms with the same chemical composition and cooling rate but with different BS values were removed.The sizes of the four datasets after data cleaning are shown in Table 1, and the numbers of deleted samples in each dataset are given in Table 2.The overall information of MS, Ac1 and Ac3, and BS datasets are shown in Tables 3-5, respectively.Atomic parameters could be divided into two categories.One was associated with the properties of the free atoms, such as radius, electronegativity and the ionization energy of the atoms, etc.The other was related to the pure metals corresponding to the alloying elements [50].The atomic parameters utilized in the present work are shown in Table 6.

Performance of Empirical Formula
The collected empirical formulas sued to predict Ac1 and Ac3 temperatures, the bainite transformation start temperature (BS temperature) and the martensite transformation start temperature (MS temperature) are shown in 24,25,[50][51][52][53][54][55][56][57][58], respectively.The performance of the empirical formulas was evaluated on the dataset provided by the Materials Algorithms Project (MAP) and compared with four preliminary machine learning models (called base models) trained using the same datasets.Four base models using only the chemical compositions, the cooling or heating rates, and the corresponding phase transformation temperature were trained based on the MAP dataset.During training, n_estimators was set as 600, random_state was set as 8 and all other hyperparameters in LightGBM were used with default values.The performances of the four trained models and the empirical formulas were compared on the remaining part of MAP dataset, which was not utilized in the training of the models.Feature sets of the machine learning models are shown in Table 11.It is clear that the empirical formulas exhibit larger errors compared with the models trained by means of machine learning (as shown in Figure 3), which indicates that the empirical formulas exhibit inherent limitations and deficiencies, thereby restricting their applicability [59,60].

No.
Ref. Equations Table 10.Empirical formulas for BS calculation.

No.
Ref. Equations

Performance of the Machine Learning Models Trained with Atomic Parameters
In our previous work [27], it was found that the introduction of new features related to atomic parameters could significantly improve the performance of trained models in predicting the martensite transformation start temperature.In this work, more complete atomic parameters (18 types) were introduced to construct new features.Each atomic parameter was separately introduced into the feature space to train a new model and then comparisons among the 19 models were conducted as shown in Figures 4 and 5 under different evaluation indexes.Only the atomic parameters were attached in each sample and the size of the dataset was not changed.Meanwhile, the models with atomic parameters were trained with the same hyperparameters as the base models.The newly trained model was named by the abbreviation of the introduced atomic parameter as shown in Table 6.The model without any atomic parameter was named the base model.It was found that all the models used to predict the martensite start transformation temperature with atomic parameters outperform the base model.For the other three phase transformation temperatures, most of the models with atomic parameters outperform the base model.Specifically, the melting temperature and linear thermal expansion coefficient of the pure metal related to the alloy element, the valence electron number and pseudopotential radius come first in the ranking of the most effective atomic parameters in training Ac1, MS, Ac3 and BS prediction models, respectively.In Figure 6, it is shown that except for the BS prediction model, the newly introduced feature related to the atomic parameters ranks first regarding the importance of the features in the other three models.

Performance of the Machine Learning Models Trained with Atomic Parameters
In our previous work [27], it was found that the introduction of new features related to atomic parameters could significantly improve the performance of trained models in predicting the martensite transformation start temperature.In this work, more complete atomic parameters (18 types) were introduced to construct new features.Each atomic parameter was separately introduced into the feature space to train a new model and then comparisons among the 19 models were conducted as shown in Figures 4 and 5 under different evaluation indexes.Only the atomic parameters were attached in each sample and the size of the dataset was not changed.Meanwhile, the models with atomic parameters were trained with the same hyperparameters as the base models.The newly trained model was named by the abbreviation of the introduced atomic parameter as shown in Table 6.The model without any atomic parameter was named the base model.It was found that all the models used to predict the martensite start transformation temperature with atomic parameters outperform the base model.For the other three phase transformation temperatures, most of the models with atomic parameters outperform the base model.Specifically, the melting temperature and linear thermal expansion coefficient of the pure metal related to the alloy element, the valence electron number and pseudopotential radius come first in the ranking of the most effective atomic parameters in training Ac1, MS, Ac3 and BS prediction models, respectively.In Figure 6, it is shown that except for the BS prediction model, the newly introduced feature related to the atomic parameters ranks first regarding the importance of the features in the other three models.In Figure 7, it is shown that Pearson's linear correlation coefficient between any two features in the new feature space with the addition of best atomic parameters was below 0.7 for four models, suggesting a limited correlation between the features and no extra feature screening was needed.Figure 8 exhibits the final trained models with the best atomic parameters.It is clear that all the points are close to the diagonal lines, which indicates that these four models were trained with high accuracy.Tables 12 and 13 show the evaluation results of the trained models with and without adding features related to atomic parameters, respectively.It is clear that the fitting error was significantly decreased.The MAE is decreased by 1.604 °C, 0.932 °C, 4.785 °C and 2.659 °C for Ac1, Ac3, MS and BS, respectively.The R2 is increased by 0.010, 0.019, 0.042 and 0.015 for Ac1, Ac3, MS and BS, respectively.The evaluation indexes are systematically improved.Especially, the performance of the trained MS prediction model was significantly increased.In Figure 7, it is shown that Pearson's linear correlation coefficient between any two features in the new feature space with the addition of best atomic parameters was below 0.7 for four models, suggesting a limited correlation between the features and no extra feature screening was needed.Figure 8 exhibits the final trained models with the best atomic parameters.It is clear that all the points are close to the diagonal lines, which indicates that these four models were trained with high accuracy.Tables 12 and 13 show the evaluation results of the trained models with and without adding features related to atomic parameters, respectively.It is clear that the fitting error was significantly decreased.

The Influencing Factors of MS Temperature
The nucleation and growth rate of the new phase were closely tied to the chemical composition of the steels during the solid phase transformation [61][62][63].Table 14 shows the clarification of normal alloying element in the steels.Generally, the austenite-forming elements should stabilize the austenite during the cooling stage and prompt its transformation during the heating stage.Vice versa, the ferrite-forming elements should stabilize the ferrite during the heating stage and prompt its transformation during the cooling  The nucleation and growth rate of the new phase were closely tied to the chemical composition of the steels during the solid phase transformation [61][62][63].Table 14 shows the clarification of normal alloying element in the steels.Generally, the austenite-forming elements should stabilize the austenite during the cooling stage and prompt its transformation during the heating stage.Vice versa, the ferrite-forming elements should stabilize the ferrite during the heating stage and prompt its transformation during the cooling stage.Meanwhile, the effects of carbon on the phase transformation could be influenced differently by carbide-forming elements and non-carbide-forming elements.It is generally accepted that all alloying elements except Al and Co lower the MS temperature.Different from martensite transformation [27], the influencing mechanisms of the alloying elements on MS temperature were explained in detail in our previous publication [27].In summary, the higher the C, Ni, Cr, Mn and Si contents, the lower the SHAP values (negative), i.e., the lower the MS temperature; the higher the Al, Co and V contents, the higher the SHAP values (positive), i.e., the higher the MS temperature; N, Nb and Cu show a similar effect on MS temperature as Al, Co and V do but with some exceptions.W and Mo demonstrate more complicated effects on MS temperature.
Among all the alloying elements, C exhibited the most pronounced effects on austenite decomposition temperature with decreasing ability at higher contents.Mn and Ni were negatively related to the MS temperature, consistent with the thermodynamic mechanism.The influence of alloying elements on the MS temperature was primarily governed by their impact on the T 0 temperature at which the ferrite and austenite with the same chemical composition showed equal free energy and their ability in strengthening the prior austenite phase.C demonstrated a significant effect in strengthening the austenite phase and demonstrated a substantial reduction in the MS temperature.Likewise, the presence of Mn, Ni, Cu and some other austenite-forming elements was associated with a decrease in the T 0 and a marginal effect in strengthening the austenite, leading to a significant decrease in the MS temperature.On the other hand, ferrite-forming elements such as Al, Co, Si, Mo, W, V and Ti were found to elevate the T 0 but still enhanced the strength of prior austenite to different extents, as indicated by various studies [64][65][66][67][68][69].Therefore, mostly, the addition of alloying element inhibits the formation of martensite and, correspondingly, decreases the MS temperature.
Figure 9 shows the importance of the features in the model with the best performance after adding a new feature, the coefficient of linear thermal expansion of the pure metals corresponding to specific alloying elements.In Figure 6, it is shown that this feature ranked first in importance in fitting the machine learning model, and Figure 9 further demonstrates that this feature significantly contributed to the prediction outcome, second only to C, which highlighted the importance of new feature construction.In Figure 7, it is shown that COLTE showed the largest positive correlation coefficient with Ni and the second and third largest positive correlation coefficients with Co and Al, respectively.In the meantime, both C and Mn exhibit negative and close correlation coefficients with COLTE.Further, as shown in Figure 9, the higher the COLTE values, the lower the MS temperature.These results revealed that the alloying elements demonstrate complicated effects on the MS temperature because they contribute to both sides, the driving force and the resistance force, i.e., non-chemical driving forces.The non-chemical driving forces (∆ → ) in the martensitic phase transformation include the dilatation strain energy (∆ ), dislocation stored energy (∆ ), the shearing energy of austenite (∆ ) and interfacial energy (∆ ), which can be specified as in Equation ( 6) [68], The non-chemical driving forces (∆G a→M ) in the martensitic phase transformation include the dilatation strain energy (∆G dil ), dislocation stored energy (∆G stor ), the shearing energy of austenite (∆G sh ) and interfacial energy (∆G inter ), which can be specified as in Equation ( 6) [68], The above four energy terms can be simplified as ∆G sh = 0.53σ s , ∆G stor = Gb 2 ρV m and ∆G inter = 70.9.Here, σ s is the yield strength and E is Young's modulus of austenite, treated as a constant.∆L L is related to the difference between the thermal expansion coefficients of ferrite and austenite.V m is molar volume of iron atoms.G is the shear modulus, b is the Burgers vector and ρ is the dislocation density in the formed martensite.It was found that the dilatation strain energy (∆G dil ) induced by thermal expansion coefficient difference between the formed martensite and austenite plays a key role in the non-chemical driving forces.A complicated equation (Equation ( 7)) depending on the chemical compositions was given in the literature [26] and it was found that with this improved item, the accuracy of the martensite transformation start temperature prediction model was significantly improved [70].In this work, it was found that the coefficient of linear thermal expansion of the pure metals corresponding to specific alloying elements was the most important feature influencing the MS temperature, as shown in Figure 6c.This coincidence indicated the rationality and superiority of contracting new features based on atomic parameters in the training of the phase transformation temperature prediction model.

The Influencing Factors of BS Temperature
The incubation time was needed for bainite transformation as characterized by its time-temperature-transformation curves.Therefore, cooling rate outperformed other features in the feature importance ranking during the training, as shown in Figure 6d. Figure 10a clearly shows that the higher the C, Mn, Cr, Ni, Si, Mo, V, P and S contents and cooling rates, the lower the SHAP values (negative), i.e., the lower the BS temperature; the higher the Al, Ti and N contents, the higher the SHAP values (positive), i.e., the higher the BS temperature; Mn, Cu and B demonstrated more complicated effects on the MS temperature.The difference in the effects of alloying elements on the MS and BS temperatures indicated the diverse phase transformation mechanisms between martensite and bainite transformations.Meanwhile, the atomic Waber-Crome pseudopotential radius outperforms the atomic radius and the coefficient of linear thermal expansion in improving the performance of the BS prediction model, which indicated that the motion of the alloying element ion and the interaction between the alloying element ion and the surrounding electrons played an important role in bainite transformation and then the BS transformation could be better classified into reconstructive phase transformation.
In Figure 11, it is demonstrated that the BS temperature firstly increased with the Mn and Si contents then decreased with increasing Mn and Si contents within the medium content range and finally increased again with further increasing Si content.It is also found that there was a clear plateau on most of the PDP curves, which were diverse from martensite transformation.The higher the carbon content, the lower the bainite transformation temperature.
indicated the diverse phase transformation mechanisms between martensite and bainite transformations.Meanwhile, the atomic Waber-Crome pseudopotential radius outperforms the atomic radius and the coefficient of linear thermal expansion in improving the performance of the BS prediction model, which indicated that the motion of the alloying element ion and the interaction between the alloying element ion and the surrounding electrons played an important role in bainite transformation and then the BS transformation could be better classified into reconstructive phase transformation.In Figure 11, it is demonstrated that the BS temperature firstly increased with the Mn and Si contents then decreased with increasing Mn and Si contents within the medium content range and finally increased again with further increasing Si content.It is also found that there was a clear plateau on most of the PDP curves, which were diverse from martensite transformation.The higher the carbon content, the lower the bainite transformation temperature.It was reported that bainite transformation kinetics of Fe-C-Si-Mn alloys were much slower than those of ternary Fe-C-Mn and Fe-C-Si alloys, which suggested that the interaction of Si and Mn had an important influence on the kinetics of bainite transformation in Fe-C-Si-Mn alloys.It was experimentally proven that Si could enhance Mn segregation to austenite grain boundaries and inhibit the Fe3C precipitation and then inhibit the for- It was reported that bainite transformation kinetics of Fe-C-Si-Mn alloys were much slower than those of ternary Fe-C-Mn and Fe-C-Si alloys, which suggested that the interaction of Si and Mn had an important influence on the kinetics of bainite transformation in Fe-C-Si-Mn alloys.It was experimentally proven that Si could enhance Mn segregation to austenite grain boundaries and inhibit the Fe 3 C precipitation and then inhibit the formation of bainitic ferrite nucleation [71].For low-carbon bainite steels, it was found that Nb addition retarded bainitic transformation, and Mo addition was effective in promoting bainitic transformation [72], and the addition of B slightly decreased the bainite transformation temperature at low cooling rates, whereas the combined addition of B + Nb greatly decreased the transformation temperature [73].Meanwhile, carbide-forming elements, such as Mo, Nb, V and Cr, lead to an elevation in the activation energy required for carbon diffusion in austenite, which then retards bainite formation [74].
Figure 12 illustrates the effects of the interaction between C and other alloying elements on SHAP values.Generally, SHAP values decreased with C content.However, due to the interaction with other alloying elements, scattering in SHAP values occurred at each carbon content.Mn, Mo, V, Al, Cr and B exhibited an obvious influence on the effects of carbon on BS temperature.At lower carbon contents, SHAP values of carbon increased with Mn, Al and B contents, and SHAP values of carbon decreased with increasing Mo and V contents at the higher carbon contents.Meanwhile, increasing S and P somehow decreased the SHAP values of carbon.These results indicated that Mn, Al and B decrease the lowering effects of C on the BS temperature, and Mo and V as well as S and P enhanced the lowering effects of C on the BS temperature.C-Cr interaction demonstrated a complicated influence on the BS temperature.At lower carbon content, C-Cr interaction decreased the BS temperature; at higher carbon content, C-Cr interaction increased the BS temperature.At high carbon content, C-Mn interaction tended to decrease the BS temperature.C-Si interaction slightly increased the BS temperature.

The Influencing Factors of Ac1 Temperature
Figure 13a demonstrates the importance of the features on the final prediction output in the trained models.The importance of the alloying elements was arranged in the following order, Ni, Cr, Mn, Si, C, Cu, Mo, V, P, Nb, Al, N, B, W, Co, Ti and Zr.Generally, Ac1 temperature decreased with increasing contents of austenite-forming elements, especially Ni and Mn, and increased with increasing contents of ferrite-forming elements, especially Cr and Si. Figure 14 further exhibits the dependence of Ac1 temperature on alloying element contents based on the PDP analysis.Unusually, it was found that at lower contents, the austenite-forming element Ni increased the Ac1 temperature, i.e., retarded austenite formation, and Si increased Ac1 temperature, i.e., prompted austenite formation.For example, there were two stages regarding the change in Ac1 temperatures of the investigated steels with the varying Si content.In the range of 0-1.0 wt% Si, the influence of Si content on the Ac1 temperatures was weak.And in some alloys, the influence of Ni content on the Ac1 temperatures was also weak.In the scope of this study, the effects of Ni and Si on Ac1 are summarized above.In another early work [75], 80 entries were used for training and 40 entries (randomly selected) were used for testing a trained network, including steels such as structural steels, stainless steels, rail steels, spring steels, high-temperature creepresisting steels and tool steels.It was observed that Ac1 increased with Si content and decreased with increasing Ni content.Figure 14 further exhibits the dependence of Ac1 temperature on alloying element contents based on the PDP analysis.Unusually, it was found that at lower contents, the austenite-forming element Ni increased the Ac1 temperature, i.e., retarded austenite formation, and Si increased Ac1 temperature, i.e., prompted austenite formation.For example, there were two stages regarding the change in Ac1 temperatures of the investigated steels with the varying Si content.In the range of 0-1.0 wt% Si, the influence of Si content on the Ac1 temperatures was weak.And in some alloys, the influence of Ni content on the Ac1 temperatures was also weak.In the scope of this study, the effects of Ni and Si on Ac1 are summarized above.In another early work [75], 80 entries were used for training and 40 entries (randomly selected) were used for testing a trained network, including steels such as structural steels, stainless steels, rail steels, spring steels, high-temperature creep-resisting steels and tool steels.It was observed that Ac1 increased with Si content and decreased with increasing Ni content.
Mn and C always prompted austenite formation, and Cr inhibited austenite formation.However, the effect of C on Ac1 temperature remained constant after its content exceeded a certain amount.Generally, Ac1 temperature rises with increasing MP temperature (i.e., with a decrease in C content in the steel).Therefore, ferrite transformation in austenite should be inhibited.Figure 15 shows the influence of other alloying elements on the effects of Ni on Ac1 temperature.It was also found that at a lower concentration range, the interaction of Si, Cu and Cr with Ni could increase the SHAP values, i.e., Ac1 temperature.In other words, the ferrite-forming elements could weaken the effects of austenite-forming elements on Ac1 temperature.Mn and C always prompted austenite formation, and Cr inhibited austenite formation.However, the effect of C on Ac1 temperature remained constant after its content exceeded a certain amount.Generally, Ac1 temperature rises with increasing MP temperature (i.e., with a decrease in C content in the steel).Therefore, ferrite transformation in austenite should be inhibited.Figure 15 shows the influence of other alloying elements on the effects of Ni on Ac1 temperature.It was also found that at a lower concentration range, the interaction of Si, Cu and Cr with Ni could increase the SHAP values, i.e., Ac1 temperature.In other words, the ferrite-forming elements could weaken the effects of austeniteforming elements on Ac1 temperature.

The Influencing Factors of Ac3 Temperature
In Figure 16, the importance of features in influencing Ac3 temperature is shown.It was clear that austenite-forming alloying elements and ferrite-forming elements were separated well by their SHAP values' characteristics.But the span of SHAP values was smaller than those in the other three models, which indicated that the total effects of alloying elements were weaker.For all austenite-forming alloying elements, the Ac3 temperature decreased with increasing alloying content because these alloying elements expanded the area of the austenite phase and decreased the equilibrium Ac3 temperature.Meanwhile, Ac3 temperature increased with increasing ferrite-forming alloying element contents because these alloying elements expanded the area of the ferrite phase and increased the equilibrium Ac3 temperature.Therefore, the effects of alloying elements on Ac3 temperature mainly depended on their influence on the equilibrium phase boundary.However, Figure 17 shows that the effects of Mn on Ac3 temperature only increase with its content within a narrow range (about 0.6-0.8wt%), which was consistent with the literature [20], where the authors found that an increase in Mn content had little effect on the Ac3 temperatures of their investigated steels.The C, Si and Mo elements demonstrated enhanced effects with increases in their contents.The effect of Mo on Ac3 temperatures was similar to that of Si but with lower magnitude.Si demonstrates the same influence on Ac1 and Ac3 temperatures as Al does.

The Influencing Factors of Ac3 Temperature
In Figure 16, the importance of features in influencing Ac3 temperature is shown.It was clear that austenite-forming alloying elements and ferrite-forming elements were separated well by their SHAP values' characteristics.But the span of SHAP values was smaller than those in the other three models, which indicated that the total effects of alloying elements were weaker.For all austenite-forming alloying elements, the Ac3 temperature decreased with increasing alloying content because these alloying elements expanded the area of the austenite phase and decreased the equilibrium Ac3 temperature.Meanwhile, Ac3 temperature increased with increasing ferrite-forming alloying element contents because these alloying elements expanded the area of the ferrite phase and increased the equilibrium Ac3 temperature.Therefore, the effects of alloying elements on Ac3 temperature mainly depended on their influence on the equilibrium phase boundary.However, Figure 17 shows that the effects of Mn on Ac3 temperature only increase with its content within a narrow range (about 0.6-0.8wt%), which was consistent with the literature [20], where the authors found that an increase in Mn content had little effect on the Ac3 temperatures of their investigated steels.The C, Si and Mo elements demonstrated enhanced effects with increases in their contents.The effect of Mo on Ac3 temperatures was similar to that of Si but with lower magnitude.Si demonstrates the same influence on Ac1 and Ac3 temperatures as Al does.In Figure 18, it is shown that the interaction of C and alloying elements on Ac3 temperature was limited compared with their effects on other phase transformation temperatures.However, the interaction between C-Si and C-B decreased the effects of C in lowering Ac3 temperature, and the interaction between C-Ni and C-Cr enhanced the effects of C in lowering Ac3 temperature.To understand the effects of Mn on Ac3 temperature, the interaction of Mn and other alloying elements was further presented in Figure 19.It was found that the scattering degree of SHAP values of Mn was significantly larger than that of C SHAP values, which indicated that the effect of Mn on Ac3 temperature was obviously influenced by other alloying elements.C decreased the effects of Mn in lowering the Ac3 temperature at lower and higher Mn contents, but Ni enhanced the effect of Mn in lowering the Ac3 temperature at a medium content, so did the C.Meanwhile, Mo as well as V and Cr decreased the effect of Mn in lowering the Ac3 temperature, and Si and Al slightly enhanced the effect of Mn.
that of C SHAP values, which indicated that the effect of Mn on Ac3 temperature was obviously influenced by other alloying elements.C decreased the effects of Mn in lowering the Ac3 temperature at lower and higher Mn contents, but Ni enhanced the effect of Mn in lowering the Ac3 temperature at a medium content, so did the C.Meanwhile, Mo as well as V and Cr decreased the effect of Mn in lowering the Ac3 temperature, and Si and Al slightly enhanced the effect of Mn.

The Generalization Ability of the Trained Models
The MAP dataset utilized in the present work had a large concentration range for all the main alloying elements.Therefore, it was expected the trained models showed good generalization ability on the unseen dataset.These best prediction models for each transformation temperature were chosen to evaluate their generalization ability on four groups of experimental phase transformation temperatures [6,56,61,76] not in the MAP project.show comparisons between the experimental phase transformation temperatures and the predicted ones.It was generally found that the trained model with the best atomic parameter features gave better prediction, compared with the models without atomic parameter features.In Figure 20, the prediction process of the trained models is presented, where various features had distinct contributions to the final prediction.Table 16 demonstrates that the predicted MS temperatures were very close to the experimental ones.In the literature, Bohemen et al. [51] achieved the best results in the training of the

The Generalization Ability of the Trained Models
The MAP dataset utilized in the present work had a large concentration range for all the main alloying elements.Therefore, it was expected the trained models showed good generalization ability on the unseen dataset.These best prediction models for each transformation temperature were chosen to evaluate their generalization ability on four groups of experimental phase transformation temperatures [6,56,61,76] not in the MAP project.show comparisons between the experimental phase transformation temperatures and the predicted ones.It was generally found that the trained model with the best atomic parameter features gave better prediction, compared with the models without atomic parameter features.In Figure 20, the prediction process of the trained models is presented, where various features had distinct contributions to the final prediction.Table 16 demonstrates that the predicted MS temperatures were very close to the experimental ones.In the literature, Bohemen et al. [51] achieved the best results in the training of the MS temperature with MAE = 5.60, RMSE = 7.11, R2 = 0.98 and EV = 0.98 with a thermodynamicsbased model including the effect of the prior austenite grain size.In our previous work [27], we achieved close prediction by considering the effects of alloying elements on the lattice constant of the prior austenite.In the present work, it was found that the coefficient of linear thermal expansion of the pure metals corresponding to specific alloying elements was the most important feature among 18 types of atomic features.Both features were related to the dilatation strain energy induced by austenite-martensite transformation, which contributed to most of the non-chemical driving force of the austenite-martensite phase transformation.For bainite transformation start temperature prediction, it was found that the best model was created with MAE = 17.34,RMSE = 24.67 and R2 = 0.913, considering element characteristics in the form as shown in Table 19 [77].In this work, it was found that atomic Waber-Crome pseudopotential radius comes first in the ranking of feature importance influencing the BS temperature.Considering atomic parameter-based features, the difference between the experimental and predicted BS temperatures was significantly narrowed with an averaged absolute error of 8.33, as shown in Table 16.However, the accuracy of BS temperature prediction was lower than that of the MS prediction model.Note: a i is the mole fraction of the alloying element, r i is the atomic radius of the alloying element or ion, VEC i is the number of valence electrons of the alloying element or ion, χ i is the electronegativity of the alloying element or ion.
By using neural network to train the Ac1/Ac3 temperature prediction models, it is found that the absolute error value of predicted Ac1 temperature does not exceed 22 • C, and the relative error is less than 3.01%; the absolute error value of the predicted Ac3 temperature does not exceed 28 • C, and the relative error is less than 3.02% [75].The results presented in Tables 17 and 18 show that the prediction performance of the Ac1/Ac3 temperature prediction models trained in the present work by means of LightGBM was better, partially due to the quality of the MAP dataset and partially due to the improved LightGBM algorithm.For example, in earlier work, the averaged prediction errors of BS and MS temperatures were both larger than 20 • C using the trained artificial neural network model based on a dataset even with a narrowing of the chemical composition range, i.e., the total mass fractions of manganese, chromium, nickel and molybdenum did not exceed 5% [51], which indicated the importance of the data cleaning and feature engineering strategy and the advantage of newly developed machine learning algorithm.In the present work, considering the features based on atomic parameters, the prediction accuracy was significantly improved.Meanwhile, the performance of the Ac1 temperature prediction model was better than that of the Ac3 temperature prediction model, which was consistent with the literature [57].

Conclusions
Prediction models for MS, BS, Ac1 and Ac3 temperatures were trained using the popular machine learning algorithm LightGBM, considering new features constructed based on 18 atomic parameters.Most of the new features enhanced the performance of the trained model, and the underlying mechanisms were discussed in the perspective of phase transformation theories through PDP and SHAP analysis.The main conclusions could be drawn as follows: (1) The prediction models for MS, BS, Ac1 and Ac3 temperatures were trained with high accuracy and achieved satisficed predictions on the unseen experimental data and exhibited higher accuracy and better generalization compared to the empirical formula.The prediction model for MS temperature showed the highest accuracy, followed by the Ac1 temperature prediction model.(2) C, Ni and Cr are the top three elements influencing MS temperature, followed by Mn and Mo.MS temperature increased with increasing Al and Co contents.Other alloying elements exhibit positive or negative influences on MS temperature at different composition ranges.(3) Except Al, Ti and N, the BS temperature generally decreased with increasing alloying element contents.Mn, Si and B elevated the BS temperature in certain content ranges.(4) The averaged magnitude of the effects of alloying elements on phase transformation temperatures was highest for martensite transformation.Cooling rate and heating rate played important roles in bainite transformation during cooling and austenite transformation during heating, respectively.(5) The interaction between alloying elements exhibits complicated effects on phase transformation temperatures.A linear relationship between the alloying element concentration and phase transformation temperature is hardly observed due to its contribution to both aspects, i.e., chemical driving forces and non-chemical driving forces as well as the interaction between alloying elements.

Figure 2 .
Figure 2. The workflow of the present work.

Figure 2 .
Figure 2. The workflow of the present work.

Figure 2 .
Figure 2. The workflow of the present work.

Figure 3 .
Figure 3.Comparison between the performance of the empirical formulations and trained machine learning models without atomic parameters.(a) Ac1, (b) Ac3, (c) MS and (d) BS.

Figure 3 .
Figure 3.Comparison between the performance of the empirical formulations and trained machine learning models without atomic parameters.(a) Ac1, (b) Ac3, (c) MS and (d) BS. Materials 2024, 17, x FOR PEER REVIEW 10 of 30

Figure 8 .
Figure 8.The fitting results in the models with the best atomic parameters.(a) Ac1, (b) Ac3, (c) MS and (d) BS.

Figure 8 .
Figure 8.The fitting results in the models with the best atomic parameters.(a) Ac1, (b) Ac3, (c) MS and (d) BS.

Materials 2024 , 30 Figure 9 .
Figure 9. SHAP analysis in the trained MS prediction model: (a) impact on model output and (b) average impact on model output magnitude.

Figure 9 .
Figure 9. SHAP analysis in the trained MS prediction model: (a) impact on model output and (b) average impact on model output magnitude.

Figure 10 .
Figure 10.SHAP analysis in the trained BS prediction model: (a) impact on model output and (b) average impact on model output magnitude.

Figure 10 . 30 Figure 11 .
Figure 10.SHAP analysis in the trained BS prediction model: (a) impact on model output and (b) average impact on model output magnitude.Materials 2024, 17, x FOR PEER REVIEW 17 of 30

Figure 11 .
Figure 11.PDP analysis on the effects of alloying element content on BS temperature.

Materials 2024 ,
17, x FOR PEER REVIEW 18 of 30decreased the SHAP values of carbon.These results indicated that Mn, Al and B decrease the lowering effects of C on the BS temperature, and Mo and V as well as S and P enhanced the lowering effects of C on the BS temperature.C-Cr interaction demonstrated a complicated influence on the BS temperature.At lower carbon content, C-Cr interaction decreased the BS temperature; at higher carbon content, C-Cr interaction increased the BS temperature.At high carbon content, C-Mn interaction tended to decrease the BS temperature.C-Si interaction slightly increased the BS temperature.

Figure 12 .
Figure 12.SHAP analysis on the effects of alloy element C on BS temperature.

Figure 12 . 30 Figure 13 .
Figure 12.SHAP analysis on the effects of alloy element C on BS temperature.4.3.The Influencing Factors of Ac1 Temperature Figure 13a demonstrates the importance of the features on the final prediction output in the trained models.The importance of the alloying elements was arranged in the following order, Ni, Cr, Mn, Si, C, Cu, Mo, V, P, Nb, Al, N, B, W, Co, Ti and Zr.Generally, Ac1

Figure 13 .
Figure 13.SHAP analysis in the trained Ac1 prediction model: (a) impact on model output and (b) average impact on model output magnitude.

Figure 14 .
Figure 14.PDP analysis on the effects of alloying element content on Ac1 temperature.

Figure 14 .
Figure 14.PDP analysis on the effects of alloying element content on Ac1 temperature.

Figure 15 .
Figure 15.SHAP analysis on the effects of alloy element Ni on Ac1 temperature.

Figure 15 . 30 Figure 16 .
Figure 15.SHAP analysis on the effects of alloy element Ni on Ac1 temperature.Materials 2024, 17, x FOR PEER REVIEW 22 of 30

Figure 16 .
Figure 16.SHAP analysis in the trained Ac3 prediction model: (a) impact on model output and (b) average impact on model output magnitude.

Figure 16 .
Figure 16.SHAP analysis in the trained Ac3 prediction model: (a) impact on model output and (b) average impact on model output magnitude.

Figure 17 .
Figure 17.PDP analysis on the effects of alloying element content on Ac3 temperature.

Figure 18 .
Figure 18.SHAP analysis on the effects of alloy element C on Ac3 temperature.Figure 18. SHAP analysis on the effects of alloy element C on Ac3 temperature.

Figure 18 .
Figure 18.SHAP analysis on the effects of alloy element C on Ac3 temperature.Figure 18. SHAP analysis on the effects of alloy element C on Ac3 temperature.Materials 2024, 17, x FOR PEER REVIEW 24 of 30

Figure 19 .
Figure 19.SHAP analysis on the effects of alloy element Mn on Ac3 temperature.

Figure 19 .
Figure 19.SHAP analysis on the effects of alloy element Mn on Ac3 temperature.

Table 1 .
Dataset size after data cleaning.

Table 2 .
Number of deleted samples.

Table 3 .
Overall information of MS dataset.

Table 4 .
Overall information of the Ac1 and Ac3 datasets.

Table 5 .
Overall information of BS dataset.

Table 6 .
Atomic parameter candidates utilized for constructing new features.

Table 7 .
Empirical formulas for MS calculation.

Table 11 .
Feature sets of machine learning models.

Table 12 .
Model performance before adding atomic parameters.

Table 12 .
Model performance before adding atomic parameters.

Table 13 .
Model performance after adding atomic parameters.

Table 13 .
Model performance after adding atomic parameters.

Table 14 .
Clarification of the normal alloying elements in the steels.

Table 15 .
Verification of MS temperature prediction model.

Table 18 .
Verification of Ac3 temperature prediction model.

Table 16 .
Verification of BS temperature prediction model.

Table 17 .
Verification of Ac1 temperature prediction model.

Table 19 .
Features related to the atomic parameters.