Application of Machine Learning Algorithms and SHAP for Prediction and Feature Analysis of Tempered Martensite Hardness in Low-Alloy Steels

: The tempering of low-alloy steels is important for controlling the mechanical properties required for industrial ﬁelds. Several studies have investigated the relationships between the input and target values of materials using machine learning algorithms. The limitation of machine learning algorithms is that the mechanism of how the input values affect the output has yet to be conﬁrmed despite numerous case studies. To address this issue, we trained four machine learning algorithms to control the hardness of low-alloy steels under various tempering conditions. The models were trained using the tempering temperature, holding time, and composition of the alloy as the inputs. The input data were drawn from a database of more than 1900 experimental datasets for low-alloy steels created from the relevant literature. We selected the random forest regression (RFR) model to analyze its mechanism and the importance of the input values using Shapley additive explanations (SHAP). The prediction accuracy of the RFR for the tempered martensite hardness was better than that of the empirical equation. The tempering temperature is the most important feature for controlling the hardness, followed by the C content, the holding time, and the Cr, Si, Mn, Mo, and Ni contents. are precipitated. The amount of carbide increases with the C content, leading to the enhancement of the HV. However, between 400 and 700 ◦ C, the HV is reduced by an increase in the C content. This reduction may be due to the decrease in the recrystallization temperature of the low-carbon alloy. A larger C content increases the density of Fe 3 C, which ﬁxes the ferrite boundaries. The earlier occurrence of recrystallization in the low-carbon alloy compared to the high-carbon alloy results in a ﬁner grain in the low-carbon alloy. In contrast, in the high-carbon alloy, spheroidal Fe 3 C is formed, and dislocations are recovered. At a large C content, recrystallization is inhibited by the pinning action of carbides on the boundaries. The recrystallization temperature of the low-C alloy is lower than that of the high-C alloy [6].


Introduction
The tempering of low-alloy steels involves the control of their mechanical properties through ε-martensite and carbide formation, dislocation recovery, carbide spheroidization, and recrystallization [1][2][3][4][5][6]. It is important to use apposite tempering to meet the property requirements of industrial use. Empirical equations for adjusting the hardness in tempered low-alloy steels have been proposed. Hollomon and Jaffe proposed tempering parameters comprising the tempering temperature and holding time to predict the hardness of tempered martensite. These equations are given by [1]: where TP is the tempering temperature in Kelvin, t is the holding time in hours for isothermal conditions, C is a constant, TMH is the tempered martensite hardness, and X c is the amount of carbon in weight percent. The equations effectively explain the effect of the tempering temperature and holding time. The hardness of the tempered martensite varies with the alloy elements. However, this hardness variation cannot be accurately estimated by Equation (2). Kang and Lee proposed a composition-dependent tempering parameter given as [2] TP = T log t + k 0 + ∑ i k i X i (3) where k 0 is a constant, k i is the coefficient of element i, and X i is the amount of alloying element i in weight percent. k 0 , k c , k Mn , k Si , k Ni , k Cr , and k Mo were 17.396, −6.661, −1.604, −3.412, −0.248, −1.112, and −4.355, respectively [2]. It has been reported that these optimal k i values were determined from over 1900 experimental data and revealed the fact that the change of activation energy related to tempering is associated with chemical composition. This equation makes it possible to evaluate the hardness variations of the tempered lowalloy steel with the tempering temperature, holding time, and alloy composition. These equations sufficiently explain the hardness of the tempered martensite. However, it is still important to confirm the importance of each parameter to the hardness of tempered martensite and to improve the accuracy of the equation to reduce the prediction error. Machine learning algorithms are intensive methods for improving prediction accuracy. They can predict the target values more accurately than classical alloy modeling methods without a trial-and-error process, which requires considerable time, cost, and effort. Such algorithms include random forest regression (RFR), support vector regression (SVR), k-nearest neighbors (kNN), and artificial neural networks (ANN). These algorithms are widely used to predict target parameters such as the yield strength, hardness, and thermal properties in material science [7][8][9][10][11][12][13][14]. For instance, Zhang et al. predicted the mechanical properties of a nonequiatomic high-entropy alloy using an ANN and an SVR [7]. Narayana et al. predicted the mechanical properties of 20Cr-20Ni-0.4C steels over a wide range of temperatures using an ANN [8]. However, the analysis of machine learning models to confirm how they work remains a challenge. Shapley additive explanations (SHAP) have been used to study the mechanisms of trained machine learning algorithms [15][16][17]. Yan et al. evaluated the importance of input features and how they affect the corrosion rate of low-alloy steels using SHAP [15]. This analysis opened the way to the evaluation of the machine learning algorithm mechanisms, which were previously regarded as black boxes.
In this study, we propose the use of machine learning algorithms to improve prediction accuracy and investigate how the tempering temperature, holding time, and alloy composition affect the tempered martensite hardness. We collected a dataset containing the composition, tempering temperature, and holding time of low-alloy steels from the literature. SHAP were then used to assess the trained machine learning mechanisms.

Data Collection
Datasets for low-alloy steels were collected from 1926 utilizable datasets from the literature [2]. The datasets consisted of compositions containing six elements (C, Mn, Si, Ni, Cr, and Mo), tempering features such as the tempering temperature and holding time, and the Vickers hardness (HV). The ranges of the compositions, tempering features, and HV are listed in Table 1.

Machine Learning Training
To predict the HV, the ANN, SVR, kNN, and RFR were employed. The ANN learning algorithm searches for the optimal weight and bias features to explain the relationship between the input and target features [18]. The main hyperparameters in ANN learning are the optimizer, activation function, number of layers, and number of nodes. The ANN was trained with the number of layers varied from 1 to 2, the number of neurons from 1 to 100, and the learning rate to 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, and 0.1. Deep learning, which has over three hidden layers, is typically used with tens of thousands of data. In this study, we trained the ANN with 1 or 2 layers, which was enough to deal with the total of 1926 datasets efficiently and accurately. In addition, we used root mean squared propagation (RMSprop) as the optimizer, the rectified linear unit (ReLU) as the activation function for the hidden layers, and the linear transfer function (purelin) as the activation function for the output layer. The datasets of the ANN were divided into proportions of 70%, 15%, and 15% for training, validation, and testing, respectively. The SVR algorithm was trained to minimize the weight and satisfy the constraints given by [19]: subject to where W is the weight matrix, C is a constant, ξ i and ξ * i are the soft margins, X is the matrix of input values, b is a constant, and ε is the error. The values of C and ε affect the SVR performance. The SVR was trained with ε set to 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, and 0.1. We set C to 100 and used a radial basis kernel with a constant value of 0.1. The kNN algorithm predicts the target value using the k-nearest values [20]. The performance of the kNN varies with the value of k. The kNN algorithm was trained with k values ranging from 1 to 20. The RFR consists of a multitude of decision trees [21,22]. The RFR trains k decision trees and takes the average value. The performance of the RFR is affected by the number of decision trees. The RFR was trained using 10 to 100 decision trees. The datasets for the SVR, kNN, and RFR algorithms were split into 80% for training and 20% for validation, respectively. R 2 was used to assess the predictive accuracy. The R 2 value is given by: where x and y are the measurement and predicted target values, respectively, and x and y their arithmetic means. The training processes and algorithm assessment were performed using TensorFlow Version 2.0, Google brain team, Mountain View, CA, USA and the scikit-learn Version 0.23.1 module in Python Version 3.7, Python software foundation, Fredericksburg, VA, USA.

SHAP Method
For a simple model such as linear regression, the best explanation is often the model itself. However, it is difficult to explain how complex models such as machine learning models work and how the input values affect the target values. The SHAP, which is based on game theory and local explanations, can explain the relationship between the inputs and targets using the Shapley value. The Shapley value can be expressed as [23,24]: where φ i is the Shapley value of the ith input value, S is the set of input values with the ith input value excluded, |S| is the magnitude of S (for example, S = x 1 , x 2 , . . . , x i−1 , x i+1 , . . . , x n−1 , x n and |S| = n − 1), F is the set of input values, and v(x) is calculated based on the marginal contribution of the input values. The explanation model of the linear function of the Shapley value is given by: where z i ∈ {0, 1} M and M is the number of input values. z i is 1 when the ith input value is observed and 0 otherwise. The value of the explanation model demonstrates how the model works. For example, if we need a Shapley value of C, we calculate v(x) with the whole data and v(x) except the C data. Then, we calculate φ C . The Shapley value makes it possible to evaluate the average influence of the input value on the target value.

Results and Discussion
The machine learning algorithms with the best R 2 values were selected. Figure 1 shows the accuracy of the selected machine learning algorithms with the training dataset.
where ′ ∈ {0, 1} and M is the number of input v is observed and 0 otherwise. The value of the expl model works. For example, if we need a Shapley v whole data and ( ) except the C data. Then, we c it possible to evaluate the average influence of the i

Results and Discussion
The machine learning algorithms with the b shows the accuracy of the selected machine learnin  RFR exhibited the best R 2 value of 0.9966. The second best model was kNN, which had an R 2 value of 0.9648. The ANN and SVR had R 2 values of less than 0.9. The average learning time for the ANN was 103.8 s, whereas that for the other algorithms was just a few seconds. The selected RFR algorithm consisted of 30 trees. The kNN algorithm had the k value of k = 4 for the HV. The structure of the ANN algorithm was 9-71-1. The learning rate of 0.01 was used in the selected ANN. ε = 0.01 in the selected SVR. Based on the R 2 values, we chose the RFR for HV prediction. Figure 2a shows the performance of the RFR for the test datasets that were not used for training. The accuracy was 0.9795. This demonstrates that the RFR predictions were reliable and that the RFR was successfully trained. The R 2 values for the test datasets were 0.9492, 0.8474, and 0.5344 for kNN, the ANN, and the Metals 2021, 11, 1159 5 of 9 SVR, respectively. Figure 2b shows the performance of Equation (3) and the RFR. The RFR showed a higher accuracy (R 2 = 0.9961) than Equation (3) (R 2 = 0.9407). This result indicates that the RFR predicted HV more accurately.
Metals 2021, 11, x FOR PEER REVIEW RFR for the test datasets that were not used for training. The accuracy was 0 demonstrates that the RFR predictions were reliable and that the RFR was s trained. The R 2 values for the test datasets were 0.9492, 0.8474, and 0.5344 fo ANN, and the SVR, respectively. Figure 2b shows the performance of Equat the RFR. The RFR showed a higher accuracy (R 2 = 0.9961) than Equation (3) Figure 3 shows the influence of each feature. We calculated the absolu value of each input value. Then, we averaged it. Therefore, Mean (|SHAP|) in average absolute magnitude of the influence of each input value on HV. The temperature had the greatest influence on the HV, followed by the C content, t time, and the contents of Cr, Si, Mn, Mo, and Ni. Figure 4 shows the average influence of each variation on hardness. Figur the effect of the tempering temperature on the HV. Carbon was chosen as a co To make the plot, we calculated the Shapley values at each tempering temp with RFR and confirmed the effect of each variation, which mostly related to temperature. Hence, we can say if the tempering temperature increases, HV d average. For the same tempering temperature, the C content is the most effecti control the HV. Speich et al. proposed that the tempering process can be divide stages according to the tempering temperature. The first stage occurs at tempe tween room temperature and 250 °C. In this stage, carbon segregation (<0.2 w ε-carbide precipitation (>0.2 wt.% C) occur. The tempering progresses to the se at temperatures between 230 and 300 °C. At this stage, the retained austenite de The third stage takes place between 200 and 300 °C. At this stage, rod-shaped precipitated at the austenite grain boundaries, the martensite lath boundaries, aries between ε-carbide and the matrix, and the twin boundaries. The fourth s between 300 and 600 °C. At this stage, dislocation recovery and Fe3C spheroidi Above 600 °C, the crystal structure recrystallizes and Fe3C coarsening occurs HV is decreased according to the four stages of tempering. The tempering te effect in the RFR reflects this process. Figure 4b shows the effect of C on the HV. The HV is enhanced with the the C content. The HV is influenced by several factors, including (1) the restri dislocation movement by C atoms located at interstitial sites, (2) the increase i transformation at tempering temperatures between room temperature and 250 the decrease in the frequency of auto-tempering due to the decline in the marte  pering temperature. At tempering temperatures between hanced when the C content increases. In this range of te shaped carbide are precipitated. The amount of carbide in ing to the enhancement of the HV. However, between 400 by an increase in the C content. This reduction may be d tallization temperature of the low-carbon alloy. A larger of Fe3C, which fixes the ferrite boundaries. The earlier o the low-carbon alloy compared to the high-carbon alloy re carbon alloy. In contrast, in the high-carbon alloy, spheroi tions are recovered. At a large C content, recrystallization tion of carbides on the boundaries. The recrystallization te lower than that of the high-C alloy [6]. Figure 4c shows the effect of the holding time on the H time provides sufficient time for alloy elements to diffu tempering process proceeds sufficiently. Figure 4d-h sh ments. All the elements improve the HV. Cr and Mo en mation of carbide, which is harder than Fe3C. Si improv retarding the nucleation and growth of Fe3C [6]. Mn has pearance of martensitic to a higher temperature and to ma numerous. The HV is hardly affected by Ni [3].   Figure 4 shows the average influence of each variation on hardness. Figure 4a shows the effect of the tempering temperature on the HV. Carbon was chosen as a color legend. To make the plot, we calculated the Shapley values at each tempering temperature the with RFR and confirmed the effect of each variation, which mostly related to tempering temperature. Hence, we can say if the tempering temperature increases, HV decreases on average. For the same tempering temperature, the C content is the most effective value to control the HV. Speich et al. proposed that the tempering process can be divided into four stages according to the tempering temperature. The first stage occurs at temperatures between room temperature and 250 • C. In this stage, carbon segregation (<0.2 wt.% C) and ε-carbide precipitation (>0.2 wt.% C) occur. The tempering progresses to the second stage at temperatures between 230 and 300 • C. At this stage, the retained austenite decomposes. The third stage takes place between 200 and 300 • C. At this stage, rod-shaped carbide is precipitated at the austenite grain boundaries, the martensite lath boundaries, the boundaries between ε-carbide and the matrix, and the twin boundaries. The fourth stage occurs between 300 and 600 • C. At this stage, dislocation recovery and Fe 3 C spheroidizing occur. Above 600 • C, the crystal structure recrystallizes and Fe 3 C coarsening occurs [3][4][5][6]. The HV is decreased according to the four stages of tempering. The tempering temperature effect in the RFR reflects this process. Figure 4b shows the effect of C on the HV. The HV is enhanced with the increase in the C content. The HV is influenced by several factors, including (1) the restriction of the dislocation movement by C atoms located at interstitial sites, (2) the increase in ε-carbide transformation at tempering temperatures between room temperature and 250 • C, and (3) the decrease in the frequency of auto-tempering due to the decline in the martensite starting temperature. Figure 4a,b show the relationship between the C content and the tempering temperature. At tempering temperatures between 100 and 300 • C, the HV is enhanced when the C content increases. In this range of temperatures, ε-carbide and rod-shaped carbide are precipitated. The amount of carbide increases with the C content, leading to the enhancement of the HV. However, between 400 and 700 • C, the HV is reduced by an increase in the C content. This reduction may be due to the decrease in the recrystallization temperature of the low-carbon alloy. A larger C content increases the density of Fe 3 C, which fixes the ferrite boundaries. The earlier occurrence of recrystallization in the low-carbon alloy compared to the high-carbon alloy results in a finer grain in the low-carbon alloy. In contrast, in the high-carbon alloy, spheroidal Fe 3 C is formed, and dislocations are recovered. At a large C content, recrystallization is inhibited by the pinning action of carbides on the boundaries. The recrystallization temperature of the low-C alloy is lower than that of the high-C alloy [6].
pering temperature. At tempering temperatures between 100 and 300 °C, the HV is enhanced when the C content increases. In this range of temperatures, ε-carbide and rodshaped carbide are precipitated. The amount of carbide increases with the C content, leading to the enhancement of the HV. However, between 400 and 700 °C, the HV is reduced by an increase in the C content. This reduction may be due to the decrease in the recrystallization temperature of the low-carbon alloy. A larger C content increases the density of Fe3C, which fixes the ferrite boundaries. The earlier occurrence of recrystallization in the low-carbon alloy compared to the high-carbon alloy results in a finer grain in the lowcarbon alloy. In contrast, in the high-carbon alloy, spheroidal Fe3C is formed, and dislocations are recovered. At a large C content, recrystallization is inhibited by the pinning action of carbides on the boundaries. The recrystallization temperature of the low-C alloy is lower than that of the high-C alloy [6]. Figure 4c shows the effect of the holding time on the HV. A longer tempering holding time provides sufficient time for alloy elements to diffuse in the steel alloy. Hence, the tempering process proceeds sufficiently. Figure 4d-h shows the effects of the alloy elements. All the elements improve the HV. Cr and Mo enhance the HV through the formation of carbide, which is harder than Fe3C. Si improves the stability of ε-carbide by retarding the nucleation and growth of Fe3C [6]. Mn has been reported to delay the appearance of martensitic to a higher temperature and to make the carbide smaller and more numerous. The HV is hardly affected by Ni [3].

Conclusions
In this study, the use of the RFR to predict the HV of low-alloy steels was proposed. The RFR algorithm was trained using the tempering temperature, holding time, and the amounts of added alloying elements C, Mn, Si, Ni, Cr, and Mo. The RFR exhibited a larger improvement in the accuracy of the predicted hardness compared to the compositiondependent empirical equation. The effect of each feature on the RFR was evaluated using the SHAP value. The HV decreased as the tempering temperature and holding time increased and increased as the amount of added alloying composition increased. The most important feature affecting the HV was the tempering temperature, followed by the C content, the holding time, and the Cr, Si, Mn, Mo, and Ni contents.  Figure 4c shows the effect of the holding time on the HV. A longer tempering holding time provides sufficient time for alloy elements to diffuse in the steel alloy. Hence, the tempering process proceeds sufficiently. Figure 4d-h shows the effects of the alloy elements. All the elements improve the HV. Cr and Mo enhance the HV through the formation of carbide, which is harder than Fe 3 C. Si improves the stability of ε-carbide by retarding the nucleation and growth of Fe 3 C [6]. Mn has been reported to delay the appearance of martensitic to a higher temperature and to make the carbide smaller and more numerous. The HV is hardly affected by Ni [3].

Conclusions
In this study, the use of the RFR to predict the HV of low-alloy steels was proposed. The RFR algorithm was trained using the tempering temperature, holding time, and the amounts of added alloying elements C, Mn, Si, Ni, Cr, and Mo. The RFR exhibited a larger improvement in the accuracy of the predicted hardness compared to the compositiondependent empirical equation. The effect of each feature on the RFR was evaluated using the SHAP value. The HV decreased as the tempering temperature and holding time increased and increased as the amount of added alloying composition increased. The most important feature affecting the HV was the tempering temperature, followed by the C content, the holding time, and the Cr, Si, Mn, Mo, and Ni contents.