Modeling Yield Strength of Austenitic Stainless Steel Welds Using Multiple Regression Analysis and Machine Learning

: Designing welding filler metals with low cracking susceptibility and high strength is essential in welding low-temperature base metals, such as austenitic stainless steel, which is widely utilized for various applications. A strength model for weld metals using austenitic stainless steel consumables has not yet been developed. In this study, such a model was successfully developed. Two types of models were developed and analyzed: conventional multiple regression and machine-learning-based models. The input variables for these models were the chemical composition and heat input per unit length. Multiple regression analysis utilized five statistically significant input variables at a significance level of 0.05. Among the prediction models using machine learning, the stepwise linear regression model showed the highest coefficient of determination ( R 2 ) value and demonstrated practical advantages despite having a slightly higher mean absolute percentage error (MAPE) than the Gaussian process regression models. The conventional multiple regression model exhibited a higher R 2 (0.8642) and lower MAPE (3.75%) than the machine-learning-based predictive models. Consequently, the models developed in this study effectively predicted the variation in the yield strength resulting from dilution during the welding of high-manganese steel with stainless-steel-based welding consumables. Furthermore, these models can be instrumental in developing new welding consumables, thereby ensuring the desired yield strength levels.

The chemical composition of the weld metal for 9% Ni steel and high-Mn steel with austenitic stainless-steel-based welding consumables varies based on the degree of dilution between the base metal and filler metal [15,16].This dilution affects the mechanical properties of the weld, such as the impact toughness, yield strength, and tensile strength [17,18].It is noteworthy that the mechanical properties are influenced by process parameters such as the heat input, welding position, and arc inclination because these factors influence the degree of dilution in dissimilar combinations of base metal and filler metal.
Several studies have focused on developing models to predict the mechanical properties of the base metal in stainless steels using multiple regression analyses [19][20][21].In addition, there is ongoing research on the development of machine-learning-based models to predict the mechanical properties of various steels, including carbon and stainless steels [22][23][24][25].However, studies on prediction models for the mechanical properties of weld metals are limited, and there is a need for exploration using artificial intelligence technology in addition to conventional multiple regression analysis.The properties of the weld metal are influenced by its chemical composition and thermal history, making the prediction model for the weld metal more sophisticated than that for the base metal.Verma et al. [26] proposed a prediction model for the mechanical properties of welded joints; however, their model was specifically developed for friction-stir-welded aluminum alloys, which have limited applicability.
Given the increasing demand for cryogenic liquid tanks, low cracking susceptibility, high strength, and cost-effectiveness are crucial factors in the design of welding filler metals to accommodate low-temperature base metals.For austenitic stainless steels, the Schaeffler phase diagram is used to estimate the phase of the weld metal based on its chemical composition, as a measure of cracking susceptibility.However, a strength model for austenitic stainless steel welds has not yet been developed.
Therefore, the objective of this study was to develop a machine-learning-based prediction model for the mechanical properties of weld metals using austenitic stainless steel filler metals.Specifically, the focus was on predicting the yield strength based on the chemical composition of the weld metal and the welding heat input.A dataset comprising 100 data points with various chemical compositions and heat inputs was utilized to train the prediction model, while an additional 27 data points were reserved for model verification.To establish yield strength models, various algorithms, including linear regression, support vector machine (SVM), Gaussian process regression (GPR), decision tree, and tree ensemble, were employed.

Data Preparation
A dataset consisting of 100 data points was used as the training dataset to establish a machine-learning-based prediction model for the weld metal strength.These data points were obtained from evaluation reports conducted by welding consumable manufacturers (Hyundai Welding Co., LTD Seoul, South Korea.and ESAB SeAH Corp. Seoul, South Korea) and shipyards (HD Korea Shipbuilding & Offshore Engineering Co., LTD.Seoul, South Korea and HD Hyundai Heavy industries Co., LTD.Seoul, South Korea) to assess the mechanical properties of the welds.The flux cored arc welding (FCAW) process was applied using the thyristor-controlled welding power sources commonly used in shipyards, and CO2 was used alone as the shielding gas, so the globular transfer mode was applied.The input variables considered in the model included the composition of the weld metal, nine different alloying elements (C, Si, Mn, P, S, Ni, Cr, Mo, and Cu), and the welding heat input per unit length.The output variable of the model was the yield strength of the weld metal.Table 1 presents the statistical characteristics of the data.Various base metals, such as high-Mn steel, stainless steel, and 9% Ni steel, were welded using austenitic stainless steel filler metals.To validate the predictive models developed in this study, additional yield strength data points for austenitic stainless steel weld metals were obtained from previous studies [27][28][29][30][31] and mill test certificates for shipyards.In addition, 27 datasets were acquired, surpassing 25% of the original 100 datasets, and were used for model training.The ranges of each variable in the collected datasets are listed in Table 2.

Multiple Regression Analysis
Pearson's correlation analysis was performed to assess the relationships between the input and output variables.Pearson's correlation coefficients were calculated to determine the strength and direction of the correlation between each input and output variable.
A multiple linear regression model was constructed using all the input variables in relation to the output variable.Additionally, a simplified model was developed considering only statistically significant input variables.

Machine Learning Prediction Models
Prediction models were developed using the MATLAB (version R2022a) Regression Learner App.Several machine learning algorithms, such as linear regression, decision tree, SVM, regression tree ensemble, and GPR [32,33], have been employed to establish prediction models for yield strength.The inputs for the model included the chemical composition of the weld metal and heat input.
The linear regression model is a regression model that predicts the value of a dependent variable by assuming that the independent and dependent variables have a linear relationship.Robust linear regression is a method used to reduce the effect of outliers that distort data in linear regression and is a method of adjusting weights.Stepwise linear regression is a method of optimization by adding terms step-by-step until the performance no longer improves without inserting all terms.A decision tree is a multistage decisionmaking algorithm that predicts the final dependent variable by composing a complex decision-making process using a combination of several simple decisions.This method matches the results by creating branching points using specific criteria in the dataset.This is called a decision tree, because the shape of the overall model divided by the branching points is similar to that of a tree.A regression tree ensemble is a technique that connects multiple decision trees to create a robust model.Boosting and bagging were used in this study.SVM is an algorithm designed to solve the binary classification problem; it finds a hyperplane that can classify data with the maximum margin.SVM has excellent generalization properties and can be applied to regression problems.Six types of SVM models (linear, quadratic, cubic, fine, medium, and coarse Gaussian) were used for the training.GPR is a regression modeling technique that uses a Gaussian process.The squared exponential, Matern 5/2, exponential GPR, and rational quadratic GPR were used.In the Gaussian process, the mean and covariance functions are used to define the distribution of the regression function; the covariance function is also called a kernel function.
To assess the performance of the developed model, a comparative evaluation was conducted using various predictive models, enabling a comprehensive analysis of their effectiveness.

Correlation Analysis
Table 3 presents the Pearson correlation coefficients and corresponding p-values for all the input variables in relation to the yield strength, which served as the output variable.Among the chemical compositions, Mo demonstrated the highest correlation coefficient, i.e., 0.7032, with the yield strength.It is generally accepted that a correlation is appropriate when the absolute value of the correlation coefficient exceeds the absolute value of 0.4 [34].Based on this criterion, the input variables C, Si, S, and Mo were found to be significantly correlated with the yield strength.The statistical significance of the relationships was assessed using a significance level of 0.05 [35].When the p-value was below this threshold, the relationship was considered statistically significant.Consequently, the input variables for C, Si, S, Ni, and Mo were identified as statistically significant, based on their respective p-values.

Multiple Regression Analysis
A multiple regression model utilizing all the input variables was developed, and the coefficients and p-values are presented in Table 4.
where the chemical composition is given in weight percent and the heat input is in kJ/cm.The p-values for C, Cr, Mo, Cu, and HI were less than 0.05, which can be considered statistically significant at a significance level of 0.05, and had a meaningful effect on the model's output.Owing to the limited number of training data points, the signs of some coefficients may be inconsistent with the physical behavior caused by the respective elements.However, the coefficients of the significant variables aligned with the actual role of the element.For example, an increase in C content leads to increased yield strength owing to interstitial solid solution strengthening [36][37][38], whereas an increase in heat input results in decreased yield strength owing to an increase in primary dendrite arm spacing (PDAS) [39].
At significance levels of 0.1 and 0.05, multiple regression models using only the significant variables were derived, as shown in Equations ( 2) and ( 3 where the chemical composition is given in weight percent and the heat input is in kJ/cm.Table 5 summarizes the accuracy of the model with the mean absolute percentage errors (MAPEs) and adjusted coefficient of determination (R 2 ) values of Equations ( 1)-(3).The MAPEs for Equations (1) and (3) are 2.1% and 2.2%, respectively, and the R 2 values are 0.9294 and 0.9278, respectively.Equation (3) utilizes only five input variables but demonstrates performance comparable to that of Equation (1).

Machine Learning Models
Figure 1 presents the R 2 values for the prediction models developed using linear regression, decision trees, SVM, and GPR.For the decision tree models, the R 2 values for the fine, medium, and coarse tree models were less than 0.50; therefore, they were not plotted.Among the prediction models developed using the Regression Learner App, the stepwise linear regression model exhibited the highest R 2 value, i.e., 0.93.Among the different SVM models, only the linear SVM model surpassed the value of 0.90, with an R 2 value of 0.91.Table 6 lists the MAPEs and R 2 values for the prediction models, with the R 2 values exceeding 0.90.6 lists the excellent prediction performance of both the linear regression and GPR models.The stepwise linear regression model demonstrated a marginally higher MAPE than the GPR model.However, the stepwise linear regression model also exhibited a higher R 2 value, surpassing the GPR model by 0.01.To assess the residual tendency of the model, residual analysis was conducted for both linear regression and GPR. Figure 2 illustrates the residuals from the prediction models, depicting a random distribution without any discernible trends in the measured yield strength.Because linear regression models require fewer computing resources, they are generally preferred over GPR models.Despite the slightly higher MAPE in the stepwise linear regression, the model's higher R 2 value and practical advantages make it a favorable choice.Figure 3 summarizes the MAPE and R 2 of models with R 2 exceeding 0.90 among the prediction models developed through the Regression Learner App and Equations ( 1)-( 3) through multiple regression analysis.Among the models shown in Figure 3, the ideal models are the model developed using Equation ( 3) with the highest R 2 , stepwise linear regression with the lowest MAPE and highest R 2 , and Matern 5/2 GPR with the lowest MAPE selected from the GPR models.

Model Verification Results Using Additional Data Points
The machine learning models developed in this study were verified using additional data points that were not used in the training in the previous step.The yield strength was predicted using Equation (3), the stepwise linear regression model, and the Matern 5/2 GPR model, which were confirmed to have excellent prediction performance.Figure 4 compares the measured yield strengths in the additional data with the yield strengths predicted by the developed models.The linear SVM model with a higher MAPE and a lower R 2 value than the stepwise linear regression and Matern 5/2 GPR, but with an R 2 value exceeding 0.90 among SVM models, was also included in the model validation.
According to Figure 4, the highest R 2 for the correlation between the measured and predicted yield strengths was 0.8642 in Equation ( 3), and a prediction model was developed using a multiple regression analysis.In contrast, the stepwise linear regression model was confirmed to have the lowest R 2 at 0.6383; however, the model had an R 2 of 0.7537, except for one outlier with a measured yield strength of 348 MPa, as shown in Figure 4.The linear SVM model, which had lower accuracy than stepwise linear regression and Matern 5/2 GPR in the process of prediction model development, indicated high prediction accuracy in model validation.Table 7 summarizes the MAPE and R 2 to ensure the accuracy of the model during validation.
In model validation, the accuracy of the model developed through multiple regression analysis was the highest; however, the R 2 and MAPE of machine-learning-based prediction models may be lowered depending on the attributes and quantity of data points.( Mutiple Regression -Eq. ( Mutiple Regression -Eq. (

Dilution and Yield Strength of Stainless-Steel-Based Weld Metal in High-Mn Steel
The yield strength prediction models for the weld metal developed in this study can predict the variation in the yield strength of the weld metal by considering the dilution when welding high-Mn steel with stainless-steel-based welding consumables.
Using the models used for model validation, the predicted yield strength according to the chemical composition variation due to dilution between the high-Mn steel and the three welding consumables, listed in Table 8, is as shown in Figure 5.Because the dilution rate varies depending on the welding position, current, speed, and heat input [40][41][42], the heat input per unit length was fixed at 10 kJ/cm.The stepwise linear regression and linear SVM models showed similar trends for various material combinations and dilutions; however, they exhibited slight differences compared with the multiple regression model, which had the highest prediction accuracy.
For the Matern 5/2 GPR model, the predicted yield strength increased linearly with the dilution rate up to a certain point.However, in certain dilution ranges (40-80%), the increase became nonlinear, leading to a relatively high deviation compared to the multiple regression model.
According to ASTM A1106/A1106M-17 [43], a yield strength of at least 400 MPa is required for high-Mn steel used for cryogenic applications.It was confirmed that the type 308L welding material did not satisfy the predicted yield strength of 400 MPa until the dilution rate was less than 20%; however, the predicted strength of the other welding consumables exceeded 400 MPa at all dilution rates.Considering the requirement for a maximum yield ratio of 0.9 (the yield strength/tensile strength) by engineering specifications [44], and the minimum tensile strength of 600 MPa for welding consumables according to KS D 7143 (flux-cored arc welding wires for high-Mn steel) [45], the maximum yield strength of austenitic stainless-steel-based welding consumables for high-Mn steel is considered to be 540 MPa.Because the high-strength properties of steel can lead to cracking [46], it is desirable to set an appropriate upper limit for the yield strength.

Establishment of Main Chemical Composition Content of Austenitic Stainless-Steel-Based Welding Consumables for High-Mn Steel
Models based on multiple regression, such as Equation (3), have relatively high prediction accuracy and can be compared with other machine-learning-based prediction models for accuracy validation.C, Cu, and Mo had relatively high coefficients in Equations (2) and (3); however, the Cu content in the welding consumables was minimal.Therefore, C and Mo were selected as the variables, and contour plots were drawn to determine the yield strength, as shown in Figure 6.For C and Mo, the minimum and maximum content of the dataset used for model training were used as domains, whereas, for other chemical components, the chemical composition was extracted from one experimental data point where the heat input was 14 kJ/cm and the base metal and weld metal were high-Mn steel and fully austenitic welding consumables, respectively.Their chemical compositions are listed in Table 9.According to Figure 6, when the C content is 0.10 wt%, the Mo content must be at least 1.25 wt% to obtain yield strength of at least 400 MPa.Therefore, it is possible to predict the minimum Mo content at a specific value of C content, or the minimum C content at a specific value of Mo content, to satisfy the minimum yield strength requirement.Likewise, when the C content is 0.35 wt% and the Mo content exceeds 3.2 wt%, excessive yield strength exceeding 540 MPa can be predicted.and 316L welding consumables were predicted to have ferrite content of less than 10% when undiluted, and to have a fully austenitic structure when the dilution rate exceeded 5%.Welding consumables with a fully austenitic structure maintained the structure because the variation in the Ni equivalent was small, even when the Cr equivalent decreased as the dilution rate increased.In particular, the fully austenitic welding consumable had a relatively high Ni equivalent of 24.3-24.5% according to dilution with high-Mn steel, which was advantageous in maintaining the fully austenitic microstructure.However, there is a need to reduce the cost of welding consumables through the optimization of chemical compositions, while ensuring that the required mechanical properties are satisfied.In fully austenitic welding materials, the Ni content, which is expensive and relatively large, was adjusted to 5% and 10%, respectively, and did not significantly change the yield strength, as shown in Table 10. Figure 8 shows the Cr and Ni equivalents according to the dilution rate, with the chemical compositions of the welding consumables listed in Table 10 on the Schaeffler diagram.The original fully austenitic welding consumable is expected to have a fully austenitic microstructure even if the Ni content is lowered to 10% and the Cr content is further reduced to 10%.However, when the Ni content was further reduced to 5%, martensite with a brittle microstructure was generated.Therefore, it was confirmed that a fully austenitic microstructure can be maintained even when the Ni content is lowered to 10% in the original fully austenitic welding consumable and can be utilized when designing a new welding consumable.Table 11 shows the results of predicting the yield strength using the chemical composition change data according to the increase in the dilution rate in the prediction models using the welding consumables (Case 2 in Table 10), in which the fully austenitic microstructure was maintained even when the Ni content was lowered to 10%.According to Table 11, a yield strength of at least 461 MPa is expected with the chemical composition of the undiluted welding consumable.When the dilution rate is 50% or more, it exceeds 540 MPa, which is considered the upper limit of the yield strength.In general, because it is difficult for a dilution rate above 50% to occur in multilayer welds, it is advantageous to refer only to the information on excessive dilution rates.

Conclusions
This study developed multiple regression and machine-learning-based models to predict the yield strength of weld metals using austenitic stainless-steel-based welding consumables, and the following conclusions were drawn.
1. Through multiple regression analysis, a model with high accuracy was developed with five input variables satisfying a significance level of 0.05, and conventional multiple regression models showed excellent prediction performance with an MAPE of 2.2%.2. Among the prediction models developed using machine learning, the stepwise linear regression model was identified as the best, with the highest R 2 and a practical advantage.3. Comparing the prediction models developed based on the multiple regression analysis and machine learning, the multiple regression model showed a higher R 2 than the machine learning models used in this study.In the model validation, the multiple regression model showed an R 2 of 0.8642 and an MAPE of 3.75%.
Consequently, the models developed in this study can effectively predict the variation in yield strength resulting from thermal history and dilution during the welding of high-Mn steel with stainless-steel-based welding consumables.Furthermore, these models play an important role in developing new welding consumables, thereby ensuring the desired yield strength levels.

Figure 3 .
Figure 3. R 2 and MAPE for prediction models.

Figure 7
Figure 7 hows the Cr and Ni equivalents in a Schaeffler diagram based on the variation in the chemical composition according to the dilution rate.Stainless steel type 308Land 316L welding consumables were predicted to have ferrite content of less than 10% when undiluted, and to have a fully austenitic structure when the dilution rate exceeded 5%.Welding consumables with a fully austenitic structure maintained the structure because the variation in the Ni equivalent was small, even when the Cr equivalent decreased as the dilution rate increased.

Figure 7 .
Figure 7. Cr and Ni equivalents according to chemical composition variation-induced dilution in Schaeffler diagram for austenitic stainless-steel-based welding consumables.

Figure 8 .
Figure 8. Cr and Ni equivalents according to chemical composition variation-induced dilution in Schaeffler diagram for austenitic stainless-steel-based welding consumables.

Author Contributions:
Conceptualization, S.P. and M.C.; data preparation, S.P., M.C., and D.K.; software, S.P. and D.K.; investigation, S.P.; resources, C.K. and N.K.; writing-original draft preparation, S.P.; writing-review and editing, C.K.; supervision, N.K.All authors have read and agreed to the published version of the manuscript.Funding: This work was supported by the Technology Innovation Program-Materials and Components Development Program (Grant No. 20022454) funded by the Ministry of Trade, Industry, and Energy (MOTIE, Korea) and the Korea Institute of Industrial Technology as "The dynamic parameter control based smart welding system module development for the complete joint penetration weld (KITECH EH-23-0007).

Table 1 .
Descriptive statistics of the dataset for model training.

Table 2 .
Descriptive statistics of the dataset for the verification test.

Table 3 .
Pearson correlation coefficients and p-values between input variables and yield strength.

Table 4 .
Regression coefficients and p-values for yield strength regression using all input variables.

Table 5 .
Accuracy of multiple regression models.

Table 6 .
Accuracy of regression models.

Table 7 .
Accuracy of regression models for model validation.

Table 8 .
Chemical composition of base metal and welding consumables (wt%).

Table 9 .
Chemical composition for contour line of yield strength (wt%).

Table 10 .
Chemical composition of base metal and modified welding consumables (wt%).

Table 11 .
Results of predicting the yield strength according to the increase in dilution rate with the optimized welding consumable.