On the Use of Machine Learning Algorithms to Predict the Corrosion Behavior of Stainless Steels in Lactic Acid

: Predicting the corrosion behavior of materials in speciﬁc environmental conditions is important for establishing a sustainable manufacturing system while reducing the need for time-consuming experimental investigations. Recent studies started to explore the application of supervised Machine Learning (ML) techniques to forecast corrosion behavior in various conditions. However, there is currently a research gap in utilizing classiﬁcation ML techniques speciﬁcally for predicting the corrosion behavior of stainless steel (SS) material in lactic acid-based environments, which are extensively used in the pharmaceutical and food industry. This study presents a ML-based prediction model for corrosion behavior of SSs in different lactic acid environmental conditions, using a database that described the corrosion behavior by qualitative labels. Decision tree (DT), random forest (RF) and support vector machine (SVM) algorithms were applied for classiﬁcation. Training and testing accuracies of, respectively 97.5% and 92.5% were achieved using the DT classiﬁer. Four SS alloy composition elements (C, Cr, Ni, Mo), acid concentration, and temperature were found sufﬁcient to consider as input data for corrosion prediction. The developed models are reliable for predicting corrosion degradation and, as such, contribute to avoiding failures and catastrophes in industry.


Introduction
A material's chemical composition and various environmental or operational factors, including temperature, pH, chloride content [1,2], humidity [3,4], stray currents [5], oxygen levels [6], and impurity levels [7] have a significant impact on corrosion processes. Therefore, it is difficult to link individual environmental factors to corrosion processes or predict corrosion lifetimes from physics-based corrosion laws. Predicting corrosion behavior of materials in any given environmental condition is important, because testing materials in each possible environmental condition is time-consuming and expensive [8]. In addition, analyzing corrosion data and predicting corrosion behavior needs advanced data mining methods [9,10]. The predictive capability of corrosion behavior in stainless steel (SS), which is widely employed in various industries, plays a crucial role in maintaining sustainable manufacturing processes and mitigating the detrimental effects of corrosion [11,12]. The present study focuses on the corrosion behavior prediction of different SSs in a lactic acid environment. This type of corrosive environment is typical for applications in health-related industries such as food, pharmaceuticals, and cosmetics [12][13][14].
In the field of materials research, machine learning (ML) techniques have gained significant popularity due to their effective data mining abilities [9]. Instead of relying on pre-established equations, ML algorithms learn from input data and previous experiences. ML techniques like Random Forest (RF), Support Vector Regression (SVR), and Artificial Neural Networks (ANN) have recently found application in corrosion research to explore corrosion behavior [8,10,15,16]. ML algorithms can be divided into two categories: unsupervised learning and supervised learning. Typically, in unsupervised learning the data the model's robustness and performance [35]. Furthermore, SVM is a powerful classifier renowned for its effectiveness in high-dimensional spaces and ability to handle complex decision boundaries [31]. Considering the presence of multiple features and potentially ©ntricate relationships between them in the dataset of this study, SVM was selected to explore non-linear decision boundaries and establish a competitive baseline for comparison with other algorithms.
Metals 2023, 13, x FOR PEER REVIEW 3 of 15 approach to capture and comprehend the decision-making process, enabling the extraction of meaningful insights from the model [34]. In addition, RF is utilized as an ensemble learning method to address overfitting and check the possible enhancement in the overall accuracy. Generally, by combining multiple decision trees and aggregating their predictions, RF demonstrates a tendency to generalize well on unseen data, making it a suitable choice for improving the model's robustness and performance [35]. Furthermore, SVM is a powerful classifier renowned for its effectiveness in high-dimensional spaces and ability to handle complex decision boundaries [31]. Considering the presence of multiple features and potentially ©ntricate relationships between them in the dataset of this study, SVM was selected to explore non-linear decision boundaries and establish a competitive baseline for comparison with other algorithms.
This study proposes the use of classification methods to categorize and predict the SS alloys' corrosion behavior based on the literature data published in the ASM Handbook of Corrosion Data (2nd edition) [38]. The dataset's features comprise the chemical composition of SS alloys, which includes 13 elements, the electrolyte temperature and the concentration of lactic acid in the electrolytic solution. Various corrosion behaviors within this dataset were qualitatively labeled as Resistant, Good, Poor, and Questionable, with each label defined as follows: • Resistant: Corresponding to a mass loss rate of less than 0.1 g/h/m 2 or less than 0.11 mm/year decrease in thickness.

•
Good: Corresponding to a mass loss rate ranges from 0.1 to 1.0 g/h/m 2 or 0.11 to 1.10 mm/year decrease in thickness.

•
Poor: Corresponding to a mass loss rate ranges from 1.0 to 10.0 g/h/m 2 or 1.1 to 11.0 mm/year decrease in thickness.

•
Questionable: Corresponding to a mass loss rate exceeding 10.0 g/h/m 2 or 11.0 mm/year decrease in thickness, or being susceptible to local corrosion, pitting, crevice, or stress corrosion.
This study proposes the use of classification methods to categorize and predict the SS alloys' corrosion behavior based on the literature data published in the ASM Handbook of Corrosion Data (2nd edition) [38]. The dataset's features comprise the chemical composition of SS alloys, which includes 13 elements, the electrolyte temperature and the concentration of lactic acid in the electrolytic solution. Various corrosion behaviors within this dataset were qualitatively labeled as Resistant, Good, Poor, and Questionable, with each label defined as follows:

•
Resistant: Corresponding to a mass loss rate of less than 0.1 g/h/m 2 or less than 0.11 mm/year decrease in thickness.

•
Good: Corresponding to a mass loss rate ranges from 0.1 to 1.0 g/h/m 2 or 0.11 to 1.10 mm/year decrease in thickness.

•
Poor: Corresponding to a mass loss rate ranges from 1.0 to 10.0 g/h/m 2 or 1.1 to 11.0 mm/year decrease in thickness.

•
Questionable: Corresponding to a mass loss rate exceeding 10.0 g/h/m 2 or 11.0 mm/year decrease in thickness, or being susceptible to local corrosion, pitting, crevice, or stress corrosion.
On this dataset, it was proposed to develop and apply ML classification techniques that map the chemical composition of steel, acid concentration, and temperature to a label, which was defined based on the corrosion rate. Different ML techniques were analyzed by comparing their accuracies. The effect of feature selection and reduction on the accuracy of the models was studied. Finally, the prediction accuracy was improved using hyperparameter tuning techniques. Figure 2 illustrates a schematic representation of the proposed methodology for assessing the corrosion condition of SSs in lactic acid. In the first step, the corrosion data are preprocessed before being transferred to the modeling step. Three ML algorithms, including DT, RF, and SVM, are considered as classification methods. After fitting the model, its performance is assessed by computing training and testing accuracies. Additionally, the models are evaluated using the confusion matrix and Receiver Operator Characteristic (ROC) curves to provide a comprehensive assessment. Hyperparameter tuning is conducted to optimize the model's hyperparameters, leading to subsequent enhancements in the model's performance [17]. In addition, feature selection is performed based on the importance of input features to improve the model performance by maintaining high accuracies, while using fewer inputs.

Methodology
On this dataset, it was proposed to develop and apply ML classification techniques that map the chemical composition of steel, acid concentration, and temperature to a label, which was defined based on the corrosion rate. Different ML techniques were analyzed by comparing their accuracies. The effect of feature selection and reduction on the accuracy of the models was studied. Finally, the prediction accuracy was improved using hyperparameter tuning techniques. Figure 2 illustrates a schematic representation of the proposed methodology for assessing the corrosion condition of SSs in lactic acid. In the first step, the corrosion data are preprocessed before being transferred to the modeling step. Three ML algorithms, including DT, RF, and SVM, are considered as classification methods. After fitting the model, its performance is assessed by computing training and testing accuracies. Additionally, the models are evaluated using the confusion matrix and Receiver Operator Characteristic (ROC) curves to provide a comprehensive assessment. Hyperparameter tuning is conducted to optimize the model's hyperparameters, leading to subsequent enhancements in the model's performance [17]. In addition, feature selection is performed based on the importance of input features to improve the model performance by maintaining high accuracies, while using fewer inputs. The following subsections (2.1-2.3) provide the details of the database, the preprocessing methods, the fitted models, and the feature reduction methods utilized in the developed classification ML.

Corrosion Data Preprocessing
It is necessary to prepare the dataset for ML approaches with the help of preprocessing techniques. At first, the database underwent a uniformity check. The majority of the corrosion behavior labels are qualitative; however, as the experimental data provided in the handbook originated from different references, some data were not qualitative (see the red box in Figure 3) and the corrosion rate was reported numerically in these cases. For a few data series, some features were missing, shown as a red box in Figure 4, where the lactic acid concentration was not reported and, consequently, they were removed from the database. Hence, these undesirable data were removed in the first step of  The following Sections 2.1-2.3 provide the details of the database, the preprocessing methods, the fitted models, and the feature reduction methods utilized in the developed classification ML.

Corrosion Data Preprocessing
It is necessary to prepare the dataset for ML approaches with the help of preprocessing techniques. At first, the database underwent a uniformity check. The majority of the corrosion behavior labels are qualitative; however, as the experimental data provided in the handbook originated from different references, some data were not qualitative (see the red box in Figure 3) and the corrosion rate was reported numerically in these cases. For a few data series, some features were missing, shown as a red box in Figure 4, where the lactic acid concentration was not reported and, consequently, they were removed from the database. Hence, these undesirable data were removed in the first step of preprocessing to make a uniform dataset. After these manipulations, 198 rows of data were obtained in total.
The chemical compositions of SSs were obtained from the literature [39]. This resulted in the definition of 15 features: 13 representing the elemental compositions in the SS alloy (Table 1), and two features indicating the environmental conditions (electrolyte temperature and acid concentration) as shown in Table 2. The corrosion behavior is also listed in Table 2 as the output feature. preprocessing to make a uniform dataset. After these manipulations, 198 rows of data were obtained in total.  The chemical compositions of SSs were obtained from the literature [39]. This resulted in the definition of 15 features: 13 representing the elemental compositions in the SS alloy (Table 1), and two features indicating the environmental conditions (electrolyte temperature and acid concentration) as shown in Table 2. The corrosion behavior is also listed in Table 2 as the output feature.  13, x FOR PEER REVIEW 5 of 15 preprocessing to make a uniform dataset. After these manipulations, 198 rows of data were obtained in total.  The chemical compositions of SSs were obtained from the literature [39]. This resulted in the definition of 15 features: 13 representing the elemental compositions in the SS alloy (Table 1), and two features indicating the environmental conditions (electrolyte temperature and acid concentration) as shown in Table 2. The corrosion behavior is also listed in Table 2 as the output feature.

ML Approach
This study compares the performance of corrosion rate prediction of different ML algorithms, including DT, SVM, and RF. For these three training methods, respectively 80% and 20% of the data are considered as training and testing data sets. As shown in Table 3, four classes of corrosion behavior present in this database and the number of available data for each class is: Resistant-95 labels, Good-58 labels, Poor-22 labels, and Questionable-23 labels. The symbolic labels, Questionable, Poor, Good, and Resistant, are quantified automatically by defining a function label_encoder. The performance of each model was assessed by calculating its test and train accuracy using the sklearn.metrics.accuracy_score function from the Scikitlearn library [40]. For each model, a confusion matrix was generated to examine misclassified labels. The training set was created using two variables, x_train and y_train, and a DT classifier was trained on these variables with random_state = 0. Also, the maximum DT depth was set to the default value max_depth = None to expand nodes until all leaves are pure (i.e., there is only one class on that node). Based on the plotted DT, with max_depth = 10, the lowest class impurity was achieved. Hyperparameter tuning led to a tuned set of parameters with max_depth = 5 and criterion = 'gini'. In the next step, K-fold cross validation [40] was applied, where the data were divided into k = (2, 3, . . ., 11) equally sized chunks and each chunk had its turn "pretending" to be a test set. Comparing these validation results with the obtained test accuracies, it can be concluded that the K-fold cross validation is a reasonable and conservative estimate of the testing accuracies for the used ML approaches. Initially, the SVM model was fitted on the training data using the default hyperparameters (C = 1 and gamma = 'scale') with a kernel of 'rbf'. Subsequently, hyperparameter tuning was conducted to enhance the model's accuracy. This tuning involved inspecting the regularization parameter C and the kernel coefficient gamma with a reciprocal random distribution. The resulting optimized hyperparameters were found to be C = 1.38 and gamma = 0.019. After applying the SVM algorithm, the RF algorithm was also employed for comparison. To do so, the RF Classifier with random_state = 0 was fitted to the training data. A hyperparameter tuning process was performed to determine the best number of trees in the forest (n_estimators) for this algorithm. The optimal number of estimators, in this case, was identified as 8 for RF classification. It is important to note that further increasing the parameter n_estimators did not yield any positive influence on the accuracy of the model.
The ROC curve is a common and effective method to visualize the performance of the classifier [41,42]. To draw the ROC curve, the false positive rate (FPR) and true positive rate (TPR) values, as shown in Equations (1) and (2) below, were evaluated.
Here, T p represents the correctly classified known positives ("True positives"), T n represents the correctly classified known negatives ("True negatives"), F p represent the incorrectly classified known positives ("False positives"), and F n represents the incorrectly classified known negatives ("False negatives"). In order to draw a ROC curve for a multiclassification problem, binarization is applied to the data series. Therefore, the function label_binarize from the sklearn.preprocessing library [40] was deployed.

Feature Reduction
Model development and training for predictive modelling problems can be slow due to the large number of involved variables. To reduce the number of input variables, related features can be selected according to their effectiveness for predicting the target variable (here: corrosion behavior). Some models perform worse when the target variable has a low correlation with the input variables [43]. Therefore, the main reason for feature selection and reduction is to increase the classification accuracy and decrease the training time. As we have numerical input and classified output, the most common feature selection technique is the analysis of variance (ANOVA) [44][45][46]. In this study, the most related features for training models were adopted using the ANOVA feature selection method.

Results and Discussion
In this section, the training and testing accuracies for each model are reported separately. To visualize the erroneous predictions, the confusion matrixes and the ROC curves are constructed. In addition, the influence of feature reduction on the accuracy of the models is presented.

DT Model
The training and testing accuracies of the developed DT model, calculated by sklearn.metrics.accuracy_score [40], are 98.73% and 90.00%, respectively. After the hyperparameter tuning and using the optimized parameters, the obtained accuracies changed to 97.47% and 92.50%. The confusion matrix ( Figure 5) visualizes the number of correct and wrong predictions for each category. In each row, the sum of horizontal values represents the total data series associated with its corrosion label. In other words, in the testing data set, there are 11 datapoints with a Good label, 7 datapoints with a Poor label, 5 datapoints with a Questionable label, and 17 datapoints with a Resistant label. For the data series corresponding to the Good label, 10 of them are classified correctly (T p ), and one is mislabeled by Poor (F n ). For the data series with Questionable and Resistant labels, one F n classification with a Good label was made. All data series with Poor labels are T p and are therefore classified correctly. wrong predictions for each category. In each row, the sum of horizontal values represents the total data series associated with its corrosion label. In other words, in the testing data set, there are 11 datapoints with a Good label, 7 datapoints with a Poor label, 5 datapoints with a Questionable label, and 17 datapoints with a Resistant label. For the data series corresponding to the Good label, 10 of them are classified correctly (Tp), and one is mislabeled by Poor (Fn). For the data series with Questionable and Resistant labels, one Fn classification with a Good label was made. All data series with Poor labels are Tp and are therefore classified correctly. The DT classifier achieved an average accuracy of 83.92% across all 11 folds with a standard deviation of 5.5%. These results demonstrate the stability and consistency of the model's performance during cross-validation. It is worth noting that the model exhibits consistent performance across different folds, indicating a good level of generalization. No overfitting is observed, as the performance on the validation sets remained relatively close to the training set accuracy.

RF Model
The training and testing accuracies of the applied RF model are, respectively, 98.10% and 87.50%. In Figure 6, the confusion matrix shows that all Poor and Questionable labels, which corresponds to an undesired corrosion behavior, were correctly classified. However, some mislabeling for Good and Resistant labels did occur. In other words, the Fn for the Good and Resistant labels are equal to 2 and 3, respectively. The DT classifier achieved an average accuracy of 83.92% across all 11 folds with a standard deviation of 5.5%. These results demonstrate the stability and consistency of the model's performance during cross-validation. It is worth noting that the model exhibits consistent performance across different folds, indicating a good level of generalization. No overfitting is observed, as the performance on the validation sets remained relatively close to the training set accuracy.

RF Model
The training and testing accuracies of the applied RF model are, respectively, 98.10% and 87.50%. In Figure 6, the confusion matrix shows that all Poor and Questionable labels, which corresponds to an undesired corrosion behavior, were correctly classified. However, some mislabeling for Good and Resistant labels did occur. In other words, the F n for the Good and Resistant labels are equal to 2 and 3, respectively.

SVM Model
Considering the default hyperparameter values for the SVM model, the obtained training and testing accuracies are, respectively, 60.13% and 55.00%, which indicates a low performance of this model on the dataset. Figure 7a indicates that none of the Good and Poor classes were classified correctly by the model, and all of them were mistakenly predicted (Fn) as Questionable and Resistant. After applying the hyperparameter tuning, the accuracies improved to 93.04% and 87.50%, respectively. Figure 7b also shows an improvement in detecting the right label for each class. All the Resistant classes are predicted correctly in the dataset; however, some errors in predicting the right label for Good, Questionable, and Poor classes still remain.

SVM Model
Considering the default hyperparameter values for the SVM model, the obtained training and testing accuracies are, respectively, 60.13% and 55.00%, which indicates a low performance of this model on the dataset. Figure 7a indicates that none of the Good and Poor classes were classified correctly by the model, and all of them were mistakenly predicted (F n ) as Questionable and Resistant. After applying the hyperparameter tuning, the accuracies improved to 93.04% and 87.50%, respectively. Figure 7b also shows an improvement in detecting the right label for each class. All the Resistant classes are predicted correctly in the dataset; however, some errors in predicting the right label for Good, Questionable, and Poor classes still remain.
training and testing accuracies are, respectively, 60.13% and 55.00%, which indicates a low performance of this model on the dataset. Figure 7a indicates that none of the Good and Poor classes were classified correctly by the model, and all of them were mistakenly predicted (Fn) as Questionable and Resistant. After applying the hyperparameter tuning, the accuracies improved to 93.04% and 87.50%, respectively. Figure 7b also shows an improvement in detecting the right label for each class. All the Resistant classes are predicted correctly in the dataset; however, some errors in predicting the right label for Good, Questionable, and Poor classes still remain.   The area under each ROC curve represents the degree of separability; the closer this area is to 1, the higher is the capability of distinguishing between classes. For the specific class of TPR equaling 1, all data belonging to that class were classified correctly. In the case of a class with FPR equaling 0, not a single data was considered erroneous for this class. The ROC curve of Figure 8, representing the corrosion behavior of SS in lactic acid, demonstrates that all the Poor labels were correctly classified, and the area under the Questionable curve is very close to 1. Indeed, this is of great importance for practical use, because wrongly classified Poor or Questionable classes (i.e., when wrongly classified as Good or Resistant) may result in a disaster such as failure of systems, accidents and leakage of hazardous substances in industrial applications exposed to corrosive environments. The area under each ROC curve represents the degree of separability; the closer this area is to 1, the higher is the capability of distinguishing between classes. For the specific class of TPR equaling 1, all data belonging to that class were classified correctly. In the case of a class with FPR equaling 0, not a single data was considered erroneous for this class. The ROC curve of Figure 8, representing the corrosion behavior of SS in lactic acid, demonstrates that all the Poor labels were correctly classified, and the area under the Questionable curve is very close to 1. Indeed, this is of great importance for practical use, because wrongly classified Poor or Questionable classes (i.e., when wrongly classified as Good or Resistant) may result in a disaster such as failure of systems, accidents and leakage of hazardous substances in industrial applications exposed to corrosive environments.

Feature Reduction
Due to the large number of variables involved in predictive modelling problems, model development and training can be slow. However, input variables can be reduced by selecting features that are most useful for target variable prediction. Moreover, in the database considered for this study, it is not possible to measure the atomic percentage of

Feature Reduction
Due to the large number of variables involved in predictive modelling problems, model development and training can be slow. However, input variables can be reduced by selecting features that are most useful for target variable prediction. Moreover, in the database considered for this study, it is not possible to measure the atomic percentage of all elements, so it is preferred to have a simplified model based on a smaller number of elements. Therefore, feature selection and reduction are aimed to increase classification accuracy, decrease training time, and save time and costs for preparing experimental data that are used to train the models. Table 4 shows the list of input features ranked by deploying the ANOVA technique. A comparison of training and testing accuracies based on different input numbers for each model (DT, RF and SVM) is shown in Figure 9 to determine the required number of features to be considered as input. A comparison of training and testing accuracies based on different input numbers for each model (DT, RF and SVM) is shown in Figure 9 to determine the required number of features to be considered as input.  Figure 9 presents the obtained graphs of accuracy versus number of features for each model (Figure 9 (a) DT, (b) RF and (c) SVM). It can be concluded that the highest training accuracy was achieved when considering the first nine features, and further increasing the number of input features has no influence on training accuracy. Testing accuracy varies, specifically for RF, but it can still be considered as stable. Thus, we can conclude from Figure 9 that considering nine input features, including Fe, Ni, Cr, temperature, Mn, C, N, Mo, and lactic acid concentration will provide the same training accuracy compared to considering all 15 input features. The testing accuracy only reduced 2.5%, 7.5%, and 0% for DT, RF and SVM models, respectively. Therefore, feature reduction has a minor impact on the training and testing accuracies. The ROC curves of the SVM models after feature reduction are shown in Figure 10. The latter gives a good indication on how the ML performance is affected. It can be observed that the values remain almost unchanged, indicating no significant influence on the model's performance.  Figure 9 presents the obtained graphs of accuracy versus number of features for each model (Figure 9 (a) DT, (b) RF and (c) SVM). It can be concluded that the highest training accuracy was achieved when considering the first nine features, and further increasing the number of input features has no influence on training accuracy. Testing accuracy varies, specifically for RF, but it can still be considered as stable. Thus, we can conclude from Figure 9 that considering nine input features, including Fe, Ni, Cr, temperature, Mn, C, N, Mo, and lactic acid concentration will provide the same training accuracy compared to considering all 15 input features. The testing accuracy only reduced 2.5%, 7.5%, and 0% for DT, RF and SVM models, respectively. Therefore, feature reduction has a minor impact on the training and testing accuracies. The ROC curves of the SVM models after feature reduction are shown in Figure 10. The latter gives a good indication on how the ML performance is affected. It can be observed that the values remain almost unchanged, indicating no significant influence on the model's performance. According to materials science literature, in SS alloys; Ni, Cr, Mo, and C are essential elements in the chemical composition to form corrosion-resistant passive layers [47]. The aforementioned elements are considered to be the most important influencing factors. Thus, feature reduction is also justified from a materials science perspective. It should be highlighted that the feature reduction step, which excludes Si, Al, S, P, Nb, Ta, and Ti elemental composition as input features does not impact prediction accuracy, however, it is not necessarily implying that these elements have no influence on the corrosion behavior of SS in lactic acid. It is important to note that the developed prediction in this study is qualitative, where each label describes a range of corrosion rates, as explained in the introduction Section 1. To comprehensively discuss the effect of the elements Si, Al, S, P, Nb, Ta, and Ti on the corrosion behavior of SSs in lactic acid, conducting experiments and numerical evaluations are necessary next steps in future studies.
ROC curves and accuracy measurements show no major negative impacts after feature reduction. It was proven that the number of inputs can be reduced to keep modeling and prediction faster, and to prepare the experimental data in a more time-and cost-effective manner.

Conclusions
In the present study, different ML algorithms were used to model and predict the corrosion behavior of SS alloys in an aqueous environment containing different lactic acid concentrations at varying temperatures. It was demonstrated that using the DT classifier, training and testing accuracies of, respectively, 97.47% and 92.50% could be achieved. The training and testing accuracies of the RF algorithm were 98.1% and 87.5%, respectively, which are close to the DT classifier results. Using the SVM algorithm, a training accuracy of 60.1% and testing accuracy of 55.0% were obtained. To improve the model, hyperparameter tuning was applied, leading to values to 93.0% and 87.5%, respectively. To evaluate the FPR and TPR parameters, an ROC curve for multiclassification was constructed. Based on its graph, it was shown that the area under the curve of the Poor class is equal to 1, which is highly important from a material selection perspective. Finally, ANOVA feature selection was applied to the pre-processed data set and nine input features including Fe, Ni, Cr, temperature, Mn, C, N, Mo, and lactic acid concentration were found as the According to materials science literature, in SS alloys; Ni, Cr, Mo, and C are essential elements in the chemical composition to form corrosion-resistant passive layers [47]. The aforementioned elements are considered to be the most important influencing factors. Thus, feature reduction is also justified from a materials science perspective. It should be highlighted that the feature reduction step, which excludes Si, Al, S, P, Nb, Ta, and Ti elemental composition as input features does not impact prediction accuracy, however, it is not necessarily implying that these elements have no influence on the corrosion behavior of SS in lactic acid. It is important to note that the developed prediction in this study is qualitative, where each label describes a range of corrosion rates, as explained in the introduction Section 1. To comprehensively discuss the effect of the elements Si, Al, S, P, Nb, Ta, and Ti on the corrosion behavior of SSs in lactic acid, conducting experiments and numerical evaluations are necessary next steps in future studies.
ROC curves and accuracy measurements show no major negative impacts after feature reduction. It was proven that the number of inputs can be reduced to keep modeling and prediction faster, and to prepare the experimental data in a more time-and costeffective manner.

Conclusions
In the present study, different ML algorithms were used to model and predict the corrosion behavior of SS alloys in an aqueous environment containing different lactic acid concentrations at varying temperatures. It was demonstrated that using the DT classifier, training and testing accuracies of, respectively, 97.47% and 92.50% could be achieved. The training and testing accuracies of the RF algorithm were 98.1% and 87.5%, respectively, which are close to the DT classifier results. Using the SVM algorithm, a training accuracy of 60.1% and testing accuracy of 55.0% were obtained. To improve the model, hyperparameter tuning was applied, leading to values to 93.0% and 87.5%, respectively. To evaluate the FPR and TPR parameters, an ROC curve for multiclassification was constructed. Based on its graph, it was shown that the area under the curve of the Poor class is equal to 1, which is highly important from a material selection perspective. Finally, ANOVA feature selection was applied to the pre-processed data set and nine input features including Fe, Ni, Cr, temperature, Mn, C, N, Mo, and lactic acid concentration were found as the most influencing factors on the corrosion behavior of SSs in a lactic acid environment. Finally, it can be concluded that SVM, RF, and DT techniques are useful to predict the corrosion behavior of data sets, which consists of quantitative input and qualitative output (labeled) data. The models can predict the corrosion behavior of SS in 'unseen' situations (lactic acid concentration and temperature), and then label or classify the corrosion behavior of the material as Poor, Good, Questionable, or Resistant.
Although corrosion engineers prefer to work with quantitative corrosion data, such as corrosion rate or corrosion current density, in some cases, categorization of materials based on their corrosion behavior in different environments is useful and, often, the only information available in practical use cases. This study successfully demonstrates that labeled corrosion behavior of relatively small datasets can be correctly predicted using ML algorithms. In addition, it was observed that, according to the obtained accuracies, DT is the best model to train and test; whereas, according to the confusion matrixes, RF performs best in terms of identifying all Poor and Questionable labels. Data Availability Statement: The raw data that was used and the codes developed to replicate these results can be accessed for download from the provided Github repository: https://github.com/ Soroosh-HKN/ML-lactic-acid-corrosion. Accessed on 7 July 2023.