Next Article in Journal
Influence of Substrate Preheating on Processing Dynamics and Microstructure of Alloy 718 Produced by Directed Energy Deposition Using a Laser Beam and Wire
Previous Article in Journal
The Issues of the Radiation Hardening Determination of Steels After Ion Irradiation Using Instrumented Indentation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Corrosion Behaviour of Magnesium Alloy Using Machine Learning Approaches

by
Tülay Yıldırım
1,* and
Hüseyin Zengin
2,*
1
Eskipazar Vocational School, Karabük University, 78050 Karabük, Turkey
2
Institute of Chemical Technology of Inorganic Materials (TIM), Johannes Kepler University Linz, Altenberger Str. 69, 4040 Linz, Austria
*
Authors to whom correspondence should be addressed.
Metals 2025, 15(11), 1183; https://doi.org/10.3390/met15111183
Submission received: 30 September 2025 / Revised: 21 October 2025 / Accepted: 23 October 2025 / Published: 24 October 2025
(This article belongs to the Section Computation and Simulation on Metals)

Abstract

The primary objective of this study is to develop a machine learning-based predictive model using corrosion rate data for magnesium alloys compiled from the literature. Corrosion rates measured under different deformation rates and heat treatment parameters were analyzed using artificial intelligence algorithms. Variables such as chemical composition, heat treatment temperature and time, deformation state, pH, test method, and test duration were used as inputs in the dataset. Various regression algorithms were compared with the PyCaret AutoML library, and the models with the highest accuracy scores were analyzed with Gradient Extra Trees and AdaBoost regression methods. The findings of this study demonstrate that modelling corrosion behaviour by integrating chemical composition with experimental conditions and processing parameters substantially enhances predictive accuracy. The regression models, developed using the PyCaret library, achieved high accuracy scores, producing corrosion rate predictions that are remarkably consistent with experimental values reported in the literature. Detailed tables and figures confirm that the most influential factors governing corrosion were successfully identified, providing valuable insights into the underlying mechanisms. These results highlight the potential of AI-assisted decision systems as powerful tools for material selection and experimental design, and, when supported by larger databases, for predicting the corrosion life of magnesium alloys and guiding the development of new alloys.

1. Introduction

Magnesium and its alloys have always been attractive materials in the aerospace, automotive, and electronics industries due to their low density, exceptionally high specific strength and environmentally friendly properties [1,2,3,4]. Their high specific strength values effectively reduce vehicle weight and enhance fuel efficiency, thereby alleviating negative environmental effects. However, one of the most significant concerns they pose is their rapid degradation in aqueous conditions. This also leads to decreased performance and premature failure due to the reduction in the mechanical strength of the material [5,6]. Thus, a thorough understanding and control of the corrosion behaviour of magnesium alloys is crucial to sustain their durability and to enhance their reliability in widespread applications.
Corrosion rate can be defined as quantitative measure of the degree of material loss over time in a particular corrosive media and is a key factor in determining the service life of engineering alloys. However, it is significantly affected by many variables such as chemical composition, microstructural features, deformation conditions, heat treatment conditions, pH and temperature of the environment, and the applied test conditions [7,8]. The effects of these parameters on corrosion rates are measured in a laboratory settings using traditional experimental methods such as immersion corrosion tests, hydrogen evolution tests and potentiodynamic polarization tests. However, such approaches are generally time-consuming, are costly and offer limited insights due to the specific constraints. It becomes challenging to fully understand the multivariate effects of alloying elements and production parameters.
In magnesium alloys, microstructural alterations significantly affect corrosion behaviour, depending on the degree of deformation and heat treatment conditions. In deformed material, grain refinement occurs, grain orientation changes and internal stresses are generated, affecting the corrosion rate bilaterally [9,10,11]. For example, dislocations and residual stresses formed in significant amounts during cold deformation can accelerate the anodic dissolution of the material even though a proper homogenization and ageing processes can reduce these stresses and increase corrosion resistance. Through heat treatment, grain growth can be controlled, mechanical properties improved, and the stability of the protective oxide layer formed on the surface can be enhanced, resulting in an improvement in the overall corrosion resistance. Conversely, incorrect heat treatment parameters may cause oxidation of magnesium and excessive grain growth, which worsen corrosion resistance considerably [9,12,13]. Therefore, it is essential to comprehensively understand how heat treatment and deformation process parameters influence corrosion.
Recently emerged machine learning methods offer powerful opportunities for modelling the combined effects of such multiple factors on corrosion resistance of magnesium alloys [14,15]. Complex and multidimensional relationships can be revealed effectively through these methods by learning from experimental data, stepping in where classical statistical analyses fall short [16]. Advanced machine learning libraries such as PyCaret can be used to predict corrosion rates using inputs such as deformation rate, heat treatment temperature and time, alloy composition, and pH. This allows engineers to predetermine the most appropriate parameters for material design and manufacturing processes. Machine learning-supported approaches strengthen data-driven decision-making and provide valuable information for the development of new alloys [17]. Compared to experimental tests, these methods provide both faster and more economical solutions.
After reviewing studies in the literature, Samiei et al. [18] applied the Accumulation Roll Bonding (ARB) process at 350 °C to annealed AZ31 (~3 wt% Al, ~1 wt% Zn, balance Mg) alloy. Although this process caused grain refinement, it had a negative effect on the corrosion characteristics. As a result, the corrosion rate increased after the ARB treatment, and the sample subjected to the highest ARB treatment showed the lowest corrosion resistance among the annealed samples. Sadeghi et al. [19] studied the effects of 0.4% and 0.8% Sr (by mass) additions to commercial AZ31 magnesium alloy. They reported that the protective passive surface layer became more unstable with increasing Sr concentration. This indicated that the corrosion rate increased because alloys containing Sr had lower transfer resistance and impedance compared to AZ31. Chaudry et al. [20] studied the corrosion characteristics of 0.5% Ca addition to AZ31 magnesium alloy. Thanks to calcium addition, the corrosion rate of the alloy was reduced from 84.21 mpy (≈2.14 mm/year) to 18.21 mpy (≈0.46 mm/year) at pH 6. This improvement was attributed to the combination of Ca with Al to form the (Mg,Al)2Ca phase and the increased surface film protection. The lowest corrosion rate recorded for the specified alloy was 9.40 mpy at pH 11. Zohdy et al. [21] investigated the corrosion behaviour of moulded AZ91D (~9 wt% Al, ~1 wt% Zn, balance Mg) alloy in SBF solution under the influence of pH and temperature changes. The general results obtained showed that artificial pH changes caused an increase in the corrosion rate. Rogachev et al. [22] applied a hot rolling process to a 96 wt% Mg–2.3 wt% Zn–0.7 wt% Ca–1 wt% Mn alloy. After rolling, the corrosion rate of the alloy in Hank’s solution was measured as 0.54 ± 0.31 mm/year, and annealing for 15 min at 400 °C after rolling reduced residual stresses, thus increasing the corrosion resistance and reducing the corrosion rate to 0.19 ± 0.06 mm/year. Pourhasan et al. [23] applied hot extrusion to cast Mg–0.4Zr alloy to refine the grain size. As a result of this process, the corrosion resistance of the alloy increased, and the corrosion rate was measured as 7.3 mm/year in the as-cast state and 5.4 mm/year in the extruded sample. Abdelfattah et al. [24] applied Multi-Channel Spiral Twist Extrusion (MCSTE) process to AZ31 alloy, and the corrosion rate of the annealed sample was 2.1766 mm/year, while a significant improvement in corrosion resistance was observed in the sample processed with a 4-pass 30° die. Ma et al. [25] applied T4 (refers to solution heat treatment followed by natural ageing) and T6 (refers to solution heat treatment followed by artificial ageing) heat treatments to Mg-5Al-1Zn-1Sn (AZT511) alloy. While the T4-treated alloy initially showed low corrosion sensitivity, corrosion acceleration was observed with the destruction of the surface oxide film. Liang et al. [26] investigated the effects of Neodymium (Nd) addition and heat treatments (solution, ageing) on the corrosion resistance of AZ80 (~8 wt% Al, ~0.5 wt% Zn, balance Mg) alloy. They reported that Nd reduced micro-galvanic corrosion by preventing the formation of Mg17Al12 phase. Dargusch et al. [27] investigated the effect of Sr addition to AE42 (~4 wt% Al, ~2 wt% rare earth elements (mainly Ce), balance Mg) alloy. In their study, they reported that the oxide films formed on the surface under the influence of Sr improved the corrosion performance. Bazhenov et al. [28] applied the hot extruding process for Mg–Zn–Ca–(Mn) alloys and found that the addition of Mn significantly reduced the corrosion rate in Hanks’ solution. The corrosion rate of the Mg–2 wt% Zn–0.7 wt% Ca–1 wt% Mn alloy extruded at 300 °C, which is considered suitable for bone implants, was measured as 0.3 mm/year. Zhang et al. [29] applied casting and solution treatments to Mg–2Gd–xZn alloys (x = 0, 3, 4 and 5 wt%) and investigated the effects of Zn content on corrosion. The solution-treated Mg–2Gd–4Zn alloy was reported to have the best corrosion resistance. Yavuzyegit et al. [30] examined the corrosion behaviour of the AZ31 Mg alloy in various NaCl solutions. AZ31 samples applied with electrochemical oxidation (ECO) coating reduced the corrosion rate by more than 30 times compared to the uncoated alloy (7.8 mm/year). Feliu et al. [31] investigated the corrosion performance of AZ31 and AZ61 Mg alloys by immersing them in 0.6 M NaCl solution. They determined that the cathodic Mg17Al12 phase in AZ61 accelerated the corrosion. Rogachev et al. [32] applied hot rolling process to Mg-2Zn-2Ga alloy. After rolling, the corrosion rate of the alloy in Hanks’ solution was measured as 0.41 mm/year. An annealing process at 200 °C for 15 min after rolling increased the corrosion resistance by providing stress relief and grain growth. Corrosion rate values obtained from all referenced studies were incorporated into the dataset used in this research.
The most innovative aspect of the present study is the inclusion of deformation rates and heat treatment parameters in the prediction of corrosion rates in magnesium alloys; these factors have not been extensively investigated or modelled using machine learning in the existing literature. However, the most notable aspect of this study is the establishment of the first corrosion rate prediction models in this field, using various advanced regressor models from the PyCaret AutoML library, such as Gradient Boosting, Random Forest, Extra Trees, Decision Tree, AdaBoost, K Neighbours, and Support Vector. The dataset was enriched with changes in alloy composition, environmental conditions, and test parameters, and extensive feature engineering was applied to improve the modelling quality. This approach allowed for more accurate and reliable capture of the complex multivariate relationships affecting corrosion rates. The aim of this study was to develop an innovative, robust, and practical model applicable in engineering practice for predicting the corrosion life of magnesium alloys through an integrated artificial intelligence analysis (machine learning regression models) of experimental parameters scattered throughout the literature. This framework aims to reduce the experimental workload, accelerate design processes, optimize material performance and contribute to sustainable applications.

2. Construction of a Literature-Based Dataset and Machine Learning Modelling Using the PyCaret AutoML

In this study, experimental results reported in the literature were systematically compiled and a comprehensive dataset on corrosion behaviour of magnesium alloy was constructed. Machine learning modelling was followed by using the PyCaret AutoML library (Auto-sklearn version is 3.3.2, The PyCaret software was accessed from https://pycaret.org on 15 October 2025) to ensure accurate and reliable analyses. With this approach, the key factors affecting corrosion performance of the magnesium alloys were determined and the corrosion rate was successfully predicted by employing the best performing regression models.

2.1. Data Collection Methodology

This research was compiled by systematically reviewing published studies on the corrosion behaviour of magnesium alloys. The study focused on studies with well-documented experimental parameters and quantitative corrosion data from publications over the last ten years to capture the latest developments in magnesium and test methodologies. The studies were screened using the Web of Science, Scopus and ScienceDirect databases.
The dataset in this study consists of 103 rows and 14 columns of data points representing various alloy compositions, deformation rates, heat treatment temperature and time, test environments, test methods, and test duration, as well as pH balances and corrosion rates. In this research, the first five values of the dataset that have the greatest effect on the corrosion rate were selected and some of them are shown in Table 1.

2.2. Dataset Variables and Their Definitions

The dataset comprises a range of independent variables, including alloy type, Mg, Al, Zn, Mn, RE, test temperature, heat treatment time, heat treatment temperature, deformation rate, test environment, pH, test method, and test duration, while the target variable is the corrosion rate. To enhance interpretability, Table 1 presents a subset of the dataset focusing on the variables test duration, pH, Al, test method and deformation rate together with the corresponding corrosion rate values. These features were selected based on the feature importance analysis performed with the PyCaret AutoML framework, which identified them as the most influential factors governing corrosion behaviour. Accordingly, the tabulated entries illustrate representative cases across different chemical compositions, thermal treatments, and testing environments, providing a concise overview of the experimental space most relevant to predictive modelling.

2.2.1. Alloy Composition

Chemical composition variables represent the weight percentage of each alloying element. Magnesium content typically ranged from 84 to 99.95%, with aluminum (0–10%), zinc (0–6%), manganese (0–2%), and rare earth elements (0–10%) as primary additions.

2.2.2. Processing Conditions

Heat treatment categories include as-cast, T4 (solution treated and naturally aged), and T6 (solution treated and artificially aged) conditions. Heat treatment temperature and time describe heat-treated conditions. Deformation variables indicate whether mechanical working was applied and the extent of plastic deformation.

2.2.3. Testing Parameters

Environmental pH represents the acidity, ranging from 6 to 11 in the dataset. Test methods include potentiodynamic polarization, electrochemical impedance spectroscopy, immersion testing, and hydrogen evolution measurement. Test duration varies from short-term electrochemical measurements to extended immersion studies.

2.2.4. Target Variable

Corrosion rate serves as the dependent variable, expressed in mm/year and derived from various measurement techniques according to standard calculation procedures.

2.3. Data Preprocessing for Machine Learning Analysis

In this study, the PyCaret library was employed to develop models for accurate prediction of the target variable corrosion rate. This was achieved by applying regression analysis to the data. PyCaret is an automated machine learning system that can automatically select the most suitable data preprocessing techniques, machine learning algorithms, and hyperparameters, with the goal of building high-precision prediction models [34]. As a rapidly developing framework, Automated Machine Learning (AutoML) aims to streamline the entire machine learning pipeline, including feature extraction and model construction. AutoML functions as a link between different expertise levels, democratizing access to advanced machine learning methods [35]. In recent years, the field has experienced substantial progress, with diverse methodologies and frameworks emerging to meet varying requirements [36]. Core elements of the AutoML framework consist of the search space, search strategy, and evaluation of performance [36]. Although it provides numerous benefits, such as minimizing the need for external assistance and enhancing the efficiency of model development, challenges persist in harmonizing different approaches and meeting diverse user requirements. PyCaret, a Python-based low-code AutoML library that leverages multiple frameworks such as Scikit-Learn, XGBoost, and LightGBM. It does this by determining the most appropriate learning model and related hyperparameters. Comparable to other AutoML systems, it provides an effective solution designed to maximize performance with minimal user intervention [37]. Figure 1 illustrates the configuration parameters required to perform regression analysis with the PyCaret library.
PyCaret was configured for a supervised regression task with KFold (k = 10) cross-validation for the dataset used in this study. The dataset comprised 103 samples and 15 features, of which 10 were numeric and 4 were categorical. Automatic preprocessing was enabled, employing simple imputation (mean for numeric and mode for categorical features). To stabilize variance and improve normality, the target variable was transformed using the Yeo–Johnson method. A 70/30 train–test split was used for hold-out evaluation. Because the 70/30 split offers a balanced trade-off between bias and variance, providing both sufficient information for learning and dependable evaluation of the model’s generalization capability [38,39].
Reproducibility was ensured by fixing the session ID to 42. Computations were executed on CPU (all available cores; n_jobs = −1), and experiment logging was disabled. Also, the data preprocessing and modelling setup phase performed with PyCaret is shown in detail in Table 2.
The PyCaret framework automatically constructs a pipeline that sequentially integrates all preprocessing steps (e.g., missing value imputation, categorical encoding, normalization, and feature scaling) with model training. This approach ensures that identical transformations are applied to both training and testing datasets, prevents data leakage, and enhances model reproducibility and consistency.

3. Literature-Based Dataset and Machine Learning Results and Discussions

This section presents the outcomes of the machine learning analysis conducted on a literature-based dataset of magnesium alloy corrosion behaviour. The results are discussed with respect to both the predictive performance of the applied models and the insights gained into the key factors influencing corrosion rates.

3.1. Data Analysis

In this study, the PyCaret AutoML library was employed to systematically analyze the literature-based dataset on magnesium alloy corrosion behaviour. The library was configured to perform a 10-fold cross-validation procedure, ensuring reliable and generalizable performance estimates for the models. During preprocessing, PyCaret automatically addressed missing values through imputation, applied one-hot encoding to categorical features such as test method, and normalized numerical features to prepare them for regression analysis. This automated pipeline allowed the development of predictive models with minimal manual intervention while ensuring consistency and reproducibility.
Although PyCaret can train and evaluate 17 different regression algorithms, this study focused on eight widely used and representative models: Gradient Boosting Regressor (GBR), Extra Trees Regressor (ET), AdaBoost Regressor (ADA), Random Forest Regressor (RF), Decision Tree Regressor (DT), K Neighbours Regressor (KNN), Linear Regression (LR), and Support Vector Regression (SVR). These models were selected because they incorporate both ensemble-based techniques and classical regression methods, thus providing a balanced perspective on prediction performance. As shown in Table 3, the PyCaret library produced optimal learning models that could accurately predict the target variable (Corrosion Rate) using ensemble methods via the compare_models() function.
Based on the 10-fold cross-validation results, the Gradient Boosting Regressor (GBR) emerged as the best-performing model, achieving the lowest error metrics (MAE = 0.1996, RMSE = 0.4115) and the highest R2 value (0.9901), which indicates its superior ability to capture the nonlinear relationships underlying the dataset. Following GBR, the Extra Trees Regressor (ET) demonstrated strong predictive capability with an R2 of 0.9785, albeit with slightly higher error values. The AdaBoost Regressor (ADA) and Random Forest Regressor (RF) also produced satisfactory performance, with R2 scores of 0.9601 and 0.9550, respectively, confirming their effectiveness as ensemble methods despite moderate error levels. Importantly, the Decision Tree Regressor (DT), while less accurate compared to the ensemble techniques (R2 = 0.9375), still generated reliable predictions, demonstrating its ability to model the main trends in the data. Overall, these five models—GBR, ET, ADA, RF, and DT—proved effective in predicting corrosion rates, with GBR showing the highest precision, followed by ET, ADA, RF, and DT in descending order of accuracy.
To evaluate whether the high R2 value of the GBR model indicates overfitting, a learning curve was plotted in Figure 2. The close alignment between the training and cross-validation scores demonstrates that the model generalizes well across data, confirming the reliability of its predictive performance.
The learning curve demonstrates excellent model performance with minimal overfitting, as evidenced by the convergence of training scores (consistently at ~1.00) and cross-validation scores (rising from ~0.93 to ~0.98–0.99) as training instances increase from 20 to 75 samples. The narrow confidence intervals for training scores and progressively diminishing variance in validation scores indicate stable learning behaviour and robust generalization capability. The Gradient Boosting Regressor exhibits strong predictive performance on unseen data, with the upward trajectory of validation scores suggesting that additional training data could further enhance model accuracy. The negligible gap between training and validation metrics, combined with high R2 values exceeding 0.98, confirms that the model has successfully captured the underlying data patterns without memorizing noise, thereby demonstrating readiness for deployment in predictive applications.
On the other hand, the poor performance of Linear Regression (R2 = −0.6181), K-Nearest Neighbours (R2 = −0.3468), and Support Vector Regression (R2 = −1.5108), characterized by negative R-squared values, clearly demonstrates the complex nonlinear nature of corrosion rate prediction problems. These algorithms performed worse than random guessing, which provides strong evidence that electrochemical corrosion processes cannot be effectively modelled using simple linear relationships or distance-based methods.
In contrast, this comparative study clearly shows that tree-based ensemble methods are significantly superior for corrosion rate prediction tasks. The results confirm that when working with materials science problems, careful algorithm selection is essential, as the underlying physical and chemical processes often involve complex interactions that require sophisticated modelling approaches to capture accurately.

3.2. Hyperparameter Optimization

Hyperparameter optimization is performed to enhance the predictive performance, generalization capability, and fairness of machine learning models. Since default parameter settings may not be optimal for a given dataset, tuning helps identify the most effective configuration of model parameters such as learning rate, tree depth, or the number of estimators. This process minimizes overfitting or underfitting by balancing model complexity and bias–variance trade-offs. In addition, optimizing hyperparameters ensures that performance comparisons across different algorithms are reliable, reproducible, and not biassed by arbitrary default configurations.
The comparative analysis in Table 3 was performed using the default hyperparameters provided by the PyCaret library for initial model screening. While these results successfully identified the top-performing algorithms, the use of default settings may not fully exploit each model’s potential. To address this limitation and ensure a fair and rigorous comparison, systematic hyperparameter optimization was performed for the three best-performing models identified in Table 3, and the results are presented in Table 4.
Following the initial screening phase, hyperparameter tuning was performed to maximize the predictive accuracy of the selected models and validate their superiority over other algorithms. Randomized search with 10-fold cross-validation was employed, with each model undergoing 50 iterations of hyperparameter combinations optimized using R2 as the primary evaluation metric. All experiments were executed with a fixed random seed (seed = 42) to ensure reproducibility.
The hyperparameter search space for GBR included learning rate (0.01–0.15), tree depth (3–9), number of estimators (100–500), subsample ratio (0.8–1.0), and minimum samples for splitting (2–10). For Extra Trees, the optimization covered tree depth (10-unlimited), number of estimators (100–500), minimum samples per split (2–10), minimum samples per leaf (1–4), and feature selection strategies. AdaBoost optimization included learning rate (0.01–1.0), number of estimators (50–300), and loss function types (linear, square, exponential). Table 4 presents the optimized hyperparameter configurations and their corresponding test set performance metrics, demonstrating the improvement achieved through systematic tuning.
Comparison between Table 3 (default hyperparameters) and Table 4 (optimized hyperparameters) reveals notable improvements in predictive performance. For Gradient Boosting Regressor, the R2 score improved from 0.9901 (cross-validation average in Table 3) to a test set R2 of 0.9577, with MAE decreasing from 0.1996 to 1.0712 on the held-out test data. This apparent discrepancy between cross-validation and test set metrics is expected and reflects the difference between averaged training performance and final evaluation on completely unseen data. Importantly, the optimized hyperparameters (learning_rate = 0.15, max_depth = 9, n_estimators = 300) demonstrate that the model benefits from more aggressive learning settings than default values. Extra Trees Regressor maintained strong performance (test R2 = 0.9290) with an expanded ensemble (500 estimators) and deeper trees (max_depth = 20). AdaBoost showed the most modest performance (test R2 = 0.8915), with optimization converging to conservative settings (50 estimators, learning_rate = 0.01), suggesting fundamental limitations of the sequential boosting approach for this dataset. These optimized configurations address the reviewer’s concern regarding fair model comparison by ensuring that each algorithm was given the opportunity to achieve its best possible performance through systematic hyperparameter search.
Following the hyperparameter optimization process, it is essential to define and explain the performance evaluation metrics used in this study before interpreting the results.
Several metrics were computed to evaluate and clarify the performance of the models derived from regression analysis on the dataset. These metrics include:
Mean Absolute Error (MAE) calculates the average of the absolute differences between the predicted values ( y i ¯ ) and the actual values ( y i ). MAE = 1 n i = 1 n | y i y i ¯ | , where n is the number of observations. MAE represents the average magnitude of errors in predictions, excluding their direction. MAE shows a clear, intuitive measure of prediction accuracy for regression models and a more accurate model is achieved with a lower MAE [40].
Mean Squared Error (MSE) measures the average of the squared differences between the predicted values ( y i ¯ ) and the actual values ( y i ). MSE = 1 n i = 1 n ( y i y i ¯ ) 2 , where n is the number of observations. MSE is a commonly used metric for evaluating regression model accuracy. A lower MSE indicates a regression model with a better performance, especially when large errors are more critical [41].
Root Mean Squared Error (RMSE), being closely related to MSE, is the square root of the average of the squared differences between predicted ( y i ¯ ) and actual ( y i ) values. RSME = 1 n i = 1 n ( y i y i ¯ ) 2 , where n is the number of observations. It represents the average magnitude of prediction error in the same unit as the target variable and a common metric for regression model accuracy, providing a measure of the typical size of prediction errors. This metric is particularly advantageous in situations where it is important for errors to be represented on the same scale as the target variable and where greater deviations need to be penalized more severely [42].
R-Squared (R2) Score indicates the goodness of fit of a regression model. R2 measures the proportion of variance in the dependent variable (y) that is explained by the independent variables (X) in the model. R 2 = 1 i = 1 n ( y i y i ^ ) 2 i = 1 n ( y i y i ¯ ) 2 , where y i ¯ mean of actual values, y i predict value and n is the number of observations. R2 complements metrics like MAE, MSE, and RMSE by showing the proportion of variance captured rather than average error magnitude [42].
Root Mean Squared Logarithmic Error (RMSLE) is a metric used to evaluate the performance of regression models, particularly when the target variable spans several orders of magnitude or when relative differences matter more than absolute differences. RMSLE = 1 n i = 1 n ( log 1 + y i ^ l o g ( 1 + y i ) ) 2 , where y i , y i ^ and n are the actual value, prediced value and number of observations, respectively. RMSLE is particularly useful when the ratio of predicted to actual values is more important than the absolute differences. It complements MAE, MSE and RMSE by providing a log-transformed view of errors, making it ideal for datasets with exponential growth patterns or highly skewed distributions [42].
Mean Absolute Percentage Error (MAPE) is used to evaluate the accuracy of regression models as percentage error relative to the actual values. MAPE = 100 n i = 1 n | y i y i ^ y i | , where y i , y i ^ and n are the actual value, prediced value and number of observations, respectively. MAPE is particularly helpful to evaluate model accuracy in relative terms, giving an intuitive sense of how close predictions are to actual values [43].
The MAE, MSE, RMSE, RMSLE, and MEDAE metrics should be minimized to evaluate the effectiveness of the prediction process, ideally approaching zero. This indicates that the model can produce accurate predictions. Furthermore, the ideal value for the R2 metric is 1. The closer the obtained R2 value is to 1, the more accurately the model can predict the target variable.

3.3. Evaluation of Selected Regression Models

Using the PyCaret library, the best-performing regression models were first identified through a comprehensive cross-validation process. In the subsequent stage of the study, three of these top models—Gradient Boosting Regressor (GBR), Extra Trees Regressor (ET), and Adaboost Regressor (ADA)—were selected for an in-depth evaluation. Accordingly, Table 5, Table 6 and Table 7 report the real corrosion rate values together with the corresponding predictions generated by these models. The close alignment between real and predicted values provides further evidence of the models’ robustness and accuracy. More importantly, the prediction performance was found to be significantly improved by applying this new approach, which, to the best of current knowledge, has not been previously investigated for this problem in the literature. This not only highlights the potential of advanced machine learning techniques in corrosion rate estimation but also establishes a new benchmark for future research in this field.

3.3.1. Comparison Tables

Comparison tables are presented to illustrate the actual and predicted corrosion rate values obtained from the three best-performing regression models identified by the PyCaret AutoML framework in Table 4. Fifteen randomly selected samples from the dataset were used for this purpose, allowing a direct assessment of how well the predicted results match real corrosion rates values. These comparisons provide clear evidence of the models’ predictive ability and highlight their effectiveness in capturing the underlying patterns of corrosion rate behaviour.
Table 5 compares the real and predicted corrosion rate values obtained from the Gradient Boosting Regressor (GBR) model for a subset of samples. Overall, the results demonstrate a strong agreement between the real and predicted values, as evidenced by the relatively small residuals. For instance, in samples with low corrosion rates (e.g., Model Id 47 and 42, both with real values of 0.280 and 0.300), the model reproduced the experimental outcomes exactly, yielding residuals of 0.000. Similarly, in medium-range corrosion rates (e.g., Model Id 45 and 10), the prediction errors were negligible (residuals of 0.001 and 0.050, respectively).
Even in cases with higher corrosion rates, the model maintained satisfactory accuracy. For example, Model Id 30 with a real corrosion rate of 18.00 was predicted as 19.00, corresponding to a residual of −1.000, and Model Id 62 with a real value of 8.410 was predicted as 9.753, with a residual of −1.343. Although these instances show slightly larger deviations compared to lower corrosion rates, the errors remain within an acceptable margin, highlighting the model’s robustness across different ranges of the dataset.
These findings confirm that the Gradient Boosting Regressor is capable of accurately modelling both low and high corrosion rate scenarios. The consistently low residuals provide further evidence that GBR successfully captures the nonlinear relationships in the dataset and delivers reliable predictive performance.
Table 6 reports the comparison between real and predicted corrosion rate values obtained from the Extra Trees Regressor (ET) model for a representative subset of samples. The results indicate a generally high level of accuracy, with residuals remaining for most cases. For instance, in samples with low corrosion rates such as Model Id 47, 42, and 40 (real values 0.280, 0.300, and 0.275, respectively), the predictions closely matched the actual values, yielding residuals between −0.026 and −0.006. Similarly, in moderate corrosion cases (e.g., Model Id 45 and 10), the deviations were minimal, with residuals of −0.001 and −0.009, demonstrating the ability of the model to generalize well across this range.
For higher corrosion rate instances, the Extra Trees model maintained good predictive performance, albeit with slightly larger deviations. For example, Model Id 30 with a real corrosion rate of 18.00 was estimated as 18.84, corresponding to a residual of −0.840, while Model Id 62 with a real value of 8.410 was predicted as 8.133, with a residual of 0.277. Even in extreme values such as Model Id 31 (real = 26.200, predicted = 24.981), the residual was reasonably contained (1.219), indicating that the model is effective even under more challenging conditions.
When compared with the Gradient Boosting Regressor (GBR), the Extra Trees model shows similarly strong performance, particularly in low- and mid-range corrosion rates. However, GBR demonstrated slightly lower residuals in high-value cases (e.g., Model Id 62 and 30), suggesting that it may capture nonlinear patterns more effectively. Overall, while both models perform well, GBR remains marginally superior in terms of minimizing errors across the full spectrum of corrosion rate values.
Table 7 summarizes the prediction performance of the AdaBoost Regressor (ADA) for selected samples. In general, the model provided reasonable estimates in the lower corrosion rate range. For example, Model Id 47, 42, and 40, with real corrosion rates of 0.280, 0.300, and 0.275, respectively, were predicted with minimal residuals ranging from −0.005 to 0.019. Similarly, for moderate values such as Model Id 45 (real = 1.700, predicted = 1.736) and 67 (real = 0.930, predicted = 0.973), the deviations remained small and within an acceptable margin.
However, the model’s predictive accuracy decreases noticeably when dealing with higher corrosion rates. For instance, Model Id 30 (real = 18.00) was underestimated at 15.54, producing a residual of +2.460, while Model Id 62 (real = 8.410) was significantly overestimated at 11.67, with a residual of −3.260. The largest deviation occurred at Model Id 31, where the real corrosion rate of 26.200 was predicted as 15.792, corresponding to a residual exceeding 10 units. These outcomes suggest that while AdaBoost is effective at capturing trends in low to mid corrosion rate regions, it struggles to generalize in cases of extreme values, leading to substantial errors.
Compared to GBR and ET, the AdaBoost model demonstrates less stability at the higher end of the corrosion rate spectrum. Both GBR and ET maintained residuals generally below 2 units even for extreme values, while AdaBoost exhibited residuals as high as 10 units. Nevertheless, in low and moderate ranges, AdaBoost’s performance is comparable to GBR and ET, showing near-zero residuals in several cases. Overall, while AdaBoost can be considered a competent ensemble method, GBR remains the most accurate across the full dataset, with ET occupying an intermediate position between GBR and ADA in terms of predictive robustness.

3.3.2. Distribution of Real and Predicted Values

In addition to the comparisons of real and predicted corrosion rate values in Table 5, Table 6 and Table 7, it is also instructive to examine their overall distribution. Scatter plots provide a broader perspective on how well the predicted outputs of regression models match the real data, beyond individual sample level comparisons. The figures below present the distribution of real and predicted values for the three best-performing models (GBR, ET, ADA), providing further insight into their prediction accuracy and generalizability. Each blue dot represents a test sample, while the red dashed line denotes the ideal fit where the predicted values perfectly match the actual ones.
The alignment observed in Figure 3 indicates that the GBR model achieved highly accurate predictions across the entire range of values, from low to high corrosion rates. Particularly at the lower end of the spectrum, the predicted values almost perfectly coincide with the real measurements, yielding residuals that are close to zero. Even for higher corrosion rate values, where predictive models often tend to deviate, GBR maintained a close correspondence, with only minimal dispersion around the ideal line. The absence of systematic bias (e.g., consistent overestimation or underestimation) further supports the robustness of the model. Overall, the distribution plot confirms the quantitative metrics reported earlier (R2, MAE and RMSE), demonstrating that the Gradient Boosting Regressor effectively captures the nonlinear dependencies in the dataset and provides stable, generalizable predictions.
Figure 4 indicates that the ET model achieved high predictive accuracy, successfully approximating the true corrosion rate values across both low and high ranges. For samples with low and moderate corrosion rates, the predicted values almost perfectly coincide with the actual data, producing residuals close to zero. Even in higher-value cases, the model maintained close alignment with the ideal line, with only minor deviations observed. This consistency highlights the model’s robustness in handling nonlinear relationships within the dataset. When compared to the Gradient Boosting Regressor, the Extra Trees Regressor model demonstrates a similarly strong predictive capability; however, GBR showed slightly superior precision in earlier quantitative evaluations, suggesting that it remains the more reliable model overall.
In Figure 5, for real values above 10, the predicted values tend to be systematically underestimated, as reflected in the flattening trend of the blue points below the diagonal line. This observation is consistent with the quantitative performance measures of the model (R2, MAE and RMSE). These values confirm that the model can capture much of the variance in the dataset but remains less accurate than the Gradient Boosting and Extra Trees regressors. The relatively higher MAE and RMSE indicate greater average prediction errors, particularly in the upper range of corrosion rates, where residuals are more pronounced.
Comparison with GBR and ET: When contrasted with the Gradient Boosting and Extra Trees models, AdaBoost demonstrates weaker predictive capacity. Both GBR and ET produced more precise predictions with minimal residuals across all value ranges. In particular, GBR showed the best overall alignment with real values, followed closely by ET, whereas ADA struggled with high-value predictions. Thus, although AdaBoost provides reasonable performance for lower corrosion rates, its limitations in handling extreme values make it less suitable than GBR and ET for this dataset.

3.4. Sensitivity Analysis in Machine Learning Models

Sensitivity analysis is employed to examine how variations in a model’s input values influence its output, with the aim of identifying which inputs exert the greatest impact on the results. This process is essential for assessing the reliability and robustness of model predictions on the dataset. Moreover, sensitivity analysis is widely used by researchers to determine which independent variables are most critical to predictive performance and which variables have only a limited effect when altered.
To understand the relative contribution of different factors in corrosion rate prediction, a comprehensive feature importance analysis was conducted using the optimized Gradient Boosting Regressor model, which demonstrated the best predictive performance among the evaluated ensemble methods. This analysis systematically identified and quantified the hierarchical contribution of material properties (chemical composition), processing conditions (heat treatment temperature time, deformation rates), and experimental parameters (test duration, pH, test temperature, test method) to corrosion rate prediction accuracy. The resulting feature importance rankings, derived from the literature-based dataset, revealed the relative significance of each parameter class in determining corrosion behaviour, with detailed results presented in Figure 6 and Table 8.
The visual representation of feature rankings in Figure 6 demonstrates the clear dominance of certain parameters, while the detailed quantitative analysis presented in Table 8 provides the precise importance scores that enable a more thorough interpretation of each factor’s contribution to the corrosion prediction model.
This feature importance analysis provides valuable insights into materials science and corrosion engineering by revealing the hierarchical order of factors influencing corrosion rate prediction. Test duration (0.483) stands out as the most dominant factor, and this result is in excellent agreement with the time-progressive kinetic nature of corrosion processes. Because corrosion is essentially a process driven by the cumulative effects of electrochemical reactions over time, the model’s identification of exposure duration as the most important parameter is in full agreement with the scientific literature.
The second-highest ranking of the pH factor (0.131) highlights the critical role of the chemical characteristics of the corrosion environment. pH directly influences the thermodynamic and kinetic behaviour of electrochemical reactions at the metal surface, thereby exponentially influencing the corrosion rate. This high importance value demonstrates that the model successfully captures the strong influence of environmental conditions on corrosion processes.
The third and fourth rankings of alloying elements such as aluminum (0.093) and zinc (0.076) demonstrate the decisive influence of metallurgical composition on corrosion resistance. Aluminum’s high prominence can be explained by its role in supporting passive film formation and its modifier effect on galvanic corrosion behaviour. Zinc’s prominence reflects its role in cathodic protection mechanisms.
The fifth-place position of the heat treatment temperature (0.071) demonstrates the impact of thermal treatments on microstructural changes. The temperature parameter indirectly affects the corrosion behaviour of the material by controlling grain size distribution, phase transformations, and precipitate formation. The relatively lower importance of the heat treatment factor (0.045) could be explained by the fact that the samples in the dataset were processed in similar metallurgical conditions or by the masking effect of other factors.
The moderate importance values for other alloying elements, such as manganese (0.028) and magnesium (0.027), indicate that their contributions to corrosion mechanisms depend on more specific conditions. The relatively low importance of the test method parameter (0.023) is noteworthy, as different test procedures (immersion, electrochemical impedance, salt spray) generally activate different corrosion mechanisms. This result suggests that the test methods used in the dataset may have simulated similar conditions.
The low importance values for deformation ratio (0.011), rare earth elements (0.006), and other elements (0.002) indicate that these factors have limited impact in this specific dataset. These results demonstrate that the model reflects the characteristics of the dataset and performs domain-specific optimization.

4. Conclusions

In this study, the PyCaret AutoML library was employed, providing an integrated workflow that automates data preprocessing, model selection and comparison, hyperparameter tuning, model evaluation, and deployment. This framework, with model accuracy values serving as a central criterion in the decision-making process, has facilitated the determination of the most suitable regression models for the dataset.
Gradient Boosting Regressor (GBR), Extra Trees Regressor (ET), and AdaBoost (ADA) were chosen for further analysis among the evaluated regression models. They showed superior predictive performance, as reflected in relatively low MAE and RMSE values and the high R2 scores compared to alternative models. Moreover, ensemble-based approaches were represented by these three methods —boosting in the case of GBR and bagging in the case of ET and ADA—which are widely recognized in the literature for their robustness and generalization ability. By focusing on these models, this study aimed to provide a balanced and comprehensive evaluation of ensemble techniques, supported by feature importance analysis and visual inspection through distribution of real and predicted values. The most influential parameters were test duration and pH, as revealed in the feature importance analysis. This aligns well with established corrosion science and confirms the reliability of the models. These results highlight not only the validation of the dataset but also the ability of AutoML frameworks such as PyCaret to accelerate and optimize model development in materials science applications.
Overall, PyCaret AutoML integrated with advanced ensemble models can provide a powerful and efficient workflow for predictive modelling. Thanks to the proven success of this approach, a broader potential application in experimental materials research and how automated machine learning can contribute to data-driven decision-making in scientific studies are showcased.

Author Contributions

Conceptualization, T.Y., H.Z.; methodology, T.Y.; software, machine learning regression and modelling, T.Y.; validation, T.Y., H.Z.; formal analysis, T.Y.; investigation, T.Y.; resources, T.Y., H.Z.; data curation, T.Y.; writing—original draft preparation, T.Y.; writing—review and editing, T.Y., H.Z.; visualization, T.Y.; funding acquisition, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

Supported by Johannes Kepler Open Access Publishing Fund and the federal state of Upper Austria.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bai, J.; Yang, Y.; Wen, C.; Chen, J.; Zhou, G.; Jiang, B.; Peng, X.; Pan, F. Applications of Magnesium Alloys for Aerospace: A Review. J. Magnes. Alloys 2023, 11, 3609–3619. [Google Scholar] [CrossRef]
  2. Turan, M.E.; Sun, Y.; Akgul, Y. Mechanical, Tribological and Corrosion Properties of Fullerene Reinforced Magnesium Matrix Composites Fabricated by Semi Powder Metallurgy. J. Alloys Compd. 2018, 740, 1149–1158. [Google Scholar] [CrossRef]
  3. Abdulgadir, M.M.; Demir, B.; Turan, M.E. Hybrid Reinforced Magnesium Matrix Composites (Mg/Sic/GNPs): Drilling Investigation. Metals 2018, 8, 215. [Google Scholar] [CrossRef]
  4. Zengin, H.; Turen, Y.; Turan, M.E. Tensile and Wear Properties of As-Cast and as-Extruded ZK60 Magnesium Alloys Containing Minor Nd Additions. Mater. Res. Express 2019, 6, 086528. [Google Scholar] [CrossRef]
  5. Atrens, A.; Song, G.-L.; Liu, M.; Shi, Z.; Cao, F.; Dargusch, M.S. Review of Recent Developments in the Field of Magnesium Corrosion. Adv. Eng. Mater. 2015, 17, 400–453. [Google Scholar] [CrossRef]
  6. Zengin, H.; Ari, S.; Turan, M.E.; Hassel, A.W. Evolution of Microstructure, Mechanical Properties, and Corrosion Resistance of Mg–2.2 Gd–2.2 Zn–0.2 Ca (Wt%) Alloy by Extrusion at Various Temperatures. Materials 2023, 16, 3075. [Google Scholar] [CrossRef]
  7. Praveen, T.S.; Padmanaban, R.; Vignesh, R.V.; Baghad, A. Investigations on the Corrosion Characteristics of Additive Manufactured Mg–Ag Alloy in Simulated Body Fluids for Biodegradable Implants. Prog. Addit. Manuf. 2024, 10, 1281–1292. [Google Scholar] [CrossRef]
  8. Gong, C.; Yan, X.; He, X.; Su, Q.; Liu, B.; Chen, F.; Fang, D. Influence of Homogenization Treatment on Corrosion Behavior and Discharge Performance of the Mg–2Zn–1Ca Anodes for Primary Mg-Air Batteries. Mater. Chem. Phys. 2022, 280, 125802. [Google Scholar] [CrossRef]
  9. Li, B.; Duan, Y.; Zheng, S.; Li, M.; Peng, M.; Qi, H. Microstructural Homogeneity, Texture and Corrosion Properties of RE-Doped 55Mg-35Pb-9.2Al-0.8B Alloy Fabricated via Equal Channel Angular Pressing (ECAP). J. Alloys Compd. 2023, 966, 171607. [Google Scholar] [CrossRef]
  10. Bai, R.; Feng, Y.; Li, J. Effect of Extrusion Temperature on the Microstructure, Mechanical Properties, and In Vitro Corrosion Behavior of a Biodegradable Mg-Zn-Ca-Mn-Sn Alloy. J. Mater. Eng. Perform. 2024, 34, 10448–10459. [Google Scholar] [CrossRef]
  11. Huang, S.-J.; Wang, C.-F.; Subramani, M.; Fan, F.-Y. Effect of ECAP on Microstructure, Mechanical Properties, Corrosion Behavior, and Biocompatibility of Mg-Ca Alloy Composite. J. Compos. Sci. 2023, 7, 292. [Google Scholar] [CrossRef]
  12. Zerankeshi, M.M.; Alizadeh, R.; Gerashi, E.; Asadollahi, M.; Langdon, T.G. Effects of Heat Treatment on the Corrosion Behavior and Mechanical Properties of Biodegradable Mg Alloys. J. Magnes. Alloys 2022, 10, 1737–1785. [Google Scholar] [CrossRef]
  13. Li, D.-W.; Wang, H.-Y.; Wei, D.-S.; Zhao, Z.-X.; Liu, Y. Effects of Deformation Texture and Grain Size on Corrosion Behavior of Mg–3Al–1Zn Alloy Sheets. ACS Omega 2020, 5, 1448–1456. [Google Scholar] [CrossRef] [PubMed]
  14. Guo, Y.; Sun, M.; Zhang, W.; Wang, L. Machine Learning in Enhancing Corrosion Resistance of Magnesium Alloys: A Comprehensive Review. Metals 2023, 13, 1790. [Google Scholar] [CrossRef]
  15. Ghorbani, M.; Boley, M.; Nakashima, P.N.H.; Birbilis, N. A Machine Learning Approach for Accelerated Design of Magnesium Alloys. Part B: Regression and Property Prediction. J. Magnes. Alloys 2023, 11, 4197–4205. [Google Scholar] [CrossRef]
  16. Moses, A.; Chen, D.; Wan, P.; Wang, S. Prediction of Electrochemical Corrosion Behavior of Magnesium Alloy Using Machine Learning Methods. Mater. Today Commun. 2023, 37, 107285. [Google Scholar] [CrossRef]
  17. GitHub—Pycaret/Pycaret: An Open-Source, Low-Code Machine Learning Library in Python. Available online: https://github.com/pycaret/pycaret (accessed on 20 September 2025).
  18. Samiei, S.; Dini, G.; Ebrahimian-Hosseinabadi, M. Correlation Between Microstructure, Mechanical Properties, and Corrosion Characteristics of AZ31 Mg Alloy Processed by Accumulative Roll Bonding Process. Met. Mater. Int. 2023, 29, 192–203. [Google Scholar] [CrossRef]
  19. Sadeghi, A.; Hasanpur, E.; Bahmani, A.; Shin, K.S. Corrosion Behaviour of AZ31 Magnesium Alloy Containing Various Levels of Strontium. Corros. Sci. 2018, 141, 117–126. [Google Scholar] [CrossRef]
  20. Chaudry, U.M.; Farooq, A.; bin Tayyab, K.; Malik, A.; Kamran, M.; Kim, J.-G.; Li, C.; Hamad, K.; Jun, T.-S. Corrosion Behavior of AZ31 Magnesium Alloy with Calcium Addition. Corros. Sci. 2022, 199, 110205. [Google Scholar] [CrossRef]
  21. Zohdy, K.M.; El-Sherif, R.M.; El-Shamy, A.M. Effect of pH Fluctuations on the Biodegradability of Nanocomposite Mg-Alloy in Simulated Bodily Fluids. Chem. Pap. 2023, 77, 1317–1337. [Google Scholar] [CrossRef]
  22. Rogachev, S.O.; Bazhenov, V.E.; Bautin, V.A.; Li, A.V.; Plegunova, S.V.; Ten, D.V.; Yushchuk, V.V.; Komissarov, A.A.; Shin, K.S. Influence of Hot Rolling on Microstructure, Corrosion and Mechanical Properties of Mg–Zn–Mn–Ca Alloy. Metals 2024, 14, 1249. [Google Scholar] [CrossRef]
  23. Pourhasan, N.; Sabbaghian, M.; Pishbin, F.; Mahmudi, R. Enhanced Corrosion Behavior and SCC Resistance of a Cast Mg–0.4 Zr Alloy after Hot Extrusion. J. Mater. Res. Technol. 2025, 39, 585–596. [Google Scholar] [CrossRef]
  24. Abdelfattah, K.B.; Abbas, M.A.; El-Garaihy, W.H.; Mohamed, A.M.; Salem, H.G. Corrosion and Degradation Behavior of MCSTE-Processed AZ31 Magnesium Alloy. Sci. Rep. 2025, 15, 4072. [Google Scholar] [CrossRef] [PubMed]
  25. Ma, Y.; Xiong, H.; Chen, B. Effect of Heat Treatment on Microstructure and Corrosion Behavior of Mg-5Al-1Zn-1Sn Magnesium Alloy. Corros. Sci. 2021, 191, 109759. [Google Scholar] [CrossRef]
  26. Liang, M.; Liu, H.; Wu, C.; Li, Y.; Guo, Z.; Murugadoss, V. Effects of Rare Earth Neodymium (Nd) and Heat Treatment on Anti-Corrosion Behaviors of the AZ80 Magnesium Alloy. Adv. Compos. Hybrid Mater. 2022, 5, 1460–1476. [Google Scholar] [CrossRef]
  27. Dargusch, M.S.; Shi, Z.; Zhu, H.; Atrens, A.; Song, G.-L. Microstructure Modification and Corrosion Resistance Enhancement of Die-Cast Mg-Al-Re Alloy by Sr Alloying. J. Magnes. Alloys 2021, 9, 950–963. [Google Scholar] [CrossRef]
  28. Bazhenov, V.E.; Li, A.V.; Komissarov, A.A.; Koltygin, A.V.; Tavolzhanskii, S.A.; Bautin, V.A.; Voropaeva, O.O.; Mukhametshina, A.M.; Tokar, A.A. Microstructure and Mechanical and Corrosion Properties of Hot-Extruded Mg–Zn–Ca–(Mn) Biodegradable Alloys. J. Magnes. Alloys 2021, 9, 1428–1442. [Google Scholar] [CrossRef]
  29. Zhang, M.; Deng, W.-L.; Yang, X.-N.; Wang, Y.-K.; Zhang, X.-Y.; Hang, R.-Q.; Deng, K.-K.; Huang, X.-B. In Vitro Biodegradability of Mg–2Gd–xZn Alloys with Different Zn Contents and Solution Treatments. Rare Met. 2019, 38, 620–628. [Google Scholar] [CrossRef]
  30. Yavuzyegit, B.; Karali, A.; De Mori, A.; Smith, N.; Usov, S.; Shashkov, P.; Bonithon, R.; Blunn, G. Evaluation of Corrosion Performance of AZ31 Mg Alloy in Physiological and Highly Corrosive Solutions. ACS Appl. Bio Mater. 2024, 7, 1735–1747. [Google Scholar] [CrossRef]
  31. Feliu, S., Jr.; Llorente, I. Corrosion Product Layers on Magnesium Alloys AZ31 and AZ61: Surface Chemistry and Protective Ability. Appl. Surf. Sci. 2015, 347, 736–746. [Google Scholar] [CrossRef]
  32. Rogachev, S.O.; Bazhenov, V.E.; Komissarov, A.A.; Li, A.V.; Munzaferova, N.E.; Plegunova, S.V.; Ten, D.V. Microstructure, Mechanical, and Corrosion Properties of Mg-Zn-Ga Alloy after Hot Rolling. J. Mater. Eng. Perform. 2025, 34, 3970–3978. [Google Scholar] [CrossRef]
  33. Liu, J.; Han, E.-H.; Song, Y.; Shan, D. Effect of Twins on the Corrosion Behavior of Mg–5Y–7Gd–1Nd–0.5 Zr Mg Alloy. J. Alloys Compd. 2018, 757, 356–363. [Google Scholar] [CrossRef]
  34. Whig, P.; Gupta, K.; Jiwani, N.; Jupalle, H.; Kouser, S.; Alam, N. A Novel Method for Diabetes Classification and Prediction with Pycaret. Microsyst. Technol. 2023, 29, 1479–1487. [Google Scholar] [CrossRef]
  35. Pillay, N.; Qu, R. (Eds.) Automated Design of Machine Learning and Search Algorithms; Natural Computing Series; Springer International Publishing: Cham, Switzerland, 2021; ISBN 978-3-030-72068-1. [Google Scholar]
  36. Baratchi, M.; Wang, C.; Limmer, S.; Van Rijn, J.N.; Hoos, H.; Bäck, T.; Olhofer, M. Automated Machine Learning: Past, Present and Future. Artif. Intell. Rev. 2024, 57, 122. [Google Scholar] [CrossRef]
  37. Tekin Ünver, R.; Bayraktar, C.; Demir, B. The Regression Analysis of Dry—Wet Wear Outcomes and Materials Properties of Biodegradable MgCu and MgZn, Made by P/M, Using Machine Learning Models. Appl. Phys. A 2025, 131, 311. [Google Scholar] [CrossRef]
  38. Gholamy, A.; Kreinovich, V.; Kosheleva, O. Why 70/30 or 80/20 Relation Between Training and Test Sets: A Pedagogical Explanation; University of Texas at El Pase, Departmental Technical Reports: El Paso, TX, USA, 2018. [Google Scholar]
  39. Joseph, V.R. Optimal ratio for data splitting. Stat. Anal. Data Min. 2022, 15, 531–538. [Google Scholar] [CrossRef]
  40. Qi, J.; Du, J.; Siniscalchi, S.M.; Ma, X.; Lee, C.-H. On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression. IEEE Signal Process. Lett. 2020, 27, 1485–1489. [Google Scholar] [CrossRef]
  41. Rutkowski, L.; Jaworski, M.; Duda, P. Stream Data Mining: Algorithms and Their Probabilistic Properties; Studies in Big Data; Springer International Publishing: Cham, Switzerland, 2020; Volume 56, ISBN 978-3-030-13961-2. [Google Scholar]
  42. Deperlioğlu, Ö.; Köse, U. Python Ile Makine Öğrenmesi; Seçkin Yayıncılık: Ankara, Turkey, 2024; ISBN 978-975-02-9027-5. [Google Scholar]
  43. Poldrack, R.A.; Huckins, G.; Varoquaux, G. Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry 2020, 77, 534–540. [Google Scholar] [CrossRef]
Figure 1. Hyperparameters used in the PyCaret library for regression analysis.
Figure 1. Hyperparameters used in the PyCaret library for regression analysis.
Metals 15 01183 g001
Figure 2. Learning curve for GBR Regression.
Figure 2. Learning curve for GBR Regression.
Metals 15 01183 g002
Figure 3. Distribution of real and predicted values obtained by the Gradient Boosting Regression model.
Figure 3. Distribution of real and predicted values obtained by the Gradient Boosting Regression model.
Metals 15 01183 g003
Figure 4. Distribution of real and predicted values obtained by the extra trees regression model.
Figure 4. Distribution of real and predicted values obtained by the extra trees regression model.
Metals 15 01183 g004
Figure 5. Distribution of real and predicted values obtained by the random forest regression model.
Figure 5. Distribution of real and predicted values obtained by the random forest regression model.
Metals 15 01183 g005
Figure 6. Top 10 features ranked by feature importance in corrosion rate prediction.
Figure 6. Top 10 features ranked by feature importance in corrosion rate prediction.
Metals 15 01183 g006
Table 1. Representative dataset samples showing the impact of the five most significant variables on corrosion rate.
Table 1. Representative dataset samples showing the impact of the five most significant variables on corrosion rate.
Test
Duration
(h)
pHAl
(%)
Test
Method
Deformation
Rate (%)
Corrosion
Rate
(mm/Year)
Reference
33673Immersion-0.28[30]
2.57.53PD502.387[18]
0.2570PD51.321[33]
2167.40Immersion-19[29]
2474HE-0.28[27]
0.257.40PD5002.520[28]
0.7580PD970.23[22]
175PD-1.740[25]
19270HE940.28[32]
7278.52Immersion-11.500[26]
7278.52Immersion-12.800[26]
0.57.30PD905.400[23]
1273Immersion-0.82[19]
2.57.53PD-2.123[18]
0.863PD721[24]
7273Immersion-1.32[19]
3367.26EIS-3.500[31]
7273Immersion-1.35[19]
2475PD-3.700[25]
7278.52Immersion-5.500[26]
PD: Potentiodynamic polarization, HE: Hydrogen evolution; EIS: Electrochemical Impedance Spectroscopy.
Table 2. Regression experiment configuration in PyCaret.
Table 2. Regression experiment configuration in PyCaret.
DescriptionValueDescriptionValue
Session id42Categorical ImputationMode
TargetCorrosion RateMax. one-hot encoding25
Target typeRegressionTransform TargetTrue
Original data shape(103, 15)Transform target methodYeo
Transformed data shape(103, 48)Fold GeneratorK-Fold
Transformed train set (72, 48)Fold Number10
Transformed test set(31, 48)CPU Jobs−1
Numeric features10Use GPUFalse
Categorical features4USIFb37
Rows with missing values1.9%Numeric ImputationMean
PreprocessTrueLog ExperimentFalse
Imputation TypeSimpleExperiment NameReg
Table 3. Comparison of regression models based on 10-fold cross-validation.
Table 3. Comparison of regression models based on 10-fold cross-validation.
Regressor
Models
MAEMSERMSER2RMSLEMAPETT (Sec)
GBRGradient Boosting 0.19960.44890.41150.99010.03470.05720.0980
ETExtra Trees0.35780.83750.64590.97850.07250.14510.1650
ADAAdaBoost0.46641.92060.94380.96010.07040.26090.1110
RFRandom Forest0.35371.21130.74200.95500.06160.26440.5870
DTDecision Tree 0.42801.29170.85780.93750.07910.11450.0810
KNNK Neighbours2.478221.61023.8918−0.34680.62144.03650.0850
LRLinear 1.484819.03842.9840−0.61810.34801.00010.0850
SVMSupport Vector2.984832.05134.8763−1.51080.78885.57060.0990
Table 4. Optimized hyperparameters and test set performance metrics for the three best regression models after randomized search optimization.
Table 4. Optimized hyperparameters and test set performance metrics for the three best regression models after randomized search optimization.
Regressor
Models
R 2
MAERMSESubsamplen_EstimatorsLearning_Rate
GBR0.95771.07121.55651.03000.15
ET0.92901.26302.0173NaN500NaN
ADA0.89151.68522.4934NaN500.01
Table 5. Gradient Boosting Regression results: real and predicted values.
Table 5. Gradient Boosting Regression results: real and predicted values.
Model
Id
Corrosion Rate
Real (mm/Year)
Corrosion Rate
Predicted (mm/Year)
Residual
3018.0019.00−1.000
670.9300.976−0.046
628.4109.753−1.343
470.2800.2800.000
420.3000.3000.000
400.2750.2750.000
902.0301.9600.070
451.7001.6990.001
101.1001.0500.050
00.2800.2800.000
181.0000.9760.024
3126.20024.5141.686
977.8007.2980.502
850.4100.4090.001
760.4800.4790.001
Table 6. Extra trees regression results: real and predicted values.
Table 6. Extra trees regression results: real and predicted values.
Model IdCorrosion Rate
Real (mm/Year)
Corrosion Rate
Predicted (mm/Year)
Residual
3018.0018.84−0.840
670.9300.7360.194
628.4108.1330.277
470.2800.286−0.006
420.3000.319−0.019
400.2750.301−0.026
902.0302.0230.007
451.7001.701−0.001
101.1001.109−0.009
00.2800.926−0.646
181.0000.9890.011
3126.20024.9811.219
977.8007.818−0.018
850.4100.3820.028
760.4800.4610.019
Table 7. AdaBoost regression results: real and predicted values.
Table 7. AdaBoost regression results: real and predicted values.
Model
Id
Corrosion Rate
Real (mm/Year)
Corrosion Rate
Predicted (mm/Year)
Residual
3018.0015.542.460
670.9300.973−0.043
628.41011.67−3.260
470.2800.2600.002
420.3000.305−0.005
400.2750.2560.019
902.0301.9460.084
451.7001.736−0.036
101.1001.157−0.057
00.2800.2600.020
181.0000.9850.015
3126.20015.79210.408
977.8007.1910.609
850.4100.3890.021
760.4800.3900.009
Table 8. Feature Importance Results for Corrosion Rate Prediction.
Table 8. Feature Importance Results for Corrosion Rate Prediction.
Test_Duration0.483318
pH0.131010
Al0.093299
Zn0.076419
Heat_Treatment_Temperature0.070875
Test_Temperature0.045259
Mn0.028079
Mg0.027290
Test_Method0.022538
Deformation_Rate0.011346
RE0.006233
Heat_Treatment_Time0.002653
Other_Elements0.001681
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yıldırım, T.; Zengin, H. Predicting Corrosion Behaviour of Magnesium Alloy Using Machine Learning Approaches. Metals 2025, 15, 1183. https://doi.org/10.3390/met15111183

AMA Style

Yıldırım T, Zengin H. Predicting Corrosion Behaviour of Magnesium Alloy Using Machine Learning Approaches. Metals. 2025; 15(11):1183. https://doi.org/10.3390/met15111183

Chicago/Turabian Style

Yıldırım, Tülay, and Hüseyin Zengin. 2025. "Predicting Corrosion Behaviour of Magnesium Alloy Using Machine Learning Approaches" Metals 15, no. 11: 1183. https://doi.org/10.3390/met15111183

APA Style

Yıldırım, T., & Zengin, H. (2025). Predicting Corrosion Behaviour of Magnesium Alloy Using Machine Learning Approaches. Metals, 15(11), 1183. https://doi.org/10.3390/met15111183

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop