Interpretable Machine-Learning Prediction of Atmospheric Zinc Corrosion Depth Under Diverse Environmental Conditions

Jain, Sandeep; Mourya, Rahul Singh; Jain, Reliance; Dewangan, Sheetal Kumar; Tiwari, Saurabh

doi:10.3390/pr14081214

Open AccessArticle

Interpretable Machine-Learning Prediction of Atmospheric Zinc Corrosion Depth Under Diverse Environmental Conditions

by

Sandeep Jain

^1,*

,

Rahul Singh Mourya

²,

Reliance Jain

³

,

Sheetal Kumar Dewangan

³

and

Saurabh Tiwari

^1,*

¹

School of Materials Science and Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea

²

Department of Mechanical Engineering, Shri Govindram Seksaria Institute of Technology and Science, Indore 452003, India

³

Department of Materials Science and Engineering, Ajou University, Suwon 16419, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Processes 2026, 14(8), 1214; https://doi.org/10.3390/pr14081214

Submission received: 17 March 2026 / Revised: 7 April 2026 / Accepted: 8 April 2026 / Published: 10 April 2026

(This article belongs to the Special Issue Advances in Research on the Corrosion Properties of Metal Compounds in Alloys)

Download

Browse Figures

Versions Notes

Abstract

Understanding the depth and severity of corrosion is vital for evaluating the long-term durability and economic performance of Zn-based structures. In this study, a machine learning (ML) framework was applied to forecast the corrosion depth of zinc under varying environmental circumstances. A dataset consisting of 300 samples compiled from previously published atmospheric corrosion studies under various environmental conditions was used to develop and evaluate the machine learning models. Seven ML algorithms were developed by integrating different environmental constraints such as temperature, time of wetness (TOW), SO₂ concentration, Cl⁻ concentration, and exposure time as input parameters. The models were trained using cross-validation and hyperparameter optimization to ensure robust predictive performance and minimize overfitting. The Random Forest (RF) model confirmed superior predictive performance with an R² of 96.4% and RMSE of 0.642 µm among all used models. The predictive ability of the optimized RF model was further confirmed using five new environmental systems, attaining excellent agreement with predicted values (R² = 97.9%, RMSE = 0.87 µm). Model interpretability analysis using SHAP (SHapley Additive exPlanations) discovered that exposure time and SO₂ concentration are the most significant parameters leading zinc corrosion behaviour. The developed ML framework provides interpretable insights into the influence of environmental parameters on atmospheric zinc corrosion behaviour and provides a reliable tool for forecasting corrosion depth. These findings highlight the potential of ML approaches to support corrosion mitigation strategies and accelerate materials design by reducing reliance on conventional trial-and-error experimentation.

Keywords:

machine learning; corrosion behaviour; Zn alloys; SHAP analysis

1. Introduction

Zinc (Zn) and its alloys are crucial engineering materials employed in several industries. The wide application of these materials is due to their unique combination of properties, including excellent corrosion resistance, considerable dimensional stability during casting and beneficial traits such as recyclability, biocompatibility, etc. [1,2,3,4,5]. In the building sector, Zn based roofing systems and galvanized steel components are recognized for their capability to last over a century under typical urban environmental circumstances [6]. Zinc primarily protects ferrous substrates by acting as both a physical barrier to corrosive species and a sacrificial anode that provides cathodic protection [7]. Zinc has recently received prominence in biological sectors, primarily in biodegradable coronary stents, in addition to its conventional engineering applications [8,9,10]. In spite of these advantages, atmospheric corrosion poses a notable challenge to the durability and financial viability of Zn based infrastructures in outdoor environments [10,11,12,13]. The corrosion process is influenced by different environmental factors, including temperature, sulfur dioxide (SO₂) concentration, chloride ion deposition, and time of wetness (TOW) [14,15]. The effectiveness of these corrosion products as protective barriers is highly dependent on their adhesion, structural morphology, and ionic permeability [16]. More thorough statistical processes have been established by the UN/ECE ICP Materials programme, but they still rely on linear correlations. As a result, these approaches may fail to capture the nonlinear and synergistic interactions characteristic of specific exposure conditions, such as the sharp increase in corrosion depth when the time of wetness (TOW) exceeds 80% or the bell-shaped response observed at elevated concentrations of SO₂ and chloride ions [14,17].

The experimental approach is often time-consuming and resource-intensive, requiring significant energy, labour, and material costs [18]. It is challenging to identify the effective environmental factors with various combinations to characterize the corrosion behaviour for required applications. In order to solve these limitations and accelerate the design of high-performance materials, computational approaches are the focus of researchers and machine learning (ML) has emerged as the most important approach [19,20]. Over the past few decades, the application of machine learning (ML) to corrosion prediction has been motivated by the growing accessibility of atmospheric corrosion datasets and collective advancements in computational capability [21,22,23]. ML algorithms can implicitly learn nonlinear, high-dimensional relationships from experimental data without a prescribed functional form, making them well suited to the coupled, multivariate nature of atmospheric corrosion [14,21]. In previous work, Cai et al. [14] demonstrated that a backpropagation ANN trained on multi-site exposure data substantially outperformed conventional regression for zinc corrosion prediction. Kenny et al. [20] developed an ANN for metal corrosion in equatorial climates, while Zulkifli et al. [22] applied multilayer perceptron networks to predict the atmospheric corrosion rate of aluminum alloy, reporting improved accuracy using a Gradient Boosting model with an R² of 0.835. A study by Zhi et al. [24] shows that a Random Forest model outperformed ANN, support vector regression (SVR), and logistic regression for forecasting outdoor atmospheric corrosion rates of low-alloy steels. Although ML works are becoming increasingly accessible, there is still a lack of systematic comparative studies on various algorithms specifically for forecasting atmospheric zinc corrosion. Most existing studies have benchmarked a single algorithm against a classical baseline, leaving unresolved the question of which model class is best suited to the nonlinear multivariate dataset characteristics specific to zinc. There are still a number of significant gaps in the use of machine learning approaches to corrosion prediction, despite growing interest in this area. While thorough comparisons between various algorithms and model interpretability are frequently disregarded, many current studies concentrate on a small number of machine learning models and mostly stress predicted accuracy. Understanding how environmental factors affect corrosion behaviour is just as crucial for corrosion engineers as making precise predictions. Therefore, to enhance forecast performance and offer insights into the relative significance of environmental elements regulating atmospheric zinc corrosion, interpretable machine learning algorithms are required. Furthermore, the interpretability of ML predictions, particularly the extraction of physically meaningful, condition-dependent insights distinguishing controlling mechanisms at high-severity versus low-severity corrosion extremes, has received limited attention and is directly relevant to the design of targeted corrosion prevention strategies for Zn-coated infrastructure in industrial, marine, and urban environments [14,24].

Therefore, the primary aim of the present study is to fulfil the current research gap with the help of different ML algorithms. In order to accurately forecast corrosion behaviour, we suggest a machine learning architecture that takes into account different environmental factors affecting the corrosion behaviour. This innovative technique opens up new possibilities for the creation of different materials by providing a successful and effective approach to the prediction of corrosion behaviour. Better applications with lower prices, time, and energy usage will be made possible by these developments.

2. Methodology

2.1. Data Collection

Any ML process is fundamentally dependent on data to establish the relationship between input and output parameters. This data is the pillar of the prediction work. The initial step in any ML process is the collection of the dataset. The size of the dataset determines the efficiency of the prediction models and their accuracy level. Therefore, collection of the dataset from simulation studies, experimental results and different literature is very critical. In the present study, a total of 300 datasets were collected from different sources by including different descriptors and one output in terms of corrosion depth [25]. Temperature (°C), time of wetness (TOW) (annual fraction), exposure time (years), SO₂ concentration (μg/m³), and chloride deposition rate (mg/m²/day) are all measured in the data as input parameters. Due to data restrictions, other elements (such as humidity and pH) were not included. These parameters were chosen based on their established impact on the corrosion behaviour of zinc.

Larger datasets are generally better for machine learning applications, but the availability of large, high-quality datasets is limited since air corrosion investigations sometimes need long-term exposure experiments and site-specific environmental monitoring. In corrosion prediction investigations where extensive exposure times are necessary for experimental measurements, similar dataset sizes have been often utilized. Robust validation techniques like cross-validation and independent testing were used to maximize the efficient use of the given data and assess model generalization performance in order to guarantee dependable model development. Because the data originate from multiple sources, the dataset represents a heterogeneous collection of exposure environments and experimental observations. To ensure consistency for machine learning analysis, the collected data were carefully reviewed and standardized, and all parameters were converted into uniform units before being used for model training and validation. The complete detail of the used dataset has been given in Figure 1.

2.2. Development of ML Algorithms

After preparing the dataset, selection of ML algorithm is the most important step. In the present study, seven different ML algorithms such as Random Forest (RF), CatBoost, XGBoost, Extra Tree (ET), Support vector regressor (SVR), Decision Tree (DT) and KNN were choosen based on their strength and weakness in handling the complex dataset. In order to provide a thorough comparison of prediction performance, these models were chosen to reflect both ensemble-based approaches and traditional machine learning techniques.

To optimize these algorithms for better results, a hyperparameter tuning approach was used with the help of a five-fold cross-validation technique, which allow us to make an iterative adjustment of different hyperparameters through repeated training and testing by improving their prediction skills. To mitigate potential overfitting associated with moderate dataset sizes, multiple machine learning algorithms were evaluated and their performance was assessed using cross-validation and independent test datasets. This approach enables reliable comparison of model robustness and ensures that the learned relationships between environmental variables and zinc corrosion depth are not model-specific.

The data was normalized using the minmax scalar before it was fed into the models. After successful development of these models, predicted values were denormalized to return them to original scale. The collected dataset was divided into three parts in the ratio of 80:10:10 used for training, testing and validation respectively by ensuring robust model evaluation and minimizing the risk of overfitting. This training set was used to build and tune the different ML models to optimize in a better way. The testing set was used as an intermediate checkpoint during the optimization work. After optimizing the models, the validation set was used to evaluate the performance of the finalized model because this dataset was kept isolated from the model optimization process. This whole procedure was conducted on a workstation with an Intel Core i7 processor, 128 GB RAM and NVIDIA GeForce RTX 3070 GPU. The whole analysis was caried out using the Python library “scikit-learn 1.3.1”. The complete methodology with the different steps is shown in Figure 2.

2.3. Feature Engineering

Feature engineering is also an important step to select the most relevant features by seeing their correlation with output parameter to ensure that only significant parameters were included to develop the different ML models. Pearson and Spearman correlation analysis were used to examine the connections between environmental factors and the goal variable (corrosion depth). While Spearman correlation is resilient to nonlinear dependencies and analyzes monotonic relationships based on rank-order, Pearson correlation evaluates linear relationships.

Using the best machine learning model, SHAP (SHapley Additive exPlanations) analysis was carried out to understand the impact of environmental factors on the corrosion behaviour. To measure the individual and combined effects of the descriptors on the anticipated corrosion reaction, SHAP dependence graphs were generated. Temperature, time of wetness, sulfur dioxide concentration, chloride concentration and exposure time are among the characteristics taken into account in the model. The magnitude and direction of each descriptor’s contribution to the anticipated corrosion attribute can be seen using the SHAP dependence charts.

2.4. Model Performance Evaluation

Three widely used regression measures were employed to assess the ML models’ prediction performance. While RMSE and MAE quantify the size of prediction errors, the R² metric quantifies the percentage of variance in the target variable explained by the model. Better predictive performance is shown by higher R² values and lower RMSE and MAE values. Model fitting and generalization ability were evaluated across training, validation, and testing datasets. The validation performance was used for model comparison instead of training and testing performance for reliability. To evaluate the performance in better way, computational efficiency was assessed. Two important indicators were taken for it. The amount of time which is desirable to train the model and the memory requirements. The scalability and computational cost of used ML models can be confirmed with the help of these two metrics. A paired t-test was accomplished on the prediction errors in order to statistically inspect the variations in predictive performance. The t-test establishes whether a difference in mean prediction errors between two models is the result of random variation or is statistically significant. This significance was evaluated using a p-value threshold of 0.05. The performance difference between two models is deemed statistically significant if the p-value is less than 0.05. Beyond outdated performance pointers, this study offers an extra degree of validation, guaranteeing the significance and dependability of the observed variations. The final model selection was based on a comprehensive evaluation process that took into account the statistical significance of performance fluctuations, computational effectiveness, and the anticipated precision on test and validation datasets.

3. Results and Discussion

3.1. Correlation Analysis

The Pearson and Spearman correlation matrices that found the correlation between environmental factors and corrosion depth are shown in Figure 3. It proposes that SO₂ concentration and exposure time had the strongest positive correlations with corrosion depth. In Figure 3a, corrosion depth has correlation coefficients of 0.65 with SO₂ and 0.64 with exposure time; however, the Spearman analysis provides values of 0.56 and 0.70, respectively, as shown in Figure 3b. This proposes that corrosion grows more quickly at higher pollutant concentrations and longer exposure times. Chloride concentration (Cl⁻) has a slightly positive relationship, indicating its role in corrosion processes. In contrast, temperature and TOW show relatively weak correlations, signifying a limited direct influence within the studied dataset.

3.2. SHAP Analysis

3.2.1. Models Interpretation

SHAP analysis was used to regulate how environmental features affected the corrosion behaviour. By measuring the influence of each feature, it offers a reliable framework for understanding ML models. The mean absolute SHAP values, which display the overall significance of each feature, are shown in Figure 4a. The results display that the most significant factor is exposure time, which is influencing corrosion depth. The high environmental exposure inspires ongoing electrochemical reactions and growing material degradation, which highlights the collective aspect of corrosion. SO₂ contributes significantly to environmental contaminants that promote metal breakdown. By weakening protective oxide layers like pitting, chloride ions can contribute to corrosion. A summary of feature contributions to the predicted corrosion behaviour is shown in Figure 4b. SHAP values are shown on the horizontal axis, with positive values denoting a rise in anticipated corrosion and negative values denoting a decrease. Figure 4b suggests that exposure time is the most important factor affecting the corrosion behaviour. The high importance of exposure time reflects the cumulative nature of atmospheric corrosion, where corrosion damage progressively increases with prolonged environmental exposure. The significant influence of SO₂ concentration is consistent with its well-known role in accelerating atmospheric corrosion through the formation of acidic surface films, which enhance electrochemical reactions on zinc surfaces. Chloride content also makes a reasonable contribution to corrosion predictions.

3.2.2. Effect of Each Descriptor on Corrosion Behaviour

SHAP dependence plots were produced to better recognize the impact of each descriptor on the predicted corrosion response, as shown in Figure 5a–e. Figure 5a shows that the SHAP readings stay around zero or slightly negative at lower temperatures, indicating a minimal impact to corrosion. High SHAP values at higher temperatures suggest that faster electrochemical reactions at higher temperatures can intensify corrosion. Figure 5b shows that longer exposure to moisture often results in larger SHAP values, indicating that corrosion exposure increases. This finding is in line with how moisture forms the electrolytes required for corrosion processes. The SHAP contribution becomes noticeably positive as SO₂ levels rise, as shown in Figure 5c, suggesting that corrosion is greatly accelerated by greater pollutant concentrations. A similar pattern can be seen in Figure 5d, where greater SHAP values are associated with higher Cl⁻ concentrations. This reveals how Cl⁻ play a noteworthy role in producing corrosion by inspiring localized corrosion processes. The SHAP contribution intensely increases with exposure duration, highlighting the collective nature of corrosion damage over time, as shown in Figure 5e.

This analyses steadily show that exposure time, SO₂ and chloride concentration are the main environmental parameters prompting the anticipated corrosion behaviour. These results are consistent with well-established mechanisms, in which aggressive contaminants and extended exposure to the environment greatly speed up material degradation. The proposed predictive model’s dependability and physical relevance are further validated by the agreement between the machine learning interpretation and established corrosion processes.

3.3. Model Performance

3.3.1. Training and Testing Performance

Using the training and testing datasets, the predictive ability of the developed ML models was first assessed in terms of both model fitting and generalization capacity. The majority of ensemble-based models demonstrated extremely high prediction accuracy during the training phase. For example, high training R² values were attained by Extra Trees, Decision Tree, and XGBoost, respectively, demonstrating their excellent capacity to capture the correlations between input descriptors and the target attribute. With R² values more than 0.98, Random Forest and CatBoost likewise showed excellent training performance. However, because of the possibility of overfitting, assessing model performance only on training data could result in false results. Consequently, the independent test dataset was used to further investigate the prediction capability.

The testing performance of the present study is shown in Figure 6. With an R² of 0.921 and the lowest RMSE (1.726) on the test dataset, Extra Trees had the best prediction performance. CatBoost and XGBoost also showed excellent predictive abilities. Simpler ML algorithms, on the other hand, demonstrated relatively poorer prediction performance. While Decision Tree and KNN demonstrated noticeably lesser predictive accuracy, SVR attained a better performance. Clear overfitting, in which the model fits the training data exceptionally well but is unable to generalize to unseen data, is indicated by the significant discrepancy between training and test performance for Decision Tree. In comparison to conventional algorithms, ensemble-based ML models offer improved prediction performance and better generalization capabilities.

3.3.2. Validation Performance

The validation step is finalized by using the dataset which was reserved and not included in training and testing process. We can evaluate the performance of these models in better way with more accuracy and reliability by using this validation dataset. The RF model produced the best predictive performance with an R² value of 0.964, as shown in Figure 7. With the RF model, CB and XGB both displayed exciting performances, demonstrating the capacity of boosting algorithms to precisely forecast nonlinear relationships. Conventional techniques on the other hand, had somewhat poorer results, demonstrating that they had trouble grasping the intricate patterns found in the dataset. These results highlight the benefit of using ensemble learning techniques to estimate the corrosion behaviour.

3.3.3. Computational Efficiency of ML Models

The computational efficiency of all used ML models is shown in Figure 8 in the form of training time and memory size. It is clear from Figure 8 that XGB had the quickest training time (0.16 s) of the used models while retaining strong predictive performance, indicating its computational efficiency for large-scale applications. Although SVR, Decision Tree, and KNN also had relatively fast training times, they performed less in prediction.

On the other hand, ensemble models like RF and ET generated higher model sizes and longer training durations. Despite their excellent predicted accuracy, these models may not be as effective in large datasets or real-time applications due to their increased processing cost. With a reasonable training time (0.89 s) and a relatively small model size (~70 KB), CatBoost showed a balanced trade-off between computational cost and predictive performance. These results show that, in comparison to previous ensemble techniques, XGBoost and CatBoost offer better computational efficiency, even while ensemble models give superior predictive performance.

3.3.4. Statistical Significance Analysis

The prediction errors of the assessed algorithms were subjected to a paired t-test in order to statistically validate the variations in predictive performance among the models. A t-test was performed using the prediction errors obtained from the testing dataset to statistically evaluate the agreement between the predicted and experimentally measured corrosion depth values. The test assumes approximately normally distributed errors and was used as an additional statistical measure to assess model reliability. The results of t-test have been shown in Table 1. Table 1 shows that there are typically no statistically significant differences (p > 0.05) across the ensemble models (Random Forest, CatBoost, XGBoost, and Extra Trees). This implies that these models offer similar predictive performance for the dataset under present study. However, comparing ensemble models with more straightforward techniques revealed a number of statistically significant changes. For instance, compared to Decision Tree (p = 0.025) and KNN (p = 0.047), Extra Trees had much lower prediction errors. Similarly, SVR performed much better than KNN (p = 0.040), while XGBoost greatly beat Decision Tree (p = 0.047). These results validate that ensemble-based models produce more accurate predictions compare to conventional ML algorithms.

3.4. Model Selection and Implications

The ensemble models were proved to be the best methods for predicting the corrosion behaviour based on the evaluation of validation performance, computational efficiency, and statistical significance. The RF model showed the utmost validation result, demonstrating a strong capacity for generalization during model construction. However, CB and XGB offer a good trade-off between accuracy and efficiency when taking into account both prediction performance and computational efficiency. They are especially attractive for real-world applications due to their competitive prediction performance and very fast training times. Overall, the findings show that, in comparison to conventional ML methods, ensemble learning approaches are very successful at modelling complicated materials datasets, offering strong predictive capability and increased reliability.

3.5. Prediction of Corrosion Depth for New Systems

The RF model showed the best predictive performance, according to validation data. This model was chosen to estimate the corrosion depth in order to evaluate the impact of different descriptors on Zn based structures. To further evaluate the predictive capability of the RF model, five additional environmental conditions from previously published studies were selected as independent validation cases. These conditions were not used during the model training process and represent new combinations of environmental parameters. The experimentally reported corrosion depths under these conditions were compared with the predicted values to examine the generalization performance of the developed machine learning models. We have predicted corrosion depth of the five different combinations of environmental factors, as listed in Table 2. The comparison of predicted and actual value of corrosion depth is shown in Figure 9. This RF model is showing the excellent prediction performance by achieving an R² of 97.9% and Root mean Square Error (RMSE) of 11.8%. The validation results yielded an RMSE of 0.87 µm, corresponding to a normalized RMSE (NRMSE) of 11.9%. These results show that this approach is useful to predict the corrosion behaviour without wasting any time, money and energy.

4. Conclusions

This study highlights the effectiveness of ML, a branch of artificial intelligence, in accelerating the prediction of corrosion behaviour using different environmental factors. Seven ML algorithms were employed to predict the corrosion behaviour by incorporating different environmental factors. Among these models, Random Forest exhibited the highest prediction performance (R² = 96.4%, RMSE = 0.642 µm). The prediction of corrosion depth using five new different environmental factors with the help of the top performing RF model also showed very good performance, such as (R² = 97.9%, RMSE = 11.9%). The RF model provides insights into Zn corrosion mechanisms by accurately capturing the impact of important input factors, such as the exposure time and SO₂ concentration. Additionally, exposure time is the most important factor influencing corrosion depth, according to the SHAP analysis. These results improve our knowledge of the dynamics of zinc corrosion and provide useful direction for corrosion avoidance techniques. The applicability of the ML model can be further improved by conducting future studies; these might examine the relationships between various environmental parameters affecting zinc corrosion behaviour under various air circumstances. The proposed interpretable machine learning framework offers a reliable tool for predicting atmospheric zinc corrosion depth and provides valuable insights into the influence of environmental factors, which can support corrosion engineers, material designers, and infrastructure planners in assessing corrosion risks and optimizing protective strategies for metallic infrastructure. It should be mentioned that the availability of experimentally recorded air corrosion data limits the size of the dataset. To further improve the predictive power and generalization of the suggested machine learning framework, future research may include more long-term exposure datasets and environmental monitoring records.

Author Contributions

Conceptualization, S.J. and S.T., Methodology, S.J. and S.T.; Software, S.J.; Validation, S.J.; Formal analysis and investigation, S.J., S.T., R.S.M., R.J. and S.K.D.; Resources, S.J. and S.T.; Data curation. S.J. and S.T.; Writing—original draft preparation, S.J. and S.T.; Writing—review and editing, S.J., S.T., R.S.M., R.J. and S.K.D.; Visualization, S.J.; Supervision, S.J.; Project administration, S.J., R.J. and S.K.D.; Funding acquisition, S.J., R.J. and S.K.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

The authors declare that this research received no external funding and was conducted without support from any funding agencies or external organizations.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Z.; Wang, F.; Qiu, D.; Taylor, J.A.; Zhang, M. The Effect of Solute Elements on the Grain Refinement of Cast Zn. Metall. Mater. Trans. A 2013, 44, 4025–4030. [Google Scholar] [CrossRef]
Liu, Z.; Qiu, D.; Wang, F.; Taylor, J.A.; Zhang, M. The Grain Refining Mechanism of Cast Zinc through Silver Inoculation. Acta Mater. 2014, 79, 315–326. [Google Scholar] [CrossRef]
Yuan, W.; Xia, D.; Wu, S.; Zheng, Y.; Guan, Z.; Rau, J. V A Review on Current Research Status of the Surface Modification of Zn-Based Biodegradable Metals. Bioact. Mater. 2022, 7, 192–216. [Google Scholar] [CrossRef] [PubMed]
Rajaei, M.; Elahi, S.H.; Asefi, A. Modal Properties of Closed-Cell Zinc Foam. Structures 2020, 27, 1380–1383. [Google Scholar] [CrossRef]
Jain, S. Phase Equilibria Study and Mechanical Properties of Multicomponent Alloys. Ph.D. Thesis, Indian Institute of Technology Indore, Indore, India, 2023. Available online: http://hdl.handle.net/10603/544595 (accessed on 6 April 2026).
de la Fuente, D.; Castaño, J.G.; Morcillo, M. Long-Term Atmospheric Corrosion of Zinc. Corros. Sci. 2007, 49, 1420–1436. [Google Scholar] [CrossRef]
Marder, A.R. The Metallurgy of Zinc-Coated Steel. Prog. Mater. Sci. 2000, 45, 191–271. [Google Scholar] [CrossRef]
Kong, L.; Heydari, Z.; Lami, G.H.; Saberi, A.; Baltatu, M.S.; Vizureanu, P. A Comprehensive Review of the Current Research Status of Biodegradable Zinc Alloys and Composites for Biomedical Applications. Materials 2023, 16, 4797. [Google Scholar] [CrossRef]
Liu, L.; Meng, Y.; Volinsky, A.A.; Zhang, H.-J.; Wang, L.-N. Influences of Albumin on in Vitro Corrosion of Pure Zn in Artificial Plasma. Corros. Sci. 2019, 153, 341–356. [Google Scholar] [CrossRef]
Salgueiro Azevedo, M.; Allély, C.; Ogle, K.; Volovitch, P. Corrosion Mechanisms of Zn(Mg, Al) Coated Steel in Accelerated Tests and Natural Exposure: 1. The Role of Electrolyte Composition in the Nature of Corrosion Products and Relative Corrosion Rate. Corros. Sci. 2015, 90, 472–481. [Google Scholar] [CrossRef]
Wang, X.; Liu, B.; Li, M.; Liu, Y.; Yu, X.; Sun, X.; Shi, L. Comparative Study of Zinc Base Alloy Coatings: Composition, Microstructure, and Corrosion Resistance. Ferroelectrics 2024, 618, 1655–1665. [Google Scholar] [CrossRef]
Thierry, D.; Persson, D.; LeBozec, N. Long-Term Atmospheric Corrosion Rates of Zn55Al-Coated Steel. Mater. Corros. 2024, 75, 694–704. [Google Scholar] [CrossRef]
Maniam, K.K.; Paul, S. Corrosion Performance of Electrodeposited Zinc and Zinc-Alloy Coatings in Marine Environment. Corros. Mater. Degrad. 2021, 2, 163–189. [Google Scholar] [CrossRef]
Cai, J.; Cottis, R.A.; Lyon, S.B. Phenomenological Modelling of Atmospheric Corrosion Using an Artificial Neural Network. Corros. Sci. 1999, 41, 2001–2030. [Google Scholar] [CrossRef]
Feliu, S.; Morcillo, M.; Feliu, S. The Prediction of Atmospheric Corrosion from Meteorological and Pollution Parameters—II. Long-Term Forecasts. Corros. Sci. 1993, 34, 415–422. [Google Scholar] [CrossRef]
Weibel, D.; Jovanovic, Z.R.; Gálvez, E.; Steinfeld, A. Mechanism of Zn Particle Oxidation by H₂O and CO₂ in the Presence of ZnO. Chem. Mater. 2014, 26, 6486–6495. [Google Scholar] [CrossRef]
Mikhailov, A.A.; Tidblad, J.; Kucera, V. The Classification System of ISO 9223 Standard and the Dose–Response Functions Assessing the Corrosivity of Outdoor Atmospheres. Prot. Met. 2004, 40, 541–550. [Google Scholar] [CrossRef]
Jain, S.; Bhowmik, A.; Lee, J. Machine Learning Approaches for Predicting and Validating Mechanical Properties of Mg Rare Earth Alloys for Light Weight Applications. Sci. Technol. Adv. Mater. 2025, 26, 2449811. [Google Scholar] [CrossRef]
Moses, A.; Chen, D.; Wan, P.; Wang, S. Prediction of Electrochemical Corrosion Behavior of Magnesium Alloy Using Machine Learning Methods. Mater. Today Commun. 2023, 37, 107285. [Google Scholar] [CrossRef]
Jain, S.; Wagri, N.K.; Bhowmik, A.; Park, N. Machine Learning Approaches for Predicting Mechanical Performance and Reducing Experimentation in Refractory High-Entropy Alloys. Adv. Eng. Mater. 2025, 27, 2403052. [Google Scholar] [CrossRef]
Kenny, E.D.; Paredes, R.S.C.; de Lacerda, L.A.; Sica, Y.C.; de Souza, G.P.; Lázaris, J. Artificial Neural Network Corrosion Modeling for Metals in an Equatorial Climate. Corros. Sci. 2009, 51, 2266–2278. [Google Scholar] [CrossRef]
Zulkifli, F.; Abdullah, S.; Suriani, M.J.; Kamaludin, M.I.A.; Wan Nik, W.B. Multilayer Perceptron Model for the Prediction of Corrosion Rate of Aluminium Alloy 5083 in Seawater via Different Training Algorithms. IOP Conf. Ser. Earth Environ. Sci. 2021, 646, 012058. [Google Scholar] [CrossRef]
Jain, S.; Wagri, N.K.; Arya, M.; Bhowmik, A.; Park, N. Predicting the Magnetic Behaviour of Homogenized CoCrFeNiAlx High Entropy Alloys at Different Aluminium Content and Temperatures: Reducing Experimental Dependency through Machine Learning Approaches. Mater. Chem. Phys. 2025, 346, 131386. [Google Scholar] [CrossRef]
Zhi, Y.; Fu, D.; Zhang, D.; Yang, T.; Li, X. Prediction and Knowledge Mining of Outdoor Atmospheric Corrosion Rates of Low Alloy Steels Based on the Random Forests Approach. Metals 2019, 9, 383. [Google Scholar] [CrossRef]
Maurya, A.K.; Tiwari, S.; Bhavani, A.G.; Park, N.; Reddy, N.S. ANN-Based Modeling of Atmospheric Zinc Corrosion Rates 2 Using Meteorological and Pollutant Data. Coatings 2025, 15, 538. [Google Scholar] [CrossRef]

Figure 1. Input and output parameters for the dataset used in the present study.

Figure 2. Complete methodology with the different steps used in the present study.

Figure 3. Correlation matrices using (a) Pearson and (b) Spearman analysis.

Figure 4. SHAP analysis. (a) Mean SHAP value. (b) SHAP summary plot.

Figure 5. SHAP dependence plot. (a) Temperature. (b) TOW. (c) SO₂. (d) Cl⁻. (e) Exposure time.

Figure 6. Testing performance of (a) RF, (b) CB, (c) XGB, (d) ET, (e) SVR, (f) DT, and (g) KNN ML models.

Figure 7. Validation performance of (a) RF, (b) CB, (c) XGB, (d) ET, (e) SVR, (f) DT, and (g) KNN ML models.

Figure 8. Performance of ML models based on computational efficiency.

Figure 9. New prediction and validation performance for five different environmental factors using RF model.

Table 1. The performance of statistical significance analysis of used ML models.

Model 1	Model 2	T Statistic Value	p-Value
RF	CB	0.994191	0.328082
RF	XGB	0.808388	0.425232
RF	ET	2.162704	0.038662
RF	SVR	0.835294	0.410156
RF	DT	−1.74094	0.091942
RF	KNN	−1.65331	0.108696
CB	XGB	−0.01283	0.989848
CB	ET	1.346071	0.188364
CB	SVR	0.316949	0.753478
CB	DT	−1.8009	0.08178
CB	KNN	−1.59368	0.12149
XGB	ET	0.776232	0.443689
XGB	SVR	0.330035	0.743667
XGB	DT	−2.06355	0.047805
XGB	KNN	−1.64388	0.110642
ET	SVR	−0.13663	0.892232
ET	T	−2.34239	0.025988
ET	KNN	−2.06551	0.047607
SVR	DT	−1.58453	0.123558
SVR	KNN	−2.14332	0.040316
DT	KNN	−0.27146	0.787897

Table 2. New prediction and validation of corrosion depth for five different combinations of environmental factors.

S.No.	Temperature (°C)	TOW (Annual Fraction)	SO₂	Cl⁻	Exposure Time (Years)	Exp. Corrosion Depth (µm)	Predicted Corrosion Depth (µm)
1	13.34	0.24	55	0	10	10	9.12
2	9.4	0.69	24	171	1	5.1	6.6
3	7.28	0.46	4	17	1	1.1	1.467
4	4.85	0.37	3	2	4	2.6	2.13
5	12.43	0.58	125	125	3	17.9	17.25

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jain, S.; Mourya, R.S.; Jain, R.; Dewangan, S.K.; Tiwari, S. Interpretable Machine-Learning Prediction of Atmospheric Zinc Corrosion Depth Under Diverse Environmental Conditions. Processes 2026, 14, 1214. https://doi.org/10.3390/pr14081214

AMA Style

Jain S, Mourya RS, Jain R, Dewangan SK, Tiwari S. Interpretable Machine-Learning Prediction of Atmospheric Zinc Corrosion Depth Under Diverse Environmental Conditions. Processes. 2026; 14(8):1214. https://doi.org/10.3390/pr14081214

Chicago/Turabian Style

Jain, Sandeep, Rahul Singh Mourya, Reliance Jain, Sheetal Kumar Dewangan, and Saurabh Tiwari. 2026. "Interpretable Machine-Learning Prediction of Atmospheric Zinc Corrosion Depth Under Diverse Environmental Conditions" Processes 14, no. 8: 1214. https://doi.org/10.3390/pr14081214

APA Style

Jain, S., Mourya, R. S., Jain, R., Dewangan, S. K., & Tiwari, S. (2026). Interpretable Machine-Learning Prediction of Atmospheric Zinc Corrosion Depth Under Diverse Environmental Conditions. Processes, 14(8), 1214. https://doi.org/10.3390/pr14081214

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable Machine-Learning Prediction of Atmospheric Zinc Corrosion Depth Under Diverse Environmental Conditions

Abstract

1. Introduction

2. Methodology

2.1. Data Collection

2.2. Development of ML Algorithms

2.3. Feature Engineering

2.4. Model Performance Evaluation

3. Results and Discussion

3.1. Correlation Analysis

3.2. SHAP Analysis

3.2.1. Models Interpretation

3.2.2. Effect of Each Descriptor on Corrosion Behaviour

3.3. Model Performance

3.3.1. Training and Testing Performance

3.3.2. Validation Performance

3.3.3. Computational Efficiency of ML Models

3.3.4. Statistical Significance Analysis

3.4. Model Selection and Implications

3.5. Prediction of Corrosion Depth for New Systems

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI