Machine Learning-Based Prediction of Atmospheric Corrosion Rates Using Environmental and Material Parameters

Tiwari, Saurabh; Dash, Khushbu; Park, Nokeun; Reddy, Nagireddy Gari Subba

doi:10.3390/coatings15080888

Open AccessArticle

Machine Learning-Based Prediction of Atmospheric Corrosion Rates Using Environmental and Material Parameters

¹

School of Materials Science and Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea

²

Department of Chemistry, School of Engineering, Amrita Vishwa Vidyapeetham, Chennai Campus, Chennai 600113, India

³

Institute of Materials Technology, Yeungnam University, Gyeongsan 38541, Republic of Korea

⁴

Virtual Materials Laboratory, School of Materials Science and Engineering, Engineering Research Institute, Gyeongsang National University, Jinju 52828, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Coatings 2025, 15(8), 888; https://doi.org/10.3390/coatings15080888

Submission received: 4 July 2025 / Revised: 28 July 2025 / Accepted: 28 July 2025 / Published: 31 July 2025

(This article belongs to the Special Issue Advanced Anticorrosion Coatings and Coating Testing)

Download

Browse Figures

Versions Notes

Abstract

Atmospheric corrosion significantly impacts infrastructure worldwide, with traditional assessment methods being time-intensive and costly. This study developed a comprehensive machine learning framework for predicting atmospheric corrosion rates using environmental and material parameters. Three regression models (Linear Regression, Random Forest, and Gradient Boosting) were trained on a scientifically informed synthetic dataset incorporating established corrosion principles from ISO 9223 standards and peer-reviewed literature. The Gradient Boosting model achieved superior performance with cross-validated R² = 0.835 ± 0.024 and RMSE = 98.99 ± 16.62 μm/year, significantly outperforming the Random Forest (p < 0.001) and Linear Regression approaches. Feature importance analysis revealed the copper content (30%), exposure time (20%), and chloride deposition (15%) as primary predictors, consistent with the established principles of corrosion science. Model diagnostics demonstrated excellent predictive accuracy (R² = 0.863) with normally distributed residuals and homoscedastic variance patterns. This methodology provides a systematic framework for ML-based corrosion prediction, with significant implications for protective coating design, material selection, and infrastructure risk assessment, pending comprehensive experimental validation.

Keywords:

atmospheric corrosion; machine learning; gradient boosting; corrosion prediction; environmental degradation; data-driven modeling; protective coatings

1. Introduction

Atmospheric corrosion is one of the most economically significant forms of material degradation worldwide, with annual losses estimated at 3%–4% of the global GDP, equivalent to approximately 2.5 trillion USD [1,2]. This phenomenon occurs when metallic materials are exposed to atmospheric environments containing moisture, oxygen, and various pollutants, leading to electrochemical reactions that deteriorate structural integrity over time. The complexity of atmospheric corrosion arises from the intricate interplay between environmental factors such as relative humidity, temperature, time of wetness, and atmospheric pollutants (SO₂, chlorides, NO_x), and material properties, including chemical composition, microstructure, and surface conditions [3,4,5,6]. Traditional corrosion assessment methods are scientifically rigorous and constrained by several limitations, including extended exposure periods (often requiring years to decades for meaningful data), high costs associated with long-term field testing, limited ability to replicate diverse environmental conditions, and challenges in scaling results across different geographical locations [3,7,8].

Recent advances in computational materials science have opened new avenues for addressing these challenges using machine-learning approaches. The application of ML techniques to corrosion prediction has gained significant momentum owing to their ability to capture complex nonlinear relationships among multiple variables, process large datasets efficiently, and provide rapid predictions for material screening applications [8,9,10,11]. Several pioneering studies have demonstrated the potential of ML for corrosion science. Zhi et al. [9] successfully applied Random Forest algorithms to predict outdoor atmospheric corrosion rates of low-alloy steels, achieving R² values of 0.89 for carbon steel predictions. Their work highlighted the importance of environmental factors, such as temperature and humidity, in governing corrosion behavior. Similarly, Wen et al. [12] utilized Support Vector Regression to predict the corrosion rates of 3C steel in seawater environments, demonstrating the effectiveness of kernel-based methods for complex corrosion systems. The work by Pei et al. [12] represents a significant advancement in ML-based corrosion prediction, in which they developed neural network models to predict the atmospheric corrosion of Fe/Cu sensors. Their study achieved impressive predictive accuracy (R² = 0.92) and provided valuable insights into the relative importance of the environmental parameters. These findings align with the broader trend in materials informatics, where data-driven approaches increasingly complement traditional experimental methods [13,14].

Despite these promising developments, the field faces a critical challenge: the scarcity of comprehensive, standardized datasets that encompass diverse environmental conditions and material compositions. Real-world corrosion datasets typically require extensive resources for data collection, standardized measurement protocols across different laboratories, long-term monitoring infrastructure, and systematic variation in both environmental and material parameters [15]. This scarcity of data has led to the exploration of synthetic data generation as a viable approach for ML methodology. Synthetic data offer several advantages, including controlled parameter variation, reduced experimental costs, accelerated algorithm development, and the ability to generate datasets with desired statistical properties [16,17]. However, the success of synthetic data approaches critically depends on their foundation in established scientific principles and their validation against experimental observations.

This study addressed these challenges by developing a comprehensive ML framework for atmospheric corrosion prediction using a scientifically informed synthetic dataset. Our approach builds on the established corrosion science principles documented in international standards (ISO 9223) [18] and peer-reviewed literature to create a realistic dataset for algorithm development and validation. The specific objectives of this research were to (1) develop a robust synthetic dataset generation methodology based on established atmospheric corrosion relationships, (2) systematically compare the performance of different ML algorithms for corrosion prediction, (3) identify key environmental and material parameters governing corrosion behavior through feature importance analysis, and (4) establish a comprehensive validation framework for ML-based corrosion prediction models.

2. Materials and Methods

2.1. Synthetic Dataset Generation

The synthetic dataset generation approach implemented in this study is grounded in the well-established atmospheric corrosion relationships documented in international standards and peer-reviewed literature. The methodology follows the dose–response relationship framework established by Kucera and Mattsson [19,20] and refined through decades of atmospheric corrosion research under the ISOCORRAG program (Tidblad et al.) [20]. The core mathematical framework included several key corrosion mechanisms. The temperature effects follow Arrhenius-type relationships, which are consistent with the thermodynamic basis of electrochemical corrosion processes (Melchers) [21]. The temperature correction factor was implemented as follows: temp_effect = exp ((T − 20) × 0.05), where T is the temperature in °C and the reference temperature is 20 °C. This formulation aligns with the temperature dependencies observed in long-term atmospheric exposure studies [22].

Humidity effects incorporate the critical threshold behavior that is well-documented in the atmospheric corrosion literature. Below the critical relative humidity (typically 60%–80% depending on pollutant levels), the corrosion rates remain minimal owing to insufficient surface moisture for sustained electrochemical activity [23]. Above this threshold, the corrosion rates increased exponentially with humidity, reflecting the formation of continuous electrolyte films on the metal surfaces. Time-of-wetness calculations followed the ISO 9223 guidelines, incorporating the synergistic effects of temperature and humidity in determining the duration of corrosive conditions. This parameter is particularly crucial for atmospheric corrosion prediction because it is directly related to the time available for electrochemical reactions [4].

The synthetic dataset incorporates well-established dose–response relationships for key atmospheric pollutants. Sulfur dioxide effects follow the power-law relationships documented in multi-site exposure programs: SO₂_effect = (SO₂_concentration)^0.3. This formulation reflects the nonlinear response of corrosion rates to SO₂ concentrations, where initial increases in pollution levels have pronounced effects that gradually diminish at higher concentrations [19].

The chloride deposition effects were implemented using exponential relationships that capture the aggressive nature of marine and coastal environments: chloride_effect = exp (chloride deposition × 0.01). This approach was consistent with the findings of Alcántara et al. [3], who demonstrated the profound impact of airborne chloride deposits on atmospheric corrosion rates.

The synthetic dataset incorporates the well-documented effects of alloying elements on the atmospheric corrosion resistance. The copper content effects were implemented through exponential protection factors: Cu_protection = exp (−Cu_content × 2). This formulation reflects the established protective effect of copper in atmospheric environments, where copper-bearing steels exhibit significantly reduced corrosion rates compared with plain carbon steels (Cano et al., 2013 [4]; de la Fuente et al., 2011 [5]). The effects of Cr and Ni were incorporated through synergistic protection mechanisms that accounted for the formation of protective oxide layers. These relationships are based on the extensive literature on weathering steels and low-alloy atmospheric corrosion-resistant steels [4].

2.2. Dataset Characteristics and Statistical Analysis

The synthetic dataset comprised 500 samples representing diverse atmospheric corrosion scenarios across different climatic zones and material compositions (Supplementary Data). The dataset structure incorporates 15 input features categorized into environmental parameters (temperature, humidity, time of wetness, precipitation, SO₂, and chlorides), material composition variables (C, Si, Mn, P, S, Cu, Ni, and Cr), and exposure conditions (time). The statistical characteristics of the dataset (Table 1) demonstrated the appropriate variability for ML model development. Environmental parameters span ranges typical of diverse climatic conditions, from temperate inland environments to aggressive marine coastal zones. The material composition ranges from plain carbon steels to low-alloy atmospheric corrosion-resistant steels, providing a comprehensive coverage of commercially relevant materials. The scope of this study was limited to carbon and low-alloy steels, which are widely used in infrastructure and weathering applications. Although the current dataset reflects this focus, the methodology is designed to be generalizable and will be extended in future work to include other material systems, such as stainless steels and aluminum alloys.

The target variable (corrosion rate) exhibits a wide range from 29.81 to 1906.13 μm/year, reflecting the broad spectrum of atmospheric corrosion severity from benign rural environments to highly aggressive industrial/marine locations. This range is consistent with long-term atmospheric exposure data reported in international programs such as ISOCORRAG and MICAT [20,24,25,26]. The relatively high standard deviations observed for some environmental parameters (e.g., precipitation, SO₂, and chloride) and the corrosion rate itself are intentional and stem from the synthetic dataset design. The goal was to simulate real-world environmental diversity by incorporating the variability across climate zones, pollution loads, and exposure times. Such diversity is essential for developing ML models that generalize across conditions, rather than overfitting to narrow data bands.

2.3. Machine Learning Pipeline and Model Development

2.3.1. Data Preprocessing and Feature Engineering

Input features were standardized using z-score normalization to ensure equal contributions to model training, addressing the different scales of the environmental and compositional parameters. This preprocessing step is crucial for algorithms sensitive to feature scaling, particularly gradient-based optimization methods. The dataset was randomly partitioned into the training (70%, 350 samples), validation (15%, 75 samples), and testing (15%, 75 samples) subsets. This allocation provides sufficient data for model training while preserving adequate samples for an unbiased performance evaluation.

2.3.2. Model Selection and Implementation

Three distinct ML algorithms were selected to represent the different modeling approaches.

Linear Regression served as a baseline model, providing insights into the linear relationships between the features and corrosion rates. Although atmospheric corrosion involves complex nonlinear processes, linear models can capture first-order effects and provide interpretable coefficients for feature importance assessments.

Random forest regressors represent ensemble tree-based methods that can capture nonlinear relationships and feature interactions. This algorithm constructs multiple decision trees using bootstrap sampling and feature randomization to provide robust predictions through vote aggregation [25]. Random Forest has shown particular success in materials science applications, owing to its ability to handle mixed data types and provide feature importance rankings.

The Gradient Boosting Regressor implements sequential ensemble learning in which the models are trained iteratively to correct errors from previous iterations. This approach often achieves superior performance in regression tasks by focusing on samples that are difficult to predict [26]. Gradient boosting has demonstrated excellent performance in various scientific prediction tasks, including material property prediction.

2.3.3. Model Validation and Statistical Analysis

A five-fold cross-validation was employed to assess the stability and generalization capability of the model. This approach provides robust performance estimates by training and evaluating models on different data subsets, thereby reducing the impact of random data partitioning on performance metrics. The statistical significance of the performance differences was evaluated using paired t-tests on cross-validated R² values to provide a quantitative assessment of model superiority. This statistical approach is essential for establishing confidence in model selection decisions.

2.3.4. Performance Metrics and Evaluation Framework

The model performance was assessed using multiple complementary metrics.

R² (Coefficient of Determination): Measures the proportion of variance in corrosion rates explained by the model.
RMSE (Root Mean Square Error): Provides error magnitude in original units (μm/year).
MAE (Mean Absolute Error): Offers a robust error assessment that is less sensitive to outliers.

Additionally, comprehensive diagnostic analyses, including residual distribution analysis, heteroscedasticity assessment, and feature correlation analysis, were performed to ensure model validity and identify potential issues.

3. Results and Discussion

3.1. Model Performance and Diagnostic

A comparative analysis of the three ML algorithms revealed significant differences in their ability to predict atmospheric corrosion rates (Table 2). Gradient Boosting achieved the highest predictive accuracy with a test R² of 0.863 and RMSE of 106.60 μm/year, substantially outperforming both Random Forest (R² = 0.777, RMSE = 135.85 μm/year) and Linear Regression (R² = 0.777, RMSE = 136.03 μm/year).

The superior performance of Gradient Boosting aligns with recent findings in materials science applications, where sequential ensemble methods have demonstrated exceptional capabilities in capturing complex nonlinear relationships. This result is consistent with the work of Pei et al. (2020) [12], who reported that boosting algorithms achieved superior performance compared to single-model approaches in atmospheric corrosion prediction. The sequential learning nature of gradient boosting allows the algorithm to focus on difficult-to-predict samples, which is particularly valuable for corrosion prediction where extreme conditions often govern long-term performance. Cross-validation results confirmed the robustness of these findings, with Gradient Boosting maintaining the highest cross-validated R² (0.835 ± 0.024) and lowest RMSE (99 ± 17 μm/year). The relatively small standard deviations indicate good model stability across different data subsets, suggesting that the performance advantage of Gradient Boosting is consistent rather than dependent on specific training/validation splits.

The comparable performances of Random Forest and linear regression (both achieving R² ≈ 0.777) are somewhat surprising given the expected nonlinear nature of atmospheric corrosion processes. This finding may reflect the dominant influence of a few key variables (particularly copper content and exposure time) that exhibit relatively linear relationships with corrosion rates in the synthetic dataset. However, the superior performance of Gradient Boosting suggests that capturing subtle nonlinear interactions and complex feature relationships provides a significant predictive advantage. Statistical analysis using paired t-tests revealed that Gradient Boosting significantly outperformed Random Forest (p < 0.001), establishing statistical confidence in the performance difference. The comparison with Linear Regression showed a trend toward significance (p = 0.068), which may reach statistical significance with larger sample sizes or additional cross-validation folds.

The diagnostic analysis of the Gradient Boosting model (Figure 1, Figure 2 and Figure 3) demonstrates excellent model performance and appropriate assumptions. The predicted and actual corrosion rates for the gradient-boosting model are shown in Figure 1. A strong linear correlation (R² = 0.863) and tight clustering of data points around the diagonal line indicate a high predictive accuracy and minimal systematic bias across the corrosion range. The slight scatter at higher corrosion rates may reflect the inherent challenges in predicting extreme corrosion conditions, which are often governed by the complex interactions between multiple factors. The residual distribution (Figure 2) exhibits an approximately normal distribution centered around zero, confirming the appropriate model assumptions for statistical inference. The absence of significant skewness or multimodality indicates that the model captures the underlying data structure without systematic bias. The residuals vs. predicted values plot (Figure 3) shows random scatter around the horizontal reference line, indicating homoscedasticity (constant variance) across the prediction range. This finding is important for establishing the reliability of uncertainty estimates and confidence intervals for model predictions. These diagnostic results provide confidence in the validity of the model and suggest that the Gradient Boosting approach successfully captures the underlying relationships in the synthetic dataset without significant violations of the statistical assumptions.

3.2. Feature Importance Analysis

The feature importance analysis from the Gradient Boosting model provided valuable insights into the relative significance of the different factors governing atmospheric corrosion (Figure 4). The rankings revealed four primary predictors that dominate the corrosion process.

The copper content (Importance ≈ 0.30) emerged as the most influential variable, confirming the well-established protective effect of copper in atmospheric environments. This finding is consistent with the extensive literature documenting the beneficial effects of copper addition on the steel composition. The protective mechanism involves the formation of adherent Cu-rich corrosion products that provide a barrier against further oxidation [27,28]. The magnitude of the importance of copper in our model aligns with field exposure studies showing that copper contents as low as 0.2%–0.5% can reduce atmospheric corrosion rates by 40%–60% compared to plain carbon steels [4].

Exposure time (Importance ≈ 0.20) is the second most important factor, reflecting the time-dependent nature of atmospheric corrosion. This result validates the power-law time dependence (α t^0.7) implemented in the synthetic dataset, which is consistent with long-term exposure studies reported by Panchenko and Marshakov [22]. The sublinear time dependence reflects the gradual development of protective corrosion products that slow the corrosion rate over time, a phenomenon well-documented in the atmospheric corrosion literature.

Chloride Deposition (Importance ≈ 0.15) ranked third, confirming the aggressive nature of the marine and coastal environments. This finding aligns with extensive research on marine atmospheric corrosion, where chloride deposition rates as low as 10–20 mg/m²/day can significantly accelerate corrosion rates [3]. The importance of chlorides reflects their role in promoting electrolyte film formation, increasing conductivity, and disrupting the protective oxide layers.

The time of wetness (Importance ≈ 0.13) represents the fourth most important factor, highlighting the critical role of moisture availability in governing the electrochemical corrosion processes. This parameter, which combines the effects of temperature and humidity, is directly related to the duration of the conditions favorable for corrosion reactions. The importance of wetness time is consistent with the ISO 9223 corrosivity classification systems, where this parameter is used as a primary indicator of atmospheric corrosion severity.

The relatively low importance of other compositional elements (Mn, P, S, Si) reflects their limited influence on atmospheric corrosion resistance within the concentration ranges typical of structural steels. This finding is consistent with metallurgical knowledge, in which these elements primarily affect mechanical properties rather than corrosion resistance in atmospheric environments.

3.3. Correlation Analysis and Feature Relationships

Correlation analysis revealed generally low inter-feature correlations (most values < 0.1), indicating minimal multicollinearity issues that could compromise model performance or interpretation. This finding is important for establishing the reliability of feature importance rankings and ensuring that model predictions are not unduly influenced by redundant information and that the individual feature correlations with corrosion rate provide additional validation of the scientific foundation of the synthetic dataset. The strongest positive correlations were observed for the wetness time (0.37) and Exposure Years (0.37), both of which directly promoted corrosion through extended exposure to corrosive conditions. The chloride deposition showed a positive correlation (0.34), confirming its role as a corrosion accelerator.

Figure 5 shows the comprehensive feature correlation matrix, revealing generally low inter-feature correlations (most < 0.1), indicating minimal multicollinearity issues. This finding is important for establishing the reliability of feature importance rankings. Figure 6 shows the individual feature correlations with the corrosion rate, providing additional validation of the scientific foundation of the synthetic dataset. The strongest positive correlations were for the Time of Wetness (0.37), Exposure Years (0.37), and Chloride Deposition (0.34).

The strongest negative correlation was observed for copper content (−0.53), providing clear evidence of its protective effect. This correlation strength is consistent with experimental studies showing that copper addition provides logarithmic improvements in the corrosion resistance (de la Fuente et al., 2011 [5]). These correlation patterns align well with established corrosion science principles, providing confidence in the synthetic dataset representation of real-world atmospheric corrosion behavior. The correlation magnitudes are also consistent with field exposure studies, where environmental factors typically show moderate correlations with corrosion rates owing to the complex, multifactorial nature of atmospheric corrosion.

3.4. Implications for Corrosion Science and Engineering Applications

3.4.1. Materials Design and Alloy Development

Feature importance rankings provide quantitative guidance for optimizing alloy designs. The dominant influence of copper content (30% importance) suggests that copper-bearing alloys should be prioritized for atmospheric applications, particularly in aggressive environments. This finding supports the continued development of copper-bearing weathering steels and low-alloy atmospheric corrosion-resistant steels. The relatively low importance of traditional alloying elements (Mn, Si, P, and S) within typical structural steel ranges indicates that optimization efforts should focus on the Cu content rather than the fine-tuning of these elements. This insight can guide alloy development programs by prioritizing Cu optimization over other compositional variables.

3.4.2. Protective Coating System Design

The ML model results have significant implications for the design of protective coating strategies. The importance of environmental factors (chlorides and time of wetness) suggests that coating systems should be specifically tailored to the local environmental conditions. Enhanced barrier properties and cathodic protection are necessary in coastal environments with high chloride deposition. The time-dependent nature of corrosion (20% importance for exposure time) indicates that coating systems should be designed for long-term performance, with particular attention paid to the degradation mechanisms that develop over extended periods. This finding supports the development of self-healing and adaptive coating systems that can maintain their protective properties for decades.

3.4.3. Infrastructure Risk Assessment and Maintenance Planning

The predictive framework enables a quantitative risk assessment of infrastructure exposed to atmospheric corrosion. By incorporating site-specific environmental data (temperature, humidity, and pollutant levels), the model can provide estimates of expected corrosion rates for different material systems. The sensitivity of the model to chloride deposition and wetness makes it particularly valuable for coastal infrastructure planning, where these factors vary significantly with distance from the sea and local microclimatic conditions. This capability supports evidence-based decision making for material selection and maintenance scheduling.

3.5. Comparison with Existing Literature and Validation

The performance metrics achieved in this study are broadly comparable to those reported in prior machine learning-based corrosion prediction studies. For instance, Zhi et al. [9] reported an R² value of 0.89 using a Random Forest model on experimentally obtained corrosion data, while our Gradient Boosting model achieved an R² of 0.863 on a scientifically informed synthetic dataset. Although the numerical difference in the predictive accuracy was relatively small, statistical analysis based on paired t-tests confirmed that the improvement was significant. More importantly, our results demonstrate that a well-constructed synthetic dataset grounded in corrosion science principles can reproduce key trends in atmospheric corrosion behavior with high fidelity. The feature importance rankings obtained from our model align well with the established corrosion science. The prominent role of copper supports extensive findings in the literature regarding the protective effect of Cu-bearing steels in atmospheric environments [4]. Similarly, the high influence of environmental parameters, such as chloride deposition and time of wetness, is consistent with ISO 9223-based corrosivity classifications and field exposure studies. The model treatment of time-dependent corrosion behavior reflects the power-law degradation trend observed in long-term atmospheric exposure programs. For example, Panchenko and Marshakov [22] reported time exponents ranging from 0.6 to 0.8 for the atmospheric corrosion of steel, consistent with the t^0.7 dependence implemented in our dataset generation framework.

3.6. Limitations and Future Research Directions

Although the synthetic dataset provided valuable insights for methodology development, several limitations must be acknowledged. The simplified mathematical relationships used in this study do not fully capture the complexity of real-world atmospheric corrosion processes. In particular, the current model does not include microstructural characteristics, such as grain boundary density, phase distribution, or inclusion content, which are known to influence corrosion behavior. Additionally, galvanic effects arising from multimetallic assemblies and the role of surface treatments or coatings were not considered. These limitations have been acknowledged, and future work will focus on incorporating these advanced factors as more detailed experimental datasets become available. Thus, this study serves as a foundational step toward more comprehensive and physically representative models. The current synthetic dataset primarily focuses on carbon and low-alloy steels, which are among the most commonly used materials in atmospheric-exposed infrastructures. Although this provides a relevant and practical basis for model development, it limits the generalizability of the results to other material classes. In future work, we aim to expand the dataset to include stainless steels, aluminum alloys, and other structurally important metals to capture a wider range of corrosion behaviors and enhance the applicability of the model across different industries. Future studies should expand the synthetic datasets to encompass broader material classes and more complex corrosion mechanisms.

The ultimate validation of this ML framework requires a comprehensive comparison with experimental data from controlled-exposure studies. Priority should be given to validation studies that include diverse environmental conditions (urban, rural, marine, and industrial), systematic variations in material composition, and long-term exposure data (5–20 years). Collaborative programs with international atmospheric corrosion research networks (ISOCORRAG and MICAT) can provide access to standardized datasets for model validation and refinement. Such collaborations would enable the development of regionally specific models and improve the understanding of the climatic effects on corrosion behavior. In particular, electrochemical validation techniques such as potentiodynamic polarization and electrochemical impedance spectroscopy (EIS) are essential to gain mechanistic insight into the roles of copper content and chloride deposition, which were identified as key predictive features in our model. These methods can provide a quantitative understanding of the passive film stability, breakdown potentials, and charge transfer resistances that govern the atmospheric corrosion rates. Future work will focus on integrating such electrochemical data to verify and calibrate the ML predictions and to link model outputs more directly to established corrosion mechanisms.

Future research should explore advanced ML techniques, including deep learning approaches for capturing complex nonlinear relationships, ensemble methods combining multiple algorithm types, and physics-informed ML models that incorporate the fundamental principles of corrosion science as constraints. The integration of real-time environmental monitoring data with ML models can enable dynamic corrosion prediction and early warning systems for critical infrastructures. This capability represents a significant advancement in predictive maintenance strategies for atmospheric corrosion management.

4. Conclusions

This study successfully demonstrates a comprehensive methodological framework for ML-based atmospheric corrosion prediction using scientifically informed synthetic data. This study makes several key contributions to the field of computational corrosion science.

Methodological Innovation: A systematic approach to synthetic dataset generation based on established corrosion science principles provides a valuable framework for ML algorithm development in corrosion science. This methodology addresses the critical challenge of data scarcity in corrosion research, while maintaining scientific rigor through the incorporation of well-documented mechanistic relationships.

Algorithm Performance Validation: Gradient Boosting demonstrated superior predictive performance (R² = 0.863, RMSE = 106.60 μm/year) compared to Random Forest and Linear Regression approaches. The statistical significance of this performance advantage (p < 0.001 vs. Random Forest) provides confidence in the selection of sequential ensemble methods for complex corrosion prediction tasks.

Scientific Insights: Feature importance analysis confirmed the critical role of copper content, exposure time, and environmental factors in governing atmospheric corrosion rates. These findings align with decades of experimental research and provide quantitative guidance for material design and environmental risk assessment.

Practical Applications: This predictive framework offers systematic support for protective coating design, alloy development, and infrastructure risk assessment. The sensitivity of the model to key environmental parameters makes it particularly valuable for site-specific corrosion-management strategies.

Quality Assurance: Comprehensive diagnostic analysis confirms the appropriate model assumptions and reliable predictive performance across the full range of corrosion conditions. The absence of systematic bias and homoscedastic residuals provides confidence in the model’s statistical validity.

Synthetic data methodology represents a significant advancement in computational approaches to corrosion science, providing a cost-effective and scientifically rigorous foundation for the development of ML algorithms. Although experimental validation remains essential for practical implementation, this study establishes a robust framework for advancing the predictive capabilities of atmospheric corrosion science.

Future research should focus on comprehensive experimental validation across diverse environmental conditions, extension to broader material classes, and integration with real-time monitoring systems for dynamic corrosion prediction. The methodology developed in this study provides a solid foundation for these advanced applications and represents a significant step forward in the digitalization of corrosion science and engineering.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/coatings15080888/s1, Supplementary data.

Author Contributions

Conceptualization, S.T., N.P., and N.G.S.R.; methodology, S.T. and K.D.; software, N.G.S.R. and S.T.; validation, K.D. and S.T.; formal analysis and investigation, S.T. and K.D. resources, N.G.S.R. and N.P.; data curation K.D. and S.T.; writing—original draft preparation, S.T. and N.G.S.R.; writing—review and editing, S.T., N.G.S.R. and N.P.; visualization, S.T. and K.D.; supervision, N.G.S.R. and N.P.; project administration, N.G.S.R.; and funding acquisition, N.P. and N.G.S.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was also funded and conducted under the Industrial Innovation Talent Growth Support Project of the Korean Ministry of Trade, Industry, and Energy (MOTIE), operated by the Korea Institute for Advancement of Technology (KIAT) (No. P0023676, Expert Training Project for eco-friendly metal materials industry). This work was supported by the Learning & Academic Research Institution for Master’s PhD students and the Postdocs (LAMP) Program of the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. RS-2023-00301974).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article or Supplementary Material.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bowman, E.; Thompson, N.; Gl, D.; Moghissi, O.; Gould, M.; Payer, J. International Measures of Prevention, Application, and Economics of Corrosion Technologies Study; NACE International: Houston, TX, USA, 2016. [Google Scholar]
Hou, B.; Li, X.; Ma, X.; Du, C.; Zhang, D.; Zheng, M.; Xu, W.; Lu, D.; Ma, F. The cost of corrosion in China. npj Mater. Degrad. 2017, 1, 4. [Google Scholar] [CrossRef]
Alcántara, J.; Chico, B.; Díaz, I.; de la Fuente, D.; Morcillo, M. Airborne chloride deposit and its effect on marine atmospheric corrosion of mild steel. Corros. Sci. 2015, 97, 74–88. [Google Scholar] [CrossRef]
Morcillo, M.; Chico, B.; Díaz, I.; Cano, H.; de la Fuente, D. Atmospheric corrosion data of weathering steels. A review. Corros. Sci. 2013, 77, 6–24. [Google Scholar] [CrossRef]
de la Fuente, D.; Díaz, I.; Simancas, J.; Chico, B.; Morcillo, M. Long-term atmospheric corrosion of mild steel. Corros. Sci. 2011, 53, 604–617. [Google Scholar] [CrossRef]
Narayana, P.L.; Tiwari, S.; Maurya, A.K.; Ishtiaq, M.; Park, N.; Reddy, N.G.S. Quantitative and Qualitative Analysis of Atmospheric Effects on Carbon Steel Corrosion Using an ANN Model. Metals 2025, 15, 607. [Google Scholar] [CrossRef]
Cai, Y.; Zhao, Y.; Ma, X.; Zhou, K.; Chen, Y. Influence of environmental factors on atmospheric corrosion in dynamic environment. Corros. Sci. 2018, 137, 163–175. [Google Scholar] [CrossRef]
Maurya, A.K.; Tiwari, S.; Bhavani, A.G.; Park, N.; Reddy, N.G.S. Artificial Neural Network-Based Modeling of Atmospheric Zinc Corrosion Rates Using Meteorological and Pollutant Data. Coatings 2025, 15, 538. [Google Scholar] [CrossRef]
Zhi, Y.; Fu, D.; Zhang, D.; Yang, T.; Li, X. Prediction and Knowledge Mining of Outdoor Atmospheric Corrosion Rates of Low Alloy Steels Based on the Random Forests Approach. Metals 2019, 9, 383. [Google Scholar] [CrossRef]
Wen, Y.F.; Cai, C.Z.; Liu, X.H.; Pei, J.F.; Zhu, X.J.; Xiao, T.T. Corrosion rate prediction of 3C steel under different seawater environment by using support vector regression. Corros. Sci. 2009, 51, 349–355. [Google Scholar] [CrossRef]
Kamrunnahar, M.; Urquidi-Macdonald, M. Prediction of corrosion behavior using neural network as a data mining tool. Corros. Sci. 2010, 52, 669–677. [Google Scholar] [CrossRef]
Pei, Z.; Zhang, D.; Zhi, Y.; Yang, T.; Jin, L.; Fu, D.; Cheng, X.; Terryn, H.A.; Mol, J.M.C.; Li, X. Towards understanding and prediction of atmospheric corrosion of an Fe/Cu corrosion sensor via machine learning. Corros. Sci. 2020, 170, 108697. [Google Scholar] [CrossRef]
Butler, K.T.; Davies, D.W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine learning for molecular and materials science. Nature 2018, 559, 547–555. [Google Scholar] [CrossRef] [PubMed]
Schmidt, J.; Marques, M.R.G.; Botti, S.; Marques, M.A.L. Recent advances and applications of machine learning in solid-state materials science. npj Comput. Mater. 2019, 5, 83. [Google Scholar] [CrossRef]
Leygraf, C.; Wallinder, I.O.; Tidblad, J.; Graedel, T. THE ELECTROCHEMICAL SOCIETY SERIES. In Atmospheric Corrosion; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2016; pp. a–c. [Google Scholar] [CrossRef]
Raccuglia, P.; Elbert, K.C.; Adler, P.D.F.; Falk, C.; Wenny, M.B.; Mollo, A.; Zeller, M.; Friedler, S.A.; Schrier, J.; Norquist, A.J. Machine-learning-assisted materials discovery using failed experiments. Nature 2016, 533, 73–76. [Google Scholar] [CrossRef]
Jha, D.; Ward, L.; Paul, A.; Liao, W.; Choudhary, A.; Wolverton, C.; Agrawal, A. ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition. Sci. Rep. 2018, 8, 17593. [Google Scholar] [CrossRef]
ISO 9223:2012; Corrosion of metals and alloys—Corrosivity of atmospheres—Classification, determination and estimation. International Organization for Standardization: Geneva, Switzerland, 2012.
Mansfeld, F.B. Corrosion Mechanisms; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar] [CrossRef]
Tidblad, J.; Kucera, V.; Ferm, M.; Kreislova, K.; Brüggerhoff, S.; Doytchinov, S.; Screpanti, A.; Grøntoft, T.; Yates, T.; de la Fuente, D.; et al. Effects of Air Pollution on Materials and Cultural Heritage: ICP Materials Celebrates 25 Years of Research. Int. J. Corros. 2012, 2012, 496321. [Google Scholar] [CrossRef]
Melchers, R.E. Modeling of Marine Immersion Corrosion for Mild and Low-Alloy Steels—Part 1: Phenomenological Model. Corrosion 2003, 59, 319–334. [Google Scholar] [CrossRef]
Panchenko, Y.M.; Marshakov, A.I. Long-term prediction of metal corrosion losses in atmosphere using a power-linear function. Corros. Sci. 2016, 109, 217–229. [Google Scholar] [CrossRef]
Leygraf, C.; Wallinder, I.O.; Tidblad, J.; Graedel, T. APPLIED ATMOSPHERIC CORROSION. In Atmospheric Corrosion; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2016; pp. 166–180. [Google Scholar] [CrossRef]
Morcillo, M.; Chico, B.; De La Fuente, D.; Simancas, J. Looking back on contributions in the field of atmospheric corrosion offered by the MICAT ibero-american testing network. Int. J. Corros. 2012, 2012, 824365. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Friedman, J. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2000, 29, 1189–1232. [Google Scholar] [CrossRef]
Cano, H.; Neff, D.; Morcillo, M.; Dillmann, P.; Díaz, I.; Fuente, D. Characterization of corrosion products formed on Ni 2.4 wt%–Cu 0.5 wt%–Cr 0.5 wt% weathering steel exposed in marine atmospheres. Corros. Sci. 2014, 87, 438–451. [Google Scholar] [CrossRef]
Diaz, I.; Cano, H.; de la Fuente, D.; Chico, B.; Vega, J.M.; Morcillo, M. Atmospheric corrosion of Ni-advanced weathering steels in marine atmospheres of moderate salinity. Corros. Sci. 2013, 76, 348–360. [Google Scholar] [CrossRef]

Figure 1. Predicted vs. actual corrosion rates for the Gradient Boosting model.

Figure 2. Residual distribution for the Gradient Boosting model.

Figure 3. Residuals plotted against predicted values for Gradient Boosting model.

Figure 4. Feature importance ranking from Gradient Boosting model. The copper content dominated with an importance of 0.30, followed by exposure time (0.20), chloride deposition (0.15), and time of wetness (0.13). Material composition features with a low variance are of minimal importance.

Figure 5. Pearson correlation matrix heatmap for all features and target variables. The color scale ranges from −0.6 (blue) to +1.0 (red). The matrix confirms the low inter-feature correlations (most < 0.1) and highlights the key relationships with the corrosion rate in the bottom row.

Figure 6. Individual feature correlations with corrosion rate ranked by absolute correlation strength. The time of wetness (0.37) and exposure years (0.37) showed the strongest positive correlations, whereas the copper content showed the strongest negative correlation (−0.53), validating the synthetic data programming.

Table 1. Descriptive statistics of synthetic dataset variables.

Feature	Unit	Min	Max	Mean	Std Dev
Temperature	°C	15.03	27.98	21.32	3.74
Relative Humidity	%	70.02	94.96	82.06	7.24
Time of Wetness	fraction/year	0.10	0.90	0.50	0.23
Precipitation	mm	104.40	3999.90	2070.72	1141.13
SO₂	μg/m³	5.11	49.84	27.42	13.30
Chloride	mg/m²/day	10.30	399.24	206.80	112.02
Carbon	wt%	0.05	0.30	0.17	0.07
Silicon	wt%	0.15	0.60	0.39	0.13
Manganese	wt%	0.30	1.50	0.90	0.34
Phosphorus	wt%	0.01	0.08	0.05	0.02
Sulfur	wt%	0.01	0.04	0.02	0.01
Copper	wt%	0.01	0.50	0.26	0.14
Nickel	wt%	0.01	0.50	0.25	0.15
Chromium	wt%	0.01	0.80	0.42	0.23
Exposure Years	years	1.03	9.97	5.35	2.57
Corrosion Rate	μm/year	29.81	1906.13	320.45	255.30

Table 2. Model performance comparison.

Model	Test R²	Test RMSE	Test MAE	CV R² (±SD)	CV RMSE (±SD)
Linear Regression	0.777	136.03	94.50	0.778 ± 0.031	115 ± 16
Random Forest	0.777	135.85	85.13	0.751 ± 0.027	122 ± 18
Gradient Boosting	0.863	106.60	70.24	0.835 ± 0.024	99 ± 17

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tiwari, S.; Dash, K.; Park, N.; Reddy, N.G.S. Machine Learning-Based Prediction of Atmospheric Corrosion Rates Using Environmental and Material Parameters. Coatings 2025, 15, 888. https://doi.org/10.3390/coatings15080888

AMA Style

Tiwari S, Dash K, Park N, Reddy NGS. Machine Learning-Based Prediction of Atmospheric Corrosion Rates Using Environmental and Material Parameters. Coatings. 2025; 15(8):888. https://doi.org/10.3390/coatings15080888

Chicago/Turabian Style

Tiwari, Saurabh, Khushbu Dash, Nokeun Park, and Nagireddy Gari Subba Reddy. 2025. "Machine Learning-Based Prediction of Atmospheric Corrosion Rates Using Environmental and Material Parameters" Coatings 15, no. 8: 888. https://doi.org/10.3390/coatings15080888

APA Style

Tiwari, S., Dash, K., Park, N., & Reddy, N. G. S. (2025). Machine Learning-Based Prediction of Atmospheric Corrosion Rates Using Environmental and Material Parameters. Coatings, 15(8), 888. https://doi.org/10.3390/coatings15080888

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Based Prediction of Atmospheric Corrosion Rates Using Environmental and Material Parameters

Abstract

1. Introduction

2. Materials and Methods

2.1. Synthetic Dataset Generation

2.2. Dataset Characteristics and Statistical Analysis

2.3. Machine Learning Pipeline and Model Development

2.3.1. Data Preprocessing and Feature Engineering

2.3.2. Model Selection and Implementation

2.3.3. Model Validation and Statistical Analysis

2.3.4. Performance Metrics and Evaluation Framework

3. Results and Discussion

3.1. Model Performance and Diagnostic

3.2. Feature Importance Analysis

3.3. Correlation Analysis and Feature Relationships

3.4. Implications for Corrosion Science and Engineering Applications

3.4.1. Materials Design and Alloy Development

3.4.2. Protective Coating System Design

3.4.3. Infrastructure Risk Assessment and Maintenance Planning

3.5. Comparison with Existing Literature and Validation

3.6. Limitations and Future Research Directions

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI