Next Article in Journal
Microstructure and Mechanical Properties of Laser-Clad Stellite 6 Coatings with Thermal Field Assistance
Previous Article in Journal
Urushiol-Based Antimicrobial Coatings for Lacquer Art Applications: A Review of Mechanisms, Durability, and Safety
Previous Article in Special Issue
A Comparative Study of Microstructure and Tribological Properties of Electroless Ni-P, Ni-W-P, and Ni-Ce-P Coatings on 6061 Aluminum Alloy: The Role of Heat Treatment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Achieving High Hardness and Uniformity in Fe-Based Amorphous Coatings for Enhanced Wear Resistance via Explainable Machine Learning

1
Defense Innovation Institute, Academy of Military Sciences, Beijing 100071, China
2
State Key Laboratory of Advanced Marine Materials, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Coatings 2026, 16(2), 199; https://doi.org/10.3390/coatings16020199 (registering DOI)
Submission received: 14 January 2026 / Revised: 29 January 2026 / Accepted: 2 February 2026 / Published: 5 February 2026
(This article belongs to the Special Issue Advanced Corrosion- and Wear-Resistant Coatings)

Highlights

What are the main findings?
A unified HVAF process optimization framework is proposed by integrating DDPM-based data augmentation with explainable machine learning.
DDPM generates synthetic samples with the highest statistical fidelity and distributional consistency, effectively mitigating data scarcity.
What are the implications of the main findings?
The optimized GBR model, enhanced with 10% DDPM-generated data, achieves superior prediction accuracy and generalization for coating hardness and uniformity.
SHAP analysis quantitatively reveals the dominant effect of spraying distance and uncovers coupled mechanisms governing hardness uniformity.

Abstract

High-Velocity Air-Fuel (HVAF) spraying of Fe-based amorphous coatings involves strong nonlinear coupling among multiple process parameters, while practical optimization is severely constrained by limited experimental data and poor model interpretability. To address these challenges, a systematic data-driven optimization framework integrating the Denoising Diffusion Probabilistic Model (DDPM)-based data augmentation with explainable machine learning is proposed. Coating microhardness and hardness uniformity were jointly selected as target properties to capture both performance level and spatial reliability. Three generative models—Generative Adversarial Network (GAN), Variational Autoencoder (VAE), and DDPM—were comparatively evaluated using statistical matching and distribution-consistency metrics, revealing that DDPM most faithfully reproduces the intrinsic statistical characteristics of real HVAF process data. We benchmarked ten representative regression algorithms covering classical statistical learning, ensemble methods, and deep learning paradigms, with GBR demonstrating the highest predictive accuracy and stability. The inclusion of 10% DDPM-generated samples further improved the predictive precision of the GBR model. SHapley Additive exPlanations (SHAP) quantitatively identified spraying distance as the dominant parameter governing coating hardness, while elucidating the coupled effects of multiple parameters on hardness uniformity. By interpolatively expanding the process parameter space, a two-stage screening strategy identified 98 high-performance parameter combinations. Experimental validation confirmed that the optimal parameter set simultaneously achieved higher hardness and improved uniformity compared with the original best condition, resulting in a 13.6% reduction in wear rate.

1. Introduction

High-velocity air–fuel (HVAF) spraying, characterized by its low-temperature and high-velocity particle dynamics, enables high-energy impact deposition while effectively suppressing in-flight oxidation and thermal degradation. This unique combination makes HVAF particularly suitable for the fabrication of metallic glass coatings with high density and high amorphous phase retention. For Fe-based amorphous alloy systems, HVAF-deposited coatings have been demonstrated to exhibit superior wear resistance, enhanced corrosion protection, and improved load-bearing capability compared with conventional flame spraying and high-velocity oxygen–fuel (HVOF) processes, thereby meeting the stringent requirements of demanding service environments [1,2]. Despite these advantages, the HVAF process is inherently governed by complex and strongly nonlinear interactions among multiple process variables, including gas dynamics, particle heating and acceleration, torch motion, and deposition behavior [3]. Key parameters such as gas pressure, powder feeding rate, spraying distance, and torch travel conditions are tightly coupled, giving rise to a highly multidimensional and nonlinear process space. As a consequence, conventional trial-and-error optimization strategies become inefficient and resource-intensive, requiring extensive experimental iterations to approach acceptable process windows [4,5,6]. This intrinsic complexity necessitates the development of advanced and systematic optimization strategies tailored to Fe-based amorphous thermal spray coatings.
To address the aforementioned complexity, a variety of optimization methodologies have been explored, including statistical design approaches, numerical simulations, and machine learning (ML) techniques [3]. Among these, ML has attracted increasing attention owing to its capability to capture complex and strongly nonlinear relationships in high-dimensional parameter spaces without requiring explicit physical or mathematical formulations. By autonomously learning the intrinsic correlations between process parameters and coating properties from experimental data, ML-based models have demonstrated superior predictive performance in multiparameter process optimization and have shown particular advantages in handling multifactorial and strongly coupled thermal spray systems [7]. In contrast, statistical optimization methods are typically constrained by predefined regression structures, while numerical simulations often rely on simplified physical assumptions and are limited by computational cost and modeling accuracy. Despite these advantages, the practical deployment of ML in thermal spray optimization remains challenged by the scarcity of high-quality experimental data under realistic industrial conditions [8]. Moreover, most ML models function as black boxes, providing limited insight into the underlying decision-making processes. This lack of interpretability restricts the mechanistic understanding of parameter–property relationships and poses a critical barrier to the reliable and broader application of ML-driven optimization in engineering practice [9].
Most existing studies focus on improving model prediction accuracy in specific equipment setups. However, they often fail to examine the fundamental physical mechanisms that explain how process parameters affect coating performance [10,11,12,13]. For example, Lv et al. [14] employed factor influence rankings derived from orthogonal experimental designs to provide a certain level of order-based interpretability; however, such analysis remains essentially statistical in nature and does not reveal the intrinsic nonlinear interactions embedded within the back-propagation neural network (BPNN) model. Similarly, Gao et al. [15] mitigated data imbalance through nonlinear transformations and oversampling strategies, yet these operations inevitably compromised the transparency of parameter–property relationships. In addition, the introduction of coupled features to compress the process space further obscured the individual contributions of key process parameters. Shifting focus to advances in alloy design, interpretable machine learning, and data augmentation frameworks is becoming more prevalent in alloy design and microstructural optimization. Qin et al. [16] integrated sigmoid fitting with a conditional generative adversarial network (CGAN) to establish a property–composition–structure (P–C–S) modeling chain for Mg–Nd alloys, thereby enabling performance-driven microstructure generation. Wang et al. [17] combined interpretable gradient boosting models with data augmentation and reconstruction strategies to identify low-alloyed magnesium systems exhibiting an excellent strength–ductility synergy. Collectively, these studies highlight the considerable potential of coupling generative modeling with explainable learning algorithms to enhance both efficiency and transparency in data-driven materials design. Nevertheless, existing efforts predominantly focus on correlations among composition, structure, and properties. In contrast, the process–structure–property relationships that are critical for understanding complex manufacturing systems, such as HVAF spraying, remain rarely explored from a data-driven and interpretable perspective.
This study proposes and validates a process parameter optimization framework that integrates data augmentation techniques with explainable machine learning. The effects of six key HVAF parameters on coating microhardness and hardness uniformity were systematically investigated. The study compared three data augmentation methods: Generative Adversarial Network (GAN), Variational Autoencoder (VAE), and Denoising Diffusion Probabilistic Model (DDPM), and found that the DDPM produced the highest-quality synthetic data. A comparative evaluation of ten regression algorithms identified Gradient Boosting Regression (GBR) as the optimal predictive model. Incorporating DDPM-augmented data further enhanced GBR performance. To overcome the black-box limitation of conventional ML models, SHapley Additive exPlanations (SHAP) were employed to quantitatively reveal both the individual and synergistic effects of HVAF process parameters on coating hardness, identifying spraying distance as the dominant governing factor. By integrating mechanistic interpretability with enhanced predictive capability, the proposed framework delivers actionable insights for efficient process-window optimization and offers a transferable strategy for data-driven optimization of complex thermal spray systems.

2. Materials and Methods

2.1. Spraying Materials

The raw material used in this study was Fe52Ni5Nb5Cr16Mo5B12Si5 (at.%) amorphous powder synthesized via vacuum arc melting and gas atomization under vacuum conditions. The microstructural characterization was performed using a ZEISS Gemini SEM 300 (Carl Zeiss, Jena, Germany) equipped with an Oxford Xplore 30 EDS detector(Oxford Instruments, Oxford, United Kingdom). The resulting powders exhibited a spherical morphology with a controlled particle size distribution ranging from 20 to 45 µm (Figure 1). X-ray diffraction (XRD) powder analysis revealed a broad hump-like diffraction pattern characteristic of amorphous materials, confirming the predominantly amorphous structure. Quantitative fitting of the XRD pattern using the Verdon method [18] estimated the amorphous phase content to be approximately 90% by volume.
Q235 carbon steel plates, commonly used in industrial applications, were selected as the substrate material. Before coating deposition, the substrates were thoroughly degreased using acetone to eliminate surface contaminants such as oils, oxide layers, and rust that could compromise coating adhesion. Surface roughening was subsequently performed by grit blasting with white corundum abrasives. The blasting parameters were set as follows: pressure of 0.7 MPa, standoff distance of 100 mm, and impact angle of 70–80°, ensuring optimal mechanical interlocking between the coating and the substrate surface.

2.2. Coating Fabrication and Characterization

The coatings were deposited using an AcuKote HVAF thermal spray system (AMETEK, Pittsburgh, PA, USA) with a propane-fueled AK-C7 torch, as illustrated in Figure 2. An experimental parameter matrix (Table 1) was created based on combustion kinetics and thermal spray principles, capturing key variables in gas dynamics, motion control, and material feed. Six process parameters—air pressure, propane pressure, torch traversing velocity, torch shifting distance, powder feeding rate (rotary powder feeder), and spraying distance—were selected for study. Using a full-factorial design yielded 288 parameter combinations. Air–propane pressure pairs were grouped into three categories: 60–62 psi, 70–73 psi, and 80–84 psi. This approach aimed to capture both individual effects and nonlinear interactions of each parameter.
After deposition, microhardness tests were conducted on the cross-sections of the coatings using an FM700 Vickers microhardness tester (Future-Tech, Kawasaki, Japan), equipped with a standard diamond pyramidal indenter, following ASTM E384 [19]. A load of 100 g was applied for 15 s. For each sample, ten indentation points were arranged at 50 µm intervals along the direction parallel to the coating surface. The highest and lowest values were discarded, and the average of the remaining eight measurements was used as the representative hardness value for that sample. The standard deviation of the eight values was also calculated to quantify hardness uniformity, using the following equation:
σ = 1 N i = 1 N ( x i μ ) 2
where N is the number of data points, x i is the ith data value, and μ is the mean.
Figure S1 shows the distribution of measured microhardness values. Over 70% of the data points fall within the 800–1000 HV0.1 range, with the 900–1000 HV0.1 interval showing the highest frequency (over 41%). This indicates that the coatings predominantly exhibit mid-to-high microhardness characteristics.

2.3. Data Preprocessing and Machine Learning Framework

Due to the significant differences in the dimensional scales of various process parameters, directly using raw data for model training may lead to dominance of features with larger numerical values during gradient descent, ultimately impairing training efficiency and model stability. To address this, all input features were standardized using Z-score normalization:
z = x μ σ
where x denotes the original value, μ is the mean, and σ is the standard deviation of the dataset. This normalization process eliminates dimensional disparities and centers the feature distribution, thereby improving model convergence and training stability.
The optimization framework proposed in this study follows a two-stage strategy. In the first stage, the optimal machine learning model is identified using the original experimental dataset. In the second stage, the effect of data augmentation on these candidate models is systematically evaluated to determine the most effective model–augmentation combination. Based on this strategy, 10 representative regression models were selected to comprehensively evaluate the applicability and performance of machine learning algorithms for predicting coating microhardness from HVAF process parameters. These models span classical statistical learning, ensemble methods, and deep learning paradigms, and include: Bayesian Regression (BR), CatBoost, Convolutional Neural Network (CNN), Gaussian Process Regression (GPR), GBR, K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), Random Forest (RF), Support Vector Machine (SVM), and XGBoost. Model performance was quantitatively assessed using two widely adopted evaluation metrics: the coefficient of determination ( R 2 ) and the root mean squared error (RMSE).
Systematic hyperparameter tuning was performed to maximize the predictive capabilities of each algorithm and avoid suboptimal performance due to poorly chosen hyperparameters. A combination of randomized search and cross-validation strategies was employed during model training and evaluation [20]. Specifically, each model underwent 60 iterations of randomized hyperparameter search combined with five-fold cross-validation to ensure robust generalization and high predictive accuracy.

2.4. Deep Generative Models and Data Augmentation Strategy

In thermal spraying applications, the acquisition of high-quality experimental data is costly and time-consuming, resulting in severe data scarcity that hampers the training of machine learning models and limits both predictive accuracy and generalization capability. To address this challenge, this study systematically investigates and compares three state-of-the-art deep generative models—GAN [21], VAE [22], and DDPM [23]—to evaluate their ability to generate high-quality, diverse synthetic data that conforms to the underlying physical constraints of the HVAF process. The fundamental working principles of these generative models are briefly summarized below:
GAN is a deep learning framework based on adversarial training between two neural networks—a generator and a discriminator. The generator learns to produce realistic synthetic samples from random noise, while the discriminator attempts to distinguish between real and generated samples. The generator is optimized to minimize the discriminator’s classification accuracy, while the discriminator seeks to maximize its ability to differentiate between genuine and synthetic data. Through an iterative game-theoretic training process (Figure S2), both networks continuously improve, ideally converging to a Nash equilibrium where the generated data distribution closely matches the real data.
As shown in Figure S3, VAE is a probabilistic generative model that encodes input data into a latent space distribution, samples latent variables via the reparameterization trick, and reconstructs data through a decoder. The training objective is to minimize the Evidence Lower Bound (ELBO), enabling the model to learn structured and interpolatable latent representations. Unlike traditional autoencoders, VAEs incorporate variational inference (VI) to approximate the true posterior of the latent variables, allowing for both stochastic generation and conditional sample control.
The DDPM is a generative model based on a diffusion process, as illustrated in Figure S4. Its core concept involves generating high-quality data samples by simulating and reversing the noise addition process. DDPM consists of two key stages: the forward diffusion process and the reverse denoising process. In the forward process, Gaussian noise is gradually added to the original data in multiple steps, ultimately transforming the data into pure noise. In the reverse process, a neural network is trained to learn how to iteratively remove the noise, progressively restoring the original data or generating new data samples.
Although data augmentation has shown promising potential in materials science, there remains a lack of quantitative criteria for systematically evaluating the similarity between synthetic and original data distributions. To compensate for the lack of quality evaluation metrics for structured data augmentation, this study proposes a dual-dimensional evaluation framework that integrates statistical matching and distributional consistency metrics. Statistical matching refers to the degree of alignment between the key statistical features of the synthetic and real datasets, while distributional consistency reflects their overall similarity in probability distribution. In addition, an iterative selection strategy was employed to ensure that the generated data not only meet statistical criteria but also conform to the physical constraints of the coating process. The Kolmogorov–Smirnov (KS) two-sample test is a non-parametric statistical method to assess the similarity between synthetic and real data. This test measures the maximum difference between the cumulative distribution functions (CDFs) of two datasets [24]. The resulting KS p-value indicates the likelihood that the two datasets originate from the same distribution—higher p-values reflect greater distributional similarity.
The overall data augmentation framework is illustrated in Figure 3. Initially, the raw process dataset undergoes an integrity check, followed by the independent standardization of input and output features. This step eliminates dimensional discrepancies, thereby enhancing both training convergence and model generalization. Prior to augmentation, the complete experimental dataset is randomly partitioned into a training set (70%) and a hold-out test set (30%). An iterative selection mechanism is introduced: a preset number of synthetic samples is produced during each user-defined generation round. For each batch, the KS statistic and its corresponding p-value are computed between the synthetic and the original test dataset. According to hypothesis testing principles, the subset with the highest p-value is retained as the optimal batch. To ensure physical consistency and prevent the generation of physically implausible samples, a constraint mechanism utilizing a physics-informed proxy model was introduced. As detailed in the Supplementary Information (Figure S5), this pre-trained proxy guides the augmented data toward physically valid regions of the parameter space. Finally, the synthetic data are validated against the hold-out test set, with all samples undergoing inverse standardization to restore their original scales for direct comparability with experimental values. This iterative selection strategy ensures that only statistically faithful and physically consistent synthetic data are incorporated into downstream model training.

2.5. Explainable Machine Learning Approach

To gain deeper insights into the mechanisms by which process parameters influence coating performance—and to address the common “black-box” limitations of machine learning models in complex process optimization—this study adopts the SHAP framework for feature interpretation. SHAP is a model-agnostic interpretability method grounded in cooperative game theory. It attributes the contribution of each input feature to the model’s output by fairly distributing the difference between the prediction and a baseline value across all features, thereby enabling a consistent and transparent explanation of model decisions [25]. SHAP value for a given feature i, denoted ϕ i f , x , is mathematically defined as
ϕ i f , x = S F i   S ! F S 1 ! F ! f x S i f x S
where F is the set of all features, S is a subset of features excluding i , and f x S represents the model output when only the features in the subset S are present.
In this study, feature importance was first quantified by computing each feature’s mean absolute SHAP value across all predictions. Features were then ranked according to their contributions to coating performance, allowing for the identification of parameters with the most significant individual influence. This ranking reflects the predictive importance of each feature and reveals the directionality of its relationships with the output. Furthermore, to explore potential synergistic or antagonistic effects between features, SHAP interaction values were computed. In HVAF spraying and other complex manufacturing processes, the influence of a single parameter on performance can often be modulated by interactions with other variables. Identifying and quantifying these interactions is therefore crucial to understanding the whole process mechanism. The SHAP interaction value between two features i and j , denoted ϕ i , j , is calculated as follows [26]:
ϕ i , j = S F i , j   S ! F S 2 ! 2 F 1 ! f S i , j f S i f S j + f S
Here, ϕ i , j captures the extent to which the joint contribution of features i and j deviates from the sum of their individual effects, thereby quantifying the strength of their interaction.

3. Results and Discussion

3.1. Comparison of Machine Learning Model Performance

The performance of ten machine learning algorithms was systematically evaluated for optimizing the HVAF process. A complete comparison of these ten models is provided in Table S1. As anticipated, the fitting accuracy on the training set generally exceeded that on the test set, as model parameters are optimized directly on the training data, inherently resulting in lower errors and higher precision within that dataset. Based on a preliminary screening of test-set performance, Figure 4(a1,a2) shows the evaluation results for the four top-performing models. Among all evaluated algorithms, GBR demonstrated the superior overall performance, achieving an R2 of 0.95 and an RMSE of only 0.175 on the test set. This underscores GBR’s strong predictive accuracy and generalization capability. Other ensemble learning models, such as CatBoost and XGBoost, also achieved high performance, with test-set R2 values exceeding 0.91. In contrast, models such as RF and MLP showed strong training performance but significant declines in test set accuracy, indicating overfitting. The KNN model, owing to its non-parametric nature, maintained consistent performance and robust generalization across datasets. From a comprehensive perspective—considering test accuracy, generalization, and overfitting risk—GBR, KNN, CatBoost, and XGBoost emerged as the top-performing models. A randomized search strategy was employed within the relevant hyperparameter spaces to identify optimal configurations that fully leverage each model’s learning potential. The resulting optimal hyperparameter sets are listed in Table S2.
Figure 4(b1–b4) compares the predicted and experimentally measured microhardness values for the four best-performing models on both the training and testing datasets. The horizontal axis represents the experimentally measured values, while the vertical axis denotes the predicted values. The diagonal line y = x indicates perfect agreement between prediction and measurement. The closer the scatter points lie to this line, the more accurate the model predictions; greater deviations reflect larger prediction errors. The predictions from all four models are generally distributed near the diagonal, indicating that they effectively capture the relationships between HVAF process parameters and coating performance. Both the GBR and CatBoost models exhibit the tightest clustering of predicted points along the diagonal in the test set. However, the GBR model has fewer points far from the diagonal than CatBoost, indicating that it is better at handling outliers. This results in the highest level of predictive accuracy and minimal deviation from experimental results.

3.2. Evaluation of Data Augmentation Models

3.2.1. Selection of the Augmentation Model

To evaluate the statistical matching of synthetic data generated by different models, the relative deviations in key statistical metrics—namely, the mean ( μ ) and standard deviation ( σ )—between synthetic and original process parameters were computed. These metrics provide a quantitative measure of how closely the generated data resembles the real data regarding statistical consistency. The relative deviations are defined as follows:
Δ μ = μ s y n μ r e a l μ r e a l × 100 % , Δ σ = σ s y n σ r e a l σ r e a l × 100 %
where μ s y n and σ s y n represent the mean and standard deviation of the synthetic data, μ r e a l and σ r e a l are those of the original data.
As shown in Figure 5(a1), the DDPM demonstrated the highest accuracy in reconstructing the means of key process parameters, with most relative mean deviations remaining within ±5%. This indicates a strong ability to preserve the central tendency of the original data. Similarly, Figure 5(a2) shows that the DDPM Δ σ values are generally lower than those of other models, indicating that DDPM is well-suited to capturing data variability. In contrast, the VAE showed slightly inferior performance in terms of mean reconstruction—though still within acceptable limits—but exhibited substantially larger deviations in standard deviation, suggesting limitations in modeling data variability. GAN performed the worst overall, with relatively high deviations in both mean and standard deviation, indicating insufficient fidelity and poor robustness in simulating data diversity. DDPM outperformed VAE and GAN in two aspects of statistical matching—mean and standard deviation—highlighting its superior ability to generate synthetic data that is highly consistent with the statistical properties of real HVAF process data.
To visually assess distributional consistency, the Uniform Manifold Approximation and Projection (UMAP) technique was employed to perform dimensionality reduction on the generated data and 86 randomly selected samples from the test set. The resulting low-dimensional embeddings were then used to visualize and compare the distributions of synthetic and real samples in the feature space. UMAP optimizes the layout of low-dimensional embeddings by minimizing cross-entropy between high- and low-dimensional neighborhood graphs [27]. Compared with Principal Component Analysis (PCA) [28] and t-distributed Stochastic Neighbor Embedding (t-SNE) [29], UMAP provides superior performance in preserving both local and global data structures. As shown in Figure 5(b1–b3) (2D plots; corresponding 3D plots in Figure S6), the three generative models exhibited clearly different behaviors in feature space.
The GAN demonstrated relatively good alignment with the original data distribution in certain subspaces, with high-density regions showing substantial overlap. However, noticeable gaps appeared in other feature areas, particularly in the tail regions of the distribution. These discontinuities are likely attributable to mode collapse, a common issue in GAN where a few clusters are oversampled while others are underrepresented or entirely missing [30]. In contrast, the VAE generated samples that partially overlapped with the original data in several regions but exhibited a pronounced clustering effect. Specifically, the synthetic data tended to concentrate around cluster centers, while the peripheries of the data space remained sparsely populated. This artifact likely stems from the prior assumptions in the latent space imposed by the VAE framework, which limit its ability to capture data diversity fully [31]. By comparison, the DDPM exhibited the highest fidelity to the original data distribution. Its generated samples not only matched the global distributional shape but also fully covered both the major feature subspaces and boundary regions. The core advantage of the DDPM lies in its progressive, Markov chain-based generative process. By reconstructing data through a series of incremental denoising steps, DDPM closely mirrors the intrinsic nature of many physical phenomena, such as diffusion and solidification [32]. This structural analogy enables DDPM to theoretically approximate arbitrarily complex data distributions while maintaining a simple and stable training objective that fundamentally avoids mode collapse [33].
To further validate model performance, five quantitative metrics were used to systematically compare the three generative models (Table S3). DDPM consistently achieved the best scores across all indicators, obtaining the lowest Fréchet Inception Distance (FID) of 0.562 (compared to 1.614 for VAE and 0.977 for GAN), the lowest Kernel Inception Distance (KID) of 0.031, and the lowest Maximum Mean Discrepancy (MMD) of 0.087. In addition, the Kolmogorov–Smirnov (KS) test yielded a p-value of 0.608 for DDPM, which is significantly higher than the significance threshold (α = 0.05), indicating statistical acceptance of distributional equivalence between the synthetic and original datasets. These metrics were selected to evaluate data fidelity from multiple perspectives: the KS statistic verifies the marginal consistency of individual parameters, whereas distance-based metrics (FID, KID, and MMD) evaluate high-dimensional joint distributions. This approach ensures the effective preservation of the complex nonlinear couplings and implicit physical correlations inherent to the HVAF process. This comprehensive analysis selected DDPM as the core generative model for data augmentation in this study.

3.2.2. Validation of Augmentation Model Performance

To systematically assess the impact of synthetic data volume on model performance and determine the optimal augmentation ratio, a sensitivity analysis was designed. Additional synthetic samples were generated using DDPM and separately incorporated into the training dataset to expand the effective training volume. A progressive sample augmentation strategy was employed to systematically investigate the effect of DDPM-generated data on the predictive performance of four representative models: GBR, KNN, CatBoost, and XGBoost.
As shown in Figure 6, test set performance was evaluated under five augmentation scenarios—0% (no augmentation), 5%, 10%, 15%, and 20% additional synthetic data. The results demonstrate that data augmentation had a significant and model-dependent effect on prediction accuracy. Among the four models, GBR exhibited the most substantial improvement. Its test RMSE decreased noticeably while its R2 remained consistently high, confirming it as the optimal model for this task. CatBoost also benefited from data augmentation, particularly at low augmentation ratios; however, excessive synthetic data led to a degradation in performance. KNN showed stable but limited gains, while XGBoost experienced minimal benefit from the augmented data. These findings clearly validate the effectiveness of data augmentation in the context of materials process optimization. Notably, using high-fidelity synthetic samples generated by DDPM significantly enhanced model accuracy and generalization. Nevertheless, the study also highlights a diminishing return phenomenon: even high-quality synthetic data may degrade model performance if introduced excessively, possibly due to redundancy or distributional bias. This phenomenon primarily arises from the balance between the limited dataset size and the risk of overfitting. Given the small size of the original training set, introducing 5%–10% high-quality synthetic data effectively increases sample diversity, thereby enhancing model generalization. However, when the augmentation ratio exceeds 10%, the proportion of synthetic data becomes too large. Despite its high fidelity, the synthetic data introduces subtle distributional shifts, akin to injecting noise into the training labels [34,35]. This causes the model to overfit to spurious features in the synthetic data, rather than learning the true physical relationships, ultimately compromising its predictive accuracy on the original test set [36].
In summary, DDPM proved to be a highly effective augmentation tool in this study, substantially boosting model generalization and predictive performance.

3.3. Model Interpretability Analysis

3.3.1. SHAP Analysis of Coating Microhardness

Figure 7(a1,a2) displays the SHAP summary plots for coating microhardness. Each row corresponds to a process feature, and the x-axis shows the SHAP value, indicating the magnitude and direction of that feature’s contribution to the predicted hardness. Features are ranked by their mean absolute SHAP values. A wider spread of dots along the x-axis denotes higher variability in influence, while dense regions suggest sample clustering. Each dot represents a single sample, where red indicates higher feature values, and blue indicates lower values. The summary plot ranks feature importance and reveals how individual feature values affect the model’s predictions. In Figure 7(a1), for the most critical feature—spraying distance—red dots (high values) are concentrated on the left side (negative SHAP values), indicating a negative correlation with hardness. In contrast, blue dots (representing shorter distances) are associated with positive SHAP values, indicating a beneficial effect on hardness. This trend aligns well with experimental observations in this study. Although powder feeding rate exhibits a narrower SHAP spread than spraying distance, it still demonstrates a clear directional impact. Red dots (higher feed rates) predominantly appear on the right side (positive SHAP values), while blue dots (lower feed rates) cluster on the left, indicating that increasing powder feeding rate tends to enhance microhardness. As shown in Figure 7(a2), the influence of process parameters on microhardness follows the order: spraying distance > torch traversing velocity > air pressure > propane pressure > powder feeding rate > torch shifting distance. Among these, spraying distance exhibits the broadest distribution of SHAP values. Its impact on coating hardness is five to six times greater than that of other process parameters, identifying it as the most critical determinant of coating hardness.
To further elucidate how individual process parameters influence coating hardness predictions, SHAP dependence plots were generated for each key feature. As shown in Figure 7(b1–b6), each plot uses a single process variable on the x-axis and its corresponding SHAP value on the y-axis. The color of each data point reflects the feature’s actual value—red for higher values and blue for lower values. Among all parameters, spraying distance has the greatest impact on hardness, as indicated by its SHAP values, which show a distinct negative correlation. An inflection point is observed near 270 mm: below this threshold, SHAP values are predominantly positive, indicating that shorter spraying distances contribute favorably to hardness. Conversely, when the distance exceeds 270 mm, SHAP values shift rapidly to negative values, suggesting that excessive distance compromises coating performance. The powder feeding rate also shows a significant positive trend, with higher rates corresponding to larger SHAP values; specifically, increasing the rate to 4 r/min yields a marked improvement in hardness. In contrast, torch traversing velocity shows a gradual dependence, maintaining a weak negative correlation with hardness.
The physical interpretation of these findings can be attributed to the thermal and kinetic behavior of molten particles in the HVAF spraying process. Figure S7 illustrates the microstructural evolution of representative coatings deposited at varying spraying distances. As the spraying distance increases, a distinct accumulation of unmelted particles and a degradation in lamellar continuity are observed, resulting in a nearly 17-fold surge in overall porosity. At shorter spraying distances, particles retain more of their initial high velocity and thermal energy. When these high-energy particles impact the substrate, they undergo intense deformation and extensive spreading, creating irregular microstructural features that increase surface roughness. This complete spreading promotes tight stacking and strong bonding among particles and between the coating and substrate, reducing internal porosity and defects. As a result, the higher coating density enhances hardness by improving resistance to indentation and deformation. Conversely, at longer spraying distances, particles travel farther through the air, increasing convective heat loss to the surrounding environment. This leads to a drop in particle temperature, while air drag reduces their velocity, diminishing kinetic energy during impact [37,38]. As a result, particles may cool prematurely, which increases their viscosity and limits their ability to spread upon impact. The lower spreading efficiency leads to a smoother but less compact microstructure. Additionally, partially unmelted particles may be embedded in the coating or fail to bond adequately, introducing porosity and structural defects that weaken the mechanical properties [39,40]. As shown in Figure S8, surface irregularities decrease significantly with increasing spraying distance, providing visual evidence of insufficient particle deformation and bonding. This correlates with increased internal porosity, unmolten regions, and other structural discontinuities, which together reduce the overall coating density and hardness [41]. This trend aligns well with established findings in the field of thermal spraying. Liu et al. [10,13] developed a two-stage ANN model, showing that the spraying distance has a significant negative impact on hardness, primarily by reducing the velocity and temperature of in-flight particles. Similarly, using an improved WOA-ANN model for the HVAF process, Ye et al. [12] observed a distinct negative correlation between spraying distance and coating hardness, with optimization results confirming that shorter spraying distances are required to achieve high hardness.
To investigate the nonlinear interactions among key process parameters, a parameter interaction effect matrix was constructed based on the SHAP interaction analysis framework. Each element in the matrix quantifies the pairwise interaction strength between two features, capturing how the combined influence of two parameters deviates from the sum of their individual effects. The diagonal elements of the matrix represent the independent contribution of each parameter to the prediction of coating hardness. Larger diagonal values indicate stronger individual effects. To improve visualization clarity, all diagonal values were set to zero in Figure 8, and the complete matrix, including diagonals, is provided in Figure S9. From the matrix, the most striking observation is the dominant role of spraying distance. Its diagonal value reaches 0.768, which significantly exceeds the independent or interaction contributions of all other parameters. These highlight the spraying distance as the most critical factor governing coating hardness, both in isolation and in the context of parameter interactions.
Among all parameter pairs, the interactions between torch traversing velocity and spraying distance (0.053) and between torch shifting distance and spraying distance (0.048) exhibited the highest interaction strengths. These values suggest the existence of non-negligible synergistic effects between these parameters. The two strongest interaction pairs are visualized in Figure S10, which presents three-dimensional SHAP interaction surfaces illustrating their joint influence on coating hardness. However, it is important to note that despite these interactions, spraying distance maintains a dominant main effect. Its overwhelming contribution implies that its influence on coating performance arises primarily from its independent mechanism, rather than through interaction with other parameters. In fact, when a feature exhibits such a strong main effect, its interaction effects may become statistically insignificant or even masked by the magnitude of its individual impact. This observation further underscores the importance of precisely controlling spraying distance in HVAF processes. It remains the most influential and decisive parameter for optimizing coating hardness, exerting a direct and dominant influence on performance outcomes, regardless of its interactions with other variables.

3.3.2. Analysis of Coating Microhardness Uniformity

In evaluating coating performance, in addition to average microhardness, hardness uniformity is also critical. Variations in microhardness reflect not only the distribution of internal defects and the degree of deposition density but also local differences in bonding quality—factors that may significantly influence the overall service reliability of the coating. Therefore, building on the analysis of average hardness, this study introduces the microhardness standard deviation as a key statistical indicator to quantify uniformity, enabling a more comprehensive understanding of how HVAF process parameters affect structural consistency and service stability.
As shown in Figure 9, the ranked influence of spraying features on hardness uniformity is as follows: spraying distance, torch traversing velocity, torch shifting distance, powder feeding rate, propane pressure, and air pressure. Unlike the case of average hardness, no single parameter dominates the variation in microhardness standard deviation. This suggests that hardness uniformity is governed by a complex, multivariate mechanism involving the collective effects of multiple parameters. To further investigate this relationship, SHAP dependence plots and a global SHAP interaction strength matrix were constructed (Figures S11 and S12). The influence of process parameters on hardness uniformity is notably complex. While spraying distance remains the dominant governing factor, its relationship with uniformity is non-linear. Specifically, a sharp reduction in microhardness standard deviation is observed when the spraying distance exceeds 210 mm, followed by a continued, albeit slight, decline as the distance increases further. Torch traversing velocity also exerts a significant impact, with the standard deviation decreasing markedly at velocities above 1 m/s. Furthermore, gas pressure makes a strong positive contribution: increasing it significantly reduces the microhardness standard deviation, thereby enhancing overall uniformity. While spraying distance still exhibited the highest independent contribution (interaction score = 7.939), its dominance was not overwhelming. Other parameters, such as torch traversing velocity (4.467) and torch shifting distance (3.092), also demonstrated significant independent contributions. In addition, several non-diagonal elements in the interaction matrix indicated non-negligible synergistic effects between parameters. For example, spraying distance and powder feeding rate: 1.194. torch traversing velocity and spraying distance: 1.182. torch shifting distance and powder feeding rate: 1.471. These results confirm that variations in hardness uniformity are driven by coupled mechanisms, rather than attributable to any single parameter alone.
The physical mechanisms underlying these effects can be attributed to the joint regulation of particle temperature and velocity by the spraying parameters, which determine the degree of particle melting and the uniformity of their deposition. Specifically, Lower powder feeding rates and longer spraying distances help particles maintain a more stable velocity and thermal profile during flight. This promotes uniform melting and consistent deposition, thereby preventing localized overaccumulation or thickness variation, which can introduce non-uniform hardness and microstructural gradients [41,42,43]. Higher torch traversing velocity and larger step distances reduce the dwell time of the torch at a specific location. This prevents localized overheating and excessive coating buildup, allowing particles to deposit more uniformly. As a result, this minimizes the formation of thermal stress gradients and heterogeneous phase transitions, thereby reducing localized hardness fluctuations [44,45]. Higher air and propane pressures increase the kinetic energy and temperature of the combustion flow, leading to higher in-flight particle velocities and more efficient heat transfer. This allows particles to be heated and melted evenly, thereby reducing thermal variance. Moreover, increased flow pressure and temperature help ensure more uniform spreading upon impact with the substrate, promoting the formation of dense and homogeneous coating structures [46,47,48].

3.4. Parameter Space Expansion and Screening

To further explore the optimization potential of the HVAF process and identify a broader set of high-performance parameter combinations, the parameter space was systematically expanded via interpolation based on the six original key parameters. The expanded parameter design is detailed in Table 2. Given that machine learning models are essentially pattern recognizers, rather than actual physical simulators, they learn statistical correlations embedded in the training data [49]. Therefore, this study restricts augmentation to interpolative expansion within the original parameter ranges, without extrapolating beyond the data boundaries. Specifically, air pressure, propane pressure, torch traversing velocity, torch shifting distance, and powder feeding rate were each divided into five evenly spaced levels, while spraying distance was refined into 15 discrete values with a step size of 15 mm. This expansion yielded a theoretical design space of 9375 combinations, approximately 32 times larger than the original parameter matrix. After removing the 288 combinations already covered by experiments, 9087 new parameter sets were generated for prediction. The GBR model trained with DDPM-augmented data was directly used to predict the mean microhardness and standard deviation for each new parameter set. No additional retraining was required, as the model had already incorporated high-fidelity synthetic samples that enhanced its generalization capability.
A two-stage screening strategy was developed to identify high-performance process combinations effectively. In the first stage, screening thresholds were defined based on the distribution characteristics of the original dataset, where coating microhardness ranged from 600 HV0.1 to 1200 HV0.1, with a maximum value of 1165.13 HV0.1 and a minimum standard deviation of 62.02 HV0.1. Based on the distribution characteristics of the original dataset, the threshold was set at the 80th percentile level for both indicators, with coating hardness used as the primary criterion and the microhardness standard deviation as a secondary constraint. Parameter combinations predicted to exceed the 80th percentile of hardness and fall below the 80th percentile of standard deviation were selected as initial candidates. In the second stage, SHAP value analysis was used to evaluate the feature-level contributions of each process parameter to microhardness in the candidate sets. Preference was given to combinations where the Spraying distance was less than 270 mm, and all other parameters had neutral or positive SHAP contribution. Through a comprehensive examination of SHAP dependence plots and interaction matrices, 98 parameter combinations were ultimately identified as optimal, exhibiting both high predictive confidence and strong interpretability.
To verify the predictive accuracy of the GBR model, one optimal combination (Sample A), two medium-performance combinations (Samples B and C), and one lower-performance combination (Sample D) were selected from the screened dataset. In addition, the best-performing condition from the original experimental dataset (Sample E) was included as a baseline reference. The detailed process parameters of these combinations are provided in Table 3.
All coatings (Samples A–D) were prepared and characterized following the same HVAF spraying and testing procedures used for the original dataset. As shown in Figure 10, the experimentally measured average microhardness values of Samples A, B, C, and D were 1254.13 HV0.1, 1113.62 HV0.1, 1007.17 HV0.1, and 987.35 HV0.1, with corresponding standard deviations of 68.9 HV0.1, 120.13 HV0.1, 123.59 HV0.1, and 134.25 HV0.1, respectively. The relative deviation between experimental and predicted values remains below 2%, indicating that the error is within permissible limits. These experimental results show excellent agreement with the GBR model predictions, demonstrating that the proposed framework can not only accurately identify peak-performance conditions but also provide reliable performance estimations across the high-performance parameter space. Notably, the hardness of Sample A exceeded the maximum value observed in the original dataset, while exhibiting the lowest variability. This indicates that the coating produced under these conditions possessed exceptional density and structural uniformity, achieving optimal overall performance. These findings confirm the effectiveness of the proposed optimization framework in identifying high-performance parameter combinations that extend beyond the originally explored experimental range.

3.5. Comparative Evaluation of Friction and Wear Performance

Building upon the previous parameter space expansion and two-stage screening, this study not only optimized the average microhardness and hardness uniformity of the HVAF-sprayed coatings but also investigated the implications for service reliability, with a particular focus on friction and wear behavior. As a critical indicator of long-term performance in thermal spray applications, wear resistance is closely related to a coating’s hardness, density, and microstructural integrity.
To assess this, a friction and wear test was conducted using Sample A—the optimal predicted parameter combination—and Sample E, the best-performing combination from the original dataset in terms of hardness. Figure 11 presents the cross-sectional wear track profiles, 3D surface morphologies, and quantitative wear metrics of the two coatings under a 15 N applied load. As shown in Figure 11(a1), the wear track depth of Sample A is 3.94 µm, while that of Sample E is 2.93 µm. The corresponding wear track widths are 0.665 mm for Sample A and 0.709 mm for Sample E. Although Sample E exhibits a shallower wear depth, its wear track is slightly wider than that of Sample A. The wear rate was calculated and presented in Figure 11(b1). These results indicate that Sample A achieved approximately a 13.6% improvement in wear resistance compared to Sample E. Despite a slightly deeper wear track, the wear rate was lower for Sample A, reflecting its enhanced resistance to material loss. The superior performance of Sample A can be attributed to its denser microstructure and higher microhardness, which contribute to improved load-bearing capacity and reduced wear-induced damage. Furthermore, its high hardness uniformity supports a more uniform mechanical response during sliding contact.
The friction coefficient curves for Sample A and E under dry sliding conditions are shown in Figure 11(b2). The wear process can be divided into two distinct stages: the running-in stage and the steady-state wear stage. During the initial running-in stage, the friction coefficient exhibits pronounced dynamic fluctuations. This behavior is primarily attributed to the residual surface roughness of the polished coatings. As the counterface slides against the coating, surface asperities are gradually worn down, resulting in intermittent increases and decreases in friction. Once the surfaces stabilize, the system transitions into the steady-state wear stage, where both coatings display relatively stable friction behavior with minor fluctuations. These residual fluctuations are mainly due to the accumulation and redistribution of wear debris particles within the contact interface during sliding. Under an applied load of 15 N, the average friction coefficients for Sample A and Sample E were measured as 0.579 and 0.568, respectively.
Figure 12 (a1–a3) and (b1–b3) show the worn surface morphologies of Sample A and E, respectively. Both samples exhibit parallel plowing grooves aligned with the sliding direction, a hallmark of abrasive wear, indicating that hard debris particles at the contact interface contributed to material removal through a cutting mechanism. During sliding, adhesive contact between the coating surface and the counterface led to localized metal bonding. Subsequent relative motion caused these junctions to rupture, generating debris particles that further acted as abrasive agents, deepening the grooves. Additionally, the accumulation of wear debris in localized regions likely induced contact stress concentration, exacerbating surface material spallation. Energy-dispersive spectroscopy (EDS) analysis revealed the presence of oxidation during the wear process. In the EDS maps (Figure 12(a3,b3)), dark-contrast regions correspond to oxidized zones. The elemental composition at representative point A in these zones was measured as follows: Sample A—Fe18.82Ni1.85Nb1.39Cr6.04Mo1.73Si5.18O65.00; Sample E—Fe17.13Ni1.59Nb1.27Cr5.69Mo1.71Si9.11O63.50. Because EDS has limited accuracy in detecting light elements such as B under vacuum conditions, this element was not considered in the analysis. Magnified surface images also revealed microcracks and delamination, confirming the occurrence of brittle spallation during wear. Using ImageJ software (version 2.35), the oxidized area fractions (black contrast regions) were quantified via grayscale thresholding. The results showed that Sample A had a value of 3.54% and Sample E had a value of 3.56%. This negligible difference indicates that oxidation played a comparable role in the wear processes of both coatings. In summary, under a 15 N load, both coatings experienced synergistic wear mechanisms, including abrasive wear, oxidation wear, and brittle delamination. However, differences in the severity of spallation and groove depth were observed between the two samples, highlighting the influence of processing parameters on wear resistance.
In conclusion, the enhanced microhardness and reduced hardness fluctuation of Sample A led to the formation of a denser, more homogeneous microstructure, which in turn significantly improved its resistance to abrasive and oxidative wear. These findings further verify that microhardness and structural uniformity are key contributors to the wear resistance of thermally sprayed coatings. The tribological tests conducted in this study constitute short-term comparative evaluations under fixed loading and sliding conditions. Consequently, the results primarily serve to validate the effectiveness of the optimized spraying parameters rather than to provide a comprehensive assessment of long-term coating durability under complex service environments.

4. Conclusions

To overcome the challenges of strong multi-parameter coupling, limited experimental data, and insufficient model interpretability in the optimization of HVAF-sprayed Fe-based amorphous coatings, a systematic optimization framework integrating data augmentation with explainable machine learning was developed and validated. By coupling a DDPM-based data augmentation strategy with SHAP, the proposed framework simultaneously enhances predictive accuracy and provides quantitative mechanistic insight into process–property relationships. The principal conclusions can be summarized as follows:
(1)
A comprehensive comparison of three generative models—GAN, VAE, and DDPM—demonstrated that DDPM exhibits superior statistical fidelity and distribution consistency with respect to real HVAF process data, thereby effectively mitigating the limitations imposed by data scarcity. Among ten representative regression algorithms, GBR delivered the highest predictive accuracy and robustness. Furthermore, augmenting the training dataset with 10% DDPM-generated samples led to a further improvement in both prediction accuracy and model generalization.
(2)
SHAP-based feature importance and interaction analyses quantitatively revealed that spraying distance plays a dominant governing role in determining coating microhardness, substantially outweighing the contributions of other process parameters. In contrast, hardness uniformity is jointly regulated by the coupled effects of torch traversing velocity, torch shifting distance, and powder feeding rate. These findings not only enhance the transparency and interpretability of the predictive models but also provide mechanism-informed guidance directly applicable to engineering-oriented process optimization.
(3)
By interpolatively expanding the original process parameter space and implementing a two-stage screening strategy, 98 high-potential process parameter combinations were identified. Experimental validation confirmed that the optimal parameter set simultaneously achieved a hardness level exceeding the maximum value observed in the original experimental dataset and improved hardness uniformity, resulting in a 13.6% reduction in wear rate compared with the reference condition. These results verify the effectiveness, robustness, and practical feasibility of the proposed framework in advancing process-window optimization and enhancing the performance of HVAF-sprayed Fe-based amorphous coatings.
Although this study focuses on the microhardness and hardness uniformity of HVAF-sprayed Fe-based amorphous coatings, the proposed optimization framework is inherently extendable to other thermal spray processes and material systems. The workflow learns directly from experimental input–output data and is therefore not restricted to the specific physical equations of the HVAF process. With DDPM addressing the small-data challenge and SHAP enabling interpretation of nonlinear process effects, the framework can be readily adapted to other spraying techniques (e.g., HVOF and APS) or material compositions, provided that representative datasets are available. This flexibility underscores the framework’s potential as a generic tool for accelerating process optimization across the broader field of surface engineering.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/coatings16020199/s1: Figure S1. Frequency distribution histogram of the measured coating microhardness values from the original experimental dataset, indicating a predominant concentration in the medium-to-high-hardness range. Figure S2. Schematic illustration of the working principle of a Generative Adversarial Network (GAN). Figure S3. Working principle of the Variational Autoencoder (VAE). Figure S4. Schematic illustration of the working principle of DDPM. The diagram shows the forward diffusion process, where noise is progressively added to the data, and the reverse generation process, where the model learns to denoise and recover the original data distribution. Figure S5. Schematic workflow of the generative adversarial network framework integrated with physics-informed surrogate model constraints. Table S1. Performance metrics for the preliminary screening of ten regression algorithms on training and test datasets. Table S2. Optimal hyperparameters for each machine learning model. Table S3. Multi-dimensional evaluation metrics for generative models. Figure S6. 3D UMAP visualizations of synthetic data distributions generated by (a) GAN, (b) VAE, and (c) DDPM. Figure S7. Cross-sectional morphologies and porosity of Fe-based amorphous coatings: (a–h) cross-sectional morphologies of coatings deposited at increasing spraying distances (different magnifications were selected for clarity); (i) corresponding coating porosity. Figure S8. Macroscopic surface morphologies of typical coatings with increasing spraying distance. Figure S9. SHAP interaction matrix for coating hardness, including diagonal elements. Figure S10. Three-dimensional distribution map of strong interaction parameters versus coating microhardness: (a) torch traversing velocity vs. spraying distance; (b) torch shifting distance vs. spraying distance. Figure S11. SHAP dependence plots of standard deviation of coating hardness with respect to (a) spraying distance; (b) torch traversing speed; (c) torch shifting distance; (d) powder feeding rate; (e) propane pressure; and (f) air pressure. Figure S12. SHAP interaction strength matrix for the standard deviation of coating hardness.

Author Contributions

Conceptualization, Z.Z., Z.J. and B.Z.; methodology, E.Z., C.M. and J.Y.; investigation, E.Z. and C.M.; data curation, J.Y. and S.Y.; writing—original draft preparation, E.Z. and C.M.; writing—review and editing, E.Z. and C.M.; supervision, Z.Z., Z.J. and B.Z.; funding acquisition, Z.Z., Z.J. and B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (52275225).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Acknowledgments

The authors would like to express their sincere gratitude to Peisong Song from Hohai University and Xili Liu from Northeastern University for their professional assistance in language polishing and proofreading of this paper. Their meticulous revisions and valuable suggestions have significantly improved the quality of this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Guo, R.; Zhang, C.; Chen, Q.; Yang, Y.; Li, N.; Liu, L. Study of structure and corrosion resistance of Fe-based amorphous coatings prepared by HVAF and HVOF. Corros. Sci. 2011, 53, 2351–2356. [Google Scholar] [CrossRef]
  2. Silveira, L.; Pukasiewicz, A.; de Aguiar, D.; Zara, A.; Björklund, S. Study of the corrosion and cavitation resistance of HVOF and HVAF FeCrMnSiNi and FeCrMnSiB coatings. Surf. Coat. Technol. 2019, 374, 910–922. [Google Scholar] [CrossRef]
  3. Zhang, E.; Zhang, Z.; Jing, Z.; Yuan, J.; Ma, C.; Yan, S.; Zhang, S.; Liang, X. Research Progress on Process Optimization of Thermal-Sprayed Iron-Based Amorphous Coatings. Integr. Mater. Manuf. Innov. 2025, 14, 247–275. [Google Scholar] [CrossRef]
  4. Li, M.; Christofides, P.D. Modeling and analysis of HVOF thermal spray process accounting for powder size distribution. Chem. Eng. Sci. 2003, 58, 849–857. [Google Scholar] [CrossRef]
  5. Oksa, M.; Turunen, E.; Suhonen, T.; Varis, T.; Hannula, S.-P. Optimization and Characterization of High Velocity Oxy-fuel Sprayed Coatings: Techniques, Materials, and Applications. Coatings 2011, 1, 17–52. [Google Scholar] [CrossRef]
  6. Yan, S.; Yuan, J.; Ma, C.; Zhang, Z.; Jing, Z.; Chu, Z.; Liang, X. Innovative preparation and corrosion resistance of Fe-based amorphous coatings fabricated by ultra short pulse laser manufacturing process. J. Alloys Compd. 2025, 1049, 185410. [Google Scholar] [CrossRef]
  7. Gurgenc, T.; Altay, O.; Ulas, M.; Ozel, C. Extreme learning machine and support vector regression wear loss predictions for magnesium alloys coated using various spray coating methods. J. Appl. Phys. 2020, 127, 185103. [Google Scholar] [CrossRef]
  8. Zhang, Y.; Ling, C. A strategy to apply machine learning to small datasets in materials science. npj Comput. Mater. 2018, 4, 25. [Google Scholar] [CrossRef]
  9. McGovern, A.; Lagerquist, R.; Gagne, D.J.; Jergensen, G.E.; Elmore, K.L.; Homeyer, C.R.; Smith, T. Making the Black Box More Transparent: Understanding the Physical Implications of Machine Learning. Bull. Am. Meteorol. Soc. 2019, 100, 2175–2199. [Google Scholar] [CrossRef]
  10. Liu, M.; Yu, Z.; Zhang, Y.; Wu, H.; Liao, H.; Deng, S. Prediction and analysis of high velocity oxy fuel (HVOF) sprayed coating using artificial neural network. Surf. Coat. Technol. 2019, 378, 124988. [Google Scholar] [CrossRef]
  11. Paturi, U.M.R.; Cheruku, S.; Geereddy, S.R. Process modeling and parameter optimization of surface coatings using artificial neural networks (ANNs): State-of-the-art review. Mater. Today Proc. 2021, 38, 2764–2774. [Google Scholar] [CrossRef]
  12. Ye, W.; Wang, W.; Su, Y.; Qi, W.; Feng, L.; Xie, L. Prediction of HVAF thermal spraying parameters and coating properties based on improved WOA-ANN method. Mater. Today Commun. 2024, 39, 109265. [Google Scholar] [CrossRef]
  13. Liu, M.; Yu, Z.; Wu, H.; Liao, H.; Zhu, Q.; Deng, S. Implementation of Artificial Neural Networks for Forecasting the HVOF Spray Process and HVOF Sprayed Coatings. J. Therm. Spray Technol. 2021, 30, 1329–1343. [Google Scholar] [CrossRef]
  14. Lv, Y.; Wang, Z.; Cheng, S.; Di, J.; Zhao, T.; Fan, H.; Ning, Z.; Sun, J.; Huang, Y. Multi-Objective Optimization of HVAF-Sprayed Fe-Based Amorphous Alloy Coatings via Machine Learning for Superior Corrosion Resistance. Corros. Sci. 2025, 256, 113225. [Google Scholar] [CrossRef]
  15. Gao, T.; Gao, J.; Zhou, H.; Zhang, S.; Wang, D.; Yang, B.; Sun, W.; Wang, J. Achieving superior corrosion resistance in HVAF-sprayed Fe-based amorphous alloy coatings through data-driven machine learning. J. Mater. Sci. Technol. 2025, 247, 171–187. [Google Scholar] [CrossRef]
  16. Qin, X.; Wang, Q.; Zhao, X.; Xia, S.; Wang, L.; Zhang, Y.; He, C.; Chen, D.; Jiang, B. PCS: Property-composition-structure chain in Mg-Nd alloys through integrating sigmoid fitting and conditional generative adversarial network modeling. Scr. Mater. 2025, 265, 116762. [Google Scholar] [CrossRef]
  17. Wang, Q.; Qin, X.; Xia, S.; Wang, L.; Wang, W.; Huang, W.; Song, Y.; Tang, W.; Chen, D. Interpretable machine learning excavates a low-alloyed magnesium alloy with strength-ductility synergy based on data augmentation and reconstruction. J. Magnes. Alloys 2025, 13, 2866–2883. [Google Scholar] [CrossRef]
  18. Verdon, C.; Karimi, A.; Martin, J.-L. A study of high velocity oxy-fuel thermally sprayed tungsten carbide based coatings. Part 1: Microstructures. Mater. Sci. Eng. A 1998, 246, 11–24. [Google Scholar] [CrossRef]
  19. ASTM E384-22; Standard Test Method for Microindentation Hardness of Materials. ASTM: West Conshohocken, PA, USA, 2022.
  20. Sharma, N.; Malviya, L.; Jadhav, A.; Lalwani, P. A hybrid deep neural net learning model for predicting Coronary Heart Disease using Randomized Search Cross-Validation Optimization. Decis. Anal. J. 2023, 9, 100331. [Google Scholar] [CrossRef]
  21. Shao, S.; Wang, P.; Yan, R. Generative adversarial networks for data augmentation in machine fault diagnosis. Comput. Ind. 2019, 106, 85–93. [Google Scholar] [CrossRef]
  22. Chadebec, C.; Thibeau-Sutre, E.; Burgos, N.; Allassonnière, S. Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2879–2896. [Google Scholar] [CrossRef]
  23. Shu, K.; Wu, L.; Zhao, Y.; Liu, A.; Qian, R.; Chen, X. Data Augmentation for Seizure Prediction with Generative Diffusion Model. IEEE Trans. Cogn. Dev. Syst. 2024, 17, 577–591. [Google Scholar] [CrossRef]
  24. Lanzante, J.R. Testing for differences between two distributions in the presence of serial correlation using the Kolmogorov–Smirnovand Kuiper’s tests. Int. J. Clim. 2021, 41, 6314–6323. [Google Scholar] [CrossRef]
  25. Wang, H.; Liang, Q.; Hancock, J.T.; Khoshgoftaar, T.M. Feature selection strategies: A comparative analysis of SHAP-value and importance-based methods. J. Big Data 2024, 11, 44. [Google Scholar] [CrossRef]
  26. Ji, S.; Wang, X.; Lyu, T.; Liu, X.; Wang, Y.; Heinen, E.; Sun, Z. Understanding cycling distance according to the prediction of the XGBoost and the interpretation of SHAP: A non-linear and interaction effect analysis. J. Transp. Geogr. 2022, 103, 103414. [Google Scholar] [CrossRef]
  27. Armstrong, G.; Martino, C.; Rahman, G.; Gonzalez, A.; Vázquez-Baeza, Y.; Mishne, G.; Knight, R. Uniform Manifold Approximation and Projection (UMAP) Reveals Composite Patterns and Resolves Visualization Artifacts in Microbiome Data. mSystems 2021, 6, e0069121. [Google Scholar] [CrossRef]
  28. Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef]
  29. Belkina, A.C.; Ciccolella, C.O.; Anno, R.; Halpert, R.; Spidlen, J.; Snyder-Cappione, J.E. Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nat. Commun. 2019, 10, 5415. [Google Scholar] [CrossRef]
  30. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  31. Girin, L.; Leglaive, S.; Bie, X.; Diard, J.; Hueber, T.; Alameda-Pineda, X. Dynamical Variational Autoencoders: A Comprehensive Review. Found. Trends Mach. Learn. 2022, 15, 1–175. [Google Scholar] [CrossRef]
  32. Alblwi, A.; Makkawy, S.; Barner, K.E. D-DDPM: Deep Denoising Diffusion Probabilistic Models for Lesion Segmentation and Data Generation in Ultrasound Imaging. IEEE Access 2025, 13, 41194–41209. [Google Scholar] [CrossRef]
  33. Yu, T.; Li, C.; Huang, J.; Xiao, X.; Zhang, X.; Li, Y.; Fu, B. ReF-DDPM: A novel DDPM-based data augmentation method for imbalanced rolling bearing fault diagnosis. Reliab. Eng. Syst. Saf. 2024, 251, 110343. [Google Scholar] [CrossRef]
  34. Mumuni, A.; Mumuni, F. Data augmentation: A comprehensive survey of modern approaches. Array 2022, 16, 100258. [Google Scholar] [CrossRef]
  35. Tran, N.-T.; Tran, V.-H.; Nguyen, N.-B.; Nguyen, T.-K.; Cheung, N.-M. On Data Augmentation for GAN Training. IEEE Trans. Image Process. 2021, 30, 1882–1897. [Google Scholar] [CrossRef]
  36. Li, X.; Sun, J.; Chen, X. Machine Learning-Based Prediction of High-Entropy Alloy Hardness: Design and Experimental Validation of Superior Hardness. Trans. Indian Inst. Met. 2024, 77, 3973–3981. [Google Scholar] [CrossRef]
  37. Mauer, G.; Rauwald, K.-H.; Sohn, Y.J.; Vaßen, R. The Potential of High-Velocity Air-Fuel Spraying (HVAF) to Manufacture Bond Coats for Thermal Barrier Coating Systems. J. Therm. Spray Technol. 2023, 33, 746–755. [Google Scholar] [CrossRef]
  38. Bobzin, K.; Zhao, L.; Heinemann, H.; Burbaum, E. Influence of the atmospheric plasma spraying parameters on the coating structure and the deposition efficiency of silicon powder. Int. J. Adv. Manuf. Technol. 2022, 123, 35–47. [Google Scholar] [CrossRef]
  39. Palanisamy, K.; Gangolu, S.; Antony, J.M. Effects of HVOF spray parameters on porosity and hardness of 316L SS coated Mg AZ80 alloy. Surf. Coat. Technol. 2022, 448, 128898. [Google Scholar] [CrossRef]
  40. Katranidis, V.; Kamnis, S.; Allcock, B.; Gu, S. Effects and Interplays of Spray Angle and Stand-off Distance on the Sliding Wear Behavior of HVOF WC-17Co Coatings. J. Therm. Spray Technol. 2019, 28, 514–534. [Google Scholar] [CrossRef]
  41. Masoumeh, G.; Shahrooz, S.; Mahmood, G.; Ahmad, S.E. Investigation of stand-off distance effect on structure, adhesion and hardness of copper coatings obtained by the APS technique. J. Theor. Appl. Phys. 2018, 12, 85–91. [Google Scholar] [CrossRef]
  42. El-Awadi, G.A. Review of effective techniques for surface engineering material modification for a variety of applications. AIMS Mater. Sci. 2023, 10, 652–692. [Google Scholar] [CrossRef]
  43. Yue, K.; Lian, G.; Zeng, J.; Chen, C.; Lan, R.; Kong, L. An investigation on graphite behavior and coating properties in the molten pool based on different powder particle sizes. Heliyon 2023, 9, e14222. [Google Scholar] [CrossRef]
  44. Mauer, G.; Rauwald, K.-H.; Sohn, Y.J.; Weirich, T.E. Cold Gas Spraying of Nickel-Titanium Coatings for Protection Against Cavitation. J. Therm. Spray Technol. 2020, 30, 131–144. [Google Scholar] [CrossRef]
  45. Tillmann, W.; Hagen, L.; Luo, W. Process Parameter Settings and Their Effect on Residual Stresses in WC/W2C Reinforced Iron-Based Arc Sprayed Coatings. Coatings 2017, 7, 125. [Google Scholar] [CrossRef]
  46. Xie, Y.; Chen, C.; Planche, M.-P.; Deng, S.; Huang, R.; Ren, Z.; Liao, H. Strengthened Peening Effect on Metallurgical Bonding Formation in Cold Spray Additive Manufacturing. J. Therm. Spray Technol. 2019, 28, 769–779. [Google Scholar] [CrossRef]
  47. Gao, X.; Li, C.; Zhang, D.; Gao, H.; Han, X. Numerical analysis of the activated combustion high-velocity air-fuel (AC-HVAF) thermal spray process: A survey on the parameters of operation and nozzle geometry. Surf. Coat. Technol. 2021, 405, 126588. [Google Scholar] [CrossRef]
  48. Thoutam, A.K.; Lamana, M.S.; de Castilho, B.C.N.M.; Ben Ettouil, F.; Chandrakar, R.; Bessette, S.; Brodusch, N.; Gauvin, R.; Dolatabadi, A.; Moreau, C. The Role of HVAF Nozzle Design and Process Parameters on In-Flight Particle Oxidation and Microstructure of NiCoCrAlY Coatings. Coatings 2025, 15, 355. [Google Scholar] [CrossRef]
  49. Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Fundamental characteristics of Fe-based amorphous powders: (a) SEM morphology; (b) XRD pattern.
Figure 1. Fundamental characteristics of Fe-based amorphous powders: (a) SEM morphology; (b) XRD pattern.
Coatings 16 00199 g001
Figure 2. Schematic diagram of the HVAF coating preparation process.
Figure 2. Schematic diagram of the HVAF coating preparation process.
Coatings 16 00199 g002
Figure 3. The workflow diagram of the data augmentation strategy and dual-dimensional evaluation framework.
Figure 3. The workflow diagram of the data augmentation strategy and dual-dimensional evaluation framework.
Coatings 16 00199 g003
Figure 4. (a1,a2) Performance comparison of different models; (b1b4) correlation between predicted and actual values for the optimal model in the training and test sets.
Figure 4. (a1,a2) Performance comparison of different models; (b1b4) correlation between predicted and actual values for the optimal model in the training and test sets.
Coatings 16 00199 g004
Figure 5. (a) Comparison of statistical deviations between synthetic and real data: (a1) mean deviation; (a2) standard deviation. (b1b3) Two-dimensional UMAP visualization showing the distribution consistency between synthetic data generated by different models and the real dataset.
Figure 5. (a) Comparison of statistical deviations between synthetic and real data: (a1) mean deviation; (a2) standard deviation. (b1b3) Two-dimensional UMAP visualization showing the distribution consistency between synthetic data generated by different models and the real dataset.
Coatings 16 00199 g005
Figure 6. Impact of varying proportions of DDPM-generated synthetic data on the predictive performance of four representative machine learning models on the independent test set: (a) R2; (b) RMSE.
Figure 6. Impact of varying proportions of DDPM-generated synthetic data on the predictive performance of four representative machine learning models on the independent test set: (a) R2; (b) RMSE.
Coatings 16 00199 g006
Figure 7. SHAP summary visualization for coating microhardness: (a1) distribution of SHAP values across all features; (a2) ranking of feature importance; (b1b6) SHAP dependence plots illustrating the effects of individual spraying parameters.
Figure 7. SHAP summary visualization for coating microhardness: (a1) distribution of SHAP values across all features; (a2) ranking of feature importance; (b1b6) SHAP dependence plots illustrating the effects of individual spraying parameters.
Coatings 16 00199 g007
Figure 8. Matrix of pairwise interaction strengths among the six HVAF process parameters for coating hardness, derived from SHAP interaction analysis. Diagonal elements representing independent effects are set to zero for clarity.
Figure 8. Matrix of pairwise interaction strengths among the six HVAF process parameters for coating hardness, derived from SHAP interaction analysis. Diagonal elements representing independent effects are set to zero for clarity.
Coatings 16 00199 g008
Figure 9. SHAP summary plots for the standard deviation of coating hardness: (a) distribution of SHAP values across all features; (b) ranking of feature importance.
Figure 9. SHAP summary plots for the standard deviation of coating hardness: (a) distribution of SHAP values across all features; (b) ranking of feature importance.
Coatings 16 00199 g009
Figure 10. (a) Comparison between predicted and experimentally measured coating microhardness; (b) comparison between predicted and measured standard deviation of microhardness.
Figure 10. (a) Comparison between predicted and experimentally measured coating microhardness; (b) comparison between predicted and measured standard deviation of microhardness.
Coatings 16 00199 g010
Figure 11. (a1) Cross-sectional wear track profiles of Sample A and E; (a2) 3D morphology of Sample A; (a3) 3D morphology of Sample E; (b1) wear rate comparison; (b2) comparison of friction coefficient curves.
Figure 11. (a1) Cross-sectional wear track profiles of Sample A and E; (a2) 3D morphology of Sample A; (a3) 3D morphology of Sample E; (b1) wear rate comparison; (b2) comparison of friction coefficient curves.
Coatings 16 00199 g011
Figure 12. Worn surface morphologies of HVAF-sprayed coatings under dry sliding conditions at 15 N load: (a1a3) sample A; (b1b3) sample E.
Figure 12. Worn surface morphologies of HVAF-sprayed coatings under dry sliding conditions at 15 N load: (a1a3) sample A; (b1b3) sample E.
Coatings 16 00199 g012
Table 1. Design of HVAF spraying parameters.
Table 1. Design of HVAF spraying parameters.
ParameterLeaveValue
Air pressure (psi)360, 70, 80
Propane pressure (psi)362, 73, 84
Torch traversing velocity (m/s)30.5, 1, 1.5
Torch shifting distance (mm)22, 4
Powder feeding rate (r/min)22, 4
Spraying distance (mm)8180, 210, 240, 270, 300, 330, 360, 390
Table 2. Expanded parameter space design.
Table 2. Expanded parameter space design.
ParameterLeaveValue
Air pressure (psi)560, 65, 70, 75, 80
Propane pressure (psi)562, 67, 73, 78, 84
Torch traversing velocity (m/s)50.5, 0.8, 1, 1.3, 1.5
Torch shifting distance (mm)52, 2.5, 3, 3.5, 4
Powder feeding rate (r/min)52, 2.5, 3, 3.5, 4
Spraying distance (mm)15180, 195, 210, 225, 240, 255, 270, 285, 300, 315, 330, 345, 360, 375, 390
Table 3. Selected predicted parameter combinations and comparison with the best configuration in the original dataset.
Table 3. Selected predicted parameter combinations and comparison with the best configuration in the original dataset.
SampleABCDE
 Predicted optimal parameterPredicting Moderate ParametersPredicted poor parameterOptimal parameters of the dataset
Air pressure (psi)8065707080
Propane pressure (psi)8467737384
Torch traversing velocity (m/s)1.51.310.81.5
Torch shifting distance (mm)2.52.5222
Powder feeding rate (r/min)423.53.54
Spraying distance (mm)195225240255210
Predicted microhardness (HV)1268.921124.541010.12994.151165.13
Predicted microhardness standard deviation (HV)79.49115.4125.07110.7989.69
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, E.; Ma, C.; Yuan, J.; Yan, S.; Zhang, Z.; Jing, Z.; Zhang, B. Achieving High Hardness and Uniformity in Fe-Based Amorphous Coatings for Enhanced Wear Resistance via Explainable Machine Learning. Coatings 2026, 16, 199. https://doi.org/10.3390/coatings16020199

AMA Style

Zhang E, Ma C, Yuan J, Yan S, Zhang Z, Jing Z, Zhang B. Achieving High Hardness and Uniformity in Fe-Based Amorphous Coatings for Enhanced Wear Resistance via Explainable Machine Learning. Coatings. 2026; 16(2):199. https://doi.org/10.3390/coatings16020199

Chicago/Turabian Style

Zhang, Enhao, Cong Ma, Jiachi Yuan, Shuang Yan, Zhibin Zhang, Zhiyuan Jing, and Binbin Zhang. 2026. "Achieving High Hardness and Uniformity in Fe-Based Amorphous Coatings for Enhanced Wear Resistance via Explainable Machine Learning" Coatings 16, no. 2: 199. https://doi.org/10.3390/coatings16020199

APA Style

Zhang, E., Ma, C., Yuan, J., Yan, S., Zhang, Z., Jing, Z., & Zhang, B. (2026). Achieving High Hardness and Uniformity in Fe-Based Amorphous Coatings for Enhanced Wear Resistance via Explainable Machine Learning. Coatings, 16(2), 199. https://doi.org/10.3390/coatings16020199

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop