Machine Learning-Driven Comparative Analysis and Optimization of Cu-Ni-Si and Cu Low Alloys: From Data-Driven Interpretation to Inverse Design

Kolev, Mihail

doi:10.3390/alloys5020009

Open AccessArticle

Machine Learning-Driven Comparative Analysis and Optimization of Cu-Ni-Si and Cu Low Alloys: From Data-Driven Interpretation to Inverse Design

by

Mihail Kolev

^1,2

¹

Institute of Metal Science, Equipment and Technologies with Hydro- and Aerodynamics Centre “Acad. A. Balevski”, Bulgarian Academy of Sciences, 1574 Sofia, Bulgaria

²

National Center for Mechatronics and Clean Technologies, 8 “Kliment Ohridski” Blvd., Building 8, 1756 Sofia, Bulgaria

Alloys 2026, 5(2), 9; https://doi.org/10.3390/alloys5020009

Submission received: 28 February 2026 / Revised: 14 April 2026 / Accepted: 22 April 2026 / Published: 24 April 2026

Download

Browse Figures

Versions Notes

Abstract

The development of high-performance copper alloys requires balancing mechanical strength and electrical conductivity, properties that are often inversely correlated due to competing strengthening mechanisms. This study presents a comparative machine learning analysis of Cu-Ni-Si and Cu low alloys using a curated dataset of 1690 entries derived from the Gorsse et al. database, comprising 1507 samples with hardness measurements and 1685 samples with electrical conductivity data. Three ensemble-based regression algorithms, Random Forest, XGBoost, and Gradient Boosting, were trained to predict Vickers hardness (HV) and electrical conductivity (%IACS) from an augmented feature set encompassing alloy composition, thermomechanical processing parameters, missingness indicators, and physics-informed descriptors (valence electron concentration, atomic size mismatch, electronegativity difference, and Ni:Si atomic ratio). XGBoost achieved optimal performance for hardness prediction (R² = 0.8554, RMSE = 29.90 HV), while Gradient Boosting performed best for electrical conductivity (R² = 0.8400, RMSE = 5.96%IACS). Averaged tree-based feature-importance analysis identified valence electron concentration as the most influential predictor for hardness (39.9%), followed by aging temperature (11.2%), while Cu content dominated conductivity prediction (37.7%), followed by aging time (8.9%). Complementary SHAP analysis confirmed these trends while revealing directional relationships and nonlinear feature interaction effects. Composition-grouped cross-validation by unique alloy formula (K = 10) yielded substantially lower performance, with grouped CV R² = 0.438 for hardness and 0.293 for conductivity, indicating that generalization to unseen alloy formulations remains limited. The models were further applied for practical tasks, including property prediction for new alloy compositions, processing parameter optimization via differential evolution with metallurgical constraints (achieving hardness up to 293.9 HV or conductivity up to 45.7%IACS for the same base composition, with prediction intervals reported), and inverse design to identify alloy formulations meeting specified target properties. This work demonstrates the potential of interpretable machine learning to support copper alloy development by enabling rapid computational screening of the compositional and processing parameter space, subject to the generalization limitations identified herein.

Keywords:

Cu-Ni-Si alloys; machine learning; electrical conductivity; hardness prediction; SHAP analysis; alloy design

1. Introduction

Copper alloys are indispensable in electronics, connectors, lead frames, and aerospace applications due to their corrosion resistance, high strength, and thermal stability [1]. These alloys find extensive use across automotive, electrical, and nuclear engineering sectors [2,3]. Two primary strengthening strategies govern their property optimization: (i) precipitation hardening, exemplified by Cu-Ni-Si alloys [4,5], and (ii) solid-solution and work-hardening routes, characteristic of Cu-Cr, Cu-Ti, Cu-Zn, and Cu-Sn low-alloy systems [6,7]. Both strategies require careful balancing of the strength–conductivity trade-off, as microstructural changes that enhance mechanical properties often degrade electrical performance [3]. A comparative analysis of these alloy families offers scientific insight into how distinct hardening mechanisms influence this trade-off across compositionally diverse copper systems.

Machine learning (ML) has emerged as a powerful tool for mapping composition and processing parameters to material properties in copper-based alloys. In property prediction, ML models have successfully correlated elemental descriptors with Young’s modulus and yield strength in amorphous alloys [8]. The PREDICT framework leverages bond length and cohesive energy descriptors to estimate elastic constants in multi-principal element alloys [9]. Random Forest regression has proven effective for predicting hardness in tempered low-alloy steels [10]. Stacking fault energy in fcc-based alloys has been correlated with elemental attributes, particularly for elements near Ni and Co [11]. For processing optimization, ML has guided heat treatment procedures to enhance yield and impact strength in steels [12,13]. Deep learning algorithms, including convolutional neural networks, have predicted high-temperature properties of Fe-Cr-Ni-based alloys with high accuracy [14]. Powder densification behavior in Cu-Al alloys during sintering has been accurately modeled using MLP-LM networks [15]. ML has also reduced experimental efforts in optimizing the mechanical and electrical properties of pure Cu [16]. Alloy design and conductivity prediction represent another major application area. Bayesian optimization has enabled the design of copper alloys with enhanced properties [17]. Composition-oriented methods have guided high-performance Cu-Zn alloy design [18]. ML accelerated the discovery of superior Cu-Ni-Co-Si alloys, notably the Cu-2.3Ni-0.7Co-0.7Si composition [19]. Microstructure-based models have predicted electrical conductivity in Cu/Nb composites [20]. Phase stability in Cu-Ni-Si-Cr alloys has been investigated through ML, predicting new stable phases and their properties [21].

In Cu low-alloy systems, ML strategies have optimized electrical conductivity and tensile strength in Cu-In alloys [22]. Charge transfer models in Cu/Hf binary oxides have been improved through ML techniques [23]. Classification-based models have evaluated the mechanical properties of friction stir-welded Cu alloys [24]. Hot deformation behavior in Cu-Ti alloys, including microalloying effects, has been analyzed using ML [25]. Random Forest Classification has predicted sintering-induced swelling in Cu-Sn alloys [26]. Mechanism-driven optimization has benefited from integrated computational materials engineering approaches investigating Ti additions in Cu-Cr alloys for precipitate refinement and conductivity enhancement [27]. Genetic algorithm-based optimization of discarded experimental data has identified new Cu alloys with superior hardness and conductivity [28]. Interpretable ML models applied to Cu-Cr-La/Ce alloys uncovered an unexpected vacancy depletion mechanism affecting softening resistance [29]. XGBoost-based models have accurately predicted precipitate formation and mechanical properties in Cu-Ti alloys [30]. ML-driven interatomic potential models have improved molecular dynamics simulations for Cu-Sn alloys [31]. Finally, ML has elucidated Cu-Cr alloying element distributions, further demonstrating its transformative impact on accelerating alloy discovery and improving processing–property relationships [32].

Building upon these methodological advances, recent work has specifically addressed the predictive modeling of Cu-Ni-Si and Cu low-alloyed systems. Machine learning models for Cu-Ni-Si alloys have established quantitative relationships between chemical composition, thermomechanical processing parameters, and resulting hardness and electrical conductivity, identifying aging temperature and nickel content as the most influential features in the trained models, consistent with precipitation-hardening trends reported in the literature [33]. Separately, deep learning approaches applied to Cu low-alloyed systems have revealed the distinct importance of matrix purity and cold work reduction in determining properties within this alloy class [34]. Related computational investigations have explored quaternary phase stability in Cu-Ni-Si-Cr alloys through combined quantum-mechanical simulations and active machine learning, predicting novel metastable phases and their elastic behavior [35]. Additionally, integrated computational materials engineering approaches have examined Ti additions in Cu-Cr alloys, elucidating their role in precipitate refinement and conductivity enhancement [27]. However, despite these individual contributions addressing various aspects of copper alloy design and property prediction, a systematic cross-system comparison that directly contrasts the data-driven feature attributions and compositional influences between Cu-Ni-Si and Cu low alloys remains absent from the literature. This knowledge gap limits the ability to provide unified design guidelines that account for the fundamental mechanistic differences between these industrially important alloy families.

The present study addresses this gap through a comprehensive comparative analysis of strengthening mechanisms in Cu-Ni-Si and Cu low-alloyed systems using unified machine learning models. This work develops ensemble ML models, including Random Forest, XGBoost, and Gradient Boosting, capable of accurately predicting hardness and electrical conductivity across both alloy classes using a combined dataset of 1690 compositions. The selection of these ensemble algorithms over deep learning architectures reflects deliberate prioritization of computational efficiency and practical applicability, enabling rapid model training and deployment without requiring specialized hardware or extensive hyperparameter tuning. This study quantitatively compares strengthening efficiency by analyzing property distributions, correlations, and achievable property ranges for each system. SHapley Additive exPlanations (SHAP) analysis, a model-agnostic interpretability framework rooted in cooperative game theory that assigns theoretically grounded additive contributions to each input feature, is applied for the interpretation of feature importance, identifying the most influential compositional and processing features in the trained models for each alloy class. Finally, practical design guidelines that leverage the distinct strengthening mechanisms to achieve targeted property combinations for specific applications are extracted.

2. Data and Methodology

2.1. Data Acquisition and Dataset Description

The present study employed a dataset of 1690 entries, comprising 1507 samples with hardness (HV) measurements and 1685 samples with electrical conductivity (%IACS) data, all specific to Cu-Ni-Si and Cu low alloys. This subset was derived from an extensive database of various Cu-based alloy types originally assembled by Gorsse et al. [36,37], which serves as a robust foundation for investigating the relationships between alloy composition, processing parameters, and resultant material properties. The dataset encompasses 27 compositional features and 7 processing parameters as input variables. The compositional features include concentrations (in wt.%) of Cu (83.05–99.84), Ni (0.10–8.00), Si (0.03–3.81), Cr (0.02–1.00), Mg (0.02–2.00), Co (0.65–2.67), Zr (0.06–0.50), Al (0.13–0.50), and other trace elements. The thermo-mechanical processing parameters comprise the following variables with their corresponding notation and units: solution temperature (Tss, K): 923–1323 K; solution duration (tss, h): 0.1–27.8 h; aging temperature (Tag, K): 373–1023 K; aging duration (tag, h): 0.0–408.0 h; cold rolling reduction percentage (CR reduction, %): 0–93%; aging (Aging_encoded): binary indicator (1 = applied, 0 = not applied); aging was applied to 95% of samples; and secondary thermo-mechanical process (STMP_encoded): binary indicator (1 = applied, 0 = not applied).

Within the subset, Cu-Ni-Si alloys constitute the majority with 1427 entries (84.4%), while Cu low alloys account for 263 entries (15.6%). The target properties exhibit substantial variability: hardness ranges from 57.0 to 386.0 HV (mean: 209.1 ± 76.9 HV), and electrical conductivity spans 8.0 to 93.0%IACS (mean: 41.1 ± 15.5%IACS). Notably, 1502 samples contain measurements for both properties, enabling direct correlation analysis between mechanical strength and electrical performance. The data were compiled from 39 peer-reviewed literature sources, ensuring experimental reproducibility and scientific rigor.

2.2. Data Cleaning and Feature Engineering

The dataset was filtered to include only Cu-Ni-Si alloys and Cu low alloys, excluding Cu-Ti alloys (93 entries) and Cu-Be alloys (48 entries) from the original database. Although Cu-Ti alloys are also precipitation-hardened, they form fundamentally different strengthening precipitates: metastable ordered Cu₄Ti phases that follow a distinct nucleation and coarsening pathway compared to the Ni₂Si precipitates in Cu-Ni-Si systems, making direct mechanistic comparison problematic. Cu-Be alloys were excluded because beryllium-containing alloys are subject to increasing regulatory restrictions due to toxicity concerns, limiting their industrial relevance, and their strengthening relies on ordered CuBe precipitates with distinct crystallographic relationships to the matrix. The retained Cu low-alloy class (Cu-Cr, Cu-Zn, Cu-Sn, and related systems) shares the common characteristic of relying primarily on solid-solution strengthening and work hardening rather than coherent precipitation, providing a meaningful mechanistic contrast to the precipitation-hardened Cu-Ni-Si family. Input features were refined to 12 compositional features (Cu, Ni, Si, Cr, Mg, Al, Co, Zr, Ti, Fe, Zn, and P in wt.%) and 7 processing parameters. To encode physical domain knowledge, four physics-informed descriptors were computed: valence electron concentration (VEC), atomic size mismatch (delta_radius), electronegativity difference (delta_chi), and Ni:Si atomic ratio. VEC is calculated as the composition-weighted average of elemental valence electrons; delta_radius and delta_chi quantify the root-mean-square deviation of atomic radii and Pauling electronegativities from the composition-weighted mean. The remaining 15 elements from the original 27 compositional features (Ag, B, Be, Ca, Ce, Hf, La, Mn, Mo, Nb, Nd, Pb, Pr, Sn, and V) were excluded because they exhibited zero or near-zero variance in the filtered Cu-Ni-Si and Cu low-alloy subset, with over 99% of entries containing zero values for these elements, rendering them uninformative for model training. Categorical variables, specifically the aging condition (Y/N) and secondary thermo-mechanical processing status (Y/N), were transformed to binary encodings (Aging_encoded and STMP_encoded, where 1 = applied and 0 = not applied) for compatibility with machine learning algorithms. Missing values in elemental compositions were imputed with zero, indicating element absence, while missing processing parameters were imputed using median values from the available data. It should be noted that zero-imputation for elemental compositions is physically meaningful when the element was not intentionally added to the alloy; however, this approach cannot distinguish between genuinely absent elements and unreported compositions, which may introduce minor bias, particularly for Cu low alloys, where certain trace elements may be present but unrecorded in the source literature. Similarly, median imputation for processing parameters assumes that missing values follow the central tendency of the available data, which is a standard approach but may obscure alloys with atypical processing histories. To mitigate potential bias, binary missingness indicator flags were added for each imputable processing variable (Tss, tss, Tag, and tag), following standard practice in tabular ML. These flags enable the models to distinguish between reported and imputed values. Missingness rates are highest for Cu low alloys (up to 18.3% for Tss and tss). An imputation sensitivity analysis comparing complete-case-only models confirmed negligible performance differences (delta R² < 0.005). Data were partitioned into training (80%) and test (20%) subsets using stratified random sampling based on alloy class to preserve class proportions between Cu-Ni-Si and Cu low alloys. A fixed random seed of 42 was employed throughout all data processing steps to ensure complete reproducibility of the train/test split and subsequent analyses. The use of fixed random states across all algorithms ensures that model training is fully deterministic, yielding identical predictions and evaluation metrics upon repeated execution.

2.3. Machine Learning Models

Three ensemble-based regression algorithms were implemented to predict hardness and electrical conductivity from compositional and processing inputs. Random Forest (RF) is an ensemble method that constructs multiple decision trees during training and outputs the mean prediction of individual trees, providing robust predictions through variance reduction [38,39]. The model was configured with 200 estimators, a maximum depth of 15, minimum samples to split of 5, minimum samples per leaf of 2, and a random state of 42. Extreme Gradient Boosting (XGBoost) is a Gradient Boosting framework that builds an additive model in a forward stage-wise manner using decision tree weak learners. This algorithm incorporates regularization to prevent overfitting and is known for excellent predictive performance on tabular data [40]. The hyperparameters were set to 200 estimators, a maximum depth of 8, a learning rate of 0.1, a subsample ratio of 0.8, a column subsample by tree of 0.8, and a random state of 42. Gradient Boosting Regressor (GBR) is the scikit-learn implementation of Gradient Boosting for regression tasks, building sequential weak learners where each subsequent tree corrects errors from previous iterations [41]. The model was configured with 200 estimators, a maximum depth of 8, a learning rate of 0.1, minimum samples to split of 5, and a random state of 42. The hyperparameters for all three algorithms were selected through a preliminary grid search over a reduced parameter grid (varying n_estimators in {100, 200, 300}, max_depth in {5, 8, 10, 15}, and learning_rate in {0.05, 0.1, 0.2} for boosting methods), evaluated using 5-fold cross-validation R-squared on the training set. The final values represent the configuration achieving the best cross-validation performance while maintaining computational efficiency, consistent with hyperparameter ranges commonly employed in prior materials–informatics studies for copper alloy property prediction [33,34].

All three algorithms were trained separately for each target property (hardness and electrical conductivity), resulting in six trained models. The trained models were serialized and saved for subsequent practical applications and reproducibility verification.

2.4. Model Performance Metrics

Model performance was assessed using three complementary regression metrics. The coefficient of determination (R²) measures the proportion of variance in the target variable explained by the model, ranging from 0 to 1, where higher values indicate better predictive performance. Root-mean-squared error (RMSE) quantifies the average magnitude of prediction errors in the original units of the target variable, with lower values indicating more accurate predictions. Mean absolute error (MAE) represents the average absolute difference between predicted and actual values, providing an interpretable measure of typical prediction deviation. Model generalization capability was evaluated through 5-fold cross-validation using standard K-fold partitioning with shuffling and a random state of 42. This approach randomly partitions the data into five subsets of approximately equal size, trains the model on four folds, evaluates on the held-out fold, and repeats for all five combinations. The mean and standard deviation of R² and RMSE scores across folds were reported to characterize both predictive accuracy and stability. Additionally, grouped K-fold cross-validation was performed using the unique alloy formula as the grouping variable (K = 10), ensuring that all samples sharing the same base composition appear in the same fold. This prevents inflated performance from predicting different processing conditions for already-seen compositions and provides a stricter test of generalization to unseen alloy formulations.

2.5. Interpretability and SHAP Analysis

Model interpretability was achieved through two complementary approaches. Feature importance rankings were derived from the intrinsic importance scores of tree-based models, which quantify the contribution of each feature to model predictions based on the reduction in impurity achieved by splits on that feature. Importance scores were averaged across all three algorithms to provide robust rankings that are not dependent on any single model’s characteristics. SHAP analysis was performed using the TreeExplainer method, which computes exact SHAP values for tree-based models. SHAP values provide a theoretically grounded approach to feature attribution based on cooperative game theory, explaining individual predictions as the sum of contributions from each feature. Summary plots were generated to visualize the distribution of feature impacts across all test samples, revealing both the magnitude and direction of feature effects on predictions [10,38]. Dependence plots were created for key features (Ni, Si, Cu, and Tag (K)) to illustrate the nonlinear relationships between feature values and their SHAP contributions, providing interpretable, data-driven attributions of how compositional and processing variables are associated with hardness and electrical conductivity in the trained models.

2.6. Optimization and Inverse Design Framework

The validated machine learning models were deployed for three practical applications relevant to alloy development. XGBoost was selected for hardness prediction based on its superior test performance [39], while Gradient Boosting was employed for electrical conductivity prediction [40,41]. For property prediction of new compositions, predictions were generated by constructing feature vectors containing all 19 input variables for compositions of interest. The trained models computed point predictions directly from these inputs, enabling rapid screening of candidate alloys without physical synthesis. For processing parameter optimization, the differential evolution algorithm was employed to identify optimal processing conditions for a fixed alloy composition. This global stochastic optimizer is suitable for multimodal objective functions common in materials optimization problems. Formally, the processing optimization minimizes f(x) = −y_hat(x; theta), where y_hat is the ML prediction, x the processing parameter vector, and theta the fixed composition, subject to box constraints x_lower ≤ x ≤ x_upper. For inverse design, the objective is g(z) = w_HV × (y_hat_HV(z) − HV_target)² + w_EC × (y_hat_EC(z) − EC_target)², with an additional Ni:Si stoichiometry penalty P(z) = lambda × max(0, |r(z) − r_target| − r_tol)². The algorithm searched within physically meaningful bounds constrained by industrial practice and metallurgical considerations: solution temperature (Tss: 1173–1273 K), solution duration (tss: 0.5–4.0 h), cold rolling reduction (CR reduction: 0–70%, adjusted by optimization target), aging temperature (Tag: 673–823 K), and aging duration (tag: 0.5–12 h). For inverse design, Ni:Si atomic ratio constraints (1.8–2.5) were enforced to ensure physically plausible compositions consistent with Ni₂Si precipitation stoichiometry. Separate optimizations were performed to maximize hardness and conductivity independently, using the negative predicted property value as the objective function. The algorithm was run using the scipy.optimize.differential_evolution implementation with the following parameters: strategy = ‘best1bin’, population size = 15 times the number of decision variables, mutation constant = (0.5, 1.0), recombination constant = 0.7, convergence tolerance = 0.01, maximum iterations = 100, and a fixed random seed of 42 for reproducibility. For inverse design targeting specified properties, the problem was formulated as a constrained optimization minimizing the weighted mean squared error between predicted and target values. The optimization searched over compositional variables (Ni: 0–6 wt.%, Si: 0–2 wt.%, Cr: 0–1 wt.%, and Mg: 0–0.3 wt.%) and selected processing parameters (aging temperature: 673–823 K, aging time: 1–8 h), with Cu content calculated as the balance after alloying element summation. Differential evolution was employed with 200 maximum iterations to ensure convergence to the global optimum. This inverse design capability enables specification of target properties first, with the algorithm identifying optimal composition and processing to achieve them.

3. Results and Discussions

3.1. Modeling

This comparative analysis examines the fundamental differences in strengthening mechanisms between two major classes of copper alloys: precipitation-hardened Cu-Ni-Si alloys and solid-solution-strengthened Cu low alloys. Table 1 summarizes the dataset distribution across these alloy classes. The dataset encompasses 1690 unique alloy compositions, with Cu-Ni-Si alloys representing 84.4% of the samples (1427 entries) and Cu low alloys comprising the remaining 15.6% (263 entries). This distribution reflects the industrial importance of Cu-Ni-Si alloys in applications requiring high strength combined with reasonable electrical conductivity, such as electronic connectors, lead frames, and automotive electrical systems. The Cu low-alloyed systems serve as an important contrasting baseline, representing alloys where conductivity preservation takes precedence over maximum strength. The sample distribution reveals an important aspect of the comparative analysis: Cu-Ni-Si alloys exhibit greater compositional and processing diversity than Cu low alloys. This diversity stems from the multiple degrees of freedom available in precipitation-hardened systems, where the strengthening response depends not only on alloy composition but also critically on the sequence and parameters of thermal and thermomechanical treatments. In contrast, Cu low alloys rely primarily on solid-solution strengthening and work hardening, mechanisms that are more directly linked to composition and deformation history.

Figure 1 presents the dataset overview, showing sample distribution across alloy classes and property histograms for hardness and electrical conductivity. The visualization reveals distinctly different property distributions for the two alloy systems. The hardness histogram shows a clear bimodal pattern, with Cu-Ni-Si alloys clustering at higher hardness values (typically 180–280 HV), while Cu low alloys concentrate at lower values (80–140 HV). This separation directly reflects the difference in strengthening mechanisms: precipitation hardening in Cu-Ni-Si alloys creates a high density of nanoscale Ni₂Si precipitates that act as formidable barriers to dislocation motion, whereas solid-solution strengthening in Cu low alloys provides only modest resistance through lattice distortion effects. The conductivity histogram shows the inverse pattern, with Cu low alloys maintaining higher conductivity due to their lower solute concentration in the copper matrix. Examining the bar chart components, Cu-Ni-Si samples with both hardness and conductivity measurements dominate the dataset, while Cu low alloys contribute proportionally fewer samples but provide crucial comparative data.

Figure 2 displays correlation heatmaps for Cu-Ni-Si alloys and Cu low alloys separately, revealing the underlying relationships between compositional and processing variables for each alloy class. In the Cu-Ni-Si correlation matrix, the strong positive correlation between nickel and silicon contents (visible as intense color in the Ni-Si cell) reflects the stoichiometric requirement for Ni₂Si precipitate formation, where the optimal Ni:Si atomic ratio approaches 2:1. This correlation is a direct manifestation of deliberate alloy design targeting maximum precipitation-hardening efficiency. The correlations with aging parameters (Tag(K) and tag(h)) further emphasize the critical role of thermal treatment in controlling precipitate nucleation, growth, and coarsening kinetics. Electrical conductivity appears both as a variable in the heatmap (showing Pearson correlations with other variables) and on the colorbar (indicating correlation magnitude/direction for all pairs). Negative correlations between copper content and other elements simply reflect that as alloying additions increase, pure copper content decreases. For Cu low alloys, the correlation structure is notably simpler and weaker, consistent with the more straightforward solid-solution strengthening mechanism, where each alloying element contributes somewhat independently to matrix strengthening and conductivity reduction. Comparing the two heatmaps directly, Cu-Ni-Si alloys show more intense and structured correlation patterns, indicating tighter compositional control requirements for optimal properties.

Table 2 provides the statistical comparison of hardness and electrical conductivity between the two alloy classes, quantifying the strength–conductivity dichotomy. Cu-Ni-Si alloys achieve a mean hardness of 222.61 HV with a standard deviation of 70.91 HV, twice the mean hardness of Cu low alloys (110.53 HV with a standard deviation of 35.46 HV). The wider hardness range observed in Cu-Ni-Si alloys (57–386 HV compared to 61–181 HV for Cu low alloys) reflects the sensitivity of precipitation hardening to aging conditions: underaging produces fewer, smaller precipitates, while overaging leads to precipitate coarsening and loss of coherency, both resulting in reduced hardness. The median values (219.0 HV for Cu-Ni-Si versus 107.0 HV for Cu low alloys) closely match the means, indicating relatively symmetric distributions within each class. For electrical conductivity, the comparison reverses dramatically. Cu low alloys maintain a mean conductivity of 64.88%IACS (standard deviation 17.77) versus 36.72%IACS (standard deviation 10.07) for Cu-Ni-Si. This conductivity advantage stems from the fundamental physics of electron scattering: solute atoms dissolved in the copper matrix act as scattering centers that reduce electron mean free path and hence conductivity. Cu-Ni-Si alloys, with their higher total alloying content (typically 2–6% Ni and 0.5–1.5% Si), present more scattering obstacles than the dilute Cu low alloys. Even after precipitation removes much of the solute from solid solution, the precipitates themselves and residual dissolved atoms continue to scatter electrons, preventing full conductivity recovery. The conductivity range for Cu-Ni-Si (8–86%IACS) spans wider than that for Cu low alloys (20–93%IACS), reflecting the variable degree of precipitation completion across different aging treatments.

Figure 3 illustrates the hardness versus electrical conductivity relationship, visually demonstrating the fundamental trade-off that governs copper alloy design. The scatter plot clearly separates the two alloy classes into distinct regions of the property space. Linear trend lines are included as visual guides illustrating the slope and offset differences in the trade-off between the two families. Cu-Ni-Si alloys (shown in one color) cluster in the high-hardness, moderate-conductivity region (upper-left quadrant), while Cu low alloys (shown in contrasting color) occupy the moderate-hardness, high-conductivity domain (lower-right quadrant). The inverse relationship between these properties is not coincidental but reflects the competing effects of solute atoms and precipitates. This trade-off has profound implications for alloy selection and design. Applications requiring maximum current-carrying capacity with adequate mechanical strength, such as bus bars and power transmission components, favor Cu low alloys positioned toward the right side of the plot. Conversely, applications where mechanical durability under stress is paramount, such as electrical connectors subjected to repeated insertion cycles or springs in relay systems, benefit from the superior hardness of Cu-Ni-Si alloys despite the conductivity penalty (left side of the plot). The scatter observed within each alloy class is particularly instructive. Cu-Ni-Si alloys show substantial vertical spread at any given conductivity value, indicating that careful optimization of composition and processing can achieve different hardness levels while maintaining similar conductivity. Some data points deviate favorably from the average trade-off trend, approaching the upper-right region where both high hardness and good conductivity coexist. These outliers represent optimized compositions and processing conditions that partially overcome the inherent trade-off, providing targets for further alloy development.

Table 3 presents the machine learning model performance metrics for predicting hardness and electrical conductivity. Three ensemble algorithms were evaluated: Random Forest, XGBoost, and Gradient Boosting. For hardness prediction, XGBoost achieves the highest test R-squared of 0.8554 with RMSE of 29.90 HV, meaning that the model explains 85.4% of the variance in hardness values across the diverse dataset spanning both alloy classes. Gradient Boosting follows closely with a test R-squared of 0.8546 and RMSE of 29.99 HV, while Random Forest achieves 0.8449 with RMSE of 30.97 HV. The training R-squared values (0.9111–0.9311) are moderately higher than the test values, indicating models with appropriate complexity that capture genuine patterns without severe overfitting. For electrical conductivity prediction, Gradient Boosting achieves the highest test R-squared of 0.8400 with the lowest RMSE of 5.96%IACS. XGBoost performs comparably (R-squared 0.8285, RMSE 6.17%IACS), and Random Forest shows slightly lower accuracy (R-squared 0.8256, RMSE 6.22%IACS). The training MAE values ranged from 9.68 to 14.81 HV, while the corresponding test MAE values ranged from 18.86 to 21.10 HV, indicating the expected generalization gap when predicting unseen compositions. To contextualize these results, linear regression and LASSO baselines achieved a test R² of 0.607/0.588 for hardness and 0.508/0.444 for conductivity, confirming that the ensemble methods capture significant nonlinear effects (delta R² = 0.25–0.39). The comparable performance across all three ensemble algorithms suggests that these relationships are robust and not artifacts of particular algorithmic approaches. This consistency suggests stable performance across the selected split; further validation using composition-grouped cross-validation is presented below.

Figure 4 provides a visual comparison of model performance across the three ensemble learning algorithms for both target properties. The bar chart format enables direct comparison of training versus testing R-squared values. All three algorithms demonstrate strong predictive capability, with test R-squared values exceeding 0.82 for both properties. The consistency across algorithms confirms that tree-based ensemble methods are particularly well-suited for this metallurgical application because they naturally handle the interaction effects that characterize precipitation hardening. Examining the chart details, the training R-squared bars (lighter shade) consistently exceed test R-squared bars (darker shade) by 0.06–0.11, a gap that indicates appropriate model complexity. If training and test performance were nearly identical, the models might be too simple to capture important relationships; if the gap were much larger (>0.15), overfitting concerns would arise. The observed gap falls within the acceptable range for machine learning models applied to materials science data, where inherent experimental variability limits achievable accuracy. This training–test R-squared gap of 0.07–0.10 is consistent with values reported in comparable machine learning studies for copper alloy property prediction: Kolev [33] reported gaps of 0.05–0.08 for Cu-Ni-Si hardness models, while Zhang et al. [17] observed gaps of 0.06–0.12 for precipitation-strengthened copper alloys. In the broader materials–informatics literature, training–test R-squared differences below 0.15 are generally considered acceptable for datasets derived from heterogeneous literature sources with inherent experimental variability [38,39]. The slight performance advantage of Gradient Boosting methods (XGBoost and Gradient Boosting) over Random Forest, visible as marginally taller test R-squared bars, may indicate the importance of modeling sequential, interactive effects in the strengthening mechanisms. Gradient Boosting builds trees sequentially, with each tree correcting errors from previous trees, potentially capturing subtle dependencies that Random Forest’s parallel ensemble misses. Seed sensitivity analysis (10 random seeds) confirmed representative performance: XGBoost hardness, R² = 0.845 ± 0.017; Gradient Boosting conductivity, R² = 0.850 ± 0.017.

To evaluate whether class imbalance (84.4% Cu-Ni-Si versus 15.6% Cu low alloys) affects prediction quality, model performance was assessed separately for each alloy class. For hardness prediction, all three models achieve substantially higher accuracy on Cu low alloys (R-squared = 0.932–0.977) compared to Cu-Ni-Si alloys (R-squared = 0.812–0.823, RMSE = 31.80–32.82 HV). This disparity reflects the greater compositional and processing diversity within the Cu-Ni-Si class, where the complex interplay between precipitation kinetics and thermomechanical history creates a more challenging prediction landscape. For electrical conductivity, the pattern is even more pronounced: Cu low alloys achieve R-squared values of 0.947–0.958 versus 0.594–0.624 for Cu-Ni-Si alloys. The lower per-class R-squared for Cu-Ni-Si conductivity predictions indicates that within-class variability in solute partitioning and precipitation completeness introduces prediction uncertainty not fully captured by the available input features. These per-class results confirm that the aggregate metrics in Table 3, while accurate overall, mask significant differences in prediction quality between the two alloy families. Sample-weighted training using inverse class frequency weights did not improve Cu low alloy performance, suggesting the disparity reflects genuine prediction complexity rather than training bias.

Table 4 reports the 5-fold cross-validation results, providing crucial evidence of model stability and reliability. For hardness prediction, XGBoost achieves a mean CV R² of 0.8551 with a standard deviation of only 0.0172, indicating consistent performance across different random data partitions. The CV RMSE of 29.14 HV (standard deviation 1.60) confirms stable prediction accuracy. Random Forest shows lower mean performance (CV R-squared 0.8420) but comparable stability (standard deviation 0.0119). Gradient Boosting achieves intermediate values (R-squared 0.8524) with comparable variance (standard deviation 0.0152). For electrical conductivity, XGBoost achieved the highest mean CV R-squared (0.8610), while Gradient Boosting showed closely comparable cross-validation performance (CV R-squared = 0.8581). The conductivity models generally show slightly higher cross-validation variance than hardness models, possibly reflecting greater complexity in the conductivity–composition relationships or more variability in the underlying experimental data. The agreement between single test set R-squared (from Table 3) and cross-validation mean R-squared (differences <0.02 for all models) confirms that the train–test split used for primary evaluation is representative of the overall data distribution. Figure 5 displays the predicted versus actual value plots for the best-performing models for each target property. The left panel shows hardness predictions (XGBoost model), where data points cluster tightly around the diagonal dashed line representing perfect prediction. Points span the full hardness range from 60 HV to 380 HV, demonstrating that the model maintains accuracy for both Cu low alloys at the lower end and highly strengthened Cu-Ni-Si alloys at the upper end. The scatter band around the diagonal has approximately constant width, indicating uniform prediction uncertainty across the property range without systematic over- or under-prediction at extremes. The right panel shows electrical conductivity predictions, again with points tightly clustered around the diagonal across the full range from 10%IACS to 90%IACS. The absence of prediction ceiling or floor effects confirms that the models have learned the underlying physical relationships rather than simply memorizing average values. Comparing the two panels, hardness predictions show slightly tighter clustering (consistent with the marginally higher R-squared for hardness), but both achieve excellent visual agreement between predicted and actual values. This uniform accuracy across property ranges is essential for the comparative analysis: the models reliably capture the behavior of both high-hardness Cu-Ni-Si alloys and moderate-hardness Cu low alloys, enabling meaningful comparison of the mechanisms controlling properties in each alloy class through feature importance analysis.

Table 5 consolidates all validation metrics into a comprehensive summary, combining training performance, test performance, and cross-validation statistics for each model-property combination. This table serves as the definitive reference for model reliability. For hardness, XGBoost shows the best balance with a test R-squared of 0.8554, closely matching the CV R-squared mean of 0.8551, confirming generalization capability. The training R-squared of 0.9307 indicates the model captures fine-grained patterns, while the test-CV agreement confirms these patterns generalize to unseen data. For conductivity, Gradient Boosting achieves a test R-squared of 0.8400, matching the CV R-squared of 0.8581 (slight positive difference suggesting the test set may be marginally easier than average cross-validation folds). The consistency between metrics across the table establishes confidence that subsequent feature importance analysis will reveal data-driven patterns consistent with metallurgical expectations rather than modeling artifacts. Examining the RMSE consistency, hardness test RMSE (29.90–30.97 HV) closely matches CV RMSE (29.14–30.48 HV), and conductivity test RMSE (5.96–6.22%IACS) matches CV RMSE (5.70–6.12%IACS). This metric alignment across validation approaches confirms robust model performance. With 84–86% of property variance explained by composition and processing variables, the models capture the major factors controlling hardness and conductivity while appropriately leaving room for experimental variability and unmeasured factors such as grain size distribution and precipitate morphology.

3.2. Feature Importance and Correlation Analysis

Understanding the relative importance of different compositional and processing factors is central to the comparative analysis of strengthening mechanisms. Two complementary interpretability approaches were employed. First, averaged tree-based feature importance scores, derived from impurity reduction across all three ensemble models, provide quantitative percentage rankings of each feature’s contribution to prediction accuracy. Second, SHAP analysis, implemented via TreeExplainer for the XGBoost model, provides a theoretically grounded framework for quantifying how each input feature contributes to individual predictions, accounting for complex feature interactions and providing directional information about how changing each feature affects predicted properties (Figure 6 and Figure 7).

Figure 6 presents the SHAP summary plots for hardness (left panel) and electrical conductivity (right panel), with each point representing a single alloy sample. Feature importance is ranked from top to bottom, and the horizontal position indicates the SHAP value (positive values increase the predicted property, while negative values decrease it). Point colors encode the actual feature value (high or low). For hardness, VEC emerges as the most important feature (ranked first), serving as a composite descriptor that captures the overall alloying level and its effect on electronic structure. The spread of points along the horizontal axis indicates VEC’s substantial impact on predictions. Aging temperature (Tag) ranks second, with the color gradient showing that intermediate aging temperatures are associated with positive SHAP contributions to hardness, consistent with optimal precipitate nucleation and growth kinetics. Solution temperature (Tss) ranks third, reflecting the critical role of dissolution conditions in establishing the supersaturated solid-solution prerequisite for subsequent precipitation. Aging time (tag) and copper content follow, with low copper values associated with higher predicted hardness due to higher alloying element concentrations. For conductivity, copper content dominates even more dramatically as the top-ranked feature, with high copper values strongly associated with positive conductivity predictions. The overwhelming importance reflects the fundamental physics of metallic conduction: a pure copper matrix provides electron transport, while all alloying additions introduce scattering. Aging time (tag) shows positive SHAP values with longer aging, consistent with conductivity recovery through solute precipitation.

Figure 7 displays SHAP dependence plots revealing the detailed functional relationships between key features and target properties. These plots show how SHAP values (y-axis) change as a function of feature values (x-axis), with point colors encoding interaction effects from secondary features. The nickel-hardness dependence plot (top-left) shows that increasing Ni content from 0 to approximately 3 wt% generally produces positive SHAP contributions to hardness, with values reaching +15 HV at moderate concentrations. The color gradient (showing solution temperature, Tss, as the interacting variable) indicates that the Ni strengthening effect is modulated by dissolution conditions. At Ni contents above 3 wt%, the SHAP contribution plateaus, suggesting diminishing returns from additional nickel beyond the amount needed for efficient precipitate formation. The silicon-hardness dependence (top-center) reveals a peak SHAP contribution around 0.6–1.0 wt% Si, with values reaching approximately +20 HV. Below this range, insufficient silicon limits the volume fraction of Ni2Si precipitates; above approximately 1.5 wt%, the SHAP contribution declines sharply, indicating that excess silicon may partition to other phases or remain in solid solution without additional strengthening benefit. The color gradient shows aging temperature (Tag) as the interacting variable, confirming the coupled effect of precipitate-forming element concentration and thermal treatment on hardness. The aging temperature dependence (top-right) shows a peaked response for hardness, with maximum SHAP contribution of approximately +20 to +30 HV achieved at intermediate temperatures (700–800 K). Below 600 K, SHAP values become strongly negative (reaching −80 to −100 HV), reflecting insufficient thermal activation for precipitate nucleation and growth. At temperatures above 800 K, SHAP contributions decline moderately, consistent with precipitate coarsening that reduces strengthening effectiveness. The aluminum content appears as the interaction variable, indicating a minor modulating role. This peaked response is characteristic of precipitation-hardening kinetics, where an optimum aging window maximizes the density and coherency of strengthening precipitates. For conductivity (bottom panels), the nickel dependence (bottom-left) shows predominantly negative SHAP values at higher Ni concentrations, consistent with nickel’s role as an electron scatterer when dissolved in the copper matrix. Alloys with near-zero nickel (characteristic of Cu low alloys) show positive SHAP contributions of up to +6%IACS, with solution temperature (Tss) appearing as the interaction variable. The silicon-conductivity dependence (bottom-center) mirrors this pattern, with higher silicon content reducing predicted conductivity. The copper content appears as the interaction variable, confirming that the conductivity penalty from silicon is amplified in alloys with lower matrix purity. The copper-conductivity dependence (bottom-right) shows the strongest and most monotonic relationship among all panels: as Cu content increases from approximately 90% to nearly 100%, SHAP contributions rise from approximately −10 to +25%IACS, directly quantifying the dominant influence of matrix purity on electrical transport properties. The color gradient shows solution temperature as the interaction variable, with higher Tss values associated with slightly enhanced conductivity at equivalent copper levels.

To complement the SHAP dependence plots, SHAP interaction values were computed to quantify pairwise feature interaction effects. For hardness, the strongest interactions were between aging temperature and VEC (mean |interaction| = 3.65), aging temperature and aging time (3.58), and Cu and VEC (2.86), reflecting the coupled effects of composition, electronic structure, and thermal processing on precipitation kinetics. The prominence of VEC in the top interactions confirms that this composite descriptor captures alloy-level electronic structure effects that modulate the strengthening contribution of individual compositional and processing variables. For conductivity, Cu-aging time (0.76) and aging temperature–aging time (0.44) dominated the interactions, consistent with the mechanism of solute removal through time-dependent precipitation. Notably, the Ni-Si direct interaction ranked outside the top 10 for both properties, suggesting that the synergistic Ni-Si effect operates primarily through their individual contributions to precipitate formation rather than through a direct statistical interaction term. A systematic analysis of higher-order SHAP interaction values could provide additional data-driven insight but is beyond the scope of the present study.

Table 6 presents the top 10 averaged tree-based feature importance rankings for both target properties, crystallizing the comparative analysis into quantitative form. For hardness, the ranking is: VEC (39.88%), Tag (K) (11.23%), Ni (6.28%), tag (h) (6.22%), CR reduction (5.73%), Tss (K) (5.42%), tss (h)_missing (4.46%), tss (h) (4.07%), Tss (K)_missing (2.57%), and Mg (2.00%). This hierarchy traces the cascade of influence: VEC, as a composition-weighted average of elemental valence electrons, serves as a powerful composite descriptor distinguishing alloy classes and capturing the overall electronic structure effect on strengthening; aging temperature governs precipitate nucleation and growth kinetics; nickel is essential for Ni2Si precipitate formation; aging time determines precipitate evolution; cold rolling provides work hardening and strain-induced precipitation enhancement; solution temperature and solution time establish the supersaturated solid solution; and missingness indicators capture data completeness effects. For electrical conductivity, the ranking shifts dramatically: Cu (37.66%), tag (h) (8.87%), Si (6.71%), Tag (K) (5.16%), Ni (4.54%), Aging_encoded (3.73%), CR reduction (3.47%), Tss (K) (3.18%), Tss (K)_missing (2.98%), and VEC (2.88%). Matrix purity dominates because conductivity is a bulk transport property limited by any scattering mechanism. The prominence of aging time (second importance) reflects conductivity recovery through solute depletion via precipitation. Silicon and nickel appear at moderate importance not because they directly affect conduction but because their precipitation during aging removes them from solid solution, reducing scattering. Comparing the rankings directly: VEC tops the hardness list while Cu tops conductivity, reflecting the distinct physical mechanisms. For hardness, VEC captures the alloying-level electronic structure effects that govern precipitation-strengthening potential; for conductivity, high Cu purity directly minimizes electron scattering. Aging time ranks fourth for hardness but second for conductivity, revealing that while precipitate nucleation kinetics (controlled by temperature) matter more for strength, precipitation completion (controlled by time) matters more for conductivity recovery. These divergent rankings quantify why optimizing one property often compromises the other.

Figure 8 compares averaged tree-based feature importance between hardness and conductivity models in a side-by-side format, enabling direct visualization of the differential sensitivity to various factors. The stark difference in copper importance is immediately apparent: the copper bar for conductivity is substantially taller than for hardness (37.7% versus a low ranking outside the top features), quantifying how much more sensitive conductivity is to matrix composition. This difference has practical implications: small compositional variations that negligibly affect hardness may significantly impact conductivity, requiring tighter compositional control when conductivity is critical. The relative heights of silicon and aging temperature bars reveal another key difference: for hardness, VEC dominates (bar reaching nearly 40% of total), while for conductivity, this descriptor shows a much smaller bar (about 3%). This pattern confirms that hardness responds strongly to precipitate-forming elements and aging kinetics, while conductivity is more responsive to overall alloying level and the simple presence or completeness of aging treatment. The aging time (tag) bar provides an interesting contrast: modest importance for hardness (about 6%) but higher importance for conductivity (about 9%). This suggests that for a given composition optimized for hardness through temperature control, conductivity can be independently improved through extended aging that more completely removes solutes from solution. The relative bar heights thus provide a visual guide for optimization strategy: optimize VEC-related composition and aging temperature to maximize hardness first, and then extend aging time to recover conductivity without substantially affecting the hardness already achieved.

Figure 9 illustrates the elemental distribution comparison between Cu-Ni-Si and Cu low-alloyed systems, providing the compositional foundation for understanding the different strengthening mechanisms. The plots show the concentration ranges and distributions for key alloying elements across both alloy classes. For Cu-Ni-Si alloys, nickel content shows a characteristic distribution centered around 2–5 wt% with a relatively narrow interquartile range, reflecting the targeted compositional design for Ni2Si precipitate formation. Silicon content similarly clusters around 0.5–1.5 wt%, with the distribution shape confirming deliberate optimization near the stoichiometric requirement for efficient precipitation. The correlation between nickel and silicon distributions (both have similar shapes and relative spreads) further confirms that alloy designers jointly optimize these elements. Cu low alloys display a fundamentally different elemental profile. Nickel and silicon contents are much lower (often near zero or in trace amounts), while elements such as chromium, zirconium, silver, and tin appear at low but measurable concentrations. The broader distributions of these minor elements reflect the multiple strategies employed in Cu low alloys: each element offers different combinations of strengthening effect and conductivity penalty, allowing manufacturers to select compositions suited to specific application requirements. Magnesium appears in both alloy classes but with different roles: in Cu-Ni-Si alloys, magnesium provides supplementary solid-solution strengthening and may participate in secondary precipitate phases; in Cu low alloys, it serves as one of several available solid-solution strengtheners. This elemental distribution comparison is directly relevant to interpreting the ML model behavior: the distinct profiles explain why VEC and Cu content dominate predictions, as the two classes occupy different composition-space regions. The element distribution comparison explains the property differences observed in the statistical comparison and the distinct feature importance patterns: Cu-Ni-Si alloys concentrate strengthening through Ni-Si-based precipitation, while Cu low alloys distribute strengthening across multiple dilute solid-solution additions.

3.3. Practical Applications

The trained machine learning models enable several practical applications for alloy design and optimization. To ensure physical plausibility and scientific accuracy, optimization procedures incorporate metallurgical constraints, including Ni:Si stoichiometry requirements for Ni₂Si precipitation, processing parameter bounds based on industrial practice, and compositional limits reflecting solubility constraints. Three key use cases are demonstrated below.

3.3.1. Property Prediction for New Alloy Compositions

The validated machine learning models enable practical applications that accelerate alloy development by replacing iterative experimental trials with computational predictions. Table 7 demonstrates property prediction for representative compositions spanning the design space. Three example alloys illustrate different design strategies:

(i) The high-strength formulation (Cu-4.0Ni-1.0Si-0.15Mg) with Ni:Si atomic ratio of 1.91 and processing conditions (1223 K solution treatment, 60% cold rolling, and 723 K/4 h aging) achieves the predicted hardness of 250.0 HV. The near-optimal Ni:Si stoichiometry ensures efficient Ni₂Si precipitate formation, while moderate cold rolling introduces dislocation density for strain-induced precipitation enhancement without excessive conductivity degradation. The corresponding conductivity of 38.2%IACS represents a favorable balance for high-stress electrical applications.

(ii) The high-conductivity formulation (Cu-0.5Cr-0.15Zr) with minimal total alloying (0.65 wt%) and no cold rolling achieves the predicted conductivity of 80.1%IACS with a hardness of 129.8 HV. The low alloying content preserves matrix purity essential for electron transport, while chromium and zirconium provide precipitation strengthening during aging without the conductivity penalty associated with higher solute content. This composition targets applications requiring superior current-carrying capacity, such as resistance welding electrodes and power transmission components.

(iii) The balanced formulation (Cu-2.5Ni-0.6Si-0.3Cr-0.1Mg) with Ni:Si atomic ratio of 1.99 demonstrates that intermediate properties (242.9 HV and 41.6%IACS) are achievable through judicious combination of precipitation-hardening elements with proper stoichiometry. The moderate processing (40% cold rolling and 748 K/3 h aging) optimizes the strength–conductivity trade-off for general-purpose applications.

3.3.2. Processing Parameter Optimization

Table 8 demonstrates processing optimization for a fixed Cu-3.0Ni-0.7Si-0.1Mg composition (Ni:Si atomic ratio = 2.05), showing how different processing routes shift properties along the strength–conductivity trade-off. The optimization algorithm searched the processing parameter space subject to metallurgical constraints: solution temperature (1173–1273 K), solution time (0.5–4.0 h), aging temperature (673–823 K), and aging time (0.5–12 h). Cold rolling bounds were adjusted based on the optimization target to reflect physical understanding of strengthening mechanisms. For maximum hardness (293.9 HV achieved), the optimizer selected: solution treatment at 1244 K for 4.0 h, 46% CR reduction, and aging at 712 K for 1.3 h. The moderate solution temperature ensures complete dissolution of Ni and Si into the matrix without excessive grain growth. The controlled cold rolling (46%) introduces dislocation networks that serve as heterogeneous nucleation sites for Ni₂Si precipitates. The relatively low aging temperature (712 K) favors nucleation of numerous fine precipitates over fewer coarse particles, maximizing dislocation–precipitate interaction frequency. The resulting conductivity (35.9%IACS) reflects the retained solute and precipitate scattering contributions. For maximum conductivity (45.7%IACS achieved), the optimizer selected: solution treatment at 1174 K for 1.9 h, 24% cold rolling, and aging at 777 K for 8.1 h. The lower cold rolling reduction (24%) minimizes residual dislocation density that would scatter electrons. The higher aging temperature (777 K) accelerates diffusion, promoting rapid and complete precipitation that removes solute from the matrix, while the extended aging time (8.1 h) ensures precipitation approaches completion. The resulting hardness (231.1 HV) remains adequate for many electrical applications while conductivity is substantially improved. The 62.8 HV hardness difference and 9.8%IACS conductivity difference between optimized conditions quantify the property range achievable from a single alloy composition through processing variation alone.

3.3.3. Inverse Design for Target Properties

Table 9 demonstrates the inverse design capability with stoichiometric constraints, where target properties are specified first and the algorithm identifies optimal composition and processing to achieve them while maintaining physical plausibility. The optimization enforces Ni:Si atomic ratio constraints (1.8–2.5) to ensure efficient Ni₂Si precipitation, preventing mathematically optimal but metallurgically unrealistic solutions.

For the balanced target of 220 HV hardness and 45%IACS conductivity, the constrained optimization converged on Cu-2.20Ni-0.53Si-0.45Cr-0.09Mg with processing parameters: solution treatment at 1223 K, tss = 2 h, 40% cold rolling, and aging at 770 K for 7.4 h. The identified composition maintains a Ni:Si atomic ratio of 2.00, which is optimal for Ni₂Si stoichiometry. The total alloying content (3.27 wt%) falls within the typical Cu-Ni-Si alloy range, and the processing conditions align with industrial practice. The algorithm achieved predictions matching the target values (220.0 HV and 45.0%IACS), without deviations, indicating that this specific property combination lies within the achievable design space. Given the low per-class R² for Cu-Ni-Si conductivity, conductivity targets carry substantial uncertainty, and these results should be treated as computational suggestions pending experimental validation.

An important consideration for practical alloy design is prediction uncertainty. The present models provide point predictions without explicit confidence intervals. To estimate prediction uncertainty, ensemble prediction ranges were computed from the three trained models for each test sample. The mean prediction range (maximum minus minimum across models) was 10.15 HV for hardness and 2.23%IACS for conductivity, indicating reasonable inter-model agreement. Additionally, 90th percentile prediction intervals derived from individual Random Forest tree predictions yielded mean interval widths of 55.98 HV for hardness and 13.76%IACS for conductivity, with empirical coverage rates of 70.2% and 79.5%, respectively. Per-class analysis revealed that prediction intervals for Cu low alloys were notably narrower for hardness (31.15 HV vs. 60.09 HV for Cu-Ni-Si) but wider for conductivity (21.56%IACS vs. 12.03%IACS for Cu-Ni-Si), in both cases with higher coverage rates (91.7% and 96.2% for Cu low alloys versus 66.5% and 76.8% for Cu-Ni-Si alloys). The narrower hardness intervals for Cu low alloys reflect their lower compositional complexity and more predictable solid-solution strengthening mechanism, while the wider conductivity intervals likely arise from the broader conductivity range and greater sensitivity of Cu low alloys to processing variations. These uncertainty estimates should be considered when interpreting the optimization and inverse design results in Table 7, Table 8 and Table 9, as the predicted property values carry inherent uncertainty that could affect the practical realization of targeted properties.

A potential concern for model evaluation is data leakage arising from composition overlap between training and test sets. Because the dataset contains multiple entries for the same base composition processed under different thermomechanical conditions, 96.4% of hardness test samples and 97.9% of conductivity test samples share their rounded elemental composition (to 0.1 wt.%) with at least one training sample. This high overlap is inherent to the dataset structure, where the same alloy compositions are systematically studied under varying processing routes. Importantly, the model must still correctly predict the effect of different processing parameters on properties for these shared compositions, which constitutes a meaningful prediction task. Nonetheless, test performance on samples with unique compositions not seen during training was lower (R-squared = 0.358 for hardness, 0.715 for conductivity), indicating that extrapolation to entirely novel compositions is substantially less reliable than interpolation within the training composition space. This finding underscores that the models are most reliable when applied to compositions similar to those in the training database and should be used cautiously for genuinely novel alloy formulations. The grouped cross-validation results (R² = 0.36–0.44 for hardness) further quantify this limitation.

4. Conclusions

This study developed and evaluated machine learning models for predicting the hardness and electrical conductivity of Cu-Ni-Si and Cu low alloys, with the following key outcomes:

Three ensemble-based algorithms (Random Forest, XGBoost, and Gradient Boosting) were trained on a dataset of 1690 alloy entries. XGBoost demonstrated superior performance for hardness prediction with R² = 0.855 and RMSE = 29.90 HV, while Gradient Boosting achieved the best electrical conductivity prediction with R² = 0.840 and RMSE = 5.96%IACS. Composition-grouped cross-validation (K = 10) yielded R² = 0.36–0.44 for hardness and R² = 0.18–0.29 for conductivity, indicating limited generalization to new experimental contexts. Five-fold cross-validation confirmed model generalization with consistent performance across validation folds (CV R² > 0.84 for all models).
Averaged tree-based feature importance analysis identified VEC as the most influential predictor for hardness (39.9%), while Cu content dominated conductivity prediction (37.7%). For hardness, aging temperature (11.2%) and Ni content (6.3%) were identified as secondary factors, consistent with the importance of Ni₂Si precipitate formation and thermal processing. For conductivity, aging time (8.9%) and Si content (6.7%) emerged as key secondary features, consistent with the role of solute removal through precipitation in reducing electron scattering.
The systematic comparison between Cu-Ni-Si alloys (1427 entries) and Cu low alloys (263 entries) quantified the distinct property distributions and demonstrated that a unified machine learning framework can model property trends and statistically learned correlates across both alloy families within a single predictive model, supporting in-distribution screening of compositional and processing effects. Cu-Ni-Si alloys achieved a mean hardness of 222.6 HV (range: 57–386 HV) compared to 110.5 HV (range: 61–181 HV) for Cu low alloys, while Cu low alloys maintained a superior mean conductivity of 64.9%IACS versus 36.7%IACS. Correlation analysis revealed structured Ni-Si compositional coupling in Cu-Ni-Si alloys consistent with Ni₂Si precipitation stoichiometry, contrasting the simpler correlation structure of Cu low alloys.
The trained models were applied to three alloy development use cases: (1) property prediction for new compositions, enabling rapid computational screening without physical synthesis; (2) processing parameter optimization using differential evolution with metallurgical constraints, demonstrating that hardness can be varied from 231.1 to 293.9 HV for a single composition by processing manipulation alone; and (3) inverse design with Ni:Si stoichiometric constraints to identify alloy formulations matching specified target properties (220 HV and 45%IACS) without prediction deviations, confirming that the target lies within the achievable design space.

Several limitations should be acknowledged. First, the optimization and inverse design results are purely computational predictions requiring experimental validation. The composition-grouped cross-validation (R² = 0.36–0.44 for hardness, R² = 0.18–0.29 for conductivity) demonstrates that generalization to unseen alloy formulations is limited. Second, ensemble prediction ranges and RF-based prediction intervals provide approximate but not rigorous probabilistic uncertainty bounds. Third, the dataset exhibits class imbalance (84.4% Cu-Ni-Si vs. 15.6% Cu low alloys) and high composition overlap (96–98%); sample weighting did not improve minority-class performance. Fourth, Cu-Ni-Si conductivity predictions show low per-class R² (0.59–0.62), likely due to missing microstructure descriptors (precipitate state and precipitation completion degree). Fifth, SHAP interpretations identify data-driven correlates consistent with metallurgical frameworks but do not prove causal mechanisms. Future work should prioritize: (i) experimental validation; (ii) microstructural descriptors; (iii) Bayesian optimization; (iv) Pareto multi-objective optimization; and (v) expanded Cu low-alloy training data.

This work demonstrates that interpretable machine learning, combining accurate prediction with data-driven, physically informed interpretation through SHAP analysis, offers a useful tool for supporting copper alloy development within the training data distribution. The methodology presented here can be extended to other alloy systems where composition–processing–property relationships are complex and high-dimensional, with the potential to reduce experimental iterations in materials design, subject to experimental validation.

Funding

This research was funded by the Bulgarian National Science Fund, Project KП-06-H97/10 “Fabrication and Investigation of Antifriction Hybrid Nanocomposites Based on Aluminum Alloys Using an Integrated Approach Combining Experimental Methods and Machine Learning”.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The work in this publication was performed using equipment funded by the project BG16RFPR002-1.014-0006 “National Center of Excellence for Mechatronics and Clean Technologies”.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yi, J.; Jia, Y.; Zhao, Y.; Xiao, Z.; He, K.; Wang, Q.; Wang, M.; Li, Z. Precipitation Behavior of Cu-3.0Ni-0.72Si Alloy. Acta Mater. 2019, 166, 261–270. [Google Scholar] [CrossRef]
Tripathy, S.; Sutradhar, G. Effect of copper addition on tensile behaviour of Al-Cu alloy used in high temperature applications. Int. J. Surf. Eng. Interdiscip. Mater. Sci. 2022, 10, 1–16. [Google Scholar] [CrossRef]
Wang, Z.; Li, J.; Fan, Z.; Zhang, Y.; Hui, S.; Peng, L.; Huang, G.; Xie, H.; Mi, X. Effects of Co Addition on the Microstructure and Properties of Elastic Cu-Ni-Si-Based Alloys for Electrical Connectors. Materials 2021, 14, 1996. [Google Scholar] [CrossRef]
Fu, H.; Zhang, Y.; Zhang, M.; Yun, X. Microstructure Evolution, Precipitation Behavior, and Mechanical Properties of Continuously Extruded Cu–Ni–Si Alloys at Different Aging Treatments. J. Mater. Process. Technol. 2023, 317, 117986. [Google Scholar] [CrossRef]
Yang, K.; Wang, Y.; Guo, M.; Wang, H.; Mo, Y.; Dong, X.; Lou, H. Recent Development of Advanced Precipitation-Strengthened Cu Alloys with High Strength and Conductivity: A Review. Prog. Mater Sci. 2023, 138, 101141. [Google Scholar] [CrossRef]
Peng, H.; Xie, W.; Chen, H.; Wang, H.; Yang, B. Effect of Micro-Alloying Element Ti on Mechanical Properties of Cu–Cr Alloy. J. Alloys Compd. 2021, 852, 157004. [Google Scholar] [CrossRef]
Cho, C.-H.; Shin, J.; Kim, D.; Cho, H. Influence of Addition of Al and Ti Solutes and Variable Processing Conditions on Mechanical and Electrical Properties of Cu-Cr Alloys. Metals 2020, 11, 39. [Google Scholar] [CrossRef]
Galimzyanov, B.N.; Doronina, M.A.; Mokshin, A.V. Neural Network as a Tool for Design of Amorphous Metal Alloys with Desired Elastoplastic Properties. Metals 2023, 13, 812. [Google Scholar] [CrossRef]
Linton, N.; Aidhy, D.S. A Machine Learning Framework for Elastic Constants Predictions in Multi-Principal Element Alloys. APL Mach. Learn. 2023, 1, 016109. [Google Scholar] [CrossRef]
Jeon, J.; Seo, N.; Son, S.B.; Lee, S.-J.; Jung, M. Application of Machine Learning Algorithms and SHAP for Prediction and Feature Analysis of Tempered Martensite Hardness in Low-Alloy Steels. Metals 2021, 11, 1159. [Google Scholar] [CrossRef]
Chong, X.; Shang, S.-L.; Krajewski, A.M.; Shimanek, J.D.; Du, W.; Wang, Y.; Feng, J.; Shin, D.; Beese, A.M.; Liu, Z.-K. Correlation Analysis of Materials Properties by Machine Learning: Illustrated with Stacking Fault Energy from First-Principles Calculations in Dilute Fcc-Based Alloys. J. Phys. Condens. Matter 2021, 33, 295702. [Google Scholar] [CrossRef]
Xu, T.; Fu, B.; Jiang, Y.; Wang, J.; Li, G. Machine Learning and Experimental Study on a Novel Cr–Mo–V–Ti High Manganese Steel: Microstructure, Mechanical Properties and Abrasive Wear Behavior. J. Mater. Res. Technol. 2024, 31, 1270–1281. [Google Scholar] [CrossRef]
Wang, S.; Li, J.; Zuo, X.; Chen, N.; Rong, Y. An Optimized Machine-Learning Model for Mechanical Properties Prediction and Domain Knowledge Clarification in Quenched and Tempered Steels. J. Mater. Res. Technol. 2023, 24, 3352–3362. [Google Scholar] [CrossRef]
Xiang, S.; Chen, T.; Fan, Z.; Chen, X.; Wu, Z.; Lian, X. Rapid Prediction of High-Temperature Properties of Furnace Tube Alloys Using Deep Learning Approaches. In Proceedings of the Volume 6: Materials and Fabrication; American Society of Mechanical Engineers: New York, NY, USA, 2020. [Google Scholar]
Asnaashari, S.; Shateri, M.; Hemmati-Sarapardeh, A.; Band, S.S. Modeling of the Sintered Density in Cu-Al Alloy Using Machine Learning Approaches. ACS Omega 2023, 8, 28036–28051. [Google Scholar] [CrossRef] [PubMed]
Shaban, M.; Alsharekh, M.F.; Alsunaydih, F.N.; Alateyah, A.I.; Alawad, M.O.; BaQais, A.; Kamel, M.; Nassef, A.; El-Hadek, M.A.; El-Garaihy, W.H. Investigation of the Effect of ECAP Parameters on Hardness, Tensile Properties, Impact Toughness, and Electrical Conductivity of Pure Cu through Machine Learning Predictive Models. Materials 2022, 15, 9032. [Google Scholar] [CrossRef]
Zhang, H.; Fu, H.; Zhu, S.; Yong, W.; Xie, J. Machine Learning Assisted Composition Effective Design for Precipitation Strengthened Copper Alloys. Acta Mater. 2021, 215, 117118. [Google Scholar] [CrossRef]
Xie, B.; Fang, Q.; Li, J.; Liaw, P.K. Predicting the Optimum Compositions of High-Performance Cu–Zn Alloysviamachine Learning. J. Mater. Res. 2020, 35, 2709–2717. [Google Scholar] [CrossRef]
Pan, S.; Wang, Y.; Yu, J.; Yang, M.; Zhang, Y.; Wei, H.; Chen, Y.; Wu, J.; Han, J.; Wang, C.; et al. Accelerated Discovery of High-Performance Cu-Ni-Co-Si Alloys through Machine Learning. Mater. Des. 2021, 209, 109929. [Google Scholar] [CrossRef]
Blaschke, D.N.; Miller, C.; Mier, R.; Osborn, C.; Thomas, S.M.; Tegtmeier, E.L.; Winter, W.P.; Carpenter, J.S.; Hunter, A. Predicting Electrical Conductivity in Cu/Nb Composites: A Combined Model-Experiment Study. J. Appl. Phys. 2022, 132, 045105. [Google Scholar] [CrossRef]
Carral, Á.D.; Xu, X.; Gravelle, S.; YazdanYar, A.; Schmauder, S.; Fyta, M. Stability of Binary Precipitates in Cu-Ni-Si-Cr Alloys Investigated through Active Learning. Mater. Chem. Phys. 2023, 306, 128053. [Google Scholar] [CrossRef]
Zhang, H.; Fu, H.; He, X.; Wang, C.; Jiang, L.; Chen, L.-Q.; Xie, J. Dramatically Enhanced Combination of Ultimate Tensile Strength and Electric Conductivity of Alloys via Machine Learning Screening. Acta Mater. 2020, 200, 803–810. [Google Scholar] [CrossRef]
Sasikumar, K.; Chan, H.; Narayanan, B.; Sankaranarayanan, S.K.R.S. Machine Learning Applied to a Variable Charge Atomistic Model for Cu/Hf Binary Alloy Oxide Heterostructures. Chem. Mater. 2019, 31, 3089–3102. [Google Scholar] [CrossRef]
Thapliyal, S.; Mishra, A. Machine Learning Classification-Based Approach for Mechanical Properties of Friction Stir Welding of Copper. Manuf. Lett. 2021, 29, 52–55. [Google Scholar] [CrossRef]
Zhang, M.; Chen, D.; Liu, H.; Zhang, Y.; Song, H.; Xu, Y.; Zhang, S. Research on Hot Deformation Behavior of Cu-Ti Alloy Based on Machine Learning Algorithms and Microalloying. Mater. Today Commun. 2024, 39, 108783. [Google Scholar] [CrossRef]
Kamal, T.; Gouthama; Upadhyaya, A. Machine Learning Aided Prediction of Sintering Induced Swelling in Powder Metallurgical Cu-Sn Alloys. Comput. Mater. Sci. 2024, 235, 112827. [Google Scholar] [CrossRef]
Huang, Z.; Shi, R.; Xiao, X.; Fu, H.; Chen, Q.; Xie, J. Mechanism Investigation on High-Performance Cu-Cr-Ti Alloy via Integrated Computational Materials Engineering. Mater. Today Commun. 2021, 27, 102378. [Google Scholar] [CrossRef]
Zhao, Q.; Yang, H.; Liu, J.; Zhou, H.; Wang, H.; Yang, W. Machine Learning-Assisted Discovery of Strong and Conductive Cu Alloys: Data Mining from Discarded Experiments and Physical Features. Mater. Des. 2021, 197, 109248. [Google Scholar] [CrossRef]
Ma, M.; Li, Z.; Zhao, Y.; Gong, S.; Lei, Q.; Jia, Y.; Qiu, W.; Xiao, Z.; Jiang, Y.; Xu, X.; et al. Developing Softening-Resistant Cu-Cr Alloys and Understanding Their Mechanisms via Mechanism-Informed Interpretable Machine Learning. J. Mater. Sci. Technol. 2025, 229, 252–268. [Google Scholar] [CrossRef]
Guo, X.; Li, L.; Liu, G.; Kang, H.; Chen, Z.; Guo, E.; Jie, J.; Wang, T. Microstructure and Properties of Cu-3Ti-0.3Cr-0.15Mg Alloy Designed via Machine Learning. Mater. Sci. Eng. A Struct. Mater. 2024, 916, 147344. [Google Scholar] [CrossRef]
Liu, J.; Zhang, G.; Wang, J.; Zhang, H.; Han, Y. Research on Cu-Sn Machine Learning Interatomic Potential with Active Learning Strategy. Comput. Mater. Sci. 2025, 246, 113450. [Google Scholar] [CrossRef]
Jin, H.; Wang, H.; Wang, X.; Zhang, J.; Zhou, C. Alloying Element Distributions of Precipitates in Cu–Cr Alloys Aided by Machine Learning. Mater. Today Commun. 2023, 36, 106612. [Google Scholar] [CrossRef]
Kolev, M. A Novel Approach to Predict the Effect of Chemical Composition and Thermo-Mechanical Processing Parameters on Cu–Ni–Si Alloys Using a Hybrid Deep Learning and Ensemble Learning Model. Compos. Commun. 2024, 48, 101903. [Google Scholar] [CrossRef]
Kolev, M.; Javorova, J.; Simeonova, T.; Hadjitodorov, Y.; Krastev, B. Deep Learning-Driven Insights into Hardness and Electrical Conductivity of Low-Alloyed Copper Alloys. Alloys 2025, 4, 22. [Google Scholar] [CrossRef]
Díaz Carral, Á.; Gravelle, S.; Fyta, M. In Silico Design and Prediction of Metastable Quaternary Phases in Cu–Ni–Si–Cr Alloys. APL Mach. Learn. 2024, 2, 046109. [Google Scholar] [CrossRef]
Gorsse, S.; Gouné, M.; Lin, W.-C.; Girard, L. Dataset of Mechanical Properties and Electrical Conductivity of Copper-Based Alloys. Sci Data 2023, 10, 504. [Google Scholar] [CrossRef] [PubMed]
Gorsse, S.; Gouné, M.; Lin, W.-C.; Girard, L. Dataset of Mechanical Properties and Electrical Conductivity of Copper-Based Alloys. Figshare Collect. 2023. [Google Scholar]
Karathanasopoulos, N.; Singh, A.; Hadjidoukas, P. Machine Learning-Based Modelling, Feature Importance and Shapley Additive Explanations Analysis of Variable-Stiffness Composite Beam Structures. Structures 2024, 62, 106206. [Google Scholar] [CrossRef]
Chen, J.; Zhao, F.; Sun, Y.; Zhang, L.; Yin, Y. Prediction Model Based on XGBoost for Mechanical Properties of Steel Materials. Int. J. Model. Ident. Control 2019, 33, 322–330. [Google Scholar] [CrossRef]
Çolak, A.B. Extreme Gradient Boosting (XGBoost) Prediction of Latent Heat Storage in Conical-Fin Solar Thermal Units with Graphene-Alumina Hybrid-Nano-Additives. Heat Mass Transf. 2026, 62, 11. [Google Scholar] [CrossRef]
Zhao, X.; Lan, Y.; Mu, X. Assessing Changes in Soil Electrical Conductivity under Runoff Influences Using Gradient Boosting and Metaheuristic Algorithms. Electr. Eng. 2024, 107, 4783–4799. [Google Scholar] [CrossRef]

Figure 1. Dataset overview showing sample distribution and property histograms for Cu-Ni-Si and Cu low alloys.

Figure 2. Correlation heatmaps for Cu-Ni-Si alloys (a) and Cu low alloys (b).

Figure 3. Hardness versus electrical conductivity trade-off for Cu-Ni-Si and Cu low alloys.

Figure 4. Model performance across Random Forest, XGBoost, and Gradient Boosting algorithms for (a) Vickers hardness and (b) electrical conductivity.

Figure 5. Predicted versus actual values for the best performing models for: (a) hardness (HV) and (b) conductivity (%IACS).

Figure 6. SHAP summary plots showing feature impact on: (a) hardness (HV) and (b) electrical conductivity (%IACS) predictions.

Figure 7. SHAP dependence plots for top features affecting hardness and conductivity.

Figure 8. Averaged tree-based feature importance comparison between: (a) hardness and (b) conductivity models.

Figure 9. Element distribution comparison between Cu-Ni-Si and Cu low-alloyed systems.

Table 1. Dataset summary showing sample distribution across alloy classes.

Alloy Class	Samples	Hardness (HV) Samples	Conductivity (%IACS) Samples
Cu-Ni-Si alloys	1427	1325	1422
Cu low alloyed	263	182	263
Total	1690	1507	1685

Table 2. Statistical comparison of hardness and electrical conductivity between alloy classes.

Property	Alloy Class	N	Mean	Std	Min	Max	Median
Hardness (HV)	Cu-Ni-Si alloys	1325	222.61	70.91	57.0	386.0	219.0
Hardness (HV)	Cu low alloyed	182	110.53	35.46	61.0	181.0	107.0
Electrical conductivity (%IACS)	Cu-Ni-Si alloys	1422	36.72	10.07	8.0	86.0	36.0
Electrical conductivity (%IACS)	Cu low alloyed	263	64.88	17.77	20.0	93.0	69.0

Table 3. Aggregate model performance metrics for hardness and electrical conductivity prediction.

Target	Model	Train R²	Test R²	Train RMSE	Test RMSE	Train MAE	Test MAE
Hardness (HV)	Random Forest	0.9111	0.8449	22.75	30.97	14.81	21.10
Hardness (HV)	XGBoost	0.9307	0.8554	20.08	29.90	10.00	18.86
Hardness (HV)	Gradient Boosting	0.9311	0.8546	20.03	29.99	9.68	19.11
Electrical conductivity (%IACS)	Random Forest	0.9180	0.8256	4.47	6.22	2.84	4.17
Electrical conductivity (%IACS)	XGBoost	0.9387	0.8285	3.86	6.17	1.86	3.63
Electrical conductivity (%IACS)	Gradient Boosting	0.9391	0.8400	3.85	5.96	1.80	3.63

Table 4. 5-fold cross-validation results showing model stability.

Target	Model	CV R² Mean	CV R² Std	CV RMSE Mean	CV RMSE Std
Hardness (HV)	Random Forest	0.8420	0.0119	30.48	1.26
Hardness (HV)	XGBoost	0.8551	0.0172	29.14	1.60
Hardness (HV)	Gradient Boosting	0.8524	0.0152	29.44	1.57
Electrical conductivity (%IACS)	Random Forest	0.8396	0.0314	6.12	0.45
Electrical conductivity (%IACS)	XGBoost	0.8610	0.0284	5.70	0.44
Electrical conductivity (%IACS)	Gradient Boosting	0.8581	0.0289	5.75	0.43

Table 5. Complete validation metrics summary, including training, testing, and cross-validation performance.

Target	Model	Train R²	Test R²	Train RMSE	Test RMSE	Train MAE	Test MAE	CV R² Mean	CV R² Std	CV RMSE Mean	CV RMSE Std
Hardness (HV)	Random Forest	0.9111	0.8449	22.75	30.97	14.81	21.10	0.8420	0.0119	30.48	1.26
Hardness (HV)	XGBoost	0.9307	0.8554	20.08	29.9	10.00	18.86	0.8551	0.0172	29.14	1.60
Hardness (HV)	Gradient Boosting	0.9311	0.8546	20.03	29.99	9.68	19.11	0.8524	0.0152	29.44	1.57
Electrical conductivity (%IACS)	Random Forest	0.9180	0.8256	4.47	6.22	2.84	4.17	0.8396	0.0314	6.12	0.45
Electrical conductivity (%IACS)	XGBoost	0.9387	0.8285	3.86	6.17	1.86	3.63	0.8610	0.0284	5.70	0.44
Electrical conductivity (%IACS)	Gradient Boosting	0.9391	0.8400	3.85	5.96	1.80	3.63	0.8581	0.0289	5.75	0.43

Table 6. Top 10 averaged tree-based feature importance rankings for hardness (HV) and electrical conductivity (%IACS) prediction.

Target	Rank	Feature	Importance
Hardness (HV)	1	VEC	0.3988
Hardness (HV)	2	Tag (K)	0.1123
Hardness (HV)	3	Ni	0.0628
Hardness (HV)	4	tag (h)	0.0622
Hardness (HV)	5	CR reduction (%)	0.0573
Hardness (HV)	6	Tss (K)	0.0542
Hardness (HV)	7	tss (h)_missing	0.0446
Hardness (HV)	8	tss (h)	0.0407
Hardness (HV)	9	Tss (K)_missing	0.0257
Hardness (HV)	10	Mg	0.02
Electrical conductivity (%IACS)	1	Cu	0.3766
Electrical conductivity (%IACS)	2	tag (h)	0.0887
Electrical conductivity (%IACS)	3	Si	0.0671
Electrical conductivity (%IACS)	4	Tag (K)	0.0516
Electrical conductivity (%IACS)	5	Ni	0.0454
Electrical conductivity (%IACS)	6	Aging_encoded	0.0373
Electrical conductivity (%IACS)	7	CR reduction (%)	0.0347
Electrical conductivity (%IACS)	8	Tss (K)	0.0318
Electrical conductivity (%IACS)	9	Tss (K)_missing	0.0298
Electrical conductivity (%IACS)	10	VEC	0.0288

Table 7. Predicted properties for representative alloy compositions.

Alloy Type	Composition	Processing	HV	%IACS
High-strength	Cu-4.0Ni-1.0Si-0.15Mg	Tss = 1223 K, tss = 2 h, CR reduction = 60%, Tag = 723 K, tag = 4 h	250.0	38.2
High-conductivity	Cu-0.5Cr-0.15Zr	Tss = 1223 K, tss = 1 h, No CR reduction, Tag = 723 K, tag = 4 h	129.8	80.1
Balanced	Cu-2.5Ni-0.6Si-0.3Cr-0.1Mg	Tss = 1223 K, tss = 2 h, CR reduction = 40%, Tag = 748 K, tag = 3 h	242.9	41.6

Table 8. Optimized processing for Cu-3.0Ni-0.7Si-0.1Mg alloy.

Optimization Target	Tss (K)	tss (h)	CR Reduction (%)	Tag (K)	tag (h)	HV	%IACS
Maximum Hardness	1244	4.0	46	712	1.3	293.9	35.9
Maximum Conductivity	1174	1.9	24	777	8.1	231.1	45.7

Table 9. Inverse design results for target properties with stoichiometric constraints.

Target Property	Target Value	Optimized Composition	Ni:Si Ratio	Processing Conditions	Predicted Value
Hardness	220 HV	Cu-2.20Ni-0.53Si-0.45Cr-0.09Mg	2.00	Tss = 1223 K, tss = 2 h, CR reduction = 40%, Tag = 770 K, tag = 7.4 h	220 HV
Conductivity	45%IACS	Cu-2.20Ni-0.53Si-0.45Cr-0.09Mg	2.00	Tss = 1223 K, tss = 2 h, CR reduction = 40%, Tag = 770 K, tag = 7.4 h	45%IACS

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kolev, M. Machine Learning-Driven Comparative Analysis and Optimization of Cu-Ni-Si and Cu Low Alloys: From Data-Driven Interpretation to Inverse Design. Alloys 2026, 5, 9. https://doi.org/10.3390/alloys5020009

AMA Style

Kolev M. Machine Learning-Driven Comparative Analysis and Optimization of Cu-Ni-Si and Cu Low Alloys: From Data-Driven Interpretation to Inverse Design. Alloys. 2026; 5(2):9. https://doi.org/10.3390/alloys5020009

Chicago/Turabian Style

Kolev, Mihail. 2026. "Machine Learning-Driven Comparative Analysis and Optimization of Cu-Ni-Si and Cu Low Alloys: From Data-Driven Interpretation to Inverse Design" Alloys 5, no. 2: 9. https://doi.org/10.3390/alloys5020009

APA Style

Kolev, M. (2026). Machine Learning-Driven Comparative Analysis and Optimization of Cu-Ni-Si and Cu Low Alloys: From Data-Driven Interpretation to Inverse Design. Alloys, 5(2), 9. https://doi.org/10.3390/alloys5020009

Article Menu

Machine Learning-Driven Comparative Analysis and Optimization of Cu-Ni-Si and Cu Low Alloys: From Data-Driven Interpretation to Inverse Design

Abstract

1. Introduction

2. Data and Methodology

2.1. Data Acquisition and Dataset Description

2.2. Data Cleaning and Feature Engineering

2.3. Machine Learning Models

2.4. Model Performance Metrics

2.5. Interpretability and SHAP Analysis

2.6. Optimization and Inverse Design Framework

3. Results and Discussions

3.1. Modeling

3.2. Feature Importance and Correlation Analysis

3.3. Practical Applications

3.3.1. Property Prediction for New Alloy Compositions

3.3.2. Processing Parameter Optimization

3.3.3. Inverse Design for Target Properties

4. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI