Inverse Design of Aluminium Alloys Using Genetic Algorithm: A Class-Based Workflow

: The design of aluminium alloys often encounters a trade-off between strength and ductility, making it challenging to achieve desired properties. Adding to this challenge is the broad range of alloying elements, their varying concentrations, and the different processing conditions (features) available for alloy production. Traditionally, the inverse design of alloys using machine learning involves combining a trained regression model for the prediction of properties with a multi-objective genetic algorithm to search for optimal features. This paper presents an enhancement in this approach by integrating data-driven classes to train class-specific regressors. These models are then used individually with genetic algorithms to search for alloys with high strength and elongation. The results demonstrate that this improved workflow can surpass traditional class-agnostic optimisation in predicting alloys with higher tensile strength and elongation.


Introduction
Aluminium (Al) alloys remain integral to numerous industries, ranging from aerospace, construction, electronics, transportation, and marine [1][2][3].Pure Al is not used as a structural material owing to its low strength, with alloying being critical to enhancing strength and physical properties [4].The use of Al alloys is essential in many applications for minimising weight due to their high strength-to-weight ratio; however, high strength is often at the cost of lower ductility.Structure-property relationships in Al-alloys permit the development of alloys with diverse properties that can be tailored for specific applications.For instance, 5xxx series Al alloys (based on the Al-Mg system) are chosen for marine applications owing to their corrosion resistance [4], while 7xxx series alloys (based on the Al-Zn-Mg system) are preferred in applications that require high strength and damage tolerance [5].
The experimental design of Al alloys has traditionally relied on a trial-and-error approach, which faces inherent challenges due to the number of alloying elements and different processing conditions (features) available for the production and fabrication of alloys [5].The trial-and-error approach primarily focuses on the isolated examination of a single feature [6][7][8][9], which, in the case of Al alloys, may include the effect of the concentration of an alloying element or the impact of varying processing parameters.While readily interpretable, such approaches are ineffective for efficiently investigating simultaneous changes in multiple features in alloys, particularly in Al alloys that may include more than 15 alloying elements at different concentrations [5].The relationship between alloying elements and processing conditions and mechanical properties (targets) is non-linear, posing significant challenges in the design of Al alloys.
Machine learning has emerged as a powerful tool for identifying non-linear relationships in metallic alloys (including Al alloys), successfully predicting mechanical properties based on alloy compositions and processing conditions [10][11][12][13][14][15][16].Random forest models have predicted tensile strength and elongation in wrought Al alloys with 11% and 14% error rates, respectively [15].In Al-Mg-Si alloys, random forest models have also been reported to perform better than neural network and support vector regression in tensile strength prediction, achieving an error rate of 2.87% on a test set [12].Despite these advancements in forward-predictive models, inverse design [17], which involves creating alloys based on target properties, remains a complex task.The exhaustive exploration of all possible alloy combinations for inverse design is infeasible due to the vast combinatorial space, which includes multiple alloying elements and various processing conditions [18,19].
Multi-objective optimisation algorithms, particularly genetic algorithms (a type of evolutionary computing), have been extensively used for inverse design, thus addressing the complexities of exploring vast combinatorial spaces [12,[20][21][22][23][24].For instance, Feng et al. [12] combined a random forest model with the Non-dominated Sorting Genetic Algorithm (NSGA-II) to optimise strength and ductility in Al-Mg-Si alloy.This led to the successful prediction of an Al alloy with alloying concentrations of 0.74% Mg, 0.78% Si, and 0.37% Cu, with 410 MPa tensile strength and 15.2% elongation.Experimentally, this alloy composition demonstrated superior performance to the commonly used AA6013 alloy, with a tensile strength of 410 MPa and an elongation of 15.2%.Genetic programming, in combination with NSGA-II, has been utilised to enhance the strength and ductility of age-hardenable Al alloys.This approach led to the formulation of an alloy with a high Zn concentration of 5 wt% and moderate levels of Cu and Mg, each at 2 wt%.The resultant alloy showed a tensile strength of 356 MPa and elongation of 13% when peak-aged.Similarly, rough fuzzy models have been integrated with genetic algorithms to design Al alloys that exhibit high yield strength and elongation at cryogenic temperatures [21,25].Cu and Mg emerged as critical alloying elements, with the optimal composition identified as Al-Cu-Mg-Si alloys, with Cu concentrations varying from 0.82 to 2.03 wt% and Mg concentrations between 0.72 and 1.48 wt%.
The design of Al alloys using machine learning has predominantly focused on specific subsets or classes of aluminium alloys, such as the Al-Mg-Si series [12] or age-hardenable alloys [20,21,24].However, it remains unclear whether optimising properties within these specific classes yields more beneficial properties compared to a broader optimisation strategy that uses the entire dataset.For mechanical property prediction, data-driven classbased models have been reported to have higher accuracy [13].It is unclear whether classbased optimisation offers any advantages in the context of alloy design using genetic design.
This study presents a workflow to improve the performance of traditional multiobjective optimisation-based design.The proposed workflow utilises data-driven classes by training class-specific regressors.These regressors are used to optimise objectives, tensile strength, and elongation, for each class.Additionally, this study uses recursive feature elimination to refine and reduce each class's feature space.The performance of class-based optimisation is compared with class-agnostic models by comparing the optimal objectives predicted on the Pareto front.Finally, to assess the utility of the class-based genetic design, the alloys predicted are compared with those already reported in the literature.

Dataset
In this study, we use a publicly accessible dataset of Al alloys, which was curated by the authors [26].This dataset encompasses cast and wrought alloys and includes age-hardened and strain-hardened alloys.The dataset includes three mechanical properties (targets): tensile strength (44.81-820MPa), yield strength (151.26-790MPa), and elongation (0.5-50%).

It also includes information on the concentration of 25 different alloying elements and
Metals 2024, 14, 239 3 of 18 the manufacturing processes, which are grouped into ten distinct processing conditions (features).The dataset contains 1154 instances, with 933 alloys having complete data on all three mechanical properties.Figure 1 illustrates the distribution of tensile strength and elongation within the dataset, with each alloy class denoted by different colours.This visual representation also highlights the inherent trade-off between strength and ductility in these alloys.
data on all three mechanical properties.Figure 1 illustrates the distribution of tensile strength and elongation within the dataset, with each alloy class denoted by different colours.This visual representation also highlights the inherent trade-off between strength and ductility in these alloys.
In a previous study, iterative label spreading was used [27] to identify eight distinct clusters in the data determined by feature similarity [28].Further, a decision tree classifier showed that these clusters are separable classes, and information on these classes is also included in the dataset [28].Class 1 is characterised by "as cast" or "solution heat-treated" alloys.Class 2 includes alloys with a high Cu content, while Class 3 consists of coldworked and artificially aged alloys.Class 4 includes over-aged alloys, followed by Class 5, which features strain-hardened alloys with Mg additions.Class 6 consists of naturally aged alloys, and Classes 7 and 8 are differentiated by their high Mg and Fe concentrations, respectively.

Multi-Target Random Forest Models
This study uses random forest regressors for forward prediction due to their higher accuracy in predicting the mechanical properties of aluminium alloys [12,15].Tree-based methods partition the feature space into distinct regions through successive splits, beginning at the root and continuing until a stop criterion is reached [29].Each split is determined by a greedy algorithm aiming to reduce a loss, with mean squared error [30] commonly used for regression.Each node in the tree represents these splits.Random forest is an ensemble machine learning method that uses a decision tree as the base model [31].A random forest is constructed by generating multiple decision trees, each trained using bootstrap aggregation and random feature selection.This approach ensures that each tree in the forest uses a different subset of features, minimising the influence of any single feature on the overall partitioning process and reducing correlation amongst individual In a previous study, iterative label spreading was used [27] to identify eight distinct clusters in the data determined by feature similarity [28].Further, a decision tree classifier showed that these clusters are separable classes, and information on these classes is also included in the dataset [28].Class 1 is characterised by "as cast" or "solution heat-treated" alloys.Class 2 includes alloys with a high Cu content, while Class 3 consists of cold-worked and artificially aged alloys.Class 4 includes over-aged alloys, followed by Class 5, which features strain-hardened alloys with Mg additions.Class 6 consists of naturally aged alloys, and Classes 7 and 8 are differentiated by their high Mg and Fe concentrations, respectively.

Multi-Target Random Forest Models
This study uses random forest regressors for forward prediction due to their higher accuracy in predicting the mechanical properties of aluminium alloys [12,15].Tree-based methods partition the feature space into distinct regions through successive splits, beginning at the root and continuing until a stop criterion is reached [29].Each split is determined by a greedy algorithm aiming to reduce a loss, with mean squared error [30] commonly used for regression.Each node in the tree represents these splits.Random forest is an ensemble machine learning method that uses a decision tree as the base model [31].A random forest is constructed by generating multiple decision trees, each trained using bootstrap aggregation and random feature selection.This approach ensures that each tree in the forest uses a different subset of features, minimising the influence of any single feature on the overall partitioning process and reducing correlation amongst individual trees.For regression tasks, the final output is the average predicted value from all the individual trees in the forest.Random forest regressors also expose feature importance profiles using variance reduction.These profiles are valuable for identifying how the features are used in the model architecture for property prediction.The random forest regressor was implemented using the scikit-learn library [32].

Multi-Objective Optimisation Method
This study utilised the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to identify the concentration of alloying elements leading to optimal tensile strength and ductility [33].Here, "optimal" refers to the best possible trade-off between these two mechanical properties.The NSGA-II identifies a set of optimal solutions known as Paretooptimal solutions, representing the best trade-offs between conflicting objectives such as strength and ductility.An initial population of potential solutions is randomly generated.Each solution in the population is encoded as a continuous-valued chromosome, representing a possible alloying concentration.The random forest regressor is then used to predict the tensile strength and elongation of the potential solutions in the fitness function.The NSGA-II algorithm uses a non-dominated sorting method, which ranks individuals based on dominance criteria.This study's dominance criteria determines whether one solution is superior in at least one objective (either tensile strength or elongation) without being inferior in the other, thus identifying the most effective trade-offs between these two properties.Solutions are classified into different fronts, the first front being entirely non-dominated, with subsequent fronts representing solutions dominated by those in the preceding fronts.Within each front, solutions were assigned a crowding distance value, measuring the density of solutions surrounding a particular alloy.
In this study, the selection process was carried out using a binary tournament selection based on rank and crowding distance [34,35].Crossover and mutation were applied to create a new offspring population.The algorithm iterates through generations until a termination condition is met, which could be a predefined number of generations or a convergence criterion based on the diversity of the Pareto front.A hypervolume measure is often used to measure the convergence of a multi-objective optimisation algorithm [36].The NSGA-II algorithm implementation discussed in this paper used the pymoo package, the details of which are available in [37].In the implementation of the genetic algorithm, the following parameters were manually determined to ensure that the algorithm converged: a population size of 500 individuals, 125 offspring per generation, a mutation probability of 0.2, and a crossover probability of 0.9.The optimisation was carried out for 500 generations.The hypervolume plot demonstrating convergence can be found in the supplementary information.

An Alloy Design Workflow
As illustrated in Figure 2, the workflow starts with partitioning the dataset into datadriven classes, as identified in a previous study.For each class, distinct random forest models (R i ) are trained.Prior to the training of regressors, feature engineering is conducted.This involved the removal of features exhibiting a linear correlation greater than 95% to reduce bias.The processing condition feature is one hot encoded [38].Subsequently, the remaining features and targets undergo normalisation using MinMaxScaler.Each dataset is then partitioned into 80% train and 20% test sets.The hyperparameters of all regressors are optimised using a random gird search with 1000 iterations and 5-fold cross-validation at each iteration.
Recursive feature elimination is used to determine the optimal number of features required to maintain similar accuracy.Subsequently, optimised random forest models (R ′ i ) are trained for each class using this curated set of features.The optimised models are used in the fitness function of the NSGA-II algorithm to predict tensile strength and elongation.Then, the NSGA-II algorithm is used to identify the Pareto front (P i ), which includes optimal alloying features that lead to dominating mechanical properties.The size of the chromosome is equal to the number of features selected for each class.The objective functions for the NSGA-II algorithm are the predicted tensile strength and elongation using R ′ i .Further, the concentration of each element alloy is constrained between zero and the maximum concentration of an element in the class.
includes optimal alloying features that lead to dominating mechanical properties.Th of the chromosome is equal to the number of features selected for each class.The obje functions for the NSGA-II algorithm are the predicted tensile strength and elongatio ing ℛ .Further, the concentration of each element alloy is constrained between zero the maximum concentration of an element in the class.Parallel to the class-based approach, a random forest regressor (ℛ ) is trained u the entire dataset.Similar to the class-based workflow, an optimised model (ℛ ) i veloped using features selected through recursive feature elimination and featur portance profiles.This optimised model is then utilised in the NSGA-II algorithm's fi function to compute the Pareto front for the entire dataset.
The class-based Pareto fronts are then compared to  to evaluate the effectiv of the class-based approach against a more generalised model, identifying each met benefits and potential drawbacks.The comparison is conducted manually with the u a scatter plot for tensile strength against elongation for each respective Pareto front.

Model Training and Feature Selection
Multi-targeted random forest regressors were trained to predict the three mecha properties.Learning curves were used to assess model overfitting and underfitting test set accuracies of the trained models ℛ , where  1, 2, 3, 4, 5, 6, 7, 8,  , are vided in Table 1 using the mean squared error (MSE) and mean absolute error (MAE) The relative importance of the feature to the property prediction is presented u feature importance (FI) profiles in Figure 3, which shows that some features have ne ble importance in the prediction of mechanical properties.Eliminating such feature fore training might not affect the accuracy of the model.Further, the significance of cessing conditions is negligible in some class-based regressors, particularly in compa with ℛ .This is attributed to alloys within such class having the same processing c tion.In classes with multiple processing conditions, for example, Class 8, the impor of processing conditions can be seen.This can be further validated using the recu feature elimination (RFE) profiles in Figure 4. Recursive feature elimination is carrie by iteratively identifying and removing the least significant features.The optimal nu of features is determined using recursive feature elimination, as noted in Table 1 number of features required for optimal performance is lower for all class-based re sors than ℛ .This means that class-based regressors need fewer features to predi mechanical properties.
Notably, there is a disparity between the FI and RFE profiles for Class 4 and t and RFE profiles for Class 4. The FI profile highlights several important features, wh Parallel to the class-based approach, a random forest regressor (R all ) is trained using the entire dataset.Similar to the class-based workflow, an optimised model (R ′ all ) is developed using features selected through recursive feature elimination and feature importance profiles.This optimised model is then utilised in the NSGA-II algorithm's fitness function to compute the Pareto front for the entire dataset.
The class-based Pareto fronts are then compared to P all to evaluate the effectiveness of the class-based approach against a more generalised model, identifying each method's benefits and potential drawbacks.The comparison is conducted manually with the use of a scatter plot for tensile strength against elongation for each respective Pareto front.

Model Training and Feature Selection
Multi-targeted random forest regressors were trained to predict the three mechanical properties.Learning curves were used to assess model overfitting and underfitting.The test set accuracies of the trained models R i , where i = {1, 2, 3, 4, 5, 6, 7, 8, all}, are provided in Table 1 using the mean squared error (MSE) and mean absolute error (MAE) [39].The relative importance of the feature to the property prediction is presented using feature importance (FI) profiles in Figure 3, which shows that some features have negligible importance in the prediction of mechanical properties.Eliminating such features before training might not affect the accuracy of the model.Further, the significance of processing conditions is negligible in some class-based regressors, particularly in comparison with R all .This is attributed to alloys within such class having the same processing condition.In classes with multiple processing conditions, for example, Class 8, the importance of processing conditions can be seen.This can be further validated using the recursive feature elimination (RFE) profiles in Figure 4. Recursive feature elimination is carried out by iteratively identifying and removing the least significant features.The optimal number of features is determined using recursive feature elimination, as noted in Table 1.The number of features required for optimal performance is lower for all class-based regressors than R all .This means that class-based regressors need fewer features to predict the mechanical properties.
Notably, there is a disparity between the FI and RFE profiles for Class 4 and the FI and RFE profiles for Class 4. The FI profile highlights several important features, whereas the RFE profile indicates that optimal accuracy is attainable using merely two features.The RFE profile for Class 4 suggests that as the number of features escalates, the regressor begins to overfit the data.Consequently, variance reduction from additional features, which might not be needed for the optimal model, is incorporated when computing the FI profiles.This observation indicates that some features, though appearing significant in an overfitted model, may not be needed to achieve the best model performance.Furthermore, Class 7's RFE profile reveals an initial minimum error with a single feature, which subsequently rises upon integrating additional features, indicative of overfitting.The low error with a single feature suggests that Mg alone is a strong predictor in the model.
The random forest models were retrained using the selected features.The test set accuracy is also reported in Table 1, which shows that there is no loss of accuracy compared to respective regressors trained on all the features.The MAE of individual targets, along with the standard deviation error, is provided in the supplementary information.For the majority of the optimised class-based regressors (R ′ 1 , R ′ 2 , R ′ 4 , R ′ 5 and R ′ 6 ), the optimised model exhibited superior performance, as indicated by the lower MSE and MAE values compared with the R ′ all model accuracy.The MAE for classes such as R ′ 4 (0.0315) and R ′ 6 (0.0290) were significantly lower than other class-based regressors.However, the MAE for R ′ 7 and R ′ 8 was higher than for R ′ all .The learning curve of the models is presented in the supplementary information.For R ′ 4 and R ′ 6 , the learning curves displayed a consistent decrease in MSE with increasing training sizes, showing low bias and variance.In contrast, the learning curve for R ′ 7 demonstrated a high MSE even as the training size increased.R ′ 8 also exhibited higher MSE values than R ′ 4 , and R ′ 6 , although the increase in MSE was less than R ′ 7 .The learning curves of both classes show higher underfitting compared with the other classes.R ′ all showed an overall higher MSE compared with the other classes.The learning curve for R ′ all suggests that the model underfits the data, suggesting that the model is not complex enough to generalise across the dataset.This underperformance could also be attributed to the current feature set not fully capturing the relationships with the targets.This figure demonstrates that a subset of features can achieve the same level of accuracy as using the entire feature set.

Pareto Front
The trained regressors were used to identify the Pareto front maximising for tensile strength and elongation using the NSGA-II algorithm.P all , which is calculated using R all , servers as a baseline for comparison to other class-based Pareto fronts.The optimal tensile strength and elongation predicted using selected regressors are presented in Figure 5. P 1 , P 2 , and P 6 were selected because they collectively dominate P all .Other Pareto fronts can be found in the supplementary information provided.P all spans a more extensive range of tensile strength and elongation than the class-based regressors.However, the class-based Pareto fronts (P 1 , P 2 and P 6 ) dominate P all within their respective regions.
5.  ,  , and  were selected because they collectively dominate  .Other Pareto fronts can be found in the supplementary information provided. spans a more extensive range of tensile strength and elongation than the class-based regressors.However, the class-based Pareto fronts ( ,  and  ) dominate  within their respective regions.
The baseline Pareto front  exhibits comparable performance to the reported literature.Specifically, in the tensile strength range of 350-450 MPa and elongation between 15% and 20%,  resembles the Pareto front reported by Feng et al. [12], which focused on optimising the properties of Al-Mg-Si alloys.Further, the Pareto front presented by Sekhar et al. [40] for the AA6063 alloy outperforms  in the 20-30% elongation region and has similar characteristics in regions below 20% elongation.However, the performance in regions 20-30% is comparable to the Pareto front  .
The enhanced performance of class-based Pareto fronts over  can be attributed to their focused optimisation within smaller regions because they are trained in specific regions of the feature space.This facilitates a more efficient optimisation process, enhancing the predicted tensile strength and elongation in the Pareto front.[40] and experimentally tests alloys similar to the predicted alloys [41][42][43].A detailed presentation of all other Pareto fronts can be found in this paper's supplementary information.[40] and experimentally tests alloys similar to the predicted alloys [41][42][43].A detailed presentation of all other Pareto fronts can be found in this paper's supplementary information.
The baseline Pareto front P all exhibits comparable performance to the reported literature.Specifically, in the tensile strength range of 350-450 MPa and elongation between 15% and 20%, P all resembles the Pareto front reported by Feng et al. [12], which focused on optimising the properties of Al-Mg-Si alloys.Further, the Pareto front presented by Sekhar et al. [40] for the AA6063 alloy outperforms P all in the 20-30% elongation region and has similar characteristics in regions below 20% elongation.However, the performance in regions 20-30% is comparable to the Pareto front P 1 .
The enhanced performance of class-based Pareto fronts over P all can be attributed to their focused optimisation within smaller regions because they are trained in specific regions of the feature space.This facilitates a more efficient optimisation process, enhancing the predicted tensile strength and elongation in the Pareto front.

Predicted Compositions
To inspect the alloys predicted to be on the Pareto fronts, the concentrations of the predicted alloys were assessed against tensile strength.Only a selected number of predicted alloy concentrations are reported below, with additional predictions detailed in the supplementary information.The optimal alloy concentrations predicted for the Pareto front P 1 are presented in Figure 6.The predictions show two distinct ranges of tensile strength for the alloys: one for strengths between 75 and 150 MPa and another for strengths between 275 and 300 MPa.An example alloy on this Pareto front, exhibiting a tensile strength of 150 MPa and an elongation of 29%, is Al-0.9%Sc, which has undergone solutionising as its processing condition.

Alloys Predicted within Class 2
The alloys predicted on Pareto front  show high strength and low elongation.The solutions within this front surpass  for strengths exceeding 600 MPa; hence, the discussions below are limited to predictions with strengths greater than 600 MPa.The concentration of alloying elements in  is reported in Figure 7.An example alloy on this Pareto front, with a tensile strength of 750 MPa and an elongation of 10%, is Al-12Zn-4Mg-1.5Cu, which is artificially peak-aged.
In the tensile strength range of 650-750 MPa, the alloys are predicted to have high concentrations of Zn, accompanied by some Mg and Cu.These compositions mirror the overaged Al-Zn-Mg-Cu alloys documented in the literature, which are known for their high strength, primarily due to MgZn2 precipitates [42].
For the even higher tensile strength ranges (greater than 750 MPa), the predicted alloys contain higher concentrations of Zn and Mg than those reported in the existing literature, representing unexplored compositions.A 7xxx alloy reported in the literature with a Zn concentration of 8.67 wt% and a Mg concentration of 2.50 wt% exhibited a tensile strength of 641 MPa [48].The enhanced strength of the alloy was attributed to precipitate strengthening due to the formation of MgZn2 precipitates, a mechanism that may also apply to the newly predicted alloys.The predicted alloys present promising candidates that require experimental validation to confirm their tensile properties and any specific performance criteria relevant to target applications.A pronounced discontinuity is apparent in the Pareto front P 1 , particularly noticeable at elongations exceeding 30%.In this region, the model predicts alloys with strengths under 200 MPa, whereas for elongations under 30%, the predicted strength jumps to around 300 MPa.This abrupt transition is likely due to the imposed constraint that limits the alloying elements' concentration to their maximum concentration in the dataset for the given class.Specifically, when the scandium concentration reaches a maximum of 0.9%, the model shifts to predicting the characteristics of Al-Mg-Mn alloys.The difference in optimal properties is consistent with the predictions of two distinct types of alloys within the respective regions.Further, discontinuities were also seen in tensile strength and elongation Pareto front by Dey et al. [20] during optimisation for age-hardenable Al alloys.Another possible explanation for the discontinuity in the Pareto front might be the presence of two distinct clusters within the Class 1 dataset.However, the elbow plot [44] for KMeans clustering included in the supplementary information indicates a consistent reduction in distortion as the number of clusters increases.This suggests that there are no additional subclusters within Class 1.
The alloy with a tensile strength of 275 MPa is characterised by a higher Mg concentration, aligning with the composition of AA5xxx series alloys.Mg enhances strength via solid solution hardening mechanisms [45,46].However, a rise in Mg content can result in the formation of Mg 5 Al 8 precipitates, which are known to reduce ductility, explaining the observed trend in P 1 where higher strength alloys exhibit lower elongation [5].The precipitation of Mg 5 Al 8 can be inhibited by the minor addition of elements such as Mn and Cr, which is also predicted, as seen in Figure 6b,c.
Conversely, alloys with tensile strengths below 150 MPa show low levels of alloying elements.These predicted alloys contain minor additions of Sc, which is known to increase alloy strength through the formation of Al 3 Sc precipitates [47].Besides Sc, these aluminium alloys contain very low quantities of other alloying elements.

Alloys Predicted within Class 2
The alloys predicted on Pareto front P 2 show high strength and low elongation.The solutions within this front surpass P all for strengths exceeding 600 MPa; hence, the discussions below are limited to predictions with strengths greater than 600 MPa.The concentration of alloying elements in P 2 is reported in Figure 7.An example alloy on this Pareto front, with a tensile strength of 750 MPa and an elongation of 10%, is Al-12Zn-4Mg-1.5Cu, which is artificially peak-aged.
In the tensile strength range of 650-750 MPa, the alloys are predicted to have high concentrations of Zn, accompanied by some Mg and Cu.These compositions mirror the overaged Al-Zn-Mg-Cu alloys documented in the literature, which are known for their high strength, primarily due to MgZn 2 precipitates [42].
For the even higher tensile strength ranges (greater than 750 MPa), the predicted alloys contain higher concentrations of Zn and Mg than those reported in the existing literature, representing unexplored compositions.A 7xxx alloy reported in the literature with a Zn concentration of 8.67 wt% and a Mg concentration of 2.50 wt% exhibited a tensile strength of 641 MPa [48].The enhanced strength of the alloy was attributed to precipitate strengthening due to the formation of MgZn 2 precipitates, a mechanism that may also apply to the newly predicted alloys.The predicted alloys present promising candidates that require experimental validation to confirm their tensile properties and any specific performance criteria relevant to target applications.

Alloys Predicted within Class 6
Alloys at the Pareto front P 6 are defined by their moderate strength and ductility, including naturally aged alloys, and their alloying concentrations are reported in Figure 8.The alloys predicted in this class include moderate-strength 7xxx series alloys and the higher-strength 6xxx series alloys.Further, at higher strengths, this also includes 2xxx Al-Cu-Li alloys.Similar to Class 1, the Pareto front exhibits a discontinuous region between the two distinct predicted alloys (Al-Cu-Li and Al-Zn-Mg-Cu alloys).An example alloy on the Pareto front, with a tensile strength of 375 MPa and an elongation of 24%, is Al-6Zn-3Cu-0.5Li,which has been naturally aged.
Alloys with tensile strengths surpassing 500 MPa are Al-Cu-Li-based alloys, where strength is primarily attributed to Al 2 CuLi precipitates, enhancing the mechanical properties, as reported in [49,50].Further, it can be seen that these alloys also have a minor addition of Mg (0.5 wt%), which leads to faster precipitation of the Al 2 CuLi phase, leading to higher strengthening during natural ageing [51].
In the tensile strength range of 350 to 500 MPa, the alloys are predicted to be of the Al-Zn-Mg-Cu alloys, which belong to the 7xxx series.Within this range, two alloy types can be observed based on Mg content: high Mg (2 wt%) and low Mg (0.5 wt%).At high Mg concentrations, natural ageing results in the formation of Guinier-Preston (GP) zones [52], with alloy strengthening attributed to coherency and modulus mismatch strain [43].Conversely, at lower Mg concentrations, the strength is primarily due to the formation of η ′ phases during natural ageing [53,54].

Alloys Predicted within Class 6
Alloys at the Pareto front  are defined by their moderate strength and ductility, including naturally aged alloys, and their alloying concentrations are reported in Figure 8.The alloys predicted in this class include moderate-strength 7xxx series alloys and the higher-strength 6xxx series alloys.Further, at higher strengths, this also includes 2xxx Al-Cu-Li alloys.Similar to Class 1, the Pareto front exhibits a discontinuous region between the two distinct predicted alloys (Al-Cu-Li and Al-Zn-Mg-Cu alloys).An example alloy on the Pareto front, with a tensile strength of 375 MPa and an elongation of 24%, is Al-6Zn-3Cu-0.5Li,which has been naturally aged.
Alloys with tensile strengths surpassing 500 MPa are Al-Cu-Li-based alloys, where strength is primarily attributed to Al2CuLi precipitates, enhancing the mechanical properties, as reported in [49,50].Further, it can be seen that these alloys also have a minor For alloys with tensile strengths below 350 MPa, the model predicts Al-Mg-Si alloys with trace amounts of Cu [55].The introduction of copper modifies the precipitation sequence in Al-Mg-Si alloys, leading to metastable precipitates that influence the alloy's microstructure and hardness.This results in the growth of phases like semi-coherent β ′′ and Q ′ along specific crystallographic directions [56].Notably, these alloys exhibit excellent formability, leading to their use in the automotive industry [57].
For alloys with tensile strengths below 350 MPa, the model predicts Al-Mg-Si alloys with trace amounts of Cu [55].The introduction of copper modifies the precipitation sequence in Al-Mg-Si alloys, leading to metastable precipitates that influence the alloy's microstructure and hardness.This results in the growth of phases like semi-coherent β″ and Q′ along specific crystallographic directions [56].Notably, these alloys exhibit excellent formability, leading to their use in the automotive industry [57].

Discussion
The study presented here introduces a methodology for enhancing the design of alloys using class-based optimisation.By using a class-based genetic algorithm, this study demonstrated the prediction of a Pareto front that may surpass those generated by classagnostic optimisation techniques.The combination of Pareto fronts P 1 , P 2 , and P 6 emerge as a dominant front, outperforming the Pareto front P all .Further, the forward prediction multi-target regressors achieved low error in line with the literature, and some class-based outperformed the multi-target regressor trained on the entire dataset, which aligns with previous findings.
Multi-target random forest regressors expose feature importance profiles, which were used to identify the impact features on simultaneous prediction of the three mechanical properties.This revealed that only a specific subset of elements substantially affects these Metals 2024, 14, 239 14 of 18 properties, directing the alloy design strategy towards optimising the concentrations of these elements while disregarding the less impactful ones.In the prediction of mechanical properties for R 1 , Mg and Cu emerged as the most significant, likely due to their role in solid solution hardening mechanisms that enhance strength while reducing ductility [45,46].For R 2 , Zn, Cu, Mg, and Si were identified as crucial, with the presence of Zn and Mg linked to the formation of MgZn 2 precipitates, contributing to precipitate strengthening [46].The combination of Mg and Si in this context suggests strengthening due to the formation of the β ′′ phase in the 2xxx and 6xxx alloy series [58].Cu, Si, and Mg are most important for prediction in R 6 , which might denote strength due to similar mechanisms as in Class 2. Notably, the high importance of Cu in Class 6 could be attributed to the addition of Cu in 6xxx alloys, leading to the formation of a finer, needle-shaped β ′′ phase, which further increases the strength of these alloys [59,60].
The combination of Pareto fronts that dominated P all included alloys with the following processing conditions: solutionised and peak-aged, naturally aged, as-cast, and solutionised.This implies that alloys with the most favourable properties can be fabricated by using only these processes.Notably, these processes are commonly used in manufacturing Al alloys [5,46], indicating that the alloys predicted may be readily manufactured using existing processes.
Comparative analysis with alloys documented in the literature validated the efficacy of the class-based genetic design predictions.The predicted concentration and processing conditions aligned with those reported in the literature.The predictions indicated that as-cast Al-Sc and Al-Mg alloys would exhibit low tensile strengths due to their low alloying element concentrations, which is also responsible for their high ductility [5].The model predicted naturally aged Al-Mg-Si, Al-Zn-Mg-Cu, and Al-Cu-Li alloys for moderate strength.The strength of these alloys is commonly attributed to precipitation hardening, as noted in the existing studies [51,52,56].At the high-strength end, the model anticipated peak-aged Al-Zn-Mg-Cu alloys, aligning with similar high-strength alloy reports in the literature [61].
Certain alloys that had not been previously documented in the literature were identified among these predictions.This includes 7xxx series alloys with high Zn (12 wt%) and Mg (4 wt%) concentrations predicted with tensile strength greater than 700 MPa.There are additional factors to consider for high Zn aluminium alloys that also have Mg.These factors, which are not included in our current model or the data used for it, might involve the risk of hot cracking when these alloys solidify from a liquid during production and casting and their tendency to crack under stress corrosion conditions [62].The former is a topic that has been studied both in the context of wrought Al-Zn-Mg alloys [63] and additively manufactured Al-Zn-Mg alloys [64,65].While further experimental validation through the fabrication and testing of these alloys is necessary to confirm the predictions, such investigations fall beyond the scope of this paper and represent a promising direction for subsequent research efforts.
The composition and processing parameters of an alloy significantly influence its resultant microstructure.This microstructure plays a vital role in determining the mechanical properties of the alloy.In this study, the regression models are trained to predict the mechanical properties of alloys by directly using the concentrations of alloying elements and processing conditions.This design methodology, however, does not eliminate the need for experimentation.Key processing parameters, notably ageing time and temperature, remain to be optimised through experimental methods as they are not included within the current dataset.Incorporating data on these processing parameters could significantly enhance the model's applicability.A notable limitation inherent in optimisation-based design is that the forward model only provides an estimate of the error in predicted properties.Experimental validation is essential to estimate the error in predicting alloy concentrations.
Despite the limitations in machine learning-based computational approaches for alloy design, the model provides guidance for alloy development with predictive capabilities and Pareto front calculations.When additional properties like conductivity or hardness are part of the alloy design requirements, the workflow provides initial predictions of potential alloys that meet the strength and ductility requirements.These alloys require subsequent experimental validation to confirm their suitability in meeting additional criteria.Furthermore, the model's utility could be improved by including additional properties in the dataset and, where relevant, augmenting the dataset with calculated parameters, such as phase concentration.Such an approach was recently presented in the context of multi-principal element alloys by Li and co-workers [66].

Conclusions
This study presented a design methodology using class-based optimisation to predict optimal Al alloy compositions.This method surpassed traditional class-agnostic optimisation techniques in predicting alloys with enhanced tensile strength and elongation, identifying key alloying elements for targeted optimisation.Recursive feature elimination was also used to reduce the feature space, particularly benefiting class-based regressors, which require fewer features for optimal accuracy.This study, using data-driven approaches, has shown that the class-based optimisation of Al alloys further improves the predictions of Al alloys.The predicted alloys are consistent with the current literature, supporting the utility of the data-driven approach and model framework.Moreover, this study identified previously unreported 7xxx series alloys with high Zn and Mg concentrations predicted to have tensile strengths over 700 MPa, suggesting a promising area for future experimental validation.The method cannot substitute for experimental processes, particularly in optimising critical parameters like ageing time (3 h to 48 h) and temperature (100 • C to 200 • C), which vary with processing conditions and typically are determined through domain knowledge.The results herein, however, provide an interpretable framework for guiding future Al alloy design.

Figure 1 .
Figure 1.Distribution of tensile strength and elongation in the dataset utilised for the present study.

Figure 1 .
Figure 1.Distribution of tensile strength and elongation in the dataset utilised for the present study.

Figure 2 .
Figure 2. Workflow for the design of aluminium alloys as explored herein.The workflow mences with data-driven partitions using unsupervised machine learning and ends with classoptimisation.

Figure 2 .
Figure 2. Workflow for the design of aluminium alloys as explored herein.The workflow commences with data-driven partitions using unsupervised machine learning and ends with class-based optimisation.

Figure 3 .Figure 3 .Figure 4 .
Figure 3. Feature importance profiles for (a) ℛ , (b) ℛ , (c) ℛ , (d) ℛ , (e) ℛ , (f) ℛ , (g) ℛ , (h) ℛ , and (i) ℛ .The figures illustrate the relative significance of each feature in predicting mechanical properties.It highlights that certain features have negligible importance, suggesting that the Figure 3. Feature importance profiles for (a) R 1 , (b) R 2 , (c) R 3 , (d) R 4 , (e) R 5 , (f) R 6 , (g) R 7 , (h) R 8 , and (i) R all .The figures illustrate the relative significance of each feature in predicting mechanical properties.It highlights that certain features have negligible importance, suggesting that the exclusion of some features prior to model training may not adversely impact the model's accuracy.The ideal number of features was calculated using recursive feature elimination.

Figure 5 .
Figure 5.Comparison of Pareto fronts  ,  , and  , each exhibiting superior tensile strength and elongation compared with the Pareto front  .The Pareto font is compared with the Pareto front reported by Sekhar et al.[40] and experimentally tests alloys similar to the predicted alloys[41][42][43].A detailed presentation of all other Pareto fronts can be found in this paper's supplementary information.

Figure 5 .
Figure 5.Comparison of Pareto fronts P 1 , P 2 , and P 6 , each exhibiting superior tensile strength and elongation compared with the Pareto front P all .The Pareto font is compared with the Pareto front reported by Sekhar et al.[40] and experimentally tests alloys similar to the predicted alloys[41][42][43].A detailed presentation of all other Pareto fronts can be found in this paper's supplementary information.

Figure 6 .
Figure 6.Optimal alloy concentrations on Pareto front 1.This figure illustrates two distinct regions of tensile strength: one below 150 MPa and another above 275 MPa.(a) Mg concentration, (b) Mn concentration, (c) Cr concentration, and (d) Sc concentration.

Figure 6 .
Figure 6.Optimal alloy concentrations on Pareto front 1.This figure illustrates two distinct regions of tensile strength: one below 150 MPa and another above 275 MPa.(a) Mg concentration, (b) Mn concentration, (c) Cr concentration, and (d) Sc concentration.

Table 1 .
The optimal number of features for each regressor and test set accuracy of the model trained with all features and selected features.R i denotes a regressor trained on all the features.R ′ i denotes a regressor trained on select features.