Next Article in Journal
Optimizing Asphalt Surface Course Compaction: Insights from Aggregate Triaxial Acceleration Responses
Previous Article in Journal
Interaction Mechanism Characterized by Bond Performance and Diffusion Performance between TiO2@LDO and Asphalt Based on Molecular Dynamics Simulation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predictive Modeling of Tensile Strength in Aluminum Alloys via Machine Learning

1
School of Electrical & Information Engineering, Beihang University, No. 37, Xueyuan Road, Beijing 100191, China
2
Beijing Advanced Innovation Center for Materials Genome Engineering, Innovation Research Institute for Carbon Neutrality, University of Science and Technology Beijing, No. 30, Xueyuan Road, Beijing 100083, China
3
State Key Laboratory for Advanced Metals and Materials, University of Science and Technology Beijing, No. 30, Xueyuan Road, Beijing 100083, China
4
National Joint Engineering Research Center for Abrasion Control and Molding of Metal Materials, Henan University of Science and Technology, Luoyang 471003, China
5
Longmen Laboratory, Luoyang 471003, China
6
School of Materials Science and Engineering, Henan University of Science and Technology, Luoyang 471003, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Materials 2023, 16(22), 7236; https://doi.org/10.3390/ma16227236
Submission received: 11 October 2023 / Revised: 1 November 2023 / Accepted: 2 November 2023 / Published: 20 November 2023

Abstract

:
Aluminum alloys are widely used due to their exceptional properties, but the systematic relationship between their grain size and their tensile strength has not been thoroughly explored in the literature. This study aims to fill this gap by compiling a comprehensive dataset and utilizing machine learning models that consider both the alloy composition and the grain size. A pivotal enhancement to this study was the integration of hardness as a feature variable, providing a more robust predictor of the tensile strength. The refined models demonstrated a marked improvement in predictive performance, with XGBoost exhibiting an R2 value of 0.914. Polynomial regression was also applied to derive a mathematical relationship between the tensile strength, alloy composition, and grain size, contributing to a more profound comprehension of these interdependencies. The improved methodology and analytical techniques, validated by the models’ enhanced accuracy, are not only relevant to aluminum alloys, but also hold promise for application to other material systems, potentially revolutionizing the prediction of material properties.

1. Introduction

Aluminum alloys possess commendable mechanical properties, such as a high strength-to-weight ratio, excellent corrosion resistance, an appealing appearance, good recyclability, ease of manufacture, and non-magnetism. In recent years, aluminum alloys have found extensive applications in large-span structures, marine and offshore structures, movable lightweight structures, and prefabricated systems [1,2]. Over the past few decades, numerous studies have been conducted on the material properties of aluminum alloys and their structural performance at environmental temperatures [3,4,5,6,7,8,9]. All of these studies have improved the properties of aluminum alloys by adding elements, biphasic nucleation, and new preparation processes. However, it should be noted that these improvement methods are closely related to the grain size. About 70 years ago, it was discovered that there is a linear relationship between flow stress and the square root of the grain size, resulting in the recognized Hall-Petch grain refinement and strengthening effect [10]. Since then, researchers have also focused on the crystal size and properties of aluminum alloys. For example, the relationship between the grain size and properties such as the flow stress [11,12], mechanical properties [13], thermal cracking [14], and deformation temperature [15] have been examined. Although grain refinement can enhance the performance of aluminum alloys, abnormal grain growth can lead to structural defects, severely affecting the mechanical properties of the alloys [16,17,18]. With the development of computers and information technology, mathematical models describing the structure and properties are the basis for the accurate prediction and control of the structure and properties of aluminum alloys in the production process, and thus, they have become a focus of attention [19,20,21,22]. However, these current predictive models are mainly obtained by regressing experimental data on specific cases or by physical/mathematical reasoning under certain assumptions.
In recent years, machine learning has transcended its theoretical boundaries, demonstrating practical applications across diverse fields, particularly in materials science [23,24,25,26,27]. These techniques, distinguished by their data-intensive approach, enable nuanced understanding and prediction by identifying complex patterns and relationships within extensive datasets. For instance, researchers have applied machine learning algorithms to predict hardness, yield/tensile strength, and elastic constants with a considerable accuracy [28,29,30]. These studies exemplify the potential of machine learning to facilitate a deeper understanding of material behavior, optimize material performance, and even guide the discovery of new materials. Moreover, machine learning’s predictive capabilities are not confined to homogeneous or isotropic materials, but extend to anisotropic materials, composites, and intricate material systems. Such versatility underscores the adaptability and broad applicability of machine learning in contemporary materials science research. However, despite these advancements, there is a noticeable gap in the application of machine learning to the realm of aluminum alloys, particularly concerning the prediction of tensile strength based on compositional variables and the grain size. This research endeavors to bridge this gap, contributing to the burgeoning field of materials informatics and offering new insights for the production and application of aluminum alloys.
This study explored the intricate relationships between the tensile strength of aluminum alloys, the alloy composition, and the grain size. It utilized a diverse range of machine learning algorithms to model and predict the tensile strength of these aluminum alloys. Through different feature selection methods and model evaluations, the most influential features were determined, thereby more effectively revealing the correlation between elemental feature variables and the grain size with the tensile strength. Furthermore, a mathematical expression was derived to quantitatively assess the correlation between the tensile strength and the aluminum alloy composition and grain size, employing a polynomial regression analysis.

2. Data Preparation and Analysis

2.1. Data Collection and Preprocessing

Data collection constitutes the foundation of materials informatics research. To investigate the correlation between the tensile strength of aluminum alloys and their alloy composition and grain size, 84 experimental data points were gathered from the literature [31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55]. The vast majority of the data in the dataset (90%) were cast aluminum alloys, with the remainder being wrought aluminum alloys. Following the data collection, a series of data preprocessing operations were conducted to ensure data quality and consistency. This preprocessing stage was pivotal in ensuring the accuracy and reliability of the analysis, encompassing steps such as data cleaning and outlier handling. During the data-cleaning phase, incomplete or inaccurate data points that could potentially introduce interference were eliminated. Simultaneously, outlier handling was performed to exclude alloy element combinations that appeared infrequently in the entire dataset of aluminum alloy tensile strength, thereby enhancing the representativeness and credibility of the dataset. Notably, data points with incomplete alloy component information and alloy element combinations that only occurred once were removed during the data preprocessing process, to augment the completeness and significance of the dataset.
Consequently, after data preprocessing, a dataset comprising 67 data points of aluminum alloy tensile strength was obtained. The dataset encompassed information on the alloy composition, the grain size, and the corresponding tensile strength of the aluminum alloy samples. It is worth mentioning that this dataset encompassed 19 chemical elements, with each sample’s data including 19 chemical element variables, 1 grain size variable, and 1 target performance variable of tensile strength. These features provided a pool of information for a subsequent analysis and modeling. In Table 1, the minimum, maximum, and mean values of each feature variable are listed to provide a comprehensive understanding of the dataset’s distribution features.

2.2. Data Analysis

Data analysis serves as a crucial step in gaining a profound understanding of the relationship between the tensile strength of aluminum alloys and various feature variables. We adopted two methods, data visualization and a regression analysis from a statistical analysis, to comprehensively reveal these relationships. Data visualization played a pivotal role in this study, as it facilitated an intuitive understanding of the data features and their interrelations through a graphical representation. Conversely, a regression analysis was employed to construct machine learning models, thereby providing a deeper insight into the connections between the tensile strength of aluminum alloys and each feature variable. In this section, a frequency distribution histogram of the tensile strength of aluminum alloys is plotted, as shown in Figure 1. Through this frequency distribution histogram, the distribution of the tensile strength data can be comprehensively understood, aiding in the comprehension of the data features and statistical properties.
Figure 2 comprises two scatter plots that provide valuable insights into the relationship between various parameters and the tensile strength of aluminum alloys. The scatter plots in Figure 2 provide clear evidence of the impact of the aluminum content and grain size on the tensile strength of aluminum alloys. Specifically, the data reveal a negative correlation between the aluminum content and the tensile strength, with an increase in the aluminum content leading to a decrease in the tensile strength. Additionally, the scatter plot for grain size and tensile strength showed a similar trend, with larger grain sizes resulting in a lower tensile strength. These findings offer valuable insights into the mechanical properties of aluminum alloys and highlight the importance of carefully controlling these factors to ensure an optimal performance in practical applications.

3. Machine Learning Algorithms and Evaluation Methods

3.1. Machine Learning Algorithm and Parameter Settings

This study employed different machine learning regression algorithms, all of which originated from the open-source algorithm package in scikit-learn [56]. Here is a brief introduction to each algorithm: Linear regression is a classic linear model, suitable for modeling simple linear relationships. Random forest is applicable to complex non-linear problems, with the number and depth of trees being significant parameters. GBDT (gradient-boosting decision trees) is an ensemble of learning algorithms that improves the prediction performance by iteratively training decision tree models. GBDT optimizes the loss function using gradient descent to gradually reduce the prediction error and improve the overall model performance. The performance of the K-nearest neighbors algorithm is greatly influenced by the choice of K. Neural networks simulate connections between neurons in the human brain, with the key parameters including the number of network layers, the number of neurons, and the learning rate. XGBoost is a gradient-boosting algorithm, performing exceptionally well in regression and classification problems, with the important parameters including the learning rate and the depth of trees. LightGBM efficiently handles large-scale datasets and is suitable for high-performance machine learning tasks, with the key parameters including the learning rate, the number of trees, and their depth. Unless specifically stated, the default parameter settings for these algorithms were derived from scikit-learn and were carefully selected in the algorithm implementation, typically suitable for various types of problems. These algorithms were employed to capture the relationship between feature variables (alloy compositions and grain size) and target variables (tensile strength). The diversity and flexibility of the algorithms contributed to a comprehensive exploration and understanding of the correlation between the tensile strength of aluminum alloys and the feature variables, thereby providing a powerful tool for research in the field of materials science.

3.2. Model Evaluation Methods and Evaluation Indicators

Ten-fold cross-validation is a commonly used method for assessing the performance of machine learning models, and is particularly suitable for small sample datasets. It divides the original dataset into ten equal subsets, sequentially using each subset as the test set and the remaining nine subsets combined as the training set. The model is trained and evaluated on each test set, yielding ten independent performance evaluation results. Ultimately, the average of these ten evaluation results is computed to obtain the final performance evaluation metric. Ten-fold cross-validation has a distinct advantage when dealing with small sample datasets: by randomly dividing the training and test sets multiple times, it reduces the variance of the evaluation results, thereby enhancing the stability of the evaluation. Every sample has the opportunity to serve as the test data, and the performance of each model is thoroughly assessed, thus providing a more comprehensive understanding of the model’s performance. For small sample datasets, ten-fold cross-validation helps prevent overfitting on the training set and improves the evaluation of the model’s generalization ability for unseen data.
In this paper, evaluation metrics, including R2 (coefficient of determination) and the RMSE (root mean squared error), were utilized to measure the effectiveness of each algorithm in modeling the relationship between the tensile strength of aluminum alloys and the feature variables. R2, also known as the coefficient of determination, is an important metric for measuring the degree of fit of a regression model to observed data. It represents the proportion of the variance in the target variable that is predictable from the feature variables. The formula for calculating R2 is as follows:
R 2 = 1 i = 1 n y i y ^ i 2 i = 1 n y i y ¯ 2
In the formula, y i and y ^ i are the observed and corresponding predicted values, respectively, while y ¯ and y ^ ¯   are the mean values of y i and y ^ , respectively. The range of R2 lies between 0 and 1. When R2 equals 1, it indicates a perfect fit of the model to the data; conversely, when R2 equals 0, the model fails to explain any variance in the target variable, thereby representing the worst fit.
The RMSE/MSE/MAE is utilized to measure the prediction error of the regression model, reflecting the average discrepancy between the model’s predictions and the actual observations. A smaller value indicates a higher model prediction accuracy, i.e., a smaller difference between the predicted and actual values. The formulas for calculating the RMSE/MSE/MAE are as follows:
R M S E = i = 1 n 1 n y ^ i y i 2
M S E = 1 n i = 1 n y ^ i y i
M A E = 1 n y ^ i y i
In these formulas, y i   represents the observed values, y ^ i represents the model’s predicted values, and n denotes the number of samples. The range of the RMSE/MSE/MAE typically aligns with the unit of the target variable. A smaller RMSE/MSE/MAE value suggests a smaller prediction error of the model, i.e., the model is closer to the actual data. All of these evaluation metrics, R2 and the RMSE/MSE/MAE, played a pivotal role in the regression analysis. They provided an important basis for objectively assessing the model performance.

4. Results and Discussion

4.1. Modeling Based on Aluminum Alloy Composition and Grain Size

In this study, we initially embarked on machine learning modeling based on the composition of aluminum alloys (comprising 18 elements), the grain size, and the target property of tensile strength. Distinct machine learning algorithms were employed, namely linear regression (LR), random forest (RF), k-nearest neighbors (KNN), extreme gradient boosting (XGBoost), the light gradient-boosting machine (LightGBM), and an artificial neural network (ANN). These models were constructed to predict the tensile strength of aluminum alloys based on their composition and grain size. The modeling process underwent a rigorous evaluation, and the ten-fold cross-validation method was adopted, thereby ensuring the reliability of the model’s performance.
Table 2 presents the performance evaluation results of each model, encompassing the different metrics. The linear regression model yielded an R2 value of 0.55 and an RMSE of 171.51 MPa. Although its performance was relatively low, it provided a benchmark for subsequent model comparisons. The random forest model exhibited an exceptional performance with an R2 value of 0.86 and an RMSE of merely 62.42 MPa, indicating its efficacy in capturing the intricate relationship between the tensile strength of the aluminum alloys and the grain size. The KNN model, however, achieved an R2 value of 0.39 and an RMSE of 129.04 MPa, suggesting that it might be less suitable for problems with a plethora of features. KNN, in its essence, predicts the label of a new data point by considering the labels of its “nearest” neighbors within the feature space. While renowned for its simplicity and effectiveness in scenarios with fewer dimensions, KNN’s efficacy diminishes in the face of high-dimensional data, a phenomenon often termed the “curse of dimensionality.” Furthermore, our dataset, rich in features, included variables of varying relevance to the tensile strength prediction, introducing noise into the distance calculations pivotal to KNN and inadvertently leading to the model recognizing incorrect neighbors. Compounding these challenges was the issue of data sparsity: the exponential increase in feature space volume inherent with higher dimensions results in sparser data, a situation ill-suited for KNN, especially given our dataset’s modest size of 67 data points. The XGBoost model demonstrated an impressive ability to address the relationship between the aluminum alloy tensile strength and the grain size, with an R2 value of 0.88 and an RMSE of 57.72 MPa. The LightGBM model’s performance was closely aligned with XGBoost, registering an R2 value of 0.77 and an RMSE of 80.06 MPa. The ANN model, although slightly inferior to random forest, XGBoost, and LightGBM, still produced an acceptable R2 value of 0.73 and an RMSE of 85.69 MPa. These evaluation results unequivocally highlight that sophisticated models such as random forest, XGBoost, and LightGBM excel in modeling the relationship between the aluminum alloy tensile strength and the grain size, possibly due to their capacity to capture more complex non-linear relationships. Conversely, linear regression and KNN exhibited a lower accuracy on this issue.

4.2. Feature Selection and Modeling

Feature selection stands as a pivotal step in machine learning and data analysis, focusing on the effective reduction in data dimensions without compromising essential information. In the context of studying the tensile strength of aluminum alloys, feature selection becomes paramount. Considering the numerous and intricate components of aluminum alloys, it is crucial to pinpoint the key elements that genuinely impact the tensile strength. Traditional statistical methods, such as a correlation analysis and a variance analysis, are no longer sufficient for this purpose. Therefore, we also turned to advanced machine learning techniques. In this research, the random forest model was employed, which constructs multiple decision trees and amalgamates their predictive outcomes, thereby enhancing the prediction accuracy. The random forest model assigns importance scores to each feature, aiding in the identification of the features with the most significant impact on the target variable. However, relying solely on the feature importance scores from random forest might be inadequate. To achieve a more comprehensive and equitable feature evaluation, we introduced an advanced feature interpretation technique based on Shapley values—SHAP (Shapley additive explanation) values [57,58]. In this study, we incorporated an in-depth explanation and application of SHAP (Shapley additive explanation) values to enhance the transparency and interpretability of our machine learning models. SHAP values, which represent a prominent method in the domain of machine learning interpretability, offer significant advantages over traditional feature importance metrics by providing a detailed contribution of each feature for individual predictions, rather than just an aggregate impact. The concept of SHAP values is rooted in cooperative game theory, wherein the Shapley values allocate fair “payouts” to players in a coalition based on their contribution. In the context of our machine learning model, SHAP values assign each feature a value indicative of its contribution to a specific prediction, thus offering a granular understanding of the model’s decision-making process. In our methodology, we applied SHAP values to our random forest model, enabling us to delve into the intricate relationship between our predictors (such as aluminum content, grain size, etc.) and the target outcome (tensile strength). This method illuminated critical insights into both global and local feature importance, thereby significantly enriching our analysis and conclusions. In contrast to the feature importance scores obtained from random forest, SHAP values provide a more comprehensive and fair assessment of features. By synthesizing the insights from the random forest model and SHAP values, we discerned the influence of different feature variables on the tensile strength.
The magnitude of the SHAP value reflects the importance of a specific feature. Figure 3a provides a summary of the key factors influencing the tensile strength, with Cu, Si, Mg, Fe, and Zn identified as the top five features. In Figure 3b, the SHAP values quantify the individual contributions of these features to the target variable. Positive SHAP values indicate a positive effect, while negative values indicate a negative correlation. Figure 3b, the SHAP summary plot, visually represents the SHAP values for each feature across all samples. Based on Figure 3b, it can be inferred that there is an inverse relationship between the Al content and the grain size with respect to the tensile strength.
Subsequently, by focusing on these five elemental feature variables (Cu, Si, Mg, Fe, and Zn), the optimal subset method was employed for further feature selection, aiming to ascertain which elemental combinations play a pivotal role in establishing the optimal tensile strength prediction model. The optimal subset method is a feature selection technique that determines the best predictive model combination by considering various feature variable combinations. In this instance, the top five elemental feature variables (Cu, Si, Mg, Fe, and Zn) were selected as the candidate feature set.
By utilizing the optimal subset method [59,60], four distinct machine learning algorithms were explored, namely random forest, k-nearest neighbors, LightGBM, and XGBoost. For each algorithm, the best feature combination was identified by comparing the performance of various feature sets. From the importance ranking of element features in Figure 4a–d, it can be seen that in different ML models, the model generally has the highest R2 when the number of element features is 3. After a thorough analysis and comparison, the optimal variable combination was determined, encompassing the elements Fe, Cu, and Mg. This aided in further simplifying the model, reducing the feature dimensions, enhancing the model interpretability, and, consequently, decreasing the model complexity. By selecting the optimal feature combination, we could more effectively capture the relationship between elemental feature variables and the tensile strength, while also reducing computational costs. This step allowed for a more precise determination of which elements significantly influence the tensile strength and their interplay. This facilitated the construction of simpler models that still possessed high predictive capabilities, offering deeper insights for understanding and optimizing material properties. Through feature selection, attention can be more effectively directed towards the elemental feature variables that have a pivotal impact on the tensile strength prediction, further optimizing the material design and refinement processes.
The three elemental feature variables derived from the feature selection, along with the grain size, were used as the inputs for different machine learning models for further modeling. Automated machine learning, commonly known as AutoML, streamlined the process of selecting the appropriate algorithms and tuning the hyperparameters to achieve an optimal model performance. Particularly, the flaml package is a Python library that automates this process and offers a fast and efficient framework for hyperparameter optimization. After our initial modeling with machine learning algorithms, we further employed AutoML, specifically leveraging the flaml package, for hyperparameter tuning. This additional step of optimization enhanced the precision of our models, making them more robust and accurate in predicting the tensile strength of aluminum alloys. Table 3 provides the detailed results of the top four ML models using different evaluation metrics.
To comprehensively understand the performance of the model, a scatter plot was created for the top four models based on R2. The plot included the predicted values and experimental values, along with a 45° diagonal line and confidence intervals. This was implemented to observe the relationship and deviations between the predicted results and the actual observations. Ideally, the points on the scatter plot should cluster around the 45° diagonal line, indicating a perfect match between the model predictions and the experimental values. Therefore, a denser distribution of points near the diagonal line signifies a high congruence between the model predictions and the actual observations. A further performance analysis clearly revealed that sophisticated models such as RF, GBDT, and LightGBM excel in modeling the relationship between the aluminum alloy tensile strength and the grain size, as shown in Figure 5b–d. Their predicted points were more concentrated around the 45° diagonal line, showcasing a superior predictive accuracy. Conversely, models such as linear regression exhibited an underperformance in Figure 5a, as their predicted points were more dispersed and displayed less alignment with the diagonal line. The results from Figure 5 indicate that, compared to the initial nineteen elemental variables, the performance of the three selected elemental variables was comparable, underscoring the efficacy of the proposed feature selection method. This selection not only filtered out features with lower associations to the tensile strength, but also further reduced the risk of overfitting and training costs, enhancing the model interpretability.
Despite diligent feature selection and hyperparameter optimization, the outcomes of our models did not meet the desired level of satisfaction. In an endeavor to enhance the predictive capability of our models, we revisited the dataset and extracted hardness values for the corresponding aluminum alloys from the original literature. Hardness, as a mechanical property, was anticipated to provide a substantial correlation with the tensile strength, thereby potentially elevating the predictive accuracy of the models. It is noteworthy, however, that not all sources provided measurements for hardness; thus, we were able to augment our dataset with hardness values for only 38 data points. Nevertheless, the incorporation of this mechanical property into our analysis yielded gratifying results. The inclusion of hardness as an additional feature variable in the predictive modeling of aluminum alloy tensile strength led to noteworthy improvements in the performance of our machine learning models, as shown in Table 4.
The LR model, in particular, demonstrated a significant leap in predictive accuracy, with its R2 value ascending from 0.501 to 0.904 and the RMSE sharply declining from 117.008 MPa to 44.217 MPa. This substantial enhancement in the LR indicated that the relationship between the predictive features and the tensile strength became more linear with the integration of hardness, thereby refining the model’s interpretative and predictive capacity. XGBoost also showed positive gains, with its R2 increasing marginally from 0.884 to 0.914 and its RMSE decreasing to 41.740 MPa, suggesting that the inclusion of hardness fine-tuned the model’s ability to capture the complex interplay between the features and the tensile strength. Conversely, the RF model exhibited a slight decrease in its R2 value from 0.891 to 0.854, yet sustained a steady RMSE of 54.447 MPa, indicating a robustness in its predictive precision, despite the integration of the additional feature. The slight dip in R2 may be attributed to the RF model’s complexity and the potential redundancy introduced by the correlated features. The subsequent analysis introduced GBDT in lieu of LightGBM, with GBDT continuing the trend of a high performance, as evidenced by an R2 of 0.907 and an RMSE of 43.470 MPa. This performance was consistent with the high predictive accuracy previously established by LightGBM, reaffirming the strength of gradient-boosting methodologies in this research domain. The scatter plot in Figure 6 shows the comparison between the predicted values and the experimental values for each of the four models. In summary, these results underscore the pivotal role of feature engineering in materials informatics. By selecting features underpinned by both statistical significance and physical relevance to the property of interest, the precision and reliability of predictive models can be improved. The strategic inclusion of hardness provided an additional layer of insight, empowering our models to unravel the complex dependencies and, thus, predict the tensile strength of aluminum alloys with an enhanced accuracy.

4.3. Polynomial Regression and Analysis

In this section, we delve deeply into the pivotal role of a polynomial regression analysis in predicting the tensile strength of aluminum alloys [61]. The objective of this analysis was to elucidate the intricate mathematical relationships between the tensile strength and key elemental variables, as well as the grain size, thereby offering a robust tool for a more profound understanding and optimization of material properties. The choice of a polynomial regression analysis stems from an understanding of the complexity of aluminum alloy material properties, as this method is adept at capturing non-linear relationships, thus providing a more accurate depiction of the influence of the elemental composition on the tensile strength. A salient advantage of this approach lies in its ability to handle intricate relationships, encompassing higher-order polynomial terms, to comprehensively consider the impact of the elemental composition. When constructing the mathematical model, higher-order polynomial terms of the chosen elemental variables were considered, and their relationships with the tensile strength were quantified through regression coefficients. Regarding the polynomial regression model, its inherent characteristic of relying on the entire dataset to generate a single, global equation makes it unsuitable for cross-validation. Therefore, in this case, the entire dataset was utilized to train the polynomial regression model. The mathematical relationship derived from the polynomial regression is represented below:
T s = 236.03 180.52 × F e + 151.99 × C u + 24.63 × M g 0.58 × G s                   140.19 × F e 2 + 1.24 × F e × G s 41.94 × C u 2 + 53.41 × C u × M g
In Equation (5), Ts represents the tensile strength of the aluminum alloy while Fe, Cu, and Mg denote the content of iron, copper, and magnesium elements in the aluminum alloy, respectively, and Gs signifies the grain size. The coefficients preceding each variable are regression coefficients, illustrating the specific impact of each element and the grain size on the tensile strength. According to Equation (5), the following key results can be observed: the tensile strength is inversely proportional to the content of Fe and the grain size, and directly proportional to the content of Cu and Mg. These results are consistent with previous research findings. It is known from this study that, with an increase in the Fe/Si ratio, there is a notable decline in the tensile strength in the rolling direction, transverse direction, and 45° direction, and elongation significantly increases; when the Fe/Si ratio increases to between 3 and 3.4, the decrease in β slows down and stabilizes, and δ experiences a significant reduction. Copper is a crucial alloying element with a certain solid-solution-strengthening effect, and the precipitated CuAl2 during aging also has a remarkable aging-strengthening effect. The copper content in aluminum sheets is typically between 2.5% and 5%, and the strengthening effect is optimal when the copper content is between 4% and 6.8%; hence, most high-strength aluminum alloys have a copper content within this range. Magnesium significantly strengthens aluminum; for every 1% increase in magnesium, the tensile strength increases by approximately 34 MPa. By providing the content of the elements Fe, Cu, and Mg in an aluminum alloy and the grain size Gs, the tensile strength of the aluminum alloy can be calculated through Equation (5).
To validate the accuracy of the established polynomial regression model, a comparison was made between the experimental data and the model-predicted values, revealing that the formula possesses an excellent predictive accuracy, achieving an R2 = 0.91, as shown in Figure 7. This substantiates that the polynomial regression analysis is effective and applicable for the design and performance prediction of aluminum alloy materials. Although our dataset was comprehensive, it did not encompass the exhaustive spectrum of variables that could potentially influence the extrapolation of our findings. We are also aware of the inherent assumptions imposed by each machine learning model employed in our study, ranging from the assumption of linearity in our regression models to the nuanced ramifications of hyperparameter selections in more complex models such as random forest and XGBoost. Furthermore, the utilization of SHAP values for feature interpretability, despite its robustness, remains susceptible to the complex uncertainties that accompany the interpretation of intricate, high-dimensional data interactions. An acknowledgment must be made of this study’s specific contextual framework, underscoring that the applicability of the findings could be confined to the distinct types of aluminum alloys and manufacturing conditions examined. The recognition of these limitations underscores the necessity of prudence in extrapolating these conclusions to more expansive contexts.

5. Conclusions

In this study, we developed predictive machine learning models that leverage not only the chemical composition, but also the grain size to forecast the tensile strength of aluminum alloys. The models were refined through meticulous feature selection and comprehensive model evaluations. It was discerned that the integration of the grain size with key elemental constituents—namely Fe, Cu, and Mg—imparted a pronounced influence on the tensile strength. The inclusion of hardness as an additional feature notably augmented the models’ predictive ability. Specifically, the XGBoost model exhibited exceptional proficiency in predicting the tensile strength, with an R2 of 0.914. This study substantiates the critical role of feature selection in reducing data dimensionality and streamlining the models, underscoring that the identification and utilization of the most salient features can more adeptly unravel the complex relationships between the elemental composition, grain size, and mechanical properties such as the tensile strength. With the inclusion of hardness, the model’s predictive accuracy was markedly enhanced, underscoring the value of incorporating comprehensive mechanical properties into the predictive framework. This advancement in feature selection methodology was validated by improved accuracy metrics, attesting to the efficacy of our approach.
The implications of this study are substantial, offering meaningful contributions to the predictive modeling and design of aluminum alloy tensile strength. This research enriches our understanding of the intricate relationship between the alloy composition, grain size, hardness, and tensile strength, thus paving the way for refined improvement and optimization strategies for aluminum alloys. Furthermore, the methodologies and techniques employed here bear the potential for application across diverse material systems, heralding a new frontier in material performance prediction and material informatics.

Author Contributions

Data curation, K.F., D.Z. and Y.Z.; formal analysis, K.F., D.Z. and Y.Z.; funding acquisition, C.Z. (Cheng Zhang 1) and H.Y.; investigation, K.F., D.Z., Y.Z., X.M. and C.Z. (Cheng Zhang 1); methodology, K.F., D.Z. and Y.Z.; project administration, C.Z. (Cheng Zhang 1) and H.Y.; resources, X.W., C.W. and T.J.; software, D.Z., Y.Z., X.M. and C.Z. (Cheng Zhang 1); validation, F.M. and C.Z. (Cheng Zhang 2); writing—original draft, D.Z., Y.Z. and C.Z. (Cheng Zhang 1); writing—review and editing, K.F., D.Z., Y.Z. and C.Z. (Cheng Zhang 1). All authors have read and agreed to the published version of the manuscript.

Funding

This project was supported by the National Key R&D Program of China (2020YFB2008400), the Special Project of Industrial Foundation Reconstruction and High-Quality Development of Manufacturing Industry in 2021, the State Key Lab of Advanced Metals and Materials (2022-Z17), the Frontier Exploration Projects of Longmen Laboratory (Nos. LMQYTSKT011 and LMQYTSKT006), the Key Scientific and Technological Project of Henan Province (No. 232102231018), and the Provincial and Ministerial Co-Construction of Collaborative Innovation Center for Non-Ferrous Metal New Materials and Advanced Processing Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the data are available within the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wang, Z.; Li, M.; Han, Q.; Yun, X.; Zhou, K.; Gardner, L.; Mazzolani, F.M. Structural fire behaviour of aluminium alloy structures: Review and outlook. Eng. Struct. 2022, 268, 114746. [Google Scholar] [CrossRef]
  2. Bayoumy, D.; Kan, W.; Wu, X.; Zhu, Y.; Huang, A. The latest development of Sc-strengthened aluminum alloys by laser powder bed fusion. J. Mater. Sci. Technol. 2023, 149, 1–17. [Google Scholar] [CrossRef]
  3. Yun, X.; Wang, Z.; Gardner, L. Full-Range Stress–Strain Curves for Aluminum Alloys. J. Struct. Eng. 2021, 147, 04021060. [Google Scholar] [CrossRef]
  4. Xiong, Z.; Guo, X.; Luo, Y.; Zhu, S.; Liu, Y. Experimental and numerical studies on single-layer reticulated shells with aluminium alloy gusset joints. Thin-Walled Struct. 2017, 118, 124–136. [Google Scholar] [CrossRef]
  5. Yang, L.; Chen, C.; Yu, F.; Chen, C.; Liu, J.; Wang, Z.; Wang, X.; Cui, J. Coordinated deformation and high formability mechanisms of 7A36 aluminum alloy by Sc micro-alloying and low-frequency electromagnetic casting. J. Mater. Res. Technol. 2023, 24, 5186–5201. [Google Scholar] [CrossRef]
  6. Leng, J.F.; Ren, B.H.; Dong, Y.F.; Wu, H. Grain Refinement and Strengthening Mechanism Analysis of an Ultrahigh Strength Sc(Er)-Zr-7075 Aluminum Alloy. Phys. Met. Metallogr. 2021, 122, 1597–1604. [Google Scholar] [CrossRef]
  7. Guo, Y.; Wei, W.; Shi, W.; Zhang, B.; Zhou, X.; Wen, S.; Wu, X.; Gao, K.; Rong, L.; Huang, H.; et al. Effect of Er and Zr additions and aging treatment on grain refinement of aluminum alloy fabricated by laser powder bed fusion. J. Alloys Compd. 2022, 912, 165237. [Google Scholar] [CrossRef]
  8. Zhang, J.; Gao, J.; Yang, S.; Song, B.; Zhang, L.; Lu, J.; Shi, Y. Breaking the strength-ductility trade-off in additively manufactured aluminum alloys through grain structure control by duplex nucleation. J. Mater. Sci. Technol. 2023, 152, 201–211. [Google Scholar] [CrossRef]
  9. Yang, X.; Chen, H.; Li, M.V.; Bu, H.; Zhu, Z.; Cai, C. Porosity suppressing and grain refining of narrow-gap rotating laser-MIG hybrid welding of 5A06 aluminum alloy. J. Manuf. Process. 2021, 68, 1100–1113. [Google Scholar] [CrossRef]
  10. Figueiredo, R.B.; Kawasaki, M.; Langdon, T.G. Seventy years of Hall-Petch, ninety years of superplasticity and a generalized approach to the effect of grain size on flow stress. Prog. Mater. Sci. 2023, 137, 101131. [Google Scholar] [CrossRef]
  11. Chen, G.Q.; Fu, G.S.; Wei, T.Y.; Cheng, C.Z.; Wang, H.S.; Song, L.L. Establishment of Prediction Model of Microstructure and Properties of 3003 Aluminum Alloy during Hot Deformation. Mater. Sci.-Medzg. 2019, 25, 369–375. [Google Scholar] [CrossRef]
  12. Ma, S.; Zhang, Z.; Huang, Z.; Song, D.; Jia, Y.; Zhou, N.; Wang, K.; Zheng, K.; Du, H. Prediction of Grain Size in Cast Aluminum Alloys. Crystals 2022, 12, 474. [Google Scholar] [CrossRef]
  13. Easton, M.A.; StJohn, D.H. Improved prediction of the grain size of aluminum alloys that includes the effect of cooling rate. Mater. Sci. Eng. A-Struct. Mater. Prop. Microstruct. Process. 2008, 486, 8–13. [Google Scholar] [CrossRef]
  14. Nadella, R.; Eskin, D.; Katgerman, L. Effect of Grain Refining on Defect Formation in DC Cast Al-Zn-Mg-Cu Alloy Billet. In Grandfield, Eskin. Essential Readings in Light Metals: Volume 3 Cast Shop for Aluminum Production; Springer International Publishing: Cham, Switzerland, 2016; pp. 842–847. [Google Scholar]
  15. Lu, Z.Y.; Huang, X.Q.; Huang, J.Z. Role of Grain Size and Shape in Superplasticity of Metals. Front. Mater. 2021, 8, 641928. [Google Scholar] [CrossRef]
  16. Zhao, H.; Sun, L.; Zhao, G.; Yu, J.; Liu, F.; Sun, X.; Lv, Z.; Cao, S. Abnormal grain growth behavior and mechanism of 6005A aluminum alloy extrusion profile. J. Mater. Sci. Technol. 2023, 157, 42–59. [Google Scholar] [CrossRef]
  17. Liu, R.; Li, K.; Zhou, G.W.; Tang, W.Q.; Shen, Y.; Tang, D.; Li, D.Y. Simulation of strain induced abnormal grain growth in aluminum alloy by coupling crystal plasticity and phase field methods. Trans. Nonferrous Met. Soc. China 2022, 32, 3873–3886. [Google Scholar] [CrossRef]
  18. Kalinenko, A.; Mishin, V.; Shishov, I.; Malopheyev, S.; Zuiko, I.; Novikov, V.; Mironov, S.; Kaibyshev, R.; Semiatin, S.L. Mechanisms of abnormal grain growth in friction-stir-welded aluminum alloy 6061-T6. Mater. Charact. 2022, 194, 112473. [Google Scholar] [CrossRef]
  19. Bouaziz, O.; Allain, S.; Scott, C.P.; Cugy, P.; Barbier, D. High manganese austenitic twinning induced plasticity steels: A review of the microstructure properties relationships. Curr. Opin. Solid State Mater. Sci. 2011, 15, 141–168. [Google Scholar] [CrossRef]
  20. Busby, J.T.; Hash, M.C.; Was, G.S. The relationship between hardness and yield stress in irradiated austenitic and ferritic steels. J. Nucl. Mater. 2005, 336, 267–278. [Google Scholar] [CrossRef]
  21. Chen, G.; Fu, G.; Lin, S.; Cheng, C.; Yan, W.; Chen, H. Simulation of flow of aluminum alloy 3003 under hot compressive deformation. Met. Sci. Heat Treat. 2013, 54, 623–627. [Google Scholar] [CrossRef]
  22. Chen, G.Q.; Clelland, J.; Slemrod, M.; Wang, D.H.; Yang, D. Isometric embedding via strongly symmetric positive systems. Asian J. Math. 2018, 22, 1–40. [Google Scholar] [CrossRef]
  23. Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef] [PubMed]
  24. Wang, F.Y.; Wu, H.H.; Dong, L.S.; Pan, G.F.; Zhou, X.Y.; Wang, S.Z.; Guo, R.Q.; Wu, G.L.; Gao, J.H.; Dai, F.Z. Atomic-scale simulations in multi-component alloys and compounds: A review on advances in interatomic potential. J. Mater. Sci. Technol. 2023, 165, 49–65. [Google Scholar] [CrossRef]
  25. Pan, G.; Wang, F.; Shang, C.; Wu, H.; Wu, G.; Gao, J.; Mao, X. Advances in machine learning-and artificial intelligence-assisted material design of steels. Int. J. Miner. Metall. Mater. 2023, 30, 1003–1024. [Google Scholar] [CrossRef]
  26. Zhu, D.; Pan, K.; Wu, H.H.; Wu, Y.; Xiong, J.; Yang, X.S.; Lookman, T. Identifying intrinsic factors for ductile-to-brittle transition temperatures in Fe-Al intermetallics via machine learning. J. Mater. Res. Technol. 2023, 26, 8836–8845. [Google Scholar] [CrossRef]
  27. Chen, Y.; Wang, S.; Xiong, J.; Wu, G.; Gao, J.; Wu, Y.; Mao, X. Identifying facile material descriptors for Charpy impact toughness in low-alloy steel via machine learning. J. Mater. Sci. Technol. 2023, 132, 213–222. [Google Scholar] [CrossRef]
  28. Lookman, T.; Balachandran, P.V.; Xue, D.; Yuan, R. Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design. NPJ Comput. Mater. 2019, 5, 21. [Google Scholar] [CrossRef]
  29. Wei, Q.; Xiong, J.; Sun, S.; Zhang, T.Y. Multi-objective machine learning of four mechanical properties of steels. Sci. China Technol. Sci. 2021, 51. [Google Scholar] [CrossRef]
  30. Zhu, D.X.; Pan, K.M.; Wu, Y.; Zhou, X.Y.; Li, X.Y.; Ren, Y.P.; Yang, X.S. Improved material descriptors for bulk modulus in intermetallic compounds via machine learning. Rare Met. 2023, 42, 2396–2405. [Google Scholar] [CrossRef]
  31. Sui, X.; Zhang, X.; Wei, J. Effects of cold rolling deformation rate and annealing temperature on microstructure and properties of automobile 5182 Aluminum alloy. Nonferrous Met. Process. 2023, 52, 40–43. [Google Scholar]
  32. Huang, K.; Guo, L.; Yang, F.; Ma, K.; Liu, P.; Liu, G. Effects of extrusion ratio on microstructure and tensile properties of an extruded Al-Fe-Cu alloy. Mater. Mech. Eng. 2015, 39, 6. [Google Scholar]
  33. Qin, J.; Zhi, L.; Yi, D.; Bin, W. Diversity of intergranular corrosion and stress corrosion cracking for 5083 Al alloy with different grain sizes. Trans. Nonferrous Met. Soc. China 2022, 32, 765–777. [Google Scholar] [CrossRef]
  34. Tsai, T.C.; Chuang, T.H. Role of grain size on the stress corrosion cracking of 7475 aluminum alloys. Mater. Sci. Eng. A 1997, 225, 135–144. [Google Scholar] [CrossRef]
  35. Ludtka, G.M.; Laughlin, D.E. The influence of microstructure and strength on the fracture mode and toughness of 7XXX series aluminum alloys. Metall. Trans. A 1982, 13, 411–425. [Google Scholar] [CrossRef]
  36. Ma, K.; Wen, H.; Hu, T.; Topping, T.D.; Isheim, D.; Seidman, D.N.; Schoenung, J.M. Mechanical behavior and strengthening mechanisms in ultrafine grain precipitation-strengthened aluminum alloy. Acta Mater. 2014, 62, 141–155. [Google Scholar] [CrossRef]
  37. Suresh, S.; Vasudévan, A.K.; Bretz, P.E. Mechanisms of Slow Fatigue Crack Growth in High Strength Aluminum Alloys: Role of Microstructure and Environment. Metall. Trans. A 1984, 15, 369–379. [Google Scholar] [CrossRef]
  38. Curle, U.A.; Govender, G. Semi-solid rheocasting of grain refined aluminum alloy 7075. Trans. Nonferrous Met. Soc. China 2010, 20, s832–s836. [Google Scholar] [CrossRef]
  39. Shou, W.B.; Yi, D.Q.; Liu, H.Q.; Tang, C.; Shen, F.H.; Wang, B. Effect of grain size on the fatigue crack growth behavior of 2524-T3 aluminum alloy. Arch. Civ. Mech. Eng. 2016, 16, 304–312. [Google Scholar] [CrossRef]
  40. Mobasherpour, I.; Tofigh, A.A.; Ebrahimi, M. Effect of nano-size Al2O3 reinforcement on the mechanical behavior of synthesis 7075 aluminum alloy composites by mechanical alloying. Mater. Chem. Phys. 2013, 138, 535–541. [Google Scholar] [CrossRef]
  41. Woo, W.; Balogh, L.; Ungar, T.; Choo, H.; Feng, Z. Grain structure and dislocation density measurements in a friction-stir welded aluminum alloy using X-ray peak profile analysis. Mater. Sci. Eng. A 2008, 498, 308–313. [Google Scholar] [CrossRef]
  42. Ram, G.D.J.; Murugesan, R.; Sundaresan, S. Fusion zone grain refinement in aluminum alloy welds through magnetic arc oscillation and its effect on tensile behavior. J. Mater. Eng. Perform. 1999, 8, 513–520. [Google Scholar] [CrossRef]
  43. Zhang, H.; Wang, M.; Zhang, X.; Yang, G. Microstructural characteristics and mechanical properties of bobbin tool friction stir welded 2A14-T6 aluminum alloy. Mater. Des. 2015, 65, 559–566. [Google Scholar] [CrossRef]
  44. Li, Z.; Yi, D.; Tan, C.; Wang, B. Investigation of the stress corrosion cracking behavior in annealed 5083 aluminum alloy sheets with different texture types. J. Alloys Compd. Interdiscip. J. Mater. Sci. Solid-State Chem. Phys. 2020, 817, 152690. [Google Scholar] [CrossRef]
  45. Zhang, Z.; Wang, J.; Zhang, Q.; Zhang, S.; Shi, Q.; Qi, H. Research on Grain Refinement Mechanism of 6061 Aluminum Alloy Processed by Combined SPD Methods of ECAP and MAC. Materials 2018, 11, 1246. [Google Scholar] [CrossRef]
  46. Tan, Y.B.; Wang, X.M.; Ma, M.; Zhang, J.X.; Liu, W.C.; Fu, R.D.; Xiang, S. A study on microstructure and mechanical properties of AA 3003 aluminum alloy joints by underwater friction stir welding. Mater. Charact. 2017, 127, 41–52. [Google Scholar] [CrossRef]
  47. Gupta, M.; Srivatsan, T.S. Interrelationship between matrix microhardness and ultimate tensile strength of discontinuous particulate-reinforced aluminum alloy composites. Mater. Lett. 2001, 51, 255–261. [Google Scholar] [CrossRef]
  48. Hosseinifar, M.; Malakhov, D.V. Effect of Ce and La on microstructure and properties of a 6xxx series type aluminum alloy. J. Mater. Sci. 2008, 43, 7157–7164. [Google Scholar] [CrossRef]
  49. Howeyze, M.; Eivani, A.R.; Arabi, H.; Jafarian, H.R. Effects of deformation routes on the evolution of microstructure, texture and tensile properties of AA5052 aluminum alloy. Mater. Sci. Eng. A 2018, 732, 120–128. [Google Scholar] [CrossRef]
  50. Pattnaik, A.B.; Das, S.; Jha, B.B.; Prasanth, N. Effect of Al–5Ti–1B grain refiner on the microstructure, mechanical properties and acoustic emission characteristics of Al5052 aluminium alloy. J. Mater. Res. Technol. 2015, 4, 171–179. [Google Scholar] [CrossRef]
  51. Zhao, K.; Gao, T.; Yang, H.; Hu, K.; Liu, G.; Sun, Q.; Liu, X. Enhanced grain refinement and mechanical properties of a high–strength Al–Zn–Mg–Cu–Zr alloy induced by TiC nano–particles. Mater. Sci. Eng. A 2021, 806, 140852. [Google Scholar] [CrossRef]
  52. Zhang, Y.; Ma, N.; Le, Y. Mechanical properties and damping capacity after grain refinement in A356 alloy. Mater. Lett. 2005, 59, 2174–2177. [Google Scholar] [CrossRef]
  53. Mehmood, A.; Shah, M.; Sheikh, N.A.; Qayyum, J.A.; Khushnood, S. Grain refinement of ASTM A356 aluminum alloy using sloping plate process through gravity die casting. Alex. Eng. J. 2016, 55, 2431–2438. [Google Scholar] [CrossRef]
  54. Camicia, G.; Timelli, G. Grain refinement of gravity die cast secondary AlSi7Cu3Mg alloys for automotive cylinder heads. Trans. Nonferrous Met. Soc. China 2016, 26, 1211–1221. [Google Scholar] [CrossRef]
  55. Ding, W.; Zhao, X.; Chen, T.; Zhang, H.; Liu, X.; Cheng, Y.; Lei, D. Effect of rare earth Y and Al–Ti–B master alloy on the microstructure and mechanical properties of 6063 aluminum alloy. J. Alloys Compd. 2020, 830, 154685. [Google Scholar] [CrossRef]
  56. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Duchesnay, É. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  57. Jablonka, K.M.; Jothiappan, G.M.; Wang, S.; Smit, B.; Yoo, B. Bias free multiobjective active learning for materials design and discovery. Nat. Commun. 2021, 12, 2312. [Google Scholar] [CrossRef]
  58. Kumar, R.; Singh, A.K. Chemical hardness-driven interpretable machine learning approach for rapid search of photocatalysts. npj Comput. Mater. 2021, 7, 197. [Google Scholar] [CrossRef]
  59. Sheikhpour, R.; Sarram, M.A.; Gharaghani, S.; Chahooki, M.A.Z. A survey on semi-supervised feature selection methods. Pattern Recognit. 2017, 64, 141–158. [Google Scholar] [CrossRef]
  60. Shang, C.; Wang, C.; Wu, H.; Liu, W.; Chen, Y.; Pan, G.; Mao, X. Improved data-driven performance of Charpy impact toughness via literature-assisted production data in pipeline steel. Sci. China Technol. Sci. 2023, 66, 2069–2079. [Google Scholar] [CrossRef]
  61. Eva, O. Modelling using polynomial regression. Procedia Eng. 2012, 48, 500–506. [Google Scholar]
Figure 1. Histogram of frequency distribution of tensile strength of aluminum alloys.
Figure 1. Histogram of frequency distribution of tensile strength of aluminum alloys.
Materials 16 07236 g001
Figure 2. Scatter plots depicting the relationship between aluminum content, grain size, and tensile strength of aluminum alloys: (a) shows the correlation between aluminum content and tensile strength, and (b) illustrates the relationship between grain size and tensile strength.
Figure 2. Scatter plots depicting the relationship between aluminum content, grain size, and tensile strength of aluminum alloys: (a) shows the correlation between aluminum content and tensile strength, and (b) illustrates the relationship between grain size and tensile strength.
Materials 16 07236 g002
Figure 3. Analysis results of SHAP values; (a) importance ranking of element features for tensile strength and (b) summary plot of SHAP values.
Figure 3. Analysis results of SHAP values; (a) importance ranking of element features for tensile strength and (b) summary plot of SHAP values.
Materials 16 07236 g003
Figure 4. Importance ranking of element features for tensile strength of Al alloys: (a) RF, (b) KNN, (c) XGBoost, and (d) LightGBM, respectively.
Figure 4. Importance ranking of element features for tensile strength of Al alloys: (a) RF, (b) KNN, (c) XGBoost, and (d) LightGBM, respectively.
Materials 16 07236 g004
Figure 5. Scatter plot of comparison of experimental values with predicted values of different models after feature selection: (a) LR, (b) RF, (c) GBDT, and (d) LightGBM, respectively. (all models were evaluated with 10-fold cross-validation and the grey line represents the fit curve with an 80% confidence interval).
Figure 5. Scatter plot of comparison of experimental values with predicted values of different models after feature selection: (a) LR, (b) RF, (c) GBDT, and (d) LightGBM, respectively. (all models were evaluated with 10-fold cross-validation and the grey line represents the fit curve with an 80% confidence interval).
Materials 16 07236 g005
Figure 6. Scatter plot of comparison of experimental values with predicted values of different models using screened composition, grain size, and hardness: (a) LR, (b) XGBoost, (c) RF, and (d) GBDT, respectively. (all models were evaluated with 10-fold cross-validation and the grey line represents the fit curve with an 80% confidence interval).
Figure 6. Scatter plot of comparison of experimental values with predicted values of different models using screened composition, grain size, and hardness: (a) LR, (b) XGBoost, (c) RF, and (d) GBDT, respectively. (all models were evaluated with 10-fold cross-validation and the grey line represents the fit curve with an 80% confidence interval).
Materials 16 07236 g006
Figure 7. Experimental values versus values calculated from the proposed equation (equation obtained by fitting 67 data points from the entire dataset).
Figure 7. Experimental values versus values calculated from the proposed equation (equation obtained by fitting 67 data points from the entire dataset).
Materials 16 07236 g007
Table 1. Twenty-two features characterizing aluminum alloys used in analysis.
Table 1. Twenty-two features characterizing aluminum alloys used in analysis.
Input/OutputAbb.DescriptionMinMaxMean
InputsAlMass fraction of Al/%84.450100.0093.626
SiMass fraction of Si/%0.0007.2701.114
FeMass fraction of Fe/%0.0000.6340.209
CuMass fraction of Cu/%0.0004.4201.297
MnMass fraction of Mn/%0.0005.4200.480
MgMass fraction of Mg/%0.0006.3801.628
CrMass fraction of Cr/%0.0005.6000.135
ZnMass fraction of Zn/%0.0007.8401.460
TiMass fraction of Ti/%0.0001.0100.073
ScMass fraction of Sc/%0.0000.1000.001
ErMass fraction of Er/%0.0000.1000.001
ZrMass fraction of Zr/%0.0000.2000.011
NiMass fraction of Ni/%0.0000.0350.003
BMass fraction of B/%0.0000.0420.002
BeMass fraction of Be/%0.0000.0010.000
LiMass fraction of Li/%0.0002.2000.033
CeMass fraction of Ce/%0.0000.1000.001
LaMass fraction of La/%0.0000.2100.003
SrMass fraction of Sr/%0.0000.0110.001
GsGrain size/um0.360482.00076.713
HardnessHardness/Hv48.000302.030108.838
OutputTsTensile strength/MPa115.000696.000349.528
Table 2. Performance metrics of ML models for predicting tensile strength in aluminum alloys (alloy composition and grain size).
Table 2. Performance metrics of ML models for predicting tensile strength in aluminum alloys (alloy composition and grain size).
ML ModelsR2RMSE (MPa)MSE (MPa)MAE (MPa)
KNN0.393129.04416,652.27599.490
LR0.550171.50729,414.59885.920
ANN0.73285.69133,282.024145.833
LightGBM0.76680.0626409.91163.933
RF0.85862.4243688.17443.259
XGBoost0.87857.7213331.76839.267
Table 3. Top four high-accuracy machine learning models with screened composition and grain size as input.
Table 3. Top four high-accuracy machine learning models with screened composition and grain size as input.
ML ModelsR2RMSE (MPa)MAE (MPa)MSE (MPa)
LR0.501117.00893.53913,690.907
XGBoost0.88456.45240.0343186.835
RF0.89154.68937.8972990.903
LightGBM0.91249.22737.6752423.319
Table 4. Four best-performing machine learning models based on accuracy (using screened composition, grain size, and hardness).
Table 4. Four best-performing machine learning models based on accuracy (using screened composition, grain size, and hardness).
ML ModelsR2RMSE (MPa)MAE (MPa)MSE (MPa)
RF0.85454.44740.5732964.502
LR0.90444.21735.3241955.187
GBDT0.90743.47031.7851889.660
XGBoost0.91441.74029.9101742.203
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fu, K.; Zhu, D.; Zhang, Y.; Zhang, C.; Wang, X.; Wang, C.; Jiang, T.; Mao, F.; Zhang, C.; Meng, X.; et al. Predictive Modeling of Tensile Strength in Aluminum Alloys via Machine Learning. Materials 2023, 16, 7236. https://doi.org/10.3390/ma16227236

AMA Style

Fu K, Zhu D, Zhang Y, Zhang C, Wang X, Wang C, Jiang T, Mao F, Zhang C, Meng X, et al. Predictive Modeling of Tensile Strength in Aluminum Alloys via Machine Learning. Materials. 2023; 16(22):7236. https://doi.org/10.3390/ma16227236

Chicago/Turabian Style

Fu, Keya, Dexin Zhu, Yuqi Zhang, Cheng Zhang, Xiaodong Wang, Changji Wang, Tao Jiang, Feng Mao, Cheng Zhang, Xiaobo Meng, and et al. 2023. "Predictive Modeling of Tensile Strength in Aluminum Alloys via Machine Learning" Materials 16, no. 22: 7236. https://doi.org/10.3390/ma16227236

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop