Assessment of Artificial Intelligence Strategies to Estimate the Strength of Geopolymer Composites and Influence of Input Parameters

Geopolymers might be the superlative alternative to conventional cement because it is produced from aluminosilicate-rich waste sources to eliminate the issues associated with its manufacture and use. Geopolymer composites (GPCs) are gaining popularity, and their research is expanding. However, casting, curing, and testing specimens requires significant effort, price, and time. For research to be efficient, it is essential to apply novel approaches to the said objective. In this study, compressive strength (CS) of GPCs was anticipated using machine learning (ML) approaches, i.e., one single method (support vector machine (SVM)) and two ensembled algorithms (gradient boosting (GB) and extreme gradient boosting (XGB)). All models’ validity and comparability were tested using the coefficient of determination (R2), statistical tests, and k-fold analysis. In addition, a model-independent post hoc approach known as SHapley Additive exPlanations (SHAP) was employed to investigate the impact of input factors on the CS of GPCs. In predicting the CS of GPCs, it was observed that ensembled ML strategies performed better than the single ML technique. The R2 for the SVM, GB, and XGB models were 0.98, 0.97, and 0.93, respectively. The lowered error values of the models, including mean absolute and root mean square errors, further verified the enhanced precision of the ensembled ML approaches. The SHAP analysis revealed a stronger positive correlation between GGBS and GPC′s CS. The effects of NaOH molarity, NaOH, and Na2SiO3 were also observed as more positive. Fly ash and gravel size: 10/20 mm have both beneficial and negative impacts on the GPC′s CS. Raising the concentration of these ingredients enhances the CS, whereas increasing the concentration of GPC reduces it. Gravel size: 4/10 mm has less favorable and more negative effects. ML techniques will benefit the construction sector by offering rapid and cost-efficient solutions for assessing material characteristics.


Introduction
Ordinary Portland cement (OPC), the extensively utilized cementitious ingredient in concrete globally, is connected with high energy demand and significant CO 2 discharges as a result of the manufacturing processes [1][2][3][4][5][6]. OPC manufacturing emits around 4 billion tons of CO 2 annually and emits nearly 5-7% of overall CO 2 worldwide [7,8]. Numerous measures have been attempted to mitigate the effects of OPC production and would improve the accuracy of ML approaches. The purpose of this work is to discover the best suitable ML technique for predicting the CS of GPC and the effect of several parameters on GPC strength.
This study is also novel in that it involves the comparative study of the employed ML algorithms for recommending the high-precision approach in the further studies for predicting the CS of GPC. These approaches are also beneficial for the researchers and the construction industries to minimize the experimental efforts, cost, and time of the project.

Data Retrieval and Analysis
SML methods require a wide variety of input variables in order to produce the desired output [73]. The CS of GPC was calculated using data from the scientific literature (see Table S1 in supplementary materials). In order to prevent bias, experimental data were selected at random from the published literature. This study gathered CS-based data points to run the algorithms, whereas the majority of articles studied additional aspects of GPC. Fine aggregate, GGBS, fly ash, NaOH molarity, NaOH, water/solids ratio, Na 2 SiO 3 , and gravel size: 10/20 and 4/10 mm were included as input variables in the algorithms, with CS serving as the output parameter. The number of inputs and datasets has a major effect on the output of the model [41]. In the present study, 371 data points were utilized to execute ML algorithms (see Supplementary Materials). The data were retrieved by keeping the mix proportions and required outcome in consideration as models required a similar number of input parameters for each mix to run it for the required output. The data used in this study were retrieved from the literature, so many of the tests were performed in the different zones, testing setups, and geometry setups. However, this variation in different testing setups, arrangements, or geometry of samples does not affect the study's main findings, as the models required only input variables and outcomes, irrespective of the testing setup and arrangements. Each input variable s descriptive statistics are summarized in Table 1. The normalization process was also adopted for the selected data. Normalization is the process of structuring data in a database. This entails constructing tables and developing links between those tables according to rules designed both to safeguard the data and to make the database more adaptable by removing redundancy and inconsistent reliance. The word "descriptive statistics" refers to a collection of short, factual measures that yield an outcome, which may be the entire population or a subset of it. The mean, median, and mode variables show fundamental tendencies, whereas the maximum, minimum, and standard deviation variables represent variation. Table 1 contains all the mathematical terms for the input variables of the model. The distribution of each input factor with CS is depicted in Figure 1. The frequency distribution is shown diagonally, along with the correlation between each input and output parameter. The growing trend of the line graph for each x-axis input/output parameter indicates a positive/negative connection with the y-axis input/output parameter under consideration. The straight line, on the other hand, demonstrates no link between the parameters. The correlation pattern of input parameters with the CS is depicted in Figure 2.

Machine Learning Algorithms Employed
In order to meet the study s aims, an individual ML method (SVM) and ensemble ML approaches (GB and XGB) were employed in conjunction with Python coding using the Anaconda Navigator package. Spyder (version 4.3.5) was used to run the SVM, GB, and XGB models. These techniques are frequently used to forecast desired outcomes in the presence of input parameters. These techniques can anticipate the temperature impacts, the strength characteristics, and the material s durability, among other things [74,75]. Nine input factors and one output (CS) were used throughout the modeling phase. The projected result's R 2 value reflects the performance of the models employed. The R 2 value indicates the degree of divergence; a value near zero indicates greater divergence, while a value near one indicates that the model and experimental data are nearly perfectly fit [40]. The subsequent sub-segments describe the ML approaches used in this study. Moreover, k-fold, statistical, as well as error evaluations were performed on all models involving root mean square error (RMSE) and mean absolute error (MAE). Additionally, a model-independent post hoc procedure called SHapley Additive exPlanations (SHAP) was used to examine the impact of input factors on the CS of GPCs. The research plan is depicted in Figure 3.

Machine Learning Algorithms Employed
In order to meet the study′s aims, an individual ML method (SVM) and ens approaches (GB and XGB) were employed in conjunction with Python coding Anaconda Navigator package. Spyder (version 4.3.5) was used to run the SVM XGB models. These techniques are frequently used to forecast desired outcom presence of input parameters. These techniques can anticipate the temperatur the strength characteristics, and the material′s durability, among other things [74 input factors and one output (CS) were used throughout the modeling phase jected result's R 2 value reflects the performance of the models employed. The indicates the degree of divergence; a value near zero indicates greater divergen a value near one indicates that the model and experimental data are nearly pe [40]. The subsequent sub-segments describe the ML approaches used in this stu over, k-fold, statistical, as well as error evaluations were performed on all mode ing root mean square error (RMSE) and mean absolute error (MAE). Addit model-independent post hoc procedure called SHapley Additive exPlanation was used to examine the impact of input factors on the CS of GPCs. The resear depicted in Figure 3.

Support Vector Machine
SVM is a term that refers to supervised learning algorithms and related learning algorithms employed to evaluate data for classification and regression evaluation. An SVM technique is a description of the samples as points in space that have been plotted in such a manner that the patterns of the distinct classifications are split by a distinct vector (line/plane) with a gap as large as feasible. Other instances are then overlaid into that similar space and classified according to which side of the vector they lie on, as seen in Figure  4. Figure 5 depicts the SVM model's implementation method. The material strength was estimated using this model, which takes into consideration the combined influence of several elements. The optimization technique was utilized to ascertain the SVM model's pa-

Support Vector Machine
SVM is a term that refers to supervised learning algorithms and related learning algorithms employed to evaluate data for classification and regression evaluation. An SVM technique is a description of the samples as points in space that have been plotted in such a manner that the patterns of the distinct classifications are split by a distinct vector (line/plane) with a gap as large as feasible. Other instances are then overlaid into that similar space and classified according to which side of the vector they lie on, as seen in Figure 4. Figure 5 depicts the SVM model's implementation method. The material strength was estimated using this model, which takes into consideration the combined influence of several elements. The optimization technique was utilized to ascertain the SVM model's parameters.

Support Vector Machine
SVM is a term that refers to supervised learning algorithms and related learning algorithms employed to evaluate data for classification and regression evaluation. An SVM technique is a description of the samples as points in space that have been plotted in such a manner that the patterns of the distinct classifications are split by a distinct vector (line/plane) with a gap as large as feasible. Other instances are then overlaid into that similar space and classified according to which side of the vector they lie on, as seen in Figure  4. Figure 5 depicts the SVM model's implementation method. The material strength was estimated using this model, which takes into consideration the combined influence of several elements. The optimization technique was utilized to ascertain the SVM model's parameters.

Gradient Boosting
In 1999, Friedman [77] proposed GB as a classification and regression ensemble approach. GB is useful exclusively for regression. Figure 6 depicts that the GB method relates every single repetition of the arbitrarily selected training dataset to the base model. By arbitrarily subsampling the training dataset, which also prevents overfitting, execution time may be sped up, and accuracy can be raised. The less the amount of training dataset, the quicker the regression since each iteration of the model must include minimal data. GB technique involves modification parameters, comprising n-trees and shrinkage rate, where n-trees are the number of trees to be formed; n-trees must not be maintained too small, and the shrinkage factor, also known as the learning rate, applied to all trees in progress, must not be kept too high [78].

Gradient Boosting
In 1999, Friedman [77] proposed GB as a classification and regression ensemble approach. GB is useful exclusively for regression. Figure 6 depicts that the GB method relates every single repetition of the arbitrarily selected training dataset to the base model. By arbitrarily subsampling the training dataset, which also prevents overfitting, execution time may be sped up, and accuracy can be raised. The less the amount of training dataset, the quicker the regression since each iteration of the model must include minimal data. GB technique involves modification parameters, comprising n-trees and shrinkage rate, where n-trees are the number of trees to be formed; n-trees must not be maintained too small, and the shrinkage factor, also known as the learning rate, applied to all trees in progress, must not be kept too high [78].
arbitrarily subsampling the training dataset, which also prevents overfitting, execution time may be sped up, and accuracy can be raised. The less the amount of training dataset, the quicker the regression since each iteration of the model must include minimal data. GB technique involves modification parameters, comprising n-trees and shrinkage rate, where n-trees are the number of trees to be formed; n-trees must not be maintained too small, and the shrinkage factor, also known as the learning rate, applied to all trees in progress, must not be kept too high [78].

Extreme Gradient Boosting
The fundamental concept underlying the projected XGB model is to construct an optimization job utilizing a genetic algorithm on top of the classifier in order to improve the classification precision of smaller groups, devoid of substantially compromising the

Extreme Gradient Boosting
The fundamental concept underlying the projected XGB model is to construct an optimization job utilizing a genetic algorithm on top of the classifier in order to improve the classification precision of smaller groups, devoid of substantially compromising the classification precision of other groups. The genetic algorithm creates arbitrary estimates for the XGB in order to establish a new decision threshold with the greatest genetic fitness rate [80]. Particularly, the XGB model consists of the following four phases: producing the population of parameter values, choosing the population of parameter values, training the decision function, and assessing the fitness function. Figure 7 depicts the XGB flowchart. classification precision of other groups. The genetic algorithm creates arbitrary estimates for the XGB in order to establish a new decision threshold with the greatest genetic fitness rate [80]. Particularly, the XGB model consists of the following four phases: producing the population of parameter values, choosing the population of parameter values, training the decision function, and assessing the fitness function. Figure 7 depicts the XGB flowchart.  Figure 8 depicts the findings of the SVM method for GPC's CS. Figure 8a illustrates the relationship between actual data (experimental) and forecasted results. The SVM method yielded outcomes with a fair degree of precision and a minor discrepancy between actual and forecasted findings. The R 2 value of 0.93 demonstrates the higher accuracy of the SVM technique in predicting the CS of GPC. Figure 8b depicts the scattering of experimental, projected, and divergence values (errors) for the SVM model. After analyz-   Figure 8a illustrates the relationship between actual data (experimental) and forecasted results. The SVM method yielded outcomes with a fair degree of precision and a minor discrepancy between actual and forecasted findings. The R 2 value of 0.93 demonstrates the higher accuracy of the SVM technique in predicting the CS of GPC. Figure 8b depicts the scattering of experimental, projected, and divergence values (errors) for the SVM model. After analyzing the error values, it was discovered that the minimum, mean, and maximum values were 0.20 MPa, 4.04 MPa, and 8.39 MPa, respectively. In addition, the proportion dissemination of divergence values was established, and it was observed that 37% were lying below 3 MPa, 37% were in the range of 3-6 MPa, and 26% were greater than 6 MPa. In addition, the variance of diverged values suggests that the SVM technique performed adequately in predicting the CS of GPC.  Figure 9a,b present an evaluation of the actual and estimated findings of the GB model. Figure 9a illustrates the correlation between actual and forecasted results, with an R 2 of 0.97 indicating that the GB technique is more accurate than the SVM in forecasting the CS of GPC. The scattering of experimental, projected, and diverged values (errors) for the GB technique are depicted in Figure 9b. It was determined that the lowest, average, and maximum errors were 0.  Figure 9a,b present an evaluation of the actual and estimated findings of the GB model. Figure 9a illustrates the correlation between actual and forecasted results, with an R 2 of 0.97 indicating that the GB technique is more accurate than the SVM in forecasting the CS of GPC. The scattering of experimental, projected, and diverged values (errors) for the GB technique are depicted in Figure 9b. It was determined that the lowest, average, and maximum errors were 0.23 MPa, 2.27 MPa, and 6.30 MPa, respectively. The error distribution was 16.4% below 1 MPa, 60.3% between 1 and 3 MPa, and 23.3% above 3 MPa. In addition, these decreased error levels suggest that the GB model is more accurate than the SVM model. The improved accuracy of the GB model is due to the formation of twenty sub-models and using the one with the optimized R 2 value.

Gradient Boosting Model
Polymers 2022, 14, x 11 of 25 distribution was 16.4% below 1 MPa, 60.3% between 1 and 3 MPa, and 23.3% above 3 MPa. In addition, these decreased error levels suggest that the GB model is more accurate than the SVM model. The improved accuracy of the GB model is due to the formation of twenty sub-models and using the one with the optimized R 2 value.  Figure 10 displays the outcomes of the XGB method for predicting the GPC's CS. Figure 10a depicts the connection between actual and forecasted findings. The XGB method yielded output with the best precision and the smallest deviation between experimental and predicted results. The XGB model is highly good at predicting the CS of GPC, with an R 2 of 0.98. The scattering of experimental, forecasted, and diverged values (errors) for the XGB method is depicted in Figure 10b. The minimum, average, and highest error values were analyzed to be 0.44 MPa, 2.08 MPa, and 4.98 MPa, respectively. The error division was 10.0% below 1 MPa, 75.5% between 1 and 3 MPa, and 14.5% above 3 MPa. In addition, the division of errors reveals that the XGB model has the best predicting precision.  Figure 10 displays the outcomes of the XGB method for predicting the GPC's CS. Figure 10a depicts the connection between actual and forecasted findings. The XGB method yielded output with the best precision and the smallest deviation between experimental and predicted results. The XGB model is highly good at predicting the CS of GPC, with an R 2 of 0.98. The scattering of experimental, forecasted, and diverged values (errors) for the XGB method is depicted in Figure 10b. The minimum, average, and highest error values were analyzed to be 0.44 MPa, 2.08 MPa, and 4.98 MPa, respectively. The error division was 10.0% below 1 MPa, 75.5% between 1 and 3 MPa, and 14.5% above 3 MPa. In addition, the division of errors reveals that the XGB model has the best predicting precision.

Models′ Validation
Methods of k-fold and statistical tests were utilized to validate the employed ML methods. Frequently, the k-fold approach [81] was used to determine the validity of a strategy by arbitrarily scattering and dividing pertinent data into 10 groups. As displayed in Figure 11, nine groups were utilized for training ML models, whereas one was to validate it. When the errors (MAE and RMSE) are small, and R 2 is superior, the ML method is

Models' Validation
Methods of k-fold and statistical tests were utilized to validate the employed ML methods. Frequently, the k-fold approach [81] was used to determine the validity of a strategy by arbitrarily scattering and dividing pertinent data into 10 groups. As displayed in Figure 11, nine groups were utilized for training ML models, whereas one was to validate it. When the errors (MAE and RMSE) are small, and R 2 is superior, the ML method is more precise. Moreover, the procedure must be repeated 10 times in order to obtain a satisfactory result. This substantial effort considerably adds to the exceptional accuracy of the model. In addition, as indicated in Table 2, each model was statistically tested for inaccuracy (MAE and RMSE). The MAE values for SVM, GB, and XGB were determined to be 4.03, 2.26, and 2.01 MPa, respectively. Similarly, the RMSE values for SVM, GB, and XGB were identified as 4.62, 2.59, and 2.18 MPa, respectively. These evaluations also suggested that the XGB model is more accurate than the other techniques because of its reduced errors. By using Equations (1) and (2), which were acquired from earlier studies [82,83], the prediction performance of the approaches was measured statistically.
where n = total number of data samples, P i = forecasted values, and T i = actual values from the data sample.   Figure 11. K-fold cross-validation procedure [84].
In order to determine the efficacy of the k-fold assessment, R 2 , RMSE, and MAE were computed, and their readings are listed in Table 3. In order to compare the outcomes of kfold analysis for each ML method, Figures 12-14 were constructed. As shown in Figure   1 Figure 11. K-fold cross-validation procedure [84].
In order to determine the efficacy of the k-fold assessment, R 2 , RMSE, and MAE were computed, and their readings are listed in Table 3. In order to compare the outcomes of k-fold analysis for each ML method, Figures 12-14 were constructed. As shown in Figure 12 Figure 13). However, the average R 2 for SVM, GB, and XGB were 0.75, 0.82, and 0.85, respectively ( Figure 14). Compared to the other models, the XGB model with the lowermost error values and the greatest R 2 is the most exact in forecasting the CS of GPC.

Influence of Input Parameters
In this study, the impact of input factors on the ML technique′s performance was determined. SHAP tree explainer is originally applied over the whole database in order to offer a more accurate description of global feature impacts by combining local SHAP

Influence of Input Parameters
In this study, the impact of input factors on the ML technique′s performance was determined. SHAP tree explainer is originally applied over the whole database in order to offer a more accurate description of global feature impacts by combining local SHAP

Influence of Input Parameters
In this study, the impact of input factors on the ML technique s performance was determined. SHAP tree explainer is originally applied over the whole database in order to offer a more accurate description of global feature impacts by combining local SHAP explanations. The "TreeExplainer" technique of tree-like SHAP approximation was implemented [85]. This approach evaluates the internal structure of tree-based models, i.e., the summation of a series of computations associated with the tree model s leaf node, which leads to low-order complexity. Figure 15 displays the results on the violin SHAP plot for all the characteristics used to predict the CS of GPC. In this plot, each feature value is represented by a distinct color, and the corresponding SHAP value on the x-axis indicates the contribution output. As an example, GGBS is an input characteristic with a greater impact, illustrating the greater positive link between this characteristic and the CS of GPC. This indicates that an increase in GGBS would mostly lead to an increase in CS. A comparable impact of NaOH molarity, NaOH, and Na 2 SiO 3 was also indicated on the CS prediction of GPC. The water/solids ratio, Gravel 10/10 mm, and fly ash have both beneficial and negative effects on the GPC's CS. This signifies that using these ingredients up to optimal contents improves the CS, while at higher contents, the CS of GPC decreases. On the other hand, the fine aggregate and gravel size, 4/10 mm, have a less positive influence and more negative influence (more red dots on the negative side). This evaluation is based on a database suggested by current research, and more data points may yield more accurate results.
Polymers 2022, 14, x 17 of 25 explanations. The "TreeExplainer" technique of tree-like SHAP approximation was implemented [85]. This approach evaluates the internal structure of tree-based models, i.e., the summation of a series of computations associated with the tree model′s leaf node, which leads to low-order complexity. Figure 15 displays the results on the violin SHAP plot for all the characteristics used to predict the CS of GPC. In this plot, each feature value is represented by a distinct color, and the corresponding SHAP value on the x-axis indicates the contribution output. As an example, GGBS is an input characteristic with a greater impact, illustrating the greater positive link between this characteristic and the CS of GPC. This indicates that an increase in GGBS would mostly lead to an increase in CS. A comparable impact of NaOH molarity, NaOH, and Na2SiO3 was also indicated on the CS prediction of GPC. The water/solids ratio, Gravel 10/10 mm, and fly ash have both beneficial and negative effects on the GPC's CS. This signifies that using these ingredients up to optimal contents improves the CS, while at higher contents, the CS of GPC decreases. On the other hand, the fine aggregate and gravel size, 4/10 mm, have a less positive influence and more negative influence (more red dots on the negative side). This evaluation is based on a database suggested by current research, and more data points may yield more accurate results.  Figure 16 depicts the relationship between the input parameters and the GPC's CS. Figure 16a depicts the interaction of the GGBS. The scatter plot reveals that, among other parameters, GGBS has the highest impact on the CS of GPC, which is increasing with increasing GGBS quantity and is mainly interacting with the water/solids ratio. By increasing the GGBS quantity from 0 to 300 kg/m 3 , the CS of GPC improves incessantly, while above that value, its impact becomes constant. In this circumstance, the quantity of GGBS in the range of 300-400 kg/m 3 is favorable in achieving high CS for GPCs, while using the same ingredients considered in the present study. Similarly, NaOH molarity, NaOH, and Na2SiO3 have a favorable impact on the CS of GPC with increasing amounts up to an optimal content (Figure 16b-d). For example, increasing NaOH molarity up to eight has a positive influence, while further increase causes a negative impact (Figure 16b). However, the impact of fly ash on the CS of GPC is different. Using fly ash below 400 kg/m 3 has a detrimental influence while using fly ash in higher amounts has a favorable impact on the  Figure 16 depicts the relationship between the input parameters and the GPC's CS. Figure 16a depicts the interaction of the GGBS. The scatter plot reveals that, among other parameters, GGBS has the highest impact on the CS of GPC, which is increasing with increasing GGBS quantity and is mainly interacting with the water/solids ratio. By increasing the GGBS quantity from 0 to 300 kg/m 3 , the CS of GPC improves incessantly, while above that value, its impact becomes constant. In this circumstance, the quantity of GGBS in the range of 300-400 kg/m 3 is favorable in achieving high CS for GPCs, while using the same ingredients considered in the present study. Similarly, NaOH molarity, NaOH, and Na 2 SiO 3 have a favorable impact on the CS of GPC with increasing amounts up to an optimal content (Figure 16b-d). For example, increasing NaOH molarity up to eight has a positive influence, while further increase causes a negative impact (Figure 16b). However, the impact of fly ash on the CS of GPC is different. Using fly ash below 400 kg/m 3 has a detrimental influence while using fly ash in higher amounts has a favorable impact on the CS of GPC. It is important to mention here that these observations are based on the types of ingredients and the number of data samples considered in this study. Using distinct ingredients as input parameters and datasets might yield different outputs. CS of GPC. It is important to mention here that these observations are based on the types of ingredients and the number of data samples considered in this study. Using distinct ingredients as input parameters and datasets might yield different outputs.

Discussions
The purpose of this research was to contribute to the current body of knowledge concerning the application of modern methods for measuring the CS of GPC. This study will benefit the construction industry by providing quick and cost-effective methods for predicting material properties. In addition, the acceptance and usage of GPC in the building sector will be hastened by employing these measures to encourage eco-friendly building. As GPC might be produced from aluminosilicate-containing wastes, its usage in the building sector will offer many benefits, as shown in Figure 17. This research illustrates how ML techniques may be utilized to predict the CS of GPC. The study employed three ML methods: one individual (SVM) and two ensembled (GB and XGB). The accuracy of each

Discussion
The purpose of this research was to contribute to the current body of knowledge concerning the application of modern methods for measuring the CS of GPC. This study will benefit the construction industry by providing quick and cost-effective methods for predicting material properties. In addition, the acceptance and usage of GPC in the building sector will be hastened by employing these measures to encourage eco-friendly building. As GPC might be produced from aluminosilicate-containing wastes, its usage in the building sector will offer many benefits, as shown in Figure 17. This research illustrates how ML techniques may be utilized to predict the CS of GPC. The study employed three ML methods: one individual (SVM) and two ensembled (GB and XGB). The accuracy of each approach was evaluated to determine the highly effective forecaster. In comparison with the GB and SVM techniques, which provided R 2 of 0.97 and 0.93, respectively, the XGB technique produced a more accurate result with an R 2 of 0.98. In comparison, Wang et al. [84] also anticipated the CS of geopolymer concrete by using the AdaBoost, random forest, and decision tree algorithms and reported the R 2 value equal to 0.90, 0.90, and 0.83, respectively. Cao et al. [86] also employed SVM and MLP approaches for the CS of geopolymer concrete and reported the R 2 result as 0.91 and 0.88, respectively. It also indicates that the selected algorithms in the present study performed better than the approaches used in the previous studies. In addition, the correctness of each model was tested using the k-fold and statistical methods. The greater the model's precision, the fewer errors it contains. Nevertheless, establishing and suggesting the ideal ML method for anticipating findings in a range of domains is challenging since the performance of an ML method is highly dependent on the quantity of input variables dataset used to run algorithms. In contrast, ensembled ML techniques frequently utilize the weak learner by building sub-models that might be trained on the dataset and optimized to maximize the R 2 . Figure 18 depicts the distribution of R 2 for the GB and XGB submodels. Figure 18a depicts the R 2 values for GB sub-models with the lowest, mean, and maximum R 2 values of 0.902, 0.919, and 0.973, respectively. Nonetheless, the lowest, mean, and maximum R 2 values for XGB submodels were 0.901, 0.927, and 0.980, respectively (Figure 18b). These results indicate that both the GB and XGB submodels have similar readings and a superior degree of accuracy when predicting the GPC's CS. Additionally, SHAP was used to investigate the effect of input factors on the CS of GPCs. GGBS was determined to be an input feature with a bigger influence, indicating a stronger positive relationship between this characteristic and GPC's CS. On the CS prediction of GPC, a comparable effect of NaOH molarity, NaOH, and Na 2 SiO 3 was also revealed. Gravel size: 10/20 mm, the water/solids ratio, and fly ash have both positive and negative influences on the GPC's CS. This indicates that employing these substances up to the ideal concentration enhances the CS, but the CS of GPC diminishes at greater concentrations. In contrast, the fine aggregate and gravel size of 4/10 mm have a lesser beneficial impact and a greater negative impact.

Conclusions
This research aimed to utilize both individual and ensemble machine learnin methods to predict the compressive strength (CS) of geopolymer composites (GPC individual approach, support vector machine (SVM), and two ensemble strategies ent boosting (GB) and extreme gradient boosting (XGB), were employed to pred comes. As a consequence of this investigation, the following findings were drawn XGB sub-models

Conclusions
This research aimed to utilize both individual and ensemble machine learning (ML) methods to predict the compressive strength (CS) of geopolymer composites (GPCs). One individual approach, support vector machine (SVM), and two ensemble strategies, gradient boosting (GB) and extreme gradient boosting (XGB), were employed to predict outcomes. As a consequence of this investigation, the following findings were drawn:

•
Ensemble ML methods fared better at predicting the CS of GPCs than individual machine learning techniques, with the XGB model doing the best. For the XGB, GB, and SVM models, the coefficients of determination (R 2 ) were 0.98, 0.97, and 0.93, respectively. All the employed techniques yielded results within a satisfactory limit and with little deviation from the experimental findings; • These error readings also proved the best performance of the XGB method in forecasting the CS of GPC; • Statistical tests and k-fold analysis validated the performance of the employed models.
The smaller errors and greater R 2 resulting from k-fold analysis suggested the higher precision of the ML model. These analyses indicated that the XGB model outperformed the other investigated models; • Based on the results of the SHAP analysis, GGBS was considered to be a more influential input feature, showing a larger positive association between this characteristic and GPC's CS; • This nature of research will help the construction industry by facilitating the development of fast and cost-efficient methods for forecasting material characteristics. Moreover, by supporting eco-friendly construction, these initiatives will hasten the acceptance and application of GPC in the building sector.
Further studies can also include the other experimental factors, such as the geometry of the sample testing, strain rate, and temperature effect. The other ensemble ML approaches can also be employed to check the precision level of the results, such as random forest, bagging, and boosting.