Prediction of Ecofriendly Concrete Compressive Strength Using Gradient Boosting Regression Tree Combined with GridSearchCV Hyperparameter-Optimization Techniques

A crucial factor in the efficient design of concrete sustainable buildings is the compressive strength (Cs) of eco-friendly concrete. In this work, a hybrid model of Gradient Boosting Regression Tree (GBRT) with grid search cross-validation (GridSearchCV) optimization technique was used to predict the compressive strength, which allowed us to increase the precision of the prediction models. In addition, to build the proposed models, 164 experiments on eco-friendly concrete compressive strength were gathered for previous researches. The dataset included the water/binder ratio (W/B), curing time (age), the recycled aggregate percentage from the total aggregate in the mixture (RA%), ground granulated blast-furnace slag (GGBFS) material percentage from the total binder used in the mixture (GGBFS%), and superplasticizer (kg). The root mean square error (RMSE) and coefficient of determination (R2) between the observed and forecast strengths were used to evaluate the accuracy of the predictive models. The obtained results indicated that—when compared to the default GBRT model—the GridSearchCV approach can capture more hyperparameters for the GBRT prediction model. Furthermore, the robustness and generalization of the GSC-GBRT model produced notable results, with RMSE and R2 values (for the testing phase) of 2.3214 and 0.9612, respectively. The outcomes proved that the suggested GSC-GBRT model is advantageous. Additionally, the significance and contribution of the input factors that affect the compressive strength were explained using the Shapley additive explanation (SHAP) approach.


Introduction
In recent years, the design and construction of sustainable buildings has become a major goal. One of the most important factors influencing this criterion is the use of Portland cement substitutes. For instance, cement production consumes vast energy and significantly contributes to greenhouse gas emissions and environmental degradation. Therefore, a reduction in cement incorporated in concrete mixes is necessary to protect the environment. In addition, to avoid the negative impacts of burying hazardous industrial wastes, using recycled concrete aggregate (RCA) and ground granulated blast-furnace slag (GGBFS) as waste materials in concrete mixes would also allow us to reduce greenhouse gas emissions, since less cement and natural aggregate (NA) will be produced [1].

Related Works
In civil engineering, supervised machine learning models have acquired a lot of momentum, especially for predicting eco-friendly concrete properties, because they can forecast the results with high accuracy. Researchers have presented many machine learning (ML)-based prediction models for forecasting the mechanical and physical properties of RCA. The suggested approaches have been validated as a reliable alternative to expensive and time-consuming testing in laboratory usually used to assess concrete qualities. For example, Han at al. [16] presented an ensemble ML model for concrete derived from RCA to estimate the concrete modulus of elasticity. It is demonstrated that when compared to stand-alone models, the ensemble ML model consistently generates more precise predictions. Additionally, prediction models for the Cs of concrete incorporating recycled materials were suggested using random forest, linear regression, and nonlinear regression techniques [17]. The models based on random forest and nonlinear regression were found to be more accurate than linear regression. Furthermore, Liu et al. [18] used ML algorithms to predict the carbonation depth in RCA. According to the results, the random forest model outperforms the stand-alone artificial neural network (ANN) model and Gaussian progress regression model. Last but not least, 10 ML algorithms were tested using a dataset containing 962 experimental Cs of NCA and RCA [19]. As they outperform existing methods, the ML models established for this work can be suggested as a valuable tool for the prediction of the Cs. In term of GGBFS, numerous studies established an ML model to forecast the mechanical characteristics of eco-friendly concrete that includes GGBFS material as a substitute for regular cement. The NN and ANFIS models for predicting chloride permeability in concrete were introduced by Boga et al. [20]. Their study used concrete specimens containing solely calcium nitrite-based corrosion inhibitor (CNI), GGBFS, and a mixture of these ingredients in various ratios. The evaluation of the results demonstrated that both models estimate the permeability of chloride with good precision. In addition, random forest was used by Mai et al. [21] to forecast the Cs of GGBFS-containing concrete. RF performances in terms of R 2 , RMSE, and MAE were 0.9729, 4.9585, and 3.9423, respectively. Han et al. [22] created an innovative hybrid model for calculating the Cs of GGBFS concrete and validated the synergistic benefits of the hybrid algorithm over a single algorithm. The new PSO-BP hybrid neural network model outperformed basic ANNs trained by a single method and was shown to be suited for estimating the Cs of GGBFS concrete. Table 1 provides a comprehensive summary of relevant prior work related to predicting the Cs of eco-friendly concrete.

Research Significance
A powerful ensemble learning approach built on a gradient boosting system is called a gradient boosting regression tree (GBRT) [31]. To be more precise, GBRT is a robust datamining technique that has been extensively tested and shown to be successful in various classification and regression problems [32][33][34][35][36][37][38][39][40][41][42][43]. As a result, in this study the prediction of the Cs of eco-friendly concrete was chosen to demonstrate the potential of the GBRT technique. Additionally, adjusting the hyperparameters of GBRT models for eco-friendly concrete datasets is beneficial. An essential step in the ML process is hyperparameter tuning (optimization). A wise selection of hyperparameters may either help a model to achieve the intended metric value or, on the other hand, cause it to enter an endless loop of iterative training and optimization. Thus, the hyperparameters of GBRT are optimized using the GridSearchCV technique. Since the prediction of the Cs of eco-friendly concrete that contains recycled aggregate (RA) as a replacement for NA and GGBFS material as a replacement for OPC has never been studied in the way that this study does, this is a novel piece of research. This article has the following format. The research technique for this work is outlined in Section 2, and Section 3 displays a description of the dataset. The outcomes from the model prediction and the comparative analysis are described in Section 4. Section 5 provides a thorough analysis of the significance and contribution of each input variable to the final Cs. Finally, Section 6 provides a summary of the main conclusion.

Research Methodology
The methodology of the study is displayed in Figure 1. The data were first gathered, then divided into training and testing datasets: 80% and 20% of the total data, respectively. The GBRT technique for generating models and the GridSearchCV approach for determining precise model parameters were then introduced. The evaluation and interpretation of the GSC-GBRT model made up the last phase. A performance assessment was carried out based on the statistical performance measurement metrics. Related metrics were used to quantify the evaluation results, including RMSE and R 2 measurements. The notion of SHAP was finally introduced throughout the interpretation process, and global and local assessments were carried out. The Python 3.7 Scikit-learn software [44] was used to model and tune the GBRT in order to produce GSC-GBRT. work is outlined in Section 2, and Section 3 displays a description of the dataset. The outcomes from the model prediction and the comparative analysis are described in Section 4. Section 5 provides a thorough analysis of the significance and contribution of each input variable to the final Cs. Finally, Section 6 provides a summary of the main conclusion.

Research Methodology
The methodology of the study is displayed in Figure 1. The data were first gathered, then divided into training and testing datasets: 80% and 20% of the total data, respectively. The GBRT technique for generating models and the GridSearchCV approach for determining precise model parameters were then introduced. The evaluation and interpretation of the GSC-GBRT model made up the last phase. A performance assessment was carried out based on the statistical performance measurement metrics. Related metrics were used to quantify the evaluation results, including RMSE and R 2 measurements. The notion of SHAP was finally introduced throughout the interpretation process, and global and local assessments were carried out. The Python 3.7 Scikit-learn software [44] was used to model and tune the GBRT in order to produce GSC-GBRT.

Gradient Boosting Algorithm
Let us consider x as a collection of random inputs variable = { 1 , 2 , … , } and as response variable. Using a training data in the form of {( , )} for = 1,2, … , with ∈ and ∈ , the goal is finding an approximation ̃( ) of the function F(x) mapping to , to minimize loss function ( , (x)), see Equation (1). Errors are inevitable when it is expected to seek function ̃( ). Each weak learner model seeks to correct errors produced by earlier weak learner models as the gradient boosting approach fits weak learners to the loss functions. Because of this, the performance of the prediction model may be improved, and prediction error can be decreased.

Gradient Boosting Algorithm
Let us consider x as a collection of random inputs variable x = {x 1 , x 2 , . . . , x n } and y as response variable. Using a training data in the form of {(x i , y i )} for i = 1, 2, . . . , N with x i ∈ R n and y i ∈ R, the goal is finding an approximation F(x) of the function F(x) mapping x to y, to minimize loss function L(y, F(x)), see Equation (1). Errors are inevitable when it is expected to seek function F(x). Each weak learner model seeks to correct errors produced by earlier weak learner models as the gradient boosting approach fits weak learners to the loss functions. Because of this, the performance of the prediction model may be improved, and prediction error can be decreased.
The estimation of the approximation function as L(y, F(x)) = (y − F(x)) 2 needs the squared error function to be used as the loss function. The gradient boosting technique uses the steepest descent step to minimize the loss function after establishing an initial base learner F 0 (x), which is typically a constant function in step 1.
Step 2 requires defining the iteration space for m = 1, . . . , M. Finding the local minimum requires the steepest descent, which makes steps proportional to the negative gradient of the loss function.
Specifically, the following equation can be used to determine the gradient of the loss function L(y, F(x)) (step 3): When regression trees h(x i ; a) are employed with parameter a as weak learners, it can generalize the range of the gradient calculation. The a parameters define it as a parameterized function of the input variables x [45]. The equation below can be solved to produce the tree (step 4): where β is the weight value, commonly known as the expansion coefficient of each weak learner, and a m is the parameters discovered at iteration m. The current negative gradient is fitted to each regression tree. The model F m (x) is then updated at step 6 at each iteration m, with m = 1, . . . , M, after step 5 determines the ideal length p m . Algorithm 1 from Friedman formalizes the gradient boosting algorithm [29], see Algorithm 1.

Gradient Boosting Regression Tree Algorithm (GBRT)
Lie et al. [46] introduced classification and regression trees (CARTs) in 1984. CARTs can be used for regression and classification models [31][32][33][34][35]. The trees utilized in these two models are known as decision trees, and the development of decision trees involves using recursive techniques to produce binary trees. The technique that creates regression trees using the square error minimization criteria is mainly discussed here since the aim is to research concrete Cs predictions. It is noted that He et al. [47] introduced the GBRT method, which combines the CART algorithm with the GB algorithm. Because it can represent nonlinear interactions without requiring previous knowledge of the probability distribution of variables, it is noted that the CART has highest excellent prediction performance than most artificial intelligence models [47]. As mentioned before, the gradient boosting algorithm combines weak and robust learners. Regression trees produced by the CART method serve as weak learners in this investigation. In order to further minimize the prediction error and raise the model accuracy, the weak learners are added to the model to correct the prediction errors created by previous models.
The formalization of GBRT Algorithm 2 is presented in Algorithm 2.
The initial value of F 0 (x) is set by the GBRT algorithm using the following equation (step 1).
Each regression tree divides the input space J m into disjoint regions R m,1 , . . . , R m,Jm and predicts a value c m,j for region R m,J based on the assumption that there are J m splits. The following equation can be minimized to get the value of c m,j (step 4).
The updated model, or the m-th regression tree F m (x), whose corresponding leaf node area is R m,J j = 1, 2, . . . , J m , may be calculated as follows (step 5).
where J m is a representation of the m-th regression tree number of leaf nodes and I = 1 if x ∈ R m,J and I = 0 otherwise. At step 6, the model is lastly updated.

Hyperparameter Tunning with GridSearchCV
In this research, many models were trained on the dataset for almost every ML project before choosing the one that performs the best. However, there is still potential for improvement because there is no certainty that this specific model is the best for the issue at hand. As a result, the aim is to make the model better in whatever manner. These models' hyperparameters play a crucial role in how well they function: if the correct values are chosen for these hyperparameters, the performance of the model performance can advance considerably. Grid search cross-validation (GridSearchCV) was used to choose the best model for each ML approach. With the parameters that produced the better cross-validation performance, a new model is automatically fitted using this method to the whole training dataset. This method aids in obtaining a more accurate generalization performance estimate. For example, with k = 5, the K-fold cross-validation procedure was used. A portion of the data is used in the K-fold cross-validation to test the model and another portion to fit it. The prediction error from Equation (8) is then estimated using cross-validation as follows: where k is the number of subsets, n is the size of the dataset, T is the loss function, and f −k(i) is the fitted function. The GridSearchCV methodology utilized in this study for model training and hyperparameter selection is shown in Figure 2 as demonstrated in [48]. where is the number of subsets, is the size of the dataset, is the loss function, and − ( ) is the fitted function. The GridSearchCV methodology utilized in this study for model training and hyperparameter selection is shown in Figure 2 as demonstrated in [48].

Model Interpretation with the SHAP Method
In this research, SHAP was applied to the GSC-GBRT model output of the predicted values to explain them. This technique is known as the decoupling of each input parameter's effect on the Cs of a particular mixture sample. A model of explanation developed by SHAP is expressed as follows: where 0 is the constant if all inputs are missing, stands for the -th feature contribution value, and ′ ∈ and K represent the number of input features. A mechanism is created to determine how much each input information adds to the value that the model generates. The Shaply value is the one that most closely resembles human intuition and fits the three criteria that the additive feature attribution approach should meet (local accuracy, missingness, and consistency) [49]. Shaply values indicate the extent that each predictor (feature) contributes to a machine learning model. Two models are trained: ∪ when a particular feature is included and when it is not, to investigate the impact of that feature on the model. The variance in the results obtained from these two models for a given input reveals how feature affected the model. This theoretical idea serves as the foundation for the final calculation of the Shapely value, which represents the contribution of each feature, as the weighted average of all potential differences, as indicated in the following equation:

Model Interpretation with the SHAP Method
In this research, SHAP was applied to the GSC-GBRT model output of the predicted values to explain them. This technique is known as the decoupling of each input parameter's effect on the Cs of a particular mixture sample. A model of explanation developed by SHAP is expressed as follows: where φ 0 is the constant if all inputs are missing, φ j stands for the i-th feature contribution value, and z ∈ and K represent the number of input features. A mechanism is created to determine how much each input information adds to the value that the model generates.
The Shaply value is the one that most closely resembles human intuition and fits the three criteria that the additive feature attribution approach should meet (local accuracy, missingness, and consistency) [49]. Shaply values indicate the extent that each predictor (feature) contributes to a machine learning model. Two models are trained: f S∪ when a particular feature i is included and f S when it is not, to investigate the impact of that feature on the model. The variance in the results obtained from these two models for a given input x S reveals how feature i affected the model. This theoretical idea serves as the foundation for the final calculation of the Shapely value, which represents the contribution of each feature, as the weighted average of all potential differences, as indicated in the following equation: where S represents the set of all features excluding i, F represents the set of all features, and f is the prediction/estimation model.

Performance Metrics
The coefficient of determination (R 2 ), as indicated by Equation (11), was used to evaluate the accuracy of the training and testing datasets for each model.
where y pre i and y obs i represent the predicted output and the actual outcome (Cs), respectively; n denotes the number of data used in the Cs modeling, and y −obs i is the mean value of the real outcome.
The root mean square error (RMSE) of the algorithm prediction was computed from Equation (12). The comparison between RMSE values can be used to assess the optimization process since a more accurate model has a comparatively lower RMSE value.
Another performance metric used in this study is the variance accounted for (VAF). The following equation was used to compute this metric.
When the RMSE, R 2 , and VAR become closer to 0, 1, and 100, respectively, the accuracy of the model prediction increases.

Dataset Used
Various studies have investigated the Cs of environmentally friendly concrete. As a result, an extensive dataset with 164 experiments on the Cs of eco-friendly concrete incorporating both RA and GGBFS material was recently assembled in reference [50]. The ML model for predicting the concrete Cs was trained and tested using this dataset. Details can be consulted in [50]. According to [39], the data collected from the collected experimental studies consider the effect of both GGBFS and RCA on the Cs of concrete. To unify their results, when collecting the data, specimens with only cube shapes of either 100 mm or 150 mm in length were considered. The authors used Rashid and Mansur's equation to transform the Cs from a 100 mm cube specimen to the equivalent Cs from a 150 mm cube specimen. In addition, five of the data records had no RA and/or GGBFS in their combination proportions, indicating standard concrete blends.
After being randomly sorted for the aim of constructing models, the gathered data records were divided into two fragments. First, the data records were split into training and testing data with 80% and 20%, respectively. Some relevant key parameters of the dataset utilized in this work is shown in Table 2. The primary input factors that have a significant impact on the concrete Cs were the following ones: water/binder ratio (W/B), curing time (Age), the recycled aggregate percentage from total aggregate in the mixture (RA%), GGBFS material percentage from total binder used in the mixture (GGBFS%), and superplasticizer content (kg). Table 2 presents the lowest, average, median, standard deviation, and maximum values of the input variables for the training and testing sets.  Figure 3 displays the relationships and statistical distributions of the Cs and the concrete components. Notably, none of the components are correlated, meaning that all the variables considered to forecast the concrete Cs are independent. Additionally, it can be seen that there is only a weak correlation between the W/B ratio and the Cs.
A heatmap, as shown in Figure 4, clearly summarizes the association of the entire dataset (containing all of the ratios, weight, percentages, age, and Cs of concrete). It is possible to plot the heatmap in Python by using the Seaborn library. A correlation index near 1 indicates that the features are highly connected. A negative correlation value close to −1 indicates a perfect correlation of two features but moving in opposing directions. In contrast, two uncorrelated features have a correlation index near 0. As it can be noticed in the heatmap, W/B, RA%, and GGBFS% have a negative correlation with the output, while Age and Sp have a positive correlation.
The parameters in Table 2 are utilized to produce environmentally friendly concrete. For completeness, a brief description of each variable is provided below to justify why they were considered concrete components in this study.
The W/B ratio has a significant influence on the concrete strength. Researchers noted that reducing the W/B ratio enhanced the Cs, splitting tensile strength, net flexural strength, and elastic modulus of self-compacting concretes fabricated using RA [51]. Moreover, the 28-day Cs of RCA concretes with a W/B ratio of 0.40 could be increased to be higher than the 28-day Cs of NA concrete with a W/B ratio of 0.50 and be closer to the 28-day Cs of NA concrete with a W/B ratio of 0.45 through the use of fly ash at replacement rates of 15 and 25% by weight of binder in RCA concretes [52]. In addition, the test findings demonstrated that the fly ash contributed more for the Cs at lower W/B ratios than it did in mixes made at higher W/B ratios.
The use of demolition waste as RA to create RAC has been investigated by several researchers [53,54]. However, RAC characteristics are inferior to NCA when porous mortar binds the RA [15]. Furthermore, comparing RCA to NCA, a 30-40% reduction in Cs was noticed [16]. Regardless, Aliabdo et al. [55] found that while replacing coarse aggregate with RA results in a negligible drop in concrete strength, substituting fine recycled aggregate results in a considerable reduction. Therefore, RA% was added as a necessary concrete element in the dataset used in this study. The use of demolition waste as RA to create RAC has been investigated by several researchers [53,54]. However, RAC characteristics are inferior to NCA when porous mortar binds the RA [15]. Furthermore, comparing RCA to NCA, a 30-40% reduction in Cs was noticed [16]. Regardless, Aliabdo et al. [55] found that while replacing coarse aggregate with RA results in a negligible drop in concrete strength, substituting fine recycled aggregate results in a considerable reduction. Therefore, RA% was added as a necessary concrete element in the dataset used in this study.
With the incorporation of RA, Cs tends to drop; however, adding superplasticizers (Sp) can improve the mix compactness, recovering up most of the strength loss [56]. Cs increased when Sp dose was raised, and it was even more significant for Sp with higher With the incorporation of RA, Cs tends to drop; however, adding superplasticizers (Sp) can improve the mix compactness, recovering up most of the strength loss [56]. Cs increased when Sp dose was raised, and it was even more significant for Sp with higher water reduction capacities. With the proportion of added Sp, the mix density followed a similar pattern to Cs and slightly decreased towards a higher dosage [56]. at later ages (56 and 90 days), the Cs was equal or higher. The maximum Cs was achieved for SCC mixtures containing 15% GGBFS [57].
The development of durable concrete is determined by its age. When both RCA and GGBFS content are increased, the short-term and long-term Cs of concrete fall [10]. However, the strength growth rate with age increases when both RCA and GGBFS rise. This increase in strength growth rate results from GGBFS latent hydraulic activity and increased hydration of the unhydrated cement particles in RCA [10].

Hyperparameter Optimization: GridSearchCV
Throughout applying the exhaustive parameter search GridSearchCV method, it was possible to find the parameters that matched the predicted model characteristics. Finding parameters with the optimal model estimation accuracy is easier using the Python GridSearchCV function [58]. The ideal GBRT hyperparameter combination was obtained by defining the values and ranges of the GBRT estimation model hyperparameter and using the GridSearchCV function. The learning rate and the estimators, which stand for the weights given to each estimator and the number of the model weak learners, respectively, are the most crucial hyperparameters for the GBRT model. Additionally, the subsample parameter, which denotes the percentage of data to be utilized for fitting the individual base learners, and the max depth parameter, which defines the complexity of each tree, can both significantly impact the GBRT model ability to predict outcomes. When the following parameters are set: learning_rate = 0.05, max_depth = 4, _estimators = 1000, subsample = 0.5, the GBRT using the GridSearchCV approach yields high estimate accuracy. The tuning parameters taken into account by GridSearchCV are listed in Table 3. Table 3. Hyperparameters for the GBRT model. Some researchers discovered that the Cs of all self-compacting concrete (SCC) mixes dropped with an increase in RCA for SCC incorporating GGBFS and RCA [57]. GGBFS as a cement replacement reduced the Cs at an early age compared to reference concrete, but at later ages (56 and 90 days), the Cs was equal or higher. The maximum Cs was achieved for SCC mixtures containing 15% GGBFS [57].
The development of durable concrete is determined by its age. When both RCA and GGBFS content are increased, the short-term and long-term Cs of concrete fall [10]. However, the strength growth rate with age increases when both RCA and GGBFS rise. This increase in strength growth rate results from GGBFS latent hydraulic activity and increased hydration of the unhydrated cement particles in RCA [10].

Hyperparameter Optimization: GridSearchCV
Throughout applying the exhaustive parameter search GridSearchCV method, it was possible to find the parameters that matched the predicted model characteristics. Finding parameters with the optimal model estimation accuracy is easier using the Python GridSearchCV function [58]. The ideal GBRT hyperparameter combination was obtained by defining the values and ranges of the GBRT estimation model hyperparameter and using the GridSearchCV function. The learning rate and the n estimators, which stand for the weights given to each estimator and the number of the model weak learners, respectively, are the most crucial hyperparameters for the GBRT model. Additionally, the subsample parameter, which denotes the percentage of data to be utilized for fitting the individual base learners, and the max depth parameter, which defines the complexity of each tree, can both significantly impact the GBRT model ability to predict outcomes. When the following parameters are set: learning_rate = 0.05, max_depth = 4, n_estimators = 1000, subsample = 0.5, the GBRT using the GridSearchCV approach yields high estimate accuracy. The tuning parameters taken into account by GridSearchCV are listed in Table 3.

Comparison of the Prediction Results of the Two Models
The observed and predicted Cs values of the testing and training datasets are shown in Figure 5 to understand better the performance of the GSC-GBRT. Figure 5 demonstrates that, when compared to the default GBRT predictive model, the Cs outcomes predicted by the GSC-GBRT model were more in line with their observed values. In addition, the prediction accuracy of the GSC-GBRT test set after the hyperparameters' modification was higher than that of the default GBRT model, according to the optimized hyperparameter findings (see Table 4). For example, the R 2 and RMSE values for the GSC-GBRT model were 0.9612 and 2.3214, respectively, whereas they were 0.9216 and 3.4390 for the GBRT model. This demonstrates that, for the eco-friendly concrete dataset used for the prediction procedure of the Cs, GSC-GBRT could better match the complicated connection between the component factors influencing the Cs and had superior generalization capacity.   Figure 6 displays the distribution of the relative percentage error between the values of the GBRT and GSC-GBRT models and the Cs of environmentally friendly concrete. Based on this graph, it can be stated that the relative error distributions around the zero-error line for both models during testing are suitable. However, as shown by the reported findings in Figure 6, the GSC-GBRT model appears more capable of making predictions than the GBRT model.  The results of residuals produced by GSC-GBRT and default GBRT prediction of the Cs of environmentally friendly concrete mixtures at various ages are shown in Figure 7. Each figure shows red dots for the predicted values and black square for measured values. The predicted and measured values of the training set was relatively close for both methods. However, some of the GBRT calculations included inaccuracies exceeding 5 MPa. As it can be noticed, the testing set analysis showed that while GSC-GBRT maintained relatively constant prediction findings, several GBRT predicted points significantly differed from the actual Cs values.
The multiple linear regression and M5P models [59,60] are compared with the GSC-GBRT model in order to more accurately depict the accuracy of the last one. Multiple linear regression and M5P were used to obtain the predictions, and an 8:2 dataset division ratio was used. Figure 8 depicts the correlation between the estimated and actual Cs. The multiple linear regression and M5P models [59,60] are compared with the GSC-GBRT model in order to more accurately depict the accuracy of the last one. Multiple linear regression and M5P were used to obtain the predictions, and an 8:2 dataset division ratio was used. Figure 8 depicts the correlation between the estimated and actual Cs. Table 5 displays the evaluation results for the test set using multiple linear regression and M5P. It can be shown that the GSC-GBRT ensemble learning model outperforms both of the other models. Multiple weak learners produced by different learning algorithms are combined in ensemble learning. To encourage more accurate predictions, weak learners who perform well are given larger weights, while weak learners who perform poorly are given lower weights. On the test set, the GSC-GBRT ensemble learning model performed the best. When compared to the linear regression, the R 2 was up 31%, and the RMSE was down 62%. R 2 increased by 28%, and RMSE dropped by 61% when compared to the M5P model. Overall, the ensemble learning model appears to perform significantly better for the eco-friendly concrete Cs prediction than the traditional machine learning methods.    Table 5 displays the evaluation results for the test set using multiple linear regression and M5P. It can be shown that the GSC-GBRT ensemble learning model outperforms both of the other models. Multiple weak learners produced by different learning algorithms are combined in ensemble learning. To encourage more accurate predictions, weak learners who perform well are given larger weights, while weak learners who perform poorly are given lower weights. On the test set, the GSC-GBRT ensemble learning model performed the best. When compared to the linear regression, the R 2 was up 31%, and the RMSE was down 62%. R 2 increased by 28%, and RMSE dropped by 61% when compared to the M5P model. Overall, the ensemble learning model appears to perform significantly better for the eco-friendly concrete Cs prediction than the traditional machine learning methods.

Interpretation of the GBRT Model
In the absence of supporting theories, mathematical computations, or operational processes, the output findings or predictions of ML modeling can be challenging to explain [61]. However, the contributions of input variables can be analyzed using feature importance, sensitivity analysis, or partial dependency analysis in order to evaluate those outputs and comprehend the trained models. Here, the multicollinearity problem and potential synergistic effects of the variables were effectively treated using the SHAP approach. In this section, the "SHapley Additive Explanations" approach [62], combined with the GSC-GBRT model, explains and clarifies the contribution of each input variable to the concrete Cs prediction. According to Figure 9a, which displays the mean absolute SHAP values for each feature in the Cs modeling, the age in the concrete mixture has the highest mean SHAP value among the five input features. The descending order of input variable impact on GSC-GBRT model prediction accuracy is: AGE > W/B > GGBFS% > SP > RCA%. The input features SHAP values are shown in Figure 9b, with red denoting a high feature value and blue denoting a low feature value. The corresponding feature will benefit the output goal if the feature SHAP value is positive. When the SHAP value is more significant, the influence has a greater impact. For example, the age (red points) in Figure  9b exhibited noticeably high SHAP values, which showed that the higher curing age had a detrimental effect on the prediction of the Cs. It is widely established from Figure 9b that the W/B content, RCA%, and GGBFS% have a negative association with Cs. This is in line with earlier research [8,63].
A web application was developed using the suggested GSC-GBRT model to forecast the Cs of eco-friendly concrete at the Streamlit library. The user can theoretically forecast the Cs of eco-friendly concrete using the trained ML model based on the compiled dataset. W/B, RA%, GGBFS%, superplasticizer (kg), and age (days) are the first parameters the The input features SHAP values are shown in Figure 9b, with red denoting a high feature value and blue denoting a low feature value. The corresponding feature will benefit the output goal if the feature SHAP value is positive. When the SHAP value is more significant, the influence has a greater impact. For example, the age (red points) in Figure 9b exhibited noticeably high SHAP values, which showed that the higher curing age had a detrimental effect on the prediction of the Cs. It is widely established from Figure 9b that the W/B content, RCA%, and GGBFS% have a negative association with Cs. This is in line with earlier research [8,63].
A web application was developed using the suggested GSC-GBRT model to forecast the Cs of eco-friendly concrete at the Streamlit library. The user can theoretically forecast the Cs of eco-friendly concrete using the trained ML model based on the compiled dataset. W/B, RA%, GGBFS%, superplasticizer (kg), and age (days) are the first parameters the user enters using the online application sliders and radio buttons. In the next step, the concrete Cs in MPa is computed. The parameter ranges provided in the application match the feature ranges in the datasets used for ML training. The Streamlit web application can be accessed at the link given in [64].

Conclusions
This research proposed a hybrid GSC-GBRT model for the prediction of the Cs of sustainable concrete. The GridSearchCV approach was firstly used to find the optimum parameters, and the optimized model was then used to forecast the Cs. The hybrid GSC-GBRT model obtained higher prediction accuracy and reduced error with R 2 = 0.9612 and RMSE = 2.3214 when compared to the evaluation metrics of the original GBRT model with R 2 = 0.9216 and RMSE = 3.4390 for the test set. The suggested GSC-GBRT model surpasses the initial GBRT model in assessment metrics, and it is suggested to be used as a tool for pre-estimating the Cs of concrete using the mix ratio prior to design and mixing.
According to the SHAP-based research, W/B and age are the two input factors that most significantly impact the concrete Cs among the five considered. Age and superplasticizer positively impact the output, and the Cs rises as a result. On the other hand, a rise in W/B, GGBFS%, and RA% causes the Cs to fall. Therefore, designers and engineers can use the significance and contribution of these factors to the output outcomes as a guide. Finally, in Streamlit, a web application for predicting Cs of eco-friendly concrete, was developed. The cloud has been used to deliver the application light version. Any web browser, including mobile ones, can be used to access and use it.
The GSC-GBRT model has several limitations while being competent and acceptable for estimating the Cs of eco-friendly concrete. Firstly, 164 experiments from the literature were used to construct the eco-friendly concrete dataset. The accuracy of the prediction models is significantly influenced by the completeness of the data, quantity, quality, and distribution of the input parameters. As new experimental data become available, the dataset may benefit from being updated. Secondly, like any ML method, the SHAP explanations and GSC-GBRT findings may only apply to the tested input data ranges.