Bayesian-Optimized Ensemble Models for Geopolymer Concrete Compressive Strength Prediction with Interpretability Analysis

Cihan, Mehmet Timur; Cihan, Pınar

doi:10.3390/buildings15203667

Open AccessArticle

Bayesian-Optimized Ensemble Models for Geopolymer Concrete Compressive Strength Prediction with Interpretability Analysis

by

Mehmet Timur Cihan

¹

and

Pınar Cihan

^2,*

¹

Department of Civil Engineering, Çorlu Engineering Faculty, Tekirdağ Namık Kemal University, 59860 Tekirdağ, Turkey

²

Department of Computer Engineering, Çorlu Engineering Faculty, Tekirdağ Namık Kemal University, 59860 Tekirdağ, Turkey

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(20), 3667; https://doi.org/10.3390/buildings15203667

Submission received: 16 September 2025 / Revised: 9 October 2025 / Accepted: 10 October 2025 / Published: 11 October 2025

(This article belongs to the Section Building Materials, and Repair & Renovation)

Download

Browse Figures

Versions Notes

Abstract

Accurate prediction of geopolymer concrete compressive strength is vital for sustainable construction. Traditional experiments are time-consuming and costly; therefore, computer-aided systems enable rapid and accurate estimation. This study evaluates three ensemble learning algorithms (Extreme Gradient Boosting (XGB), Random Forest (RF), and Light Gradient Boosting Machine (LightGBM)), as well as two baseline models (Support Vector Regression (SVR) and Artificial Neural Network (ANN)), for this task. To improve performance, hyperparameter tuning was conducted using Bayesian Optimization (BO). Model accuracy was measured using R², RMSE, MAE, and MAPE. The results demonstrate that the XGB model outperforms others under both default and optimized settings. In particular, the XGB-BO model achieved high accuracy, with RMSE of 0.3100 ± 0.0616 and R² of 0.9997 ± 0.0001. Furthermore, Shapley Additive Explanations (SHAP) analysis was used to interpret the decision-making of the XGB model. SHAP results revealed the most influential features for compressive strength of geopolymer concrete were, in order, coarse aggregate, curing time, and NaOH molar concentration. The graphical user interface (GUI) developed for compressive strength prediction demonstrates the practical potential of this research. It contributes to integrating the approach into construction practices. This study highlights the effectiveness of explainable machine learning in understanding complex material behaviors and emphasizes the importance of model optimization for making sustainable and accurate engineering predictions.

Keywords:

geopolymer concrete; ensemble machine learning; explainable artificial intelligence; interpretability analysis; graphical user interface

1. Introduction

The reuse of waste materials in the construction industry is crucial for sustainable production. In the ready-mixed concrete industry, where high-carbon-footprint materials such as Portland cement are predominantly used, the partial replacement of cement and natural aggregates with industrial by-products (e.g., fly ash, slag, or silica fume) enhances resource efficiency, reduces greenhouse gas emissions, and supports the transition toward more sustainable construction practices. Moreover, geopolymers [1] and alkali-activated binder systems, which exhibit a lower carbon footprint compared to conventional Portland cement, aim to reduce cement consumption and are instrumental in shaping the sustainability vision of the concrete industry.

Geopolymers are inorganic aluminosilicate-based binders formed through the alkali activation of amorphous or glassy aluminosilicate materials [2]. In geopolymer production, natural raw materials such as metakaolin, natural pozzolans, ground granulated blast furnace slag (GGBFS), phosphorus slag, fly ash, and waste glass, as well as industrial by-products and recycled aluminosilicate components, are utilized as precursor materials. Among these, fly ash and GGBFS are currently considered the most reliable and widely used precursor materials for geopolymer concrete production [2]. These materials undergo solidification and gain high mechanical strength when reacted with suitably formulated alkali activators. Furthermore, properly designed alkali-activated systems have been reported to exhibit superior performance characteristics compared to conventional Portland cement-based concretes [2].

Numerous variables influence the physical and mechanical properties of geopolymer concretes. Factors such as raw material characteristics, alkaline activation conditions, water-to-solid ratio, and curing regimes directly affect the performance of these materials. The large number of effect parameters complicates the determination of the mechanical properties of geopolymer concretes, thereby requiring extensive laboratory testing. However, such comprehensive experiments are both costly and time-consuming [3,4]. Therefore, high-performance computer-aided prediction of the mechanical properties of geopolymer concretes is of great importance, offering significant advantages in terms of cost and labor reduction. Furthermore, the absence of standardized design procedures for geopolymer concretes restricts their widespread application in practical engineering contexts [5].

In recent years, the use of machine learning-based approaches to predict the properties of construction materials has increased significantly. Among these, ensemble-learning techniques have shown significant potential in accurately estimating the physical and mechanical properties of complex materials, like concrete. Thus, implementing ensemble-learning methods for predicting the strength of geopolymer concretes offers a reliable and robust alternative.

Ensemble machine learning methods enable the construction of more robust and stable predictive models by combining multiple weak learners. These techniques are generally more resistant to overfitting and have the potential to improve generalization performance. Consequently, ensemble learning has been widely adopted in data science and predictive modeling tasks, particularly for estimating the properties of complex systems with high accuracy. Among these approaches, bagging (Bootstrap Aggregating) and boosting stand out as the most popular strategies. While bagging-based algorithms, such as Random Forest (RF), primarily focus on reducing model variance, boosting-based algorithms, including eXtreme Gradient Boosting (XGB) and Light Gradient Boosting Machine (LightGBM), systematically aim to minimize model errors. Notably, the prediction of mechanical properties of heterogeneous materials, such as concrete and geopolymer concrete, these powerful learning strategies offer significant time and cost advantages over traditional experimental methods. In addition to the ensemble methods, two widely used baseline models were also included in this study for comparative evaluation: Support Vector Regression (SVR) and Artificial Neural Network (ANN). SVR, a kernel-based learning algorithm, is effective at capturing nonlinear relationships by mapping input features into high-dimensional spaces. ANN, on the other hand, is a data-driven approach inspired by biological neural networks, capable of approximating complex, nonlinear mappings between input variables and target properties. Including these models provides a valuable benchmark to highlight the advantages of ensemble learning techniques for compressive strength prediction.

Despite the growing interest in applying machine learning to predict the properties of geopolymer concretes, most existing studies have either relied on a single learning algorithm, lacked systematic hyperparameter optimization, or provided limited insights into the underlying decision-making of the models. In particular, the integration of ensemble methods with rigorous Bayesian Optimization and explainable AI techniques such as SHAP for understanding feature contributions has rarely been explored in this context. Moreover, few studies have translated their models into practical tools, such as user-friendly interfaces, that can support engineers and practitioners in real-time mixture-design decision-making. This study addresses these gaps by delivering a comprehensive framework that combines ensemble and baseline models, optimized via BO, interpretable through SHAP, and implemented in a GUI for practical use—offering a novel and application-oriented contribution to sustainable geopolymer concrete design.

In this study, ensemble machine learning techniques, namely the XGB, RF, LightGBM, SVR, and ANN algorithms, were employed to predict the compressive strength of geopolymer concretes. Hyperparameter Optimization (HO) was conducted using the Bayesian Optimization (BO) method, and model performance with default and optimized parameters was compared. Additionally, to enhance the interpretability of the developed models, SHapley Additive exPlanations (SHAP) analysis was applied, facilitating the identification of the most influential variables affecting compressive strength. Furthermore, a user-friendly graphical user interface (GUI) was developed to enable rapid and accessible prediction of compressive strength based on mixture design parameters. Overall, this study aimed to deliver reliable, rapid, and explainable predictive models for geopolymer concretes, along with practical tools to support their application, thereby contributing to the field of sustainable construction materials.

The contributions of this study to the field of sustainable construction materials, with a specific focus on geopolymer concretes, are summarized below:

To systematically assess the effectiveness of ensemble machine learning models in predicting the compressive strength of geopolymer concretes.
To investigate the impact of hyperparameter tuning through BO, providing insights into how model calibration affects prediction accuracy.
To enhance model interpretability by applying SHAP, an explainable artificial intelligence (XAI) technique, for identifying the most influential variables affecting concrete strength.
To develop reliable, accurate, fast, and interpretable predictive models supported by a user-friendly GUI, facilitating the practical implementation of AI-driven solutions in designing sustainable and eco-friendly construction materials.

2. Related Studies

In recent years, the use of machine learning (ML) techniques to predict the physical and mechanical properties of concrete has gained significant attention, particularly due to their ability to model complex, nonlinear relationships among material parameters [6,7,8,9,10,11,12]. Given the growing demand for sustainable construction materials, geopolymer concrete has emerged as a promising alternative to conventional Portland cement concrete. Accordingly, researchers have increasingly applied ML-based models to predict key performance indicators of geopolymer concretes, particularly compressive strength, with encouraging levels of accuracy and generalizability. These studies employ various ML algorithms and optimization strategies to improve model robustness and interpretability. Table 1 provides an overview of the latest studies that focus specifically on predicting the compressive strength of geopolymer concretes, highlighting the methodologies used, input parameters considered, and key findings.

It is observed that approximately 100 different inputs were used in the studies in Table 1. Among these, NH, NS, M, FiAg, CT, FA, and CoAg are the most frequently used inputs. It has been reported that the key parameters influencing the compressive strength of geopolymer concrete are curing time, curing temperature, and water content, as these factors govern the geopolymerization reaction process [5].

Approximately 40 different machine learning techniques were employed across the investigated studies (Table 1). The most commonly used techniques include RF, XGB, DT, GB, and SVM. Generally, the best-performing model(s) were identified by comparing the prediction accuracy of various machine learning models.

It is noted that in nearly half of the studies listed in Table 1, hyperparameter optimization was not conducted. However, the predictive performance of models is highly dependent on the correct tuning of hyperparameters, and failing to set them appropriately can significantly limit a model’s potential. Furthermore, in the vast majority of studies, explainable artificial intelligence (XAI) methods were not utilized. Machine learning models typically function as black boxes, making it difficult to understand how they arrive at their predictions. XAI methods help clarify the model’s decision-making process by illustrating the influence of each input variable on the prediction. This transparency enables users to understand the model’s rationale better and interpret its outputs with greater confidence.

The number of samples used in the modeling ranges from 24 to 795, with the majority falling within the 100–350 range. However, the relatively low sample size limits the generalisability of the model, increases the risk of overfitting, and makes it difficult to obtain statistically significant results. This may adversely affect the model’s capacity to produce consistent and reliable forecasts on different datasets. Moreover, models may perform differently even in studies using the same dataset. This is due to differences in data preprocessing methods, model hyperparameter settings, training test data splitting strategies, and algorithmic approaches used in the model’s training process [17,22].

3. Predictive Modeling and Explainable AI

In this study, Random Forest, XGBoost, LightGBM, SVR, and ANN models were used to predict the compressive strength of geopolymer concrete. These models were chosen because they can effectively capture the nonlinear nature of material properties and are robust to overfitting [40,41]. In contrast, SVR and ANN were included as widely adopted baseline models, enabling a meaningful comparison with the ensemble approaches [42]. Each model was trained separately with both default hyperparameters and hyperparameters determined by Bayesian optimization, and their performances were analyzed comparatively. During this optimization process, a 10-fold cross-validation scheme was applied to the training set to ensure that the selected hyperparameters provided robust generalization while maintaining computational efficiency [43]. R², RMSE, MAE, and MAPE statistical metrics were used to evaluate the prediction performance of the models. Finally, to enhance the interpretability of the models, SHAP analysis was conducted within the framework of XAI, and the key variables influencing the strength of geopolymer concrete were identified.

3.1. Predictive Models

3.1.1. Extreme Gradient Boosting (XGB)

XGB is a machine learning model based on the ensemble learning approach, which leverages the concept of boosting by combining the predictions of multiple weak learners through additive training strategies to construct a strong and accurate learner [40]. It is an optimized implementation of the gradient boosting algorithm, designed to achieve high accuracy and computational efficiency. This algorithm builds sequential decision trees to minimize the error rate, with each new tree trained to correct the errors made by the previous ones [44]. Unlike traditional gradient boosting methods, XGB accelerates the optimization process by employing a second-order Taylor expansion and enables more precise model updates. To prevent overfitting, it incorporates L1 (Lasso) and L2 (Ridge) regularization techniques. Additionally, the contribution of each new tree is scaled by a shrinkage factor (learning rate) during training, which helps ensure more stable and balanced learning [45].

3.1.2. Random Forest (RF)

RF is an ensemble learning model that makes high-accuracy predictions by combining a large number of independent decision trees [46]. RF uses the bagging (bootstrap aggregation) method to train each tree on different bootstrap samples and aggregates the predictions of all trees. In this model, each decision tree learns independently on a distinct subset of the training dataset, and the trees are mutually independent. It combines the most common results in classification problems using majority voting and the predictions in regression problems with averaging. Random Forest increases model diversity by selecting features for each tree randomly at each level, which enhances generalization and mitigates overfitting [41].

3.1.3. Light Gradient Boosting Machine (LightGBM)

LightGBM is a gradient boosting model developed by Microsoft that performs rapidly and efficiently on large datasets. Light Gradient Boosting Machine (LightGBM) is a gradient boosting model developed by Microsoft that performs rapidly and efficiently on large datasets. Unlike traditional gradient boosting algorithms, LightGBM uses a leaf-wise tree growth strategy, focusing on reducing the highest error with each new tree. This strategy enables the construction of deeper and more powerful trees; however, various regularization techniques must be applied to reduce the risk of overfitting. With Gradient-based One-Side Sampling (GOSS), samples with small gradients are neglected while samples with large gradients are prioritized, thereby reducing computational costs while maintaining model accuracy [47].

3.1.4. Support Vector Regression (SVR)

SVR is a kernel-based algorithm derived from Support Vector Machines that fits a regression function within a specified margin of tolerance (ε), focusing on data points outside this margin (Support Vectors) [48]. Kernel functions, such as the radial basis function (RBF), allow SVR to capture nonlinear relationships between input variables and the target property. The performance of SVR mainly depends on three hyperparameters: C, which controls the trade-off between model complexity and training error; ε, which defines the margin of tolerance around the regression function; and γ (gamma), which in nonlinear kernels like RBF, determines the influence range of each training point [49]. Proper tuning of these parameters is crucial for achieving a good balance between accuracy and generalization.

3.1.5. Artificial Neural Network (ANN)

ANNs are data-driven models inspired by the structure and functioning of biological neurons. An ANN typically consists of input, hidden, and output layers, where each layer is composed of interconnected nodes (neurons). These neurons apply nonlinear activation functions (e.g., sigmoid, ReLU, tanh) to weighted inputs, allowing the network to capture complex, nonlinear relationships between input features and the target variable [50]. The performance of ANN largely depends on several key hyperparameters: the number of neurons in the first and second hidden layers (hidden1, hidden2), which controls the model’s capacity to capture nonlinear patterns; the regularization parameter α (L2 penalty), which helps prevent overfitting by constraining large weight values; and the initial learning rate (learning_rate_init), which determines the step size during weight updates and affects both convergence speed and stability. Proper selection of these parameters is critical to achieving accurate and generalizable predictions.

3.2. Bayesian Hyperparameter Optimization

Hyperparameter optimization is a crucial step for improving model accuracy, preventing overfitting, and balancing learning speed with model complexity. Each algorithm has key hyperparameters—such as the number of trees, tree depth, and learning rate—that directly influence the learning process. If not properly tuned, models may suffer from reduced accuracy, overfitting, or inefficient training [51]. In boosting-based models such as XGB and LightGBM, appropriate tuning of the learning rate and regularization parameters enhances both convergence speed and generalization, while in bagging-based models such as Random Forest, optimizing parameters like the number of trees and feature selection improves robustness.

Bayesian Optimization (BO) is a powerful technique for this task, especially for complex and computationally expensive models. By leveraging the results of previous trials, BO efficiently searches for optimal parameter values and has been shown to outperform alternative global optimization methods in many benchmark problems [52,53].

In this study, BO was implemented using the open-source bayes_opt Python package. The optimizer employed the library’s default Gaussian Process surrogate model and the Upper Confidence Bound (UCB) acquisition function (κ = 2.576, ξ = 0.0). The search was initialized with 15 random evaluations followed by 25 BO iterations, which served as the stopping criterion. At each BO step, the candidate hyperparameters were evaluated using 10-fold cross-validated negative RMSE, which served as the optimization objective.

Typically, BO utilizes a Gaussian Process to construct a probabilistic model of the objective function. This surrogate model is then used to predict which hyperparameters are most likely to yield improved performance. This process significantly reduces model training and validation time while enabling a more efficient search for optimal hyperparameter values.

3.3. Model Performance Evaluation

In this study, the performance of the models was evaluated using the coefficient of determination (R²), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) metrics [6,7,54,55,56]. These metrics were calculated based on the results of 10-fold cross-validation applied to each model. In this method, the dataset is randomly divided into 10 equal parts; in each iteration, one part is used as the test set, while the remaining nine parts serve as the training set. In the cross-validation procedure, the data were shuffled (shuffle=True) before splitting into training and test subsets in each iteration. This process is repeated 10 times to ensure that each fold functions as the test set exactly once. The application of this approach reduces the likelihood of overfitting and provides a robust evaluation of the model’s ability to generalize to new data [57]. The definitions and formulas of the statistical metrics used in the study are presented in Table 2.

3.4. Explainable Artificial Intelligence (XAI)

Machine learning (ML) models are often considered black boxes because users have limited insight into how and why these models make specific predictions, which can lead to significant issues regarding trust and transparency, especially in decision-making processes [59]. Therefore, XAI has been developed to make the predictions of ML models more understandable and transparent. XAI interprets the model’s decisions by explaining to users how and why predictions are made.

There are two primary explanation approaches in XAI: data-driven interpretation and model-driven interpretation. Data-driven interpretation methods commonly utilize techniques such as Partial Dependence Plots or Feature Importance. Model-driven interpretation is typically performed using methods like SHAP (Shapley Additive Explanations).

SHAP is a powerful technique that clearly explains the contribution of each feature to the model’s prediction by calculating these contributions. Shapley values are used to express the model’s prediction as the sum of the contributions from each feature. SHAP provides reliable and accurate explanations by satisfying three fundamental properties: local accuracy, missingness, and consistency. This enhances the interpretability and transparency of machine learning models. When evaluating feature contributions, SHAP considers the effect of each feature in conjunction with the probabilistic influence of other features. This calculation determines the average impact of each feature by analyzing the probabilities across different combinations of features.

4. Methods

Figure 1 illustrates the methodological framework for predicting the compressive strength of geopolymer concrete. The research begins with the data analysis phase, which focuses on identifying patterns within the dataset and determining any potential preprocessing requirements. Following the data analysis, a comprehensive data preprocessing stage is performed, which involves correlation analysis, applying feature scaling, and preparing suitable data for model training.

The data-preprocessing phase is a crucial step in constructing a clean, complete, and appropriately scaled dataset for training machine learning models. During this phase, errors, missing values, and scale discrepancies in the data are addressed through necessary adjustments to enhance the model’s accuracy and performance.

In this study, three ensemble learning algorithms (XGB, RF, and LightGBM) and two baseline models (SVR and ANN) were implemented for the prediction of compressive strength (CS), using both default parameters and hyperparameter optimization through the Bayesian Optimization (BO) technique to achieve the best possible model performance. The parameters, their respective ranges, and the resulting optimal values obtained through this optimization process are presented in Table 3.

Using both the default and optimized hyperparameters, the compressive strength of the geopolymers was predicted. To rigorously assess the generalization capabilities of the models, a cross-validation approach was employed, as it is widely regarded for providing reliable performance evaluation and reducing the risk of overfitting. In this method, the available dataset is divided into k equally sized subsets, where the model is iteratively trained on k—1 folds and tested on the remaining fold. This process is repeated k times, ensuring that each subset serves as the test set exactly once, and the overall performance is obtained by averaging the results across all folds. In this study, the number of folds (k) was set to 10, which represents a commonly used balance between computational efficiency, mitigation of overfitting [60], and reliable estimation of model generalization, especially for moderately sized datasets.

The performance of each model was assessed using key evaluation metrics, including R², RMSE, MAE, and MAPE. Based on these metrics, the model demonstrating the highest prediction accuracy was identified as the optimal model.

In the final stage of the study, SHAP, a widely recognized XAI technique, was employed to enhance the interpretability of the model’s decision-making process and to analyze the relative importance of input features.

The dataset used in this study consists of 672 samples, 12 input variables, and 1 output variable [61]. A statistical summary of the dataset is presented in Table 4. As shown in Table 4, the input variables NaOH amount (kg/m³), Na₂SiO₃ amount (kg/m³), Extra water (kg/m³), and Fine Aggregate (kg/m³) have constant values across all observations. Since such variables do not contribute to the modeling process due to the absence of variance, they have been removed from the dataset. Constant-valued variables do not provide any informative signal to the model’s learning process, as they do not influence the decision-making mechanism. Therefore, removing such variables from the dataset enables the model to be trained more efficiently and accurately.

5. Results and Discussion

The distribution patterns and densities of the input variables in the dataset are presented in Figure 2a using a violin plot. Upon analyzing the plot, it is observed that some variables have relatively narrow value ranges (e.g., NH, CT, CuTi, TA). Others exhibited wider and higher value ranges (e.g., CoAg, R-Ag). This discrepancy may cause the models to be sensitive to variables on different scales, potentially impairing learning performance. Therefore, data scaling is necessary to ensure more stable and balanced learning by the models. For this purpose, Min-Max normalization was applied in the study, transforming the variation interval of all variables into the (0, 1). The distribution of the normalized variables is illustrated in Figure 2b using violin plots. This process not only enhances model performance but also ensures that variables with different scales are considered with equal weight during training.

To examine the relationships among the variables in the dataset used in this study, a correlation analysis was conducted. The resulting correlation matrix indicates the strength and direction of the relationships between variables. These results are presented in Figure 3 as a correlation heatmap, with a color scale ranging from −1 to 1. In the heatmap, red tones represent negative correlations, while blue tones indicate positive correlations. Correlation analysis facilitates a better understanding of inter-variable relationships, which is valuable during the model development process.

Figure 3 presents the Pearson correlation matrix illustrating inter-variable relationships. The highest positive correlation with compressive strength (CS) is observed for coarse aggregate (CoAg, 0.68) and curing time (CuTi, 0.55), emphasizing their significant roles in the strength development of geopolymer concrete. Moderate positive correlations are also noted for NaOH molarity (M, 0.17) and ground granulated blast furnace slag (GGBFS, 0.19). In contrast, recycled aggregate (R-Ag) exhibits a strong negative correlation with CS (−0.68), implying that higher recycled aggregate content reduces mechanical performance. Fly ash (FA) shows a weaker negative correlation (−0.19), indicating a limited detrimental effect. The correlations for curing temperature (CT, 0.25) and testing age (TA, 0.07) with CS are weak, likely influenced by the narrow and uneven distributions of these parameters, as further visualized in Figure 4. Such distributions can mask underlying effects when evaluated using variance-sensitive metrics such as Pearson correlation. Additionally, two strong inverse relationships emerge: a perfect negative correlation between GGBFS and FA (−1.00), reflecting their reciprocal use as binder components, and an almost perfect negative correlation between CoAg and R-Ag (−0.98), indicating a replacement strategy where recycled aggregate substitutes coarse aggregate in the mix design.

In this study, XGBoost (XGB), RF, LightGBM, SVR, and ANN algorithms were employed to predict the compressive strength with high accuracy. Within this scope, each model was trained using both default parameters and optimized hyperparameter configurations, and the resulting prediction performances were compared (Table 5). During the modeling process, the prediction performance of each model was evaluated using 10-fold cross-validation, providing a more reliable assessment of its generalization capabilities.

The results presented in Table 5 indicate that the models achieve strong overall performance, with hyperparameter optimization leading to further improvements in predictive accuracy.

The XGB model demonstrated remarkably high accuracy even with default parameters (RMSE: 0.4956, R²: 0.9990, MAPE: 0.64%). However, following Bayesian optimization, this already strong performance was further enhanced, with the RMSE reduced to 0.3100, MAPE to 0.50%, and R² reaching an almost perfect value of 0.9997. These results show that even a well-performing model can be significantly improved through proper hyperparameter tuning.

The RF model, on the other hand, showed comparatively lower performance with default parameters (RMSE: 0.9523, MAPE: 1.39%). After hyperparameter optimization, only minor improvements were observed, with the MAPE remaining unchanged. This suggests that optimization had a limited effect on the RF model’s performance.

The LightGBM model achieved a good level of accuracy with default settings (RMSE: 0.7959, MAPE: 1.15%). Following Bayesian optimization, considerable improvements were also recorded for this model, with the RMSE decreasing to 0.3567 and MAPE to 0.57%, bringing its performance close to that of the XGB model.

The SVR model performed competitively even in its default setting (RMSE: 0.4367 MPa, MAPE: 0.59%), outperforming the default LightGBM and RF models. Bayesian optimization further improved its performance, reducing the RMSE to 0.3705 MPa and the MAPE to 0.51%, achieving accuracy comparable to the optimized XGB and LightGBM models.

In contrast, the ANN model exhibited the weakest baseline performance (RMSE: 2.1965 MPa, MAPE: 3.66%), reflecting its sensitivity to hyperparameters and the relatively small size of the dataset. Nevertheless, Bayesian optimization substantially enhanced its predictive capability, lowering the RMSE to 1.0493 MPa and nearly halving the MAPE to 1.87%, even though its performance remained slightly below that of the tree-based ensemble models and the SVR model.

As a result, all models demonstrated remarkably high predictive performance even with their default configurations, with R² values approaching 1.0. The high accuracy achieved by machine learning models is also evident in their predictive capability across various fields [62,63,64], and similar outcomes have been reported in another study using the same dataset [65]. However, unlike previous studies, the present work employed Bayesian optimization, resulting in both an increase in R² values and a reduction in RMSE, MAE, and MAPE metrics. Moreover, hyperparameter optimization yielded particularly notable improvements for the SVR and LightGBM models, with their error metrics such as RMSE, MAE, and MAPE being nearly halved compared to their default configurations. These findings indicate that although the XGB model provided the highest absolute performance, optimization can deliver substantial gains for both moderately and highly performing models, thereby enhancing the robustness of the predictive framework.

In predicting compressive strength, the XGB-BO model achieved the most accurate results among all models evaluated. The relationship between the predicted values obtained through 10-fold cross-validation and the actual compressive strength values (MPa) is presented in Figure 4. In the plot, actual values are shown on the horizontal axis, while the predicted values by the model are displayed on the vertical axis. The red dashed diagonal line represents the ideal case (y = y_pred), where the predicted values perfectly match the actual ones. Data points that lie closer to this line indicate higher prediction accuracy, while deviations from the line reflect prediction errors. The close clustering of points around this ideal line in Figure 4 visually confirms the high performance and precision of the XGB-BO model.

Figure 4 shows that the predicted values are mostly located near the ideal prediction line. This observation indicates that the XGB-BO model demonstrates high predictive accuracy and generally maintains a low error rate. The dense and balanced distribution of data points around the ideal line suggests that the model does not exhibit systematic bias and has no apparent tendency toward deviation. Notably, there is no observed dispersion or skewness in either the lower or the higher ranges of compressive strength values. This further implies that the model has effectively learned both low and high-strength characteristics, showcasing a strong generalization capability. In conclusion, the XGB-BO model effectively predicts the compressive strength of geopolymer concrete, and the cross-validation results support the model’s reliability. This accuracy implies that the model can provide robust predictions across various datasets and is suitable for practical applications.

To enhance the interpretability of the XGB-BO model, a SHAP analysis was performed (Figure 5). By quantifying the contribution of each feature to the predictions of the XGB-BO model, SHAP enhances the transparency of the model’s decision-making process.

Figure 5a displays the SHAP summary plot for the XGBoost model, showing how each input variable affects the model’s predictions across all observations. The plot captures both the direction and strength of each feature’s effect. The SHAP value on the x-axis indicates how much and in which direction a specific variable influences the model’s output. Meanwhile, the color scale represents the actual value of the variable, from low (blue) to high (red). Notably, the variables CoAg and CuTi demonstrate high SHAP values, meaning that higher values of these features positively influence the predicted outcome. Conversely, M shows a nonlinear interaction pattern, producing both positive and negative effects depending on its range. Variables such as FA and R-Ag tend to have a negative impact at higher levels, while at lower levels, they contribute more positively. This visualization is essential because it not only emphasizes the relative importance of each variable but also shows the direction and distribution of their individual effects on the model output.

Figure 5b shows a feature importance plot based on the mean absolute SHAP values calculated across all observations. This plot ranks the overall influence of each feature on the model’s predictions, clearly indicating the order of importance. In this context, CoAg and CuTi are the most influential variables, showing the highest average SHAP values by a large margin. M and FA follow these. Conversely, features like CT and TA contribute less to the model’s predictive ability. This plot is a helpful tool for guiding feature selection, simplifying models, and improving interpretability.

When these two plots are considered together, it becomes evident that the SHAP-based explainability analysis provides a quantitative and qualitative understanding of the variables influencing the model’s decision-making process. A key point to emphasize here is that SHAP analysis is not limited to examining direct linear relationships (correlations) between input parameters and the target variable, but also captures the actual contribution of each feature to the model output, including complex interactions and nonlinear effects. In this study, for instance, the variable R-Ag exhibited the highest linear correlation with compressive strength (CS); however, it appeared as a low-impact feature in the SHAP importance ranking. This indicates that despite its strong superficial linear correlation, R-Ag plays a limited role in the actual decision-making mechanism of the predictive model. Therefore, SHAP analysis provides insights beyond conventional correlation analysis by revealing not only which features the model is most sensitive to, but also how and within which value ranges these features affect predictions, offering transparency and interpretability that surpass the limitations of traditional black-box approaches.

When the SHAP-based results are interpreted in the context of geopolymerization mechanisms, CoAg (coarse aggregate) and CuTi (curing time) emerge as the most influential variables, as they strongly affect the microstructural development and load-bearing capacity of geopolymer concretes. An adequate amount of coarse aggregate enhances the packing density of the mixture, contributing to the formation of a stable skeleton that supports the development of strength. Whereas a longer curing time allows the geopolymerization reactions to proceed more completely, leading to the formation of a denser gel phase and, consequently, higher compressive strength.

The M (alkali activator concentration) variable shows a moderate yet dual effect: at optimal levels, it facilitates the dissolution of aluminosilicate precursors and accelerates geopolymer gel formation; however, at excessively high concentrations, it can induce rapid setting and microcracking, thereby exerting a detrimental influence. The nonlinear behavior observed in the SHAP summary plot reflects this dual role. In contrast, R-Ag (recycled aggregate), despite exhibiting a strong linear correlation with compressive strength, appears as a lower-impact feature in the SHAP importance ranking. This is primarily due to the weaker interfacial bonding and higher porosity typically associated with recycled aggregates, which reduces their effective contribution to strength. Similarly, the amounts of FA (fly ash) and GGBFS (ground granulated blast-furnace slag) also exhibit different effects depending on their proportions in the mix. The proportions of these precursors govern the Si/Al ratio and the availability of calcium in the mixture, thereby influencing the balance between the geopolymeric gel and the C–A–S–H phases. These findings are consistent with the established understanding that the Si/Al and Ca/Al ratios govern geopolymerization kinetics and phase assemblage. Similar effects of precursor composition on strength development and microstructural properties have been reported by [66] in their study on slag/fly ash-based geopolymers.

The differences observed between the correlation analysis and the SHAP-based feature importance rankings for certain variables arise from the distinct methodological foundations of the two approaches. For example, although the correlation matrix shows a clear negative relationship between recycled aggregate (R-Ag) and compressive strength (CS) (−0.68), the SHAP analysis assigns a lower relative importance to R-Ag. This discrepancy results from the fact that correlation coefficients capture only pairwise linear relationships, whereas SHAP values explain the model’s multivariate decision-making process and account for interactions among variables. Since SHAP calculates the marginal contribution of each variable to the prediction for every observation while accounting for all other variables, it reflects nonlinear interactions arising from factors such as binder type, aggregate proportions, and curing conditions. Therefore, while the detrimental effect of recycled aggregate on compressive strength remains, its magnitude depends on its interactions with other mixture components, and the SHAP analysis provides a more accurate representation of the model’s multivariate structure.

In addition to evaluating the model’s performance and interpretability, a graphical user interface (GUI) was developed to facilitate the practical implementation of the proposed XGB-BO model for compressive strength prediction (Figure 6). This interface, named the ‘Geopolymer Compressive Strength Predictor’, enables users to estimate the compressive strength of geopolymer concrete by entering key mixture design parameters, including fly ash amount, GGBFS amount, NaOH molarity, coarse and recycled aggregate contents, curing temperature, curing time, and age of testing. Once the inputs are provided, the model instantly predicts the corresponding compressive strength value in megapascals (MPa). This tool offers a rapid and accessible solution for engineers and researchers to estimate mechanical performance without the need for complex calculations or additional experiments. Furthermore, the designed GUI serves as a bridge between the complexity of the developed ML model and its practical application, enabling broader access to the study’s insights and fostering real-world adoption.

As illustrated in Figure 6, the developed interface provides a user-friendly environment where all required inputs can be entered effortlessly. In this example, the user-defined mixture proportions and curing conditions result in a predicted compressive strength of 27.32 MPa. Such an application demonstrates the potential of the proposed tool to support decision-making processes in material design, optimize mixture compositions, and reduce the reliance on time-consuming laboratory trials. Consequently, this GUI not only simplifies access to advanced machine learning predictions but also contributes to enhancing the efficiency of sustainable construction practices.

6. Conclusions

This study presents a high-accuracy AI-based model for predicting the compressive strength of geopolymer concretes by employing ensemble learning techniques, including XGB, RF, LightGBM, as well as baseline models SVR and ANN. The XGB model demonstrated superior performance, particularly when optimized through Bayesian Optimization, achieving excellent predictive accuracy. SHAP analysis further revealed that coarse aggregate content, curing time, and NaOH molarity are the most influential parameters affecting strength. In addition, a user-friendly GUI was developed to provide fast and accessible strength estimation, supporting the practical application of sustainable concrete technologies. The primary findings of this study are summarized as follows:

The XGB-BO model outperformed other ensemble machine learning models in terms of predictive accuracy, as demonstrated by its higher R² and lower RMSE, MAE, and MAPE values. This model effectively captured the complex, nonlinear relationships among mixture design parameters influencing the compressive strength of geopolymer concretes, providing a robust, reliable, and explainable prediction framework.
The integration of the XGB model with Bayesian Optimization (BO) has significantly enhanced the model’s predictive performance. The BO approach enabled a more efficient search for optimal hyperparameters, reducing the risk of overfitting and maintaining high prediction accuracy. This optimization strategy provided a notable reduction in error rates and played a key role in improving the model’s robustness and generalization capability.
Model interpretability analysis using SHAP contribution values revealed that the most influential parameters affecting the compressive strength of geopolymer concretes are coarse aggregate, curing time, and NaOH molar concentration, with corresponding mean SHAP values of 9.109, 7.743, and 2.584, respectively. The SHAP feature importance plot provides an explicit quantification of how each variable contributes to the model’s predictions, highlighting their relative impact. In this way, ensemble learning models, which are often considered black-box approaches, were made explainable, and the model’s decision-making process was rendered more transparent.
A user-friendly GUI was developed to enable fast and accessible prediction of the compressive strength of geopolymer concretes based on mixture design parameters. This tool facilitates the practical implementation of the proposed AI-based framework and supports its adoption in real-world engineering applications.
In this study, R-Ag showed the highest linear correlation with compressive strength (CS); however, CoAg was the most influential feature in the SHAP importance ranking. This indicates that despite its strong linear correlation, the prediction model for R-Ag has a limited impact on actual decision-making. Therefore, using not only correlation coefficients but also model-based explainability methods when selecting parameters for experimental designs offers a more comprehensive and reliable basis for parameter choice and process optimization in complex concrete systems.

The success of machine learning models largely depends on the structural characteristics of the dataset used. Even if the same data set is used in different studies, differences in data preprocessing methods, the way training and testing sets are determined, and modeling strategies can directly affect the accuracy and reliability of the results obtained. In particular, imbalances in the distribution of variables or insufficient data density at certain intervals limit the model prediction performance and reduce the statistical reliability of the regression coefficients produced. In this context, to develop models that predict the compressive strength of geopolymer concrete with higher accuracy and generalizability, more comprehensive and systematic experimental studies are needed to obtain balanced, homogeneous, and extensive datasets. Such studies will contribute to the more effective use of AI-based prediction models in the field of sustainable building materials, while also increasing model reliability.

Author Contributions

Conceptualization, P.C. and M.T.C.; methodology, P.C.; software, P.C.; investigation, P.C. and M.T.C.; writing—original draft preparation, P.C. and M.T.C.; writing—review and editing, P.C. and M.T.C.; visualization, P.C.; supervision, P.C. and M.T.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data used in this manuscript are referenced in the Methods section of the paper. The dataset is available at https://data.mendeley.com/datasets/ksvb32cvw6/1 (accessed on 9 May 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ABR	Adaptive boosting regressor
AD	Alkali dosage
ADA	ADAboost
AF	Alccofine
Ag	Aggregate
ANFIS	Adaptive neuro-fuzzy inference system
ANN	Artificial neural networks
AS	Alkaline solution
ASC	Alkaline solution concentration
ASM	Aluminosilicate material
B	Binder
BC	Basicity coefficient
BDT	Boosted decision tree
BNN	Bayesian neural network
BO	Bayesian Optimization
BPNN	Back-propagation neural network
BR	Bagging regresor
CatBoost	Categorical boosting regressor
CCA	Corncob ash
CD	Curing duration
CFV	Compaction factor value
CG	Coal gangue
CN2	CN2 Rule induction
CNN	1 D Convolution neural network
CoAg	Coarse aggregate
CoS	Copper slag
CS	Compressive strength
CSG	Concrete strength grade
CSO	Cat swarm optimization
CT	Curing temperature
CuMe	Curing method
CuTi	Curing time
Dmax	Maximum size of coarse aggregate
DNN	Deep neural network
DT	Decision tree
ECSO	Enhanced cat swarm optimization
EL	Ensemble learning
ELM	Extreme learning machine
EN	Elastic net
ESA	Eggshell ash
ET	Elevated temperature
ETR	Extra trees regressor
EW	Extra water
FA	Fly ash
FAR	Fibre aspect ratio
FD	Fibre diameter
FE	Elastic modulus of fibre
FiAg	Fine aggregate
FL	Fibre length
Fs	Fineness modulus of fine aggregate
FTS	Fibre tensile strength
FV	Fibre volume
GA	Genetic algorithm
GB	Gradient boosting
GEP	Gene expression programming
GGBFS	Ground granulated blast furnace slag
GMDH	Group method of data handling
GP	Glass powder
GPR	Gaussian process regression
GUI	Graphical user interface
H	Humidity
HCD	High-temperature curing duration
HD	Heating duration
HM	Hydration modulus
HO	Hyperparameter optimization
HR	Heating rate
HT	Heating temperature
HTT	Heat treatment time
KNN	K-nearest neighbour
L	Liquid
LightGBM	Light gradient boosting machine
LI	Loss on ignition
LoR	Logistic regression
LR	Linear regression
LSTM	Long short-term memory
M	NaOH molar concentration
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
MARS	Multivariate adaptive regression splines
MEP	Multi-expression programming
MK	Metakaolin
ML	Machine learning
MLP	Multilayer perceptron regressor
MLR	Multiple linear regression
MP	Mixing procedure
MR	Mole ratio
Ms	Silica module (SiO2/Na2O)
NB	Naive bayes
NH	NaOH
NS	Na2SiO3
PM	Pozzolanic material
PSO	Particle swarm optimization
PT	Pretreatment temperature
R	Recycled
R²	Coefficient of determination
RA	Rubber aggregate
RAWA	Recycled aggregate water absorption
ResNet	Deep residual network
RF	Random forest
RHA	Rice husk ash
RM	Red Mud
RMSE	Root Mean Square Error
Rub	Rubber
S	Solid
SC	Slag cement
SF	Silica fume
SGD	Stochastic gradient descent
SHAP	SHapley Additive exPlanations
SHO	Spotted hyena optimization
SP	Superplasticizer
SS	Silica sand
SSA	Specific surface area
StF	Steel fiber
SVM	Support vector machine
TA	Test age
UPV	Ultrasonic pulse velocity
W	Water
WR	Water reducer
XAI	Explainable artificial intelligence
XGB	Extreme gradient boosting

References

Abdellatief, M.; Abd Elrahman, M.; Abadel, A.A.; Wasim, M.; Tahwia, A. Ultra-high performance concrete versus ultra-high performance geopolymer concrete: Mechanical performance, microstructure, and ecological assessment. J. Build. Eng. 2023, 79, 107835. [Google Scholar] [CrossRef]
Pacheco-Torgal, F. Introduction to handbook of alkali-activated cements, mortars and concretes. In Handbook of Alkali-Activated Cements, Mortars and Concretes; Elsevier: Amsterdam, The Netherlands, 2015; pp. 1–16. [Google Scholar]
Azimi-Pour, M.; Eskandari-Naddaf, H.; Pakzad, A. Linear and non-linear SVM prediction for fresh properties and compressive strength of high volume fly ash self-compacting concrete. Constr. Build. Mater. 2020, 230, 117021. [Google Scholar] [CrossRef]
Nguyen, H.; Vu, T.; Vo, T.P.; Thai, H.-T. Efficient machine learning models for prediction of concrete strengths. Constr. Build. Mater. 2021, 266, 120950. [Google Scholar] [CrossRef]
Rathnayaka, M.; Karunasinghe, D.; Gunasekara, C.; Wijesundara, K.; Lokuge, W.; Law, D.W. Machine learning approaches to predict compressive strength of fly ash-based geopolymer concrete: A comprehensive review. Constr. Build. Mater. 2024, 419, 135519. [Google Scholar] [CrossRef]
Cihan, M.T. Comparison of artificial intelligence methods for predicting compressive strength of concrete. Građevinar 2021, 73, 617–632. [Google Scholar]
Cihan, M.T. Prediction of concrete compressive strength and slump by machine learning methods. Adv. Civ. Eng. 2019, 2019, 3069046. [Google Scholar] [CrossRef]
Shahrokhishahraki, M.; Malekpour, M.; Mirvalad, S.; Faraone, G. Machine learning predictions for optimal cement content in sustainable concrete constructions. J. Build. Eng. 2024, 82, 108160. [Google Scholar] [CrossRef]
Chen, B.; Wang, L.; Feng, Z.; Liu, Y.; Wu, X.; Qin, Y.; Xia, L. Optimization of high-performance concrete mix ratio design using machine learning. Eng. Appl. Artif. Intell. 2023, 122, 106047. [Google Scholar] [CrossRef]
Kumar, P.; Pratap, B. Feature engineering for predicting compressive strength of high-strength concrete with machine learning models. Asian J. Civ. Eng. 2024, 25, 723–736. [Google Scholar] [CrossRef]
Zhao, N.; Zhang, H.; Xie, P.; Chen, X.; Wang, X. Prediction of compressive strength of multiple types of fiber-reinforced concrete based on optimized machine learning models. Eng. Appl. Artif. Intell. 2025, 152, 110714. [Google Scholar] [CrossRef]
Dong, Y.; Tang, J.; Xu, X.; Li, W.; Feng, X.; Lu, C.; Hu, Z.; Liu, J. A new method to evaluate features importance in machine-learning based prediction of concrete compressive strength. J. Build. Eng. 2025, 102, 111874. [Google Scholar] [CrossRef]
Ghosh, A.; Ransinchung, G. Application of machine learning algorithm to assess the efficacy of varying industrial wastes and curing methods on strength development of geopolymer concrete. Constr. Build. Mater. 2022, 341, 127828. [Google Scholar] [CrossRef]
Wang, Y.; Iqtidar, A.; Amin, M.N.; Nazar, S.; Hassan, A.M.; Ali, M. Predictive modelling of compressive strength of fly ash and ground granulated blast furnace slag based geopolymer concrete using machine learning techniques. Case Stud. Constr. Mater. 2024, 20, e03130. [Google Scholar] [CrossRef]
Revathi, B.; Gobinath, R.; Bala, G.S.; Nagaraju, T.V.; Bonthu, S. Harnessing explainable Artificial Intelligence (XAI) for enhanced geopolymer concrete mix optimization. Results Eng. 2024, 24, 103036. [Google Scholar] [CrossRef]
Nguyen, K.T.; Nguyen, Q.D.; Le, T.A.; Shin, J.; Lee, K. Analyzing the compressive strength of green fly ash based geopolymer concrete using experiment and machine learning approaches. Constr. Build. Mater. 2020, 247, 118581. [Google Scholar] [CrossRef]
Peng, Y.; Unluer, C. Analyzing the mechanical performance of fly ash-based geopolymer concrete with different machine learning techniques. Constr. Build. Mater. 2022, 316, 125785. [Google Scholar] [CrossRef]
Shen, J.; Li, Y.; Lin, H.; Li, H.; Lv, J.; Feng, S.; Ci, J. Prediction of compressive strength of alkali-activated construction demolition waste geopolymers using ensemble machine learning. Constr. Build. Mater. 2022, 360, 129600. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, C.; Zhang, J.; Wang, L.; Wang, F. Effect of composition and curing on alkali activated fly ash-slag binders: Machine learning prediction with a random forest-genetic algorithm hybrid model. Constr. Build. Mater. 2023, 366, 129940. [Google Scholar] [CrossRef]
Dash, P.K.; Parhi, S.K.; Patro, S.K.; Panigrahi, R. Efficient machine learning algorithm with enhanced cat swarm optimization for prediction of compressive strength of GGBS-based geopolymer concrete at elevated temperature. Constr. Build. Mater. 2023, 400, 132814. [Google Scholar] [CrossRef]
Gad, M.A.; Nikbakht, E.; Ragab, M.G. Predicting the compressive strength of engineered geopolymer composites using automated machine learning. Constr. Build. Mater. 2024, 442, 137509. [Google Scholar] [CrossRef]
Gomaa, E.; Han, T.; ElGawady, M.; Huang, J.; Kumar, A. Machine learning to predict properties of fresh and hardened alkali-activated concrete. Cem. Concr. Compos. 2021, 115, 103863. [Google Scholar] [CrossRef]
Afzali, S.A.E.; Shayanfar, M.A.; Ghanooni-Bagha, M.; Golafshani, E.; Ngo, T. The use of machine learning techniques to investigate the properties of metakaolin-based geopolymer concrete. J. Clean. Prod. 2024, 446, 141305. [Google Scholar] [CrossRef]
Golafshani, E.; Khodadadi, N.; Ngo, T.; Nanni, A.; Behnood, A. Modelling the compressive strength of geopolymer recycled aggregate concrete using ensemble machine learning. Adv. Eng. Softw. 2024, 191, 103611. [Google Scholar] [CrossRef]
Liu, L.; Du, Y.T.; Amin, M.N.; Nazar, S.; Khan, K.; Qadir, M.T. Explicable AI-based modeling for the compressive strength of metakaolin-derived geopolymers. Case Stud. Constr. Mater. 2024, 21, e03849. [Google Scholar] [CrossRef]
Zeng, Y.; Chen, Y.; Liu, Y.; Wu, T.; Zhao, Y.; Jin, D.; Xu, F. Prediction of compressive and flexural strength of coal gangue-based geopolymer using machine learning method. Mater. Today Commun. 2025, 44, 112076. [Google Scholar] [CrossRef]
Parhi, S.K.; Dwibedy, S.; Panigrahi, S.K. AI-driven critical parameter optimization of sustainable self-compacting geopolymer concrete. J. Build. Eng. 2024, 86, 108923. [Google Scholar] [CrossRef]
Ji, H.; Lyu, Y.; Ying, W.; Liu, J.-C.; Ye, H. Machine learning guided iterative mix design of geopolymer concrete. J. Build. Eng. 2024, 91, 109710. [Google Scholar] [CrossRef]
Ranasinghe, R.; Kulasooriya, W.; Perera, U.S.; Ekanayake, I.; Meddage, D.; Mohotti, D.; Rathanayake, U. Eco-friendly mix design of slag-ash-based geopolymer concrete using explainable deep learning. Results Eng. 2024, 23, 102503. [Google Scholar] [CrossRef]
Bypour, M.; Yekrangnia, M.; Kioumarsi, M. Machine Learning-Driven Optimization for Predicting Compressive Strength in Fly Ash Geopolymer Concrete. Clean. Eng. Technol. 2025, 25, 100899. [Google Scholar] [CrossRef]
Sathiparan, N.; Jeyananthan, P. Predicting compressive strength of quarry waste-based geopolymer mortar using machine learning algorithms incorporating mix design and ultrasonic pulse velocity. Nondestruct. Test. Eval. 2024, 39, 2486–2509. [Google Scholar] [CrossRef]
Le, Q.-H.; Nguyen, D.-H.; Sang-To, T.; Khatir, S.; Le-Minh, H.; Gandomi, A.H.; Cuong-Le, T. Machine learning based models for predicting compressive strength of geopolymer concrete. Front. Struct. Civ. Eng. 2024, 18, 1028–1049. [Google Scholar] [CrossRef]
Khan, A.Q.; Naveed, M.H.; Rasheed, M.D.; Miao, P. Prediction of compressive strength of fly ash-based geopolymer concrete using supervised machine learning methods. Arab. J. Sci. Eng. 2024, 49, 4889–4904. [Google Scholar] [CrossRef]
Yeluri, S.C.; Singh, K.; Kumar, A.; Aggarwal, Y.; Sihag, P. Estimation of compressive strength of rubberised slag based geopolymer concrete using various machine learning techniques based models. Iran. J. Sci. Technol. Trans. Civ. Eng. 2025, 49, 1157–1172. [Google Scholar]
Philip, S.; Nakkeeran, G. Soft computing techniques for predicting the compressive strength properties of fly ash geopolymer concrete using regression-based machine learning approaches. J. Build. Pathol. Rehabil. 2024, 9, 108. [Google Scholar] [CrossRef]
Onyelowe, K.C.; Ebid, A.M.; Awoyera, P.; Kamchoom, V.; Rosero, E.; Albuja, M.; Mancheno, C. Prediction and validation of mechanical properties of self-compacting geopolymer concrete using combined machine learning methods a comparative and suitability assessment of the best analysis. Sci. Rep. 2025, 15, 6361. [Google Scholar] [CrossRef] [PubMed]
Philip, S.; Nidhi, M.; Ahmed, H.U. A comparative analysis of tree-based machine learning algorithms for predicting the mechanical properties of fibre-reinforced GGBS geopolymer concrete. Multiscale Multidiscip. Model. Exp. Des. 2024, 7, 2555–2583. [Google Scholar]
Yang, H.; Li, H.; Jiang, J. Predictive modeling of compressive strength of geopolymer concrete before and after high temperature applying machine learning algorithms. Struct. Concr. 2025, 26, 1699–1732. [Google Scholar]
Diksha; Dev, N.; Goyal, P.K. Utilizing an enhanced machine learning approach for geopolymer concrete analysis. Nondestruct. Test. Eval. 2025, 40, 904–931. [Google Scholar] [CrossRef]
Mustapha, I.B.; Abdulkareem, Z.; Abdulkareem, M.; Ganiyu, A. Predictive modeling of physical and mechanical properties of pervious concrete using XGBoost. Neural Comput. Appl. 2024, 36, 9245–9261. [Google Scholar] [CrossRef]
Cutler, A.; Cutler, D.R.; Stevens, J.R. Random forests. In Ensemble Machine Learning: Methods and Applications; Springer: New York, NY, USA, 2012; pp. 157–175. [Google Scholar]
Akbarzadeh, M.R.; Jahangiri, V.; Naeim, B.; Asgari, A. Advanced computational framework for fragility analysis of elevated steel tanks using hybrid and ensemble machine learning techniques. Structures 2025, 81, 110205. [Google Scholar] [CrossRef]
Jahangiri, V.; Akbarzadeh, M.R.; Shahamat, S.A.; Asgari, A.; Naeim, B.; Ranjbar, F. Machine learning-based prediction of seismic response of steel diagrid systems. Structures 2025, 80, 109791. [Google Scholar] [CrossRef]
Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T. Xgboost: Extreme Gradient Boosting. R Package Version 0.4-2. 2015, Volume 1, pp. 1–4. Available online: https://cran.ms.unimelb.edu.au/web/packages/xgboost/vignettes/xgboost.pdf (accessed on 10 May 2025).
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149. [Google Scholar]
Cihan, P.; Ozel, H.; Ozcan, H.K. Modeling of atmospheric particulate matters via artificial intelligence methods. Environ. Monit. Assess. 2021, 193, 287. [Google Scholar] [CrossRef]
Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: New York, NY, USA, 2013. [Google Scholar]
Zou, J.; Han, Y.; So, S.-S. Artificial Neural Networks: Methods and Applications; Livingstone, D.J., Ed.; Humana Press: Totowa, NJ, USA, 2008; pp. 14–22. [Google Scholar]
Cihan, P. Bayesian Hyperparameter Optimization of Machine Learning Models for Predicting Biomass Gasification Gases. Appl. Sci. 2025, 15, 1018. [Google Scholar] [CrossRef]
Qiu, Y.; Zhou, J.; Khandelwal, M.; Yang, H.; Yang, P.; Li, C. Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Eng. Comput. 2022, 38, 4145–4162. [Google Scholar] [CrossRef]
Jones, D.R. A taxonomy of global optimization methods based on response surfaces. J. Glob. Optim. 2001, 21, 345–383. [Google Scholar] [CrossRef]
Cihan, M.T.; Aral, I.F. Application of AI models for predicting properties of mortars incorporating waste powders under Freeze-Thaw condition. Comput. Concr. 2022, 29, 187–199. [Google Scholar]
Cihan, P. Comparative performance analysis of deep learning, classical, and hybrid time series models in ecological footprint forecasting. Appl. Sci. 2024, 14, 1479. [Google Scholar] [CrossRef]
Cihan, P.; Özcan, H.K.; Öngen, A. Prediction of tropospheric ozone concentration with Bagging-MLP method. Gazi Mühendislik Bilim. Derg. 2023, 9, 557–573. [Google Scholar]
Teodorescu, V.; Obreja Brașoveanu, L. Assessing the Validity of k-Fold Cross-Validation for Model Selection: Evidence from Bankruptcy Prediction Using Random Forest and XGBoost. Comput. 2025, 13, 127. [Google Scholar] [CrossRef]
Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peerj Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
Ekanayake, I.; Meddage, D.; Rathnayake, U. A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP). Case Stud. Constr. Mater. 2022, 16, e01059. [Google Scholar] [CrossRef]
Hassanat, A.B.; Ali, H.N.; Tarawneh, A.S.; Alrashidi, M.; Alghamdi, M.; Altarawneh, G.A.; Abbadi, M.A. Magnetic force classifier: A novel method for big data classification. IEEE Access 2022, 10, 12592–12606. [Google Scholar] [CrossRef]
Rahmati, S.; Mahdikhani, M. Geopolymer Concrete Compressive Strength Data Set for ML. In Mendeley Data; Elsevier, Inc.: Philadelphia, PA, USA, 2023. [Google Scholar]
Fu, K.; Xue, Y.; Qiu, D.; Wang, P.; Lu, H. Multi-channel fusion prediction of TBM tunneling thrust based on multimodal decomposition and reconstruction. Tunn. Undergr. Space Technol. 2026, 167, 107061. [Google Scholar] [CrossRef]
Yao, S.; Chen, F.; Wang, Y.; Zhou, H.; Liu, K. Manufacturing defect-induced multiscale weakening mechanisms in carbon fiber reinforced polymers captured by 3D CT-based machine learning and high-fidelity modeling. Compos. Part A Appl. Sci. Manuf. 2025, 197, 109052. [Google Scholar] [CrossRef]
Niu, Y.; Wang, W.; Su, Y.; Jia, F.; Long, X. Plastic damage prediction of concrete under compression based on deep learning. Acta Mech. 2024, 235, 255–266. [Google Scholar] [CrossRef]
Gautam, R.; Jaiswal, R.; Yadav, U.S. AI-Enhanced Data-Driven Approach to Model the Mechanical Behavior of Sustainable Geopolymer Concrete. Research Square. 2024. Available online: https://www.researchsquare.com/article/rs-5307352/v1 (accessed on 25 May 2025).
Luo, B.; Su, Y.; Ding, X.; Chen, Y.; Liu, C. Modulation of initial CaO/Al₂O₃ and SiO₂/Al₂O₃ ratios on the properties of slag/fly ash-based geopolymer stabilized clay: Synergistic effects and stabilization mechanism. Mater. Today Commun. 2025, 47, 113295. [Google Scholar] [CrossRef]

Figure 1. Flowchart for estimating geopolymer concrete compressive strength.

Figure 2. Violin plots of (a) the original dataset and (b) the normalized dataset.

Figure 3. Correlation heatmap for the geopolymer concrete dataset.

Figure 4. Actual and predicted values for compressive strength.

Figure 5. SHAP-based interpretability analysis for the XGB-BO model: (a) SHAP summary plot showing individual feature impacts; (b) Bar chart of mean absolute SHAP values indicating overall feature importance.

Figure 6. The graphical user interface for predicting compressive strength.

Table 1. Overview of recent studies employing ML techniques for the prediction of compressive strength in geopolymer concretes.

References	Sample Size	Inputs	ML Method	HO	Best Model	R/R²	XAI
Ghosh, A. et al. [13]	-	FA, CuTi, CuMe	LR, DT, RF, SVM	×	DT, RF	-/0.9879	×
Wang, Y. et al. [14]	156	FA, GGBFS, FiAg, CoAg, NH, NS, SP, CT	ANN, ANFIS, GEP	×	GEP	0.99/-	×
Revathi, B. et al. [15]	37	FA, GGBFS, AF, CoAg, FiAg, NH, NS, W	RF	√	RF	-/0.79	√
	24	PM, FA, M, Na/Si, Si/Al, H₂O/Na₂O, Na/Al				-/0.81
	28	NH, NS, NS/NH, M, CT, ET				-/0.88
Nguyen, K. T. et al. [16]	335	FA, NH, NS, CaAg, FiAg, W, Molarity, CuTi, CT	DNN, ResNet	×	ResNet	0.9927/-	×
Peng, Y. et al. [17]	110	FA, SiO₂, Al₂O₃, CoAg, FiAg, NH, Molarity, NS, NS/NH, AA/FA, W, SP	BPNN, SVM, ELM	×	BPNN	-/0.8221	×
Shen, J. et al. [18]	328	Ms, Na₂O, SiO₂/Al₂O₃, Na₂O/SiO₂, L/S, PT, HTT, TA	RF, GB, XGB	√	XGB	-/0.939	√
Zhang, M. et al. [19]	616	Si/Al, Na/Al, Ca/Si, CT, CuTi, H, W	BPNN, LoR, MLR, SVM, RF	√	RF	0.9322/-	×
Dash, P. K. et al. [20]	192	GGBFS, CoAg, FiAg, AS/GGBFS, NH, NS, NS/NH, CuTi, CT	ELM, ELM-CSO, ELM-ECSO	√	ELM-ECSO	-/0.957	×
Gad, M. A. et al. [21]	132	FA, GGBFS, SF, NH, NS, SS, EW, WR, CuMe	CatBoost, XGB, ETR, DT, RF, GB	√	GB	-/0.9651	√
Gomaa, E. et al. [22]	180	CoAg, FiAg, SiO₂, Al₂O₃, Fe₂O₃, CaO, MgO, Na₂O, K₂O, TiO₂, P₂O₅, MnO, LI, W, SSA-FA, MP, CuMe, CT, CuTi, TA	RF	√	RF	0.972/0.944	×
Afzali, S. A. E. et al. [23]	235	MK, NH, Molarity, NS, EW, W/S, SiO₂/Al₂O₃, H₂O/Na₂O, Na₂O/ Al₂O₃, CoAg/FiAg, SP, TA, CT	GB, RF, DT, ANN, SVM	√	GB	-/0.983	√
Golafshani, E. et al. [24]	314	FA, SC, CoAg, R-CoAg, FiAg, NH, NS, SP, RAWA, M, CT, HCD, TA	RF, BR, ETR, ABR, GB, XGB, CatBoost, LightGBM	√	XGB	-/0.955	√
Liu, L. et al. [25]	235	SiO₂/Al₂O₃, Na₂O/Al₂O₃, H₂O/Na₂O, CoAg/FiAg, SP, W/S, EW, NH, NS, Molarity, MK, TA, CT	GEP, MEP	√	MEP	0.98/0.96	√
Zeng, Y. et al. [26]	206	AD, SiO₂/Na₂O, W/B, SiO₂/Al₂O₃, CaO/SiO₂, Al₂O₃/Na₂O, CaO/Al₂O₃, CaO/(SiO₂+Al₂O₃), CG	RF, XGB, MLP, DT	√	XGB	-/0.882	×
Parhi, S. K. et al. [27]	240	FA, GGBFS, SiO₂, Al₂O₃, Fe₂O₃, CaO, CoAg, FiAg, NH, Molarity, NS, NS/NH, AS/B, EW, SP, CuTi, CT	ABR, RF, XGB, Hybrid	√	Hybrid	-/0.97	×
Ji, H. et al. [28]	795	FA, Na₂O, Ms, W/FA, CoAg/FA, FiAg/FA, Fs, Dmax, BC, HM, CT, CuTi, TA	XGB	√	XGB	-/0.95	×
Ranasinghe, R. S. S. et al. [29]	260	GGBFS, CCA, FiAg, CoAg, W, NH, NS, CuTi, Molarity, CSG	ANN, DNN, CNN	×	DNN	-/0.972	√
Bypour, M. et al. [30]	161	SiO₂, Al₂O₃, Fe₂O₃, CaO, P₂O₅, SO₃, K₂O, TiO₂, MgO, Molarity, Na₂O, MnO, CoAg, FiAg, StF, Rub	DT, ETR, RF, GB, XGB, ADA	√	ADA	-/0.86	√
Sathiparan, N. et al. [31]	189	ESA, RHA, NH, UPV	LR, ANN, BDT, RF, KNN, SVM, XGB	×	KNN	-/0.958	√
Le, Q. H. et al. [32]	375	CoAg, FiAg, NH, NS, ASM, CT, CuTi, TA	DNN, KNN, SVM	√	DNN	0.8903/-	×
Khan, A. Q. et al. [33]	149	FA, SiO₂, Al₂O₃, CoAg, FiAg, NH, Molarity, NS, NS/NH, (NH+NS)/FA, W, CT, CuTi	BPNN, RF, KNN	×	BPNN	-/0.948	×
Yeluri, S. C. et al. [34]	186	FA, Molarity, NH, NS, FiAg, CoS, CoAg, RA, CuTi	MARS, GMDH, M5P, LR	×	MARS	0.9634/-	√
Philip, S. et al. [35]	309	FA, CoAg, FiAg, NH, NS, Molarity, NS/NH, (NH+NS)/FA, TA, CT, EW, SP	LR, ADA, RF, SVM, ANN	×	RF	-/0.96	×
Onyelowe, K. C. et al. [36]	132	GGBFS, FA, NH, NS, TA	GB, CN2, NB, SVM, SGD, KNN, DT, RF	×	KNN	-/0.99	×
Philip, S. et al. [37]	110	GGBFS, CoAg, FiAg, NH, Molarity, NS, NS/NH, (NH+NS)/GGBFS, TA, CT, FV, FL, FD, FAR, FTS	LR, DT, RF, XGB, GB, ADA	×	XGB	-/0.938	×
Yang, H. et al. [38]	206	Size, W, NS, Molarity, NH, GGBFS, FA, FiAg, CoAg, CT, HR, HD, HT, CuTi	SVM, EN, GB, XGB, GA-RF, PSO-DNN, BNN, ELM	√	GA-RF	-/0.937	√
Diksha, Dev, N. and Goyal, P. K. [39]	144	AF/FA, Molarity, L/B, Ag/B, L/Ag, EW, EW/L, CFV, CT, TA	LR, GPR, EL, SVM, ANN	√	GPR	0.9951/-	×

Table 2. Definitions and mathematical formulas of model evaluation metrics.

Metric	Formula	Description
R²	$1 - \frac{Σ {(A c t u a l - P r e d i c t e d v a l u e s)}^{2}}{Σ {(A c t u a l - M e a n o f a c t u a l v a l u e)}^{2}}$	It is a measure of how well the model explains the variance of the dependent variable(s), typically ranging between 0 and 1. A value of 1 indicates that the model fits the data perfectly; a value of 0 indicates that the model has no success in explaining the data. The model may take negative values when it predicts worse than the mean value of the target variable [54].
RMSE	$\sqrt{{\frac{1}{n} \sum_{i - 1}^{n} \| A c t u a l - P r e d i c t e d v a l u e s \|}^{2}}$	RMSE measures the square root of the mean square differences between the model-predicted values and the actual values. This metric evaluates the overall accuracy of the model by penalizing large errors more. Lower RMSE values indicate better model performance [6,58].
MAE	$\frac{1}{n} \sum_{i - 1}^{n} \| A c t u a l - P r e d i c t e d v a l u e s \|$	It measures the average of the absolute differences between actual and predicted values. Lower MAE values indicate that the model makes more accurate predictions. Unlike squared error metrics, MAE does not disproportionately penalize large errors, making it less sensitive to outliers [58].
MAPE	$\frac{1}{n} \sum_{i - 1}^{n} \| \frac{A c t u a l - P r e d i c t e d v a l u e}{A c t u a l v a l u e} \| \times 100$	It is an error metric that measures the average of the absolute percentage differences between the predicted values and the actual values. MAPE provides a percentage-based measure of how accurately the model makes predictions. MAPE is especially useful for understanding model performance when the dataset contains very large or values close to zero. Low MAPE values indicate that the model makes more accurate predictions [58].

Table 3. Hyperparameters, parameter intervals, and optimally tuned values obtained through Bayesian Optimization for each ensemble learning algorithm.

Model	Parameter	Parameter Intervals	Best Value
XGB	learning_rate	0.01–0.3	0.1495
	max_depth	3–10	7
	Subsample	0.5–1.0	0.9591
	colsample_bytree	0.5–1.0	0.7033
	n_estimators	50–500	319
RF	max_depth	3–20	10
	min_samples_split	2–20	2
	min_samples_leaf	1–20	1
	n_estimators	50–500	479
	max_features	0.1–1.0	0.8485
LightGBM	learning_rate	0.01–0.2	0.1926
	max_depth	3–20	8
	num_leaves	20–100	54
	n_estimators	50–500	485
	min_child_samples	5–50	13
	Subsample	0.6–1.0	0.6604
	colsample_bytree	0.6–1.0	0.8689
SVR	C	1–500	299
	epsilon	0.001–0.5	0.0789
	gamma	0.0001–1	0.1561
ANN	hidden1 (neurons)	16–128	109
	hidden2 (neurons)	8–64	19
	alpha (L2 penalty)	1 × 10⁻⁶–1 × 10⁻⁴	0.0018
	learning_rate_init	1 × 10⁻⁴–1 × 10⁻²	0.0019

Table 4. Statistical summary of the variables in the dataset.

Parameter	Abbr.	Unit	Mean	Std	Min	25%	50%	75%	Max
Fly ash	FA	kg/m³	350	50	300	300	350	400	400
Ground granulated blast furnace slag	GGBFS	kg/m³	50	50	0	0	50	100	100
NaOH molar concentration	M	-	11	2.2	8	9.5	11	12.5	14
NaOH	NH	kg/m³	70	0	70	70	70	70	70
Na₂SiO₃	NS	kg/m³	120	0	120	120	120	120	120
Extra water	EW	kg/m³	0	0	0	0	0	0	0
Coarse aggregate	CoAg	kg/m³	873.3	217.2	620	620	850	1150	1150
Fine aggregate	FiAg	kg/m³	650	0	650	650	650	650	650
Recycled aggregate	R-Ag	kg/m³	366.7	330.2	0	0	300	800	800
Curing temperature	CT	°C	49.7	11.9	24	48	48	60	60
Curing time	CuTi	hrs	44.6	20	24	24	48	72	72
Testing age	TA	Days	13	9.5	3	6	10.5	17.5	28
Compressive Strength	CS	MPa	44.7	17.2	23.4	32.7	39.9	52.1	102

Table 5. Comparison of Model Prediction Performances with Default Parameters and Bayesian Optimization.

Model	R²	RMSE (MPa)	MAE (MPa)	MAPE (%)
XGB-Default	0.9990 ± 0.0006	0.4956 ± 0.1477	0.2845 ± 0.0601	0.64
XGB-BO	0.9997 ± 0.0001	0.3100 ± 0.0616	0.2191 ± 0.0368	0.50
RF-Default	0.9965 ± 0.0020	0.9523 ± 0.2365	0.6113 ± 0.1314	1.39
RF-BO	0.9966 ± 0.0020	0.9462 ± 0.2449	0.6222 ± 0.1522	1.39
LightGBM-Default	0.9977 ± 0.0009	0.7959 ± 0.1305	0.5287 ± 0.0561	1.15
LightGBM-BO	0.9996 ± 0.0001	0.3567 ± 0.0599	0.2480 ± 0.0426	0.57
SVR-Default	0.9993 ± 0.0003	0.4367 ± 0.1075	0.2746 ± 0.0468	0.59
SVR-BO	0.9995 ± 0.0002	0.3705 ± 0.0863	0.2274 ± 0.0287	0.51
ANN-Default	0.9824 ± 0.0073	2.1965 ± 0.3982	1.6265 ± 0.3104	3.66
ANN-BO	0.9961 ± 0.0012	1.0493 ± 0.1222	0.8054 ± 0.0908	1.87

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cihan, M.T.; Cihan, P. Bayesian-Optimized Ensemble Models for Geopolymer Concrete Compressive Strength Prediction with Interpretability Analysis. Buildings 2025, 15, 3667. https://doi.org/10.3390/buildings15203667

AMA Style

Cihan MT, Cihan P. Bayesian-Optimized Ensemble Models for Geopolymer Concrete Compressive Strength Prediction with Interpretability Analysis. Buildings. 2025; 15(20):3667. https://doi.org/10.3390/buildings15203667

Chicago/Turabian Style

Cihan, Mehmet Timur, and Pınar Cihan. 2025. "Bayesian-Optimized Ensemble Models for Geopolymer Concrete Compressive Strength Prediction with Interpretability Analysis" Buildings 15, no. 20: 3667. https://doi.org/10.3390/buildings15203667

APA Style

Cihan, M. T., & Cihan, P. (2025). Bayesian-Optimized Ensemble Models for Geopolymer Concrete Compressive Strength Prediction with Interpretability Analysis. Buildings, 15(20), 3667. https://doi.org/10.3390/buildings15203667

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bayesian-Optimized Ensemble Models for Geopolymer Concrete Compressive Strength Prediction with Interpretability Analysis

Abstract

1. Introduction

2. Related Studies

3. Predictive Modeling and Explainable AI

3.1. Predictive Models

3.1.1. Extreme Gradient Boosting (XGB)

3.1.2. Random Forest (RF)

3.1.3. Light Gradient Boosting Machine (LightGBM)

3.1.4. Support Vector Regression (SVR)

3.1.5. Artificial Neural Network (ANN)

3.2. Bayesian Hyperparameter Optimization

3.3. Model Performance Evaluation

3.4. Explainable Artificial Intelligence (XAI)

4. Methods

5. Results and Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI