Interpretable Machine Learning for Compressive Strength Prediction of Fly Ash-Based Geopolymer Concrete

Ahadian, Farnaz; Işıkdağ, Ümit; Bekdaş, Gebrail; Nigdeli, Sinan Melih; Cakiroglu, Celal; Geem, Zong Woo

doi:10.3390/su18052227

Open AccessArticle

Interpretable Machine Learning for Compressive Strength Prediction of Fly Ash-Based Geopolymer Concrete

by

Farnaz Ahadian

¹,

Ümit Işıkdağ

²

,

Gebrail Bekdaş

^1,*

,

Sinan Melih Nigdeli

¹,

Celal Cakiroglu

³

and

Zong Woo Geem

^4,*

¹

Department of Civil Engineering, Istanbul University-Cerrahpasa, 34320 Istanbul, Türkiye

²

Department of Architecture, Mimar Sinan Fine Arts University, 34427 İstanbul, Türkiye

³

GameAbove College of Engineering and Technology, Eastern Michigan University, Ypsilanti, MI 48197, USA

⁴

College of IT Convergence, Gachon University, Seongnam 13120, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Sustainability 2026, 18(5), 2227; https://doi.org/10.3390/su18052227

Submission received: 18 January 2026 / Revised: 20 February 2026 / Accepted: 22 February 2026 / Published: 25 February 2026

Download

Browse Figures

Versions Notes

Abstract

Fly ash-based geopolymer concrete (GPC) is a sustainable alternative to conventional cementitious materials; however, its compressive strength is governed by complex and highly correlated mixture parameters, making experimental optimization expensive and data-driven modeling challenging. While machine learning (ML) techniques have been widely applied to predict GPC strength, most studies prioritize predictive accuracy without explicitly addressing multicollinearity among input variables, which can distort feature importance, reduce model stability, and limit engineering interpretability. This study proposes a multicollinearity-integrated and interpretable ML framework that systematically embeds correlation diagnostics and structured feature screening within the modeling pipeline rather than treating interpretability as a post-processing step. Multiple conventional and ensemble learning algorithms were comparatively evaluated using cross-validation to ensure generalization robustness. The proposed framework achieved a maximum coefficient of determination (R²) of 0.96 with low prediction error, outperforming baseline regression models while demonstrating improved stability under correlated input conditions. Unlike existing studies that rely solely on black-box optimization, the integrated interpretability analysis revealed physically consistent dominance of curing temperature, alkali content, and water-related parameters in governing strength development. By explicitly coupling predictive performance with multicollinearity mitigation and engineering-oriented interpretability, this work advances beyond accuracy-driven ML applications and provides a robust and transparent decision-support tool for sustainable geopolymer mix design.

Keywords:

geopolymer concrete; machine learning; feature selection; genetic algorithm; multi-objective optimization

1. Introduction

Concrete is the most widely used construction material worldwide, owing to its versatility, durability, and relatively low cost. However, the dominant use of ordinary Portland cement (OPC) as the primary binder in concrete has raised serious environmental concerns. Cement production is an energy-intensive process. It is responsible for approximately 5–8% of global anthropogenic CO₂ emissions, mainly due to clinker production and fossil fuel consumption. As global infrastructure demand continues to increase, reducing the environmental footprint of cement-based materials has become a critical challenge for sustainable development in the construction sector [1].

In response to these concerns, geopolymer concrete has emerged as a promising alternative to OPC-based systems. Geopolymers are inorganic binders formed through the alkaline activation of aluminosilicate-rich materials, such as fly ash, ground granulated blast furnace slag (GGBFS) and other industrial by-products. Unlike OPC, geopolymer binders do not rely on high-temperature clink erization, enabling substantial reductions in CO₂ emissions while simultaneously valorizing industrial waste streams [2]. From a sustainability perspective, geopolymer technology aligns with circular economy principles by transforming by-products into value-added construction materials and reducing reliance on virgin raw resources.

Geopolymer concrete has gained considerable attention due to its environmental benefits and mechanical performance. In addition to its behavior under static loading, geo-polymer concrete also exhibits notable performance under dynamic loading conditions. Studies have shown that it provides adequate resistance to impact and vibrations, effectively absorbing applied energy. Therefore, investigating its strength and behavior under dynamic conditions is particularly important for both structural and industrial applications.

Among various precursor materials, fly ash has received particular attention due to its wide availability and favorable chemical composition. Large quantities of fly ash are generated annually by coal-fired power plants. Only a limited fraction is effectively utilized in construction applications. When activated under alkaline conditions, fly ash can form a dense aluminosilicate network. It results in geopolymer concretes with mechanical properties comparable to, and in some cases exceeding those of, conventional OPC concrete [3,4,5]. The incorporation of GGBFS has also been shown to enhance early-age strength development, especially under ambient curing conditions, further improving the practical applicability of geopolymer systems [6].

Despite these advantages, the mechanical performance of geopolymer concrete is governed by a complex interaction of multiple parameters. These include precursor composition, fineness of fly ash, alkaline activator type and concentration, activator to binder ratio, curing temperature and duration, and the presence of chemical admixtures [7]. The nonlinear and interdependent nature of these variables makes it challenging to develop generalized mixed design rules using traditional empirical approaches. As a result, reported findings in the literature often exhibit significant variability, limiting the transferability of experimental results to broader engineering practice.

To address this complexity, machine learning (ML) techniques have increasingly been adopted to model and predict the compressive strength of geopolymer concrete. ML models are capable of capturing nonlinear relationships and interactions among multiple variables and have demonstrated high predictive accuracy. However, many ML-based approaches suffer from limited interpretability, particularly when input variables exhibit strong linear interdependencies. In such cases, model predictions may be accurate, yet the extracted feature importance measures become unreliable, effectively turning the models into black-box predictors with limited physical insight.

From a sustainability-oriented engineering perspective, interpretability is as important as predictive accuracy. Understanding which material parameters most strongly influence compressive strength supports more efficient mix design, reduces unnecessary material usage and facilitates the rational optimization of geopolymer systems with lower environmental impact. Therefore, there is a clear need for modeling frameworks that balance predictive performance with transparency and robustness, especially when dealing with high-dimensional datasets characterized by multicollinearity.

Recent studies have extensively explored machine learning (ML) techniques for predicting the compressive strength (CS) of fly ash-based geopolymer concrete (GPC), highlighting the nonlinear dependency between mixture constituents, curing conditions, and strength development. For instance, Ansari et al. compared conventional ML models (Linear Regression, Artificial Neural Networks) with ensemble techniques such as AdaBoost and reported a substantial improvement in predictive performance using ensemble learning (R² = 0.944), demonstrating the superiority of boosting-based approaches over traditional regression models [8]. Similarly, Gupta et al. showed that Random Forest Regression outperformed KNN and Linear Regression in predicting GPC compressive strength, further confirming the robustness of ensemble-based frameworks for highly nonlinear concrete datasets [9].

More advanced ensemble and boosting strategies have also been proposed. Ahmad et al. employed ANN, AdaBoost, and boosting techniques, concluding that boosting algorithms achieved the highest accuracy (R² ≈ 0.96), and incorporated k-fold validation to improve generalization reliability [10]. Dash et al. introduced a firefly-optimized hybrid ensemble model that integrates multiple base learners through stacking. Their study combined metaheuristic hyperparameter tuning with Sobol and FAST global sensitivity analyses, demonstrating that curing temperature and water content dominate strength variability. These studies collectively indicate a strong trend toward hybrid and optimized ensemble frameworks to maximize prediction accuracy [11].

Beyond predictive accuracy, recent research has begun addressing model interpretability. Hu et al. applied SHapley Additive exPlanations (SHAP) to interpret eXtreme Gradient Boosting predictions for FRP-confined concrete columns, identifying parameter importance and nonlinear functional relationships [12]. Likewise, Nguyen et al. proposed a structured three-stage framework that transitions from ML prediction (LR, DNN, ResNet) to global sensitivity analysis and finally to empirical formula derivation, thereby attempting to bridge the gap between black-box ML models and practical engineering applicability [13].

Despite these significant advances, two critical issues remain insufficiently addressed in the current literature. First, while ensemble and hybrid models consistently improve predictive accuracy, most studies prioritize performance metrics (R², RMSE, MAE) without systematically addressing multicollinearity among input variables—a common characteristic of geopolymer concrete datasets due to correlated binder compositions, activator ratios, and curing parameters. Second, interpretability is often treated as a post-processing step (e.g., SHAP or sensitivity analysis) rather than being structurally integrated into the modeling framework to enhance both robustness and engineering transparency.

Therefore, the existing body of research reveals a methodological gap between high-accuracy ensemble modeling and structured interpretability with explicit treatment of correlated input features. The present study aims to address this gap by integrating multicollinearity mitigation strategies within the ML pipeline and systematically evaluating their impact on both predictive performance and interpretability. In doing so, this work moves beyond incremental accuracy improvement and contributes to a more reliable and engineering-oriented modeling framework for geopolymer concrete strength prediction.

The novelty of this study lies in the development of an interpretable and dependency-aware machine learning framework for estimating the compressive strength of sustainable geopolymer concrete. Unlike conventional ML approaches that prioritize prediction accuracy alone, this work introduces a multi-objective NSGA-II–based feature selection strategy that explicitly accounts for linear interdependencies among input variables. By simultaneously optimizing predictive accuracy and feature independence, the proposed framework enhances both model reliability and interpretability.

Using an extensive dataset comprising multiple mix designs and curing parameters, the study systematically evaluates twelve hyperparameter-tuned machine learning models trained on optimized feature subsets. The results not only confirm the dominant influence of industrial by-products such as GGBFS and key activator-related parameters but also demonstrate that high predictive accuracy can be maintained while significantly reducing input redundancy. From a sustainability standpoint, the proposed methodology supports more informed and resource-efficient geopolymer mix design, reinforcing the role of geopolymer concrete as a viable low-carbon alternative to conventional cement-based materials.

2. Materials and Methods

This study aims to enhance the interpretability and robustness of machine-learning models in the domain of compressive strength prediction of geopolymer concrete by reducing linear interdependence among input features while preserving predictive accuracy.

To ensure a fair and reproducible evaluation framework, the dataset was first divided into training and testing subsets using a single 80/20 holdout split. The same partition was consistently preserved across both the baseline models and the feature-selected models to eliminate performance variability arising from data partitioning differences.

Given the high degree of linear interdependence among geopolymer constituents, a multi-objective feature selection strategy based on the NSGA-II algorithm was implemented. Unlike conventional single-objective selection approaches, the proposed framework simultaneously considers four criteria: predictive performance, multicollinearity reduction, correlation control, and feature subset compactness. These objectives were combined using weighted aggregation to balance model accuracy with statistical stability.

The weight coefficients (0.15, 0.30, 0.40, 0.15) were determined empirically through pre-liminary exploratory experiments to achieve a stable trade-off between predictive performance and multicollinearity mitigation. The selected configuration provided consistent convergence behavior across independent runs. The aim was not to impose theoretical optimality, but to establish a practically balanced configuration suited to the characteristics of the dataset.

Feature selection was performed exclusively on the training set to prevent information leakage. After subset identification, twelve machine learning algorithms were trained using identical training–testing partitions to ensure a valid comparison between baseline and optimized feature configurations.

Hyperparameter optimization was conducted using Tree-structured Parzen Estimator (TPE) search with 50 evaluation iterations per model. This budget was selected as a compromise between computational feasibility and convergence stability, and preliminary runs indicated diminishing performance gains beyond this threshold.

Although the dataset consists of 274 experimental samples, it represents a consolidated compilation of geopolymer concrete mixtures with varying binder compositions, activator ratios, and curing conditions. Therefore, the results should be interpreted within the scope of this dataset, and broader generalization requires further cross-dataset validation.

2.1. Data Collection

To achieve robust predictions and reliable feature interpretation, it is essential to curate and pre-process datasets carefully and ensure their credibility. A key dataset in this field is provided with a manuscript titled “Effect of Alkaline Activators and Other Factors on the Properties of Geopolymer Concrete Using Industrial Waste”, developed with a gene expression programming (GEP) approach by Pham and Nguyen [14]. It is publicly available through the Mendeley Data Repository [15]. The dataset focuses on mixtures incorporating industrial by-products such as fly ash and ground granulated blast furnace slag, offering a solid basis for exploring geopolymer concrete properties through machine learning techniques.

2.2. Exploratory Data Analysis

The dataset contains 19 independent and 1 dependent variable(s) (columns) and 274 data points (rows). To provide a comprehensive overview of the dataset employed in this study, the descriptive statistics of the input variables are presented in Table 1.

Fly ash is a byproduct of coal combustion in thermal power plants. It is a fine powder mainly composed of silica, alumina, and iron oxides. With its pozzolanic properties, it reacts with calcium hydroxide in the presence of water to form cementitious compounds, thereby improving workability, reducing heat of hydration, enhancing long-term strength, and supporting sustainability by recycling industrial waste. Ground granulated blast furnace slag (GGBFS) is generated as a by-product of the iron and steel industry. It also exhibits cementitious and pozzolanic characteristics that enhance durability and mechanical performance over time. The incorporation of these materials in concrete is strongly aligned with sustainable construction practices and the objectives of green development [16].

In the present dataset, Concentration (M) NaOH denotes the molar concentration of the sodium hydroxide solution used as the alkaline activator. This parameter describes the amount of NaOH dissolved per liter of solution and is used to characterize the alkalinity level of the activating medium. The concentration of NaOH is known to play an important role in controlling the dissolution of aluminosilicate precursors and the overall reaction environment during geopolymer formation. Defining concentration in terms of molarity is a common and well-established practice in geopolymer and alkali-activated material studies, and the same convention has been adopted in this dataset.

Sodium silicate solution (Na₂SiO₃) is obtained by combining alkaline compounds with silica sources and water. Owing to its soluble silica content, it plays a key role in the geopolymerization process, stabilizing the structure and supporting the formation of three-dimensional silicate–aluminate networks [17]. Water within this solution reduces viscosity and improves particle dispersion, thereby enhancing the workability of fresh mixtures. The SiO₂/Na₂O molar ratio is particularly critical for both strength and durability [18]. Sodium hydroxide (NaOH), which is prepared by dissolving solid NaOH in water, acts as a strong alkaline activator that breaks down aluminosilicate structures, releasing reactive ions to accelerate polymerization and strength development [19]. However, its concentration must be carefully controlled to avoid negative effects such as thermal cracking or reduced workability. Sodium oxide (Na₂O) further increases sodium ion availability, strengthening silicate–aluminate gels and promoting the dissolution of precursors, though excessive contents may harm durability [20].

Silicon dioxide (SiO₂) provides the main source of reactive silica, enabling the formation of sodium aluminosilicate hydrate (N-A-S-H) gels that contribute to strength, durability, and low permeability [21]. Geopolymer systems also reduce CO₂ emissions by replacing Portland cement with recycled pozzolanic materials and alkaline activators [21], while dense gel networks improve resistance against carbonation and aggressive ions, contributing to sustainability. Water primarily acts as a reaction medium [17], while its total amount, including any additional water, must be optimized to balance workability and strength. The molar concentration and dry mass of NaOH [22] provide precise control of activator-to-binder interactions. Superplasticizers enhance flow without excess water [23], though their efficiency depends on compatibility with alkaline conditions.

Curing conditions are equally critical. Initial curing time and temperature strongly affect early geopolymer gel formation and strength gain, with moderate heat curing between 60 and 80 °C being particularly beneficial. A short resting period before heat application can reduce thermal stresses. Final curing temperature [3] governs matrix densification, durability, and efflorescence prevention. Additionally, sodium silicate concentration [5] influences network formation, setting behavior, and compressive strength through its control of the Si/Na ratio.

The variables included in the dataset were chosen mainly for their known effects on the mechanical strength of geopolymer concrete. At the same time, materials that contribute to sustainability, such as industrial by-products like fly ash and GGBFS, were also considered when selecting the features.

2.3. The Experimental Process

Following the collection of the data, a 4-stage experiment was conducted to fulfill the research aim and objectives (Figure 1). Appendix A provides a more comprehensive step-by-step tree diagram of all experimental process (pipeline). The stages of the process were Initialization, Feature Selection with Genetic Algorithm, Performance Assessment of ML Models, Visualization and Verification. This section provides the details of each stage.

2.3.1. Stage 1: Initialization

The process starts with loading the dataset from a CSV file. A check is made following this to detect if there is any missing data in the dataset. In the case of missing data, rows that contain null values would be removed. In fact, in our dataset, there were no rows with missing data.

In the following phase, the Standard Scaler (z-score normalization) X_scaled = (X − μ)/σ, where μ is the mean and σ is the standard deviation, is applied to the dataset. In many machine learning tasks, features often have different scales, which can negatively affect model performance. The Standard Scaler addresses this by standardizing each feature to have a mean of zero and a standard deviation of one. This ensures that all variables contribute equally to the learning process, without being dominated by those with larger magnitudes. By preserving the structure of the data while balancing feature scales, Standard Scaler facilitates more stable and efficient model training. This stage ensures data quality and normalizes features for consistent algorithm performance.

The following phase involves parameter configuration. A set of parameters for the feature selection process is configured in this phase. First, the minimum number of variables to retain (default: 5) is provided by the user. Following this, parameters related to the Genetic Algorithm Population size (default: 60), Number of generations (default: 40), Crossover probability (default: 0.7), and Mutation probability (default: 0.2) are set. Finally, the weights for each objective, VIF, RMSE, R², and Total Multiple R (MultiR) are provided by the user. These weights are used for transforming the 4-objective optimization problem into a single objective optimization problem. Weights also determine the focus of the feature selection, either by prioritizing one of the objectives (i.e., minimizing VIF) or by giving equal importance to each objective. In our study, the objective weights were provided as twelve weight combinations prioritizing both maximizing R² and minimizing RMSE, while controlling the effects of VIF and Total Multiple R. The weight combinations are provided in Table 2.

The following phase was correlation analysis and the generation of a correlation plot for the entire dataset based on Pearson Correlation Coefficient values (Figure 2). Figure 2 shows the correlation matrix of 19 key variables related to geopolymer concrete. The relationships between the variables are color-coded based on the correlation coefficient, with red indicating a positive relationship and blue indicating a negative relationship. A very strong negative correlation (r = −0.92) is observed between FA and GGBFS, indicating a strong inverse relationship between the two constituents. Na₂SiO₃ also shows a high positive correlation with NaOH and total water, since they are often used together in alkaline solutions. The values of water (1), added water, and total water also show relatively high correlations. Most variables related to curing time and temperature (e.g., final curing temperature) are less correlated with other variables, indicating that these parameters are determined independently of the composition of the raw materials. Overall, this matrix accurately shows the internal relationships of geopolymer concrete variables.

Following this phase, the dataset with all features is evaluated with a Random Forest Algorithm using 10-fold cross-validation. In this stage, metrics related to linear dependence and model accuracy are provided with several graphs.

2.3.2. Stage 2a: Feature Selection with Genetic Algorithm (Multi-Objective Optimization)

In this study, we have developed a Genetic Algorithm (GA)-based feature selection method to identify the most relevant input variables. The method optimizes a multi-objective fitness function that balances prediction accuracy, linearity/linear dependence, and the number of selected features. Each solution is represented as a binary chromosome, and standard GA operators such as selection, crossover, and mutation are used. The best-performing feature subset is selected based on repeated runs and average model performance. This approach is adaptable to any regression model and improves both efficiency and interpretability compared to traditional methods. This approach can be used to tackle high VIF scores caused by possibly redundant features. The GA based algorithm developed focuses on the following 4 objectives:

(1): Minimizing VIF
(2): Maximizing R²
(3): Minimizing RMSE
(4): Minimizing Total Multiple R

The following elaborates on each objective:

Minimizing VIF (Variance Inflation Factor):

High multicollinearity arises when independent variables are highly correlated, making it difficult to isolate the individual effect of each predictor. This condition increases the standard errors of the regression coefficients and undermines the model’s reliability. The Variance Inflation Factor (VIF) is commonly used to detect such issues; VIF values greater than 10 typically indicate problematic multicollinearity, suggesting the need to remove or combine variables. Rather than merely treating input features as isolated variables, this objective seeks to examine the underlying dependency structure between them. A high VIF implies that a variable carries redundant information already encoded by other predictors, which not only distorts coefficient estimates but also undermines the clarity of model interpretation. By minimizing VIF, we aim to ensure that each selected variable contributes unique and non-overlapping information, thereby promoting a more stable and trustworthy model structure.

Maximizing R² (Coefficient of Determination):

This objective is rooted in the notion that a strong predictive model should be capable of capturing the majority of the variability present in the response variable(s). A high R² does not merely signal good performance—it indicates alignment between the model’s internal representation and the natural structure of the data. Optimizing for R² thus reflects an effort to build models that are not just mathematically accurate, but also conceptually coherent with the phenomenon being modeled.

Minimizing RMSE (Root Mean Square Error):

While R² reflects the proportion of variance explained, RMSE directly captures the magnitude of prediction error in the same unit as the response. It penalizes large deviations more severely than smaller ones, making it particularly sensitive to outliers and noise. Minimizing RMSE ensures that the model’s predictions remain not only close to observed values but also consistent and reliable across diverse instances, without major fluctuations in performance.

Minimizing Total Multiple R (Custom Metric):

Beyond standard metrics, we introduce Total Multiple R as a tailored indicator of cumulative linear dependence between input features and the target(s). Total Multiple R measures the overall interdependence among selected variables by summing the Multiple Correlation R values for each variable. The calculation method for the Total Multiple R metric is provided in Table 3.

For each independent variable x_i, a regression model is constructed using the remaining V-1 independent variables as given in Equation (1):

y_{i} = β_{0} + \sum_{j = 1, i \neq j}^{V} β_{j} x_{j} + ε

(1)

where y_i denotes the i-th original variable treated as the dependent variable in the corresponding regression model, the remaining x_j are used as independent variables (IVs) of the original dataset, V is total number of independent variables, β₀ is intercept, β_i is regression coefficient and ε is residual error term. The calculation of coefficient of determination (R²) is done according to Equations (2)–(4).

R_{i}^{2} = 1 - (S S_{r e s, i}) / (S S_{t o t, i})

(2)

S S_{r e s, i} = Σ_{(k = 1)}^{n} {(y_{i, k} - {\hat{y}}_{i, k})}^{2}

(3)

S S_{t o t, i} = Σ_{(k = 1)}^{n} {(y_{i, k} - {\bar{y}}_{i})}^{2}

(4)

y_i,k is the observed value of variable i for sample k,

{\hat{y}}_{i, k}

is the predicted value from regression,

{\bar{y}}_{i}

is the mean of variable i and n is the number of samples. The multiple correlation coefficient R_i is calculated as Equation (5).

R_{i} = \sqrt{(R_{i}^{2})}

(5)

Total multiple R (Total_R) is calculated as an overall metric as given in Equation (6).

{T o t a l}_{R} = Σ_{I = 1}^{V} R_{i}

(6)

As the range of Total Multiple R is [0, Number of Independent Variables], high total correlation (Total Multiple R) values may reflect a scenario where certain features dominate the output space, leading to overfitting and sensitivity to minor changes in those dominant predictors. By minimizing this metric, we aim to balance the influence of individual features and reduce the risk of overly deterministic or fragile model behavior.

The objective function, which aims to fulfill the 4 objectives mentioned above, takes independent variables of the ML problem and objective weights as inputs, trains the Random Forest models with different combination of independent variables, then calculates the values of VIF, R², RMSE and Total Multiple R, and provides the weighted sum of these as the objective function output.

The optimization algorithm with 4 objectives has been transformed into a single objective optimization problem using the Weighted Sum Method, as explained in [24], by forming a convex combination of the objectives with the weights provided by the user. In runtime, the user is given an option to run the feature selection process either focusing on one of the competing objectives, i.e., minimization of VIF, maximization of R², minimization of RMSE, and minimization of Total Multiple Correlation R or choosing a balanced approach. When the user selects one of these 4 approaches, a predefined set of weights is provided for that specific focus. When the user selects the balanced approach, equal weights of 0.25 are assigned for each objective. In this specific study, twelve weight combinations, provided in Table 2, were used, prioritizing both maximizing R² and minimizing RMSE, while controlling the effects of VIF and Total Multiple R.

NSGA-II is implemented as the Genetic Algorithm (GA) variant for this multi-objective problem. Random Forest is used inside the genetic algorithm for training as the ML model that is evaluated with different combinations of independent variables. A fixed set of hyperparameters for the Random Forest model is used when evaluating the model with different variable combinations. The minimum number of independent variables to keep was a constraint of the implemented GA (default: 5) and is input by the user before the algorithm initiates. The user is also asked to input hyperparameters of the GA, such as the number of generations [10–100] and population size [20–200].

Following the completion of the multi-objective genetic algorithm optimization process, the algorithm produces a set of non-dominated solutions known as the Pareto front. These solutions represent optimal trade-offs among the four competing objectives: minimization of VIF, maximization of R², minimization of RMSE, and minimization of Total Multiple Correlation R. The Pareto front analysis phase involves a comprehensive evaluation of all solutions, and the selection of a single optimal solution based on the user-defined focus and preferred weights set. The set of all Pareto optimal solutions forms the Pareto front, which represents the best possible trade-offs achievable among the competing objectives. Solutions in the Pareto front have the property that any improvement in one objective necessarily requires degradation in at least one other objective.

2.3.3. Stage 2b: Pareto Front Analysis for Multi-Objective Variable Selection

Each individual in the Pareto front is represented as a binary vector b ∈ {0,1}^V, where V is the total number of available variables, bi = 1 indicates variable i is selected, and bi = 0 indicates exclusion. The pseudocode for Pareto front analysis is provided in Table 4.

In the Pareto front analysis, following the initial 3 steps, Step 4 identifies the best solutions for each objective, and Step 5 calculates the weighted (balanced) scores for all possible solutions (i.e., feature subsets). Finally, the optimal solution is selected based on these scores in Step 6. The best solutions for each of the 12 scenarios tested were calculated using this analysis.

2.3.4. Stage 3: Performance Assessment of ML Models with Selected Features

Once the feature selection is complete and the optimal subset of features is selected based on commonly selected features in 12 scenarios, a series of ML models is trained with hyperparameter optimization using this optimal subset. In this stage, the hyperparameter optimization is accomplished through the HyperOpt (version 0.2.7) Python library. This section first presents a brief overview of each of the models evaluated in this stage, and following this, an overview of hyperparameter optimization is provided.

XGBoost: XGBoost is a Gradient Boosting framework that builds decision trees sequentially, with each new tree attempting to reduce the residual errors of the previous ones. Its learning mechanism incorporates both first- and second-order gradients, allowing for precise optimization during training. What sets XGBoost apart is its built-in regularization, which helps prevent overfitting, especially in high-dimensional datasets often encountered in structural and materials engineering. This algorithm has shown superior predictive accuracy in complex regression problems, such as modeling the rheological properties of self-compacting concrete.

CatBoost: CatBoost is another Gradient Boosting algorithm specifically designed to handle categorical features more effectively than traditional models. İt introduces a novel technique known as ordered boosting, which reduces prediction shift and leakage that can occur during training. Furthermore, CatBoost internally transforms categorical variables into numerical formats without the need for external encoding. This capability makes it highly suitable for construction datasets that include multiple qualitative inputs. Although its training time is typically longer than that of other boosting methods, it often achieves comparable or better accuracy.

LightGBM: LightGBM is a Gradient Boosting algorithm developed for efficiency and scalability. It uses a leaf-wise growth strategy, which typically results in deeper trees and better accuracy compared to level-wise methods. Additionally, it employs techniques like Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bunding (EFB) to reduce computational cost and memory usage. İn engineering applications involving large and heterogeneous datasets, LightGBM offers an excellent balance between training speed and prediction performance, making it an attractive option for real-time modeling.

K-Nearest Neighbors: KNN is a non-parametric, instance-based learning algorithm commonly used for regression and classification tasks. It predicts outcomes by identifying the closest data points in the feature space and aggregating their responses. Although not frequently employed. KNN often serves as a benchmark due to its simplicity and interpretability, particularly when transparency in decision-making is required.

Random Forest: Random Forest is an ensemble learning algorithm that constructs multiple decision trees using random subsets of data and features. The final prediction is obtained by aggregating the outputs of individual trees through majority voting or averaging.

Gradient Boosting: In practical engineering problems where data complexity and non-linearity are prevalent, Gradient Boosting has emerged as a reliable modeling strategy. It functions by building a predictive structure layer by layer, correcting prior errors iteratively, which allows for capturing subtle patterns hidden in the data. Rather than relying on a single deep tree, the model constructs a sequence of shallow ones, each fine-tuning the shortcomings of its predecessor. This stepwise refinement not only improves accuracy but also allows engineers to regulate overfitting by adjusting parameters like learning rate and maximum depth. It is particularly valuable in applications involving material strength prediction or load estimation, where precision is essential and training data may be limited.

Decision Tree: For tasks requiring clarity and explainability, Decision Trees serve as an intuitive analytical framework. Their logic mimics human decision-making: sequential, rule-based, and visually traceable. While they may not offer the highest predictive power on their own, they are instrumental in uncovering key variable interactions and segmenting data in a transparent way. This interpretability is especially advantageous in engineering domains, where understanding cause–effect relationships between inputs and outputs can inform both model development and domain-specific decisions.

Extra Trees: In scenarios where model training speed and generalization are crucial, the Extra Trees algorithm offers an efficient yet robust solution. Randomly selecting both features and splitting thresholds, it eliminates the exhaustive search typical of standard decision trees, which drastically reduces training time. Although the model sacrifices fine-tuned decisions at each node, it gains stability and noise tolerance through sheer ensemble diversity. This makes Extra Trees highly practical in settings where data may include measurement uncertainties or where rapid iterations are needed across many design alternatives.

MLP Neural Network: The Multilayer Perceptron (MLP) is a class of artificial neural networks particularly effective for learning nonlinear relationships in engineering systems. With its layered architecture consisting of interconnected neurons distributed across input, hidden, and output layers, MLP can model complex mappings between input features and output responses. Its strength lies in the backpropagation algorithm, which fine-tunes the internal weights based on the discrepancy between predicted and actual values. This adaptability allows MLPs to capture intricate behavioral patterns in domains such as material science, structural health monitoring, or energy consumption forecasting, even when analytical formulations are not feasible.

Ridge Regression: When dealing with multicollinearity among predictors or limited data in high-dimensional spaces, Ridge Regression serves as a disciplined approach to prevent model overfitting. By introducing a penalty term proportional to the square of the coefficients, it constrains the magnitude of parameter estimates, stabilizing the solution. This regularization not only improves predictive generalization but also allows for a more robust interpretation in data-driven engineering studies. Especially in optimization tasks involving multiple correlated inputs, Ridge Regression provides a mathematically sound alternative to standard linear models by managing trade-offs between bias and variance.

Linear Regression: Linear Regression remains a foundational method in predictive modeling, offering clarity and interpretability in understanding relationships between variables. Its principle assumes that the target variable can be approximated as a weighted sum of the input features, with residuals following a normal distribution. Despite its simplicity, linear regression is valuable in preliminary assessments, sensitivity analyses, or when establishing baseline models for comparison. In engineering contexts, it enables practitioners to quickly identify dominant factors affecting system behavior, while also setting the stage for more advanced modeling if required.

Lasso Regression: Lasso Regression, like Ridge, is a type of penalized linear regression used for feature selection and preventing overfitting. The main difference from Ridge is that Lasso’s penalty can drive some coefficients exactly to zero. This feature allows Lasso to remove unimportant features from the model, creating a simpler and more interpretable model.

Elastic Net: Elastic Net is a combination of Ridge and Lasso regression. This algorithm simultaneously uses both L1 penalty (like Lasso) and L2 penalty (like Ridge). This combination allows Elastic Net to perform feature selection (like Lasso) and effectively manage multicollinearity issues (like Ridge), making it suitable for many datasets.

SVR (Linear): Linear SVR is a version of Support Vector Machine designed for regression problems. Instead of trying to find a line that minimizes error, Linear SVR tries to find a line that keeps most of the data within a specific “margin area”. This makes the model less sensitive to outliers.

SVR (RBF): RBF SVR is also used for regression, but it uses a kernel function called the Radial Basis Function (RBF). This function allows SVR to model complex nonlinear relationships between data. In other words, RBF SVR can find a “line” (actually a curve or hyperplane) that best approximates non-linear data, offering more flexibility than Linear SVR.

In machine learning, hyperparameters are external settings defined by the user before the training process begins. Unlike model weights, which are learned from data, hyperparameters control the overall behavior of the algorithm. They influence aspects such as model complexity, training speed, and generalization ability. Examples include the learning rate, the depth of a decision tree, or the number of estimators in ensemble methods. Selecting appropriate hyperparameters is a critical step, as it directly impacts the model’s accuracy and stability. Hyperparameter optimization aims to systematically determine these parameters to maximize the model’s performance. This process helps the model produce more stable and accurate results by preventing potential problems such as over- or under-fitting. Once the feature selection is finalized and the set of optimal features is determined, training with hyperparameter optimization is performed for each ML model (Random Forest, XGBoost, LightGBM, etc.) through the Hyperopt library [25]. The Hyperparameter Optimization was accomplished using the HyperOpt (version 0.2.7) python package’s Tree-structured Parzen Estimator (TPE), which is used as the optimization algorithm in this stage. Following the training stage, the models are evaluated on a separate holdout set.

2.3.5. Stage 4: Visualization and Verification

This stage involves the generation of convergence plots, Pareto front plots, Feature Importance Plot for the most accurate ML model, in addition to a series of verification checks that are also made to ensure the final VIF values are not extreme, and the linear dependency problem has been resolved.

3. Results

The experiment started with providing the initial parameter values to the Genetic Algorithm as indicated in Table 5. The algorithm used the exact same values for all runs of the algorithmic pipeline, which runs 12 times for different weight scenarios given in Table 2.

3.1. Feature Selection via Genetic Algorithm

Table 6 below presents a summary of outputs of 12 genetic algorithm pipeline runs, for the 12 weight combination scenarios that focus on performance optimization while taking into account tackling variable interdependency. N vars indicates the number of features retained on scenario completion, Max VIF of the solution indicates the Max VIF value exhibited along with all retained variables, R², RMSE, Total Multi-R indicates the R², RMSE, Total Multi-R values obtained as result of training with baseline (non-fine-tuned) Random Forest Model for that scenario. Naturally as the number of selected features decreases, due to both the decrease in the predictive power of the model and total correlations among independent variables, the Max VIF, R² and Total Multi-R values have decreased and the RMSE values have increased.

Table 7 provides the selected features (variables) for each scenario. The set of Concentration(M) NaOH, NaOH (Dry), GGBFS, Initial curing temp (C), Coarse Aggregate formed a solid foundation by being selected in all scenarios. The Na₂O (Dry) and Initial curing rest time (day) were selected in ¾ of all scenarios, indicating that these variables can also play a significant role in predictions. Additional water, initial curing time (day), FA, Na₂SiO₃ were selected only in ¼ of all scenarios, indicating that the contribution of these variables to the predictive ability of the model is limited. A total of 11 features were selected as a result of this selection process; the remaining eight features appear to have either zero or a considerably very low impact on the predictive ability of the model.

Every variable selection happens as a result of the Pareto front analysis process. The weighting scheme helps in transforming the solution from a multi-objective to a single-objective optimization problem. The balanced score is calculated first by applying a min-max normalization to all calculated objective values, then the weighted sum (ws) of normalized values of each objective was calculated according to the formula below.

ws = (1 − normVIF) ∗ w1 + (normR²) ∗ w2 + (1 − normRMSE) ∗ w3 + (1 − normMultiR) ∗ w4

(7)

Table 8 provides Pareto front optimal solutions, calculated weighted sum scores (Balanced Score) and selected optimal solution for scenario S4 (chosen as an example as it is the first scenario representing the lowest VIF score obtained). In the table, the column N Vars provides the number of variables (features) selected for that optimal solution. The table also provides the maximum VIF value observed for the variables in that selected subset, along with R² and RMSE. Total Multiple R values observed in the training with the Random Forest model (Random Forest model was the only model used in all generations of the GA).

The algorithm also provides the best solution per objective for each scenario. The best solutions per objective for scenario S4 are shown in Table 9.

The solution providing the Best Max VIF score was Solution 1, the solution providing the Best R² was Solution 15, the solution providing the Best RMSE was again Solution 15, and the solution providing the Best Total Multiple R was Solution 1. According to the weights provided for this scenario S4 (0.10, 0.40, 0.40, 0.10), the Best Solution (the Balanced Solution) was selected as Solution 9, with a Balanced Score of 0.8673. The feature subset in the selected solution (Solution 4) included Concentration(M)NaOH, NaOH (Dry), GGBFS, Initial curing temp(C), Coarse Aggregate, Na₂O (Dry) and Initial curing rest time (day). The Pareto front plot from the perspective of different objectives of the selected solution is provided in Figure 3.

Figure 3 indicates how variables representing objectives correlate with each other in the selected Pareto front. The Pareto front visualization in Figure 3 illustrates the trade-off relationships among multicollinearity (VIF), predictive accuracy (R²), prediction error (RMSE), and feature redundancy (Total MultiR). Each red diamond represents a non-dominated solution where no objective can be improved without degrading at least one other objective. The optimal solution, indicated by the red star, was selected using the predefined weighted aggregation function. The results demonstrate that although slightly higher R² values can be achieved at the expense of increased multicollinearity and redundancy, the selected optimal solution provides a balanced compromise by maintaining high predictive performance while controlling feature interdependence. This confirms the effectiveness of the proposed multi-objective optimization framework in identifying robust and parsimonious feature subsets without relying solely on predictive accuracy.

As explained previously, the set consisting of Concentration(M)NaOH, NaOH (Dry), GGBFS, Initial curing temp(C) and Coarse Aggregate formed a solid foundation by being selected in all scenarios. Thus, it was decided to use these five features in the further ML training and evaluation stages. Figure 4 presents the correlation plot of selected variable and target variable (28 d cubic compressive strength (MPa)).

The Pearson correlation analysis indicated that the 28-day compressive strength exhibits weak-to-moderate linear relationships with the input variables. Most importantly, the absence of strong individual correlations suggests that compressive strength is governed by complex multivariate interactions rather than a single dominant factor, supporting the need for the use of advanced machine learning approaches capable of capturing nonlinear relationships. Among the predictors, NaOH concentration showed the highest positive correlation (r = 0.33), followed by NaOH dry mass (r = 0.25), indicating that alkaline activator properties play a relatively more important role in strength development. Despite the fact that we had concerns regarding keeping the GGBFS within the selected set, as GGBFS content and initial curing temperature exhibited very weak correlations (r = 0.07), suggesting that their individual linear contributions to strength variation are limited within the studied range and a strong negative correlation was observed between GGBFS and coarse aggregate (r = −0.92), indicating severe multicollinearity between these variables (which would in turn increase VIF), we finally decided to keep this variable as it is selected in all 12 scenarios in GA optimization process, as it might possibly have a high contribution to the predictive power of the models.

3.2. Baseline Model Evaluation

The ML pipeline involved 30 runs of baseline model evaluation and multi-model fine-tuning. In the first stage, by considering the Random Forest as the algorithm for baseline evaluation, the model was first evaluated in each run of the ML pipeline with the full input variable set. The dataset was divided into training (80%) and holdout (20%) subsets using a fixed seed in each of the 30 runs to ensure the baseline model and fine-tuned models use the same data splits. The model training and hyperparameter evaluation were performed using 5-fold cross-validation exclusively on the training data. Appendix B provides the performance metrics of the baseline model. In the metrics, the cross-validated R² represents the average out-of-fold predictive performance, while the holdout R² reflects the final generalization performance on an independent, unseen dataset. The best performing run for the fine-tuned models was for seed 29; thus, in this section, we focus on the baseline model evaluation metrics for the baseline model run for the same seed 29. The R² value for the holdout set evaluation was 0.9306 with RMSE = 3.1516 and MAE = 2.2696. The R² value for the test fold of 5-fold cross-validation was 0.8933 with RMSE of 3.6735. (Figure 5).

The VIF values of each variable were calculated to control interdependence between features used in the baseline Random Forest model. Figure 6 presents the VIF bar graph in descending order, 10 features such as Superplasticizer, Additional water, Water (2), NaOH, Total water, Water (1), NaOH (Dry), Na₂SiO₃, SiO₂ (Dry), Na₂O (Dry) exhibit extremely high VIF values (>1000).

The R² value for the test set was also high for the Random Forest model, which is known to be a robust model for high VIF values of predictor variables. In fact, ML models become very weak in terms of interpretability when multiple independent variables are strongly correlated (as multiple variables explain the same signal). In such a case, the Random Forest will often pick one of them for splits, ignore the other, or alternate randomly across trees. This leads to three subtle effects: (i) Feature importance becomes unreliable; Gini or impurity-based importance gets “diluted” across correlated features. None of them look important, even though the group is; (ii) Interpretability lowers, and it would not be possible to say “X₁ matters more than X₂” when they are shadow-boxing the same information. (iii) Model size and noise increase slightly, redundant features mean more candidate splits, which means more randomness leading to marginal inefficiency. In order to ensure robustness in model evaluation, increase efficiency in accuracy and achieve explainability in the next stage, the pipeline involved 30 runs of fine tuning of a set of ML models with the exact seeds and data splits used for baseline model evaluation.

3.3. Evaluation of Hyperparameter Optimized Models

A set of ML models was evaluated with 30 runs of the fine-tuning pipeline using the selected features in the following stage. The dataset is split into train and holdout sets with an 80/20 ratio using the pre-determined seeds, and only the train set is used in the training process with 5-fold cross-validation. The 5-fold cross-validation was conducted on the training dataset to estimate out-of-sample predictive performance, and the trained model was subsequently evaluated on an independent holdout dataset that was not involved in either model fitting or validation. The evaluation metrics for all 30 runs are presented in Appendix C. The cross-validated (CV) R² represents the mean performance across the out-of-fold test partitions, whereas the holdout R² (Holdout) reflects performance on a single, unseen realization of the data distribution

Table 10 and Figure 7 and Figure 8 summarizes average performance of the hyperparameter-optimized models after 30 runs of the pipeline, each run with 1000 TPE iterations of the Hyperopt library. The models were trained with the set consisting of five optimal input variables determined as a result of the GA-based feature selection process explained in the previous section.

When the average performance across the out-of-fold test partitions is compared, CatBoost appeared as the best-performing model, followed by XGBoost and Gradient Boosting. This is followed by ExtraTrees, LightGBM and Random Forest. The results indicate the dominance of ensemble models in exhibiting the best performance, where boosting-based models performed slightly better than the bagging-based ones. Among the foundation models, SVR appeared as the best performing model, followed by MLP, Decision Tree and KNN. Linear models, regardless of being the penalized ones or simple linear regression, exhibited the lowest performance among all models. Comparison of the average CV R² and holdout R² values revealed no indication of overfitting and no evidence of bias arising from an unbalanced holdout split, as both metrics exhibited consistent and comparable predictive performance.

The best overall holdout performance was exhibited in the 30th run (seed = 29) (Table 11) by the fine-tuned CatBoost model. In this run, the fine-tuned CatBoost model was able to achieve a holdout R² score of 0.9627 and RMSE of 2.3113 with only five input variables (i.e., set of Concentration(M)NaOH, NaOH (Dry), GGBFS, Initial curing temp(C), Coarse Aggregate). Feature importances plot for CatBoost is given as Figure 9. Figure 10 provides a predicted vs. actual plot for all algorithms for this run of the pipeline for the holdout set.

3.4. Feature Importance Analysis

In the feature importance analysis, the focus is on evaluating how each input variable contributes to the predictive performance of the best-performing model for estimating the output feature. CatBoost was chosen for this examination as it outperformed the other models. We would like to note that reported importance rankings are model-dependent interpretations rather than universal causal claims. The measure of importance for each variable was determined by assessing the change in the model’s error when that specific variable was used in the prediction process. The algorithm uses each model’s built-in feature_importances attribute to get the feature importance values for each feature. The results show that GGBFS has the highest contribution to the predictive ability by a significant margin. This indicates the important role of blast furnace slag in the mechanical properties of geopolymer concrete. In addition, NaOH (Dry) also has a significant contribution to the predictive ability of the model. In addition, Concentration, Initial Curing Time, and the amount of Coarse Aggregate were identified as factors affecting the concrete strength.

Although GGBFS exhibited a very weak linear correlation with compressive strength (r = 0.07), it was identified as the most important feature by the CatBoost model. This apparent discrepancy arises because Pearson correlation measures only the individual linear relationship between a predictor and the target variable, whereas CatBoost feature importance reflects the overall predictive contribution of a variable within a multivariate and nonlinear modeling framework. In particular, GGBFS showed a strong negative correlation with coarse aggregate (r = −0.92), indicating substantial multicollinearity, which can suppress its apparent marginal correlation with strength. However, tree-based models such as CatBoost can effectively capture nonlinear and interaction effects between variables. Therefore, the high importance of GGBFS suggests that it plays a critical role in combination with other parameters, even though its individual linear association with strength appears weak. This finding highlights the limitation of relying solely on linear correlation analysis and demonstrates the ability of the proposed machine learning framework to uncover complex multivariate relationships governing compressive strength.

4. Discussion

In this study, various machine learning models were used to predict the compressive strength of geopolymer concrete. Boosting-based ensemble models, particularly CatBoost, XGBoost, and Gradient Boosting, achieved the best performance, followed by bagging methods, while SVR was the strongest among base models and linear models performed the worst; moreover, the consistency between CV R² and holdout R² indicated reliable generalization without overfitting or holdout bias.

For the 30th run (seed = 29), the Random Forest model trained with the reduced feature set (five variables) achieved (holdout) R² of 0.8775 on the unseen data, while the baseline Random Forest model trained with the full feature set (19 variables) achieved R² of 0.9306 under the same conditions. Based on the 30th run, although the baseline model exhibited a slightly higher performance (ΔR² ≈ 0.06), the overall predictive performances remained comparable. This result along with the average Random Forest performance of R² of 0.8716 when compared with the baseline model performance, indicate that the hyperparameter optimization was able to largely compensate for the reduction in input dimensionality, and that decreasing the number of features from 19 to 5 did not result in a substantial or practically significant deterioration in predictive performance, while greatly enhancing the robustness in model interpretability.

A comparison of our findings with the existing literature reveals both alignment with established trends and notable methodological distinctions. Previous studies have consistently reported that parameters such as NaOH molarity, GGBFS proportion, and curing conditions exert a marked influence on geopolymer compressive strength; however, these investigations were often restricted to specific experimental setups or relatively small datasets. In this study, a larger and more diverse dataset comprising 274 mix designs was analyzed, enabling the confirmation of these known effects within a broader parameter space. By combining machine learning models with a multi-objective genetic algorithm, we were able to optimize predictive accuracy while reducing redundancy among variables, ensuring interpretability and thereby offering a practical tool that complements and extends the scope of earlier experimental research.

A comparison of the existing literature and the present study on geopolymer compressive strength prediction is given in Table 12.

This study introduced a combined approach of machine learning and multi-objective genetic algorithms to achieve both high predictive accuracy and stable variable selection. By modeling the effects of various input parameters across a broad dataset, multicollinearity is reduced through GA-based feature selection, enhancing interpretability. The analysis also considered ambient curing conditions, capturing their interactions with other mix components through correlation and network-based approaches. The models reproduce known trends, quantify optimal ranges, and examine interactions among mix design factors. Feature importance results from CatBoost highlight Na₂SiO₃ and dry NaOH as key predictors, consistent with experimental observations. Overall, this work reinforces established patterns while providing a data-driven predictive framework that combines feature selection with GA and optimized machine learning models. The study confirms trends at a larger scale and introduces a methodological contribution by simultaneously optimizing feature stability and prediction accuracy within a single workflow.

5. Conclusions

The production of ordinary Portland cement is a major source of CO₂ emissions, whereas geopolymer concrete significantly reduces environmental impact by utilizing industrial by-products such as fly ash and slag. At the same time, geopolymer concrete can achieve mechanical properties comparable to conventional concrete when key parameters, including precursor composition, activator type, and curing conditions, are properly optimized. These characteristics demonstrate that geopolymer concrete offers an effective balance between environmental sustainability and structural performance.

This research was focused on reinforcing the reliability and transparency of machine learning frameworks used to estimate geopolymer concrete compressive strength. Through implementing a dependency-aware feature selection protocol, we mitigate the negative effects of multicollinearity in a dataset (with many linearly dependent features) while maintaining high predictive precision of the model. The methodology followed a three-fold objective: first, engineering a specialized multi-objective NSGA-II genetic algorithm that filters features based on their linear independence; second, restructuring a high-dimensional geopolymer dataset into an optimized variable subset; and finally, benchmarking the predictive efficacy of twelve hyperparameter-tuned models trained on this refined input dataset. The results have shown that the number of linearly dependent features can be efficiently reduced without sacrificing the prediction accuracy for this dataset. It should be emphasized that the absence of a substantial loss in predictive accuracy between the baseline model trained on the full feature set and the models trained on the reduced feature subset can be largely attributed to the hyperparameter optimization applied to the latter, which enabled the models to maintain comparable performance despite the lower input dimensionality.

The features that have an influence on compressive strength prediction were observed as GGBFS, NaOH (Dry), Concentration, Initial curing time and Coarse Aggregate. This highlights the potential of industrial by-products like ground granulated blast furnace slag (GGBFS) as sustainable alternatives to conventional aggregates. Higher NaOH concentrations, particularly in the 8–16 M range, generally improve strength due to enhanced dissolution of aluminosilicate species.

Geopolymer concrete is particularly attractive for industries requiring high durability and resistance to aggressive environments, such as infrastructure, marine constructions, and precast concrete elements.

To emphasize the practical relevance of the findings, the best-performing ensemble model achieved an R² of 0.96 on the holdout dataset, with consistently low prediction error across cross-validation folds, confirming its robustness. The integration of correlation diagnostics enabled the selection of a reduced and physically consistent feature subset, mitigating the instability commonly caused by multicollinearity in geopolymer datasets. This structured integration of interpretability within the modeling pipeline improves confidence in feature importance rankings and enhances engineering transparency compared to conventional black-box ML applications.

Nevertheless, the proposed framework is constrained by the moderate dataset size and reliance on a limited number of experimental sources, which may affect external validity. Model performance may also exhibit sensitivity to data partitioning strategies. Future work should focus on expanding the dataset with independent experimental campaigns, testing transferability across different precursor systems, and validating the framework under varying curing and environmental conditions. From a practical standpoint, the developed approach provides a transparent decision-support tool for geopolymer mix design, enabling more efficient parameter prioritization and reducing experimental trial-and-error efforts.

Future research should extend the proposed framework by incorporating larger and multi-source experimental datasets to enhance external validity and improve generalization capability. The integration of additional geopolymer systems (e.g., blended precursors or alternative activator types) would allow evaluation of model transferability across different material compositions. Moreover, combining the current multicollinearity-aware pipeline with advanced explainable AI techniques and uncertainty quantification methods could further strengthen interpretability and reliability. The development of hybrid physics-informed or mechanistic-ML models may also improve extrapolation performance beyond the training domain. Finally, implementing the framework within an optimization environment for automated mix proportioning would enable direct decision-support applications for sustainable geopolymer concrete design in practical engineering settings.

Author Contributions

Conceptualization, G.B. and F.A.; methodology, F.A. and Ü.I.; software, Ü.I.; validation, Ü.I. and F.A.; formal analysis, Ü.I., F.A. and G.B.; investigation, Ü.I. and F.A.; writing—original draft preparation, F.A., Ü.I., G.B. and C.C.; writing—review and editing, Ü.I., C.C., S.M.N., Z.W.G. and G.B.; visualization, Ü.I., C.C. and F.A.; supervision, G.B. and Z.W.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in Mendeley Data Repository at reference [15].

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Step-by-Step Tree Diagram of Experimental Process

1. Load Data
  ├── Read CSV file
  ├── Handle missing values (dropna)
  ├── Apply StandardScaler normalization
  └── Extract dataset info (19 independent variables + 1 target)
2. Initial Correlation Analysis
  ├── Calculate Pearson correlation matrix
  ├── Generate correlation plot (heatmap)
  ├── Create correlation network plot (threshold = 0.7)
  └── Report correlation statistics
3. Genetic Algorithm Configuration
  ├── Set minimum variables (default: 5)
  ├── Configure GA parameters:
  │   ├── Population size (default: 60)
  │   ├── Generations (default: 40)
  │   ├── Crossover probability (default: 0.7)
  │   └── Mutation probability (default: 0.2)
  └── Set objective weights (VIF, R², RMSE, Total R)

4. Genetic Algorithm(NSGA-II) Optimization (Run 12 Weight Combinations)
  ├── Initialize population (60 individuals, binary encoding)
  ├── For 40 generations:
  │   ├── Evaluate fitness (VIF, R², RMSE, Total Multiple R)
  │   ├── Select with NSGA-II (Pareto ranking + crowding distance)
  │   ├── Crossover (prob = 0.7, two-point crossover)
  │   ├── Mutate (prob = 0.2, bit flip mutation)
  │   └── Update Pareto Front (Hall of Fame)
  └── Return Pareto optimal solutions

5. Pareto Front Analysis
  ├── Analyze all Pareto solutions
  ├── Find best solution for each objective
  ├── Calculate balanced scores using weighted normalization
  ├── Select optimal solution (maximizes balanced score)
  └── Variable count: Determined by GA results (typically 5–12)

6. Selected Variables Correlation Analysis
  ├── Calculate correlation for selected subset
  ├── Generate heatmap (selected variables only)
  ├── Create network plot (threshold = 0.7)
  └── Compare to baseline correlations
7. Baseline Evaluation (30 runs)
  ├── Calculate VIF for all 19 variables
  ├── Split data 80/20 train/test
  ├── Train Random Forest with k-fold CV
  ├── Evaluate on test set
  ├── Generate outputs:
  │   ├── VIF analysis plot
  │   ├── Predicted vs. true plots
  │   ├── Feature importance plot
  │   └── Baseline report (.txt)
  └── Purpose: Establish reference metrics before optimization

8. Training, Tuning and Evaluation of ML models(30 runs)
  ├── Extract selected features (N variables from GA)
  ├── Scale selected features (StandardScaler)
  ├── New Split: 80/20 train/test (Different from baseline split)
  ├── For each of 12–15 models:
  │   ├── Optimize hyperparameters (TPE, 50 evaluations)
  │   ├── k-fold CV on training set ONLY
  │   ├── Train final model on training set
  │   └── Evaluate on holdout test set
  ├── Create predicted vs. true grid for all models
  ├── Intelligent feature importance plotting:
  │   ├── Try best model first
  │   ├── If unsupported, try 2nd best, 3rd best, etc.
  │   └── Plot from first model that supports it
  ├── Rank models by test set R²
  └── Print performance summary table

9. Final Analysis and Visualization
  ├── Print Final Model Performance and Ranking:
  │   ├── Display All models
  │   ├── Show holdout test metrics: R²_Test, RMSE_Test, MAE_Test
  │   ├── Include CV metrics for comparison: R2_CV, RMSE_CV
  │   ├── Best model selected by R²_Test (holdout performance)
  │   └── Generalization assessment: ΔR² = |R²_Test − R²_CV|
  ├── Plot GA convergence (4 objectives over 40 generations)
  ├── Plot Pareto front (4 views of trade-offs)
  ├── Plot feature importance for BEST model (by R²_Test)
  └── Compare baseline vs. final performance

10. Final Verification (embedded in Step 8)
  ├── Calculate final VIF values
  ├── Assess multicollinearity reduction
  ├── Report variable reduction percentage
  ├── Compare baseline vs. final metrics (R², RMSE)
  └── Summary statistics and conclusions

Appendix B

Baseline Model Metrics (30 Runs)

Model	Holdout R²	Holdout RMSE	Holdout MAE	CV R²	CV RMSE	Seed
RF	0.9226	3.2784	2.6429	0.8963	3.5745	0
RF	0.9319	3.0384	2.291	0.8947	3.5524	1
RF	0.9322	3.3769	2.548	0.8942	3.5051	2
RF	0.8825	4.1381	2.9361	0.9048	3.3808	3
RF	0.9102	3.0746	2.2946	0.9084	3.3667	4
RF	0.8986	3.0843	2.1676	0.9071	3.4802	5
RF	0.9227	3.0016	2.2783	0.9079	3.426	6
RF	0.8201	3.56	2.7877	0.9222	3.2965	7
RF	0.9356	3.0688	2.2093	0.8958	3.5138	8
RF	0.9072	3.4149	2.5008	0.9173	3.2699	9
RF	0.9342	2.7686	2.0312	0.878	3.9645	10
RF	0.8792	3.8861	2.969	0.9076	3.442	11
RF	0.9006	3.6712	2.7049	0.9078	3.3133	12
RF	0.9338	3.596	2.8676	0.8974	3.3979	13
RF	0.949	2.5461	1.896	0.886	3.6622	14
RF	0.8674	3.6007	2.8647	0.9222	3.1948	15
RF	0.9383	2.9217	2.1654	0.9177	3.2575	16
RF	0.9399	3.4124	2.5526	0.8908	3.5125	17
RF	0.8314	3.8618	2.9215	0.9167	3.382	18
RF	0.938	2.6377	1.941	0.914	3.3207	19
RF	0.9543	2.5599	1.9562	0.8977	3.5292	20
RF	0.9019	3.1983	2.3596	0.9056	3.5737	21
RF	0.856	4.1259	2.7964	0.9129	3.3745	22
RF	0.9411	2.9445	2.056	0.9095	3.3722	23
RF	0.9377	3.277	2.5692	0.8943	3.2581	24
RF	0.9226	2.9211	2.2204	0.8899	3.6107	25
RF	0.925	2.8658	2.1304	0.903	3.5624	26
RF	0.9114	3.3076	2.4315	0.9147	3.2288	27
RF	0.9138	2.8161	2.0989	0.9102	3.5027	28
RF	0.9306	3.1516	2.2696	0.8933	3.6735	29

Appendix C

Fine-Tuned Model Metrics (30 Runs)

Model	Holdout R²	Holdout RMSE	Holdout MAE	CV R²	CV RMSE	Seed
CatBoost	0.9501	2.6328	2.0413	0.9166	3.208	0
XGBoost	0.9197	3.3389	2.3549	0.8915	3.6501	0
LightGBM	0.9082	3.5707	2.492	0.8263	4.6447	0
GradientBoosting	0.9059	3.6139	2.564	0.8904	3.671	0
ExtraTrees	0.8958	3.8038	3.0069	0.8861	3.7658	0
RandomForest	0.8826	4.0373	3.182	0.8646	4.1154	0
SVR	0.851	4.5476	3.6113	0.8566	4.2369	0
DecisionTree	0.8355	4.7796	3.9073	0.7585	5.4498	0
MLP	0.8013	5.2528	4.4536	0.8469	4.3404	0
KNeighbors	0.7007	6.4465	4.9625	0.673	6.5176	0
ElasticNet	0.6405	7.0651	6.0485	0.6116	6.9837	0
Ridge	0.6404	7.066	6.0499	0.6116	6.9834	0
Lasso	0.6404	7.066	6.0497	0.6116	6.9835	0
LinearRegression	0.6403	7.0665	6.0505	0.6116	6.9833	0
CatBoost	0.9394	2.8657	2.0593	0.9133	3.2102	1
XGBoost	0.9324	3.0256	2.1132	0.905	3.3824	1
GradientBoosting	0.9046	3.5957	2.4552	0.9055	3.3612	1
ExtraTrees	0.9008	3.6672	3.1507	0.8643	4.0879	1
RandomForest	0.8887	3.8832	3.1977	0.8487	4.32	1
SVR	0.8645	4.2844	3.2473	0.8404	4.3847	1
LightGBM	0.8616	4.3305	2.9442	0.857	4.2867	1
MLP	0.8566	4.4087	3.6754	0.8582	4.1225	1
DecisionTree	0.8013	5.1886	3.8099	0.7909	5.0612	1
KNeighbors	0.7724	5.5532	4.4787	0.6296	6.9488	1
LinearRegression	0.6765	6.6211	5.382	0.599	7.041	1
Lasso	0.6765	6.6212	5.382	0.599	7.0411	1
Ridge	0.6765	6.6213	5.3823	0.599	7.041	1
ElasticNet	0.6764	6.6218	5.3828	0.599	7.0412	1
CatBoost	0.9502	2.8926	2.1738	0.9091	3.2442	2
XGBoost	0.9422	3.1166	2.4511	0.8933	3.5089	2
GradientBoosting	0.9316	3.3915	2.4927	0.8847	3.6491	2
ExtraTrees	0.9093	3.9036	3.2343	0.8739	3.8104	2
SVR	0.8932	4.2366	3.365	0.8118	4.68	2
LightGBM	0.8902	4.2963	3.0713	0.8396	4.3843	2
RandomForest	0.8865	4.3671	3.1754	0.8485	4.22	2
DecisionTree	0.8786	4.5164	3.7798	0.8008	4.8198	2
MLP	0.8427	5.1422	4.2887	0.7863	5.0306	2
KNeighbors	0.7201	6.8583	5.0539	0.6062	6.8183	2
LinearRegression	0.688	7.2413	5.8934	0.5786	6.9441	2
Ridge	0.6879	7.2423	5.8936	0.5786	6.9441	2
Lasso	0.6879	7.2425	5.8936	0.5786	6.9442	2
ElasticNet	0.6877	7.2445	5.894	0.5786	6.9443	2
CatBoost	0.919	3.436	2.6097	0.9141	3.1367	3
MLP	0.8896	4.0109	3.2653	0.8603	4.0044	3
XGBoost	0.8825	4.1379	2.8213	0.9214	3.0495	3
SVR	0.878	4.216	3.2267	0.8093	4.6947	3
ExtraTrees	0.8735	4.2936	3.4908	0.8766	3.8198	3
GradientBoosting	0.8596	4.5233	2.9682	0.9139	3.2113	3
RandomForest	0.8425	4.7913	3.6386	0.8622	4.1216	3
DecisionTree	0.8003	5.3944	4.238	0.7571	5.2296	3
LightGBM	0.7841	5.6092	3.8023	0.8422	4.2603	3
KNeighbors	0.7144	6.4506	5.0533	0.6347	6.7522	3
LinearRegression	0.67	6.934	5.5934	0.5777	7.0427	3
Lasso	0.67	6.9343	5.5927	0.5777	7.0427	3
Ridge	0.67	6.9349	5.5878	0.5779	7.043	3
ElasticNet	0.6699	6.9357	5.5843	0.578	7.0432	3
XGBoost	0.8992	3.2573	2.5146	0.9089	3.3498	4
GradientBoosting	0.8969	3.2931	2.582	0.9075	3.3774	4
CatBoost	0.889	3.418	2.5535	0.9186	3.1755	4
LightGBM	0.8852	3.476	2.6456	0.8534	4.2314	4
RandomForest	0.8694	3.7065	2.9541	0.8634	4.1831	4
ExtraTrees	0.8637	3.7874	2.9489	0.8821	3.8905	4
SVR	0.846	4.0252	3.2793	0.8215	4.7277	4
MLP	0.744	5.1903	4.3146	0.8663	4.141	4
DecisionTree	0.736	5.2702	4.4625	0.7607	5.4633	4
KNeighbors	0.6996	5.6221	4.4731	0.6596	6.6997	4
LinearRegression	0.5878	6.5857	5.5928	0.6274	7.0012	4
Ridge	0.5868	6.5941	5.5963	0.6277	7.0004	4
Lasso	0.5863	6.5978	5.597	0.6276	7.0018	4
ElasticNet	0.5862	6.5987	5.598	0.6277	7.0004	4
CatBoost	0.9248	2.6559	1.8868	0.9175	3.2195	5
XGBoost	0.9215	2.7133	1.7232	0.9046	3.4968	5
GradientBoosting	0.8991	3.0762	2.0002	0.9033	3.5734	5
ExtraTrees	0.8924	3.1767	2.5468	0.8813	4.0073	5
LightGBM	0.8805	3.3481	2.5556	0.8674	4.2061	5
RandomForest	0.8642	3.57	2.7763	0.8692	4.2182	5
SVR	0.8084	4.2397	3.2075	0.8534	4.4186	5
MLP	0.7387	4.9509	4.0228	0.899	3.6619	5
DecisionTree	0.738	4.9583	4.045	0.8313	4.7348	5
KNeighbors	0.7002	5.3034	3.9112	0.6854	6.6324	5
ElasticNet	0.554	6.4685	5.189	0.6333	7.0276	5
Lasso	0.5533	6.4739	5.1855	0.6332	7.0283	5
Ridge	0.5509	6.4909	5.1968	0.6334	7.026	5
LinearRegression	0.544	6.5404	5.2163	0.6336	7.0214	5
XGBoost	0.9439	2.557	1.8363	0.908	3.3954	6
CatBoost	0.943	2.5773	1.986	0.9146	3.2632	6
GradientBoosting	0.9279	2.899	2.0394	0.905	3.4222	6
SVR	0.8923	3.544	2.8536	0.8322	4.5862	6
ExtraTrees	0.8849	3.6638	2.9152	0.8776	3.9888	6
LightGBM	0.877	3.787	3.0202	0.8284	4.7531	6
RandomForest	0.8723	3.8591	3.1022	0.8575	4.3389	6
MLP	0.8655	3.9601	3.1953	0.8674	4.0941	6
DecisionTree	0.739	5.5171	4.6455	0.7386	5.7387	6
KNeighbors	0.7148	5.7666	4.6138	0.652	6.8958	6
LinearRegression	0.6376	6.5006	5.4315	0.624	7.0016	6
Lasso	0.637	6.5059	5.4345	0.624	7.0027	6
Ridge	0.6362	6.5139	5.4418	0.624	7.0027	6
ElasticNet	0.6353	6.5212	5.4469	0.624	7.0032	6
CatBoost	0.8367	3.3921	2.5945	0.9291	3.0499	7
XGBoost	0.8022	3.7332	2.9258	0.9356	2.977	7
ExtraTrees	0.8008	3.7457	3.0452	0.8959	3.7686	7
LightGBM	0.7974	3.7775	3.116	0.866	4.2277	7
GradientBoosting	0.7967	3.784	2.8242	0.9248	3.1941	7
RandomForest	0.788	3.8643	3.0317	0.8778	4.1161	7
SVR	0.7745	3.986	3.2161	0.8489	4.5428	7
MLP	0.7334	4.3333	3.6347	0.8426	4.645	7
KNeighbors	0.6532	4.943	3.8896	0.7103	6.3176	7
DecisionTree	0.5861	5.3999	4.5492	0.7642	5.7264	7
Lasso	0.4169	6.4091	5.2824	0.6323	7.0564	7
ElasticNet	0.4156	6.4163	5.2876	0.6322	7.0568	7
Ridge	0.4145	6.4221	5.2911	0.6322	7.056	7
LinearRegression	0.4122	6.435	5.3004	0.6322	7.0543	7
CatBoost	0.9469	2.7856	2.1058	0.9088	3.2681	8
XGBoost	0.9431	2.8843	2.0608	0.9112	3.29	8
GradientBoosting	0.9268	3.2716	2.3197	0.9023	3.4144	8
LightGBM	0.9222	3.3736	2.4765	0.8357	4.5543	8
RandomForest	0.8916	3.9812	3.0747	0.8481	4.2675	8
ExtraTrees	0.8864	4.0755	3.2277	0.8762	3.8608	8
SVR	0.8755	4.2665	3.3299	0.8175	4.631	8
MLP	0.8566	4.5785	3.5956	0.8531	4.1666	8
DecisionTree	0.7286	6.2985	4.7167	0.7629	5.3199	8
KNeighbors	0.6927	6.7023	4.8752	0.6559	6.5889	8
LinearRegression	0.6781	6.8603	5.6953	0.6039	6.9526	8
Ridge	0.6779	6.8622	5.6975	0.6039	6.9525	8
Lasso	0.6779	6.862	5.6974	0.6039	6.9526	8
ElasticNet	0.6776	6.8656	5.7013	0.6039	6.9526	8
CatBoost	0.9249	3.0706	2.0918	0.9262	3.1121	9
XGBoost	0.9199	3.1731	2.3622	0.9171	3.252	9
GradientBoosting	0.919	3.1904	2.3459	0.9101	3.4089	9
LightGBM	0.9049	3.4556	2.2741	0.8411	4.521	9
RandomForest	0.9022	3.5044	2.64	0.8658	4.151	9
ExtraTrees	0.8922	3.6796	2.9456	0.8847	3.8631	9
DecisionTree	0.8426	4.4466	3.6427	0.8425	4.445	9
SVR	0.8423	4.4506	3.3924	0.8318	4.6183	9
KNeighbors	0.7727	5.3433	4.2427	0.6704	6.5229	9
MLP	0.7182	5.9498	4.8059	0.8907	3.6939	9
ElasticNet	0.5128	7.8234	6.4292	0.6535	6.572	9
Lasso	0.5126	7.8248	6.4294	0.6535	6.5722	9
LinearRegression	0.5126	7.8253	6.4295	0.6535	6.5722	9
Ridge	0.5126	7.8246	6.4294	0.6535	6.5721	9
XGBoost	0.9104	3.2319	2.2339	0.9055	3.4885	10
ExtraTrees	0.9081	3.2733	2.7139	0.8554	4.3791	10
CatBoost	0.9079	3.2764	2.215	0.9038	3.5705	10
RandomForest	0.9047	3.333	2.6334	0.8377	4.608	10
GradientBoosting	0.8978	3.4517	2.3268	0.8776	3.9871	10
MLP	0.883	3.693	2.8804	0.8555	4.3752	10
LightGBM	0.8754	3.8108	2.7363	0.8574	4.3138	10
SVR	0.8585	4.0614	2.9323	0.8134	4.9601	10
DecisionTree	0.845	4.2507	3.4263	0.7836	5.2976	10
KNeighbors	0.8125	4.6745	3.6989	0.6521	6.8167	10
Lasso	0.6516	6.3729	5.2726	0.6136	7.1204	10
Ridge	0.651	6.378	5.2866	0.6139	7.1216	10
ElasticNet	0.651	6.3783	5.2771	0.6133	7.1288	10
LinearRegression	0.6475	6.4103	5.3269	0.6137	7.1116	10
XGBoost	0.9095	3.363	2.5279	0.9045	3.5388	11
CatBoost	0.8982	3.5671	2.5502	0.9298	2.9618	11
GradientBoosting	0.8904	3.7011	2.6252	0.8838	3.7533	11
ExtraTrees	0.8824	3.8341	3.2904	0.8779	3.9817	11
RandomForest	0.8821	3.8379	3.0885	0.8646	4.2851	11
LightGBM	0.8754	3.9456	2.887	0.8362	4.6651	11
SVR	0.8379	4.5017	3.4877	0.8149	4.6814	11
MLP	0.7714	5.345	4.4674	0.7998	4.8324	11
DecisionTree	0.7682	5.3827	4.3376	0.7325	5.7177	11
KNeighbors	0.7662	5.4058	4.0847	0.6039	7.3523	11
LinearRegression	0.5577	7.4352	6.0354	0.614	7.0752	11
Ridge	0.5576	7.4357	6.0358	0.614	7.0754	11
Lasso	0.5576	7.4357	6.0357	0.614	7.0755	11
ElasticNet	0.5575	7.4367	6.0366	0.614	7.076	11
XGBoost	0.9392	2.8711	2.1712	0.8677	3.8619	12
GradientBoosting	0.9292	3.0976	2.1658	0.8797	3.7281	12
CatBoost	0.9233	3.2249	2.3917	0.8983	3.4023	12
ExtraTrees	0.8682	4.2272	3.4336	0.8643	3.9737	12
RandomForest	0.8633	4.3058	3.4128	0.8483	4.2547	12
SVR	0.8632	4.3072	3.4325	0.8218	4.4909	12
LightGBM	0.8416	4.6348	3.2347	0.8366	4.491	12
MLP	0.8327	4.7628	4.033	0.8419	4.2948	12
LinearRegression	0.7223	6.1363	5.1091	0.5658	7.2461	12
Lasso	0.7222	6.137	5.1094	0.5658	7.2462	12
Ridge	0.7207	6.1542	5.1201	0.5665	7.2464	12
ElasticNet	0.7199	6.1628	5.1274	0.5665	7.2466	12
KNeighbors	0.6923	6.4591	4.8027	0.5979	7.1792	12
DecisionTree	0.6711	6.6778	5.2014	0.74	5.464	12
XGBoost	0.9495	3.1401	2.3151	0.8864	3.5771	13
CatBoost	0.9448	3.2845	2.5357	0.9031	3.2987	13
GradientBoosting	0.9413	3.386	2.5505	0.8825	3.6295	13
LightGBM	0.922	3.9024	2.8814	0.842	4.1964	13
ExtraTrees	0.9059	4.2863	3.6585	0.8494	4.0873	13
SVR	0.8776	4.8893	3.91	0.8363	4.2708	13
MLP	0.8648	5.1398	4.3815	0.8599	3.949	13
RandomForest	0.8451	5.5009	4.2629	0.8502	4.1057	13
DecisionTree	0.8428	5.5422	4.6208	0.7865	4.9059	13
KNeighbors	0.6495	8.2747	6.2872	0.5991	6.6639	13
LinearRegression	0.6427	8.3546	6.6651	0.6226	6.5361	13
Ridge	0.6414	8.3698	6.6648	0.6228	6.5349	13
ElasticNet	0.6407	8.3777	6.6647	0.6228	6.5349	13
Lasso	0.6399	8.3866	6.6689	0.6227	6.5356	13
XGBoost	0.9486	2.5566	1.8771	0.8871	3.6233	14
CatBoost	0.9442	2.6624	1.988	0.895	3.395	14
GradientBoosting	0.9365	2.8409	1.9553	0.8897	3.5488	14
ExtraTrees	0.9247	3.0934	2.5661	0.8419	4.3006	14
LightGBM	0.906	3.456	2.6451	0.8103	4.6583	14
RandomForest	0.9047	3.4805	2.8058	0.8311	4.4967	14
SVR	0.8743	3.9974	3.007	0.812	4.5625	14
MLP	0.8597	4.2223	3.5695	0.8428	4.1671	14
DecisionTree	0.7732	5.3693	4.4792	0.7525	5.4252	14
KNeighbors	0.6803	6.3741	4.8464	0.6464	6.6296	14
LinearRegression	0.6767	6.4103	5.3225	0.5786	7.0904	14
Ridge	0.6734	6.4428	5.3297	0.5804	7.09	14
Lasso	0.6731	6.4462	5.3276	0.5797	7.0931	14
ElasticNet	0.6716	6.4606	5.3364	0.5806	7.0909	14
CatBoost	0.8846	3.36	2.6998	0.9263	3.0492	15
XGBoost	0.8719	3.5397	2.602	0.9075	3.4121	15
GradientBoosting	0.8618	3.6769	2.7795	0.8883	3.7286	15
ExtraTrees	0.8551	3.7643	3.0283	0.8925	3.7475	15
RandomForest	0.8474	3.8628	3.2067	0.8693	4.1718	15
LightGBM	0.8021	4.3995	3.5007	0.8529	4.3372	15
SVR	0.7956	4.4709	3.4907	0.8442	4.4722	15
DecisionTree	0.7649	4.7955	3.5861	0.8022	5.0637	15
MLP	0.7443	5.0007	4.0799	0.8622	4.2037	15
KNeighbors	0.6382	5.949	4.5333	0.7234	6.1986	15
LinearRegression	0.5964	6.2833	4.742	0.6246	7.0054	15
Lasso	0.5939	6.3023	4.7371	0.6259	7.004	15
Ridge	0.5931	6.3083	4.7474	0.6263	7.0015	15
ElasticNet	0.5915	6.3212	4.7563	0.6265	7.002	15
CatBoost	0.9442	2.7782	1.7931	0.918	3.2227	16
XGBoost	0.9401	2.8788	1.992	0.9172	3.2661	16
GradientBoosting	0.9165	3.3981	2.2245	0.9147	3.3087	16
ExtraTrees	0.9079	3.5676	2.9641	0.8895	3.7455	16
RandomForest	0.89	3.8999	3.0664	0.8725	4.0374	16
LightGBM	0.8886	3.9255	2.8282	0.8864	3.863	16
SVR	0.8527	4.5136	3.4466	0.8654	4.1381	16
MLP	0.8014	5.2398	4.154	0.8934	3.6466	16
DecisionTree	0.7907	5.3797	4.1956	0.7645	5.4268	16
KNeighbors	0.7727	5.6064	4.3284	0.6546	6.6366	16
ElasticNet	0.5974	7.4609	6.0042	0.6356	6.7751	16
Ridge	0.5971	7.4635	6.0055	0.6356	6.7755	16
Lasso	0.5967	7.4675	6.0073	0.6354	6.7769	16
LinearRegression	0.5966	7.4681	6.0077	0.6354	6.7771	16
CatBoost	0.9485	3.1585	2.4146	0.8967	3.3706	17
GradientBoosting	0.9327	3.611	2.5344	0.8911	3.4823	17
XGBoost	0.9299	3.6856	2.6912	0.8887	3.531	17
ExtraTrees	0.9259	3.7892	3.0982	0.8661	3.8917	17
LightGBM	0.9182	3.9824	2.9991	0.8463	4.1813	17
MLP	0.9109	4.1556	3.0083	0.8706	3.7336	17
DecisionTree	0.9046	4.2993	3.3245	0.796	4.7267	17
RandomForest	0.8945	4.5213	3.5257	0.837	4.297	17
SVR	0.8822	4.7787	3.6541	0.8175	4.4347	17
KNeighbors	0.7435	7.0501	5.3603	0.5791	6.8743	17
LinearRegression	0.7061	7.5467	6.0846	0.6058	6.7127	17
Ridge	0.7012	7.6099	6.1621	0.6074	6.708	17
Lasso	0.7004	7.6198	6.1664	0.6066	6.7125	17
ElasticNet	0.6986	7.6429	6.1986	0.6075	6.7087	17
ExtraTrees	0.8478	3.6695	2.969	0.9047	3.6425	18
XGBoost	0.8414	3.7464	2.7587	0.9166	3.382	18
CatBoost	0.8281	3.8997	2.9294	0.9356	2.9764	18
GradientBoosting	0.8222	3.9661	2.8306	0.9205	3.2998	18
RandomForest	0.8127	4.0711	3.3184	0.8876	3.964	18
LightGBM	0.8003	4.2038	3.0064	0.8493	4.5896	18
SVR	0.7462	4.7392	3.6377	0.8687	4.3002	18
KNeighbors	0.7028	5.1281	4.0929	0.6899	6.574	18
MLP	0.667	5.428	4.3937	0.8975	3.7024	18
DecisionTree	0.5783	6.1083	4.8724	0.7801	5.5124	18
Lasso	0.4968	6.6725	5.3074	0.6482	6.9713	18
ElasticNet	0.4965	6.6743	5.3096	0.6482	6.9708	18
Ridge	0.4954	6.6818	5.3127	0.6482	6.9706	18
LinearRegression	0.4934	6.6954	5.3171	0.648	6.9709	18
CatBoost	0.9222	2.9547	2.3238	0.9272	3.0337	19
XGBoost	0.9197	3.0016	2.3155	0.9326	2.9614	19
GradientBoosting	0.9175	3.0421	2.3079	0.9134	3.3414	19
RandomForest	0.8868	3.5635	2.9678	0.8588	4.2661	19
ExtraTrees	0.8863	3.5714	2.8125	0.8775	3.965	19
LightGBM	0.878	3.6998	2.7497	0.854	4.2984	19
SVR	0.8387	4.2547	3.0751	0.854	4.3311	19
DecisionTree	0.8087	4.6329	3.6258	0.7916	5.1481	19
MLP	0.7055	5.7493	4.5797	0.8688	4.1353	19
KNeighbors	0.5931	6.7579	5.1132	0.6839	6.3987	19
ElasticNet	0.5461	7.137	5.8042	0.6531	6.754	19
Lasso	0.5459	7.139	5.8049	0.6531	6.7537	19
Ridge	0.5458	7.1392	5.8052	0.6531	6.7538	19
LinearRegression	0.5457	7.1401	5.8055	0.6531	6.7536	19
CatBoost	0.9512	2.6457	1.9785	0.9053	3.3857	20
XGBoost	0.9441	2.8311	2.0372	0.8944	3.6094	20
GradientBoosting	0.9206	3.3731	2.4217	0.9002	3.5283	20
ExtraTrees	0.9007	3.7717	2.9212	0.8904	3.6705	20
LightGBM	0.8941	3.8968	2.5787	0.8157	4.7705	20
DecisionTree	0.8903	3.9657	3.2683	0.8084	4.8331	20
RandomForest	0.8879	4.0076	3.038	0.8621	4.1729	20
SVR	0.8715	4.2923	3.2819	0.846	4.3393	20
MLP	0.7837	5.5679	4.5253	0.8489	4.2407	20
KNeighbors	0.6805	6.7669	4.7924	0.6652	6.5863	20
Ridge	0.6313	7.2699	5.9644	0.6039	6.929	20
ElasticNet	0.6313	7.2698	5.9571	0.604	6.9295	20
Lasso	0.6312	7.2709	5.9521	0.6041	6.9299	20
LinearRegression	0.6311	7.2713	5.974	0.6036	6.9283	20
CatBoost	0.9084	3.09	2.1586	0.9285	3.1159	21
GradientBoosting	0.9043	3.1584	2.2339	0.8901	3.7854	21
XGBoost	0.8848	3.4659	2.3619	0.902	3.5846	21
LightGBM	0.8845	3.4697	2.4531	0.8455	4.4469	21
RandomForest	0.8653	3.7474	3.0338	0.865	4.2821	21
ExtraTrees	0.8337	4.1637	3.2727	0.8829	4.0019	21
SVR	0.7821	4.7654	3.5945	0.8487	4.5667	21
DecisionTree	0.6902	5.6826	4.6767	0.7479	5.8399	21
MLP	0.6862	5.7197	4.7232	0.8745	4.1805	21
KNeighbors	0.6494	6.0452	4.3715	0.6956	6.4446	21
ElasticNet	0.5092	7.1525	5.8389	0.6582	6.7383	21
Ridge	0.5087	7.156	5.8388	0.6582	6.7379	21
Lasso	0.5087	7.1565	5.8374	0.6582	6.7376	21
LinearRegression	0.5078	7.163	5.8383	0.6581	6.7366	21
CatBoost	0.9462	2.5225	1.998	0.9151	3.3583	22
XGBoost	0.9193	3.0889	2.3159	0.9151	3.338	22
ExtraTrees	0.919	3.0956	2.3566	0.8802	3.9616	22
LightGBM	0.9073	3.3112	2.2262	0.8565	4.3439	22
GradientBoosting	0.8886	3.6294	2.5989	0.9035	3.5557	22
RandomForest	0.8676	3.9561	2.8271	0.861	4.2643	22
SVR	0.8518	4.1863	3.3504	0.8383	4.6031	22
MLP	0.8384	4.3711	3.6608	0.8407	4.5787	22
DecisionTree	0.7472	5.4674	4.5319	0.7769	5.4051	22
KNeighbors	0.6958	5.998	4.3274	0.6912	6.3763	22
ElasticNet	0.6555	6.3823	5.3863	0.646	6.8528	22
Lasso	0.6552	6.3855	5.3823	0.646	6.8519	22
Ridge	0.6549	6.3885	5.3865	0.646	6.8523	22
LinearRegression	0.6534	6.4021	5.387	0.646	6.852	22
CatBoost	0.9507	2.6954	1.9522	0.9202	3.1335	23
SVR	0.9192	3.4504	2.6118	0.86	4.1881	23
ExtraTrees	0.9187	3.4603	2.7364	0.8773	3.9162	23
GradientBoosting	0.9107	3.6263	2.4295	0.9155	3.2791	23
XGBoost	0.9098	3.6452	2.4856	0.9282	3.0365	23
RandomForest	0.9087	3.6682	2.8844	0.8549	4.2511	23
LightGBM	0.8696	4.3831	3.072	0.8349	4.5328	23
MLP	0.8621	4.5077	3.7432	0.8635	4.1211	23
DecisionTree	0.8285	5.0265	3.9714	0.7567	5.4989	23
KNeighbors	0.7807	5.6844	4.2173	0.6231	6.9149	23
LinearRegression	0.689	6.7686	5.5431	0.5978	7.0433	23
Lasso	0.689	6.7692	5.5438	0.5978	7.0434	23
Ridge	0.6888	6.7705	5.5453	0.5978	7.0434	23
ElasticNet	0.6887	6.7718	5.5468	0.5978	7.0436	23
CatBoost	0.9551	2.783	2.1612	0.9023	3.1017	24
GradientBoosting	0.9472	3.0179	2.3091	0.8786	3.433	24
XGBoost	0.9406	3.1999	2.5022	0.8898	3.3078	24
LightGBM	0.9232	3.6394	2.6643	0.8375	4.0463	24
ExtraTrees	0.9116	3.904	3.0649	0.8664	3.6892	24
SVR	0.8749	4.6446	3.6486	0.8182	4.2488	24
RandomForest	0.8637	4.8478	3.9354	0.8516	3.9799	24
DecisionTree	0.8555	4.9909	3.8607	0.809	4.3779	24
MLP	0.846	5.1528	4.3006	0.8404	3.9861	24
KNeighbors	0.7103	7.0668	5.8424	0.599	6.7504	24
LinearRegression	0.6452	7.8207	6.6026	0.565	6.8174	24
Ridge	0.6451	7.8223	6.6032	0.565	6.8175	24
Lasso	0.6451	7.8225	6.6031	0.565	6.8176	24
ElasticNet	0.6448	7.8258	6.6044	0.5651	6.8177	24
CatBoost	0.9334	2.7104	2.1613	0.9104	3.365	25
GradientBoosting	0.8983	3.3488	2.2235	0.9012	3.537	25
XGBoost	0.8967	3.3752	2.4045	0.9096	3.3406	25
ExtraTrees	0.8911	3.4646	2.6323	0.8802	3.9287	25
SVR	0.8731	3.7407	2.8403	0.8133	4.8907	25
RandomForest	0.8538	4.0156	3.1182	0.857	4.2602	25
MLP	0.8373	4.2351	3.5608	0.8534	4.3152	25
LightGBM	0.8289	4.3431	2.9035	0.8531	4.3279	25
DecisionTree	0.7244	5.5129	4.7306	0.7461	5.6451	25
KNeighbors	0.6988	5.7631	4.4883	0.6442	6.9113	25
LinearRegression	0.6924	5.8241	4.9444	0.5958	7.1363	25
Ridge	0.6923	5.8247	4.9454	0.5958	7.1364	25
Lasso	0.6923	5.8247	4.9455	0.5958	7.1364	25
ElasticNet	0.6922	5.8261	4.9477	0.5958	7.1366	25
XGBoost	0.9445	2.4657	1.8701	0.9029	3.5657	26
CatBoost	0.9443	2.4687	1.8411	0.9042	3.5537	26
LightGBM	0.9299	2.7707	1.9348	0.8385	4.5531	26
GradientBoosting	0.9245	2.8757	2.145	0.8968	3.6793	26
ExtraTrees	0.9019	3.2773	2.6205	0.893	3.7715	26
SVR	0.885	3.5476	2.9276	0.8237	4.8462	26
RandomForest	0.8597	3.9188	2.9551	0.8759	4.0674	26
KNeighbors	0.8333	4.2723	3.5009	0.6596	6.7094	26
MLP	0.8081	4.5836	3.8746	0.8408	4.5811	26
DecisionTree	0.777	4.9413	3.9367	0.7424	5.7795	26
ElasticNet	0.6566	6.1317	5.2344	0.6359	7.0061	26
Lasso	0.6563	6.1343	5.2319	0.6358	7.0067	26
Ridge	0.6559	6.1376	5.2444	0.6359	7.006	26
LinearRegression	0.6542	6.1523	5.2648	0.6353	7.0089	26
XGBoost	0.9283	2.9749	1.9757	0.9178	3.2001	27
GradientBoosting	0.9135	3.2677	2.2509	0.9141	3.282	27
CatBoost	0.8909	3.6702	2.4653	0.9214	3.0969	27
ExtraTrees	0.8874	3.7278	3.0551	0.8901	3.6865	27
RandomForest	0.8689	4.0232	3.2147	0.867	4.0731	27
LightGBM	0.8494	4.3114	3.0814	0.8572	4.2923	27
SVR	0.828	4.6072	3.6264	0.8464	4.3901	27
MLP	0.8248	4.6508	3.7296	0.903	3.478	27
DecisionTree	0.7591	5.4535	4.3257	0.7546	5.5847	27
KNeighbors	0.7406	5.6583	4.2953	0.6623	6.818	27
Ridge	0.6158	6.8868	5.6101	0.6372	6.8525	27
ElasticNet	0.6158	6.8868	5.6132	0.6372	6.8526	27
Lasso	0.6157	6.8871	5.609	0.6371	6.8525	27
LinearRegression	0.6157	6.8869	5.6074	0.6371	6.8525	27
GradientBoosting	0.9188	2.7343	1.9617	0.9112	3.4376	28
XGBoost	0.915	2.7971	2.0887	0.9132	3.3483	28
LightGBM	0.9099	2.8798	2.179	0.8356	4.7637	28
CatBoost	0.8983	3.0595	2.3184	0.9281	3.1239	28
ExtraTrees	0.8928	3.1416	2.5403	0.8856	3.9368	28
RandomForest	0.8746	3.3968	2.542	0.8772	4.1012	28
SVR	0.8398	3.84	2.9941	0.844	4.5646	28
MLP	0.7905	4.3904	3.6776	0.8667	4.2183	28
DecisionTree	0.7452	4.8423	3.9169	0.7533	5.7434	28
KNeighbors	0.6575	5.6141	4.5489	0.6815	6.5871	28
ElasticNet	0.566	6.3197	5.3963	0.6488	6.8973	28
Ridge	0.5648	6.3285	5.4084	0.6489	6.8967	28
Lasso	0.563	6.3417	5.4264	0.649	6.8951	28
LinearRegression	0.5626	6.3442	5.4296	0.6491	6.8949	28
CatBoost	0.9627	2.3113	1.7425	0.9182	3.2227	29
XGBoost	0.9503	2.6669	1.9102	0.8962	3.6079	29
GradientBoosting	0.9228	3.3233	2.2458	0.889	3.7145	29
LightGBM	0.9137	3.5147	2.7174	0.8439	4.439	29
ExtraTrees	0.9136	3.5163	2.857	0.8814	3.8748	29
RandomForest	0.8775	4.1863	3.4466	0.8557	4.2775	29
SVR	0.8618	4.4467	3.3173	0.8513	4.3612	29
MLP	0.8262	4.987	4.1373	0.8431	4.4839	29
DecisionTree	0.8061	5.2672	3.9692	0.8175	4.7925	29
KNeighbors	0.7876	5.513	4.4698	0.6658	6.5232	29
Lasso	0.5746	7.8024	6.6356	0.6567	6.6273	29
Ridge	0.5745	7.8033	6.6362	0.6569	6.6262	29
ElasticNet	0.5745	7.8029	6.638	0.6569	6.6259	29
LinearRegression	0.5744	7.8041	6.6365	0.6567	6.6277	29

References

Meyer, C. The greening of the concrete industry. Cem. Concr. Compos. 2009, 31, 601–605. [Google Scholar] [CrossRef]
Ahmaruzzaman, M. A review on the utilization of fly ash. Prog. Energy Combust. Sci. 2010, 36, 327–363. [Google Scholar] [CrossRef]
Hardjito, D.; Rangan, B.V. Development and Properties of Low-Calcium Fly Ash-Based Geopolymer Concrete; Research Report GC 1; Faculty of Engineering; Curtin University of Technology: Perth, Australia, 2005. [Google Scholar]
Hardjito, D.; Wallah, S.E.; Sumajouw, D.M.J.; Rangan, B.V. On the development of fly ash-based geopolymer concrete. ACI Mater. J. 2004, 101, 467–472. [Google Scholar] [CrossRef] [PubMed]
Khale, D.; Chaudhary, R. Mechanism of geopolymerization and factors influencing its development: A review. J. Mater. Sci. 2007, 42, 729–746. [Google Scholar] [CrossRef]
Nath, P.; Sarker, P.K. Effect of GGBFS on setting, workability and early strength properties of fly ash geopolymer concrete cured in ambient condition. Constr. Build. Mater. 2014, 66, 163–171. [Google Scholar] [CrossRef]
Patankar, S.V.; Ghugal, Y.M.; Jamkar, S.S. Mix design of fly ash-based geopolymer concrete. In Advances in Structural Engineering; Springer: New Delhi, India, 2015. [Google Scholar] [CrossRef]
Ansari, S.S.; Ibrahim, S.M.; Hasan, S.D. Conventional and ensemble machine learning models to predict the compressive strength of fly ash based geopolymer concrete. Mater. Today Proc. 2023, in press. [Google Scholar] [CrossRef]
Gupta, P.; Gupta, N.; Saxena, K.K. Predicting compressive strength of geopolymer concrete using machine learning. Innov. Emerg. Technol. 2023, 10, 2350003. [Google Scholar] [CrossRef]
Ahmad, A.; Ahmad, W.; Chaiyasarn, K.; Ostrowski, K.A.; Aslam, F.; Zajdel, P.; Joyklad, P. Prediction of geopolymer concrete compressive strength using novel machine learning algorithms. Polymers 2021, 13, 3389. [Google Scholar] [CrossRef] [PubMed]
Dash, P.K.; Parhi, S.K.; Patro, S.K.; Panigrahi, R. Influence of chemical constituents of binder and activator in predicting compressive strength of fly ash-based geopolymer concrete using firefly-optimized hybrid ensemble machine learning model. Mater. Today Commun. 2023, 37, 107485. [Google Scholar] [CrossRef]
Hu, T.; Zhang, H.; Cheng, C.; Li, H.; Zhou, J. Explainable machine learning: Compressive strength prediction of FRP-confined concrete column. Mater. Today Commun. 2024, 39, 108883. [Google Scholar] [CrossRef]
Nguyen, T.K.; Huynh, T.A.; Dang, V.H.; Ahmed, A.; Thai, D.-K. From Machine Learning to Empirical Modelling: A Structured Framework for Predicting Compressive Strength of Fly Ash-Based Geopolymer Concrete. Buildings 2025, 16, 123. [Google Scholar] [CrossRef]
Pham, V.N.; Nguyen, V.Q. Effects of alkaline activators and other factors on the properties of geopolymer concrete using industrial wastes based on GEP-based models. Eur. J. Environ. Civ. Eng. 2024, 28, 3770–3792. [Google Scholar] [CrossRef]
Pham, V.N. Compressive strength of geopolymer concrete. Mendeley Data 2023, V1. [Google Scholar] [CrossRef]
Alsalman, A.; Assi, L.N.; Kareem, R.S.; Carter, K.; Ziehl, P. Energy and CO₂ emission assessments of alkali-activated concrete and ordinary Portland cement concrete: A comparative analysis of different grades of concrete. Clean. Environ. Syst. 2021, 3, 100047. [Google Scholar] [CrossRef]
Davidovits, J. Geopolymer Chemistry and Applications; Institut Géopolymère: Saint-Quentin, France, 2008. [Google Scholar]
Leonelli, C.; MacKenzie, K.J.; Seo, D.K.; Kriven, W.M. Geopolymer and alkali-activated materials: Chemistry, structure, and properties. Front. Chem. 2022, 10, 929163. [Google Scholar] [CrossRef] [PubMed]
Shi, C.; Krivenko, P.V.; Roy, D. Alkali-Activated Cements and Concretes; Taylor and Francis: London, UK, 2003. [Google Scholar]
Provis, J.L.; Yong, C.Z.; Duxson, P.; van Deventer, J.S.J. Correlating mechanical and thermal properties of sodium silicate–fly ash geopolymers. Colloids Surf. A Physicochem. Eng. Asp. 2009, 336, 57–63. [Google Scholar] [CrossRef]
Habert, G.; d’Espinose de Lacaillerie, J.B.; Roussel, N. An environmental evaluation of geopolymer-based concrete production: Reviewing current research trends. J. Clean. Prod. 2011, 19, 1229–1238. [Google Scholar] [CrossRef]
Temuujin, J.; van Riessen, A.; MacKenzie, K.J.D. Preparation and characterization of fly ash-based geopolymer mortars. Constr. Build. Mater. 2010, 24, 1906–1910. [Google Scholar] [CrossRef]
Pacheco-Torgal, F.; Labrincha, J.; Leonelli, C.; Palomo, A.; Chindaprasit, P. Handbook of Alkali-Activated Cements, Mortars and Concretes; Elsevier: Oxford, UK, 2014. [Google Scholar]
Miettinen, K. Nonlinear Multiobjective Optimization; Springer: New York, NY, USA, 1999. [Google Scholar] [CrossRef]
Archetti, F.; Candelieri, A. Software Resources. In Bayesian Optimization and Data Science; Springer International Publishing: Cham, Switzerland, 2019; pp. 97–109. [Google Scholar]
Wattanasiriwech, S.; Nurgesang, F.A.; Wattanasiriwech, D.; Timakul, P. Characterisation and properties of geopolymer composite part 1: Role of mullite reinforcement. Ceram. Int. 2017, 43, 16055–16062. [Google Scholar] [CrossRef]
Assi, L.N.; Deaver, E.E.; Ziehl, P. Effect of source and particle size distribution on the mechanical and microstructural properties of fly ash-based geopolymer concrete. Constr. Build. Mater. 2018, 167, 372–380. [Google Scholar] [CrossRef]

Figure 1. Experiment Flowchart.

Figure 2. Correlation of the input variables.

Figure 3. Pareto front and objectives.

Figure 4. Correlation plot of selected variables and target variable.

Figure 5. Predicted vs. actual plot for baseline random forest model.

Figure 6. Variance Inflation Factor (VIF) values for input variables.

Figure 7. Mean R² scores of the model (30 runs).

Figure 8. Mean RMSE scores of the model (30 runs).

Figure 9. Feature importances plot.

Figure 10. Comparison of predicted and true

f_{c}^{'}

[MPa] values for Holdout Set.

Figure 10. Comparison of predicted and true

f_{c}^{'}

[MPa] values for Holdout Set.

Table 1. Descriptive statistics of input features.

Variable	Mean	SD	Min	Max
Fly Aah (FA)	255.2143	91.8794	120.000	530.000
GGBFS	164.6996	99.8852	0.000	360.000
Coarse Aggregate	991.2118	199.9069	525.4000	1276.8000
Fine Aggregate (Sand)	744.3048	84.6714	514.000	196.4000
Na₂SiO₃ Solution	149.2435	26.7160	76.2500	102.8571
NaOH	59.8875	11.7165	30.500	102.8571
Na₂O	18.4245	3.9115	11.2100	28.8700
SiO₂	44.5342	7.9886	22.4200	57.80600
Water (1)	86.287	15.5516	42.6238	112.8214
Concentration	8.4689	2.2129	2.0000	16.000
Water (2)	39.9111	10.0020	13.4200	73.6000
NaOH (Dry)	19.9764	5.2394	3.4960	38.4000
Additional water	20.6293	53.1391	0.0000	170.000
Superplasticizer	11.7201	7.3765	0.0000	18.000
Total water	153.8571	62.0876	56.0438	303.5336
Initial curing time	0.2894	0.4543	0.0000	1.000
Initial curing temp	37.3040	21.2549	24.000	100.000
Initial curing rest time	0.9048	0.3181	0.0000	2.0000
Final curing temp	28.3297	2.6942	24.000	30.000

Table 2. Weight combinations for objectives.

Scenario	Weights Combinations (VIF, R², RMSE, MultiR)
S1	(0.05, 0.45, 0.45, 0.05)
S2	(0.05, 0.50, 0.40, 0.05)
S3	(0.05, 0.40, 0.50, 0.05)
S4	(0.10, 0.40, 0.40, 0.10)
S5	(0.10, 0.35, 0.45, 0.10)
S6	(0.10, 0.45, 0.35, 0.10)
S7	(0.15, 0.35, 0.35, 0.15)
S8	(0.15, 0.40, 0.30, 0.15)
S9	(0.15, 0.30, 0.40, 0.15)
S10	(0.20, 0.30, 0.30, 0.20)
S11	(0.20, 0.35, 0.25, 0.20)
S12	(0.20, 0.25, 0.35, 0.20)

Table 3. Calculation method for Total Multiple R.

Single Variable Multiple R: Calculated for each Independent Variable (IV) separately

Step 1: Define regression problem as Equation (1)

Step 2: Calculate R² (Coefficient of Determination) according to Equations (2)–(4)

Step 3: Calculate Multiple Correlation R according to Equation (5)
where R_i ∈ [0, 1]

Total Multiple R: Calculated as an overall metric as given in Equation (6)

Properties:
- Range: [0, V]
- Minimum: 0 (perfect independence of IVs)
- Maximum: V (complete redundancy of IVs)

Table 4. Pseudocode for Pareto front analysis.

ALGORITHM: Pareto Front Analysis and Optimal Selection

INPUT:
pareto_front: Set of non-dominated solutions from NSGA-II
X_scaled: Standardized feature matrix
y: Target variable
weights: [w₁, w₂, w₃, w₄] objective weights

OUTPUT:
optimal_solution: Selected best compromise solution
pareto_df: DataFrame with all solutions and metrics
best_by_objective: Dictionary of best solutions per objective

PROCEDURE:

1. Initialize result storage
pareto_solutions ← []

2. FOR EACH individual IN pareto_front DO

(a) Decode binary representation
  variable_mask ← individual.binary_vector
  selected_vars ← DECODE(variable_mask)
  n_vars ← COUNT(selected_vars)

(b) Extract subset data
  X_subset ← X_scaled[:, selected_vars]

(c) Calculate VIF
  max_vif ← CALCULATE_MAX_VIF(X_subset, selected_vars)

(d) Evaluate model performance (5-fold CV)
  r2, rmse ← EVALUATE_RANDOM_FOREST(X_subset, y)

(e) Calculate Total Multiple R
  total_r ← 0
  individual_r_scores ← []

  FOR EACH var_idx IN selected_vars DO
r_value ← CALCULATE_SINGLE_R(var_idx, X_subset)
individual_r_scores.APPEND(r_value)
total_r ← total_r + r_value
  END FOR

  avg_r ← total_r/n_vars

(f) Store comprehensive metrics
  solution ← {
‘solution_id’: i,
‘n_variables’: n_vars,
‘max_vif’: max_vif,
‘r2’: r2,
‘rmse’: rmse,
‘total_multiple_r’: total_r,
‘avg_multiple_r’: avg_r,
‘variables’: selected_vars,
‘individual_r_scores’: individual_r_scores,
‘fitness’: individual.fitness.values
  }
  pareto_solutions.APPEND(solution)

3. Create DataFrame
pareto_df ← DATAFRAME(pareto_solutions)

4. Identify best solutions by individual objectives
best_by_objective ← {
  ‘vif’: pareto_df[pareto_df[‘max_vif’].IDXMIN()],
  ‘r2’: pareto_df[pareto_df[‘r2’].IDXMAX()],
  ‘rmse’: pareto_df[pareto_df[‘rmse’].IDXMIN()],
  ‘total_r’: pareto_df[pareto_df[‘total_multiple_r’].IDXMIN()]
}

5. Calculate balanced (i.e., weighted) scores for all solutions
FOR EACH solution IN pareto_df DO
  // Normalize objectives
  norm_vif ← 1/(1 + solution[‘max_vif’]/100)
  norm_r2 ← solution[‘r2’]
  norm_rmse ← 1/(1 + solution[‘rmse’]/10)
  norm_total_r ← 1/(1 + solution[‘total_multiple_r’]/10)

  // Weighted aggregation
  balanced_score ← w₁ × norm_vif +
w₂ × norm_r2 +
w₃ × norm_rmse +
w₄ × norm_total_r
  solution[‘balanced_score’] ← balanced_score
END FOR

6. Select optimal solution
optimal_solution ← pareto_df[pareto_df[‘balanced_score’].IDXMAX()]

7. RETURN optimal_solution, pareto_df, best_by_objective

END ALGORITHM

Table 5. Initial Parameter Values for the Experiment.

Parameter	Value
GA Population size	60
GA Generations	40
Objective weights	12 scenarios
Dataset variables	19
Minimum variables to keep	5
Cross-validation folds	5
Random state	42

Table 6. Optimal solutions for genetic algorithm runs with different weight scenarios.

Scenario	Weights (VIF, R², RMSE, MultiR)	N Vars	Max VIF of Solution	R²	RMSE	Total Multi-R
S1	(0.05, 0.45, 0.45, 0.05)	9	36.01	0.9156	3.3044	8.2536
S2	(0.05, 0.50, 0.40, 0.05)	9	36.01	0.9156	3.3044	8.2536
S3	(0.05, 0.40, 0.50, 0.05)	9	36.01	0.9156	3.3044	8.2536
S4	(0.10, 0.40, 0.40, 0.10)	7	11.76	0.9142	3.3294	5.7507
S5	(0.10, 0.35, 0.45, 0.10)	7	11.76	0.9142	3.3294	5.7507
S6	(0.10, 0.45, 0.35, 0.10)	7	11.76	0.9142	3.3294	5.7507
S7	(0.15, 0.35, 0.35, 0.15)	7	11.76	0.9142	3.3294	5.7507
S8	(0.15, 0.40, 0.30, 0.15)	7	11.76	0.9142	3.3294	5.7507
S9	(0.15, 0.30, 0.40, 0.15)	7	11.76	0.9142	3.3294	5.7507
S10	(0.20, 0.30, 0.30, 0.20)	7	11.76	0.9142	3.3294	5.7507
S11	(0.20, 0.35, 0.25, 0.20)	7	11.76	0.9142	3.3294	5.7507
S12	(0.20, 0.25, 0.35, 0.20)	7	11.76	0.9142	3.3294	5.7507

Table 7. Variable selection stability across weight scenarios.

Variable	Times Selected	Scenarios
Concentration(M) NaOH	12/12	S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12
NaOH (Dry)	12/12	S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12
GGBFS	12/12	S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12
Initial curing temp (C)	12/12	S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12
Coarse Aggregate	12/12	S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12
Na₂O (Dry)	9/12	S4, S5, S6, S7, S8, S9, S10, S11, S12
Initial curing rest time (day)	9/12	S4, S5, S6, S7, S8, S9, S10, S11, S12
Additional water	3/12	S1, S2, S3
Initial curing time (day)	3/12	S1, S2, S3
FA	3/12	S1, S2, S3
Na₂SiO₃	3/12	S1, S2, S3

Table 8. S4 Pareto front optimal solutions.

Solution ID	N Vars	Max VIF	R²	RMSE	Total Multiple R	Balanced Score (ws)	Optimal
1	5	1.7866	0.8902	3.7692	2.9706	0.2000	False
2	5	2.5093	0.8974	3.6401	3.1931	0.4082	False
3	5	2.6997	0.8927	3.729	3.1785	0.2662	False
4	5	2.8235	0.9032	3.5359	3.4096	0.576	False
5	5	2.9639	0.8905	3.7688	3.1313	0.2005	False
6	6	3.1136	0.9109	3.3943	4.3889	0.7919	False
7	7	3.7254	0.9122	3.3698	5.4745	0.8129	False
8	7	11.6182	0.9124	3.3646	5.6581	0.8129	False
9	7	11.7611	0.9142	3.3294	5.7507	0.8673	True
10	10	26.2126	0.9158	3.304	9.2227	0.8487	False
11	6	26.8715	0.9116	3.3814	4.6817	0.7932	False
12	9	36.0141	0.9156	3.3044	8.2536	0.8548	False
13	9	37.8616	0.9142	3.3308	7.8009	0.8189	False
14	10	162.9658	0.9164	3.2878	9.4148	0.7855	False
15	10	167.6364	0.917	3.2781	9.3437	0.8011	False

Table 9. Best solution per objective for S4.

Criterion	Solution ID	N Vars	VIF	R²	RMSE	Multi-R
Min VIF	1	5	1.79	0.8902	3.7692	2.9706
Max R²	15	10	167.64	0.917	3.2781	9.3437
Min RMSE	15	10	167.64	0.917	3.2781	9.3437
Min Multi-R	1	5	1.79	0.8902	3.7692	2.9706

Table 10. Average metrics for fine-tuned models (30 runs).

Model	Holdout R²	Holdout RMSE	Holdout MAE	CV R²	CV RMSE
CatBoost	0.9237	2.995	2.224	0.9152	3.2209
XGBoost	0.9167	3.1488	2.2867	0.906	3.3978
Gradient Boosting	0.9054	3.3722	2.3904	0.8996	3.5108
ExtraTrees	0.8894	3.6799	2.9701	0.8782	3.9005
LightGBM	0.8776	3.8502	2.7892	0.8449	4.406
Random Forest	0.8716	3.9903	3.1352	0.8597	4.2089
SVR	0.8513	4.2611	3.2999	0.8354	4.5054
MLP	0.8064	4.8227	3.9577	0.8579	4.1708
Decision Tree	0.7752	5.1786	4.1551	0.775	5.2726
KNeighbors	0.7142	5.9684	4.5852	0.6532	6.688
Lasso	0.6156	6.8971	5.6716	0.6184	6.9271
ElasticNet	0.6154	6.8994	5.6747	0.6185	6.9271
Ridge	0.6154	6.8981	5.674	0.6185	6.9266
Linear Regression	0.6153	6.8976	5.6743	0.6183	6.9264

Table 11. Performance metrics for fine-tuned models (30th run).

Model	Holdout R²	Holdout RMSE	Holdout MAE	CV R²	CV RMSE	Seed
CatBoost	0.9627	2.3113	1.7425	0.9182	3.2227	29
GradientBoosting	0.9228	3.3233	2.2458	0.889	3.7145	29
LightGBM	0.9137	3.5147	2.7174	0.8439	4.439	29
ExtraTrees	0.9136	3.5163	2.857	0.8814	3.8748	29
RandomForest	0.8775	4.1863	3.4466	0.8557	4.2775	29
SVR	0.8618	4.4467	3.3173	0.8513	4.3612	29
MLP	0.8262	4.987	4.1373	0.8431	4.4839	29
DecisionTree	0.8061	5.2672	3.9692	0.8175	4.7925	29
KNeighbors	0.7876	5.513	4.4698	0.6658	6.5232	29
Lasso	0.5746	7.8024	6.6356	0.6567	6.6273	29
Ridge	0.5745	7.8033	6.6362	0.6569	6.6262	29
ElasticNet	0.5745	7.8029	6.638	0.6569	6.6259	29
LinearRegression	0.5744	7.8041	6.6365	0.6567	6.6277	29

Table 12. Comparative summary of existing literature and the present study on geopolymer compressive strength prediction.

Literature	Data/Experimental Conditions	Methods/Analysis	Key Findings
Pham and Nguyen (2024) [14]	Broadly compiled industrial geopolymer mixes (various FA/GGBFS blends).	Gene Expression Programming (GEP)/experimental data compilation.	Highlighted the strong effect of alkaline activators and raw material composition.
Hardjito and Rangan (2005) [3]	Low-calcium FA, controlled curing (high temperature).	Experimental mechanical tests; microstructural analysis.	Demonstrated that heat curing accelerates early strength gain, with responses varying by mix composition.
Wattanasiriwech et al. (2017) [26]	Sequential curing at 90 °C, followed by humid and then dry curing at 40 °C.	Experimental strength testing.	Showed that multi-stage curing can produce consistently high strengths.
Assi (2016) [27]	Experimental tests and microstructural evaluation.	Reported that silica fume-based systems are less sensitive to curing conditions than sodium silicate systems.	Reported that silica fume-based systems are less sensitive to curing conditions than sodium silicate systems.
Present Study	274 experimental records covering 19 variables (FA, GGBFS, Na₂SiO₃, NaOH, curing parameters, aggregates, water, SP, etc.).	Data preprocessing, correlation and network analysis, VIF calculation, multi-objective GA for feature selection, hyperparameter tuning; models: RF, XGBoost, LightGBM, CatBoost, SVR, MLP, etc.	Best performance achieved with CatBoost (R² = 0.9583). GGBFS identified as most influential feature; NaOH (Dry) is also critical for estimation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ahadian, F.; Işıkdağ, Ü.; Bekdaş, G.; Nigdeli, S.M.; Cakiroglu, C.; Geem, Z.W. Interpretable Machine Learning for Compressive Strength Prediction of Fly Ash-Based Geopolymer Concrete. Sustainability 2026, 18, 2227. https://doi.org/10.3390/su18052227

AMA Style

Ahadian F, Işıkdağ Ü, Bekdaş G, Nigdeli SM, Cakiroglu C, Geem ZW. Interpretable Machine Learning for Compressive Strength Prediction of Fly Ash-Based Geopolymer Concrete. Sustainability. 2026; 18(5):2227. https://doi.org/10.3390/su18052227

Chicago/Turabian Style

Ahadian, Farnaz, Ümit Işıkdağ, Gebrail Bekdaş, Sinan Melih Nigdeli, Celal Cakiroglu, and Zong Woo Geem. 2026. "Interpretable Machine Learning for Compressive Strength Prediction of Fly Ash-Based Geopolymer Concrete" Sustainability 18, no. 5: 2227. https://doi.org/10.3390/su18052227

APA Style

Ahadian, F., Işıkdağ, Ü., Bekdaş, G., Nigdeli, S. M., Cakiroglu, C., & Geem, Z. W. (2026). Interpretable Machine Learning for Compressive Strength Prediction of Fly Ash-Based Geopolymer Concrete. Sustainability, 18(5), 2227. https://doi.org/10.3390/su18052227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable Machine Learning for Compressive Strength Prediction of Fly Ash-Based Geopolymer Concrete

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Exploratory Data Analysis

2.3. The Experimental Process

2.3.1. Stage 1: Initialization

2.3.2. Stage 2a: Feature Selection with Genetic Algorithm (Multi-Objective Optimization)

2.3.3. Stage 2b: Pareto Front Analysis for Multi-Objective Variable Selection

2.3.4. Stage 3: Performance Assessment of ML Models with Selected Features

2.3.5. Stage 4: Visualization and Verification

3. Results

3.1. Feature Selection via Genetic Algorithm

3.2. Baseline Model Evaluation

3.3. Evaluation of Hyperparameter Optimized Models

3.4. Feature Importance Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Step-by-Step Tree Diagram of Experimental Process

Appendix B

Baseline Model Metrics (30 Runs)

Appendix C

Fine-Tuned Model Metrics (30 Runs)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI