A Hybrid Clustering–Classification Approach for Predicting Strength and Analyzing Material Composition of Geopolymers
Abstract
1. Introduction
- The development of a hybrid model which unites K-means clustering and machine learning classification methods to analyze material composition.
- The evaluation of multiple classification algorithms, which include RF, NB, ANNs, LightGBM, and LDA.
- The presentation of results which demonstrate how clustering–classification techniques improve anticipation accuracy when compared to standard classification techniques.
- This research delivers information about material composition data, which shows how those groups connect to material characteristics.
2. Methodology
2.1. Dataset Description and Preprocessing
2.1.1. Dataset Overview and Analysis
2.1.2. Data Preprocessing
- Data Validation: All samples were checked for completeness. The elemental percentages were confirmed to reach 100 percent when accounting for a tolerance of 2 percent, which included unmeasured trace elements.
- Feature Engineering: Elemental ratios were calculated as (Si/Al and Ca/Si) to indicate important compositional relationships.
- Normalization: Z-score normalization (mean = 0, standard deviation = 1) was used to standardize all features because it helps maintain equal weight during distance computations in clustering algorithms.
- Outlier Detection: The Interquartile Range (IQR) framework was applied to identify extreme outliers. The study retained all 115 samples because there were no detected outliers in any of the features.
- Data Splitting: The researchers divided the data into two sets, which they used to conduct training and testing, by implementing an 80-20 split that maintained the original distribution of ‘Mixture Number’ across various mixture types.
2.2. Clustering with K-Means
K-Means Algorithm for Determining the Optimal Number of Clusters
2.3. Classification Using Machine Learning
2.4. Feature Importance Analysis
2.4.1. SHAP (SHapley Additive exPlanations)
2.4.2. Permutation Importance
2.5. Evaluation Metrics
3. Results
3.1. Performance Results
3.2. SHAP and Permutation Analysis Results
3.3. Evaluation of the Hybrid Framework
3.4. Robustness Results of the Hybrid Model
4. Discussion
5. Conclusions
- After the clustering process, the Random Forest model demonstrated high classification performance with values of 98% precision and accuracy score.
- In the cross-validation analysis, significant performance was achieved in all folds, but the highest prediction performance was obtained from fold 3 with an R2 value of 0.98.
- In both the SHAP and permutation importance analyses, the input feature with the highest importance level is the Si/Al ratio. This is followed by the other important components, namely the chemical components O, Si, and Al.
- The highest accuracy values for all algorithms were achieved on the training test set at 80%. A decrease in the training test ratio led to a decrease in the performance values obtained.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Amran, Y.H.M.; Alyousef, R.; Alabduljabbar, H.; El-Zeadani, M. Clean production and properties of geopolymer concrete: A review. J. Clean. Prod. 2020, 251, 119679. [Google Scholar] [CrossRef]
- El Inaty, F.; Nasreddine, H.; Djerbi, A.; Gautron, L.; Marchetti, M.; Quiertant, M.; Metalssi, O.O. Mechanical and durability performance of metakaolin and fly ash-based geopolymers compared to cement systems. Results Eng. 2025, 27, 105788. [Google Scholar] [CrossRef]
- El Inaty, F.; Marchetti, M.; Quiertant, M.; Metalssi, O.O. Effect of curing on the coupled attack of sulfate and chloride ions on low-carbon cementitious materials including slag, fly ash, and metakaolin. Constr. Build. Mater. 2024, 438, 137307. [Google Scholar] [CrossRef]
- Bediako, M.; Valentini, L. Strength, carbon emissions, and sorptivity behavior of cement paste and mortar containing thermally activated clay. J. Build. Eng. 2024, 89, 109278. [Google Scholar] [CrossRef]
- Gowri, K.; Rahim, A.A. Investigation on rheological performance of indigenously developed sustainable low clinker hybrid cementing binders. Results Eng. 2024, 24, 103543. [Google Scholar] [CrossRef]
- Ho, L.S.; Van Quang, L.; Van-Pham, D.-T.; Huynh, T.-P. Strength development and microstructural characterization of eco-cement paste with high-volume fly ash. Results Eng. 2025, 26, 105013. [Google Scholar] [CrossRef]
- Philip, S.; Marakkath, N. Compressive strength prediction and feature analysis for GGBS-Based geopolymer concrete using optimized XGBoost and SHAP: A comparative study of optimization algorithms and experimental validation. J. Build. Eng. 2025, 108, 112879. [Google Scholar] [CrossRef]
- Borçato, A.G.; Casali, J.M.; Betioli, A.M.; Medeiros-Junior, R.A. Development of eco-friendly brick waste-based geopolymers: Effect of calcium incorporation on rheology, compressive strength, microstructure, and eco-efficiency. J. Build. Eng. 2025, 111, 113101. [Google Scholar] [CrossRef]
- Asrani, N.P.; Murali, G.; Abdelgader, H.S.; Parthiban, K.; Haridharan, M.K.; Karthikeyan, K. Investigation on Mode I Fracture Behavior of Hybrid Fiber-Reinforced Geopolymer Composites. Arab. J. Sci. Eng. 2019, 44, 8545–8555. [Google Scholar] [CrossRef]
- Sinkhonde, D.; Mirindi, D.; Dabakuyo, I.; Bezabih, T.; Moffo, N.D.N.; Mirindi, F. Predicting the compressive strength of laterite blocks stabilized with metakaolin geopolymer and sugarcane molasses via machine learning. Clean. Waste Syst. 2025, 12, 100352. [Google Scholar] [CrossRef]
- Yang, K.-H.; Song, J.-K.; Song, K.-I. Assessment of CO2 reduction of alkali-activated concrete. J. Clean. Prod. 2013, 39, 265–272. [Google Scholar] [CrossRef]
- Alomayri, T. Experimental study of the microstructural and mechanical properties of geopolymer paste with nano material (Al2O3). J. Build. Eng. 2019, 25, 100788. [Google Scholar] [CrossRef]
- Part, W.K.; Ramli, M.; Cheah, C.B. An overview on the influence of various factors on the properties of geopolymer concrete derived from industrial by-products. Constr. Build. Mater. 2015, 77, 370–395. [Google Scholar] [CrossRef]
- Fořt, J.; Vejmelková, E.; Keppert, M.; Rovnaníková, P.; Bezdička, P.; Černý, R. Alkaline activation of low-reactivity ceramics: Peculiarities induced by the precursors’ dual character. Cem. Concr. Compos. 2020, 105, 103440. [Google Scholar] [CrossRef]
- Borçato, A.G.; Thiesen, M.; Medeiros-Junior, R.A. Mechanical properties of metakaolin-based geopolymers modified with different contents of quarry dust waste. Constr. Build. Mater. 2023, 400, 132854. [Google Scholar] [CrossRef]
- Khalifa, A.Z.; Cizer, Ö.; Pontikes, Y.; Heath, A.; Patureau, P.; Bernal, S.A.; Marsh, A.T. Advances in alkali-activation of clay minerals. Cem. Concr. Res. 2020, 132, 106050. [Google Scholar] [CrossRef]
- Mehta, A.; Siddique, R. An overview of geopolymers derived from industrial by-products. Constr. Build. Mater. 2016, 127, 183–198. [Google Scholar] [CrossRef]
- Ren, B.; Zhao, Y.; Bai, H.; Kang, S.; Zhang, T.; Song, S. Eco-friendly geopolymer prepared from solid wastes: A critical review. Chemosphere 2021, 267, 128900. [Google Scholar] [CrossRef]
- Hu, H.; Jiang, M.; Tang, M.; Liang, H.; Cui, H.; Liu, C.; Ji, C.; Wang, Y.; Jian, S.; Wei, C.; et al. Prediction of compressive strength of fly ash-based geopolymers concrete based on machine learning. Results Eng. 2025, 27, 106492. [Google Scholar] [CrossRef]
- Zeng, Y.; Chen, Y.; Liu, Y.; Wu, T.; Zhao, Y.; Jin, D.; Xu, F. Prediction of compressive and flexural strength of coal gangue-based geopolymer using machine learning method. Mater. Today Commun. 2025, 44, 112076. [Google Scholar] [CrossRef]
- Han, Y.; Dai, W.; Zhou, L.; Guo, L.; Liu, M.; Wang, D.; Ju, Y. Predicting the adsorption capacity of geopolymers for heavy metals in solution based on machine learning. J. Environ. Chem. Eng. 2025, 13, 115978. [Google Scholar] [CrossRef]
- Afzali, S.A.E.; Shayanfar, M.A.; Ghanooni-Bagha, M.; Golafshani, E.; Ngo, T. The use of machine learning techniques to investigate the properties of metakaolin-based geopolymer concrete. J. Clean. Prod. 2024, 446, 141305. [Google Scholar] [CrossRef]
- Bin, F.; Hosseini, S.; Chen, J.; Samui, P.; Fattahi, H.; Jahed Armaghani, D. Proposing Optimized Random Forest Models for Predicting Compressive Strength of Geopolymer Composites. Infrastructures 2024, 9, 181. [Google Scholar] [CrossRef]
- Stel’makh, S.A.; Beskopylny, A.N.; Shcherban’, E.M.; Razveeva, I.; Oganesyan, S.; Shakhalieva, D.M.; Chernil’nik, A.; Onore, G. Compressive Strength of Geopolymer Concrete Prediction Using Machine Learning Methods. Algorithms 2025, 18, 744. [Google Scholar] [CrossRef]
- Yılmaz, Y.; Çakmak, T.; Kurt, Z.; Ustabaş, İ. A novel correlation study using pearson and spearman algorithms for mineral component-driven strength analysis of geopolymer. Pamukkale Univ. J. Eng. Sci. 2025, 31, 409–416. [Google Scholar] [CrossRef]
- Ngo, T.-P.; Vu, H.-N.; Bui, Q.-B. Application of machine learning models for the optimisation of compressive strength and water resistance of geopolymer stabilised compacted earth. Case Stud. Constr. Mater. 2025, 22, e04203. [Google Scholar] [CrossRef]
- Cao, R.; Fang, Z.; Jin, M.; Shang, Y. Application of Machine Learning Approaches to Predict the Strength Property of Geopolymer Concrete. Materials 2022, 15, 2400. [Google Scholar] [CrossRef]
- Cakmak, T.; Ustabas, İ. The anticipation of compressive strength of geopolymer mortars with tree-based machine learning models: Effect of training-testing ratios. Asian J. Civ. Eng. 2025, 26, 2657–2670. [Google Scholar] [CrossRef]
- Harmaji, A.; Kirana, M.C.; Jafari, R. Machine Learning to Predict Workability and Compressive Strength of Low- and High-Calcium Fly Ash–Based Geopolymers. Crystals 2024, 14, 830. [Google Scholar] [CrossRef]
- Nazar, S.; Yang, J.; Amin, M.N.; Khan, K.; Ashraf, M.; Aslam, F.; Javed, M.F.; Eldin, S.M. Machine learning interpretable-prediction models to evaluate the slump and strength of fly ash-based geopolymer. J. Mater. Res. Technol. 2023, 24, 100–124. [Google Scholar] [CrossRef]
- Wang, Y.; Iqtidar, A.; Amin, M.N.; Nazar, S.; Hassan, A.M.; Ali, M. Predictive modelling of compressive strength of fly ash and ground granulated blast furnace slag based geopolymer concrete using machine learning techniques. Case Stud. Constr. Mater. 2024, 20, e03130. [Google Scholar] [CrossRef]
- Shen, J.; Li, Y.; Lin, H.; Li, Y. Development of autogenous shrinkage prediction model of alkali-activated slag-fly ash geopolymer based on machine learning. J. Build. Eng. 2023, 71, 106538. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
- Zhang, H. The Optimality of Naive Bayes. 2004. Available online: www.aaai.org (accessed on 9 January 2026).
- Rish, I. An Empirical Study of the Naive Bayes Classifier. 2001. Available online: https://faculty.cc.gatech.edu/~isbell/reading/papers/Rish.pdf (accessed on 9 January 2026).
- Peña, D.; Tiao, G.C.; Tsay, R.S. (Eds.) A Course in Time Series Analysis; Wiley: New York, NY, USA, 2001; Volume 409. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Fisher, R.A. The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
- García, M.V.; Aznarte, J.L. Shapley additive explanations for NO2 forecasting. Ecol. Inform. 2020, 56, 101039. [Google Scholar] [CrossRef]
- Molnar, C. Interpretable Machine Learning A Guide for Making Black Box Models Explainable. Available online: http://leanpub.com/interpretable-machine-learning (accessed on 9 January 2026).
- Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T. Permutation importance: A corrected feature importance measure. Bioinformatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef]
- Gramacy, R.B. Surrogates; Chapman and Hall/CRC: Boca Raton, FL, USA, 2020. [Google Scholar] [CrossRef]








| Ref. | Year | Binders | Objectives |
|---|---|---|---|
| [23] | 2024 | FA, GGBS | Predicting the CS of geopolymer composites |
| [24] | 2025 | GGBS | Forecasting the CS of geopolymer concrete |
| [25] | 2025 | Glass waste, OB and FA | Predicting the CS of geopolymer mortars depend on waste and natural materials |
| [26] | 2025 | Soil and FA | Optimizing the CS and water resistance of geopolymer-stabilized compacted soil |
| [27] | 2022 | FA | Predicting the CS of FA-based geopolymers |
| [28] | 2025 | OB and SF | Predicting the CS of OB and SF geopolymers |
| [29] | 2024 | FA | Forecasting the workability and CS of geopolymer |
| [30] | 2023 | FA | Forecasting the slump and CS values of geopolymers containing FA |
| [31] | 2024 | GGBS and FA | Predicting the CS of geopolymer concrete based on FA and GGBS |
| [32] | 2023 | Slag and FA | Forecasting the autogenous shrinkage values of geopolymers containing alkali-activated slag-FA |
| Name | Mean | Mode | Median | Dispersion | Min. | Max. |
|---|---|---|---|---|---|---|
| O | 49.47 | 44.9 | 50.9 | 0.248 | 0.00 | 91.8 |
| Si | 30.69 | 35.2 | 30.1 | 0.378 | 1.10 | 79.6 |
| CS | 47.34 | 39.02 | 43.02 | 0.18 | 38.62 | 65.35 |
| Al | 8.02 | 0.00 | 6.7 | 0.815 | 0.00 | 61.9 |
| Na | 9.96 | 7.20 | 8.5 | 0.769 | 0.00 | 50.2 |
| C | 0.99 | 0.00 | 0.00 | 6.25 | 0.00 | 45.5 |
| Ca | 0.81 | 0.00 | 0.20 | 2.298 | 0.00 | 14.2 |
| Mixture Number | 5.38 | 2.00 | 5.00 | 0.62 | 1.00 | 11.00 |
| Si/Al | 4.13 | 0.00 | 4.16 | 0.50 | 0.00 | 9.76 |
| K | 0.018 | 0.00 | 0.00 | 7.592 | 0.00 | 1.10 |
| Ca/Si | 0.03 | 0.00 | 0.007 | 2.53 | 0.00 | 0.64 |
| Model | Type of Classification | Main Characteristics | Complexity | Interpretability | Data Type Suitability |
|---|---|---|---|---|---|
| RF | Ensemble | Aggregates multiple decision trees; handles non-linear relationships | Moderate | Moderate | Numerical, Mixed |
| NB | Probabilistic | Assumes feature independence; fast computation | Low | High | Categorical, Numerical |
| ANN | Non-linear | Multi-layer architecture; captures complex interactions | High | Low | Numerical |
| LightGBM | Gradient Boosting | Tree-based with histogram optimization; handles categorical features natively | Moderate | Moderate | Numerical, Mixed |
| LDA | Linear | Projects data to maximize class separation; assumes normal distributions | Low | High | Numerical |
| K-means | Clustering | Partitions data into k clusters based on similarity | Moderate | Low | Numerical |
| Algorithm | Definition |
|---|---|
| RF | Random Forest was selected for its ensemble nature, capability to handle non-linear relationships, and resistance to overfitting through bootstrap aggregation [33]. The system provides internal Gini importance measurement, which meets the requirements of material science to analyze compositional dependencies [34]. The algorithm uses bagging to reduce variance, which enables it to handle datasets that contain features with different abilities to predict outcomes [35]. |
| NB | Naive Bayes was selected because it offers both basic design and efficient processing of complex data, which includes multiple dimensions [36]. The method shows strong results for material classification because it maintains accuracy even when testing features that are linked to each other in a particular testing environment [37]. |
| ANNs | Artificial Neural Networks (ANNs) were used to model complex non-linear relationships between different compositional variables. The universal approximation capability of ANNs enables them to create accurate models for complex relationships between material properties [38]. |
| LightGBM | LightGBM was chosen as the gradient boosting model because it delivers high accuracy and computational efficiency according to [39]. The histogram-based optimization system of the software natively processes categorical features, which benefits material datasets that include discrete compositional elements [40]. |
| LDA | The study used Linear Discriminant Analysis (LDA) as a fundamental measurement for testing classes that can be separated through linear methods. The model provides interpretable results through its probabilistic output combined with its distinct decision boundaries, but struggles to maintain performance when dealing with non-linear relationships among its features [34,41]. |
| ML-Classifier | Libraries/ Frameworks | Training Method/Class | Key Parameters |
|---|---|---|---|
| Random Forest (RF) | scikit-learn | RandomForestClassifier() | n_estimators, max_depth, criterion |
| Naive Bayes (NB) | scikit-learn | GaussianNB() or MultinomialNB() | priors, var_smoothing |
| Artificial Neural Networks (ANN) | TensorFlow/Keras | Sequential() (Dense layers) | epochs, batch_size, optimizer |
| LightGBM | lightgbm | LGBMClassifier() | num_leaves, learning_rate, boosting_type |
| Linear Discriminant Analysis (LDA) | scikit-learn | LinearDiscriminantAnalysis() | solver, shrinkage |
| k-Means | scikit-learn | KMeans() | n_clusters, init, max_iter |
| Model | Hyperparameter | Tested Values/Ranges | Optimal Value (Typical) |
|---|---|---|---|
| Random Forest (RF) | n_estimators | 50, 100, 200, 300, 500 | 100 |
| max_depth | 3, 5, 7, 9, 11, None | 7 | |
| criterion | “gini”, “entropy” | “gini” | |
| Naive Bayes (NB) | var_smoothing | 1e−12, 1e−9, 1e−6, 1e−3 | 1e−9 |
| priors | None, [class_probs] | None | |
| ANN | epochs | 50, 100, 200, 500 | 100 |
| batch_size | 16, 32, 64, 128 | 32 | |
| optimizer | “adam”, “sgd”, “rmsprop” | “adam” | |
| LightGBM | num_leaves | 15, 31, 63, 127 | 31 |
| learning_rate | 0.01, 0.05, 0.1, 0.2 | 0.1 | |
| boosting_type | “gbdt”, “dart”, “goss” | “gbdt” | |
| LDA | solver | “svd”, “lsqr”, “eigen” | “svd” |
| shrinkage | None, 0.1, 0.5, 0.9, “auto” | None | |
| k-Means | n_clusters | 3, 5, 7, 10 (domain-specific) | 5 |
| init | “k-means++”, “random” | “k-means++” | |
| max_iter | 100, 300, 500 | 300 |
| Metric | Definition | Formula | |
|---|---|---|---|
| R2 | Coefficient of determination | (1) | |
| MAE | Mean absolute error | (2) | |
| MAPE | Mean absolute percentage error | (3) | |
| MSE | Mean squared error | (4) | |
| RMSE | Root mean squared error | (5) | |
| Accuracy | Accuracy | (6) | |
| Recall | Recall | (7) | |
| Precision | Precision | (8) | |
| F1 | F1 | (9) |
| Model | Train Size | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| RF | 80% | 0.942 | 0.945 | 0.942 | 0.941 |
| 75% | 0.928 | 0.931 | 0.928 | 0.927 | |
| 70% | 0.875 | 0.878 | 0.875 | 0.874 | |
| NB | 80% | 0.885 | 0.897 | 0.885 | 0.884 |
| 75% | 0.868 | 0.880 | 0.868 | 0.867 | |
| 70% | 0.850 | 0.861 | 0.850 | 0.849 | |
| ANN | 80% | 0.925 | 0.929 | 0.925 | 0.925 |
| 75% | 0.915 | 0.918 | 0.915 | 0.915 | |
| 70% | 0.895 | 0.898 | 0.895 | 0.895 | |
| LightGBM | 80% | 0.835 | 0.842 | 0.835 | 0.830 |
| 75% | 0.826 | 0.828 | 0.826 | 0.822 | |
| 70% | 0.770 | 0.771 | 0.770 | 0.765 | |
| LDA | 80% | 0.905 | 0.910 | 0.905 | 0.905 |
| 75% | 0.885 | 0.890 | 0.885 | 0.885 | |
| 70% | 0.865 | 0.870 | 0.865 | 0.865 |
| Model | Approach | Accuracy | Precision | Recall | F1 Score | Accuracy Improvement |
|---|---|---|---|---|---|---|
| RF | Hybrid (K + RF) (The proposed) | 0.980 | 0.981 | 0.980 | 0.980 | +4.0% |
| Standalone (RF) | 0.942 | 0.945 | 0.942 | 0.941 | ||
| ANN | Hybrid (K + ANN) | 0.960 | 0.964 | 0.960 | 0.960 | +3.5% |
| Standalone (ANN) | 0.925 | 0.929 | 0.925 | 0.925 | ||
| LDA | Hybrid (K + LDA) | 0.940 | 0.945 | 0.940 | 0.940 | +3.5% |
| Standalone (LDA) | 0.905 | 0.910 | 0.905 | 0.905 | ||
| NB | Hybrid (K + NB) | 0.920 | 0.933 | 0.920 | 0.919 | +3.5% |
| Standalone (NB) | 0.885 | 0.897 | 0.885 | 0.884 | ||
| LightGBM | Hybrid (K + LGBM) | 0.880 | 0.887 | 0.880 | 0.875 | +4.5% |
| Standalone (LGBM) | 0.835 | 0.842 | 0.835 | 0.830 |
| Feature | Importance (SHAP Value) | Direction | Mean Absolute SHAP |
|---|---|---|---|
| Si/Al | 0.378 | Positive | 4.23 |
| O | −0.298 | Negative | 3.12 |
| Si | 0.234 | Positive | 2.45 |
| Al | −0.189 | Negative | 1.98 |
| Na | 0.156 | Positive | 1.67 |
| C | −0.145 | Negative | 1.34 |
| Ca/Si | 0.123 | Positive | 1.45 |
| Ca | 0.078 | Positive | 0.89 |
| K | 0.034 | Positive | 0.45 |
| Feature | Importance | Standard Deviation (±) |
|---|---|---|
| Si/Al | 0.334 | 0.038 |
| O | 0.245 | 0.028 |
| Si | 0.198 | 0.023 |
| Al | 0.167 | 0.019 |
| Na | 0.134 | 0.015 |
| C | 0.112 | 0.014 |
| Ca/Si | 0.098 | 0.011 |
| Ca | 0.089 | 0.012 |
| K | 0.045 | 0.008 |
| K-Means Cluster | Mixture No | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| Cluster 1 | 1, 5, 8, | 0.99 | 0.99 | 0.99 | 0.99 |
| Cluster 2 | 2, 3, 10, | 0.98 | 0.98 | 0.98 | 0.98 |
| Cluster 3 | 4, 9, 11, | 0.96 | 0.96 | 0.96 | 0.96 |
| Cluster 4 | 6, 7 | 0.99 | 0.99 | 0.99 | 0.99 |
| Average | 0.98 | 0.981 | 0.98 | 0.98 |
| Fold | MSE | RMSE | MAE | MAPE | R2 |
|---|---|---|---|---|---|
| Fold 1 | 1.402 | 1.184 | 0.215 | 0.004 | 0.981 |
| Fold 2 | 1.378 | 1.174 | 0.210 | 0.004 | 0.982 |
| Fold 3 | 1.345 a | 1.160 a | 0.208 a | 0.004 a | 0.983 a |
| Fold 4 | 1.362 | 1.167 | 0.211 | 0.004 | 0.982 |
| Fold 5 | 1.390 | 1.179 | 0.213 | 0.004 | 0.981 |
| Ref. | Year | Application Models | Binders | Objectives | Performance Metrics | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| R2 | R | RMSE | MSE | MAE | MAPE | Accuracy | |||||
| [23] | 2024 | HHO-RF | FA and GGBS | Predicting the CS of geopolymers composites | 0.940 | 7.914 | 5.965 | ||||
| SCA-RF | 0.962 | 5.699 | 3.763 | ||||||||
| LSSVM | 0.931 | 9.020 | 8.235 | ||||||||
| ELM | 0.911 | 12.058 | 12.215 | ||||||||
| [24] | 2025 | KNN | GGBS | Predicting the CS of geopolymer concrete | 0.9998 | 0.63 | 0.37 | ||||
| AutoML | 0.9995 | 0.68 | 0.51 | ||||||||
| MLP | 0.9993 | 0.82 | 0.56 | ||||||||
| TabPFN v2 | 0.9996 | 0.64 | 0.46 | ||||||||
| [25] | 2025 | DNN | Glass waste, OB and FA | Predicting the CS of geopolymer mortars based on waste and natural materials | 0.93 | 5.17 | 26.7 | 0.285 | |||
| RF | 0.96 | 3.992 | 15.9 | 2.119 | |||||||
| LR | 0.763 | 9.507 | 90.4 | 5.987 | |||||||
| kNN | 0.804 | 8.643 | 74.7 | 4.669 | |||||||
| XGBoost | 0.981 | 2.968 | 8.81 | 1.582 | |||||||
| [26] | 2025 | XGBoost | Soil and Fly ash | Optimizing the CS and water resistance of geopolymer-stabilized compacted soil | 0.916 | 0.27 | |||||
| RF | 0.84 | 0.54 | |||||||||
| AdaBoost | 0.768 | 0.83 | |||||||||
| GBRT | 0.872 | 0.37 | |||||||||
| [27] | 2022 | MLP | Fly ash | Predicting the CS of FA-based geopolymers | 0.98 | 4.37 | 19.09 | 3.48 | |||
| XGB | 0.91 | 1.78 | 3.16 | 1.49 | |||||||
| SVM | 0.88 | 3.82 | 14.59 | 2.77 | |||||||
| [28] | 2025 | DT | Obsidian and Silica Fume | Predicting the CS of obsidian and silica fume geopolymers | 0.95 | 2.704 | 7.31 | 1.951 | 4.306 | ||
| ET | 0.818 | 5.731 | 32.8 | 4.414 | 9.251 | ||||||
| RF | 0.795 | 5.912 | 33 | 5.64 | 10.145 | ||||||
| GBR | 0.972 | 2.197 | 4.83 | 1.486 | 3.413 | ||||||
| [29] | 2024 | MLP | Fly ash | Predicting the workability and CS of geopolymer | 0.78 | 4.81 | 1.87 | ||||
| VR | 0.89 | 2.39 | 1.31 | ||||||||
| XGB | 0.96 | 0.007 | 0.06 | ||||||||
| [30] | 2023 | GEP | Fly ash | Predicting the slump and CS values of geopolymers containing FA | 0.914 | 2.19 | 0.3 | ||||
| ANFIS | 0.93 | 2.19 | 0.3 | ||||||||
| ANN | 0.92 | 2.18 | 0.3 | ||||||||
| [31] | 2024 | ANN | GGBS and fly ash | Predicting the CS of geopolymer concrete based on GA and GGBS | 0.97 | 6.76 | 4.72 | ||||
| ANFIS | 0.94 | 8.89 | 5.99 | ||||||||
| GEP | 0.99 | 3.85 | 2.53 | ||||||||
| [32] | 2023 | SVR | Slag and fly ash | Predicting the autogenous shrinkage values of geopolymers containing alkali-activated slag-FA | 0.88 | 347.57 | 293.1 | 0.56 | |||
| GPR | 0.88 | 342.56 | 291.5 | 0.53 | |||||||
| CART | 0.86 | 370.53 | 312.4 | 1.37 | |||||||
| RF | 0.88 | 348.72 | 293.5 | 1.81 | |||||||
| GB | 0.95 | 230.51 | 170.1 | 1.48 | |||||||
| XGB | 0.95 | 214.83 | 165.1 | 1.69 | |||||||
| This study | - | RF | - | The goal is to determine the relationship between the chemical components of geopolymer mortars and their CS. | 98% | ||||||
| NB | 92% | ||||||||||
| ANN | 96% | ||||||||||
| LightGBM | 88% | ||||||||||
| LDA | 94% | ||||||||||
| K-means | 94% | ||||||||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yılmaz, Y.; Çakmak, T.; Ustabaş, İ. A Hybrid Clustering–Classification Approach for Predicting Strength and Analyzing Material Composition of Geopolymers. Polymers 2026, 18, 959. https://doi.org/10.3390/polym18080959
Yılmaz Y, Çakmak T, Ustabaş İ. A Hybrid Clustering–Classification Approach for Predicting Strength and Analyzing Material Composition of Geopolymers. Polymers. 2026; 18(8):959. https://doi.org/10.3390/polym18080959
Chicago/Turabian StyleYılmaz, Yıldıran, Talip Çakmak, and İlker Ustabaş. 2026. "A Hybrid Clustering–Classification Approach for Predicting Strength and Analyzing Material Composition of Geopolymers" Polymers 18, no. 8: 959. https://doi.org/10.3390/polym18080959
APA StyleYılmaz, Y., Çakmak, T., & Ustabaş, İ. (2026). A Hybrid Clustering–Classification Approach for Predicting Strength and Analyzing Material Composition of Geopolymers. Polymers, 18(8), 959. https://doi.org/10.3390/polym18080959

