Influence of Data Structure on Prediction Error in Machine Learning-Based Concrete Compressive Strength Models
Abstract
1. Introduction
2. Data and Feature Structure Analysis
2.1. Dataset and Structural Characteristic Analysis
2.2. Methods for Feature Space Structure Analysis
2.2.1. Correlation
2.2.2. Partial Correlation
2.2.3. Information Entropy
2.2.4. Relief
2.3. Analysis of Feature Structure Differences Among Different Datasets
3. Modeling Methods
3.1. Artificial Neural Network Model
3.2. Support Vector Machine Model
3.3. Random Forest Model
3.4. Evaluation Indicators and Experimental Procedure
4. Influence of Data Feature Structure on Prediction Error
4.1. Relationship Between Feature Size and Prediction Error
| Algorithm 1 Feature filtering model |
|
function ----number of data set, -size of feature subset) for) do Ranking feature in Correlation, Partial correlation, Information entropy and Relief Generate ranking list of Correlation, Partial correlation, Information entropy and Relief for each ranking list do for do Get the top end for end for end for return end function |
4.2. Influence of Data Size and Strength Range on Prediction Error
4.3. Three-Factor Coupling Mechanism Analysis
5. Discussion
5.1. Relationship Between Feature Structure Differences and Concrete Material System
5.2. Influence of Data Structure on Model Error
5.3. Interpretation of the Empirical Error Model
6. Conclusions
- (1)
- There exists a reasonable feature size range for concrete strength prediction systems, but this range is dataset-dependent rather than universally fixed. For most normal concrete and high-performance concrete datasets, a relatively small, optimized feature subset can achieve stable prediction accuracy. For ultra-high-performance concrete and structurally more complex datasets, increasing feature number appropriately can improve prediction ability, although larger feature size does not continuously improve performance.
- (2)
- The distribution of feature importance is closely related to the material system. In normal concrete datasets, cement content, water–binder ratio, and binder-related variables usually play dominant roles. In high-strength or ultra-high-performance concrete datasets, mineral admixtures and chemical admixture variables become more important. This indicates that feature engineering should be designed according to the specific concrete system rather than transferred mechanically across datasets.
- (3)
- The four feature-filtering methods, including correlation, partial correlation, information entropy, and relief, can all be used to characterize feature space structure, but their effectiveness depends on feature distribution and variable correlation in the dataset. There is no single filtering method that performs best for all concrete datasets. For datasets with mixed information or unbalanced variable contribution, information entropy and relief may show relative advantages, whereas in structurally simpler datasets, correlation-based methods may reach low-error regions earlier.
- (4)
- ANN, SVR, and RF show consistent global trends in error variation, which indicates that prediction error is primarily constrained by data structure rather than determined only by model type. ANN is generally more sensitive to changes in feature configuration, whereas RF and SVR are relatively more stable. However, the model type does not overturn the overall trend that prediction error usually decreases first and then stabilizes as the retained feature subset becomes structurally more adequate.
- (5)
- Dataset size, strength range, and feature size jointly determine the attainable performance of the prediction system. The empirical error model established in this study can quantitatively describe the coupled relationship among these three structural variables and prediction error within the investigated dataset collection. Its role is to summarize the structural trend observed under the unified framework of this study rather than to serve as a universal predictive law for all future concrete datasets.
- (6)
- From a practical perspective, the findings suggest that improving concrete compressive strength prediction does not rely only on selecting a more complex algorithm. Researchers and engineers should first examine the structural characteristics of a dataset, especially sample support, target strength range, and feature space organization, before attempting additional gains through model complexity alone. In this sense, data organization and feature configuration should be regarded as the first layer of prediction error control.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
| Data Set | Feature Size | ANN | RF | SVR | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Correlation | Partial Correlation | Relief | Information Entropy | Correlation | Partial Correlation | Relief | Information Entropy | Correlation | Partial Correlation | Relief | Information Entropy | ||
| data0 | 2 | 4.05 | 3.801 | 7.268 | 5.794 | 3.832 | 3.824 | 7.226 | 6.182 | 4.118 | 3.698 | 7.299 | 5.701 |
| 3 | 3.955 | 3.773 | 5.799 | 5.608 | 4.036 | 3.822 | 6.1 | 6.24 | 4.134 | 3.73 | 5.747 | 5.362 | |
| 4 | 3.934 | 3.653 | 5.509 | 4.1 | 4.086 | 3.844 | 5.877 | 4.155 | 4.062 | 3.497 | 5.326 | 4.03 | |
| 5 | 3.939 | 3.648 | 4.142 | 4.048 | 4.088 | 3.879 | 4.178 | 4.02 | 4.059 | 3.616 | 3.989 | 3.924 | |
| 6 | 3.968 | 3.762 | 4.113 | 4.026 | 4.106 | 4.027 | 4.176 | 3.926 | 4.166 | 3.68 | 3.99 | 3.963 | |
| 7 | 3.854 | 3.728 | 4.158 | 4.095 | 4.011 | 3.928 | 4.176 | 3.931 | 3.98 | 3.828 | 4.008 | 4.016 | |
| 8 | 3.765 | 3.712 | 4.138 | 3.629 | 3.989 | 3.929 | 4.17 | 3.702 | 4.027 | 3.838 | 4.031 | 3.711 | |
| 9 | 3.84 | 3.706 | 4.084 | 3.696 | 3.867 | 3.918 | 4.058 | 3.694 | 4.069 | 3.832 | 4.03 | 3.74 | |
| 10 | 3.732 | 3.757 | 4.176 | 3.758 | 3.739 | 3.921 | 3.981 | 3.692 | 4.015 | 3.944 | 3.96 | 3.897 | |
| 11 | 3.725 | 3.593 | 4.262 | 3.674 | 3.726 | 3.78 | 3.981 | 3.68 | 4.053 | 3.959 | 3.978 | 3.993 | |
| 12 | 3.722 | 3.685 | 3.888 | 3.718 | 3.731 | 3.777 | 3.763 | 3.677 | 4.067 | 4.005 | 3.738 | 4.011 | |
| 13 | 3.705 | 3.711 | 3.782 | 3.696 | 3.73 | 3.774 | 3.77 | 3.672 | 4.106 | 4.021 | 3.795 | 3.932 | |
| 14 | 3.76 | 3.656 | 3.779 | 3.705 | 3.731 | 3.691 | 3.754 | 3.679 | 4.104 | 3.984 | 3.891 | 3.968 | |
| 15 | 3.845 | 3.667 | 3.73 | 3.7 | 3.733 | 3.673 | 3.734 | 3.667 | 4.132 | 3.958 | 3.95 | 3.967 | |
| 16 | 3.813 | 3.752 | 3.828 | 3.825 | 3.673 | 3.691 | 3.739 | 3.673 | 4.096 | 3.999 | 4.056 | 3.999 | |
| 17 | 3.771 | 3.771 | 3.771 | 3.771 | 3.671 | 3.671 | 3.671 | 3.671 | 4.018 | 4.018 | 4.018 | 4.018 | |
| data1 | 2 | 17.45 | 16.6 | 21.23 | 17.37 | 10.15 | 18.21 | 19.17 | 16.11 | 16.33 | 16.48 | 23.08 | 14.15 |
| 3 | 17.88 | 26.56 | 22.21 | 16.51 | 9.852 | 11.82 | 19.49 | 15.9 | 15.92 | 11.61 | 21.4 | 13.27 | |
| 4 | 18.53 | 11.87 | 42.87 | 24.96 | 9.16 | 9.419 | 18.85 | 10.02 | 14.79 | 9.95 | 20.48 | 11.58 | |
| 5 | 28.12 | 12.59 | 25.43 | 24.16 | 9.165 | 9.496 | 18.73 | 9.688 | 12.78 | 10.1 | 20.4 | 12.64 | |
| 6 | 20.69 | 13.93 | 23.79 | 21.55 | 8.19 | 8.539 | 18.81 | 9.174 | 9.009 | 8.881 | 19.16 | 13.5 | |
| 7 | 19.39 | 21.47 | 22.96 | 15.4 | 8.103 | 7.998 | 18.55 | 9.332 | 9.036 | 9.245 | 18.83 | 13.65 | |
| 8 | 16.17 | 14.57 | 31.57 | 28.62 | 8.241 | 8.154 | 18.25 | 8.945 | 7.948 | 8.342 | 18.37 | 13.1 | |
| 9 | 16.59 | 15.71 | 16.52 | 14.45 | 8.22 | 8.059 | 8.037 | 8.023 | 8.448 | 8.448 | 8.448 | 8.448 | |
| data2 | 2 | 10.24 | 10.24 | 12.92 | 10.87 | 8.336 | 8.336 | 11.19 | 7.221 | 10.26 | 10.26 | 12.89 | 10.2 |
| 3 | 11.98 | 10.29 | 12.64 | 9.209 | 7.351 | 7.277 | 9.747 | 6.769 | 8.998 | 9.233 | 12.79 | 8.912 | |
| 4 | 11.57 | 10.75 | 12.94 | 10.46 | 6.752 | 7.495 | 8.919 | 6.75 | 7.291 | 7.362 | 10.32 | 7.291 | |
| 5 | 9.389 | 8.96 | 10.24 | 8.405 | 6.825 | 7.118 | 8.192 | 6.445 | 7.195 | 6.87 | 10.37 | 6.657 | |
| 6 | 7.443 | 7.195 | 7.18 | 8.468 | 6.64 | 6.621 | 6.627 | 6.581 | 6.476 | 6.476 | 6.476 | 6.476 | |
| data3 | 2 | 10.81 | - | 32.91 | 10.81 | 10.04 | - | 34.75 | 10.04 | 10.46 | - | 34.14 | 10.46 |
| 3 | 18.03 | - | 30.27 | 18.03 | 8.803 | - | 29.65 | 8.803 | 7.027 | - | 31.23 | 7.027 | |
| 4 | 24.29 | - | 36.34 | 26.51 | 8.938 | - | 29.65 | 8.429 | 9.167 | - | 31.65 | 8.431 | |
| 5 | 19.8 | - | 24.75 | 22.44 | 8.462 | - | 18.08 | 8.511 | 9.489 | - | 17.71 | 9.489 | |
| 6 | 25.54 | - | 34.93 | 22.09 | 8.694 | - | 18.1 | 8.697 | 10.21 | - | 19.5 | 10.21 | |
| 7 | 28.97 | - | 19.99 | 28.82 | 8.689 | - | 8.509 | 8.509 | 10.49 | - | 10.49 | 10.49 | |
| data4 | 2 | 16.56 | 13.2 | 19.61 | 16.33 | 15.76 | 15.71 | 16.7 | 16.19 | 14.42 | 11.95 | 16.5 | 13.95 |
| 3 | 16.66 | 15.67 | 23.41 | 22.21 | 15.3 | 15.34 | 17.15 | 15.19 | 12.89 | 12.89 | 17.43 | 18.26 | |
| 4 | 19.93 | 18.11 | 21.6 | 30.08 | 14.22 | 15.08 | 16.95 | 15.14 | 10.61 | 12.21 | 18.67 | 17.99 | |
| 5 | 34.17 | 35.88 | 25 | 29.39 | 14.41 | 15.18 | 15.02 | 15.37 | 11.44 | 11.7 | 17.61 | 16.49 | |
| 6 | 36.72 | 30.21 | 36.99 | 19.94 | 15.19 | 16.42 | 15.19 | 15.35 | 13.09 | 12.61 | 16.65 | 15.12 | |
| 7 | 44.59 | 31.94 | 41.15 | 27.27 | 16.62 | 16.86 | 15.17 | 15.99 | 15.12 | 12.88 | 15.34 | 15.9 | |
| 8 | 38.54 | 35.72 | 50.87 | 31.42 | 16.86 | 15.82 | 18.32 | 15.85 | 15 | 14.79 | 16.28 | 14.68 | |
| 9 | 40.44 | 37.43 | 54.37 | 30.57 | 17.22 | 16.02 | 17.02 | 15.51 | 15.35 | 14.46 | 13.98 | 14 | |
| 10 | 27.37 | 37.36 | 33.5 | 41.29 | 17.07 | 17.4 | 16.16 | 16.94 | 14.52 | 14.52 | 14.05 | 14.48 | |
| 11 | 32.12 | 24.67 | 43.52 | 31 | 16.61 | 16.49 | 16.48 | 16.58 | 14.23 | 14.23 | 14.23 | 14.23 | |
| data5 | 2 | 4.614 | 3.985 | 7.478 | 5.499 | 2.796 | 3.399 | 4.403 | 2.773 | 3.008 | 3.426 | 4.883 | 3.008 |
| 3 | 5.387 | 5.73 | 9.231 | 9.56 | 3.443 | 3.708 | 4.517 | 3.678 | 3.785 | 4.02 | 4.419 | 3.726 | |
| 4 | 9.296 | 6.969 | 8.223 | 9.81 | 3.447 | 3.617 | 3.824 | 3.859 | 3.361 | 3.047 | 4.095 | 3.683 | |
| 5 | 13.51 | 6.432 | 12.18 | 12.16 | 3.731 | 3.725 | 3.837 | 3.953 | 3.309 | 3.004 | 3.929 | 3.929 | |
| 6 | 9.014 | 12.27 | 14.1 | 10.86 | 3.88 | 3.852 | 3.857 | 3.824 | 3.726 | 3.726 | 3.726 | 3.726 | |
| data6 | 2 | 12.44 | 12.44 | 9.926 | 14.37 | 7.991 | 7.991 | 9.728 | 7.962 | 9.197 | 9.197 | 9.659 | 10.44 |
| 3 | 10.27 | 10.27 | 11.43 | 12.71 | 8.072 | 8.072 | 9.952 | 7.887 | 9.478 | 9.478 | 9.737 | 9.771 | |
| 4 | 11.78 | 11.78 | 11.61 | 9.286 | 8.128 | 8.128 | 10.08 | 7.707 | 8.873 | 8.873 | 9.659 | 9.935 | |
| 5 | 10.97 | 10.97 | 12.24 | 9.282 | 8.075 | 8.075 | 10.11 | 7.453 | 9.277 | 9.277 | 9.767 | 9.223 | |
| 6 | 11.36 | 11.36 | 12.39 | 9.78 | 7.983 | 7.983 | 10.18 | 7.653 | 9.782 | 9.782 | 9.902 | 8.896 | |
| 7 | 12.51 | 12.51 | 16.08 | 10.52 | 7.721 | 7.721 | 11.46 | 7.564 | 9.278 | 9.278 | 10.97 | 9.381 | |
| 8 | 15.32 | 15.32 | 12.56 | 12.04 | 7.592 | 7.592 | 7.62 | 7.671 | 9.057 | 9.057 | 9.057 | 9.057 | |
| data7 | 2 | 17.3 | 13.72 | 16.87 | 22 | 17.04 | 16.84 | 16.8 | 16.67 | 15.31 | 13.53 | 17.15 | 17.1 |
| 3 | 17.22 | 17.41 | 18.38 | 17.57 | 16.93 | 16.33 | 16.72 | 14.19 | 14.81 | 13.43 | 17.92 | 14.17 | |
| 4 | 21.56 | 30.37 | 23.22 | 24.15 | 16.9 | 15.35 | 17.08 | 16.79 | 14.29 | 13.1 | 18.83 | 15.21 | |
| 5 | 33.58 | 39.28 | 23.73 | 30.83 | 16.53 | 16.3 | 16.92 | 16.81 | 14.66 | 11.7 | 19.11 | 16.01 | |
| 6 | 33.32 | 34.62 | 63.4 | 23.56 | 16.24 | 16.29 | 17.72 | 16.77 | 13.92 | 13.92 | 17.82 | 15.57 | |
| 7 | 30.29 | 22.24 | 51.4 | 27.79 | 16.97 | 17.14 | 19.8 | 17.05 | 13.48 | 13.48 | 19.2 | 16.04 | |
| 8 | 30.23 | 37.98 | 48.82 | 29.29 | 17.15 | 17.08 | 18.22 | 16.83 | 12.5 | 13.53 | 16.84 | 14.9 | |
| 9 | 27.6 | 31.73 | 27.64 | 31.12 | 16.52 | 17.23 | 15.26 | 16.94 | 13.63 | 12.13 | 13.6 | 14.49 | |
| 10 | 25.67 | 23.82 | 29.44 | 32.29 | 16.65 | 18.04 | 15.13 | 16.47 | 12.99 | 14.08 | 13.94 | 12.99 | |
| 11 | 31.77 | 26.83 | 38.02 | 24.64 | 17.43 | 17.43 | 17.83 | 17.56 | 13.41 | 13.41 | 13.41 | 13.41 | |
| data8 | 2 | 10.26 | 10.26 | 8.277 | 7.553 | 9.598 | 8.569 | 6.278 | 7.939 | 9.419 | 7.912 | 7.526 | 7.616 |
| 3 | 9.331 | 9.288 | 7.588 | 8.476 | 8.535 | 6.629 | 6.594 | 6.571 | 8.211 | 5.592 | 7.089 | 5.592 | |
| 4 | 9.099 | 10.05 | 10.7 | 8.355 | 8.73 | 7.066 | 7.205 | 7.461 | 8.403 | 6.361 | 7.49 | 6.756 | |
| 5 | 8.362 | 8.885 | 6.981 | 9.035 | 7.394 | 7.397 | 7.529 | 7.879 | 6.49 | 6.49 | 6.49 | 7.851 | |
| 6 | 8.951 | 7.187 | 6.995 | 8.006 | 7.941 | 7.931 | 8.001 | 7.863 | 7.836 | 7.836 | 7.836 | 7.836 | |
| data9 | 2 | 16.17 | 13.58 | 10.66 | 17.19 | 11.82 | 12.21 | 11.24 | 12.03 | 13.86 | 14.93 | 10.86 | 13.06 |
| 3 | 25.35 | 15.65 | 20.49 | 28.96 | 12.12 | 12.1 | 13.03 | 11.92 | 14.8 | 6.703 | 11.51 | 13.78 | |
| 4 | 12.65 | 11.33 | 21.61 | 28.7 | 12.18 | 11.2 | 12.46 | 12.06 | 7.937 | 8.199 | 12.83 | 12.36 | |
| 5 | 11.84 | 10.61 | 21.9 | 22.88 | 12.61 | 11.84 | 12.74 | 11.57 | 7.723 | 7.647 | 13.86 | 11.16 | |
| 6 | 11.92 | 13.22 | 9.841 | 11.65 | 11.76 | 11.63 | 11.78 | 11.51 | 8.499 | 8.499 | 8.499 | 8.499 | |
| data10 | 2 | 32.14 | 30.13 | 25.96 | 30.47 | 27.84 | 25.34 | 30.2 | 28.6 | 30.14 | 27.71 | 29.78 | 31.83 |
| 3 | 31.27 | 37.74 | 37.7 | 43.2 | 28.3 | 27.61 | 29.41 | 27.72 | 29.44 | 30.91 | 31.86 | 33.74 | |
| 4 | 36.91 | 41.74 | 110.5 | 57.46 | 26 | 28.35 | 29.69 | 27.73 | 31.55 | 32.74 | 32.86 | 29.7 | |
| 5 | 87.53 | 70.8 | 64 | 65.66 | 25.83 | 28.42 | 28.5 | 27.86 | 26.62 | 31.92 | 32.08 | 30.97 | |
| 6 | 58.12 | 109.3 | 73.69 | 70.89 | 27.07 | 27.94 | 27.36 | 28.7 | 28.34 | 29.66 | 28.68 | 32.72 | |
| 7 | 51.65 | 73.03 | 79.89 | 131.1 | 27.84 | 27.79 | 27.85 | 28.28 | 28.26 | 28.26 | 30.09 | 28.26 | |
| 8 | 119.2 | 59.68 | 164 | 152.5 | 27.69 | 27.69 | 27.66 | 27.71 | 29.86 | 29.86 | 29.86 | 29.86 | |
| data11 | 2 | 8.346 | 7.481 | 11 | 8.346 | 6.88 | 6.17 | 10.39 | 6.88 | 8.277 | 7.54 | 11.42 | 8.277 |
| 3 | 8.396 | 6.597 | 11.13 | 6.097 | 6.545 | 5.277 | 9.212 | 4.861 | 8.344 | 6.37 | 10.56 | 6.321 | |
| 4 | 7.488 | 5.862 | 9.489 | 5.973 | 5.985 | 5.099 | 8.338 | 4.833 | 8.257 | 6.095 | 10.14 | 5.835 | |
| 5 | 5.313 | 5.595 | 11.6 | 5.604 | 4.4 | 5.371 | 7.698 | 4.708 | 5.507 | 5.477 | 10.08 | 5.857 | |
| 6 | 5.811 | 5.117 | 6.838 | 6.057 | 4.648 | 4.766 | 6.491 | 4.819 | 5.689 | 5.716 | 7.097 | 5.717 | |
| 7 | 5.154 | 5.85 | 4.885 | 5.613 | 4.699 | 4.736 | 4.679 | 4.73 | 5.534 | 5.534 | 5.534 | 5.534 | |
| data12 | 2 | 3.331 | 3.331 | 3.331 | 3.395 | 3.019 | 3.019 | 3.019 | 2.986 | 3.197 | 3.197 | 3.197 | 3.197 |
| 3 | 3.235 | 2.893 | 2.893 | 2.953 | 2.79 | 2.395 | 2.395 | 2.386 | 3.221 | 2.802 | 2.802 | 2.802 | |
| 4 | 3.22 | 2.696 | 2.696 | 2.763 | 2.64 | 2.083 | 2.083 | 2.097 | 3.341 | 2.668 | 2.668 | 2.668 | |
| 5 | 2.16 | 2.136 | 2.136 | 2.202 | 2.031 | 2.035 | 2.035 | 2.026 | 2.124 | 2.124 | 2.124 | 2.124 | |
| data13 | 2 | 7.721 | - | 7.379 | 6.556 | 2.869 | - | 6.862 | 3.945 | 3.568 | - | 5.544 | 5.214 |
| 3 | 5 | - | 8.685 | 6.277 | 3.947 | - | 6.879 | 3.947 | 5.13 | - | 5.513 | 5.13 | |
| 4 | 7.615 | - | 7.089 | 6.629 | 3.991 | - | 2.859 | 2.929 | 5.283 | - | 3.159 | 3.147 | |
| 5 | 12.07 | - | 7.48 | 6.057 | 3.999 | - | 2.837 | 2.838 | 5.204 | - | 3.112 | 3.112 | |
| 6 | 13.17 | - | 5.768 | 8.832 | 2.983 | - | 2.897 | 2.917 | 3.225 | - | 3.245 | 3.245 | |
| 7 | 13.22 | - | 13.63 | 15.82 | 2.882 | - | 2.893 | 2.912 | 3.141 | - | 3.141 | 3.141 | |
| data14 | 2 | 7.771 | 7.446 | 9.941 | 7.771 | 6.112 | 6.176 | 8.58 | 6.112 | 7.552 | 7.342 | 9.889 | 7.552 |
| 3 | 7.141 | 5.993 | 10.04 | 6.621 | 5.626 | 4.746 | 7.94 | 4.918 | 7.041 | 5.641 | 9.745 | 6.317 | |
| 4 | 6.741 | 5.776 | 9.562 | 6.05 | 5.272 | 4.645 | 7.563 | 4.701 | 6.424 | 5.579 | 9.45 | 5.803 | |
| 5 | 6.283 | 5.097 | 8.557 | 5.867 | 4.892 | 4.731 | 6.401 | 4.58 | 5.938 | 4.815 | 7.866 | 5.62 | |
| 6 | 5.538 | 4.548 | 6.322 | 5.508 | 4.633 | 4.499 | 5.445 | 4.47 | 4.994 | 4.487 | 5.431 | 4.934 | |
| 7 | 4.636 | 4.662 | 4.71 | 4.472 | 4.105 | 4.111 | 4.108 | 4.11 | 4.081 | 4.081 | 4.081 | 4.081 | |
References
- DeRousseau, M.A.; Kasprzyk, J.R.; Srubar, W.V. Computational design optimization of concrete mixtures: A review. Cem. Concr. Res. 2018, 109, 42–53. [Google Scholar] [CrossRef]
- Chaabene, W.B.; Flah, M.; Nehdi, M.L. Machine learning prediction of mechanical properties of concrete: Critical review. Constr. Build. Mater. 2020, 260, 119889. [Google Scholar] [CrossRef]
- Behnood, A.; Golafshani, E.M. Artificial intelligence to model the performance of concrete mixtures and elements: A review. Arch. Comput. Methods Eng. 2022, 29, 1941–1964. [Google Scholar] [CrossRef]
- Adeli, H.; Yeh, C. Perceptron learning in engineering design. Comput. Aided Civ. Infrastruct. Eng. 1989, 4, 247–256. [Google Scholar] [CrossRef]
- Yeh, I.C. Modeling of strength of high-performance concrete using artificial neural networks. Cem. Concr. Res. 1998, 28, 1797–1808. [Google Scholar] [CrossRef]
- Hu, X.; Li, B.; Mo, Y.; Alselwi, O. Progress in Artificial Intelligence-based Prediction of Concrete Performance. J. Adv. Concr. Technol. 2021, 19, 924–936. [Google Scholar] [CrossRef]
- Chang, W.; Zheng, W. Effects of key parameters on fluidity and compressive strength of ultra-high performance concrete. Struct. Concr. 2020, 21, 747–760. [Google Scholar] [CrossRef]
- Asteris, P.; Kolovos, K.; Douvika, M.; Roinos, K. Prediction of self-compacting concrete strength using artificial neural networks. Eur. J. Environ. Civ. Eng. 2016, 20, s102–s122. [Google Scholar] [CrossRef]
- Sobhani, J.; Najimi, M.; Pourkhorshidi, A.R.; Parhizkar, T. Prediction of the compressive strength of no-slump concrete: A comparative study of regression, neural network and ANFIS models. Constr. Build. Mater. 2010, 24, 709–718. [Google Scholar] [CrossRef]
- Šipoš, T.K.; Miličević, I.; Siddique, R. Model for mix design of brick aggregate concrete based on neural network modelling. Constr. Build. Mater. 2017, 148, 757–769. [Google Scholar] [CrossRef]
- Asteris, P.G.; Kolovos, K.G. Self-compacting concrete strength prediction using surrogate models. Neural Comput. Appl. 2019, 31, 409–424. [Google Scholar] [CrossRef]
- Saleh, M.A.; Kazemi, F.; Abdelgader, H.S.; Isleem, H.F. Optimization-based multitarget stacked machine-learning model for estimating mechanical properties of conventional and fiber-reinforced preplaced aggregate concrete. Arch. Civ. Mech. Eng. 2025, 25, 185. [Google Scholar] [CrossRef]
- Young, B.A.; Hall, A.; Pilon, L.; Gupta, P.; Sant, G. Can the compressive strength of concrete be estimated from knowledge of the mixture proportions?: New insights from statistical analysis and machine learning methods. Cem. Concr. Res. 2019, 115, 379–388. [Google Scholar] [CrossRef]
- Sah, A.K.; Hong, Y.-M. Performance comparison of machine learning models for concrete compressive strength prediction. Materials 2024, 17, 2075. [Google Scholar] [CrossRef] [PubMed]
- Wan, Z.; Xu, Y.; Šavija, B. On the use of machine learning models for prediction of compressive strength of concrete: Influence of dimensionality reduction on the model performance. Materials 2021, 14, 713. [Google Scholar] [CrossRef]
- Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. [Google Scholar] [CrossRef]
- Ouyang, B.; Li, Y.; Song, Y.; Wu, F.; Yu, H.; Wang, Y.; Bauchy, M.; Sant, G. Learning from Sparse Datasets: Predicting Concrete’s Strength by Machine Learning. arXiv 2020, arXiv:2004.14407. [Google Scholar]
- Guyon, I.; Elisseef, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Garcia, S.; Luengo, J.; Herrera, F. Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl. Based Syst. 2016, 98, 1–29. [Google Scholar] [CrossRef]
- Langley, P. Selection of relevant features in machine learning. In Proceedings of the AAAI Fall Symposium on Relevance, New Orleans, LA, USA, 4–6 November 1994; Volume 184. [Google Scholar]
- Baudat, G.; Anouar, F. Generalized discriminant analysis using a kernel approach. Neural Comput. 2000, 12, 2385–2404. [Google Scholar] [CrossRef]
- Liu, C. Gabor-based kernel PCA with fractional power polynomial models for face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 572–581. [Google Scholar] [CrossRef]
- Li, Y.; Li, T.; Liu, H. Recent advances in feature selection and its applications. Knowl. Inf. Syst. 2017, 53, 551–577. [Google Scholar] [CrossRef]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Al-Tashi, Q.; Abdulkadir, S.J.; Rais, H.M.; Mirjalili, S.; Alhussian, H. Approaches to multi-objective feature selection: A systematic literature review. IEEE Access 2020, 8, 125076–125096. [Google Scholar] [CrossRef]
- Spolaôr, N.; Monard, M.C.; Tsoumakas, G.; Lee, H.D. A systematic review of multi-label feature selection and a new method based on label construction. Neurocomputing 2016, 180, 3–15. [Google Scholar] [CrossRef]
- Guan, D.; Yuan, W.; Lee, Y.-K.; Najeebullah, K.; Rasel, M.K. A review of ensemble learning based feature selection. IETE Tech. Rev. 2014, 31, 190–198. [Google Scholar] [CrossRef]
- Bolón-Canedo, V.; Alonso-Betanzos, A. Ensembles for feature selection: A review and future trends. Inf. Fusion 2019, 52, 1–12. [Google Scholar] [CrossRef]
- Solorio-Fernández, S.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F. A review of unsupervised feature selection methods. Artif. Intell. Rev. 2020, 53, 907–948. [Google Scholar] [CrossRef]
- Niño-Adan, I.; Manjarres, D.; Landa-Torres, I.; Portillo, E. Feature weighting methods: A review. Expert Syst. Appl. 2021, 184, 115424. [Google Scholar] [CrossRef]
- De La Iglesia, B. Evolutionary computation for feature selection in classification problems. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2013, 3, 381–407. [Google Scholar] [CrossRef]
- Kononenko, I. Estimating attributes: Analysis and extensions of RELIEF. In Proceedings of the European Conference on Machine Learning on Machine Learning, Catania, Italy, 6–8 April 1994. [Google Scholar]
- Urbanowicz, R.J.; Meeker, M.; La Cava, W.; Olson, R.S.; Moore, J.H. Relief-based feature selection: Introduction and review. J. Biomed. Inform. 2018, 85, 189–203. [Google Scholar] [PubMed]
- Robnik-Šikonja, M.; Kononenko, I. An adaptation of Relief for attribute estimation in regression, Machine Learning. In Proceedings of the Fourteenth International Conference (ICML’97), San Francisco, CA, USA, 8–12 July 1997; pp. 296–304. [Google Scholar]
- Lv, Y.; Shi, X.; Ran, L.; Shang, M. Random Forest-Based Ensemble Estimator for Concrete Compressive Strength Prediction via AdaBoost Method. In Proceedings of the International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, Hohhot, China, 26–28 July 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 557–565. [Google Scholar]
- Goliatt, L.; Farage, M. An Extreme Learning Machine with Feature Selection for Estimating Mechanical Properties of Lightweight Aggregate Concretes. In Proceedings of the 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018; IEEE: New York, NY, USA, 2018; pp. 1–7. [Google Scholar]
- Farooq, F.; Czarnecki, S.; Niewiadomski, P.; Aslam, F.; Alabduljabbar, H.; Ostrowski, K.A.; Śliwa-Wieczorek, K.; Nowobilski, T.; Malazdrewicz, S. A comparative study for the prediction of the compressive strength of self-compacting concrete modified with fly ash. Materials 2021, 14, 4934. [Google Scholar] [CrossRef] [PubMed]
- Kang, M.-C.; Yoo, D.-Y.; Gupta, R. Machine learning-based prediction for compressive and flexural strengths of steel fiber-reinforced concrete. Constr. Build. Mater. 2021, 266, 121117. [Google Scholar]
- Liu, F.; Ding, W.; Qiao, Y.; Wang, L. An artificial neural network model on tensile behavior of hybrid steel-PVA fiber reinforced concrete containing fly ash and slag power. Front. Struct. Civ. Eng. 2020, 14, 1299–1315. [Google Scholar] [CrossRef]
- Cao, Y.; Fan, Q.; Azar, S.M.; Alyousef, R.; Yousif, S.T.; Wakil, K.; Jermsittiparsert, K.; Ho, L.S.; Alabduljabbar, H.; Alaskar, A. Computational parameter identification of strongest influence on the shear resistance of reinforced concrete beams by fiber reinforcement polymer. Structures 2020, 27, 118–127. [Google Scholar] [CrossRef]
- Rinchon, J.P.M.; Concha, N.C.; Calilung, M.G.V. Reinforced concrete ultimate bond strength model using hybrid neural network-genetic algorithm. In Proceedings of the 2017 IEEE 9th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), Manila, Philippines, 1–3 December 2017; IEEE: New York, NY, USA, 2017; pp. 1–6. [Google Scholar]
- Lu, S.; Koopialipoor, M.; Asteris, P.G.; Bahri, M.; Armaghani, D.J. A novel feature selection approach based on tree models for evaluating the punching shear capacity of steel fiber-reinforced concrete flat slabs. Materials 2020, 13, 3902. [Google Scholar] [CrossRef]
- Liu, T.; Wang, Z.; Zeng, J.; Wang, J. Machine-learning-based models to predict shear transfer strength of concrete joints. Eng. Struct. 2021, 249, 113253. [Google Scholar] [CrossRef]
- Haruna, S.I.; Farouk, A.I.; Ibrahim, Y.E.; Nawar, M.T.; Abdulrahman, S.; Abdulhadi, M. Insights into the Feature-Selection Mechanisms for Modeling the Shear Capacity of Stud Connectors in Concrete: A Machine Learning Approach. J. Compos. Sci. 2026, 10, 34. [Google Scholar] [CrossRef]
- Nunez, I.; Marani, A.; Flah, M.; Nehdi, M.L. Estimating compressive strength of modern concrete mixtures using computational intelligence: A systematic review. Constr. Build. Mater. 2021, 310, 125279. [Google Scholar] [CrossRef]
- Mirzahosseini, M.; Jiao, P.; Barri, K.; Riding, K.A.; Alavi, A.H. New machine learning prediction models for compressive strength of concrete modified with glass cullet. Eng. Comput. 2019, 36, 876–898. [Google Scholar] [CrossRef]
- Dragaš, J.; Marinković, S.; Radonjanin, V. Prediction models for high-volume fly ash concrete practical application: Mechanical properties and experimental database. Građevinski Mater. I Konstr. 2021, 64, 19–43. [Google Scholar] [CrossRef]
- Qi, C.; Huang, B.; Wu, M.; Wang, K.; Yang, S.; Li, G. Concrete strength prediction using different machine learning processes: Effect of slag, fly ash and superplasticizer. Materials 2022, 15, 5369. [Google Scholar] [CrossRef] [PubMed]
- Liu, G.; Sun, B. Concrete compressive strength prediction using an explainable boosting machine model. Case Stud. Constr. Mater. 2023, 18, e01845. [Google Scholar] [CrossRef]
- Fu, H.; Zhou, X.; Xu, P.; Sun, D. Prediction of Compressive Strength of Concrete Using Explainable Machine Learning Models. Materials 2025, 18, 5009. [Google Scholar] [CrossRef] [PubMed]
- Abuodeh, O.R.; Abdalla, J.A.; Hawileh, R.A. Assessment of compressive strength of ultra-high performance concrete using deep machine learning techniques. Appl. Soft Comput. 2020, 95, 106552. [Google Scholar] [CrossRef]
- Kaloop, M.R.; Kumar, D.; Samui, P.; Hu, J.W.; Kim, D. Compressive strength prediction of high-performance concrete using gradient tree boosting machine. Constr. Build. Mater. 2020, 264, 120198. [Google Scholar] [CrossRef]
- Vakharia, V.; Gujar, R. Prediction of compressive strength and portland cement composition using cross-validation and feature ranking techniques. Constr. Build. Mater. 2019, 225, 292–301. [Google Scholar] [CrossRef]
- Abuodeh, O.; Abdalla, J.A.; Hawileh, R.A. Prediction of compressive strength of ultra-high performance concrete using SFS and ANN. In Proceedings of the 2019 8th International Conference on Modeling Simulation and Applied Optimization (ICMSAO), Manama, Bahrain, 15–17 April 2019; IEEE: New York, NY, USA, 2019; pp. 1–5. [Google Scholar]
- Timur Cihan, M. Prediction of concrete compressive strength and slump by machine learning methods. Adv. Civ. Eng. 2019, 2019, 3069046. [Google Scholar] [CrossRef]
- Keleş, M.K.; Keleş, A.E.; Kiliç, Ü. Prediction of concrete strength with data mining methods using artificial bee colony as feature selector. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 28–30 September 2018; IEEE: New York, NY, USA, 2018; pp. 1–4. [Google Scholar]
- Li, K.; Long, Y.; Wang, H.; Wang, Y.-F. Modeling and Sensitivity Analysis of Concrete Creep with Machine Learning Methods. J. Mater. Civ. Eng. 2021, 33, 04021206. [Google Scholar] [CrossRef]
- Li, W.; Li, H.; Liu, C.; Min, K. Concrete Creep Prediction Based on Improved Machine Learning and Game Theory: Modeling and Analysis Methods. Buildings 2024, 14, 3627. [Google Scholar] [CrossRef]










| Data Set | Type | Data Size | Feature Set Size | Min of CS (MPa) | Max of CS (MPa) |
|---|---|---|---|---|---|
| data0 | normal concrete | 7083 | 17 | 12.9 | 79.9 |
| data1 | high-performance concrete | 207 | 9 | 3 | 133.6 |
| data2 | self-compacting concrete | 127 | 6 | 10.2 | 73.5 |
| data3 | ultra-high-performance concrete | 58 | 9 | 77 | 211 |
| data4 | self-compacting concrete | 169 | 11 | 10.2 | 117.03 |
| data5 | slump-free concrete | 32 | 6 | 52.2 | 76.7 |
| data6 | masonry aggregate concrete | 147 | 8 | 8.7 | 80.5 |
| data7 | self-compacting concrete | 205 | 11 | 10.2 | 122 |
| data8 | rice husk ash concrete | 60 | 6 | 42.47 | 92.21 |
| data9 | self-compacting concrete | 80 | 6 | 10.2 | 73.5 |
| data10 | ultra-high-performance concrete | 110 | 8 | 95 | 240 |
| data11 | high-performance concrete | 425 | 7 | 8.536 | 81.751 |
| data12 | high-performance concrete | 357 | 5 | 4.096 | 91.3 |
| data13 | recycled concrete | 74 | 7 | 25.8 | 52.4 |
| data14 | high-performance concrete | 528 | 7 | 8.54 | 81.75 |
| Feature | Mean | Std | Var | Ske | Kur | Mode | Min | Med | Max | Rg |
|---|---|---|---|---|---|---|---|---|---|---|
| ShengWei_P·O42.5 | 0.96 | 15.10 | 227.96 | 16.84 | 305.82 | 0.00 | 0.00 | 0.00 | 420.00 | 420.00 |
| ShengTai_P·O42.5 | 232.20 | 58.26 | 3394.48 | 0.08 | 0.90 | 220.00 | 0.00 | 230.00 | 430.00 | 430.00 |
| Fly_Ash | 81.40 | 14.69 | 215.86 | 0.78 | 9.68 | 80.00 | 0.00 | 80.00 | 200.00 | 200.00 |
| Fiber_expansion_agent | 0.65 | 4.67 | 21.77 | 7.01 | 47.13 | 0.00 | 0.00 | 0.00 | 34.00 | 34.00 |
| Siliceous_compacting_agent | 0.20 | 2.58 | 6.63 | 13.03 | 167.91 | 0.00 | 0.00 | 0.00 | 34.00 | 34.00 |
| Expansion_agent | 0.04 | 1.21 | 1.47 | 28.01 | 782.55 | 0.00 | 0.00 | 0.00 | 34.00 | 34.00 |
| Datang_S95_mineral_powder | 60.64 | 23.55 | 554.56 | −1.61 | 2.11 | 60.00 | 0.00 | 70.00 | 110.00 | 110.00 |
| Shangluo_medium_sand | 383.24 | 164.72 | 27,133.86 | 0.51 | 0.87 | 300.00 | 0.00 | 350.00 | 970.00 | 970.00 |
| Tongchuan_medium_sand | 380.68 | 164.72 | 27,133.64 | 0.55 | 0.90 | 300.00 | 0.00 | 350.00 | 970.00 | 970.00 |
| Coarse_sand | 167.36 | 145.53 | 21,180.04 | 0.17 | −0.96 | 0.00 | 0.00 | 200.00 | 780.00 | 780.00 |
| Commercial_coarse_sand | 347.41 | 154.69 | 23,929.07 | 0.72 | 1.44 | 300.00 | 0.00 | 300.00 | 950.00 | 950.00 |
| 5–25 mm_crushed_stone | 752.23 | 217.99 | 47,520.97 | −2.55 | 6.51 | 750.00 | 0.00 | 770.00 | 1050.00 | 1050.00 |
| Brick_slag | 0.67 | 21.55 | 464.44 | 32.17 | 1041.71 | 0.00 | 0.00 | 0.00 | 750.00 | 750.00 |
| 5–10 mm_fine_stone | 188.97 | 172.61 | 29,793.39 | 2.38 | 7.19 | 200.00 | 0.00 | 200.00 | 950.00 | 950.00 |
| Water_reducing_agent | 8.78 | 1.64 | 2.68 | 0.13 | 0.65 | 8.00 | 0.00 | 8.80 | 16.20 | 16.20 |
| water | 77.12 | 19.79 | 391.63 | 1.57 | 2.20 | 70.00 | 0.00 | 70.00 | 200.00 | 200.00 |
| Sewage | 59.01 | 24.58 | 604.37 | −1.83 | 1.72 | 70.00 | 0.00 | 70.00 | 100.00 | 100.00 |
| 28d_compressive_strength | 44.84 | 10.38 | 107.82 | −0.38 | 0.76 | 45.47 | 12.93 | 45.47 | 79.93 | 67.00 |
| Correlation | Partial Correlation | Information Entropy | Relief | ||||
|---|---|---|---|---|---|---|---|
| ShengTai_P·O42.5 | 0.799 | ShengTai_P·O42.5 | 0.722 | ShengTai_P·O42.5 | 0.525 | Shangluo_medium_sand | −3.07 × 10−3 |
| Fly_Ash | 0.536 | Datang_S95_mineral_powder | 0.517 | Water_reducing_agent | 0.084 | ShengWei_P·O42.5 | −8.09 × 10−3 |
| Water_reducing_agent | 0.513 | ShengWei_P·O42.5 | 0.386 | Datang_S95_mineral_powder | 0.063 | Water_reducing_agent | −0.01550063 |
| Sewage | 0.487 | water | 0.141 | Commercial_coarse_sand | 0.054 | water | −3.08 × 10−2 |
| 5−10 mm_fine_stone | 0.425 | Sewage | 0.137 | water | 0.052 | ShengTai_P·O42.5 | −3.75 × 10−2 |
| 5−25 mm_crushed_stone | 0.403 | Water_reducing_agent | 0.131 | 5–25 mm_crushed_stone | 0.051 | Expansion_agent | −1.44 × 10−1 |
| Datang_S95_mineral_powder | 0.388 | Coarse_sand | 0.097 | Shangluo_medium_sand | 0.035 | Brick_slag | −2.85 × 10−1 |
| water | 0.199 | Expansion_agent | 0.065 | Tongchuan_medium_sand | 0.035 | Fiber_expansion_agent | −6.56 × 10−1 |
| Coarse_sand | 0.136 | 5–10 mm_fine_stone | 0.064 | Coarse_sand | 0.027 | 5–25 mm_crushed_stone | −8.35 × 10−1 |
| Shangluo_medium_sand | 0.100 | Fly_Ash | 0.045 | Sewage | 0.026 | Coarse_sand | −9.32 × 10−1 |
| Tongchuan_medium_sand | 0.096 | Tongchuan_medium_sand | 0.036 | Fly_Ash | 0.022 | Siliceous_compacting_agent | −9.50 × 10−1 |
| Brick_slag | 0.063 | Shangluo_medium_sand | 0.024 | 5–10 mm_fine_stone | 0.019 | Datang_S95_mineral_powder | −1.04 × 100 |
| Fiber_expansion_agent | 0.056 | Siliceous_compacting_agent | 0.019 | ShengWei_P·O42.5 | 0.006 | Tongchuan_medium_sand | −1.05 × 100 |
| Expansion_agent | 0.044 | Commercial_coarse_sand | 0.018 | Fiber_expansion_agent | 0.000 | Fly_Ash | −1.12 × 100 |
| Siliceous_compacting_agent | 0.034 | 5–25 mm_crushed_stone | 0.016 | Expansion_agent | 0.000 | 5–10 mm_fine_stone | −1.40 × 100 |
| Commercial_coarse_sand | 0.031 | Fiber_expansion_agent | 0.014 | Siliceous_compacting_agent | 0.000312 | Sewage | −2.40 × 100 |
| ShengWei_P·O42.5 | 0.01 | Brick_slag | 0.001 | Brick_slag | 0.00011 | Commercial_coarse_sand | −3.08 × 100 |
| ANN | SVM | RF | |||
|---|---|---|---|---|---|
| Model parameter | Value | Model parameter | Value | Model parameter | Value |
| hidden_layer_sizes | 30 | kernel | Rbf | n_estimators | 100 |
| random_state | 0 | C | 100 | min_samples_split | 2 |
| max_iter | 2000 | gamma | 0.1 | min_samples_leaf | 1 |
| activation | Relu | epsilon | 0.1 | random_state | 0 |
| solver | Adam | degree | 3 | min_impurity_decrease | 0 |
| alpha | 0.0001 | coef0 | 0 | min_weight_fraction_leaf | 0 |
| learning_rate_init | 0.001 | tol | 1 × 10−3 | ccp_alpha | 0 |
| power_t | 0.5 | max_iter | −1 | max_features | Auto |
| tol | 0.0001 | - | - | max_samples | None |
| validation_fraction | 0.1 | - | - | - | - |
| beta_1 | 0.9 | - | - | - | - |
| beta_2 | 0.999 | - | - | - | - |
| epsilon | 1 × 10−8 | - | - | - | - |
| n_iter_no_change | 10 | - | - | - | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Mo, Y.; Li, B.; Yan, C.; Hu, X. Influence of Data Structure on Prediction Error in Machine Learning-Based Concrete Compressive Strength Models. Buildings 2026, 16, 1537. https://doi.org/10.3390/buildings16081537
Mo Y, Li B, Yan C, Hu X. Influence of Data Structure on Prediction Error in Machine Learning-Based Concrete Compressive Strength Models. Buildings. 2026; 16(8):1537. https://doi.org/10.3390/buildings16081537
Chicago/Turabian StyleMo, Yelan, Bixiong Li, Chengcheng Yan, and Xiangxin Hu. 2026. "Influence of Data Structure on Prediction Error in Machine Learning-Based Concrete Compressive Strength Models" Buildings 16, no. 8: 1537. https://doi.org/10.3390/buildings16081537
APA StyleMo, Y., Li, B., Yan, C., & Hu, X. (2026). Influence of Data Structure on Prediction Error in Machine Learning-Based Concrete Compressive Strength Models. Buildings, 16(8), 1537. https://doi.org/10.3390/buildings16081537

