A Comparative Study for the Prediction of the Compressive Strength of Self-Compacting Concrete Modified with Fly Ash

Artificial intelligence and machine learning are employed in creating functions for the prediction of self-compacting concrete (SCC) strength based on input variables proportion as cement replacement. SCC incorporating waste material has been used in learning approaches. Artificial neural network (ANN) support vector machine (SVM) and gene expression programming (GEP) consisting of 300 datasets have been utilized in the model to foresee the mechanical property of SCC. Data used in modeling consist of several input parameters such as cement, water–binder ratio, coarse aggregate, fine aggregate, and fly ash (FA) in combination with the superplasticizer. The best predictive models were selected based on the coefficient of determination (R2) results and model validation. Empirical relation with mathematical expression has been proposed using ANN, SVM, and GEP. The efficiency of the models is assessed by permutation features importance, statistical analysis, and comparison between regression models. The results reveal that the proposed machine learning models achieved adamant accuracy and has elucidated performance in the prediction aspect.


Introduction
In recent years, concrete technology has been improving due to the fact that it is the most commonly used building material in the world. The knowledge of advance techniques of designing concrete has also improved recently due to different type of concrete being designed containing different admixtures [1]. One of the results of developing concrete designing technology is self-compacting concrete (SCC) [2]. Self-compacting concrete is defined as a cementitious material that can flow under its own weight and was first developed in the late 1990s in Japan. SCC deforms efficiently and shows maximum resistance to segregation and bleeding as per American Concrete Institute committee 237 R-07 [3]. Moreover, due to its workability, SCC is more often used where there is a need of creating different shapes of the elements or there are some parts of elements hardly reachable [4].

Research Significance
The novelty of this research is the usage of the newest machine learning algorithms in the comparative manner in order to evaluate the compressive strength of fly ash-based self-compacting concrete. For this purpose, the artificial neural network, support vector machine, and genetic expression programming were used. In particular, the novelty of this research is the usage of the genetic expression programming for this purpose. The best model among those investigated was selected after optimization. Permutation features and statistical analysis with in-depth error measures are conducted to compare the accuracy of aforementioned models and comparing them with others in the scientific field.

Artificial Neural Network (ANNs)
Artificial neural networks are algorithms simulating the microstructure (neurons) of a biological nervous system [27][28][29]. Their structure is similar to the biological connection between neurons in the human brain. The ANNs consist of layers: input (consist of variables used in order to forecast the investigated property), hidden (consist of nodes connected with other layers using functions and weights) and output (which is consist of predicted variables). It is possible to analyze data using ANN thanks to learning algorithms such as: quasi-Newtons, Levenberg-Marquardt's and conjugate gradients [8]. ANNs are widely used in many applications and can therefore be a useful tool in engineering applications [30].
In this study, a multi-layer perceptron (MLP) feed-forward with backpropagation algorithm ANNs have been selected. One hidden layer and varying neuron numbers are selected to find the optimum performance of the multilayer perceptron neuron network (MLPNN) [31]. The learning algorithms used in ANNs modeling of SCC compressive strength were the Broyden-Fletcher-Goldfarb-Shano algorithm and the Levenberg-Marquardt algorithm. The data set division was fixed as: 70% of data was used in training process and 30% of data was used in processes of testing and validation [32]. Moreover, optimization of the training, validation set, and training set was obtained by changing the number of neuron layers with iteration, and vice versa. The most accurate results were obtained for the topology of six inputs, 13 hidden neurons, and one output. The topology of this network is presented in Figure 1 and described in Table 2. In this study, a multi-layer perceptron (MLP) feed-forward with backpropagation algorithm ANNs have been selected. One hidden layer and varying neuron numbers are selected to find the optimum performance of the multilayer perceptron neuron network (MLPNN) [31]. The learning algorithms used in ANNs modeling of SCC compressive strength were the Broyden-Fletcher-Goldfarb-Shano algorithm and the Levenberg-Marquardt algorithm. The data set division was fixed as: 70% of data was used in training process and 30% of data was used in processes of testing and validation [32]. Moreover, optimization of the training, validation set, and training set was obtained by changing the number of neuron layers with iteration, and vice versa. The most accurate results were obtained for the topology of six inputs, 13 hidden neurons, and one output. The topology of this network is presented in Figure 1 and described in Table 2.

Support Vector Machine (SVM)
The support vector machine is a supervised learning model used for analyzing classification and regression data, invented by Vapnik [33]. The data are represented as a map of points in space and the solution is the hyperplane (lane in 2D, plane in 3D, etc.) with the widest possible gap between two classes. Each point in this space is described with support vectors; however, there are some situations wherein the division of the data set is possible only after using kernel functions, presented in Figure 2. The support vector machine has been successfully used in solving some engineering problems, e.g., analyzing the durability of lightweight cement composites with hydrophobic coatings modified by nanocellulose [33]. In this work, the v-SVM was used, with linear kernel function as the most accurate. The other kernel function tested: polynomial, RBF, or sigmoid were not that significantly accurate.

Genetic Engineering Programming (GEP)
Genetic engineering programming is a versatile approach as it incorporates both gene algorithms (GAs) and genetic programming (GP) [34]. This algorithm consisting of trees, that are called expression trees (ETs), and the benefit of this solution is the fact of adamantly simplified at the chromosome level operation of genetic work [35]. Another modification in GEP, in comparison to GAs, is that the individual chromosomes that contain numerous genes and are additionally classified into the model head and tail [36]. Each individual gene of GEP, presented as a node of the ET, stores a number of variables with constant length, function set, and terminal sets. Function set, terminal set, and variables are connected with each other via a linear genetic code. It is worth mentioning here that these sets must have closure property. A sample of the GEP gene can also be represented by an expression tree (ETs) diagram. An example of ET diagram is shown in Figure 3. The support vector machine has been successfully used in solving some engineering problems, e.g., analyzing the durability of lightweight cement composites with hydrophobic coatings modified by nanocellulose [33]. In this work, the v-SVM was used, with linear kernel function as the most accurate. The other kernel function tested: polynomial, RBF, or sigmoid were not that significantly accurate.

Genetic Engineering Programming (GEP)
Genetic engineering programming is a versatile approach as it incorporates both gene algorithms (GAs) and genetic programming (GP) [34]. This algorithm consisting of trees, that are called expression trees (ETs), and the benefit of this solution is the fact of adamantly simplified at the chromosome level operation of genetic work [35]. Another modification in GEP, in comparison to GAs, is that the individual chromosomes that contain numerous genes and are additionally classified into the model head and tail [36]. Each individual gene of GEP, presented as a node of the ET, stores a number of variables with constant length, function set, and terminal sets. Function set, terminal set, and variables are connected with each other via a linear genetic code. It is worth mentioning here that these sets must have closure property. A sample of the GEP gene can also be represented by an expression tree (ETs) diagram. An example of ET diagram is shown in Figure 3. adamantly simplified at the chromosome level operation of genetic work [35]. Another modification in GEP, in comparison to GAs, is that the individual chromosomes that contain numerous genes and are additionally classified into the model head and tail [36]. Each individual gene of GEP, presented as a node of the ET, stores a number of variables with constant length, function set, and terminal sets. Function set, terminal set, and variables are connected with each other via a linear genetic code. It is worth mentioning here that these sets must have closure property. A sample of the GEP gene can also be represented by an expression tree (ETs) diagram. An example of ET diagram is shown in Figure 3.  It is expected that every gene (chromosome) contains the head, which executes the algorithm by creating chromosomes. The individuals (gene) in GEP are selected and represented as expression tree(s) with the execution of the analysis. After performing the analysis, the fitness is estimated; based on this, the decision of dismissing or reiterating is made. Dismissing the fitness finishes the algorithm, while, during reiteration, the fitness is calculated and estimated once again in order to evaluate the suitability for another expression of chromosomes as expression trees. The schematic diagram of the GEP algorithm is shown in Figure 4. It is expected that every gene (chromosome) contains the head, which executes the algorithm by creating chromosomes. The individuals (gene) in GEP are selected and represented as expression tree(s) with the execution of the analysis. After performing the analysis, the fitness is estimated; based on this, the decision of dismissing or reiterating is made. Dismissing the fitness finishes the algorithm, while, during reiteration, the fitness is calculated and estimated once again in order to evaluate the suitability for another expression of chromosomes as expression trees. The schematic diagram of the GEP algorithm is shown in Figure 4.

Correlation Graph Python Programming Based
The collected SCC database taken from published literature [17, includes information on the water-binder ratio, fly ash, fine and coarse aggregate, superplasticizer, and cement content (see Appendix A). Each model performance is governed by the dis-

Correlation Graph Python Programming Based
The collected SCC database taken from published literature [17, includes information on the water-binder ratio, fly ash, fine and coarse aggregate, superplasticizer, and cement content (see Appendix A). Each model performance is governed by the distribution of its parameters [61]. It can be seen that machine learning and artificial intelligence are hand full tools in the prediction of mechanical properties of SCCs. The distribution and relationship (optimal quantities) of input parameters to its output can be seen in contour form in Figure 5. It can be seen that with the increasing value of cement content, the compressive strength value has also increased; however, it is the opposite in the case the of water-binder ratio, wherein the increase of this ratio results in a decrease in compressive strength. Moreover, using these variable concentrations in SCCs yield maximum compressive strength output, thus eliminating its need for going in hit and trial methods to obtain the target strength. Furthermore, the range and description of data is shown in Tables 3 and 4. It may be concluded that machine learning and deep learning approaches adamantly benefit in the prediction of the mechanical aspect of SCCs.  Tables 3 and 4. It may be concluded that machine learning and deep learning approaches adamantly benefit in the prediction of the mechanical aspect of SCCs.     Table 4. The dataset from the latest works in the subject of concrete compressive strength prediction.

Sensitivity Analysis or Permutation Feature Importance
The influence of parameters on the compression strength of SCC was calculated by using machine learning (python) based program. It can be seen in Figure 6 that cement and fly ash play a vital role in SCC compressive strength prediction with 53% of their net contribution, whereas the coarse aggregate and water-binder ratio have an influence of 27.27% on the compressive strength of SCC.
It must be noted that properly fitting parameters play an adamant part in the effectiveness and simplification of the established model. The factors for the GEP algorithm were calculated on the premise of research recommendations and numerous preliminary runs [62]. It must be kept in mind that gene chromosomes (population size) and head sizes are the key aspects in controlling program run time. Larger chromosome population and head size result in a longer time of test. Due to the number of possible results and the difficulty of the assessment model estimation, three best populations, i.e., 50, 100, or 150, and one head size were taken into consideration. The parameters for the model used in the GEP algorithm are listed in Table 5.  Development of the SCCs model incorporating waste material is based on the selection of input parameters. These variables have an intransigent impact on SCCs mechanical properties. All parameters in the dataset were carefully studied, and only the influential parameters for a generalized relationship were selected. The compressive strength response ( f c ) of SCC depends upon the following factors as illustrated in Equation (1).
It must be noted that properly fitting parameters play an adamant part in the effectiveness and simplification of the established model. The factors for the GEP algorithm were calculated on the premise of research recommendations and numerous preliminary runs [62]. It must be kept in mind that gene chromosomes (population size) and head sizes are the key aspects in controlling program run time. Larger chromosome population and head size result in a longer time of test. Due to the number of possible results and the difficulty of the assessment model estimation, three best populations, i.e., 50, 100, or 150, and one head size were taken into consideration. The parameters for the model used in the GEP algorithm are listed in Table 5.
The correlation coefficient (R 2 ) is a common mean degree of performance of any machine learning model. Nevertheless, the inconsiderateness of R to divide and multiply the productivity values into a constant implies that R (coefficient of relation) cannot be used exclusively as the predictive precision of any model. Therefore, errors such as the relative root mean square error (RMSE), mean absolute error (MAE), and relative mean square error (RSE) were also calculated. An output index or performance index (ρ) is proposed to measure model efficiency as a result of both R and RRMSE [63]. The calculated expressions are given as equations for these error functions, which are listed below: where ex i , mo i , ex i , and mo i are experimental values setup and model domain.

Artificial Neural Network
The influence of variables including regression coefficient R 2 as well as statistical characteristics of errors between actual targets and modeled outputs are measured for the performance evaluation of the MLP-ANN model. The network output is assessed independently for training, validation, and testing set. The correlation between experimental values and prediction sets for training, validation, and testing set, respectively, are shown in Figure 7. It shows that the obstinate relation between an experimental set with modeled output for data exists. It can be seen that the training set, validation set, and test set give a coefficient of correlation close to 1, as illustrated in Figure 7a,c,e. Moreover, the prediction accuracy by ANN can also be evaluated by its error distribution. Figure 7b,d,f, present the error distribution of the training set, validation set, and testing set with prediction to output variables, showing satisfactory performance of the model. It can be seen that the error distribution of training set data between the experimentally measured compressive strength and predicted lies mostly below 10

Support Vector Machine
The influence of variables including regression coefficient R 2 as well as statistical characteristics of errors between actual targets and modeled outputs are measured for the performance evaluation of the SVM model. The correlation between experimental values and prediction sets for training, validation, and testing set, respectively, are shown in Figure  8. It shows that the relation between an experimental set with modeled output for data exists, but it is not as sufficient as in comparison to ANN. It can be seen that training set, validation set, and test set give the coefficient of correlation are lower than for ANN but are still very high, as illustrated in Figure 8a,c,e. Moreover, the prediction accuracy by SVM is also illustrated by its error distribution, presented in Figure 8b,d,f). It can be seen that error values ranges between −17.75 MPa and 17.00 MPa, respectively, for training, as depicted in Figure 8b. Similarly, validation demonstrates the same trend by showing lesser error distribution in the same range of error values between −11.33 MPa and 14.35 MPa, as illustrated in Figure 8d, and for the testing set, the range of error values was a little bit higher and ranges between −15.78 MPa and 21.82 MPa, as depicted in Figure 8f.
Thus, the prediction model shows less accuracy in comparison to ANN.

Support Vector Machine
The influence of variables including regression coefficient R 2 as well as statistical characteristics of errors between actual targets and modeled outputs are measured for the performance evaluation of the SVM model. The correlation between experimental values and prediction sets for training, validation, and testing set, respectively, are shown in Figure 8. It shows that the relation between an experimental set with modeled output for data exists, but it is not as sufficient as in comparison to ANN. It can be seen that training set, validation set, and test set give the coefficient of correlation are lower than for ANN but are still very high, as illustrated in Figure 8a,c,e. Moreover, the prediction accuracy by SVM is also illustrated by its error distribution, presented in Figure 8b,d,f). It can be seen that error values ranges between −17.75 MPa and 17.00 MPa, respectively, for training, as depicted in Figure 8b. Similarly, validation demonstrates the same trend by showing lesser error distribution in the same range of error values between −11.33 MPa and 14.35 MPa, as illustrated in Figure 8d, and for the testing set, the range of error values was a little bit higher and ranges between −15.78 MPa and 21.82 MPa, as depicted in Figure 8f. Thus, the prediction model shows less accuracy in comparison to ANN.

Gene Expression Programming
The output of the GEP algorithm for the SCC model is denoted as an expression tree(s), as illustrated in Figure 9. The GEP algorithm solves nonlinear expressions as well as linear ones by forming a tree-like structure, which can then be used to form an equation used to predict the model outcome. These ETs were then decoded to give empirical relationships. The ETs for compressive strength of SCC contains four basic mathematical functions containing addition, multiplication, subtraction, and division. Moreover, it can be seen that these expression trees contain parameters and constants to prepare empirical equations, as shown in Table 6. Materials 2021, 14, x FOR PEER REVIEW 13 of 28

Gene Expression Programming
The output of the GEP algorithm for the SCC model is denoted as an expression tree(s), as illustrated in Figure 9. The GEP algorithm solves nonlinear expressions as well as linear ones by forming a tree-like structure, which can then be used to form an equation used to predict the model outcome. These ETs were then decoded to give empirical relationships. The ETs for compressive strength of SCC contains four basic mathematical functions containing addition, multiplication, subtraction, and division. Moreover, it can be seen that these expression trees contain parameters and constants to prepare empirical equations, as shown in Table 6. Defined relationships between ETs and genes help in predicting the compressive properties of self-compacting concrete ( f c ). The response to predict the compressive strength is then proposed with expression trees by using Equation (5). where: The evaluation of the model expectations against the actual results of SCC strength is graphically shown in Figure 10. It depicts that all input variables to predict f c of SCC are accurately taken into account by the model. The presented results are highly correlated, as be seen in Figure 10a,c,e; it was also proved by the obtained values of linear correlation coefficient, equal to 0.941, 0.935, and 0.947 for training and validation. The proposed model's efficiency is significantly affected by the number of datasets [63]. This research consists of 300 datasets in the prediction of SCC; hence, high accuracy of the model is expected. The response of predicted values with error distribution is presented in Figure 10b,d,f. It can be seen that all sets for the GEP model show a minimum error with the maximum range that lies below 10 MPa, as depicted in Figure 10b,d,f. It confirms the accuracy of the desired model with respect to regression models and it is on the same level of accuracy as for ANN. tree(s), as illustrated in Figure 9. The GEP algorithm solves nonlinear expressions as well as linear ones by forming a tree-like structure, which can then be used to form an equation used to predict the model outcome. These ETs were then decoded to give empirical relationships. The ETs for compressive strength of SCC contains four basic mathematical functions containing addition, multiplication, subtraction, and division. Moreover, it can be seen that these expression trees contain parameters and constants to prepare empirical equations, as shown in Table 6.  Defined relationships between ETs and genes help in predicting the compressive properties of self-compacting concrete ( ). The response to predict the compressive strength is then proposed with expression trees by using Equation (5).  posed model's efficiency is significantly affected by the number of datasets [63]. This research consists of 300 datasets in the prediction of SCC; hence, high accuracy of the model is expected. The response of predicted values with error distribution is presented in Figure  10b,d,f. It can be seen that all sets for the GEP model show a minimum error with the maximum range that lies below 10 MPa, as depicted in Figure 10b,d,f. It confirms the accuracy of the desired model with respect to regression models and it is on the same level of accuracy as for ANN.

Comparison between the Proposed Models
The machine learning algorithms used in the article are accurate in prediction of the compressive strength of self-compacting concrete modified by fly ash. It can be observed based on the values of the parameters describing their accuracy, which were linear coefficient of correlation R, root mean square error RMSE, and mean average error MAE. Among the artificial neural networks, the support vector machine and gene expression

Comparison between the Proposed Models
The machine learning algorithms used in the article are accurate in prediction of the compressive strength of self-compacting concrete modified by fly ash. It can be observed based on the values of the parameters describing their accuracy, which were linear coefficient of correlation R, root mean square error RMSE, and mean average error MAE. Among the artificial neural networks, the support vector machine and gene expression programming, there is difficult to point the most accurate algorithm. The least accurate was support vector machine due to the lowest values of the linear coefficient of correlation and the highest values of errors in all processes. However, even though the neural network was the most accurate during the training process, the gene expression programming algorithm was more accurate in the testing and validation processes. Thus, for construction practice, it might be beneficial to use this algorithm, which performs better in the testing and validation processes instead of training because of the threat of overfitting. In Figure 11, the aforementioned algorithms were compared with other models presented in the literature.
It can be seen that all of the investigated models are predict the SCC compressive strength well, according to the literature. However, due to the fact that none of the models were perfectly accurate (linear correlation coefficient was equal to 1.0), it is still possible to improve the algorithms by building other databases or using different algorithms. and the highest values of errors in all processes. However, even though the neural net-work was the most accurate during the training process, the gene expression programming algorithm was more accurate in the testing and validation processes. Thus, for construction practice, it might be beneficial to use this algorithm, which performs better in the testing and validation processes instead of training because of the threat of overfitting. In Figure 11, the aforementioned algorithms were compared with other models presented in the literature. It can be seen that all of the investigated models are predict the SCC compressive strength well, according to the literature. However, due to the fact that none of the models were perfectly accurate (linear correlation coefficient was equal to 1.0), it is still possible to improve the algorithms by building other databases or using different algorithms.

Conclusions
This research discusses the machine learning application of artificial intelligence, in particular, artificial neural network, support vector machine, and gene expression programming for the prediction of self-compacting concrete compressive strength. By performing an extensive literature survey for obtaining the experimental results of the SCCs compressive strength values and also by performing numerical analysis using ANN, SVM, and GEP, the following conclusions can be drawn: 1. ANN-, SVM-, and GEP-based models predict the properties of SCC strength; however, ANN and GEP are the most accurate for this purpose; Figure 11. The comparison of models for SCC compressive strength prediction.

Conclusions
This research discusses the machine learning application of artificial intelligence, in particular, artificial neural network, support vector machine, and gene expression programming for the prediction of self-compacting concrete compressive strength. By performing an extensive literature survey for obtaining the experimental results of the SCCs compressive strength values and also by performing numerical analysis using ANN, SVM, and GEP, the following conclusions can be drawn: Statistical analysis and external checks give obstinate responses for all models.
These models were used for prediction rather than conducting experimental work; thus, their utilization in the civil engineering field will lower the carbon footprint. Below, a few recommendations for continuing similar research in the future are presented:

1.
Hybrid models or advanced evolutionary algorithms can be developed, and the results can be compared to the present study.

2.
The techniques used in this study can be used to model other engineering properties of concrete and structures.
As every study and technique has some limitations, some of the limitations of GEP are as follows: 1.
Sometimes, the GEP is trapped in a local region that does not contain the global optimum. This phenomenon is called premature convergence and is one of the serious problems in genetic algorithms. 2.
The "best" fitness is in comparison to other fitness; i.e., the stop criterion is not clear in every problem.

3.
For specific optimization problems and problem instances, other optimization algorithms may be more efficient than genetic algorithms in terms of speed of convergence. Funding: The APC was funded by Cracow University of Technology.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
All the data is available within the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.