Compressive Strength Prediction via Gene Expression Programming (GEP) and Artiﬁcial Neural Network (ANN) for Concrete Containing RCA

: To minimize the environmental risks and for sustainable development, the utilization of recycled aggregate (RA) is gaining popularity all over the world. The use of recycled coarse aggregate (RCA) in concrete is an effective way to minimize environmental pollution. RCA does not gain more attraction because of the availability of adhered mortar on its surface, which poses a harmful effect on the properties of concrete. However, a suitable mix design for RCA enables it to reach the targeted strength and be applicable for a wide range of construction projects. The targeted strength achievement from the proposed mix design at a laboratory is also a time-consuming task, which may cause a delay in the construction work. To overcome this ﬂaw, the application of supervised machine learning (ML) algorithms, gene expression programming (GEP), and artiﬁcial neural network (ANN) was employed in this study to predict the compressive strength of RCA-based concrete. The linear coefﬁcient correlation (R 2 ), mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) were evaluated to investigate the performance of the models. The k-fold cross-validation method was also adopted for the conﬁrmation of the model’s performance. In comparison, the GEP model was more effective in terms of prediction by giving a higher correlation (R 2 ) value of 0.95 as compared to ANN, which gave a value of R 2 equal to 0.92. In addition, a sensitivity analysis was conducted to know about the contribution level of each parameter used to run the models. Moreover, the increment in data points and the use of other supervised ML approaches like boosting, gradient boosting, and bagging to forecast the compressive strength, would give a better response.


Introduction
The utilization trend of aggregate obtained from natural resources increases sharply from the increased manufacturing and usage of concrete in the construction sectors [1,2]. The largest consumers of the natural aggregates are construction industries [3]. A total of 15 billion tons of concrete material is produced worldwide, which equates to about two tons of concrete per resident per annum [4]. To reduce this flaw and manage this demand, the origin of good quality natural aggregates is significantly reducing worldwide [5]. The approximate amount of aggregate used in the European Union countries has reached two billion each year. The activities related to construction demand a high number of natural materials to produce cement and aggregate. However, the construction sectors are an enormous consumer of natural resources, producing huge amounts of waste [6]. The application of raw materials in the construction industry is the key factor that causes environmental risks and pollution to earth [7]. The usage of raw materials has also led to the depletion of minerals as well as natural resources [8]. Resources including cement, fine aggregate, and coarse aggregate will be at a deprived status because these resources cannot manage the increasing demand in the construction industry [9]. Furthermore, sustainable waste management is one of the most crucial matters experienced by the world. Therefore, to minimize the environmental impact and energy consistency of concrete applied to construction work, the utilization of demolition and construction wastes can be favorable for a sustainable engineering approach for the mixed design of concrete. The use of recycled coarse aggregate (RCA) can also be a significant and positive aspect to achieve sustainable construction and reduce environmental risks [10].
The main difference between the natural aggregate and recycled coarse aggregate (RCA) is a certain amount of sticky mortar at the surface of RCA [11]. The properties of RCA vary with certain percentages from the natural aggregate. RCA is generally a porous material, having low saturated surface dry density and bulk density, 2310-2620 kg/m 3 and 1290-1470 kg/m 3 , respectively [12]. The porosity of RCA is due to a high content of adhered mortar on its surface, which also reduces its resistance against the chemical and mechanical effects. In comparison, RCA also shows a high value of water absorption (4% to 9%) as opposed to natural aggregate (1% to 2%) [13]. The porosity and water absorption are normally increased in RCA just because of the amount of adhered mortar [14,15]. The effect on density and absorption capacity is also affected by the adhered mortar. These parameters affect the fresh properties of concrete and reduce the strength properties of concrete. The proper mix design for RCA has assured the acceptable properties of concrete which can be used in several construction projects. The properties of concrete material can also be improved by using other waste materials like silica fume, fly ash, and natural and artificial fibers [16][17][18][19].
Several studies were presented regarding the application of recycled aggregate (RA) in concrete at certain percentages [20,21]. Several properties of concrete were investigated upon the inclusion of RA in concrete, including the fresh properties and mechanical properties of RA-based concrete [22][23][24]. The different qualities of RA were employed in concrete for maintaining or increasing the strength properties of concrete [25][26][27][28]. They also showed that the targeted strength was achieved even at an 80% replacement of coarse aggregate with RCA. Khaldoun et al. [23] worked on the effect of mechanical properties of concrete containing RCA. The compressive strength of the specimens at different ages was calculated to analyze the behavior of concrete. Muzaffer et al. [29] described the mechanical and physical properties of RCA concrete GGBFS, in which they concluded that the split tensile strength was improved when tested at various ages of specimens. Etxeberria et al. [30] showed the influence of RCA and the production process on the properties of recycled aggregate-based concrete. They prepared concrete with 0%, 25%, 50%, and 100% recycled aggregate to investigate the properties. Sumayia et al. reported the mechanical properties of three generations of 100% repetition of RCA. They reported the idea that the repeated RA experienced marginally lower compressive strength than the normal concrete.
Supervised machine learning (ML) techniques are extensively used in the fields of artificial inelegance (AI) and computer science and have a positive reflection in engineering. However, it has gained rapid promotion in the field of civil engineering, especially when it comes to predicting the strength properties of concrete. The supervised ML approaches can be employed, which can predict the outcomes at high accuracy. Ayaz et al. [31] predicted the compressive strength of fly ash-based concrete with individual and ensemble ML approaches. Miao et al. [32] used MLR, SVM, and ANN to foretell the bond strength between the FRPs and concrete, in which they compared the accuracy level of the predictions from the employed techniques. Khoa et al. [33] used ML algorithms to forecast the compressive strength of greenfly ash-based geopolymer concrete. Marjana et al. used different ML techniques for predicting the compressive strength of concrete. The predicted accuracy and the error distribution were analyzed in the study. Ayaz et al. [34] used artificial neural network (ANN), gene expression programming (GEP), and decision tree (DT) techniques to forecast the surface chloride concentration in concrete containing waste material. They indicated that the GEP was a more effective technique for prediction than other employed algorithms. This research also focuses on the application of supervised ML approaches to forecast the compressive strength of recycled coarse aggregate-based concrete. The ANN and GEP algorithms have been investigated to predict the compressive strength of concrete containing recycled aggregate. The various statistical checks, k-fold cross-validation method, and error distribution are included to confirm the model performance. The focus of this study is on the application of supervised machine learning algorithms (gene expression programming and artificial neural network) to predict the compressive strength of concrete containing recycled coarse aggregate (RCA) of 344 data points. The aim of this research also describes the performance of gene expression programming (GEP) and an artificial neural network (ANN) in terms of the correlation coefficient (R 2 ) value. The statistical checks, evaluation of errors (MAE, MSE, and RMSR), k-fold cross-validation, and sensitivity analysis were also involved to evaluate the performance of both GEP and ANN models. This study can be useful for researchers in the field of civil engineering to foretell the strength properties without consuming more time on practical work in the laboratory.

Data Description
Supervised machine learning algorithms require various input variables to give the output predicted variable. The data used in this study to forecast the compressive strength of recycled coarse aggregate-based concrete were taken from previously published literature and can be seen in Appendix A. A total of nine parameters including water, cement, sand, natural coarse aggregate, recycled coarse aggregate (RCA), superplasticizers, size of RCA, the density of RCA, and water absorption of RCA were taken as input for running the models, and one variable, compressive strength, was taken as an outcome for the models. Several input parameters and the total number of data points greatly influence the model's outcome. A total of 344 data points (mixes) for the prediction of RCA-based concrete were used in the study. Anaconda software was introduced to run the model for ANN using python coding, while the GEP model was run on the GEP software. The relative frequency distribution of each parameter used for the mixes can be seen in Figure 1. The descriptive statistical analysis for all the parameters is listed in Table 1. The flowchart of the research approach can be seen in Figure 2.

Methodology
Two algorithms (GEP and ANN) were introduced in the study to predict the compressive strength of RAC. Spyder 4.1.1 was selected in the Anaconda navigator to run the model for the artificial neural network (ANN) using python coding. However, the GEP, which is the computer-based software, was adopted for modeling to give a predicted compressive result for the concrete containing recycled coarse aggregate. The GEP and ANN used nine parameters as input and one parameter (compressive strength) as the output during the modeling. The predicted outcome from both models presented the correlation coefficient (R 2 ) value, which is an indication of the accuracy level. The R 2 value normally ranges from 0-10, and a higher R 2 value indicates a high accuracy between the actual and predicted result. Gene expression programming is from the family of evolutionary algorithms and is generally associated with genetic programming. GEP being from the evolutionary algorithms, can design computer programs and models. Computer programming is considered as a composite tree-like structure that learns and alters by substituting their shapes, compositions, and sizes similar to living organisms. The GEP computer program is included in simple linear chromosomes of fixed length. GEP consists of five components: terminal set, function set, controlee variable, fitness function, and terminate condition. Ferreira presents GEP in 2006, which is a modified form of genetic programming (GP) and depends on the population evolutionary theorem. An exceptional tempering in GEP was that the single gene must be transferred to another generation and has no need to reproduce and mutate the complete structure since every alteration takes place in a linear and simple structure. Each gene in GEP contains a fixed-length variable having terminal sets and arithmetic operations as a set of functions. GEP makes it possible to learn the complex data in the form of input and gives the resulting output in a simple and easy manner. An artificial neural network (ANN) is generally a segment of a computing system that is designed in such a way that it can simulate just like the human brain and inspect and execute a set of information. ANN is the foundation of artificial intelligence (AI), which can resolve problems that would seem difficult or impossible for a human. It is also comprised of self-learning potential, which permits them to generate better results. ANN is designed like a human brain having neuron nodes interrelated just like a web. The brain consists of hundreds of billions of cells known as neurons. Every neuron is prepared with a cell body that is accountable for executing the information by taking information towards and away from the brain. The application of ANN is reflected in every industry and field to predict required outcomes.

Statistical Analysis
The statistical analysis representation between the actual and predicted outcomes (for compressive strength of RCA-based concrete) from the GEP and ANN models along their error distribution can be seen in Figure 3. The GEP gives high accuracy and less variance between the actual and predicted output. The coefficient correlation (R 2 ) value equals 0.95 and is an indication of its high performance towards the prediction of the result, as shown in Figure 3a. The scattering of errors for the GEP model is also illustrated in Figure 3b. The error distribution in Figure 3b represents that the maximum, minimum, and average values of the training set were 22.37 MPa, 0.00 MPa, and 1.84 MPa, respectively. However, 21.73% of the error data lies below 1 MPa, and 22.96% of the data represented the errors between 2 MPa and 5 MPa. However, only 6.97% of the data lies above the 5 MPa. In addition, 21.73% of error data lies below 1 MPa, and 36.23% of data lies between 2 MPa and 5 MPa. However, only 7.24% of the error data indicated above the 5 MPa.

K-Fold Cross-Validation
The authenticity of the model's execution was analyzed through the k-fold crossvalidation method. To examine the model's validity, the k-fold cross-validation process is normally adopted, in which the required data has been arranged randomly and divided into ten groups. The nine groups need to be allocated for training and the remaining one for the model's validation. The procedure also needs repetition (ten times) to have an average output. This detailed process of the k-fold cross-validation results in the high accuracy of the models. In addition, the statistical checks in the form of the error's (MSE, MAE, and RMSE) evaluation have also been carried out, as illustrated in Table 2. The response of the models towards the prediction was also checked through the statistical analysis, illustrated in the form of the equations stated below. (Equations (1)-(5)) where, ex i = experimental value, mo i = predicted value, ex i = mean experimental value, mo i = mean predicted value obtained by the model, n = number of samples.
The resulting evaluation of the k-fold cross-validation comprised of four parameters, including the coefficient correlation (R 2 ), mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE), and their distribution can be seen in Figure 4. The lesser error of the GEP model with a high value of R 2 indicates the better performer for prediction of outcome. The maximum, minimum, and average values of R 2 for the GEP model were equal to 0.77, 0.00, and 0.49, respectively, as shown in Figure 4a. Similarly, the same values of R 2 for the ANN model were 2.05, 0.00, and 0.68, as depicted in Figure 4b

Sensitivity Analysis
This analysis refers to the effect of parameters on predicting the compressive strength of concrete containing recycled coarse aggregate, as depicted in Figure 5. The input parameters have a significant effect on forecasting the outcomes. The figure illustrates that the highest contributor was the recycled coarse aggregate (RCA) at 41.1%, while the other two main contributors were natural coarse aggregate (NCA) and water at 25% and 20%, respectively. However, the contribution of the other variables was less, and for cement, it showed a 3.8% contribution, fine aggregate 2.3%, superplasticizers 2.6%, the size of coarse aggregate 1.9%, the density of RCA 2%, and water absorption showed 1.3% contribution towards the prediction of the compressive strength of RCA-based concrete. The following equation was used to calculate the contribution of each variable towards the model's output. N where, f max (x i ) and f min (x i ) are the maximum and minimum of the estimated output over the ith output.

Discussion
This research describes the application of supervised machine learning (ML) techniques to foretell the strength property (compressive strength) of recycled coarse aggregatebased concrete. The use of recycled aggregates in concrete is to produce effective material and sustainable construction works. The ML approaches used in this study were gene expression programming (GEP) and an artificial neural network (ANN). The predictive performance of both algorithms was compared to evaluate the better predictor. The GEP model's outcome was more accurate by indicating the coefficient correlation (R 2 ) value equal to 0.95 as opposed to the ANN model's outcome which gave an R 2 value equal to 0.92. The performance of both models was also confirmed from the statistical checks and k-fold cross-validation method. The lesser values of the errors indicate the high performance of the employed model. Moreover, the sensitivity analysis was also carried out to know about the contribution of each parameter towards the prediction of the compressive strength of concrete containing recycled coarse aggregate. The performance of the models can be affected by the input parameters used to run the model and the number of data points. The contribution level from the sensitivity analysis of all the nine input parameters towards the forecasted result indicates the high contributor parameter.

Conclusions and Future Recommendations
This study describes the application of supervised machine learning approaches to predict the compressive strength of concrete containing recycled coarse aggregate (RCA). The gene expression programming (GEP) and artificial neural network (ANN) algorithms were employed for forecasting the compressive strength of concrete. The GEP model was more effective in terms of prediction as compared to the ANN model, which is confirmed from its higher value of linear correlation coefficient (R 2 ) and lesser values of the errors. The following conclusions can be drawn.
The results of the GEP model indicate the high performance towards the prediction of concrete containing recycled coarse aggregate (RCA) as opposed to the ANN model.
The results from the ANN model are also in the acceptable range and can be used for predicting the outcomes.
The high performance of the GEP model has also been confirmed from statistical checks and the k-fold cross-validation process.
The application of GEP and ANN was proposed in this study to predict the strength property of concrete. The use of ML approaches can predict the strength properties without casting the samples in the laboratory. However, the use of other supervised machine learning algorithms would give a better idea about the accuracy of the employed ML techniques.
The RCA also showed a significant effect (41.1%) towards predicting the concrete's compressive strength compared to other input variables.
It would be easier to understand the effect of the models by making comparisons of more than two algorithms towards the prediction of the outcomes.
It is recommended for future research that datasets should be enhanced from experimental work, field tests, and other numerical analyses using different approaches (e.g., Monte-Carlo simulation).
The input parameters can also be increased by adding the environmental effects (e.g., high temperature and humidity) to provide a better response from the models.
The application of the other ensemble ML algorithms (e.g., Adaboost, bagging, and boosting) can be more effective to predict the compressive strength of concrete.
Author Contributions: A.A.: conceptualization, methodology, investigation, formal analysis, modeling, visualization, and writing-original draft preparation. K.C.: funding acquisition, methodology, investigation, formal analysis, writing-reviewing and editing, and supervision. F.F.: resources, methodology, and writing-reviewing and editing. W.A.: methodology, and writing-reviewing and editing. S.S.: conceptualization, methodology, and writing-reviewing and editing. F.A.: conceptualization, methodology, and writing-reviewing and editing. All authors have read and agreed to the published version of the manuscript.