Application of Gene Expression Programming (GEP) for the Prediction of Compressive Strength of Geopolymer Concrete

For the production of geopolymer concrete (GPC), fly-ash (FA) like waste material has been effectively utilized by various researchers. In this paper, the soft computing techniques known as gene expression programming (GEP) are executed to deliver an empirical equation to estimate the compressive strength fc′ of GPC made by employing FA. To build a model, a consistent, extensive and reliable data base is compiled through a detailed review of the published research. The compiled data set is comprised of 298 fc′ experimental results. The utmost dominant parameters are counted as explanatory variables, in other words, the extra water added as percent FA (%EW), the percentage of plasticizer (%P), the initial curing temperature (T), the age of the specimen (A), the curing duration (t), the fine aggregate to total aggregate ratio (F/AG), the percentage of total aggregate by volume ( %AG), the percent SiO2 solids to water ratio (% S/W) in sodium silicate (Na2SiO3) solution, the NaOH solution molarity (M), the activator or alkali to FA ratio (AL/FA), the sodium oxide (Na2O) to water ratio (N/W) for preparing Na2SiO3 solution, and the Na2SiO3 to NaOH ratio (Ns/No). A GEP empirical equation is proposed to estimate the fc′ of GPC made with FA. The accuracy, generalization, and prediction capability of the proposed model was evaluated by performing parametric analysis, applying statistical checks, and then compared with non-linear and linear regression equations.


Introduction
Fly ash (FA) is the unburned leftover residue from thermal coal plants [1]. Which is transported by gases emitted from the burning zone in the boiler. FA is collected through mechanical or electrostatic separator [2]. Annually around 375 million tons of FA is produced throughout the globe, with a disposal cost as high as $20-$40 per ton [3]. It is dumped into landfills in sub-urban areas [4]. However, dumping tons of FA exclusive of treatment sets off a malignant impact on the green environment [5]. The hazardous materials in FA like silica, alumina, and oxides such as a ferric oxide (Fe 2 O 3 ) are intervening factors in water, soil, and air pollution. This ultimately leads to health issues and different geo-environmental problems [6,7]. A good waste management employment is desirable for the sustainability of a safe environment [8]. FA, if not properly disposed of, will affect the whole ecological cycle. Ultra-fine particles of FA act in the same way as poison when

Supervised Machine Learning Algorithms
Artificial neural networks (ANN), fuzzy logic, genetic algorithms (GA), and genetic programming (GP) use AI techniques built on natural tools. These methods have been used to resolve the problems of the pre-mix design of rubberized concrete and waste foundry sand concrete by training of the available data collected from the literature [41,42]. The configuration detection capabilities of the AI methods (support vector regression or ANN) lead to the generalization of complicated patterns. Therefore, they can be applied in the vast field of engineering [43]. By employing such approaches, the presence of an enormous sum of hidden or concealed neurons often makes it impossible to establish accurate relations between the inputs and outcomes. ANN can be exercised for estimating the mechanical properties of concrete. Recently, Getahun et al. used ANN on 66 experimental datasets to estimate the compressive strength of rice husk ash-based concrete [44]. While Mashhadban et al. predict the workability of self-compacting concrete using ANN [45]. These models give a strong correlation with no empirical expression which can be practically used. This is because of the complexity of the ANN model structure which is considered as the main obstruction in the wide-scale implementation of the ANN approach [46]. Multicollinearity is the hindrance in such methods [47]. The updated ANN technique was likewise extended to assess silica fume concrete compressive strength ( f cc ) and elastic modulus (E c ) of concrete incorporates recycled aggregate. Because of the complexities of the relationship proposed, a devoted graphical interface was created for the model functional usage [48].
A strong soft computing technique, namely, genetic programming (GP), is valuable as it ignores the previous forms of established relationships for the development of the model [49,50]. An extension of GP, namely, gene expression programming (GEP), which encodes a small program and uses fixed-length linear chromosomes, was recently introduced [51]. GEP has an advantage in that a simple mathematical expression can represent the outcome that is appropriate for practicable usage of better predictive accuracy. It is currently exercised as a substitute to the common techniques of prediction [52][53][54][55][56][57][58].
Compressive strength ( f c ) is considered as the primary factor in designing and analyzing concrete [59]. The researchers focused on the experimental route to estimate the f c of FA dependent GPC [60,61]. To save time, cost, and to sustain fly-ash and cement for future use, the development of accurate and reliable expressions is needed to relate the mix design variables and f c of GPC made with FA. A complete and thorough revision of the literature discloses that there are few empirical models for the estimation the f c of FA based GPC [41,55,58]. Though, the predictions of such empirical equations are confined to a specific dataset, for example, to the corresponding experimental study results. The prediction from such models is not viable and accurate outside the corresponding database file. Alkaroosh et al. [62] developed an empirical equation to estimate the f c of FA based GPC, based on 56 data points collected from previous research [63]. In the proposed equation, no factor was used for making the sodium silicate solution. Their equation shows a pure linear relationship between the NaOH solution molarity and f c for FA-based GPC. While other researchers reported a decrease in compressive strength by increasing the molarity of the NaOH solution [64]. To fill the research gap, the GEP approach is employed to establish a generalized and more effective empirical equation for the estimation of f c of FA-based GPC with a tolerable error. A detailed database has been developed from published research that incorporates cylindrical specimen of size 200 × 100 mm, height × diameter, and cubic specimens of size 150 mm and 100 mm. The comprehensive database accomplishment guarantees that the models are consistent and accessible for the data that is not exercised in the model's establishment. The model's performance is also verified by observation of the statistical errors, parametric analysis, sensitivity checks, and linear and non-linear regression methods.

Research Methodology
In this segment, methodology for the establishment of an empirical model for the compressive strength ( f c ) of GPC made with FA has been incorporated.

Brief Review of Genetic Programming and Gene Expression Programming
Koza proposed a GP method, to provide an alternate method for fixed-length binary strings (used in GAs) [65]. This method is illustrated in Figure 1 which is adapted from [37]. Five main parameters to be defined throughout the GP methodology are the collection of the terminals (the constants and the input variables), the set of primitive functions (domainexplicit functions), the fitness evaluation, the control variables (cross-over and population size, etc.) and the termination criteria followed by a result designation method [65]. The induction of non-linear parse tree-like structures makes GP an adaptable programming technique. It assumes any initial non-linearity depending on the data. A similar kind of non-linearity has been used previously [62,65]. Limitation of GP is the ignorance of the independent genome. GP uses non-linear structures that act as both the genotype and the phenotype. This makes it unlikely to produce basic and simplistic expressions. The GEP method is proposed by Ferreira, as a modified version of the GP method to overcome its discrepancies [65]. A significant alteration throughout GEP is that only the genome is transmitted towards the subsequent generation. One other noteworthy characteristic is the establishment of entities by a single chromosome composed of various genes [66]. Every gene within GEP comes in the form of fitted lengths parameters, terminal sets of constants, and the functions used are the arithmetic operations. Furthermore, in genetic code operators, there is a stabilized interaction amongst both the associated function and the chromosome symbol. The necessary information for the development of an empirical model is registered into the chromosomes and to infer this data a novel program, in other words, karva is established.  The phases covered in the process of GEP are illustrated in Figure 2 which is adapted from [37]. The method starts with the arbitrary formation of fixed-size chromosomes for all individuals; which are subsequently converted into expression trees (ET) and for each individual, the fitness strength is estimated. For several creations, the replication lasts with new individuals until the accomplishment of fine results. Genetic functions like crossover, reproduction, and mutation are implemented for population alteration.

Data Collection
Compressive strength ( f c ) is the main factor in analysis and design of concrete structure. To save time, cost, and to sustain the use of FA in the construction industry, the development of accurate and reliable expression is needed that can relate the mix proportion and f c of GPC made with FA.
A detailed database for the f c of FA-based GPC, was compiled from previously published, experimental researches [60,61,63,. The database comprises of total 298 samples which include 101 cylindrical specimens of size 200 mm × 100 mm, height × diameter, 166, and 31 cube specimens of size 150 mm and 100 mm, respectively. f c of cube and cylindrical specimens depends on the length to diameter (L/D) ratio [100,101]. The f c of 100 mm cubes are 5% greater than 150 mm cubes. While f c of 150 mm cubes are 20% greater than cylindrical specimens of size 100 mm × 200 mm. With the increase of the volume of the specimen, the number of voids also increases, so, the specimen with smaller dimension will have lesser f c than the larger dimension specimen. Furthermore, the stress is inversely related to the cross-sectional area of the specimen. The one with smaller cross-sectional area will have higher stresses, which means high internal resistance to failure. Table 1 displays the normalization of the compressive strength of various types of specimens considered in this study. The comprehensive database guarantees the model reliability and accessibility for the data that is not exercised in the development of the empirical model. The composed database covers data about the explanatory parameters, namely, the extra water added as percent FA (%E W ), the percentage of plasticizer (%P), the age of the specimen (A), the curing duration (t), the fine aggregate to the total aggregate ratio (F/A G ), the percentage of total aggregate by volume ( %A G ), the percent SiO 2 solids to water ratio (% S/W) in sodium silicate (Na 2 SiO 3 ) solution, the NaOH solution molarity (M), the activator or alkali to FA ratio (A L /F A ), and the Na 2 SiO 3 to NaOH ratio (N s /N o ) for the response of compressive strength. All the samples collected for the mentioned parameters are heat cured initially for 24 h at different temperatures. The f c increases with curing time but researchers reported that the rate of increment in the f c of FA-based GPC is rapid until 24 h [63]. The early strength of GPC is higher due to the geopolymerization process and limited literature is available for longer curing duration. Van Jaarsveld et al. [102] described that for longer than 24 h curing time, the f c is not increased. Every model performance depends on the distribution of explanatory parameters [103]. The marginal histograms of all ten input parameters used in this study are shown in Figure 3, which dictates that all 10 explanatory parameters selected are distributed through its range for the compressive strength. The bar charts added above and to the right of the main plot add more information to the data. Along with the distribution of the input variables, it also shows the distribution of the compressive strength. Every explanatory variable has a strong impact on the variation of the compressive strength of FA-based GPC.
To conduct the generalized study, both cubes and cylindrical specimens are counted to construct a database. The output and input variables' ranges, along with their mean values are presented in Table 2. For the achievement of reliable and consistent predictions of the compressive strength, it is endorsed to utilize the suggested model with the ranges provided.
It should be noted that, for the evaluation of the validity, reliability, and consistency of the database, multiple trials were conducted. Datasets that diverged considerably from the global norm (about 20%) were not included in the model's creation and performance evaluation. To establish an empirical model, 298 datasets for the prediction of compressive strength were used. In this research, the data points were arbitrarily divided into two statistically consistent sets known as the training and validation sets [37]. Furthermore, 70% (208 data points) of the total data are assigned to the training set and 30% (90 data points) to the validation set [37]. The training set was employed for training the empirical model known as gene progression, whereas validation data points were utilized for the justification and calibration of the established model's generalization capability as suggested in the literature [57]. To conduct the generalized study, both cubes and cylindrical specimens are counted to construct a database. The output and input variables' ranges, along with their mean

Model Development and Evaluation Criteria
For the development of the model, the first step is the selection of input parameters that can influence the FA-based GPC's properties. Influential parameters that effect the compressive strength ( f c ) of GPC made with FA were selected for the generalized model development. The detailed study is carried out and the performance of several initial runs is computed. Hence, the FA-based GPC's compressive strength is taken into account as the function of Equation (1).
Chromosomes, genes, and expression trees (ETs) perform a central role in the development of the GEP model. The program's running duration is regulated through the size of the population (chromosome number). The chromosome is comprised of genes that are used for encoding of the subexpression trees (sub-ETs). Considering the predictive model complexity, the stages counted as population size were 150. The model's architectural structures rest on the gene number and head-size with the latter dictating the difficulty of every term and the latter deciding the sum of the model's sub-ETs. Thus, population size 150, genes 3, and head size 10 is considered for the development of the model. The chromosomes are subjected to genetic variation through genetic operators. In mutation, the component of the gene's tail or head is randomly selected and replaced with a randomly selected component of the terminal or function set. The transposition function involves the transposition of the sequences inside the chromosomes, in other words, root insertion sequence (RIS) and insertion sequence (IS). After all, the recombination combines and splits up 2 chromosomes in order to substitute their elements. For creating the fair empirical model, the adjusted settings recommended in earlier literature were used [41]. To execute the GEP algorithm, GeneXproTool was used. Table 3 illustrates the adjusted setting of the hyperparameters utilized in the formation of the GEP empirical equation. A correlation coefficient (R) is mostly applied to measure model performance. However, it cannot be merely studied as the sign of model predictive accuracy as it is insensitive towards division and multiplication of outcomes to a constant [104]. For that reason, in this research the mean absolute error (MAE), the root means square error (RMSE), the relative root mean squared error (RRMSE), and the relative squared error (RSE) are also considered. Moreover, the model performance evaluation performance index (ρ) is recommended, as it covered the function of both the R and RRMSE [103]. The equations of error functions used in this study are provided as Equations (2)- (7): where m i and e i are the i th model outcome value and experimental value, respectively. While m i and e i are the model's outcome average value and experimental average value, respectively. Additionally, n denotes the overall data points. High R-value and low RMSE, MAE, RSE, and RRMSE shows a best-calibrated model. It is suggested that for a deep correlation between measured and predicted values, the R-value should be greater than 0.8 (as for ideal model R = 1) [105]. The (ρ) value near to zero replicates better model performance.

Results and Discussion
The GEP algorithm's output for the compressive strength ( f c ) model as an expression tree is shown in Figure 4. The empirical relationship was obtained by decoding these expression trees (ETs) that encompass five arithmetical operations, namely, +, −, ×, / and 3 √ .
GEP ETs use the indicators to express the explanatory variables. The correspon symbols and description of each indicator are provided in Table 4.   GEP ETs use the indicators to express the explanatory variables. The corresponding symbols and description of each indicator are provided in Table 4. Table 4. Indicators of GEP expression tree.

Indicator in Expression Tree
Description Symbol The temperature for curing in degrees Celsius T d 1 The age of the sample A d 2 Ratio of alkali or activator to the fly-ash Ratio of Na 2 SiO 3 to NaOH N s /N o d 4 NaOH solution molarity M d 5 Percentage of total aggregate by volume % A G d 6 Ratio of fine aggregate to total aggregate F/A G d 7 Plasticizer as percent fly-ash % P d 8 Percentage of SiO 2 solids to water ratio %S/W d 9 Extra water addition as percent fly ash % E W

Compressive Strength Formulation for FA Based Geopolymer Concrete
Equation (8) is the simplified equation that is presented to estimate the compressive strength, f c , for GPC made with FA in MPa. It is comprised of four variables namely A, B, C, and D represented as Equations (9)- (12) and have been translated from the sub-ETs 1, 2, 3, and 4, respectively, as illustrated in Figure 4.
where; Figure 5 represents the comparison of regression lines between experimental and model outcomes for both the training samples and validation samples. A strong correlation can clearly be seen which is represented via slopes of the regression lines, namely, 1.000 and 0.9892, for the train and validation samples, respectively. Equation (8) is the simplified equation that is presented to estimate the compressive strength, , for GPC made with FA in MPa. It is comprised of four variables namely A, B, C, and D represented as Equations (9)-(12) and have been translated from the sub-ETs 1, 2, 3, and 4, respectively, as illustrated in Figure 4.

Sensitivity and Parametric Analysis
Sensitivity analysis (SA) is performed to investigate the relative contribution of input variables that are exercised to estimate the compressive strength of GPC made with FA, using Equations (13) and (14). SA defines the dependency of the outcome on the input variable.

Sensitivity and Parametric Analysis
Sensitivity analysis (SA) is performed to investigate the relative contribution of input variables that are exercised to estimate the compressive strength f c of GPC made with FA, using Equations (13) and (14). SA defines the dependency of the outcome on the input variable. N where x i represents the i th input variable. f max (x i ) and f min (x i ) represent the maximum and minimum values of outcome, respectively, that depends on its i th input dominion, where other input variables are maintained at a constant average value. The difference between f max (x i ) and f min (x i ) gives the range N i of the i th input variable. The sensitivity and parametric study were both conducted for the training data set, as both the training and validation data sets are consistent [41,105]. Results of sensitivity analysis are presented in Figure 6. The figure clarifies that, from a material engineering perspective, the involvement of the explanatory parameters to the f c of GPC made with FA are similar.
Materials 2020, 13, x FOR PEER REVIEW 1 where represents the input variable. ( ) and ( ) represent the mum and minimum values of outcome, respectively, that depends on its inp minion, where other input variables are maintained at a constant average value. Th ference between ( ) and ( ) gives the range of the input variabl sensitivity and parametric study were both conducted for the training data set, as bo training and validation data sets are consistent [41,105]. Results of sensitivity analy presented in Figure 6. The figure clarifies that, from a material engineering perspe the involvement of the explanatory parameters to the of GPC made with FA are lar. Besides, the effectiveness of most influential input variables in the projection compressive strength of FA-dependent GPC is obtained by performing parametric sis. Changes in compressive strength were recorded only by changing the value o variable from maximum to minimum and other inputs were maintained at averag ues. Figure 7 illustrates the GEP model's parametric analysis results.
It is known that the temperature for curing of the samples is the prompting pa ter in controlling the compressive strength ( ) of GPC made with FA. Its relative c bution is 25.3% as depicted in Figure 6. Figure 7 shows that the increases at v rates with the increase of , , % , ( ⁄ ), ( ⁄ ), and % , but decreases % , ( ⁄ ), , and (% ⁄ ).
Hydrates and silicates are released by the alkali-activating solution that helps formation of the polymeric structure of alumina silicates. Extra heat is needed for s ing up the reaction process and to improve the mechanical characteristics of GPC. F 7 shows that the increases with the increase in the curing temperature up to 100 higher curing temperature the moisture from the concrete is lost, even if sealed pro Analogous results have also been witnessed in earlier literature [64]. The decrease rate of increment of of GPC after 240 days, is due to the decrease in the num unreacted particles. Wardhono et al. [73] presented scanning electron microscopy images, which show that gel fills up the voids after 240 days leading to the format compacted and semi-homogenous microstructure. Furthermore, it can be depicted Figure 7 that the increases with an increase in the amount of total aggregate, how the total aggregate relates to the ratio between fine aggregate to total aggregate con Alkali to FA ratio is linked to the ratio between sodium silicate to sodium hydr and the molarity of NaOH. The increase in the is greatly altered with the amo sodium silicate that transforms the microstructure of GPC. In the development of t dium silicate solution, the ratio between percentage silica to water needs to be highe higher the sodium silicate content, the greater the compressive strength will be. The Besides, the effectiveness of most influential input variables in the projection of the compressive strength of FA-dependent GPC is obtained by performing parametric analysis. Changes in compressive strength were recorded only by changing the value of one variable from maximum to minimum and other inputs were maintained at average values. Figure 7 illustrates the GEP model's parametric analysis results.
It is known that the temperature for curing of the samples is the prompting parameter in controlling the compressive strength ( f c ) of GPC made with FA. Its relative contribution is 25.3% as depicted in Figure 6. Figure 7 shows that the f c increases at various rates with the increase of T, A, %A G , (F/A G ), (N s /N o ), and %P, but decreases with %E W , (A L /F A ), M, and (%S/W).
Hydrates and silicates are released by the alkali-activating solution that helps in the formation of the polymeric structure of alumina silicates. Extra heat is needed for speeding up the reaction process and to improve the mechanical characteristics of GPC. Figure 7 shows that the f c increases with the increase in the curing temperature up to 100 • C. At higher curing temperature the moisture from the concrete is lost, even if sealed properly. Analogous results have also been witnessed in earlier literature [64]. The decrease in the rate of increment of f c of GPC after 240 days, is due to the decrease in the number of unreacted particles. Wardhono et al. [73] presented scanning electron microscopy (SEM) images, which show that gel fills up the voids after 240 days leading to the formation of compacted and semi-homogenous microstructure. Furthermore, it can be depicted from Figure 7 that the f c increases with an increase in the amount of total aggregate, however, the total aggregate relates to the ratio between fine aggregate to total aggregate content.
Alkali to FA ratio is linked to the ratio between sodium silicate to sodium hydroxide, and the molarity of NaOH. The increase in the f c is greatly altered with the amount of sodium silicate that transforms the microstructure of GPC. In the development of the sodium silicate solution, the ratio between percentage silica to water needs to be higher. The higher the sodium silicate content, the greater the compressive strength will be. The lower ratio of alkali to fly ash in combination with higher sodium silicate to sodium hydroxide ratio, and lower molarity of NaOH solution results in higher compressive strength. However, the amount of NaOH solution must remain enough to complete the process of dissolution of the geopolymer. Similar findings have also been observed in a previous study [74]. ratio of alkali to fly ash in combination with higher sodium silicate to sodium hydroxide ratio, and lower molarity of NaOH solution results in higher compressive strength. However, the amount of NaOH solution must remain enough to complete the process of dissolution of the geopolymer. Similar findings have also been observed in a previous study [74]. In GPC, the total water content is the addition of the water content required in preparing the solution of sodium silicate and sodium hydroxide and the addition of extra water needed. To prevent cracking and to achieve a practical GPC, it is necessary to consider the addition of extra water and plasticizer [90]. The addition of extra water or plasticizer as a percent FA contributes 18.85% and 6.71% respectively to the in comparison In GPC, the total water content is the addition of the water content required in preparing the solution of sodium silicate and sodium hydroxide and the addition of extra water needed. To prevent cracking and to achieve a practical GPC, it is necessary to consider the addition of extra water and plasticizer [90]. The addition of extra water or plasticizer as a percent FA contributes 18.85% and 6.71% respectively to the f c in comparison with other input factors. f c of GPC increases with the increase in plasticizer and decreases with the addition of extra water as evident from Figure 7, as the addition of extra water beyond certain limits leads to bleeding and segregation in fresh concrete mix. Figure 7 is in line with the previous studies of other researchers [74,90]. The results of parametric analytics for the proposed GEP model correctly encompasses the influence of input variables to estimate the f c of FA-based GPC.

Performance Evaluation of GEP Models
According to the previous study, to achieve a reliable GEP equation, the ratio between the number of data points in the database to the number of input parameters should be at least equal to three [103]. While in this study a higher value of 30 has been used. Table 5 represents the statistical analysis for validation sets and training sets of the GEP model. These results illustrate the effectiveness of training models and the strong correlation between experimental and predicted outcomes with minimal error. The RMSE, MAE, and RSE for the training set of the GEP model are 5.971, 5.832, and 0.325, respectively, and are calculated as 2.643, 2.057, and 0.0675 from the validation samples. The statistical measure of the training and validation set are similar, which indicates the higher generalization capability of the model. Thus, the developed model can predict accurate and reliable outcomes for the new data. Table 5 witnesses ρ approach zero (as ideal cases equal zero).  Figure 8 illustrates the absolute error of both the experimental and predicted model outcomes, which gives an overall idea of the maximum percentage of error. The average percent error and maximum percent error were calculated as 6.47% and 8.32% respectively, which confirms that the experimental and model outcomes are similar. Furthermore, the occurrence frequency for the maximum error is comparatively smaller. Almost 90% of model predictive outcomes for validation set have the error below 10%, while the average percent error is less than 5.56%. This verifies the accuracy and generalization of the developed GEP equation. with other input factors. of GPC increases with the increase in plasticizer and decreases with the addition of extra water as evident from Figure 7, as the addition of extra water beyond certain limits leads to bleeding and segregation in fresh concrete mix. Figure 7 is in line with the previous studies of other researchers [74,90]. The results of parametric analytics for the proposed GEP model correctly encompasses the influence of input variables to estimate the of FA-based GPC.

Performance Evaluation of GEP Models
According to the previous study, to achieve a reliable GEP equation, the ratio between the number of data points in the database to the number of input parameters should be at least equal to three [103]. While in this study a higher value of 30 has been used. Table 5 represents the statistical analysis for validation sets and training sets of the GEP model. These results illustrate the effectiveness of training models and the strong correlation between experimental and predicted outcomes with minimal error. The RMSE, MAE, and RSE for the training set of the GEP model are 5.971, 5.832, and 0.325, respectively, and are calculated as 2.643, 2.057, and 0.0675 from the validation samples. The statistical measure of the training and validation set are similar, which indicates the higher generalization capability of the model. Thus, the developed model can predict accurate and reliable outcomes for the new data. Table 5 witnesses approach zero (as ideal cases equal zero). Figure 8 illustrates the absolute error of both the experimental and predicted model outcomes, which gives an overall idea of the maximum percentage of error. The average percent error and maximum percent error were calculated as 6.47% and 8.32% respectively, which confirms that the experimental and model outcomes are similar. Furthermore, the occurrence frequency for the maximum error is comparatively smaller. Almost 90% of model predictive outcomes for validation set have the error below 10%, while the average percent error is less than 5.56%. This verifies the accuracy and generalization of the developed GEP equation. For external validation and testing of the proposed GEP model, various statistical error tests were also employed. The literature discloses a suggested criterion that the slope (inclination) of any of the regression lines ( ꞌ) traversing the origin should be approximately equal to 1 [106]. The slope of regression lines is 1.001 and 0.995 as shown in Table  6. It shows greater accurateness and correlation. Moreover, the researchers proposed that For external validation and testing of the proposed GEP model, various statistical error tests were also employed. The literature discloses a suggested criterion that the slope (inclination) of any of the regression lines (k or k ) traversing the origin should be approximately equal to 1 [106]. The slope of regression lines is 1.001 and 0.995 as shown in Table 6. It shows greater accurateness and correlation. Moreover, the researchers proposed that the squared correlation coefficient (passing by origin) among the predicted outcome and experimental results (R 2 o ) or among the experimental and predicted outcome (R o 2 ) should approach 1 [107]. Table 6 summarizes these checks and was applied to the developed GEP equation. The results of these external validations replicate that the proposed GEP model is valid. Thus, the proposed model is not only a correlation but also has predictive capacity.

Comparison of GEP and Regression Models
No GEP model has been identified from the literature that would estimate the compressive strength ( f c ) of GPC made with FA and that considers the influencing input variables used in this research. So, it is necessary to establish linear and non-linear regression models, on the same data points, for the prediction of the f c of FA-based GPC, the results are then judged against GEP Equation (8).
Equations (15) and (16) show the empirical expressions for the prediction of f c founded on linear and nonlinear regression study respectively.
The absolute error of predicted results by all three equations are compared in Figure 9. The statistical indicators like RMSE, MAE, RSE, RRMSE%, R, and ρ for GEP model, linear and no-linear regression model are listed in Table 5. The ρ and RMSE of the established GEP equation are calculated as the least of all three models, for both the training and validation data points. The values of RMSE training and ρ training are 14.5% and 14% lower than the linear regression model, respectively. In the test stage, the proposed GEP model gives better performance than the non-linear regression model. ρ testing of the two models varies by 44%. Furthermore, Figure 9 shows that linear and non-linear regression equations failed in efficiently capturing a high f c , which limits the application of the regression models.
These observations shows that the GEP model performed better than the linear and non-linear regression equations, for the same input variables. The regression methods have certain disadvantages as in they use some predefined expressions and pre-assume the residual's normality [105]. Whereas modeling based on GEP implies that the model efficiently picks up the non-linear relationship between the dependent and independent parameters, with a higher generalization capacity and considerably decreases the error values in comparison with the regression models.

Recommendations for Future Study
Fly-ash (FA)-based geopolymer concrete (GPC) has a great potential to be used in the construction industry, as a replacement of ordinary Portland cement (OPC) concrete. The data set used in this paper is limited to 298 samples. In fact, proper testing must be carried out by varying maximum explanatory variables for a more efficient predictive model. Although, this paper considers a wide range comprehensive data base consisting of ten explanatory parameters for modelling the compressive strength of geopolymer concrete made with wasted fly-ash.
Moreover, study of other mechanical characteristics of fly-ash based GPC like tensile strength, elastic modulus, poison ratio, and flexural strength, is highly necessary; at normal temperature as well as at elevated temperature. A new data base is also needed for the durability study of fly-ash-based GPC. Furthermore, it is recommended to predict the stated mechanical properties of fly-ash-based GPC via different artificial intelligence (AI) techniques, such as fuzzy logic, adaptive fuzzy interface system (ANFIS), response surface methodology (RSM), support vector machine (SVM) analysis, random forest regression (RFR), decision tree (DT), artificial neural network (ANN), recurrent neural network (RNN), convolutional neural network (CNN), M5P tree and restricted Boltzmann machine (RBM), et cetera. Furthermore, an extensive study related to the interaction of geopolymer concrete and reinforcing steel is needed. It would also be worthwhile formalizing the different mechanical properties of fiber reinforced geopolymer concrete. These observations shows that the GEP model performed better than the linear and non-linear regression equations, for the same input variables. The regression methods have certain disadvantages as in they use some predefined expressions and pre-assume the residual's normality [105]. Whereas modeling based on GEP implies that the model efficiently picks up the non-linear relationship between the dependent and independent parameters, with a higher generalization capacity and considerably decreases the error values in comparison with the regression models.

Recommendations for Future Study
Fly-ash (FA)-based geopolymer concrete (GPC) has a great potential to be used in the construction industry, as a replacement of ordinary Portland cement (OPC) concrete. The data set used in this paper is limited to 298 samples. In fact, proper testing must be carried out by varying maximum explanatory variables for a more efficient predictive model. Although, this paper considers a wide range comprehensive data base consisting of ten explanatory parameters for modelling the compressive strength of geopolymer concrete made with wasted fly-ash.
Moreover, study of other mechanical characteristics of fly-ash based GPC like tensile strength, elastic modulus, poison ratio, and flexural strength, is highly necessary; at normal temperature as well as at elevated temperature. A new data base is also needed for the durability study of fly-ash-based GPC. Furthermore, it is recommended to predict the stated mechanical properties of fly-ash-based GPC via different artificial intelligence (AI) techniques, such as fuzzy logic, adaptive fuzzy interface system (ANFIS), response surface methodology (RSM), support vector machine (SVM) analysis, random forest regression (RFR), decision tree (DT), artificial neural network (ANN), recurrent neural network (RNN), convolutional neural network (CNN), M5P tree and restricted Boltzmann machine (RBM), et cetera. Furthermore, an extensive study related to the interaction of geopolymer concrete and reinforcing steel is needed. It would also be worthwhile formalizing the different mechanical properties of fiber reinforced geopolymer concrete.
Normally it is considered that the production cost of GPC is greater than OPC concrete. It can be reduced by the use of different types of waste materials such as sand replacement that are rich in alumina silicates; like the use of locally available waste foundry sand, glass waste, and marble wastes, et cetera. The authors replaced fine aggregates with waste foundry sand in GPC. They reported that the initial production cost of M50 grade GPC is 11% lower than OPC concrete [108]. However, the M30 grades of GPC and OPC concrete have almost similar of production costs [108]. Environmental safety delivered by GPC production from waste materials is worthwhile as it reduces the carbon-dioxide emission from the manufacture of cement and adds a carbon credit to the economy of the country as well. Comparing the overall cost, including the maintenance and durability, the cost of GPC is similar to OPC concrete as the geopolymer concrete is much more durable and resistive to chemical attacks than OPC concrete [109]. The authors immersed GPC and OPC concrete in a magnesium sulfate solution for 45 days and reported that the reduction of compressive strength of GPC is 13% lower than OPC concrete [109]. Additionally, the immersion for the same duration in a sulfuric acid solution resulted in 8% lower reduction of compressive strength of GPC as compared to OPC concrete [109].

Conclusions
This research utilizes the gene expression programming technique (GEP) to establish an expression for the estimation of the compressive strength, f c , of geopolymer concrete (GPC) made with fly-ash. The projected GEP model is empirical and is built on the broadly distributed database, consisting of different parameters, that comes from the published literature. For the prediction of the f c of fly-ash-based GPC, highly prominent and influential parameters are considered as explanatory variables. The predicted model results satisfy the experimental results. From the parametric analysis, it has been shown that the projected model successfully encompasses the impact of the input parameters to predict the exact pattern of fly-ash-based GPC. The accurateness of the projected models is verified by the examination and assessment of statistical checks MAE, RSE, R, and RMSE and fitness functions (ρ) for training and validation samples. Furthermore, the model correctly meets the appropriate requirements considered for external validation. The comparison of the proposed model with the simple linear and non-linear equations shows that the GEP model possesses a higher generality and predictive capability and is appropriate to practice in the preliminary design of fly-ash-based GPC. Furthermore, before adding fly-ash as a geopolymer binder, it is suggested to perform a leachate analysis. The projected models can provide a detailed and practical foundation for increasing the use of toxic fly-ash for construction practices, instead of disposal in landfill sites. This would lead to effective and sustainable construction as green concrete is made by the incorporation of waste fly-ash that reduces the consumption of energy, emissions of greenhouse gases, disposal, and construction costs.