Multigene Expression Programming Based Forecasting the Hardened Properties of Sustainable Bagasse Ash Concrete

The application of multiphysics models and soft computing techniques is gaining enormous attention in the construction sector due to the development of various types of concrete. In this research, an improved form of supervised machine learning, i.e., multigene expression programming (MEP), has been used to propose models for the compressive strength (fc′), splitting tensile strength (fSTS), and flexural strength (fFS) of sustainable bagasse ash concrete (BAC). The training and testing of the proposed models have been accomplished by developing a reliable and comprehensive database from published literature. Concrete specimens with varying proportions of sugarcane bagasse ash (BA), as a partial replacement of cement, were prepared, and the developed models were validated by utilizing the results obtained from the tested BAC. Different statistical tests evaluated the accurateness of the models, and the results were cross-validated employing a k-fold algorithm. The modeling results achieve correlation coefficient (R) and Nash-Sutcliffe efficiency (NSE) above 0.8 each with relative root mean squared error (RRMSE) and objective function (OF) less than 10 and 0.2, respectively. The MEP model leads in providing reliable mathematical expression for the estimation of fc′, fSTS and fFS of BA concrete, which can reduce the experimental workload in assessing the strength properties. The study’s findings indicated that MEP-based modeling integrated with experimental testing of BA concrete and further cross-validation is effective in predicting the strength parameters of BA concrete.


Introduction
The damage caused by the construction industry to the environment is a well-known fact. The construction sector uses a third of the total energy production and emits a large amount of greenhouse gases into the atmosphere [1]. Concrete is a widely used material that emits 0.13 tons of CO 2 per ton of concrete produced [2][3][4]. The idea of green concrete is gaining popularity as a way to diminish the harmful impacts of concrete while still addressing the underlying problem. Green concrete is made by substituting industrial waste with traditional cementitious materials. Commonly used wastes that can be used as cement replacement are electric arc furnace slag, rubber ash, fly ash, volcanic ash, rice husk ash, metakaolin, and sugarcane bagasse ash [5,6]. The use of these materials is properties of high strength concrete (HSC), waste foundry sand concrete (WFSC), bagasse ash concrete (BAC), and modeling the bearing capacity of concrete filled steel tubes and RC frame structures [5,[34][35][36]. GEP was considered advantageous in terms of providing empirical equations and high prediction capability, and comparative assessment showed better and enhanced accuracy of the GEP. However, the GEP approach was found to have some limitations as it does not take into account a large number of diverging entries for the establishment of model, thus shrinking its range of application [5]. Such outlying entries should be deleted from the GEP model domain to enhance the performance of developed model. Moreover, the GEP encodes just one chromosome, so it is appropriate for the basic connection among the dependent (response) and independent (explanatory) variables [37].
Considering the aforementioned difficulties and limitations, an advanced and improved algorithm, i.e., multigene expression programming (MEP) has been utilized to formulate the mechanical properties of BAC. To the best of the authors' knowledge, no detailed study has been performed to date to develop a relationship and figure out the responsible factors for the development of strength of BAC using MEP. The MEP has the capability of encoding several chromosomes into a single program (code) and the best possible chromosome can be selected based on evaluating fitness [37,38]. MEP is considered an improved form of the GEP, having the capacity to forecast accurate results given the complexity of the target is unseen compared to other modeling techniques [39]. A simple decoding process is used in MEP as compared to other machine learning (ML) algorithms. Despite the unique attributes of MEP, it has been scarcely utilized in civil engineering. In the present study, the mechanical properties of BAC, such as compressive strength (f c ), splitting tensile strength (f STS ), and flexural strength (f FS ) of BA concrete were modeled considering the optimum parameters of MEP to resolve a complex relationship. A large and comprehensive database was extracted from the previously published literature to train the proposed model. After that, the concrete specimen with different dosages of BA was prepared in the lab, and the results of the lab-tested specimen were used to validate and test the established MEP models. The output of the developed models was further cross-validated by the k-fold method. The performance of the final established models was assessed employing several statistical assessment indicators. The robust MEP technique supplemented with experimental tests and statistical checks could effectively solve complex problems.

Multigene Expression Programming
An improved form of machine learning (ML) known as multigene expression programming (MEP) is recently proposed, in which individual variables are represented by changing length entities [37,40]. The distinguishing feature of MEP is to propose simple linear and numerous solutions in a single chromosome [41]. This unique function enables searching in a broader range to find the finest viable response. Compared to gene expression programming (GEP), the MEP follows simple and easy processes [32]. MEP can handle exceptions such as incorrect expressions, infinity, statistical error type values, etc. As the gene is responsible for generating an exception, it alters to an arbitrarily terminal symbol. Therefore, no infertile individuals enter the next generation, thus providing a margin in the chromosome structure during the assessment and evaluation process. However, the GEP cannot remove such exceptions and may become part of the final solution [37]. The MEP is decoded similarly to the pascal and C compiler empirical relationship to machine coding. The result of the MEP is in a linear string of instructions form [42]. Several genes per chromosome govern the chromosome length, whereas the gene encodes the elements in function and terminal set. The abovementioned advantages of MEP over other methods can lead to accurate and reliable models in many fields. The MEP has been applied in a few research studies to estimate elastic modulus of normal and high strength concrete [41], to formulate the compressive strength of Portland cement [43], to develop models for soil deformation modulus [44], to formulate models for consolidating depth of the soil layer [42] and to develop models for polymer confined concrete columns [39].
The development of MEP model depends on several parameters which affect the overall performance of the model. Therefore, careful selection of these parameters is necessary. The values of the MEP-optimized parameters selected in the present study are presented in Table 1. The trial and error approach was used to get the optimum values of these important parameters, as suggested in the literature [45].

Modeling Database
A comprehensive and reliable experimental dataset on 28 days mechanical properties of bagasse ash concrete (BAC) was acquired from the published literature to train the MEP models [10][11][12][13][14][15]. The final datasets included a total of 132, 125, and 128 records of compressive strength f c , splitting tensile strength (f STS ) and flexural strength (f FS ), respectively, for concrete incorporated with bagasse ash (BA). As some researchers follow the British standard during experimental testing of the compressive strength (f c ) for concrete, the cube strength data was converted to cylindrical strength to make the data uniform [4,68]. Once all the data was collected and properly arranged, statistical analysis was applied to identify the most important and effective parameters that considerably influence the performance of BAC. The results of the data after statistical analysis are shown in Table 2. The parameters selected in the present research are water-to-binder ratio (w/c), amount of cement (CC), the quantity of coarse aggregate (CA), the quantity of fine aggregate (FA), and the percentage of BA (BA%). The frequency histograms of these modeling inputs are illustrated in Figure 1 for the purpose of visualizing the distribution of the input variables. The aforementioned parameters are considered to be a function of the f c , f STS and f FS of BAC as given in Equation (1).

Cross-Validation with k-Fold Algorithm
The machine learning (ML) models frequently fail to generate generalizable findings when trained on data that has not been previously used for model training. Consequently, it becomes difficult to assess the accuracy of the models [69]. As a usual practice, the dataset is partitioned into train and test sets for training and testing of models, respectively, and the performance is then assessed using statistical error metrics. However, this approach only works well with the availability of a large and broad dataset. Moreover, it is not considered a reliable method as the accuracy of one dataset can be very different from the accuracy obtained for another dataset. A resampling technique, called k-fold crossvalidation, is used to ensure that the model can perform well on unseen data. This technique distributes the currently available dataset to k subclasses [70]. The superior results and efficacy of the 10-fold approach are presented in the previously published literature [71]. In the current study, the 10-fold cross-validation is adopted by randomly dividing the dataset into ten subsets. Each class of the 10 subsets is utilized for validation to examine the grouping model, and the same process is reiterated for each subset left behind. The accuracy and predictability of the final model are then expressed in terms of mean accuracy obtained by the 10-fold approach in ten individual rounds.

Cross-Validation with k-Fold Algorithm
The machine learning (ML) models frequently fail to generate generalizable findings when trained on data that has not been previously used for model training. Consequently, it becomes difficult to assess the accuracy of the models [69]. As a usual practice, the dataset is partitioned into train and test sets for training and testing of models, respectively, and the performance is then assessed using statistical error metrics. However, this approach only works well with the availability of a large and broad dataset. Moreover, it is not considered a reliable method as the accuracy of one dataset can be very different from the accuracy obtained for another dataset. A resampling technique, called k-fold cross-validation, is used to ensure that the model can perform well on unseen data. This technique distributes the currently available dataset to k subclasses [70]. The superior results and efficacy of the 10-fold approach are presented in the previously published literature [71]. In the current study, the 10-fold cross-validation is adopted by randomly dividing the dataset into ten subsets. Each class of the 10 subsets is utilized for validation to examine the grouping model, and the same process is reiterated for each subset left behind. The accuracy and predictability of the final model are then expressed in terms of mean accuracy obtained by the 10-fold approach in ten individual rounds.

Models Evaluation by Statistical Measures
Different researchers suggest different parameters to check the accuracy of the developed models. Some of those parameters are used in this study, and their mathematical expressions are presented in Equations (2)- (9). Researchers recently used a new parameter to avoid overfitting of the model in artificial intelligence, and ML, known as an objective function (OF), is also used in this study [41,72]. If the values are low for Equations (2) and (5), the model is said to be good [30]. Similarly, if the values obtained from Equations (3) and (4) are close to 1, the model is termed as good [73]. However, it singlehandedly cannot judge the validity of a model because of its insensitiveness to the multiplication of the division of outcome. Likewise, according to Despotovic et al. (2016) [74], a model is deemed excellent if the result of Equation (7) is between 0 and 0.10; and good if lies between 0.11 and 0.20, respectively. The values of Equations (8) and (9) lie from 0 to positive infinity with a value nearer to zero signifies a good model. Lower value of OF identifies superior model performance.

Root means squared error
Objective function (OF) = n T − n TE n ρ T + 2 n TE n ρ TE (9) where n, M i , P i , M i and P i shows the total number of data points been partitioned into subsets, measured value, predicted value, mean of measured values, and mean of predicted value, respectively of the ith domain. The T and TE are the subscripts that correspond to the train and test datasets, respectively.

Mix Proportions for Bagasse Ash Concrete (BAC)
A series of experimental tests of bagasse ash concrete (BAC) was completed to validate the behavior of the MEP model through the validation requirement. The modified bagasse ash concrete mixes (BAC) and normal concrete (NC) samples were casted at 25 • C, and cured for 28 days to compare their mechanical properties. Various doses of bagasse ash (BA), ranging from 0% to 40%, were used as a cement replacement. The water-to-cement ratio for all the specimens was kept constant to compare the BAC with NC. Table 3 presents the complete formulation of the mix design proportions. Standard concrete cylinders (300 mm × 150 mm) and beams (100 mm × 100 mm × 500 mm) were produced with varying dosages of BA. The f c , f STS , f FS were tested at 28 days of curing age according to ASTM C39, ASTM C496, and ASTM C293 standards, respectively. The final results of the tested specimens were used to verify the behavior of the MEP models.

Mechanical Properties of BAC
The fundamental mechanical properties of BAC, namely f c , f STS and f FS were evaluated in the laboratory through testing beams and concrete cylinders using BA from 0 to 40% as a partial cement replacement. It can be noticed from Figure 2 that the strength of concrete increases up to 10BA (10% cement replaced with BA) and consistently decreases for 20BA, 30BA, and 40BA. The maximum strength gained is at 10% cement replacement and may be due to the small finest BA particle dispersed throughout the mix. The silica reacts with lime (resulted from cement hydration) and produces more calcium silicate hydrate (CSH) [49,75]. Additionally, the finer particle size fills the voids and increases the packing density. The strength reduction for higher replacement levels, i.e., 20BA, 30BA, and 40BA, is 6.5%, 17.3%, and 30.3%, respectively. This reduction might be attributable to a lack of sufficient Ca(OH) 2 .
The fundamental mechanical properties of BAC, namely f c ′ , f STS and f FS were evaluated in the laboratory through testing beams and concrete cylinders using BA from 0 to 40% as a partial cement replacement. It can be noticed from Figure 2 that the strength of concrete increases up to 10BA (10% cement replaced with BA) and consistently decreases for 20BA, 30BA, and 40BA. The maximum strength gained is at 10% cement replacement and may be due to the small finest BA particle dispersed throughout the mix. The silica reacts with lime (resulted from cement hydration) and produces more calcium silicate hydrate (CSH) [49,75]. Additionally, the finer particle size fills the voids and increases the packing density. The strength reduction for higher replacement levels, i.e., 20BA, 30BA, and 40BA, is 6.5%, 17.3%, and 30.3%, respectively. This reduction might be attributable to a lack of sufficient Ca(OH)2.
As shown in Figure 2, the maximum f STS has been achieved by 10BA followed by 20BA. The increase in f STS relative to NC samples is 25.3% and 15.8% for 10% and 20% substitution of BA, respectively. However, maximum f STS has been achieved at 10% substitution of BA as a replacement of cement. The f STS decreases by 7.9% and 23.8% for 30% and 40% BA replacement, respectively. For f FS , the maximum strength is also attained by 10% BA. The increased f STS and f FS at 10% BA might be attributed to the micro-fibrous character of BA, associated with CSH production and the generation of aluminates, developing in a needle-shaped structure [76,77]. The interlocking and bonding of such needles occur between hydrated pastes, which immediately enhances f STS and f FS of BAC.  As shown in Figure 2, the maximum f STS has been achieved by 10BA followed by 20BA. The increase in f STS relative to NC samples is 25.3% and 15.8% for 10% and 20% substitution of BA, respectively. However, maximum f STS has been achieved at 10% substitution of BA as a replacement of cement. The f STS decreases by 7.9% and 23.8% for 30% and 40% BA replacement, respectively. For f FS , the maximum strength is also attained by 10% BA. The increased f STS and f FS at 10% BA might be attributed to the micro-fibrous character of BA, associated with CSH production and the generation of aluminates, developing in a needle-shaped structure [76,77]. The interlocking and bonding of such needles occur between hydrated pastes, which immediately enhances f STS and f FS of BAC.

Formulation of BAC Mechanical Properties
The MEP findings for f c , f STS and f FS are evaluated in order to obtain empirical formulations for predicting the abovementioned characteristics related to the five input variables (w/c, BA%, CC, FA, and CA). For f c , f STS and f FS , the resulting MEP formulae are presented as Equations (10)-(12), respectively. Firstly, the essential input parameters were selected based on significant correlation and literature study for the derived equations. The MEP model was then trained on the data acquired from published literature. After acquiring the results predicted by the model, i.e., the RMSE and NSE values, the model is considered to be successfully trained on the given data. At the end of this process, the model provides empirical equations based on the number of input parameters. Finally, the derived Equations (10)- (12), were tested given the testing dataset. Figure 3a,b shows a comparison plot of experimental and projected f c along with the expression for the regression line for all three sets, i.e., training, and testing. The slope of the line is known to be exactly equal to one for an ideal situation. Figure 3 shows that the established MEP model included the influence of all five inputs and delivered a high correlation between experimental and projected results, as evidenced from the slopes of training and testing, i.e., 0.8951 and 0.9315, respectively. The graph also infers that the established model has been trained and has a high generalization relationship and thus will perform well on unseen data as well. where; x 0 = w c ; x 1 = BA%; x 2 = CC; x 3 = FA; x 4 = CA acquiring the results predicted by the model, i.e., the RMSE and NSE values, the model is considered to be successfully trained on the given data. At the end of this process, the model provides empirical equations based on the number of input parameters. Finally, the derived Equations (10)- (12), were tested given the testing dataset. Figure 3a,b shows a comparison plot of experimental and projected f c ′ along with the expression for the regression line for all three sets, i.e., training, and testing. The slope of the line is known to be exactly equal to one for an ideal situation. Figure 3 shows that the established MEP model included the influence of all five inputs and delivered a high correlation between experimental and projected results, as evidenced from the slopes of training and testing, i.e., 0.8951 and 0.9315, respectively. The graph also infers that the established model has been trained and has a high generalization relationship and thus will perform well on unseen data as well. where; x 0 = w c ⁄ ; x 1 = BA%; x 2 = CC; x 3 = FA; x 4 = CA   Figure 4a,b shows a similar comparative analysis for the f STS results. It can be observed that a good correlation exists between experimental and projected f STS . The slopes of the regression lines for the training, and testing datasets are close to ideal scenario, i.e., 0.9351, and 0.8903, respectively. The model developed for f STS also performs extremely well on the training set. As a result, the problem of model over-fitting has been mitigated to a higher extent.
The graphical results of MEP model for f FS can be observed in Figure 5a,b, which displays the regression line slope for training, and testing sets equals to 0.9494, 0.9026, respectively. It can also be observed that a better correlation between experimental and projected results was achieved for f FS which highlighted an excellent performance of MEP on both training and testing set.
well on the training set. As a result, the problem of model over-fitting has been mitigated to a higher extent.
The graphical results of MEP model for f FS can be observed in Figure 5a,b, which displays the regression line slope for training, and testing sets equals to 0.9494, 0.9026, respectively. It can also be observed that a better correlation between experimental and projected results was achieved for f FS which highlighted an excellent performance of MEP on both training and testing set.

Models Validation by Experimental Data
A literature survey revealed that BA concrete behaves differently at high and low replacement levels. The results of the model validation, by experimental data, are shown in Figures 6-8   well on the training set. As a result, the problem of model over-fitting has been mitigated to a higher extent. The graphical results of MEP model for f FS can be observed in Figure 5a,b, which displays the regression line slope for training, and testing sets equals to 0.9494, 0.9026, respectively. It can also be observed that a better correlation between experimental and projected results was achieved for f FS which highlighted an excellent performance of MEP on both training and testing set.

Models Validation by Experimental Data
A literature survey revealed that BA concrete behaves differently at high and low replacement levels. The results of the model validation, by experimental data, are shown in Figures 6-8

Models Validation by Experimental Data
A literature survey revealed that BA concrete behaves differently at high and low replacement levels. The results of the model validation, by experimental data, are shown in Figures 6-8 for f c , f STS and f FS , respectively. The slopes of the regression lines are 0.9014, 0.9273, and 0.9332 for f c , f STS and f FS , models which are nearly equal to 1 for the ideal case. During the models' validation, the R value was observed to be 0.93, 0.92, and 0.92 for f c , f STS and f FS data, respectively. The results revealed that the modeling outcome is in line with the experimental results, and the MEP model considered the effect of the parameters essential for concrete. Therefore, it has been confirmed that the ML techniques can be easily used to model the complicated processes and interaction among concrete ingredients in predicting the properties of concrete given the significant input variables. the parameters essential for concrete. Therefore, it has been confirmed that the ML techniques can be easily used to model the complicated processes and interaction among concrete ingredients in predicting the properties of concrete given the significant input variables.   the parameters essential for concrete. Therefore, it has been confirmed that the ML techniques can be easily used to model the complicated processes and interaction among concrete ingredients in predicting the properties of concrete given the significant input variables.

Statistical Analysis and Generalizability of the Models
The amount of data points utilized for model development affects its reliability. Therefore, the ratio between data points and inputs must be higher than five for both training and testing [78]. For f c ′ , f STS and f FS datasets, the aforementioned ratio for the

Statistical Analysis and Generalizability of the Models
The amount of data points utilized for model development affects its reliability. Therefore, the ratio between data points and inputs must be higher than five for both training and testing [78]. For f c , f STS and f FS datasets, the aforementioned ratio for the training set is 18.2, 17.5 and 17.1, respectively; and 6.4, 6.6 and 5.7, for the testing set, respectively. Moreover, Table 4     The conditions for checking the external predictability of the MEP models are given in Table 5. The researcher proposed that one of the regression line slopes (k and k') crossing the origin must be close to 1 [79]. Additionally, the literature has mentioned that if the indicator, R m is higher than 0.5, then the requirements for external validation of models are satisfied [80]. Table 5 shows that external validation requirements are met for all the three proposed MEP models for f c , f STS and f FS . Table 5. Statistical indicators for verifying the external predictability of proposed MEP models.

S.No.
Mathematical Expression Requirement f c f STS f FS Reference

10-Fold Cross-Validation Results
The 10-fold cross-validation can easily verify the robustness and generalized capability of ML models. This method has a parameter (k) which denotes the number of subclasses that a dataset can be split into. The 10-fold means that the given dataset can be segmented into 10 subsets or folds. This method is generally used to evaluate the ability of a model to analyze unseen data and also decreases the probability of error with random sampling.
All the three MEP models established for f c , f STS and f FS , were evaluated with 10-fold cross-validation using R and RMSE and graphically presented in Figure 10a,b, respectively. The figures show the variation in R and RMSE in each subset. However, an excellent mean accuracy can be seen. For f c , f STS and f FS , the mean value of R is 0.85, 0.89, and 0.85, respectively. In-10 fold, f STS obtained the minimum and maximum R of 0.91 and 0 0.72, respectively. Consequently f c , f STS and f FS , achieved mean RMSE of 4.54, 3.89, and 4.78, respectively. The f STS also has the smallest RMSE equals to 1.86, for the individual subset. Furthermore, the findings of 10-fold cross-validation demonstrate the MEP models are accurate and have a robust performance.
spectively. The figures show the variation in R and RMSE in each subset. However, an excellent mean accuracy can be seen. For f c ′ , f STS and f FS , the mean value of R is 0.85, 0.89, and 0.85, respectively. In-10 fold, f STS obtained the minimum and maximum R of 0.91 and 0 0.72, respectively. Consequently f c ′ , f STS and f FS , achieved mean RMSE of 4.54, 3.89, and 4.78, respectively. The f STS also has the smallest RMSE equals to 1.86, for the individual subset. Furthermore, the findings of 10-fold cross-validation demonstrate the MEP models are accurate and have a robust performance.

Conclusions
The present research implemented a twofold objective. Primarily, the mechanical properties, i.e., f c , f STS and f FS of bagasse ash concrete (BAC) were formulated by applying a supervised machine learning model, i.e., MEP. The training and testing of the models were accomplished based on widespread data collected from previous technical literature. Thereafter, sugarcane bagasse ash (BA) was used as a partial substitute for cement in various amounts (10%, 20%, 30% and 40%) to evaluate the mechanical properties. The developed MEP models were further validated through data obtained from experimental testing of BAC. The efficacy and performance of the projected models were reviewed via inferential statistical metrics, i.e., RMSE, RSE, NSE, MAE, RRMSE, ρ, OF and R. The final datasets were also cross-validated with k-fold algorithm, confirming the generalizability of models. The findings of developed models showed a good relationship with the experimental results, with R higher than 0.9; RMSE and MAE values less than 5, and OF values nearer to 0, for the all three projected MEP models for f c , f STS and f FS . The proposed models also met the external validation requirements found in the previous technical literature. It is clear from the current research that the consumption of bagasse ash like waste material is essential for the production of green concrete and from the sustainability viewpoint. Moreover, the MEP model, supplemented with validation on practical laboratory dataset and further cross-validation studies, can provide such models that can directly influence the civil engineering industry.
The work presented in the current research has certain shortcomings. The main focus of this research was to examine the consequence of concrete constituents on the mechanical properties of BAC. Indeed, other important factors also need to be investigated that are important to mechanical viewpoints, such as curing conditions, type of cement, reactivity and type of ash, and testing conditions. It is strongly endorsed that further research should be accomplished with an extensive dataset for model training and testing. Moreover, some deep learning techniques, i.e., convolution neural network, neuro-fuzzy inference system, and ensemble modeling, should be considered for comparative analysis and accurate assessment of concrete properties.