A Comparative Study of Random Forest and Genetic Engineering Programming for the Prediction of Compressive Strength of High Strength Concrete (HSC)

: Supervised machine learning and its algorithm is an emerging trend for the prediction of mechanical properties of concrete. This study uses an ensemble random forest (RF) and gene expression programming (GEP) algorithm for the compressive strength prediction of high strength concrete. The parameters include cement content, coarse aggregate to ﬁne aggregate ratio, water, and superplasticizer. Moreover, statistical analyses like MAE, RSE, and RRMSE are used to evaluate the performance of models. The RF ensemble model outbursts in performance as it uses a weak base learner decision tree and gives an adamant determination of coe ﬃ cient R 2 = 0.96 with fewer errors. The GEP algorithm depicts a good response in between actual values and prediction values with an empirical relation. An external statistical check is also applied on RF and GEP models to validate the variables with data points. Artiﬁcial neural networks (ANNs) and decision tree (DT) are also used on a given data sample and comparison is made with the aforementioned models. Permutation features using python are done on the variables to give an inﬂuential parameter. The machine learning algorithm reveals a strong correlation between targets and predicts with less statistical measures showing the accuracy of the entire model.


Introduction
High strength concrete (HSC) has its popularity spread wide and far for its superior performance. HSC has been deemed superior for its substantial high strength and durability [1][2][3][4]. Its strength has been witnessed to be higher than that of conventional concrete, a quality that has drastically increased its use in the modern-day construction industry [5]. A new technology that results in homogenous and dense concrete, and also bolsters the strength parameters, is the reason for the permeation in its use within the construction industry [5,6]. It has been commonly used in concrete-filled steel tubes, bridges, and columns. As per the American Concrete Institute (ACI), "HSC is the one that possesses a specific requirement for its working which cannot be achieved by conventional concrete" [7]. Numerous 2 of 18 researchers suggested different methods for the mix design of HSC. All the methods of mix design require a specific set of experimental trials to achieve the target strength. It is an ineluctable truth that the experimental work is time consuming and requires a substantial amount of money. In addition, amateur technicians and error in testing machines raise questions on the veracity of the experimental work conducted across the globe. Various researchers used different statistical methods to predict different properties of HSC. Some of the studies are summarized in Table 1. However, this field still requires further exploration.  [14] In recent years, concepts of machine learning are used successfully in various fields for the predictions of different properties. Likewise, the civil engineering construction industry has also adopted such techniques to overcome cumbersome experimental procedures. For instance, some of these approaches include multivariate adaptive regression spline (MARS) [15,16], genetic engineering programming (GEP) [17][18][19][20], support vector machine (SVM) [21,22], artificial neural networks (ANN) [23][24][25], decision tree (DT) [26][27][28], adaptive boost algorithm (ABA), and adaptive neuro-fuzzy interference (ANFIS) [29][30][31][32]. Javed et al. [18] predict the axial behavior of a concrete-filled steel tube (CFST) with 227 data points by using gene expression programming. The author achieves adamant strong correlation between prediction and experimental axial capacity [18]. Farjad el al. [33] used gene expression programming in the prediction of mechanical properties of waste foundry sand in concrete. Gregor et al. [34] adopted the ANN approach to evaluate the compressive strength of concrete. It was witnessed that ANN depicts the experimental values accurately; thus, it proves to be an exceptional prediction tool. Amir et al. [35] predict the compressive strength of geopolymer concrete incorporating natural zeolite and silica fume by using ANN. ANN thus established a good relationship and gave obstinate accuracy in prediction of geopolymer concrete. Zahra et al. [32] predict the compressive strength of concrete with ANN and ANFIS models. The authors reveal that ANFIS gives a more adamant and stronger correlation than the ANN model. Javed et al. [36] predict the compressive strength of sugar cane bagasse ash concrete by conducting the experimental and literature-based study. Experimental work is used to validate the model and remaining data were gathered from published literature. The author used the GEP algorithm and obtained a good model between target values. Nour et al. [37] used the GEP algorithm to predict the compressive strength of concrete filled steel columns incorporating recycled aggregate (RACFSTC). The author used 97 data points in the modeling aspect of the RACFSTC column and observed adamant correlation. Junfei et al. [38] modeled the compressive strength self-compacting concrete by using beetle antennae search-based random forest algorithm. The author obtained an obstinate strong correlation of R 2 = 0.97 with experimental results. Qinghua et al. [26] employed random forest approach to predict the compressive strength of high-performance concrete. Similarly, Sun et al. [39] used evolved random forest algorithm on 138 data samples to predict the compressive strength of rubberized concrete which was collected from published literature. This advanced-based approach gave better performance with a strong coefficient correlation of R 2 = 0.96. ANN and other models have been adopted for predicting the mechanical strength parameters of high-performance concrete and recycled aggregate concrete [40][41][42][43][44]. Pala et al. [45] studied the influence of silica and fly ash on the compressive strength of concrete. A comprehensive experimental was carried out to analyze the impact of varying w/c ratios and varying percentages of silica and fly ash on the performance of concrete. In addition, ANN was adopted to depict the effect on the strength parameters of concrete [45]. Azim et al. [44] used a GEP-based machine learning algorithm to predict the compressive arch action of a reinforced concrete structure. The author found that GEP is an effective tool for prediction performance.
This paper aimed at evaluating the performance of compressive strength of a high strength concrete (HSC) using ensemble random forest (RF) and gene expression programming (GEP). The data points used to model were attained from published articles and are listed in Table S1. Anaconda spyder python-based programming [46] and GENEXprotool software [47] are used for prediction of the compressive strength of HSC. The parameters used in model contain cement, water, coarse aggregate to fine aggregate ratio, superplasticizer as input, and compressive strength as output for model development. Hex contour graphs are made to show the relationship of the input and output parameters. Sensitivity analysis (SA) and permutation feature importance (PFI) that address the relative importance of each variable on the desired output parameters are conducted. Moreover, the model evaluation is also carried out by using statistical measures.

Random Forest Regression
Random forest regression is proposed by Breiman in 2001 [48] and is considered an improved classification regression method. The main features of RF include the speed and flexibility in creating the relationship between input and output functions. In addition, RF handles the large datasets more efficiently as compared to other machine learning techniques. RF has been used in various fields, for instance, it had been used in banking for predicting customer response [49], for predicting the direction of stock market prices [50], in the medicine/pharmaceutical industry [51], e-commerce [52], etc.
The RF method consists of the following main steps: 1.
Collection of trained regression trees using training set.

2.
Calculating average of the individual regression tree output.

3.
Cross-validation of the predicted data using validation set.
A new training set consisting of bootstrap samples is calculated by replacing the original training set. During implementation of this step, some of the sample points are deleted and replaced with existing sample points. The deleted sample points are collected in separate set, known as out-of-bag samples. Afterwards, 2/3rd of the sample points is utilized for estimating regression function. In this case, the out-of-bag samples are used for the validation of the model. The process is repeated several times till the required accuracy is achieved. This in-built process of deleting the points for out-of-bag samples and utilizing them for validation purpose is the unique capability of RFR. The total error is calculated for each expression tree at the end and shows the efficiency of each expression tree.

Gene Expression Programming
GEP is proposed by Ferreira [53] as an improved form of genetic programming (GP). It uses a linear string and parse tree of varying lengths. The GEP model includes function set, terminal set, terminal conditions, control parameters, and objective function. GEP creates an initial set of selected individuals and converts them to expression trees of different sizes and shapes. This step is necessary to represent the solutions of GEP in mathematical form. Finally, the predicted value is compared with the experimental one to calculate the fitness of each data point. The model stops working when the overall fitness of the complete dataset stops improving. The best result giving chromosome is selected and passed to next generation. The process repeats itself until satisfactory fitness is obtained.
where A, B, C, D are variables (terminal set) and 2, 3 are constants.

Dataset Used in Modeling Aspect
Model evaluation is based on data sample and the number of parameters used. A total of 357 datasets were obtained from published literature (See Table S1). These points were trained, validated, and tested during modeling to build a numerical-based empirical relation for HSC. This is done to minimize the over fitting of data in machine learning approaches. The samples were divided into 70/15/15 sets to give adamant correlation coefficient. Behnood et al. [54] predict the mechanical properties of concrete with data taken from published literature. The samples were randomly distributed for training (70%), validation (15%), and testing (15%) sets. Similarly, Getahun et al. [55] forecasted the mechanical properties of concrete by distributing the data in the same way as discussed. Training is usually done to train the model with given values which then predict the values of strength of unknown values, namely the test set.

Programming-Based Presentation of Datasets
Anaconda-based python version (version 3.7) programming [46] has been adopted to depict the influence of various input parameters upon the mechanical strength of HSC. Compressive strength of concrete is influenced by the number of parameters used in experimental work. Thus, cement content (Type 1), water, superplasticizer (polycarboxylate), and fine and coarse aggregate (20 mm) were used in modeling of the compressive strength of HSC. The impact of these input parameters was visualized with the use of python which is done in Jupitar notebook [56] as shown in Figure 1.
where A, B, C, D are variables (terminal set) and 2, 3 are constants.

Dataset Used in Modeling Aspect
Model evaluation is based on data sample and the number of parameters used. A total of 357 datasets were obtained from published literature (See Table S1). These points were trained, validated, and tested during modeling to build a numerical-based empirical relation for HSC. This is done to minimize the over fitting of data in machine learning approaches. The samples were divided into 70/15/15 sets to give adamant correlation coefficient. Behnood et al. [54] predict the mechanical properties of concrete with data taken from published literature. The samples were randomly distributed for training (70%), validation (15%), and testing (15%) sets. Similarly, Getahun et al. [55] forecasted the mechanical properties of concrete by distributing the data in the same way as discussed. Training is usually done to train the model with given values which then predict the values of strength of unknown values, namely the test set.  Figure 1 represents the quantities which have adamant influence on the mechanical properties of HSC. The darkish region shows the optimal/maximum concentration of variables as depicted in Figure 1. Python is an effective machine learning approach that enables users to have a deep understanding of the parameters that alter the functioning of the model. Python uses the seaborn  Figure 1 represents the quantities which have adamant influence on the mechanical properties of HSC. The darkish region shows the optimal/maximum concentration of variables as depicted in Figure 1. Python is an effective machine learning approach that enables users to have a deep understanding of the parameters that alter the functioning of the model. Python uses the seaborn command to plot the correlation among the desired parameters. The description of the data variables (see Table 2) used in the model consist of training set, validation set, and testing set as represented in Tables 3-5. The parameters that define and ensure that optimum results are achieved for all techniques. Identifying these parameters is of core importance.

GEP Model Development
The secondary objective during this research work was to derive a generalized equation for the compressive strength of HSC. For this purpose, a terminal set, a function set, and four parameters (d 0 : cement content, d 1 : fine to coarse aggregate, d 2 : water, d 3 : superplasticizer) were used in modeling. These input parameters were utilized for the development of the model based on gene expression programming. In addition, simple mathematical operations (+, −, /, ×) were used which were part of the function set. A simple arithmetic operation was used to build an empirical-based relation which is the function of the following parameters The GEP-based model, like all genetic algorithm models, is significantly influenced by the input parameters (variables) upon which they are modeled. These variables had a substantial impact on the generalizing fitness of these models. The variables used during this study are tabulated in Table 6. The model time is an important parameter to analyze the effectiveness of the model. Thus, efforts shall be made while selecting the sets which control the model time to ensure that the generalized model always developed within due time. The selecting of these parameters is based on hit and trial method to get maximum correlation. Root mean squared error (RMSE) was adopted in modeling. Moreover, the performance of the model based on GEP is expressed by tree like architecture structures. This structure consists of head size and number of genes [57].

Model Performance Analysis
To assess the viability of any model and to evaluate its performance, various indicators have been used. Each indicator has its method of inferring the performance of these models. The indicators commonly used include root mean squared error (RMSE), mean absolute error (MAE), relative mean square error (RSE), relative root mean squared error (RRMSE), and coefficient of determination (R 2 ). The mathematical expressions for these indicators are given below.
where: ex i = experimental actual strength. mo i = model strength. ex i = average value of the experimental outcome. mo i = average value of the predicted outcome. In this paper, the performance of the model is also evaluated by using the coefficient of determination (R 2 ). The model is deemed effective when the value of R 2 is greater than 0.8 and is close to 1 [58]. The value obtained through model is the reflection that shows the correlation between the experimental and predicted outcomes. Lower values of the indicator errors like MAE, RRMSE, RMSE, and RSE indicate higher performance. Machine learning is a good approach in the prediction of properties. However, overfitting issues in a dataset have a malignant effect in validation and fore casting of mechanical aspect of HSC. Thus, overcoming this problem of overfitting has become a dire need in supervised machine learning algorithms. Researchers used objective function (OBF) for the accuracy of models. OBF uses overall data samples along with the error and regression coefficient. This then provides a more accurate generalized model with adamant higher accuracy and is represented in Equation (8) [59].

Random Forest Model Analysis
Random forest is an ensemble modeling algorithm which uses a weak learner to give the best performance as depicted in Figure 2. These algorithms are supervised learners giving adamant accuracy in terms of correlation. The model is divided into twenty submodels to give maximum determination of coefficient as illustrated in Figure 2a. It can be seen that sub-model equal to 10 outbursts and gives a strong relationship. It is due to incorporation of a weak learner (decision tree), which then uses it in the ensembling algorithm. Moreover, the model gives an obstinate correlation of R 2 = 0.96 between experimental and predicted values and gives good validation results as illustrated in Figure 2b,c. In addition, the model performance shows less error as illustrated in Figure 2d. All the predicted data points lie in the same range of experimental values with an error less than 10MPa. This shows that the random forest ensemble algorithm gives adamant good results.

Random Forest Model Analysis
Random forest is an ensemble modeling algorithm which uses a weak learner to give the best performance as depicted in Figure 2. These algorithms are supervised learners giving adamant accuracy in terms of correlation. The model is divided into twenty submodels to give maximum determination of coefficient as illustrated in Figure 2a. It can be seen that sub-model equal to 10 outbursts and gives a strong relationship. It is due to incorporation of a weak learner (decision tree), which then uses it in the ensembling algorithm. Moreover, the model gives an obstinate correlation of R 2 = 0.96 between experimental and predicted values and gives good validation results as illustrated in Figure 2b and Figure 2c. In addition, the model performance shows less error as illustrated in Figure 2d. All the predicted data points lie in the same range of experimental values with an error less than 10MPa. This shows that the random forest ensemble algorithm gives adamant good results. Statistical analysis checks are applied to check the model performance using random forest. This is an indirect method which shows model performance. These statistical analyses check the errors in the model; thus, RMSE, MAE, RSE, and RRMSE are used as shown in Table 7. The RF model is ensemble one and thus shows lesser error in the prediction aspect. Statistical analysis checks are applied to check the model performance using random forest. This is an indirect method which shows model performance. These statistical analyses check the errors in the model; thus, RMSE, MAE, RSE, and RRMSE are used as shown in Table 7. The RF model is ensemble one and thus shows lesser error in the prediction aspect.

Empirical Relation of HSC Using the GEP Model
Gene expression programming is an individual supervised machine learning approach which predicts the mechanical compressive strength using tree-based expression. Moreover, GEP gives an empirical relation with input parameters as shown in Equation (9). This simplified equation is then used to predict the compressive strength of HSC. This equation comes from the expression tree which used a function set and terminal set with the mathematics operator as shown in Figure 3. It shows the relationship between input parameters and output strength. GEP utilizes linear as well as non-linear algorithms in the forecasting of mechanical properties.
where A = 19.97 * cement (water + superplasticizer) + 15.31 (11) Before running the GEP algorithm, the procedure starts with the selection of the number of chromosomes and basic operators that are provided by GEP software. The model uses hit and trial techniques where chromosomes of varying sizes and gene numbers are used with operational operators, thus ensuring the selection of the best model. The selected model has the best/fittest gene available within the population which gives adamant performance in making the model. The most feasible and desirable outcome used in the GEP model is f c , which is expressed in the form of an expression tree as shown in Figure 3. Expression tree uses a linkage function with a basic mathematical operator with some constants. It is worth mentioning here that the GEP algorithm uses the RMSE function for its prediction. Before running the GEP algorithm, the procedure starts with the selection of the number of chromosomes and basic operators that are provided by GEP software. The model uses hit and trial techniques where chromosomes of varying sizes and gene numbers are used with operational operators, thus ensuring the selection of the best model. The selected model has the best/fittest gene available within the population which gives adamant performance in making the model. The most feasible and desirable outcome used in the GEP model is , which is expressed in the form of an expression tree as shown in Figure 3. Expression tree uses a linkage function with a basic mathematical operator with some constants. It is worth mentioning here that the GEP algorithm uses the RMSE function for its prediction.

GEP Model Evaluation
Model evaluation and its representation between observed and predicted values is illustrated in Figure 4. GEP-based machine learning algorithm is an effective approach to assess the strength parameters of HSC. Model assessment in machine learning is usually done with regression analysis. Regression analysis shows the accuracy of any model with value close to one is an adamant accurate model as represented in Figure 4(b). It shows that the regression line of the testing and validation sets is close to 1. Figure 4(a) and Figure 4(b) represent the regression analysis of validation and testing sets with coefficient of determination R 2 . This value is greater than 0.8 which depicts the accuracy of

GEP Model Evaluation
Model evaluation and its representation between observed and predicted values is illustrated in Figure 4. GEP-based machine learning algorithm is an effective approach to assess the strength parameters of HSC. Model assessment in machine learning is usually done with regression analysis. Regression analysis shows the accuracy of any model with value close to one is an adamant accurate model as represented in Figure 4b. It shows that the regression line of the testing and validation sets is close to 1. Figure 4a,b represent the regression analysis of validation and testing sets with coefficient of determination R 2 . This value is greater than 0.8 which depicts the accuracy of the model as 0.91 and 0.90 for the testing (see Figure 4a) and validation (see Figure 4b) sets, respectively. Normalization of gathered data from published literature was also done within the range of zero and one to show the accurateness of data as illustrated in Figure 4c. the model as 0.91 and 0.90 for the testing (see Figure 4(a)) and validation (see Figure 4(b)) sets, respectively. Normalization of gathered data from published literature was also done within the range of zero and one to show the accurateness of data as illustrated in Figure 4  Statistical measures are used to evaluate the performance of the model by using MAE, RRMSE, RSE, and RMSE as done similarly in a random forest model as shown in Table 8. Low error and higher coefficient give better performance of the model. Most of the errors lies below 5 MPa with an R 2 value greater than 0.8. Thus, it depicts the accuracy of the finalized model. Further analysis is also performed to evaluate the performance of the model by determining the standard deviation (SD) and covariance (COV). The values of SD and COV are determined to be 0.16 and 0.059, respectively. The accuracy and performance of the machine learning-based model is evaluated by conducting error distribution between actual targets and predicted values of the testing set as shown in Figure 5. It can be seen that the model predicted the outcome nearly or equal to the experimental values. Moreover, the error distribution of the testing set shows that 86% of the data sample lies below 5 MPa and 13.88% of the data lies in the range of 5 MPa to 8 MPa with 7.47 MPa as maximum error. Thus, the GEP-based model not only gives obstinate accuracy in terms of correlation but also gives the Statistical measures are used to evaluate the performance of the model by using MAE, RRMSE, RSE, and RMSE as done similarly in a random forest model as shown in Table 8. Low error and higher coefficient give better performance of the model. Most of the errors lies below 5 MPa with an R 2 value greater than 0.8. Thus, it depicts the accuracy of the finalized model. Further analysis is also performed to evaluate the performance of the model by determining the standard deviation (SD) and covariance (COV). The values of SD and COV are determined to be 0.16 and 0.059, respectively. The accuracy and performance of the machine learning-based model is evaluated by conducting error distribution between actual targets and predicted values of the testing set as shown in Figure 5. It can be seen that the model predicted the outcome nearly or equal to the experimental values. Moreover, the error distribution of the testing set shows that 86% of the data sample lies below 5 MPa and 13.88% of the data lies in the range of 5 MPa to 8 MPa with 7.47 MPa as maximum error. Thus, the GEP-based model not only gives obstinate accuracy in terms of correlation but also gives the empirical equation shown in Equation (9). This equation will help the users to predict the compressive strength of concrete by using hand calculations.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 18 empirical equation shown in Equation (9). This equation will help the users to predict the compressive strength of concrete by using hand calculations.

Statistical Analysis Checks on RF and GEP Model
The accuracy of any model is based on data points. The higher the points, the greater will be the accuracy of the entire model [60]. Frank et al. [60] present an ideal solution based on the ratio of input data samples to its parameters involved. This ratio should be equal to or greater than three for good performance of the model. This study uses 357 data samples with the 4 variables mentioned earlier with the ratio equal to 89.25. This ratio value is exceptionally higher, indicating the accuracy of the model. Farjad et al. [33] used a similar approach to validate the model and yield adamant results with a ratio greater than 3. Researchers suggest different approaches for the validation of a model using external statistical measures [61,62]. Golbraikh et al. [62] validate their model using the slope of the regression line (k ' or k) of the model. This line measures the accuracy of the model by using experimental and predicted values. Any value greater than 0.8 or close to 1 will yield obstinate performance of the model [61]. All these external checks have been presented in tabulated in Table 9. Table 9. Statistical analysis of RF and GEP models from external validation.

Comparison of Models with ANN and Decision Tree
Ensemble RF and GEP approach are compared with other supervised machine learning algorithms, namely ANN and DT as depicted in Figure 6. These techniques, along with GEP, are

Statistical Analysis Checks on RF and GEP Model
The accuracy of any model is based on data points. The higher the points, the greater will be the accuracy of the entire model [60]. Frank et al. [60] present an ideal solution based on the ratio of input data samples to its parameters involved. This ratio should be equal to or greater than three for good performance of the model. This study uses 357 data samples with the 4 variables mentioned earlier with the ratio equal to 89.25. This ratio value is exceptionally higher, indicating the accuracy of the model. Farjad et al. [33] used a similar approach to validate the model and yield adamant results with a ratio greater than 3. Researchers suggest different approaches for the validation of a model using external statistical measures [61,62]. Golbraikh et al. [62] validate their model using the slope of the regression line (k' or k) of the model. This line measures the accuracy of the model by using experimental and predicted values. Any value greater than 0.8 or close to 1 will yield obstinate performance of the model [61]. All these external checks have been presented in tabulated in Table 9. Table 9. Statistical analysis of RF and GEP models from external validation.

S.No Equation Condition
RF Model GEP Model

Comparison of Models with ANN and Decision Tree
Ensemble RF and GEP approach are compared with other supervised machine learning algorithms, namely ANN and DT as depicted in Figure 6. These techniques, along with GEP, are individual algorithms. However, RF is an ensemble one which incorporates a base learner as an individual learner and model it with bagging technique to give an adamant strong correlation. It should be kept in mind that all models are based on python (anaconda). The comparison of models is presented in Figure 6. The RF outburst in performance of the model can be seen with R 2 = 0.96 and its error distribution as shown in Figure 6a,b. Whereas individual models ANN, DT, and GEP show good response with R 2 = 0.89, 0.90, and 0.90, respectively. Figure 6d represents the error distribution of decision tree with maximum error below 10 MPa. However, 18.19 MPa is reported as the maximum error. A similar trend has also been observed for ANN and GEP models with maximum error values of 11.80 MPa and 7.48 MPa, respectively as shown in Figure 6f,h. Moreover, researchers used different algorithm-based machine learning techniques for the prediction of mechanical properties of high strength concrete. Ahmed et al. [63] used an ANN algorithm and forecasted the mechanical properties (slump and compressive strength) of HSC. The author evaluated its model with ANN and revealed strong correlation for slump and compressive of about 0.99. Singh et al. [64] forecasted the mechanical properties of HSC by using RF and M5P algorithms and reported strong correlation for the testing set of 0.876 and 0.814, respectively.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 18 individual algorithms. However, RF is an ensemble one which incorporates a base learner as an individual learner and model it with bagging technique to give an adamant strong correlation. It should be kept in mind that all models are based on python (anaconda). The comparison of models is presented in Figure 6. The RF outburst in performance of the model can be seen with R 2 =0.96 and its error distribution as shown in Figure 6(a) and Figure 6(b). Whereas individual models ANN, DT, and GEP show good response with R 2 = 0.89, 0.90, and 0.90, respectively. Figure 6(d) represents the error distribution of decision tree with maximum error below 10 MPa. However, 18.19 MPa is reported as the maximum error. A similar trend has also been observed for ANN and GEP models with maximum error values of 11.80 MPa and 7.48 MPa, respectively as shown in Figure 6(f) and Figure 6(h). Moreover, researchers used different algorithm-based machine learning techniques for the prediction of mechanical properties of high strength concrete. Ahmed et al. [63] used an ANN algorithm and forecasted the mechanical properties (slump and compressive strength) of HSC. The author evaluated its model with ANN and revealed strong correlation for slump and compressive of about 0.99. Singh et al. [64] forecasted the mechanical properties of HSC by using RF and M5P algorithms and reported strong correlation for the testing set of 0.876 and 0.814, respectively.

Permutation Feature Analysis (PFA)
Permutation feature analysis (PFA) is performed to determine the most influential parameters affecting the compressive strength of HSC. PFA is performed by utilizing an extension of python programming. Figure 7 shows the results of PFA. The results show that all the variables considered in this study strongly affect the compressive strength property of HSC. However, the effect of super plastizer is more as compared to the other variables.

Conclusions
Supervised machine learning predicts the mechanical properties of concrete and gives outmost result. This will help the user to forecast the desire properties rather than conducting the experimental setup. The following properties are deduced from using the machine learning algorithm.
1. Random forest is an ensemble approach which gives adamant performance between observed and predicted value. It is due to incorporation of a weak learner as base learner (decision tree) and gives determination of coefficient R 2 = 0.96.

Permutation Feature Analysis (PFA)
Permutation feature analysis (PFA) is performed to determine the most influential parameters affecting the compressive strength of HSC. PFA is performed by utilizing an extension of python programming. Figure 7 shows the results of PFA. The results show that all the variables considered in this study strongly affect the compressive strength property of HSC. However, the effect of super plastizer is more as compared to the other variables.

Permutation Feature Analysis (PFA)
Permutation feature analysis (PFA) is performed to determine the most influential parameters affecting the compressive strength of HSC. PFA is performed by utilizing an extension of python programming. Figure 7 shows the results of PFA. The results show that all the variables considered in this study strongly affect the compressive strength property of HSC. However, the effect of super plastizer is more as compared to the other variables.

Conclusions
Supervised machine learning predicts the mechanical properties of concrete and gives outmost result. This will help the user to forecast the desire properties rather than conducting the experimental setup. The following properties are deduced from using the machine learning algorithm.
1. Random forest is an ensemble approach which gives adamant performance between observed and predicted value. It is due to incorporation of a weak learner as base learner (decision tree) and gives determination of coefficient R 2 = 0.96.

Conclusions
Supervised machine learning predicts the mechanical properties of concrete and gives outmost result. This will help the user to forecast the desire properties rather than conducting the experimental setup. The following properties are deduced from using the machine learning algorithm.

1.
Random forest is an ensemble approach which gives adamant performance between observed and predicted value. It is due to incorporation of a weak learner as base learner (decision tree) and gives determination of coefficient R 2 = 0.96.

2.
GEP is an individual model rather than an ensemble algorithm. It gives a good relation with the empirical relation. This relation can be used to predict the mechanical aspect of high strength concrete via hand calculation.

3.
Comparison of the RF and GEP models is made with ANN and DT. However, RF outbursts and gives an obstinate relation of R 2 = 0.96. GEP model gives R 2 = 0.90. ANN and DT models give 0.89 and 0.90, respectively. Moreover, RF gives less errors as compared to others individual algorithms. This is due to the bagging mechanism of RF.

4.
Permutation features give an influential parameter in HSC. This help us to check and know the most dominant variables in using experimental work; thus, all the variables have an effect on compressive strength.  Funding: This research received no external funding.