New Prediction Model for the Ultimate Axial Capacity of Concrete-Filled Steel Tubes: An Evolutionary Approach

: The complication linked with the prediction of the ultimate capacity of concrete-ﬁlled steel tubes (CFST) short circular columns reveals a need for conducting an in-depth structural behavioral analyses of this member subjected to axial-load only. The distinguishing feature of gene expression programming (GEP) has been utilized for establishing a prediction model for the axial behavior of long CFST. The proposed equation correlates the ultimate axial capacity of long circular CFST with depth, thickness, yield strength of steel, the compressive strength of concrete and the length of the CFST, without need for conducting any expensive and laborious experiments. A comprehensive CFST short circular column under an axial load was obtained from extensive literature to build the proposed models, and subsequently implemented for veriﬁcation purposes. This model consists of extensive database literature and is comprised of 227 data samples. External validations were carried out using several statistical criteria recommended by researchers. The developed GEP model demonstrated superior performance to the available design methods for AS5100.6, EC4, AISC, BS, DBJ and AIJ design codes. The proposed design equations can be reliably used for pre-design purposes—or may be used as a fast check for deterministic solutions.


Introduction
A concrete-filled steel tube (CFST), consists of a steel tube full of concrete. Over the last decade, their use in the building-construction industry as a column and has increased exponentially [1,2]. They have been used in various modern construction projects [3][4][5][6]. The CFST structure provides adamant structural advantages that include desirable ductility with high energy-absorption capacities, high strength and fire resistance [7][8][9]. During concrete construction, the use of shuttering is also not some flaws in ANN modeling as it acts as a black box and does not give adamant relation to model in term of the equation. This reduces its chance of modeling perspective. Also, ANN parameters are based on several hit and trail cycles which in turn requires more time in prediction. In contrary, use of gene expression in supervised mechanism produces and gives a well-defined prediction model [42][43][44].
Ipek et al. predicted the axial capacity of concrete-filled double-skin steel column section by using gene expression [45]. The author achieved a strong correlation with actual and predicted one with minimum errors. The Gene expression model take the best input and optimize it and predict the outcome by minimizing its error and thus provide best prediction with adamant fitness. Numerous scholars' study and used GP in generating an accurate model for complex engineering domains. Different modifications were proposed to enhance the performance of GP. Genetic engineering programming (GEP) is the most advanced one. Yet, the use of GEP to address complex structural engineering problems has been limited [22]. Esra et al. estimated the axial carry capacity of concrete-filled tube by using GEP algorithm [46]. It is worth mentioning here, that the developed equation is two lengthy and cannot be used for practical implementation [47,48].
Experimental works is time consuming and thus required lot of resources to give a good justified strength. This tradition approach and misplacement of quantities during casting produces malignant effect to strength. Hence, use of supervised algorithms increases the efficiency of prediction by not only just taking data point, but can also help us in generating a hand-based equation. This equation can be then used to predict the overall efficiency of desired model. Moreover, supervised machine learning approaches just predict the strength by giving us the strong correlation but cannot give a relation-based equation. Hence, gene expression programming algorithm was used which can give a strong-based equation with stronger correlation with target and predicted values.
In this research, the GEP approach is exercised to evaluate the axial performance of CFST members. The developed model correlates the axial strength to a few affecting parameters. To effectively design the CFST members with lesser costs, it is essential to establish some models correlating the basic parameters with an axial ultimate capacity of CFST members. Special attention has been given in making a simplified equation that can predicts the strength of CFST even by hand calculation. The model proposed is built based on a huge number of published axial tests on CFST members. The results produced by the model developed are further than judged with those achieved through various codes of practice as several authors show their concerns about the existing design codes [23].

Comparison of Genetic Programming vs. Genetic Engineering Programming
Ferreira [24] proposed supervised learning machine algorithm ahead from GP which is based on the genetic human evolutionary algorithm. This modified form is also termed as gene expression programming (GEP). GEP develops computer supervised programs that are encrypted in fixed-length chromosomes whereas GP grows a solitary tree expression [49][50][51]. Gene expression programming (GEP) is like genetic algorithms (GA) and is an alternative form of traditional genetic programming (GP). It was proposed by Ferreira [24] and is used to predict the relationship between input and output data. In GEP the chromosome consist of linear, symbolic strings of genes and each gene in it is a code for object selection while expression tree (ET) is also used for the similar purpose. The parameters that are used by GEP are similar to the ones that were used in GP [52][53][54]. In these algorithms the computer programs consist of the characters of defined length comparing with the expression trees of length which varies in genetic programming. In computer programs each expression hide as cramped twine of rooted capacity and intentionally declared as the function in which entities are not affected by the change in their values. These types of programs are called complex tree structure or expression trees (ETs) [55][56][57]. GEP uses genotype and phenotype algorithms in which genotype is detached from phenotype and this programming results as an evolutionary advantage [24]. In GEP size of genome is defined clearly by the problems and is determined by hit and trial rule. For this purpose, a method that utilizes the capability of a system to choose a best possible mode of operation is adaptive control that is employed [58][59][60]. This approach uses the parameters that are same as of GP. Since all adaptation Crystals 2020, 10, 741 4 of 24 take place in simple linear structure because in overall structure mutation and structure replication is not required. Moreover, each chromosome comprises of genes which have two well-defined adjacent regions which is called head and the terminal symbols (nodes of leaf) called tail. In head the symbol are used to code internal on ET and in tail it is encoded in expression tree (ET) [61][62][63]. Figure 1 displays the GEP algorithm schematic layout. The procedure is started with the random formation of fixed dimension chromosome for each singular. Second, the genes are fetching as ETs and tested for their best fitness. Afterwards, the reproduction is applied to the individuals evaluated by the fitness function. The complete hierarchy is repetitive with newly produced gene until the obstinate solution is attained. In short, genetic procedures for example X-over, mutation and reproduction are used for the transformation in population.
Crystals 2020, 10, 741 4 of 33 choose a best possible mode of operation is adaptive control that is employed [58][59][60]. This approach uses the parameters that are same as of GP. Since all adaptation take place in simple linear structure because in overall structure mutation and structure replication is not required. Moreover, each chromosome comprises of genes which have two well-defined adjacent regions which is called head and the terminal symbols (nodes of leaf) called tail. In head the symbol are used to code internal on ET and in tail it is encoded in expression tree (ET) [61][62][63]. Figure 1 displays the GEP algorithm schematic layout. The procedure is started with the random formation of fixed dimension chromosome for each singular. Second, the genes are fetching as ETs and tested for their best fitness. Afterwards, the reproduction is applied to the individuals evaluated by the fitness function. The complete hierarchy is repetitive with newly produced gene until the obstinate solution is attained. In short, genetic procedures for example X-over, mutation and reproduction are used for the transformation in population.

Experimental Database
The model is built with the aid of 227 test results collected from more than 40 literature studies is attached in Appendix A. Only those results were included in the database in which no reinforcement in the infilled concrete is used. Frequency histograms are used for the visualization of the data distribution as shown in Figure 2

Experimental Database
The model is built with the aid of 227 test results collected from more than 40 literature studies is attached in Appendix A. Only those results were included in the database in which no reinforcement in the infilled concrete is used. Frequency histograms are used for the visualization of the data distribution as shown in Figure 2 Table 1. Moreover, Figure 3 represents the relationship of individual variables with each other.   Table 1. Moreover, Figure 3 represents the relationship of individual variables with each other.
One major drawback comes in the supervised machine learning algorithms is the over fitting of data [64,65]. Abundant explanations have been recommended in the literature to evade this problem. Fulcher suggested to train and validate the data on different set of data [66]. In this study, this procedure is used by arbitrarily separating the obtainable data into three sets, namely as a validation set, learning set and testing set. First, the model is established created on the learning set or train set which is then validated by dividing set of data and finally test was conducted to evaluate the performance of model on test set [67]. The validated model is test on the data which is not used on train data.  One major drawback comes in the supervised machine learning algorithms is the over fitting of data [64,65]. Abundant explanations have been recommended in the literature to evade this problem. Fulcher suggested to train and validate the data on different set of data [66]. In this study, this procedure is used by arbitrarily separating the obtainable data into three sets, namely as a validation set, learning set and testing set. First, the model is established created on the learning set or train set which is then validated by dividing set of data and finally test was conducted to evaluate the performance of model on test set [67]. The validated model is test on the data which is not used on train data.
Various parameters in designing long circular CFST members may be interdependent. Interdependency is needed to be check as it leads to difficulty in the interpretation of the model. In addition, the interdependency causes numerous problems during investigation as it upsurges the strength of relations between different parameters. This kind of problematic is often mentioned to as a "multicollinearity problem" [68]. Therefore, the association coefficients are calculated for all the possible mixtures among the parameters and are presented in Table 2. It can be detected that all the relation coefficients (both negative positive and) are not extraordinary, presentation no danger of "multicollinearity problem". Various parameters in designing long circular CFST members may be interdependent. Interdependency is needed to be check as it leads to difficulty in the interpretation of the model. In addition, the interdependency causes numerous problems during investigation as it upsurges the strength of relations between different parameters. This kind of problematic is often mentioned to as a "multicollinearity problem" [68]. Therefore, the association coefficients are calculated for all the possible mixtures among the parameters and are presented in Table 2. It can be detected that all the relation coefficients (both negative positive and) are not extraordinary, presentation no danger of "multicollinearity problem".

Development of Model
The study aims in establishing a novel-based prediction equation for the axial compressive strength of CFST members using the GEP approach. The main variables frequently used in the earlier codes and analytical models were used as input variables. These parameters were evaluated based on the literature [15,21,69]. Therefore, the formulation of the axial ultimate strength of CFST member was assumed as follows: In the above equation, N is the ultimate axial capacities of the long circular CFST column. f y , t, D and L are the yield strength, thickness, outer diameter and outer steel tube length, respectively. Whereas f c is the 28-day compressive strength of concrete cylinder. The key input parameters used in the GEP algorithm are shown in Table 3. These variables have influence on model and thus importance should be given while selecting the governing one. Moreover, six basic mathematic operators (+, −, ÷, ×, square, cubic root) were used in predication of model. The model prediction and time required to model is completely dependent on the difficulty of the problems, the population size and the variables. The model gets stopped after best fitness. In addition, gene size and chromosomes of the model have influence on the prediction of properties. Each gene size consists of a unique expression tree. The number of chromosomes in the genes and head size describes the difficulty level of GEP-based model. The overall fitness of the new programs is calculated via the mean absolute error (MAE) function. The parameters values included are calculated using trial and error. GeneXproTools 5.0 by Gepsoft Lda-Portugal was used to implement the GEP algorithm [70].
To achieve a consistent distribution of data, numerous arrangements of testing and training sets were established. The distribution of data in term of learning set, validation set and the model which predicts the response was used in GEP model to select the best response, namely as testing set. An objective function presented by Babanajad, Gandomi [71] is used to measure the fitness of learning and validation set. The finest GEP model was obtained by reducing the objective function (Equation (2)).
In the above equation, n V and n L are the test numbers in validation sets and learning sets, respectively. R 2 L , m L and rm L are the determination coefficient, mean absolute error and root mean square error of learning set, respectively. R 2 V , m V , and rm V are the determination coefficient, mean absolute error and root mean square error of validation set, respectively. These all are calculated using the following equations. The mathematical forms of mean square error (MAE), root mean square error (RMSE) and determination coefficient are represented in Equations (3) and (4).
In the above equations, x i and y i are the actual output and calculated output for the ith output, respectively. It is worth noting that the objective function presented in Equation (2) considers m, rm and R together, which results in a more accurate model. Furthermore, the given objective function takes into consideration the effect of distinct data sets, i.e., learning and validation sets. Lower values of m and rm indicates higher accuracy of the model.

Results and Discussion
The equation obtained for the ultimate axial capacity of circular CFST members is specified in Equation (6). The objective function ( f min ) value obtained for Equation (2) is 182.52. Equation (6) is obtained from the expression tree which is shown in Figure 3. In Figure 4, the c1-c9 represents different constant values tried by the GEP, d0-d6 are different variables explained in Equation (1), while the 3Rt represents the cubic root of the value. It can be seen that the capacity of a concrete-filled steel tube is dependent on the input variables, namely as diameter, thickness, length to diameter ratio, yield strength, compressive strength as shown in Equation (6). Moreover, every parameter has a key influence on capacity thus increasing one or decreasing another will sufficiently have a benignant and malignant effect on its strength.
where N GEP is the ultimate axial moment capacity of the column calculated from Equation (6) and f c ' is the compressive strength of infilled concrete. D, t, Land f y are the diameter, thickness, length and yield strength of the steel tube, respectively. The relationship between predicted values and experimental values is shown in Figure 5. The important statistical values of the proposed equation for learning, validation and testing sets are given in Table 4. It can be seen that the R 2 value was increased from 0.97 to 0.99 while MAE and RMSE decreases 134 to 124 and 210 to 173, respectively. Moreover, that the error value for testing is lesser as compared with other training and validation set. This illustrates that the present GEP model can accurately predict the axial capacity of CFST members and can be used for the generalization purpose [72].
where NGEP is the ultimate axial moment capacity of the column calculated from Equation (6) and fc' is the compressive strength of infilled concrete. D, t, Land fy are the diameter, thickness, length and yield strength of the steel tube, respectively.  important statistical values of the proposed equation for learning, validation and testing sets are given in Table 4. It can be seen that the R 2 value was increased from 0.97 to 0.99 while MAE and RMSE decreases 134 to 124 and 210 to 173, respectively. Moreover, that the error value for testing is lesser as compared with other training and validation set. This illustrates that the present GEP model can accurately predict the axial capacity of CFST members and can be used for the generalization purpose [72].

Model Performance, Validity and Comparative Study
The existing formulae provided by six different design codes (AS5100.6 (2004), EC4 (2004), AISC, BS, DBJ, AIJ) are utilized for the comparison of the suggested model. The process for the calculation of the axial load capacity of circular CFST columns is described in Table 5. The Australian standard (AS5100.6) counteract for the interaction effect of and steel tube concrete core. It also contains the effectiveness of concrete confinement. The relation presented by British standard (BS5400) contains an allocation for the eccentricity of the minor axis that does not surpass 0.03 times the composite column's least lateral dimension. It is improper as the engineer's preference may increase it. The equation of the American Institute of Steel Construction (AISC 2005) accounts for the effect of the restraining hoop that results from transverse confinement. This phenomenon increases the usable concrete stress. The relation provided by the Architectural Institute of Japan (AIJ 2001) involves a confinement factor that accounts for the reduction in the steel tube effective yield stress,

Model Performance, Validity and Comparative Study
The existing formulae provided by six different design codes (AS5100.6 (2004), EC4 (2004), AISC, BS, DBJ, AIJ) are utilized for the comparison of the suggested model. The process for the calculation of the axial load capacity of circular CFST columns is described in Table 5. The Australian standard (AS5100.6) counteract for the interaction effect of and steel tube concrete core. It also contains the effectiveness of concrete confinement. The relation presented by British standard (BS5400) contains an allocation for the eccentricity of the minor axis that does not surpass 0.03 times the composite column's least lateral dimension. It is improper as the engineer's preference may increase it. The equation of the American Institute of Steel Construction (AISC 2005) accounts for the effect of the restraining hoop that results from transverse confinement. This phenomenon increases the usable concrete stress. The relation provided by the Architectural Institute of Japan (AIJ 2001) involves a confinement factor that accounts for the reduction in the steel tube effective yield stress, caused by the hoop stresses. In the Eurocode 4 (EC4 2004), the equation accounts for the confinement effect in addition to the effect of steel tube and concrete core interaction. The concrete strength is increased by the triaxial state of stress conditions and the hoop stress that reduces the steel effective yield stress. The Chinese code (DBJ 1999) provides an equation ultimate axial moment capacity that cannot be used for ultra-high-strength concrete.
The comparison between the predicted values from the GEP model and different established codes is shown in Figure 6. In Figure 6, the model accuracy is highest for the value of 1. The frequency of 1 is highest for the GEP model while it is lowest for AS5100.6. In addition, it can be seen from the below Figures that the frequency of all the codes lies above 1. Thus, minimizing its practical implementation in calculation of strength. On the other side, GEP model show the distribution of its frequency in the range of 0 and 1. Thus, making it a safe approach in prediction. The statistical parameters for the comparison purpose are shown in Table 6. The R-value must approach to 1 for maximum accuracy. A value of R greater than 0.8 is deemed acceptable [73]. GEP model gives the best results than the available design codes. Furthermore, the MAE and RMSE are calculated for available design codes and the GEP model. Both MAE and RMSE should be minimum for higher accuracy. Based on MAE and RMSE, GEP gives the most accurate results followed by AIJ and BS, respectively.
; λ 1 ≤ 1.0 parameters for the comparison purpose are shown in Table 6. The R-value must approach to 1 for maximum accuracy. A value of R greater than 0.8 is deemed acceptable [73]. GEP model gives the best results than the available design codes. Furthermore, the MAE and RMSE are calculated for available design codes and the GEP model. Both MAE and RMSE should be minimum for higher accuracy. Based on MAE and RMSE, GEP gives the most accurate results followed by AIJ and BS, respectively.
Crystals 2020, 10, 741 17 of 33 Figure 6. Evaluation of the concrete-filled steel tubes (CFST) columns experimental and predicted axial bearing capacity.   The model evaluation between errors and performance coefficient is measured by performance index (ρ) [74]. ρ is used successfully by numerous researchers and is calculated by using Equation (7): where Rr m is the relative r m . Higher value of ρ shows bad achievement of the model and vice versa. From Table 6, it is determined that the GEP model outperforms the available design codes by huge margin.
The model accuracy can also be checked by several statistical measures. Frank and Todeschini [74] proposed that the accuracy of model is based on the number of testing set and the numbers of parameters used in modeling. He suggested and equation in which the ratio of both aforementioned should be greater than or equal to 5 as presented in Equation (8): No. o f experimental tests No. o f variables used ≥ 05 (8) In this research, the ratio is 44. Furthermore, external verification is also suggested by researcher [75]. The test recommended that the slope of one of the regression lines moving through the origin should be approximately 1 [76]. In addition, test recommended by Roy [77] was also conducted for the given model. Table 7 outlines the acceptance benchmarks and the results of the built GEP model. The model developed based on GEP adamantly fulfils the criteria of all the above-mentioned tests. It is therefore inferred that the GEP model established is accurate and is not a simple correlation.  Simplicity is the utmost advantages in prediction of mechanical properties based on GEP algorithm. This adamant advantage helps in calculation of ultimate axial capacity by hand calculations using GEP-based formula. GEP model is completely independent and does not depend on the previous equations and design models. Moreover, increasing the training and validation set data enhance the overall accuracy of the model.
A comparison of GEP model with equations suggested by various authors was made on all data set [78][79][80]. It can be seen in Figure 7 that GEP model give an adamant R 2 accuracy of about 0.94 as compared to other models. This is due to simplified nature of GEP in prediction. Moreover, Glakoumelis et al. [80] predict the compressive nature of CFST by giving an empirical relation with a strong correlation value R 2 of about 0.895. Also, Goode et al. [79] and Lu et al. [78] give same empirical equation with some modification with R 2 value of 0.807 and 0.903, respectively as illustrated in Figure 7. This study show us that GEP-based empirical equations can be used in prediction of different variables.
Simplicity is the utmost advantages in prediction of mechanical properties based on GEP algorithm. This adamant advantage helps in calculation of ultimate axial capacity by hand calculations using GEP-based formula. GEP model is completely independent and does not depend on the previous equations and design models. Moreover, increasing the training and validation set data enhance the overall accuracy of the model.
A comparison of GEP model with equations suggested by various authors was made on all data set [78][79][80]. It can be seen in Figure 7 that GEP model give an adamant R 2 accuracy of about 0.94 as compared to other models. This is due to simplified nature of GEP in prediction. Moreover, Glakoumelis et al. [80] predict the compressive nature of CFST by giving an empirical relation with a strong correlation value R 2 of about 0.895. Also, Goode et al. [79] and Lu et al. [78] give same empirical equation with some modification with R 2 value of 0.807 and 0.903, respectively as illustrated in Figure 7. This study show us that GEP-based empirical equations can be used in prediction of different variables.

Conclusions
This study represents a novel and dominant method for the derivation of the expression to compute the ultimate axial capacity of CFST long circular columns by genetic engineering programming (GEP) for the first time. The resulting equation is empirical, and is formed by previous experimental data published in literatures. The suggested equation is simplest and CFST axial capacity can be determined by hand calculations. All the model outcomes show outstanding consent

Conclusions
This study represents a novel and dominant method for the derivation of the expression to compute the ultimate axial capacity of CFST long circular columns by genetic engineering programming (GEP) for the first time. The resulting equation is empirical, and is formed by previous experimental data published in literatures. The suggested equation is simplest and CFST axial capacity can be determined by hand calculations. All the model outcomes show outstanding consent to the experimental results. Different statistical parameters such as RMSE, MAE and R 2 proved the accuracy and reliability of GEP-based derived equations. In addition, this supervised machine learning algorithm can be used in many other domains. As they help us in making the forecast prediction by training and testing of data. This artificial intelligence-based algorithm then helps scientific community by taking measures and overcome the issues associated in mechanical work or in experimental work. Though, the comparison between the MAE, RMSE and R 2 of GEP model, AS5100.6, EC4, AISC, BS, DBJ and AIJ shows that GEP model performs best for all sets (learning, training and validation) of data. Even though the GEP-based model can calculate short CFST shear strength, it is restricted to long circular columns. The findings from this new research will give civil engineers and structural designers some useful information and can be used as a modern and powerful method to help decision-making in concrete construction fields.