A Gene Expression Programming Model for Predicting Tunnel Convergence

Underground spaces have become increasingly important in recent decades in metropolises. In this regard, the demand for the use of underground spaces and, consequently, the excavation of these spaces has increased significantly. Excavation of an underground space is accompanied by risks and many uncertainties. Tunnel convergence, as the tendency for reduction of the excavated area due to change in the initial stresses, is frequently observed, in order to monitor the safety of construction and to evaluate the design and performance of the tunnel. This paper presents a model/equation obtained by a gene expression programming (GEP) algorithm, aiming to predict convergence of tunnels excavated in accordance to the New Austrian Tunneling Method (NATM). To obtain this goal, a database was prepared based on experimental datasets, consisting of six input and one output parameter. Namely, tunnel depth, cohesion, frictional angle, unit weight, Poisson’s ratio, and elasticity modulus were considered as model inputs, while the cumulative convergence was utilized as the model’s output. Configurations of the GEP model were determined through the trial-error technique and finally an optimum model is developed and presented. In addition, an equation has been extracted from the proposed GEP model. The comparison of the GEP-derived results with the experimental findings, which are in very good agreement, demonstrates the ability of GEP modeling to estimate the tunnel convergence in a reliable, robust, and practical manner.


Introduction
Avoiding transportation problems and heavy traffic has led to the development of underground roads in metropolises. There are several ways to excavate tunnels in urban areas; among them, the New Austrian Tunneling Method (NATM), is the most preferred method, especially when dealing with soft grounds. In this method, the strength of the surrounding ground is used to the greatest extent possible to strengthen the tunnel structure. The excavation takes place sequentially, thus taking advantage of the ground conditions. In addition, a support system is installed, as needed. The temporary support system usually includes one or two layers of shotcrete, reinforced with steel wire mesh, if necessary. The final permanent support system is commonly a cast-in-place concrete lining, placed over a waterproofing membrane. Tunnel stability, ground displacements induced by tunneling, and the performance of the support system, are some of the issues that must be taken into consideration for the successful design and implementation of a NATM tunnel [1]. The ground movements, induced by tunneling, affect the safety of adjacent underground structures and surface buildings [2]. Therefore, it is necessary to predict the ground displacements and their effects on the existing buildings, throughout the tunnel route [3]. In this regard, ground displacements at the surface and subsurface around the tunnel are frequently measured during and after construction. Convergence monitoring, as a simple measurement, provides valuable data on the deformation around the tunnel and tunnel wall.
The instrumentation and monitoring of underground structures is of great importance, on account of the uncertainties in the mechanical behavior of the ground. In addition the implementation impacts of an underground structure on the adjacent structures increase the complexity of the design and necessitate a more precise monitoring plan. This is of even higher importance when underground structures are excavated in shallow urban environments. Instrumentation tools are used to precisely quantify certain parameters of structural behavior and to monitor their rate of change [4]. Depending on the use and importance of an underground structure, various instrumentation tools are used to monitor the behavior of the underground structure and its surrounding ground.
In-situ measurements during the construction process of a tunnel are a very useful tool for measuring the tunnel stability. Various techniques are now available for measuring stress and displacement in underground spaces. Among them, the best and most appropriate technique that can measure displacements, are convergence meters. In practice, obtained convergence data, during and after tunneling, are used for a wide variety of decision-making aims [5]. Due to the ease of use of convergence meters and the reliability of the obtained data, the displacement measurements can be used to obtain the stress distribution around the tunnel. Accordingly, a method has been proposed for tunnel stability evaluation based on the obtained strains and therefore, stress analysis is thus not required [6]. Convergence of a tunnel is the amount of closure on the tunnel walls, which is due to the loss of the stress-strain equilibrium state of the surrounding soil or rock mass, resulting in a redistribution of stress around the excavation [7][8][9][10].
An important application of convergence measuring is demonstrated in the convergence-confinement method. This method calculates the load applied to the support system, installed behind the tunnel face, and provides insight in relation to the interaction between tunnel lining and the surrounding ground [11][12][13][14][15]. The convergence-confinement method has been used in several researches; investigating active grouted rock-bolts in tunnel stability [16], design of active grouted rock-bolts in tunnel stability [17] and analysis of the interaction between the lining of a tunnel boring machine (TBM) tunnel and the ground [18].
In recent years, artificial intelligence (AI) methods have been extensively used in many fields of engineering and science  on account of the ability of these methods to generate nonlinear relationships between input and output data. Research has also been conducted aiming to predict tunnel convergence by means of AI approaches. By developing an adoptive neuro fuzzy inference system (ANFIS) model, Adoko and Wu [44] proposed an approach to predict tunnel convergences and convergence velocity of NATM tunnels, by using collected datasets from two different high-speed railway tunnels in China. The results of this research demonstrated the ability of the proposed ANFIS model to predict tunnel convergence and convergence velocity. Mahdevari and Torabi [45] proposed a back propagation artificial neural network (ANN) model to predict the convergences of TBM tunnels, using several datasets from convergence monitoring measurements in different sections of the Ghomroud water conveyance tunnel. The collected datasets were utilized to estimate the unknown non-linear relationships between the rock properties and the tunnel convergence. The results demonstrated that the developed model is fairly able to predict tunnel convergence. Further research has also been conducted to forecast tunnel convergence by means of AI approaches [46][47][48]. In this paper, a gene expression programming (GEP) model is developed to predict tunnel convergence in NATM tunnels. The advantages of the proposed model, in relation to the aforementioned researches, is that a mathematical equation can be derived from the proposed model to predict tunnel convergence in NATM tunnels. In typical AI approaches, such as ANNs, a model can be proposed, in contrast to an equation, which, thus, may not be useful for researchers and engineers in practice. In the following sections, after some explanations regarding the case study under examination, the principles of GEP will be described. Then, the used database, with detailed input and output parameters, together with GEP model development, will be discussed. Finally, an evaluation of the developed GEP equation will be given in detail.

Case Study
A large number of reliable data is required in order to generate a GEP model investigating tunnel convergence. It was decided to investigate Phase I of Line 2 of the Karaj Urban Railway (KUR) Project. This selection was made mainly because extensive geotechnical studies were carried out, before and during the tunnel construction, as part of the project. Furthermore, tunnel convergences were regularly monitored throughout construction of the tunnel, providing valuable data.
Karaj is a large industrial city located about 40 km to the west of Tehran, the capital city of Iran. The city is the capital of Alborz Province, located at the foothills of the Alborz Mountains. With 1.6 million inhabitants and as the fifth-largest city of Iran, Karaj is gradually becoming an extension of metropolitan Tehran. To address traffic problems of the city, six subway lines were designed, which cover all the crowded areas, aiming to facilitate transportation issues.
With a length of 27 km, Line 2 of KUR was constructed between Kamal-Shahr, in the north-west part of the city, and Malard, located in the south area of Karaj. This line includes a single tunnel with 8.40 m width and 7.80 m height, which was constructed using NATM. In addition to the tunnel, 23 underground stations were constructed using enlarged tunnel by NATM and open cut techniques.

Geological and Geotechnical Properties
Karaj County is situated in the Karaj sedimentary basin, which consists of two main formations, Alborz Mountains in the north and Central Plateau in the south. The topography of this sedimentary basin has the form of En Echelon fold and the tectonic evolution of this area began with the Alpine orogeny. Grain size of the Karaj alluvial fan is relatively homogeneous and is composed of rubble; more precisely, gravel, sand, and a little clay [49]. The KUR tunnels are excavated in young alluvial layers, whereas the major part of the city is situated on the young and unconsolidated deposits.
To determine the soil layers' thickness, the mechanical and physical properties of the soil and the groundwater level, boreholes were drilled all over the excavation area, spaced at every 100 m along the tunnel alignment. Analysis of boreholes displayed no water table in the containing soil of the tunnel and stations. To investigate the soil properties in the tunnel alignment and stations, geotechnical field testing, including SPT, permeability (Lefranc) test, density, and moisture content tests were performed. In addition, a limited number of large-scale tests, including the in-situ plate loading test and the in-situ direct shear test, were carried out throughout the tunnel alignment. Table 1 shows the results of the geotechnical field testing. The mechanical properties of the soil layers were determined based on the results of in-situ field measurements and respective laboratory tests. Table 2 shows the results of the mechanical properties of the soil layers in Part I of KUR project.

Construction Method
NATM was implemented for the tunnel excavation in the KUR project. Aiming to ensure ground self-stability until the completion of the lining installation, the excavation of the tunnel was distributed into two sections: Top heading and bench. The top heading was excavated in one step and the bench in two steps. Figure 2 shows the tunnel dimensions and excavation sequence. According to this figure, the tunnel has a horseshoe-shape with 9 m height and 9.6 m width, after excavation, and 7.8 m height and 8.4 m width with the lining. A layer of shotcrete lining with 0.30 m thickness including wire mesh (10 mm diameter at 150 mm by 150 mm spacing) together with lattice girders formed the primary support system. A 30 cm thick reinforced concrete layer shaped the final tunnel lining. As schematically represented in Figure 2, the excavation of the tunnel was separated into three steps during which the top heading (Step I) was excavated first. The excavated area was supported by using a primary support system after excavating the top heading for a distance of 1.2 m. Excavation for Steps II and later III began upon reaching the 120 m distance for the excavation of the top heading. Tunneling process was completed by applying concrete reinforced liner as a final tunnel support, after installation of the waterproof membrane.

Monitoring Program
Ground, tunnel, and building behavior during tunneling were monitored using several types of instrumentation in the KUR project. As a major part of the monitoring program, a large number of settlement markers were installed and surface settlements were measured frequently. In addition, aiming to investigate the subsurface settlement, several extensometers were installed in some stations of the project [50].
The tunnel convergence was measured using tape extensometers throughout the tunnel alignment. In this project, the convergence pins were installed in three points after top heading excavation and were increased to five points after full face excavation. The pins were installed at every 50 m throughout the tunnel route a few hours after excavation. Convergence pins are stainless steel eyebolts that were grouted into the tunnel lining to reveal the magnitude and direction of movements. In every convergence station, the distances between the convergence pins were measured using a tape extensometer. The typical monitoring plan implemented in the KUR project is depicted in

Gene Expression Programming (GEP)
Gene expression programming is a method for developing computer programs and mathematical modelling, based on evolutionary computation and inspired by natural evolution. It presents a solution, in the form of a tree structure, by using a certain dataset [51]. This method was invented by Ferreira in 1999 and formally introduced in 2001 [52]. The GEP algorithm, integrates the dominant view of the two prior inheritance (genetic algorithm (GA) and genetic programming (GP)) algorithms, aiming to overcome their weaknesses. In this method, the chromosome genotype is similar to a GA, and the phenotype of a chromosome is a tree structure, with variable length and size, similar to the GP algorithm. Therefore, by overcoming the dual role limitation of chromosomes in the earlier algorithms, the GEP algorithm enables multiple genetic operators to be assured of the permanent health of the offspring's chromosomes and at a faster rate than GP. In this sense, GEP succeeds in crossing the presumed as first and second thresholds in natural evolution processes (Replicator Threshold and Phenotype Threshold).
The logical relationship between several variables (if any) may be a function, or at least a function that can be described accurately. This function can be algebraic operators (+ − * /), Boolean logic operators (such as OR, AND, and IF), or a variety of algebraic functions such as trigonometric and exponential. Obviously, the logical relationship between variables should be studied. To find the relationship of variables a and b with y, using the GEP algorithm, a population of linear chromosomes is first formed. At each position of the genes of these chromosomes, one of the variables may be placed. Once the chromosomes have been created and their positions filled, it is time to test the fitness of each individual (chromosome) in the generation studied. For this purpose, in the GEP algorithm, the chromosomes are expressed as an expression tree (ET). An ET is similar to a protein in a natural cell which determines what a gene's phenotype is [53].
Ferreira [52] invented a simple and efficient language called Karva to express genes and generate ETs, whereby a mathematical equation (program) is created and extracted from each chromosome, consisting of random terminals and functions. Now, if a fitness case is given as an example (that is, a set of points where each value of y is subtracted and recorded for values a and b), one can estimate how close the value of y (calculated through the equation) is to the actual value of the same points, for the specified values of a and b at different points. As the difference between them decreases, accuracy of the equation increases, thus leading to higher fitness. For each chromosome produced in the first generation, fitness is calculated and assessed in this manner. These chromosomes receive a score for the next generation depending on their proportion of total fitness. In the meantime, the most graceful person in any generation, without the selection process, is transferred directly to the next generation.
To create the next generation, the linear state (genotype) of the chromosomes is used. As such, full-length chromosomes, whether active or inactive, will be present in the next generation. The inactive part of a gene may then become an active and fully adaptive part with a next-generation mutation. The five basic steps of designing a GEP algorithm are: The first step in generating the next generation is the replication process that is implemented using the Roulette Wheel. The wheel rotates (of course, conceptually) and picks a chromosome each time (this is essentially coded by generating and assigning random numbers). Higher-rated chromosomes have a greater chance of being selected, although there is no decision, and it is precisely this feature that brings the algorithm closer to the random selection process of nature. This process continues until the number of current generations is transferred to the next generation so that the number of chromosomes is constant throughout evolution.
At that time, restructuring begins. This means that the genetic operators are sequenced on the identical chromosomes in the order specified in the algorithm to transform the chromosomes. This trend in the generation of new generation chromosomes, and their sequential testing, gradually simulates natural evolution, often approaching the optimal equation of interest after a given number of generations. Of course, a limit can also be set for the number of iterations of the algorithm, to prevent memory and time loss, if the algorithm fails to recover after a certain period of time. Figure 4 shows the flowchart of a GEP.

Data Sets
To predict tunnel convergence using GEP, it is necessary to investigate influential parameters. It is important to stress that GEP uses given data in order to reveal the relationship between the data in question. Thus, in the prediction of tunnel convergence by means of GEP, the accuracy of the results is considerably influenced by the relevancy of the selected parameters. In addition, impertinent input rejection can improve the prediction results, as well as decrease the training time.
To this end, parameters affecting convergence of a tunnel were categorized into three universal groups, as shown in Figure 5, including "geological and geotechnical conditions", "tunnel properties and tunneling methods", as well as "distance from tunnel face".

Geotechnical Parameters
In the KUR project, various geotechnical tests were conducted throughout the tunnel alignment and tunnel convergence was recorded during tunnel construction. The boreholes were drilled at every 100 m throughout the tunnel alignment in the KUR project and some of the convergence stations are not located exactly at the same location point of the boreholes. Therefore, the nearest neighbor was used to determine the soil properties at the location of convergence stations. The nearest neighbor is a simple method for classifying objects based on closest training examples. This method is a type of instance-based learning, in which the function is only approximated and an object is classified by a majority vote of its neighbors.
Ground water level, as an important influential parameter, may affect the soil movements. High water level over the tunnel invert increases the possibility of the flow of water towards the tunnel face, which may in turn lead to large displacements in the tunnel face and, consequently, large convergences. In line 2 of KUR, the study of boreholes and hand-dug wells illustrated no water table in the containing soil of the tunnel and underground stations. Consequently, ground water level was not investigated in this paper as an influencing parameter.
Soil cohesion and friction angle, as independent parameters of soil strength, have also been investigated to estimate tunnel convergence using GEP. It is obvious that soil cohesion is a suitable parameter to evaluate only fine grained soils. However, due to the fact that this project includes a wide range of fine grained and coarse grained soils, this parameter was selected to be used in evaluating the surface settlement using AI methods. In addition, the friction angle, which represents the shear strength of granular soils, has also been investigated in simulations.
Two deformation parameters of soil which include elasticity modulus and Poisson's ratio have also been investigated. Obviously, more plastic points are formed in the area with low elasticity, and as a result, the convergences are expected to be larger in these areas.

Tunnel Geometry and Tunneling Operation
The depth of the tunnel and the excavation area are influential geometric parameters on ground movements. Generally, an underground excavation with a large diameter tends to cause high convergence, especially when tunneling in soft soils. It is estimated that a greater volume of soil moves towards the tunnel face in large diameter tunnels, when the values of the other parameters are constant, and may lead to larger convergences. However, since the excavation area of the tunnel in this project is constant, the influence of the area on tunnel convergence was neglected.
The tunnel depth is also an influential geometric parameter on tunnel convergence. The relationship between the tunnel depth and measured convergences for the KUR Project is shown in Figure 6. As one may notice, when the tunnel is deeper than 20 m, larger convergences generally occur.

Distance from Tunnel Face
In addition to the abovementioned parameters, the distance of the measurement point to the tunnel face should also be considered in the prediction of tunnel convergences. For a long time after tunnel construction and as the tunnel face moves away from the measurement point, the convergence of the tunnel continues. Figures 7-9 show the amount of tunnel convergence at the chainage 1 + 787 km of KUR. As it can be seen in Figure 9, convergences increased as the distance to the tunnel face increased. However, as the cumulative amount of maximum convergences is used as the output parameter in this research, the distance from the tunnel face was not investigated as an input parameter.

Input and Output Parameters
Six parameters were selected to be used as input data, aiming to predict tunnel convergence using GEP. Table 3 shows the range of parameters used in the prediction. A total of 118 input datasets were obtained, based on experimental data. These refer to Parts I, II, and III of the KUR Project, spanning a tunnel length of 14.5 km. Thus, each dataset includes a set of six input parameters. In addition, ranges of the output parameter, used for training the GEP models, are shown in Table 3. The output parameter corresponds to the amount of total cumulative convergences of the crown and the upper sides.

Convergence Prediction Using GEP
As previously mentioned, 118 datasets were prepared to simulate tunnel convergence by means of GEP. To validate the simulation results, available datasets were randomly divided into two groups; 70 percent of the total datasets (79 datasets) were used as training datasets to develop a model and the rest of the datasets (39 datasets) were used as validation/test datasets. GeneXproTools (Version 5.0.3919) was used to conduct simulations. In order to select the best model to predict tunnel convergence, following a trial and error approach, two important features of the GEP configurations, namely, the number of chromosomes and the number of genes, were investigated within a wide range of combinations. In this regard, root mean square error (RMSE) was set as the fitness function. Thus, while keeping the other parameters constant, the optimum number of chromosomes was determined in terms of minimum error and maximum coefficient of determination. Figures 10-12 show the performance of different models with different numbers of chromosomes. From these figures, it can be seen that the minimum RMSE and mean absolute error (MAE) were obtained when the number of chromosomes was equal to 35. The same results were obtained when investigating the coefficient of determination (R 2 ) and the maximum R 2 for training and testing datasets. Therefore, a number of 35 chromosomes was selected as the optimal number of chromosomes for the problem under investigation.   Several models were implemented to obtain the best GEP model with a fixed number of 35 chromosomes and variable numbers of genes. Figures 13-15 illustrate the performance of the implemented models in terms of RMSE, MAE, and R 2 , respectively. As shown in these figures, the model with five genes yields the best performance among the models and therefore, a number of five genes was selected for the model implementation. Table 4 shows the GEP configuration that was used for tunnel convergence simulation.

Evaluation of the Results
In order to evaluate the developed model, statistical evaluation criteria were used. These criteria consist of RMSE, MAE, and R 2 given by the following equations: Furthermore, a new engineering index, the recently proposed a10-index [54][55][56][57][58] was also used for the reliability assessment of the developed models: where M is the number of dataset samples and m10 is the number of samples with value of rate experimental value/predicted value between 0.90 and 1.10. It should be noted that for a perfect predictive model, the value of the a10-indexis expected to be unity. The proposed a10-index has the advantage that its value has a physical engineering meaning. It declares the relative amount of samples which satisfy predicted values, with a deviation of ±10% compared to experimental values. With the configuration settings given in Table 4, GEP modeling was performed several times aiming to obtain the best results. Table 5 shows the performance of the selected model for the prediction of tunnel convergence using GEP. The selected model is able to predict tunnel convergence with a high degree of accuracy, as can be seen in Table 5, as well as in the scatter plots presented in Figures 16  and 17, with an R 2 of 0.997 and an R 2 of 0.986 for training and testing datasets, respectively. It is worth noting that all samples used for the training process have a deviation of less than ±10% (points between the two dotted lines in Figure 16), while during the evaluation process only one sample presented a deviation greater than ±10% (Figure 17). Figures 18 and 19 present a comparison between the exact experimental values and the predicted values of the GEP model.     In fact, this figure shows the extracted relationships between input and output data by using the GEP algorithm. This feature, which enables the extraction of mathematical formulas from the GEP algorithm, makes its results more applicable, in comparison to other AI methods.
in which, Con is tunnel convergence (mm), D is tunnel depth (m), C and ϕ are the cohesion (kPa) and internal friction angle ( • ), respectively, γ is unit weight (kN/m 3 ), E is the elasticity modulus (MPa), and ν is Poisson's ratio.

Conclusions
A GEP model was proposed with the aim of predicting tunnel convergence, using experimental data. In order to develop this model, a database was used based on experimental datasets. Each dataset consisted of six input parameters, namely, tunnel depth, cohesion, frictional angle, unit weight, Poisson's ratio, and elasticity modulus and only one output parameter, the cumulative convergence. Thus, 118 sets of data, obtained from the KUR project, were utilized for the modeling procedure. Respectively, 79 and 39 sets of data were used for the model construction (training) and its assessment (testing). Statistical evaluation criteria were utilized to evaluate the performance of the developed model. Following the trial and error method, several models were implemented to determine the best GEP configurations and, finally, an optimum model, with 35 chromosomes and five genes, was developed. Based on the derived results, it can be concluded that the proposed GEP model is a useful and reliable tool to predict convergence of NATM tunnels. An equation, derived from the GEP model, offers the added advantage that it can be easily applied by all scientists and researchers for the prediction of cumulative tunnel convergence. Furthermore, the proposed method is advantageous over other methods, on behalf of the fact that it presents no limits concerning input parameters. In the future the authors plan to investigate other parameters as well, which seem to affect tunnel convergence, using an even more extended experimental database.