AdaBoost-Based Back Analysis for Determining Rock Mass Mechanical Parameters of Claystones in Goupitan Tunnel, China

: The back analysis is an effective tool to determine the representative values of rock mass mechanical properties in rock engineering. The surrogate model is widely used in back analyses since analytical or numerical models are usually unavailable for practical engineering problems. This study proposes a novel back analysis framework by adopting the AdaBoost algorithm for deriving the surrogate model. Moreover, the simplicial homology global optimization (SHGO) algorithm, which is robust and applicable for a black-box global problem, is also integrated into the framework. To evaluate the performance, an experimental tunnel in Goupitan Hydropower Station, China, is introduced, and the representative rheological properties of the surrounding rock are obtained by applying the proposed framework. Then the computed displacements based on the acquired properties via both surrogate and numerical models are compared with field measurements. By taking triple-day data, the discrepancy between the calculated and field-measured displacements is less than 0.5 mm This validates the reliability of the obtained properties and the feasibility of the proposed framework. As an AdaBoost-based method, the proposed framework is sensitive to noise and outliers in the data, the elimination of which is recommended before application.


Introduction
The determination of rock mass parameters is essential to the design, excavation, and stability analysis in rock engineering [1] since the representative parameters of rock mass are crucial for choosing the proper engineering measures in rock excavation and support. Although the in-situ and laboratory tests are the typical methods of rock mass property determination, they always suffer from the high consumption of time and cost. Instead, the back analysis is commonly adopted in determining the soil or rock properties based on the field measurements [2]. In ordinary analyses, the mechanical properties of rock mass are required as input data, and the output of the analysis is the mechanical behavior of the rock mass. However, in back analyses, the observed mechanical behaviors are the required input, and the mechanical properties are the output to be determined [3]. The back analysis aims to find the representative values of parameters by minimizing the difference between the field measurements and the computed information via physical model and optimization techniques, which are the two essential components of back analyses. Finding a reliable physical model which describes the mechanical behavior based on the rock mass properties is usually challenging due to the inevitable complexities (e.g., local heterogeneities and unknown discontinuities [4]). Nowadays, more and more numerical models have been developed to serve as physical models with the advantages of simplicity and lower cost consumption. However, the developed numerical models become highly complex and time-consuming when the project is large in scale. It also brings challenges, such as trapping in local minima and confronting high dimensions, to the optimizing procedure.
To overcome the above issues, as an effective machine learning approach, a surrogate model was introduced into the back analysis as a substitute for physical models or complex numerical models. The surrogate model is an approximation model representing a complicated phenomenon of the systems in a computationally efficient manner [5]. Besides, a lot of soft computing techniques, such as the supporting vector machine (SVM) and high dimensional model representation (HDMR), have also been adopted in assisting in the derivation of a proper surrogate model in rock engineering [1,[6][7][8]. Ren et al. [9] adopted the symbolic regression method to automatically identify the closed-form equation for predicting the fatigue property of the cement-stabilized cold recycled mixtures. Feng et al. [10] developed an artificial neural network (ANN) model to replace the FLAC (Fast Lagrangian Analysis of Continua) [11]-based numerical model to determine the mechanical parameters of rock mass in Slope of Three Gorges Permanent Lock, China. Pichler et al. [12] proposed an iterative parameter identification method, which involves ANN and genetic algorithm (GA), for back analyses. Yu et al. [13] combined the ANN and evolutionary calculations in their proposed back analysis method for the displacement of the earth-rockfill dam. Besides ANN, a support vector machine (SVM) was also used by Feng et al. [14] and Zhao et al. [15] in displacement back analyses. Although the aforementioned machine learning techniques had excellent performance in deriving surrogate models, the overfitting issue is still encountered sometimes during their applications. To avoid overfitting and improve efficiency, ensemble methods have been employed in the derivation of surrogate models. Zhang et al. [16] utilized extreme gradient boosting (XGBoost) and random forest techniques based on Bayesian optimization for the prediction of undrained shear strength. Zhao et al. [17] proposed an ensemble neural network (ENN) to predict tunnel boring machine (TBM) performance. Their proposed model can take the uncertainties embedded in the site data into account and make appropriate inferences using very limited data via the re-sampling technique. Due to their outstanding performance, the ensemble approaches were used in various studies [18][19][20][21]. Of the various existing ensemble methods, the Adaptive Boosting (AdaBoost) algorithm [22] is adopted in this study to improve the derivation of surrogate models. The AdaBoost algorithm is prevalent in both regression and classification problems. Liu et al. [23] integrated the AdaBoost algorithm with the classification and regression tree (CART) to predict the classification of surrounding rock mass. Ren et al. [24] adopted the AdaBoost algorithm in their proposed framework for determining the optimal composition of cement-based materials.
Optimization is the other essential component in back analyses. It is used to obtain the proper properties by reducing the difference between the measured and computed values via a physical model. Once the physical model is obtained, different candidate values of the concerned property are input into the physical model, based on which the mechanical behaviors are calculated and compared with the measured ones in the field. The smaller the difference between the computed and the measured behavior is, the closer the candidate input is to the representative value. When the difference is small enough and acceptable, the candidate input can be viewed as the representative value of the concerned property. The optimization algorithm carries out these steps in an effective way until the representative value is obtained. Among the various optimization methods, the simplicial homology global optimization (SHGO) algorithm is adopted in this study for determining the rock mass mechanical parameters. Obtaining the derivative information based on the numerical model and the surrogate model is difficult, and it hinders the application of traditional mathematical optimization methods to the back analysis. Several soft computing techniques, such as genetic algorithm [12,14], artificial bee colony [25], particle swarm optimization [26,27], etc., have also been utilized in optimization problems. Since trapping in the local minimum solutions is a commonly encountered issue, the global performance of the optimization algorithm is critical to the back analysis. The SHGO algorithm is a general-purpose global optimization algorithm that integrates simplicial integral homology and combinatorial topology [28]. It is believed that adopting the SHGO algorithm in this study will enhance the efficiency and guarantee the robustness of the back analysis.
The back analysis is not a novel technique in determining the rock mass properties in geotechnical engineering. It has been widely used in many subjects. Based on 93 samples, Barzegar et al. [18] developed an ensemble committee-based artificial neural network (ANN) model for the prediction of uniaxial compressive strength (UCS) of travertine rock. Yin et al. [19] constructed ensemble models via stacking techniques for rockburst intensity prediction, and the results indicate that the ensemble models perform better than the single classical models, especially for imbalanced data. Cai et al. [29] backcalculated the rock mass strength parameters from acoustic emission monitoring data in combination with elastic stress analyses. However, an effective and robust back analysis method for estimating the mechanical parameters of rock mass based on field deformation measurements in tunneling engineering is still lacking. Integrating AdaBoost and SHGO algorithms in this study will not only improve the effectiveness of the surrogate model derivation but also enhance the robustness of the back analysis.
This study aims to propose a novel back analysis framework by combining AdaBoost and SHGO algorithms to determine the mechanical parameters of the surrounding rock mass in rock engineering. The remainder of this study is arranged as follows: the concept of back analysis and the main ideas of AdaBoost and SHGO are revisited firstly; then the procedure of the proposed framework is presented in detail; to verify the proposed framework, the application to a tunnel project in Goupitan hydraulic engineering, China, is comprehensively investigated; the main results are presented, and the advantages of the proposed framework are pointed out.

Methodology
The back analysis is a commonly used tool for determining the proper rock mass properties in geotechnical engineering. The schematics of the back analysis are shown in Figure 1. Field measurements provide the basic data sets for the back analysis. The physical model, replaced instead with a surrogate model due to time and cost considerations, is the core of the back analysis. By selecting the proper optimization algorithm as well as the objective function, the analysis is conducted iteratively until the proper mechanical properties are obtained. The working principle of the optimization algorithm is to approach the optimal mechanical parameters by minimizing the objective function, which represents the difference between the field measurements and the calculated values determined by the physical model. In this study, the AdaBoost algorithm and SHGO algorithm are utilized in the surrogate model derivation and optimization section, respectively. The main ideas and computing details of the proposed framework are to be described briefly in this section.

Surrogate Model Derivation with AdaBoost Technique
Boosting is an efficient instrument for improving the predicting ability of learning systems both in regression and classification problems. It is also known as the lifting or enhanced learning method. AdaBoost algorithm [22] is one of the most successful boosting algorithms, with the advantages of efficiency in speed and simplicity in programming and operation [30]. The boosting technique converts weak learners, which work slightly better than random guessing, into ones with arbitrarily high accuracy [22]. Among various AdaBoost regression algorithms, the AdaBoost R2 algorithm [31] is adopted in this study for surrogate model derivation. k  th weak learner. The maximum error of the k th weak learner can be represented as Equation (1): where ( ) k i G x is the output of the k th weak learner. Then the relative error for the ith data set can be expressed as Equation (2), where linear error form is adopted.
Moreover, the error rate for the k th weak learner can be obtained by Equation (3), which is the weighted summation of relative errors for each data set.
The weight coefficient for the k th weak learner k  can be obtained by Equation For updating the sample weight for the successive weak learner, is the normalization factor for the k th weak learner.
Finally, by adopting the combination strategy, the weighted weak learner corresponding to the weight median is taken as the final strong learner, i.e., the final regressor, as shown in Equation (6): where Although the physical model in the back analysis is usually complex and difficult to obtain, AdaBoost algorithm provides a promising method for the surrogate model derivation via the boosting technique. To avoid losing generality, the surrogate model derived via the AdaBoost algorithm, ( ) F X , can be viewed as a mapping from N dimensional inputs to Q dimensional outputs. It can be defined as in Equation (7): where 1 2 ( , , , )  represents a vector of N elements, which are, in this study, the N measured properties of surrounding rock mass.
mensional vector, represents the responses induced by excavation during construction in this study. Therefore, the AdaBoost-based algorithm proposed in this study facilitates the back analysis by providing a surrogate model, which serves as a physical model, and links the material properties to the system behavior. In this study, the proposed algorithm is implemented in Python.

SHGO-Based Optimization
As aforementioned, optimization is also a necessary step in back analyses. In this study, a general-purpose global optimization algorithm, SHGO, for its efficiency and convergence speed. It is based on the applications of simplicial integral homology and combinatorial topology. Since the derivatives of objective functions are not required, and only the function evaluations are used in SHGO, it makes this algorithm applicable to blackbox global optimization problems. The conduction of the SHGO algorithm mainly consists of four steps: During the optimization process, the difference between the predicted system output via the physical model and measured system responses is expressed as the objective function. In this study, the root mean square (RMS) is adopted as the objective function, and the expression is shown in Equation (8): where  i y and i y are the i th predicted system outputs and measured system responses, respectively. N is the number of measurements.

Procedures of Proposed Framework
Based on the AdaBoost technique and SHGO algorithm, an integrated back analysis framework for rock engineering is proposed in this study. AdaBoost technique facilitates the derivation of the surrogate model, which serves as a physical model and describes the relationship between the properties of surrounding rock mass and their responses during excavation. SHGO algorithm is adopted for optimization sessions in back analysis. The flowchart of the proposed framework is shown in Figure 2 and the procedures are as follows.


Step 1: Collect the project information.  Step 2: Generate the training data based on numerical simulation via the experimental design. The training data consist of not only the numerical model configurations but also the rheological responses.  Step 3: Derive the surrogate model based on AdaBoost algorithm to capture the nonlinear relationship between the mechanical parameters of surrounding rock mass and the corresponding responses during excavation.  Step 4: Conduct the optimization via SHGO based on RMS objective function. Then the most likely representative mechanical properties can be obtained. Then the results can be applied in the stability analysis, design scheme optimization, and other works. The proposed framework is mainly based on the AdaBoost and SGHO algorithms and implemented in Python and the SciPy package [32].

Project Overview
In order to verify the proposed framework, a practical tunneling engineering project, Goupitan Hydropower Station, was examined in this study. The project is located in Wujiang, Guizhou Province of China. It is a landmark of China's West-East electricity transmission project [33]. The preliminary design report of the project states that the tailwater tunnel of the underground powerhouse on the right-hand bank and the construction diversion tunnel pass through a soft claystone rock mass. The geological conditions and monitoring configurations are shown in Figure 3. There are three geological layers in this site, and they are labeled as  S  and 70 m deep from the ground surface. To monitor the deformation of surrounding rock after excavation, three boreholes, labeled as 4#, 5#, and 6# in Figure 3, were drilled at a depth of 11.6 m with a length of 7 m. The orientations of boreholes 4# and 6# are horizontal and vertical, respectively, and borehole 5# is inclined at 45  . Along each monitoring borehole, the monitoring instruments were placed at five monitoring points at depths of 0, 1, 2, 4, and 6 m, respectively. Once the monitoring instruments were set up successfully, the displacements of each point were measured and recorded continuously. As a part of field measurements, the relative displacements of monitoring point 5 with respect to point 1 of each borehole are summarized in Table 1.

Surrogate Model Derivation
As aforementioned, the analytical model to predict the surrounding rock mass mechanical behavior is usually impossible for a practical project. To conduct the back analysis, the proposed framework and numerical simulation via FLAC software [34] were adopted here for deriving the surrogate model. Based on the ground investigation information, the numerical model was constructed as shown in Figure 4. The maximum dimensions of the numerical model were up to 100 m×100 m to reduce the boundary effect. The roller supports were used for both vertical and bottom boundaries as boundary conditions such that only vertical movement was allowed along vertical boundaries and only horizontal movement was allowed along the bottom boundary. No support was assigned to the top boundary of the model, and it could deform freely. The mesh was refined around the tunnels to capture the more precise behavior of the surrounding rock. Three materials were adopted for different rock layers. The upper rock layer S  were assumed to be Burgers material [35,36] with different rheological properties. Based on the laboratory and in-situ tests, the range of rheological properties of layers 1 1 2h S  and 1 2 2h S  was able to be obtained as listed in Table 2. Moreover, the average bulk unit weight, 0.0265 MN/m 3 , was shared by all three rock layers.   According to the uniform design method and the prior information of relative mechanical properties, in total 42 training numerical model samples were generated. For each model, the rheological process was simulated via FLAC, and the evolution of displacements at given positions was recorded. By involving the configurations and rheological behaviors of all the samples in the proposed AdaBoost-based framework, a surrogate model linking rheological properties and concerned displacements was derived.

Clay-Green Clay Rock Purple Clay Rock
To verify the derived surrogate model, ten numerical testing models were generated as well. For each test model, the rheological properties were generated randomly according to Table 2, and the mechanical responses were obtained through numerical simulation via FLAC. The relative displacements between monitoring points 1 and 5 along 4# borehole predicted by the surrogate model and those directly from FLAC, for both training and testing models, are shown in Figure 5. Three different time points, Day 3, Day 5, and Day 11, were chosen to trace the evolutions of displacements. The coefficient of determination, denoted as 2 R , was also adopted here to indicate the goodness of fit of the surrogate models to the numerical models. Good agreements and higher 2 R values (more than 0.97) were able to be observed for the training samples for all three time points. Although lower 2 R values, ranging from 0.818 to 0.944, were found for testing samples, the displacements from surrogate models and numerical models agreed well for most testing samples, except for a few specific cases, e.g., Sample 2 on Day 3 and Sample 7 on Day 11. For most testing and training samples, the discrepancies were less than 0.5 mm. The largest discrepancy was about 2 mm and took place in Testing Sample 7 on Day 11. Similar conclusions can be drawn by comparing the results along 6# borehole shown in Figure 6. The largest discrepancy between the displacements from the surrogate model and numerical model happened in Testing Sample 7 on Day 11 and was about 2.5 mm. Therefore, it was proved that the derived surrogate model can also capture the nonlinear behavior of surrounding rock, and that the performance is acceptable. Considering the balance between efficiency and performance, the derived surrogate model is believed to be a good choice for back analyses.

Back-Analysis
To obtain the proper rheological parameters via back analysis, the field measurements (Table 1), as well as the derived surrogate model, were adopted into the SHGObased optimization process within the proposed framework. Table 3 summarizes the obtained representative rheological properties of rock layers . Based on the data involved in the back analysis, there are three types of field data: single-day, doubleday, and triple-day. Single-day data include only the field data on Day 3; double-day data include the field data both on Days 3 and 5; triple-day data include the field data on Days 3, 5, and 11. Different values were found for rheological properties by using a different type of field data. The obtained representative rheological properties shown in Table 3 were further examined. Figure 7 shows the predicted displacements via surrogate model based on different types of field data along 4# borehole. The field monitored displacements are also provided for reference. The predicted displacements based on the single-day data matched the field measurements well on Days 3 and 5. However, the discrepancy increased dramatically after Day 5. This suggests that the obtained rheological properties based on only Day 3 data are less capable of conducting a long-term prediction. Even for Day 11, which was not long after Day 3, the increasing discrepancy was as large as 4 mm, approximately equal to the field measurement itself. This is less reliable in guiding the long-term tunnel excavation in practice. As observed clearly in Figure 7, the performance is much improved if double-day back analysis results are used. An even better agreement of the predicted displacements via the surrogate model with the field measurements can be found by taking the rheological properties based on triple-day data. From Day 3 to Day 30, the discrepancy of the predicted displacement via surrogate model from the field measurements remained below 0.5 mm. Similar observations can also be found in Figure 8, which shows the predicted displacements via surrogate model based on the different types of field data along 6# borehole. The predicted displacement based on single-day data increased faster than the field measurements and became less and less reliable as time went on. For Day 30, the predicted displacement was about 14 mm, which is greater than four times the field measurement. The discrepancies of the predicted displacements based on doubleday data from triple-day data along 6# borehole were smaller than those along 4# borehole. However, significant improvement can also be observed by taking double-or tripleday data compared to single-day data. This confirms that more field-monitored data can provide more accurate predictions and suggests that the rheological properties obtained based on triple-day data are reliable representatives of the surrounding rock mass.  Besides the surrogate model, the numerical model can also be used to examine the obtained rheological properties in Table 3. By assigning the obtained rheological properties based on different types of field data, the numerical models were executed, and the mechanical behaviors were recorded. Rather than the relative displacement between monitoring points 1 and 5, the displacement of monitoring point 3 along 4# borehole was examined here. Figure 9 compares the monitored displacements and those calculated via numerical models with the different obtained rheological properties. It was observed that the calculated displacements based on triple-day rheological properties agreed well with the monitored field results. This further validates the reliability of the obtained rheological properties and the feasibility of the proposed back analysis framework.

Discussions
This section demonstrated the application of the proposed back analysis framework on the Goupitan Hydropower Station project. Numerical models were adopted in the surrogate model derivation since the practical project is usually complex and no analytical solution was available. The derived surrogate model was verified through the comparison between displacements from the surrogate model and the numerical model. In terms of 2 R values, the surrogate models for all the time points had good fitting effects greater than 0.818 for 4# borehole and greater than 0.701 for 6# borehole. The 2 R values for testing data were lower than those for training data. This is reasonable since the derivation was conducted based on training data. Based on the derived surrogate model, a back analysis was conducted via the SHGO algorithm, and the representative values were obtained for rheological properties. It is less practical to verify the obtained values directly by investigating the claystone properties via an in-situ test. As a test tunnel, the displacements along the borehole are much more meaningful than the rheological properties themselves. Therefore, the predicted displacements, both from the numerical model and surrogate model, were chosen for verifying the back analysis results. Compared with the field measurements, the predicted displacements based on the different types of field data (single-, double-and triple-day) showed different reliabilities. The agreements of predicted displacements to the field measurements were much improved when double-or triple-day data were taken into account, instead of single-day data.
The proposed framework is efficient and robust since it adopts AdaBoost and SHGO algorithms in surrogate model derivation and optimization, respectively. It can be widely applied to similar projects when prior knowledge of the concerned parameters and field measurements are available. However, as a boosting technique, the AdaBoost algorithm is sensitive to noise and outliers in the data. Therefore, it is highly recommended to eliminate the noise and outliers in the data before using the proposed framework.

Conclusions
The back analysis is a common and effective tool for finding the representative values of mechanical properties of rock mass for rock engineering. A novel framework combining AdaBoost and SHGO algorithms is proposed in this study. The AdaBoost algorithm is used to derive the surrogate model for predicting system outputs (e.g., mechanical behavior of surrounding rock in tunneling) in a computationally efficient manner. The SHGO-based optimization approach adopted in the proposed framework only requires function evaluations. This provides the framework with more robustness. The main procedures of the framework were presented, and the main ideas of the relative algorithms were revisited. Goupitan Hydropower Station was then introduced as a case study, in order to evaluate the proposed back analysis framework. The surrogate models, which describe the relationship between surrounding rock displacement and rheological properties, were derived via numerical models based on FLAC software. By comparing the monitored displacements and computed displacements from the derived surrogate model, the discrepancies for most samples were found to be less than 0.5 mm. It was validated that the derived surrogate model via the AdaBoost algorithm can also capture the nonlinear mechanical response of surrounding rock, and that the performance is acceptable. Then the representative values of rheological properties were able to be obtained by utilizing the SHGO algorithm within the proposed framework. The obtained properties were further examined by comparing the monitored displacements with the calculated displacements via surrogate model and numerical model, respectively. The predicted displacements based on single-day data were less reliable, and they cannot be used for longterm prediction. By adopting double-day or triple-day data, the agreements were significantly improved. For the whole monitoring period, the discrepancy between the calculated displacement, via either the surrogate model or numerical model, and the field measurements were kept below 0.5 mm. The reliability of the obtained rheological properties and the feasibility of the proposed back analysis framework were confirmed.