Towards TBM Automation: On-The-Fly Characterization and Classiﬁcation of Ground Conditions Ahead of a TBM Using Data-Driven Approach

: Pre-tunneling exploration for rock mass classiﬁcation is a common practice in tunneling projects. This study proposes a data-driven approach that allows for rock mass classiﬁcation. Two machine learning (ML) classiﬁcation models, namely random forest (RF) and extremely randomized tree (ERT), are employed to classify the rock mass conditions encountered in the Pahang-Selangor Raw Water Tunnel in Malaysia using tunnel boring machine (TBM) operating parameters. Due to imbalance of rock classes distribution, an oversampling technique was used to obtain a balanced training dataset for unbiased learning of the ML models. A ﬁve-fold cross-validation approach was used to tune the model hyperparameters and validation-set approach was used for the model evaluation. ERT achieved an overall accuracy of 95%, while RF achieved 94% accuracy, in rightly classifying rock mass conditions. The result shows that the proposed approach has the potential to identify and correctly classify ground conditions of a TBM, which allows for early problem detection and on-the-ﬂy support system selection based on the identiﬁed ground condition. This study, which is part of an ongoing effort towards developing reliable models that could be incorporated into TBMs, shows the potential of data-driven approaches for on-the-ﬂy classiﬁcation of ground conditions ahead of a TBM and could allow for the early detection of potential construction problems.


Introduction
Tunnel boring machines (TBMs) are currently the most utilized equipment for deep and long tunnels in both civil and mining industries. One important consideration prior to the actual excavation is evaluating ground conditions along the proposed tunnel alignment. This initial evaluation provides critical information for selecting the excavation type and developing preliminary ground support systems. Ground conditions are obtained by the characterization and subsequent classification of the rock mass based on a pre-defined system known as a rock mass classification system. Since the introduction of rock mass classification by Terzaghi, it has become a useful tool for rock engineering and is widely considered the most practical method for evaluating the quality of the rock mass in underground engineering practices. The common and widely used classification systems are the Q-system [1], Rock Mass Rating (RMR) [2], Rock Mass Index (RMi) [3], and Geological Strength Index (GSI) [4]. Aside from these classification systems, the Japanese Highway Classification System (JH system) and the Hydropower Classification System (HC system) are also popular in Asia.
One of the serious concerns in the use of rock mass classifications schemes is that they are subjective. Field engineers with different experience levels classifying the same rock mass using for example, RMR, can produce significantly different rock mass behavior [5]. This is because most of these classification systems use both quantitative and qualitative methodologies. To reduce, if not to eliminate, the subjectivity or experience factor in rock mass classification, a data-driven system is necessary. Some of the early attempts on datadriven approaches focused on the use of non-destructive forward geological prospecting techniques including tunnel seismic prediction (TSP), and ground penetration radar (GPR) to assess the rock mass quality ahead of TBMs [6,7]. Although these geophysical techniques provide reliable and accurate results, they are expensive and cause undue project delays. Zhang et al. [8] indicated that these forward geophysical prospecting techniques are not directly related to the rock tunneling/excavation process since they can only be implemented when the TBM is not in operation. Besides the subjective nature of rock mass classification systems, limited space between the TBM cutterhead and the tunnel face makes geologic mapping for classifying in-situ ground conditions difficult, if not impossible [9].
Another data-driven approach for classification of rock mass conditions in tunnels excavated by TBMs is the application of artificial intelligence (AI) and machine learning (ML) techniques to TBM operating parameters. Several researchers [10][11][12][13][14][15][16][17] have applied ML algorithms, capable of handling complex non-linear problems, to establish the relationship between TBM operational data and rock mass conditions. Liu et al. [18] used cutterhead thrust, cutterhead torque, revolution per minute (RPM), and penetration rate to develop a simulated annealing-back propagation neural network (SA-BPNN) model to predict rock mass properties (UCS, brittleness index (Bi), and the distance between plane of weakness (DPW). Current research in rock excavation and tunneling is focused on developing reliable AI and ML models based exclusively on TBM operational data.
The overall objective of these efforts is to develop some kind of on-board rock mass classification system on TBMs that will allow automated rock mass classification and possibly ground support system selection. Liu et al. [19] used TBM operational data to train a support vector classifier coupled with genetic algorithm to classify rock masses based on the improved basic quality (BQ) classification system. Jung et al. [20] applied ANN to shield TBM operational data (penetration rate, cutterhead torque and thrust force) to predict ground conditions ahead of the TBM. Zhang et al. [8] used RF, K-NN, and support vector classification (SVC) to predict ground conditions in tunnels using four TBM parameters namely; cutterhead torque, cutterhead thrust, cutterhead speed, and advance rate, and concluded that SVC outperformed the other techniques with an accuracy of 98%. They also indicated that out of the four TBM parameters analyzed, the cutterhead torque and thrust were found to better reflect the changes in rock types. Based on the Hydropower classification (HC) system, [9] used TBM operational data to train five predictive models: AdaBoost-CART, CART, SVC, ANN, and KNN, and concluded that AdaBoost-CART was the best model for predicting rock mass conditions. Zhang et al. [21] used ANN, SVM, KNN, and CART to develop geologic type recognition classifiers based on advance rate, cylinder thrust, cutterhead torque, and cutterhead rotational speed. Erharter and Marcher [22] proposed the multivariate sequence segmentation, abstraction, and classification (MSAC), a data-driven rock mass classification model, using the advance force, cutterhead torque, penetration rate, cutterhead rotations, advance speed, specific penetration, specific energy, and torque ratio.
This study explores the suitability of two supervised machine learning algorithms, random forest (RF), and extremely randomized trees (ERT) in predicting the ground conditions on the tunnel face ahead of a TBM based on the Japanese Highway Classification System. RF and ERT harness the predictive capabilities of multiple decision trees. Different sets of predictors are used at each node; hence, the variance of the resulting model is significantly reduced compared to the individual regression trees. RF was selected for this analysis because it has been applied successfully in a wide variety of projects and has seen tremendous acceptance in many disciplines due to its tendency to decrease the models' variance [23]. ERT, on the other hand, is relatively unknown especially in the area of rock excavation but it was selected due to its high performance with less noisy data. In this study, TBM operating parameters namely; rate of penetration, cutterhead torque, cutterhead thrust force, cutterhead revolution per minute, hydraulic cylinder stroke speed, boring pressure, pitching, and motor amps were analyzed using the two ML algorithms to develop models for classifying the rock mass conditions in TBM tunnels. This research contributes to the ongoing research efforts towards developing reliable models that could be incorporated into TBMs to allow for on-the-fly characterization and classification of ground conditions in tunnel excavation as well as eventual automation of ground support systems selection.

The Japanese Highway Classification System (JH System)
The Japanese Highway Classification System was first developed in Japan in the 1960s for large dam foundation and later extended to tunnel rock mass characterization [24]. This classification system commonly referred to as JH System, like many rock mass classification systems, has undergone several revisions since its introduction. The JH System relies primarily on seven rock mass parameters namely: intact rock strength (compressive strength), weathering, spacing of discontinuities, condition of discontinuities, effect of discontinuities orientation, groundwater condition, and degradation by water. Each of these parameters is further subdivided into subgrades and assigned a grade point corresponding to the level of the rock mass feature being characterized. For example, the intact rock property (UCS), is divided into six subgroups: less than 3 MPa, 3-10 MPa, 10-25 MPa, 25-50 MPa, 50-100 MPa, and greater than 100 MPa. Each of these subgroups is assigned a grade point reflecting the strength of the intact rock material.
Once each rock mass parameter is graded/rated, the grade point for the intact rock property, weathering, joints spacing, condition of joints are added up and the grade points for groundwater conditions, deterioration due to water, and effect of discontinuities orientation are subtracted from the sum to obtain total grade points of the rock mass at that location. The total grade point ranges from 0 to 100 representing very poor rock to very good fresh rock respectively. The total grade point is then used to categorize the rock mass into classes. The system has six rock mass classes; A, B, CI, CII, D, and E. In terms of rock mass competence, it decreases from class A through class E, with class E been the least competent rock mass. In tunnel excavation, these rock mass groups are used to determine the ground support system required to stabilize the tunnel walls. Table 1 shows typical JH System data collection sheet used in the Pahang-Selangor Raw Water Tunnel (PSRWT) while Table 2 shows typical ground support systems for the different rock mass classes.

Project Background
The Pahang Selangor Raw Water Tunnel (PSRWT) is a property of the Malaysian Government that was constructed to convey raw water from the Semantan River, located in the southwestern part of Pahang, to Selangor State to address perennial water challenges. The tunnel is gravity driven and conveys approximately 1.89 billion liters of water per day to the Hulu Langat treatment plant. The tunnel, which is 44.6 km long, the 11th longest tunnel in the world, was constructed using two tunneling methods, i.e., the new Austrian tunneling method (NATM) and TBM method. The TBM was used to drill 33 km of the tunnel length utilizing three (3) different Robbins Main Beam Tunnel Boring Machines, labeled TBM 1, TBM 2, and TBM 3. Figure 1 shows TBM 1, a Robbins 5.2 m Diameter Main Beam Tunnel Boring Machine, which was used to collect the data analyzed in this paper.

Geologic Setting
Geologically, Peninsular Malaysia is made up of four major tectonic zones namely, the Western Stable Shelf, the Main Range Belt, the Central Graben and the Eastern Belt [25]. Lumpur Granite and the Genting Sempah Micro-granite are separated by the Kongkoi Fault and the Bukit Tinngi Fault also separates the Genting Sempah Micro-granite and the Bukit Tinggi Granite. While the Kuala Lumpur Granite is megacrystic, the Genting Sempah Micro-granite consists of micro-granodiorite. The Bukit Tinggi Granite consists of very coarse-grained biotite granite. The Main Range Granite is strongly deformed due to the intrusion of other granitic rocks. In general, the study area is underlain by coarse grained, porphyritic biotite granite cut by minor porphyritic differentiates. Micro-granite, granodiortite, diorite, monzonite, granite porphyry, quartz porphyry, megacrystic biotite granite, megacrystic muscovite-biotite granite and equigranular tourmaline-muscovite granite are the other rocks within the study area. Figure 3 is a geologic cross-section showing the tunnel alignment.

Database and Data Collection
The Pahang-Selangor Raw Water Tunnel was constructed by a Japanese firm. Consequently, the JH system was employed in the tunnel rock mass characterization. To do this, the tunnel was divided in three zones: right, left and center sides as shown in Figure 4.
Each of these zones were characterized using the JH system described in Section 1.1. For the intact rock strength, Schmidt hammer measurements were made and converted to UCS values. The tunnel face was mapped by geologist to provide the needed information to calculate the grade points for each zone. The final grade point was a weighted average of the grade points of the three zones. The tunnel was mapped every four (4) to ten (10) meters along the length of the tunnel. The database used for this paper consists of 180 rock mass data and 79,813 TBM operating data points. This dataset represents 11.6 km of the tunnel from chainage 6.85 km to chainage 18.59 km.

Data Exploration
The dataset used in this study contained 23,947 records after cleaning to remove missing values, and duplicates. A summary of the input variables is presented in Table 3 with the cutterhead torque having the largest range followed by the boring pressure. A pairwise correlation of the input variables presented in Table 4. tabreftabref:applsci-1041145-t004 shows that, apart from stroke speed and penetration rate, that have a strong positive correlation, the rest of the variables have very weak correlations. This shows that there are no concerns of multicollinearity.
From Figure 5, the median cutterhead RPM decreases with decreasing rock mass competence. In a more competent rock mass, the penetration of the cutting tools into the rock mass is limited by the rock mass strength, therefore, the RPM of the cutterhead is higher than when rock mass is less competent (e.g., CII), where the cutting tool penetrates deeper. This is a possible explanation for the behavior of the cutterhead RPM observed in Figure 5.
A close observation of Figure 6 shows a consistent decline in the median boring pressure from rock class A through rock class CII. This general decline in the applied pressure can be attributed to the decrease in the integrity of the rock mass from class A to CII. Massive competent rock, like rocks in class A, will require high excavation pressure for fragmentation than fractured rock such as those in class CII. In each rock type, the boring pressure is widely variable with a lot of outliers ( Figure 6). This variability stems from instantaneous heterogeneities that are encountered within one rock mass class. The variability is more pronounced in the first three classes and not as much in class CII.   In terms of the rate of penetration or advancement rate, the median penetration rate increased from class A to class CI. This is intuitive since it is expected to be more difficult to advance in competent rock. CII however, shows an unexpected low penetration rate as shown in Figure 7. This may be attributed to other operation factors that accompany excavation in relatively weak rocks like class CII. The dataset had an obvious imbalance in the number of data points in each rock mass class ( Figure 8, Table 5). This imbalance tends to affect the performance of classification models. Majority of the rock mass in the dataset were in class B. The number of rock mass data points in classes A and CI are comparable with only a small fraction of the dataset falling in class CII. Due to this imbalance, an oversampling technique was employed to obtain a balanced training dataset for unbiased learning of the ML models. The upSample() function in the caret package in the R software was used to conduct the oversampling of the minority classes, A, CI, and CII to equal the majority class, B. It must be stated that the oversampling was only conducted in the training set and not the test set since an imbalance in the test set does not affect the performance of the already trained models.
As stated in Section 1.1, the Japanese highway classification system has six rock classes, but the dataset used in this study only contained four rock classes, A through CII. These classes fall in the general category of hard rock. Therefore, this study is applicable to hard rock tunnel excavations.

Variable Importance
A sensitivity analysis was conducted to ascertain the level of influence each input variable has on the models' classification capabilities. Permutation of each input variable was done while keeping the rest of the input variables constant and the mean decrease in Gini index, a measure of total variance across the rock mass classes, was recorded. The higher the mean decrease in Gini index, the higher the sensitivity to that variable.
Based on this analysis, cutterhead RPM is the most sensitive variable to the rock mass class followed by the cutterhead thrust ( Figure 9). Zhang et al. [8] observed a similar relationship between cutterhead torque, cutterhead thrust, and rock mass classes, and concluded that torque and thrust were good indicators of rock mass behavior. The least sensitive variables are the stroke speed and rate of penetration, respectively. The high sensitivity to the cutterhead RPM is somewhat intuitive since it is directly related to the integrity of the rock mass being excavated. With the same level of cutterhead torque, RPM will decrease significantly in less competent rock masses (e.g., class CII) as compared to more competent rock (e.g., class A) as seen in Figure 5. A similar analogy can be given for the cutterhead thrust. In general, it is expected that the rate of excavation/penetration would increase significantly when cutting class CII as compare to operating in class A. This was observed from class A through CI but the rate of penetration decreased in CII. The rate of penetration can be affected by several factors such as intentional maneuvers by the operator due to the unstable nature of the weak rocks (e.g., class CII). This response can be seen in Figure 7.

Development of Machine Learning (ML) Models
Two machine learning techniques, random forest, and extremely randomized trees, were applied to develop models for classifying the rock mass dataset into categories based on JH rock classification system. This section discusses the data preprocessing, machine learning models that were applied, and their learning process.

Data Preprocessing
The TBM operation parameters were recorded at a much higher resolution, about a fraction of a meter, as compared to the rock mass data, which were collected every 4 to10 m. The rock mass data was taken at a coarse resolution because the rock properties in this section of the tunnel were not changing much within a short interval. Where a change in rock mass characteristics was observed, a finer rock mass data collection resolution was used in order to capture all the variations in the rock mass. Another possible reason for coarser resolution in the rock mass data is that taking the rock mass data involves shutting down the operations to allow for geologist to be able to access the tunnel walls. On the other hand, the machine operating parameters are easier to collect and does not require any downtime. For this study, the resolution of the machine data and the rock mass data had to be matched to enable usage of the machine data to predict the rock mass conditions. The chainage interval in the two datasets was used as a key to match the two datasets. That is, the rock mass record for a particular chainage interval is adjoined to all the TBM records in that chainage interval. This was done for all the data points in the rock mass dataset, creating the aggregate dataset used for this study.
The variables in the dataset consist of a wide range of scales, tens to thousand. Consequently, the data was normalized so that the input variables are in the same scale. According to Jayalakshmi and Santhakumaran [26], normalization helps minimize bias caused by different scales of the input variables. Computational speed is also improved by data normalization since the features are put on the same scale. As a result, that dataset in this study was normalized using the min-max normalization which preserves the relationship between the input and output variables. The input variables were scaled to a range between a minimum of zero and a maximum of one. The preProcess() function in R software was used to normalize the data in this study. The normalization is achieved using Equation (1) [26].
where x is the rescaled feature x, x max is the maximum value of feature x, x min is the minimum value of feature x, and x i is the ith value of feature x.

Random Forest (RF)
According to Zhang and Ma [23], random forest (RF) has been applied successfully in a wide variety of projects and has seen tremendous acceptance in many disciplines, thus, its inclusion in this study. RF also has the capability of ranking the importance of all the input variables contributing to the prediction of the target variable.
The predictive abilities of multiple decision trees are harnessed by Random Forest, an ensemble method. To practice each decision tree, bootstrapped samples are used and the predictive capabilities of all the trained trees are aggregated to form the final model. A number of predictors, mtry, was randomly chosen in constructing the trees to be considered at each node during the recursive binary splitting instead of using all the predictors [27]. This gives the technique its name, random forest. At each node, a different set of predictors are used for node splitting; therefore, the variance of the resulting model is significantly reduced compared to the individual regression tree. In training the decision trees, each split is done to obtain two regions R 1 and R 2 as in Equation (2).
where j is the index in the predictor space with an upper limit of mtry and s is the cut point for the split.
The objective is to obtain j and s values that minimize the function (Equation (3)).
withŷ R 1 is the mean response for the training observations in R 1 (j, s); andŷ R 2 is the mean response for the training observations in R 2 (j, s). This process is repeated until there is no decrease in residual sum of squares by further splitting, at which point the terminal node is reached. The number of predictors to be considered in the splitting at the nodes, mtry, is a hyperparameter that has been calibrated using 5-fold cross-validation (CV) to achieve an optimum value for the best prediction output in training the random forest model [27]. The optimal mtry was then used to fit the final model.

Extremely Randomized Tree (ERT)
ERT is also an ensemble method similar to RF. The difference between RF and ERT is in the mode of tree nodes splitting. While the splitting is deterministic in RF, it is randomized in ERT. The randomized splitting in ERT has the tendency to further reduce the prediction variance when the dataset has a low level of noise. This implies that when the dataset is less noisy, ERT tends to perform significantly better than RF. However, when the data is noisy ERT does not necessarily have an improved performance over RF. Due to the randomized nature of node splitting, ERT is more computationally expensive than RF. Therefore, if the performance of ERT is not significantly better than that of RF, it is recommendable to adopt RF. A detailed description of ERT can be found in [28]. ERT is considered in this study because of its semblance to the RF model which has proven to be effective in predicting mechanical excavator's performance [29] and its tendency to have improved performance over RF.

Machine Learning Process
Since the response variable-rock mass class-has four levels, multi-class classification was conducted using Random Forest and Extremely Randomized Trees. The models were trained on 70% of the dataset and the remaining 30% was used to evaluate their classification performance. These fractions were chosen because the dataset is large and oversampling of the minority classes in the training set further increased the size of the training set, hence, 70% of the data was used for the model training instead of the usual 80% that is generally used.
During the model training, 5-fold CV was used to tune the hyperparameters of the models. In RF, the hyperparameter, mtry, is the number of predictors that are considered in deciding the best split at each decision node [27]. The mtry for this dataset was 5. The hyperparameters for the ERT are mtry and numRandomCuts. numRandomCuts is the number of randomly selected splits for each mtry. The mtry and numRandomCuts in this study were both 6. After obtaining the optimal hyperparameters the cross-validation run, the final models were then fitted using these hyperparameters.

Accuracy and Balanced Accuracy
Accuracy is the measure of correct classifications. It is the ratio of the number of observations that are correctly classified to the total number of observations. This metric is only meaningful when evaluating balanced datasets. It loses its relevance when evaluating an unbalanced dataset [30]. In studies involving unbalance datasets, balanced accuracy is a more meaningful performance metric. It is calculated as the average of the proportion corrects of each class individually, that is, the arithmetic mean of the precision and recall (Equation (4)).
where b is the balance accuracy, p is the precision, and re is the recall

F1 Score
Precision measures the proportion of positive classifications that are correct in binary classification. It is the ratio of the number of correct positive classifications to the total number of positive observations [30]. Recall is a measure of the proportion of actual positives that are identified correctly. This is also known as the sensitivity of the model [30]. There is usually a trade-off between precision and recall depending on the purpose of the classification and the risk associated with the false-positive classification. The F1 score is the harmonic mean of precision and recall (Equation (5)).

Cohen's Kappa Coefficient (k)
Kappa is a statistical measure of the agreement between different raters [31]. In this case, it is the measure of the agreement between the predicted and observed rock mass classes. Unlike accuracy, kappa takes into account classifications made by chance. It is given by Equation (6). The following descriptions are given to various ranges of kappa: 0 = agreement equivalent to chance; 0.1-0.20 = slight agreement; 0.21-0.40 = fair agreement; 0.41-0.60 = moderate agreement; 0.61-0.80 = substantial agreement; 0.81-0.99 = near perfect agreement; 1 = perfect agreement [31]. In formula, where p 0 is the relative observed agreement among raters; and p e is the hypothetical probability of chance agreement.

Classification Performance the ML Models
The overall performance of the ML models was measured by the accuracy and Cohen's kappa. These were calculated by considering all the correct predictions and all the wrong predictions. The performance of the models in predicting each rock class was measured by the F1-score and balanced accuracy. Since the study involved a multi-class classification, the performance metrics were computed by considering one class, e.g., class A, as positive, while the other three classes, e.g., classes B, CI, and CII, were considered negative. This was done until each rock mass class was considered positive to obtain the metrics presented in Table 6. Based on the F1-score and the balanced accuracy, both RF and ERT accurately predicted rock class CII with at least 96% in terms of the F1-score and 99% in terms of the balanced accuracy. The worse model performance was recorded when predicting class A with F1-score of at least 92% and balanced accuracies of 95%. The variation in performance level in predicting the different rock mass classes could be related to the TBM operation and excavation process. Rock mass class A consists of slightly weathered with few or no fractures, to fresh massive granites, which causes excessive cutter wear resulting in frequent replacement of consumable components, e.g., cutters. This wear and tear, and subsequent replacement of cutters can cause fluctuations in the TBM operating parameters and could have resulted in the low prediction performance of class A as can be seen Table 6. As the rock mass gets highly weathered and intensively fractured, like rock mass classified as CII, less cutter wear will be observed, resulting in a fairly consistent set of operating parameters, all other factors held constant. This can also be seen in Figure 6, where the boring pressure showed less variability in class CII as compared to the rest of the rock mass categories. In general, more consistent set of operating parameters should lead to models with high prediction performance. In terms of classifying the overall rock mass in the various rock mass classes, both models performed very well with the overall accuracy greater than 0.94 and Cohen's kappa greater than 0.90, as shown in Table 7. Visual presentation of the classification by the two models in the form of confusion matrix heatmaps are shown in Figures 10 and 11. The counter-diagonal boxes (top right to the bottom left corner) represent correct prediction of the rock mass class while the rest of the boxes represent misclassification. The intensity of the fill color of the boxes represents the proportion of the data points that have been categorized into that class by the model (misclassification and correct classification). It is interesting to note that in both models, class A was only misclassified as class B but was not CI or CII (Figures 10 and 11). Class B was misclassified as A and CI on a few occasions and only misclassified as CII once. The misclassifications of CI were mostly as B with only one being labeled as class A by RF and three labeled as CII. Both models only misclassified CII as CI once. This shows that the models do not predict rock classes that are far off from the actual class, especially in the case of A and CII. This means that on a very worst-case scenario of misclassification, there is still confidence that the prediction is within the immediate neighborhood of the actual rock mass class. Since the predicted rock mass classes will be used to determine the required support type, it would be detrimental to classify CII as A and assume that it needs no support. On the other hand, classifying A as CII will result in an unnecessary escalation of the project cost in terms of the needed ground support for class CII.

Comparison of the ML Models
The overall classification performance of the two models was compared using bootstrap sampling. 1000 bootstrap samples were taken from the test dataset with replacement and the performance of both models was tested on each sample set. This gave normal distributions of accuracy and kappa ( Figure 12). The mean performance of ERT is higher than that of RF, however, the 95% confidence interval for the two models overlap in terms of both accuracy and kappa as shown in Table 8.  This indicates that statistically, ERT does not significantly outperform RF.

Conclusions
In this study, two machine learning (ML) classification algorithms; random forest and extremely randomized trees, were employed to characterize and classify ground conditions along the Pahang-Selangor Raw Water Transfer (PSRWT) tunnel alignment in Malaysia based on TBM operating parameters and rock mass data obtained based on JH rock mass classification system. The TBM operating parameters included in this approach are rate of penetration, cutterhead torque, cutterhead thrust, cutterhead revolution per minute, hydraulic cylinder stroke speed, boring pressure, and motor amps. Due to imbalance in the rock mass data, an oversampling technique was used to obtain a balanced training dataset for unbiased learning of the machine learning (ML) models. Multi-class classification was done, categorizing the rock mass condition into A, B, CI, and CII classes per the JH system. The JH classification system categorizes rock mass into six classes but the tunnel section from which the dataset was obtained consisted primarily of hard rocks. Consequently, only rock classes consistent with hard rock were encountered and analyzed in this paper. An extension of this study is needed with a dataset that includes all the soft rock mass classes to make the developed models compressive in all ground conditions that the TBM may encounter along the tunnel tract.
The main conclusions of this study can be summarized as follows: 1.
The proposed approach was applied to a dataset from the Pahang-Selangor Raw Water tunnel (PSRWT) project in Malaysia. A comparison between the ML model classification results and the measured rock mass classes shows that the proposed approach is effective. The identification and classification accuracies were 95% and 94% for ERT and RF, respectively with kappa values of at least 0.90.

2.
A bootstrap comparison of the performance of the two ML models, RF and ERT, indicated no model outperformed the other. Due to the randomized nature of node splitting, ERT is more computationally expensive than RF. Therefore, if the performance of ERT is not significantly better than that of RF, it is recommendable to adopt RF. 3.
The most influential TBM operating parameter in classifying the rock mass is the cutterhead RPM followed by cutterhead thrust. The two least influential parameters are stroke speed and rate of penetration. Therefore, TBM thrust and RPM can be adjusted in real-time by determining the rock mass class being excavated using the ML models developed in this paper.

4.
From a practical standpoint, the overall results obtained in this study show that the data-oriented approach is a useful tool for on-the-fly rock mass conditions identification, characterization and classification of ground conditions along tunnel alignment. It can be a tool for on-site decision making such as selecting support systems or refining preliminary support systems based on ground condition encountered.

5.
Extension of this research should also focus on exploring other ML techniques including deep learning methods as well as developing a framework for operationalizing this approach in TBMs.