Prediction Method of Surrounding Rock Hydropower Classification (HC) Based on TBM Tunneling Data and Spearman-Weighted Supervised Prototype Classifier

Zhang, Lingli; Hou, Jian; Wang, Ruirui; Liu, Nana

doi:10.3390/app16115636

Open AccessArticle

Prediction Method of Surrounding Rock Hydropower Classification (HC) Based on TBM Tunneling Data and Spearman-Weighted Supervised Prototype Classifier

by

Lingli Zhang

^1,2,

Jian Hou

¹,

Ruirui Wang

^1,2,*

and

Nana Liu

¹

School of Civil Engineering, Shandong Jianzhu University, Jinan 250101, China

²

Key Laboratory of Building Structural Retrofitting & Underground Space Engineering, Ministry of Education, Shandong Jianzhu University, Jinan 250101, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(11), 5636; https://doi.org/10.3390/app16115636

Submission received: 20 April 2026 / Revised: 1 June 2026 / Accepted: 2 June 2026 / Published: 4 June 2026

Download

Browse Figures

Versions Notes

Abstract

Hydropower Classification (HC) is a widely used rock mass classification method in tunneling construction projects. Predicting the HC of the surrounding rock is significant for selecting tunneling parameters and guaranteeing tunneling safety. To predict the HC of a tunnel excavated by TBM, a Spearman-Weighted Supervised Prototype Classifier (SW-SPC) is proposed. According to the target task, the Spearman’s correlations between TBM tunneling features and HCs are utilized as distance weights to guide the prototype optimization process, while an unweighted distance-based prototype optimization classifier serves as a baseline for performance comparison. To verify the proposed method, a total of 275 field samples with matched tunneling data and HCs were collected from a tunnel project located in Northeast China. Among them, 200 samples made up the training and the other 75 samples made up the testing set. The SW-SPC model achieved an accuracy of 82.7% and the precision and recall reached 84.6% and 81.7% in the single static test split. To rigorously evaluate the generalization of SW-SPC, a 5-fold cross-validation was implemented. It yielded a consistent global accuracy of 82.0%, with precision and recall ranges of 59.3–61.0% and 58.3–66.5%, respectively. These metrics demonstrate the model’s robust global performance while highlighting the localized sensitivity to class distribution variance inherent in field-measured tunneling data.

Keywords:

tunnel boring machine; rock mass parameter; Hydropower Classification; K-means clustering

1. Introduction

The tunnel boring machine (TBM), with its advantages of high efficiency and safety, has recently been widely used in long and deep tunnels. Accurate and valid investigation of the geological condition of the surrounding rock is significant for selecting proper tunneling parameters to ensure tunneling safety and efficiency, which has become a critical research focus in tunnel construction. However, due to the constraints imposed by the intricate mechanical structure of the tunnel boring machine (TBM) and the confined working space, conventional in situ testing techniques for rock mass parameters, which are readily applicable to surface excavations, face significant challenges in underground environments. Consequently, numerous scholars have directed their efforts towards developing suitable methodologies for rock mass parameter determination. Naeimipour et al. devised a Rock Strength Boring Probe (RSBP) to acquire geological data, including uniaxial compressive strength and tensile strength, by analyzing the scratch depth measurements induced by the probe on the rock surface [1]. Wang et al. developed a True Triaxial Rock Drilling Test System (TRD) capable of characterizing rock mass parameters under various lithological conditions [2]. In addition, Goh et al. employed the Spectral Analysis of Surface Waves (SASW) method to analyze shear wave velocity for assessing the Rock Quality Designation (RQD) [3]. Furthermore, Kong et al. [4] and Liu et al. [5] utilized point load test results to evaluate rock compressive strength.

The aforementioned studies provide valuable frameworks for acquiring geological data and assessing rock mass conditions within TBM-driven tunnels. Nevertheless, a common limitation among these approaches is the time-intensive nature of testing and analytical procedures, which inherently leads to latency in geological data acquisition. This makes it impractical to obtain data in real time at the pace of TBM excavation. To address this issue, establishing a correlation between real-time TBM tunneling data and rock mass parameters through statistical methodologies—including traditional regression analysis and data mining techniques—offers a viable solution for overcoming these latency constraints. Such research typically employs diverse datasets, either obtained from field measurements or laboratory tests, as training inputs. The prediction or evaluation targets then serve as the output, in which known data are fed into the model. Through regression or data mining algorithms, the functional relationships between inputs and outputs are established and subsequently utilized as the basis for evaluation. By integrating newly acquired data into these models, the corresponding outputs can be generated, which represent the predicted values of the target parameters.

Mikaeil et al. [6], Hassanpur et al. [7], Samaei et al. [8], Nelson et al. [9], Grima et al. [10], and Entacher et al. [11] proposed models to characterize the evolutionary relationship between rock mass properties and TBM tunneling parameters. Concurrently, advancements in computer technology have facilitated the integration of machine learning approaches, which exhibit exceptional capacity for handling regression problems involving large-scale datasets and complex nonlinear patterns. Specifically, Armaghani et al. [12], Zare et al. [13,14], Mahdeveri et al. [15], Liu et al. [16], Yagiz et al. [17], and Minh et al. [18] applied various machine learning algorithms, such as artificial neural networks, particle swarm optimization, fuzzy logic, and gene expression programs, to construct the rock–machine interaction models, yielding favorable predictive outcomes. In the field of rock mass parameter prediction, there has been a trend of machine learning algorithms replacing traditional regression and becoming increasingly widely applied.

To facilitate their application in actual tunnel projection, researchers have proposed several rock mass classification methods, such as Q_TBM, RMR, RME, and GSI, which have been verified in actual tunnel projects and demonstrated effective practical applications [19,20,21]. Among them, Hydropower Classification (HC) is a widely used rock mass classification method, particularly used in the construction of hydraulic tunnel projects [20]. The main factors considered by HC are the uniaxial compressive strength and the integrity index of the surrounding rock. In addition, the discontinuity of the structural plane, attitude of the major discontinuity plane, and groundwater conditions are also used to modify the classification results.

In essence, rock mass classification methods, including the HC method, are effective indexes to characterize the rock mass parameters, and their prediction belongs to the classification problem in machine learning, which is different from the prediction of rock mass parameters. Supervised classifiers, such as support vector machine (SVM) and artificial neural networks (ANNs), are widely used to solve this kind of task. In addition, unsupervised algorithms, such as K-means clustering and fuzzy C means clustering, can only calculate the classification results without labels, which cannot be directly used in this task [22,23].

To bridge the gap between traditional unsupervised algorithms and supervised geotechnical tasks, this paper introduces the Spearman-Weighted Supervised Prototype Classifier (SW-SPC), an application-specific supervised prototype optimization approach, for evaluating the surrounding rock HC in TBM tunnels. Specifically, the real-time TBM tunneling parameters are utilized as input variables, and partial HC is the output. In the training set, the Spearman’s correlations between each input and the output are calculated and used as the weight of the corresponding input to measure distances between samples. On the basis of the weighted distance, samples are grouped into multiple prototype-defined clusters. Further, a multiple regression of the HC with the input of tunneling data is proposed to evaluate the expected HC values of the prototypes of categories and assign labels to them.

When developing SW-SPC based on evaluating HC, improvements are mainly made in two aspects. Firstly, in supervised tasks, the Spearman’s correlation between input and output is known by sample labels, which has a positive effect on the evaluating accuracy according to validation by field samples. Then, assigning labels to clustering categories by multiple regression effectively integrates the discriminative power of supervised learning into the prototype-based classification framework.

2. Data Collection

2.1. Original Data

This research is conducted based on a hydraulic tunnel project located in Northeastern China. The length of the tunnel is 23 km, and the main landforms consist of valleys and hills. The dominant lithologies in the project area are granite and limestone, which occupy about 38% and 30% of the total length of the tunnel. The development of fractures in the surrounding rock is relatively high, and the surrounding rock primarily consists of class III and IV rock masses. Especially in the limestone area, groundwater is abundant, and frequent seepage occurs on the tunnel face.

The tunnel is excavated by an open-type TBM, whose cutterhead is 3.95 m in radius. The TBM is equipped with 56 cutters, with diameters of 19 inches, and the cutterhead space is about 84 mm. With the TBM tunneling, nearly 200 tunneling parameters are recorded by the acquirement system of the TBM, including mechanical, electrical, and hydraulic data. The sampling frequency of these data is 1 Hz. In addition, the HC is recorded by artificial statistics every tunneling cycle. The tunneling data and HC constitute the original dataset for this research.

2.2. Division of the Dataset

Due to the complexity of the original data, it cannot be directly used to establish the HC prediction model. Firstly, the nearly 200 features of the tunneling data should be screened. In this research, redundant tunneling features which are not closely related to the results are removed [16,24], and only five features, including the revolution per minute of the cutterhead, the torque, the cutterhead power, the penetration, and the thrust, are used to implement the SW-SPC model. The abovementioned five parameters are directly related to the tunneling load or energy for rock-breaking, which are always regarded as the main controlling parameters in the field of TBM tunneling research [25,26]. Previous research investigated the relationship between the rock mass and tunneling parameters using statistical methods, such as principal component analysis (PCA), which proves that variables directly related to the tunneling process always have a higher correlation than other tunneling parameters, such as electrical and hydraulic parameters [27].

In addition, the project comprises thousands of tunneling cycles, corresponding to thousands of samples, which is computationally prohibitive for the prototype optimization process. In this research, adjacent tunneling cycles with the same HC are considered as homogeneous segments and are merged into a single representative sample. In other words, a continuous tunneling section with the same HC is regarded as a sample, which consists of multiple tunneling sections.

By merging homogenized samples, thousands of tunneling cycles are transformed into a total of 275 samples. The 275 samples are divided into two sub-datasets. The first part includes 200 samples, making up the training set to establish the SW-SPC-based prediction model of HC. The other 75 samples make up the testing set to validate the trained model. Generally, the distribution of the training and testing samples should be different. The proportions of the different HCs of the surrounding rock in the training and testing sets are shown in Figure 1.

3. Data Pretreatment

The inherent discrepancy between rock mass characteristics and TBM tunneling data poses a significant challenge in the field of predicting rock mass parameters or rock classification. In detail, a tunneling area with a rock classification value might correspond to tens of thousands of tunneling data points. Directly using the average values to represent the multiple tunneling data points is unreasonable, because the tunneling data recorded by the TBM acquisition system includes invalid data, such as the tunneling data in stoppage and trial tunneling, which always leads to an underestimation of the tunneling parameters. Therefore, the key of data pretreatment is to distinguish the invalid tunneling data generated by the TBM stoppage or trial tunneling. Currently, there is no universal distinguishing standard, and tunneling data pretreatment is often based on subjective experience.

Take the selected tunneling features recorded from 0:00:00 to 18:00:00 on 20 October 2015 as an example, which are listed in Figure 2a. As shown in the data, during most of the listed time segment, such as the segment from 0:00:00 to 5:09:58, all the selected features are zero. These data are obviously from the TBM stoppage, and they should be removed in data pretreatment. In addition, there are short segments, in which only the cutterhead power and torque have positive values, and the other three tunneling data features are still zero, such as from 5:09:59 to 5:10:11. Meanwhile, in this kind of segment, the cutterhead power and torque always maintain low values which are obviously lower than during a normal tunneling cycle. In these segments, the TBM cutterhead is always idling, and it is not in contact with the tunnel face, which explains why the thrust and penetration rate remain at zero. This kind of segment is called trial tunneling, and its tunneling data should also be removed in data pretreatment.

The above sections represent obvious cases of data exclusion in pretreatment. However, in a whole tunneling cycle, there is still a portion of the data that should be removed. Take the tunneling cycle from 13:30:00 to 13:48:40 as an example, which is shown in Figure 2b. In this tunneling cycle, the tunneling data can be obviously divided into three stages. Firstly, in the stage from 13:30:00 to 13:34:16 (the increasing stage), the tunneling data sharply increase from 0 to their maximum, and this stage can be called the increasing stage. Then, in the stage from 13:34:17 to 13:46:36, the tunneling data are relatively stable and fluctuate near their maximum, which is called the stable stage. In the stage from 13:46:37 to 13:48:40, the tunneling data decrease from the maximum to 0, which is called the decreasing stage. Previous research indicates that the trends of tunneling data in the increasing or decreasing stage are influenced not only by the rock mass condition, but also by the operating habits of the TBM workers. Therefore, in data pretreatment, only the tunneling data in the stable stage should be retained [28].

According to the above analysis, the target of data pretreatment is to exclude tunneling data from stoppage, trial tunneling, and the increasing/decreasing stages, while utilizing stable-stage data to construct the final tunneling dataset.

Based on the characteristics of the tunneling data, a total of three steps were conducted to pretreat the tunneling data and build the tunneling dataset. Firstly, the tunneling data in stoppage and trial tunneling stages were removed. Then, the tunneling data in the increasing and decreasing stages were removed, and the tunneling data in the stable stage were screened. Finally, for each sample, the average value of tunneling data within the corresponding mileage was used as the tunneling feature.

These three steps are a general method to handle TBM tunneling data in the field of predicting rock mass parameters. Among them, the first and last steps, as well as removing the stoppage data and calculating the average value of the tunneling data, are relatively simple. In contrast, there is no widely recognized criterion for screening tunneling data in the stable stage due to the difference in the data trends of different TBM specifications. In this research, the penetration rate, defined as the product of penetration and rotation speed, is adopted as the criterion for screening the stable stage. In each tunneling cycle, when the tunneling data shows a penetration rate higher than 10 mm/min for a continuous duration of 10 s, it is judged as the beginning of the stable stage. Similarly, when the tunneling data shows a penetration rate lower than 10 mm/min for a continuous duration of 10 s, it is judged as the end of the stable stage. Take the tunneling data shown in Figure 2a as an example. The pretreated results are shown in Figure 3. The pretreated data were used for training and testing the SW-SPC.

4. Formulation of the Spearman-Weighted Supervised Prototype Classifier (SW-SPC)

4.1. Mechanics of the Prototype Optimization Classifier

On the basis of the pretreated and refined data, a baseline prototype optimization classifier using unweighted distance is developed as a supervised framework, which is conceptually aligned with the K-means iterative refinement process, to categorize tunneling areas according to their tunneling parameters. It has the advantage of a fast calculation speed and good interpretability. A small number of samples is sufficient for a relatively stable stratum without sharp changes. Hence, the rock mass dataset usually contains tens to hundreds of samples, and the prototype optimization classifier performs sufficiently well for such a volume of data.

Before executing the unweighted distance-based prototype optimization classifier, the number of target categories n should be determined based on experience or trial-and-error. On this basis, n samples are randomly selected as the prototype of each category, and recorded as x₁, x₂, …, x_n. Each selected sample represents a category (c₁, c₂,…, c_n). The Euclidean distances between each sample and category prototype can be calculated as shown in Equation (1).

d = \sqrt{\sum_{j = 1}^{m} d_{j}^{2}},

(1)

where d_j is the difference in the jth feature between the sample and its assigned prototype. On this basis, each sample is assigned to the nearest prototype, and the updated prototypes are recalculated as ([29,30]):

C_{i} = {(\frac{1}{n_{i}} \sum_{k = 1}^{n_{i}} x_{i 1 k}, \frac{1}{n_{i}} \sum_{k = 1}^{n_{i}} x_{i 2 k}, \dots, \frac{1}{n_{i}} \sum_{k = 1}^{n_{i}} x_{i n_{i} k})}^{T},

(2)

where C_i and n_i are, respectively, the prototypes and total number of samples of the ith category, and x_ijk is the value of the jth feature of the kth sample in the category. The prototype of each category can then be calculated, and this is regarded as the new prototype. The calculation of prototypes and division of samples is performed iteratively until the results are stable.

According to the principle of prototype optimization and the structure of the dataset, the prediction code for HC is developed. The code consists of six parts, including data import, prototype initialization, distance calculation, sample classification, iteration, and results export. The structure and synopsis of the code are shown in Figure 4.

K-means is a typical unsupervised algorithm, which can only divide samples into different categories, and the categories do not have labels. However, evaluating HC is a supervised task, whose results include labels [31]. Therefore, K-means cannot be directly used to evaluate HC. In this research, the K-means clustering is only used as a baseline for prototype optimization with unweighted distance. Further, the classical distance metric of the K-means operator is refined by Spearman-derived non-uniform weights, thereby reforming the algorithm into a strictly supervised prototype learning framework. Weighted distance and assigning ordered value will be introduced in Section 4.2 and Section 4.3.

4.2. Distance Metric Refinement Based on Spearman Ranks

In clustering problems, the classification results are determined by multiple inputs. The influence of each input feature on the classification outcome varies. In fact, a stronger correlation between an input feature and the target variable signifies a greater influence, and in K-means clustering, the higher weight should be given to the distance of the input. In conventional K-means clustering, the input features are equally weighted. Equal weights are typically adopted in conventional K-means clustering because it is primarily designed for unsupervised learning, in which the influence of the input features on the output classification is unknown. However, in the issue of HC, there is clear physical significance and data basis of the input and output, and the influence of the input on the output can be analyzed by statistics. Given that HC is an ordinal variable, Spearman’s rank correlation is employed instead of Pearson’s correlation as a reference to modify the distance in K-means clustering.

Before calculating Spearman’s correlation by the 200 samples in the training set, their tunneling data are ranked in ascending order from 1 to 200, which is recorded as x_i, while i represents the sample number. In addition, the HC values are also ranked in ascending order and recorded as y_i. According to Figure 1a, samples with class II are ranked from 1 to 26, and the average order, 13.5, is recorded as these samples’ rank data y_i. Similarly, the rank data y_i of class III, IV and V samples are calculated as 60, 131.5, and 185. According to the data x_i and y_i, the Spearman correlation

ρ

can be calculated by Equation (3).

ρ = 1 - \frac{6 \cdot \sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}{n (n^{2} - 1)}

(3)

In Equation (3), n represents the total sample numbers in the training set, which is 200. The distribution of each tunneling feature to the HC is shown in Figure 5.

As Figure 5a–e show, there is a significant difference in the selected tunneling data distribution under different HCs, which also proves the relevance between tunneling data and the rock mass classification. Take the distribution of revolutions per minute as an example, which is shown in Figure 5a; with an increase in the HC value, the integrity of the surrounding rock decreases, and the selected rotation speed decreases to ensure that the volume of muck generated during tunneling does not exceed the load-bearing capacity of the belt conveyor. Therefore, according to the statistical results, the average pretreated revolutions per minute under surrounding rock with classes II, III, and IV are 0.77, 0.35, and 0.26. Similarly, penetration increases with an increase in HC, as shown in Figure 5d. According to the training data and Equation (3), the Spearman’s correlations between the selected five tunneling features and the HC and the corresponding p values are calculated and listed in Table 1.

As shown in Table 1, the absolute values of Spearman’s correlation for rotation speed and thrust are notably higher than those of the other three features, which proves that the two features have higher influence on the HC than the other three features. In particular, the p values of the five features are lower than 0.01, implying that the null hypothesis is rejected, which indicates a statistically significant correlation between these features and the rock mass HC of the surrounding rock. In addition, a sensitivity analysis is conducted to further test the stability of Spearman’s correlation by 1000 times bootstrap. The testing results are shown in Figure 6, and the 95% confidence intervals of the five tunneling data points, with the order shown in Table 1, are [−0.35, −0.58], [−0.10, −0.38], [−0.21, −0.47], [0.10, 0.36], and [−0.42, −0.63].

Accordingly, their distance weights in the prototype optimization classifier should be higher. Therefore, the absolute values of Spearman’s correlation listed in Table 1 were used as the distance weights, and the expression of distances (Equation (1)) is modified as shown in Equation (4).

d = \frac{1}{w} \sqrt{\sum_{j = 1}^{m} w_{j} \cdot d_{j}^{2}},

(4)

where d in Equation (4) denotes the modified distance and w is the sum of the absolute values of the Spearman’s correlation coefficients for the five selected features, which is 1.82 according to Table 1.

w_{j}

is the distance weight of each tunneling feature, and

w_{1}

,

w_{2}

, …,

w_{5}

are 0.46, 0.24, 0.34, 0.24, and 0.54.

4.3. Assigning Ordered Values to the Prototype

Before evaluating the HC of the surrounding rock by prototype optimization, the problem of assigning the values of the class prototypes should be solved. On the basis of the field data, the HC is ordered known data, and the evaluation of HC for surrounding rock is treated as a supervised classification problem [31]. In Section 4.1, the prototype optimization process leverages the iterative logic inherent in K-means clustering, which is a classical algorithm primarily utilized in unsupervised learning scenarios and cannot be directly used to solve the supervised problem. In other words, samples can be assigned to a specific category, but their corresponding HCs remain unknown. In this section, a method for assigning values to categories obtained by the prototype optimization is introduced to effectively integrate the discriminative power of supervised learning into the prototype-based classification framework.

To assign values to each category, the most probable HC for each prototype must be determined. For this purpose, a normalized training set is used to establish a multiple regression equation for HC using the least squares method. Its basic form is shown in Equation (5).

{H C}^{'} = \sum_{i = 1}^{n} w_{i} \cdot x_{i} + c,

(5)

where

{H C}^{'}

is the HC value calculated by the multiple regression equation and n is the number of categories, which is set to 4 in this research.

w_{i}

is the calculated weight of the ith input variable,

x_{i}

is the normalized ith input variable, and c is a constant. The 200 training samples are used to search for the optimal combination of weights by iterations. In each iteration, the calculated error is evaluated using Equation (6).

E = \frac{1}{m} \sum_{i = 1}^{m} {(H C - {H C}^{'})}^{2},

(6)

where HC represents actual HC values in the training data and m is the number of training samples, which is 200. According to the variable order listed in Table 1, x₁ to x₅ represent revolutions per minute, torque, cutterhead power, penetration, and thrust, respectively, and the corresponding coefficients

w_{1}

to

w_{5}

are calculated as −1.38, −0.97, −0.84, 0.88, and −1.64, respectively. The constant c is calculated as 5.35. According to the calculated

w_{i}

and c, the multiple regression equation is established and used to assign the HC values of categories obtained by K-means clustering. Notably, standard linear regression struggles to adapt to complex tunneling field data and provide satisfactory evaluation results. Therefore, the multiple regression (Equation (5)) is only used as a rough indexing tool to order the prototypes.

It should be noted that applying MLR to ordinal HC is a statistical approximation, as the intervals between the classifications may not be strictly uniform. Future work may explore ordinal regression models or nonlinear mapping techniques to further refine the classification boundary.

4.4. Training of the Prediction Model of HC

The SW-SPC is applied to the 200 training samples. Given that the training set comprises four classes of surrounding rock, the number of clusters, k, is set to 4. Through the alternating calculation of prototypes and their corresponding distances, four clusters are successfully identified. The iterative convergence process of the distances is illustrated in Figure 7.

As shown in Figure 7, after 10 iterations, the prototypes of the four categories reach a stable state, and the classification results remain unchanged. The prototypes of the four categories after normalizations are listed in Table 2.

Table 2 shows the prototypes of the four obtained categories. Further, the HC values of the four categories should be assigned by the multiple regression equation introduced in Section 4.3. Substituting the prototype data listed in Table 2 into Equation (5), the calculated

{H C}^{'}

values for Categories 1 to 4 are 3.03, 4.08, 2.06, and 4.90. Ranking these four categories in ascending order by their calculated

{H C}^{'}

values yields the sequence: Category 3, 1, 2, and 4, which correspond to rock mass classes II, III, IV, and V, respectively.

By the SW-SPC, the 200 samples in the training set are divided into four categories, and the HC value of clustering is given. The given HC is compared to the actual investigated HC value, and the results are shown in Figure 8. In addition, precision and recall, as widely adopted evaluation indices for classification problems, are utilized to assess the classification performance, which can be calculated by Equations (7) and (8).

P r e c i s i o n = \frac{T P}{(T P + F P)} \times 100 %,

(7)

R e c a l l = \frac{T P}{(T P + F N)} \times 100 %,

(8)

In Equations (7) and (8), TP represents the number of samples correctly classified into a specific category. FP denotes the number of samples incorrectly assigned to this category from other classes. FN represents the number of samples belonging to this category but incorrectly assigned to other classes. Precision and recall are employed to evaluate the performance of each specific category rather than the overall classification accuracy. Take the sample with the classification of IV as an example; the TP is 68, the FP is 1 + 4 + 6 = 11, and the FN is 1 + 0 + 7 = 8. Therefore, the precision and the recall of the sample with the HC of IV are 86.1% and 89.5%.

As shown in Figure 8, the SW-SPC yielded correct classification results for the majority of samples in the training set. For instance, regarding class II rock mass, the SW-SPC correctly identified 21 out of 26 samples, and the precision and recall were 87.5% and 80.8%. The precision and recall of the four kinds of samples in the training set are listed in Table 3. It is shown that the precision and recall of nearly all the four kinds of samples are higher than 80%, with their average values reaching 86.4% and 85.5%. Only the precision of class V rock is below 80%, reaching 78.1%. This discrepancy may be attributed to the imbalance of the training set, in which there were only 31 samples with a HC of V. Overall, the classification results are acceptable, and the prediction model of HC based on the SW-SPC is trained. In other words, the prototypes listed in Table 2 are regarded as the correct ones, and they are applied to the test set to further validate its prediction performance.

4.5. Results

Section 4.4 details the training process of the prediction model based on the SW-SPC, which demonstrates a satisfactory performance on the training set. To further validate the model’s predictive capability, it is applied to a testing set comprising 75 samples. The testing samples are collected from the same project, which is introduced in Section 2 of this paper.

A comparison between Figure 1a,b reveals a significant difference in the distribution of the four surrounding rock classes between the training and testing sets. Notably, the testing set exhibits a more balanced class distribution. Samples with HCs of II and V, whose proportions in the training set are only 13% and 15.5%, occupy 18.7% and 26.7% in the testing set. Based on the prototypes listed in Table 2, the 75 testing samples are directly categorized, yielding the prediction results of the SW-SPC. The comparison results between the SW-SPC and the actual HC are shown in Figure 9a. The precision and recall of the samples in the four categories are listed in Table 4.

The average precision and recall on the testing set achieved 84.6% and 81.7%, respectively, with each class exhibiting a precision and recall exceeding 75%. Although the testing set performance metrics showed slight decreases of 1.8% and 3.8% compared to the training set, these results remain highly satisfactory. These findings demonstrate that the method is helpful for predicting the HC of the tunnel surrounding rock.

To further validate the effectiveness of the proposed method, a random classifier was employed as a baseline on the test set. In the random classifier, the probabilities of the samples being evaluated as II, III, IV, and V are set as 13%, 33.5%, 38%, and 15.5%, respectively, which equal the corresponding probabilities in the training set shown in Figure 1a. As illustrated in Figure 9b, the average precision and recall achieved only 24.5% and 24.2%, respectively. These values are significantly lower than those of the proposed method, further demonstrating the superior effectiveness of our approach.

5. Discussion

5.1. 5-Fold Cross-Validation

Due to the limited size of the field dataset, a single static split with 200 training samples and 75 testing samples was employed. To verify the stability of the evaluation performance, 5-fold cross-validation was conducted on the 200 training samples. Based on the composition of the training set, five sub-datasets were formed using stratified sampling, as shown in Table 5.

Based on the five sub-datasets presented in Table 5, five test iterations were performed using the proposed method. The resulting class prototypes and average accuracies for each test are presented in Table 6.

As Figure 1 shows, the field geotechnical datasets from complex TBM tunnel excavations naturally manifest severe structural class imbalance, notably characterized by the acute scarcity of hazardous Class V surrounding rock. Consequently, the limited size of the sub-datasets (40 samples) may induce significant fluctuations in class prototypes, leading to a noticeable decline in localized testing performance.

However, this cross-validation challenge provides a profound scientific justification for our proposed physics-informed paradigm. Under these highly adverse cross-validation splits, the developed SW-SPC model successfully mitigates the severe localized fluctuations of calculated prototypes, consistently maintaining a robust multi-fold average accuracy of 77.5–85.0% and an average precision and recall of over 60%. This empirical test demonstrates that by embedding global Spearman rank correlations as non-uniform metric distance weights, the proposed classifier prevents the prototype optimization from overestimating localized sample density, thereby ensuring reliable generalization and high stability in complex field datasets.

5.2. Comparison to the Prototype Optimization with Unweighted Distance

The prediction results shown in Section 4 demonstrate the SW-SPC is effective and accurate. However, the enhancement effect of the data correlation-based distance weights still needs to be validated. For this purpose, the prototype optimization with unweighted distance was applied to the pretreated dataset. The main difference between the conventional and SW-SPC is the way to calculate the distance. In the code of the prototype optimization with unweighted distance, the distance is calculated by Equation (1) instead of Equation (3). Except for the distance calculation, the training method is the same as the one introduced in Section 4, in which the 200 training samples are used to train the prototypes. The prototypes trained by the prototype optimization with unweighted distance are listed in Table 7.

Similarly to Section 4, the order of the prototypes optimized with unweighted distance is randomly generated, and it does not match the HC. According to Equation (5), the calculated

{H C}^{'}

values of Categories 1 to 4 are 3.07, 2.01, 4.84, and 3.91. Therefore, Categories 1, 2, 3, and 4 are regarded as classes III, II, V, and IV, respectively.

On this basis, according to the distance calculated by Equation (1), the 200 training samples and 75 testing samples are divided into four categories, corresponding to class II, III, IV, and V rock, and they are compared with the actual HC. The comparison results are shown in Figure 10, while the corresponding precision and recall are listed in Table 8.

As Table 8 shows, the average training precision and recall on the training set achieved 87.3% and 86.6%, which are comparable with the results of the proposed method. However, the testing set of the prototype optimization with unweighted distance was unsatisfactory, with an average precision and recall of 70.3% and 70.0%. This indicates that the prototype optimization with unweighted distance is prone to overfitting, whereas the proposed method demonstrates superior generalization. The differences in the precision and recall in the training and testing sets are only 1.8% and 3.8%. The results demonstrate that the SW-SPC has a better performance on field-collected data than the conventional one, and the correlation-based distance exerts a positive influence on the prototype optimization.

As the results show, the evaluation performance of the proposed method is significantly higher than the prototype optimization with unweighted distance. To rigorously isolate and validate the true predictive contribution of the Spearman-based weighting mechanism itself, a formal ablation experiment was structured under the Nearest Centroid Classifier (NCC) framework (as plotted in Figure 11). Here, the input space to the five selected TBM features is strictly fixed, and the baseline models under equal weights (Equation (1)) and our proposed weighted-distance metric (Equation (4)) are explicitly contrasted. In NCC, the class prototypes are the average values of training samples, and the prototypes of samples with HCs of II to V are listed in Table 9.

With the same prototype combinations, the 75 testing samples are divided into four categories by weighted distances, as shown in Equation (4), and by unweighted distance, as shown in Equation (1), respectively, and the results are shown in Figure 11. In Figure 11, the numbers of correctly evaluated samples with HCs of II, III, IV, and V by the unweighted distance are 6, 5, 10, and 8, while those by the weighted distance are 7, 9, 12, and 11. The average precision and recall of the unweighted distance are 41.3% and 38.5%, and for the weighted distance they are 53.3% and 52.3%. In terms of average precision and recall, the performance of the NCC framework is inferior to that of the K-means-based prototype optimization, primarily due to the variations in the calculated prototypes. In addition, results of the weighted distance are more accurate than the results of unweighted distance, which further confirms that the weighted distance has positive influence on evaluation performance. The controlled ablation test rigorously isolates feature selection from distance metric learning, providing direct empirical proof that the uneven physics-informed weights themselves drive a substantial performance increment.

5.3. Comparison with Supervised Classifiers

To further validate the reliability of the SW-SPC for evaluating the HC of the surrounding rock, a typical supervised classifier, random forest (RF), is programmed by the scikit-learn library within Python 3.7 and applied to evaluate the same dataset. During the training of the RF classifier, hyperparameters are determined using a systematic trial-and-error method based on the average precision and recall of the training set. In particular, considering the imbalance of the field dataset, samples of different classes are balanced according to their frequencies. The search ranges and final selected values of the main hyper-parameters are listed in Table 10, and the evaluated results on the test set by RF are shown in Figure 11.

As Figure 12 shows, the average precision and recall of RF are 77.8% and 73.0%. On the whole, the evaluation performance of RF is better than the unweighted prototype optimization, but the average precision and recall are still 6.8% and 8.7% lower than those of the proposed method with weighted distance. Although the aforementioned supervised classifier achieved acceptable results in multiple fields, this is not suitable for the dataset in this research.

To further verify the evaluation performance of the proposed method, the confidence intervals (CIs) of the evaluated results by prototype optimization with unweighted distance, RF, and the proposed method are calculated based on layered bootstrap. Based on the layered bootstrap, the sampling frequencies for classes II, III, IV, and V were set to 14, 16, 25, and 20, respectively, over 1000 iterations. In each iteration, accuracy—defined as the ratio of correctly classified samples to the total testing set—was calculated and sorted in ascending order, as illustrated in Figure 13. The confidence level was set as 95%, and the accuracy CIs of the proposed method, the prototype optimization with unweighted distance, and RF were [74.7%, 90.7%], [60.0%, 81.3%], and [64.0%, 82.7%]. This result demonstrates that the proposed method exhibits good stability.

5.4. Limitations

This paper introduced an HC prediction method for the surrounding rock in tunnels excavated by TBM. According to verification by a total of 275 field samples, the effectiveness of the proposed method is validated. However, there are still some limitations of this method that need to be addressed.

1. In this research, the tunneling parameters are recorded by the acquisition system equipped on the TBM, and the statistical relationships between them and the HC of the surrounding rock are obtained to evaluate the HC. However, this does not mean that the rock property is the only factor affecting the tunneling data. Further, more uncontrollable factors also significantly influence the recorded tunneling parameters, such as TBM operator habits, cutter wear, and design parameters, including cutter size and count. Incorporating these complex factors into the current dataset for HC remains challenging. Therefore, in this research, data from the same tunnel project constitutes the dataset, which effectively prevents those uncontrollable factors from affecting the statistical results.

2. In the field of predicting rock mass parameters or TBM tunneling performance, a significant challenge persists in the standardized pretreatment of tunneling data. Given the structural discrepancies between tunneling and rock mass parameters, a universally accepted pretreatment protocol is currently lacking, especially for removing the increasing and decreasing stage of each tunneling cycle. This study focuses on predicting HC using pretreated tunneling data rather than optimizing the pretreatment process itself. Therefore, the pretreatment method adopted in this paper is subjective, as introduced in Section 3. The pretreatment of the tunneling data should be studied in future research.

3. The ranges and distributions of tunneling data generated by different types of TBMs are significantly different. Further, the detailed tunneling data is always determined with subjective experience. Therefore, the proposed prediction model, as well as the calculated prototypes listed in Table 2, cannot be directly used for predicting the HC in other tunnels excavated by a different type of TBM. However, the prediction idea and method based on SW-SPC can be referenced. In detail, for new tunnels excavated by other types of TBMs, a number of HC and tunneling data points should be collected, some of which can be classified by the SW-SPC to obtain the prototypes of the tunneling data under different classes of surrounding rock. On this basis, the later HC can be predicted by these prototypes instead of human investigation, which is also the significance of the proposed method.

6. Conclusions

The main conclusions of this paper are summarized as follows.

1. This paper introduces a method to predict the HC of tunnels excavated by a TBM. For this purpose, hundreds of samples with matched HC and tunneling data are needed. Using the tunneling data as input, the SW-SPC categorizes the data into distinct groups corresponding to the specific HC of the surrounding rock.

2. Based on the field tunneling data distribution, the SW-SPC based on the Spearman’s correlation between HC and tunneling data is introduced. In the prototype optimization with unweighted distance, all of the distance weights of different input features are selected as 1, which means the input features have the same influence on the output. Because the correlation between each tunneling feature and the HC is known, Spearman’s correlation is used as the weighting factor, which results in tunneling features with a higher correlation with the HC having higher weights. This approach demonstrates a positive impact on the prediction accuracy.

3. Based on a tunnel project located in Northeast China, the tunneling data recorded by the acquisition system equipped on the TBM and the corresponding HC data were collected. After pretreatment, a total of 275 samples with matched tunneling data and HC are obtained. Among them, 200 samples made up the training set to establish the prediction model and calculate the prototypes, and the other 75 samples made up the testing set to verify the performance of the model. The SW-SPC model achieved an accuracy of 82.7% (precision: 84.6%, recall: 81.7%) on the static test set. Furthermore, 5-fold cross-validation yielded an average accuracy of 82.0% (precision: 59.3–61.0%, recall: 58.3–66.5%), demonstrating robust global performance while reflecting sensitivity to class distribution variance inherent in field-measured tunneling data.

Author Contributions

L.Z. was responsible for the manuscript drafting and algorithm implementation. J.H. performed the data classification, processing, and analysis of the 275 TBM datasets. R.W. contributed to the conceptualization and methodology. N.L. supervised the manuscript writing. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the Natural Science Foundation of Shandong Province (No. ZR2024QE150), and Postdoctoral Innovation Program Project of Shandong Province (No. SDCX-ZG-202503076).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to express their sincere gratitude to the construction management department and technical teams of the hydropower tunnel project for providing the raw TBM operational data used in this study.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design, execution, interpretation, or writing of the study.

References

Naeimipour, A.; Rostami, J.; Buyuksagis, I. Applications of rock strength borehole probe (RSBP) in underground openings. In ISRM International Symposium-EUROCK; International Society for Rock Mechanics and Rock Engineering: Rhodes Island, Greece, 2016. [Google Scholar]
Wang, Q.; Gao, H.; Jiang, B.; Li, S.; Gao, S.; He, M.; Zhang, C. Development and Application of a Multifunction True Triaxial Rock Drilling Test System. J. Test. Eval. 2020, 48, 3450–3467. [Google Scholar] [CrossRef]
Goh, T.L.; Samsudin, A.R.; Rafek, A.G. Application of Spectral Analysis of Surface Waves (SASW) Method: Rock Mass Chara terization. Sains Malays. 2011, 40, 425–430. [Google Scholar]
Kong, F.; Shang, J. A Validation Study for the Estimation of Uniaxial Compressive Strength Based on Index Tests. Rock Mech. Rock Eng. 2018, 51, 2289–2297. [Google Scholar] [CrossRef]
Liu, Q.S.; Zhao, Y.; Zhang, X.P.; Kong, X. Study and discussion on point load test for evaluating rock strength of TBM tunnel constructed in limestone. Rock Soil Mech. 2018, 39, 977–984. [Google Scholar]
Mikaeil, R.; Zare Naghadehi, M.; Ghadernejad, S. An Extended Multifactorial Fuzzy Prediction of Hard Rock TBM Penetrability. Geotech. Geol. Eng. 2018, 36, 1779–1804. [Google Scholar] [CrossRef]
Hassanpour, J. Development of an empirical model to estimate disc cutter wear for sedimentary and low to medium grade metamorphic rocks. Tunneling Undergr. Space Technol. 2018, 75, 90–99. [Google Scholar] [CrossRef]
Samaei, M.; Ranjbarnia, M.; Nourani, V.; Naghadehi, M.Z. Performance prediction of tunnel boring machine through developing high accuracy equations: A case study in adverse geological condition. Measurement 2020, 152, 107244. [Google Scholar] [CrossRef]
Nelson, P.P.; Al-jalil, Y.A.; Laughton, C. Tunnel boring machine project data bases and construction simulation. In Geotechnincal Engineering Report; University of Texas at Auetin: Austin, TX, USA, 1994. [Google Scholar]
Grima, M.A.; Bruines, P.A.; Verhoef, P.N.W. Modeling tunnel boring machine performance by neuro-fuzzy methods. Tunneling Undergr. Space Technol. 2000, 15, 259–269. [Google Scholar] [CrossRef]
Entacher, M.; Lorenz, S.; Galler, R. Tunnel boring machine performance prediction with scaled rock cutting tests. Int. J. Rock Mech. Min. Sci. 2014, 70, 450–459. [Google Scholar] [CrossRef]
Armaghani, D.J.; Mohamad, E.T.; Narayanasamy, M.S.; Narita, N.; Yagiz, S. Development of hybrid intelligent models for predicting TBM penetration rate in hard rock condition. Tunneling Undergr. Space Technol. 2017, 63, 29–43. [Google Scholar] [CrossRef]
Zare Naghadehi, M.; Samaei, M.; Ranjbarnia, M.; Nourani, V. State-of-the-art predictive modeling of TBM performance in changing geological conditions through gene expression programming. Measurement 2018, 126, 46–57. [Google Scholar] [CrossRef]
Zare Naghadehi, M.; Ramezanzadeh, A. Models for estimation of TBM performance in granitic and mica gneiss hard rocks in a hydropower tunnel. Bull. Eng. Geol. Environ. 2017, 76, 1627–1641. [Google Scholar] [CrossRef]
Mahdevari, S.; Shahriar, K.; Yagiz, S.; Shirazi, M.A. A support vector regression model for predicting tunnel boring machine penetration rates. Int. J. Rock Mech. Min. Sci. 2014, 72, 214–229. [Google Scholar] [CrossRef]
Liu, B.; Wang, R.R.; Guan, Z.D.; Li, J.; Xu, Z.; Guo, X.; Wang, Y. Improved support vector regression models for predicting rock mass parameters using tunnel boring machine tunneling data. Tunneling Undergr. Space Technol. 2019, 91, 102958.1–102958.10. [Google Scholar] [CrossRef]
Yagiz, S.; Karahan, H. Prediction of hard rock TBM penetration rate using particle swarm optimization. Int. J. Rock Mech. Min. Sci. 2011, 48, 427–433. [Google Scholar] [CrossRef]
Minh, V.T.; Katushin, D.; Antonov, M.; Veinthal, R. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate. Open Eng. 2017, 7, 60–68. [Google Scholar] [CrossRef]
Barton, N.R. TBM Tunneling in Jointed and Faulted Rock; CRC Press: Boca Raton, MA, USA, 2000; pp. 170–175. [Google Scholar]
Liu, Q.; Liu, J.; Pan, Y.; Kong, X.; Hong, K. A case study of TBM performance prediction using a Chinese rock mass classification system—Hydropower Classification (HC) method. Tunneling Undergr. Space Technol. 2017, 65, 140–154. [Google Scholar] [CrossRef]
Gong, Q.; Zhao, J. Development of a rock mass characteristics model for TBM penetration rate prediction. Int. J. Rock Mech. Min. Sci. 2009, 46, 8–18. [Google Scholar] [CrossRef]
Wang, R.R.; Ni, Y.D.; Zhang, L.L.; Gao, B. Grouped machine learning methods for predicting rock mass parameters in TBM driven tunnel based on fuzzy c-means clustering. Deep Undergr. Sci. Eng. 2025, 4, 55–71. [Google Scholar] [CrossRef]
Wang, R.R.; Zhang, L.L. K means based heterogeneous tunneling data analysis method for evaluating rock mass parameters along a TBM tunnel. Sci. Rep. 2023, 13, 21564. [Google Scholar] [CrossRef] [PubMed]
Wu, X.L.; Zhang, X.P.; Liu, Q.S.; Li, W.W.; Huang, J.M. Prediction and classification of rock mass boreability in TBM tunnel. Rock Soil Mech. 2020, 1721–1729. [Google Scholar]
Liu, B.; Wang, R.R.; Zhao, G.Z.; Guo, X.; Wang, Y.; Li, J.; Wang, S. Prediction of rock mass parameters in the TBM tunnel based on BP neural network integrated simulated annealing algorithm. Tunneling Undergr. Space Technol. 2020, 93, 103103-1–103103-12. [Google Scholar] [CrossRef]
Wang, R.R.; Wang, Y.X.; Li, J.B.; Jing, L.; Zhao, G.; Nie, L. A TBM Cutter Life Prediction Method Based on Rock Mass Classification. KSCE J. Civ. Eng. 2020, 24, 2794–2807. [Google Scholar] [CrossRef]
Meng, Q.T.; Wang, H.; Bai, L.; Zhao, F.; Bai, J. Rock mass properties prediction method in TBM tunnel based on principal component analysis (PCA). Civ. Environ. Eng. 2025; ahead of print. [CrossRef]
Zhang, N.; Li, J.B.; Jing, L.J.; Li, P.Y.; Xu, S.T. Study and Application of Intelligent Control System of TBM Tunneling Parameters. Tunn. Constr. 2018, 38, 1734–1740. [Google Scholar]
Sinaga, K.P.; Yang, M.S. Unsupervised K-Means Clustering Algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
Yang, M.S.; Sinaga, K.P.A. Feature-Reduction Multi-View K-Means Clustering Algorithm. IEEE Access 2019, 7, 114472–114486. [Google Scholar] [CrossRef]
Finley, T.; Joachims, T. Supervised clustering with support vector machines. In Proceedings of the 22nd International Conference on Machine Learning; ACM: New York, NY, USA, 2005. [Google Scholar]

Figure 1. The sample proportions of different HCs in (a) training and (b) testing sets.

Figure 2. Recorded TBM tunneling data. (a) Tunneling data recorded from 0:00:00 to 18:00:00 on 20 October 2015 (b) Tunneling data of a tunneling cycle.

Figure 3. Tunneling data after pretreatment.

Figure 4. The structure and synopsis of the SW-SPC code.

Figure 5. (a) Revolution per minute, (b) torque, (c) cutterhead power, (d) penetration and (e) thrust across different HC rock mass classifications.

Figure 6. Bootstrap of Spearman’s correlation.

Figure 7. Prototype convergence with iterations.

Figure 8. Comparison of the SW-SPC results and the actual HC.

Figure 9. Evaluated results in testing set. (a) By the proposed method. (b) By the random classifier (used as a baseline).

Figure 10. Comparison between the actual and calculated HC by the prototype optimization with unweighted distance. (a) Results of the training set. (b) Results of the testing set.

Figure 11. HC evaluated results of the test set by weighted and unweighted distance. (a) Unweighted distance (Equation (1)). (b) Weighted distance (Equation (4)).

Figure 12. HC evaluated results of the test set by RF.

Figure 13. Accuracy of the mentioned three methods by layered bootstrap and the corresponding confidence interval.

Table 1. Spearman’s correlation between each tunneling feature and HC.

Feature	Revolution per Minute	Torque	Cutterhead Power	Penetration	Thrust
Spearman’s correlation	−0.46	−0.24	−0.34	0.24	−0.54
p value	2.8 × 10⁻⁷	2.9 × 10⁻⁸	3.3 × 10⁻⁶	2.1 × 10⁻⁴	8.0 × 10⁻¹¹

Table 2. Prototypes results of the SW-SPC.

	Revolution per Minute	Torque	Cutterhead Power	Penetration	Thrust
Category 1	0.60	0.53	0.57	0.46	0.55
Category 2	0.39	0.33	0.36	0.55	0.36
Category 3	0.74	0.73	0.76	0.39	0.77
Category 4	0.20	0.23	0.21	0.57	0.17

Table 3. Precision and recall of SW-SPC results in training set.

	II Class	III Class	IV Class	V Class	Average
Precision	87.5%	93.8%	86.1%	78.1%	86.4%
Recall	80.8%	91.0%	89.5%	80.6%	85.5%

Table 4. Precision and recall of the SW-SPC results in testing set.

	II Class	III Class	IV Class	V Class	Average
Precision	91.6%	85.7%	75.9%	85.0%	84.6%
Recall	78.6%	75.0%	88.0%	85.0%	81.7%

Table 5. Composition of five sub-datasets.

	II	III	IV	V
K1	5	14	25	6
K2	5	14	25	6
K3	5	13	25	7
K4	5	13	26	6
K5	6	13	25	6

Table 6. Prototypes and accuracies.

Testing Set	HC	RPM	Tor	CP	P	Th	Accuracy	Average Precision	Average Recall
K1	II	0.74	0.45	0.41	0.41	0.64	82.5%	60.5%	63.3%
	III	0.37	0.36	0.69	0.65	0.66
	IV	0.22	0.46	0.46	0.90	0.45
	V	0.23	0.19	0.57	0.80	0.21
K2	II	0.52	0.81	0.72	0.27	0.51	82.5%	59.5%	58.3%
	III	0.38	0.41	0.49	0.61	0.64
	IV	0.27	0.23	0.79	0.85	0.26
	V	0.16	0.25	0.42	0.82	0.31
K3	II	0.87	0.64	0.14	0.20	0.71	85.0%	61.0%	66.5%
	III	0.42	0.39	0.58	0.55	0.64
	IV	0.18	0.52	0.46	0.75	0.45
	V	0.21	0.22	0.63	0.85	0.31
K4	II	0.44	0.57	0.68	0.67	0.74	77.5%	59.3%	65.3%
	III	0.29	0.39	0.43	0.37	0.55
	IV	0.36	0.45	0.46	0.72	0.49
	V	0.23	0.16	0.64	0.89	0.25
K5	II	0.39	0.41	0.67	0.42	0.65	82.5%	60.8%	61.3%
	III	0.26	0.26	0.35	0.29	0.35
	IV	0.32	0.20	0.78	0.73	0.29
	V	0.16	0.46	0.49	0.77	0.23

Table 7. Prototypes optimized with unweighted distance.

	Revolution per Minute	Torque	Cutterhead Power	Penetration	Thrust
Category 1	0.50	0.51	0.64	0.51	0.61
Category 2	0.74	0.66	0.77	0.31	0.79
Category 3	0.19	0.29	0.22	0.65	0.21
Category 4	0.40	0.43	0.31	0.50	0.40

Table 8. Precision and recall by the prototype optimization with unweighted distance.

		II Class	III Class	IV Class	V Class	Average
Training	Precision	84.0%	89.1%	88.8%	87.1%	87.3%
Training	Recall	80.8%	85.1%	93.4%	87.1%	86.6%
Testing	Precision	69.2%	55.0%	81.8%	75.0%	70.3%
Testing	Recall	64.3%	68.8%	72.0%	75.0%	70.0%

Table 9. Prototypes of class II to V samples in training set.

	Revolutions per Minute	Torque	Cutterhead Power	Penetration	Thrust
Class II	0.78	0.51	0.49	0.35	0.65
Class III	0.36	0.44	0.70	0.50	0.64
Class IV	0.23	0.39	0.51	0.58	0.41
Class V	0.27	0.21	0.48	0.61	0.22

Table 10. Hyper-parameters of the RF classifier.

Hyper-Parameter	Ranges	Step	Selected Value
n_estimators	[50, 200]	50	50
max_depth	[2, 10]	2	2
min_samples_split	[5, 20]	5	10
bootstrap	-		True
class_weight	-		‘balanced’

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, L.; Hou, J.; Wang, R.; Liu, N. Prediction Method of Surrounding Rock Hydropower Classification (HC) Based on TBM Tunneling Data and Spearman-Weighted Supervised Prototype Classifier. Appl. Sci. 2026, 16, 5636. https://doi.org/10.3390/app16115636

AMA Style

Zhang L, Hou J, Wang R, Liu N. Prediction Method of Surrounding Rock Hydropower Classification (HC) Based on TBM Tunneling Data and Spearman-Weighted Supervised Prototype Classifier. Applied Sciences. 2026; 16(11):5636. https://doi.org/10.3390/app16115636

Chicago/Turabian Style

Zhang, Lingli, Jian Hou, Ruirui Wang, and Nana Liu. 2026. "Prediction Method of Surrounding Rock Hydropower Classification (HC) Based on TBM Tunneling Data and Spearman-Weighted Supervised Prototype Classifier" Applied Sciences 16, no. 11: 5636. https://doi.org/10.3390/app16115636

APA Style

Zhang, L., Hou, J., Wang, R., & Liu, N. (2026). Prediction Method of Surrounding Rock Hydropower Classification (HC) Based on TBM Tunneling Data and Spearman-Weighted Supervised Prototype Classifier. Applied Sciences, 16(11), 5636. https://doi.org/10.3390/app16115636

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction Method of Surrounding Rock Hydropower Classification (HC) Based on TBM Tunneling Data and Spearman-Weighted Supervised Prototype Classifier

Abstract

1. Introduction

2. Data Collection

2.1. Original Data

2.2. Division of the Dataset

3. Data Pretreatment

4. Formulation of the Spearman-Weighted Supervised Prototype Classifier (SW-SPC)

4.1. Mechanics of the Prototype Optimization Classifier

4.2. Distance Metric Refinement Based on Spearman Ranks

4.3. Assigning Ordered Values to the Prototype

4.4. Training of the Prediction Model of HC

4.5. Results

5. Discussion

5.1. 5-Fold Cross-Validation

5.2. Comparison to the Prototype Optimization with Unweighted Distance

5.3. Comparison with Supervised Classifiers

5.4. Limitations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI