Next Article in Journal
A Repair-Based Improved Whale Optimization Algorithm for Low-Carbon Economic Dispatch of an Islanded Renewable Microgrid
Previous Article in Journal
Performance Analysis of an Open-Cathode PEM Fuel Cell System Under Dynamic Power Profiles Using an Energy-Based Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Knowledge-Driven Method for Constructing TBM Rock-Breaking Indexes

1
Hebei Transportation Investment Group Company Limited, Shijiazhuang 050051, China
2
Key Laboratory of Large Structure Health Monitoring and Control, Shijiazhuang Tiedao University, Shijiazhuang 050043, China
3
Optoelectronic System Laboratory, Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China
4
National Key Laboratory of Green and Long-Life Road Engineering in Extreme Environment, Shenzhen University, Shenzhen 518060, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(12), 5950; https://doi.org/10.3390/app16125950
Submission received: 27 May 2026 / Revised: 9 June 2026 / Accepted: 9 June 2026 / Published: 12 June 2026

Abstract

As tunnel construction advances toward greater depths and lengths, full-face tunnel boring machines (TBMs) have become the preferred method for large-scale excavation. The operational efficiency of TBMs significantly impacts the progress, cost, and safety of tunnel projects. With the rapid development of machine learning and big data technologies, data-driven models based on cutterhead response signals have emerged as a key approach to improving TBM perception and decision-making. However, conventional rock-breaking indicators predominantly rely on single physical variables, limiting their ability to capture the complex dynamic interactions between the TBM and surrounding rock during excavation, thereby restricting their engineering applicability. To address this limitation, this study proposes a knowledge-driven data processing and indicator construction method to more accurately represent TBM operational states and surrounding rock properties. First, a novel excavation phase division algorithm based on time-domain and penetration-depth features is developed to accurately distinguish different tunneling stages. Subsequently, using data from the YC and YE projects, thrust- and torque-driven rock-breaking indicators are formulated, and the relationship between penetration depth and thrust/torque is optimized via power function fitting. Optimal exponents are determined through algorithmic optimization. Validation with field data confirms that the proposed indicators significantly enhance the accuracy and generalization of surrounding rock classification and control parameter prediction models.

1. Introduction

With the continuous expansion of infrastructure construction in transportation, water conservancy, and hydropower sectors, tunnels—as critical control structures—are being developed to greater depths and lengths. Tunnel boring machines (TBMs) offer notable advantages such as cleaner construction environments, higher automation, and significant economic benefits [1]. With advancements in machine learning technology, constructing intelligent models based on large-scale cutterhead response data has become a key approach to enhancing TBM perception and decision-making capabilities. The operational data collected during TBM tunneling serves as a fundamental basis for developing intelligent tunneling models. However, directly using all raw data for model training may introduce a substantial amount of irrelevant data, significantly reducing training efficiency. To address this, many researchers have selected or constructed rock-breaking indices that reflect rock mass conditions based on accumulated domain knowledge—a strategy known as knowledge-driven modeling. Studies have shown that intelligent tunneling models built on knowledge-driven features often outperform those based solely on data-driven approaches.
During TBM construction, it is inevitable to collect outliers as well as invalid data from machine start-up and shutdown periods. The quality of these data directly determines the performance ceiling of machine learning models. Currently, data cleaning primarily focuses on two aspects: outlier handling and tunneling phase segmentation. In terms of outlier handling, the most commonly used method is the 3? rule, which identifies data points beyond three standard deviations from the mean as outliers. Xu et al. [2], Zhang et al. [3], and Li et al. [4] each adopted statistical or feature-based filtering strategies for TBM operational data. However, this method is only applicable to steady-state segments and fails to detect anomalies during the acceleration phase. For tunneling phase segmentation, distinguishing between tunneling and shutdown states is relatively straightforward: continuous non-zero data segments are generally considered active tunneling phases [5]. Zhang et al. [6], Wang et al. [7], and Li et al. [8] divided tunneling cycles into active and transitional operating phases, including idle thrust, acceleration, steady, and deceleration stages, with some researchers further splitting the idle thrust phase into idle rotation and idle thrust sub-phases. Distinguishing specific tunneling phases is more challenging. Current approaches include: identifying the boundary between acceleration and steady phases using cumulative sum and change point detection algorithms; determining the start of the steady phase based on the standard deviation of advance speed; identifying the start of the acceleration phase using the rate of decrease in thrust speed [4]; defining the steady phase by discarding the first 400 s and the last 300 s of a tunneling cycle; and recognizing the start of the steady phase based on the operator’s characteristic behavior of reducing thrust speed before entering a steady advance [2].
There are over a hundred types of monitoring signals in TBM operations. To improve the accuracy of intelligent rock mass classification models, researchers often identify the most contributive signal types: those most correlated with rock mass grade through correlation analysis and feature importance ranking. This process is known as data-driven feature extraction. However, purely data-driven approaches often extract signal types unrelated to tunneling conditions, such as temperature or gas concentrations. Moreover, the reliance on specific databases leads to inconsistencies across studies, with varying extracted features whose relevance to rock mass conditions remains to be validated [9]. In contrast, knowledge-driven methods rely on human expertise for feature selection and generally achieve higher accuracy in rock mass classification and tunneling parameter prediction. Xu et al. [2] used TBM operating parameters for control-parameter prediction, Gao et al. [10] developed recurrent neural network models for TBM operating-parameter prediction, and Liu et al. [11] used TBM construction data for hard-rock lithology prediction. These studies support the use of cutterhead rotation speed, advance rate, cutterhead torque, and total thrust during the steady phase as key features. Additionally, new features can be constructed based on prior knowledge, with the most well-known being the Torque Penetration Index (TPI) and Field Penetration Index (FPI). Li et al. [12] and Zhu et al. [13] both confirmed the usefulness of rock-breaking-related indicators in TBM datasets. Zhang et al. [6] and Wang et al. [14] further demonstrated the effectiveness of these indicators in cross-project or engineering applications. These studies show that FPI and TPI yield better performance in rock mass classification. Academician Chen Zuyu and others, based on tunneling data from the YS and YC projects, conducted theoretical analyses and statistical tests to systematically review the use of TPI and FPI in machine learning for TBM intelligent construction. Their results confirmed that knowledge-driven methods outperform data-driven ones. Jing Liujie [15], using data from the YS project, computed FPI and TPI under different uniaxial compressive strengths and demonstrated their strong capability to distinguish between rock mass types. Li et al. [12] analyzed the relationship between TPI, FPI, and penetration depth in the YS project, showing that most tunneling cycles exhibit linear relationships, and proposed additional indices for characterizing rock conditions. Zhang et al. [6] integrated tunneling data from the YS, YC, and DZ projects (with varying cutterhead diameters), and used FPI, TPI, cutterhead diameter, and cutter count as features in a cross-project rock mass classification model, achieving promising results. Therefore, knowledge-driven features exhibit higher accuracy and interpretability than data-driven ones. However, due to limited prior knowledge, the range of knowledge-driven features remains narrow, highlighting the need to extract a richer variety of such features.
This study proposes a more precise tunneling phase segmentation algorithm and constructs a knowledge-driven rock-breaking index. In terms of data preprocessing, the relationship between control parameters and cutterhead response was analyzed from both time-domain and penetration-depth perspectives. A novel tunneling phase segmentation method based on temporal and penetration features was developed, significantly improving the accuracy and applicability of phase identification. For knowledge-driven index construction, leveraging the prior knowledge that the relationship between rock-breaking force and control parameters follows a power-law function, an innovative method was proposed to construct ascending-phase indicators based on the power-law relationship between cutterhead thrust, torque, and penetration. The proposed indicators were validated using tunneling data from the YC and YE projects, demonstrating their effectiveness in capturing the interaction between the cutterhead and surrounding rock.
The overall workflow includes excavation phase segmentation, knowledge-driven index construction, and validation through surrounding rock classification and control-parameter prediction models.

2. Description of the Projects

2.1. YC Project

The construction big data used in this study are partly sourced from the entire tunneling section of Tunnel No. 6 and partial sections of Tunnel Nos. 3, 4, and 5 in Section II of the YC Project. A portion of the geological profile is shown in Figure 1. Two open-type TBMs were deployed in this section, and their main technical parameters are nearly identical, as detailed in Table 1. Table 2 summarizes the lengths of different rock mass classes in the data segments collected from Tunnel Nos. 3, 4, 5, and 6.

2.2. YE Project

Another portion of the construction big data is sourced from partial tunneling sections of TBM1-1 and TBM1-2 in Section II of the KS segment of the YE Project. The overall construction layout is illustrated in Figure 2. This project also employed open-type hard rock TBMs, with the main parameters detailed in Table 3. Table 4 summarizes the lengths of different rock mass classes within the tunneling data collected from TBM1-1 and TBM1-2. Compared to the YC Project, the TBMs used in the YE Project feature a larger cutterhead diameter and a greater number of disc cutters, and the distribution of rock mass classes is more imbalanced.

3. TBM Data Processing Method Based on Time-Domain and Penetration Features

3.1. Tunneling Data Analysis

3.1.1. Time-Domain Analysis

First, the time-history curves of cutterhead rotation speed, advance rate, thrust, and torque within a complete excavation cycle are analyzed, as shown in Figure 3a. Based on the variation patterns of control parameters and cutterhead responses, the excavation cycle can be divided into five stages.
Idle Rotation Stage: This is the initial startup phase. During this stage, the operator gradually increases the cutterhead rotation speed from 0 r/min to the set value within one minute, generating only a small torque. The advance rate remains at 0 mm/min, and the thrust stays at 0 kN.
No-load Advance Stage: This stage spans from the beginning of advance to the point when the disc cutters contact the rock. The advance rate is set during this stage, and the penetration rate is approximately 1 mm/rev. In the first half, thrust increases sharply while torque remains relatively unchanged. In the second half, penetration remains nearly constant, but both thrust and torque rise, indicating that initially the thrust is used to overcome shield-ground friction and drag forces from the rear systems, and later, the cutters begin to contact but not fully penetrate the rock.
Ramp-up Stage: This stage begins once the disc cutters fully penetrate the tunnel face. The operator then adjusts the advance rate to determine suitable control parameters. During this stage, the cutterhead rotation speed remains unchanged, and the advance rate gradually increases. The thrust growth rate slows down, suggesting that the additional thrust is effectively used for rock breaking. Torque, however, increases rapidly as the cutters start rotating against the fully engaged rock. Related studies have shown that the ramp-up stage effectively functions as an in situ excavation test under various control parameters and contains rich rock–machine interaction information.
Steady Stage: Once the operator identifies an appropriate advance rate, both cutterhead thrust and torque remain relatively stable until the hydraulic cylinders reach their maximum stroke. This stage usually occupies most of a single excavation cycle and reflects the relationship between different rock properties and TBM performance under consistent conditions.
Shutdown Stage: After reaching the maximum stroke of the hydraulic cylinders, the operator gradually reduces both the advance rate and the cutterhead rotation speed to zero. Consequently, both torque and thrust drop to zero, marking the end of one complete excavation cycle.

3.1.2. Penetration Feature Analysis

According to laboratory tests and related findings, penetration rate exhibits the highest correlation with cutterhead thrust and torque. Analyzing the relationships between penetration rate and cutterhead thrust and torque offers an alternative perspective for understanding the excavation stages. Therefore, the time-domain curves in Figure 3a are transformed into penetration rate–thrust and penetration rate–torque curves, as shown in Figure 3b.
It can be observed that when the penetration rate is 0 mm/rev, different levels of cutterhead torque exist while the thrust remains at zero. This indicates that only the cutterhead drive system is active without any forward advance, corresponding to the idle rotation stage. As the penetration rate increases and fluctuates within the range of 0–1.5 mm/rev, the cutterhead thrust rises sharply while torque changes only slightly—this corresponds to the no-load advance stage. With further increases in penetration rate, after the disc cutters fully engage with the tunnel face, the operator increases the advance rate, resulting in a rapid increase in torque and a relatively slower rise in thrust—this marks the beginning of the ramp-up stage. When cutterhead thrust, torque, and penetration rate stabilize, the data points cluster together, corresponding to the steady stage.
By comparing these two forms of representation, it becomes clear that time-domain curves primarily reveal cutterhead responses under the operator’s selected optimal excavation parameters, emphasizing the onset of the steady stage. In contrast, penetration rate feature curves focus on cutterhead responses under varying excavation parameters during the ramp-up stage, highlighting the trends in thrust and torque as penetration increases and more clearly indicating the onset of the ramp-up stage. Together, these two analytical approaches provide more precise geometric features, facilitating the identification and segmentation of excavation stages. In this study, the association between these variables is used qualitatively: time-domain curves locate the stable-stage boundary, whereas penetration-rate curves expose the thrust/torque response during the rising stage; combining the two reduces dependence on a single signal and improves phase-boundary identification.

3.2. Tunneling Phase Segmentation Method

This subsection presents the implementation process of the excavation stage division method based on time-domain and penetration-rate features. The process consists of the following five steps.
Step 1: Removal of shutdown segments and extraction of complete excavation segments. In the shutdown stage, the cutterhead rotation speed, advance rate, thrust, and torque are all zero. Continuous non-zero segments are extracted as individual excavation cycles. The state discrimination functions are defined in Equations (1) and (2). The number of excavation cycles extracted from the YC and YE projects is listed in Table 5, which aligns well with actual operational records.
f ( x ) = 1 ( x > 0 ) 0 ( x 0 )
S ( T c h , F c h , A P M , R A M ) = f ( T c h ) f ( F c h ) f ( A P M ) f ( R A M )
Step 2: Outlier removal and noise reduction. After evaluating various filtering methods, the median filter was ultimately selected for denoising. The effect of the filtering process is shown in Figure 4. It can be observed that the application of the median filter effectively reduced noise and fluctuations.
Step 3: Identification of the end point of the stable stage. By analyzing the data characteristics, it was observed that when excavation stops, the advance rate rapidly drops to zero. Based on this feature, the end point of the stable stage within an excavation cycle is defined as the first index (from the end toward the beginning), where the advance rate exceeds the lower bound of 3σ. Let the advance rate sequence be V t i m e ; then, the index v 1 , v 2 , v 3 , , v n of the stable stage end point is determined as shown in Equation (3), where V ¯ and σ V are the mean and standard deviation of the advance rate, respectively, and n is the total number of time points.
i e n d = max j | v j > V ¯ 3 σ V , n j 1
Step 4: Identification of the starting point of the stable stage. On the time–penetration curve, the start of the stable stage appears as an inflection point where the penetration rate transitions from a rapid increase to a steady state. Before this point, the penetration rate increases with a gradually decreasing slope; after the inflection, it fluctuates within a narrow range. Based on this geometric feature, the point farthest from the straight line connecting the origin and the end point of the stable stage is defined as the starting point of the stable stage. The index ise corresponding to this point is identified as the start of the stable stage, and its search method is shown in Equations (1)–(4), where tend and pend represent the time and penetration rate at the end of the stable stage, and pi and ti represent the penetration and time at the i-th point, respectively.
i s e = max p e n d t i t e n d p i p e n d 2 + t e n d 2 , 1 i i e n d
Step 5: Identification of the starting point of the rising stage. In the penetration–thrust curve, the geometric feature of the starting point of the rising stage is like that of the stable stage’s starting point on the time–penetration curve. Based on the same principle used in Step 4, the point that is farthest from the straight line connecting the origin to the stable stage end point is defined as the start of the rising stage. The index ise corresponding to this point is identified as the start of the rising stage, and the search method is given in the equation, where Fse and pse denote the thrust and penetration at the stable stage starting point, and pi and Fi represent the penetration and thrust at the i-th point, respectively. This method is independent of the sampling frequency and achieves significantly higher segmentation accuracy and broader applicability than the standard deviation or mean-based methods commonly used in related studies. The final segmentation result is shown in Figure 4.
i s s = max F s e p i p s e F i F s e 2 + p s e 2 , 1 i i s e

4. Construction of TBM Rock-Breaking Indices Based on the Physical Laws of Rock–Machine Interaction

The cutterhead rotation speed, advance rate, thrust, and torque are the most representative signals reflecting the interaction between the TBM and the surrounding rock [16]. As these four signals are commonly monitored across most TBM systems, many existing studies construct rock-breaking indices based on them to enhance the generalizability of the resulting indicators. In this section, knowledge-driven rock-breaking indices are also developed exclusively using these four types of signals.

4.1. Rock-Breaking Index Construction Method for the Ascending Phase

4.1.1. Index Construction Method

The rock-breaking force exhibits a power-law relationship with the control parameters. Geng et al. [17] verified this relationship using a full-scale experimental cutterhead system, and Sun et al. [18] developed TBM cutting-force prediction formulas based on rotary cutting tests. It can thus be inferred that the relationship between control parameters and cutterhead response during the ascending phase should also follow a similar form. Based on this, the cutterhead thrust and torque during the ascending phase are fitted using power-law functions, and the corresponding fitting parameters are extracted as rock-breaking indices. The thrust acting on a single disc cutter can be simplified as the sum of the net thrust and the resistive force due to trailing support and average cutter resistance (Ff). The total thrust is the sum of the normal forces on all disc cutters, as expressed in Equation (6). Similarly, the torque of a single cutter can be represented as the sum of the net torque and the average frictional moment (Tf), and the total torque can be simplified as shown in Equation (7). The coefficients aF, mF, aT, mT, Ff, and Tf are undetermined constants that depend on rock properties, disc cutter geometry, and other factors. As rock strength and integrity increase, the incremental thrust required per unit penetration also increases, reflected in larger values of aF, mF, aT, and mT. For rock masses with similar strength and integrity, larger values of Ff and Tf indicate greater resistance to overcome, which can also serve as effective indicators of rock-breaking difficulty.
F c h = i = 1 N s u m a F i p c u t t e r m F i + F f i = N s u m a F p c u t t e r m F + F f
T c h = i = 1 N s u m a T i p c u t t e r m T i + T f i = N s u m a T p c u t t e r m T + T f
Power-law fitting is highly sensitive to data fluctuations. Directly fitting the penetration–thrust data during the ascending phase to estimate aF, mF and Ff, or fitting the penetration–torque data to obtain aT, mT, and Tf, can lead to large variations in the power-law coefficients, making it difficult to identify consistent patterns. Experimental results in this study and previous research [17] indicate that rock mass conditions have a relatively small effect on the power-law exponents but significantly influence the coefficients. Therefore, this study fixes the exponents mF and mT, and only fits the coefficients aF, Ff, aT, and Tf to reduce the sensitivity of the power-law model to data variability. Recent studies have similarly emphasized that penetration, thrust, torque, cutter geometry, and rock-mass characteristics should be considered jointly [19,20,21,22].
To identify the optimal fixed exponents, a grid search was performed over defined exponent ranges, using goodness of fit (R2) as the evaluation metric. Linear and logarithmic functions were also included for comparison. The search ranges were set as mF ∈ [0.1, 1.0] and mT ∈ [0.5, 1.5], with a step size of 0.1. Table 6 and Table 7 present the best-performing function types. In the YC and YE project datasets, the penetration–thrust relationship is best modeled by a power function, followed by logarithmic and linear functions. When the exponent is set to 0.4, the number of fits with R2 in the range [0.9, 1.0] is the highest; when the exponent is 0.3, the number of poor fits (R2 < 0.6) is the lowest. For the penetration–torque relationship, similar trends are observed. Power functions again show the best performance, followed by linear and logarithmic functions. When the exponent exceeds 1.2, the number of fits with R2 < 0.6 increases. In contrast, when the exponent lies in the range [0.9, 1.0], the number of fits with R2 ∈ [0.9, 1.0] increases. According to standard evaluation criteria, an R2 > 0.6 indicates a good fit. Therefore, the optimal exponent is defined as the one that results in the fewest fits with R2 < 0.6 and the most with R2 ∈ [0.9, 1.0]. Based on this criterion, the optimal exponents for the penetration–thrust and penetration–torque relationships in the YE and YC datasets are determined to be 0.3 and 1.2, respectively. The fitting results for the tunneling cycles in Section 3.1 are shown in Figure 5, all with R2 > 0.99, demonstrating excellent fitting performance.

4.1.2. Distribution Patterns of Rock-Breaking Indices

The distributions of the rock-breaking indices aF, Ff, aT, and Tf across different surrounding rock classes in the YC and YE projects are shown in Figure 6 and Figure 7. All four indices exhibit approximately normal distributions. With increasing surrounding rock class (i.e., from weaker to stronger rock masses), both aF and aT show a decreasing trend. This indicates that in stronger and more intact rock masses, higher thrust and torque are required by the TBM to achieve the same unit penetration, highlighting the increasing rock-breaking difficulty. Notably, Class V rock demonstrates a significant difference from the other rock classes, which aligns with the distribution patterns observed in the no-load segment. This consistency supports the reliability of the rock-breaking indices derived through power-law fitting. In contrast, Ff and Tf also tend to decrease with increasing rock class, but their variation across classes is less regular. Additionally, the rock-breaking indices obtained from the YE project are generally higher than those from the YC project. This trend is consistent with the results from the no-load segment, further validating the accuracy of the index extraction and normalization method for the ascending phase.

4.2. Rock-Breaking Indices in Other Phases

In the no-load stage, cutterhead rotation speed remains stable, while penetration rate, thrust, and torque gradually increase. The thrust at the end of this stage reflects the combined force from rock pressure on the shield and the trailing system, while the torque reflects system friction and surrounding rock constraints. These parameters characterize the cutterhead’s mechanical response in the no-load stage. Since the cutters do not contact the rock face during this stage, to enable comparison across different projects, the thrust and torque are normalized by the cutterhead perimeter to define average frictional thrust (FKT) and average frictional torque (TKT), as shown in Equation (8).
F K T = F c h 2 π r c h , T K T = T c h 2 π r c h
In rock mass classification, key indicators are typically derived from the steady stage, including average cutterhead rotation speed, advance rate, penetration, thrust, and torque. To eliminate the effect of cutter quantity, thrust per cutter and torque per cutter are calculated by dividing thrust and torque by the number of cutters, as shown in Equations (9) and (10). Additionally, the ratio of thrust power (PF) to torque power (PT), referred to as the power ratio (BFT), is also a useful rock-breaking indicator because it reflects the relative contribution of axial thrust work and rotational cutting work. It is calculated using Equations (11)–(13).
F o n e = F c h N s u m
T o n e = T c h N s u m
P T = i = 1 N s u m 2 π r i F M R i R P M 60 S c h = π T c h R P M 30 S c h
P F = i = 1 N s u m F M N i A P M S c h = F c h A P M S c h
B F T = 30 F c h p c u t t e r π T c h

4.3. Correlation Analysis

Since the knowledge-driven rock-breaking indicators from both the no-load and steady stages involve cutterhead rotation speed, advance rate, penetration, thrust, and torque, high correlations among these indicators may lead to multicollinearity when used as model inputs. This could negatively affect the performance of machine learning models. To address this, Pearson correlation analysis was conducted on the above features, as shown in Figure 8. The results indicate that the correlation coefficients between cutterhead rotation speed and thrust per cutter from the no-load and steady stages exceed 0.8. Therefore, highly correlated features should not be used simultaneously in the construction of base learners.

5. Validation of Knowledge-Driven Rock-Breaking Indices Based on Surrounding Rock Classification

5.1. Dataset Construction and Evaluation Metrics

In this section, a cross-project training dataset was constructed using all of the excavation segments from Tunnel 6 and part of Tunnel 3 in the YC project, along with all of the excavation segments from TBM1-1 in the YE project. This dataset was then split into a training and validation set at a 4:1 ratio [23]. Two test sets were defined: Test Set 1 includes 200 excavation segments—50 segments each from Class II, III, IV, and V surrounding rocks—randomly selected from Tunnels 3, 4, and 5 in the YC project, to evaluate the model’s performance within the YC project. Test Set 2 comprises all excavation segments from TBM1-2 in the YE project, used to assess the model’s performance across projects. The features used include: no-load phase: cutterhead rotation speed, advance rate, penetration rate, average frictional thrust, and average frictional torque; ascending phase: aF, Ff, aT, and Tf; stable phase: cutterhead rotation speed, advance rate, penetration rate, thrust per cutter, torque per cutter, and power ratio; and others: cutterhead diameter and number of disc cutters. To improve classification performance, some studies simplify the task to binary classification by categorizing Classes II–III as “favorable” and Classes IV–V as “unfavorable” rock conditions, which can still effectively reflect the overall rock condition [6]. The distribution of surrounding rock classes is shown in Figure 9.
The binary classification performance was evaluated using the following metrics [24]: precision (PRE), recall (REC), accuracy (ACC), F1-score, and area under the ROC curve (AUC). Since the original dataset contains Grades II–V and some grades are strongly imbalanced in the YE project, full multi-class classification is reserved for future work with more balanced samples, while the original four-class distribution is retained in Figure 9 for completeness.

5.2. Verification of the Effectiveness of Knowledge-Driven Features

To demonstrate the effectiveness of the proposed knowledge-driven indicators for cross-project classification of rock mass quality, and to evaluate different combinations of indicators from various tunneling phases, this study builds comparative models using a combined dataset from the YC and YE projects. Since the knowledge-driven indicators from both the no-load and steady phases involve cutterhead rotation speed, advance rate, penetration, thrust, and torque, high correlations among these features may lead to multicollinearity when used as model inputs, which can negatively impact machine learning performance. Therefore, Pearson correlation analysis was conducted on these features, as shown in Figure 8. The results indicate that the correlation coefficients between cutterhead rotation speed and thrust per cutter from the no-load and steady phases exceed 0.8. As a result, highly correlated features should not be used simultaneously when constructing base learners.
Based on the above analysis, six models were developed using different feature combinations.
Model based on knowledge-driven indicators from the no-load phase: Features include cutterhead rotation speed, advance rate, penetration, average frictional thrust, average frictional torque during the no-load phase, as well as cutterhead diameter and number of cutters.
Model based on knowledge-driven indicators from the penetration-increasing phase: Features include cutterhead rotation speed, aF, Ff, aT, and Tf during the penetration-increasing phase, along with cutterhead diameter and number of cutters.
Model based on knowledge-driven indicators from the steady phase: Features include advance rate, cutterhead rotation speed, penetration, thrust per cutter, and power ratio during the steady phase, plus cutterhead diameter and number of cutters.
Model using four commonly used indicators (baseline model): Features include cutterhead rotation speed, advance rate, cutterhead thrust, and torque during the steady phase, as well as cutterhead diameter and number of cutters.
Model combining knowledge-driven indicators from the no-load and penetration-increasing phases: Features include the same no-load phase indicators as in Model 1 and aF, Ff, aT, and Tf from the penetration-increasing phase, along with cutterhead diameter and number of cutters.
Model using all knowledge-driven indicators: Features include all indicators from the no-load phase (Model 1), penetration-increasing phase (Model 2), and selected steady-phase indicators (Model 3), excluding highly correlated ones, together with cutterhead diameter and number of cutters.
Based on the six proposed feature sets, ten widely used classification algorithms were employed to evaluate the effectiveness of surrounding rock quality recognition. These algorithms include four statistical learning methods—Logistic Regression (LG), Decision Tree (DT), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN); five ensemble learning methods—Random Forest (RF), LightGBM (LGBM), AdaBoost, Gradient Boosting Decision Tree (GBDT), and XGBoost; and one deep learning method—Long Short-Term Memory (LSTM). A total of 60 models (6 feature sets × 10 algorithms) were constructed, and optimal hyperparameters were determined via grid search. The models were then evaluated on the validation dataset. Table 8 summarizes the best-performing model for each feature set along with its performance metrics. The results demonstrate that all optimal models are based on ensemble learning algorithms, indicating their superior capability in classifying surrounding rock quality. All optimal models achieved an accuracy (ACC) above 0.975, precision (PRE) above 0.879, recall (REC) above 0.824, F1-score above 0.851, and area under the ROC curve (AUC) above 0.990. These findings suggest that traditional classification models can effectively extract key data features; however, further improvement is still possible in terms of predictive performance.
A comparative analysis of the evaluation metrics across the six models reveals that the model constructed using knowledge-driven features from the no-excavation (idle rotation) phase (Model 1) outperforms the traditional feature-based model (Model 4), indicating a certain level of classification capability. Furthermore, the models based on knowledge-driven features from the penetration-increasing phase (Model 2) and the steady phase (Model 3) achieved better performance than Model 1, suggesting the features from these phases are more informative. Models incorporating multi-phase knowledge-driven rock-breaking features (Models 5 and 6) consistently outperformed the single-phase models (Models 1–4). Notably, the model constructed using all knowledge-driven features (Model 6) achieved the highest F1-score (0.917) and AUC (0.998) among all six models, improving the F1-score by 4.9% and AUC by 0.6% compared with the traditional steady-stage control model (Model 4). These results indicate that model performance improves as the feature set expands, particularly with the inclusion of penetration-increasing phase indicators, which significantly enhance classification effectiveness. This confirms the validity and advantage of the proposed knowledge-driven rock-breaking indices for cross-project surrounding rock quality recognition.
To further investigate the generalization capability of machine learning models on cross-project datasets, two additional models were constructed using all knowledge-driven rock-breaking features: one trained solely on the YC project dataset (Model 7) and the other on the YE project dataset (Model 8). These models were compared with the cross-project model (Model 6), which was trained on the combined dataset of both YC and YE projects. The evaluation metrics for Models 6–8 are summarized in Table 8. The results show that the performance of the cross-project model (Model 6) lies between that of the single-project models. Specifically, Model 7 (YC-based) exhibited superior performance due to a more balanced distribution of surrounding rock quality categories. In contrast, Model 8 (YE-based) performed less favorably, primarily because of severe class imbalance in the YE dataset—only 2.4% of the samples belong to poor-quality surrounding rock. Overall, the cross-project model (Model 6) maintained comparable performance to the single-project models, indicating that the integration of datasets from different projects did not compromise the learning ability of the machine learning model. This suggests the proposed approach is suitable for cross-project classification tasks in TBM tunneling operations. Therefore, the method is most effective for open-type hard-rock TBM projects with stable sensor acquisition, identifiable excavation cycles, and sufficient samples covering the main surrounding rock classes.

6. Conclusions

This paper, based on prior knowledge of the mechanical interaction between the cutterhead and surrounding rock, proposes a new excavation phase division algorithm based on time-domain and penetration-depth features. Using data from six tunnels in the YC and YE projects, a total of 12,598 excavation cycles were analyzed. In constructing rock-breaking indicators driven by thrust and torque, this study introduces an ascending phase indicator based on a power function fitting relationship between cutterhead thrust, torque, and penetration depth. Through optimization, the optimal exponents for penetration depth, thrust, and torque were determined to be 0.3 and 1.2, respectively. The proposed knowledge-driven rock-breaking indicators, based on thrust and torque data from the YC and YE projects, validate the effectiveness of the data processing and indicator construction methods.
The innovation of this study lies in the proposed method for constructing indicators that comprehensively considers multidimensional data and their interrelationships during TBM operation. By incorporating machine learning techniques, the accuracy of prediction models has been significantly improved. The findings of this research provide new insights into TBM intelligent perception and decision-making in complex geological conditions, offering theoretical and technical support for data-driven management and optimization in tunnel construction. This method has broad engineering application prospects, especially for tunnel excavation projects in complex geological conditions, where it can effectively improve excavation efficiency, reduce construction risks, and contribute to the further development of tunnel engineering technologies.
The proposed method has several limitations. It depends on reliable sensor data and accurate synchronization of thrust, torque, advance rate, and cutterhead rotation signals. Its applicability to shielded TBMs, highly fractured strata, or extremely heterogeneous geological conditions still requires further validation. Future work will focus on online adaptation, integration with additional signals, and more balanced multi-class datasets.

Author Contributions

Y.G.: Conceptualization, Writing—review, Resources and editing. H.S.: Data curation, Writing, Formal analysis and Investigation. H.X.: Data curation, Writing—original draft. X.Z.: Software. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (Grant No. 52578481), the Science and Technology Research Project of Hebei Higher Education Institutions (Project No. JZX2024010), the Research on Long-Term Treatment Technologies and Applications for Defects in Heavy-Haul Railway Tunnels during Operation (Project No. SHTL-24-24), and the Hebei Provincial Science and Technology Program (Project No. 25365406D). The authors gratefully acknowledge the financial support of the above-mentioned organizations.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Acknowledgments

The authors would like to thank the mining company for providing monitoring data.

Conflicts of Interest

Author Haokai Sun was employed by Hebei Transportation Investment Group Company Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Chen, Z.; Zhang, Y.; Li, J.; Li, X.; Jing, L. Diagnosing tunnel collapse sections based on TBM tunneling big data and deep learning: A case study on the Yinsong Project, China. Tunn. Undergr. Space Technol. 2021, 108, 103700. [Google Scholar] [CrossRef]
  2. Xu, C.; Liu, X.; Wang, E.; Wang, S. Prediction of tunnel boring machine operating parameters using various machine learning algorithms. Tunn. Undergr. Space Technol. 2021, 109, 103699. [Google Scholar] [CrossRef]
  3. Zhang, Q.; Liu, Z.; Tan, J. Prediction of geological conditions for a tunnel boring machine using big operational data. Autom. Constr. 2019, 100, 73–83. [Google Scholar] [CrossRef]
  4. Li, J.-B.; Chen, Z.-Y.; Li, X.; Jing, L.-J.; Zhang, Y.-P.; Xiao, H.-H.; Wang, S.-J.; Yang, W.-K.; Wu, L.-J.; Li, P.-Y.; et al. Feedback on a shared big dataset for intelligent TBM Part I: Feature extraction and machine learning methods. Undergr. Space 2023, 11, 1–25. [Google Scholar] [CrossRef]
  5. Hou, S.; Liu, Y.; Yang, Q. Real-time prediction of rock mass classification based on TBM operation big data and stacking technique of ensemble learning. J. Rock Mech. Geotech. Eng. 2022, 14, 123–143. [Google Scholar] [CrossRef]
  6. Zhang, Y.P.; Chen, Z.Y.; Jin, F.; Jing, L.J.; Xing, H.; Li, P.Y. Cross-project prediction for rock mass using shuffled TBM big dataset and knowledge-based machine learning methods. Sci. China-Technol. Sci. 2023, 66, 751–770. [Google Scholar] [CrossRef]
  7. Wang, X.; Zhu, H.; Zhu, M.; Zhang, L.; Ju, J.W. An integrated parameter prediction framework for intelligent TBM excavation in hard rock. Tunn. Undergr. Space Technol. 2021, 118, 104196. [Google Scholar] [CrossRef]
  8. Li, J.; Guo, D.; Chen, Z.; Li, X.; Li, Z. Transfer learning for collapse warning in TBM tunneling using databases in China. Comput. Geotech. 2024, 166, 105968. [Google Scholar] [CrossRef]
  9. Li, J.; Li, P.; Guo, D.; Li, X.; Chen, Z. Advanced prediction of tunnel boring machine performance based on big data. Geosci. Front. 2021, 12, 331–338. [Google Scholar] [CrossRef]
  10. Gao, X.; Shi, M.; Song, X.; Zhang, C.; Zhang, H. Recurrent neural networks for real-time prediction of TBM operating parameters. Autom. Constr. 2019, 98, 225–235. [Google Scholar] [CrossRef]
  11. Liu, Z.; Li, L.; Fang, X.; Qi, W.; Shen, J.; Zhou, H.; Zhang, Y. Hard-rock tunnel lithology prediction with TBM construction big data using a global-attention-mechanism-based LSTM network. Autom. Constr. 2021, 125, 103647. [Google Scholar] [CrossRef]
  12. Li, X.; Wu, L.-J.; Wang, Y.-J.; Li, J.-H. Rock fragmentation indexes reflecting rock mass quality based on real-time data of TBM tunnelling. Sci. Rep. 2023, 13, 10420. [Google Scholar] [CrossRef] [PubMed]
  13. Zhu, M.; Gutierrez, M.; Zhu, H.; Ju, J.W.; Sarna, S. Performance Evaluation Indicator (PEI): A new paradigm to evaluate the competence of machine learning classifiers in predicting rockmass conditions. Adv. Eng. Inform. 2021, 47, 101232. [Google Scholar] [CrossRef]
  14. Wang, S.; Wang, Y.; Li, X.; Liu, L.; Xing, H.; Zhang, Y. Big Data-Based Boring Indexes and Their Application during TBM Tunneling. Adv. Civ. Eng. 2021, 2021, 2621931. [Google Scholar] [CrossRef]
  15. Jing, L.-J.; Li, J.-B.; Yang, C.; Chen, S.; Zhang, N.; Peng, X.-X. A case study of TBM performance prediction using field tunnelling tests in limestone strata. Tunn. Undergr. Space Technol. 2019, 83, 364–372. [Google Scholar] [CrossRef]
  16. Li, X.; Yao, M.; Yuan, J.-D.; Wang, Y.-J.; Li, P.-Y. Deep learning characterization of rock conditions based on tunnel boring machine data. Undergr. Space 2023, 12, 89–101. [Google Scholar] [CrossRef]
  17. Geng, Q.; He, F.; Ma, M.; Liu, X.; Wang, X.; Zhang, Z.; Ye, M. Application of Full-Scale Experimental Cutterhead System to Study Penetration Performance of Tunnel Boring Machines (TBMs). Rock. Mech. Rock. Eng. 2022, 55, 4673–4696. [Google Scholar] [CrossRef]
  18. Sun, H.K.; Gao, Y.; Yang, C.Y. Full-Scale Rotary Cutting Experimental Study and Development of Prediction Formulas for TBM Cutting Force. Arab. J. Sci. Eng. 2023, 48, 13353–13376. [Google Scholar] [CrossRef]
  19. Liu, H.; Yang, J.; Huang, F.; Shi, X.; Zhu, J. Experimental study on rock-breaking effect of TBM disc cutters in different lithological types. Int. J. Rock Mech. Min. Sci. 2021, 145, 104832. [Google Scholar] [CrossRef]
  20. Zhang, H.; Hou, X.; Xia, Y.; Li, T. Experimental study on rock-breaking characteristics of tunnel boring machine disc cutters under variable penetration depths. Buildings 2024, 14, 1719. [Google Scholar]
  21. Shin, Y.J.; Kim, K.; Lee, C.; Yagiz, S. Performance evaluation of a full-scale rotary cutting machine on granite with a single disc cutter. Eng. Comput. 2024, 41, 155–182. [Google Scholar] [CrossRef]
  22. Li, Z.; Dai, L.; Liu, H.; Tan, Y.; Li, X.; Zhang, X. A novel rotary cutting machine equipped with sensors to analyze rock breakage and disc cutter performance. Sensors 2024, 24, 6320. [Google Scholar] [CrossRef] [PubMed]
  23. Ma, H.; Zhou, J.; Abbas, S.M.; Zhang, L.; Deng, X.; Khandelwal, M. Automated rock mass classification using a deep convolutional neural network and long short-term memory network with a new high-quality database. Comput. Geotech. 2024, 175, 106695. [Google Scholar]
  24. Luque, A.; Carrasco, A.; Martín, A.; Ana, D.L.H. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit. 2019, 91, 216–231. [Google Scholar] [CrossRef]
Figure 1. Tunnel 6 in the YC project.
Figure 1. Tunnel 6 in the YC project.
Applsci 16 05950 g001
Figure 2. YE project.
Figure 2. YE project.
Applsci 16 05950 g002
Figure 3. Parameter characteristics during a tunneling cycle from the perspectives of time and penetration.
Figure 3. Parameter characteristics during a tunneling cycle from the perspectives of time and penetration.
Applsci 16 05950 g003
Figure 4. Data denoising and identification of the start and end points of the ascending phase.
Figure 4. Data denoising and identification of the start and end points of the ascending phase.
Applsci 16 05950 g004aApplsci 16 05950 g004b
Figure 5. Fitting results of penetration–thrust and penetration–torque.
Figure 5. Fitting results of penetration–thrust and penetration–torque.
Applsci 16 05950 g005
Figure 6. Data distribution of power function fitting coefficients for the ascending phase in the YC project.
Figure 6. Data distribution of power function fitting coefficients for the ascending phase in the YC project.
Applsci 16 05950 g006
Figure 7. Data distribution of power function fitting coefficients for the ascending phase in the YE project.
Figure 7. Data distribution of power function fitting coefficients for the ascending phase in the YE project.
Applsci 16 05950 g007
Figure 8. Correlation analysis of rock-breaking indicators in the empty thrust and stable phase.
Figure 8. Correlation analysis of rock-breaking indicators in the empty thrust and stable phase.
Applsci 16 05950 g008
Figure 9. Proportion of rock grades in four-class and two-class distributions in the YC project and YE project. (a) YC project: Entire section of Hole No. 6 and part of Hole No. 3. (b) YE project: TBM1-1 tunneling section. (c) YC project: Tunneling section in Test Set 1. (d) YE project: Tunneling section in Test Set 2.
Figure 9. Proportion of rock grades in four-class and two-class distributions in the YC project and YE project. (a) YC project: Entire section of Hole No. 6 and part of Hole No. 3. (b) YE project: TBM1-1 tunneling section. (c) YC project: Tunneling section in Test Set 1. (d) YE project: Tunneling section in Test Set 2.
Applsci 16 05950 g009
Table 1. Main parameters of TBMs in the YC project.
Table 1. Main parameters of TBMs in the YC project.
Feature ParameterValueFeature ParameterValue
IDCREC 667/668Max Thrust Speed (mm/min)120
TBM TypeOpenMax Cutterhead Speed (r/min)11.45
Cutterhead Diameter (mm)5200Number of Rollers34
Rated Cutterhead Thrust (kN)11,340Max Advancement (mm)1800
Rated Cutterhead Torque (kN·m)3340Avg Roller Spacing (mm)70
Table 2. Lengths of each rock mass grade in the YC project.
Table 2. Lengths of each rock mass grade in the YC project.
TunnelTBMGrade II/mGrade III/mGrade IV/mGrade V/mSum/m
3CREC6670.000.000.00301.52301.52
4CREC66866.9366.080.000.00133.00
5CREC6680.004.6167.390.0072.00
6CREC6671157.762619.43365.2310.264152.68
Sum/m\1224.692690.12432.61311.784659.20
Table 3. Main parameters of TBM in the YE project.
Table 3. Main parameters of TBM in the YE project.
Feature ParameterValueFeature ParameterValue
TBM TypeOpen TypeCutterhead Drive Power (kW)2100
Cutterhead Diameter (mm)7030Rated Cutterhead Torque (kN·m)4410
Number of Rollers48Cutterhead Unjamming Torque (kN·m)6620
Max Cutterhead Speed (r/min)10.6Rated Cutterhead Thrust (kN)23,562
Table 4. Lengths of each rock mass grade in the YE project.
Table 4. Lengths of each rock mass grade in the YE project.
TunnelGrade II/mGrade III/mGrade IV/mGrade V/mSum/m
TBM1-17623.22736.4656.2558.128474.05
TBM1-25073.97199.5779.2760.845413.65
Sum/m12,697.19936.03135.52118.9613,887.71
Table 5. Rock mass classification and number of tunneling cycles at the YC and YE projects.
Table 5. Rock mass classification and number of tunneling cycles at the YC and YE projects.
ProjectTunnelGrade IIGrade IIIGrade IVGrade V
Length/mNumLength/mNumLength/mNumLength/mNum
YC30.0000.0000.000301.52392
YC466.935066.08460.0000.000
YC50.0004.61467.39500.000
YC61157.767252619.431704365.2334210.2618
YETBM1-17623.225096736.4654356.256658.1271
YETBM1-25073.973178199.5715079.279060.8473
Table 6. Fitting results of cutterhead thrust and penetration in the ascending phase.
Table 6. Fitting results of cutterhead thrust and penetration in the ascending phase.
Model ParametersDistribution of R2/%
Function TypeFunction Type[1.0, 0.9)[0.9, 0.8)[0.8, 0.7)[0.7, 0.6)<0.6
Power functionaF·(pcutter)0.4 + Ff63.6817.946.633.308.36
Power functionaF·(pcutter)0.3 + Ff63.5318.206.563.368.27
Power functionaF·(pcutter)0.5 + Ff63.5117.896.683.378.45
Power functionaF·(pcutter)0.6 + Ff63.1218.036.713.418.62
Power functionaF·(pcutter)0.2 + Ff63.1118.486.573.418.32
Logarithmic functionaF·(log0.1pcutter) + Ff61.3620.066.843.278.38
Linear functionaF·pcutter + Ff59.3819.997.453.959.13
Table 7. Fitting results of cutterhead torque and penetration in the ascending phase.
Table 7. Fitting results of cutterhead torque and penetration in the ascending phase.
Model ParametersModel Parameters
Function TypeFunction Type[1.0, 0.9)[0.9, 0.8)[0.8, 0.7)[0.7, 0.6)<0.6
Power functionaT·(pcutter)1.4 + Tf91.404.931.470.491.52
Power functionaT·(pcutter)1.3 + Tf91.265.091.430.511.50
Power functionaT·(pcutter)1.2 + Tf91.225.111.450.541.47
Power functionaT·(pcutter)1.1 + Tf91.135.071.560.561.47
Power functionaT·(pcutter)1.0 + Tf91.125.071.490.701.47
Linear functionaT·pcutter + Tf91.085.071.490.651.47
Logarithmic functionaT·(log1pcutter) + Tf78.8614.463.081.352.08
Table 8. Evaluation results of models with different stages and features on the validation set.
Table 8. Evaluation results of models with different stages and features on the validation set.
Model IDFeature StageDatasetBest Model TypeACCPRERECF1AUC
1Idle StageYC & YE ProjectsRF0.9750.8790.8240.8510.990
2Rising StageYC & YE ProjectsLGBM0.9830.8990.8970.8980.996
3Stable StageYC & YE ProjectsXGBoost0.9790.8910.8570.8740.993
4Stable Stage (Control)YC & YE ProjectsRF0.9790.9040.8470.8740.992
5Idle + Rising StagesYC & YE ProjectsXGBoost0.9830.9100.8960.9030.997
6All StagesYC & YE ProjectsLGBM0.9860.9250.9090.9170.998
7All StagesYC ProjectLGBM0.9670.9180.9390.9280.995
8All StagesYC & YE ProjectsRF0.9960.9050.7040.7920.978
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, H.; Gao, Y.; Xu, H.; Zheng, X. Knowledge-Driven Method for Constructing TBM Rock-Breaking Indexes. Appl. Sci. 2026, 16, 5950. https://doi.org/10.3390/app16125950

AMA Style

Sun H, Gao Y, Xu H, Zheng X. Knowledge-Driven Method for Constructing TBM Rock-Breaking Indexes. Applied Sciences. 2026; 16(12):5950. https://doi.org/10.3390/app16125950

Chicago/Turabian Style

Sun, Haokai, Yang Gao, Hongbin Xu, and Xinyu Zheng. 2026. "Knowledge-Driven Method for Constructing TBM Rock-Breaking Indexes" Applied Sciences 16, no. 12: 5950. https://doi.org/10.3390/app16125950

APA Style

Sun, H., Gao, Y., Xu, H., & Zheng, X. (2026). Knowledge-Driven Method for Constructing TBM Rock-Breaking Indexes. Applied Sciences, 16(12), 5950. https://doi.org/10.3390/app16125950

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop