5.1. Dataset Construction and Evaluation Metrics
In this section, a cross-project training dataset was constructed using all of the excavation segments from Tunnel 6 and part of Tunnel 3 in the YC project, along with all of the excavation segments from TBM1-1 in the YE project. This dataset was then split into a training and validation set at a 4:1 ratio [
23]. Two test sets were defined: Test Set 1 includes 200 excavation segments—50 segments each from Class II, III, IV, and V surrounding rocks—randomly selected from Tunnels 3, 4, and 5 in the YC project, to evaluate the model’s performance within the YC project. Test Set 2 comprises all excavation segments from TBM1-2 in the YE project, used to assess the model’s performance across projects. The features used include: no-load phase: cutterhead rotation speed, advance rate, penetration rate, average frictional thrust, and average frictional torque; ascending phase:
aF,
Ff,
aT, and
Tf; stable phase: cutterhead rotation speed, advance rate, penetration rate, thrust per cutter, torque per cutter, and power ratio; and others: cutterhead diameter and number of disc cutters. To improve classification performance, some studies simplify the task to binary classification by categorizing Classes II–III as “favorable” and Classes IV–V as “unfavorable” rock conditions, which can still effectively reflect the overall rock condition [
6]. The distribution of surrounding rock classes is shown in
Figure 9.
The binary classification performance was evaluated using the following metrics [
24]: precision (
PRE), recall (
REC), accuracy (
ACC), F1-score, and area under the ROC curve (
AUC). Since the original dataset contains Grades II–V and some grades are strongly imbalanced in the YE project, full multi-class classification is reserved for future work with more balanced samples, while the original four-class distribution is retained in
Figure 9 for completeness.
5.2. Verification of the Effectiveness of Knowledge-Driven Features
To demonstrate the effectiveness of the proposed knowledge-driven indicators for cross-project classification of rock mass quality, and to evaluate different combinations of indicators from various tunneling phases, this study builds comparative models using a combined dataset from the YC and YE projects. Since the knowledge-driven indicators from both the no-load and steady phases involve cutterhead rotation speed, advance rate, penetration, thrust, and torque, high correlations among these features may lead to multicollinearity when used as model inputs, which can negatively impact machine learning performance. Therefore, Pearson correlation analysis was conducted on these features, as shown in
Figure 8. The results indicate that the correlation coefficients between cutterhead rotation speed and thrust per cutter from the no-load and steady phases exceed 0.8. As a result, highly correlated features should not be used simultaneously when constructing base learners.
Based on the above analysis, six models were developed using different feature combinations.
Model based on knowledge-driven indicators from the no-load phase: Features include cutterhead rotation speed, advance rate, penetration, average frictional thrust, average frictional torque during the no-load phase, as well as cutterhead diameter and number of cutters.
Model based on knowledge-driven indicators from the penetration-increasing phase: Features include cutterhead rotation speed, aF, Ff, aT, and Tf during the penetration-increasing phase, along with cutterhead diameter and number of cutters.
Model based on knowledge-driven indicators from the steady phase: Features include advance rate, cutterhead rotation speed, penetration, thrust per cutter, and power ratio during the steady phase, plus cutterhead diameter and number of cutters.
Model using four commonly used indicators (baseline model): Features include cutterhead rotation speed, advance rate, cutterhead thrust, and torque during the steady phase, as well as cutterhead diameter and number of cutters.
Model combining knowledge-driven indicators from the no-load and penetration-increasing phases: Features include the same no-load phase indicators as in Model 1 and aF, Ff, aT, and Tf from the penetration-increasing phase, along with cutterhead diameter and number of cutters.
Model using all knowledge-driven indicators: Features include all indicators from the no-load phase (Model 1), penetration-increasing phase (Model 2), and selected steady-phase indicators (Model 3), excluding highly correlated ones, together with cutterhead diameter and number of cutters.
Based on the six proposed feature sets, ten widely used classification algorithms were employed to evaluate the effectiveness of surrounding rock quality recognition. These algorithms include four statistical learning methods—Logistic Regression (LG), Decision Tree (DT), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN); five ensemble learning methods—Random Forest (RF), LightGBM (LGBM), AdaBoost, Gradient Boosting Decision Tree (GBDT), and XGBoost; and one deep learning method—Long Short-Term Memory (LSTM). A total of 60 models (6 feature sets × 10 algorithms) were constructed, and optimal hyperparameters were determined via grid search. The models were then evaluated on the validation dataset.
Table 8 summarizes the best-performing model for each feature set along with its performance metrics. The results demonstrate that all optimal models are based on ensemble learning algorithms, indicating their superior capability in classifying surrounding rock quality. All optimal models achieved an accuracy (ACC) above 0.975, precision (PRE) above 0.879, recall (REC) above 0.824, F1-score above 0.851, and area under the ROC curve (AUC) above 0.990. These findings suggest that traditional classification models can effectively extract key data features; however, further improvement is still possible in terms of predictive performance.
A comparative analysis of the evaluation metrics across the six models reveals that the model constructed using knowledge-driven features from the no-excavation (idle rotation) phase (Model 1) outperforms the traditional feature-based model (Model 4), indicating a certain level of classification capability. Furthermore, the models based on knowledge-driven features from the penetration-increasing phase (Model 2) and the steady phase (Model 3) achieved better performance than Model 1, suggesting the features from these phases are more informative. Models incorporating multi-phase knowledge-driven rock-breaking features (Models 5 and 6) consistently outperformed the single-phase models (Models 1–4). Notably, the model constructed using all knowledge-driven features (Model 6) achieved the highest F1-score (0.917) and AUC (0.998) among all six models, improving the F1-score by 4.9% and AUC by 0.6% compared with the traditional steady-stage control model (Model 4). These results indicate that model performance improves as the feature set expands, particularly with the inclusion of penetration-increasing phase indicators, which significantly enhance classification effectiveness. This confirms the validity and advantage of the proposed knowledge-driven rock-breaking indices for cross-project surrounding rock quality recognition.
To further investigate the generalization capability of machine learning models on cross-project datasets, two additional models were constructed using all knowledge-driven rock-breaking features: one trained solely on the YC project dataset (Model 7) and the other on the YE project dataset (Model 8). These models were compared with the cross-project model (Model 6), which was trained on the combined dataset of both YC and YE projects. The evaluation metrics for Models 6–8 are summarized in
Table 8. The results show that the performance of the cross-project model (Model 6) lies between that of the single-project models. Specifically, Model 7 (YC-based) exhibited superior performance due to a more balanced distribution of surrounding rock quality categories. In contrast, Model 8 (YE-based) performed less favorably, primarily because of severe class imbalance in the YE dataset—only 2.4% of the samples belong to poor-quality surrounding rock. Overall, the cross-project model (Model 6) maintained comparable performance to the single-project models, indicating that the integration of datasets from different projects did not compromise the learning ability of the machine learning model. This suggests the proposed approach is suitable for cross-project classification tasks in TBM tunneling operations. Therefore, the method is most effective for open-type hard-rock TBM projects with stable sensor acquisition, identifiable excavation cycles, and sufficient samples covering the main surrounding rock classes.