1. Introduction
During the shield tunneling process, deviations between the actual tunneling axis and the design tunnel axis (DTA) can lead to adverse consequences, including segment misalignment, ground deformation, water leakage, and segment flotation. Accurate prediction of shield attitude is therefore of critical importance [
1,
2,
3,
4]. In this study, the term “shield attitude” refers to the positional deviation state of the shield machine during tunneling, including the horizontal and vertical deviations of both the shield head and the shield tail. Existing prediction methods can be broadly classified into three categories: analytical formulas, numerical simulations, and machine learning (ML) algorithms.
Analytical approaches are generally established on the basis of mechanical principles tailored to specific engineering projects. For instance, Sugimoto et al. [
5,
6,
7] investigated the intrinsic relationships among tunneling resistance, ground pressure, and attitude variation through force-balance analysis. Festa et al. [
8] developed an empirical computational model based on shield construction parameters and field excavation records to control attitude deviations. Meanwhile, some researchers have employed Finite Element Method (FEM) simulations to analyze the forces acting on the shield machine and established mechanical adjustment models for prediction [
9,
10].
However, both theoretical and numerical simulation methods typically necessitate numerous simplifying assumptions and incorporate a limited number of excavation parameters. These methods are also constrained in their ability to process the large-scale, real-time data generated during shield tunneling. As a result, their predictions may not fully meet the practical requirements of proactive risk prevention and control in modern construction. Consequently, recent studies have adopted machine learning (ML) algorithms to explore the complex nonlinear coupling relationships among multiple factors and improve predictive performance. For instance, Huang et al. [
11] used a machine learning method to predict shield attitude. Similarly, Chen et al. [
12] developed an intelligent attitude prediction model based on a Bayesian-optimized LightGBM framework, while Wang et al. [
13] applied XGBoost to improve deviation prediction under complex geological conditions. To better capture temporal dependencies, an increasing number of studies have further introduced sequential architectures. These include CNN-LSTM models integrated with wavelet transforms [
14], LSTM-Transformer hybrids [
15], and the FTA-N-GRU model, which incorporates Feature Temporal Attention [
16]. Other advancements involve PCA-assisted GRU models [
17] and PCA-assisted Temporal Convolutional Networks (TCN) featuring SHAP-based interpretability [
18]. Furthermore, long-term and irregular time-series modeling has been refined through time-aware LSTM variants [
19] and attention-based CNN-BiLSTM-Transformer frameworks [
20]. Notably, the D-T-RC_LSTM model offers the potential to extend source domains to multiple shield tunneling projects, facilitating further investigation into the feasibility of multi-domain transfer learning [
21].
Despite the progress of ML algorithms in intelligent tunnel construction, the high uncertainty and heterogeneity of engineering environments often lead to limited transferability and poor generalization performance of single-model approaches across different projects [
4,
13,
22]. In addition, conventional Stacking-based methods generally rely on simple combinations of heterogeneous learners, with limited consideration given to hierarchical feature interaction and preservation of original engineering information during multi-layer fusion. To address these limitations, this study proposes a novel PCA-SWO-Stacking framework for shield attitude prediction. The proposed method integrates Principal Component Analysis (PCA), Spider Wasp Optimizer (SWO), and a multi-layer heterogeneous Stacking architecture into a unified prediction framework. In contrast to conventional shallow ensemble strategies, the proposed model introduces residual-like feature connections by concatenating the raw input features with the outputs of preceding layers, thereby enabling hierarchical feature fusion while mitigating information loss during deep ensemble learning. Meanwhile, the SWO algorithm is employed to adaptively optimize the hyperparameters of heterogeneous base learners, improving model robustness and reducing the dependence on manual parameter tuning. The proposed framework not only enhances prediction accuracy but also improves computational efficiency and engineering adaptability for large-diameter shield tunneling under complex geological conditions. The PCA-SWO-Stacking algorithm is applied to the Shanghai Beiheng Passageway project, and its performance is validated through comparisons with commonly used ensemble algorithms and ablation experiments.
2. Methods
Alignment inaccuracy is defined as the deviation between the current tunneling axis (CTA) and the design tunnel axis (DTA). In engineering practice, four specific parameters are commonly monitored: Horizontal Deviation of the Shield Head (HDSH), Vertical Deviation of the Shield Head (VDSH), Horizontal Deviation of the Shield Tail (HDST), and Vertical Deviation of the Shield Tail (VDST). These parameters serve as the output variables for the shield attitude prediction model. The geometric representation of these shield attitude parameters is illustrated in
Figure 1.
2.1. Data Collection and Preprocessing
During shield tunneling, a wide range of construction parameters is monitored in real time. Initially, 68 parameters closely related to shield attitude—such as advance rate, cutterhead speed, and total thrust—were selected. Preliminary analysis indicated that the raw dataset had high dimensionality and contained numerous zero values, outliers, and noise. Consequently, rigorous preprocessing and feature screening were essential. The preprocessing workflow consisted of four stages: zero-value removal, outlier handling, noise reduction, and normalization.
Zero-value Handling: During actual construction, the shield machine remains in a non-excavation state for a considerable duration. Monitoring data recorded during these intervals are not relevant to attitude prediction.
Outlier Removal: Outliers in the raw dataset can distort subsequent noise reduction and normalization, potentially leading to model overfitting and reduced generalization performance.
Noise Reduction: To mitigate the impact of measurement noise on model training, the Savitsky–Golay (S-G) filter was applied. The window length was set to 11, and the polynomial order was set to 3. This method was selected for its ability to suppress noise while preserving the original shape and features of the signal.
Normalization: The parameters in the raw dataset have different units and vary across several orders of magnitude, which can hinder model convergence and prolong training time. The Min–Max normalization technique was utilized to map all features into the [0, 1] interval, ensuring numerical stability without altering the underlying distribution of the data.
2.2. Principal Component Analysis (PCA) Algorithm
In complex model training scenarios, input parameters often exhibit high dimensionality. Such high-dimensional data typically contain invalid features and redundant information, which can reduce training efficiency and degrade model performance. As a classical dimensionality reduction technique, Principal Component Analysis (PCA) removes irrelevant features and integrates redundant information while preserving the primary characteristics of the dataset, thereby establishing a more compact input feature system.
PCA is implemented by calculating the eigenvalues and eigenvectors of the covariance matrix obtained from mean-centered data. By ranking these eigenvalues in descending order, the principal components with higher values are selected. The original data are then projected onto a low-dimensional subspace formed by the corresponding eigenvectors. The specific implementation steps are as follows:
(1) Covariance Matrix Calculation
The standardized data are expressed as
X = [
x1, …,
xN]
D×N. The covariance matrix is computed as follows:
where
C is the covariance matrix,
N is the number of features, and
xi represents the vector of the
i dimension.
(2) Eigenvalue Decomposition and Projection
Eigenvalue decomposition is performed on the covariance matrix
C to obtain the eigenvalues and their corresponding eigenvectors. The eigenvalues are sorted in descending order, and the first
d principal components are selected. Their corresponding eigenvectors are concatenated column-wise to form the basis matrix
Ud. The transformation from
N-dimensional data to a
d-dimensional space is achieved through matrix projection:
where
Ud is the
d-dimensional basis matrix composed of the selected eigenvectors,
μd rep denotes the eigenvectors, and
Y is the dimensionality-reduced dataset.
2.3. Spider Wasp Optimizer (SWO) Algorithm
The Spider Wasp Optimizer (SWO) is a recently developed meta-heuristic optimization algorithm inspired by the predatory and reproductive behaviors of spider wasps in nature [
23]. By simulating biological activities—such as searching for spiders, paralyzing prey, and oviposition (egg-laying)—the SWO establishes a robust balance between global exploration and local exploitation. It demonstrates superior convergence speed and a remarkable ability to escape from local optima.
The core mechanism of SWO is categorized into three distinct phases: the searching phase, the hunting and paralyzing phase, and the oviposition and hatching phase.
(1) Searching Phase
In this stage, spider wasps fly randomly within the search space to locate potential spider prey. This phase emphasizes global exploration to prevent the algorithm from premature convergence. The position update is formulated as
where
and
are random numbers uniformly distributed in [0, 1];
and
denote the current and subsequent positions, respectively;
represents the current global optimal solution; and
is a randomly selected individual from the population.
(2) Hunting and Paralyzing Phase
Upon locating prey, the wasp dynamically adjusts its flight trajectory to chase and paralyze it. This phase performs a fine-tuned local search around the optimal solution, effectively balancing exploration and exploitation:
where
is the position of the prey,
is a coefficient controlling the hunting intensity,
denotes the decay factor, and
represents the maximum number of iterations.
(3) Oviposition and Hatching Phase
The paralyzed spider is dragged into the nest where the wasp lays its eggs, and the hatched larvae feed on the spider. This phase preserves high-quality solutions through a population update mechanism while introducing stochastic perturbations to maintain population diversity:
where
represents the mutation step size, and
denotes a random variable following a standard normal distribution, which is used to generate new individuals near the optimal solution.
The Spider Wasp Optimizer (SWO) was employed to determine the optimal hyperparameter configurations for the machine learning models, as illustrated in
Figure 2.
Compared with conventional metaheuristic optimizers, such as Particle Swarm Optimization (PSO), Genetic Algorithms (GAs), and Bayesian Optimization (BO), the Spider Wasp Optimizer (SWO) provides a stronger balance between exploration and exploitation and more stable convergence in complex nonlinear optimization problems. PSO is prone to premature convergence, whereas GAs usually require higher computational costs and more complicated parameter tuning. Although BO performs well in low-dimensional optimization tasks, its efficiency may decrease when applied to high-dimensional and coupled hyperparameter spaces. Since the proposed shield attitude prediction framework involves multiple coupled hyperparameters and nonlinear feature interactions, SWO was adopted to improve global search capability and enhance the robustness of model parameter optimization.
2.4. Stacking Ensemble Learning Framework
By constructing and integrating multiple sub-models, ensemble learning can achieve better performance than individual models in regression tasks. The core mechanism lies in leveraging the complementarity and diversity among constituent models to enhance overall generalization capability. The general structure of ensemble learning is conceptually illustrated in
Figure 3.
At the algorithmic level, ensemble learning can be categorized into homogeneous and heterogeneous architectures. Homogeneous ensembles use a single algorithm to construct base learners; for instance, Random Forest uses only decision trees and generates diverse predictions through feature sampling and bootstrap aggregating (bagging). In contrast, heterogeneous ensembles integrate base models built using different algorithms, such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), and Gradient Boosting Decision Trees (GBDTs), to form a multimodal feature representation space. Among the three mainstream ensemble paradigms—Bagging, Boosting, and Stacking [
24]—the former two primarily rely on the parallel or serial training of homogeneous weak learners to improve accuracy via variance reduction or bias correction. Conversely, the Stacking algorithm, characterized by its hierarchical modeling structure, effectively consolidates the strengths of heterogeneous base models. It provides a more robust solution, particularly when significant uncertainty exists regarding the selection of an optimal individual algorithm.
The main advantage of the Stacking ensemble algorithm lies in its hierarchical feature transformation mechanism. Its fundamental architecture comprises two critical layers of components: the Base Layer (Level-0), which integrates multiple heterogeneous learners to generate primary predictions, and the Meta-Layer (Level-1), where a meta-learner performs high-order combinations of the base-layer outputs. This two-layered framework is conceptually illustrated in
Figure 4.
To further enhance the predictive performance, this study extends the conventional Stacking framework by increasing the number of internal learning layers, resulting in a multi-layer Stacking architecture, as illustrated in
Figure 5.
The multi-layer Stacking achieves deep feature fusion through residual-like connections, where the original input features are concatenated with the output of each preceding layer. This mechanism enables subsequent models to learn the abstract representations processed by the base layers while simultaneously retaining the integrity of the raw data information. Mathematically, this design is formulated as a residual mapping process. By employing skip connections, the input feature vectors are merged with the hidden layer outputs along the channel dimension. This approach effectively mitigates the information attenuation (decay) of raw data commonly encountered in deep learning training.
In the specific implementation, each layer can be flexibly configured with different types and quantities of individual learners. For instance, the base layer may deploy models such as XGBoost and Random Forest (RF) to handle structured construction features. These outputs are then combined in the final meta-learning layer for model fusion, establishing a robust hierarchical feature extraction and fusion system. This establishes a robust, hierarchical feature extraction system.
To improve the accuracy of shield attitude prediction, a hybrid PCA-SWO-Stacking framework is proposed, as illustrated in
Figure 6. The process begins with mean normalization and Principal Component Analysis (PCA) to project high-dimensional data into a lower-dimensional space while retaining essential information through an optimized variance threshold. Subsequently, multiple heterogeneous base learners are selected, and their hyperparameters are globally optimized using the Spider Wasp Optimizer (SWO) to ensure robust individual model performance. To capitalize on deep feature fusion, a multi-layer Stacking architecture is implemented, where each layer iteratively learns from an augmented feature set—a concatenation of raw inputs and preceding layer outputs. Finally, to mitigate the risk of overfitting inherent in such a complex hierarchical structure, a K-fold cross-validation strategy is employed to enhance the model’s generalization capability and ensure reliable performance evaluation on unseen data.
2.5. Evaluation Metrics
Shield attitude prediction is a regression task. To evaluate the performance of the proposed model, several commonly used metrics are employed: The Root Mean Squared Error (RMSE) maintains the same units as the original data and intuitively reflects the typical deviation between predicted and observed values.
where
n is the number of samples in the testing set,
is the predicted value of the
i-th sample, and
is the corresponding actual value.
The Mean Absolute Error (MAE) represents the average of the absolute differences between the predicted and actual values.
The Coefficient of Determination (
) is a core statistical metric for measuring the goodness-of-fit in regression analysis.
Here, is the average of the observed values.
4. Analysis of Attitude-Prediction Results
Using the dataset from rings 700 to 1000 of the eastern section of the Beiheng Passageway, a total of 259,199 data samples were obtained after preliminary data preprocessing. The dataset was ordered by ring number to preserve the temporal sequence. To avoid data leakage, a time-based split was adopted: the first 80% of rings were used for training, and the last 20% were used for testing without random shuffling. In performing dimensionality reduction using PCA, the amount of information retained in the output matrix, defined as the cumulative variance contribution of the principal components, was set to no less than 90% of that of the original dataset.
Figure 8 illustrates the variance contribution rates of the principal components. As indicated, the cumulative explained variance of the first eight principal components reached 90%. Hence, the first eight principal components were selected as the dimensionality-reduced features for the subsequent training of the model.
The PCA-SWO-Stacking model was adopted for model training. The proposed Stacking ensemble framework integrates five heterogeneous base learners, namely LightGBM, KNN, DT, RF, and XGBoost, whose primary hyperparameters are summarized in
Table 3. These models were selected because they are based on different algorithmic principles. Such heterogeneous characteristics contribute to lower correlation among prediction errors and improve the overall predictive performance and generalization capability of the ensemble model. The optimal hyperparameters for each sub-model were determined through the SWO algorithm. The population size for the SWO algorithm was set to 40, the maximum number of iterations to 200, and the range of hyperparameters is shown in
Table 3. The Stacking architecture itself employs a two-layer stacked structure with the number of cross-validation folds set to five (K = 5). Given that this hierarchical framework already possesses substantial complexity, the requirements for the meta-learner’s complexity are correspondingly reduced; therefore, a weighted averaging method is utilized for model fusion.
The testing results are shown in
Figure 9, where the red dashed line represents the ideal prediction line (where predicted values equal actual values). The scatter points in the figure represent the model’s predictions; points closer to the ideal line indicate higher prediction accuracy.
As shown in the figure, the overall prediction performance for the four shield attitude parameters is favorable, with all R2 values exceeding 0.9. There is no apparent systematic deviation between the predicted and actual values, indicating a high level of prediction accuracy. Notably, the prediction performance for the two attitude parameters at the shield head is generally lower than that for the shield tail. This discrepancy can be attributed to the fact that the selected features are more correlated with the shield tail attitude parameters, thereby exerting a greater influence on their prediction.
Employing the same dataset used for the PCA-SWO-Stacking model, comparative models based on RF, XGBoost, LSTM, and GRU algorithms were trained, and the identical test set was adopted for prediction. Taking the horizontal deviation of the shield tail as an illustrative example,
Figure 10 presents a comparison of the prediction results obtained from the different models. As shown in the figure, the predicted values generated by the PCA-SWO-Stacking model are generally closer to the actual values than those produced by the other models. Although RF and XGBoost exhibit relatively stable predictive performance, their predictions still deviate from the actual values at certain locations, particularly for samples with larger fluctuations. Similarly, while the LSTM and GRU models are capable of capturing temporal dependencies in sequential tunneling data, their prediction accuracy and stability remain inferior to those of the proposed PCA-SWO-Stacking framework. Compared with the single-model approaches, the proposed PCA-SWO-Stacking model integrates heterogeneous base learners and further optimizes model parameters using the SWO algorithm, thereby achieving stronger nonlinear mapping capability and improved generalization performance under complex geological conditions. A quantitative performance comparison of the different models is provided in
Table 4. Compared with the RF, XGBoost, LSTM, and GRU models, the PCA-SWO-Stacking model exhibits a substantially higher R
2 value, together with lower RMSE and MAE values. These results indicate that the proposed model more accurately characterizes the relationship between tunneling parameters and shield attitude during shield driving, thereby demonstrating superior predictive capability and robustness.
For further analysis of the PCA-SWO-Stacking model, data from rings 1 to 300 of the western section of the Beiheng Passageway were adopted for testing, following the same procedure described in the preceding sections. Ablation experiments were performed on the model. Specifically, the Stacking model with PCA only, the Stacking model with SWO hyperparameter optimization only, and the Stacking model without any preprocessing were each trained and predicted. Moreover, sub-models using the same hyperparameters as the PCA-SWO-Stacking model were constructed. Using the lateral deviation of the shield tail as an illustrative example, the performance comparison of the models is presented in
Figure 11 and
Table 5.
Experimental results indicate that ensemble learning, especially the Stacking framework, yields markedly superior prediction accuracy compared with conventional single models such as DT and KNN. The PCA-SWO-Stacking model achieves an R2 of 0.927, in contrast to 0.723 and 0.795 for DT and KNN, respectively. Regarding computational efficiency, PCA reduces the prediction time by approximately 34% from 1.57 s to 1.03 s for the Stacking model, accompanied by an improvement in accuracy, with R2 increasing from 0.901 to 0.904. SWO further increases the accuracy to 0.927 without increasing prediction time. The PCA-SWO-Stacking model also demonstrates favorable predictive performance for shield attitude in the western section of the project. Furthermore, it achieves a balance between high accuracy and the real-time requirements of shield attitude prediction, maintaining robust predictive capability while keeping prediction time within a reasonable range through PCA-based dimensionality reduction.
To improve the interpretability and engineering explainability of the proposed PCA-SWO-Stacking framework, SHAP (SHapley Additive exPlanations) analysis was conducted to quantify the contribution of different input parameters to the prediction results of shield attitude. The corresponding SHAP analysis results are shown in
Figure 12.
The SHAP results indicate that the four shield attitude parameters are primarily influenced by the thrust forces of different groups of hydraulic jacks. This is mainly because the differential distribution of jack thrust directly affects the force balance and attitude adjustment of the shield machine during tunneling. The analysis demonstrates that the proposed model not only achieves high prediction accuracy but also provides reasonable engineering interpretability consistent with the mechanical characteristics of shield tunneling operations. Therefore, the proposed framework exhibits strong potential for practical application in intelligent shield attitude prediction and control.
5. Discussion
This study proposes a shield attitude prediction model based on PCA-SWO-Stacking, which achieves favorable prediction performance and verifies the contributions of PCA preprocessing and the SWO optimization algorithm to the overall model. Specifically, PCA improves computational efficiency, whereas SWO enhances prediction accuracy.
The raw dataset used in this study was obtained from an engineering project in Shanghai, and the research primarily focuses on shield tunneling construction in soft-soil regions. To further assess the generalization capability and applicability of the proposed method, future work may consider collecting datasets from different geological conditions to conduct cross-regional comparative and validation studies. The Stacking ensemble model comprises five heterogeneous base models. Future investigations could explore the impact of incorporating more sub-models, as well as adopting new algorithms, on the overall Stacking architecture.
6. Conclusions
This study proposes a shield attitude prediction model based on PCA-SWO-Stacking, which achieves integrated fusion among different sub-models. The model architecture and its impact on the final prediction results are analyzed. The main conclusions are as follows:
(1) A complete pipeline for shield attitude prediction using PCA-SWO-Stacking is proposed. Principal Component Analysis (PCA) is employed to reduce the dimensionality of the high-dimensional shield tunneling data, effectively extracting key features and reducing noise interference. By integrating multiple heterogeneous base models within the Stacking ensemble learning framework and optimizing the hyperparameters using the SWO algorithm, the model achieves a significant improvement in prediction accuracy. The proposed model yields satisfactory prediction performance for all four shield attitude targets, with R2 values of 0.940, 0.964, 0.997, and 0.991, respectively, and MAE values below 1.5 in all cases.
(2) To validate the superiority and stability of the PCA-SWO-Stacking shield attitude prediction model, four baseline models—RF, GRU, LSTM, and XGBoost—were constructed for comparison. Their R2 values are 0.916, 0.883, 0.916, and 0.928, respectively, all of which are lower than those of the PCA-SWO-Stacking model. Furthermore, using data from the western section for testing, a performance analysis of the PCA-SWO-Stacking model and its sub-models was conducted. The results show that the overall model outperforms each individual sub-model constructed separately. Moreover, ablation experiments verify the contributions of PCA preprocessing and the SWO optimization algorithm to the overall model: the former improves computational efficiency, while the latter enhances prediction accuracy.
(3) The proposed PCA-SWO-Stacking framework demonstrates potential for practical engineering applications in shield tunneling. By enabling accurate real-time prediction of shield attitude, the proposed method can assist operators in optimizing tunneling parameter adjustments and reducing construction risks caused by excessive attitude deviation. Furthermore, the proposed framework provides technical support for intelligent tunnel construction management and automated shield control under complex geological conditions.