Prediction of Large-Diameter Shield Tunneling Attitude: PCA-SWO-Stacking Machine Learning Algorithm Application in a Case Study of the Shanghai Beiheng Passageway

Yu, Jingxiang; Zhang, Mengxi

doi:10.3390/app16115548

Open AccessArticle

Prediction of Large-Diameter Shield Tunneling Attitude: PCA-SWO-Stacking Machine Learning Algorithm Application in a Case Study of the Shanghai Beiheng Passageway

by

Jingxiang Yu

and

Mengxi Zhang

^*

School of Mechanics and Engineering Science, Shanghai University, Shanghai 200444, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(11), 5548; https://doi.org/10.3390/app16115548

Submission received: 12 May 2026 / Revised: 22 May 2026 / Accepted: 23 May 2026 / Published: 2 June 2026

Download

Browse Figures

Versions Notes

Abstract

To address the limited cross-domain generalization of single-algorithm models for shield attitude prediction, this study proposes a heterogeneous algorithm-fusion framework based on Stacking. The framework integrates multiple machine learning algorithms and uses the Spider Wasp Optimizer (SWO) for hyperparameter optimization, thereby overcoming the limitations of individual learners and reducing the need for laborious algorithm selection. Principal Component Analysis (PCA) is further used to reduce dimensionality and reconstruct high-dimensional features, which lowers computational complexity and improves prediction accuracy. The proposed PCA-SWO-Stacking algorithm was applied to shield attitude prediction using data from the Shanghai Beiheng Passageway project. The results show strong predictive performance, with all R² values exceeding 0.94 and all RMSE and MAE values remaining below 2. Comparative experiments with commonly used ensemble algorithms and ablation studies further confirm the effectiveness and robustness of the proposed method.

Keywords:

super-large-diameter shield tunnel; attitude prediction; ensemble algorithms; Principal Component Analysis; Spider Wasp Optimizer

1. Introduction

During the shield tunneling process, deviations between the actual tunneling axis and the design tunnel axis (DTA) can lead to adverse consequences, including segment misalignment, ground deformation, water leakage, and segment flotation. Accurate prediction of shield attitude is therefore of critical importance [1,2,3,4]. In this study, the term “shield attitude” refers to the positional deviation state of the shield machine during tunneling, including the horizontal and vertical deviations of both the shield head and the shield tail. Existing prediction methods can be broadly classified into three categories: analytical formulas, numerical simulations, and machine learning (ML) algorithms.

Analytical approaches are generally established on the basis of mechanical principles tailored to specific engineering projects. For instance, Sugimoto et al. [5,6,7] investigated the intrinsic relationships among tunneling resistance, ground pressure, and attitude variation through force-balance analysis. Festa et al. [8] developed an empirical computational model based on shield construction parameters and field excavation records to control attitude deviations. Meanwhile, some researchers have employed Finite Element Method (FEM) simulations to analyze the forces acting on the shield machine and established mechanical adjustment models for prediction [9,10].

However, both theoretical and numerical simulation methods typically necessitate numerous simplifying assumptions and incorporate a limited number of excavation parameters. These methods are also constrained in their ability to process the large-scale, real-time data generated during shield tunneling. As a result, their predictions may not fully meet the practical requirements of proactive risk prevention and control in modern construction. Consequently, recent studies have adopted machine learning (ML) algorithms to explore the complex nonlinear coupling relationships among multiple factors and improve predictive performance. For instance, Huang et al. [11] used a machine learning method to predict shield attitude. Similarly, Chen et al. [12] developed an intelligent attitude prediction model based on a Bayesian-optimized LightGBM framework, while Wang et al. [13] applied XGBoost to improve deviation prediction under complex geological conditions. To better capture temporal dependencies, an increasing number of studies have further introduced sequential architectures. These include CNN-LSTM models integrated with wavelet transforms [14], LSTM-Transformer hybrids [15], and the FTA-N-GRU model, which incorporates Feature Temporal Attention [16]. Other advancements involve PCA-assisted GRU models [17] and PCA-assisted Temporal Convolutional Networks (TCN) featuring SHAP-based interpretability [18]. Furthermore, long-term and irregular time-series modeling has been refined through time-aware LSTM variants [19] and attention-based CNN-BiLSTM-Transformer frameworks [20]. Notably, the D-T-RC_LSTM model offers the potential to extend source domains to multiple shield tunneling projects, facilitating further investigation into the feasibility of multi-domain transfer learning [21].

Despite the progress of ML algorithms in intelligent tunnel construction, the high uncertainty and heterogeneity of engineering environments often lead to limited transferability and poor generalization performance of single-model approaches across different projects [4,13,22]. In addition, conventional Stacking-based methods generally rely on simple combinations of heterogeneous learners, with limited consideration given to hierarchical feature interaction and preservation of original engineering information during multi-layer fusion. To address these limitations, this study proposes a novel PCA-SWO-Stacking framework for shield attitude prediction. The proposed method integrates Principal Component Analysis (PCA), Spider Wasp Optimizer (SWO), and a multi-layer heterogeneous Stacking architecture into a unified prediction framework. In contrast to conventional shallow ensemble strategies, the proposed model introduces residual-like feature connections by concatenating the raw input features with the outputs of preceding layers, thereby enabling hierarchical feature fusion while mitigating information loss during deep ensemble learning. Meanwhile, the SWO algorithm is employed to adaptively optimize the hyperparameters of heterogeneous base learners, improving model robustness and reducing the dependence on manual parameter tuning. The proposed framework not only enhances prediction accuracy but also improves computational efficiency and engineering adaptability for large-diameter shield tunneling under complex geological conditions. The PCA-SWO-Stacking algorithm is applied to the Shanghai Beiheng Passageway project, and its performance is validated through comparisons with commonly used ensemble algorithms and ablation experiments.

2. Methods

Alignment inaccuracy is defined as the deviation between the current tunneling axis (CTA) and the design tunnel axis (DTA). In engineering practice, four specific parameters are commonly monitored: Horizontal Deviation of the Shield Head (HDSH), Vertical Deviation of the Shield Head (VDSH), Horizontal Deviation of the Shield Tail (HDST), and Vertical Deviation of the Shield Tail (VDST). These parameters serve as the output variables for the shield attitude prediction model. The geometric representation of these shield attitude parameters is illustrated in Figure 1.

2.1. Data Collection and Preprocessing

During shield tunneling, a wide range of construction parameters is monitored in real time. Initially, 68 parameters closely related to shield attitude—such as advance rate, cutterhead speed, and total thrust—were selected. Preliminary analysis indicated that the raw dataset had high dimensionality and contained numerous zero values, outliers, and noise. Consequently, rigorous preprocessing and feature screening were essential. The preprocessing workflow consisted of four stages: zero-value removal, outlier handling, noise reduction, and normalization.

Zero-value Handling: During actual construction, the shield machine remains in a non-excavation state for a considerable duration. Monitoring data recorded during these intervals are not relevant to attitude prediction.

Outlier Removal: Outliers in the raw dataset can distort subsequent noise reduction and normalization, potentially leading to model overfitting and reduced generalization performance.

Noise Reduction: To mitigate the impact of measurement noise on model training, the Savitsky–Golay (S-G) filter was applied. The window length was set to 11, and the polynomial order was set to 3. This method was selected for its ability to suppress noise while preserving the original shape and features of the signal.

Normalization: The parameters in the raw dataset have different units and vary across several orders of magnitude, which can hinder model convergence and prolong training time. The Min–Max normalization technique was utilized to map all features into the [0, 1] interval, ensuring numerical stability without altering the underlying distribution of the data.

2.2. Principal Component Analysis (PCA) Algorithm

In complex model training scenarios, input parameters often exhibit high dimensionality. Such high-dimensional data typically contain invalid features and redundant information, which can reduce training efficiency and degrade model performance. As a classical dimensionality reduction technique, Principal Component Analysis (PCA) removes irrelevant features and integrates redundant information while preserving the primary characteristics of the dataset, thereby establishing a more compact input feature system.

PCA is implemented by calculating the eigenvalues and eigenvectors of the covariance matrix obtained from mean-centered data. By ranking these eigenvalues in descending order, the principal components with higher values are selected. The original data are then projected onto a low-dimensional subspace formed by the corresponding eigenvectors. The specific implementation steps are as follows:

(1) Covariance Matrix Calculation

The standardized data are expressed as X = [x₁, …, x_N]_D×N. The covariance matrix is computed as follows:

C = \frac{1}{N} \sum_{i = 1}^{N} x_{i} x_{i}^{T}

(1)

where C is the covariance matrix, N is the number of features, and x_i represents the vector of the i dimension.

(2) Eigenvalue Decomposition and Projection

Eigenvalue decomposition is performed on the covariance matrix C to obtain the eigenvalues and their corresponding eigenvectors. The eigenvalues are sorted in descending order, and the first d principal components are selected. Their corresponding eigenvectors are concatenated column-wise to form the basis matrix U_d. The transformation from N-dimensional data to a d-dimensional space is achieved through matrix projection:

U_{d} = {[μ_{1}, \dots, μ_{d}]}_{D \times d}

(2)

Y = U_{d}^{T} X

(3)

where U_d is the d-dimensional basis matrix composed of the selected eigenvectors, μ_d rep denotes the eigenvectors, and Y is the dimensionality-reduced dataset.

2.3. Spider Wasp Optimizer (SWO) Algorithm

The Spider Wasp Optimizer (SWO) is a recently developed meta-heuristic optimization algorithm inspired by the predatory and reproductive behaviors of spider wasps in nature [23]. By simulating biological activities—such as searching for spiders, paralyzing prey, and oviposition (egg-laying)—the SWO establishes a robust balance between global exploration and local exploitation. It demonstrates superior convergence speed and a remarkable ability to escape from local optima.

The core mechanism of SWO is categorized into three distinct phases: the searching phase, the hunting and paralyzing phase, and the oviposition and hatching phase.

(1) Searching Phase

In this stage, spider wasps fly randomly within the search space to locate potential spider prey. This phase emphasizes global exploration to prevent the algorithm from premature convergence. The position update is formulated as

X_{i}^{t + 1} = X_{i}^{t} + r_{1} * (X_{b e s t}^{t} - X_{i}^{t}) + r_{2} * (X_{r a n d}^{t} - X_{i}^{t})

(4)

where

r_{1}

and

r_{2}

are random numbers uniformly distributed in [0, 1];

X_{i}^{t}

and

X_{i}^{t + 1}

denote the current and subsequent positions, respectively;

X_{b e s t}^{t}

represents the current global optimal solution; and

X_{r a n d}^{t}

is a randomly selected individual from the population.

(2) Hunting and Paralyzing Phase

Upon locating prey, the wasp dynamically adjusts its flight trajectory to chase and paralyze it. This phase performs a fine-tuned local search around the optimal solution, effectively balancing exploration and exploitation:

X_{i}^{t + 1} = X_{i}^{t} + β * \exp (- α t / T) * (X_{p r e y}^{t} - X_{i}^{t})

(5)

where

X_{p r e y}^{t}

is the position of the prey,

β

is a coefficient controlling the hunting intensity,

α

denotes the decay factor, and

T

represents the maximum number of iterations.

(3) Oviposition and Hatching Phase

The paralyzed spider is dragged into the nest where the wasp lays its eggs, and the hatched larvae feed on the spider. This phase preserves high-quality solutions through a population update mechanism while introducing stochastic perturbations to maintain population diversity:

X_{n e w} = X_{b e s t}^{t} + σ * N (0, 1) (X_{b e s t}^{t} - X_{r a n d}^{t})

(6)

where

σ

represents the mutation step size, and

N (0,1)

denotes a random variable following a standard normal distribution, which is used to generate new individuals near the optimal solution.

The Spider Wasp Optimizer (SWO) was employed to determine the optimal hyperparameter configurations for the machine learning models, as illustrated in Figure 2.

Compared with conventional metaheuristic optimizers, such as Particle Swarm Optimization (PSO), Genetic Algorithms (GAs), and Bayesian Optimization (BO), the Spider Wasp Optimizer (SWO) provides a stronger balance between exploration and exploitation and more stable convergence in complex nonlinear optimization problems. PSO is prone to premature convergence, whereas GAs usually require higher computational costs and more complicated parameter tuning. Although BO performs well in low-dimensional optimization tasks, its efficiency may decrease when applied to high-dimensional and coupled hyperparameter spaces. Since the proposed shield attitude prediction framework involves multiple coupled hyperparameters and nonlinear feature interactions, SWO was adopted to improve global search capability and enhance the robustness of model parameter optimization.

2.4. Stacking Ensemble Learning Framework

By constructing and integrating multiple sub-models, ensemble learning can achieve better performance than individual models in regression tasks. The core mechanism lies in leveraging the complementarity and diversity among constituent models to enhance overall generalization capability. The general structure of ensemble learning is conceptually illustrated in Figure 3.

At the algorithmic level, ensemble learning can be categorized into homogeneous and heterogeneous architectures. Homogeneous ensembles use a single algorithm to construct base learners; for instance, Random Forest uses only decision trees and generates diverse predictions through feature sampling and bootstrap aggregating (bagging). In contrast, heterogeneous ensembles integrate base models built using different algorithms, such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), and Gradient Boosting Decision Trees (GBDTs), to form a multimodal feature representation space. Among the three mainstream ensemble paradigms—Bagging, Boosting, and Stacking [24]—the former two primarily rely on the parallel or serial training of homogeneous weak learners to improve accuracy via variance reduction or bias correction. Conversely, the Stacking algorithm, characterized by its hierarchical modeling structure, effectively consolidates the strengths of heterogeneous base models. It provides a more robust solution, particularly when significant uncertainty exists regarding the selection of an optimal individual algorithm.

The main advantage of the Stacking ensemble algorithm lies in its hierarchical feature transformation mechanism. Its fundamental architecture comprises two critical layers of components: the Base Layer (Level-0), which integrates multiple heterogeneous learners to generate primary predictions, and the Meta-Layer (Level-1), where a meta-learner performs high-order combinations of the base-layer outputs. This two-layered framework is conceptually illustrated in Figure 4.

To further enhance the predictive performance, this study extends the conventional Stacking framework by increasing the number of internal learning layers, resulting in a multi-layer Stacking architecture, as illustrated in Figure 5.

The multi-layer Stacking achieves deep feature fusion through residual-like connections, where the original input features are concatenated with the output of each preceding layer. This mechanism enables subsequent models to learn the abstract representations processed by the base layers while simultaneously retaining the integrity of the raw data information. Mathematically, this design is formulated as a residual mapping process. By employing skip connections, the input feature vectors are merged with the hidden layer outputs along the channel dimension. This approach effectively mitigates the information attenuation (decay) of raw data commonly encountered in deep learning training.

In the specific implementation, each layer can be flexibly configured with different types and quantities of individual learners. For instance, the base layer may deploy models such as XGBoost and Random Forest (RF) to handle structured construction features. These outputs are then combined in the final meta-learning layer for model fusion, establishing a robust hierarchical feature extraction and fusion system. This establishes a robust, hierarchical feature extraction system.

To improve the accuracy of shield attitude prediction, a hybrid PCA-SWO-Stacking framework is proposed, as illustrated in Figure 6. The process begins with mean normalization and Principal Component Analysis (PCA) to project high-dimensional data into a lower-dimensional space while retaining essential information through an optimized variance threshold. Subsequently, multiple heterogeneous base learners are selected, and their hyperparameters are globally optimized using the Spider Wasp Optimizer (SWO) to ensure robust individual model performance. To capitalize on deep feature fusion, a multi-layer Stacking architecture is implemented, where each layer iteratively learns from an augmented feature set—a concatenation of raw inputs and preceding layer outputs. Finally, to mitigate the risk of overfitting inherent in such a complex hierarchical structure, a K-fold cross-validation strategy is employed to enhance the model’s generalization capability and ensure reliable performance evaluation on unseen data.

2.5. Evaluation Metrics

Shield attitude prediction is a regression task. To evaluate the performance of the proposed model, several commonly used metrics are employed: The Root Mean Squared Error (RMSE) maintains the same units as the original data and intuitively reflects the typical deviation between predicted and observed values.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(7)

where n is the number of samples in the testing set,

{\hat{y}}_{i}

is the predicted value of the i-th sample, and

y_{i}

is the corresponding actual value.

The Mean Absolute Error (MAE) represents the average of the absolute differences between the predicted and actual values.

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(8)

The Coefficient of Determination (

R^{2}

) is a core statistical metric for measuring the goodness-of-fit in regression analysis.

R^{2} = 1 - \frac{\sum_{i}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i}^{n} {(\bar{y} - y_{i})}^{2}}

(9)

Here,

\bar{y}

is the average of the observed values.

3. Case Study

3.1. Project Background

The Shanghai Beiheng Passageway project is located in the southeastern part of the Yangtze River Delta alluvial plain. The geological formation within a depth of 75 m exhibits typical soft soil characteristics. According to the engineering investigation results, the Quaternary sedimentary layers in the project area are primarily composed of highly compressible clay and fluid-plastic mucky soil, forming a typical coastal soft-soil foundation. The second contract section of the Shanghai Beiheng Passageway is divided into the east line and the west line.

According to the tunnel design and construction drawings, the central burial depth of the excavation cross-section in the east-line shield tunneling section generally ranges from approximately 30 m to 40 m. In contrast, due to the influence of alignment changes in the west-line section, the tunnel burial depth varies considerably, ranging between 12 m and 38 m. The differences in the stratigraphic structures traversed by the east and west lines can be comparatively analyzed using the representative geological profiles shown in Figure 7. The corresponding geological parameters are presented in Table 1.

3.2. Feature Selection

Based on relevant literature [25,26,27] and on-site engineering investigations, a comprehensive set of input parameters was selected from the initial 68 variables. Specifically, the mechanical inputs include six sets of thrust cylinder forces, six sets of thrust cylinder pressures, and four groups of articulation cylinder strokes. To leverage the temporal continuity of the tunneling process, the current shield attitude—comprising the horizontal and vertical deviations of both the head and tail (HDSH, VDSH, HDST, and VDST)—was incorporated as the baseline for predicting the attitude at the subsequent time step. Furthermore, to fully incorporate the influence of known geological conditions, several geotechnical parameters were integrated into the input feature set. These include cohesion (C), internal friction angle (φ), coefficient of lateral earth pressure (K₀), average compression modulus (Es), and tunnel overburden depth (D).

The Pearson correlation coefficients were calculated for the 27 features selected in the previous steps. A near-linear correlation between features indicates the presence of redundant information, which justifies selective elimination to simplify the model. The results demonstrated a perfect linear correlation between each set of thrust cylinder pressures and their corresponding thrust forces. Additionally, a strong positive correlation was observed between penetration rate and advance rate. To eliminate multicollinearity and reduce the dimensionality of the input space, the six sets of thrust cylinder pressures and the penetration rate were excluded. Following this feature screening process, a final set of 19 input parameters (features) was retained, as detailed in Table 2.

4. Analysis of Attitude-Prediction Results

Using the dataset from rings 700 to 1000 of the eastern section of the Beiheng Passageway, a total of 259,199 data samples were obtained after preliminary data preprocessing. The dataset was ordered by ring number to preserve the temporal sequence. To avoid data leakage, a time-based split was adopted: the first 80% of rings were used for training, and the last 20% were used for testing without random shuffling. In performing dimensionality reduction using PCA, the amount of information retained in the output matrix, defined as the cumulative variance contribution of the principal components, was set to no less than 90% of that of the original dataset. Figure 8 illustrates the variance contribution rates of the principal components. As indicated, the cumulative explained variance of the first eight principal components reached 90%. Hence, the first eight principal components were selected as the dimensionality-reduced features for the subsequent training of the model.

The PCA-SWO-Stacking model was adopted for model training. The proposed Stacking ensemble framework integrates five heterogeneous base learners, namely LightGBM, KNN, DT, RF, and XGBoost, whose primary hyperparameters are summarized in Table 3. These models were selected because they are based on different algorithmic principles. Such heterogeneous characteristics contribute to lower correlation among prediction errors and improve the overall predictive performance and generalization capability of the ensemble model. The optimal hyperparameters for each sub-model were determined through the SWO algorithm. The population size for the SWO algorithm was set to 40, the maximum number of iterations to 200, and the range of hyperparameters is shown in Table 3. The Stacking architecture itself employs a two-layer stacked structure with the number of cross-validation folds set to five (K = 5). Given that this hierarchical framework already possesses substantial complexity, the requirements for the meta-learner’s complexity are correspondingly reduced; therefore, a weighted averaging method is utilized for model fusion.

The testing results are shown in Figure 9, where the red dashed line represents the ideal prediction line (where predicted values equal actual values). The scatter points in the figure represent the model’s predictions; points closer to the ideal line indicate higher prediction accuracy.

As shown in the figure, the overall prediction performance for the four shield attitude parameters is favorable, with all R² values exceeding 0.9. There is no apparent systematic deviation between the predicted and actual values, indicating a high level of prediction accuracy. Notably, the prediction performance for the two attitude parameters at the shield head is generally lower than that for the shield tail. This discrepancy can be attributed to the fact that the selected features are more correlated with the shield tail attitude parameters, thereby exerting a greater influence on their prediction.

Employing the same dataset used for the PCA-SWO-Stacking model, comparative models based on RF, XGBoost, LSTM, and GRU algorithms were trained, and the identical test set was adopted for prediction. Taking the horizontal deviation of the shield tail as an illustrative example, Figure 10 presents a comparison of the prediction results obtained from the different models. As shown in the figure, the predicted values generated by the PCA-SWO-Stacking model are generally closer to the actual values than those produced by the other models. Although RF and XGBoost exhibit relatively stable predictive performance, their predictions still deviate from the actual values at certain locations, particularly for samples with larger fluctuations. Similarly, while the LSTM and GRU models are capable of capturing temporal dependencies in sequential tunneling data, their prediction accuracy and stability remain inferior to those of the proposed PCA-SWO-Stacking framework. Compared with the single-model approaches, the proposed PCA-SWO-Stacking model integrates heterogeneous base learners and further optimizes model parameters using the SWO algorithm, thereby achieving stronger nonlinear mapping capability and improved generalization performance under complex geological conditions. A quantitative performance comparison of the different models is provided in Table 4. Compared with the RF, XGBoost, LSTM, and GRU models, the PCA-SWO-Stacking model exhibits a substantially higher R² value, together with lower RMSE and MAE values. These results indicate that the proposed model more accurately characterizes the relationship between tunneling parameters and shield attitude during shield driving, thereby demonstrating superior predictive capability and robustness.

For further analysis of the PCA-SWO-Stacking model, data from rings 1 to 300 of the western section of the Beiheng Passageway were adopted for testing, following the same procedure described in the preceding sections. Ablation experiments were performed on the model. Specifically, the Stacking model with PCA only, the Stacking model with SWO hyperparameter optimization only, and the Stacking model without any preprocessing were each trained and predicted. Moreover, sub-models using the same hyperparameters as the PCA-SWO-Stacking model were constructed. Using the lateral deviation of the shield tail as an illustrative example, the performance comparison of the models is presented in Figure 11 and Table 5.

Experimental results indicate that ensemble learning, especially the Stacking framework, yields markedly superior prediction accuracy compared with conventional single models such as DT and KNN. The PCA-SWO-Stacking model achieves an R² of 0.927, in contrast to 0.723 and 0.795 for DT and KNN, respectively. Regarding computational efficiency, PCA reduces the prediction time by approximately 34% from 1.57 s to 1.03 s for the Stacking model, accompanied by an improvement in accuracy, with R² increasing from 0.901 to 0.904. SWO further increases the accuracy to 0.927 without increasing prediction time. The PCA-SWO-Stacking model also demonstrates favorable predictive performance for shield attitude in the western section of the project. Furthermore, it achieves a balance between high accuracy and the real-time requirements of shield attitude prediction, maintaining robust predictive capability while keeping prediction time within a reasonable range through PCA-based dimensionality reduction.

To improve the interpretability and engineering explainability of the proposed PCA-SWO-Stacking framework, SHAP (SHapley Additive exPlanations) analysis was conducted to quantify the contribution of different input parameters to the prediction results of shield attitude. The corresponding SHAP analysis results are shown in Figure 12.

The SHAP results indicate that the four shield attitude parameters are primarily influenced by the thrust forces of different groups of hydraulic jacks. This is mainly because the differential distribution of jack thrust directly affects the force balance and attitude adjustment of the shield machine during tunneling. The analysis demonstrates that the proposed model not only achieves high prediction accuracy but also provides reasonable engineering interpretability consistent with the mechanical characteristics of shield tunneling operations. Therefore, the proposed framework exhibits strong potential for practical application in intelligent shield attitude prediction and control.

5. Discussion

This study proposes a shield attitude prediction model based on PCA-SWO-Stacking, which achieves favorable prediction performance and verifies the contributions of PCA preprocessing and the SWO optimization algorithm to the overall model. Specifically, PCA improves computational efficiency, whereas SWO enhances prediction accuracy.

The raw dataset used in this study was obtained from an engineering project in Shanghai, and the research primarily focuses on shield tunneling construction in soft-soil regions. To further assess the generalization capability and applicability of the proposed method, future work may consider collecting datasets from different geological conditions to conduct cross-regional comparative and validation studies. The Stacking ensemble model comprises five heterogeneous base models. Future investigations could explore the impact of incorporating more sub-models, as well as adopting new algorithms, on the overall Stacking architecture.

6. Conclusions

This study proposes a shield attitude prediction model based on PCA-SWO-Stacking, which achieves integrated fusion among different sub-models. The model architecture and its impact on the final prediction results are analyzed. The main conclusions are as follows:

(1) A complete pipeline for shield attitude prediction using PCA-SWO-Stacking is proposed. Principal Component Analysis (PCA) is employed to reduce the dimensionality of the high-dimensional shield tunneling data, effectively extracting key features and reducing noise interference. By integrating multiple heterogeneous base models within the Stacking ensemble learning framework and optimizing the hyperparameters using the SWO algorithm, the model achieves a significant improvement in prediction accuracy. The proposed model yields satisfactory prediction performance for all four shield attitude targets, with R² values of 0.940, 0.964, 0.997, and 0.991, respectively, and MAE values below 1.5 in all cases.

(2) To validate the superiority and stability of the PCA-SWO-Stacking shield attitude prediction model, four baseline models—RF, GRU, LSTM, and XGBoost—were constructed for comparison. Their R² values are 0.916, 0.883, 0.916, and 0.928, respectively, all of which are lower than those of the PCA-SWO-Stacking model. Furthermore, using data from the western section for testing, a performance analysis of the PCA-SWO-Stacking model and its sub-models was conducted. The results show that the overall model outperforms each individual sub-model constructed separately. Moreover, ablation experiments verify the contributions of PCA preprocessing and the SWO optimization algorithm to the overall model: the former improves computational efficiency, while the latter enhances prediction accuracy.

(3) The proposed PCA-SWO-Stacking framework demonstrates potential for practical engineering applications in shield tunneling. By enabling accurate real-time prediction of shield attitude, the proposed method can assist operators in optimizing tunneling parameter adjustments and reducing construction risks caused by excessive attitude deviation. Furthermore, the proposed framework provides technical support for intelligent tunnel construction management and automated shield control under complex geological conditions.

Author Contributions

Conceptualization, J.Y.; Methodology, J.Y.; Formal analysis, J.Y.; Data curation, J.Y.; Writing—original draft, J.Y.; Writing—review & editing, M.Z.; Supervision, M.Z.; Project administration, M.Z.; Funding acquisition, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (grant No. 52078286), Shanghai Tunnel Engineering Co., Ltd. Special Research Project (2022-SK-01-5).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The funder had the following involvement with the study: participated in the provision of data.

References

Wang, L.; Wang, S.; Pan, Q. Data-Driven Predictions of Shield Attitudes Using Bayesian Machine Learning Incorporating Cross-Correlations and Spatial Correlations. Eur. J. Environ. Civ. Eng. 2025, 30, 1–35. [Google Scholar] [CrossRef]
Shen, X.; Yuan, D.; Jin, D.; Wang, X.; Chen, X. Shield Attitude Adjustment Induced by Slurry Pressure Balance (SPB) Shield Tunneling Considering the Effects of Overbreak Cutter: A Numerical Simulation by DEM and Engineering Application. Urban Rail Transit 2023, 9, 221–232. [Google Scholar] [CrossRef]
Wang, L.; Pan, Q.; Wang, S. Data-Driven Predictions of Shield Attitudes Using Bayesian Machine Learning. Comput. Geotech. 2024, 166, 106002. [Google Scholar] [CrossRef]
Xiao, H.; Xing, B.; Wang, Y.; Yu, P.; Liu, L.; Cao, R. Prediction of Shield Machine Attitude Based on Various Artificial Intelligence Technologies. Appl. Sci. 2021, 11, 10264. [Google Scholar] [CrossRef]
Sugimoto, M.; Sramoon, A. Theoretical model of shield behavior during excavation. I: Theory. J. Geotech. Geoenviron. Eng. 2002, 128, 138–155. [Google Scholar] [CrossRef]
Sugimoto, M.; Sramoon, A.; Konishi, S.; Sato, Y. Simulation of shield tunneling behavior along a curved alignment in a multilayered ground. J. Geotech. Geoenviron. Eng. 2007, 133, 684–694. [Google Scholar] [CrossRef]
Sugimoto, M.; Asanprakit, A. Stack pipe model for pipe jacking method. J. Constr. Eng. Manag. 2010, 136, 683–692. [Google Scholar] [CrossRef]
Festa, D.; Broere, W.; Bosch, J.W. Kinematic Behaviour of a Tunnel Boring Machine in Soft Soil: Theory and Observations. Tunn. Undergr. Space Technol. 2015, 49, 208–217. [Google Scholar] [CrossRef]
Sun, W.; Yue, M.; Wei, J. Relationship between Rectification Moment and Angle of Shield Based on Numerical Simulation. J. Cent. South Univ. 2012, 19, 517–521. [Google Scholar] [CrossRef]
Shen, X.; Chen, X.; Bao, X.; Zhou, R.; Zhang, G. Real-Time Prediction of Attitude and Moving Trajectory in Shield Tunneling Based Optimal Input Parameter Combination Using Random Forest Deep Learning Method. Acta Geotech. 2023, 18, 6687–6707. [Google Scholar] [CrossRef]
Huang, H.; Chang, J.; Zhang, D.; Zhang, J.; Wu, H.; Li, G. Machine learning-based automatic control of tunneling posture of shield machine. J. Rock Mech. Geotech. Eng. 2022, 14, 1153–1164. [Google Scholar] [CrossRef]
Chen, H.; Li, X.; Feng, Z.; Wang, L.; Qin, Y.; Skibniewski, M.J.; Chen, Z.S.; Liu, Y. Shield Attitude Prediction Based on Bayesian-LGBM Machine Learning. Inf. Sci. 2023, 632, 105–129. [Google Scholar] [CrossRef]
Wang, P.; Kong, X.; Guo, Z.; Hu, L. Prediction of Axis Attitude Deviation and Deviation Correction Method Based on Data Driven During Shield Tunneling. IEEE Access 2019, 7, 163487–163501. [Google Scholar] [CrossRef]
Zhou, C.; Xu, H.; Ding, L.; Wei, L.; Zhou, Y. Dynamic prediction for attitude and position in shield tunneling: A deep learning method. Autom. Constr. 2019, 105, 102840. [Google Scholar] [CrossRef]
Dai, L.; Chen, W.; Xiao, M.; Sun, W.; Wang, Z. Prediction of Super-Large Diameter Shield Attitude Based on LSTM-Transformer. Sci. Rep. 2025, 15, 15725. [Google Scholar] [CrossRef]
Zeng, L.; Chen, J.; Zhang, C.; Yan, X.; Ji, F.; Chang, X.; Wang, S.; Feng, Z.; Xu, C.; Xiong, D. Prediction of Shield Tunneling Attitude: A Hybrid Deep Learning Approach Considering Feature Temporal Attention. Meas. Sci. Technol. 2024, 35, 086211. [Google Scholar] [CrossRef]
Zhang, J.; Ding, X.; Deng, T.; Zheng, L.; Chen, G. Multistep transferable prediction of shield attitude and position in shield tunneling based on PCA and deep learning method. Meas. Sci. Technol. 2025, 36, 036312. [Google Scholar] [CrossRef]
Fu, Y.; Chen, L.; Xiong, H.; Chen, X.; Lu, A.; Zeng, Y.; Wang, B. Data-Driven Real-Time Prediction for Attitude and Position of Super-Large Diameter Shield Using a Hybrid Deep Learning Approach. Undergr. Space 2024, 15, 275–297. [Google Scholar] [CrossRef]
Chen, L.; Tian, Z.; Zhou, S.; Gong, Q.; Di, H. Attitude Deviation Prediction of Shield Tunneling Machine Using Time-Aware LSTM Networks. Transp. Geotech. 2024, 45, 101195. [Google Scholar] [CrossRef]
Dong, M.; Chen, C.; Zhong, F.; Jia, P. A Novel Hybrid Deep Learning for Attitude Prediction in Sustainable Application of Shield Machine. Sustainability 2025, 17, 10604. [Google Scholar] [CrossRef]
Xu, J.; Zhang, Z.; Zhang, L.; Liu, D. Predicting shield position deviation based on double-path hybrid deep neural networks. Autom. Constr. 2023, 148, 104775. [Google Scholar] [CrossRef]
Liu, X.; Yang, K. Shield machine pose prediction based on CNN-GRU-Attention. Eng. Res. Express 2024, 6, 035238. [Google Scholar] [CrossRef]
Abdel-Basset, M.; Mohamed, R.; Jameel, M.; Abouhawwash, M. Spider wasp optimizer: A novel meta-heuristic optimization algorithm. Artif. Intell. Rev. 2023, 56, 11675–11738. [Google Scholar] [CrossRef]
Ribeiro, M.; Coelho, L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl. Soft Comput. 2020, 86, 105837. [Google Scholar] [CrossRef]
Yue, M.; Guo, L. Double Closed-Loop Adaptive Rectification Control of a Shield Tunneling Machine with Hydraulic Actuator Dynamics Subject to Saturation Constraint. J. Vib. Control 2016, 22, 309–319. [Google Scholar] [CrossRef]
Zhang, Z.; Ma, L. Attitude Correction System and Cooperative Control of Tunnel Boring Machine. Int. J. Pattern Recognit. Artif. Intell. 2018, 32, 1859018. [Google Scholar] [CrossRef]
Wang, X.; Yuan, D.; Wang, X.; Wu, J. Kinematic Analysis and Virtual Prototype Simulation of the Thrust Mechanism for Shield Machine. Appl. Sci. 2022, 12, 1431. [Google Scholar] [CrossRef]

Figure 1. Geometric diagram of shield attitude parameters.

Figure 2. Workflow of the Spider Wasp Optimizer (SWO) algorithm.

Figure 3. General structure of ensemble learning.

Figure 4. Structure of the Stacking ensemble learning framework.

Figure 5. Multi-layer stacking ensemble learning structure.

Figure 6. Workflow of the PCA-SWO-Stacking model for shield attitude prediction.

Figure 7. Geological cross-sections of the east and west lines: (a) west line; (b) east line. Different colors represent different geological strata encountered along the tunnel alignment, including silty clay, sandy silt, and powdery clay layers. The solid black lines indicate the tunnel center axes, while the dashed red lines represent the designed tunnel alignments.

Figure 8. PCA scree plot.

Figure 9. Prediction results of shield attitude: (a) HDSH; (b) VDSH; (c) HDST; (d) VDST.

Figure 10. Comparison of test-set predictions from five models.

Figure 11. Comparison of the performance of different models.

Figure 12. SHAP analysis results of shield attitude: (a) HDSH; (b) VDSH; (c) HDST; (d) VDST.

Table 1. Physical and mechanical parameters of soil in each stratum.

Soil Layer Number	Soil Layer Name	Unit Weight γ (kN/m³)	Cohesion c (kPa)	Internal Friction Angle φ (°)	Compression Modulus Es (MPa)	Lateral Pressure Coefficient K₀
②₁	Brownish yellow to gray sandy silt	18.2	16	14.5	4.4	0.50
②₃	Yellow to gray sandy silt	18.5	3	30.6	8.9	0.35
③	Gray mucky silty clay	17.2	11	14.0	2.7	0.60
④	Gray mucky clay	16.8	10	10.8	2.3	0.62
⑤₁	Gray silty clay	17.8	15	15.4	3.7	0.50
⑤₃	Gray powdery clay interbedded with silt	18.0	15	18.2	4.9	0.45
⑤₄	Gray-green powdery clay	19.6	48	17.9	7.7	0.40
⑥	Dark green silty clay	19.6	48	15.9	8.0	0.40
⑦₁	Straw yellow to gray silty sand	18.9	2	31.4	12.7	0.32
⑦₂	Straw yellow to gray silty fine sand	18.7	1	32.8	12.7	0.30
⑧_1-1	Gray clay	18.0	19	15.0	5.4	0.50

Table 2. Input features for the shield attitude prediction model.

Category	Parameter	Unit	Count
Construction Parameters	Thrust cylinder force	kN	6
	Cutterhead speed	(r/min)	1
	Advance rate	(mm/min)	1
	Cutterhead torque	(MN·m)	1
	Total thrust	kN	1
Current Shield Attitude	HDSH	mm	1
	VDSH	mm	1
	HDST	mm	1
	VDST	mm	1
Geological & Design Info	cohesion	kPa	1
	internal friction angle	°	1
	coefficient of lateral earth pressure	-	1
	average compression modulus	MPa	1
	tunnel overburden depth	m	1
Total			19

Table 3. Optimization range and values for model hyperparameters.

Model	Primary Hyperparameters	Hyperparameter Range	Value
LightGBM	n_estimators	[50, 500]	312
	learning_rate	[0.01, 0.30]	0.1
	num_leaves	[5, 255]	30
KNN	K	[1, 50]	7
DT	Splitting criterion	-	RMSE
DT	max_depth	[3, 20]	8
RF	n_estimators	[10, 200]	42
	max_depth	[3, 13]	9
	Splitting criterion	-	RMSE
XGBoost	n_estimators	[50, 500]	328
	max_depth	[3, 15]	9
	learning_rate	[0.01, 0.30]	0.2

Table 4. Performance comparison of different models.

Model	R²	RMSE	MAE
PCA-SWO-Stacking	0.997	1.653	1.031
RF	0.916	8.061	5.129
XGBoost	0.928	7.442	5.126
GRU	0.883	9.497	6.846
LSTM	0.916	8.071	5.603

Table 5. Comparison of model performance.

Model	R²	RMSE	Prediction Time/s
PCA-SWO-Stacking	0.927	2.621	0.96
PCA-Stacking	0.904	2.937	1.03
SWO-Stacking	0.921	2.614	1.54
Stacking	0.901	2.873	1.57
RF	0.846	7.766	0.43
XGBoost	0.854	6.942	0.59
LightGBM	0.801	7.856	0.51
KNN	0.795	8.543	0.07
DT	0.723	9.932	0.08

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, J.; Zhang, M. Prediction of Large-Diameter Shield Tunneling Attitude: PCA-SWO-Stacking Machine Learning Algorithm Application in a Case Study of the Shanghai Beiheng Passageway. Appl. Sci. 2026, 16, 5548. https://doi.org/10.3390/app16115548

AMA Style

Yu J, Zhang M. Prediction of Large-Diameter Shield Tunneling Attitude: PCA-SWO-Stacking Machine Learning Algorithm Application in a Case Study of the Shanghai Beiheng Passageway. Applied Sciences. 2026; 16(11):5548. https://doi.org/10.3390/app16115548

Chicago/Turabian Style

Yu, Jingxiang, and Mengxi Zhang. 2026. "Prediction of Large-Diameter Shield Tunneling Attitude: PCA-SWO-Stacking Machine Learning Algorithm Application in a Case Study of the Shanghai Beiheng Passageway" Applied Sciences 16, no. 11: 5548. https://doi.org/10.3390/app16115548

APA Style

Yu, J., & Zhang, M. (2026). Prediction of Large-Diameter Shield Tunneling Attitude: PCA-SWO-Stacking Machine Learning Algorithm Application in a Case Study of the Shanghai Beiheng Passageway. Applied Sciences, 16(11), 5548. https://doi.org/10.3390/app16115548

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Large-Diameter Shield Tunneling Attitude: PCA-SWO-Stacking Machine Learning Algorithm Application in a Case Study of the Shanghai Beiheng Passageway

Abstract

1. Introduction

2. Methods

2.1. Data Collection and Preprocessing

2.2. Principal Component Analysis (PCA) Algorithm

2.3. Spider Wasp Optimizer (SWO) Algorithm

2.4. Stacking Ensemble Learning Framework

2.5. Evaluation Metrics

3. Case Study

3.1. Project Background

3.2. Feature Selection

4. Analysis of Attitude-Prediction Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI