Next Article in Journal
A Life-Cycle Technology Upgrade Scheduling Model
Previous Article in Journal
Automated Classification of Medical Image Modality and Anatomy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Shield Machine Attitude Prediction Method Based on Causal Graph Convolutional Network

1
School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan 430068, China
2
Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology, Wuhan 430068, China
3
CCCC Wuhan Zhi Xing International Engineering Consulting Company Ltd., Wuhan 430068, China
*
Author to whom correspondence should be addressed.
Algorithms 2026, 19(3), 224; https://doi.org/10.3390/a19030224
Submission received: 9 January 2026 / Revised: 3 March 2026 / Accepted: 9 March 2026 / Published: 16 March 2026

Abstract

Accurately predicting and controlling the attitude of a shield tunneling machine is critical for quality assurance in shield tunneling projects. Existing prediction methods utilize historical data to construct a machine learning framework to predict future attitude deviations. However, this method is poorly interpretable and lacks practical engineering guidance. Considering the shortcomings of this prediction method, this study suggests an innovative deep learning method called causal graph convolutional network (C-GCN-GRU), and the goal of this project is the improvement of the interpretability of the shield attitude prediction. The causal relationships between key attitude features of the shield machine are recognized and quantified by the PCMCI+ method. The found causal relationships are converted into collocation matrices to be input into a model consisting of GCN and GRU, and combined with multi-head causal attention to better forecast the shield machine attitude. The results trained on a dataset from the Karnaphuli River Tunnel Project in Bangladesh show that the accuracy of the four variables characterizing the shield attitude and position predicted by the C-GCN-GRU model outperforms that of the other four similar models and provides decision support for attitude and position adjustments in shield tunnels.

1. Introduction

The uneven mass distribution, varying frictional resistance and operational complexity of the shield machine make it hard to control the attitude and position of the shield machine in actual operation, resulting in a serpentine movement of its trajectory around the DTA [1]. Shield tunnel deviation can trigger errors in the tunnel structure, which in turn poses safety risks for later operations, such as quality problems that may lead to damage and leakage of tunnel tubes [2]. In addition to this, STM increases the complexity of assembly and may cause deviations in the excavation route, which may have an important impact on the cost control and scheduling of the tunneling project [3].
Currently, feedback-based control methods are used for shield control. Since the control process must exert its influence and produce its effect only after the deviation has occurred, it has the disadvantage of untimely control. This situation is the main cause of serpentine motion in shield tunnel construction [4]. In addition, there is a hysteresis effect of attitude and position control in shield tunnel construction, i.e., there is a time and distance lag between the change in the thrust operation mode of the shield machine and the movement path of the shield machine. The current control relies heavily on the experience of the operator, which is highly uncertain and further exacerbates the serpentine motion [5].
The shield tunnel boring process has accumulated a wealth of experimental data, which cover a wide range of working conditions and present a comprehensive picture of the boring process and its results. It is worth noting that with the wide application of machine learning techniques in shield attitude prediction, these techniques have attracted much attention due to their ability to reveal the multifactorial and complex relationships affecting shield attitude. Although these methods offer certain advantages in enhancing the accuracy of shield attitude prediction, they often struggle to address the inherent correlations between shield attitude parameters and operational factors. Additionally, many existing machine learning techniques tend to implicitly capture the relationships between different parameters. While this may reduce computational complexity, it compromises the interpretability of the model. As a result, the interactions among various factors remain unclear, leading to models with opaque inner workings that are difficult to comprehend. These challenges hinder effective decision-making, risk management, and mitigation strategies in tunnel projects. Moreover, the limited interpretability of machine learning models further complicates the understanding of the fundamental principles driving their predictions [6]. The importance of integrating interpretability mechanisms into predictive modeling pipelines has been similarly recognized in other high-stakes decision domains, where structured explainability strategies have been shown to improve both model transparency and practical utility [7]. Some researchers have recently suggested explanatory models to shed light on black-box models and their predictive outcomes. However, these approaches often neglect the significance of causal knowledge [8,9]. Causality is different from correlation, which not only reveals the trend of change between variables but also explains in more detail how the change in one variable directly affects the change in another variable. The construction of causal networks helps to improve the accuracy, interpretability and reliability of forecasting models. Identifying causal relationships in time-series data is crucial for predicting the field of shield tunneling [10].
The Graph Convolutional Network (GCN) extends neural network processing of non-regular graph data to improve processing efficiency and accuracy [11]. GCN is used in conjunction with the Gated Recurrent Unit (GRU) for real-time traffic flow prediction to capture both spatial and temporal dependencies [12] and has shown potential for regression analysis of non-uniform data in tasks such as information mapping extraction [13]. The aim of this study is to combine causality and GCN to improve the accuracy and interpretability of shield tunneling attitude.
The key contributions of this study are outlined below: (1) A data-driven approach was applied to extract the causal relationships between shield machine attitude features as input features for developing deep learning models to improve the interpretability of the models. (2) A novel deep learning method based on a causal spatio-temporal graph convolutional network is proposed to predict shield machine attitude parameters with high accuracy. (3) A multi-head causal attention mechanism is introduced into the model, which not only effectively reduces the influence of confounding factors but also focuses on the key causal chains more accurately than the traditional attention mechanism, thus significantly improving the overall performance and generalization ability of the model.
The proposed C-GCN-GRU method represents a targeted methodological advancement within the GCN-GRU framework for shield machine attitude prediction, through the principled integration of causal discovery and confounding-aware attention mechanisms. The goal is to improve both prediction accuracy and physical interpretability within this specific engineering application.
The rest of the paper is organized as follows. Section 2 describes the measurement methods related to shield attitude. Section 3 shows the proposed method in detail. In Section 4, the effectiveness of the method proposal is evaluated using genuine tunnel construction data from the Karnaphuli River Tunnel Project in Bangladesh. In Section 5, the results are fully compared and discussed. Finally, conclusions are drawn in Section 6.

2. Research Background of Shield Machine Attitude and Position Control

The control of the attitude and position of the shield machine is critical, as significant deviations in its trajectory can result in long-term issues with the lining quality. To provide a clearer context for this study, this section outlines the fundamentals, including current industry practices for shield machine position control and relevant past research in the literature.

2.1. Measurement of Shield Machine Attitude and Position

Accurate attitude and position measurements are essential during tunnel excavation, enabling operators to steer the shield machine along its designed trajectory (DTA) [14]. Earlier manual measurements with low efficiency and poor accuracy have been replaced by automated measurement methods such as laser guidance systems. The laser guidance system consists of a laser longitude and latitude meter mounted on the jacking axis and a target plate at the target position, which obtains the position and attitude of the shield machine by analyzing the laser reflection points. Meanwhile, the inclinometer can provide high-frequency measurements of pitch and roll angles, but it is susceptible to vibration [15]. To improve the measurement accuracy and anti-interference capability, modern shields usually combine two methods [16,17].
Using the gathered data, the attitude and position of the shield machine can be determined through sensors placed at six locations, with the results displayed as deviations from the target position. These sensors measure horizontal and vertical deviations at the head, center (near the joint jacks), and tail of the shield machine. Figure 1 shows a schematic diagram of the shield machine position measurement and sensor locations.

2.2. Deep Learning Methods for Intelligent Tunnel Boring Applications

In the last decade, deep learning has become one of the brightest stars in the field of artificial intelligence; with its powerful characterization and modeling capabilities, deep learning methods have demonstrated their amazing capabilities and potential in several industries and applications. In the field of project construction, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Graph Neural Network (GNN) have been heavily experimented with and achieved some results [18]. Next, their applicability to shield machine attitude prediction is briefly described.
CNN and its variants are widely used for classification and target recognition tasks, with major applications in the construction industry for structural health monitoring, schedule and resource management, and personnel safety management [19]. CNN has also been used in intelligent tunneling, such as Gong et al.’s real-time slag analysis system [20], Zhao et al.’s CNN-based tunnel crack detection method [21], and Qin et al.’s study on identifying tunnel lining elements using ground penetrating radar images [22], which demonstrated the superiority of CNN in image processing. In addition, Zhao et al. constructed a network for accurately detecting tunnel cracks by combining an attention module with a U-net [23]. However, the application of CNN in the prediction of shield machine operation parameters is still limited by its characteristic of only processing Euclidean data, while the data structure of shield machine parameters is more complicated.
RNN and its variants are very powerful in processing time-series data and are able to capture the cyclic patterns of each loop of data during tunneling, and thus are widely used for parameter prediction tasks in tunneling, such as penetration rates, load parameters and face pressures. For example, Fu and Zhang used LSTM to accurately estimate the penetration rate and analyze the key influencing factors [24], and Gao et al. investigated the performance of RNN, LSTM, and GRU and found that GRU was the best in the prediction of key parameters of the shield machine [25]. In addition, Li and Gong accurately predicted the slurry pressure by diagonal RNN [26], while Dai et al. used a hybrid approach combining CNN and GRU to predict shield machine attitude and position [27]. These studies have shown that RNN methods perform well in capturing the time dependence of shield machine operating parameters, but they still have limitations in modeling relationships between variables.
GNN and its variants are mostly used to process graph data with a unique and non-Euclidean structure [28], of which GCN is one of the most commonly used models and has been widely studied [29]. In the civil engineering and construction fields, GCN has especially excelled in traffic prediction. For example, Zhao et al. combined GCN with LSTM to estimate traffic flows in real time [12], while Lee and Rhee predicted traffic speed based on a hybrid GCN combining distance, direction and location relationships [30]. In tunneling, attitude parameters can be considered as nodes with dependencies as links, thus constituting graph-structured data. However, unlike traffic sensors, shield machine parameters are difficult to define relationships in terms of Euclidean distances, and conventional correlation coefficients, while describing variable dependencies, may not be sufficient to improve prediction accuracy.
In summary, CNN, RNN and GCN are the three key deep learning networks in shield machine research. CNN mainly deals with images but has limited applicability in shield machine prediction tasks; RNN has been widely used and is effective in shield machine parameter prediction, but the prediction accuracy can still be improved if inter-parameter relationships are taken into account; and GCN has great potential for load prediction but has challenges in constructing the adjacency matrix. To this end, we will combine the inter-parameter causal relationships to construct the GCN adjacency matrix required for shield machine attitude prediction. Therefore, this paper proposes a new C-GCN-GRU method to quantify variable causality and predict attitude factors with high precision.

3. Methodology

In order to speculate the causal relationships between the attitude factors of the shield machine and provide a reliable prediction of the attitude key factors, a new method, C-GCN-GRU, is put forward. The framework of this method is shown in Figure 2’s flowchart of the method. Firstly, the collected data are preprocessed to detect the causal relationship between these variables by the PCMCI+ method, which can construct the adjacency matrix more accurately. Next, the adjacency matrix and the dataset are fed into our proposed novel deep learning model, which combines GCN and GRU to capture both spatial dependencies among shield parameters and temporal dynamics, with interpretability further enhanced by the causal adjacency matrix.

3.1. Data Preparation and Causation

The shield boring data was organized by ring number and saved as separate CSV files. However, the raw data contains various invalid entries. These cause superfluous information and reduce the prediction accuracy of the model, so data preprocessing is crucial [31]. Downtime is first filtered out of the dataset, and then outliers are handled. In this study, outliers were identified using the 3-σ rule [32] and replaced with the mean of adjacent data points. It is noted that this rule assumes an approximately normal distribution and performs best when the proportion of outliers is small (≤~1%). For the TBM operational parameters used in this study, these conditions were verified prior to preprocessing. Following this, a Butterworth filter was applied to reduce noise and enhance the model’s predictive accuracy. The filter’s squared amplitude, expressed by Equation (1), defines w as the frequency, w c as the cutoff frequency, and m as the filter order, representing how signal components outside the cutoff range are attenuated.
H w 2 = 1 1 + w w c 2 m
Due to the different dimensions of the in situ collected data, all features were normalized to [0, 1] using standard min–max normalization [33] to eliminate scale differences and improve training stability. The inverse transformation was applied to back-normalize predictions for evaluation.
To adapt the processed two-dimensional data for the time-series model, a sliding window method was used to convert it into a three-dimensional format organized by batch, time, and feature. Initially a two-dimensional matrix with time steps and features, the normalized data is incompatible with time-series inputs until reshaped.
The preprocessed data can be tested for causal relationships between the predicted parameters using the PCMCI+ algorithm that is included in two phases: (1) identifying relevant parent terms in all time-series using the PC algorithm; and (2) testing the causal relationship between each variable at moments t τ with the variable X i t τ and the causal relationship between the variable X j t at moment t (where i and j denote different features in the original data).
In the first stage, the PC algorithm is used to test the dependence of all parameter nodes. We test conditional independence for each pair of time-series X t τ i and X t j . Ignore links below α P C .
In the second stage, the MCI examines and extracts potential directed causal links using all simultaneous neighborhoods and the set of hysteresis parent tags identified in the first stage as conditions. The MCI test can be carried out by using Equation (2).
M C I : X t τ j X t τ i     P X t i \ X t τ j , P X t τ j

3.2. Causal Spatio-Temporal Deep Learning Model Development

In order to optimize the operating parameters of the shield machine and control the trajectory deviation, a model is needed to simulate the nonlinear relationship between the input parameters and the trajectory deviation. In this paper, we propose the causal extraction adjacency matrix to be incorporated into the GCN-GRU deep learning model with a multi-head causal attention mechanism. In this section, we introduce the GCN, GRU structure, multi-head causal attention mechanism and the training and evaluation process of the model in detail.

3.2.1. GCN-GRU Mechanism

During the shield machine tunneling process, there are complex interactions between the attitude parameters, which contain not only dependencies in the time dimension but also spatial correlations between the parameters. Therefore, we propose to combine the adjacency matrix established by causality with GCN to specifically capture the spatial dependencies among shield machine parameters. Unlike traditional adjacency matrices constructed based on correlation coefficients or expert experience, our adjacency matrix is constructed by causal relationships identified from actual construction data through the PCMCI+ method, which can more accurately reflect the physical influence mechanisms among parameters.
The main idea of GCN is to extract features by aggregating the information of nodes and their neighboring nodes by using the connection relationship between nodes in the graph structure when processing graph data. The hidden state of the network layer at time step l + 1 can be expressed as in Equation (3). The schematic diagram of the GCN structure is shown in Figure 3’s GCN model structure.
H l + 1 = f H l , A
where H l is the input message of the l t h layer ( H 0 is the initial input data), and A is the adjacency matrix. The difference in the models is mainly in the activation function f . , . . The choice of the forward propagation rule is represented in Equation (4).
f H l , A = σ A H l W l
where W l is the trainable weight matrix of layer l , and φ . is a nonlinear activation function. A in the above equation indicates that the feature vectors of all neighboring nodes of the target node are summed except for the target node itself, and the addition of a unit matrix can solve this problem. In addition, normalizing the adjacency matrix A by using the positive definite matrix D 1 2 A D 1 2 helps the algorithm to proceed smoothly. Therefore, the forward propagation equation can be designed as shown in Equation (5).
H l + 1 = σ D ~ 1 2 A ~ D ~ 1 2 H l W l
A ~ = A + I , I is the unit matrix. Embedded nodes can be input into any loss function for forward propagation, using stochastic gradient descent and backpropagation strategies to adjust the weight parameters.
The tunneling parameters of the shield machine show obvious time-series characteristics, and the historical trends of the parameters have an important influence on the future attitude. In our test data analysis, we observed complex nonlinear time-series dependencies of shield machine parameters for the Karnaphuli River Tunnel Project, as they varied in different geological conditions. Therefore, we need a model structure that can effectively capture the long-term time-series dependence and is computationally efficient.
Many studies have highlighted not only the causal relationship between TBM parameters but also their temporal dependency. In time-series prediction tasks, a commonly employed deep learning approach is the Recurrent Neural Network (RNN), which is particularly effective for handling sequential dependencies in data. GRU is a computationally efficient variant of LSTM that addresses the gradient vanishing problem of standard RNN through update and reset gating mechanisms, and has demonstrated strong performance in TBM parameter prediction tasks [25] (Gao et al. 2019). This study employs the GRU as the temporal modeling component, the structure of which is depicted in Figure 4.
The GRU regulates information flow through two gating mechanisms: the reset gate R t controls how much of the previous hidden state h t 1 is combined with the current input x t , while the update gate Z t determines how much prior memory is retained. Each time step produces a hidden state h t passed to subsequent steps. The specific formulas of GRU are shown in Equations (6)–(10).
Z t = σ W z · h t 1 , x t
R t = σ W r · h t 1 , x t
h t ~ = tan h W · r t h t 1 , x t
h t = 1 z t h t 1 + z t h t ~
tanh x = sinh x cosh x = e x e x e x + e x
Here, the output hidden state h t of the GRU is used as input, and a random weight is generated W . The attentional weights for each modality are obtained by multiplying h t with W , followed by a bias, and performing a t a n h operation on the sum.

3.2.2. Multiple Causal Attention Mechanisms

In the shield tunneling environment, the factors affecting the attitude of the shield machine are very complex, including changes in geological conditions, adjustments of operating parameters, and various unmeasured factors (e.g., groundwater and rock interfaces). These factors, as potential confounders, may cause the model to misidentify the relationship between certain parameters. For example, in our data analysis, we found that the correlation between cutter torque and thrust increased significantly when the shield traversed a water-bearing sand layer, but this correlation does not necessarily reflect a direct causal relationship between them; it may be due to the common influencing factor of groundwater pressure.
In the traditional self-attention mechanism, the weights are usually obtained by multiplying the query set Query and the key set Key, and then updating the value set Value. Then, these weights are unsupervised, i.e., the attention weights are not labeled with weight labels during the training process, which may lead to data bias. For example, if there are a lot of descriptions of “man riding a horse” in the training data, the self-attention mechanism will tend to associate “riding” with “man” and “horse”. In the testing stage, when encountering the scenario of “a person driving a horse and carriage”, the self-attention mechanism may incorrectly associate “person” with “horse” and infer “riding”, ignoring the fact that “riding” is the same as “riding” and ignore “carriage”.
The problem is essentially caused by confounding factors (a proper name in causal reasoning), such as when there is no direct causal relationship between X and Y, but X and Y are still related to each other. The theory can be explained by a causal structure diagram, as shown in Figure 5.
In the figure, X is the input picture, Y is the label, C denotes common knowledge (e.g., a person can ride a horse), C is the confounding factor, and M is the target in picture X. From the causal graph, we can see that there are two paths for X→Y: X→M→Y and X←C→M→Y (with confounders). Therefore, no matter how large the dataset is, if we do not know the confounders, we can never identify the true causal effect by only using P(Y|X) to train the model.
To solve this problem, Nanyang Technological University and Monash University, Australia, jointly proposed Causal Attention (CATT), which uses the front-door criterion and does not require the assumed knowledge of confounders [34]. Intra-sample attention (IS-ATT) and cross-sample attention (CS-ATT) are proposed to comply with the Q-K-V operation, and the parameters of the Q-K-V operation can also be shared between IS-ATT and CS-ATT to further improve the efficiency in certain architectures. The mathematical processes for in-sample attention, cross-sample attention, and front-gate criterion are shown in Equations (11)–(13).
Z ^ = z P Z = z | h X z
X ^ = x P X = x | f X x
P Y | d o X = z P Z = z | X x P X = x P Y | Z = z , X = x
where h X and f X are both feature encoding functions. X and Z denote vectors.
The structure of a single module of causal attention is shown in Figure 6.
These include IS-ATT and CS-ATT. After calculating Z ^ and X ^ , we can feed them into the predictor to make decisions or use more stacked attention layers for further embeddings.
Correspondingly, the In-Sampling attention (IS-ATT) formula is as follows:
A I = S o f t m a x Q I T K I
Z ^ = V I A I
where all K I and V I are derived from the current input sample features, and Q I is derived from h X . In cross-modal i-attention, the query vectors represent the sentence context, while the query vectors in the self-attention mechanism still represent the input sample features. For A I , each attention vector a I is an IS-Sampling probability estimate of P Z = z | h X , and the output Z ^ is an IS-Sampling evaluation vector.
Similar to IS-ATT, the structure of Cross-Sample attention (CS-ATT) is shown in the red part of the above figure, and the algorithm is as follows:
A C = S o f t m a x Q C T K C
X ^ = V C A C
where both K C and V C are derived from other samples in the training set, and Q C is derived from f X . a C approximates P X = x | f X , and X ^ is the CS-Sampling evaluation vector.
Finally, a single causal attention is obtained from IS-ATT and CS-ATT, respectively, and these two values are then spliced together as the final P Y | d o X value. In this study, the “other samples” in CS-ATT are drawn exclusively from within the same training mini-batch, ensuring that no validation or test information is accessible during the attention computation. Mini-batches are constructed solely from the training partition under the chronological data split, guaranteeing strict prevention of information leakage through the cross-sample attention mechanism.
The CATT module is incorporated primarily as a confounding-aware architectural component, whose design is theoretically inspired by the front-door criterion. While the front-door criterion provides the motivation for the intra-sample and cross-sample attention structure, the formal conditions required for front-door identification cannot be rigorously verified in the shield tunneling context. The module should therefore be understood as improving robustness to spurious correlations and distributional biases in the training data, rather than formally identifying causal effects in the do-calculus sense.
The C-GCN-GRU approach begins with using the PCMCI+ algorithm to extract and quantify causal relationships among shield machine parameters. These quantized causal connections are then processed through the CATT module to emphasize relevant causal features. Afterward, the data is fed into a deep learning model that integrates Graph Convolutional Networks (GCN) with Gated Recurrent Units (GRU) to further model the relationships and temporal dependencies. Figure 7 illustrates the full structure of this deep learning model, showcasing how each component contributes to the model’s overall predictive capability. The input temporal graph data ( b a t c h _ s i z e ,   s e q _ l e n ,   16 ) first passes through the input projection layer, which transforms the features into 128 dimensions. The Causal Self-Attention (CATT) module then processes the data while maintaining the temporal causal structure. The output of the attention module ( H ( 0 ) ) flows through two consecutive GCN layers, which combine neighboring matrix information for graph convolution, thus maintaining a 128-dimensional feature representation. Following the two GCN layers, the output tensor of shape ( b a t c h _ s i z e ,   s e q _ l e n ,   128 ) is transposed to ( b a t c h _ s i z e ,   128 ,     s e q _ l e n ) , where the sequence length dimension (30) serves directly as the input size for the first GRU layer. This transposition operation aligns the GCN spatial embeddings with the sequential input format required by the GRU temporal modeling component, without introducing additional learnable parameters. Three successive GRU layers process the sequence information using explicit hidden state layers ( h t 3 ,     h t 2 ,     h t 1 ) . Finally, the output of the GRU is passed through the fully connected layers to produce the final 6-dimensional output.

3.3. Model Training and Evaluation

The dataset is divided into a training set, a validation set and a testing set. Of these, the first 70% of the data is used for training, 10% for validation, and the remaining 20% is used to test the model performance. The dataset was partitioned in strict chronological order to preserve the temporal dependency structure of the boring process and prevent data leakage. Standard k-fold cross-validation was not applied, as it would violate the temporal ordering of the data and introduce look-ahead bias. This chronological partitioning strategy is consistent with the rolling forecasting origin approach for time-series modeling, ensuring that the model is evaluated solely on data recorded after the training period, in line with realistic operational deployment conditions. The sliding window was applied independently within each partition after the chronological train/validation/test split, with no window spanning partition boundaries to prevent information leakage. The model adopts a direct multi-step prediction strategy, simultaneously outputting all 6 future time steps in a single forward pass without recursive feedback of predicted values, thereby avoiding exposure bias and compounding prediction errors during both training and evaluation. Normalization parameters were computed solely on the training set and applied consistently to the validation and test sets. During the training process, the input data are passed through enough iterations to fit the corresponding outputs, and the loss function used is the mean square error (MSE) to ensure model optimization and performance improvement. The loss function can be calculated from Equation(18).
M S E = 1 N i = 1 N f i x y i 2
N is the total number of test samples for y t e s t , f i x is the model output, and y i is the original data.
Trained deep learning models need to be carefully evaluated to check their performance, as only models with higher performance can provide reliable results after optimization. Model performance is evaluated using three standard metrics, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Coefficient of Determination ( R 2 ), where n is the number of test samples, y i is the true value, y ^ i is the predicted value, and y ¯ is the mean of true values.
The fine-tuning of model hyperparameters is guided by domain expertise and prior experience, followed by systematic grid search to identify optimal configurations. Table 1 consolidates the complete model configuration, including training hyperparameters, architectural parameters, data preprocessing settings, and computational environment details, to ensure full reproducibility. All experiments were conducted with a fixed random seed of 42.

4. Case Studies

4.1. Project Background

Bangladesh’s Karnaphuli River Tunnel, also known as the Bongo Bandhu Sheikh Mujibur Rahman Tunnel, is the first underwater road tunnel in South Asia. Launched in 2016, the project has a total length of 3.32 km, including two 2.45 km-long four-lane tunnel tubes dedicated to vehicular traffic. In this study, the Karnaphuli River Bottom Tunnel in Bangladesh is used as a use case.

4.2. Model Training and Validation Details

The data collected from the Shield Data Acquisition System includes 1224 rings, each with over 1700 features and more than 100 data records. These records were obtained at intervals of 20 mm during the digging process. This comprehensive dataset provides detailed measurements for further analysis and model development.
In this study, only four traits, HDH, HDT, VDH and VDT, are considered, and 200 rings of the dataset from the 700th to 800th ring from 1224 rings of data are selected for optimization for demonstration. The segment spanning rings 700–800 was selected, as it represents a geologically homogeneous section under steady-state boring conditions, providing a controlled and physically interpretable testbed for demonstrating the proposed methodology. The full dataset of 1224 rings is available, and the framework is designed to be applicable to other segments through re-estimation of the causal graph under local geological conditions. Cross-segment validation across the full tunnel alignment is identified as an important direction for future work. The trait selection method resulted in the identification of 15 shield traits and 4 attitude traits as input features for the prediction model. Specific descriptions of these features have been presented in Table 2.
After identifying the desired features and using the 3-σ rule to deal with outliers, a Butterworth filter is then used to reduce the effect of individual noise points. Next, a Butterworth filter was used to reduce the effect of individual noise points. To effectively remove outliers while maintaining the integrity of the data trend, the order of the filter was set to 2, and the cutoff frequency was set to 0.2.
The preprocessed data serves dual purposes: it enables the extraction of causal information and supports training for the C-GCN-GRU deep learning model. In the PCMCI+ method, two primary parameters guide the causal detection process: maximum lag time τ and remarkable level α P C . The maximum lag time τ controls how many prior time steps are analyzed to detect causality, while α P C sets the statistical significance threshold for confirming causal links. It is acknowledged that PCMCI+ operates under the assumptions of causal sufficiency, stationarity, and faithfulness. In this study, the selected segment (rings 700–800) exhibits relatively homogeneous geological conditions, supporting the approximate stationarity assumption. Causal sufficiency is approximated by including all primary operational and attitude parameters identified through domain knowledge, though unmeasured confounders (e.g., localized geological heterogeneity) cannot be fully excluded. The multi-head causal attention module provides partial robustness against unmeasured confounders via the front-door criterion. The identified causal structure should therefore be interpreted as a physically informed data-driven approximation rather than verified ground-truth causality, and re-estimation of the causal graph is recommended when applying the framework to segments with substantially different geological conditions.
In this study, it is crucial to choose an appropriate significance threshold α P C , which directly affects the identification of causal relationships. To determine this threshold scientifically, we used the Akaike criterion (AIC), a statistical method widely used for model selection. The purpose of the AIC is to select a data decision that can be best interpreted from the candidate models without overdoing it. AIC is computed by a simple formula:
A I C = 2 k 2 ln L ^
where k is the number of estimated parameters, and L ^ is the maximum value of the likelihood function.
In practice, we tested a range of α P C values, from 0.1 to 0.5, and evaluated the performance of the model by calculating the AIC value at each setting. We found that the AIC value was minimized at α P C = 0.3, indicating that at this level of significance, the model achieves the best interpretation of the data and an appropriate balance of complexity. The choice of α P C = 0.3 is not only based on statistical significance but also reflects a combination of consideration of prediction accuracy and model simplicity. As an example, the causal effect of τ = 2 is visualized in Figure 8 and Figure 9. Figure 8 shows a directed acyclic plot, and Figure 9 shows a time-series plot. The link colors indicate the strength of the causal effect between the two parameters, while the node colors indicate the magnitude of the causal effect relative to the previous time step.
Key hyperparameters such as the number of layers, neuron count per layer, and activation functions are essential for optimizing the GCN and GRU components in the model. In this study, various configurations of GCN and GRU structures were tested to find the combination yielding the best performance. Through these trials, the final architecture was determined: a GCN with two layers and a GRU with three layers, providing an effective balance between complexity and predictive accuracy. The GCN layer has 16 and 128 neurons, respectively, and the GRU layer has 128 neurons, and the activation function is ELU. The problem of different feature data with different ranges and units is solved by normalization.
The input data required for model training includes training data and an adjacency matrix. The training data comprises 16 sequential features, with 15 generic and 1 predictive pose feature. According to engineering specifications, a sliding window approach is used: the window length is set to 30, meaning that the model input includes the previous 30 time steps with a step size of 1. The output predicts the pose feature values for the next 6 time steps. In this study, one of the main challenges we face is how to control the complexity of the model while maintaining its accuracy, especially when dealing with high-dimensional time-series data. To address this issue, a systematic approach is used to select the best time-lagged state τ to optimize model performance and reduce potential mixed predictions.
Experimentally evaluating the impact of different time states (from t to t-3), we find that τ = 2 provides the best balance. As shown in Table 3, Table 4 and Table 5 of our paper, model ability at τ = 2 is superior to that at τ = 0 and τ = 1, which demonstrates the latter’s inability to capture all relevant variables and dynamics in the data lead. Meanwhile, τ = 3 provides similar performance to τ = 2, but increasing the delay time leads to the complexity of the model, which increases not only the computational complexity but also the training time. After choosing τ = 2 as the time state for model training, we were able to effectively reduce unnecessary model complexity while controlling for potential confounding factors due to high-dimensional data and equilibrium covariance. This approach not only improves the overall performance of the model but also ensures computational efficiency, making the model more suitable for the needs of speed and accuracy in practical applications. Overall, by reasonably choosing the time lag states and optimizing the model structure, we effectively control the potential confounding factors in the high-dimensional time-series data, which is crucial to ensuring the accuracy and reliability of the model predictions.

4.3. Analysis of Results

This section is included in two parts for analysis: analysis of causal relationships between parameters and analysis of model prediction results. Causal relationships can be identified with the PCMCI+ method between TBM features through pure data evaluation.
Traditional purely data-driven models lack consideration of the physical mechanisms of the system and must be entered into the adjacency matrix when used in GCNs. The adjacency matrix in traditional GCNs is usually based on spatial or physical distance thresholding methods, Pearson’s correlation coefficients, or other statistical correlation metrics, which capture some kind of relationship between nodes but mainly reflect statistical correlation rather than causality, and lack consideration of the system’s physical mechanisms, resulting in a poorly interpretable model. However, our method constructs the adjacency matrix through the causal relationships extracted by the PCMCI+ algorithm, which actually incorporates the physical causal mechanism of shield machine operation into the deep learning framework. This ‘physics-aware’ property makes the model prediction based not only on statistical correlation but also on the physical causal mechanism of the system, which greatly improves the interpretability of the model.
Unlike traditional ‘black-box’ deep learning methods, our C-GCN-GRU model explicitly identifies and quantifies the causal relationships between key parameters of shield machines through the PCMCI+ algorithm and presents these relationships in the form of intuitive causal graphs. These diagrams directly demonstrate causal chains such as ‘Cutter Torque → Yaw Angle’, providing operators with an interpretable basis for decision-making and enabling them to understand the potential path of influence of parameter adjustments.
In order to introduce the relationship between causal and physical mechanisms more clearly, some features with strong correlation are extracted from Figure 8 for analysis. Figure 10 illustrates the data-driven dependency structure identified by the PCMCI+ algorithm under standard causal discovery assumptions (approximate stationarity, limited hidden confounding, and minimal sensor delays). The identified links are consistent with known physical mechanisms in shield tunneling. The diagram reveals the key engineering mechanisms: thrust (X1) increases cutting resistance and requires synchronized adjustment of the cutter speed (X2) to maintain optimal cutting efficiency (cutting balance), while higher thrust requires higher torque (X3) as the cutter penetrates deeper (rock cutting mechanics). Front Horizontal Deviation (Y1) triggers RPM adjustment to generate asymmetric lateral forces to correct trajectory (Guidance Feedback). Increased blade torque indicates greater ground resistance, prompting an automatic reduction in propulsion speed (X4) to prevent overload (drag control). Higher propulsion speeds result in increased rear horizontal deviation (Y2) due to machine inertia, control response lag, and reduced hydraulic accuracy (speed affects steering inertia). The ground pressure relationship shows how the bottom pressure (X7) transfers to the left side pressure (X8) in accordance with the principle of stress redistribution, whereas an uneven top pressure (X5) creates a transverse moment that affects the horizontal alignment of the front (top pressure effect). Differences in jack travel (X9) reflect machine tilt, requiring mud flow (X12) to be adjusted to maintain proper surface support (attitude compensation), while uneven mud distribution creates unbalanced lateral pressures that affect rear alignment (pressure equalization). Left side pressure changes trigger cutter RPM adjustments to offset uneven cutting resistance (Adaptive Adjustment), while mud pressure (X11) modifications require RPM adjustments to maintain digging efficiency (Mud Drive Adjustment). These findings indicate that the dependency structure identified by PCMCI+ is broadly consistent with established engineering knowledge of shield machine operation, lending physical plausibility to the inferred relationships. Nevertheless, we acknowledge that unobserved factors such as localized geological transitions and operator interventions may act as confounders, and that the identified structure may vary across geological regimes. These results not only validate the effectiveness of the PCMCI+ method but also provide a scientific basis for the optimization of shield machine operation.
With the remainder 20% of the data, the performance of the test model can be evaluated. The results of the C-GCN-GRU model on the test data are shown in Table 6, Table 7 and Table 8. Since this study performs multi-step prediction for shield attitude parameters, the evaluation method is different from the traditional sliding window for single-step prediction, which only needs to evaluate one time-series prediction result. In this study, in order to observe the effect of multi-step prediction performance, each t + n prediction is compared with the corresponding t + n actual values. In multi-step forecasting, the prediction score of the current time step will be used as the input for the next time step, so there is an error accumulation effect. If there is an error in the prediction of a certain time step, this error will propagate to the subsequent predictions, leading to the gradual accumulation and amplification of the prediction error.
According to the prediction results, this paper analyzes them based on MAE, RMSE, and R 2 as evaluation indexes. From the prediction scores, it can be seen that the C-GCN-GRU model is able to predict all the positional results with high accuracy, with the MAE value not exceeding 1.94 and the highest RMSE of 2.70 in each time step (t + 1–t + 6), and the lowest R 2 of each attitude parameter is 0.902 for the t + 6 time step, which is the most hard time step to predict. The mean values of MAE for HDH, HDT, VDH, and VDT are 1.5239, 1.3587, 1.6769, and 0.7680, respectively; the mean values of RMSE are 2.1993, 1.7962, 2.2789, and 1.0283, respectively; and the mean values of R 2 are 0.9345, 0.9755, 0.9292, and 0.9940, respectively. These results generally show that the proposed method can offer dependable attitude prediction parameters for the shield driver, allowing the driver to adjust the shield attitude in time. Figure 11, Figure 12, Figure 13 and Figure 14 illustrate the prediction outcomes of the four attitude factors at time step t + 6. The predicted values closely align with the actual values, demonstrating that the model retains high prediction accuracy even at a farther time step, with no significant deviations.
Figure 15 illustrates a scatter plot of the predicted results, where the horizontal axis shows the true values and the vertical axis shows the predicted values. The curves at the top and right of the way show the distribution of the data. The effect of fitting the predicted results to the true values can be visualized by adding a red dashed line. The closer the scatter is to the dotted line, the better the model predicts. It is clear that the scatter points are almost always distributed around the y = x axis. The VDTs in all of the above results show an excellent fit. It should be noted that the predictions of HDH and VDH are relatively poor, with the average of the six time-step R 2 values being 0.9. A possible interpretation for the prediction deviations is that the real data of HDH and VDH are more fluctuating, while the forecast values have a tendency to be smoothed. But the results of the line graphs show that the method proposed in this paper can track the deviation movement trend better, which will provide useful information for shield machine attitude adjustment.

5. Discussion

To assess the matrix of causality extracted by the PCMCI algorithm, the GCN-GRU model and the effect of multiple causal attention mechanism enhancement, this study designed ablation experiments and comparative experiments to highlight the C-GCN-GRU performance. Finally, the adjacency matrix extracted by the PCMCI+ algorithm and the Pearson algorithm are compared to illustrate the advantages of extracting the causality to the model.

5.1. Ablation Experiment

To further demonstrate the capability of the proposed method, the model performance is compared with the traditional GRU model, GCN-GRU-NA, C-GCN-GRU-NA, and GCN-GRU models, as summarized in Table 9.
It is worth noting that the GRU and the model in this document have the same structure, consisting of three GRU layers and two fully connected layers. The GCN-GRU model and the C-GCN-GRU-NA model have the same network topology as the C-GCN-GRU model, except that the former removes the multi-headed causal attention mechanism module, and the latter replaces the adjacency matrix with a unit matrix. In this paper, the average prediction results of the six time steps of each model are used as the indicators of the prediction performance of the models in the subsequent study, as shown in Table 10, Table 11 and Table 12.
As shown in the table, the three evaluation indexes of the models HDH, HDT, VDH, and VDT of the three ablation experiments are not as good as those of the C-GCN-GRU. The model proposed in this paper reduces HDH, HDT, VDH, and VDT by 13.12%, 3.91%, 21.63%, and 3.19%, respectively, in MAE values, and HDH, HDT, VDH, and VDT by 5.94%, 2.41%, 7.87%, and 1.80%, respectively, in RMSE values, when compared to the conventional GRU. The models proposed herein are more effective than those of the C-GCN-GRU in the following ways: HDT, VDH, and VDT decreased by 5.94%, 2.41%, 7.87%, and 1.80%, respectively, and HDH, HDT, VDH, and VDT increased by 1.83%, 1.22%, 1.29%, and 0.02%, respectively, in the R 2 value. Comparison of the model proposal in this paper with GCN-GRU and C-GCN-GRU-NA leads to the conclusion that the overall predictive effectiveness of the model decreases by removing the multiple causal attention mechanism and replacing the collocation matrix proposed by the PCMCI+ algorithm with a unit matrix. It is worth noting that the model using the unit matrix to replace the causal adjacency matrix has even worse results than those of the traditional GRU model, which reinforces the importance of correctly constructing the collocation matrix when using graph convolution. All these results indicate the reliability of C-GCN-GRU in shield machine attitude prediction. All experiments were conducted with a fixed random seed (42) to ensure reproducibility. For ablation comparisons where performance differences are relatively small, the improvements should be interpreted as indicative trends rather than definitive conclusions, and multi-run statistical validation is recommended in future work to further confirm these findings.

5.2. Comparison Experiment

To demonstrate the superiority of the model, LSTM, CNN-LSTM, BIGRU, GCN-LSTM, Graph WaveNet, and MTGNN are chosen to compare with each other using the same dataset. LSTM and CNN-LSTM are more basic networks, which have better prediction effects in the early stage of time-series prediction. The BIGRU model is a recurrent neural network, which consists of two independent GRU units, and through the bidirectional structure, the BIGRU model is able to capture both forward and backward information of the sequence data, which leads to a better understanding and prediction of patterns in the sequence. Graph WaveNet learns implicit dependencies between variables through adaptive matrices and captures dynamics in combination with temporal topology networks, modeling both spatial and temporal dimensions and handling long series data with extended receptive fields. MTGNN automatically learns variable dependencies to construct graph structures, combining graph topology modules, temporal topology layers, and gating mechanisms. It enhances prediction capabilities through multivariate receptive fields and attention mechanisms, resulting in improved forecasting performance. GCN-LSTM is utilized in the field of shield machine attitude prediction in 2023, and the model proposed in this paper is more computationally efficient compared to it, and is more suitable for scenarios that require fast processing of time-series data. All baseline models were trained and evaluated under identical conditions. Hyperparameters for each baseline were selected through grid search over the same candidate value ranges used for C-GCN-GRU, with final configurations determined by validation set performance. No model-specific advantage was introduced through differential tuning. All baseline and ablation models were trained using the same input feature set (X1–X15 and one historical attitude parameter) and the same sliding window length of 30 time steps, ensuring a fair and consistent comparison across all evaluated methods. Table 13, Table 14 and Table 15 demonstrate the specific experimental results. Figure 16, Figure 17, Figure 18 and Figure 19 show the line plots of the comparison between the predicted and true values at step t + 6 of each model. Among them, the GCN-LSTM model is even less effective than the base model in predicting the VDH parameters, which may be analyzed because the GCN performs better in capturing spatial dependence, but the LSTM part may not play a full role in combining temporal dependence. While Graph WaveNet and MTGNN models have theoretical advantages in constructing dynamic graph structures and capturing spatial dependencies, they may be overdone on the specific dataset of this study. However, in general, it can be concluded from the above data that the model proposed in this paper has the best prediction effect.
Figure 20 illustrates a box plot comparison of the error distributions of different prediction models. All models show relatively symmetric error distributions, with the median line close to zero, indicating that there is no obvious systematic deviation between the predicted and true values of the models. In terms of box height, the C-GCN-GRU model proposed in this paper has the narrowest quartile spacing, indicating that its prediction error distribution is more concentrated, and its prediction stability is higher than that of the other compared models. The smaller range of upper and lower whiskers in the box plot and the relatively small number of outliers further demonstrate that C-GCN-GRU maintains good prediction performance under various conditions. The C-GCN-GRU model proposed in this paper more effectively integrates the spatio-temporal features, thus achieving more accurate and stable prediction results.
This paper also compares the training time, total model parameters and memory usage of the models, as shown in Table 16.
Through a comprehensive comparative analysis of multiple deep learning models, our C-GCN-GRU model shows obvious advantages. In terms of model performance, the running time of C-GCN-GRU is 2 min and 36 s, which is slightly longer than the 1 min and 39 s of CNN-LSTM, but this slight time difference is acceptable, considering that our model has a stronger representation capability and a more complex structural design. Notably, compared to the 3 min and 20 s of GCN-LSTM, our model exhibits higher computational efficiency, reducing the training time by about 21% while maintaining similar parameter sizes.
Although the memory footprint of C-GCN-GRU is relatively high, this is a necessary price to pay for the implementation of more complex graph convolution with GRU structures. It is worth noting that our model has only a slight increase in the number of parameters compared to the simple BIGRU model, yet it provides a stronger representation and higher accuracy.
In particular, compared to GCN-LSTM, which also employs a graph structure, C-GCN-GRU is able to capture the spatio-temporal dependencies of the data more efficiently while maintaining a similar parameter size. Compared to lightweight models such as Graph WaveNet and MTGNN, our approach, although with a larger parameter count, is able to handle more complex patterns and long-term dependencies, and is suitable for a wider range of application scenarios.
Overall, the C-GCN-GRU model strikes an excellent balance between computational efficiency and model expressiveness. Although C-GCN-GRU incurs a moderately higher training time than simpler baselines, this cost is justified by the substantial accuracy improvements achieved. The additional overhead stems from the GCN graph convolution layers and the multi-head causal attention module, while the PCMCI+ causal discovery is performed once offline and does not affect inference. As model training in shield tunneling is conducted offline prior to deployment, training duration does not constrain real-time operational use. Given that millimeter-level improvements in attitude prediction accuracy can directly reduce trajectory deviations and costly corrective interventions, the marginal increase in training time represents an acceptable trade-off for the engineering gains delivered by the proposed method.

5.3. Parameter Characterization Comparison

The C-GCN-GRU model has a powerful capability in predicting the attitude parameters of the shield machine. One of the main innovations of the method is to exploit the relationships between the parameters of the shield machine through causal relationships. These causal relationships are used to construct an adjacency matrix that helps the GCN model to make accurate predictions. The correlation between parameters has additional tools to characterize this relationship. Some studies have used the Pearson correlation coefficient to construct the adjacency matrix. Therefore, this paper further investigates the effect of the correlation matrix on model performance by comparing the results of the Pearson- and PCMCI+-extracted adjacency matrix.
For a fair comparison, the model structures are the same except for the different methods of extracting the collocation matrix. The comparison results are shown in Figure 21, Figure 22, Figure 23 and Figure 24; the red scatter represents the causal error, the blue scatter means the Pearson error, and the solid black line at scale 0 on the vertical axis means the baseline. The proximity of the dots to the black baseline in Figure 21, Figure 22, Figure 23 and Figure 24 represents the magnitude of the error, with the closer dots indicating a smaller error.
As seen in Figure 21, Figure 22, Figure 23 and Figure 24, PCMCI+ is better than the Pearson correlation coefficient method in extracting the collocation matrix through causality, especially in the two parameters HDH and VDH.
While Pearson correlation assumes joint normality and linear dependence, alternative nonparametric measures such as Spearman correlation, Kendall’s tau, and Hoeffding’s D could provide more robust estimates of nonlinear monotonic associations between shield parameters. However, these measures, like Pearson correlation, quantify undirected statistical dependence without temporal directionality. In contrast, PCMCI+ identifies directed causal relationships with explicit time-lag structure, which is physically meaningful in the shield tunneling context, where the timing of parameter influences on attitude deviation is operationally important. A systematic comparison of adjacency matrices constructed from these alternative dependence measures remains an interesting direction for future work.

6. Conclusions

In this paper, a new method named C-GCN-GRU is proposed, which combines a causal algorithm, a deep learning framework based on GCN and GRU, and a multi-head causal attention mechanism. The method extracts the causal relationships between the attitude prediction features of the shield machine through the PCMCI+ algorithm and inputs the extracted causal relationships into the GCN-GRU model by transforming them into adjacency matrices, and at the same time, combines with the multiple causal attention mechanism by constructing four models for predicting the HDH, HDT, VDH, and VDT, so as to realize accurate prediction of the attitude prediction parameters of the shield machine.
The model proposed in this paper is used for the Karnaphuli River Tunnel Project in Bangladesh, and the results achieved are as follows: (1) The causal relationships of the feature parameters in the input model were extracted as collocation matrices input to the model by the PCMCI+ algorithm, which effectively improved the accuracy of the overall attitude prediction. (2) The C-GCN-GRU model can predict the attitude of the shield machine better in all six time steps, with an average MAE of 1.2881, an RMSE of 1.7923, and an R 2 of 0.9590. (3) Compared with the four depth-school models, the proposed C-GCN-GRU model outperforms the four model evaluation metrics.
Although the model developed in this study performs satisfactorily in terms of both accuracy and interpretability considerations, there are still some limitations, and the shield machine operating conditions may vary greatly in different engineering environments; in future studies, more datasets from different projects will be used to enhance the model’s ability to generalize to different scenarios, so as to maintain a higher prediction accuracy under a variety of operating conditions. In addition, we propose to adopt the proposed method in an actual tunneling project to further investigate its applicability and performance in real projects.

Author Contributions

Coeptualization, X.Y.; methodology, L.Z.; software, X.Y. and X.W.; validation, X.W.; formal analysis, X.Y.; investiga-tion, C.Z.; resources, C.Z.; data curation, X.Y.; writing—original draft preparation, X.Y.; writing—review and editing, L.Z. and S.W.; visualization, X.Y.; supervision, C.Z.; project administration, S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Project of Hubei Province (No. 2023BAB094) and the Open Foundation of Hubei Key Laboratory for High-efficiency Utilization of Solar Energy and Operation Control of Energy Storage System (No. HBSEES202106 and No. HBSEES202309).

Data Availability Statement

The data that support the findings of this study are not publicly available due to confidentiality restrictions imposed by the project owner. The data were obtained from the Karnaphuli River Tunnel Project and are not available for public access. Requests to access the data should be directed to the project owner.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Author Xue Wang was employed by the company CCCC Wuhan Zhi Xing International Engineering Consulting Company Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Liu, T.; Gong, G.; Yang, H.; Chen, Y.; Zhu, Y. Trajectory Control of Tunnel Boring Machine Based on Adaptive Rectification Trajectory Planning and Multi-Cylinders Coordinated Control. Int. J. Precis. Eng. Manuf. 2019, 20, 1721–1733. [Google Scholar] [CrossRef]
  2. Mo, H.H.; Chen, J.S. Study on inner force and dislocation of segments caused by shield machine attitude. Tunn. Undergr. Space Technol. 2008, 23, 281–291. [Google Scholar] [CrossRef]
  3. Sugimoto, M.; Sramoon, A. Theoretical Model of Shield Behavior During Excavation. I Theory. J. Geotech. Geoenviron. Eng. 2002, 128, 138–155. [Google Scholar] [CrossRef]
  4. Festa, D.; Broere, W.; Bosch, J.W. Kinematic behaviour of a Tunnel Boring Machine in soft soil: Theory and observations. Tunn. Undergr. Space Technol. 2015, 49, 208–217. [Google Scholar] [CrossRef]
  5. Hu, X.; Huang, Y.; Yin, Z.; Xiong, Y. Driving force planning in shield tunneling based on Markov decision processes. Sci. China Technol. Sci. 2012, 55, 1022–1030. [Google Scholar] [CrossRef]
  6. Pan, Y.; Zhou, X.; Qiu, S.; Zhang, L. Time series clustering for TBM performance investigation using spatio-temporal complex networks. Expert Syst. Appl. 2023, 225, 120100. [Google Scholar] [CrossRef]
  7. Rossi, D.; Citarella, A.A.; De Marco, F.; Di Biasi, L.; Zheng, H.; Tortora, G. DREAM: Diabetes risk via explainable AI modeling. Multimed. Tools Appl. 2026, 85, 145. [Google Scholar] [CrossRef]
  8. Fu, X.; Zhang, L. Spatio-temporal feature fusion for real-time prediction of TBM operating parameters: A deep learning approach. Autom. Constr. 2021, 132, 103937. [Google Scholar] [CrossRef]
  9. Xu, J.; Zhang, Z.; Zhang, L.; Liu, D. Predicting shield position deviation based on double-path hybrid deep neural networks. Autom. Constr. 2023, 148, 104775. [Google Scholar] [CrossRef]
  10. Fu, X.; Pan, Y.; Zhang, L. A causal-temporal graphic convolutional network (CT-GCN) approach for TBM load prediction in tunnel excavation. Expert Syst. Appl. 2024, 238, 121977. [Google Scholar] [CrossRef]
  11. Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2017, arXiv:1609.02907. [Google Scholar] [CrossRef]
  12. Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3848–3858. [Google Scholar] [CrossRef]
  13. Pan, Z.; Su, C.; Deng, Y.; Cheng, J. Video2Entities: A computer vision-based entity extraction framework for updating the architecture, engineering and construction industry knowledge graphs. Autom. Constr. 2021, 125, 103617. [Google Scholar] [CrossRef]
  14. Huang, H.; Chang, J.; Zhang, D.; Zhang, J.; Wu, H.; Li, G. Machine learning-based automatic control of tunneling posture of shield machine. J. Rock Mech. Geotech. Eng. 2022, 14, 1153–1164. [Google Scholar] [CrossRef]
  15. Shen, X.; Lu, M.; Chen, W. Tunnel-Boring Machine Positioning during Microtunneling Operations through Integrating Automated Data Collection with Real-Time Computing. J. Constr. Eng. Manag. 2011, 137, 72–85. [Google Scholar] [CrossRef]
  16. Fu, X.; Wu, M.; Tiong, R.L.K.; Zhang, L. Data-driven real-time advanced geological prediction in tunnel construction using a hybrid deep learning approach. Autom. Constr. 2023, 146, 104672. [Google Scholar] [CrossRef]
  17. He, B.; Zhu, G.; Han, L.; Zhang, D. Adaptive-Neuro-Fuzzy-Based Information Fusion for the Attitude Prediction of TBMs. Sensors 2020, 21, 61. [Google Scholar] [CrossRef] [PubMed]
  18. Guo, K.; Zhang, L. Multi-objective optimization for improved project management: Current status and future directions. Autom. Constr. 2022, 139, 104256. [Google Scholar] [CrossRef]
  19. Pan, Y.; Zhang, L. Roles of artificial intelligence in construction engineering and management: A critical review and future trends. Autom. Constr. 2021, 122, 103517. [Google Scholar] [CrossRef]
  20. Gong, Q.; Zhou, X.; Liu, Y.; Han, B.; Yin, L. Development of a real-time muck analysis system for assistant intelligence TBM tunnelling. Tunn. Undergr. Space Technol. 2021, 107, 103655. [Google Scholar] [CrossRef]
  21. Zhao, S.; Zhang, D.; Xue, Y.; Zhou, M.; Huang, H. A deep learning-based approach for refined crack evaluation from shield tunnel lining images. Autom. Constr. 2021, 132, 103934. [Google Scholar] [CrossRef]
  22. Qin, H.; Zhang, D.; Tang, Y.; Wang, Y. Automatic recognition of tunnel lining elements from GPR images using deep convolutional networks with data augmentation. Autom. Constr. 2021, 130, 103830. [Google Scholar] [CrossRef]
  23. Zhao, S.; Zhang, G.; Zhang, D.; Tan, D.; Huang, H. A hybrid attention deep learning network for refined segmentation of cracks from shield tunnel lining images. J. Rock Mech. Geotech. Eng. 2023, 15, 3105–3117. [Google Scholar] [CrossRef]
  24. Qin, C.; Huang, G.; Yu, H.; Wu, R.; Tao, J.; Liu, C. Geological Information Prediction for Shield Machine Using an Enhanced Multi-Head Self-Attention Convolution Neural Network with Two-Stage Feature Extraction. Geosci. Front. 2023, 14, 101519. [Google Scholar] [CrossRef]
  25. Gao, X.; Shi, M.; Song, X.; Zhang, C.; Zhang, H. Recurrent neural networks for real-time prediction of TBM operating parameters. Autom. Constr. 2019, 98, 225–235. [Google Scholar] [CrossRef]
  26. Li, X.; Gong, G. Predictive control of slurry pressure balance in shield tunneling using diagonal recurrent neural network and evolved particle swarm optimization. Autom. Constr. 2019, 107, 102928. [Google Scholar] [CrossRef]
  27. Dai, Z.; Li, P.; Zhu, M.; Zhu, H.; Liu, J.; Zhai, Y.; Fan, J. Dynamic prediction for attitude and position of shield machine in tunneling: A hybrid deep learning method considering dual attention. Adv. Eng. Inform. 2023, 57, 102032. [Google Scholar] [CrossRef]
  28. Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
  29. Shen, X.; Lu, M.; Fernando, S.; AbouRizk, S.M. Tunnel Boring Machine Positioning Automation in Tunnel Construction. In Proceedings of the International Symposium on Automation and Robotics in Construction, Eindhoven, The Netherlands, 26–29 June 2012. [Google Scholar]
  30. Lee, K.; Rhee, W. DDP-GCN: Multi-Graph Convolutional Network for Spatiotemporal Traffic Forecasting. arXiv 2022, arXiv:1905.12256. [Google Scholar] [CrossRef]
  31. Chen, Z.; Zhang, Y.; Li, J.; Li, X.; Jing, L. Diagnosing tunnel collapse sections based on TBM tunneling big data and deep learning: A case study on the Yinsong Project, China. Tunn. Undergr. Space Technol. 2021, 108, 103700. [Google Scholar] [CrossRef]
  32. Cavanaugh, J.E. Unifying the Derivations for the Akaike and Corrected Akaike Information Criteria. Stat. Probab. Lett. 1997, 33, 201–208. [Google Scholar] [CrossRef]
  33. Nagrecha, K.; Fisher, L.; Mooney, M.; Rodriguez-Nikl, T.; Mazari, M.; Pourhomayoun, M. As-Encountered Prediction of Tunnel Boring Machine Performance Parameters using Recurrent Neural Networks. Transp. Res. Rec. J. Transp. Res. Board 2020, 2674, 241–249. [Google Scholar] [CrossRef]
  34. Yang, X.; Zhang, H.; Qi, G.; Cai, J. Causal Attention for Vision-Language Tasks. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 9842–9852. [Google Scholar] [CrossRef]
Figure 1. Schematic diagram of shield position measurement and sensor location.
Figure 1. Schematic diagram of shield position measurement and sensor location.
Algorithms 19 00224 g001
Figure 2. Flowchart of the method.
Figure 2. Flowchart of the method.
Algorithms 19 00224 g002
Figure 3. GCN model structure.
Figure 3. GCN model structure.
Algorithms 19 00224 g003
Figure 4. GRU model structure.
Figure 4. GRU model structure.
Algorithms 19 00224 g004
Figure 5. Schematic diagram of cause and effect.
Figure 5. Schematic diagram of cause and effect.
Algorithms 19 00224 g005
Figure 6. Sketch of a single causal attention module.
Figure 6. Sketch of a single causal attention module.
Algorithms 19 00224 g006
Figure 7. Schematic diagram of the proposed model.
Figure 7. Schematic diagram of the proposed model.
Algorithms 19 00224 g007
Figure 8. Directed acyclic graph between parameters.
Figure 8. Directed acyclic graph between parameters.
Algorithms 19 00224 g008
Figure 9. Causal diagram between time-series variables.
Figure 9. Causal diagram between time-series variables.
Algorithms 19 00224 g009
Figure 10. TBM engineering causal mechanism relationship diagram.
Figure 10. TBM engineering causal mechanism relationship diagram.
Algorithms 19 00224 g010
Figure 11. Predicted results for HDH.
Figure 11. Predicted results for HDH.
Algorithms 19 00224 g011
Figure 12. Predicted results for HDT.
Figure 12. Predicted results for HDT.
Algorithms 19 00224 g012
Figure 13. Predicted results for VDH.
Figure 13. Predicted results for VDH.
Algorithms 19 00224 g013
Figure 14. Predicted results of VDT.
Figure 14. Predicted results of VDT.
Algorithms 19 00224 g014
Figure 15. Joint plots of predicted results against ground truth for (a) HDH, (b) HDT, (c) VDH, (d) VDT.
Figure 15. Joint plots of predicted results against ground truth for (a) HDH, (b) HDT, (c) VDH, (d) VDT.
Algorithms 19 00224 g015aAlgorithms 19 00224 g015b
Figure 16. HDH comparison experiment line graph.
Figure 16. HDH comparison experiment line graph.
Algorithms 19 00224 g016
Figure 17. HDT comparison experiment line graph.
Figure 17. HDT comparison experiment line graph.
Algorithms 19 00224 g017
Figure 18. VDH comparison experiment line graph.
Figure 18. VDH comparison experiment line graph.
Algorithms 19 00224 g018
Figure 19. VDT comparison experiment line graph.
Figure 19. VDT comparison experiment line graph.
Algorithms 19 00224 g019
Figure 20. Model Comparison—Prediction Error Distribution.
Figure 20. Model Comparison—Prediction Error Distribution.
Algorithms 19 00224 g020
Figure 21. Comparison of the prediction errors of different algorithms for extracting the collocation matrix (HDH).
Figure 21. Comparison of the prediction errors of different algorithms for extracting the collocation matrix (HDH).
Algorithms 19 00224 g021
Figure 22. Comparison of the prediction errors of different algorithms for extracting the collocation matrix (HDT).
Figure 22. Comparison of the prediction errors of different algorithms for extracting the collocation matrix (HDT).
Algorithms 19 00224 g022
Figure 23. Comparison of the prediction errors of different algorithms for extracting the collocation matrix (VDH).
Figure 23. Comparison of the prediction errors of different algorithms for extracting the collocation matrix (VDH).
Algorithms 19 00224 g023
Figure 24. Comparison of the prediction errors of different algorithms for extracting the collocation matrix (VDT).
Figure 24. Comparison of the prediction errors of different algorithms for extracting the collocation matrix (VDT).
Algorithms 19 00224 g024
Table 1. Model Configuration and Training Parameters.
Table 1. Model Configuration and Training Parameters.
ParameterCandidate ValuesSetting
Learning rate0.001, 0.0005, 0.00010.0001
Epochs80, 100, 120100
Batch size16, 32, 4832
Patience15, 20, 2520
Loss function-MSE
Optimizer-Adam
Learning rate decay-Exponential
Sliding window length20, 30, 4030
Prediction horizon-6 steps
GCN layers1, 22
GRU layers2, 33
Dropout rate-0.2
GCN activation-ReLU
GRU activation-ELU
Normalization-Min-Max [0, 1], per-feature
Butterworth filter order-2
Butterworth cutoff frequency-0.2
Outlier detection-3-sigma rule
PCMCI+ significance level α_PC0.1–0.50.3
Causal lag τ0, 1, 2, 32
Framework-PyTorch3.9
Random seed-42
Table 2. General description of the selected traits.
Table 2. General description of the selected traits.
FactorUnitMeanStdMinMedianMax
(X1) Thrust forcekN109,558.224277.1183,386109,435126,812
(X2) Cutterhead torque speedrpm0.930.120.830.841.21
(X3) Cutterhead torquekN·m4111.57856.35141139917088
(X4) Penetrationmm m i n 1 31.442.92183152
(X5) Earth pressure (top)MPa3.160.292.233.183.81
(X6) Earth pressure (right)MPa15.755.245.622.526.9
(X7) Earth pressure (down)MPa3.010.311.73.033.9
(X8) Earth pressure (left)MPa21.894.628.321.827.3
(X9) Differential jack travel (up-down)mm−65.9217.50−104−61−32
(X10) Differential jack travel (left-right)mm3.020.671.83.25.7
(X11) Mud delivery pressurebar5.120.474.695.126.3
(X12) Mud delivery flowm3/ m i n 1 50.250.6549.150.554
(X13) Mud discharge flowm3/ m i n 1 22.150.6120.5622.527.46
(X14) Mud discharge pressurebar4.90.364.14.86.3
(X15) Bubble chamber pressurebar4.900.054.754.95.1
(Y1) HDHmm50.231.534850.254
(Y2) HDTmm−18.9622.72−89−1437
(Y3) VDHmm−11.6817.11−53−923
(Y4) VDTmm−40.0318.28−90−361
Table 3. Evaluation of MAE for HDH, HDT, VDH, and VDT with different time lags τ .
Table 3. Evaluation of MAE for HDH, HDT, VDH, and VDT with different time lags τ .
t = 0t = 1t = 2t = 3
HDH2.80211.78261.52391.5319
HDT2.88301.87571.35871.3739
VDH2.86501.83421.50191.6959
VDT2.14051.20350.7680.8198
Table 4. Evaluation of RMSE for HDH, HDT, VDH, and VDT with different time lags τ .
Table 4. Evaluation of RMSE for HDH, HDT, VDH, and VDT with different time lags τ .
t = 0t = 1t = 2t = 3
HDH3.60052.40832.19932.2105
HDT3.49432.43991.79621.8101
VDH3.53902.39682.14562.2829
VDT2.64911.56831.02831.1008
Table 5. Evaluation of R 2 for HDH, HDT, VDH, and VDT with different time lags τ .
Table 5. Evaluation of R 2 for HDH, HDT, VDH, and VDT with different time lags τ .
t = 0t = 1t = 2t = 3
HDH0.81730.91560.93450.9281
HDT0.90800.95520.97550.9752
VDH0.81890.91640.93210.9239
VDT0.96110.98640.99400.9932
Table 6. MAE of C-GCN-GRU with respect to six output variables.
Table 6. MAE of C-GCN-GRU with respect to six output variables.
Time StepsHDHHDTVDHVDT
t + 11.03331.12201.27710.6480
t + 21.31971.29261.34590.6935
t + 31.52171.39901.44350.7506
t + 41.66001.41231.57820.7932
t + 51.75231.44331.63000.8383
t + 61.85651.48301.73720.8845
Average1.52391.35871.50190.7680
Table 7. RMSE of C-GCN-GRU with respect to six output variables.
Table 7. RMSE of C-GCN-GRU with respect to six output variables.
Time StepsHDHHDTVDHVDT
t + 11.43111.47081.72890.8427
t + 21.86221.70351.88500.9198
t + 32.19771.84612.18171.0005
t + 42.42551.87162.25781.0657
t + 52.57411.91662.35531.1363
t + 62.70531.96892.46521.2051
Average2.19931.79622.14561.0283
Table 8. R 2 of C-GCN-GRU with respect to six output variables.
Table 8. R 2 of C-GCN-GRU with respect to six output variables.
Time StepsHDHHDTVDHVDT
t + 10.97310.98370.97280.9961
t + 20.95110.97820.94750.9953
t + 30.93200.97440.93500.9945
t + 40.92720.97360.91980.9937
t + 50.91680.97230.91560.9929
t + 60.90710.97080.90200.9920
Average0.93450.97550.93210.9940
Table 9. Configuration of the three different strategies in each ablation model.
Table 9. Configuration of the three different strategies in each ablation model.
Model\MethodGCNCATTCausal Adjacency Matrix
GRU×××
GCN-GRU-NA××
C-GCN-GRU-NA×
GCN-GRU×
C-GCN-GRU
Table 10. Comparison of MAE values between the presented model and the other 3 methods.
Table 10. Comparison of MAE values between the presented model and the other 3 methods.
ModelGRUGCN-GRU-NAC-GCN-GRU-NAGCN-GRUC-GCN-GRU
HDH1.75401.68641.63541.62901.5239
HDT1.41401.49781.48031.38941.3587
VDH1.91701.75521.70431.63631.5019
VDT0.79330.91610.90130.78490.7680
Table 11. Comparison of RMSE values between the presented model and the other 3 methods.
Table 11. Comparison of RMSE values between the presented model and the other 3 methods.
ModelGRUGCN-GRU-NAC-GCN-GRU-NAGCN-GRUC-GCN-GRU
HDH2.33802.31382.29362.35392.1993
HDT1.84061.94671.93971.77491.7962
VDH2.32892.25482.16112.22912.1457
VDT1.04711.20521.17151.05261.0283
Table 12. Comparison of R 2 values between the presented model and the other 3 methods.
Table 12. Comparison of R 2 values between the presented model and the other 3 methods.
ModelGRUGCN-GRU-NAC-GCN-GRU-NAGCN-GRUC-GCN-GRU
HDH0.91770.91100.92310.91780.9345
HDT0.96430.96150.97160.97600.9755
VDH0.92020.92160.92830.92700.9321
VDT0.99380.99210.99230.99370.9940
Table 13. Comparison of contrasting experimental models (mean MAE).
Table 13. Comparison of contrasting experimental models (mean MAE).
ModelLSTMCNN-LSTMBIGRUGCN-LSTMGWNMTGNNC-GCN-GRU
HDH2.02531.88491.95131.73741.95941.96311.5239
Improvement24.76%19.13%21.88%12.29%22.23%22.37%
HDT2.14161.82801.45141.41941.49981.42201.3587
Improvement36.55%25.69%6.39%4.27%9.41%4.45%
VDH2.36041.90831.65603.63122.40992.65321.5019
Improvement36.38%21.27%9.30%58.63%37.68%43.39%
VDT1.11381.11271.02600.78991.15450.7730.768
Improvement31.04%30.97%25.18%2.78%33.48%0.65%
Table 14. Comparison of contrasting experimental models (mean RMSE).
Table 14. Comparison of contrasting experimental models (mean RMSE).
ModelLSTMCNN-LSTMBIGRUGCN-LSTMGWNMTGNNC-GCN-GRU
HDH2.60332.58102.53162.49262.66712.63822.1993
Improvement15.51%14.75%13.12%11.76%17.54%16.64%
HDT2.58842.31431.92071.83951.97511.85781.7962
Improvement30.60%22.38%6.48%2.35%9.06%3.32%
VDH3.02522.43822.26134.32852.84293.0462.1456
Improvement29.07%12%2.11%50.43%24.53%29.56%
VDT1.40281.38061.28611.07061.44611.04951.0283
Improvement26.69%25.51%20.04%3.95%28.89%2.02%
Table 15. Comparison of contrasting experimental models (mean R 2 ).
Table 15. Comparison of contrasting experimental models (mean R 2 ).
ModelLSTMCNN-LSTMBIGRUGCN-LSTMGWNMTGNNC-GCN-GRU
HDH0.89540.90260.89880.91110.89790.89970.9345
Improvement4.18%3.42%3.82%2.50%4.08%3.87%
HDT0.94930.95950.96840.97430.97020.97320.9755
Improvement2.68%1.64%0.73%0.12%0.55%0.24%
VDH0.86230.89310.92330.72920.88320.86570.9321
Improvement7.48%4.18%0.94%21.77%5.54%7.67%
VDT0.98910.98930.99070.99350.98840.99370.9940
Improvement0.49%0.47%0.33%0.05%0.57%0.03%
Table 16. Comparison of training duration and computational resource consumption.
Table 16. Comparison of training duration and computational resource consumption.
ModelTimeModel ParametersMemory Usage
BIGRU2:17 min709,23838.86 MB
CNN-LSTM1:39 min454,00613.65 MB
GCN-LSTM3:20 min536,31026.27 MB
Graph WaveNet1:11 min158,6622.73 MB
MTGNN1:25 min158,6622.73 MB
LSTM1:42 min341,11023.24 MB
C-GCN-GRU2:36 min465,65448.31 MB
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zeng, L.; Yan, X.; Zhang, C.; Wang, X.; Wang, S. Shield Machine Attitude Prediction Method Based on Causal Graph Convolutional Network. Algorithms 2026, 19, 224. https://doi.org/10.3390/a19030224

AMA Style

Zeng L, Yan X, Zhang C, Wang X, Wang S. Shield Machine Attitude Prediction Method Based on Causal Graph Convolutional Network. Algorithms. 2026; 19(3):224. https://doi.org/10.3390/a19030224

Chicago/Turabian Style

Zeng, Liang, Xingao Yan, Chenning Zhang, Xue Wang, and Shanshan Wang. 2026. "Shield Machine Attitude Prediction Method Based on Causal Graph Convolutional Network" Algorithms 19, no. 3: 224. https://doi.org/10.3390/a19030224

APA Style

Zeng, L., Yan, X., Zhang, C., Wang, X., & Wang, S. (2026). Shield Machine Attitude Prediction Method Based on Causal Graph Convolutional Network. Algorithms, 19(3), 224. https://doi.org/10.3390/a19030224

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop