ST-GC-GRU: A Hybrid Deep Learning Approach for Shield Attitude Prediction Based on a Spatial–Temporal Graph

Liu, Wen; Chen, Jia; Wang, Shanshan; Wang, Xue; Yan, Xingao; Zhang, Chenning; Zeng, Liang

doi:10.3390/electronics15030711

Open AccessArticle

ST-GC-GRU: A Hybrid Deep Learning Approach for Shield Attitude Prediction Based on a Spatial–Temporal Graph

by

Wen Liu

¹,

Jia Chen

²,

Shanshan Wang

²,

Xue Wang

¹,

Xingao Yan

²,

Chenning Zhang

² and

Liang Zeng

^2,*

¹

CCCC Wuhan Zhi Xing International Engineering Consulting Company Limited, Wuhan 430068, China

²

School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan 430068, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(3), 711; https://doi.org/10.3390/electronics15030711

Submission received: 21 January 2026 / Revised: 29 January 2026 / Accepted: 3 February 2026 / Published: 6 February 2026

Download

Browse Figures

Versions Notes

Abstract

The accurate estimation of shield attitude deviation is related to the quality of tunnel construction. However, the existing recurrent neural network (RNN)-based methods are unable to efficiently capture the spatial correlation between different timestamps (DT) and have poor prediction performance when handling drastically changing attitude data, which makes it difficult to estimate attitude deviation when attitude changes are frequent. This study proposes a shield machine attitude prediction model (ST-GC-GRU) based on a spatial–temporal graph. Different from the traditional attitude prediction methods, the method firstly introduces an improved GCN (ST-GCN: spatial–temporal graph) and the time decomposition technique to enhance its representation of the attitude change information, thus more rationally modeling the comprehensive spatial–temporal dependence of the shield structure operation data. The method demonstrates better prediction performance than previous methods under a large number of real data tests and effectively improves the low-confidence predictions of the prediction model when dealing with large attitude changes. The results indicate that the proposed method is better than the other seven prediction models in four attitude deviation values. The model and the research results can provide a reference for developing adaptive control technology in shield tunnel construction.

Keywords:

shield machine; attitude prediction; tunnel construction; deep learning

1. Introduction

Shield machines are extensively utilized in subway construction, underwater highways, high-speed railways, and similar infrastructure projects due to their high automation capabilities and robust anti-interference properties [1,2]. During shield tunneling operations, factors such as uneven geological conditions and improper control operations can cause the shield machine to deviate from the designed tunnel axis (DTA), resulting in attitude deviation. This deviation leads to significant engineering challenges, including tube piece assembly misalignment and tunnel structure cracking [3,4]. Current control strategies for shield tunnel deviation (STM) issues exhibit inherent temporal delays, meaning that operators’ corrective adjustments typically occur after attitude deviation has already manifested [5,6]. Accurate advance prediction of shield attitude information would therefore represent a major advancement in shield tunneling technology.

To address this challenge, researchers have conducted extensive studies on shield attitude prediction methods. As demonstrated in Table 1, these research approaches span from traditional experimental modeling (incorporating numerical analysis and theoretical calculations) to advanced recurrent neural network (RNN) techniques, all designed to extract comprehensive feature information from historical shield attitude data, identify underlying nonlinear relationships, and achieve precise shield attitude prediction [7,8]. Traditional modeling approaches typically utilize mechanics-based principles and field-measured parameters (thrust, torque, etc.) to establish theoretical frameworks, employing statistical (autoregressive) modeling methods for parameter optimization [9,10]. Sramoon et al. [11] investigated shield behavior simulation during tunneling in sandy gravel layers using earth-pressure-balanced shields, demonstrating that factors such as ground loosening and wire brush deformation must be considered to accurately capture actual shield behavior and align with field observations. Shen et al. [12] developed an enhanced calculation methodology for shield attitude prediction by modeling shield–soil interaction through equivalent springs and ground reaction curves, subsequently applying this approach to a metro project for pitch and yaw angle prediction and validation against real-time monitoring data. While mechanism-based approaches enhance understanding of shield tunneling behavior, statistical and experimental modeling methods cannot adequately account for operational condition variability, are computationally intensive, and demonstrate limited generalization capability [13].

Recent developments in data-driven methodologies have demonstrated remarkable capability in extracting complex feature representations from large-scale operational datasets for shield attitude prediction [20]. For example, Liu et al. [14] introduced a BWO-CNN-LSTM-GRU framework achieving 3 mm prediction deviation through systematic hyperparameter optimization, demonstrating the effectiveness of meta-heuristic algorithms in hybrid architectures. Zhou et al. [4] proposed a novel WT-CNN-LSTM hybrid architecture that integrates wavelet transform preprocessing with convolutional feature extraction and long short-term memory temporal modeling to predict critical shield parameters, including pitch angle, rolling angle, and both vertical and horizontal deviations, achieving superior prediction accuracy through enhanced feature representation. Zhen et al. [17] developed an explainable framework combining enhanced attention Informer (EAMInfor) with DeepLIFT, addressing the “black box” limitation of machine learning models. Their analysis of Xiamen Metro Line 3 revealed that push, thrust, and earth chamber pressure were the most significant features, with variations in importance exhibiting substantial differences across geological conditions. Wang et al. [16] established an innovative CNN-GRU fusion architecture that substantially improved prediction accuracy and model robustness in multi-phase shield attitude estimation tasks. Their comprehensive sensitivity analysis, incorporating both first- and second-order derivatives, revealed significant heterogeneity in the contribution weights of historical values across different temporal horizons. These investigations predominantly conceptualize shield machine attitude prediction as a temporal sequence forecasting problem, consequently driving extensive adoption of recurrent architectures, including RNN, LSTM, and GRU variants that excel in capturing temporal dependencies. Relative to conventional RNNs, LSTM and GRU architectures effectively mitigate gradient explosion and vanishing gradient phenomena through sophisticated gating mechanisms, enabling the robust learning of intricate long-term dependencies in sequential data, thereby establishing their prominence as fundamental architectural components [21].

Notably, in the authors’ previously proposed FTA-N-GRU model, the method employs feature and temporal attention to adaptively weight input parameters and model output dependencies. While effective, attention-based approaches primarily capture correlations within individual timestamps and encounter difficulties in two critical challenges: (1) modeling spatial–temporal relationships across different timestamps, and (2) handling the irregular sampling patterns that are commonly present in shield tunneling data. This study addresses these limitations through a fundamentally different approach: spatial–temporal graph modeling with time decay functions. Unlike attention mechanisms that weight features within individual timestamps, our graph-based approach explicitly constructs and models relationships among all features across all timestamps, enabling comprehensive spatial–temporal dependency learning.

Contemporary shield attitude prediction methodologies predominantly employ RNN-based architectures to model temporal dependencies within sequential operational data [22]. However, shield operational datasets intrinsically exhibit complex spatial–temporal (ST) interdependencies characterized by both temporal correlations among operational parameters and spatial correlations between heterogeneous parameters within individual timestamps. Suboptimal modeling strategies can substantially degrade the predictive performance of data-driven approaches in shield attitude forecasting applications [17]. Moreover, existing methodologies encounter significant challenges in data acquisition and preprocessing protocols. Irregular temporal sampling of attitude measurements can introduce systematic bias and spurious correlations into time series models, particularly compromising predictive accuracy during periods of rapid attitude variation. Under such conditions, models may erroneously approximate future attitudes using historical trajectory patterns, resulting in critically low prediction confidence and fundamentally limiting the practical applicability of data-driven methodologies [23,24].

To systematically address these fundamental limitations, this investigation introduces a novel ST-GC-GRU methodology, with primary contributions delineated as follows.

Temporal decomposition for robustness to sudden attitude changes: A recursive quadratic decomposition framework is introduced to separate attitude signals into trend and residual components, enhancing the model’s capability to handle drastic attitude variations during geological transitions and under irregular sampling conditions.
Fully connected spatial–temporal graph with time decay matrix: A novel graph structure is proposed where all features across multiple timestamps are fully connected, with a time decay matrix jointly modeling spatial correlations (between heterogeneous parameters) and temporal dependencies (across different time steps) based on actual temporal distances.
Comprehensive performance superiority on real project data: Extensive experiments on the Bangladesh Karnaphuli River Tunnel Project demonstrate that the proposed model consistently outperforms seven baseline methods (CNN, CNN-GRU, GRU, LSTM, TCN, RF, XGBoost) across all four deviation indices (HDH, HDT, VDH, VDT).

Compared to attention-based methods such as Feature Extraction and Attention-based Machine Learning (FEK-AML) that primarily focus on feature importance within individual timestamps, the proposed ST-GC-GRU addresses their limitations in modeling spatial–temporal relationships across different timestamps and handling drastically changing attitude data under variable working conditions. By employing spatial–temporal graph modeling with time decay functions that explicitly construct relationships among all features across all timestamps, the method enables enhanced robustness under diverse operational scenarios, including those with irregular sampling patterns and geological transitions.

The remainder of this manuscript is structured as follows: Section 2 presents a comprehensive exposition of the proposed methodology, including theoretical foundations and architectural design principles. Section 3 demonstrates the efficacy and superiority of the developed model through rigorous evaluation using real-world case studies and comparative analysis with state-of-the-art baseline methods. Section 4 provides an in-depth discussion of the experimental findings, model performance characteristics, and practical implications of the proposed approach. Section 5 concludes the investigation with a comprehensive summary of key findings and delineates promising directions for future research endeavors.

2. Methodology

As illustrated in Figure 1, the comprehensive research framework comprises three interconnected components: (1) data preprocessing and feature engineering, (2) model architecture construction and optimization, and (3) model training, validation, and performance evaluation. The following subsections provide a systematic and detailed exposition of each constituent component.

2.1. Shield Machine Attitude Measurement

The primary prediction target in this investigation is shield attitude deviation, for which measurement precision directly determines the overall construction quality and structural integrity of a shield tunneling project [25,26,27]. To comprehensively characterize shield machine attitude, four critical parameters are identified and selected: horizontal deviation of the shield head (HDH), horizontal deviation of the shield tail (HDT), vertical deviation of the shield head (VDH), and vertical deviation of the shield tail (VDT). These four parameters constitute the most widely adopted engineering indicators for quantitative assessment of shield attitude in industrial applications. Figure 2 presents a detailed schematic representation of the shield machine’s designed tunnel axis (DTA) and associated attitude parameters. The red dotted trajectory represents the predetermined DTA of the shield machine, while the solid black line depicts the actual tunneling trajectory (TA). The angular intersection between these two trajectories quantifies the deviation angle, which serves as a fundamental metric for attitude assessment.

2.2. Data Preprocessing

Throughout shield tunneling operations, attitude dynamics are governed by a complex interplay of multiple influential factors, including historical attitude states, operational parameters, and heterogeneous geological conditions. These multifaceted factors exhibit intricate interactions that collectively determine the evolution of attitude variations [28]. However, in real-world engineering applications, acquired datasets frequently exhibit temporal discontinuities, containing inevitable missing values and anomalous outliers resulting from human operational interventions or equipment malfunctions [29]. Consequently, the implementation of robust data quality assurance protocols is imperative for ensuring model reliability. The proposed data cleaning methodology leverages the fundamental operational principle that shield thrust force equals zero during shutdown periods, which can be mathematically formulated as described in Equation (1).

D = F (x_{F}) \times F (x_{v}) \times F (x_{T}) \times F (x_{n})

(1)

where v, T, and n are associated with the penetration, cutter torque, and rotational speed, respectively, while

F (x_{F})

represents the thrust force. The value of parameter D reflects the data condition of the shield machine. If D equals 0, the data is considered invalid and should be disregarded. Conversely, when D is not 0, the data is deemed valid, signifying that the shield machine is actively tunneling and the data should be retained for analysis.

The comprehensive dataset acquired in this investigation encompasses more than 1700 distinct features, providing extensive nonlinear information while simultaneously imposing substantial computational burdens on model training and inference [30]. Consequently, the implementation of systematic feature selection becomes critically important to ensure optimal data modeling efficiency and computational tractability. This investigation employs a model-agnostic filtering approach that directly evaluates features based on their intrinsic statistical properties, thereby eliminating the computational overhead associated with iterative model training procedures [31]. Specifically, this study first eliminates the invariant features of the shield machine. Subsequently, the Pearson correlation coefficient is applied to discard features less associated with the shield attitude. Finally, features that have been field-used and those related to the attitude are retained based on relevant knowledge and expert experience. The specific calculation principle is illustrated in Equation (2) [32]. The correlation coefficient between the chosen features is represented by

(f_{x}, f_{y})

, where

f_{xi}

and

f_{yi}

denote the

i^{t h}

values of

f_{x}

and

f_{y}

, respectively. The standard deviation of

f_{x}

and

f_{y}

are denoted by

σ_{f_{x}}

and

σ_{f_{y}}

, respectively. Furthermore,

\bar{f_{xi}}

and

\bar{f_{yi}}

are the mean values of

f_{x}

and

f_{y}

, correspondingly.

ρ (f_{x}, f_{y}) = \frac{c o v (f_{x}, f_{y})}{σ_{f_{x}} σ_{f_{y}}} = \frac{\sum_{i = 1}^{n} (f_{x i} - \bar{f_{x i}}) (f_{y i} - \bar{f_{y i}})}{\sqrt{\sum_{i = 1}^{n} {(f_{x i} - \bar{f_{x i}})}^{2} \sqrt{\sum_{i = 1}^{n} {(f_{y i} - \bar{f_{y i}})}^{2}}}}

(2)

To enhance the model-training efficiency, this study employs the 3-sigma rule (Equation (3)) for outlier detection. Identified outliers are replaced with the mean of their preceding and succeeding values.

σ

is the standard deviation and

μ

is the mean value. The method generally assumes that 99.7% of data values lie within (

μ - 3 σ

,

μ + 3 σ

), with only a 0.3% probability of values falling outside this range. Such deviations are considered exceptionally rare and are thus classified as outliers [33].

P (|x - μ| > 3 σ) \leq 0.003

(3)

Subsequent to outlier elimination, the removal of high-frequency noise components is essential to extract meaningful signal patterns and enhance model training effectiveness. The Butterworth filter is selected for this investigation due to its exceptional characteristics, including monotonic frequency response behavior and superior phase response properties, which enable effective noise suppression while preserving the underlying data distribution and fundamental signal trends [34]. The optimal balance between filtering performance and data integrity preservation is achieved through systematic calibration of critical filter parameters, specifically the cutoff frequency and filter order. Consequently, the Butterworth filter represents the most suitable denoising approach for this application. As mathematically expressed in Equation (4),

w_{c}

represents the cutoff frequency and m denotes the filter order. To achieve optimal filtering effectiveness while maintaining the integrity of the original data distribution characteristics, extensive empirical validation determined the optimal cutoff frequency to be 0.2 and the filter order to be 2, representing the configuration that maximizes noise reduction while minimizing signal distortion.

{|H (ω)|}^{2} = \frac{1}{1 + {(\frac{ω}{ω_{C}})}^{2 m}}

(4)

As formulated in Equations (5) and (6), data normalization is imperative prior to model training due to substantial heterogeneity in the numerical ranges and scales across different input features [35]. This preprocessing step ensures uniform scaling of all features to a standardized range, thereby preventing features with larger numerical magnitudes from dominating the learning process and enhancing model convergence stability. Subsequently, the predicted outputs require inverse normalization (denormalization) to restore values to their original scale, enabling accurate performance evaluation and meaningful comparison with ground truth measurements.

f (x_{i}) = \frac{F (x_{i}) - \min (F (x_{i}))}{\max (F (x_{i})) - \min (F (x_{i}))}

(5)

F (x_{i}) = f (x_{i}) [\max (F (x_{i})) - \min (F (x_{i}))] + \min (F (x_{i}))

(6)

where

F (x_{i})

denotes the original

i^{t h}

input feature, while

m i n (F (x_{i}))

and

m a x (F (x_{i}))

represent the minimum and maximum values of the feature, respectively. Furthermore,

f (x_{i})

refers to the value obtained after applying max–min normalization.

The sliding window method is adopted in this study to transform the time series problem into a supervised learning problem. The sliding window operation consists of two essential parameters: the sliding window length and the sliding step size [36]. An illustration of the implementation of the sliding window operation principle can be seen in Figure 3. Specifically, each training sample contains historical data from three consecutive time steps (t − 2, t − 1, t), with the output being the shield attitude value at the next time step (t + 1). This study implements single-step-ahead prediction, where the model predicts shield attitude parameters (HDH, HDT, VDH, VDT) at time step t + 1 based on the three-step historical window. The sliding step is set to 1, enabling continuous single-step predictions as the shield advances. The N + 1th sample contains historical data for the current time step t + 1 as well as a fixed time step length, with the output being the attitude value of the shield machine at the n + 2 time step. Previous research has demonstrated that a time window length of 3 yields optimal prediction performance, and further increasing the window length does not significantly enhance the effect [37]. Consequently, the time window length is set to 3, and the sliding step is set to 1.

2.3. Model Establishment

As shown in Figure 4, the ST_GC-GRU prediction model structure comprises three components: temporal decomposition representation, a fully connected spatial–temporal graph, and GRU.

The proposed methodology employs a hierarchical time decomposition approach to systematically extract trend components and residual terms from the original attitude time series data. These decomposed components are subsequently integrated with the original dataset to enhance the representation of shield attitude variation patterns and temporal dynamics. The augmented feature set serves as input to a fully connected spatial–temporal graph convolutional network (ST-GCN), which incorporates a sophisticated time decay function to construct dynamic adjacency matrices that effectively capture and model the complex spatial–temporal correlations among system variables. During the forward propagation process, input data traverses multiple computational units within the GRU architecture, generating a sequence of hidden states that encapsulate comprehensive spatial–temporal information. The effective modeling of complex nonlinear mapping relationships necessitates multi-layer recurrent architectures, with empirical evidence suggesting that a minimum of two RNN layers is required for adequate representational capacity. To achieve an optimal trade-off between model expressiveness and computational complexity, this investigation implements a two-layer GRU configuration that balances predictive performance with parameter efficiency. Subsequent to the GRU layers, the output layer (dense linear transformation) maps the learned hidden representations to final prediction values, specifically the shield machine attitude forecast at time step t + 1. Given the substantial parameter space inherent in GRU architectures, the model exhibits heightened susceptibility to overfitting phenomena. To mitigate this challenge, a dropout regularization mechanism is strategically incorporated between the GRU and output layers. During training iterations, the dropout component stochastically deactivates a predetermined fraction of neurons, thereby preventing excessive parameter specialization and enhancing the model’s generalization capability across diverse operational conditions [38,39].

The precise input tensor structure is as follows: Each sample contains N = 18 selected features (operational parameters A-O and one historical attitude parameter from P-S) across T = 3 consecutive time steps (t − 2, t − 1, t). After temporal decomposition augmentation (original + trend + residual components), the feature dimension becomes 3 N = 54. The fully connected spatial–temporal graph treats all N × T feature-timestamp combinations as nodes, resulting in 54 graph nodes. The input tensor shape to ST-GCN is (batch_size, 54, C_in), where C_in is the initial feature dimension, and the output shape is (batch_size, 54, 32), where 32 is the number of learned output channels.

2.3.1. Temporal Decomposition Representation

Empirical investigations have demonstrated that historical positioning information constitutes the most critical determinant for shield tunneling machine attitude estimation, aligning with established operational control patterns employed by experienced machine operators [39]. Specifically, when attitude deviations remain within predetermined empirical tolerance thresholds, seasoned operators typically adopt a minimal intervention strategy to preserve tunneling process continuity. Corrective actions are implemented only when deviations approach or exceed critical empirical limits, necessitating immediate operational adjustments. Consequently, shield machine tunneling attitude is predominantly governed by operator decision-making protocols and inherent machine motion inertia, resulting in strong temporal dependencies where future attitude states exhibit significant correlation with historical trajectory patterns.

The evolution of tunneling attitude dynamics lacks inherent periodic behavioral characteristics, instead demonstrating high dependence on historical attitude information and individual operator behavioral patterns. Although attitude variations fundamentally manifest as continuous state processes, practical engineering implementations frequently encounter irregular data acquisition patterns. Shield operational data collection methodology typically employs equidistant sampling based on tunneling distance intervals, deliberately avoiding high-frequency sampling protocols to prevent excessive data redundancy in raw datasets [40]. This equidistant sampling strategy effectively addresses issues associated with data duplication and overly fine temporal resolution. However, variations in operator behavioral patterns and various data quality anomalies (including null values and statistical outliers) necessitate the systematic removal of aberrant data points, inevitably introducing significant temporal discontinuities within specific observation windows. Theoretically, shield machine tunneling data should exhibit continuous state characteristics, with long-term attitude trends demonstrating gradual evolution without substantial or abrupt transitions. Traditional time series models, particularly RNN-based architectures, fundamentally assume uniform temporal spacing in historical input data, which may introduce substantial prediction errors when confronted with irregular sampling patterns and compromise overall model accuracy [41]. However, advanced time decomposition methodologies can systematically extract complex temporal patterns embedded within time series data (Figure 5), thereby elucidating distinct temporal characteristics and revealing underlying evolutionary trends. Consequently, the implementation of time decomposition techniques can substantially mitigate the adverse effects of temporal data irregularities on predictive model performance [23].

This investigation implements a recursive quadratic decomposition framework to systematically perform temporal decomposition of raw attitude feature sequences [24]. The decomposition methodology employs a moving average kernel function to extract the trend component representing long-term attitude evolution patterns. The residual component is subsequently computed as the difference between the original time series and the extracted trend component. The trend component captures the fundamental long-term directional changes within the time series, characterizing the overall upward or downward evolutionary trajectory of attitude variations. Conversely, the residual component emphasizes transient fluctuations and anomalous variations in the original dataset, effectively representing short-term attitude perturbations and localized deviations from the underlying trend. These decomposed components encapsulate distinct temporal patterns of attitude dynamics, providing complementary information about both macro-scale evolutionary trends and micro-scale variational characteristics. Both trend and residual components are subsequently integrated with the original time series to form an enriched input feature set for model training. The mathematical formulation of the temporal decomposition methodology is detailed as follows:

x_{t 1} = a v g p o o l (p a d d i n g (x))

(7)

x_{s} = x - x_{t 1}

(8)

x_{t 2} = a v g p o o l (p a d d i n g (x_{s}))

(9)

x_{s} = x_{s} - x_{t 2}

(10)

x_{t} = x_{t 1} + x_{t 2}

(11)

where

x

is the original data.

x_{s}

and

x_{t}

represent the residual and trend terms extracted from the original data, respectively.

The temporal decomposition employs the following hyperparameters: (1) Kernel size: k = 2, corresponding to a 5-point moving average window (2 k + 1 = 5). (2) Decomposition levels: 2 levels (quadratic decomposition)—the first level extracts trend T¹ and residual R¹, and the second level further decomposes R¹ into trend T² and refined residual R². (3) Padding strategy: ‘same’ padding mode to preserve the original sequence length of 3 timesteps. The final augmented feature set concatenates [X original, T¹, R²] for each of the 18 input features, resulting in 54 total feature channels (18 × 3). The kernel size k = 2 is selected to match the temporal window length: a 5-point averaging window effectively smooths fluctuations within the 3-timestep input window while preserving meaningful trend information.

2.3.2. Fully Connected Spatial–Temporal Graph

The reframed data can be represented as a two-dimensional matrix

X \in R^{N \times L}

, where each sample contains

N \times L

nodes corresponding to all shield machine parameters across the time window [42]. To model the comprehensive spatial–temporal dependencies, this study connects all nodes and quantifies their correlations using dot product similarity:

e_{t r, i j} = g_{s} (x_{t, i}) {(g_{s} (x_{r, j}))}^{T}

(12)

where

x_{t, i}

represents the

i^{t h}

data in the

i^{t h}

column of features,

t, r \in [1, L]

,

i, j \in [1, N]

. Here, a function

g_{s} (x) = x W_{s}

is used for dimensional mapping.

W_{s}

represents learnable weights [43]. Furthermore, the softmax function constrains the correlation values to the range [0,1]. After integrating the information of each node

Z = {\{{\{x_{t, i}\}}_{i = 1}^{N}\}}_{t = 1}^{L}

and the relationships

E = {\{{\{e_{t r, i j}\}}_{i, j = 1}^{N}\}}_{t, r = 1}^{L}

between them, a graph

G = (Z, E)

is formed. Graph G encompasses not only the ST relationship of each node at the same timestamp but also the ST relationship between different features across various timestamps. This enables the modeling of comprehensive ST dependencies in the shield operation data [44].

To incorporate temporal proximity, a time decay matrix C is designed where each element

c_{t r, i j} = f (T)

attenuates correlations based on temporal distance. As illustrated in Figure 6, features at closer timestamps exhibit stronger correlations than distant ones [16,18]. The enhanced adjacency matrix

e_{t r, i j} = e_{t r, i j} \cdot c_{t r, i j}

ensures that temporal proximity is explicitly modeled in the graph structure.

Graph Convolutional Networks (GCN) aggregate information between connected nodes through layer-wise propagation (Equation (13) [45]).

H^{(l + 1)} = f (H^{(l)}, E)

(13)

The lth layer propagates information through the adjacency matrix E with learnable weights and activation functions, as formulated in Equation (14):

f (H^{(l)}, E) = σ (E H^{(l)} W^{(l)})

(14)

To normalize information propagation, GCN incorporates a degree matrix

\tilde{D}

with the adjacency matrix. Stacking multiple layers enables successive transformations as defined in Equation (15) [46]:

H^{(l + 1)} = σ ({\tilde{D}}^{- \frac{1}{2}} \tilde{E} {\tilde{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)})

(15)

where

\tilde{D}

is the degree matrix,

\tilde{D} = \sum_{j} \tilde{E_{i j}}

, and

\tilde{E} = E + I_{N}

. E is the adjacency matrix, and

I_{N}

is the identity matrix. The degree matrix

\tilde{D}

captures the degree information of nodes within the graph structure, providing insights into the connectivity and importance of each node relative to its neighbors. By incorporating both the original adjacency matrix and the identity matrix, the augmented adjacency matrix

\tilde{E}

accounts for self-connections and reinforces the representation of each node in the graph.

2.3.3. GRU

Following spatial–temporal graph convolution, a Gated Recurrent Unit (GRU) processes the graph-enhanced features to capture temporal dependencies. GRU is selected over LSTM due to its computational efficiency while maintaining comparable performance in modeling sequential patterns. The GRU architecture employs two gating mechanisms (update gate and reset gate) to selectively retain relevant historical information while filtering irrelevant patterns, as illustrated in Figure 7. The mathematical formulation is detailed in Equations (16)–(20). This investigation implements a two-layer GRU configuration with dropout regularization (rate = 0.2) between layers to prevent overfitting, balancing model expressiveness with computational efficiency.

Z_{t} = σ (W_{Z} \cdot [h_{t - 1}, x_{t}])

(16)

R_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}])

(17)

\tilde{h_{t}} = t a n h (W \cdot [r_{t} * h_{t - 1}, x_{t}])

(18)

h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * \tilde{h_{t}}

(19)

\tanh (x) = \frac{\sinh (x)}{\cosh (x)} = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(20)

where

x_{t}

signifies the input at time step t, while

h_{t}

denotes the hidden state from the preceding time step t, and

W_{z}

and

W_{r}

correspond to the weights associated with the update gate and reset gate, respectively. The function

σ

specifically represents the sigmoid activation function, which is crucial for gating operations, whereas

t a n h (x)

signifies the hyperbolic tangent activation function employed within the GRU structure for non-linear transformations of input data.

σ

stands for the sigmoid function and

t a n h (x)

indicates the activation function.

2.4. Model Training and Evaluation

The primary hardware components utilized include a 12th Gen Intel(R) Core(TM) i7 12700H processor and NVIDIA GeForce RTX 3060 graphics card. To verify the performance of the proposed model, the original data are partitioned into three segments: 70% for training, 10% for overfitting assessment, and 20% for model testing. This approach ensures the model’s generalization capabilities and evaluates its predictive performance. The activation functions of the ST-GCN and GRU layers are set to ReLU (Equation (21)) and ELU (Equation (22)) functions, respectively. The loss function used for training is mean square error MSE (Equation (23)), where

y_{i}

is the prediction value of the proposed model and

{\hat{y}}_{i}

is the true value of the output features. The optimizer is Adam.

R E L U (x) = {\begin{matrix} x, i f x > 0 \\ 0, i f x \leq 0 \end{matrix}

(21)

E L U (x) = {\begin{matrix} x, i f x > 0 \\ a (e^{x} - 1), i f x \leq 0 \end{matrix}

(22)

M S E = \frac{1}{N} Σ_{i = 1}^{N} {(y_{i} - {\hat{y}}_{l})}^{2}

(23)

Training deep learning models requires a substantial number of iterations to achieve reliable predictive accuracy. Insufficient training may affect the model’s ability to extract relevant features, while excessive epochs can potentially lead to overfitting. To strike a balance between adequate training and overfitting prevention, this research initially employs an extended number of training epochs, followed by the implementation of an early-stopping strategy to monitor training progress. If the model’s accuracy fails to demonstrate significant improvement after a predefined number of epochs, training is terminated prematurely.

The fine-tuning of model hyperparameters is guided by domain expertise and prior network tuning experiences, followed by a systematic grid search to identify optimal configurations. Table 2 consolidates the complete model configuration, including training hyperparameters, architectural parameters, data preprocessing settings, and implementation details, to ensure full reproducibility. The hyperparameter candidates listed in the table were systematically evaluated through grid search, with the final settings selected based on validation set performance.

To evaluate the prediction effect, mean absolute error (MAE) and root mean squared error (RMSE) are used as evaluation metrics in this study. Compared to MAE, the RMSE is more susceptible to outliers. The detailed formulas are presented in Equations (24) to (25).

M A E = \frac{1}{N} \sum_{i = 1}^{N} ∣ (y_{i} - {\hat{y}}_{i}) ∣

(24)

R M S E = \sqrt{\frac{1}{N} Σ_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(25)

3. Case Study

3.1. Case Background and Data Processing

The Karnaphuli River Underwater Tunnel in Bangladesh, regarded as the “National Father Tunnel” within Bangladesh, represents the initial international, extended, large-diameter tunnel undertaking implemented by the China Communications Construction Company (CCCC). Situated at the mouth of the Karnaphuli River on Chittagong’s periphery in Bangladesh, this tunnel extends 2450 m in total length for the individual tunnel and roughly 4900 m for the dual track, linking the eastern and western shores of the waterway. The project site overview is depicted in Figure 8.

The project is located in the alluvial delta region at the river’s estuary, characterized by Quaternary alluvial deposits. The stratigraphy of the west bank and the riverbed features interbedded clay and sand layers of Holocene alluvial origin within the Quaternary system. This stratigraphy is predominantly sand layers interspersed with thin clay layers and vice versa. The clayey soils are mostly in a soft, plastic state with poor geotechnical properties. On the eastern bank, the surface layer comprises alternating cohesive clay and sandy soil layers, primarily dominated by sandy soil. Beneath this surface layer, there are Quaternary alluvial deposits with alternating layers of cohesive clay and sandy soil. The cohesive clay is mainly hard-state silty clay, occasionally exhibiting a semi-indurated state. As shown in Figure 9, when the shield machine passes through the fine sand layer, there is a complex geological stratum known as the “upper soft and lower hard” composite stratum, which easily causes misalignment of the shield machine, making it necessary to have stringent control over the shield machine’s attitude during construction.

The shield machine is equipped with numerous sensors that continuously monitor its tunneling operation in real-time. All monitoring data are stored in the data acquisition system. The data collected in the early stage include a total of 1224 rings, which are recorded at an excavation interval of 20 mm. Each ring comprises over 1700 features and more than 100 sampling points, resulting in a total data volume of approximately 130,000 samples.

The collected parameters can be summarized into four categories: mechanical operation parameters (thrust, cutter head rotation speed, cutter head torque, jack stroke, etc.), geological condition parameters (surrounding earth pressure), mud system parameters (mud delivery pressure, mud flow rate, etc.), and running status values.

The mechanical operating parameters reflect the current tunneling state of the shield machine. The shield machine operators can adjust these parameters to achieve attitude control. Of course, the raw data also includes many items related to electrical equipment and chemical gas concentrations, which are not listed here in detail. The geological condition parameters reflect the interaction forces between the cutter head and the complex strata during the tunneling process of the shield machine. The generation and distribution of the pressures are intimately related to geological conditions, construction depth, groundwater level, and the operation of the shield machine. Various soil layers (such as sand, clay, and rock) exhibit distinct densities and strengths, leading to different pressure distributions. The parameters of the mud system mainly reflect the impact of mud pressure and balance on the stability and direction control of the shield machine. These parameters can indirectly indicate the distribution of mud pressure and flow in the shield machine. By monitoring and adjusting these parameters, the stability and safety of the tunneling process can be ensured.

Through feature selection and related reference, the key parameters selected are: thrust of the tunnel boring machine (A), cutter head rotation speed (B), cutter head torque (C), penetration (D), surrounding earth pressure (E–H), jack stroke difference (I, J), mud flow and pressure (K–N), the air bubble chamber pressure (O), and the horizontal and vertical posture deviation of the shield machine (P–S). To evaluate the performance of the proposed method, this study randomly selected 150 rings (from ring 670 to ring 819), totaling 14,689 samples, as the dataset. This data volume aligns with the average amount in existing studies on tunnel boring machine attitude prediction problems (generally around 14,000 samples in total). Table 3 provides a summary description of all selected data.

The selected data need to be prepared for model training and evaluation, which encompasses data cleaning, filtering, normalization, and reconstruction operations. Specifically, first, invalid information, such as shield machine downtime and null values, must be removed. For outliers, the 3-sigma rule is applied to identify them, and the mean value of the adjacent data points is used for replacement. The data with outliers removed are then sent to the Butterworth filter for denoising. Following this, min–max normalization is applied to rescale the data between zero and one. Finally, the sliding window method is utilized to reformat the time series for the supervision problem.

Detailed statistics for all 18 features are provided in Table 4. The selected segment encompasses diverse geological transitions to ensure comprehensive model evaluation.

3.2. Training Details and Implementation

In total, 70% of the processed data is used for model training, 70% to 80% of the data is utilized to evaluate potential overfitting issues, and the remaining 20% of the data is used to validate the model’s performance. The training data for the model includes data A to O from time t −

τ

to t, as well as one of the historical attitude parameters (P–S) from time t −

τ

to t. To determine the appropriate window length, this study examined the influence of various window size settings on model prediction performance. Table 5 and Table 6 present the model evaluation results for each window length setting, using the average MAE and RMSE as the evaluation metrics. The results indicate that a window length of three is the most appropriate choice, yielding the best performance in terms of average MAE and RMSE. When the window length is less than three, the model’s prediction accuracy is significantly compromised. When the length exceeds three, the accuracy improvement brought by the window length becomes very minimal. Considering the data volume and model runtime, the window length is set to three.

The proposed ST_GC-GRU model includes one ST-GCN layer and two GRU layers. The model hyperparameters are crucial for prediction performance, which should be based on a comparative analysis of different configurations. This study considers key candidates such as the number of GRU layers and the number of neurons in the ST-GCN and GRU layers. The number of ST-GCN layers is set to one, with candidate neuron counts of [16, 32, 64]. The GRU can have two to three layers, with candidate numbers of neurons being [64, 128, 256]. A total of six combinations covering all candidate parameters are selected, with detailed parameter choices and the average MSE results for each output listed in Table 7. According to the parameter sensitivity analysis results, combination 2 demonstrated superior performance compared to the other combinations. Therefore, the ST-GCN layer mapping neurons in the proposed ST_GC-GRU model are set to 32, and the neurons in the two GRU layers are set to 128. The optimizer is Adam, and the loss function is MSE. A summary of the model structure is shown in Table 8.

Figure 10 shows the loss function curves for both training and validation datasets across four attitude parameters. It is evident that the training and validation losses rapidly decrease within the first 10 epochs, subsequently stabilizing at a low level within 100 epochs, indicating that the training has been completed.

3.3. Analysis of Results

To assess the accuracy of the proposed model, this study evaluates its prediction performance on the test set. Figure 11, Figure 12, Figure 13 and Figure 14 show the predicted values and actual values for the corresponding four attitude parameters. To quantify prediction performance, the study employs MAE and RMSE as evaluation metrics. Table 9 summarizes the prediction results of the proposed model.

The four output line charts indicate that alignment between the predicted and actual values is exemplary, showcasing consistent trends. This demonstrates that the model proposed in this study can achieve high-precision prediction performance for all attitudes. The model’s performance on the MAE and RMSE evaluation metrics is particularly noteworthy, with the average MAE for HDH, HDT, VDH, and VDT being as low as 0.3804 and 0.4991, respectively. The overall MAE does not exceed 0.58. Regarding RMSE, which is more susceptible to outliers, the overall accuracy is also exceptional, confirming that the model is highly capable of handling shield attitude prediction tasks. However, it is worth noting that the prediction results for VDH are relatively poor, with worse performance on the RMSE compared to the other three attitude parameters. A possible explanation for this is that there are more sudden changes in the VDH data, and the predicted values are smoother and more continuous compared to the actual values. Conversely, for VDT, which exhibits smoother data, the prediction performance is excellent, as can be seen from the distribution of the test set.

Table 10 presents the denormalized prediction errors on the test set. For the tolerance threshold of 50 mm, while HDT shows an MAE of 52.15 mm, this reflects the parameter’s large variation range (126 mm, from −89 mm to +37 mm) in the mixed geological conditions of this project. The key insight is that the normalized MAE of 0.41 indicates that the model captures 59% of the variation pattern, leaving only 41% as prediction error relative to the total variability. Similarly, for VDH with a 76 mm range, the normalized MAE of 0.57 shows that the model explains 43% of the variation. For practical application, these results demonstrate that the model effectively tracks attitude trends and provides reliable early warnings before deviations approach critical thresholds, enabling operators to implement timely corrective actions. The particularly strong performance on HDH (1.80 mm MAE, 3.6% of tolerance) and VDT (21.36 mm MAE, 42.7% of tolerance) further validates the model’s practical utility.

The denormalized error metrics demonstrate that the proposed model achieves prediction accuracy that is well-suited for practical construction control. For HDH and VDT, the MAE values remain within 43% of allowable tolerances, enabling operators to implement timely corrective measures before critical deviations occur. While HDT shows larger absolute errors due to its substantial variation range in mixed geology, the model’s ability to capture 59% of variation patterns provides reliable trend tracking for proactive attitude control. These performance levels are adequate for real-time shield guidance systems, where early warning capability is more critical than absolute prediction precision.

As shown in Figure 15, the VDH parameter exhibits noticeably larger prediction errors (MAE: 43.55 mm, normalized MAE: 0.57) compared to other attitude parameters. Segment-wise analysis reveals that this degraded performance is primarily concentrated in high-deviation ranges (|VDH| > 40 mm), where the RMSE increases to 2.95 mm compared to 1.05 mm in low-deviation segments (|VDH| < 20 mm), representing a 180% increase. This performance degradation in high-change segments can be attributed to several key factors: (1) data sparsity in extreme deviation scenarios limits the model’s exposure to critical cases despite comprising 41.6% of samples, as these events exhibit substantially higher variability; (2) increased geological complexity during large deviations, where sudden transitions (rock–soil interfaces, water-bearing zones) introduce highly nonlinear shield–geology interactions not fully captured by current features; (3) operating mode shifts, as operators implement aggressive corrective measures during severe deviations, introducing additional system dynamics that increase prediction complexity. To address these limitations, future improvements should focus on implementing attention mechanisms to dynamically emphasize geological transition indicators during high-deviation scenarios, developing residual correction modules trained specifically on extreme deviation subsets with customized loss functions, and incorporating multi-horizon loss functions that assign higher penalties to large deviations, thereby improving model sensitivity to critical control situations.

In addition to evaluating the model prediction accuracy, the time cost of model training is also a crucial evaluation criterion. Model parameters are not consistently fixed and require adjustments according to different application scenarios and shield machines in practical engineering applications. Therefore, updating model parameters is essential and must be emphasized in the engineering application of data-driven models. Referring to Table 11, this study summarizes the training time costs for various time window lengths. It is noteworthy that the training time fully meets the needs of practical engineering. Using 100 epochs as a standard, completing the model training takes only about 3 min. This implies that if it is necessary for the model to adapt to a new engineering environment and update its parameters with new data, it only needs about three minutes. As the number of input model parameters increases, the time required for model training also increases. Given that the test output time of the models in this study is relatively fast, typically within 1 s, it is not discussed further in this study.

4. Discussion

4.1. Comparison with Other Models

To further verify the superior performance of the proposed method, this study uses the same dataset to train and evaluate various commonly used methods. This section consists of two parts: ablation experiments and model comparison. Specifically, in the ablation experiments, it is necessary to verify the improvement introduced by the two strategies, these being the time decay function and the time decomposition method. It is necessary to remove these two parts from the proposed model separately to analyze their prediction results. Table 12 and Table 13 list the MAE and RMSE values of two ablation experiment results. The tables include six sets of models. GCN-GRU is an ablation model that has removed the time decay function, where the decay function is simply set to a constant value of 1, making it a standard GCN structure. The TS_GRU model is a GRU model that uses a time decomposition method; therefore, compared to the GRU model, it lacks the trend term and the residual term as part of the model input. The GRU model is a standard two-layer GRU network, where the number of neurons in the hidden layer is set to the same value as the proposed model, ensuring the validity of the experimental results. TS_LSTM is an LSTM model that uses a time decomposition method. Both LSTM and GRU are commonly used time series models, so they are also selected for the ablation experiment analysis to verify their generalizability. It should be emphasized that the number of neurons in the hidden layer of LSTM is set to the same value as GRU, because this part mainly discusses whether the added strategy has improved the predictive performance of the model. Therefore, the dataset, hyperparameters, and the number of hidden layers of the model should be consistent with the proposed model.

The ablation results show that the MAE and RMSE values of the proposed model are markedly lower than those of the other five methods for all four attitude parameters. Specifically, compared to the standard GCN-GRU, the average MAE and RMSE of the model are improved by 18.61% and 17.89%, respectively. In contrast, compared to the GRU model without any improvement measures, the MAE and RMSE of the proposed model are decreased by 52.44% and 57.07%, respectively, indicating a significant enhancement. For the time decomposition representation, this study used GRU and LSTM as baseline models, incorporating attitude decomposition-related features into the model input. From the prediction results of the TS_GRU and TS_LSTM models, it can be observed that the MAE and RMSE of the GRU model are decreased by an average of 16.69% and 19.67%, respectively. Similarly, the MAE and RMSE of the LSTM model are improved by an average of 11.98% and 18.05%. It is worth noting that RMSE, an evaluation metric that is sensitive to outliers, decreases more significantly, reflecting the fact that temporal decomposition can be effective in handling highly variable attitude to a certain extent.

In addition to the above ablation models, this study also tests some popular deep learning and machine learning models, namely CNN, CNN-GRU, GRU, LSTM, Temporal Convolutional Network (TCN), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). Among them, CNN, CNN-GRU, and LSTM are the typically utilized models for shield attitude prediction. The TCN model, proposed in recent years for time series data, enhances the network’s receptive field through dilated and causal convolutions, thereby improving prediction performance. Related literature has confirmed that it has achieved better results than LSTM and GRU in many prediction tasks [47]. The RF model is an ensemble method based on Bagging, with each base learner computed in parallel. RF improves the accuracy of decision trees (DT) by introducing random attribute selection during the training process of the decision tree [48]. The XGBoost model, on the other hand, is an efficient gradient-boosting decision tree algorithm, which improves the original GBDT by using the ensemble concept of Boosting to integrate multiple weak learners into a strong learner through certain methods [49]. To ensure the validity of the experiment, the hyperparameter settings of the models used in these comparative methods are consistent with the proposed model. For the CNN model, three convolutional layers, two pooling layers, and two fully connected layers are set. The CNN-GRU model introduces two layers of GRU after the CNN layers, with 128 neurons in the GRU hidden layer. The GRU and LSTM settings are the same as in the ablation experiments. The TCN model is configured with four convolutional layers, all with 128 output channels, and two fully connected layers for output. The activation function, loss function, optimizer, and evaluation metrics remain identical to those employed in the preceding experiment. Table 14 and Table 15 show the specific experimental results.

From the comparison results, it is evident that the proposed model significantly outperforms other models according to overall prediction accuracy. When compared to the popular CNN-GRU model, the proposed model reduces the average MAE and RMSE by 41.88% and 43.13%, respectively. The prediction performance of the two machine learning models is notably inferior, which is likely due to the substantial amount of large-variation data in VDH, making the machine learning models less effective in fitting abnormal data. It should be mentioned that machine learning models are highly constrained by the distribution of training data in regression tasks, and if the data range exceeds that of the training data, their prediction performance may be severely limited. Compared to the RF model based on Bagging integration, the proposed model reduces the average MAE and RMSE by 74.42% and 75.50%, respectively. The improvements are even more pronounced when compared to the XGBoost model based on Boosting integration. This study visualizes the prediction details of four attitude parameters, with Figure 16, Figure 17, Figure 18 and Figure 19 illustrating the prediction results of each model.

To comprehensively evaluate the computational efficiency of different models, Table 16 presents a detailed comparison of model complexity and inference performance. All inference time measurements were conducted on the hardware configuration described in Section 2.4 (Intel Core i7 12700H processor with NVIDIA GeForce RTX 3060 GPU) using a batch size of 32, with results averaged over 1000 prediction iterations to ensure statistical reliability.

For tree-based models (RF, XGBoost), the parameter count represents the total number of leaf nodes across all trees in the ensemble. Inference times for these models include the overhead of ensemble prediction across all trees.

As detailed in Table 16 the proposed ST-GC-GRU model contains 177,923 trainable parameters, representing only a 1.2% increase over baseline GRU (175,745 parameters) while achieving 47.3% lower MAE, demonstrating superior parameter efficiency. The model achieves an average inference time of 4.50 ms per prediction step on the specified hardware (Section 2.4), representing merely 0.45% of typical TBM control cycles (1–2 s) and enabling seamless real-time deployment. Compared to baseline architectures, the proposed model delivers comparable or better computational efficiency (4.50 ms vs. GRU: 4.44 ms, CNN-GRU: 4.51 ms, LSTM: 6.50 ms, TCN: 5.12 ms) while substantially outperforming all methods in prediction accuracy, establishing practical viability for industrial shield tunneling control systems.

4.2. The Analysis of the Time Decay Function

The primary distinction between the proposed model and traditional GCN is the introduction of temporal information decay. This concept, which is applicable in many domains, aims to attenuate the influence of previously memorized content based on the time taken: the longer the elapsed time, the less impact the previous information has on the predicted output [47,50]. This idea is related to shield machine tunneling. Specifically, the future attitude changes in the shield machine are heavily dependent on its recent historical attitude and current attitude. Typically, the current attitude exerts the strongest correlation, whereas data from further in the past has a diminishing influence on future attitude changes, reflecting the decaying relationship of attitude information over time. In this study, the exponential decay function is selected as the model’s decay function. Additionally, constant decay functions [1, a, a²] (a is 0.85 according to tuning) and logarithmic functions, which possess decay capabilities, are also employed in the analysis of prediction effects. Figure 20 illustrates the geometric interpretation of the three decay functions, which are retrained and tested while keeping all other processes constant. The results, shown in Table 17, indicate that for the four attitude deviation values, the RMSE values using the exponential time decay function outperform those of the other two decay functions. Therefore, the exponential decay function is selected as the more appropriate decay function.

Although this method has achieved favorable prediction results, there remains room for further investigation. This study assumes a fixed time interval for setting the decay matrix. However, in actual engineering applications, the time intervals for data collection before and after tunneling are inconsistent. There may be variations in the design of the decay matrix. For instance, if the tunneling speed of the shield machine is relatively high, the difference in attitude before and after tunneling can be more pronounced, leading to greater differences in data relationships. When the tunneling speed is relatively low, the future attitude of the machine may depend more on its current attitude, or even be entirely determined by its current state. Therefore, this method still holds research value in the future.

4.3. The Analysis of the Temporal Decomposition

This study employs the time decomposition method to decompose the attitude data into long-term trend components and short-term residuals, effectively mitigating the impact of anomalous data on the model’s predictive performance. However, quantitative analysis lacks persuasiveness. This study selects data with significant changes in attitude and visualizes the prediction effects with and without time decomposition measures to specifically observe the differences in the model prediction results, as illustrated in Figure 21.

The dashed segments in the first three figures demonstrate that the shield machine’s attitude exhibits substantial fluctuations across various operational parameters. The final VDT figure presents only a partial segment due to relatively minimal data variations throughout the collection period. Examination of the prediction details in the initial three figures reveals that prediction results without time decomposition methodology share a characteristic limitation: in regions of significant variation, predicted values frequently approximate the actual values from the preceding time step, creating an apparent temporal lag effect where predictions mirror historical observations. This phenomenon corresponds to the operational behavior patterns of shield machines that are observed in engineering practice. Specifically, when attitude variations remain minimal, the future attitude state of the shield machine depends predominantly, or exclusively, on its current operational configuration. However, this predictive pattern can only ensure approximate trend alignment with actual values, while notable discrepancies in prediction accuracy persist. Incorporating attitude decomposition information substantially mitigates this limitation. As demonstrated in the attitude decomposition analysis presented in Section 2.3.1, the original data undergoes systematic decomposition into a trend component, representing long-term evolutionary patterns, and a residual component, capturing short-term variational characteristics. These components are responsible for extracting features of long-term trajectory trends and short-term attitude fluctuations, respectively. This decomposition enables the model to simultaneously capture the fundamental trends of the original data during periods of significant attitude changes while reducing the adverse impact of abrupt data fluctuations on predictive performance.

5. Conclusions

This investigation proposes a novel hybrid methodology integrating spatial–temporal graph networks and attitude decomposition techniques for enhanced shield attitude prediction. The proposed approach effectively models both temporal correlations between sequential timestamps and spatial correlations across different temporal instances within shield tunneling operational datasets. Specifically, the methodology employs time decomposition to augment the temporal representation of shield tunneling attitude dynamics, systematically extracting long-term evolutionary trends and short-term variational patterns from attitude change sequences. Subsequently, a time decay function is implemented to construct a dynamic decay graph that interconnects historical values across all timestamps based on temporal proximity, enabling the comprehensive capture of spatial–temporal dependencies through consideration of feature correlations at different temporal intervals. Comparative analysis demonstrates that the proposed model significantly outperforms existing methodologies in terms of prediction accuracy. The effectiveness of the proposed approach is validated through application to the Karnaphuli River Tunnel Project as a comprehensive case study. The experimental results indicate that: (1) Implementation of the time decay matrix enables gradual attenuation of historical data’s influence on model predictions according to respective temporal relationships, thereby substantially improving predictive accuracy. Ablation studies demonstrate that the proposed model significantly outperforms standard GCN architectures in processing diverse shield attitude datasets. (2) The time decomposition methodology successfully extracts temporal patterns from shield attitude operational data, effectively mitigating the impact of anomalous data perturbations on predictive performance and substantially enhancing attitude prediction accuracy.

Despite achieving superior predictive performance and computational efficiency for shield machine attitude estimation, the ST-GC-GRU model exhibits certain inherent limitations that warrant future investigation. First, the current time-decay matrix assumes fixed temporal intervals between consecutive measurements, which may not fully capture temporal relationships under highly irregular sampling patterns caused by variable excavation speeds or operational interruptions. Future research should explore adaptive decay mechanisms that directly incorporate actual time gaps, such as continuous-time models (Neural ODEs), time-aware attention mechanisms, or event-driven recurrent architectures to enhance robustness across diverse sampling conditions. Second, while temporal decomposition effectively mitigates the impact of abrupt data variations, external factors such as human operational interventions also significantly influence shield machine attitude dynamics. Integrating physics-guided modeling strategies, as demonstrated in physics-informed degradation prediction [51], would balance data-driven flexibility with physics-based reliability by incorporating established mechanical principles (force equilibrium, geometric constraints) into the ST-GC-GRU framework, improving both interpretability and prediction accuracy in data-scarce scenarios [52]. Third, this study focuses on single-step-ahead prediction aligned with real-time shield control requirements, where continuous sampling frequency enables iterative predictions to guide real-time corrective actions. Unlike prognostic tasks requiring long-horizon forecasting for maintenance planning [53], shield attitude prediction serves real-time operational control where immediate accuracy is paramount. Future work could extend the framework to multi-step forecasting for planning applications through: (1) recursive multi-step prediction strategies, (2) uncertainty quantification for confidence-aware planning, and (3) hybrid single-step control with multi-step planning for enhanced decision-making under complex geological conditions.

While this study demonstrates strong predictive performance on the Bangladesh Karnaphuli River Tunnel Project encompassing diverse geological conditions (soft clay, silty sand, and “upper soft and lower hard” composite strata), the generalizability of ST-GC-GRU to other shield tunneling projects, TBM types, and significantly different geological contexts remains an important direction for future validation. The segment-wise performance analysis reveals that prediction accuracy varies with geological complexity, particularly in high-deviation scenarios where sudden geological transitions introduce highly nonlinear shield–geology interactions. Future research should systematically evaluate the framework’s performance across multiple projects with distinctly different geological conditions, and transfer learning or domain adaptation techniques may be necessary to ensure robust performance across diverse tunneling environments without requiring complete model retraining.

Author Contributions

Conceptualization, J.C.; methodology, W.L.; software, J.C., X.Y. and C.Z.; validation, X.Y. and C.Z.; investigation, S.W. and X.W.; writing—original draft preparation, J.C.; writing—review and editing, W.L. and L.Z.; visualization, J.C.; supervision, S.W. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was in part supported by the key Project of Hubei Province (No. 2023BAB094), and the Open Foundation of Hubei Key Laboratory for High-efficiency Utilization of Solar Energy and Operation Control of Energy Storage System (No. HBSEES202106 and HBSEES202309).

Data Availability Statement

The data presented in this study are not publicly available due to confidentiality agreements with the shield tunneling construction project and privacy restrictions related to the ongoing engineering operations.

Acknowledgments

This work was partially supported by the Key Project of Hubei Province (No. 2023BAB094). We also gratefully acknowledge the support from the Open Foundation of Hubei Key Laboratory for High-efficiency Utilization of Solar Energy and Operation Control of Energy Storage System (No. HBSEES202106 and No. HBSEES202309). Their support and resources were invaluable in completing this research.

Conflicts of Interest

Authors Wen Liu and Xue Wang were employed by CCCC Wuhan Zhi Xing International Engineering Consulting Company Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Fu, X.; Zhang, L. Spatio-temporal feature fusion for real-time prediction of TBM operating parameters: A deep learning approach. Autom. Constr. 2021, 132, 103937. [Google Scholar] [CrossRef]
Li, P.; Dai, Z.; Huang, D.; Cai, W.; Fang, T. Impact analysis for safety prevention and control of special-shaped shield construction closely crossing multiple operational metro tunnels in shallow overburden. Geotech. Geol. Eng. 2022, 40, 2127–2144. [Google Scholar] [CrossRef]
Xu, J.; Zhang, Z.; Zhang, L.; Liu, D. Predicting shield position deviation based on double-path hybrid deep neural networks. Autom. Constr. 2023, 148, 104775. [Google Scholar] [CrossRef]
Zhou, C.; Xu, H.; Ding, L.; Wei, L.; Zhou, Y. Dynamic prediction for attitude and position in shield tunneling: A deep learning method. Autom. Constr. 2019, 105, 102840. [Google Scholar] [CrossRef]
Chen, J.; Mo, H. Numerical study on crack problems in segments of shield tunnel using finite element method. Tunn. Undergr. Space Technol. 2009, 24, 91–102. [Google Scholar] [CrossRef]
Li, X.; Yan, Z.; Wang, Z.; Zhu, H. Experimental and analytical study on longitudinal joint opening of concrete segmental lining. Tunn. Undergr. Space Technol. 2015, 46, 52–63. [Google Scholar] [CrossRef]
Ates, U.; Bilgin, N.; Copur, H. Estimating torque, thrust and other design parameters of different type tbms with some criticism to TBMs used in Turkish tunneling projects. Tunn. Undergr. Space Technol. 2014, 40, 46–63. [Google Scholar] [CrossRef]
Zhang, Q.; Qu, C.; Cai, Z.; Kang, Y.; Huang, T. Modeling of the thrust and torque acting on shield machines during tunneling. Autom. Constr. 2014, 40, 60–67. [Google Scholar] [CrossRef]
Yue, M.; Sun, W.; Hu, P. Dynamic coordinated control of attitude correction for the shield tunneling based on load observer. Autom. Constr. 2012, 24, 24–29. [Google Scholar] [CrossRef]
Hu, M.; Wu, B.; Zhou, W.; Wu, H.; Li, G.; Lu, J.; Yu, G.; Qin, Y. Self-driving shield: Intelligent systems, methodologies, practice. Autom. Constr. 2022, 139, 104326. [Google Scholar] [CrossRef]
Sramoon, A.; Sugimoto, M.; Kayukawa, K. Theoretical model of shield behavior during excavation. ii: Application. J. Geotech. Geoenviron. Eng. 2002, 128, 156–165. [Google Scholar] [CrossRef]
Shen, X.; Yuan, D.-J.; Jin, D.-L. Influence of shield attitude change on shield–soil interaction. Appl. Sci. 2019, 9, 1812. [Google Scholar] [CrossRef]
Gao, X.; Shi, M.; Song, X.; Zhang, C.; Zhang, H. Recurrent neural networks for real-time prediction of TBM operating parameters. Autom. Constr. 2019, 98, 225–235. [Google Scholar] [CrossRef]
Liu, X.; Zhang, W.; Mengting, J.; Wang, Y.; Ma, L. Multi-step intelligent prediction of shield machine position attitude on the basis of BWO-CNN-LSTM-GRU. Meas. Sci. Technol. 2024, 35, 106205. [Google Scholar] [CrossRef]
Jia, B.; Yang, Y.; Wang, X.; Li, L.; Zhang, Y.; Zheng, S. Real-time prediction method of shield tunneling attitude under complex geological conditions. Eng. Res. Express 2025, 7, 045102. [Google Scholar] [CrossRef]
Wang, K.; Wu, X.; Zhang, L.; Song, X. Data-driven multi-step robust prediction of TBM attitude using a hybrid deep learning approach. Adv. Eng. Inform. 2023, 55, 101854. [Google Scholar] [CrossRef]
Zhen, J.; Lai, F.; Huang, M.; Zheng, J.; Shiau, J.S.; Wang, P.; Zheng, J. An explainable deep learning approach to enhance the prediction of shield tunnel deviation. J. Rock Mech. Geotech. Eng. 2026, 18, 566–579. [Google Scholar] [CrossRef]
Dong, M.; Chen, C.; Zhong, F.; Jia, P. A Novel Hybrid Deep Learning for Attitude Prediction in Sustainable Application of Shield Machine. Sustainability 2025, 17, 10604. [Google Scholar] [CrossRef]
Fu, X.; Ponnarasu, S.; Zhang, L.; Tiong, R.L.K. Online multi-objective optimization for real-time tbm attitude control with spatio-temporal deep learning model. Autom. Constr. 2024, 158, 105220. [Google Scholar] [CrossRef]
Pan, Y.; Zhang, L. Integrating BIM and AI for smart construction management: Current status and future directions. Arch. Comput. Methods Eng. 2023, 30, 1081–1110. [Google Scholar] [CrossRef]
Zhang, N.; Zhang, N.; Zheng, Q.; Xu, Y.-S. Real-time prediction of shield moving trajectory during tunnelling using gru deep neural network. Acta Geotech. 2022, 17, 1167–1182. [Google Scholar] [CrossRef]
Zhang, L.; Guo, J.; Fu, X.; Tiong, R.L.K.; Zhang, P. Digital twin enabled real-time advanced control of TBM operation using deep learning methods. Autom. Constr. 2024, 158, 105240. [Google Scholar] [CrossRef]
Cao, D.; Jia, F.; Arik, S.O.; Pfister, T.; Zheng, Y.; Ye, W.; Liu, Y. Tempo: Prompt-based generative pre-trained transformer for time series forecasting. arXiv 2023, arXiv:2310.04948. [Google Scholar]
Miao, H.; Zhang, Y.; Ning, Z.; Jiang, Z.; Wang, L. Tdg4msf: A temporal decomposition enhanced graph neural network for multivariate time series forecasting. Appl. Intell. 2023, 53, 28254–28267. [Google Scholar] [CrossRef]
Wang, P.; Kong, X.; Guo, Z.; Hu, L. Prediction of axis attitude deviation and deviation correction method based on data driven during shield tunneling. IEEE Access 2019, 7, 163487–163501. [Google Scholar] [CrossRef]
He, B.; Zhu, G.; Han, L.; Zhang, D. Adaptive-neuro-fuzzy-based information fusion for the attitude prediction of TBMs. Sensors 2020, 21, 61. [Google Scholar] [CrossRef]
Zhou, C.; Gao, Y.; Chen, E.J.; Ding, L.; Qin, W. Deep learning technologies for shield tunneling: Challenges and opportunities. Autom. Constr. 2023, 154, 104982. [Google Scholar] [CrossRef]
Li, A.; Feng, M.; Li, Y.; Liu, Z. Application of outlier mining in insider identification based on boxplot method. Procedia Comput. Sci. 2016, 91, 245–251. [Google Scholar] [CrossRef]
Sim, C.H.; Gan, F.F.; Chang, T.C. Outlier labeling with boxplot procedures. J. Am. Stat. Assoc. 2005, 100, 642–652. [Google Scholar] [CrossRef]
Chen, Z.; Zhang, Y.; Li, J.; Li, X.; Jing, L. Diagnosing tunnel collapse sections based on TBM tunneling big data and deep learning: A case study on the Yinsong Project, China. Tunn. Undergr. Space Technol. 2021, 108, 103700. [Google Scholar] [CrossRef]
Alkanhel, R.; El-kenawy, E.-S.M.; Abdelhamid, A.A.; Ibrahim, A.; Alohali, M.A.; Abotaleb, M.; Khafaga, D.S. Network intrusion detection based on feature selection and hybrid metaheuristic optimization. Comput. Mater. Contin. 2023, 74. [Google Scholar] [CrossRef]
Li, Z.; Yang, Y.; Li, L.; Wang, D. A weighted pearson correlation coefficient based multi-fault comprehensive diagnosis for battery circuits. J. Energy Storage 2023, 60, 106584. [Google Scholar] [CrossRef]
Uçar, K.T.; Çat, A. A comparative analysis of sigma metrics using conventional and alternative formulas. Clin. Chim. Acta 2023, 549, 117536. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Wu, T.; Su, L.; Qian, Y.; Li, N. Precession fault diagnosis method based on butterworth filter and convolutional neural network. In Chinese Intelligent Systems Conference; Springer: Singapore, 2023; pp. 525–533. [Google Scholar]
Nagrecha, K.; Fisher, L.; Mooney, M.; Rodriguez-Nikl, T.; Mazari, M.; Pourhomayoun, M. As-encountered prediction of tunnel boring machine performance parameters using recurrent neural networks. Transp. Res. Rec. 2020, 2674, 241–249. [Google Scholar] [CrossRef]
Guo, X.; Li, W.-J.; Qiao, J.-F. A self-organizing modular neural network based on empirical mode decomposition with sliding window for time series prediction. Appl. Soft Comput. 2023, 145, 110559. [Google Scholar] [CrossRef]
Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
Flood, I.; Kartam, N. Neural networks in civil engineering. i: Principles and understanding. J. Comput. Civ. Eng. 1994, 8, 131–148. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Fu, Y.; Chen, L.; Xiong, H.; Chen, X.; Lu, A.; Zeng, Y.; Wang, B. Datadriven real-time prediction for attitude and position of super-large diameter shield using a hybrid deep learning approach. Undergr. Space 2024, 15, 275–297. [Google Scholar] [CrossRef]
Weerakody, P.B.; Wong, K.W.; Wang, G.; Ela, W. A review of irregular time series data handling with gated recurrent neural networks. Neurocomputing 2021, 441, 161–178. [Google Scholar] [CrossRef]
Wang, Y.; Wu, M.; Li, X.; Xie, L.; Chen, Z. Multivariate time series representation learning via hierarchical correlation pooling boosted graph neural network. IEEE Trans. Artif. Intell. 2023, 5, 321–333. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Jin, M.; Koh, H.Y.; Wen, Q.; Zambon, D.; Alippi, C.; Webb, G.I.; King, I.; Pan, S. A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection. arXiv 2023, arXiv:2307.03759. [Google Scholar] [CrossRef] [PubMed]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Fan, J.; Zhang, K.; Huang, Y.; Zhu, Y.; Chen, B. Parallel spatio-temporal attention-based TCN for multivariate time series prediction. Neural Comput. Appl. 2023, 35, 13109–13118. [Google Scholar] [CrossRef]
Luo, Y. Evaluating the state of the art in missing data imputation for clinical data. Brief. Bioinform. 2022, 23, bbab489. [Google Scholar] [CrossRef]
Strobl, C.; Boulesteix, A.-L.; Zeileis, A.; Hothorn, T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 2007, 8, 1–21. [Google Scholar] [CrossRef]
Wu, L.-J.; Li, X.; Yuan, J.-D.; Wang, S.-J. Real-time prediction of tunnel face conditions using XGBoost random forest algorithm. Front. Struct. Civ. Eng. 2023, 17, 1777–1795. [Google Scholar] [CrossRef]
Nguyen, A.; Chatterjee, S.; Weinzierl, S.; Schwinn, L.; Matzner, M.; Eskofier, B. Time matters: Time-aware lstms for predictive business process monitoring. In Proceedings of the Process Mining Workshops: ICPM 2020 International Workshops, Padua, Italy, 5–8 October 2020; Revised Selected Papers 2. Springer: Cham, Switzerland, 2021; pp. 112–123. [Google Scholar]
Yin, C.; Li, Y.; Wang, Y.; Dong, Y. Physics-guided degradation trajectory modeling for remaining useful life prediction of rolling bearings. Mech. Syst. Signal Process. 2025, 224, 112192. [Google Scholar] [CrossRef]
You, K.; Wang, P.; Gu, Y. Towards efficient and interpretative rolling bearing fault diagnosis via quadratic neural network with Bi-LSTM. IEEE Internet Things J. 2024, 11, 23002–23019. [Google Scholar] [CrossRef]
Yin, C.; Sun, T.; Wu, H.; Dong, Y. Trustworthy multistep-ahead remaining useful life prediction for rolling bearings with limited data. Reliab. Eng. Syst. Saf. 2025, 111902. [Google Scholar] [CrossRef]

Figure 1. The framework of the ST_GC-GRU model.

Figure 2. The designed tunnel axis (DTA) and four attitude deviation parameters of the shield machine.

Figure 3. The sliding window operation principle.

Figure 4. The structure of the ST_GC-GRU model.

Figure 5. The temporal decomposition representation method.

Figure 6. Detailed principles of the decay matrix.

Figure 7. The gated recurrent unit (GRU).

Figure 8. The Bangladesh Karnaphuli River Tunnel project overview.

Figure 9. Longitudinal geological profile of the tunnel.

Figure 10. The loss for the training and validation sets of the ST_GC-GRU model. (a) HDH, (b) HDT, (c) VDH, (d) VDT.

Figure 11. The predicted values and actual measured data for HDH.

Figure 12. The predicted values and actual measured data for HDT.

Figure 13. The predicted values and actual measured data for VDH.

Figure 14. The predicted values and actual measured data for VDT.

Figure 15. Segment-wise RMSE analysis for VDH prediction performance.

Figure 16. The predicted results of different models and the actual measured data for HDH.

Figure 17. The predicted results of different models and the actual measured data for HDT.

Figure 18. The predicted results of different models and the actual measured data for VDH.

Figure 19. The predicted results of different models and the actual measured data for VDT.

Figure 20. The compared time decay functions.

Figure 21. The effect of temporal decomposition on model prediction results. (a) HDH, (b) HDT, (c) VDH, (d) VDT.

Table 1. Assessment of predictive methods for shield machine positioning.

Method Type	Relevant References	Advantages	Limitations
Mechanism methods	Sramoon et al. [11]	Analyzed the behavior of shield tunnels	The calculation process is very complex
	Yue et al. [9]	Can provide accurate results based on structural soundness	Present research faces constraints in reflecting authentic behavioral interactions
	Shen et al. [12]	Physical modeling of the shield attitude and soil pressure	Modeling for specific situations with low prediction accuracy
Data-driven methods	Liu et al. [14]	Improved machine learning algorithms for better prediction	Modeling for specific situations with low prediction accuracy
	Jia et al. [15]	Systematic model comparison with superior efficiency
	Zhou et al. [4]	Improved model feature extraction ability using convolutional neural networks
	Wang et al. [16]	Improved the multi-step prediction robustness
	Zhen et al. [17] Dong et al. [18] Dai et al. [19]	Can extract the spatial relationship of shield operation	Current method struggles with handling data with sudden changes

Table 2. Model configuration and training parameters.

Parameter Definition	Candidate Values	Setting
Learning rate	0.001, 0.0005, 0.0001	0.0001
Epochs	80, 100, 120	100
Batch size	16, 32, 48	32
Patience	15, 20, 25	20
Loss function	-	MSE
Optimizer	-	Adam
Window size	1, 2, 3, 4, 5	3
ST-GCN hidden units	16, 32, 64	32
GRU layers	2, 3	2
GRU hidden units	64, 128, 256	128
Dropout rate	-	0.2
ST-GCN activation	-	ReLU
GRU activation	-	ELU
Decay type	-	Exponential
Normalization	-	Min-Max [0,1], per-feature
Filter type	-	Butterworth
Filter order	-	2
Filter cutoff frequency	-	0.2
Outlier detection	-	3-sigma rule
Framework	-	PyTorch 2.0.1
Python version	-	3.9.7
CUDA version	-	11.7
Random seed	-	42

Table 3. The characterization of the chosen features.

Factor	Unit	Mean	Std	Min	Median	Max
(X1) Thrust force	kN	109,558.22	4277.11	83,386	109,435	126,812
(X2) Cutterhead torque speed	rpm	0.93	0.12	0.83	0.84	1.21
(X3) Cutterhead torque	kN·m	4111.57	856.35	1411	3991	7088
(X4) Penetration	mm min⁻¹	31.44	2.92	18	31	52
(X5) Earth pressure (top)	MPa	3.16	0.29	2.23	3.18	3.81
(X6) Earth pressure (right)	MPa	15.75	5.24	5.6	22.5	26.9
(X7) Earth pressure (down)	MPa	3.01	0.31	1.7	3.03	3.9
(X8) Earth pressure (left)	MPa	21.89	4.62	8.3	21.8	27.3
(X9) Differential jack travel (up-down)	mm	−65.92	17.50	−104	−61	−32
(X10) Differential jack travel (left-right)	mm	3.02	0.67	1.8	3.2	5.7
(X11) Mud delivery pressure	bar	5.12	0.47	4.69	5.12	6.3
(X12) Mud delivery flow	m³ min⁻¹	50.25	0.65	49.1	50.5	54
(X13) Mud discharge flow	m³ min⁻¹	22.15	0.61	20.56	22.5	27.46
(X14) Mud discharge pressure	bar	4.9	0.36	4.1	4.8	6.3
(X15) Bubble chamber pressure	bar	4.90	0.05	4.75	4.9	5.1
(Y1) HDH	mm	50.23	1.53	48	50.2	54
(Y2) HDT	mm	−18.96	22.72	−89	−14	37
(Y3) VDH	mm	−11.68	17.11	−53	−9	23
(Y4) VDT	mm	−40.03	18.28	−90	−36	1

Table 4. Dataset overview and sample distribution.

Characteristic	Details
Project	Bangladesh Karnaphuli River Tunnel Project
Ring range	Ring 670 to Ring 819 (150 rings)
Tunneling distance	Approximately 1206 m to 1476 m (270 m segment)
Geological conditions	Mixed soil–rock strata: soft clay, medium–hard rock transitions, weathered rock layers
Sampling method	Distance-based (irregular intervals: 20–100 mm per sample)
Raw data points	14,689 samples
Invalid samples removed	177 samples (shield downtime, null values)
Valid samples after cleaning	14,512 samples
Samples lost in windowing	3 samples (sliding window initialization, window length = 3)
Final dataset	14,509 samples
Training set (70%)	10,156 samples
Validation set (10%)	1451 samples
Test set (20%)	2902 samples
Number of features	18 operational and attitude parameters (A–S)
Prediction targets	4 attitude deviations (HDH, HDT, VDH, VDT)

Table 5. The average MAE of the model for different window sizes.

Window Size	HDH	HDT	VDH	VDT	Average MAE
1	0.9874	1.1142	1.2429	0.5413	0.9714
2	0.5843	0.6574	0.7384	0.3217	0.5754
3	0.3001	0.4139	0.5730	0.2347	0.3804
4	0.3241	0.4184	0.5896	0.2459	0.3945
5	0.3382	0.4052	0.5766	0.2436	0.3909

Table 6. The average RMSE of the model for different window sizes.

Window Size	HDH	HDT	VDH	VDT	Average RMSE
1	1.3151	0.9541	1.4457	0.7956	1.1276
2	0.9125	0.8417	0.8859	0.4211	0.7653
3	0.3872	0.5327	0.7762	0.3005	0.4991
4	0.4253	0.5379	0.8314	0.3241	0.5296
5	0.4417	0.5247	0.7921	0.3167	0.5188

Table 7. Model hyperparameter settings and the average loss.

Type	No. ST-GCN Neurons	No. GRU Layer	No. GRU Neurons	Average MSE
1	16	2	64	8.97 × 10⁻⁴
2	32	2	128	7.46 × 10⁻⁴
3	64	2	256	7.96 × 10⁻⁴
4	16	3	64	9.02 × 10⁻⁴
5	32	3	128	7.61 × 10⁻⁴
6	64	3	256	8.04 × 10⁻⁴

Table 8. Details of the ST_GC-GRU structure.

Layer	Type	Output Shape	Activation Function	Parameter
1	ST_GCN	(None, 54, 32)	ReLU	2178
2	GRU	(None, 54, 128)	ELU	68,352
3	GRU	(None, 128)	ELU	99,072
4	Dense	(None, 64)	-	8256
5	Dense	(None, 1)	-	65

Table 9. The predicted results of the proposed model.

Metrics	HDH	HDT	VDH	VDT	Average
MAE	0.3001	0.4139	0.5730	0.2347	0.3804
RMSE	0.3872	0.5327	0.7762	0.3005	0.4991

Table 10. Denormalized prediction errors on test set.

Attitude Parameter	Data Range (mm)	MAE (mm)	RMSE (mm)
HDH	6	1.80	2.32
HDT	126	52.15	67.12
VDH	76	43.55	58.99
VDT	91	21.36	27.35

Table 11. The average training time of the four attitude parameters.

Window Size	No. Features	Average Training Time
1	18	563 ms
2	36	784 ms
3	54	952 ms
4	72	1 s 58 ms
5	90	1 s 543 ms

Table 12. Comparison of prediction results of ablation models (MAE).

Output	Model
Output	GCN-GRU	TS_GRU	GRU	TS_LSTM	LSTM	The Proposed Model
HDH	0.3896	0.5146	0.6310	0.5421	0.6057	0.3001
HDT	0.5272	0.6421	0.7540	0.8912	1.0309	0.4139
VDH	0.6177	0.7025	0.8243	1.2159	1.3929	0.5730
VDT	0.3353	0.5465	0.6783	0.6684	0.7398	0.2347
Average	0.4674	0.6014	0.7219	0.8294	0.9423	0.3804

Table 13. Comparison of prediction results of ablation models (RMSE).

Output	Model
Output	GCN-GRU	TS_GRU	GRU	TS_LSTM	LSTM	The Proposed Model
HDH	0.5249	0.7541	0.9021	0.7122	0.8049	0.3872
HDT	0.6739	0.8164	0.9825	1.0345	1.3839	0.5327
VDH	0.7983	0.9565	1.2071	1.4251	1.7540	0.7762
VDT	0.4350	0.6791	0.8997	0.8742	0.9950	0.3005
Average	0.6080	0.8015	0.9978	1.0115	1.2344	0.4992

Table 14. Comparison of different shield orientation forecasting models (MAE).

Model	CNN	GRU	CNN-GRU	LSTM	TCN	RF	XGBoost	Proposed
HDH	1.2265	0.6310	0.6876	0.6057	0.5622	0.6184	0.8115	0.3001
HDT	1.1483	0.7540	0.6601	1.0309	0.9265	0.5638	0.6359	0.4139
VDH	0.9112	0.8243	0.7829	1.3929	0.7891	4.0721	4.5978	0.5730
VDT	0.6293	0.6783	0.4880	0.7398	0.4915	0.6941	0.6746	0.2347
Average	0.9788	0.7219	0.6546	0.9423	0.6923	1.4871	1.6799	0.3804

Table 15. Comparison of different shield orientation forecasting models (RMSE).

Model	CNN	GRU	CNN-GRU	LSTM	TCN	RF	XGBoost	Proposed
HDH	1.7321	0.9021	0.9422	0.8049	0.7721	1.0857	1.2881	0.3872
HDT	1.5075	0.9825	0.8601	1.3839	1.2074	0.7448	0.8261	0.5327
VDH	1.3315	1.2071	1.0605	1.7540	1.0867	5.3871	5.8282	0.7762
VDT	0.8372	0.8997	0.6490	0.9950	0.6551	0.9332	0.8678	0.3005
Average	1.3521	0.9978	0.8779	1.2344	0.9303	2.0377	2.2025	0.4992

Table 16. Model complexity and computational efficiency comparison.

Model	Total Parameters	Trainable Parameters	Inference Time (ms/step)
CNN	156,432	156,432	3.82
GRU	175,745	175,745	4.44
CNN-GRU	187,745	187,745	4.51
LSTM	233,740	233,740	6.50
TCN	205,120	205,120	5.12
RF	0.85 M (trees)	-	12.34
XGBoost	1.12 M (trees)	-	15.67
ST-GC-GRU	177,923	177,923	4.50

Table 17. Comparison of prediction results with different time decay functions.

Time Decay Function	Constant Decay	Logarithmic Decay	Exponential Decay
HDH	0.4126	0.3997	0.3872
HDT	0.5498	0.5512	0.5327
VDH	0.7923	0.7851	0.7762
VDT	0.3154	0.3227	0.3005
Average	0.5175	0.5146	0.4992

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, W.; Chen, J.; Wang, S.; Wang, X.; Yan, X.; Zhang, C.; Zeng, L. ST-GC-GRU: A Hybrid Deep Learning Approach for Shield Attitude Prediction Based on a Spatial–Temporal Graph. Electronics 2026, 15, 711. https://doi.org/10.3390/electronics15030711

AMA Style

Liu W, Chen J, Wang S, Wang X, Yan X, Zhang C, Zeng L. ST-GC-GRU: A Hybrid Deep Learning Approach for Shield Attitude Prediction Based on a Spatial–Temporal Graph. Electronics. 2026; 15(3):711. https://doi.org/10.3390/electronics15030711

Chicago/Turabian Style

Liu, Wen, Jia Chen, Shanshan Wang, Xue Wang, Xingao Yan, Chenning Zhang, and Liang Zeng. 2026. "ST-GC-GRU: A Hybrid Deep Learning Approach for Shield Attitude Prediction Based on a Spatial–Temporal Graph" Electronics 15, no. 3: 711. https://doi.org/10.3390/electronics15030711

APA Style

Liu, W., Chen, J., Wang, S., Wang, X., Yan, X., Zhang, C., & Zeng, L. (2026). ST-GC-GRU: A Hybrid Deep Learning Approach for Shield Attitude Prediction Based on a Spatial–Temporal Graph. Electronics, 15(3), 711. https://doi.org/10.3390/electronics15030711

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ST-GC-GRU: A Hybrid Deep Learning Approach for Shield Attitude Prediction Based on a Spatial–Temporal Graph

Abstract

1. Introduction

2. Methodology

2.1. Shield Machine Attitude Measurement

2.2. Data Preprocessing

2.3. Model Establishment

2.3.1. Temporal Decomposition Representation

2.3.2. Fully Connected Spatial–Temporal Graph

2.3.3. GRU

2.4. Model Training and Evaluation

3. Case Study

3.1. Case Background and Data Processing

3.2. Training Details and Implementation

3.3. Analysis of Results

4. Discussion

4.1. Comparison with Other Models

4.2. The Analysis of the Time Decay Function

4.3. The Analysis of the Temporal Decomposition

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI