Next Article in Journal
Development and Validation of RP-HPLC Method for Simultaneous Assay and Dissolution Quantitative Analysis of Pitavastatin-Fenofibrate Complex Dual-Layered Tablets
Previous Article in Journal
Clean-Label Preservation of Refrigerated Bluefin Tuna Using Astaxanthin: Effects of Immersion Treatments and Packaging Conditions
Previous Article in Special Issue
Numerical Study on the Mechanical Behavior of Composite Segments Cut by a Shield Cutterhead in Metro Connected Aisles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multivariate Time Series Prediction Model for TBM Excavation Parameters Using a Convolution–GRU–Attention Neural Network

1
School of Civil Engineering, Harbin Institute of Technology, Harbin 150090, China
2
China Key Lab of Structures Dynamic Behavior and Control of the Ministry of Education, Harbin Institute of Technology, Harbin 150090, China
3
Chongqing Research Institute, Harbin Institute of Technology, Chongqing 401135, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(6), 2964; https://doi.org/10.3390/app16062964
Submission received: 15 January 2026 / Revised: 9 March 2026 / Accepted: 12 March 2026 / Published: 19 March 2026
(This article belongs to the Special Issue Advances in Tunnel Excavation and Underground Construction)

Abstract

Operating data from tunnel boring machines (TBMs) capture the state of both the machine and the ground, and accurate forecasting of their evolving operating variables is essential for assessing rock-mass stability and improving construction efficiency. However, it is difficult for the current methods to predict multivariate TBM driving parameters accurately. Therefore, a novel multivariable time series prediction method was proposed based on Convolution–GRU–Attention (CGA) neural networks. Initially, data preprocessing such as effective data extraction, segmentation, status judgment, and correlation analysis is applied to raw TBM excavation data to construct a parameter database encompassing 5987 TBM excavation cycles. Subsequently, the forecasting model is trained, incorporating techniques such as cross-validation, to ensure accurate predictions of excavation parameter trends. With the average coefficient of determination ( R 2 ) for total cutterhead thrust prediction reaching 0.883, and for cutterhead torque prediction achieving 0.923, the evaluation performance of the CGA model with a filter is better than GRU and BPNN. The results demonstrate that the proposed CGA model provides reliable predictions of key TBM operational parameters and offers useful insights into the evolution of TBM excavation behavior.

1. Introduction

Operational parameters are precursors to sudden incidents in the tunnel boring machine (TBM) [1]. For example, before and during occurrences such as machine jamming and collapses, operational parameters undergo significant changes, failing to remain within stable ranges [2]. To minimize the risks and losses of excavation and ensure the safety of TBM construction, it is urgent to study the changing rules and high-precision prediction methods of TBM operational parameters. At present, a large number of research methods for operating parameters mainly focus on theoretical research, numerical analysis, data-driven techniques and intelligent algorithms.
Theoretical research primarily focuses on analyzing the influencing factors of TBM operational parameters based on physical and mechanical models. By establishing mathematical models and analyzing the interaction between the cutterhead and rock, changes in TBM performance parameters under different geological conditions can be calculated and predicted. For example, several scholars have summarized shield total thrust into three main categories: cutterhead face resistance, shield shell friction, and rear support traction, and have conducted theoretical analyses of longitudinal quasi-static mechanical equilibrium in shield excavation. Zhang et al. [3], González et al. [4], Qi et al. [5], and Kong et al. [6] have each proposed adaptive improvements for the interactions between shield cutterheads and the soil, as well as the frictional resistance experienced by the shield shell in different strata. Meanwhile, Numerical analysis typically employs numerical calculation methods such as the finite element method and the discrete element method to simulate the TBM excavation process. These methods can detail the interaction between TBM and geological bodies, predicting excavation performance under different working conditions. For example, Maynar and Rodríguez [7], Qu et al. [8], and Choi et al. [9,10] used the discrete element method and the coupled discrete element-finite difference method to analyze shield thrust and its influencing factors. The findings revealed that most of the thrust is used to overcome the friction between the soil and the shield shell surface, with only a small portion used for soil excavation. Additionally, cutterhead rotation speed and mucking rate were found to be significantly correlated with shield thrust.
With the rapid development in recent years, data-driven methods have been widely used in TBM parameter prediction. These methods construct complex nonlinear models by learning from large amounts of historical data, accurately predicting TBM operational parameters. By using different kinds of intelligent methods, studies on TBM operational parameters have yielded promising results. BPNNs (Back Propagation Neural Networks) excel in the ability to model complex nonlinear relationships between TBM operational parameters and geological conditions. Huang et al. [11] applied traditional machine learning models such as MLP, SVM, and GBR to TBM parameter prediction tasks. While these models achieved reasonable accuracy, their ability to capture complex temporal dependencies and stage-specific behavior remains limited. Recurrent neural networks (RNNs), including LSTM and GRU, are highly effective at capturing temporal dependencies in time series data, and excel at TBM operation parameters prediction [12]. Convolutional Neural Networks (CNNs) are adept at extracting spatial features from multi-source data, enabling accurate predictions of pressure fields in TBM soil chambers. Zhang et al. [13] proposed a method for predicting the pressure field in a soil chamber using spatial distribution physical characteristic functions guided by deep learning. Fu et al. [14] proposed an SVMD–Informer-based interval prediction framework to address the strong non-stationarity and uncertainty inherent in TBM excavation performance data. This method constructs physical characteristic functions to decouple the spatial distribution characteristics of the soil chamber pressure. Convolutional Neural Networks (CNNs) and gated recurrent units (GRUs) were utilized to extract the spatial characteristics of historical multi-source parameter information and the temporal characteristics of the feature coefficients, respectively. By combining real-time multi-source parameter information, the feature coefficients were predicted, achieving the prediction of the soil chamber pressure field. However, traditional machine learning and neural network models can handle multivariate inputs; their performance often degrades in highly nonstationary or transition-prone segments, motivating the development of more adaptive and interpretable architectures.
To address the complexities of TBM excavation data, characterized by high-dimensional temporal dependencies, phase-specific variability, and sensitivity to abrupt geological transitions, a hybrid Convolution–GRU–Attention (CGA) neural network architecture was proposed. First, convolutional layers are employed to extract representative features; then, GRUs are used to capture temporal dependencies; finally, an attention mechanism highlights key information to enhance prediction accuracy. The model is validated using real project data and benchmarked against traditional approaches, demonstrating superior performance and practical applicability. This study provides general insights into modeling non-stationary, phase-dependent multivariate time-series, beyond TBM applications.

2. Methodology

The workflow of the CGA-based prediction method includes three steps (Figure 1). First, the original data is collected from the constructed sections of the TBM tunnel in the learning area, and then is preprocessed into an operational parameters dataset. Then, the CGA model is used to perform time series prediction on the construction parameters to evaluate the stability of the tunnel. Ultimately, by comparing the actual measured and the predicted indicators, monitoring the evolution of excavation stability during TBM tunneling was achieved. The workflow of the TBM parameters prediction process can be illustrated as a flowchart consisting of three steps, which can be broadly divided into two main parts (Figure 2): data and modeling preparation, followed by analysis of the prediction results.
Data and Model Preparing: Raw tunnel data were gathered, followed by the extraction of relevant data and segmentation of excavation cycles. These data were then transformed into a time series format. To ensure the robustness of the data model and to optimize the structure parameters of the CGA, the K-fold cross-validation method was employed.
Prediction Results Analysis: The developed model was evaluated using metrics such as coefficient of determination ( R 2 ), mean absolute error ( M A E ), root mean square error ( R M S E ) and coefficient of variation ( C V ). The data model was then used to predict TBM operations. To interpret the results from a geomechanical perspective, rock fragmentation indices (RFIs, such as FPI, TPI, SE and WR) were calculated, allowing the linkage of predicted values with excavation efficiency and collapse risk. Comparative evaluations with GRU and BPNN prediction models demonstrated the superiority of the CGA framework in terms of both predictive accuracy and engineering applicability.

2.1. Convolution–GRU–Attention (CGA) Neural Network

A Convolution–GRU–Attention (CGA) framework is developed for multivariate time-series prediction of TBM operational parameters (Figure 3). Unlike conventional CNN–RNN–Attention architectures that mainly focus on generic sequence modeling, the proposed framework is specifically designed to address the characteristics of TBM excavation data, which typically exhibit strong multivariate coupling, phase-dependent nonstationarity, and abrupt parameter variations during tunneling cycles.
In the CGA architecture, a one-dimensional convolutional layer is first employed to extract local temporal features and perform multi-scale fusion of correlated operational parameters. This component enables the model to capture short-term interactions and abrupt local variations that frequently occur during transitions between different excavation phases. The extracted features are subsequently processed by gated recurrent units (GRUs), which model long-range temporal dependencies within sequential excavation data and preserve the evolution patterns of operational parameters across tunneling cycles. To further enhance the representation capability of the model, a temporal attention mechanism is introduced after the recurrent layer. This mechanism adaptively assigns weights to hidden states at different time steps, allowing the network to emphasize the most informative temporal patterns within the sequence. Through this design, the CGA framework integrates convolutional feature extraction, sequential dependency learning, and attention-based feature refinement into a unified architecture consisting of three stages: a convolutional feature extraction layer, a recurrent-attention learning layer, and a fully connected prediction layer. This structure enables the model to better capture the complex temporal characteristics of TBM operational data and improves the reliability of multivariate parameter prediction.

2.1.1. CNN

A Convolutional Neural Network (CNN) is a neural network model capable of processing raw data into high-level, abstract representations by utilizing local connections and weight sharing to effectively and automatically extract internal features from the data. The CNN model primarily consists of convolutional layers, pooling layers, and fully connected layers, which collectively reduce the number of weights and simplify the complexity of the network model.
The convolutional layers apply multiple convolutional kernels to perform sliding window operations on the input data, extracting local features at different scales and orientations. The pooling layers then downsample the feature maps produced by the convolutional layers, reducing the feature dimensions and enhancing feature robustness. The fully connected layers integrate the feature vectors output by the pooling layers to accomplish the final regression task. By leveraging these network layers, the CNN model effectively captures valuable information, automatically generates feature vectors from the data, reduces the difficulty of feature extraction and data reconstruction, and improves the quality of data features.
The time series data is first input into the convolutional layers, where one-dimensional convolution is employed to extract and fuse features of multivariate time series parameters across different time scales (Figure 3). The convolution operation, using parameter sharing, learns patterns and structures in the data with fewer parameters and can effectively control the local receptive field’s acquisition of local features in the time series data by adjusting the convolution kernel size. The input and output dimensions for one-dimensional convolution are calculated as follows:
L o u t = L i n K + 2 P S + 1 , C = N K
where L o u t and C represent the output feature length and dimension, L i n is the input feature length, K is the convolution kernel size, P is the padding size, S is the stride, and N is the number of convolution kernels. The feature learning process of the convolutional layers for the input time series data can be seen as a transformation of low-dimensional features into high-dimensional sparse features, which better facilitates the learning process of subsequent recurrent neural networks.

2.1.2. GRU

Next, the high-dimensional sparse temporal features output by the convolutional layer are used as input for the memory recurrent attention layer, where a recurrent neural network learns the sequential features of the data. The recurrent layer of the established model employs a gated recurrent unit (GRU) [15]. The GRU introduces gating mechanisms that effectively capture long-term dependencies within the sequence (Figure 4). As mentioned in Lai et al. [16], the functions used make gradient backpropagation easier and enhance performance reliability. Therefore, the established model utilizes the ReLU function as the activation function for updating the hidden states. The hidden state at time step ( t ) in the recurrent unit can be calculated as follows:
r t = σ x t W x r + h t 1 W h r + b r
u t = σ x t W x u + h t 1 W h u + b u
c t = R e L U x t W x c + r t h t 1 W h c + b c
h t = 1 u t h t 1 + u t c t
where represents element-wise multiplication, σ is the sigmoid activation function, x t represents the input at time step t , and W and b represent the recurrent network parameters. The hidden state output at time step t is denoted as h t .

2.1.3. Attention Mechanism

Due to the lack of clear periodic information in the input temporal data and the varying importance of different time stamps to the output results, an attention mechanism is introduced in the model to focus more on parts relevant to the target task. This enhances the resistance of the model to the interference from information distant from the current time step. Specifically, the temporal attention weights ( α t ) at time step ( t ) and the output of the temporal attention layer ( H ^ ) are calculated as follows:
α t   =   softmax Score H , W q h t
H ^ =   α t H
where H is the matrix representation of the hidden states of each time step in the recurrent unit, Wq is the weight for computing the query values, and Score is the chosen similarity function, such as dot product or cosine similarity.
Finally, the hidden state matrix obtained from the recurrent-attention layer is input into the fully connected layer for further feature fusion and analysis. Specifically, the temporal attention output feature matrix is concatenated with the original output feature matrix along the dimension direction. The resulting fused feature matrix is used as the input to the fully connected layer, leading to the final prediction result:
y ^   =   W f concat H , H ^   +   b f
where concat represents the concatenation process along the dimension direction, and Wf and bf represent the parameters of the fully connected layer.

2.2. Model Training Method

The model training employs the Adam optimization algorithm, a widely used gradient descent optimization method that can adaptively adjust the learning rate, quickly adapting to the characteristics of the dataset and thereby accelerating the convergence speed [17]. The calculation form is as follows:
g t = 1 S θ Loss in
{ m t = β 1 m t 1 + 1 β 1 g t v t = β 2 v t 1 + 1 β 2 g t 2
m ^ t = m t / 1 β 1 t , v ^ t = v t / 1 β 2 t
θ t = θ t 1 α m ^ t / m ^ t + ε
where S is the batch size of the network, equal to the number of images in the support set, θ denotes the parameter gradient calculation, m t and v t represent the first and second moment estimates of the gradient, and m ^ t and v ^ t represent the bias-corrected gradient moment estimates. β 1 and β 2 are the exponential decay rates, taken as 0.9 and 0.999, respectively, α is the learning rate, and ε is a small constant to maintain numerical stability, taken as 10−8. The model training uses the Mean Squared Error ( MSE ) loss function.

3. Method Validation

3.1. Data and Preprocessing

To validate the proposed framework, operational data from a TBM project were utilized [18]. The dataset covered a broad range of TBM parameters, including thrust, torque, penetration rate, advance velocity, chamber pressures, and machine attitude indicators, together with rock mass classification information. Non-operational segments, such as idle or shutdown states, were excluded to focus on effective excavation cycles (Figure 5a).
To objectively identify stable excavation segments within each TBM driving cycle, a variability-based segmentation method was employed using the set propulsion speed time series (Figure 5b) [19]. Let v s ( t ) denote the set propulsion speed at time step t . A sliding window of length L was applied to compute the local standard deviation of the set speed, expressed as
σ v ( t ) = 1 L i = t L + 1 t V s ( i ) V ˉ s ( t ) 2
where V ˉ s ( t ) represents the mean set speed within the window. During stable excavation, the propulsion command remains nearly constant, leading to low and steady values of σ v ( t ) . In contrast, during the preparation stage, frequent parameter adjustments result in significantly higher speed variability. Based on this observation, a threshold criterion σ v ( t ) < σ th was adopted to detect candidate stable segments. To avoid misclassification caused by short-term speed holding or transient operating conditions, an additional duration constraint was imposed, requiring the condition to be continuously satisfied for a minimum time span. Segments that met both the variability and duration criteria were identified as stable driving phases, while the remaining portions were classified as preparation phases. This rule-based segmentation approach enables consistent and reproducible extraction of steady excavation data, forming a reliable basis for subsequent time-series modeling and performance analysis.
Thirteen representative TBM operational parameters were selected as input features to characterize both mechanical behavior and TBM attitude during excavation. These include cutterhead rotational speed (N) and its set value (Ns), total thrust (F), cutterhead torque (T), penetration rate (P), and advance velocity (V) together with its set value (Vs). In addition, multiple hydraulic pressures reflecting the chamber and TBM conditions were considered, such as bracing pressure (BP), left, right, and top shield pressures (LSP, RSP, and TSP), and the left and right chamber gripper-shoe pressures (LCGSP and RCGSP). Two attitude-control parameters, namely the guide roll angle (GRA) and guide pitch angle (GPA), were also incorporated. All features were normalized and reformatted into multivariate time-series sequences, which provided the basis for subsequent model development and evaluation.
TBM operational data inherently exhibit strong temporal dependency due to the sequential nature of the excavation process. During continuous tunneling, adjacent excavation cycles are often highly correlated because machine operating conditions, geological characteristics, and control parameters evolve gradually rather than changing abruptly. As a result, operational parameters such as thrust, torque, and penetration rate typically show strong autocorrelation and temporal continuity across neighboring cycles.
To account for this characteristic, the prediction framework is constructed based on a time-series windowing strategy, where consecutive observations are organized into ordered input sequences to preserve temporal structure during model training. This approach enables the model to capture both short-term parameter interactions and longer-term sequential dependencies within the excavation process. By explicitly modeling the temporal evolution of TBM operational parameters, the proposed framework better reflects the dynamic behavior of rock–machine interaction during tunneling and improves the reliability of multivariate time-series prediction.

3.2. Model Training Process

The Convolution–GRU–Attention (CGA) model integrates three components to capture the complex characteristics of TBM operational data. First, one-dimensional convolutional layers are employed to extract multiscale temporal features and reduce local noise. These features are then processed by gated recurrent units (GRUs), which are effective in learning sequential dependencies across excavation cycles. Finally, a temporal attention mechanism is incorporated to emphasize the most relevant time steps, improving interpretability and robustness.
In the process of TBM excavation, in addition to the above 13 TBM operational parameters, the geological parameters also affect the rock-machine interaction. According to the project’s on-site geological parameters, specifically the surrounding rock grade, were collected. The one-hot encoding method [20] was then utilized to process the geological data, with each classification result of the surrounding rock grade represented as a vector. By following the excavation segment division method described earlier, 13 operational parameters and one geological parameter were selected, yielding 5987 valid excavation data samples. Of these, 1810 samples corresponding to typical excavation segmentations (excluding excavation cycles around collapse zones) were chosen. These samples were split into training and testing sets in a 4:1 ratio to train the CGA model.
A one-at-a-time (OAT) sensitivity analysis was conducted based on the established CGA training framework. The evaluated hyperparameters include the learning rate, batch size, number of training iterations, time-series window length, convolution kernel size, number of convolution kernels, and hidden state dimensions. The quantitative results of this analysis are summarized in Table 1, where the corresponding R 2 , MAE, RMSE, and CV values are reported for each tested configuration. The results indicate that the CGA model exhibits a relatively stable performance plateau for moderate parameter values, rather than a sharp optimum (Figure 6). In particular, increasing the number of training iterations beyond 300, enlarging the convolution kernel size beyond 3, or further increasing the model capacity (e.g., N k and C ) leads to only marginal performance gains while significantly increasing computational cost. Based on this analysis, the adopted baseline configuration (e.g., α   =   0.001 , W   =   20 , K   =   3 , N k   =   128 , C   =   128 , and 300 training iterations) was selected as a balanced trade-off between prediction accuracy, model stability, and computational efficiency. This OAT-based sensitivity analysis improves the transparency and reproducibility of the model configuration and supports the rationality of the chosen hyperparameters for the investigated TBM tunneling dataset. From an operational perspective, TBM parameters typically evolve gradually across consecutive excavation records due to the continuous interaction between the cutterhead and the surrounding rock. A window length of 20, therefore, provides sufficient historical context to capture short-term operational dynamics while avoiding excessive redundancy from longer sequences.
The CGA model was subsequently employed to predict the thrust force ( F ) and torque ( T ) during TBM excavation cycles. The data processing and analysis for TBM excavation were conducted on a computer equipped with an Intel Core i7-12700K CPU, operating within a Windows environment and running at a clock speed of 4.20 GHz. The time series prediction model was executed on a computer with a 12 GB NVIDIA GeForce RTX 3060 GPU.
To ensure reliable model evaluation, the dataset was first divided into training and testing subsets using a 4:1 ratio. The testing set was reserved for final model evaluation, while K-fold cross-validation was applied within the training set to improve the robustness of model training and hyperparameter selection.

3.3. K-Fold Cross Validation

To address the issue of overfitting in multivariate time series prediction when a large number of features is insufficient to support model complexity, K-fold cross-validation was employed (Figure 7). This method reduces the randomness of data partitioning and enhances the generalization ability of the model [21]. The dataset is divided into k subsets, and cross-validation is performed over k iterations. In each iteration, the data is randomly split into complementary subsets, where the model is trained on the training set and validated on the testing set. The average performance metrics from these k rounds provide an estimate of the model’s prediction accuracy.
The average values of R2, MAE, and RMSE across different folds (k = 2, 5, 10, 15, 20) reflect the generalization capability and robustness of the prediction model (Figure 8). As k increases from 2 to 5, R2 shows a noticeable improvement, and both MAE and RMSE decrease, suggesting enhanced model stability under more representative validation. Beyond k = 5, all metrics tend to stabilize, and CV remains low, indicating diminishing variance and high robustness of the model. This aligns with prior findings by Marcot and Hanea [21], who concluded that a k value between 5 and 10 offers a reliable trade-off between evaluation accuracy and computational efficiency. Therefore, k = 5 was selected for the main analysis in this study, as it provides sufficiently stable and representative performance estimates without the drawbacks of excessive data fragmentation.

4. Prediction Results Analysis

4.1. Smoothing Processing

Comparing the predicted value with the actual values during TBM operations can serve as a powerful tool to detect changes in geological conditions or anomalies in operating processes, and can allow for the early identification of deviations from expected behavior, which could indicate potential issues that need immediate attention [22].
For multivariate time series prediction results analysis, non-stationary signals (such as trend values) often present issues with statistical characteristics changing over time. The Hodrick–Prescott (HP) filter is utilized to perform trend decomposition and detrending on such time series data. The HP filter is specifically used for decomposing time series data into trend and cyclical components, effectively reducing noise and eliminating some outliers [23]. The core optimization objective function is expressed as follows:
min τ t t = 1 T ( y t τ t ) 2 + λ t = 2 T 1 ( τ t + 1 2 τ t + τ t 1 ) 2
where y t denotes the time series, τ t represents the trend component, and λ is the smoothing parameter. A sensitivity analysis was conducted by varying the smoothing parameter λ over a wide range commonly adopted in engineering time-series analysis (Table 2). The results indicate that increasing λ leads to smoother trends and slightly reduced error metrics (MAE and RMSE), with a stable performance plateau observed for λ values above approximately 5000. This behavior suggests that small λ values result in insufficient noise suppression, while excessively large λ values do not introduce further improvement and mainly increase smoothness without altering the underlying trend.
Based on this analysis, λ = 10,000 was selected as a representative value, as it provides an effective balance between high-frequency noise reduction and preservation of the physically meaningful TBM torque evolution. It should be noted that the HP filter is intended to extract low-frequency trends rather than achieve point-wise fitting; therefore, error-based metrics (MAE and RMSE) are more appropriate than R2 for evaluating the filtering effect. The adopted λ value ensures that the filtered predictions retain essential engineering information while improving stability and interpretability for subsequent analysis.

4.2. Prediction Methods Comparison

Four representative evaluation metrics were employed to assess model performance: coefficient of determination ( R 2 ), mean absolute error ( MAE ), root mean squared error ( RMSE ), and coefficient of variation ( CV ). These metrics were used to quantify the performance of the CGA multivariate sequence prediction model. Considering the non-stationary characteristics of TBM operational time-series data, multiple evaluation metrics are used to assess model performance. While R 2 provides an overall measure of goodness of fit, error-based metrics such as MAE and RMSE offer a more direct evaluation of prediction accuracy.
Figure 9a illustrates the comparison between actual values and predicted values (both with and without filtering) for the total thrust (F) of Test Sample A. The predicted values have undergone HP filtering. The curve is divided into three distinct sections: the preparing phase ( PP ) (highlighted in light yellow), the moderately stable phase ( SP 60 ) (highlighted in blue), and the stable phase ( SP 200 ) (highlighted in gray). These are defined in Section 3.2. Initially, during PP , the predicted values exhibit significant variability compared to the actual values. As the process transitions into the moderately stable and stable sections, the prediction accuracy improves markedly. The filtered predicted values align closely with the actual values, demonstrating reduced noise and enhanced prediction fidelity. Some data are partially magnified in the figure. The predicted values processed by HP filtering can well reflect the changing trend of the predicted values. Moreover, due to the small fluctuation range, better performance metrics can often be obtained after calculation with the original data. The performance metrics, including an R 2 value of 0.88, a low MAE of 238.94, a RMSE of 372.74, and a CV of 0.06, indicate a high level of prediction accuracy and reliability.
Figure 9b depicts the excavation parameter T for Test Sample B during an excavation cycle. The overall response patterns of the curve observed are similar to those for F , but some differences are notable. The response values for T are smaller than those for F , and perform better in evaluation metrics. The predicted values closely follow the actual values, with a similar trend in the response curve. The R 2 value is 0.92, the MAE is 84.09, the RMSE is 108.65, and the CV is 0.16. These metrics suggest that despite the visual similarity in trends, there are discrepancies between the predicted and actual values, particularly in the periodic fluctuations of the prediction curve compared to the actual values.
The scatter plot in Figure 10a illustrates the relationship between predicted and actual values of T of Test Sample B, with unfiltered predictions and predictions after applying the Hodrick–Prescott filter. The trend lines ( y   =   ( 1   ±   0.1 )   x and y   =   ( 1   ±   0.2 )   x ) confirm the accuracy of the model, with most points aligning closely with the y   =   x line. Filtering has effectively reduced noise, as seen by the tighter clustering of red points around the trend line. Some outliers exist, indicating areas for potential improvement. In conclusion, the analysis demonstrates the robustness of the model and its enhanced accuracy through filtering, making it reliable for various excavation scenarios.
The distribution of residuals from the predictions of the model is presented in Figure 10b, overlaid with a normal distribution curve. The residuals exhibit a bell-shaped curve centered around zero, indicating that the prediction errors are symmetrically distributed. The majority of residuals fall within the range of −100 to 100, with the highest frequency observed near zero, suggesting that most predictions are close to the actual values. The presence of residuals at both tails of the distribution, albeit with lower frequencies, indicates occasional larger prediction errors.
To provide meaningful baselines for comparison, a BPNN and a GRU network were implemented with model-specific but structurally appropriate hyperparameter settings, while maintaining consistent data preprocessing and training strategies. Due to fundamental architectural differences between feedforward and recurrent networks, identical parameter configurations are neither feasible nor meaningful; instead, each baseline was configured following commonly adopted practices for its respective model class.
The BPNN model was implemented as a feedforward neural network with one hidden layer. The number of hidden neurons was determined based on a balance between representation capacity and overfitting risk, and was selected to be comparable in scale to the feature dimension used in the proposed CGA model. ReLU activation was employed in the hidden layer to enhance nonlinear modeling capability, followed by a linear output layer. A dropout rate of 0.2 was applied to reduce overfitting. As BPNN does not explicitly model temporal dependencies, the input time-series window was flattened into a feature vector prior to training.
The GRU baseline consisted of a single recurrent GRU layer followed by a fully connected output layer. The hidden-state dimension of the GRU was selected to ensure sufficient capacity for capturing temporal dependencies while maintaining computational efficiency. Unlike the CGA model, no convolutional feature extraction or attention mechanism was included, allowing the GRU baseline to focus solely on sequential modeling of the raw multivariate inputs.
For both baseline models, the same time-series window length ( W = 20 ), batch size (256), learning rate ( 1 × 10 3 ), optimizer (Adam), loss function (mean squared error), and training epochs (300) were used wherever applicable. This design ensures that performance differences primarily reflect model architecture and temporal modeling capability, rather than discrepancies in data handling or optimization strategy.
The prediction results from GRU and BPNN models are compared to determine the most effective approach for accurate and reliable operational parameter predictions (Figure 11). Table 3 compares the average evaluation metrics of different predictive methods: the CGA model, the CGA model with the HP filter, the GRU model, and the BPNN model. The CGA model integrates convolutional layers to capture spatial features, GRU layers for temporal sequence modeling, and attention mechanisms to focus on the most relevant parts of the data. The CGA with HP filter configuration applies the HP filter to the prediction results of the CGA model with the same model structure. Table 3 reveals that for F , the CGA with the HP filter achieves the highest R 2 value (0.883), lowest MAE (282.972), and lowest RMSE (430.754), indicating superior accuracy and fewer large errors compared to the other models. The plain CGA model follows closely in performance, while the GRU and BPNN models show progressively lower accuracy and higher errors. For predicting T , the CGA with the HP filter again leads with the highest R 2 value (0.923), lowest MAE (73.323), and lowest RMSE (92.015). The plain CGA model performs well, followed by the GRU and BPNN models, which exhibit significantly higher errors. Interestingly, the R 2 value for T prediction by the CGA with HP filter (0.151) is slightly lower than that of the plain CGA model (0.164), indicating some room for improvement in this specific aspect.

4.3. Rock Fragmentation Indices Analysis

To further interpret the prediction results from a geomechanical perspective, several rock fragmentation indices (RFIs), including the Field Penetration Index (FPI), Torque Penetration Index (TPI), Specific Energy (SE), and Work Ratio (WR), were analyzed across different excavation phases. These indices reflect the interaction between TBM operational parameters and the surrounding rock, thereby providing a physical interpretation of excavation stability and rock-breaking efficiency.
FPI = F P N c
TPI = T P 0.3 N c D t
SE = F P + 2 π T π ( D t / 2 ) 2 P
WR =   c 0 ( T n ) ( F v a )
where N c is the number of cutters, D t is the diameter of the cutterhead.   c 0 and a are constants, determined according to the boring machine and geological conditions.
Figure 10a presents the response of TBM RFIs of Test Sample A across different phases of the excavation process. During the preparing phase ( PP ), all indices exhibit significant fluctuations, indicating the initial instability and adjustments in the TBM operation. The indices quickly stabilize as the process transitions into the moderately stable phase ( SP 60 ). In this phase, the indices show reduced variability, with the rock-breaking performance becoming more consistent. As the operation enters the stable phase ( SP 200 ), the indices maintain a steady state with minimal fluctuations, indicating optimal and stable TBM performance. Notably, the Field Penetration Index (FPI) shows a higher initial variance, which gradually decreases and stabilizes over time. The other indices, including the Torque Penetration Index (TPI), Specific Energy (SE), and Wear Rate (WR), follow a similar pattern of initial instability followed by stabilization.
From the comparison between Figure 12a and Figure 12b, the difference in effects between PP and SP 60 is not very distinct, whereas the distinction between SP 60 and SP 200 is more noticeable. During the SP 60 phase, there is a rapid increase in T , and the disparity between the actual and predicted values of T is significant. The RFIs, including FPI, TPI, and SE, continue to undergo substantial changes, while WR has already stabilized. In the SP 200 phase, the response curves of T and all RFIs appear more stable.
The analysis of results highlights the response of TBM parameters in different operational phases, demonstrating the adjustment of the machine as it transitions from preparation to a stable excavation state. The consistency in the indices during the stable phase underscores the effectiveness of the TBM operational adjustments and the reliability of the indices in monitoring the rock-breaking performance.
By integrating the prediction results in Figure 9 with the RFI responses in Figure 12, the excavation process of the TBM can be characterized more comprehensively. During the preparation phase, the CGA-predicted thrust and torque exhibit significant fluctuations relative to the actual values (Figure 9), and the RFIs, including FPI, TPI, SE, and WR, also display pronounced variability (Figure 12). This stage reflects the transient instability of the machine–rock interaction as the cutterhead gradually engages with the surrounding rock. As the process transitions into the moderately stable phase, the predicted values align more closely with the measured ones, and the RFIs begin to converge with reduced volatility, indicating improved excavation consistency and enhanced rock-breaking efficiency. In the stable phase, both the predicted parameters (Figure 9) and the RFIs (Figure 12) maintain steady trends with minimal fluctuations, highlighting efficient energy utilization, stable thrust–torque responses, and optimal machine performance. These findings demonstrate that the CGA model not only provides accurate parameter predictions but also, when combined with RFIs, yields valuable physical insights into excavation stability and rock fragmentation behavior across different operational phases.
In the current study, RFIs are intentionally used as a post hoc analytical tool to interpret the physical implications of the predicted TBM operational parameters, rather than as direct inputs or outputs of the prediction model. This design choice aims to decouple data-driven prediction accuracy from engineering interpretation, thereby enhancing model robustness and transparency. Although the RFIs analysis mainly focuses on trend interpretation, these indices provide physically meaningful indicators for understanding the interaction between TBM operational parameters and rock fragmentation behavior during different excavation phases.
Specifically, the CGA model is first employed to generate reliable forecasts of key TBM parameters (e.g., thrust and torque), after which RFIs are calculated based on these predicted parameters to analyze trends in rock fragmentation efficiency and excavation stability. This post-analysis framework allows RFIs to serve as an engineering interpretation layer, linking abstract model predictions to physically meaningful indicators of machine–ground interaction and excavation performance.
In future work, this framework can be naturally extended toward proactive decision support by embedding RFI thresholds or trend-change rules into an online monitoring system. For example, abrupt increases in predicted RFIs or sustained deviations from baseline ranges could be used to trigger early-warning signals or guide adaptive adjustment of TBM operational parameters. Therefore, while RFIs currently function as a post-analysis and validation mechanism in this case study, they provide a clear pathway toward real-time risk assessment and operational optimization in practical TBM applications.

5. Conclusions

The study builds a novel time-series prediction method for TBM operation parameters, utilizing a hybrid CGA neural network model. The model is designed by combining a CNN for feature extraction, a GRU for temporal sequence modeling, and an attention mechanism to emphasize the most relevant time steps. The CGA model was trained on a substantial dataset of 5987 TBM excavation cycles and was evaluated against other models for predicting key operational parameters such as total thrust and cutterhead torque. The main conclusions are as follows:
  • The CGA model demonstrated superior predictive performance for TBM parameters like total thrust and cutterhead torque. The HP filter reduced noise in time series data, improving stability and error metrics (0.883 for total thrust, 0.923 for cutterhead torque) and outperforming traditional models like GRU and BPNN.
  • The integration of convolutional feature extraction and GRU-based temporal modeling enables the CGA model to effectively capture both local temporal patterns and long-term dependencies in multivariate TBM operational data. The attention mechanism further enhances the model by adaptively focusing on the most informative time steps within the sequence.
  • By combining the predicted operational parameters with rock fragmentation indices (RFIs), the study provides a physically interpretable framework for analyzing TBM excavation behavior across different operational phases. The evolution of RFIs during preparation, transition, and stable excavation stages reflects changes in rock–machine interaction and excavation stability.
Despite the promising prediction performance achieved in this study, several limitations should be noted. The dataset is derived from a single TBM project, and further validation using data from different projects or machines would help evaluate the transferability of the proposed framework. It should be noted that the current framework focuses on deterministic point prediction of TBM operational parameters. Future work will explore uncertainty-aware prediction methods, such as probabilistic forecasting or prediction interval estimation, to further improve the interpretability and reliability of model outputs in practical TBM applications.

Author Contributions

C.Y.: Writing—original draft, Visualization, Validation, Software, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. X.K.: Writing—review and editing, Resources, Methodology, Funding acquisition, Conceptualization. L.T.: Visualization, Methodology, Investigation, Data curation. X.L.: Supervision, Resources, Project administration, Funding acquisition. W.T.: Visualization, Software, Methodology, Investigation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Heilongjiang Provincial Science and Technology Innovation Base Award Project (Grant No. JD25B010); Open Research Fund Program of State Key Laboratory of Mechanical Behavior and System Safety of Traffic Engineering Structures (Grant No. KF2025-05); Key Lab of Structures Dynamic Behavior and Control of the Ministry of Education, Harbin Institute of Technology (Grant No. HITCE202408); Chongqing Urban Management Research Project (Urban Management Science Document No. 28, 2023); Natural Science Foundation of Heilongjiang Province of China (Grant No. LH2024D014); China Postdoctoral Science Foundation Project (Grant No. 2024M754193); the Chongqing Construction Science and Technology Project (Grant No. 2023-5-6); the Research and Development Project of the Ministry of Housing and Urban-Rural Development (Grant No. 2022-K-040).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Rostami, J. Hard Rock TBM Cutterhead Modeling for Design and Performance Prediction. Geomech. Tunn. 2008, 1, 18–28. [Google Scholar] [CrossRef]
  2. Wang, L.; Sun, W.; Long, Y.; Yang, X. Reliability-Based Performance Optimization of Tunnel Boring Machine Considering Geological Uncertainties. IEEE Access 2018, 6, 19086–19098. [Google Scholar] [CrossRef]
  3. Zhang, Q.; Huang, T.; Huang, G.; Cai, Z.; Kang, Y. Theoretical Model for Loads Prediction on Shield Tunneling Machine with Consideration of Soil-Rock Interbedded Ground. Sci. China Technol. Sci. 2013, 56, 2259–2267. [Google Scholar] [CrossRef]
  4. González, C.; Arroyo, M.; Gens, A. Thrust and Torque Components on Mixed-Face EPB Drives. Tunn. Undergr. Space Technol. 2016, 57, 47–54. [Google Scholar] [CrossRef]
  5. Qi, W.; Wang, L.; Zhou, S.; Kang, Y.; Zhang, Q. Total Loads Modeling and Geological Adaptability Analysis for Mixed Soil-Rock Tunnel Boring Machines. Undergr. Space 2022, 7, 337–351. [Google Scholar] [CrossRef]
  6. Kong, X.; Ling, X.; Tang, L.; Tang, W.; Zhang, Y. Random Forest-Based Predictors for Driving Forces of Earth Pressure Balance (EPB) Shield Tunnel Boring Machine (TBM). Tunn. Undergr. Space Technol. 2022, 122, 104373. [Google Scholar] [CrossRef]
  7. Maynar, M.; Rodríguez, L. Discrete Numerical Model for Analysis of Earth Pressure Balance Tunnel Excavation. J. Geotech. Geoenviron. Eng. 2005, 131, 1234–1242. [Google Scholar] [CrossRef]
  8. Qu, T.; Wang, S.; Fu, J.; Hu, Q.; Zhang, X. Numerical Examination of EPB Shield Tunneling–Induced Responses at Various Discharge Ratios. J. Perform. Constr. Facil. 2019, 33, 04019035. [Google Scholar] [CrossRef]
  9. Choi, S.; Lee, H.; Choi, H.; Chang, S.-H.; Kang, T.-H.; Lee, C. Numerical Analysis of EPB TBM Driving Using Coupled DEM-FDM Part II: Parametric Study. Tunn. Undergr. Space 2020, 30, 496–507. [Google Scholar]
  10. Choi, S.; Lee, H.; Choi, H.; Chang, S.-H.; Kang, T.-H.; Lee, C. Numerical Analysis of EPB TBM Driving Using Coupled DEM-FDM Part I: Modeling. Tunn. Undergr. Space 2020, 30, 484–495. [Google Scholar]
  11. Huang, H.; Chang, J.; Zhang, D.; Zhang, J.; Wu, H.; Li, G. Machine Learning-Based Automatic Control of Tunneling Posture of Shield Machine. J. Rock Mech. Geotech. Eng. 2022, 14, 1153–1164. [Google Scholar] [CrossRef]
  12. Gao, X.; Shi, M.; Song, X.; Zhang, C.; Zhang, H. Recurrent Neural Networks for Real-Time Prediction of TBM Operating Parameters. Autom. Constr. 2019, 15, 130–140. [Google Scholar] [CrossRef]
  13. Zhang, C.; Zhu, M.; Lang, Z.; Chen, R.; Cheng, H. Predictive Method of Soil Chamber Pressure Field for Shield Machines Based on Deep Learning. J. Geotech. Eng. (Chin.) 2024, 46, 307–315. [Google Scholar]
  14. Fu, K.; Xue, Y.; Qiu, D.; Shao, T.; Lan, G. Interval Prediction of TBM Tunneling Performance under Uncertainty Using the Successive Variational Mode Decomposition (SVMD)–Informer Model. Autom. Constr. 2026, 181, 106656. [Google Scholar] [CrossRef]
  15. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
  16. Lai, G.; Chang, W.-C.; Yang, Y.; Liu, H. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks. In Proceedings of the SIGIR ’18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018. [Google Scholar]
  17. Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  18. Chen, Z.; Fan, L.; Zhang, Y.; Xiao, H.; Wang, L. Knowledge-Based and Data-Based Machine Learning in Intelligent TBM Construction. Tumu Gongcheng Xuebao/China Civ. Eng. J. 2024, 57, 1–12. [Google Scholar] [CrossRef]
  19. Yao, C.; Kong, X.; Tang, L.; Ling, X. An Unsupervised Deep Learning Surrounding Rock Perception Method for TBM Operational Parameter Multiobjective Optimization. Results Eng. 2025, 27, 106925. [Google Scholar] [CrossRef]
  20. Potdar, K.; Pardawala, T.; Pai, C. A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers. Int. J. Comput. Appl. 2017, 175, 7–9. [Google Scholar] [CrossRef]
  21. Marcot, B.; Hanea, A. What Is an Optimal Value of k in K-Fold Cross-Validation in Discrete Bayesian Network Analysis? Comput. Stat. 2021, 36, 2009–2031. [Google Scholar] [CrossRef]
  22. Xu, C.; Liu, X.; Wang, E.; Wang, S. Prediction of Tunnel Boring Machine Operating Parameters Using Various Machine Learning Algorithms. Tunn. Undergr. Space Technol. 2021, 109, 103699. [Google Scholar] [CrossRef]
  23. Cogley, T.; Nason, J. Effects of the Hodrick-Prescott Filter on Trend and Difference Stationary Time Series Implications for Business Cycle Research. J. Econ. Dyn. Control 1992, 19, 253–278. [Google Scholar] [CrossRef]
Figure 1. The process diagram of the CGA approach for TBM parameter prediction.
Figure 1. The process diagram of the CGA approach for TBM parameter prediction.
Applsci 16 02964 g001
Figure 2. The process diagram for TBM parameter prediction.
Figure 2. The process diagram for TBM parameter prediction.
Applsci 16 02964 g002
Figure 3. The overall architecture of the CGA model.
Figure 3. The overall architecture of the CGA model.
Applsci 16 02964 g003
Figure 4. Unit structure of GRU.
Figure 4. Unit structure of GRU.
Applsci 16 02964 g004
Figure 5. The original TBM data segmentation method [19]: (a) Raw excavation records of principal parameters and (b) a representative tunneling cycle divided into distinct phases.
Figure 5. The original TBM data segmentation method [19]: (a) Raw excavation records of principal parameters and (b) a representative tunneling cycle divided into distinct phases.
Applsci 16 02964 g005
Figure 6. CGA Model training process.
Figure 6. CGA Model training process.
Applsci 16 02964 g006
Figure 7. The training with the K-fold cross-test is used to practice the process indication diagram.
Figure 7. The training with the K-fold cross-test is used to practice the process indication diagram.
Applsci 16 02964 g007
Figure 8. The evaluation metrics from different K-fold cross-validation tests.
Figure 8. The evaluation metrics from different K-fold cross-validation tests.
Applsci 16 02964 g008
Figure 9. Prediction performance in different operational phases: (a) Prediction of F and (b) Prediction of T.
Figure 9. Prediction performance in different operational phases: (a) Prediction of F and (b) Prediction of T.
Applsci 16 02964 g009
Figure 10. Error analysis of prediction results based on prediction results of T: (a) comparison between prediction and actual values impacted by filter; (b) residual histogram.
Figure 10. Error analysis of prediction results based on prediction results of T: (a) comparison between prediction and actual values impacted by filter; (b) residual histogram.
Applsci 16 02964 g010
Figure 11. Prediction performance of T using traditional models: (a) GRU and (b) BPNN.
Figure 11. Prediction performance of T using traditional models: (a) GRU and (b) BPNN.
Applsci 16 02964 g011
Figure 12. The response to RFIs in different operational phases: (a) Sample A and (b) Sample B.
Figure 12. The response to RFIs in different operational phases: (a) Sample A and (b) Sample B.
Applsci 16 02964 g012
Table 1. Hyper-parameter settings for model training.
Table 1. Hyper-parameter settings for model training.
Hyper-ParametersSymbolValueR2MAERMSECV
Learning rateα5 × 10−40.8640.1830.2610.096
10−30.8830.1650.2390.082
2 × 10−30.8760.1720.2480.087
Batch sizeB640.8710.1760.2510.091
1280.8790.1690.2430.085
2560.8830.1650.2390.082
Number of iterationse3000.8830.1650.2390.082
6000.8860.1620.2360.079
10000.8870.1610.2350.078
Time series windowW100.8570.1920.2740.101
200.8830.1650.2390.082
300.8780.1700.2450.086
Convolution kernel sizeK20.8760.1720.2480.087
30.8830.1650.2390.082
40.8790.1690.2440.084
Number of convolution kernelsNk640.8680.1810.2580.094
1280.8830.1650.2390.082
2560.8850.1630.2370.08
Hidden state dimensionsC640.8720.1770.2520.09
1280.8830.1650.2390.082
2560.8860.1620.2360.079
Table 2. The sensitivity analysis of the smoothing parameter λ.
Table 2. The sensitivity analysis of the smoothing parameter λ.
λR2MAERMSE
1000.919239.211374.075
3000.920238.830373.016
5000.920238.600372.509
10000.921238.224371.788
20000.922237.822371.073
50000.923237.322370.219
10,0000.924236.984369.684
Table 3. Comparison of evaluation metrics of different methods.
Table 3. Comparison of evaluation metrics of different methods.
ModelF/(kN)T/(kN·m)
R2MAERMSECVR2MAERMSECV
CGA with filter0.883282.972430.7540.0800.92373.32392.0150.151
CGA0.876296.486454.3010.0840.91177.63398.9370.164
GRU0.825395.877566.4530.0900.83495.211118.9820.187
BPNN0.750408.170586.8750.0930.759111.592138.1770.214
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yao, C.; Kong, X.; Tang, L.; Ling, X.; Tang, W. A Multivariate Time Series Prediction Model for TBM Excavation Parameters Using a Convolution–GRU–Attention Neural Network. Appl. Sci. 2026, 16, 2964. https://doi.org/10.3390/app16062964

AMA Style

Yao C, Kong X, Tang L, Ling X, Tang W. A Multivariate Time Series Prediction Model for TBM Excavation Parameters Using a Convolution–GRU–Attention Neural Network. Applied Sciences. 2026; 16(6):2964. https://doi.org/10.3390/app16062964

Chicago/Turabian Style

Yao, Changrui, Xiangxun Kong, Liang Tang, Xianzhang Ling, and Wenchong Tang. 2026. "A Multivariate Time Series Prediction Model for TBM Excavation Parameters Using a Convolution–GRU–Attention Neural Network" Applied Sciences 16, no. 6: 2964. https://doi.org/10.3390/app16062964

APA Style

Yao, C., Kong, X., Tang, L., Ling, X., & Tang, W. (2026). A Multivariate Time Series Prediction Model for TBM Excavation Parameters Using a Convolution–GRU–Attention Neural Network. Applied Sciences, 16(6), 2964. https://doi.org/10.3390/app16062964

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop