Prediction of Shield Machine Attitude Based on Various Artiﬁcial Intelligence Technologies

: The shield machine attitude (SMA) is the most important parameter in the process of tunnel construction. To prevent the shield machine from deviating from the design axis (DTA) of the tunnel, it is of great signiﬁcance to accurately predict the dynamic characteristics of SMA. We establish eight SMA prediction models based on the data of ﬁve earth pressure balance (EPB) shield machines. The algorithms adopted in the models are four machine learning (ML) algorithms (KNN, SVR, RF, AdaBoost) and four deep learning (DL) algorithms (BPNN, CNN, LSTM, GRU). This paper obtains the hyperparameters of the models by utilizing grid search and K-fold cross-validation techniques and uses EVS and RMSE to verify and evaluate the prediction performances of the models. The prediction results reveal that the two best algorithms are the LSTM and GRU with EVS > 0.98 and RMSE < 1.5. Then, integrating ML algorithms and DL algorithms, we design a warning predictor for SMA. Through the historical 5-cycle data, the predictor can give a warning in advance if the SMA deviates signiﬁcantly from DTA. This study indicates that AI technologies have considerable promise in the ﬁeld of SMA dynamic prediction.


Introduction
Shield tunneling construction methods are widely used in subway, transportation, water conservancy, and other tunnel projects for their high efficiency, safety, and convenient construction characteristics [1]. The Shield tunnel construction process is a large, complex, and dynamic system, and the direction of the shield machine is hard to control. Once the shield machine deviates from the DTA, it will cause dislocation and damage to the segments, which will further affect the quality of the tunnel [2]. At present, the control of shield machine attitude (SMA) is still in the manual control stage, and the control process can only be implemented after the SMA deviation, which has the deficiencies of untimely and inaccurate control [3,4]. Consequently, to guarantee the accuracy of tunneling direction, the major work is to develop intelligent warning technology for assisting machine operators in correcting deviation in advance.
Some researchers used theoretical and experimental methods to study the prediction and control mechanism of SMA. Wang et al. [5] proposed an effective pose control system based on a target motion determination algorithm and displacement control of thrust cylinders, then tested it on the test rig of the thrust system. Other relevant works involved those by Sugimoto and Sramoon [6], Liu et al. [7], Li et al. [8], etc. These methods have been discussed concerning the mechanism but are lacking the timely guidance for the machine operators to adjust the SMA, and additionally rarely consider factors such as shield data, thus resulting in low attitude control accuracy.
During the last five years, with the abundance of data and the development of AI technologies, many algorithms with good generalization performance have been widely used in shield performance prediction [9][10][11][12][13] and geological information prediction [14][15][16], being able to provide timely guidance for the machine operators. Inspired by these methods, some scholars have also tried to apply AI technologies for real-time warning and control of the SMA. Based on the shield data, Zhang et al. [17] established a hybrid trajectory deviation prediction model combining principal component analysis (PCA) and gated recurrent unit (GRU). For the SMA prediction, similar yields include the extreme gradient boosting (XGBoost) model established by Wang et al. [18] and a hybrid deep learning model proposed by Zhou et al. [19]. These methods made significant attempts in the development of SMA prediction models, but so far as we know, there is still a lack of a universal approach. Furthermore, the substantial shield machine data make the data preprocessing work more demanding. Therefore, it is an emergent need to comprehensively evaluate various AI algorithms for predicting SMA and develop professional data preprocessing algorithms. Literature review reveals that the comprehensive comparison of multiple AI technologies has attracted the attention of researchers in various fields, such as additive manufacturing [20], medical science [21,22], and civil engineering [13,23]. However, there are no reports that use various algorithms to predict SMA and evaluate their predictive effectiveness comprehensively.
In summary, based on the existing research work, this paper aims to establish the prediction models of SMA by using various ML and DL algorithms. The datasets are collected from five earth pressure balance (EPB) shield machines in Chengdu Metro Line 19, China. Concurrently, based on the characteristics of EPB data, we develop a data preprocessing algorithm, including three phases of data segmentation, filtering, and standardization. Then, we use EVS and RMSE indicators to evaluate the prediction effect of SMA by different algorithms and specify the optimal model. Finally, we establish a warning predictor for SMA based on the prediction results.
The remainder of this paper is organized as follows. Section 2 introduces eight basic AI algorithms and SMA prediction framework. Section 3 introduces the data source, data preprocessing, modeling process, and prediction results in detail and gives an early warning predictor for SMA. Section 4 discusses the applicability of ML algorithms and DL algorithms in dealing with the prediction of SMA. Finally, Section 5 provides the concluding remarks.

Methods
AI is a science like biology or mathematics. It studies ways to build intelligent programs that can creatively solve problems imitating the human prerogative, and ML and DL are both a subset of AI [23]. This study uses eight existing basic AI algorithms to establish the prediction model of SMA. The following is the introduction to the algorithms.

ML Algorithms
Based on the Scikit-learn ML package [24], four supervised learning algorithms, including the conventional algorithms K-nearest neighbors (KNN), Support Vector Regression (SVR), and integrated algorithms Random Forest (RF), and Adaptive Boosting (AdaBoost), are adopted in this paper. The schematic of each algorithm is shown in Figure 1

KNN
The structure of the KNN algorithm is shown in Figure 1a. The KNN algorithm is a nonparametric and lazy algorithm [25], which does not need to make assumptions on the data and has no clear data training process. Based on a certain distance, KNN finds the K samples closest to it in the training set and then makes predictions based on the information of these K samples. The whole prediction process includes three steps: first, calculate the distance to determine which samples are neighbors; then, select the K value; finally, make decisions. KNN algorithm is a prediction technique that is easy to understand and implement and performs well in many cases [26,27].

SVR
Support Vector Machine (SVM) is a type of generalized linear classifier that performs binary classification of data [28]. Then the SVR is the promotion of SVM from a classification problem to a regression problem. The SVR structure is shown in Figure 1b. It maps the training samples to the high-dimensional feature space through the nonlinear mapping relationship selected in advance, then trains and solves the samples, thereby obtaining the best-fitting function model [29]. Ultimately, solving nonlinear problems using the SVR algorithm is a process of solving the optimal hyperplane. The SVR estimation functions are as follows [30]:

KNN
The structure of the KNN algorithm is shown in Figure 1a. The KNN algorithm is a nonparametric and lazy algorithm [25], which does not need to make assumptions on the data and has no clear data training process. Based on a certain distance, KNN finds the K samples closest to it in the training set and then makes predictions based on the information of these K samples. The whole prediction process includes three steps: first, calculate the distance to determine which samples are neighbors; then, select the K value; finally, make decisions. KNN algorithm is a prediction technique that is easy to understand and implement and performs well in many cases [26,27].

SVR
Support Vector Machine (SVM) is a type of generalized linear classifier that performs binary classification of data [28]. Then the SVR is the promotion of SVM from a classification problem to a regression problem. The SVR structure is shown in Figure 1b. It maps the training samples to the high-dimensional feature space through the nonlinear mapping relationship selected in advance, then trains and solves the samples, thereby obtaining the best-fitting function model [29]. Ultimately, solving nonlinear problems using the SVR algorithm is a process of solving the optimal hyperplane. The SVR estimation functions are as follows [30]: where x i is the factor that affects the output target y i ; ϕ(x) is the nonlinear function that the sample is mapped to the high-dimensional space; ω T is the coefficient of the independent variable function; b is the offset value; ω is the model complexity; C is the penalty factor; ξ i , ∧ ξ i is the relaxation factor; B is the total number of samples; ε is the insensitive factor. By introducing the Lagrange multiplier method and combining the duality theory, the SVR estimation function can be obtained. See Joachims [28], Yan et al. [30], and Yang et al. [31] for a more detailed derivation and calculation process.

RF
As a representative of the bagging algorithm, RF is a meta-estimator. Many theoretical and case studies have shown that RF has a high prediction accuracy and a good tolerance for abnormal data [32][33][34]. Through the bootstrap resampling technology, multiple classification decision tree models are matched to various subsamples of the dataset to improve the prediction accuracy. As shown in Figure 1c, the bootstrap resampling technology is used to extract n subsets from the original dataset as the training data of n decision tree models. Finally, the mean value of the prediction results of n decision tree models is used as the final prediction result. The output of the RF can be expressed as [35]: where y i (x) is the individual prediction of a decision tree for an input x, and n is the total number of decision trees. By taking an average of decision trees' predictions, the abnormal tree can be offset. RF achieves a reduced variance by combining diverse trees, sometimes at the cost of slightly increasing in deviation [35]. Therefore, it is necessary to select reasonable model parameters to achieve a balance between deviation and variance.

AdaBoost
AdaBoost is a kind of boosting algorithm introduced by Freund and Schapire [36] to improve the performance of the ML algorithms. Its structure diagram is shown in Figure 1d. The core idea of AdaBoost is to train different classifiers (weak classifiers) for the same training set and then combine these weak classifiers to form a stronger final classifier. Different from the RF model that selects some variables as split variables, AdaBoost will increase the probability of selecting samples that are easy to misclassify while selecting variables and then weight the output of each decision tree to generate the final model prediction result [37]. It has a great potential in addressing nonlinear, complicated regression problems [7,38]. For the specific implementation steps of the algorithm, please refer to Drucker [39].

DL Algorithms
Compared with ML algorithms, DL algorithms can discover complex structures in large datasets. Thus, we used four basic supervised DL algorithms to train the data sets of SMA, including backpropagation neural network (BPNN), convolutional neural networks (CNN), long short-term memory networks (LSTM), and GRU. Their schematic is shown in Figure 2.

BPNN
BPNN is a basic neural network whose output results propagate forward, and errors propagate back [14,40]. Figure 2a shows the basic structure of BPNN, which consists of an input layer, a hidden layer, and an output layer. The input data flows from the input layer to the output layer, and the error is propagated to the output layer to find a set of weights and biases that ensure the network produces the same output value as the actual output value. And the weight and bias are affected by the number of hidden layers and neurons. The outputs of the hidden layers and output layers in the BPNN can be presented as Equations (5) and (6) [41]: where X is the input matrix. H and O are the output of hidden and output layers, respectively. W1 and W2 are the weights matrices for the connections input and hidden layers, hidden and output layers, respectively. b1 and b2 are the bias vectors added in the hidden and output layers, respectively. f and g are the activation functions.

CNN
CNN is a typical kind of feedforward neural network that contains convolution calculation, which could learn implicit features automatically with a low network complexity. It was first proposed by Fukushima and further developed by LeCun [42]. As shown in Figure 2b, CNN is composed of input layers, convolution layers, pooling layers, and fully connected layers. The main function of the convolution layers is to extract implicit features, and the pooling layers are utilized for feature dimensionality reduction. In view of its powerful data feature extraction capability, CNN has been developed rapidly and

BPNN
BPNN is a basic neural network whose output results propagate forward, and errors propagate back [14,40]. Figure 2a shows the basic structure of BPNN, which consists of an input layer, a hidden layer, and an output layer. The input data flows from the input layer to the output layer, and the error is propagated to the output layer to find a set of weights and biases that ensure the network produces the same output value as the actual output value. And the weight and bias are affected by the number of hidden layers and neurons. The outputs of the hidden layers and output layers in the BPNN can be presented as Equations (5) and (6) [41]: where X is the input matrix. H and O are the output of hidden and output layers, respectively. W 1 and W 2 are the weights matrices for the connections input and hidden layers, hidden and output layers, respectively. b 1 and b 2 are the bias vectors added in the hidden and output layers, respectively. f and g are the activation functions.

CNN
CNN is a typical kind of feedforward neural network that contains convolution calculation, which could learn implicit features automatically with a low network complexity. It was first proposed by Fukushima and further developed by LeCun [42]. As shown in Figure 2b, CNN is composed of input layers, convolution layers, pooling layers, and fully connected layers. The main function of the convolution layers is to extract implicit features, and the pooling layers are utilized for feature dimensionality reduction. In view of its powerful data feature extraction capability, CNN has been developed rapidly and has been Appl. Sci. 2021, 11, 10264 6 of 22 successfully applied in many fields, such as face recognition [43], medical diagnosis [44], and image understanding [45].

LSTM
LSTM is a variation of the RNN, which has the advantages of memorability, shared parameters, and Turing completeness [46]. It overcomes the vanishing and exploding gradient problem of the RNN and is suitable for the processing of time sequences. The structure of LSTM model is shown in Figure 2c. Based on RNN, LSTM adds mechanisms, such as memory cell, input gate, forget gate, and output gate, to control the transmission of information at different times, thus greatly improving the ability of the model to learn long-term dependencies. The input gate determines how much new information will be recorded, and the forget gate decides how much information will be forgotten, and the output gate determines the output using the information and cell state from the previous two gates. The working mechanism of each gate and information flow can be expressed using the following equations [10,47]: The mathematical symbols of the above formula are defined as follows: → h t−1 is the output at time t − 1, which represents the hidden state of LSTM, and can be expressed as is the vector value of the memory cell at t − 1, and tanh is the hyperbolic tangent function, which can map real numbers to (−1, 1); σ is the sigmoid activation function, which can map real numbers to (0, 1).

GRU
The GRU algorithm was proposed in 2014, and it has shown excellent performance in time sequence prediction [48]. The GRU unit structure is shown in Figure 2d. Compared with the LSTM algorithm, GRU can also solve the problems of long-term memory and gradient in the back propagation of the RNN algorithm [49,50]. Moreover, it has no storage unit and has higher operation and prediction efficiency. The GRU unit only contains two gates, the reset gate, and the update gate. The reset gate determines how to combine the new input information with the previous memory, and the update gate defines the amount of the previous memory saved on the current time step.

Evaluation Criteria
To evaluate the accuracy of the eight algorithms, we adopted the explained variance score (EVS) and root mean squared error (RMSE) to evaluate the performance of the models.
For the raw value Y = {y 1 , y 2 , . . . , y n } and the predicted value ∧ Y = ∧ y 1 , ∧ y 2 , . . . , ∧ y n of the SMA parameters, these statistical metrics were calculated as follows [51]: Appl. Sci. 2021, 11, 10264 where Var is the sample variance; i is the sample number, ranging from 1 to n. EVS is the variance score explaining the regression model, and the best possible score is 1.0. The closer EVS is to 1, the better predicted the values are to explain the variance of the measured values. RMSE represents the standard error between the predicted values and the measured values. Instead, the smaller the RMSE, the better predictability of the model. Therefore, we expect the prediction algorithms with higher EVS and lower RMSE.

Intelligent Framework of SMA Prediction
The problem of dynamic real-time prediction of SMA parameters can be transformed into a short-time sequence prediction task. The tunneling parameters of several periods in the past (also called the time window) are retrospectively used to predict the SMA parameters of the next period. With the advancement of EPB, the data sequence is continuous with roll, achieving dynamic real-time prediction. Based on this, we proposed the prediction framework of the SMA parameters by using four ML algorithms and four DL algorithms. As shown in Figure 3, the framework included three main steps: data collection and preprocessing, model establishment, and model comparison and evaluation.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 7 of 22 where Var is the sample variance; i is the sample number, ranging from 1 to n. EVS is the variance score explaining the regression model, and the best possible score is 1.0. The closer EVS is to 1, the better predicted the values are to explain the variance of the measured values. RMSE represents the standard error between the predicted values and the measured values. Instead, the smaller the RMSE, the better predictability of the model. Therefore, we expect the prediction algorithms with higher EVS and lower RMSE.

Intelligent Framework of SMA Prediction
The problem of dynamic real-time prediction of SMA parameters can be transformed into a short-time sequence prediction task. The tunneling parameters of several periods in the past (also called the time window) are retrospectively used to predict the SMA parameters of the next period. With the advancement of EPB, the data sequence is continuous with roll, achieving dynamic real-time prediction. Based on this, we proposed the prediction framework of the SMA parameters by using four ML algorithms and four DL algorithms. As shown in Figure 3, the framework included three main steps: data collection and preprocessing, model establishment, and model comparison and evaluation.

•
Step 1: Dataset collection and preprocessing

Dataset collection and preprocessing
Model establishment

Model comparison and evaluation
Deviation?
Normal tunneling Deviation alarm Y N Figure 3. Overall framework of shield machine attitude (SMA) prediction using artificial intelligence (AI) algorithms.

•
Step 1: Dataset collection and preprocessing The raw data contains many anomalies, noise, and errors, and it is inefficient and unscientific to directly put hundreds of millions of unprocessed data into the AI algorithms. Therefore, in view of the characteristics of EPB data, we develop a data preprocessing algorithm based on the Python language, which mainly includes three stages: segmentation, noise reduction, and standardization. Then, the data is divided into a training set and testing set. The training set is further subdivided into a training subset and validation subset to serve the modeling step.

•
Step 2: Model establishment We use eight AI algorithms (KNN, SVR, RF, AdaBoost, BPNN, CNN, LSTM, and GRU) to establish the SMA model of EPB. The input and output of the model adopt the "five predict one" mode; that is, the input is the parameters of the past five drilling cycles, and the target is the SMA value of the next drilling cycle. Then, the grid search and K-fold cross-validation are used to find the hyperparametric that optimizes the generalization performance of each model. Finally, we complete the modeling process.

•
Step 3: Model comparison and evaluation At this step, the performance of eight AI algorithms is compared through the EVS and RMSE indicators. Based on the model prediction results, we propose a warning predictor of SMA. When the predictor perceives that the SMA is about to deviate from the DTA significantly, it can provide an early warning for the machine operators to adjust the tunneling parameters.

Case Study
The data in this study is from Chengdu Metro Line 19 in China, with a total length of 43.186 km and a tunnel excavation diameter of 8.64 m. The location of the project is shown in Figure 4. All lines are underground, and there are 12 underground stations with an average station spacing of 3.58 km. Line 19 is still under construction, and we collected a large amount of data from five EPB shield machines in Xin-hong, Hong-tian, and Tian-hua sections. The total length of these sections is about 4 km. The surface layer is miscellaneous fill and plain fill, and the strata crossed are mainly medium weathered sandstone and medium weathered mudstone. The geological proportions along the line are shown in Figure 4b. The overall geological condition is simple, suitable for excavation by EPB shield machine.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 22 The raw data contains many anomalies, noise, and errors, and it is inefficient and unscientific to directly put hundreds of millions of unprocessed data into the AI algorithms. Therefore, in view of the characteristics of EPB data, we develop a data preprocessing algorithm based on the Python language, which mainly includes three stages: segmentation, noise reduction, and standardization. Then, the data is divided into a training set and testing set. The training set is further subdivided into a training subset and validation subset to serve the modeling step.

•
Step 2: Model establishment We use eight AI algorithms (KNN, SVR, RF, AdaBoost, BPNN, CNN, LSTM, and GRU) to establish the SMA model of EPB. The input and output of the model adopt the "five predict one" mode; that is, the input is the parameters of the past five drilling cycles, and the target is the SMA value of the next drilling cycle. Then, the grid search and K-fold cross-validation are used to find the hyperparametric that optimizes the generalization performance of each model. Finally, we complete the modeling process.

•
Step 3: Model comparison and evaluation At this step, the performance of eight AI algorithms is compared through the EVS and RMSE indicators. Based on the model prediction results, we propose a warning predictor of SMA. When the predictor perceives that the SMA is about to deviate from the DTA significantly, it can provide an early warning for the machine operators to adjust the tunneling parameters.

Project Overview
The data in this study is from Chengdu Metro Line 19 in China, with a total length of 43.186 km and a tunnel excavation diameter of 8.64 m. The location of the project is shown in Figure 4. All lines are underground, and there are 12 underground stations with an average station spacing of 3.58 km. Line 19 is still under construction, and we collected a large amount of data from five EPB shield machines in Xin-hong, Hong-tian, and Tianhua sections. The total length of these sections is about 4 km. The surface layer is miscellaneous fill and plain fill, and the strata crossed are mainly medium weathered sandstone and medium weathered mudstone. The geological proportions along the line are shown in Figure 4b. The overall geological condition is simple, suitable for excavation by EPB shield machine.   All five EPB shield machines are made by the same manufacturer, China Railway Engineering Equipment Group Co., Ltd. (CREC), and the main technical parameters are summarized in Table 1. The EPB working parameters, such as advance rate, rota-tional speed, cutterhead torque, and chamber pressure, are automatically collected at an acquisition frequency of 1 Hz, and the total data volume exceeds 180 GB. To ensure that the EPB can excavate in accordance with the DTA, it is necessary to use a laser navigation system to measure the SMA in real-time. As represented in Figure 5, the system is composed of a total station, laser targets, rear-view prisms, industrial computers, and other components. The total station measures the spatial coordinates of the laser target by the emitted laser at regular intervals. It then measures the position of the shield head and tail center relative to the laser target according to the zero position, thereby calculating the geodetic coordinates of the EPB. Finally, the industrial computer calculates the SMA parameters relative to the DTA and displays the results in the cockpit.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 22 All five EPB shield machines are made by the same manufacturer, China Railway Engineering Equipment Group Co., Ltd. (CREC), and the main technical parameters are summarized in Table 1. The EPB working parameters, such as advance rate, rotational speed, cutterhead torque, and chamber pressure, are automatically collected at an acquisition frequency of 1 Hz, and the total data volume exceeds 180 GB. To ensure that the EPB can excavate in accordance with the DTA, it is necessary to use a laser navigation system to measure the SMA in real-time. As represented in Figure  5, the system is composed of a total station, laser targets, rear-view prisms, industrial computers, and other components. The total station measures the spatial coordinates of the laser target by the emitted laser at regular intervals. It then measures the position of the shield head and tail center relative to the laser target according to the zero position, thereby calculating the geodetic coordinates of the EPB. Finally, the industrial computer calculates the SMA parameters relative to the DTA and displays the results in the cockpit. In the tunneling process, the deviation of the shield head and shield tail from DTA directly determines the future trajectory of EPB machine and further affects the quality of the segment assembly [17]. Therefore, horizontal deviation of the shield head (HDSH), vertical deviation of the shield head (VDSH), horizontal deviation of the shield tail (HDST), and vertical deviation of the shield tail (VDST) are used as the SMA prediction parameters of the EPB in Chengdu Metro Line 19. Figure 6 shows the distribution of the In the tunneling process, the deviation of the shield head and shield tail from DTA directly determines the future trajectory of EPB machine and further affects the quality of the segment assembly [17]. Therefore, horizontal deviation of the shield head (HDSH), vertical deviation of the shield head (VDSH), horizontal deviation of the shield tail (HDST), and vertical deviation of the shield tail (VDST) are used as the SMA prediction parameters of the EPB in Chengdu Metro Line 19. Figure 6 shows the distribution of the SMA parameters. By continuously monitoring these four parameters, the machine operators achieve manual control of the shield in the excavation process. However, owing to the inevitable hysteresis of manual control, it is particularly necessary to advance the control process and issue a warning timely when the SMA is about to deviate from DTA. SMA parameters. By continuously monitoring these four parameters, the machine operators achieve manual control of the shield in the excavation process. However, owing to the inevitable hysteresis of manual control, it is particularly necessary to advance the control process and issue a warning timely when the SMA is about to deviate from DTA.

Data Preprocessing
To ameliorate the prediction effect of the algorithms, the EPB data needs to be preprocessed before establishing the model. Hence, we proposed a data standard preprocessing method for EPB tunneling parameters, which included three phases: data segmentation, noise reduction, and data standardization. The advance rate (v) is used as an example to illustrate the preprocessing process, as shown in Figure 7.

Data Preprocessing
To ameliorate the prediction effect of the algorithms, the EPB data needs to be preprocessed before establishing the model. Hence, we proposed a data standard preprocessing method for EPB tunneling parameters, which included three phases: data segmentation, noise reduction, and data standardization. The advance rate (v) is used as an example to illustrate the preprocessing process, as shown in Figure 7.

Data Segmentation
In tunnel construction, the length of one ring is not completed in a single excavation. One ring is usually cut into several segments due to the capacity of slag discharge and grouting. When the programmable logic controller (PLC) stores the effective drilling cycles, it also records a large amount of invalid data that are not related to the tunneling process, as shown in Figure 7a. The first step of data preprocessing was to eliminate invalid data and filter out valid data.
In a drilling cycle, EPB relies on the propulsion system and the cutterhead system to roll, cut, and spall the soil layer continuously. The whole process can be summarized into free running stage Figure 7b

•
There is a clear loading section with a group of ascending rotation speeds and advance rates set by the machine operators.

•
There is a clear shut-down period at which the rotation speed and advance rate are set by the machine operators to zero, accompanied by the sudden drawdown of the values of cutterhead thrust and torque to zeros.

•
The hydraulic cylinder limits the maximum possible advancement of a complete boring cycle, which is 1.5 m for the machine currently under investigation.

•
After the unloading period, there is a reasonably long period of time with zero records of advance rate. Other readings of the mechanical parameters are also virtually zero. This indicates that the machine is silent, and the machine operators are doing the necessary preparatory work for the next drilling cycle, such as slagging and segment assembly.

Data Preprocessing
To ameliorate the prediction effect of the algorithms, the EPB data needs to be preprocessed before establishing the model. Hence, we proposed a data standard preprocessing method for EPB tunneling parameters, which included three phases: data segmentation, noise reduction, and data standardization. The advance rate (v) is used as an example to illustrate the preprocessing process, as shown in Figure 7.

Data Segmentation
In tunnel construction, the length of one ring is not completed in a single excavation. One ring is usually cut into several segments due to the capacity of slag discharge and grouting. When the programmable logic controller (PLC) stores the effective drilling cycles, it also records a large amount of invalid data that are not related to the tunneling process, as shown in Figure 7a. The first step of data preprocessing was to eliminate invalid data and filter out valid data.
In a drilling cycle, EPB relies on the propulsion system and the cutterhead system to roll, cut, and spall the soil layer continuously. The whole process can be summarized into free running stage Figure 7b There is a clear loading section with a group of ascending rotation speeds and advance rates set by the machine operators.

•
There is a clear shut-down period at which the rotation speed and advance rate are set by the machine operators to zero, accompanied by the sudden drawdown of the values of cutterhead thrust and torque to zeros.

•
The hydraulic cylinder limits the maximum possible advancement of a complete boring cycle, which is 1.5 m for the machine currently under investigation.

•
After the unloading period, there is a reasonably long period of time with zero records of advance rate. Other readings of the mechanical parameters are also virtually zero. This indicates that the machine is silent, and the machine operators are doing the necessary preparatory work for the next drilling cycle, such as slagging and segment assembly.
Based on the above characteristics, an algorithm for automatic segmentation of EPB data was developed, and the segmentation effect is shown in Figure 7b. More segmentation results can be found at https://github.com/ChenZuyuIWHR/TBM-processing. Avail- Based on the above characteristics, an algorithm for automatic segmentation of EPB data was developed, and the segmentation effect is shown in Figure 7b. More segmentation results can be found at https://github.com/ChenZuyuIWHR/TBM-processing. (accessed on 14 April 2021). It is worth noting that the advance rate at the free-running and unloading stage was zero. Therefore, the subsequent AI algorithms only used the data of the loading and boring stage.

Noise Reduction
The EPB tunneling parameters are recorded by the automatic acquisition system. The working environment of the acquisition system is a moving environment with strong vibration and electromagnetic interference, sometimes causing the recording of data beyond the reasonable range. Zhou et al. [19] and Hu et al. [52] indicated that noise could interfere with the training process and reduce the generalization ability of the model. Common methods used to eliminate noise information include Fourier transform, low-pass filtering, and wavelet transform, among which low-pass filtering can block or attenuate highfrequency signals to maximize the effect of noise reduction [53,54]. Thereby, the low-pass filter Butterworth that blocks or attenuates the high-frequency signal to maximize the denoising effect was used to reduce the data noise. The expression of the Butterworth filter is as follows [55,56]: where |H(ω)| 2 is the square magnitude of the filter; n is the order of the filter; ω is the unit angular cutoff frequency. The characteristic of the Butterworth filter is that the frequency response curve in the pass band is a maximally flat approximation because its response magnitude monotonically decays as the frequency increases [56]. As the order of the Butterworth filter increases, the degree of noise reduction increase. To achieve a good data noise reduction effect while avoiding information loss, n and ω were taken as 2 and 0.2 through the trial-and-error method. The noise reduction effect of the advance rate is shown in Figure 7c.

Data Normalization
The mean, unit, and value of each physical quantity in EPB tunneling parameters are different. If the parameters are directly sent to the model for training, the prediction accuracy of the model will be reduced. Thus, we normalized the tunneling parameters according to the standard deviation standardization method [57].
where x is the original value; x is the normalized value; u and σ are the mean and the standard deviation of a dimension of the tunneling parameters, respectively. After the training process, the results needed to be transformed into the original data. After data preprocessing, a total of 5607 drilling cycles were segmented from the raw data, which constituted our AI learning database. The dataset was further split into two groups: the first group contained 4800 drilling cycles (EPB1~EPB4) for training the model, and the second group contained 807 drilling cycles (EPB5) for verifying the predictive ability of the model.

Parameters Selection
The core of achieving the SMA prediction is to grasp the key tunneling parameters that influence the SMA accurately. Parameters surely unrelated to the SMA, such as item number and oil temperature, were excluded. The practical experience and previous studies indicated that the rotation speed (N), advance rate (V), and rotation speed of the screw machine (N s ) are the main operating parameters of the machine operators. Cutterhead thrust (F) and torque (T) indirectly reflect the interaction between the EPB machine and the excavation face, and torque of the screw machine (T s ) reflects the conditions of slag discharge [58,59]. Chamber pressure (C p ) can be used to evaluate the stability of the excavation face, and it is the representative parameter of EPB shield machine. Propelling pressure (PA~PF) is the main execution parameter of SMA correction for EPB operators [60]. Some of the above parameters were also used in the prediction models of SMA in Wang et al. [18] and Zhou et al. [19].
Based on the above analysis, these 13 tunneling parameters were adopted as the input options of SMA prediction model. In addition, the boring time (Time) of a drilling cycle was also added as an input feature, which determines the boring length and indirectly affects the final SMA state. Specifically, the actual SMA parameters of the past record were also selected as the input feature to calculate the future SMA [17]. The description of these parameters is listed in Table 2.

Model Building and Training
According to the prediction framework introduced in Figure 3, we constructed eight prediction models for SMA parameters. The characteristics of SMA at time t can be expressed as: where i and j are the input and output characteristic parameters, respectively; n is the time window, which determines the time span of backtracking data. In this paper, the range of i was 1 to 14, representing the fourteen input parameters; the range of j was 1 to 4, representing the four SMA parameters. The time window was taken as 5, i.e., the SMA parameters at drilling cycle t were predicted in advance based on the information of five historical drilling cycles. To fix the size of the time window, we used the shift function in Pandas to shift the entire column. Since the order of tunneling data is important in our analysis, the random (that is, shuffle) of all algorithms was set to "False". After determining the basic parameters of the AI algorithms, it was also necessary to determine other hyperparameters, such as the number of trees in RF algorithm, the hidden layer nodes in BPNN, CNN, LSTM, and GRU algorithms, etc. For these hyperparameters, we used grid search to determine specific values [13,61]. First, on the training set, we provided a series of priori candidate values of various model parameters, then all parameter combinations were searched through the grid. The 5-fold cross-validation was applied to optimize the robustness of the model during the whole training process. Finally, we obtained the parameters' combinations that make the algorithm perform optimally. The hyperparameters used in AI algorithms are shown in Table 3. It is noteworthy that the input-output settings of the ML algorithms differ from that of the DL algorithms. For more physical meanings of algorithms hyperparameters and specific settings of input-output formats, please refer to Chollet [62] and Géron [63]. We trained four models for each algorithm, and each model was used to predict one of the four output variables. The ML algorithms were implemented using Scikit-learn 0.24.2, and the DL algorithms were implemented on Keras with TensorFlow 2.1.0 as the backend. All models were trained and optimized on the computer equipped with Windows 64-bit operating system, Intel Core i7-7700k 4.20 GHz, 8-core CPU with 32 GB RAM.

Performance Comparison of Eight Algorithms
Taking the parameter VDST as an example, the prediction results of eight different AI algorithms are shown in Figure 8 and Table 4. Different algorithms had different prediction accuracy for parameter VDST. The KNN algorithm in ML methods had the worst prediction accuracy, with a RMSE of 9.45 and an EVS of 0.19. In addition, the predicted values by the other three ML methods (SVR, RF, AdaBoost) was close to the measured VDST value. The predicted absolute error for the other three algorithms was basically less than 5 mm, the EVS was higher than 0.95, and the RMSE was lower than 2.0.  We trained four models for each algorithm, and each model was used to predict one of the four output variables. The ML algorithms were implemented using Scikit-learn 0.24.2, and the DL algorithms were implemented on Keras with TensorFlow 2.1.0 as the backend. All models were trained and optimized on the computer equipped with Windows 64-bit operating system, Intel Core i7-7700k 4.20 GHz, 8-core CPU with 32 GB RAM.

Performance Comparison of Eight Algorithms
Taking the parameter VDST as an example, the prediction results of eight different AI algorithms are shown in Figure 8 and Table 4. Different algorithms had different prediction accuracy for parameter VDST. The KNN algorithm in ML methods had the worst prediction accuracy, with a RMSE of 9.45 and an EVS of 0.19. In addition, the predicted values by the other three ML methods (SVR, RF, AdaBoost) was close to the measured VDST value. The predicted absolute error for the other three algorithms was basically less than 5 mm, the EVS was higher than 0.95, and the RMSE was lower than 2.0.  As can be seen from Figure 8, the prediction results of LSTM and GRU algorithms in DL methods were the best. They captured the measured observations of the target parameters better. The evaluation indexes RMSE and EVS were around 1.1 and 0.99, respectively, and the absolute error was within 2.5 mm. Instead, the BPNN algorithm (RMSE = 2.9, EVS = 0.95) and CNN algorithm (RMSE = 3.65, EVS = 0.88) in DL methods had poor  As can be seen from Figure 8, the prediction results of LSTM and GRU algorithms in DL methods were the best. They captured the measured observations of the target parameters better. The evaluation indexes RMSE and EVS were around 1.1 and 0.99, respectively, and the absolute error was within 2.5 mm. Instead, the BPNN algorithm (RMSE = 2.9, EVS = 0.95) and CNN algorithm (RMSE = 3.65, EVS = 0.88) in DL methods had poor prediction effect. The prediction performance of the eight AI algorithms for VDST in descending order was as follows: LSTM > GRU > AdaBoost > RF > SVR > BPNN > CNN > KNN.
Then, we comprehensively compared the HDSH, HDST, VDSH, and VDST of these AI algorithms, and the results are listed in Table 4. The prediction indexes for HDSH, HDST, VDSH, and VDST by different algorithms are as follows: EVS were 0.88~0.98, 0.92~0.99, 0.84~0.99, 0.88~0.99, and RMSE were 1.36~3.45, 1.25~3.33, 1.15~5.7, 1.09~3.65, respectively. The KNN algorithm was not included in the statistics due to it having the worst prediction effect. The remaining seven algorithms had better prediction results for different SMA parameters, with EVS above 0.8 and RMSE below 6. LSTM and GRU were the two best prediction performance algorithms, with EVS greater than 0.98 and the RMSE less than 1.5, followed by AdaBoost with an average RMSE of 1.392, and then followed by RF with an average RMSE of 1.413. The forecast effects of SVR and BPNN models were also within a reasonable range. By contrast, CNN was more sensitive to the choice of the SMA parameters, with the lowest EVS of 0.84 for VDSH parameter and the highest EVS of 0.92 for HDST parameter.
In summary, these results indicated that most AI algorithms showed satisfactory performance in predicting the SMA parameters. Considering the EVS and RMSE indicators, the two best algorithms are LSTM and GRU.

Warning Predictor for SMA
Based on the prediction results, we proposed a warning predictor integrating ML algorithms and DL algorithms for SMA parameters. Once all algorithms predict that the SMA will deviate significantly from DTA in the future drilling cycle, the predictor can send a warning message in advance to provide auxiliary support for the machine operators to adjust the tunneling parameters. However, the prediction performance of various algorithms for SMA parameters was uneven, and the calculation results were also different. Accordingly, when the prediction results of the algorithms are inconsistent, the following warning criteria are used to judge the SMA situation:

•
The prediction results of the two best-performing algorithms shall prevail.

•
When the two best-performing algorithms fail to reach the same standard, the votes of all predicted results shall prevail.

•
When it still cannot be determined by voting, the algorithms are given different weights according to the prediction index EVS, and the judgment remade.
The specific prediction process is shown in Figure 9. This paper does not consider the KNN algorithm with poor prediction performance, so the SMA situation can be judged by the first two criteria.
The following is an example of an actual VDST deviation (exceeding ± 70) during the shield tunneling process, which illustrates the basic idea of the warning scheme. As shown in Figure 10, VDST gradually increased from the 130th drilling cycle, which indicates that the shield tail migrated downward in the vertical direction, and EPB started to move away from the DTA. In the 139th drilling cycle, the LSTM and GRU algorithms successfully predicted the VDST deviation based on the data of the previous five drilling cycles (Figure 10b). Then the machine operators took some corrective measures, after which the VDST returned to the reasonable range successfully, and the warning was lifted in the 149th drilling cycle (Figure 10c). The following is an example of an actual VDST deviation (exceeding ± 70) during the shield tunneling process, which illustrates the basic idea of the warning scheme. As shown in Figure 10, VDST gradually increased from the 130th drilling cycle, which indicates that the shield tail migrated downward in the vertical direction, and EPB started to move away from the DTA. In the 139th drilling cycle, the LSTM and GRU algorithms successfully predicted the VDST deviation based on the data of the previous five drilling cycles ( Figure  10b). Then the machine operators took some corrective measures, after which the VDST returned to the reasonable range successfully, and the warning was lifted in the 149th drilling cycle (Figure 10c).  The following is an example of an actual VDST deviation (exceeding ± 70) during the shield tunneling process, which illustrates the basic idea of the warning scheme. As shown in Figure 10, VDST gradually increased from the 130th drilling cycle, which indicates that the shield tail migrated downward in the vertical direction, and EPB started to move away from the DTA. In the 139th drilling cycle, the LSTM and GRU algorithms successfully predicted the VDST deviation based on the data of the previous five drilling cycles ( Figure  10b). Then the machine operators took some corrective measures, after which the VDST returned to the reasonable range successfully, and the warning was lifted in the 149th drilling cycle (Figure 10c).   In conclusion, when the SMA is about to deviate significantly, the predictor can warn in time and provide the machine operators with preparation time to adjust the tunneling parameters. Compared with the manual control of the SMA, AI algorithms have almost no delay and can provide real-time future SMA information. The proposed SMA warning predictor can provide a basis for future correction research and a reference for similar projects. Using the basic AI algorithms, this paper constructed the warning predictor by backtracking a certain length of historical data. When the computing power and the database are further improved and enriched, more historical data will be able to be used for predicting a longer-distance SMA situation of EPB.

Model Applicability
AI technologies are widely applied in the field of geotechnical engineering and solve many practical problems. For SMA prediction of EPB shield machine, we chose eight basic AI algorithms for comprehensive analysis and comparison. As shown in Figure 11, the average prediction effect of DL algorithms was better than that of ML algorithms.
parameters. Compared with the manual control of the SMA, AI algorithms have almost no delay and can provide real-time future SMA information. The proposed SMA warning predictor can provide a basis for future correction research and a reference for similar projects. Using the basic AI algorithms, this paper constructed the warning predictor by backtracking a certain length of historical data. When the computing power and the database are further improved and enriched, more historical data will be able to be used for predicting a longer-distance SMA situation of EPB.

Model Applicability
AI technologies are widely applied in the field of geotechnical engineering and solve many practical problems. For SMA prediction of EPB shield machine, we chose eight basic AI algorithms for comprehensive analysis and comparison. As shown in Figure 11, the average prediction effect of DL algorithms was better than that of ML algorithms. Furthermore, KNN has few parameters to be optimized [13], and the calculation results mainly depend on the calculation distance, which determines that KNN may produce poor prediction results for large-scale data features (Figure 8a). Based on the decision tree algorithm, the RF and AdaBoost algorithms have the same regressors, and their prediction results are obtained by these regressors [33,38]. Thus, when solving the prediction problem of SMA, the effects of RF and AdaBoost algorithms were better than KNN algorithm and SVR algorithm [29]. As the simplest DL algorithm, BPNN is composed of a multilayer stack of simple modules, and the model with 5~20 hidden neurons can achieve extremely complex functions [40]. However, BPNN structure cannot store historical information. With a depth of network structure increase, gradient explosion or disappearance will occur, which means that it is not suitable for predicting time sequence data. Similarly, CNN predictors based on convolution dimension reduction have good performance in solving image recognition problems [45,64] but are slightly inadequate in processing time sequence data. The LSTM and GRU, which are good at processing time sequence, had the best prediction result.
To sum up, various AI algorithms have their own range of adaptation. There is no algorithm that is suitable for all problems [20]. For the SMA prediction problem, the algorithms that are good at capturing time sequence in DL technologies have the best prediction effect. Furthermore, KNN has few parameters to be optimized [13], and the calculation results mainly depend on the calculation distance, which determines that KNN may produce poor prediction results for large-scale data features (Figure 8a). Based on the decision tree algorithm, the RF and AdaBoost algorithms have the same regressors, and their prediction results are obtained by these regressors [33,38]. Thus, when solving the prediction problem of SMA, the effects of RF and AdaBoost algorithms were better than KNN algorithm and SVR algorithm [29]. As the simplest DL algorithm, BPNN is composed of a multilayer stack of simple modules, and the model with 5~20 hidden neurons can achieve extremely complex functions [40]. However, BPNN structure cannot store historical information. With a depth of network structure increase, gradient explosion or disappearance will occur, which means that it is not suitable for predicting time sequence data. Similarly, CNN predictors based on convolution dimension reduction have good performance in solving image recognition problems [45,64] but are slightly inadequate in processing time sequence data. The LSTM and GRU, which are good at processing time sequence, had the best prediction result.

Dataset Size
To sum up, various AI algorithms have their own range of adaptation. There is no algorithm that is suitable for all problems [20]. For the SMA prediction problem, the algorithms that are good at capturing time sequence in DL technologies have the best prediction effect.

Dataset Size
Urban subway construction is a short, fast, and frequent project, which needs to furnish the prediction results of the SMA based on the existing data within a short distance. Therefore, we manually deleted some datasets and observed the prediction results of LSTM and GRU models under different datasets. As shown in Table 5, when the datasets of VDST were less than 1000, the model no longer had a good prediction level, with EVS lower than 0.8 and RMSE higher than 7.0. As the dataset increased to 3000, the prediction accuracy of the model reached an acceptable range. Consequently, to enhance the generalization performance and robustness of the model in other projects, we plan to collect more SMA data (straight line or curve segment) from different shield machines under various geological conditions.

Conclusions
SMA is the important technical parameter of concern for machine operators during shield tunneling. In this paper, we committed to using various ML algorithms and DL algorithms to establish SMA prediction models. In parallel, we developed a data preprocessing algorithm for EPB tunneling parameters, including data segmentation, noise reduction, and standardization. These modules were used to extract effective data, eliminate interference signals, and unify the dimension of various parameters, which can provide more effective data for the training phase. Then, we used the EVS and RMSE indicators to comprehensively analyze and compare the prediction results of various algorithms on SMA and indicate the optimal model. Finally, we proposed a warning predictor for SMA based on the prediction results. The main conclusions are as follows: 1.
LSTM and GRU algorithms with EVS > 0.98 and RMSE < 1.5 were the two most accurate models for predicting SMA, because they are better at capturing characteristics of data in time sequences format. The comparative analysis indicated that the prediction performance of eight algorithms in descending order is as follows: GRU > LSTM > AdaBoost > RF > SVR > BPNN > CNN > KNN.

2.
SMA warning predictor combined with ML algorithms and DL algorithms can achieve timely warning based on the tunneling information of five historical drilling cycles. The predictor will provide auxiliary support for machine operators to adjust the tunneling parameters in advance.

3.
A certain scale of tunneling data set is necessary. When using relatively small data sets (dataset size < 1000), the prediction accuracy of these algorithms will be significantly reduced.
This study comprehensively assessed the eight basic AI algorithms on SMA prediction problem and then established a warning predictor. It proved the great potential of AI technology in the dynamic prediction of SMA. However, the following work needs to be further supplemented and improved: first, use the optimization algorithm to improve the prediction accuracy of the model; second, collect more tunneling data to enhance the robustness of the model; third, combine with the kinematics theory to establish a complete SMA intelligent warning and control system.  Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: The data sets generated and analyzed during the current study are not publicly available due to privacy but are available from the corresponding author on reasonable request.