A Novel Hybrid Transfer Learning Framework for Dynamic Cutterhead Torque Prediction of the Tunnel Boring Machine

: A tunnel boring machine (TBM) is an important large-scale engineering machine, which is widely applied in tunnel construction. Precise cutterhead torque prediction plays an essential role in the cost estimation of energy consumption and safety operation in the tunneling process, since it directly inﬂuences the adaptable adjustment of excavation parameters. Complicated and variable geological conditions, leading to operational and status parameters of the TBM, usually exhibit some spatio-temporally varying characteristic, which poses a serious challenge to conventional data-based methods for dynamic cutterhead torque prediction. In this study, a novel hybrid transfer learning framework, namely TRLS-SVR, is proposed to transfer knowledge from a historical dataset that may contain multiple working patterns and alleviate fresh data noise interference when addressing dynamic cutterhead torque prediction issues. Compared with conventional data-driven algorithms, TRLS-SVR considers long-ago historical data, and can effectively extract and leverage the public latent knowledge that is implied in historical datasets for current prediction. A collection of in situ TBM operation data from a tunnel project located in China is utilized to evaluate the performance of the proposed framework.


Introduction
Tunnel boring machines (TBM) are widely applied in various tunnel construction projects, such as subways, mining ores, railways, etc., due to advantages of higher reliability, safety, and environmental friendliness [1]. Figure 1 illustrates a typical structure of the TBM, which contains multiple sub-systems, such as the cutterhead driving system, thrust system, cutterhead system, etc.In real-world applications, TBMs generally work in heterogeneous and complicated geological environments, such as spalling, faulting, fracturing, rock bursting, squeezing, swelling, and high water in the flow [2], that pose severe challenges to the operation of TBMs.A schematic illustration of the geological conditions of a tunnel is demonstrated in Figure 2. To ensure construction safety and reduce energy consumption, it is desirable to accurately predict the dynamic load (generally referring to the cutterhead torque) under spatio-temporally varying geological conditions and to dynamically adjust the TBM control parameters during excavation.In general, the prediction methods for cutterhead torque can be roughly grou three types: rock-soil mechanics methods, empirical methods (combined with ments), and soft computing methods.The rock-soil mechanics method estab model according to the force balance among rock, cutters, and internal machine The empirical models are based on engineering experience involving a large am laboratory tests, field measurements, and construction records [6,7].The soft com methods are developed as data-based solutions for predicting the TBM's load mathematical mapping.Rostami [8] elaborated theoretical and empirical method cent review.S. K. Shreyas [9] and Shahrour Isam [10] provided a brief retrospect o application of soft computing methods to predict various parameters in tunne underground excavations.By dividing the tunnel alignment into three general sections in terms of ge and geotechnical conditions, Avunduk et al. [12] proposed an empirical model dicting excavation performance of TBM.Through the mechanical decoupling me analyzing the cutterhead-ground interaction, Zhang et al. [13] proposed an appr calculation method for determining the load acting on the cutterhead.Based on t action between the TBM and excavated material, Faramarzi et al. [14] applied the element method (DEM) to evaluate the TBM torque and thrust.Rock-soil m methods and empirical models are both based on the premise that the geologic mation is known.However, the accurate prediction of a geological profile before In general, the prediction methods for cutterhead torque can be roughly grouped into three types: rock-soil mechanics methods, empirical methods (combined with experiments), and soft computing methods.The rock-soil mechanics method establishes a model according to the force balance among rock, cutters, and internal machinery [4,5].The empirical models are based on engineering experience involving a large amount of laboratory tests, field measurements, and construction records [6,7].The soft computing methods are developed as data-based solutions for predicting the TBM's load through mathematical mapping.Rostami [8] elaborated theoretical and empirical methods in a recent review.S. K. Shreyas [9] and Shahrour Isam [10] provided a brief retrospect of recent application of soft computing methods to predict various parameters in tunneling and underground excavations.In general, the prediction methods for cutterhead torque can be roughly grouped into three types: rock-soil mechanics methods, empirical methods (combined with experiments), and soft computing methods.The rock-soil mechanics method establishes a model according to the force balance among rock, cutters, and internal machinery [4,5].The empirical models are based on engineering experience involving a large amount of laboratory tests, field measurements, and construction records [6,7].The soft computing methods are developed as data-based solutions for predicting the TBM's load through mathematical mapping.Rostami [8] elaborated theoretical and empirical methods in a recent review.S. K. Shreyas [9] and Shahrour Isam [10] provided a brief retrospect of recent application of soft computing methods to predict various parameters in tunneling and underground excavations.By dividing the tunnel alignment into three general sections in terms of geological and geotechnical conditions, Avunduk et al. [12] proposed an empirical model for predicting excavation performance of TBM.Through the mechanical decoupling method for analyzing the cutterhead-ground interaction, Zhang et al. [13] proposed an approximate calculation method for determining the load acting on the cutterhead.Based on the interaction between the TBM and excavated material, Faramarzi et al. [14] applied the discrete element method (DEM) to evaluate the TBM torque and thrust.Rock-soil mechanics methods and empirical models are both based on the premise that the geological information is known.However, the accurate prediction of a geological profile before excavation is a hard and challenging task.In tunneling and underground excavation, the geological information is obtained through borehole sampling, and the stratum between sampling points are usually estimated by linear fitting.The distance between the sampling points is typically considerable, and the relevant result is often different from the real distribution, which may affect the accuracy of the rock-soil mechanics methods and the empirical models [15].By dividing the tunnel alignment into three general sections in terms of geological and geotechnical conditions, Avunduk et al. [12] proposed an empirical model for predicting excavation performance of TBM.Through the mechanical decoupling method for analyzing the cutterhead-ground interaction, Zhang et al. [13] proposed an approximate calculation method for determining the load acting on the cutterhead.Based on the interaction between the TBM and excavated material, Faramarzi et al. [14] applied the discrete element method (DEM) to evaluate the TBM torque and thrust.Rock-soil mechanics methods and empirical models are both based on the premise that the geological information is known.However, the accurate prediction of a geological profile before excavation is a hard and challenging task.In tunneling and underground excavation, the geological information is obtained through borehole sampling, and the stratum between sampling points are usually estimated by linear fitting.The distance between the sampling points is typically considerable, and the relevant result is often different from the real distribution, which may affect the accuracy of the rock-soil mechanics methods and the empirical models [15].Assisted by the advancement of sensor and measurement technology, modern TBMs can record series of operation parameters closely related to dynamic load, which provides a basis for the practical application of soft computing methods.Sun et al. [16] utilized the random forest (RF) algorithm to design a predictor for TBM load.Kong et al. [17] took geological conditions and operational data as inputs to build a prediction model based on the RF for predicting driving forces of a TBM in a soil-rock, mixed-face ground.Li et al. [18] used the one-dimensional convolutional neural networks and long short-term memory network (CNN-LSTM) to predict cutterhead speed and penetration rate (PR).Qin et al. [19] applied a deep neural network-based method to predict dynamic cutterhead torque based on operating data and status parameters.Suwansawat et al. [20] applied the multi-layer perceptron (MLP) to determine the correlation among TBM operational data, groundmass characteristics, and surface movements.Lau et al. [21] used a radial basis function (RBF) to estimate tunneling production rates of successive cycles.Gao et al. [22] used three kinds of recurrent neural networks (RNNs) to deal with TBM operating parameters' real-time prediction.Soft calculation methods usually involve the optimization of many parameters, and the selection of parameters based on experience will reduce the accuracy of the analysis results.To deal with this problem, there have been many hybrid methods proposed in the literature.For example, Zhou et al. [23] applied three optimization algorithms to optima of the hyper-parameters of the support vector machine (SVM) technique in forecasting the advance rate (AR) of TBMs.Armaghani et al. [24,25] proposed two hybrid, intelligent systems, namely the particle swarm optimization (PSO)-artificial neural network (ANN) and the imperialism competitive algorithm (ICA)-ANN, to approximate the PR and AR of TBMs, respectively.
Although relatively accurate prediction results can be achieved by soft computing approaches, most of them generally assume that training samples and future test samples have identical distribution characteristics, and their practicability still has room for improvement.During the excavation process, TBMs encounter varying geological and working conditions, such as accelerating, turning, jamming releasing, etc., resulting in considerable changes in the underlying pattern of operation data over space and time.So, historical datasets behave as a non-stationary time series that makes the correlation among parameters in a high degree of complicated, changeable, and challenging conditions to be described by simple or fixed mathematical expressions.Hence, it is a serious challenge to extract common knowledge from historical datasets to assist in building an adaptive model which dynamically changes with geological conditions and operating parameters, for implementing dynamic cutterhead torque prediction at the current moment.To a certain degree, this problem is similar to the paradigm of transfer learning [26,27], which addresses this problem by utilizing experiences gained from source tasks to improve the learning of new related tasks.Hu et al. [28] applied the concept of transfer learning for efficient wind speed prediction.The prediction model was trained on samples from older data-rich farms to extract wind speed patterns, and then finely tuned with samples from newly built farms.Rui et al. [29] constructed a novel transfer learning paradigm for time series prediction, and the principle of transfer learning is employed.However, TBM's historical data contains a variety of geological information and working modes.So, directly adopting the most intuitive transfer learning method without distinguishing all the working modes in the historical data may result in negative transfer problems.
Herein, a novel hybrid data-mining framework based on clustering, multitask learning (MTL), transfer learning, and least-squares support vector regression machines (LS-SVR), abbreviated as TRLS-SVR, is proposed for dynamic cutterhead torque forecasting of TBMs.In this framework, LS-SVR is selected as a baseline model, which has a powerful capability to capture underlying nonlinear relationships for a complex system.The underlying patterns in historical data are effectively divided according to the relationship among attributes [30].To take advantage of the knowledge contained in different working modes and to eliminate the damage from dataset bias, we adopt the idea of MTL [31], which explicitly exploits commonalities and differences across multiple working modes by learning them simultaneously rather than individually, to improve knowledge extraction ability.Based on the common knowledge extracted from historical data, we utilize the newly collected operation data to continuously update the pattern-specific biases parameters for adapting to the changing geological and working conditions.This study offers the following innovations and contributions.(1) The unsupervised learning algorithm for data clustering is combined with the MTL paradigm to explore and exploit the correlations among multiple working modes by learning simultaneously rather than individually, which enhances the ability of extracting public knowledge from a diversely recorded TBM historical dataset.
(2) It employs a transfer learning paradigm to reuse the public knowledge that is contained in the historical dataset to supply new data, and it alleviates random noise interference and fits the varying geological and working conditions well.(3) The TRLS-SVR performs superior performance in geologically complex and changeable locations, compared with that of conventional data-driven algorithms.
The rest of this study is organized as follows.Section 2 presents details of the proposed framework.In Section 3, the experimental verification is presented.In Section 4, some discussions on experimental results are provided.Section 5 concludes the whole study and provides future work.

Overall Framework
The framework of dynamic cutterhead torque prediction proposed in this paper draws inspiration from various machine learning methods, including clustering, MTL, and transfer learning.The overall framework of the TRLS-SVR mainly consists of four components, namely data pre-processing, dividing of typical working modes based on unsupervised clustering algorithm, extracting implicit common knowledge by MTL algorithm, and knowledge reuse based on transfer learning, as described in Figure 3.In the first step, a large number of historical datum that have a long-time span with current sample are extracted from the database.In the second step, a clustering algorithm was used to effectively divide working modes in the historical dataset according to the relationship among attributes.Next, the MTL paradigm was used to exploit representative knowledge from multiple working modes.Based on the transfer learning paradigm, experiences extracted from the historical dataset were retained and utilized to train a fresh model.The detailed descriptions of each component are introduced below.by learning them simultaneously rather than individually, to improve knowledge extraction ability.Based on the common knowledge extracted from historical data, we utilize the newly collected operation data to continuously update the pattern-specific biases parameters for adapting to the changing geological and working conditions.This study offers the following innovations and contributions.(1) The unsupervised learning algorithm for data clustering is combined with the MTL paradigm to explore and exploit the correlations among multiple working modes by learning simultaneously rather than individually, which enhances the ability of extracting public knowledge from a diversely recorded TBM historical dataset.( 2) It employs a transfer learning paradigm to reuse the public knowledge that is contained in the historical dataset to supply new data, and it alleviates random noise interference and fits the varying geological and working conditions well.
(3) The TRLS-SVR performs superior performance in geologically complex and changeable locations, compared with that of conventional data-driven algorithms.
The rest of this study is organized as follows.Section 2 presents details of the proposed framework.In Section 3, the experimental verification is presented.In Section 4, some discussions on experimental results are provided.Section 5 concludes the whole study and provides future work.

Overall Framework
The framework of dynamic cutterhead torque prediction proposed in this paper draws inspiration from various machine learning methods, including clustering, MTL, and transfer learning.The overall framework of the TRLS-SVR mainly consists of four components, namely data pre-processing, dividing of typical working modes based on unsupervised clustering algorithm, extracting implicit common knowledge by MTL algorithm, and knowledge reuse based on transfer learning, as described in Figure 3.In the first step, a large number of historical datum that have a long-time span with current sample are extracted from the database.In the second step, a clustering algorithm was used to effectively divide working modes in the historical dataset according to the relationship among attributes.Next, the MTL paradigm was used to exploit representative knowledge from multiple working modes.Based on the transfer learning paradigm, experiences extracted from the historical dataset were retained and utilized to train a fresh model.The detailed descriptions of each component are introduced below.

Clustering Based on the Relationship among Attributes
Due to continuous changes in geological conditions and work patterns, the historical dataset may contain multiple modes.In order to better extract public knowledge under different patterns, the first step is to divide the historical dataset into different clusters.Clustering as a pre-processing algorithm to uncover the underlying patterns and find natural partitioning within a dataset is widely utilized in engineering data analyses, such as fault detection, pattern recognition, and risk analysis.Currently, widely used clustering algorithms such as K-Nearest Neighbor (K-NN) and the fuzzy c-means algorithm (FCM) are mostly based on the spatial distribution to classify the dataset.However, the spatial distribution of different categories of TBM operation data is often similar, and conventional data clustering methods might not partition it effectively.The relationship among attributes varies considerably under different working and geological conditions, which can be used to improve clustering performance [32].Thus, in this paper, we employ the modified FCM algorithm, namely, SVR-FCM, presented by Shi et al. [30] for TBM operation data clustering, which is designed under the architecture of FCM, but it partitions the data based on the relationship among attributes rather than their spatial distribution.The distance D ik is defined as follows: The clustering objective function modified as follows: The necessary conditions for minimizing (2) result in the following partition matrix: A more detailed description of the algorithm architecture can be seen in [22].

Extracting Public Knowledge from Historical Dataset
The clustering categories correspond to typical working modes, which are combined of representative working and geological conditions.It should be noted that the data distribution is distinct but similar in different working modes.To extract the public knowledge contained in typical working modes, we adopt the paradigm of MTL, which explicitly exploits commonalities and differences across multiple working modes by learning them simultaneously rather than individually to improve knowledge extraction ability.MTL reinforces each task by using the interconnections between tasks, considering both the relevance and the difference between tasks to enhance the generalization performance.There has been abundant literature on MTL, showing that learning various related tasks simultaneously can be advantageous in predictive performance relative to learning these tasks independently [33,34].This study adopts the MTL method based on the minimization of the regularization function similar to LS-SVR, which has been successfully utilized for single-task learning [35].The LS-SVR can be formulated as Equation ( 4), which solves the regression problem by optimizing the output weight vector, w, and bias term, b, by minimizing a cost function with constraint, as shown in Equation (5).
e is a vector consisting of slack variables, and the hyper-parameter ρ controls the relative weight of each term.Herein, the output weight vector of different working modes, noted as w s , can be divided into the common vector w 0 , shared by all working modes and working-mode-specific bias vectors, v s , which can be formulated as follows: We estimate all v s as well as the (common) w 0 simultaneously.To this end, we solve the following optimization problem, which is analogous to the LS-SVR used for single-task learning: The number of tasks is S, which is equal to the number of clustering results.Specifically, x s,i represents the ith sample of the sth task, λ is the constraint coefficient, γ and η are penalty coefficient, and ξ s,i and ρ s,i represent the training error vector of the sth task.According to the Lagrangian multiplier method, to solve Equation ( 7) is equivalent to solving the corresponding Lagrangian problem: where α s,i and β s,i are the ith Lagrangian multiplier for the sth task.Based on the Karush-Kuhn-Tucker (KKT) conditions, setting the first partial derivatives of L D to zero, Eliminating w 0 , {v i } S i=1 , {ξ s,i } S,n i s=1,i=1 , and {ρ s,i } S,n i s=1,i=1 results in the solution of (9), being T , where Energies 2022, 15, 2907 The working mode-specific bias vectors can be mathematically formulated as follows: The extracted public knowledge is denoted as the following vector:

Dynamic Cutterhead Torque Prediction Based on Transfer Learning
Transfer learning is an emerging framework that aims to provide a paradigm to utilize previously acquired experience to solve new but similar problems faster and more effectively [33].There are some commonalities and associations between transfer learning and MTL.Both of them aim to improve the performance of learners via knowledge transfer.Transfer learning has been studied extensively for different applications in recent years, providing many opportunities for applying data-based methods to assist in design and analysis of complex engineering systems.
During the excavation process, geological information and operating parameters generally change continuously, so operation data around the excavation point have more reference significance for subsequent dynamic cutterhead torque prediction.In addition, vibration and shock often occur during excavation, and random noise interference inevitably exists in the measurement of fresh data, which may have a substantial impact on the prediction performance.Hence, training a new model by utilizing the knowledge contained in the historical dataset to reduce the requirement of number of new samples and alleviate the interference of random noise is always considered advisable.To leverage experiences extracted from the historical dataset, the output weight vector of the fresh model, noted as w t , is feasible to minimize the difference with the public vector, w 0 , that can be regarded as the public knowledge transferred from the historical dataset.We intend to train an approximator which has the minimal norm parameter vector and training errors for available fresh samples, that can be written as, where w t is the output weight vector over the fresh data, µ denotes the penalty parameter, C is the regularization parameter, ξ j is the training error, and m t is number of fresh training sets around the excavation point.According to Lagrangian multiplier method, to solve Equation ( 12) is equivalent to solving the corresponding Lagrangian problem: where α j is the jth Lagrangian multiplier, and based on the KKT conditions, the problem can be solved with the Lagrangian multiplier method, Energies 2022, 15, 2907 8 of 17 On analysis of Equation ( 14), it can be concluded that: Plugging Equations ( 12) and ( 13) into Equation ( 15) can we obtain: Let the solution of ( 16) be α In addition, the dynamic cutterhead torque prediction of fresh data can be mathematically formulated as follows:

Numerical Experiments
In this section, a collection of real-world operational and status parameters of TBM is utilized to demonstrate the superiority and applicability of the framework.

Experimental Settings
The tunnel project studied in this study is located in Shenzhen, China, which is about 2000 m long and 6.4 m in diameter.As described in Figure 4a, from the ground surface to the tunnel floor, various geological layers, such as clay, sand, and rock, are unevenly distributed.The tunneling equipment used in this tunnel is shown in Figure 4b, and has an earth pressure balance shield TBM with 500 T of total mass and 120 knives on its cutter head.The basic equipment parameters are listed in Table 1.During the tunneling process, the operational and state data of the TBM were recorded by a PLC, which was further read by an industrial computer at regular intervals and stored in the database.Thus, the fresh data in the database were added in batches during the tunneling process.The collected operation dataset represents the operational information and status parameters along the length of the tunnel, which contains about 44 attributes, such as cutterhead torque, chamber pressure, and advance velocity, etc. Please refer to the appendix for a detailed list of these attributes (see Table A1).In the process of dynamic cutterhead torque prediction, data come in batches.We selected five sets of sequence data to construct the test datasets, covering various working and geological conditions.Each collection of data contained approximately 5000 rows and 44 columns; the first 80% of the dataset were used as training samples and the last 20% were used as test samples.Each row of data represents the data of all physical quantities at a certain moment, and each column of data represents the data of a physical quantity at any moment.detailed list of these attributes (see Table A1).In the process of dynamic cutterhead torque prediction, data come in batches.We selected five sets of sequence data to construct the test datasets, covering various working and geological conditions.Each collection of data contained approximately 5000 rows and 44 columns; the first 80% of the dataset were used as training samples and the last 20% were used as test samples.Each row of data represents the data of all physical quantities at a certain moment, and each column of data represents the data of a physical quantity at any moment.To improve prediction accuracy, in this paper, we first normalized the samples with a normalization method, which is an essential pre-processing step in the field of machine learning.It is commonly referred to simply as "normalization," or sometimes as "feature scaling," and can be formulated as: min − max = x − X min X max − X min (18) where x is the current value and X min and X max are the minimum and maximum values of the entire dataset, respectively.The min-max method rescales values and confines samples to an interval between 0 and 1.The operational data modeling was conducted with a personal computer (CPU: Intel Core i7-10700; RAM: 32 G).The framework was coded by the author with Matlab and set as follows: the clustering algorithm parameter set refers to the setting of references [30], where the fuzzification parameter, m, was 2, threshold value, ε, was 10 −6 , number of clusters was 4, and maximum iteration was 1000.The radial basis function (RBF) was selected as a kernel function for LS-SVR.Compared with ordinary LS-SVR models, the framework proposed in this paper has more hyper-parameters, such as, η, λ, and µ, that determine the information extracted from historical data and knowledge transferred for constructing a new model.In this section, we set η = 100, λ = 1, and µ was determined according to the forecast accuracy of the previous batch, varying with the value of µ as {1, 5, 10, 15}.Other hyper-parameters were set to the same values with the baseline model LS-SVR, i.e., γ = C = 100.

Experiments and Results
To verify the efficacy and superiority, the performance of TRLS-SVR was compared to that of existing data-driven methods, such as RF, SVR, Lasso, and deep neural networks, i.e., long short-term memory (LSTM) networks [22] and online learning methods (i.e., online support vector regression (OSVR) [36]).The fitness of these prediction models was evaluated with four error criterions, i.e., the coefficient of determination (R 2 ), mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE).These metrics have the following formula.
where n is total number of samples, y i is the average of all actual values, and y i is the predicted value of y i .The closer R 2 is to 1, the better the performance is.The MAE and RMSE measure the disparity between actual values and predicted values, which reflects the dispersion of models.The RMSE is more sensitive to large errors than MAE because the errors are squared, and the large errors are amplified further.MAPE is the ratio between errors and actual values.It can be considered as a relative error function, and the smaller the value, the higher the prediction accuracy.These four error criterions can be applied to evaluate the fitness of these prediction models from various viewpoints.The evaluation results of the proposed TRLS-SVR and other five data-driven models on the five test datasets are shown in Tables 2-5.In general, the results show that three indicators of TRLS-SVR, i.e., MAE, RMSE, and MAPE, are lower than the other five datadriven models, and the coefficient of determination, R 2 , is higher than others.The average value of R  and 5.28% for OSVR, respectively.Hence, the average MAE of TRLS-SVR is 80.3% less than RF; 68.31% less than LSTM; 67.48% less than SVR; 60.55% less than Lasso; 28.8% less than OSVR, respectively.In addition, the average RMSE of TRLS-SVR is 75.34% less than RF; 59.56% less than LSTM; 59.74% less than SVR; 55.11% less than Lasso; 35.91% less than OSVR, respectively.Moreover, the prediction precision of TRLS-SVR is 77.95% higher than RF; 65.85% higher than LSTM; 65.67% high than SVR; 59.35% higher than Lasso; 31.61% higher than OSVR, respectively.For visual comparison, the real cutterhead torque values and predicted cutterhead torque values with these models are also provided in Figures 5-9.It can be observed that the prediction accuracy of existing data-driven models, i.e., RF, LSTM, SVR, and Lasso, is relatively low, and can only predict the average value and changing trend but cannot achieve prediction dynamically and accurately.The main reason may lie in that the cutterhead torque sequence is nonlinear and non-stationary, and it may contain several different working conditions simultaneously.Therefore, it is not advisable to describe the cutterhead torque sequence data by a simple or fixed mathematical formula.The in situ monitoring data are spatio-temporally coupled, and the data close to the excavation point have more reference significance for subsequent load prediction.Using these fresh data to update the model parameters dynamically can capture the load data sequence's changing trend with the geological parameters and the working parameters.Therefore, online learning-based methods' prediction accuracy is higher than traditional statistical data-driven models.In addition, in spite of online learning-based methods, OSVR has high prediction accuracy in some samples; its accuracy is still less than TRLS-SVR on the entire dataset, mainly because there is random noise interference in the measurement of cutterhead torque data.Only using a small amount of fresh data that are close to the excavation point to update model parameters will inevitably overfit random noises and introduce model bias, which leads to performance degradation.For visual comparison, the real cutterhead torque values and predicted cutterhead torque values with these models are also provided in Figures 5-9.It can be observed that the prediction accuracy of existing data-driven models, i.e., RF, LSTM, SVR, and Lasso, is relatively low, and can only predict the average value and changing trend but cannot achieve prediction dynamically and accurately.The main reason may lie in that the cutterhead torque sequence is nonlinear and non-stationary, and it may contain several different working conditions simultaneously.Therefore, it is not advisable to describe the cutterhead torque sequence data by a simple or fixed mathematical formula.The in situ monitoring data are spatio-temporally coupled, and the data close to the excavation point have more reference significance for subsequent load prediction.Using these fresh data to update the model parameters dynamically can capture the load data sequence's changing trend with the geological parameters and the working parameters.Therefore, online learning-based methods' prediction accuracy is higher than traditional statistical datadriven models.In addition, in spite of online learning-based methods, OSVR has high prediction accuracy in some samples; its accuracy is still less than TRLS-SVR on the entire dataset, mainly because there is random noise interference in the measurement of cutterhead torque data.Only using a small amount of fresh data that are close to the excavation point to update model parameters will inevitably overfit random noises and introduce model bias, which leads to performance degradation.The TRLS-SVR can effectively divide different working and geological conditions of historical data, and learn the cutterhead torque sequence's changing rule under different working modes.When the new coming data are disturbed by random noises or the excavation section's geological conditions, the implicit knowledge contained in historical data is explicitly transferred to reduce over-fitting of random noise, and to avoid introducing model bias.As a result, the proposed TRLS-SVR can achieve better prediction performance than that of existing data-driven methods.

Discussion
Compared with those of the baseline data-driven method, LS-SVR, the TRLS-SVR has more hyper-parameters, for example, η, λ, μ , and the number of fresh training sets, t m .These hyper-parameters determine the amount of information extracted from histor- ical data and the proportion of this information in the model update, which may affect the performance of the algorithm.As mentioned in Section 3.1, regularization parameter μ is determined according to the prediction accuracy of the previous batch.In this section, we focus on how the hyper-parameters η, μ , and number of fresh training sets, t m , influence the prediction accuracy of the TRLS-SVR framework.

Analysis of the Number of Fresh Training Sizes
In these experiments, we select 10, 20, 50, 100, 200, 300, and 400 of the datum which are near the excavation point as fresh training sets.The prediction accuracy of the different number of fresh training sets is compared in Figure 10.It can be seen that when the number of training sizes, t m , is small, the performance of TRLS-SVR improves faster as the number of samples increases, and when the number of training size, t m , is relatively high, the performance decreases as the number of samples increases.When the number of training sizes, t m , is 50, the proposed framework tends to provide the best prediction perfor- mance.This is because too little training data cannot reduce the interference of noise, which will lead to over-fitting of the noise and affect the prediction accuracy, while too much training data will smooth the changing characteristics of the continuous data to obtain average statistical characteristics and reduce the prediction accuracy.The TRLS-SVR can effectively divide different working and geological conditions of historical data, and learn the cutterhead torque sequence's changing rule under different working modes.When the new coming data are disturbed by random noises or the excavation section's geological conditions, the implicit knowledge contained in historical data is explicitly transferred to reduce over-fitting of random noise, and to avoid introducing model bias.As a result, the proposed TRLS-SVR can achieve better prediction performance than that of existing data-driven methods.

Discussion
Compared with those of the baseline data-driven method, LS-SVR, the TRLS-SVR has more hyper-parameters, for example, η, λ, µ, and the number of fresh training sets, m t .These hyper-parameters determine the amount of information extracted from historical data and the proportion of this information in the model update, which may affect the performance of the algorithm.As mentioned in Section 3.1, regularization parameter µ is determined according to the prediction accuracy of the previous batch.In this section, we focus on how the hyper-parameters η, µ, and number of fresh training sets, m t , influence the prediction accuracy of the TRLS-SVR framework.

Analysis of the Number of Fresh Training Sizes
In these experiments, we select 10, 20, 50, 100, 200, 300, and 400 of the datum which are near the excavation point as fresh training sets.The prediction accuracy of the different number of fresh training sets is compared in Figure 10.It can be seen that when the number of training sizes, m t , is small, the performance of TRLS-SVR improves faster as the number of samples increases, and when the number of training size, m t , is relatively high, the performance decreases as the number of samples increases.When the number of training sizes, m t , is 50, the proposed framework tends to provide the best prediction performance.This is because too little training data cannot reduce the interference of noise, which will lead to over-fitting of the noise and affect the prediction accuracy, while too much training data will smooth the changing characteristics of the continuous data to obtain average statistical characteristics and reduce the prediction accuracy.

Analysis of Regularization Parameters
We conduct experiments on the TBM dataset to discuss the sensitivity of the two regularization parameters η and λ.We In Figure 11, it can be seen that the optimal prediction accuracy by TRLS-SVR is achieved by setting η = 100 when λ = 1 is fixed.From Figure 12, it can be seen that the optimal prediction accuracy by TRLS-SVR is achieved by setting the value of λ as a small value.In addition, the prediction accuracy of TRLS-SVR changes slightly when the value of λ is in the range of [10 −5 , 1].

Analysis of Regularization Parameters
We conduct experiments on the TBM dataset to discuss the sensitivity of the two regularization parameters η and λ.We fix the number of fresh training sets, m t , as 50, hyperparameters as C = γ = 100, and regularization parameter µ is determined according to the prediction accuracy of the previous batch.For the sensitivity analysis of the regularization parameter, η, we fix λ = 1 and vary the value of η as {10 −3 , 10 −2 , 10  In Figure 11, it can be seen that the optimal prediction accuracy by TRLS-SVR is achieved by setting η = 100 when λ = 1 is fixed.From Figure 12, it can be seen that the optimal prediction accuracy by TRLS-SVR is achieved by setting the value of λ as a small value.In addition, the prediction accuracy of TRLS-SVR changes slightly when the value of λ is in the range of [10

Analysis of Regularization Parameters
We conduct experiments on the TBM dataset to discuss the sensitivity of the two regularization parameters η and λ.We In Figure 11, it can be seen that the optimal prediction accuracy by TRLS-SVR is achieved by setting η = 100 when λ = 1 is fixed.From Figure 12, it can be seen that the optimal prediction accuracy by TRLS-SVR is achieved by setting the value of λ as a small value.In addition, the prediction accuracy of TRLS-SVR changes slightly when the value of λ is in the range of [10 −5 , 1].

Limitations and Recommendations
The heterogeneous in situ data of the TBM include not only numerical data but also categorical data, such as the geological data.The heterogeneous in situ data have one special characteristic that is different for the sizes of the geological data and the operation data, which limits the application of data-driven techniques on them.Thus, in this paper, we only consider the operational data and ignore the geological data.In the future, to further improve the prediction accuracy of the framework, it is necessary to integrate geological data through multi-source heterogeneous data fusion.

Conclusions
In this study, a novel hybrid transfer learning framework named TRLS-SVR, that aims to enhance the accuracy of TBM dynamic cutterhead torque prediction, is proposed.In the proposed framework, the underlying patterns in historical datasets were effectively divided according to the relationship among attributes.The idea of MTL was adopted to exploit commonalities and differences across various working modes by learning them simultaneously rather than individually, to capture the public knowledge from historical datasets.In order to cope with the changing geological and working conditions, the idea of transfer learning was adopted and the newly collected operation data were utilized to continuously update the parameters of the forecasting model as a supplement.Realworld, in situ operational and status parameters from a tunnel located in Shenzhen, China, were utilized to evaluate the efficacy and superiority of the proposed framework.Experimental results demonstrated that the TRLS-SVR alleviated the shortcoming of traditional statistical data-driven methods, which can only predict the average value and changing trend of the cutterhead torque but cannot achieve dynamically and accurately the prediction of the load.Additionally, compared with the method of an online learning paradigm, which puts more attention to data closer to the excavation point, the framework has stronger robustness.This is because the model can use the knowledge contained in historical data to reduce the impact of random noise and alleviate over-fitting issues.In summary, the major novelty of this study is to provide a first test of merging MTL and transfer learning for TBM dynamic cutterhead torque prediction.Though the framework is

Limitations and Recommendations
The heterogeneous in situ data of the TBM include not only numerical data but also categorical data, such as the geological data.The heterogeneous in situ data have one special characteristic that is different for the sizes of the geological data and the operation data, which limits the application of data-driven techniques on them.Thus, in this paper, we only consider the operational data and ignore the geological data.In the future, to further improve the prediction accuracy of the framework, it is necessary to integrate geological data through multi-source heterogeneous data fusion.

Conclusions
In this study, a novel hybrid transfer learning framework named TRLS-SVR, that aims to enhance the accuracy of TBM dynamic cutterhead torque prediction, is proposed.In the proposed framework, the underlying patterns in historical datasets were effectively divided according to the relationship among attributes.The idea of MTL was adopted to exploit commonalities and differences across various working modes by learning them simultaneously rather than individually, to capture the public knowledge from historical datasets.In order to cope with the changing geological and working conditions, the idea of transfer learning was adopted and the newly collected operation data were utilized to continuously update the parameters of the forecasting model as a supplement.Real-world, in situ operational and status parameters from a tunnel located in Shenzhen, China, were utilized to evaluate the efficacy and superiority of the proposed framework.Experimental results demonstrated that the TRLS-SVR alleviated the shortcoming of traditional statistical data-driven methods, which can only predict the average value and changing trend of the cutterhead torque but cannot achieve dynamically and accurately the prediction of the load.Additionally, compared with the method of an online learning paradigm, which puts more attention to data closer to the excavation point, the framework has stronger robustness.This is because the model can use the knowledge contained in historical data to reduce the impact of random noise and alleviate over-fitting issues.In summary, the major novelty of this study is to provide a first test of merging MTL and transfer learning for TBM dynamic cutterhead torque prediction.Though the framework is presented in the context of dynamic cutterhead torque prediction of TBM, it can be easily extended to the status monitoring of other engineering systems, such as wind power equipment, automobiles, etc.In the near future, we plan to further investigate the adaptable adjustment of TBM's

Figure 2 .
Figure 2. Longitudinal geological profile of a tunnel.Reproduced with permission from [1 2019.

Figure 3 .
Figure 3.The proposed dynamic cutterhead torque prediction framework.Figure 3. The proposed dynamic cutterhead torque prediction framework.

Figure 3 .
Figure 3.The proposed dynamic cutterhead torque prediction framework.Figure 3. The proposed dynamic cutterhead torque prediction framework.

Figure 4 .
Figure 4. Geological sampling results and the TBM used.(a) Geological sampling results.(b) The TBM used.

Figure 4 .
Figure 4. Geological sampling results and the TBM used.(a) Geological sampling results.(b) The TBM used.

Figure 5 .
Figure 5. Comparisons between real and predicted cutterhead torque for dataset 1.(a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 5 .
Figure 5. Comparisons between real and predicted cutterhead torque for dataset 1.(a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 6 .
Figure 6.Comparisons between real and predicted cutterhead torque for dataset 2. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 7 .
Figure 7. Comparisons between real and predicted cutterhead torque for dataset 3. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 6 .
Figure 6.Comparisons between real and predicted cutterhead torque for dataset 2. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 6 .
Figure 6.Comparisons between real and predicted cutterhead torque for dataset 2. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 7 .
Figure 7. Comparisons between real and predicted cutterhead torque for dataset 3. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 7 .
Figure 7. Comparisons between real and predicted cutterhead torque for dataset 3. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 6 .
Figure 6.Comparisons between real and predicted cutterhead torque for dataset 2. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 7 .
Figure 7. Comparisons between real and predicted cutterhead torque for dataset 3. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 8 .
Figure 8. Comparisons between real and predicted cutterhead torque for dataset 4. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 8 .
Figure 8. Comparisons between real and predicted cutterhead torque for dataset 4. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 9 .
Figure 9. Comparisons between real and predicted cutterhead torque for dataset 5. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Figure 9 .
Figure 9. Comparisons between real and predicted cutterhead torque for dataset 5. (a) Prediction result of RF.(b) Prediction result of LSTM.(c) Prediction result of SVR.(d) Prediction result of Lasso.(e) Prediction result of OSVR.(f) Prediction result of TRLS-SVR.

Table 1 .
Basic parameters of the TBM used.

Table 2 .
R 2 of different methods in five datasets.

Table 3 .
MAE of different methods in five datasets.

Table 4 .
RMSE of different methods in five datasets.

Table 5 .
MAPE of different methods in five datasets.

Table 5 .
MAPE of different methods in five datasets.