A Novel Principal Component Analysis Integrating Long Short-Term Memory Network and Its Application in Productivity Prediction of Cutter Suction Dredgers

: Dredging is a basic construction for waterway improvement, harbor basin maintenance, land reclamation, environmental protection dredging, and deep-sea mining. The dredging process of cutter suction dredgers is so complex that the operational data show strong characteristics of dynamic, nonlinearity, and time delay, which make it difﬁcult to predict the productivity accurately via basic principles models. In this paper, we propose a novel integrating PCA-LSTM model to improve the productivity prediction of cutter suction dredger. Firstly, multiple variables are reduced in dimension and selected by PCA method based on the working mechanism of cutter suction dredger. Then the productivity is predicted via mud concentration in long short-term memory network with relevant operational time-series data. Finally, the proposed method is successfully applied to an actual case study in China. Also, it performs well in the cross-validation and comparative study for several important characteristics: (i) it involves the operational parameters based on the mechanism analysis; and (ii) it is a deep-learning-based approach that can deal with operation series data with a special memory mechanism. This study provides a heuristic idea for integrating the data-driven method and supervision of human knowledge for application in practical engineering.


Introduction
Marine-based transportation has always played a critical role in the national economics of China [1], while rivers may suffer from sediment accumulation that obstructs riverways and reduces the carrying activity [2,3]. Cutter suction dredgers are common and useful machines that can remove the mud deposited at the bottom of water and keep transportation routes in good condition [4]. Dredging productivity is one of the most important indexes to evaluate the dredging performance, which is affected by many factors such as soil properties, the power of the pump, the cutter structural parameters, and so on [5]. The process of sand being cut into a mixture of mud and water by a rotating cutter is very complicated. Most of the parameters are dynamically influenced by the uncertain working environment and human operation [6]. Due to the limitations of dredging technology, there are indeed some obstacles for parameters monitoring and real-time prediction, which is a challenge for constructing digital models to describe this process and dredging productivity accurately [7].
Due to the developed sensor technology, more operational data have been gained for analyzing dredging performance. In the literature review, machine learning methods were recently adopted to model the complex and dynamical construction process of CSD (cutter suction dredger) for their excellent learning and mining ability [8]. Generally, the learning-based prediction models can be mainly divided into two main types based on the depth of structure: shallow learning models and deep learning models [9]. The shallow learning methods mainly cover neural-network-based methods such as RBF (radial basis function), ELMs (extreme learning machines), and SVM (support vector machine). As traditional learning models used in productivity prediction, Wang et al. adopted an RBF neural network to deal with the different working conditions and established an accurate nonlinear mathematical model for the instantaneous output prediction with control variables [10]. Guan et al. developed the model of cutter operation parameters using improved ELMs to simulate and predict the productivity distribution in actual construction [11]. Yang et al. predicted the cutter suction dredger production with a double hidden layer BP neural network [12].
The prediction methods on deep learning mainly include: DNN (deep neural network), DBN (deep belief network), CNN (convolutional neural network network), and RNN (recurrent neural network network). DNNs are adopted through multiple auto-encoder (AE) or denoising auto-encoder (DAE) stacking networks, wherein high-dimensional input data are extracted from classless data as the distribution of the original data is represented by deep learning neural networks [13]. Wang et al. developed DNN models for production forecasting wherein the data-driven method was well used to deal with hydraulic fractures and intrinsic complexity [14]. DNN architecture transfers sigmoid function to ReLu and maxout with the purpose of overcoming gradient disappearance, but requires small batch training, which leads to over-fitting and local optimal solution problems. DBN is also a deep network that is stacked with multiple Restricted Boltzman Machines (RBMs) and a classification layer or regression layer. Xu et al. designed a DBN-based model to approximate the function type coefficients of a state-dependent autoregressive model in nonlinear system and realize predictive control [15]. Hu et al. adopted DBN to extract deep hidden features behind the monitoring signals and predict the bearing remaining useful life [16]. Researchers have improved DBN by combining a feed-forward neural network (FNN) to make prediction more accurate [17]. Zhang proposed a multi-target DBN collection method in which the output of multiple DBNs has a certain weight that reveals the final output of the network set; this method performed well on NASA aero-engine data [18]. Furthermore, convolutional neural networks (CNNs) are greatly developed with the excellent characteristics of parameter sharing and spatial pooling, which make it more advantageous in computing speed and accuracy [19]. However, all these ML methods have a limitation in situations that involve a time-series input.
Then recurrent neural networks (RNNs) are proposed with adding a twist where the output from previous time step is fed as input to the current step. The most important feature of RNN is that the hidden state can remember all information calculated from the previous sequence [20]. Thus, it can generate output through prior input (the past memory) and learning in training. It is common to complete the parameter learning of recurrent neural networks by learning over back-propagation, wherein the error is passed forward step by step in the reverse order of time. In [21], a learning-based method is applied to improve the RNN training process while the number of prediction time-steps increases. However, RNNs still suffer from the long-term dependencies, and long short-term memory (LSTM) fills the gap by setting a gate control unit that can choose and keep some useful information in the long-term sequential data. Being different from the traditional RNN, the model is trained by both the stored information of last time step and new input of the current moment, which enhances the prediction accuracy and stability greatly [22].
However, for the practical application in CSD, it is equally significant to analyze the interrelated influencing factors as the productivity prediction. LSTM lacks effective processing for high-dimensional characteristics of large-scale data; it should be used integrating with other methods. Principal component analysis (PCA) is one of the most widely used algorithms for feature reduction, which reconstructs the main k-dimensional feature based on the original n-dimensional feature by deep learning. While PCA is a pure data-driven method that cannot consider the casual relationship and correlation between variables, the procedure of variable analysis based on a working mechanism and human experience is necessary. Yang et al. described a HEPCA model, which made variables supplement based on expert knowledge after the PCA process and generate a more accurate input for the predictive model [23].
Therefore, combining the advantages and characteristics of the different methods described above, this paper presents the long short-term memory integrating principal component analysis model (PCA-LSTM) to predict productivity using the monitoring sensors data. The PCA-LSTM is strutted into four phases. In the first phase, monitoring sensors are analyzed to select related variables according to the working mechanism and knowledge. In the second phase, PCA method is applied to extract the deep feature from the high-dimension dataset and to obtain the correlation of variables. In the third phase, a prediction model is built and trained by the LSTM network. Finally, cross validation and comparative analysis are conducted with a generated model from "Chang Shi 10" in China.

Preliminaries
In this section, the related preliminaries regarding PCA and LSTM will be introduced briefly on the basis of the practical application in this study.

Principal Components Analysis (PCA)
PCA is an important technique that can transform multiple variables into a few main components (comprehensive variables) by means of dimensional reduction, increasing interpretability while minimizing information loss [24]. These main components are usually expressed as linear combinations of the original variables, which can represent most of the information of the whole dataset.
For original data X = (x 1 , x 2 , . . . , x i , . . . , x n ) and X ∈ R k * n , we can get the covariance After centralizing the data, the mean function E[X] is zero and: Assuming that there is a matrix P (P ∈ R k * k ), through which we can transform the original sample data matrix X into a dimensionality-reduced matrix Y (Y ∈ R k * n ): Then the original data dimension is successfully reduced from k to k , wherein the first k principal components can explain most of the variance.
For matrix Y, its covariance matrix can be expressed through original matrix X as: It is obvious from Equation (4) that C x is guaranteed to be a non-negative definite matrix and thus is diagonalizable by some unitary matrix. Then the objective optimization is transformed to find an orthonormal transformation matrix P. Normally, we can use eigenvalue decomposition or singular value decomposition to solve the matrix P, and the first k -dimensional new features corresponding to k eigenvalues can represent the whole data best.

Long Short-Term Memory Network (LSTM)
In this paper, an integrating model of long short-term memory network based on principal components analysis (PCA-LSTM) is explored to analyze the operational timeseries data generated from the dredging process. The proposed model is developed on the basis of long short-term memory network (LSTM), which is a special form of recurrent neural networks (RNN) that can address long-distance dependencies and delay in timeseries modeling.
The LSTM architecture was firstly proposed by Sepp Hochreiter and Jürgen Schmidhuber in 1997 [25]. There is a special memory cell unit added to the original hidden layer in classic RNN architecture. The cell state is controlled by three gates: Input gate I t , Forget gate F t , and Output gate O t , as shown in Figure 1. The forget gate F t decides which information needs to be kept and which can be forgotten. The information consists of the current input X t and previous hidden state/shortterm memory h t−1 .
For every time step, the sigmoid function generates values between 0 and 1 that indicate whether the old information is necessary. 0 denotes forget, and 1 means keep. W Forget is the weight matrix between forget gate and input gate. bias Forget is the connection bias.
The input gate decides what should be stored in the long-term memory in the new information. It works with the current input X t and previous short memory h t−1 through two layers. In the first layer, the short-term memory and current input is passed through a sigmoid function that values from 0 (not important) to 1 (important): where W Input is the weight matrix of sigmoid operator between input gate and output gate. bias Input is the bias vector. The second layer uses the tanh function to regulate the network. The tanh operator creates a vector C t with all the possible values between −1 and 1: where W cell is the weight matrix of tanh operator between cell state information and network output. bias Cell is the bias vector. With these two layers input, the cell state updates a new cell state (long-term memory): where is the Hadamard product.
When it comes to the output gate, the current input X t , the previous short-term memory h t−1 , and the newly obtained cell state C t determine the new short-term memory (hidden state) that will be passed on to the cell in the next time step.
W output is the weight matrix of the output gate. This hidden state is used for prediction. Both new cell state and hidden state are carried over to the next time step.

The Proposed PCA-LSTM Model
As described above, the basic knowledge regarding PCA and LSTM networks was introduced to set up the proposed PCA-LSTM model in this section. Considering mechanism and human experience, the PCA procedure will bring about a more accurate variable analysis for the practical multi-sensors system. The time-series data of effective variables are subsequently learned in LSTM network to output the target prediction.

PCA Based on Mechanism
The traditional PCA was introduced in Section 2.1. Due to the purely data-driven process, historical data is analyzed in PCA without any prior knowledge, which may bring about that some redundancies may be considered despite the causal relationship. Therefore, human experience will be introduced to interfere with the variable selection procedure ahead of PCA, based on the known mechanism.
The monitoring system always contains a broad range of sensors data related to the target object. Some of the data is on control variables, while some of the data is just on display variables that visualize the parameters. Assuming the sensor system obtains an initial dataset: where x i represents the i-th sensor equipped in the system.
where x i j represents the j-th data obtained by i-th sensor. When studying the working mechanism of the target, variables causal relationship will be analyzed and some of the redundant variables will be deleted, as well as some meaningless display parameters. This creates a new sample set: PCA based on human experience method obtains a hyperplanar representation of all samples through recent reconstruction, realizing the dimension reduction from k to k with the least loss.
The samples are centralized firstly as: Then a new coordinate system can be obtained after projection transformation: where w i is the standard orthogonal basis vector.
If a portion of the coordinate is abandoned, namely the dimension is reduced from k to k (k < k), the projection of samples x i in the low-dimensional coordinate system will be: where z ij is the j-th coordinate of x i in low-dimensional space; and x i can be reconstructed as:x For the whole training dataset, the distance between original samples x i and the reconstructed samplesx i can then be determined as: where const is the constant item, and W can be obtained by Equation (15).
x i x i T is a covariance matrix, the target distance function can be minimized as: where I is the identity matrix.
With the Langerin multiplier method [26], it can be calculated as: After eigenvalue decomposition, the eigenvalue λ can be obtained as follows: According to the practical demand, reconstruction threshold µ is set to satisfy the condition: When the maximum threshold µ is satisfied, the eigenvalues can be obtained in turn: And the eigenvectors corresponding to the first k eigenvalues constitute the PCA solution: The variables corresponding to the eigenvectors are: Based on the variables obtained by PCA procedure above, the correlation matrix can be calculated as: Then the most positively relevant variables to the target will be proceeded in the subsequent prediction model:

The Proposed Methodology
The variables most related to the target were obtained by PCA based on human experience that can be used as inputs in the next LSTM network to obtain prediction results. Namely, with current input being X * * t , the current cell state and hidden state are updated as described in Section 2.2.
Based on the new cell state and hidden state, we define gradient δ (t) h and δ (t) c to calculate the back propagation error layer by layer: where L(t) is the loss function, and at the last sequence index τ, the gradient can be described as follows: Therefore, for any moment t, δ where W is the coefficient matrix. Then the reverse gradient error of δ (t) c can be obtained through the gradient error of the current layer returned from h (t) and the gradient error of the previous layer δ (t) c : Then, the gradient of all parameters can be calculated easily using δ (t) h and δ (t) c , and all the parameters can be updated iteratively with the lowest error.
As mentioned above, the proposed method can be run in terms of Figure 2. It mainly consists of two parts: PCA and LSTM. The variables most related to the target are firstly obtained by PCA based on expert knowledge and used as inputs for LSTM network to get the prediction results.

Case Study
Cutter Suction Dredger is a special kind of ship that is widely used in dredging engineering. In this section, the proposed method is validated in a real case study of well-equipped 4500 m 3 /h cutter suction dredger "Chang Shi 10" that serves in the Yangzi River region.
Mud and sand is cut to mix with water by a rotary cutter during the construction operation of the dredger. Meanwhile, the dredge pump works and creates vacuum pressure at the suction mouth of the cutter. Under the great pumping force, mud is sucked into the dredger pipeline and finally discharged to the dumping area. The primary system according to the dredging procedure was highlighted in Figure 3.

Principal Components Analysis Based on Mechanism and Knowledge
During the construction, mud formation is influenced by many factors such as soil type, mechanical parameters and rotation speed of the cutter, traverse speed of the dredger, dredge pump parameters, and so on. To monitor and control the dredging process, up to 255 specific real-time sensors were arranged to collect the operational data [27]. Figure 4 shows some of the related monitoring parameters and relationship in automatic control system. As shown in Figure 4, some of the parameters are control variables, while some of them are only display variables that make the data visualization.
Soil property is an important factor affecting the construction process and efficiency of cutter suction dredgers. For different solidity and water-solubility, the mud concentration is limited by cutting performance and silt mixing. The cutter structure, pipeline diameter, and the motor power of pumps are all specific variables (constant), which were demonstrated according to the demand for rated productivity at the beginning of the design. However, the cutter speed, trolley trip, cutter ladder movement, and dredge pump rotation are all control variables that can be adjusted during the operation process in specific construction. When digging hard soil, the dredging depth should be reduced while enhancing the cutter speed to prevent the formation of large diameter mud balls and pipe blocking. When the dredging soil is sediment or silt, the pump velocity should be appropriately increased to reduce the mud concentration and avoid sedimentation or clogging in the pipeline.
According to the actual sensor system of cutter suction dredger "Chang Shi 10", we firstly select 20 variables from the initial operational dataset as shown in Table 1. Depth of the dredging m S 12 Rotation speed of the submersible pump rpm S 13 Rotation speed of the cutter rpm S 20 Flow m 3 /h S 23 Soil density kg/m 3 Traditionally, the instantaneous productivity of the cutter suction dredger is the product of the flow and mud concentration.
where C m (%) represent the mud concentration; Q is the flow amount per hour; and v is the flow rate. As shown in Table 1, we choose S 21 (mud concentration) as the target variable. In actual dredging construction process, the change of flow rate in the sludge pipeline is one of the important factors affecting the flow. Thus, we delete the redundant variable S 20 flow in the first step.
Meanwhile, the mud concentration is determined by the density of soil, water, and mud.
where γ m is the mud density; γ w is the water density; and γ s is the soil density. Then we drop three redundant variables S 223 , S 23, and S 164 in the second step. For the study period in this case, the ship works with just No.1 dredge pump. Thus, the variables related to No.2 dredge pump are non-meaningful to the productivity. Namely, S 101 and S 200 are dropped according to human analysis and finally we obtain the related variable set as: X = {S 8 , S 182 , S 108 , S 13 , S 9 , S 201 , S 12 , S 198 , S 100 , S 199 , S 165 , S 79 , S 80 , S 21 } As described in Section 3.1, the selected variables X dataset based on human experience will be processed by PCA in this section. In addition, the contribution result is shown in Figure 5.
It is obvious that the top 10 principal components can represent more than 97% of the overall data. For the top two principal components, the dataset can be plot as Figure 6.
The most positively relevant variables to target can be further determined by the correlation matrix, as shown in Figure 7.
As the correlation matrix shows, the correlation between S 21 and S 199 is 0.48677, which means the discharge pressure of No.1 dredge pump affects the concentration most. It is consistent with the practical production. Pressure will influence the mud and water proportion that is pumped into pipeline. The variable S 165 shows a correlation of 0.34628, which is also mentioned by other researchers [5,27]. The flow rate may determine the mud sedimentation during pipeline transportation. Furthermore, the vacuum correlation is 0.34152, since the vacuum gauge is installed on the upper part of the cutter, which is sensitive to the change of the mud concentration in the pipeline. Additionally, the angle of cutter ladder, the depth of dredging, and the trolley trip are all factors that affect the mud formation by operation controllers. However, for the discharge of the submersible pump, it is just the indirect factor to display the vacuum condition.
In general, the mud concentration is mainly inter-influenced by the dredge pump pressure, flow rate, vacuum, cutter ladder angle, dredging depth, and trolley trip.

Modeling Prediction Analysis
In this section, we choose the first segment series data and follow the steps given in Sections 2.2 and 3.2 to train the proposed model. This segment series data is collected from the monitoring system with a frequency of 100 sample points per minute. We intercept a dataset of 18,000 for 3-hour working time zone and obtain final 16,764 data after the pre-process.

Learning Results Analysis
In terms of the variable's selection process in Section 4.1, we use the most positively related nine parameters as input to predict the target output concentration C m (%). Considering the data amount effects on the learning ability in data-driven models, we divide the input with a proportion of 6:4 and 7:3 to test the model twice. The learning results are shown in Figures 8 and 9 respectively.  Concentration changes with the working conditions. As shown in the learning results, the normal range of the concentration is from 0 to 45%, which is a comprehensive result of the multiple factors interaction. High concentration is not necessarily good for the production since it may cause sedimentation or clogging in the pipeline. The results in this case are all normal and satisfactory. However, when it comes to the detailed comparison, the learning process with 60% of the dataset performs better than 70% dataset. For the 60% data training, its maximum and minimum error are 0.3091 and 0.0149 respectively. However, the maximum error is 0.526 in the 70% data training process. Also, as shown in Figure 10, for 60% dataset, the loss value decreases and then keeps steady in the training process. The error in testing falls and then keeps steady. However, for the 70% dataset, both training and testing error are less stable and consistent.

Cross Validation
Considering the necessary adaptability to dynamical changes, we use another dataset of 36,000 for 6-hour working time zone and obtain a final of 31,304 data samples for further cross validation to illustrate the proposed method's effectiveness and generality. The learning results are shown in Figure 11. It is obvious that the proposed method performs well in both training and testing processes. The average error in cross validation is 1.021%, which decreases since the data amount is becoming larger. In other words, data volume is essential for the deep learning method to function properly. This is just the advantage we explore in this novel model for prediction with operational "big data". Especially, the model can be updated by upcoming new data for more accurate results.

Comparative Study
This paper presents the novel PCA-LSTM method, which combines advantages of PCA and deep learning algorithm LSTM to manage big time-series data in operation monitoring systems. We compare the proposed method with other prediction methods including traditional PCA-LSTM and LSTM using the same dataset as Section 4.2 for further analysis. The results are shown in Figures 12 and 13.  It is obvious in Figure 12 that the proposed method works better with a satisfactory error range. LSTM shows the maximum deviation because there is no variable selection before of the prediction process. Although it is a powerful tool to deal with the big series data for its special gate control function, it cannot give consideration to the variable analysis.
In Figure 13, it is easy to find that the novel PCA-LSTM performs better than both traditional PCA-LSTM and LSTM in the test. The yellow line in the figure marks the proposed PCA-LSTM, which has the lowest mean average error of 0.9213%. The green line marks the traditional PCA-LSTM, which shows a mean average error of 1.5301%. However, LSTM shows the worst result with a MAE (mean average error) of 2.0269%. The results differences are caused mainly because of the variable selection for the prediction model. As the input of the data-driven model, variable selection should be more focused with human knowledge and experience.
From a practical point of view, the comparative results are also analyzed by different evaluating indicators such as MAE (mean average error), R 2 (coefficient of determination), and RMSE (root mean square error). As shown in Table 2, all of the models show a good performance with the coefficient of determination, which can explain the LSTM effectiveness. However, for the root mean square error, the proposed method takes on a better stability in the prediction result. The comparative results indicate that the control of input is essential to the machine learning methods.

Conclusions
This paper proposes the novel PCA-LSTM method for the productivity prediction of cutter suction dredger, wherein the deep learning process makes good use of the real-time operational monitoring data. The PCA method based on mechanism and knowledge is proposed to analyze the multiple parameters and select relevant variables from the operation process. Then the results are used as input into LSTM model to obtain the target prediction. This approach is also successfully validated by comparison against other methods on a real-world case in China. The productivity of cutter suction dredger is influenced by many correlated factors such as the soil characteristics, cutter parameters, mud pump performance, and pipeline layout. Thus, the mud concentration should be stabilized at a suitable value by comprehensive adjustment to improve its efficiency and productivity.
However, this is still a workable extension of deep learning application in the productivity prediction of cutter suction dredger. In the future, we will further construct the dynamical predictive models according to the changing working condition. When the operational parameters change dynamically under different conditions, the generated data should be classified into status space to study how the operation influences the dredging performance. Additionally, considering the sensors distance in the system, more factors on time-delay should be put up to improve the prediction accuracy.