Dynamic Displacement Forecasting of Dashuitian Landslide in China Using Variational Mode Decomposition and Stack Long Short-Term Memory Network

In recent decades, landslide displacement forecasting has received increasing attention due to its ability to reduce landslide hazards. To improve the forecast accuracy of landslide displacement, a dynamic forecasting model based on variational mode decomposition (VMD) and a stack long short-term memory network (SLSTM) is proposed. VMD is used to decompose landslide displacement into different displacement subsequences, and the SLSTM network is used to forecast each displacement subsequence. Then, the forecast values of landslide displacement are obtained by reconstructing the forecast values of all displacement subsequences. On the other hand, the SLSTM networks are updated by adding the forecast values into the training set, realizing the dynamic displacement forecasting. The proposed model was verified on the Dashuitian landslide in China. The results show that compared with the two advanced forecasting models, long short-term memory (LSTM) network, and empirical mode decomposition (EMD)–LSTM network, the proposed model has higher forecast accuracy.


Introduction
Landslides, as one of the most widespread and frequent natural hazards all over the world, not only directly threaten human life but also cause tremendous damage to the human living environment, resources, and property. To minimize the losses caused by landslide hazards, many scholars have carried out research on landslide prediction [1][2][3][4][5]. Among them, landslide displacement forecasting has been a contentious issue [6][7][8][9].
At present, one of the most widely used ideas for landslide displacement forecasting is to decompose the original landslide displacement, and then forecast each subsequence separately, and finally reconstruct all forecast values to obtain the forecast results. Following this idea, in [10], wavelet analysis is used to decompose the landslide displacement and a particle swarm-optimized support vector machine is used as the forecasting model. In [11], empirical mode decomposition (EMD) is matched with an extreme learning adaptive neuro-fuzzy inference system for landslide displacement forecasting. In [12], ensemble empirical mode decomposition (EEMD) is adopted to analyze the landslide displacement, which avoids the mode aliasing problem in EMD. Similarly, the authors of [13] also used the EEMD technique to decompose rainfall, reservoir level, and landslide cumulative displacement sequences into residual sequence and finite intrinsic mode functions with frequencies ranging from high to low. Moreover, the forecasting model combined with intelligent algorithms, such as genetic algorithm-least squares support vector machine [14], genetic algorithm-back propagation neural network [15], and particle swarm-optimized support vector machine [16], is favored by researchers.
Some achievements have been made in the above studies; however, there are some shortcomings in displacement analysis and forecasting models. For displacement analysis, wavelet analysis has some difficulties in determining the basis function and estimating the wavelet order; EMD and EEMD are random and uncontrollable for the decomposition number of landslide displacement, while excessive decomposition will increase the forecasting workload. For forecasting models, neural networks have a certain dependence on the size of data samples, and the selection of the support vector machine parameters has certain blindness due to manual setting, while intelligent algorithms can easily fall into a local optimum. Therefore, how to more scientifically and reasonably analyze the original time series and establish a high-performance forecasting model are the difficulties of landslide displacement forecasting research.
To improve the forecast accuracy of landslide displacement, a dynamic forecasting model based on variational mode decomposition (VMD) and a stack long short-term memory network (SLSTM) is proposed. The main contributions of this paper are as follows: (1) VMD with strict mathematical theory and controllable decomposition number is applied to displacement analysis, which obtains stable and fixed displacement subsequences; (2) A SLSTM network with "3 + 1" layers is designed to model and forecast each displacement subsequence, which improves the performance of a basic long short-term memory (LSTM) network; (3) A dynamic forecasting method is proposed, which realizes long-term landslide displacement forecasting.
The rest of this paper is organized as follows: Section 2 describes the basic principle and training method of the SLSTM network. Section 3 is devoted to describing the combined process of the VMD and the SLSTM network. The dynamic forecasting process based on the VMD-SLSTM network is also presented in this section. In Section 4, a real application case is introduced to verify the proposed forecasting model. Several comparative experiments are carried out to demonstrate the superiority of the proposed forecasting model. The last section draws conclusions.

LSTM Network
Recently, due to the advantages of deep learning in automatic feature extraction and high recognition rate or prediction accuracy, it has been successfully applied to speech recognition [17], action recognition [18], remaining useful life prognosis [19], traffic flow prediction [20], and other fields. As a commonly used deep learning model, the recurrent neural network (RNN) is an effective method for modeling dynamic sequences [21][22][23][24][25][26][27][28]. However, a RNN struggles to model long sequences because of gradient disappearance. To solve this problem, Hochreiter et al. proposed the long short-term memory (LSTM) network [29]. It uses multiple functionally distinct gates to control neurons and store information, thus having the ability to store important information for longer periods of time. The information is achieved by dot product ⊗ of activation function. Each gate state relies on a series of parameters trained by the gradient descent method. Figure 1 shows the structure of the LSTM network in detail. Each gate in the LSTM network has a specific and unique function. Forget gate f decides which information should be discarded from the previous state h t−1 . Input x t and previous state h t−1 after update gate u operation, together with the revised forget gate f , determine how much weight the candidate state h t should use to update state h t . To generate output ..., a non-linear function g 2 is used to filter its current state, and then it is returned after output gate operation. Herein, the returned partial state y t is taken as the next input y t−1 . Each gate depends on the current external input x t and the previous output y t−1 . The updated process of the LSTM network is given by Equation (1).
, and b ο are bias vectors; σ is the sigmoid or tanh function; 1 g and 2 g are the non-linear activation functions; and ⊗ denotes the dot product operation.

Training Method of the SLSTM Network
The architecture of the SLSTM network is shown in Figure 2. It consists of n LSTM layers and one fully connected layer. To reduce the model dimensions, the n th LSTM layer outputs a one-dimensional vector. The SLSTM network consists of two processes: forward computation and back propagation.
Similar to an artificial neural network (ANN), the forward calculation of the SLSTM network is given by where U is the weight matrix of input X , W is the weight matrix of this input from the previous state 1 − t h , f is the activation function, and V is the weight matrix of output layer.
where x t is the input vector at t time; W f , W h , W u , and W are the weight matrices associated with the input unit; R f , R h , R u , and R are the weight matrices for hidden layer connections; b f , b h , b u , and b are bias vectors; σ is the sigmoid or tanh function; g 1 and g 2 are the non-linear activation functions; and ⊗ denotes the dot product operation.

Training Method of the SLSTM Network
The architecture of the SLSTM network is shown in Figure 2. It consists of n LSTM layers and one fully connected layer. To reduce the model dimensions, the nth LSTM layer outputs a one-dimensional vector. The SLSTM network consists of two processes: forward computation and back propagation.
Similar to an artificial neural network (ANN), the forward calculation of the SLSTM network is given by where U is the weight matrix of input X, W is the weight matrix of this input from the previous state h t−1 , f is the activation function, and V is the weight matrix of output layer. On the other hand, the SLSTM network uses the back propagation through time (BPTT) algorithm [30]. The main idea of the BPTT algorithm is to train the SLSTM network using a back propagation algorithm after it is unfolded. Specifically, the error between the actual output and the expected output is calculated, and then the weight matrix is adjusted by the minimum error. Finally, the weight gradient is obtained and updated iteratively. Appl. Sci. 2019, 9,   On the other hand, the SLSTM network uses the back propagation through time (BPTT) algorithm [30]. The main idea of the BPTT algorithm is to train the SLSTM network using a back propagation algorithm after it is unfolded. Specifically, the error between the actual output and the expected output is calculated, and then the weight matrix is adjusted by the minimum error. Finally, the weight gradient is obtained and updated iteratively.

Variational Mode Decomposition
Unlike EMD, the intrinsic mode function (IMF) in VMD is redefined as an amplitude modulated-frequency modulated signal ( ) k u t , as follows [31]: φ k t are the instantaneous amplitude and phase.
In VMD, to obtain IMF components, the signal stripping method used in EMD is discarded.
Instead, after setting the parameters such as the mode number K , the penalty parameter a , and the rising step τ , the signal is decomposed by finding the optimal solution of the model shown below in the variational framework: is the estimated center frequency of each analytical signal, and ( ) f t is the original signal.
To solve for the optimal solution of the constrained variational model, Equation (4) needs to be transformed into an unconstrained model, as follows:

Variational Mode Decomposition
Unlike EMD, the intrinsic mode function (IMF) in VMD is redefined as an amplitude modulated-frequency modulated signal u k (t), as follows [31]: where A k (t) and φ k (t) are the instantaneous amplitude and phase.
In VMD, to obtain IMF components, the signal stripping method used in EMD is discarded. Instead, after setting the parameters such as the mode number K, the penalty parameter a, and the rising step τ, the signal is decomposed by finding the optimal solution of the model shown below in the variational framework: is the analytical signal of each IMF component, e −jω k t is the estimated center frequency of each analytical signal, and f (t) is the original signal.
To solve for the optimal solution of the constrained variational model, Equation (4) needs to be transformed into an unconstrained model, as follows: where λ is the Lagrange multiplier.
The saddle point of the unconstrained model is obtained by using the alternating direction multiplier algorithm, which is the optimal solution of the constrained variational model. Hence, the original signal f (t) is decomposed into K IMF components.

VMD-SLSTM Forecasting Model
The landslide displacement-time curve is a non-stationary time series. If a forecasting model is applied directly to the landslide displacement, it is very difficult to learn the intrinsic law of displacement change. A better idea is to decompose the original displacement into several stationary subsequences that are more suitable for modeling. Figure 3 shows the landslide displacement forecasting flow chart based on the VMD-SLSTM network. The landslide displacement is decomposed into K components by the VMD algorithm. On the other hand, the SLSTM network is used to forecast each component. Hence, the forecast values of landslide displacement are obtained by reconstructing the forecast values of all components.
where λ is the Lagrange multiplier.
The saddle point of the unconstrained model is obtained by using the alternating direction multiplier algorithm, which is the optimal solution of the constrained variational model. Hence, the original signal ( ) f t is decomposed into K IMF components.

VMD-SLSTM Forecasting Model
The landslide displacement-time curve is a non-stationary time series. If a forecasting model is applied directly to the landslide displacement, it is very difficult to learn the intrinsic law of displacement change. A better idea is to decompose the original displacement into several stationary subsequences that are more suitable for modeling. Figure 3 shows the landslide displacement forecasting flow chart based on the VMD-SLSTM network. The landslide displacement is decomposed into K components by the VMD algorithm.
On the other hand, the SLSTM network is used to forecast each component.

Dynamic Forecasting Process
For landslide displacement −  t y y y , to forecast the displacement value at + t q time, Figure 4 shows the dynamic forecasting process based on VMD-SLSTM network. The main steps are as follows: Step 1 The displacement is decomposed into K components by the VMD: 1 2 Step 2 Each component  i t p is forecasted using the SLSTM network, respectively.
Step 3 The sum of K forecast components denotes the final forecast displacement  t y .
Step 4 Each forecast component  i t p is added to the input, and the SLSTM network is updated to forecast the next value.

Dynamic Forecasting Process
For landslide displacement [y 1 , y 2 , · · · , y t−1 ], to forecast the displacement value at t + q time, Figure 4 shows the dynamic forecasting process based on VMD-SLSTM network. The main steps are as follows: Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 12 Step  To evaluate the forecast accuracy of the VMD-SLSTM network, mean absolute error (MAE) and root mean square error (RMSE) are adopted, which are defined as follows:  Step 1 The displacement is decomposed into K components by the VMD: [p i 1 , p i 2 , · · · , p i t−1 ] with i = 1, 2, · · · , K.
Step 2 Each componentp i t is forecasted using the SLSTM network, respectively.
Step 3 The sum of K forecast components denotes the final forecast displacementŷ t .
Step 4 Each forecast componentp i t is added to the input, and the SLSTM network is updated to forecast the next value.
Step 5 Steps 2-4 are repeated q times, and the sum of each forecast componentp i t+q denotes the final forecast displacementŷ t+q .
To evaluate the forecast accuracy of the VMD-SLSTM network, mean absolute error (MAE) and root mean square error (RMSE) are adopted, which are defined as follows:

Dashuitian Landslide
The Dashuitian landslide is located in Baiquan Village, Longjing Township, Wushan County, Chongqing City, China, as shown in Figure 5. In this paper, the landslide is taken as the research object. To ensure the safety of people's lives and property, the Dashuitian landslide has been professionally monitored since 1 March 2009. Three GPS displacement detectors were installed to measure the deformation of the landslide. Monitoring point A1 was installed at the trailing edge of the landslide, where many cracks had developed. In this paper, the displacements measured at monitoring point A1 were used to establish a forecast model. The displacement was monitored every 10 days, and 36 measured values were obtained when the deformation monitoring ended on 1 March 2010. The landslide has an undulating oscillation characteristic due to non-periodic factors such as rainfall and human engineering activities during the evolution process, as shown in Figure  6. To ensure the safety of people's lives and property, the Dashuitian landslide has been professionally monitored since 1 March 2009. Three GPS displacement detectors were installed to measure the deformation of the landslide. Monitoring point A1 was installed at the trailing edge of the landslide, where many cracks had developed. In this paper, the displacements measured at monitoring point A1 were used to establish a forecast model. The displacement was monitored every 10 days, and 36 measured values were obtained when the deformation monitoring ended on 1 March 2010. The landslide has an undulating oscillation characteristic due to non-periodic factors such as rainfall and human engineering activities during the evolution process, as shown in Figure 6. monitoring point A1 were used to establish a forecast model. The displacement was monitored every 10 days, and 36 measured values were obtained when the deformation monitoring ended on 1 March 2010. The landslide has an undulating oscillation characteristic due to non-periodic factors such as rainfall and human engineering activities during the evolution process, as shown in Figure  6.

Forecast Results Using the VMD-SLSTM Network
In the experiment, the measured data from the first 260 days were used as a training set, and the remaining data were used as a testing set. To ensure the fidelity after displacement decomposition, the penalty parameter 0.5 a = and the rising step were finally determined through multiple trial calculations. After several tests, it was considered that 3 K = has better decomposition effect. Figure 7 shows the decomposition results of original displacement for the training set.

Forecast Results Using the VMD-SLSTM Network
In the experiment, the measured data from the first 260 days were used as a training set, and the remaining data were used as a testing set. To ensure the fidelity after displacement decomposition, the penalty parameter a = 0.5 and the rising step τ = 0.1 were finally determined through multiple trial calculations. After several tests, it was considered that K = 3 has better decomposition effect. Figure 7 shows the decomposition results of original displacement for the training set. Next, the SLSTM network with "3 + 1" layers (3 LSTM layers and 1 fully connected layer) was built to learn the training set. To improve the convergence speed of the network, the data was normalized to have the same mean 0 and standard deviation 1 by Z-score. Figure 8 shows the forecast results of displacement subsequences using SLSTM networks. It can be seen that the forecast curve of each component conforms intuitively to the changing trend of the corresponding observed displacement subsequence. Next, the SLSTM network with "3 + 1" layers (3 LSTM layers and 1 fully connected layer) was built to learn the training set. To improve the convergence speed of the network, the data was normalized to have the same mean 0 and standard deviation 1 by Z-score. Figure 8 shows the forecast results of displacement subsequences using SLSTM networks. It can be seen that the forecast curve of each component conforms intuitively to the changing trend of the corresponding observed displacement subsequence. built to learn the training set. To improve the convergence speed of the network, the data was normalized to have the same mean 0 and standard deviation 1 by Z-score. Figure 8 shows the forecast results of displacement subsequences using SLSTM networks. It can be seen that the forecast curve of each component conforms intuitively to the changing trend of the corresponding observed displacement subsequence. Finally, the forecast results of the landslide displacement were obtained by reconstructing the forecast results of all displacement subsequences. Figure 9 shows the forecast results and errors of landslide displacement. It can be seen that the forecast curve can follow the observed curve well, indicating that the proposed model is valid. It is noted that after 320 days, the forecast results are not ideal. This is because although the forecast values are retrained, the accumulated errors will continue to impair the network performance. Finally, the forecast results of the landslide displacement were obtained by reconstructing the forecast results of all displacement subsequences. Figure 9 shows the forecast results and errors of landslide displacement. It can be seen that the forecast curve can follow the observed curve well, indicating that the proposed model is valid. It is noted that after 320 days, the forecast results are not ideal. This is because although the forecast values are retrained, the accumulated errors will continue to impair the network performance.

Comparison with Other Forecasting Models
To demonstrate the superiority of the proposed VMD-SLSTM network, two advanced forecasting models-the EMD-LSTM network [21] and the LSTM network [32]-are used for comparison. Figure 10 shows the forecast results of landslide displacement using the three forecasting models. It can be seen that the forecast results of the LSTM network are the worst. This is because the original landslide displacement-time curve is a non-stationary oscillation curve, and it is difficult to learn the law of displacement change if the LSTM network is used directly. The forecast curve of EMD-LSTM network fluctuates regularly, while that of VMD-SLSTM network can follow the observed curve well.

Comparison with Other Forecasting Models
To demonstrate the superiority of the proposed VMD-SLSTM network, two advanced forecasting models-the EMD-LSTM network [21] and the LSTM network [32]-are used for comparison. Figure 10 shows the forecast results of landslide displacement using the three forecasting models. It can be seen that the forecast results of the LSTM network are the worst. This is because the original landslide displacement-time curve is a non-stationary oscillation curve, and it is difficult to learn the law of displacement change if the LSTM network is used directly. The forecast curve of EMD-LSTM network fluctuates regularly, while that of VMD-SLSTM network can follow the observed curve well.
To demonstrate the superiority of the proposed VMD-SLSTM network, two advanced forecasting models-the EMD-LSTM network [21] and the LSTM network [32]-are used for comparison. Figure 10 shows the forecast results of landslide displacement using the three forecasting models. It can be seen that the forecast results of the LSTM network are the worst. This is because the original landslide displacement-time curve is a non-stationary oscillation curve, and it is difficult to learn the law of displacement change if the LSTM network is used directly. The forecast curve of EMD-LSTM network fluctuates regularly, while that of VMD-SLSTM network can follow the observed curve well.  Table 1 reports the mean absolute error (MAE) and root mean square error (RMSE) of the forecast results using the three forecasting models. It can be seen that the MAE of the VMD-SLSTM network is 1.99 mm, lower than the 3.99 mm of the LSTM network and 2.25 mm of the EMD-LSTM network. On the other hand, the RMSE of the EMD-SLSTM network is 2.50 mm, lower than the 4.15  Table 1 reports the mean absolute error (MAE) and root mean square error (RMSE) of the forecast results using the three forecasting models. It can be seen that the MAE of the VMD-SLSTM network is 1.99 mm, lower than the 3.99 mm of the LSTM network and 2.25 mm of the EMD-LSTM network. On the other hand, the RMSE of the EMD-SLSTM network is 2.50 mm, lower than the 4.15 mm of the LSTM network and 2.76 mm of the EMD-SLSTM. These results show that the VMD-SLSTM network outperforms the LSTM network and EMD-LSTM network.

Conclusions
In this work, a dynamic forecasting model based on variational mode decomposition (VMD) and stack long short-term memory network (SLSTM) is proposed. The proposed model is successfully applied to the Dashuitian landslide in China. Compared with the two advanced forecasting models-the LSTM network and the EMD-LSTM network-the proposed model has higher forecast accuracy.
The use of the VMD method, which is backed by mathematical theory, is proposed to solve the problem of the incomplete or excessive decomposition problems caused by traditional methods such as EMD and EEMD. The landslide displacement is decomposed into stable and fixed components by the VMD method, which lays the foundation for each component forecasting task.
The use of the SLSTM network, which is a novel predictor that can remember historical information, is proposed to conduct each component forecasting task. As a deep learning method, the SLSTM network improves the network forecasting performance through stacking LSTM layers. Finally, the forecast values of landslide displacement are obtained by reconstructing the forecast values of all components.
Considering that the proposed forecasting model only achieves slightly better performance than the EMD-LSTM network when the dataset is small, the VMD-SLSTM network is promising if the dataset is expanded in future work. In addition, the complexity of the SLSTM network makes parameter training more difficult. If better network parameters are trained, the prediction accuracy can be further improved. The residual compensation for the forecasting model should also be paid more attention.
Author Contributions: Y.X. and C.C. conceived and designed the experiments; Y.X. performed the experiments; K.C. and Y.B. analyzed the data; J.Y. and S.Z. carried out the data acquisition and manuscript editing; and Y.X. wrote the paper. All authors have read and approved the final manuscript.
Funding: This research was funded by National Key R&D Program of China, grant number 2018YFC1508603.

Conflicts of Interest:
The authors declare no conflict of interest.