Landslide Displacement Prediction Combining LSTM and SVR Algorithms: A Case Study of Shengjibao Landslide from the Three Gorges Reservoir Area

: Displacement predictions are essential to landslide early warning systems establishment. Most existing prediction methods are focused on ﬁnding an individual model that provides a better result. However, the limitation of generalization that is inherent in all models makes it di ﬃ cult for an individual model to predict di ﬀ erent cases accurately. In this study, a novel coupled method was proposed, combining the long short-term memory (LSTM) neural networks and support vector regression (SVR) algorithm with optimal weight. The Shengjibao landslide in the Three Gorges Reservoir area was taken as a case study. At ﬁrst, the moving average method was used to decompose the cumulative displacement into two components: trend and periodic terms. Single-factor models based on LSTM neural networks and SVR algorithms were used to predict the trend terms of displacement, respectively. Multi-factors LSTM and SVR models were used to predict the periodic terms of displacement. Precipitation, reservoir water level, and previous displacement are considered as the candidate factors for inputs in the models. Additionally, ensemble models based on the SVR algorithm are used to predict the optimal weight to combine the results of the LSTM and SVR models. The results show that the LSTM models display better performance than SVR models; the ensemble model with optimal weight outperforms other models. The prediction accuracy can be further improved by also considering results from multiple models.


Introduction
Since the impoundment of the Three Gorges Reservoir area (TGRA) in June 2003, many new landslides have occurred and some existing landslides have been reactive. The Shanshucao landslide took place on 2 September 2014, interrupting the power supply for the town and damaged around 20 × 10 4 m 2 area of orange planting [1]. The Honyanzi landslide took place on 24 June 2015, initiating a reservoir tsunami that resulted in two deaths and significant damage to shipping facilities [2]. As an essential component for landslide early warning systems, displacement predictions become ever more important in the TGRA.
In general, landslide displacement predictions involve the following steps: (i) the decomposition of total displacement, (ii) the selection of candidate factors, (iii) the establishment of predicting models, and (iv) the evaluation of prediction results. For the decomposition of total displacement (step i), the moving average method (MAM) [3], the empirical mode decomposition [4], the wavelet analysis [5,6], and the variational mode decomposition [7] have been proposed to obtain sub-sequences of total displacement. The obtained sub-sequences are usually defined as trend terms and periodic terms. The trend term of the original total displacement represents the component mainly caused by the state factors and the periodic term represents the component mainly caused by the influence factors. For the selection of input factors (step ii), the mutual information, the Pearson cross-correlation coefficient [4], and the grey correlation analysis method (GCAM) [8,9] have been used. In the TGRA, the landslide deformation is the consequence of the state and the influence factors, and these controlling factors are very diverse [10][11][12]. Thus, it is necessary to select factors before establishing models.
Since Saito [13] proposed the empirical formula for landslide prediction, many approaches to landslide displacement prediction have been developed [14], including those based on creep theory, statistical theory, time-series analysis, and artificial intelligence methods [15,16]. For artificial intelligence methods that have been applied in landslide displacement, the Gaussian process model [17], the neural network model [18][19][20], the Support Vector Machine (SVM) model [9,21,22], the Extreme Learning Machine (ELM) model [8,23,24], and the LSTM model [4,25] have become popular. Moreover, these approaches can also be divided into two categories which are univariate forecasting methods and multivariate forecasting methods. Univariate forecasting methods only consider previous displacement, while the multivariate forecasting methods consider triggering factor as well.
The current studies mainly focus on choosing single prediction models from the numerous artificial intelligence methods. Since some models many be only suitable for specific landslide, a specific type of landslides, or a set of landslides with specific local geological conditions, periods, and variations, it is necessary to study how to accurately and comprehensively different models predict landslides by considering advantages from different models [15,16]. Moreover, the linear combination is perhaps most well developed and mature [26]. Therefore, this study mainly focuses on an Ensemble model with the weight coefficients based on linear combination theory.
In this study, the accumulated displacements of the selected monitoring points on the Shengjibao landslide were decomposed into trend term and periodic term using the MAM. The trend terms were predicted using the univariate forecasting methods based on the LSTM and SVR algorithm, respectively. The periodic terms were predicted by the multivariate forecasting methods based on the LSTM and SVR algorithm with analyzing the relationship between landslide displacement and the triggering factors separately. The predictive total displacements of each model were obtained by adding their predictive trend and periodic terms. The final total predicting displacements were obtained by adding the predictive total displacements of both models with the weight coefficients obtained from the prediction of the optimal weight.

Decomposition of the Displacement Time Series into Trend and Periodic Terms
Landslide displacement was modeled using two components: a trend component and a periodic component [3,27]. In the long-term, the deformation trend is controlled by internal geological conditions, such as lithology, geological structure, and progressive weathering. The short-term deformation in the TGRA is influenced by two main external triggering factors: rainfall and reservoir water level. The short-term displacement represents the periodic component in the model [25]. The accumulated displacement time series can be decomposed as follows: where t is the time step, S(t) is the accumulated displacement, φ(t) is the trend term, and η(t) is the periodic term.

Moving Average Methods
The decomposition of the trend and periodic terms forms the basis of the model. If the period of the time series and influencing factors can be determined accurately, the fluctuation can be smoothed by the moving average method, and the trend term of the displacement can be extracted. The function of the single moving average method is shown as follows: where S t is the trend term of the displacement at the time step t, S t is the accumulative displacement of the landslide at the time step t, n is the moving average cycle. Considering the scheduling cycle of the TGRA, the moving average cycle was selected as 12 months [21].

Long Short-Term Memory Neural Networks
Recurrent neural networks (RNNs) ( Figure 1) have been successfully applied to a variety of problems, such as speech recognition, language modeling, translation, and image captioning [28,29]. RNNs can utilize historical information and apply it to the current output, which allows for feedback in the networks [30]. The recurrent connections in the RNN implement passing information from one step of the network (time step t − 1) to the next one (time step t). For a given input sequence x = (x 1 , x 2 , · · · , x T ), the hidden state vector sequence is h = (h 1 , h 2 , · · · , h T ), and the output sequence is y = (y 1 , y 2 , · · · , y T ). The output sequence y can be obtained by iterating the following equations from time t = 1 to T: where x t and y t are the input and output at time step t; h t and h t−1 are the hidden states at time step t and t − 1, respectively; T is the prediction period; W xh is the weight matrix between input and hidden vectors; W hh represents the weight matrix between different time steps of the hidden vectors; W hy represents the weight matrix that connects the hidden information to the output vectors; b h and b y are the corresponding bias vectors of W hh and W hy , respectively; and is the nonlinear activation function for hidden nodes. However, in practice, RNNs are not capable of handling a problem as long-term dependencies because of the vanishing gradient or exploding gradient problems [31]. The LSTM neural networks are a special type of RNN, which are capable of learning long-term dependencies [32]. LSTMs replace every single unit in an RNN with a memory block ( Figure 2). There are three types of gate functions in this memory block [33]. The input gates control the flow of input activations into the memory cell, and the output gates control the output flow into other cells or as the final results [34]. The forget gate controls whether the value from the previous moment is forgotten or remembered. The combination of the three types of gates controls the output state. The key difference between RNN and LSTM lies in the formulation of the hidden vector (h t ). The hidden vector (h t ) in LSTM neural network can be obtained as follows: where i t , f t , o t , and c t are the values of the input gate, the forget gate, the output gate, and the memory cell in the memory block; b i , b f , b o , and b c are their corresponding bias values; σ is the sigmoid function; W x represents the weights between input nodes and hidden nodes; W h represents the weights between hidden nodes and memory cell; W c represents the weights that connect memory cell to output nodes. The output y t can then be obtained by Equation (2). The output sequence y = (y 1 , y 2 , · · · , y T ) will be obtained by iterating Equations (2) to (7) from times t = 1 to T.

Support Vector Regression
The support vector regression (SVR) model was proposed by Vapnik et al. in 1995 and has been widely used in nonlinear problem solving [35,36]. In SVR models, sample data are divided into a fitting sample and predicting sample. The fitting sample is then mapped to a high-dimensional feature space. The best fitting effect is obtained in the space of the optimal decision function model and the validation sample is used to validate the analytical model results. The regression function for SVR is: where W is the weight vector, Φ(x) is a nonlinear mapping from the input space to the output space, b is bias. Transforming the estimation function into a function minimization problem by the ε insensitive loss function: The constraint conditions are as follows: where C is the penalty factor; ξ i and ξ i * are relaxation factors. The Lagrange multiplier is introduced at last, and converted into an equivalent dual problem based on the Wolf duality theory: The constraint conditions are expressed in Equation (10): where . The SVR prediction model can be obtained by quadratic programming: where K(x i , x) is the kernel function of SVR. Linear kernel, polynomial kernel, radial basis kernel function (RBF), and sigmoid function [37] are four commonly used kernel functions for SVR algorithm. With a wide convergence domain, the RBF is widely used [21,38,39]. Therefore, the RBF is adopted as the kernel function for the SVR model in this paper.

Ensemble Model
We used ensemble models to reduce the instability of a single prediction model. The core idea of ensemble model is to assign different weight coefficients to the predictive results of different predicting models, to extract effective information from each model system, utilize the forecasting results of each model synthetically, disperse forecasting risks and errors, and improve the accuracy and reliability of predicting [15].
The displacement forecasts based on the LSTM model and the SVR model at the time step t were L(t) and S(t), the total displacement forecast of the ensemble algorithm was E(t), the original displacement in the validation dataset at the time step t was O(t).
where W is the dynamic weight coefficient obtained from the prediction of optimal weight.

Reliability Evaluation of the Model
After tuning the hyperparameters of the LSTM and SVR models with the fitting datasets, the training datasets and optimal hyperparameters were adopted to train the LSTM models and SVR models. To verify the prediction accuracy of the LSTM models, SVR models and the ensemble models, the root mean square error (RMSE) (Equation (13)) and mean absolute percentage error (MAPE) (Equation (14)) were calculated. The overall process is shown in Figure 3.
where x i is the measured value;x i is the predicted value; N is the number of predicted values.

Shengjibao Landslide
The Shengjibao landslide is located on the South Bank of the Yangtze River in Fengjie County ( Figure 4). The Yangtze River flows through the front of the landslide area from S72W. The overall landform type of the landslide is structure-erosion and denudation low-mountain valley. The Yangtze River Valley extends eastward in this section. The width of the river in this area is 350-400 m; the valley is V-shaped with asymmetrical banks. The topographic slope of the northern bank is steep, generally 30-50 degrees. The southern bank is a consequent slope with an overall slope of 15-20 degrees. The slope surface is ladder-like and gullies on the slope are well-developed. The slope gradient at the bottom of the gully is similar to that on the slope surface. The front elevation of the Shengjibao landslide is 81-85 m, the escarpment elevation is 370 m. The gradient range is 15-20 degrees and the longitudinal length is 1500 m. The lateral average width is about 1160 m, the average thickness is 25 m, the total area of landslide is about 2.45 km 2 , and the total volume of the landslide is 3998 × 10 4 m 3 .

Landslide Activity
The initial monitoring network of Shengjibao landslide was established in March 2007, which includes 16 horizontal displacement monitoring points ( Figure 5). In May 2007, Shengjibao landslide began to deform after the first 156 m impoundment of the TGRA, which was affected by the drop of the reservoir water level and seasonal rainfall. Movement has been small-scale controlled by the topography and generally occurred in the loose soil of the surface layer. The main manifestations are the deformation of some houses, discontinuous cracks on the ground, and ground tilt ( Figure 6). The main slide direction of the landslide is 348 degrees and a typical schematic geological cross-section is shown in Figure 7. Based on data collected from March 2007 to December 2015 (Figure 8), four stages of surface displacement along the II-II' profile could be identified (Table 1).     The displacement data showed a sharp increase in surface displacement in August and September 2013 ( Figure 8). However, a similar pattern was not detected in August and September 2014, despite comparable rainfall levels. According to the precipitation data showed in Figure 9, there was a rainfall pattern that lasted for 15 days during the period in 2013, and a rainfall pattern only lasted for 9 days in 2014. This indicates that under the same fluctuation conditions of reservoir water level, a longer duration of rainfall has a more important influence on landslide deformation.

Point Selection and Data Processing
As shown in Figure 5, there were three types of deformation zones on the Shengjibao landslide: strong deformation zone, obvious deformation zone, and weak deformation zone. We selected ZK2-1 and ZK2-3 from the obvious deformation zone and the strong deformation zone, respectively, to represent two types of deformation points on the Shengjibao landslide.
Eighty-four measured values from January 2009 to December 2015 were adopted for establishing predictive models from the GPS monitoring points as ZK2-1 and ZK2-3. First, the adopted values were decomposed with MAM into trend terms and periodic terms. After decomposition, as suggested by Yang [25], 72 values were set as the fitting dataset, 48 values were set as the training dataset to fit the models, and 24 values were set as validation dataset to tune the hyperparameters of the models. Twelve values were set as predicting dataset to evaluate the capability of the models ( Figure 10).

Factors Selection
Selecting the proper triggering factors is essential to guarantee the accuracy of a prediction. First, the candidate factors were determined with landslide deformation characteristics. Since displacement rates mainly increased during the rainy season and reservoir water level fluctuation stages (Figure 11), rainfall and reservoir water level were considered the main triggering factors that caused the deformation of the Shengjibao landslide. Additionally, the Wenchuan earthquake that occurred in May 2008 had an insignificant impact on the pattern of the landslide displacement according to the monitoring records shown in Figures 8 and 11. Thus, the events of the earthquake were not considered as candidate factors for prediction. As the values of GPS monitoring data were obtained at the end of each month, the candidate factors should include the precipitation during the current month, the precipitation during the past two months, the average reservoir level during the current month, and the change of the reservoir level during the current month, the change of the reservoir level during the past two months. These candidate factors were considered as f 1 , f 2 , f 3 , f 4 , f 5 . Slopes in different evolutionary states may respond differently to the same external trigger factors. When the slope is stable, even strong influence factors are usually not enough to cause large deformation. In contrast, if a slope is marginally stable or even unstable, the slightest increase in external "loads" may cause disequilibrium and large deformation [40]. Therefore, the state factors were also considered as input values. The candidate factors should include the change of the displacement during the last month, during the last two months, during the last three months, and the average displacement during the last 12 months (annual displacement rate). These candidate factors were considered as f 6 , f 7 , f 8 , f 9 .
A resolution coefficient of 0.5 was obtained using the GCAM. The relational degrees r k between the periodic term and the candidate factors are shown in Table 2. The candidate factors are closely related to the periodic term, r k > 0.6, suggesting that the parameters were properly selected [41].

Normalization and Inverse Normalization
LSTM is a type of Deep Learning Algorithms, while SVR is a type of Instance-Based Algorithm. Additionally, they are all Machine Learning (ML) Algorithms, while for ML Algorithms, normalization makes it easier to find optimal parameters and reduces the chances of getting stuck in local optimal [42,43]. To maximally preserve the distributions of the dataset, each triggering factor is normalized to [−1, 1] using linear normalization.
As stated in Section 4.1, the whole fitting datasets were assumed to be known, and the predicting datasets were considered unknown. To prevent information leakage, maximum and minimum values of inputs and outputs for normalization and inverse normalization were obtained from the fitting datasets, respectively.
where x * is the normalized value, x is the original value, x max is the maximum value of the samples, and x min is the minimum value of the samples.

Parameters of LSTMs and SVRs
Keras framework [44] was used to establish LSTM models, while TensorFlow was used as the back end, and Python language was adopted to write the codes. As a time series forecasting experiment, hyperparameters of LSTM models were tuned with the Grid Search (GS) method [44]. Its process can be described as follows: plausible values for each parameter are combined, and then all the combination values are used in the training process of the model. The performance metric of each combination is typically measured by validation on the validation dataset. When all the parameter combinations have been tried, an optimal parameter combination with the best performances is returned automatically.
For LSTM models establishment, batch-size, and numbers of layers of LSTM models were determined with the GS method successively. In GS processing, the gird range of batch-size is [1,20] with a grid step of 1, the gird range of layers is [2,8] with a grid step of 1. The optimal epochs were obtained with an early-stopping set during the algorithmic process, and the patience of early-stopping was set as 20 which means the algorithmic process will stop when the result has no improvement within 20 steps.
The SVR models were also operated in the Python language environment. The GS method was used to search for the optimal parameters of the SVR model. As stated in Section 2.4, the RBF was adopted as kernel function for the SVR model. The values of gamma (the kernel coefficient for RBF) and the values of C (the penalty parameter) were determined with GS. In GS processing, the search range of gamma is [0.0075, 1] with a grid step of 0.0075, the search range of C is [1,20] with a grid step of 1. The best parameters are shown in Table 3.      (Table 6). Table 6. Accuracy of the prediction in trend term and periodic term of ZK2-1 and ZK2-3. The LSTM models and SVR models both had worse performances after May 2015. LSTM models had significantly better performance in step a, period c, step d, and period e than SVR models. The performances of the models in step a, period b, period c, step d, and period e in the periodic terms were reflected in the total displacement ( Figure 12).

Ensemble Models
Eight models were established by tuning hyperparameters and training for prediction. In this section, the whole datasets with 84 time-steps of ZK2-1 and ZK2-3 were applied to make predictions with the LSTM models and SVR models. The total prediction displacements of ZK2-1 and ZK2-3 from January 2009 to December 2015 with LSTM models and SVR models can be obtained with this process. An optimal weight can be obtained with setting the original total displacements as E(t) in Equation (12).
The optimal weights were set as outputs, and all the inputs that show in Table 2, the obtained total prediction displacements with LSTM models, the obtained total prediction displacements with SVR models, and the difference between these two total prediction displacements were set as candidate input factors. The latter 3 new candidate factors were considered as f 10 , f 11 , f 12 .
An SVR model using the GS method was adopted to predict the optimal weight for an ensemble model. As stated in Section 4.1, 84 measured values were adopted for finding a better model between LSTMs and SVRs. For the ensemble model runs, 72 time-steps were set as the fitting dataset, 48 time-steps were set as a training dataset to fit the models, and 24 time-steps were set as validation dataset to tune the hyperparameters of the SVR models and to determine the optimal inputs (Table 7). Twelve time-steps were set to evaluate the models. The optimal weights were set as 1 when the predicting values were more than 1, and were set as 0 when the predicting values were less than 0. Table 7. Optimal hyperparameters and inputs of SVRs used for optimal weight prediction.

Point
Inputs Hyperparameters

C Gamma
ZK2-1  The total displacements predicted by the Ensemble model were in better agreement with the measured values than those with the LSTM or SVR models. As shown in Figure 13, in point a, point b, point c, and period d, the results of the Ensemble models tend to be closer to the results of original values than at least one of the LSTM or SVR models. This suggests that when the results of the LSTM and SVR models are different, the Ensemble model gives more weight to the results of the model that are closer to the real values. As shown in Table 8, in 12 predicted points of ZK2-1, 7 of them were had lower values of absolute error than the LSTM model, 5 of them had lower values of absolute error than the SVR model. As shown in Table 9, in 12 predicted points of ZK2-3, 2 of them were had lower values of absolute error than LSTM model, 10 of them had lower values of absolute error than SVR model.

Discussion
Considering predicting total displacement, the LSTM models showed a more satisfactory performance than the SVR models in ZK2-1 and ZK2-3, respectively, and the proposed Ensemble models yield relatively better results (Tables 8 and 9). In predicting trend terms, the LSTM models showed better performance than SVR models in ZK2-3 and worse performance in ZK2-1. In predicting periodic terms, the LSTM models showed better performance than SVR models in ZK2-1 and worse performance in ZK2-3. This suggests that the accuracy of the total displacement prediction does not depend in the case of the Shengjibao landslide on the accuracy of only one sub-sequence.
Meanwhile, the generalization ability of the models directly affects the performance of the test dataset. The models established based on the performance on the validation dataset do not always perform as expected on the predicting dataset. As shown in Tables 6 and 10, the LSTM models did not achieve the expected performance in predicting the trend term of ZK2-1 or the periodic term of ZK2-3, while the SVR model did not achieve the expected performance in predicting the trend term of ZK2-3. There have been few reports on the research of decomposing the original landslide displacement into trend term, periodic term, and stochastic term [7,9] up to now. The stochastic displacement is caused by some random factors such as wind load and vehicle load. In decomposing original displacement, we chose to ignore the stochastic term. One should consider means to develop optimal models for predicting the stochastic displacement in the future. Additionally, the stability of the location of the GPS monitoring control point is the key controlling factors of the reliability of the monitoring data. Thus, the evaluation of the stability of the GPS monitoring control point is an important content that cannot be ignored in the processing of displacement observation data.
As stated in Section 3.2, there is a sharp increase in the displacement rates in August and September 2013. Although the proposed method achieved good results in this study, if factors as the precipitation or water level suddenly change, or an unusual rainfall pattern occurs, the error in point-based displacement predictions will inevitably increase. Thus, solving this problem becomes highly necessary [9]. As the Shengjibao landslide is a wading landslide in the TGRA, the factors considered including precipitation, water level and displacement. One should, in the future, consider means to develop prediction approaches for landslides with different triggering factors. There are three types of deformation zones divided on Shengjibao landslide. Any single point cannot represent the displacement characteristics of the entire landslide. To differentiate spatial zones of comparably the deformation characteristics and select the monitoring data of representative points are within these zones are vital preparatory steps for displacement prediction studies.
According to the existed research on the application of ML techniques to landslide displacement prediction up to now, many hybrid models were established for prediction. However, this research mainly focus on the hybrid models of the coupled single ML models and models used to obtain optimal hyperparameters for these proposed single ML models. In this study, an ensemble model of the coupled LSTM and SVR algorithm with the weight coefficients based on linear combination theory was established for prediction, and the ensemble model performed well. Meanwhile, many approaches to landslide susceptibility mapping and landslide risk analysis have been developed [45,46], including those based on artificial neural network [47,48], support vector machine [49], alternative decision tree [36]. In order to achieve an accurate result in landslide susceptibility mapping and landslide risk analysis, it is necessary to research more advanced ML algorithms. Thus, the novel method proposed in this paper is recommended for conducting landslide susceptibility mapping, landslide risk analysis, and other fields.

Conclusions
The monitoring data are fundamental for establishing landslide displacement prediction models, and the results of prediction are significant for landslide early warning. In this study, the Shengjibao landslide in the TRGA was taken as an example. Based on the monitoring data, the response relationship between landslide displacement and factors is analyzed, an ensemble model of the coupled LSTM and SVR algorithm with the weight coefficients based on linear combination theory was established. The dynamic model based on the deep learning algorithm of LSTM and the static SVR model were also used to predict landslide displacement for comparison. The main conclusions are as follows: (1) Under the same fluctuation conditions of reservoir water level, a longer duration of rainfall has a more important influence on landslide deformation; (2) Overall, the LSTM models perform better than the SVR models, but the results of LSTM models are not closer to the original value than the results of SVR models at every time step in predicting dataset; (3) The proposed Ensemble model integrated both the advantages of LSTM and SVR algorithms, where the Ensemble model has higher prediction performance than both of LSTM and SVR models. In general, by combing the advantages of different ML algorithms, the ensemble model proposed in this paper can effectively construct the response relationship between landslide displacement and factors.
Overall, the proposed method, which applies time series analysis, LSTM and SVR, can achieve accurate predictions in case of slow and the step-like deformation periods. This method of establishing an Ensemble model has the potential for application to predict landslide displacement in the TGRA and other landslide-prone regions.