Large-Scale Landslide Displacement Rate Prediction Based on Multi-Factor Support Vector Regression Machine

: Forecasting the development of large-scale landslides is a contentious and complicated issue. In this study, we put forward the use of multi-factor support vector regression machines (SVRMs) for predicting the displacement rate of a large-scale landslide. The relative relationships between the main monitoring factors were analyzed based on the long-term monitoring data of the landslide and the grey correlation analysis theory. We found that the average correlation between landslide displacement and rainfall is 0.894, and the correlation between landslide displacement and reservoir water level is 0.338. Finally, based on an in-depth analysis of the basic characteristics, inﬂuencing factors, and development of landslides, three main factors (i.e., the displacement rate, reservoir water level, and rainfall) were selected to build single-factor, two-factor, and three-factor SVRM models. The key parameters of the models were determined using a grid-search method, and the models showed high accuracies. Moreover, the accuracy of the two-factor SVRM model (displacement rate and rainfall) is the highest with the smallest standard error (RMSE) of 0.00614; it is followed by the three-factor and single-factor SVRM models, the latter of which has the lowest prediction accuracy, with the largest RMSE of 0.01644.


Introduction
Landslides are a type of geological disaster occurring on the crustal surface of the earth, and are regarded as serious hazards that are widely distributed throughout the globe [1]. They not only cause heavy casualties or property losses, but also serious social and environmental problems. Landslides result in significant harm to human beings mainly because it is difficult to accurately predict and forecast landslides in advance. In addition, because of financial constraints and other limitations, only an extremely small number of major landslides have been monitored, and control measures have not been taken for most landslides. However, in many cases, it is not easy to fully mine a large amount of useful information and accurately grasp the deformation and evolution of landslides, owing to a lack of effective data processing and prediction technology, even if the landslides are subjected to monitoring. Therefore, predicting the development process of large-scale landslides remains a contentious issue [1][2][3][4][5][6].
Because of the complexity of a landslide body and the diversity and randomness of its influencing factors, an accurate description of its development law is complex. However, the series of landslide body displacements, which vary over time, can be used to describe the general laws of landslide development. Therefore, some researchers have proposed different methods for the prediction of landslide deformation and displacement [2,4,[7][8][9][10][11][12][13][14][15]. The methods can be roughly classified into four categories-deterministic methods [2,7,8,12], statistical methods [9,11], numerical simulations [4,10], and nonlinear methods [13][14][15]. Each method has its own features. The deterministic methods are based on creep theory and other physical mechanisms of rock and soil mass and can provide a clear physical concept; however, they have strict application conditions. The statistical methods are based on mathematical statistics and other prediction theories and can be used to obtain statistical laws from a known displacement to infer an unknown displacement, which are usable for landslides without clear physical mechanisms. Numerical simulation methods are based on simulation software or programs such as FLAC 3D to build a numerical-mechanical model of landslide deformation prediction, the parameters of which are not easy to determine. Nonlinear methods are presented by introducing some nonlinear theories (e.g., machine learning theory, synergetic theory, and chaotic dynamics theory), which have the potential for coping with difficult and complicated problems [5]. Moreover, these methods can also be divided into two categories-single-and multi-factor prediction methods. The former only considers previous displacement or deformation, whereas the latter considers inducing factors in addition to landslide displacement or deformation.
Thus far, most of the methods mentioned above have applied single-factor prediction models and can only be established through a single factor, such as landslide displacement or deformation. As is well known, the evolution of landslides can be influenced by a variety of factors such as rainfall and earthquakes. Multi-factor prediction models will be a good choice for landslide displacement prediction in the future.
In recent years, machine learning methods such as an artificial neural network (ANN) and support vector machine (SVM) are being widely applied in landslide research because they can consider the influences of multiple complex factors on landslides and achieve good performance with regard to providing predictions, extracting associated features, and making decisions from the given information [13][14][15][16][17][18][19][20]. Among them, an SVM, a novel machine-learning algorithm presented by Vapnik (1995), is capable of preferentially resolving small-sample, nonlinear, and high-dimensional problems, while considering the influence of multiple factors. The method works on statistical learning theory and the structural risk minimization principle. It seeks an optimal trade-off between the complexity and learning ability of models as per the limited sample information to obtain the best generalization of the models [21,22]. Many studies have considered the method to be a future approach in the area of machine learning research. At present, SVM methods have obtained good results when applied to landslide assessment and prediction [23][24][25][26].
In Ref. [5], we built a GA-SVM model of large-scale landslides for landslide development prediction, which focuses on improving the prediction accuracy of SVM models by applying a genetic algorithm. This study primarily focuses on the application of a multi-factor SVRM for landslide displacement rate prediction, and mainly emphasizes the importance of choosing the appropriate influencing factors in building the prediction model of a large-scale landslide.

SVMs Theory
SVMs are of two types-classification and regression. The original SVM was designed to solve classification problems. In recent years, however, a support vector regression algorithm has shown an excellent performance in many research fields [27][28][29][30]. The basic idea of a support vector regression machine (SVRM) is to map nonlinearly the training data x from an input space into a higher dimensional feature space through a kernel function Ø(x) (a nonlinear mapping function), and apply a nonlinear regression within the feature space [22,31,32].
Given a set of training data {(x i , y i )} n i=1 ∈ R n × R, where x i is the input vector of an SVRM model, y i is the actual output value, and n is the total number of training data. In general, the regression function in an SVRM model can be applied through the following formula [31,32]: where ω and b are called the weight vector and bias term, respectively.
According to the related references, the SVRM can be solved in its dual formulation as a convex optimization problem [21,22,33] by maximizing where a and a * are dual Lagrange multipliers, C is a positive constant that determines the trade-off between the training error and model flatness, ε is a permissible error, and K(x i , x j ) is the kernel function. In general, there are several types of kernel functions such as linear, polynomial, and radial basis functions (RBFs). Here, the RBF is adopted in this study because of its strong nonlinear mapping ability and good learning performance in a practical application [34][35][36][37]. RBF is written as follows: where γ is a kernel parameter. After a, a * , and b are determined, the explicit form of regression Formula (1) can be obtained as follows [36]: The computation and analysis of the SVRM models in this paper were conducted in MATLAB software, which has a friendly user interface and simple operation. The Libsvm toolbox for MATLAB software is used in the computation, which is developed by Chang and Lin (2001) and can be freely downloaded from the net.
The computing procedure of SVRM mainly includes the following steps: 1.
Transforming and scaling the data into the acceptable format of libsvm software; 3.
Using a grid-search method to determine the best parameters C and γ in the SVRM model; and 5.
Using the SVRM model for prediction.
The grid-search method is most commonly employed as a parameter optimization method and consists of both coarse and fine searches. First, a rough selection result is obtained within a wide range of parameters through a coarse search. Second, a fine search result with smaller steps is obtained in a smaller range determined by the rough result from a fine search [38]. In this study, the coarse search range of parameters C and γ is [2 −8 , 2 8 ], the step size is 1, the fine search range is [2 −4 , 2 4 ], and the step size is 0.5.

Grey Correlation Analysis Theory
Deng (1989) first presented the grey correlation analysis theory in 1989 [39]. It is usually applied to evaluate and compare the importance of certain factors in many practical problems. In this study, the method is used to analyze the importance of the factors influencing landslide deformation. Its main procedures are as follows.

Defining Consensus and Comparison Sequences
Consensus and comparison sequences are also called mother sequences and subsequence, respectively. A consensus sequence is the basis of the grey correlation analysis method and can be chosen according to practical needs. Assuming x 0 (k) and x i (k) as a mother sequence and subsequence, respectively, we have the following: where m is the size of considered factors, and n is the size of analyzed data. It should be noted that a series of various data must be normalized before using the grey correlation analysis.

Calculating Grey Correlation Coefficient
The correlation coefficient ξ at k point between x 0 and x i can be calculated as follows: where ρ is called a discrimination coefficient, ρ ∈ (0, +∞). The smaller ρ is, the greater the discrimination ability of the grey correlation method. The value of ρ is generally determined as 0.5.

Calculating Grey Correlation Degree
The correlation degree γ is calculated based on the grey correlation coefficients, which measures the degree of similarity among the different factors, and is defined as follows:

Geological Setting
The landslide was located 600 km upstream of a large hydro-electrical station in Sichuan province of China, on the left bank slope of a secondary tributary of the Yangtze River ( Figure 1). It is a typical slow large-scale landslide. The length of the landslide from head to toe is approximately 500-700 m, and the average width of the landslide body is approximately 700 m. Its volume is approximately 15 million m 3 . It is mainly influenced by a variety of factors, such as rain, reservoir water, and ground water [1,5]. Since 1998, landslides have been systematically and comprehensively observed. Thus far, many geological and monitoring data have been accumulated. According to the borehole investigation results, the landslide comprises a set of basalt with a block structure in the upper part and a set of layered sedimentary rocks in the lower part of the slope and is mantled by loose deposits of quaternary systems. The sliding zone is mainly located in strongly weathered basalt ( Figure 2) [1,40]. Based on the geological and geomorphological features and the deformation characteristics and history of the landslide, the landslide can be divided into three main sliding blocks-Block 1, Block 2, and Block 3 (Figure 3), which are, respectively, the old landslide area, main deformation area, and shallow landslide area. The three blocks have different movement and material characteristics. Block 1 has a total volume of 260 × 10 4 m 3 and is stable at present. Block 2 has a total volume of approximately 250 × 10 4 m 3 , with a 180 × 10 4 m 3 rock mass, which has developed into a strong creep mass. Block 3 has poor stability and has experienced several small shallow landslides and collapses in its front and side; however, a deep instability does not occur easily.
Herein, our research focus is Block 2, which occupies the middle part of the landslide. It has an obvious deformation and is a potential threat to the hydro-electrical station.

Monitoring System
A complete landslide monitoring system was established in April of 1998. The aim is to analyze its stability and deformation tendency. The monitoring items include a precision geodetic survey of the surface displacement, as well as the borehole, flat tunnel, and meteorological monitoring. The distribution of monitoring points can be seen in Figure 2.
The precision geodetic survey includes a plane control network and an elevation control network. It is used to monitor the horizontal and vertical displacement to determine the displacement direction and law of the surface displacement in different parts of the landslide. There is a total of 21 monitoring points, the accuracy of which is of a second order. In the borehole monitoring, high precision inclinometers (inclination angle accuracy 0.01 • ) are used to monitor the lateral displacement of the upper and lower rock mass of the sliding surface. On these grounds, the location and displacement of the sliding surface can be determined. There are a total of 8 borehole monitoring points on the landslide body. Among them, 6 points are located in Block 2. A footrill is a good geology window for deeply analyzing landslides. Several precision instruments including a rock mass deformation instrument (accuracy of 0.05 mm), clinometer (inclination angle accuracy of 0.01 • ), displacement meter (accuracy of 0.1 mm), and triangulated water weir are fixed in the footrill. Using these instruments, the deep deformation and groundwater level change of the landslide body can be clearly observed. Footrill #5 is located above the elevation of 1220 m and is a long-term monitoring point during the reservoir operation.
The meteorological observation of the landslide uses a 110-WS-16 automatic meteorological instrument, which can provide data on the rainfall, temperature, and wind speed for a landslide.
Initially, the monitoring frequency of the items is mostly once per week. When the reservoir water level fluctuates sharply and abnormal conditions occur, the monitoring frequency is appropriately increased. Thus far, the monitoring frequency of most items has been once a month.
The system has been checked and serviced regularly since 1998. To date, the system has recorded a considerable amount of comprehensive and real-time observation data concerning the change in movement of the landslide body and its influencing factors.

Deformation Characteristics
According to the monitoring data and geological surveys conducted since 1998, the deformation of a landslide has obvious characteristics based on differences in region, elevation, depth, and season. The deformation rate is mainly related to rainfall and its intensity. It is also related to the continuous and fast rise and fall of the reservoir water level. Although there are some country roads in the front and top of the landslide body, and the roads have also caused some small partial and shallow landslides, particularly in Block 3, they have not caused any deep slides.
For the Block 2 creep body, its surface displacement is small, although the increase in displacement was significant from 1998 to 1999. The increase in displacement started to gradually become small after 2000, whereas the full displacement of Block 2 shows an obvious spiral increasing trend. The creep trend is still further developing now.
In this paper, the data on footrill #5 in Block 2 from April 1998 to December 2005 is chosen to perform an in-depth analysis of the relationships between the landslide deformation and the influencing factors. The displacement rate is calculated according to the monitored displacement. By contrast, the index values of the rainfall and reservoir water level are normalized using the common mini-max method. The relative relationship is shown in Figure 4.  Figure 4 shows that the relationship between the displacement rate and rainfall is obvious, and the rate peaks lag behind the rainfall peaks. This fully indicates that the impact of rain on the landslide has a certain delayed effect. The influence of the reservoir water level is mainly reflected in the early impounding stage. The rate obviously increased after the water storage commenced in 1998, after which the influence gradually weakened.

Monitoring Data Analysis
We took the average monthly displacement as the mother sequence, and the monthly rainfall and average monthly reservoir water level as the subsequence, to analyze the relative relationships between them using the aforementioned grey correlation analysis theory. The results are shown in Table 1. The table shows that the correlation between displacement and rainfall is much bigger than that between displacement and reservoir water level during each year, and the average correlation of the former from 1998 to 2005 is 0.894, whereas the average correlation of the latter is only 0.338.
From these correlations and the field survey, it is concluded that the deformation of the large-scale landslide is influenced by many factors. In addition, rain is always a key influencing factor, and the changes of the reservoir water level have obvious influences on the landslide only in the early impounding stage.

Results
This part presents some prediction results of several SVRM landslide models and compares their prediction performance. According to the analysis results of the monitoring data and SVRM theory, single-and two-factor SVRM models are established, along with a three-factor SVRM model to represent a complicated landslide.

Single-Factor SVRM Prediction Results
First, only the average monthly displacement rate of the landslide from April 1998 to December 2005 was used as a factor to build the single-factor SVRM model. The number of total data is 93. The initial 62 datasets were used as the learning samples, and the remaining 31 datasets were applied as the testing samples. A single-factor SVRM model was built to predict the development of the landslide, and the parameters (C and γ) of the model were defined using the grid-search method, which was proposed by Chang and Lin (2001) (Figure 5) [41]. This enabled us to obtain the most appropriate values for the C (i.e., 0.5, Equation (3)) and γ (i.e., 2.8284, Equation (8)) parameters. The use of these parameter values will produce a SVRM model with the smallest mean square error (MSE). The predicted result of the single-factor SVRM model is shown in Figure 6. Figure 6 shows that the predictive curve of the single-factor SVRM model and the monitoring curve of the landslide display the same trend and are in accordance. However, there are some differences between the predicted and monitored values of the individual points, particularly for point no. 5 ( Figure 6).

Two-Factor SVRM Prediction Results
Next, the average monthly displacement rate and monthly rainfall, and the average monthly displacement rate and average monthly reservoir water level, were used as the two main factors to build the two-factor SVRM models. Similarly, the initial 62 datasets and the remaining 31 datasets of the two factors were chosen as the learning and test samples, respectively. The parameters (C and γ) were determined using the same method. The results are shown in Figures 7 and 8. Figure 7 shows that the predictive curve of the two-factor SVRM model (displacement rate and rainfall) agrees well with the monitoring curve. Not only do the two curves display a common changing trend, the predicted and monitored values of the 93 points are also extremely close to each other.

Three-Factor SVRM Prediction Result
Finally, the displacement rate, reservoir water level, and rainfall over the landslide were used as the three main factors for building the three-factor SVRM model. The parameters (C and γ) were also determined using a grid-research method (Figure 9). Figure 10 shows the prediction results.

Discussions
The deformation evolution prediction of large-scale landslides is vital for pre-warning and prevention of landslide hazards. The complexity and randomness of landslides and their influencing factors complicate the accurate description of landslide development. Although many methods for the prediction of landslide displacement series have been presented [2,4,[7][8][9][10]12,42,43], most of them belong to the category of single-factor models. As is well known, landslides can be influenced by a variety of factors during their evolution such as rainfall and changes in the water level. Machine learning methods such as an ANN and an SVM can consider the influence of multiple complex factors on landslides. Specifically, an SVM is a novel machine-learning method and achieves the best generalization ability [31]. The method is most suitable for resolving small-sample, nonlinear, and high-dimensional problems, with a good predictive ability [31]. It overcomes certain disadvantages of other machine learning methods, such as local minimum or overlearning problem when applying an ANN. To date, the approach has been widely used in many different fields [44][45][46][47].
The prediction ability and effect of the models can be determined by calculating the root mean square error (RMSE) (see Table 2). In general, the smaller the RMSE is, the higher the model accuracy.  Table 2 shows that the SVRM prediction models achieve a good predictive performance, with all of the RMSE values between the predicted and monitored data being less than 0.02. Among the different models, the accuracies of the two-and three-factor models are slightly higher than that of the single-factor model. Both two-factor models (displacement rate and rainfall) have the highest accuracy, with the smallest RMSE being 0.00614. The three-factor model has the second highest accuracy, and the single-factor model has the lowest accuracy, with the largest RMSE of 0.01644.
The landslide used in this study is a typical, slow, and large-scale landside. Its deformation and evolution are generally affected by rainfall, changes in the reservoir water level, and other predisposing factors. According to a monitoring data analysis conducted since 1998 and a field survey, the deformation of the landslide has obvious regional differences, elevation differences, depth differences, and seasonal change characteristics. The deformation rate is mainly related to rainfall and rainfall intensity, followed by a continuous and significant change in the reservoir water level. The reservoir water level has an obvious influence on the landside deformation during the early impounding stage. Thus, the prediction models considering the rainfall factor in the study have higher accuracies. This is helpful for understanding why the accuracy of a two-factor (displacement rate and rainfall) model is the highest.

Conclusions
This study involved the analysis of a complicated large-scale landslide in a hydropower project area in China, and presented the application of multi-factor SVRM methods in landslide deformation prediction. Single-factor, two-factor, and three-factor SVRM models were built, and the parameters of these models were determined using a gridresearch method. In comparison, we found the prediction accuracy of the models to be superior in terms of landslide development prediction. The accuracies of the two-and threefactor models were slightly higher than that of the single-factor model, and a two-factor model (displacement rate and rainfall) had the highest accuracy.
In this regard, the consideration of multiple factors is helpful for improving the accuracy of landslide prediction models because models that include more factors have a higher prediction accuracy than a single-factor model. However, more factors might not produce a higher accuracy model [6,43], depending on whether the factors involved in the model are key factors affecting the development process of a landslide [48]. Thus, it is crucial to determine the key controlling factors that influence accurate prediction of landslide development. Data Availability Statement: The data are not publicly available due to privacy or ethical restrictions.