## 1. Introduction

Landslides are among the most notable geological processes that frequently occur in mountainous regions, resulting in financial losses measured in hundreds of billions of euros annually, injuries, and fatalities [

1,

2,

3]. These disasters have caught unprecedented attention in the world; lots of researchers are continuously carrying out relevant research, mainly focusing on the prediction of landslides [

4,

5].

Various approaches have been developed for landslide hazard assessment, and can be grouped into three broad categories; analytic, statistical, and soft computing methods [

6]. Analytic approaches consider the failure mechanisms of slopes and can provide an accurate prediction for the instability of slopes. However, when the study area is large, application of these approaches may be difficult. Statistical analyses such as discriminant analysis [

4,

7], logistic regression [

8,

9], and Bayes learning [

10,

11], are deemed to be more suitable for geological hazard assessment in large and complex areas [

12,

13]. With the development of science and technology, soft computing techniques, such as data mining and artificial intelligence, have also been widely used in geological hazard assessment. Examples of these approaches include artificial neural networks [

14,

15,

16,

17], genetic algorithms [

18,

19], decision trees [

20,

21], and support vector machines [

22,

23,

24,

25]. However predictions provided by these methods are not probabilistic.

Hybrid methods, which are established by combining statistical approaches and artificial intelligence, have also been adopted for assessing geological hazards; these include artificial neural network (ANN)-Bayes analysis [

14], ANN-fuzzy logic [

26], and neuro–fuzzy inference systems [

27,

28]. However the ANN-based approaches cannot provide objective and steady assessment results because their outcomes are operator dependent [

13,

15].

The Bayes learning algorithm is considered to be an effective tool for knowledge representation and reasoning under the influence of uncertainty [

10]. Based on this algorithm, a recently developed machine learning technique, relevance vector machine (RVM), was originally introduced by Tipping [

29]. As a Bayesian treatment of the sparse learning problem, the RVM can yield a probabilistic output [

30].

In this study, a novel empirical model for slope failure analyses based on RVM is presented. We selected the lower reaches of the Jinsha River close to the Wudongde dam site as the study area; 55 landslides mapped in the region were utilized to train and test the RVM model. To evaluate the validity of the model, it was applied to another landslide site where the environmental conditions are similar to those of the study area.

## 2. Study Area

The study area (

Figure 1) lies along the lower reaches of the Jinsha River and is the reservoir region of the Wudongde hydropower station, which is located in the mountains separating the Sichuan and Yunnan provinces. The occurrence of landslides not only poses a threat to human lives and properties, but also affects the stability of the Wudongde dam. Thus assessing the failure potential of slopes in this area is of great significance.

The elevation ranges from 740 to 3900 m. This area experiences a low-latitude plateau subtropical monsoon climate characterized by plenty of sun and large evaporation capacity throughout the year [

31]. The mean annual temperature is 20.9 °C. It receives a mean annual rainfall varying from 600 to 800 mm, while the mean annual evaporation is 698 mm.

#### 2.1. Geological and Geomorphological Settings

From the tectonic standpoint, the study area is located in the eastern section of the Tethys-Himalaya tectonic domain, which is one of the tectonic zones of the Himalaya characterized by intense compressing and folding. The predominant regional structures are large-scale faults constituting the famous Chuan-Dian N-S tectonic belt [

32]. A total of 13 regional faults are situated in this region and dominantly trend approximately N-S (

Figure 2). Several strong earthquakes have been triggered by these faults since 1955 [

33], such as the Lazha earthquake (magnitude scale 6.7, 1955) which was triggered by the Tanglang-Yimeng fault, and the Panzhihua earthquake (magnitude scale 6.1, 2008), triggered by the Mopanshan-Lvzhijiang fault.

The geology is comprised of two major components; a pre-Sinian crystalline basement and a Sinian-Cretaceous sedimentary cover. The former is mainly composed of a range of metamorphic rocks (phyllite, slate and schist), which widely outcrop along the Jinsha River. The latter consists of limestone, sandstone, mudstone, and shale formations and is also widespread.

The geomorphological settings reflect the complex interplay between the geological and structural conditions of the area. Steep topographical characteristics predominate this region, with the average slope angle ranging from 30° to 45°. The valleys present a mountain canyon geomorphology. Geomorphic features include cliffs, ridges, gorges, rocky slopes, and Quaternary deposits along the river valleys. The distribution and extension of the river network and ridges is controlled by the structures to some extent. The effect of high relief and structural control is also well reflected by deep gorges and narrow valleys carved by numerous channels.

Phyllite, slate, schist, shale, and mudstone are strongly weathered and fractured and are prone to slope failures. Quaternary deposits composed of alluvial and eluvial deposits often outcrop as a cover layer on the riverbed and gentle slopes and are affected by transitional or rotational landslides triggered by the undercutting of the Jinsha River.

#### 2.2. Landslide Identification

Landslide identification was carried out using SPOT5 remote sensing images and field surveys. The SPOT5 images, which were obtained by the SPOT5 satellite launched in 2002 with a multispectral resolution of 10 m and a panchromatic resolution of 2.5 m, have been largely utilized in geological hazard surveys. A series of field investigations, aimed at the confirmation of the mapped landslides through remote sensing images and the investigation of the link between the occurrence of landslides and environmental factors, were conducted. In general, there are three main types of landslides present in the study area; translational movements (55 locations), rock falls (26 locations), and debris flows (239 locations). Each type of landslide has different mechanism requiring separate study for the spatial prediction of landslides. Therefore, in this study, only the 55 translational landslides (

Figure 3) were used for analysis. At the time of fieldwork, 44% of the slope movements were classified as active, while the class of dormant movements corresponds to 33%, and 23% of the slope movements were considered stabilized. The highest frequency of landslide phenomena was recorded in mudstones and metamorphic rock formations (phyllite, slate, and schist). The smallest landslide covered an area of 0.03 km

^{2}, while the largest one was 4.2 km

^{2}.

Figure 4 gives some examples of the landslides.

## 3. Influencing Factors

Landslide hazard assessment is often performed based on the assumption that future landslides will occur in the areas where environmental conditions are similar to those of past and present failures [

34]. Therefore it relies strongly on identifying landslide scars, characterizing the properties of failed sites, and confirming the link between landslide distribution and spatial variation of parameters [

35]. There is no standard principle for selecting the influencing parameters because they often vary from one area to another. However it is commonly recognized that the parameter selection depends on the environmental conditions of the study area, the mechanisms of failed slopes, and the scale of analyses [

17]. It should be implemented according to the data availability and the significance of data in relation to the problem at question [

23].

Based on the review of previous literature [

13,

16,

17,

25] and field surveys, seven factors that were identified as causative and triggering factors for landslide activity in the study area were selected to predict the failure potential of slopes. They are lithology, slope angle, slope aspect, slope height, slope structure, distance from faults, and land use. This research selected a single slope as a computing unit [

16], and values of the influence factors were derived for each slope in the database.

Table 1 presents the factors and their classes used in this study. The frequency distribution of landslide occurrence in each class is shown in

Figure 5.

#### 3.1. Lithology

Lithology has been considered to be one of the main parameters influencing the stability of slopes. Since different lithologic units have different slope stability performances, they are very important for landslide prediction [

4]. The large surface development of formations, such as schist formations, flysch sediments, and metamorphic rock formations with anisotropic geomechanical behavior, facilitates the manifestation of abundant slope failures. The lithology was derived from a lithologic map at a 1:100,000 scale and was grouped into three categories (

Table 1) according to Peng et al. [

25] and Fourniadis et al. [

36].

Figure 5a shows that approximately 87% of the landslides occurred on Quaternary deposits, strongly weathered sandstones, mudstones, and metamorphic rocks (phyllite, slate and schist).

#### 3.2. Slope Angle

Slope angle is an essential factor controlling the stability of a slope. The shear stress induced by gravity increases with the increase of the slope angle; therefore landslides tend to occur more frequently on steeper slopes [

24]. The slope angle was measured from the slope profile that was derived from the digital elevation model (DEM). Note that a slope angle of 10° was taken as the threshold because very few slopes less than that angle have failed [

16]. The slope angle of the study area was divided into five classes (

Table 1): 0°–10°, 10°–20°, 20°–30°, 30°–40°, and >40°.

Figure 5b illustrates that the largest numbers of landslides were recorded in the classes of 20°–30° (49.1%) and 10°–20° (32.7%).

#### 3.3. Slope Height

Slope height denotes the difference value between the head and tail elevations of a slope. A slope with higher height often indicates a higher probability of experiencing a landslide event [

37,

38]. The slope height was directly derived from the DEM. As listed in

Table 1, the slope height was partitioned into four classes according to Zhao et al. [

38] and flied surveys. The distribution of landslides in the height classes is show in

Figure 5c.

#### 3.4. Slope Aspect

Aspect controls some microclimatic parameters, such as exposure to sunlight and winds, rainfall intensity, and soil moisture [

17]. It was obtained from the DEM and was categorized into four classes according to Conforti et al. [

17], as shown in

Table 1. The relationship between lithology and failed slopes is presented in

Figure 5d.

#### 3.5. Slope Structure

Slope structure represents the spatial relationship between the rock strata and slope face. Slopes in the study area were measured from field investigations and were classified into four structural categories (

Table 1): anti-dip, insequent, transverse, and dip-bedded. Field investigations demonstrate that dip-bedded slopes are prone to failures because of the erosion of slope toes caused by the Jinsha River.

Figure 5e indicates that slope failures predominantly occurred on dip-bedded slopes, occupying 70.9% of the landslides in the area. However only 5.5% of the landslides occurred on anti-dip slopes.

#### 3.6. Distance from Faults

Field evidence suggests a strong influence of tectonic settings on the occurrence of landslides [

39]. The distance from faults was obtained from the geological map of the region (1:100,000). Approximately 49.1% (

Figure 5f) of slope failures occurred in the sites with a minimum distance of less than 500 m from faults.

#### 3.7. Land Use

Land use has been used as a predisposing factor in landslide hazard assessment. It was divided into three classes (

Table 1) based on field surveys.

Figure 5g shows that barren land and residential land have significant influence on the occurrence of landsides, and both of them occupy 41.8% of slope failures.

## 4. Relevance Vector Machine

RVM is based on a Bayesian learning framework and can output the probabilities of class membership. It has the same functional form as the support vector machine (SVM) and shares many of the characteristics of SVM whilst avoiding the SVM’s main limitations [

40]. The structure of the RVM is represented by the sum of product of weights and kernel functions, which is expressed as follows [

41]:

where

w_{i} is the weight and

K(

**x**,

**x**_{i}) is a kernel function. The commonly used kernel is the Gaussian kernel

K(

**x**,

**x**_{i}) = exp(–||

**x** –

**x**_{i}||

^{2}/σ

^{2}), where σ is the kernel parameter controlling the sensitivity of the kernel. This function is not sensitive to outliers and can handle the case in which the relationship between class labels and attributes is nonlinear [

13].

In this paper, the basic theory of RVM classification is briefly introduced. For further details of RVM, readers can refer to Tipping [

29,

41] and Bishop [

40]. For two-class (binary) classification, RVM is used to predict the posterior probability of class membership of one of the classes, given the input

**x**. By applying the logistic sigmoid function σ(

y) = 1/(1 + e

^{−y}) to

y(

**x**) and adopting the Bernoulli distribution for

p(

**t**|

**w**), the likelihood can be written as [

40]:

where the targets t

_{i} $\in $ {0, 1}. In this study, 0 and 1 denote the stable and failed cases, respectively. The weights

**w** cannot be analytically obtained and so are denied the closed-form expression for either the weight posterior

p(

**w**|

**t**,

**α**) or the marginal likelihood

p(

**t**|

**α**), with a hyper-parameter vector

**α**. Therefore, according to Tipping [

40], the following approximation procedure proposed by MacKay [

42] and based on Laplace’s method is utilized:

(1) For the current fixed values of

**α**, the most probable weights

**w**_{MP} are found, giving the location of the mode of the posterior distribution. Since

p(

**w**|

**t**,

**α**)

$\propto $ p(

**t**|

**w**)

p(

**w**|

**α**), this is equivalent to finding the maximum, over

**w**, of

where

y_{i} = σ{

y(

**x**_{i};

**w**)} and

**A** =

diag(

α_{0},

α_{1},

α_{2}, …,

α_{N}) for the current values of

**α**. Since Equation (3) is a penalized, logistic, log-likelihood function, iterative maximization is required. The iteratively reweighted least-square algorithm is adopted to find

**w**_{MP} in the following procedure.

(2) Equation (3) is differentiated twice to give:

where

**B** =

diag(β

_{0}, β

_{1}, β

_{2}, …, β

_{N}) is diagonal matrix with elements β

_{i} = σ{

y(

**x**_{i})}[1 − σ{

y(

**x**_{i})}], and

**Φ** = [ϕ(

**x**_{1}), ϕ(

**x**_{2}), …, ϕ(

**x**_{N})]

^{T} is the

N × (

N + 1) design matrix with ϕ(

**x**_{i}) = [1,

K(

**x**_{i},

**x**_{1}),

K(

**x**_{i},

**x**_{2}), …,

K(

**x**_{i},

**x**_{N})]

^{T}. This is then negated and inverted to give the covariance ∑ for a Gaussian approximation to the posterior over weights centered at

**w**_{MP}:

(3) Using the statistics ∑ and

**w**_{MP} of the Gaussian approximation, MacKay’s approach is used to update the hyper-parameter

**α** by

where ∑

_{ii} is the

ith diagonal element of the covariance ∑, and

**w**_{MP} = ∑

**Φ**^{T}**Bt**.

During the optimization process, many α_{i} will have large (in principle infinite) values; thus the values of the corresponding weights will tend toward zero. Therefore these weights and the corresponding basis functions are removed from the model and thus play no role in making predictions for new inputs. Those examples remaining with w_{i} ≠ 0 are termed relevance vectors.

## 6. Conclusions

This paper presents a novel approach for assessing the failure probability of slopes based on a relevance vector machine (RVM). The landslides mapped in the lower reaches of the Jinsha River were used to train and test the RVM model. Seven parameters, namely lithology, slope angle, slope height, slope aspect, slope structure, distance from faults, and land use, were selected as influencing factors of slope failures. The trained RVM model with seven factors is shown to be effective in classifying slopes into groups of stable ones and failed ones. The accuracies of the model in predicting the failure potential of slopes, using both training and testing data sets, are all very high and deemed satisfactory. To evaluate the model’s performance, it was applied to landslide sites identified in the lower reaches of the Jinsha River, where environmental conditions are similar to those of the study area. An accuracy of approximately 92.9% was obtained, indicating that the model has a good generalization performance.

The study demonstrates that the RVM model is stable and reliable and could be used to predict the occurrence of translational landslides in hazard mitigation and guarding systems. Another important advantage of the model is that it can provide a probabilistic prediction for slope failures.