# Landslide Susceptibility Mapping Combining Information Gain Ratio and Support Vector Machines: A Case Study from Wushan Segment in the Three Gorges Reservoir Area, China

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Description of the Study Area

#### 2.2. Methodology

#### 2.2.1. Information Gain Ratio

_{i}(landslide, non-landslide) is a classification set of sample data, and the following formula can obtain the information entropy of the factors:

_{1}, T

_{2}, …, T

_{m}) split from T regarding the causal factor F is estimated as:

#### 2.2.2. Support Vector Machines

_{i}, x

_{j}) = 1, 2…, n, the following function can solve the optimal separating hyperplane:

_{i}is the positive slack variables for the data points that allow for penalized constraint violation, and C is the penalty parameter that controls the trade-off between the complexity of the decision function and the number of training examples misclassified. The function can be converted into an equivalent dual problem based on the Wolf duality theory:

_{i}are Lagrange multipliers and C is the penalty. Then, the decision function, which will be used for the classification of new data, can be written:

**K**(x

_{i}, x

_{j}) is the kernel function. The radial basis kernel was adopted as kernel function for the SVM model in this study.

#### 2.2.3. Artificial Neural Networks

_{min}is the minimum value of the learning rate, η

_{max}is the maximum value of the learning rate, and d is the delay rate. In this study, the initial rate, the maximum and minimum learning rate, and the delay rate are 0.3, 0.1, 0.01, and 30, respectively.

#### 2.2.4. Classification and Regression Tree

**X**

_{m,p}, we sorted all samples by these attributes, and the average value of two adjacent values was taken as the separating points, which was called η

_{s}(s = 1, 2…, m−1). The data set

**X**

_{m,p}was divided into two subsets according to the value taken on attribute F, the subset

**X**

_{1}larger than η

_{s}and the subset

**X**

_{2}smaller than or equal to η

_{s}. The GINI coefficients of this classification method can be expressed as:

**X**

_{1}| is number of samples of subset

**X**

_{1}, |

**X**

_{2}| is number of samples of subset

**X**

_{2}, and

**I**(X) can be calculated using the following formula:

**X**

_{j}| is the number of samples in dataset

**X**

_{j}, and |

**C**

_{j}| is the number of samples belonging to

**C**

_{j}in data set

**X**

_{j}.

**X**

_{m,p}contained m data and p attributes, each attribute corresponded to m-1 partition points, and the GINI coefficient of each partition point was ${\mathit{G}}_{F}^{{\eta}_{\mathrm{s}}}\left(X\right)$, then the point, which had minimum GINI coefficient, was selected to partition the dataset

**X**

_{m,p}.

#### 2.2.5. Logistic Regression

_{i}(i = 1, 2…, n) is the predictor variables, and β

_{i}(i = 1, 2…, n) is the coefficient of the LR model.

#### 2.3. Data Preparation and Analysis

#### 2.3.1. Landslide Inventory Map

^{2}, and the area of single landslide ranged from 1664 m

^{2}to 1.06 km

^{2}. Most of the landslides in this study area occurred on the bank of the Yangtze River and the gully.

#### 2.3.2. Landslide Causal Factors

_{s}is the catchment area of the basin and β is the slope. The SPI can be divided into four categories (Figure 2f): [0,2), [2,4), [4,8), [8, +∞); their information values were 0.262, −0.020, −0.327, and −0.436, respectively (Table 1).

#### 2.4. Landslide Causal Factors Selection

#### 2.4.1. Multicollinearity Analysis

#### 2.4.2. Factor Selection Using Information Gain Ratio

## 3. Results and Accuracy Analysis

#### 3.1. Landslide Susceptibility Modelling

#### 3.2. Accuracy Statistic

#### 3.3. Using ROC Curve

## 4. Discussion

_{2}b

^{3}, T

_{2}b

^{4}) had a positive effect on landslides in this area, and their average merit values were 0.061 and 0.029, respectively (Figure 3). A total of 62% of the landslides were within 300 m from the Yangtze River, and nearly 60% of the landslides were with the stratigraphic lithology of T

_{2}b

^{3}and T

_{2}b

^{4}, which were regarded as the main stratum of landslide in the TGRA [37].

## 5. Conclusions

_{2}b

^{3}and T

_{2}b

^{4}); (2) IGR is an effective method for evaluating the importance of landslide indicators, and eliminating the less important factors can effectively improve the prediction accuracy in landslide susceptibility modelling; and (3) the SVM model shows the best performance in this study area, and thus it can be recommended for susceptibility modelling in TGRA and other landslide-prone regions.

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

**Figure 1.**(

**a**) The location of Three Gorges Reservoir area (TGRA) in China. (

**b**) The location of the study area. (

**c**) Elevation map of the study area with landslide distribution (the landslides polygons were obtained from historical landslide data, field investigation, and high-resolution remote sensing image data).

**Figure 2.**Landslide causal factors of the study area: (

**a**) slope, (

**b**) aspect, (

**c**) curvature, (

**d**) plan curvature, (

**e**) profile curvature, (

**f**) SPI, (

**g**) TWI, (

**h**) TRI, (

**i**) lithology, (

**j**) bedding structure, (

**k**) distance to faults, (

**l**) distance to rivers, (

**m**) distance to gully.

**Figure 4.**Landslide susceptibility maps obtained from (

**a**) ANN model, (

**b**) logistic regression (LR) model, (

**c**) SVM model, and (

**d**) classification and regression tree (CART) model.

**Figure 5.**The receiver operating characteristic (ROC) curves of the SVM, ANN, LR, and CART models in landslide susceptibility assessment: (

**a**) training and (

**b**) verifying.

Causal Factor | Category | Pixels in Landslide | Pixels in TD | Proportion of LTL | Proportion of DTD | IV | NC |
---|---|---|---|---|---|---|---|

Altitude (m) | <300 | 17,324 | 81,071 | 68.71 | 20.41 | 1.752 | 0.990 |

300–450 | 6049 | 86,452 | 23.99 | 21.76 | 0.141 | 0.663 | |

450–750 | 1839 | 113,518 | 7.29 | 28.57 | −1.970 | 0.337 | |

>750 | 0 | 116,248 | 0 | 29.26 | −∞ | 0.01 | |

Slope (°) | <6 | 538 | 8342 | 2.13 | 2.10 | 0.023 | 0.598 |

6–15 | 4196 | 30,806 | 16.64 | 7.75 | 1.102 | 0.99 | |

15–24 | 9711 | 102,948 | 38.52 | 25.91 | 0.572 | 0.794 | |

24–33 | 7608 | 129,123 | 30.18 | 32.50 | −0.107 | 0.402 | |

33–51 | 3153 | 118,589 | 12.51 | 29.85 | −1.255 | 0.206 | |

51–75 | 6 | 7481 | 0.02 | 1.88 | −6.306 | 0.01 | |

Aspect (°) | 0–45 | 3427 | 45,388 | 13.59 | 11.42 | 0.251 | 0.849 |

45–90 | 2363 | 39,597 | 9.37 | 9.97 | −0.089 | 0.283 | |

90–135 | 3380 | 43,368 | 13.41 | 10.92 | 0.296 | 0.99 | |

135–180 | 4067 | 60,128 | 16.13 | 15.13 | 0.092 | 0.707 | |

180–225 | 2058 | 44,740 | 8.16 | 11.26 | −0.464 | 0.01 | |

225–270 | 1750 | 33,824 | 6.94 | 8.51 | −0.295 | 0.141 | |

270–315 | 3180 | 50,727 | 12.61 | 12.77 | −0.018 | 0.424 | |

315–360 | 4987 | 79,517 | 19.78 | 20.01 | −0.017 | 0.566 | |

Curvature | −24 to −1 | 3254 | 369,402 | 12.91 | 92.98 | −2.849 | 0.01 |

−1 to 3 | 21,577 | 26,749 | 85.58 | 6.73 | 3.668 | 0.99 | |

3–7 | 372 | 993 | 1.48 | 0.25 | 2.562 | 0.663 | |

7–27 | 9 | 145 | 0.04 | 0.04 | −0.032 | 0.337 | |

Plan curvature | −13 to −1.5 | 562 | 13,106 | 2.23 | 3.30 | −0.566 | 0.5 |

−1.5 to 1.5 | 24,231 | 372,725 | 96.11 | 93.82 | 0.035 | 0.99 | |

1.5–10.5 | 419 | 11,458 | 1.66 | 2.88 | −0.795 | 0.01 | |

Profile curvature | −18 to −2 | 397 | 11,732 | 1.57 | 2.95 | −0.907 | 0.01 |

−2 to 2 | 24,319 | 372,535 | 96.46 | 93.77 | 0.041 | 0.99 | |

2–18 | 496 | 13,022 | 1.97 | 3.28 | −0.736 | 0.5 | |

Stream power index (SPI) | 0–2 | 13,724 | 180,391 | 54.43 | 45.41 | 0.262 | 0.99 |

2–4 | 4304 | 68,746 | 17.07 | 17.30 | −0.020 | 0.663 | |

4–8 | 3196 | 63,159 | 12.68 | 15.90 | −0.327 | 0.337 | |

>8 | 3988 | 84,993 | 15.82 | 21.39 | −0.436 | 0.01 | |

Topographic wetness index (TWI) | 0–4.5 | 18,990 | 289,614 | 75.32 | 72.90 | 0.047 | 0.663 |

4.5–6.5 | 4856 | 85,391 | 19.26 | 21.49 | −0.158 | 0.337 | |

6.5–8.5 | 954 | 14,335 | 3.78 | 3.61 | 0.069 | 0.99 | |

>8.5 | 412 | 7949 | 1.63 | 2.00 | −0.292 | 0.01 | |

Terrain roughness index (TRI) | 1–1.2 | 22,324 | 278,274 | 88.55 | 70.04 | 0.338 | 0.99 |

1.2–1.4 | 2645 | 93,562 | 10.49 | 23.55 | −1.167 | 0.663 | |

1.4–1.6 | 239 | 18,431 | 0.95 | 4.64 | −2.291 | 0.337 | |

Distance to rivers (m) | >1.6 | 4 | 7022 | 0.02 | 1.77 | −6.800 | 0.01 |

0–150 | 9958 | 41,767 | 39.50 | 10.51 | 1.910 | 0.99 | |

150–300 | 5659 | 35,396 | 22.45 | 8.91 | 1.333 | 0.794 | |

300–650 | 5047 | 67,801 | 20.02 | 17.07 | 0.230 | 0.598 | |

650–950 | 2259 | 47,096 | 8.96 | 11.85 | −0.404 | 0.402 | |

950–1550 | 1808 | 69,776 | 7.17 | 17.56 | −1.292 | 0.206 | |

>1550 | 481 | 135,453 | 1.91 | 34.09 | −4.160 | 0.01 | |

Distance to gully (m) | 0–150 | 15,036 | 194,536 | 59.64 | 48.97 | 0.284 | 0.99 |

150–350 | 7653 | 106,289 | 30.35 | 26.75 | 0.182 | 0.75 | |

350–500 | 1553 | 30,901 | 6.16 | 7.78 | −0.337 | 0.5 | |

500–900 | 962 | 36,022 | 3.82 | 9.07 | −1.249 | 0.26 | |

>900 | 8 | 29,541 | 0.03 | 7.44 | −7.872 | 0.01 | |

Distance to faults (m) | 0–450 | 14,652 | 154,959 | 58.12 | 39.00 | 0.575 | 0.99 |

450–900 | 7121 | 77,607 | 28.24 | 19.53 | 0.532 | 0.663 | |

900–1750 | 3155 | 75,914 | 12.51 | 19.11 | −0.611 | 0.337 | |

>1750 | 284 | 88,809 | 1.13 | 22.35 | −4.311 | 0.01 | |

Lithology (L) | L1 | 3890 | 47,612 | 15.43 | 11.98 | 0.365 | 0.598 |

L2 | 15,126 | 132,299 | 60.00 | 33.30 | 0.849 | 0.794 | |

L3 | 1316 | 20,209 | 5.22 | 5.09 | 0.037 | 0.402 | |

L4 | 2003 | 16,307 | 7.94 | 4.10 | 0.953 | 0.99 | |

L5 | 0 | 11,826 | 0.00 | 2.98 | −∞ | 0.01 | |

L6 | 2877 | 168,880 | 11.41 | 42.51 | −1.897 | 0.206 | |

L7 | 0 | 156 | 0.00 | 0.04 | −∞ | 0.01 | |

Bedding structure (BS) | BS1 | 206 | 509 | 0.82 | 0.13 | 2.673 | 0.99 |

BS2 | 1423 | 34,200 | 5.64 | 8.61 | −0.609 | 0.173 | |

BS4 | 3204 | 87,211 | 12.71 | 21.95 | −0.789 | 0.337 | |

BS5 | 4695 | 87,741 | 18.62 | 22.08 | −0.246 | 0.01 | |

BS6 | 8549 | 113,523 | 33.91 | 28.57 | 0.247 | 0.5 | |

BS7 | 3721 | 39,376 | 14.76 | 9.91 | 0.574 | 0.663 | |

BS8 | 3414 | 34,729 | 13.54 | 8.74 | 0.631 | 0.827 |

Category | Main Lithology | Geologic Group |
---|---|---|

A | Siltstone, silty mudstone | T_{2}b^{2} |

B | Siltstone, muddy limestone, dolostone with mudstone | T_{2}b^{3}, T_{2}b^{4} |

C | Mudstone, muddy limestone | T_{2}b^{1} |

D | Sandstone, silty shale | T_{3}xj^{1}, T_{3}e |

E | Muddy limestone with limestone | T_{1}d^{1}, T_{1}d^{2}, T_{1}d^{3}, T_{1}d^{4} |

F | Limestone with dolostone, muddy limestone, dolomitic limestone | T_{1}j^{1}, T_{1}j^{2}, T_{1}j^{3}, T_{1}j^{4} |

G | Limestone, silty shale with coal seam | P_{3}w, P_{3}d |

Category | $\mathbf{Definition}\text{}(\mathbf{slope}:\mathit{\theta}$$,\text{}\mathbf{aspect}:\mathit{\sigma}$$,\text{}\mathbf{bed}\text{}\mathbf{dip}\text{}\mathbf{angle}:\mathit{\alpha}$$,\text{}\mathbf{bed}\text{}\mathbf{dip}\text{}\mathbf{direction}:\mathit{\beta})$ |
---|---|

BS1 | $\alpha <10\xb0$ |

BS2 | $\left(\left(\left|\alpha -\beta \right|\in \left(0,30\xb0\right]\right)\parallel \left(\left|\alpha -\beta \right|\in \left[330\xb0,360\xb0\right)\right)\right)\&\&\left(\alpha >10\xb0\right)\&\&\left(\theta >\alpha \right)$ |

BS3 | $\left(\left(\left|\alpha -\beta \right|\in \left(0,30\xb0\right]\right)\parallel \left(\left|\alpha -\beta \right|\in \left[330\xb0,360\xb0\right)\right)\right)\&\&\left(\alpha >10\xb0\right)\&\&\left(\theta =\alpha \right)$ |

BS4 | $\left(\left(\left|\alpha -\beta \right|\in \left(0,30\xb0\right]\right)\parallel \left(\left|\alpha -\beta \right|\in \left[330\xb0,360\xb0\right)\right)\right)\&\&\left(\alpha >10\xb0\right)\&\&\left(\theta <\alpha \right)$ |

BS5 | $\left(\left|\alpha -\beta \right|\in \left[30\xb0,60\xb0\right)\right)\parallel \left(\left|\alpha -\beta \right|\in \left[300\xb0,330\xb0\right)\right)$ |

BS6 | $\left(\left|\alpha -\beta \right|\in \left[60\xb0,120\xb0\right)\right)\parallel \left(\left|\alpha -\beta \right|\in \left[240\xb0,300\xb0\right)\right)$ |

BS7 | $\left(\left|\alpha -\beta \right|\in \left[90\xb0,150\xb0\right)\right)\parallel \left(\left|\alpha -\beta \right|\in \left[210\xb0,240\xb0\right)\right)$ |

BS8 | $\left(\left|\alpha -\beta \right|\in \left[120\xb0,180\xb0\right)\right)\parallel \left(\left|\alpha -\beta \right|\in \left[180\xb0,210\xb0\right)\right)$ |

Factor | Original Factor System | New Factor System | ||
---|---|---|---|---|

Tolerances | VIF | Tolerances | VIF | |

Altitude | 0.176 | 5.687 | / | / |

Slope | 0.535 | 1.870 | 0.536 | 1.867 |

Aspect | 0.979 | 1.021 | 0.980 | 1.021 |

Curvature | 0.846 | 1.183 | 0.849 | 1.178 |

Plan curvature | 0.926 | 1.080 | 0.927 | 1.079 |

Profile curvature | 0.876 | 1.142 | 0.876 | 1.142 |

TRI | 0.522 | 1.916 | 0.522 | 1.914 |

Lithology | 0.489 | 2.044 | 0.544 | 1.837 |

Bedding structure | 0.939 | 1.065 | 0.941 | 1.063 |

Distance to faults | 0.603 | 1.658 | 0.627 | 1.595 |

Distance to rivers | 0.235 | 4.259 | 0.751 | 1.332 |

Distance to gully | 0.769 | 1.300 | 0.802 | 1.247 |

Model | Eliminating Less Important Factors | Accuracy |
---|---|---|

Model 1 | Without eliminating any factor | 0.918 |

Model 2 | TWI | 0.918 |

Model 3 | TWI, profile curvature | 0.920 |

Model 4 | TWI, profile curvature, plan curvature | 0.919 |

Model 5 | TWI, profile curvature, plan curvature, curvature | 0.922 |

Model 6 | TWI, profile curvature, plan curvature, curvature, aspect | 0.908 |

Models | Parameters | Notes |
---|---|---|

SVM | c = 20, γ = 1.3 | c is the penalty factor, γ is the parameter of the kernel function |

ANN | n = 5, α = 0.9 | n is the neurons number, α is the momentum |

Susceptibility Level | Pixels in Landslide | Pixels in Domain | Proportion of LD | Proportion of LTL | Proportion of DTD | Frequency Ratios |
---|---|---|---|---|---|---|

SVM | ||||||

Very low | 6 | 154,275 | 0.00% | 0.02% | 38.83% | 0.001 |

Low | 210 | 83,697 | 0.25% | 0.83% | 21.07% | 0.040 |

Moderate | 2636 | 79,817 | 3.30% | 10.46% | 20.09% | 0.520 |

High | 22,360 | 79,500 | 28.13% | 88.69% | 20.01% | 4.432 |

ANN | ||||||

Very low | 409 | 160,378 | 0.26% | 1.62% | 40.37% | 0.040 |

Low | 1741 | 79,155 | 2.20% | 6.91% | 19.92% | 0.347 |

Moderate | 5479 | 78,975 | 6.94% | 21.73% | 19.88% | 1.093 |

High | 17,583 | 78,781 | 22.32% | 69.79% | 19.83% | 3.517 |

LR | ||||||

Very low | 393 | 161,746 | 0.24% | 1.56% | 40.71% | 0.038 |

Low | 1838 | 79,127 | 2.32% | 7.29% | 19.92% | 0.366 |

Moderate | 5640 | 78,411 | 7.19% | 22.37% | 19.74% | 1.133 |

High | 17,341 | 78,005 | 22.23% | 68.78% | 19.63% | 3.503 |

CART | ||||||

Very low | 491 | 160,378 | 0.31% | 1.95% | 40.37% | 0.048 |

Low | 1341 | 79,419 | 1.69% | 5.32% | 19.99% | 0.266 |

Moderate | 7621 | 82,440 | 9.24% | 30.23% | 20.75% | 1.457 |

High | 15,759 | 75,052 | 21.00% | 62.51% | 18.89% | 3.309 |

Models | Area Under the ROC Curve (AUC) | Standard Error | 95% Confidence Interval | |
---|---|---|---|---|

Lower Limit | Upper Limit | |||

Training group | ||||

SVM | 0.927 | 0.002 | 0.923 | 0.930 |

ANN | 0.866 | 0.002 | 0.962 | 0.871 |

LR | 0.860 | 0.002 | 0.855 | 0.864 |

CART | 0.842 | 0.003 | 0.837 | 0.847 |

Prediction group | ||||

SVM | 0.922 | 0.001 | 0.920 | 0.923 |

ANN | 0.875 | 0.001 | 0.873 | 0.877 |

LR | 0.863 | 0.001 | 0.860 | 0.865 |

CART | 0.837 | 0.001 | 0.835 | 0.840 |

Authors | Study Area | Accuracy of SVM |
---|---|---|

An et al. [38] | The Wangzhou segment of the TGRA | 0.814 |

Marjanovic et al. [20] | The Fruška Gora Mountain (Serbia) | 0.842 |

Marjanovic et al. [39] | NW (Northwest) slopes of Fruška Gora Mountain, Serbia | 0.880 |

Chen et al. [40] | Hanyuan county, China | 0.875 |

Bui et al. [10] | The Son La hydropower basin (Vietnam) | 0.887 |

