# Landslide Susceptibility Mapping in a Mountainous Area Using Machine Learning Algorithms

## Abstract

**:**

## 1. Introduction

## 2. Study Area

^{2}(Figure 1). The length of the Kamyaran–Sarvabad road is around 97 km. The minimum and maximum elevations above sea level are 867 and 2474 m, respectively. The mountainous Kamyaran–Sarvabad road has a large number of sloping movements, including falling rocks and landslides. Due to its mountainous nature, the ridge is covered with snow and temporarily blocked on some winter days. The research region is situated in the Sirvan drainage basin, which is a part of the Sanandaj–Sirjan and Zagros structural zone. Rock outcrops from the Cretaceous to the Quaternary are included in bedrock lithologies. The majority of the research sites are composed of rocks from the Mesozoic and Cretaceous eras, including limestone, sandstone, shale, and volcanic rocks.

## 3. Landslide Conditioning Factors

^{2}) using the “Line density” tool in the ArcGIS 10.5 software (Figure 2h). Additionally, the “Euclidean distance” tool was used to map the distances to rivers, which were then divided into 5 categories: 0–500, 500–1000, 1000–1500, 1500–2000, and >2000 m (Figure 2i). The river density map was extracted using the “Line density” and then categorized into five classes including 0–0.115, 0.115–0.287, 0.287–0.458, 0.458–0.650, and 0.650–1.017 (km/km

^{2}) (Figure 2j).

^{2}) (Figure 2l).

_{S}(${\mathrm{m}}^{2}{\mathrm{m}}^{-1}$) is the precise catchment area and β is the slope angle in degrees. Five categories were used to categorise the SPI map, including 0.0125–2059, 2059–10,295, 10,295–25,738, 25,738–56,625, and 56,625–262,534 (Figure 2m). TWI is important conditioning factor in the incidence of landslides [36]. This index is an index of elevation that captures the ratio between slopes in the basin. It is a measure of the geographical spread of soil wetness throughout the surface of the earth, and it may be calculated using the equation below [37].

_{S}(${\mathrm{m}}^{2}{\mathrm{m}}^{-1}$) denotes the precise catchment area, tgβ denotes the slope angle at that location, and α denotes the total upslope drainage via a point. Five classes were used to generate the TWI map in this investigation including 1.104–4.586, 4.586–5.994, 5.994–7.698, 7.698–10.218, and 10.218–19.999 (Figure 2n).

## 4. Methodology

#### 4.1. Landslide Inventory Map (LIM)

#### 4.2. Background of the MLAs

#### 4.2.1. Random Forest

- (1)
- Determine each decision tree’s OOB error, denoted as errOOB1, using the matching out-of-bag (OOB) data in RF, with errOOB1 representing the average error for each calculation using only those predictions from the trees that are not contained in their respective bootstrap sample.
- (2)
- Add noise interference at random to every OOB sample, sampling feature X values. Additionally, a random adjustment may be made to the sample value at feature X. Recalculate the OOB data error after that, and then log the outcome as errOOB2.
- (3)
- Considering that RF contains N
_{S}trees, the following is the significance of feature X:

#### 4.2.2. Support Vector Machines

#### 4.2.3. Decision Tree Algorithm

#### 4.2.4. Validation of the Models

#### 4.2.5. Importance of the Factors Using Accuracy and the Gini Indexes

## 5. Results

#### 5.1. Importance of the Factors on Landslide Occurrence

#### 5.2. Performance of the Random Forest Model

#### 5.3. Developing Landslide Susceptibility Maps

## 6. Discussion

## 7. Conclusions

- (1)
- According to the results for two indices, Mean Decrease Accuracy and Mean Decrease Gini, the RF model was the most accurate in identifying the significance of landslide conditioning factors that caused landslide events in the current experiment. The most important factors in landslide susceptibility modelling for our research region include the distance to roads, road density, distance to rivers, geology, land use, elevation, distance to faults, aspect, fault density SPI, slope, TWI, and curvature.
- (2)
- LSMs were prepared using RF, DT, and SVM models adopting parameter tuning techniques. According on our research, the RF model performed and outperformed the DT and SVM models.
- (3)
- According to the landslide susceptibility maps, the most vulnerable locations are close to roads and follow the density of those roads. They are primarily in the middle of the research area as a result. The findings of this study can thus assist land developers, planners, and civil engineers with preliminary slope management and land-use planning, allowing them to take essential and scientific action to avert landslide dangers.

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

**Figure 2.**Landslide influential factors utilized in the current study: (

**a**) Slope, (

**b**) Aspect, (

**c**) Elevation, (

**d**) Curvature, (

**e**) Lithology, (

**f**) Land use, (

**g**) Distance to Fault, (

**h**) Fault density, (

**i**) Distance to river, (

**j**) River density, (

**k**) Distance to road, (

**l**) Road density, (

**m**) SPI, and (

**n**) TWI.

**Figure 5.**Performance of the random forest model: (

**a**) Error rate changes with an increasing number of trees (0 non-landslide, 1 landslide), (

**b**) ROC curve based on out-of-basket estimates.

**Figure 6.**Landslide susceptibility maps for the Kamyaran–Sarvabad main road, Iran, using three types of models: (

**a**) Decision Tree (

**b**) Random Forest, and (

**c**) Support Vector Machine.

Category | Parameter | Source, Scale/Resolution |
---|---|---|

Topographic map | Elevation | ALOS PALSAR DEM 12.5 m |

Slope | ||

Aspect | ||

Curvature | ||

Geological map | Lithology | Geology map, 1:100,000 |

Distance to fault | ||

Faults density | ||

Hydrological map | Topographic Wetness Index (TWI) | ALOS PALSAR DEM 12.5 m |

Stream Power Index (SPI) | ||

Distance to River (m) | ||

River density | ||

Land cover map | Land use | Iranian land use map |

Distance to Road (m) | Topographical map, 1:50,000 and Google Earth imageries | |

Road density |

**Table 2.**MDA and MDG indices based on a random forest model to determine conditioning factors of landslide susceptibility along the Kamyaran–Sarvabad main road.

Factor | 0 | 1 | MDA | MDG |
---|---|---|---|---|

Distance to roads | 16.3 | 24.5 | 24.1 | 5.9 |

Road density | 9.7 | 17.1 | 16.9 | 3.7 |

Distance to river | 5.3 | 10.5 | 10.14 | 1.3 |

Lithology | 7.6 | 8.3 | 9.4 | 1.7 |

Land use | 2.2 | 9.5 | 8.7 | 1.78 |

Elevation (m) | 1 | 8.8 | 8.3 | 0.83 |

River density | 2.8 | 6.1 | 6.05 | 0.89 |

Distance to fault | 1.8 | 5.8 | 5.9 | 0.52 |

Aspect | 2.1 | 5.84 | 5.8 | 1.32 |

Fault density | 2.08 | 4.01 | 3.9 | 0.38 |

SPI | 0 | 0 | 0 | 0.21 |

Slope | 0.36 | −0.21 | −0.1 | 0.31 |

TWI | 0.92 | −1.12 | −0.26 | 0.43 |

Curvature | −0.01 | −2.5 | −2.02 | 0.19 |

**Table 3.**Area and percentage of landslides for each of the susceptibility classes identified for the Kamyaran–Sarvabad main road, Iran, using Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF) models.

Classes | SVM | DT | RF | |||
---|---|---|---|---|---|---|

Area (%) | Class Area (Hectares) | Area (%) | Class Area (Hectares) | Area (%) | Class Area (Hectares) | |

VLS | 26.6 | 20,632.4 | 19.6 | 15,203.09 | 15.4 | 11,963.1 |

LS | 32.6 | 25,306.9 | 33 | 25,621.56 | 28.9 | 22,422.6 |

MS | 28.4 | 22,007 | 27.6 | 21,447.67 | 27.2 | 21,073.9 |

HS | 7.4 | 5750.7 | 13.6 | 10,529.34 | 21.3 | 16,489.4 |

VHS | 5 | 3875.5 | 6.2 | 4770.86 | 7.2 | 5623.5 |

**Table 4.**Area under the curve (AUC) results for the models utilising the validation dataset for the ROC curve.

Models | Area | Std. Error ^{a} | Asymptotic Sig. ^{b} | Asymptotic 95% Confidence Interval | |
---|---|---|---|---|---|

Lower Bound | Upper Bound | ||||

Decision Tree | 0.942 | 0.034 | 0.000 | 0.875 | 1.009 |

Random Forest | 0.823 | 0.067 | 0.001 | 0.692 | 0.953 |

SVM | 0.756 | 0.078 | 0.007 | 0.604 | 0.909 |

^{a}Under the nonparametric assumption;

^{b}Null hypothesis: true area = 0.5.

