1. Introduction
Landslides refer to natural processes in which a mass of rock or soil on an inclined surface moves downward, either partially or completely, induced by either natural or human-induced causes [
1]. As a common geological disaster, a landslide is sudden and destructive. It not only directly affects construction, transportation, and the ecological environment but also threatens the safety of people’s lives and property, causing far-reaching social and economic impacts [
2,
3]. According to statistical data from the China Natural Resources Bulletin (2024) [
4], landslides accounted for 57.98% (3316 out of 5719) of all recorded geological disasters in China in 2024. Southwestern China, with its complex terrain and extensive mountain ranges, is a region that has always been at high risk of landslides and other geological disasters. The Department of Emergency Management of Yunnan Province (2025) reported that geological hazard events in 2024 impacted an estimated 80,000 individuals, spanning 48 county-level administrative units across 15 prefecture-level divisions of the province [
5]. Therefore, it is crucial to undertake a comprehensive landslide susceptibility assessment, geohazard risk analysis, and identification of landslide-prone areas in Yunnan province of southwest China.
Landslide susceptibility is defined as the probability of landslides occurring in an area under basic geographic conditions. Traditional models, such as the Weights of Evidence (WOE) method [
6,
7,
8] and the Coefficient of Determination approach [
9,
10], have been widely employed in landslide susceptibility assessments. These models can reveal the dominant influencing factors of regional landslides. For example, the results obtained from the WOE method demonstrated that the distribution of landslides in Badong is predominantly concentrated along both sides of rivers and is influenced by faults [
11]. Similarly, the results obtained from the Coefficient of Determination approach demonstrate that rivers, faults, and rocks are the primary factors of landslides in the Qin-Ba Mountain region [
12]. Although traditional models are straightforward, easy to comprehend, and highly interpretable, these assessment techniques are limited by their incapacity to fully account for all possible influencing factors in the analysis, e.g., vegetation and extreme rainfall.
In recent years, machine learning has provided a new framework for landslide susceptibility research, leveraging its strong capabilities in data mining and pattern recognition, as well as providing more accurate predictive results. Among them, the most representative models are Random Forest (RF) [
13,
14], Back Propagation Neural Networks (BPNNs) [
10,
15], Support Vector Machines (SVMs) [
9,
16], Logistic Regression (LR), and Naive Bayes (NB) [
17]. Researchers have conducted a lot of studies to evaluate different areas using different machine learning models. For example, the multivariate LR method is employed in northeast Kansas, identifying slope as the most significant predictor of landslide risk [
18]. Two machine learning methods called RF and RF-SVM are developed to evaluate landslide susceptibility in the Rangit River watershed, and the results indicate both high regional risk and the superior predictive capability of the hybrid model [
19]. The evaluation conducted in the Nandakini River basin compares three machine learning models, including RF, Deep Learning Neural Network (DLNN), and Artificial Neural Network (ANN), and demonstrates that the DLNN achieves the highest predictive accuracy in landslide susceptibility assessment [
20]. Similarly, the comparative evaluation between LR and other machine learning approaches reveals that the RF model exhibits the highest predictive accuracy in the Sihjhong watershed of Taiwan [
21]. The consistently better performance of machine learning models across various regions confirms their technical superiority in landslide susceptibility assessment.
Landslide risk refers to the impact of landslides on human activities and economic factors. The Analytic Hierarchy Process (AHP) is a quantitative and multi-criteria method designed for the hierarchical representation of a decision-making problem. This approach has been extensively employed in disaster risk assessment. For example, in ground subsidence risk assessment, AHP can quantitatively assign weights to hydrological, geological, anthropogenic factors and other factors [
22]. Similarly, AHP can be used in flood, landslide and earthquake hazard assessment by assigning a weight value to each selected factor to provide risk maps for multi-hazard assessment modelling [
23,
24,
25]. It has previously been mathematically validated and used to identify landslide risk in India [
26]. As a semi-quantitative technique, AHP combines qualitative and quantitative methods. Machine learning training and simulation rely heavily on inventory data. However, landslide risk is a further assessment by considering socio-economic factors. Therefore, machine learning-based susceptibility maps serve as a key input in the AHP framework, creating a hybrid model. This approach weights outputs of machine learning and other layers to ensure the final risk maps are statistically reliable and informed by socio-economic judgment. Consequently, the hybrid model and the resulting map are valuable for decision-making.
Rainfall, different from terrain, geology, and other factors, is greatly affected by climate change. The relationship between rainfall and landslides has always been a research hotspot. Previous work identified a close relation between rainfall and landslide [
27,
28]. It is well accepted that shallow landslides and debris flows are triggered by high-intensity, short rainfalls, and deep-seated landslides occur as a result of less intense rain over a long time [
29,
30,
31]. However, previous studies have mostly considered the annual average precipitation [
32,
33,
34]. Extreme rainfall is often a key factor in triggering landslides, especially in areas with existing potential landslides. Heavy rainfall exerts an influence on the land surface, thereby affecting the movement of land surface materials. Long-term continuous rainfall leads to severe water infiltration, affecting deep soil layers. Therefore, the integration of extreme rainfall indices (e.g., rainfall intensity and duration) is expected to enhance the predictive precision of landslide susceptibility mapping through capturing rainfall intensity extremes beyond traditional mean-based metrics.
Therefore, this study employed four commonly used machine learning models and combined them with AHP for landslide susceptibility and risk assessment in Tengchong city. Compared with previous studies, this study has two major innovations. First, in the selection of influencing factors, the important influence of extreme rainfall on landslide occurrence is fully considered. The extreme rainfall indicators are introduced, and two representative factors are screened by correlation analysis for inclusion in landslide susceptibility assessment. It improves the accuracy of landslide prediction under extreme rainfall conditions. Second, a landslide risk zoning framework is developed through the integration of AHP and machine learning modeling. The results provide valuable support to assist local government in developing targeted disaster mitigation measures and strengthening landslide risk management capabilities.
4. Discussion
Geospatial analysis using machine learning models (e.g., Random Forest) demonstrates that the relative importance of key drivers (slope angle, curvature, land cover) exhibits spatially differentiated patterns across study regions, as revealed by permutation feature importance metrics [
11,
12,
18,
60]. For Tengchong, elevation, NDVI, lithology, and CWDs are the most critical factors for landslide assessment (
Figure 7a). It is evident that these factors, which represent topographic, land surface, lithological, and climatic dimensions, respectively, underscore the multifaceted nature of landslide susceptibility. This highlights that such geological hazards are governed by complex interactions among diverse environmental variables.
Topography is usually the primary factor controlling landslides. Landslide occurrence declines with increasing altitude as colluvial deposits become thinner in mountainous areas [
61,
62,
63]. Our results support this idea that landslides are concentrated in lowland regions of Tengchong with elevation less than 1800 m (
Figure 6a). Theoretically, slope is also among the most critical factors, with a strong correlation existing between slope angle and landslide susceptibility [
18]. However, this characteristic seems less pronounced in Tengchong, where landslides are more concentrated in low-slope areas (
Figure 6b). Moreover, in the RF model, the slope factor demonstrates less prominence in susceptibility assessments, ranking last among all evaluated factors. The generally low slope gradient (<25°) in Tengchong might lead to this weaker impact on landslides.
Areas with denser vegetation are usually less susceptible to landslides and vice versa [
64]. This aligns with our findings in this study that landslides in Tengchong are more concentrated in vegetation sparse areas with NDVI values below 0.57 (
Figure 5h). As vegetation cover increases, the relative density of landslides decreases. Vegetation has a unique soil-stabilizing effect. Therefore, when vegetation is sparse, looser soils can provide essential land surface material conditions for landslides. Vegetation in southwestern China has been increasing due to the CO
2 fertilization effect and land-use management [
65]. As such, we expect that vegetation restoration may reduce landslide risk to some extent in Tengchong.
Human activities in this assessment are mainly reflected in the distance from roads. Landslides in Tengchong are relatively concentrated within 500 m of roads (
Figure 6f;
Table 5). On the one hand, owing to Tengchong’s average elevation at approximately 2 km and its rugged hilly-mountainous terrain, the accumulation of rock and soil materials during road construction, coupled with excavation activities, compromises the stability of the original rock or soil mass. This process is also seen in the Loess Plateau [
42]. On the other hand, during the operational phase, dynamic loads and vibrations exerted by road traffic further induce stress on the surrounding geotechnical medium.
Landslides in Tengchong are concentrated in areas where the rock strata consist of Neogene sandstone and Quaternary alluvial deposits. Whether it is sandstone or alluvial deposits, the mechanism for landslide occurrence in these regions mainly lies in the instability of the rock-soil structure and high permeability [
7]. These characteristics make the areas with sandstone and alluvial deposits highly susceptible to disturbance and prone to landslides. The distance from faults is not significantly correlated with landslides in Tengchong. First, we infer most of the landslides in Tengchong may not have been triggered by fault activity, or that the existing faults have not shown significant movement in recent years. Second, the fault structure data are not detailed enough, or other variables reflecting tectonic activity should be proposed. Third, the current landslide survey may exhibit spatial sampling bias in that more landslides threatening human populations have been documented, whereas those in remote mountainous regions have been overlooked. It may partially explain why recorded landslides are closer to roads/rivers than to faults (
Figure 6e).
Globally, regional studies are necessary because landslides are sensitive to different factors in different regions. In southern Colombia, where the climate is similar to that of Tengchong, researchers used RF to evaluate landslide susceptibility. They found that elevation and soil silt content had the most significant impact, consistent with our results [
66]. In this study, we found that landslides are concentrated in low-elevation areas, whereas in Sikkim Himalaya, India, at the same latitude, landslides are more likely to occur in high-elevation regions [
67]. This may be because, in India, the monsoon winds encounter the Himalayas and are forced upwards by the terrain, generating significant rainfall and triggering landslides. In another region of Yunnan, researchers found that among their 12 evaluation factors, slope has the highest importance, making it the most significant controlling factor for landslides in Yulong County [
68]. Their result is different from that of our study; highly undulating terrain (more than 4000 m) may be the reason, according to the elevation map they showed. These comparisons reveal that although certain controlling factors (e.g., elevation) are universally important, the landslide-environment relationship is highly region-specific. This relationship is primarily shaped by unique local geological and climatic conditions.
There are some uncertainties and implications that should be noted for future studies. First, the current landslide survey may be subject to spatial sampling bias. The dataset primarily documents landslides that pose a threat to human populations, which may lead to the omission of events in remote mountainous regions. This oversight may explain why distance from faults—a key geological factor—exhibits a weak correlation with landslide locations (
Figure 6e), especially given Tengchong’s high frequency of geological activity. Additionally, the observed tendency of landslides to occur in low-slope areas (
Figure 6b) is likely attributed to this same sampling bias. Second, as the study area is relatively small, variables may have spatial autocorrelation. It may therefore inflate accuracy metrics because nearby pixels share similar environmental conditions. For the extreme rainfall indicator, although the results showed their importance in landslide susceptibility assessment, their minimal variation may affect the accuracy of their impact. Third, future research could integrate field survey data with InSAR data to achieve more precise landslide identification, thereby addressing the temporal constraints of remote sensing data and the spatial coverage limitations of field surveys. Finally, the occurrence of landslides is influenced by a variety of factors that interact with each other in a complex way. Machine learning techniques may capture these complex interactions but are difficult to explain. Sensitivity analysis of factors should be implemented to diagnose the landslide susceptibility.