Next Article in Journal
Hybrid CSP/PV Solar Systems for Sustainable Power Generation in Brazil: A Techno-Economic Perspective
Previous Article in Journal
Assessment of Regional Hydrogen Refueling Station Layout Planning and Carbon Reduction Benefits Based on Multi-Dimensional Factors of Population, Land, and Demand
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Towards Sustainable Development: Landslide Susceptibility Assessment with Sample Optimization in Guiyang County, China

1
School of Engineering, Xi Zang University, Lhasa 850032, China
2
Bomi Geological Hazards Field Scientific Observation and Research Station of the Ministry of Education, Bomê, Nyingchi 860300, China
3
School of Ecology and Environment, Xizang University, Lhasa 850032, China
4
National Institute of Natural Hazards, Ministry of Emergency Management of China, Beijing 100085, China
5
Key Laboratory of Compound and Chained Natural Hazards Dynamics, Ministry of Emergency Management of China, Beijing 100085, China
6
Geospatial Survey and Monitoring Institute of Hunan Province, Changsha 410129, China
7
College of Earth and Planetary Sciences, Chengdu University of Technology, Chengdu 610059, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sustainability 2025, 17(21), 9575; https://doi.org/10.3390/su17219575
Submission received: 4 October 2025 / Revised: 20 October 2025 / Accepted: 23 October 2025 / Published: 28 October 2025

Abstract

Here we present a high-resolution landslide susceptibility model for Guiyang County, China, developed to support sustainable disaster risk management. Our approach couples optimized positive and negative training samples with an ensemble of machine-learning algorithms to maximize predictive fidelity. We compiled a georeferenced inventory of 146 landslides by integrating historical records with systematic field validation. Sample optimization was central to our methodology: landslide presence points were refined via buffer-based dilution, and four classifiers—SVM, LDA, RF, and ET—were trained with identical covariate sets to ensure comparability. Three strategies for selecting pseudo-absences—buffering, low-slope filtering, and coupling with the IOE—were benchmarked. The Slope-IOE-O model, which synergizes low-gradient screening with entropy-weighted sampling, yielded the highest predictive capacity (AUC = 0.965). SHAP-based interpretability revealed that slope, monthly maximum rainfall, surface roughness, and elevation collectively dominate susceptibility, with pronounced non-linearities and interactions. Slope contribution peaks at 20–30°, monthly maximum rainfall exhibits a critical threshold near 225 mm, and the synergy between high roughness and road density amplifies landslide risk. Spatially, susceptibility follows a pronounced north–south gradient, with high-hazard corridors aligned along northern and southern mountain belts and the urban core of southern Guiyang County. By integrating rigorously curated training data with robust machine-learning workflows, this study provides a transferable framework for proactive landslide risk assessment, offering scientific support for sustainable land-use planning and resilient development in mountainous regions.

1. Introduction

Landslides are a type of geological disaster, characterized by their high incidence, diversity, and widespread impact on a global scale [1,2]. They not only cause significant destruction to the natural environment but also exert profound impacts on human society. With the intensification of global climate change and the advancement of human engineering projects, the risk of landslide disasters is further exacerbated [3,4]. In 2024, a total of 5719 geological disasters occurred in China, of which 3316 were landslides, accounting for 58% of all geological disasters [5]. While landslides are influenced by both natural factors and human activities, their inherent natural processes are uncontrollable and subject to long-term dynamics, making complete elimination of landslide hazards unattainable [6,7]. However, we can predict and mitigate the impacts of landslides through scientific methods and technological means, thereby minimizing the losses caused by these disasters. Conducting landslide susceptibility assessments is an effective approach to addressing this issue.
Today, landslide-susceptibility studies rely chiefly on three approaches: expert rules, physical simulation, and data-driven algorithms [8,9]. The extensive dependence on expert knowledge in experience-driven models makes them susceptible to subjectivity, which constrains their generalizability and practical application [10]. Physics-based models demand exhaustive hydrological and geotechnical inputs, so they are rarely scaled to regional mapping [11]. Because data-driven methods are objective and easily transferable, they have become the preferred choice for most researchers [12]. These empirical techniques split into classical statistical tests and an expanding family of machine-learning algorithms [13]. Statistical inference retrospectively dissects the parameter constellations that precipitated historical landslides [14]; by contrast, machine-learning algorithms autonomously learn and quantify non-linear, high-dimensional coupling patterns among predisposing factors, yielding accurate extrapolation of landslide susceptibility to unmapped terrain.
Machine learning algorithms suitable for landslide susceptibility mapping include both single models and ensemble models, with single models comprising linear regression [15], logistic regression [16,17], support vector machines (SVM) [18], and others, and ensemble models including random forests (RF) [19], gradient boosting trees [20], and others. In recent years, researchers have developed additional models by employing techniques such as boosting, bagging, stacking, voting, and blending of different models, such as XGBoost [21], LightGBM [22], Simple Stacking [23], Weighted Voting [24], Basic Blending [25], and others. However, different algorithm types vary in their- data processing methods and applicability. Currently, there is no consensus on a standardized methodology for landslide susceptibility assessment. We benchmarked two single models, SVM and linear discriminant analysis (LDA)—against two ensemble models—RF and extra trees (ET)—to systematically evaluate predictive performance.
A primary determinant of predictive accuracy in landslide susceptibility modeling is the quality, balance, and representativeness of the training samples, as they serve as the foundational basis for achieving reliable model performance [26]. For reliable landslide-susceptibility mapping, samples must capture the full spectrum of settings and traits tied to slope failure. Landslide samples, typically acquired through field surveys or remote sensing interpretation, are highly accurate. However, their spatial extent is significantly smaller than that of non-landslide areas, resulting in a substantial sample imbalance [27]. Landslides are episodic and sparse, especially in certain regions; this shortage skews machine-learning performance. Balancing and augmenting the landslide set is therefore imperative.
Currently, the two most used methods for optimizing landslide samples are oversampling and synthetic sample generation [28]. Oversampling offers a simple way to improve model performance on the minority class by augmenting its samples. Nevertheless, the potential for introduced redundancy necessitates careful consideration during implementation [29]. Synthetic sample generation is more complex in its implementation, generating new synthetic samples by interpolating between minority class samples, which increases sample diversity and reduces the risk of overfitting, However, the generated synthetic samples may not fully conform to the actual distribution, particularly when the data distribution is complex, or may result in an excessive number of synthetic samples being generated in dense minority class regions, leading to overfitting in these areas [30]. Therefore, both optimization methods for landslide samples have their respective advantages and disadvantages. To address the scarcity and imbalance between landslide and non-landslide samples, this study employed a straightforward oversampling approach. Non-landslide samples are generally not directly obtainable and are typically acquired indirectly by avoiding potential landslide points using various methods. Currently, No consensus yet exists on how non-landslide points should be chosen; the three prevailing strategies are the buffer-distance, factor-constrained, and coupled-model methods [31]. The buffer size in the buffer distance method is difficult to determine, and both excessively large and small buffer sizes can affect the classification of landslide susceptibility [32]. Caution must be exercised in buffer zone sizing for non-landslide sample selection. An insufficient buffer can incorporate pseudo non-landslide samples—areas geomorphologically similar to landslide sites—which dilutes the model’s predictive capability and causes an underestimation of susceptibility. Conversely, an excessively large buffer size may cause non-landslide samples to be overly confined in the environmental feature space, resulting in poor global representativeness of the non-landslide sample set and consequently an overestimation of landslide hazard [33]. Besides the factor-constrained selection method—which incorporates multiple factors, notably slope, for effective non-landslide sample selection [34,35]—coupled models (e.g., information value with frequency ratio) have also achieved commendable results [36,37]. Despite their ability to enhance model accuracy, these methods carry a risk: the selected non-landslide samples may fall within geomorphologically susceptible areas, ultimately compromising the predictive performance of the machine learning models. It is widely held that higher purity in non-landslide samples enables a prediction model to extract more accurate classification features from the conditioning factors, thereby improving its performance [38]. Therefore, this study employs these three methods and their three combined approaches to optimize non-landslide samples, aiming to identify the optimal method for optimizing non-landslide samples and thereby enhance the accuracy of machine learning.
This study develops a landslide susceptibility assessment method based on sample optimization and machine learning for areas with limited landslide data. First, both landslide and non-landslide samples were optimized to identify the most effective strategy. Then, the optimal hyperparameters for four models (SVM, LDA, RF, and ET) were determined using a grid search, and their susceptibility maps were generated. Finally, the models were comprehensively evaluated using the AUC value, Accuracy, Precision, Recall, F1-score, and landslide frequency ratio [39,40].

2. Materials and Methods

2.1. Study Area

2.1.1. Topographical and Geological Conditions

Guiyang County is in the southwest of Chenzhou, in southeastern Hunan Province, with longitude ranging from 112°13′26″ E to 112°55′46″ E and latitude from 25°27′15″ N to 26°13′30″ N. As the largest and most populous county in Chenzhou City, Guiyang County covers a total area of 2958.61 km2 [41] (Figure 1). Guiyang County is situated on the northern side of the Nanling Mountains, with the Tashan and Dayishan Mountains in the north, the northern foothills of Qitianling in the south, and extensive hilly and upland areas in the middle. The regional topography is characterized by elevated northern and southern parts and a depressed central area, resulting in a saddle-shaped relief. The landscape comprises rugged ridges and deeply incised gullies, rising from 59 m to 1400 m within a relief of 1341 m and locally exceeding 70°. This extreme steepness substantially undermines slope stability and precipitates a sharp rise in landslide probability [41].
Guiyang County lies at the intersection of radial Leiyang–Linwu and NE-trending Xinhua-Xia structures. Faults and folds strike ~20°, especially in Huangshaping mine; the NE corner belongs to the Yong–Chen fold belt. Outcrops are Upper Paleozoic mudstone, shale, siltstone, limestone and dolomite, capped by thin Quaternary deposits. Under humid subtropical weathering, soft mudstone/shale hydrate and argillite, creating weak bedding planes; preferential erosion undercuts competent limestone, promoting toppling and planar slides. Karstic limestone is dissected by joints and faults that channel groundwater, further lowering shear strength. Intensive extraction of more than 60 minerals has damaged rock mass and hydrology [41]. Together, tectonic weak layers, karst groundwater and mining disturbance make slopes highly unstable and landslides frequent.
Guiyang County experiences a humid subtropical monsoon climate, receiving 1400–1800 mm of precipitation annually, 55% of which is concentrated between April and June when 3-h intensities can exceed 80 mm; these extreme rainfall events constitute a primary trigger of landslides. These mass-movements pose direct threats to critical linear infrastructure—including the Gui-Xin Expressway and the Ganzhou–Chenzhou Railway—and to strategic hydraulic assets such as the Ouyang-Hai Reservoir irrigation scheme. Consequently, quantitative landslide-susceptibility mapping is essential for delineating high-priority zones, optimizing corridor alignment, and designing slope-stabilization strategies that safeguard regional water security and underpin sustainable socio-economic development.

2.1.2. Hydrological and Climatic Conditions

Guiyang County has a well-developed river system, with three primary rivers—Chongling River, Baishui River, and Yi River—56 secondary rivers, 22 tertiary rivers, and 8 quaternary rivers, with a total river length of 1363 km, a river network density of 0.46 km/km2, and an annual runoff volume of 2.03 billion cubic meters [42]. The dense river network may lead to soil saturation during heavy rainfall, further reducing slope stability. Guiyang County sits in a subtropical monsoon zone, characterized by concentrated precipitation and frequent heavy rainstorms. The annual precipitation is unevenly distributed, with significant concentration in the summer and frequent heavy rainstorms. Heavy rainfall is a primary trigger for landslides, as it raises pore water pressure and weakens the shear strength of slopes, thereby increasing disaster frequency.

2.2. Data Sources

2.2.1. Landslide Relic Data Sources

The analysis of landslide relic data is essential for understanding their spatial distribution and predicting future occurrences, thus holding vital importance for landslide hazard mitigation studies. The sources of landslide relic data mainly fall into two categories: First, data can be obtained and collected through field surveys, which yield accurate and reliable information but are time-consuming and labor-intensive, especially in remote or inaccessible areas. Second, data can be derived from the interpretation of high-resolution satellite images, which offers broad coverage and allows for rapid acquisition of landslide relic data over large areas at relatively low cost. However, the accuracy of interpretation is influenced by image resolution, weather conditions, and interpretation techniques, making it difficult to identify landslide relics in complex terrains or areas with dense vegetation [43,44]. Given that Guiyang County is in a subtropical monsoon climate zone, the vegetation throughout the area is extremely lush, which significantly affects landslide interpretation. Therefore, this study employs field surveys to obtain landslide relic data, with a total of 146 landslide relics identified in study area (Table 1). The landslide points were mapped onto the Digital Elevation Model (DEM) using ArcGIS 10.8 (Figure 2).
As shown in Figure 2, landslides in Guiyang County are primarily distributed in the relatively high-altitude northern and southern mountainous regions. In addition, the southern area surrounding the county town, which has a higher road density, also experiences a higher frequency of landslides, while the central region, characterized by lower altitude and relatively flat terrain, has fewer landslides.

2.2.2. Data Sources of Landslide Conditioning Factors

Landslide conditioning factors can primarily be categorized into four major groups: Terrain, Hydrology, Land cover, and Lithology, with most studies selecting around 10 easily accessible factors [45,46]. Different conditioning factors may vary in their contributions across different study areas or models, and some researchers have summarized the usage frequency of each conditioning factor [27]. For a comprehensive representation of landslide characteristics, 18 conditioning factors with high and moderate usage frequencies were selected to establish a landslide susceptibility assessment system for Guiyang County. The corresponding data sources are detailed in Table 2.

2.3. Mapping Units

In landslide susceptibility prediction, slope units and grid units are two commonly used types of mapping units, Slope units are typically delineated using hydrological information from ridge lines and valley lines. This method can incorporate certain topographic and geomorphological information into the slope units, but there is no unified standard for the parameters used during the generation process, and the process is highly complex, particularly when manual corrections to the slope units are required, which can be very time-consuming [47,48]. Thus, grid cells are adopted as the mapping units in this study. The selection of grid cell size is crucial for both model accuracy and computational efficiency. A grid cell size that is too small can lead to excessive computational load, while a size that is too large may fail to accurately capture the spatial distribution characteristics of landslides. An established empirical formula exists for determining the appropriate grid cell size [49].
G s = 7.49 + 6 × 1 0 4 S 2 × 1 0 9 S 2 + 2.9 × 1 0 15 S 3
where Gs denotes the appropriate grid size, and S is the denominator of the map scale.
This study employs a survey scale of 1:50,000. Using Equation (1), the theoretical grid size (Gs) is calculated as 32.853 m; however, a practical size of 30 m × 30 m was adopted for the evaluation unit to facilitate data analysis. This resulted in a total of 3,289,874 grid cells for Guiyang County.

2.4. Landslide Conditioning Factors

A uniform coordinate system was applied to all landslide conditioning factors, using a Universal Transverse Mercator (UTM) projection referenced to zone 49N (based on the 3° division scheme). The classification was performed as follows: discrete variables (i.e., lithology, land use, aspect) were categorized based on observational criteria; continuous variables were classified employing the natural breaks method (Figure 3).
Elevation: Elevation is one of the fundamental characteristics of terrain, Elevation itself does not directly trigger landslides but indirectly alters the types and frequencies of landslides by influencing climate, hydrology, vegetation, and human activities. The elevation data for Guiyang County was obtained from a national DEM with a spatial resolution of 30 m. The elevation, ranging from 59 to 1400 m, was categorized into five classes: 59–249 m, 249–393 m, 393–584 m, 584–826 m, and 826–1400 m. The first two classes (below 393 m) were found to occupy 75% of the study area.
Slope: Slope measures the steepness of a terrain. As the slope angle increases, so does the downslope gravitational force, making it easier to overcome the shear resistance of the material and consequently raising the probability of a landslide. Using a 30-m DEM, slopes were categorized into five classes: 0–6.86°, 6.86–13.73°, 13.73–21.79°, 21.79–32.53°, and 32.53–76.10°. The classification scheme resulted in the first two gentle slope classes (0–13.73°) collectively occupying 65% of the study area.
Aspect: Aspect refers to the geographical direction that a slope faces, which influences solar radiation, precipitation distribution, vegetation types, and weathering processes, thereby indirectly affecting slope stability, and it is generally believed that sun-facing slopes (aspect) are more prone to landslides than shaded slopes. Aspect was derived from a 30-m resolution DEM and classified into eight directional classes: north, northeast, east, southeast, south, southwest, west, and northwest.
Profile Curvature: Profile Curvature refers to the terrain curvature in the direction of slope gradient, which describes the concave and convex shapes of the slope in the vertical direction. Using a 30-m DEM, profile curvature was categorized into five classes: −37.08 to −4.65, −4.65 to −1.44, −1.44 to 0.81, 0.81 to 3.70, and 3.70 to 44.81, where the range from −1.44 to 3.70 accounts for 80%.
Plane Curvature: Plane Curvature refers to the terrain curvature in the direction of contour lines, which describes the bending shape of the slope in the horizontal direction. Using a 30-m DEM, plane curvature was categorized into five classes: −25.78 to −2.96, −2.96 to −0.98, −0.98 to 0.33, 0.33 to 2.09, and 2.09 to 30.18, where the range from −0.98 to 2.09 accounts for 83%.
Lithology Type: Lithology type refers to the material composition and engineering geological properties of surface or subsurface rocks and soils, which directly influences key stability parameters of slopes, such as shear strength, permeability, and weathering rate. The lithology types were interpreted from the 1:50,000 geological map of Hunan Province. Following the Chinese Engineering Rock Mass Classification Standard (GB/T 50218-2014 [50]), the lithologies were classified into five groups: Harder Rock (e.g., quartzite, basalt, granite), Hard Rock (e.g., slate, limestone), Weak Rock (e.g., dolomite, conglomerate), Weaker Rock (e.g., argillaceous limestone, sandstone, siltstone), and Loose Rock (e.g., silty clay). Collectively, Hard Rock and Weak Rock account for 88% of the study area.
DOF: DOF is a structural metric representing the total length of faults per unit area (km/km2), thus quantifying the regional intensity of tectonic fragmentation. The higher the fault density, the more intense the tectonic influence on the rock mass, and the poorer the stability is generally. The fault density data were derived from the 1:50,000 geological map of Hunan Province, with fault information extracted, and fault density was calculated using the fishnet method, divided into five categories: 0–0.25, 0.25–0.76, 0.76–1.23, 1.23–1.76, and 1.76–3.16.
Rainfall: Monthly Maximum Rainfall (unit: mm) is defined as the intensity of extreme precipitation events and is a major landslide trigger [51]. It functions through the following mechanism: heavy rainfall saturates the slope, increasing pore water pressure and reducing both effective stress and shear strength, which ultimately causes landslides. Based on the collected landslide dataset, Guiyang County experiences the highest number of landslides in June, which also coincides with the month of the highest monthly rainfall in the county. Therefore, this study utilizes the national monthly maximum rainfall data from 1991 to 2020, divided into five categories: 205.1–211.8, 211.8–216.9, 216.9–225.1, 225.1–236.2, and 236.2–256.7, with the range of 205.1–216.9 accounting for 73% of the area.
DOS: DOS is a hydrological metric representing the total length of rivers per unit area (km/km2), thus reflecting the degree of surface runoff development and the regional erosive capacity of the river network. Areas with high DOS typically exhibit strong terrain dissection and active hydrological activity, which significantly affect slope stability. This study extracted river vector data from basic geographical data in the study area, and calculated river density using the fishnet method, divided into five categories: 0–0.26, 0.26–0.79, 0.79–1.24, 1.24–1.79, and 1.79–3.28.
NDVI: NDVI reflects vegetation status and cover density, and it exerts a dual influence on landslide activity. Positively, root systems reinforce soil shear strength, while canopy interception reduces rainfall infiltration and runoff. Plant transpiration also lowers soil moisture, suppressing pore water pressure buildup and thus slope instability. Conversely, the weight of large trees can increase slope load, and decaying roots may create preferential flow paths for water, potentially triggering deep-seated landslides. Using a 30-m DEM, NDVI were categorized into five classes: −0.16 to 0.42, 0.42 to 0.61, 0.61 to 0.74, 0.74 to 0.83, and 0.83 to 1.00, with the range of 0.74 to 1.00 accounting for 77% of the area.
TWI: The potential for water accumulation and soil moisture was assessed using the TWI. This index was computed from a 30-m resolution DEM and subsequently categorized into five classes: 3.97–7.60, 7.60–9.66, 9.66–12.60, 12.60–16.81, and 16.81–28.97. The combined area of the two lowest TWI classes (3.97–9.66) constitutes 75% of the total area.
Land Use Type: Dense forest cover significantly increases soil shear strength, thereby stabilizing slopes. Conversely, sparse vegetation—such as shrubs, grassland, and bare land—provides weak root reinforcement, resulting in low shear strength and a higher propensity for landslides. The land cover data were obtained from the 30-m annual dataset of China provided by the National Cryosphere Desert Data Center. A reclassification was performed, consolidating the original types into five categories: arable land, forest, water bodies, urban land, and other land uses (shrubs, grasslands, and bare land). Forests and arable land together constitute 97% of the total area.
DOR: DOR is a spatial metric representing the total length of the road network per unit area (km/km2), thus quantifying the intensity of human engineering activities and associated surface modification. This study extracted road vector data from basic geographical data in the study area, and calculated road density using the fishnet method, divided into five categories: 0–0.49, 0.49–1.52, 1.52–2.99, 2.99–5.81, and 5.81–11.39.
SPI: SPI is defined as the rate of kinetic energy loss per unit flow width, which reflects the erosive power of surface runoff. This makes it a useful measure for estimating the potential for slope erosion and sediment transport. The SPI, extracted from a 30-m resolution DEM, was classified into five intervals: 2.75–6.55, 6.55–8.69, 8.69–10.63, 10.63–13.23, and 13.23–26.40. The classification resulted in the middle two classes (6.55–10.63) occupying 71% of the study area.
CV: The dispersion of elevation was assessed using the CV, defined as the ratio of the standard deviation to the mean. This index was computed from a 30-m resolution DEM. The resulting CV values were categorized into five intervals: 0–0.0051, 0.0051–0.0096, 0.0096–0.0162, 0.0162–0.0278, and 0.0278–0.1289. The two lowest categories (0–0.0096) were found to occupy 71% of the study area, indicating predominantly uniform terrain.
Roughness: Topographic roughness is a parameter that characterizes the micro-scale undulations of the terrain, reflecting the complexity of local topography, which is a key indicator for identifying potential landslide surfaces (unit: m). Calculated from a 30-m DEM, the topographic roughness was divided into five intervals (1–1.04, 1.04–1.12, 1.12–1.26, 1.26–1.50, 1.50–4.16), with the range of 1–1.12 covering 92% of the study area.
Cutting-Depth: The cutting depth, defined as the vertical distance from the current surface to the pristine geomorphic datum, was computed from a 30-m DEM. This metric, expressed in meters, serves as a core indicator of surface modification intensity for landslide risk evaluation in high mountain gullies. The computed values were categorized into five intervals: 0–2.05, 2.05–4.78, 4.78–8.19, 8.19–13.19, and 13.19–58. The combined area of the first three intervals (0–8.19 m) constitutes 92% of the region.
Relief: The relief amplitude, which quantifies the maximum elevation difference within a local analysis window, was computed from a 30-m resolution DEM. This metric serves as a core indicator for characterizing terrain dissection and for disaster mitigation planning in mountainous regions (units: meters). The calculated values were categorized into five intervals: 0–5, 5–10, 10–17, 17–26, and 26–132. The combined area of the low to moderate amplitude categories (0–17 m) constitutes 93% of the total area.

2.5. Data Correlation Analysis Method

Given that landslide susceptibility assessment can be affected by collinearity among conditioning factors, which reduces model reliability, a two-step screening process was utilized. First, multicollinearity was assessed using the Variance Inflation Factor (VIF > 5 indicates issues [52]). Second, pairwise linear correlations were examined using the Pearson coefficient (|r| > 0.7 suggests a strong relationship [53]). Factors flagged by either criterion were considered for removal to minimize information redundancy.

2.6. Landslide Susceptibility Assessment Method

2.6.1. IOE Model

The IOE model is a statistical model designed to assess the significance of various landslide conditioning factors. It achieves this by measuring the degree of disorder (entropy) within the data, which objectively reflects each factor’s importance and its relative contribution to the overall assessment.
The following calculations and formulas are derived from the methodology presented in reference [40]:
F R i j = a b
P i j = F R i j j = 1 s F R i j
H i = j = 1 s P i j × l o g 2 P i j
H i , m a x = log 2 S
I i = H i , m a x H i H i , m a x
P i = 1 S j = 1 s F R i j
W i = I i × P i
where the frequency ratio FRij refers to the landslide density within class j of factor i compared to the overall study area. Terms a and b signify the ratio of landslide points and the area ratio for that specific class, respectively, while Pij is its probability density. The variable s denotes the total number of classes within the factor. Furthermore, Hi and Hi,max represents the calculated entropy and the maximum possible entropy for factor i, respectively, from which the information coefficient Ii is derived. Finally, Wi is the comprehensive weight resulting from this calculation for each factor.
L S I = i = 1 n F R × W i
where LSI represents the Landslide Susceptibility Index.

2.6.2. SVM Model

The SVM model [54] is a powerful supervised learning algorithm. Its core idea is to find the optimal hyperplane that maximizes the margin between different classes, thereby enhancing generalization ability. This maximum-margin hyperplane is determined by the closest data points, known as support vectors. The SVM can handle both linear and non-linear problems by employing kernel functions to map data into higher-dimensional spaces.

2.6.3. LDA Model

The LDA model [55] is a supervised dimensionality reduction and classification technique. It aims to find a linear projection that maximizes the separation between classes while minimizing the dispersion within each class. This is achieved by maximizing the ratio of between-class variance to within-class variance, under the assumption that all classes share a common covariance matrix and are normally distributed. LDA is particularly effective for datasets with high-dimensional features and a limited number of samples.

2.6.4. RF Model

RF model [56] is an ensemble learning algorithm, which enhances model accuracy and robustness by constructing multiple decision trees. It effectively handles high-dimensional data, is robust to missing values and outliers, and can assess feature importance. RF helps reduce the risk of overfitting, enhances model generalization capability, and is easily parallelizable, speeding up the training process. This method can handle both classification and regression problems while effectively modeling nonlinear data patterns. RF typically requires no complex parameter tuning, and performs well even with default settings, making it an ideal choice for solving complex data problems.

2.6.5. ET Model

ET model [57] is an ensemble method that performs classification or regression by building many randomized decision trees and aggregating their predictions. Unlike RF, the ET model introduces more randomness in tree construction, such as randomly selecting features and split points at each decision node. This extreme randomization makes the model more robust, enabling it to capture complex patterns and interactions in the data. The ET model is typically used for handling high-dimensional data, and it has some robustness against outliers and noise. Due to its simple structure and fast training speed, and its good performance on many datasets, the ET model has been widely used in practical applications.

2.6.6. Landslide Sample Optimization Methods

A landslide sample optimization procedure was performed to mitigate class imbalance and enhance model accuracy. First, a 30-m buffer zone was created around the original landslide points. Subsequently, synthetic samples, numbering three times the original landslide count, were randomly generated within this buffer. The final dataset, comprising both original and synthetic samples, contained a total of 584 landslide samples.

2.6.7. Non-Landslide Sample Optimization Methods

Six methodologies for non-landslide sample optimization were implemented in a comparative framework to determine the most effective approach (Figure 4). The methodologies are delineated as follows:
Buf-O (Dataset 1): Non-landslide samples were randomly selected from beyond a 500-m buffer surrounding landslide inventories [32].
IOE-O (Dataset 2): Sampling was conducted within zones classified as very low or low susceptibility by the Index of Entropy (IOE) model.
Slope-O (Dataset 3): Samples were extracted from terrain characterized by low slope angles.
Slope-IOE-O (Dataset 4): Sampling was confined to the spatial overlap between low-slope areas and the IOE model’s very low/low susceptibility zones.
Buf-Slope-O (Dataset 5): Samples were derived from the intersection of low-slope areas and regions external to the 500-m landslide buffer.
Buf-IOE-O (Dataset 6): Samples were selected from the intersection of the IOE model’s very low/low susceptibility zones and the area outside the 500-m landslide buffer.
In all cases, the selected non-landslide samples were merged with the optimized landslide samples to form the final dataset for model training.

2.6.8. SHAP Feature Interpretation Method

The “black box” nature of many machine learning models often obscures their internal prediction processes. To mitigate this, SHapley Additive exPlanations (SHAP)—a method rooted in the game theory of Lloyd Shapley [58] and introduced to machine learning by Lundberg [59]—provides a powerful solution for interpretability. SHAP improves transparency and trust by quantifying the contribution of each input feature to the model’s output. Its core strength lies in facilitating both global interpretation, which assesses overall feature importance across the dataset, and local interpretation, which deconstructs the prediction for a single sample. The formula for calculating Shapley values is as follows:
ϕ i = S F \ i S ! F S 1 ! F ! f x S i f x S
where ϕ i denotes the Shapley value of the i-th feature, F represents the total number of features, f x S i represents the model prediction value for sample x when feature i is added to the feature subset S, f x S represents the model prediction value when only the feature subset S is used.

2.6.9. Validation Metrics

The classification models were evaluated based on the Receiver Operating Characteristic (ROC) curve and a confusion matrix. The ROC curve’s position is related to the top-left corner and its corresponding Area Under the Curve (AUC) value were used to gauge model performance, with higher values indicating greater robustness and classification accuracy. Additionally, a confusion matrix was utilized to compare predicted classifications with actual values, enabling the computation of Accuracy (Acc), Precision, and the F1-score—the harmonic mean of precision and recall. The formulas for these metrics are as follows [40]:
A c c u r a c y = T P + T N T P + T N + F P + F N
Precision = T P T P + F P
F 1 = 2 T P 2 T P + F P + F N
R e c a l l = T P T P + F N
In the confusion matrix, TP (True Positive) and TN (True Negative) represent the counts of correctly classified positive and negative instances, respectively. Conversely, FP (False Positive) denotes the count of negative instances misclassified as positive, while FN (False Negative) refers to the count of positive instances misclassified as negative.

2.7. Landslide Susceptibility Assessment Workflow

2.7.1. Assessment Steps

(a)
Landslide Inventory Compilation: A landslide inventory is constructed by integrating multiple data sources, including historical records, remote sensing interpretation, and field investigation data.
(b)
Construction of Conditioning Factor System: A set of conditioning factors was established, covering topographic, hydrological, geological, and environmental attributes (e.g., slope, rainfall, lithology, NDVI).
(c)
Screen conditioning factors: Use multicollinearity detection, Pearson correlation coefficients, and collinearity diagnostics to select key factors affecting landslides and eliminate highly correlated factors such as cutting depth and relief amplitude.
(d)
Construct the model dataset: Expand the landslide database by increasing the number of landslide points through a 30-m buffer zone and construct the non-landslide dataset using random sampling and non-landslide sample selection methods.
(e)
Model Evaluation and Selection: The four machine learning models (SVM, LDA, RF, and ET) were evaluated and compared. The evaluation was based on a suite of indicators, including ROC curves, AUC values, and confusion matrices, to facilitate the selection of the superior model and dataset.
(f)
Susceptibility Mapping and Accuracy Assessment: A landslide susceptibility map was generated using the selected optimal model. The accuracy of the prediction was assessed by comparing the statistical outcomes of the susceptibility zoning.

2.7.2. Technical Approach

Figure 5 depicts the technical workflow adopted in this study.

3. Result

3.1. Multicollinearity Diagnosis and Pearson Correlation Analysis

A collinearity diagnosis was conducted for all conditioning factors using the linear regression module in SPSS 27.0, with the results (VIF) summarized in Table 3. The Pearson correlation analysis between each pair of factors was conducted using the correlation analysis module in SPSS 27.0 (Figure 6).
Diagnostic results indicated that all 18 conditioning factors exhibited tolerance values exceeding 0.2 and VIF values under the threshold of 5, which confirms that significant multicollinearity was not present among the factors. Notably, the VIF values for incision depth, relief, and slope were approximately 4, and their Pearson correlation coefficients surpassed 0.7. The slope factor was deemed essential for the assessment due to its fundamental influence on landslide mechanisms and was therefore retained. Conversely, the factors of incision depth and relief amplitude were consequently removed from the analysis. Consequently, the remaining 16 hazardous factors exhibited neither collinearity nor strong correlation.

3.2. Relationship Between Conditioning Factors and Landslide Relics

The landslide inventory points were assigned to the Frequency Ratio value of their respective factor classes via the “Extract Multi Values to Points” function in ArcGIS 10.8. The Frequency Ratio for a class is calculated as: (Number of landslide points in the class/Total landslide points)/(Area of the class/Total study area). Subsequently, the number of disaster points and grid cells for each level of the conditioning factors were tallied. Finally, the values of a, b, FRij*, Pij, Hi, Hi,max, Ii, Pi, and Wi* were calculated using Excel spreadsheets according to Equations (2)–(8) (Table 4).
The entropy index weight (Wi) serves as a metric for the influence of each conditioning factor on landslide development, thereby facilitating an objective analysis of their relative contributions. The weight assigned to a factor is directly proportional to its contribution to landslide susceptibility. The quantitative analysis ranks the conditioning factors in the following descending order of contribution: land type > road density > rainfall > roughness > lithology type > NDVI > CV > DEM > SPI > relief > slope > incision depth > TWI > profile curvature > fault density > stream density > plan curvature > aspect. Among them, the dominant factors such as land type, road density, and rainfall have smaller entropy values but larger weights, indicating their significant impact on landslide development. Factors such as plan curvature and aspect display higher entropy values yet lower weights, signifying their relatively limited role in landslide development.

3.3. Analysis of Different Sample Optimization Methods

We constructed SVM, LDA, RF, and ET models using the scikit-learn library in Python 3.11. All six datasets (1–6) were used to train and test these models with a split ratio of 7:3 for training to testing. The precision of the models was assessed using ROC curves, shown in Figure 7.
Hyperparameters are configuration variables that are fixed before learning commences and remain unchanged while the model optimises its parameters on data. These variables govern model capacity, convergence speed and generalisation; Their selection directly determines predictive performance. Identifying optimal hyperparameters is thus critical to maximising model efficacy. Systematic tuning—whether through grid search [60], Bayesian optimisation [61], evolutionary strategies [62] or meta-learning [23]—is now regarded as an indispensable component of every machine-learning pipeline. Owing to the low-dimensional hyper-parameter space of the model under study, exhaustive grid search offers the most direct route to the global optimum.
The optimal parameters for each model, finalized after multiple tuning iterations, are listed in Table 5. To mitigate the bias from imbalanced data distribution and improve training precision, we employed 10-fold cross-validation using the StratifiedKFold function with the parameters: n_splits = 10, shuffle = True, and random_state = 42.
Figure 7 shows that all models achieve robust classification across datasets 1–6, with AUC values consistently above 0.8. RF and ET outperform SVM and LDA, exhibiting markedly higher AUC values. Systematic variation of non-landslide sampling strategies substantially enhances predictive performance. Specifically, AUC improves by 0.089 for SVM (0.844 → 0.933), 0.112 for LDA (0.810 → 0.922), 0.058 for RF (0.907 → 0.965) and 0.054 for ET (0.907 → 0.961). For the best-performing RF model, the strategies yield AUCs in descending order: Slope-IOE-O > Slope-O > IOE-O > Buf-Slope-O > Buf-IOE-O > Buf-O. Within single-objective optimizations, performance decreases as Slope-O > IOE-O > Buf-O; under hybrid optimizations, the hierarchy becomes Slope-IOE-O > Buf-Slope-O > Buf-IOE-O. Overall, Slope-IOE-O delivers the highest predictive accuracy across all evaluated sampling strategies.

3.4. Analysis of Model Performance and Effectiveness

3.4.1. Analysis of Model Performance

Based on the results of different sample optimization methods, the Slope-IOE-O method was found to be the most effective. Therefore, we utilized this dataset (dataset4) to apply the SVM, LDA, RF, and ET models with their optimal parameters for analysis, and the confidence intervals were presented in Table 6.
Table 6 shows that all models exhibit narrow 95% confidence intervals, signifying robust performance estimates. RF and ET surpass SVM and LDA across nearly all metrics, and their tighter intervals denote both reduced variance and heightened stability. Specifically, RF marginally excels in AUC and precision, whereas ET demonstrates superior accuracy, F1-score and recall.

3.4.2. Analysis of Model Effectiveness

A landslide susceptibility map for Guiyang County was generated using Python 3.11, Dataset 2, and four models (SVM, LDA, RF, and ET). This map was subsequently categorized in ArcGIS 10.8 into five susceptibility levels (very high, high, moderate, low, and very low) through the natural breaks’ classification method, as presented in Figure 8.
Model validation was performed by overlaying the original landslide points onto the susceptibility map. The number of landslides and grid cells were counted for each susceptibility level, enabling the computation of landslide density and frequency ratio. The frequency ratio for a given level was computed using the formula: (Percentage of Landslides in the Level)/(Percentage of Total Area occupied by the Level). The results of this analysis are presented in Table 7.
As evaluated by the landslide frequency ratio in very high and high susceptibility zones (Table 6), the models are ranked in descending order of performance as: RF, ET, SVM, and LDA, with RF and ET models producing nearly identical areal proportions in these critical zones. RF and ET yield nearly identical areal proportions for very high and high susceptibility zones. Collectively, these zones cover merely 42% of the study region but encompass 96% of the inventoried landslides.
Overall, the RF and ET models demonstrate superior classification performance and effectiveness compared to the SVM and LDA models, with RF being marginally superior to ET and thus exhibiting the best overall performance.

3.5. Model Interpretability via SHAP

Machine-learning models are inherently black boxes, rendering their decision logic opaque; therefore, interpretability frameworks are essential for trustworthy evaluation. We focus on Slope-IOE-O (Dataset 4)—the optimal sampling strategy—and the top-performing RF model, leveraging SHAP for global and local interpretation. Using Tree Explainer (SHAP v0.44, Python 3.11), we quantify global feature importance and instance-level Shapley values, elucidate dominant landslide controls and factor interdependencies, and thereby dissect the model’s internal decision mechanism.

3.5.1. Global Interpretation

The SHAP summary plot is displayed in Figure 9. The SHAP value for each predictor, plotted on the horizontal axis, is interpreted as follows: a positive value signifies that the feature elevates the model’s landslide susceptibility prediction, whereas a negative value corresponds to a decrease in the predicted risk. Each point corresponds to an individual sample, with colour indicating the factor’s magnitude (red = high, blue = low). Factors exhibiting broad horizontal dispersion exert greater influence, whereas tight clustering around zero indicates limited predictive relevance. Transparent bars denote global feature importance, with bar length proportional to each factor’s relative contribution.
Transparent bars in the SHAP summary plot identify slope, maximum rainfall, surface roughness and elevation as the four dominant landslide drivers among the sixteen predictors, as evidenced by their greatest bar lengths. Conversely, profile curvature and stream density contribute minimally, exerting negligible influence on landslide occurrence. Slope, Rainfall, Roughness and DEM exhibit wide, positively skewed SHAP distributions, indicating that higher values amplify landslide risk: red (high-value) points align with positive SHAP scores, whereas green (low-value) points cluster in the negative domain, denoting risk suppression. Low NDVI and DOF values map predominantly to positive SHAP scores, implying that reduced levels enhance landslide susceptibility. The dispersed SHAP values for DOR and land-use variables suggest strong interaction effects with other predictors. Aspect and profile curvature exhibit SHAP values tightly centred on zero, underscoring their negligible impact on landslide risk.

3.5.2. Local Interpretation

Focusing locally on the four dominant landslide drivers—slope, maximum rainfall, surface roughness and elevation—Figure 10a–d present univariate SHAP dependence plots in which the x-axis records each predictor’s raw value and the y-axis quantifies its direct, monotonic or non-monotonic, contribution to landslide probability, with every point representing an individual observation whose vertical position signals the magnitude and direction of influence on model output. Extending to interactions, Figure 10e–h display bivariate SHAP dependence landscapes where the x- and y-axes give the raw values of the primary driver and its strongest covariate, the colour scale encodes their joint SHAP interaction value, and warmer (cooler) hues reveal progressively stronger positive (negative) synergistic modulation of landslide susceptibility.
The SHAP dependence plot for slope shows low or negative SHAP values at gentle inclinations, indicating limited landslide susceptibility. Between 0° and 20°, SHAP values rise monotonically with slope, reflecting an increasingly positive contribution to landslide probability. Between 20° and 30°, SHAP values plateau at their maximum, signifying the strongest slope-driven amplification of risk. Above 30°, SHAP values decline, consistent with reduced landslide likelihood in steep terrains underlain by competent bedrock and devoid of loose regolith, which collectively enhance slope stability. The bivariate SHAP interaction plot with road density—the dominant covariate—shows that high-slope pixels coincide with sparse road networks, reflecting the county’s hilly terrain where roads cluster in gentle valleys and plains rather than steep uplands. Moreover, low-slope samples paired with high road density yield elevated SHAP values, evidencing that intensive road construction can trigger landslides even on gentle gradients.
The SHAP dependence plot for monthly maximum rainfall reveals low or negative SHAP values below 215 mm, denoting subdued landslide susceptibility. At 225 mm, SHAP values increase linearly and plateau at their maximum, signifying peak rainfall-driven landslide amplification. The interaction plot shows that, for identical rainfall, high-elevation samples (red) exhibit greater SHAP values than low-elevation samples (blue), evidencing elevation-dependent risk amplification. Between 225 mm and 235 mm, pronounced fluctuations and non-linearity suggest soil-moisture saturation and heightened modulation by covariates. Above 235 mm, SHAP values decrease; the interaction plot shows that most samples exceed 600 m, indicating that excessive rainfall attenuates landslide risk at high elevations.
Below a roughness index of 1.04, SHAP values remain low, and samples form a dense, linear cloud, indicating minimal landslide susceptibility and limited covariate influence. Between 1.04 and 1.2, SHAP values rise steeply, signaling a pronounced escalation in landslide risk. Beyond 1.2, SHAP values plateau and sample dispersion intensifies, indicating diminishing marginal risk and heightened interaction effects. Road construction markedly amplifies the landslide hazard associated with surface roughness when the roughness index exceeds 1.04 and road density (DOR) surpasses 1.0, a conjunction that demands heightened vigilance.
SHAP dependence plots for elevation reveal low or negative SHAP values below 300 m, reflecting stable fluvial plains and valley floors whose terrain exerts negligible, or even suppressive, effects on landslide initiation. Between 100 m and 450 m, SHAP values increase monotonically, mirroring escalating landslide susceptibility. Within the 450–700 m band, SHAP values plateau at elevated levels yet exhibit pronounced variability, denoting high landslide risk strongly modulated by covariates. Beyond 700 m, SHAP values decline modestly, with interaction plots revealing that landslide likelihood is instead dictated by the synergistic effects of high elevation and extreme rainfall.

4. Discussion

4.1. Hybrid Optimisation of Non-Landslide Samples

We present a systematic framework for sample optimisation. To mitigate the paucity (n = 146) and spatial bias of landslide inventories in Guiyang County, landslide records are augmented via oversampling and complemented by a novel non-landslide selection strategy—Slope-IOE-O. Low-slope constraints (Slope-O) are intersected—via a spatial overlay—with zones of minimal susceptibility delineated by the entropy-weighted information-value model (IOE-O), establishing a coupled physical–statistical framework for purifying non-landslide samples. The method implements a dual-guarantee mechanism: low-slope areas inherently exhibit geotechnical stability and low failure probability, while IOE-O statistically isolates zones of minimal susceptibility. Their spatial concordance yields negatives that are simultaneously topographically and statistically robust, elevating sample purity and minimising mislabelled pseudo-landslides. Relative to conventional buffer-based selection—prone to mislabeling proximal hazard zones—Slope-IOE-O exploits spatial concordance to sharpen the environmental distinction between positives and negatives, elevating RF AUC from 0.907 to 0.965. Accordingly, Slope-IOE-O is advocated as the primary optimisation strategy, with single-criterion alternatives reserved for data-scarce scenarios that preclude hybrid implementation. Landslide inventories derive from field surveys and archival records, ensuring veracity. Eighteen multi-source covariates—encompassing topography, hydrology, geology and anthropogenic drivers—were compiled; collinearity and correlation screening distilled the set to sixteen predictors, eliminating redundancy and furnishing high-fidelity model inputs. The integrated optimisation of samples and predictors yields a transferable protocol for susceptibility assessment in geologically complex, data-limited settings.

4.2. Mechanistic Attribution of Landslide Susceptibility Using SHAP

Using 146 inventoried landslides in Guiyang County—compiled from literature and field surveys—we model susceptibility via LDA, RF and ET. To elucidate factor contributions and interactions, we conduct a systematic SHAP analysis of model outputs. SHAP identifies slope, monthly maximum rainfall, surface roughness and elevation as the four dominant drivers, quantifies their importance in descending order, and reveals non-linear pathways and interdependencies. Dependence plots reveal that slope contributions peak at 20–30°, declining beyond 30° where exposed bedrock reduces susceptibility. Monthly rainfall exhibits a critical threshold (225 mm); beyond this threshold, rising pore-water pressure sharply increases failure probability. Interaction SHAP values further demonstrate significant synergy between surface roughness and road density, and between rainfall and elevation, jointly modulating landslide spatial patterns. By providing quantitative attribution, SHAP overcomes the black-box limitation, corroborates the physical plausibility of predictions, and enhances interpretability in geomechanically and hydrological contexts, thereby offering a robust visual and quantitative toolkit for landslide-susceptibility modelling in complex terrains [59,63].

4.3. Spatial Heterogeneity of Landslide Susceptibility and Its Primary Drivers

Landslide susceptibility in Guiyang County exhibits a pronounced north–south high-risk belt that sandwiches a low-lying central corridor, revealing strong spatial heterogeneity. Very-high susceptibility is confined to three hotspots: (i) the northern uplands (>584 m elevation, >21.79° slope, topographic roughness > 1.12); (ii) the steep southern highlands; and (iii) the built-up county seat and adjacent peri-urban fringe. Conversely, the central tract—comprising low-relief hills, terraces, plains and open water—is assigned to the very-low susceptibility class.
Excessive landslide probability in the northern mountains arises from synergistic topographic–hydrological forcing. High elevations and pronounced roughness focus runoff, sustaining chronic soil saturation [64]; steep terrain amplifies gravitational loading, and vigorous fluvial incision undermines mechanical stability [65], whereas intense monthly rainfall and elevated SPI values promote failure via surface erosion and elevated pore-water pressures [66]. Within the southern county seat, landslide susceptibility is exacerbated by the confluence of dense anthropogenic disturbance, lateral river undercutting and predisposed lithological structures. Compacted urban fabric (low NDVI) and pervasive cut slopes unload slope toes and relax rock mass [67]; lateral migration of the Xi River and attendant stage fluctuations weaken bank materials and scour slope bases; dense fault networks, intensely fractured strata and groundwater flow along discontinuities collectively diminish shear resistance. The central tract’s very-low susceptibility reflects subdued relief, gentle gradients, limited runoff convergence and negligible erosion, which together preclude landslide initiation.
RF, ET, SVM and LDA unanimously reproduce the north–south high-risk belt flanking a low-risk core, a pattern congruent with elevation and slope, corroborating model robustness. We therefore recommend focusing mitigation on (i) drainage and bio-engineered slope protection in the northern highlands, (ii) strict regulation of cut slopes, riparian armory and revegetation in the southern county seat, and (iii) preservation of the central low-risk tract’s intact, low-relief topography.

4.4. Comparison with Other Studies

Leveraging 18 conditioning factors, we delineated landslide susceptibility across Guiyang County, revealing slope, monthly maximum rainfall, surface roughness and elevation as the four dominant drivers. Liu Leilei employed a deterministic factor model in Hunan’s red-bed terrain, identified land-use type, elevation and topographic relief as principal predictors [68]. Liu Ruiyang coupling SHAP with RF, found that lithology, fault proximity, elevation, road proximity and river proximity dictated typhoon-triggered landslides in Zixing City [69]. Divergences arise from (1) distinct geological contexts and landslide sample characteristics, and (2) differing predictor suites and model architectures that reorder factor importance and modulate causal pathways [70]. To enhance interpretability, we employed SHAP, which quantifies each feature’s Shapley contribution within a game-theoretic framework, ensuring transparent and traceable decisions. Global SHAP metrics identify dominant drivers, whereas local explanations quantify site-specific contributions and elucidate underlying mechanisms, thereby validating causal links between predictors and landslide occurrence.
Four landslide susceptibility maps for Guiyang County were produced using SVM, LDA, RF, and ET models. High spatial concordance among the models was observed, with the very-high and very-low susceptibility zones exhibiting consistent spatial patterns, thereby attesting to the robustness of the modeling framework. Susceptibility exhibits a north- and south-high, central-low gradient, with very-high zones concentrated in the northern mountains, the vicinity of the southern county seat and selected villages of Taihe Town, and very-low zones confined to the central lowlands. This pattern accords with the regional zonation of Xu Zhaojun [71], derived from an index-based method. However, Xu Zhaojun incorporated ground subsidence and rockfalls, potentially underestimating landslide susceptibility in some areas, and their coarse-resolution mapping limited spatial fidelity. Our machine-learning workflow delivers 30 m × 30 m resolution, markedly enhancing both accuracy and precision.

4.5. Future Research Directions

Comparative evaluation of non-landslide sampling protocols identifies an optimal strategy that substantially enhances classifier performance and partially alleviates data scarcity in landslide inventories. Landslide samples derive from rigorously validated historical records and field surveys, warranting the use of buffer-based augmentation. When inventories are suspected, buffer methods risk amplifying noise and introducing distributional bias and are therefore discouraged. Future research should integrate spatial-autocorrelation metrics and topographic constraints within sampling protocols or harness multi-source data fusion to enhance the fidelity and representativeness of both landslide and non-landslide inventories.
Landslide footprints are spatially restricted relative to stable terrain. Although preserving this imbalance can mirror natural prevalence, a universal landslide-to-non-landslide ratio remains elusive. Ratios ranging from 1:160 [27] to 1:1 [72] have been reported, with intermediate prescriptions of 1:2 [73] and 1:10 [74]. The optimal ratio appears contingent on both spatial extent and the representativeness of predictor space. Future work could integrate cost-sensitive weighting into grid search or Bayesian optimization frameworks to self-adaptively identify the optimal class ratio for maximizing predictive accuracy.
The prevalence of rainfall-induced landslides, which constitute 90% of events in China [75], mandates focused research. The risk is particularly acute in Southern China, where more than 60% of the annual precipitation is delivered between June and September. This concentration, coupled with a rising trend in extreme rainfall events, underscores the critical importance of developing accurate rainfall-threshold models for landslide early warning. Future efforts should assimilate high-resolution rainfall data with temporally explicit landslide inventories to dissect the coupled response of slope stability to rainfall intensity and duration. Machine-learning models can be trained to link rainfall metrics to landslide response, with k-fold cross-validation and Bayesian optimisation used to derive optimal thresholds. Accounting for spatial–temporal heterogeneity, dynamic threshold models should be developed to enhance predictive accuracy and to inform regional disaster-risk reduction.

5. Conclusions

Landslide susceptibility in Guiyang County, Hunan Province, is systematically assessed through an integrated framework that couples sample-optimization strategies with machine-learning models and interpretability analyses; key conclusions are summarised below.
(1)
To counteract the paucity and spatial bias of landslide inventories, we devise and validate the Slope-IOE-O hybrid sampling protocol, integrating low-slope thresholds with IOE-delineated extremely low-susceptibility zones to maximise the purity and representativeness of non-landslide samples. The strategy elevates the RF AUC from 0.907 to 0.965, markedly surpassing both traditional buffer techniques and single-criterion alternatives.
(2)
Across SVM, LDA, RF and ET, RF delivers the highest performance across AUC, accuracy, precision, F1 and recall (AUC = 0.965), attesting to its reliability and stability for landslide prediction in Guiyang County.
(3)
SHAP interpretability identifies slope, monthly maximum rainfall, surface roughness and elevation as the four dominant drivers of regional landslide susceptibility. Each exhibits pronounced non-linear effects and interactions: slope contributions peak at 20–30°, monthly rainfall displays a threshold at 225 mm, and the synergistic effect of high roughness and road density markedly amplifies risk.
(4)
Landslide susceptibility exhibits a north–south-high, central-low pattern; high-susceptibility areas are mainly distributed in the northern and southern mountains, the southern urban core and their surrounding areas—regions closely linked to topography, hydrology and anthropogenic forcing. Very-low susceptibility zones occupy the central hills and plains, characterised by gentle topography and high geotechnical stability.
(5)
The proposed sample-optimization framework and modelling pipeline provide a transferable protocol for landslide assessment in geologically complex, data-scarce regions, especially in the heavy-rainfall zones of southern China.
Future research should integrate multi-source data, dynamic-threshold modelling and automated sample-ratio optimisation to enhance predictive performance under extreme climatic conditions. Real-time monitoring fused with deep-learning architectures will further advance intelligent, fine-resolution landslide early-warning systems.

Author Contributions

Conceptualization, Y.K.; Methodology, Y.K.; Software, Y.K., K.Z., Z.M., X.C., L.C. and T.X.; Validation, Y.K.; Formal analysis, Y.K.; Investigation, Y.K.; Resources, Y.K.; Data curation, Y.K., C.X., H.K., W.T. and X.K.; Writing—original draft, Y.K. and Z.M.; Writing—review & editing, Y.K.; Visualization, Y.K.; Supervision, Y.K.; Project administration, Y.K.; Funding acquisition, Y.K. and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (Grant No. U23A2047 awarded to H.W.), the Xizang Autonomous Region Science and Technology Department (Grant Nos. XZ202401YD0028, XZ202402ZD0001, and XZ202401ZY0057 awarded to H.W.), and Xizang University (Grant No. 2025-GSP-S011 awarded to Y.K.).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in mendeley at DOI:10.17632/jprswjdvz4.2. [mendeley].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Seçkin, F.; Hakan, T.; Abdullah, A.; Luigi, L.; Petley, D.N.; Tolga, G. Understanding Fatal Landslides at Global Scales: A Summary of Topographic, Climatic, and Anthropogenic Perspectives. Nat. Hazards 2024, 120, 6437–6455. [Google Scholar] [CrossRef]
  2. Wen, H.; Li, W.; Xu, C.; Daimaru, H. Landslides in Forests around the World: Causes and Mitigation. Forests 2023, 14, 629. [Google Scholar] [CrossRef]
  3. Capobianco, V.; Choi, C.E.; Crosta, G.; Hutchinson, D.J.; Jaboyedoff, M.; Lacasse, S.; Nadim, F.; Reeves, H. Effective Landslide Risk Management in Era of Climate Change, Demographic Change, and Evolving Societal Priorities. Landslides 2025, 22, 2915–2933. [Google Scholar] [CrossRef]
  4. Mateja, J.A.; Nejc, B.; Ela, Š.; Peter, F.; Luigi, G.S.; Anže, M.; Tina, P. Climate Change Increases the Number of Landslides at the Juncture of the Alpine, Pannonian and Mediterranean Regions. Sci. Rep. 2023, 13, 23085. [Google Scholar] [CrossRef] [PubMed]
  5. Ministry of Natural Resources. Natural Resources Bulletin of China, 2024; China Natural Resources News; Ministry of Natural Resources: Beijing, China, 2025. [Google Scholar]
  6. Tao, Z.; Luo, S.; Zhu, C.; He, M. Dynamie Mechanical Monitoring of Landslide and Case Analysis of Failure Process. J. Eng. Geol. 2022, 30, 177–186. [Google Scholar] [CrossRef]
  7. Zhang, Z.; Huang, X.; Cai, Y.; Fu, J.; Yue, Z.; Yang, R.; Han, C. The Evolution Pattern and Influence of Human Activities of Landslide Driving Factors in Wulong Section of the Three Gorges Reservoir Area. Chin. J. Geol. Hazard Control 2022, 33, 39–50. [Google Scholar] [CrossRef]
  8. Somogyvári, M.; Chicas, S.D.; Li, H.; Mizoue, N.; Ota, T.; Du, Y. Landslide Susceptibility Mapping Core-Base Factors and Models’ Performance Variability: A Systematic Review. Nat. Hazards 2024, 120, 1–21. [Google Scholar] [CrossRef]
  9. Jia, Z.; Cheng, Z.; Chang, Z.; Li, Q.; Peng, Y.; Jiang, B.; Huang, F. Modeling and Uncertainty in Landslide Susceptibility Prediction Considering the Coupling Mode of Landslide Types. Earth Sci. 2025, 50, 2311–2329. [Google Scholar] [CrossRef]
  10. Zhang, L.; Jiang, S. Data Driven Weight Model for Reqional Landslide Susceptibility Assessment and Its Application. Hydrogeol. Eng. Geol. 2004, 6, 33–36. Available online: https://www.zhangqiaokeyan.com/academic-journal-cn_hydrogeology-engineering-geology_thesis/0201254217118 (accessed on 3 October 2025).
  11. Al-Najjar, H.A.H.; Pradhan, B.; He, X.; Sheng, D.; Alamri, A.; Gite, S.; Park, H.-J. Integrating Physical and Machine Learning Models for Enhanced Landslide Prediction in Data-Scarce Environments. Earth Syst. Environ. 2024. [Google Scholar] [CrossRef]
  12. Wang, J.; Wang, Y.; Li, Y.; Wei, S.; Li, C.; Wang, Y.; Qi, H. Landslide Susceptibility Assessment Based on Weighted Information Value Model: A Case Study of Chongqing City. Sci. Soil Water Conserv. 2023, 21, 53–62. [Google Scholar] [CrossRef]
  13. Lu, Y.; Xu, H.; Wang, C.; Yan, G.; Huo, Z.; Peng, Z.; Liu, B.; Xu, C. A Novel Strategy Coupling Optimised Sampling with Heterogeneous Ensemble Machine-Learning to Predict Landslide Susceptibility. Remote Sens. 2024, 16, 3663. [Google Scholar] [CrossRef]
  14. Marzini, L.; D’Addario, E.; Papasidero, M.P.; Chianucci, F.; Disperati, L. Influence of Root Reinforcement on Shallow Landslide Distribution: A Case Study in Garfagnana (Northern Tuscany, Italy). Geosciences 2023, 13, 326. [Google Scholar] [CrossRef]
  15. Vanani, A.A.G.; Shoaei, G.; Zare, M. Landslide Susceptibility Mapping in North Tehran, Iran: Linear Regression, Neural Networks, and Fuzzy Logic Approaches. Geotech. Geol. Eng. 2024, 42, 7159–7186. [Google Scholar] [CrossRef]
  16. Xu, C.; Xu, X. Logistic Regression Model and Its Validation for Hazardmapping of Landslides Triggered by Yushu Earthquake. J. Eng. Geol. 2012, 20, 326–333. Available online: http://www.gcdz.org/en/article/id/11136 (accessed on 3 October 2025).
  17. Xu, C.; Dai, F.; Xu, S.; Xu, X.; He, H.; Wu, X.; Shi, F. Application of Logistic Regression Model on the Wenchuan Earthquaketriggered Landslide Hazard Mapping and Its Validation. Hydrogeol. Eng. Geol. 2013, 40, 98–104. [Google Scholar] [CrossRef]
  18. Kavzoglu, T.; Sahin, E.K.; Colkesen, I. Landslide Susceptibility Mapping Using GIS-Based Multi-Criteria Decision Analysis, Support Vector Machines, and Logistic Regression. Landslides 2014, 11, 425–439. [Google Scholar] [CrossRef]
  19. Li, M.X.; Wang, H.Y.; Chen, J.L.; Zheng, K. Assessing Landslide Susceptibility Based on the Random Forest Model and Multi-Source Heterogeneous Data. Ecol. Indic. 2024, 158, 111600. [Google Scholar] [CrossRef]
  20. Yang, K.; Niu, R.; Song, Y.; Dong, J.; Zhang, H.; Chen, J. Dynamic Hazard Assessment of Rainfall Induced Landslides Using Gradient Boosting Decision Tree with Google Earth Engine in Three Gorges Reservoir Area, China. Water 2024, 16, 1638. [Google Scholar] [CrossRef]
  21. Wen, H.; Liu, B.; Di, M.; Li, J.; Zhou, X. A SHAP-Enhanced XGBoost Model for Interpretable Prediction of Coseismic Landslides. Adv. Space Res. 2024, 74, 3826–3854. [Google Scholar] [CrossRef]
  22. Sun, D.; Wu, X.; Wen, H.; Gu, Q. A LightGBM-Based Landslide Susceptibility Model Considering the Uncertainty of Non-Landslide Samples. Geomat. Nat. Hazards Risk 2023, 14, 2213807. [Google Scholar] [CrossRef]
  23. Song, Y.; Song, Y.; Wang, C.; Wu, L.; Wu, W.; Li, Y.; Li, S.; Chen, A. Landslide Susceptibility Assessment through Multi-Model Stacking and Meta-Learning in Poyang County, China. Geomat. Nat. Hazards Risk 2024, 15, 2354499. [Google Scholar] [CrossRef]
  24. Shruti, S.; Tarunpreet, B.; Verma, A.K. A Novel Voting Ensemble Model for Spatial Prediction of Landslides Using GIS. Int. J. Remote Sens. 2020, 41, 929–952. [Google Scholar] [CrossRef]
  25. Zhang, R.; Guan, Y. Application of CNN-LSTM Hybrid Model in Predicting Surface Displacement of Accumula Ted Landslide Sites. North China Farthquake Sci. 2025, 43, 1–8. [Google Scholar] [CrossRef]
  26. Wang, Y.; Wu, X.; Zhou, K.; Lin, G.; Peng, B.; Zhice, F. Integrating a Multi-Dimensional Deep Convolutional Neural Network with Optimized Sample Selection for Landslide Susceptibility Assessment. Geo-Spat. Inf. Sci. 2025, 15, 1–21. [Google Scholar] [CrossRef]
  27. Huang, F.; Xiong, H.; Jiang, S.H.; Yao, C.; Fan, X.; Catani, F.; Chang, Z.; Zhou, X.; Huang, J.; Liu, K. Modelling Landslide Susceptibility Prediction: A Review and Construction of Semi-Supervised Imbalanced Theory. Earth-Sci. Rev. 2024, 250, 104700. [Google Scholar] [CrossRef]
  28. Wu, H.Y.; Zhou, C.; Liang, X.; Wang, Y.; Yuan, P.C.; Wu, L.X. Evaluation of landslide susceptilbility based on sample optimization strategy research. Geomat. Inf. Sci. Wuhan Univ. 2023, 49, 1–15. [Google Scholar] [CrossRef]
  29. Liu, Y.; Chen, C.; He, Q.; Li, K. Landslide Susceptibility Evaluation Considering Positive and Negative Sample Optimization. Acta Geod. Cartogr. Sin. 2025, 54, 308–320. [Google Scholar] [CrossRef]
  30. Ge, Q.; Li, J.; Lacasse, S.; Sun, H.; Liu, Z. Data-Augmented Landslide Displacement Prediction Using Generative Adversarial Network. J. Rock Mech. Geotech. Eng. 2024, 16, 4017–4033. [Google Scholar] [CrossRef]
  31. Liu, M.M. Landslide Susceptibility Analysis Method Considering Sample Optimization and Spatial Characteristics. Ph.D. Thesis, Liaoning Technical University, Fuxin, China, 2024. [Google Scholar]
  32. Miao, Y.; Zhu, A.; Yang, L.; Bai, S.; Liu, J.; Deng, Y. Sensitivity of BCS for Sampling Landslide Absence Datain Andslide Susceptibility Assessment. Mt. Res. 2016, 34, 432–441. [Google Scholar] [CrossRef]
  33. Yao, X.; Tham, L.G.; Dai, F.C. Landslide Susceptibility Mapping Based on Support Vector Machine: A Case Study on Natural Slopes of Hong Kong, China. Geomorphology 2008, 101, 572–582. [Google Scholar] [CrossRef]
  34. Miao, Y.; Zhu, A.; Yang, L.; Bai, S.; Zeng, C. A New Method of Pseudo Absence Data Generation in Landslide Susceptibility Mapping. Geogr. Geo-Inf. Sci. 2016, 32, 61–67+127. [Google Scholar] [CrossRef]
  35. Cui, Y.; Zhu, L.; Xu, M.; Miao, H. Optimizing TSES Method Based on the Environmental Factors to Select Negative Samples and Its Application in Landslide Susceptibility Evaluation. Bull. Geol. Sci. Technol. 2024, 43, 192–199. [Google Scholar] [CrossRef]
  36. Guo, Y.; Dou, J.; Xiang, Z.; Ma, H.; Dong, A.; Luo, W. Susceptibility Evaluation of Wenchuan Coseismic Landslides by Gradientboosting Decision Tree and Random Forest Based on Optimal Negative Samplesampling Strategies. Bull. Geol. Sci. Technol. 2024, 43, 251–265. [Google Scholar] [CrossRef]
  37. Zhou, X.; Huang, F.; Wu, W.; Zhou, C.; Zeng, S.; Pan, L. Regional Landslide Susceptibility Prediction Based on Negative Sample Selected by Coupling Information Value Method. Adv. Eng. Sci. 2022, 54, 25–35. [Google Scholar] [CrossRef]
  38. Fu, Y.; Fan, Z.; Li, X.; Wang, P.; Sun, X.; Ren, Y.; Cao, W. The Influence of Non-Landslide Sample Selection Methods on Landslide Susceptibility Prediction. Land 2025, 14, 722. [Google Scholar] [CrossRef]
  39. Gu, T.; Duan, P.; Wang, M.; Li, J.; Zhang, Y. Effects of Non-Landslide Sampling Strategies on Machine Learning Models in Landslide Susceptibility Mapping. Sci. Rep. 2024, 14, 7201. [Google Scholar] [CrossRef] [PubMed]
  40. Kong, Y.; Wu, H.; Xu, C.; Sun, J.; Zhu, K.; Zhang, C.; Zhou, J.; Xu, T.; Su, T.; Zhang, Z.; et al. Landslide Susceptibility Mapping Using an Entropy Index-Based Negative Sample Selection Strategy: A Case Study of Luolong County. PLoS ONE 2025, 20, e0322566. [Google Scholar] [CrossRef]
  41. Zhang, L. Research on the Mode of Mineland Reclamation in the County of Guiyang. Master’s Thesis, Hunan Normal University, Changsha, China, 2013. [Google Scholar]
  42. Wang, P. Research on Guiyang County HV Distribution Network Planning. Master’s Thesis, Hunan University, Changsha, China, 2016. [Google Scholar]
  43. Zhang, Y.; Ming, D.; Zhao, W.; Xu, L.; Zhao, Z.; Liu, R. The Extraction and Analysis of Luding Earthquake—Induced Landslide Based on High- Resolution Optical Satellite Images. Remote Sens. Nat. Resour. 2023, 35, 161–170. [Google Scholar] [CrossRef]
  44. Liu, P.; Wei, Y.; Wang, Q.; Chen, Y.; Xie, J. Research on Post-Earthquake Landslide Extraction Algorithm Based on Improved U-Net Model. Remote Sens. 2020, 12, 894. [Google Scholar] [CrossRef]
  45. Zhu, Y.; Sun, D.; Wen, H.; Zhang, Q.; Ji, Q.; Li, C.; Zhou, P.; Zhao, J. Considering the Effect of Non-Landslide Sample Selection on Landslide Susceptibility Assessment. Geomat. Nat. Hazards Risk 2024, 15, 2392778. [Google Scholar] [CrossRef]
  46. Cui, Y.L.; Yang, W.H.; Xu, C.; Wu, S. Distribution of Ancient Landslides and Landslide Hazard Assessment in the Western Himalayan Syntaxis Area. Front. Earth Sci. 2023, 11, 1135018. [Google Scholar] [CrossRef]
  47. Lu, C.; Bo, Z. A New Slope Unit Extraction Method Based on Improved Marked Watershed. Matec Web Conf. 2018, 232, 04070. [Google Scholar] [CrossRef]
  48. Yu, C.; Chen, J. Application of a GIS-Based Slope Unit Method for Landslide Susceptibility Mapping in Helong City: Comparative Assessment of ICM, AHP, and RF Model. Symmetry 2020, 12, 1848. [Google Scholar] [CrossRef]
  49. Liu, S.; Zhu, J.; Yang, D.; Ma, B. Comparative Study of Geological Hazard Evaluation Systems Using Grid Units and Slope Units under Different Rainfall Conditions. Sustainability 2022, 14, 16153. [Google Scholar] [CrossRef]
  50. GB/T 50218-2014; Standard for Engineering Classification of Rock Mass. China Planning Press: Beijing, China, 2014.
  51. Riaz, M.T.; Basharat, M.; Ahmed, K.S.; Sirfraz, Y.; Shahzad, A.; Shah, N.A. Failure Mechanism of a Massive Fault–Controlled Rainfall–Triggered Landslide in Northern Pakistan. Landslides 2024, 21, 2741–2767. [Google Scholar] [CrossRef]
  52. Yu, L.B.; Wang, Y.; Pradhan, B. Enhancing Landslide Susceptibility Mapping Incorporating Landslide Typology via Stacking Ensemble Machine Learning in Three Gorges Reservoir, China. Geosci. Front. 2024, 15, 101802. [Google Scholar] [CrossRef]
  53. Zhao, P.; Wen, G.; He, Z.; Wang, G.; Chen, L.; Shen, X.; Wang, K.; Tang, H. Shallow Landslide Susceptibility Assessment in Jinsha River Basin Based on Machine Learning Models. Water Resour. Hydropower Eng. 2024, 55, 53–70. [Google Scholar] [CrossRef]
  54. Kumar, D.; Thakur, M.; Dubey, C.S.; Shukla, D.P. Landslide Susceptibility Mapping & Prediction Using Support Vector Machine for Mandakini River Basin, Garhwal Himalaya, India. Geomorphology 2017, 295, 115–125. [Google Scholar] [CrossRef]
  55. Alfonso, M.R.-C.; Luis, F.P.-S.; Mario, G.T.-V.; Juan, P.M.; Ana, C.S.-R. Linear Discriminant Analysis to Describe the Relationship between Rainfall and Landslides in Bogota, Colombia. Landslides 2016, 13, 671–681. [Google Scholar] [CrossRef]
  56. Du, P.; Chen, N.S.; Wu, K.N.; Li, Z.; Zhang, Y.Y.L. Evaluation of landslide susceptibility in southeast tibet based on a random forest model. J. Chengdu Univ. Technol. (Sci. Technol. Ed.) 2024, 51, 328–344. [Google Scholar] [CrossRef]
  57. Halder, K.; Srivastava, A.K.; Ghosh, A.; Das, S.; Banerjee, S.; Pal, S.C.; Chatterjee, U.; Bisai, D.; Ewert, F.; Gaiser, T. Improving Landslide Susceptibility Prediction through Ensemble Recursive Feature Elimination and Meta-Learning Framework. Sci. Rep. 2025, 15, 5170. [Google Scholar] [CrossRef]
  58. Shapley, L.S. A Value for N-Person Games; RAND Corporation: Santa Monica, CA, USA, 1952. [Google Scholar]
  59. Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Auckland, New Zealand, 2–6 December 2024; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4768–4777. [Google Scholar]
  60. Kanwar, M.; Pokharel, B.; Lim, S. A New Random Forest Method for Landslide Susceptibility Mapping Using Hyperparameter Optimization and Grid Search Techniques. Int. J. Environ. Sci. Technol. 2025, 22, 10635–10650. [Google Scholar] [CrossRef]
  61. Xu, C.; Dai, F.C.; Yao, X.; Chen, J.; Tu, X.B.; Sun, Y.; Wang, Z.Y. Gis-based landslide susceptibility assessment using analytical hierarchy process in wenchuan earthquake region. Chin. Joural Rock Mech. Eng. 2009, 28, 3978–3985. [Google Scholar] [CrossRef]
  62. Rahman, A.S.A.; A’kif, A.F.; Mohamed, K.K.; Nouh, M.A.; Rida, A.A. Spatial Mapping of Landslide Susceptibility in Jerash Governorate of Jordan Using Genetic Algorithm-Based Wrapper Feature Selection and Bagging-Based Ensemble Model. Geomat. Nat. Hazards Risk 2022, 13, 2252–2282. [Google Scholar] [CrossRef]
  63. Zheng, D.; Li, Y.; Yan, C.; Wu, H.; Yamashiki, Y.A.; Gao, B.; Nian, T. Landslide Susceptibility Assessment Using AutoML-SHAP Method in the Southern Foothills of Changbai Mountain, China. Landslides 2025, 22, 1855–1875. [Google Scholar] [CrossRef]
  64. Zhang, T.; Li, L.; Liu, F.; Hong, Z.; Qian, F.; Hu, B.; Zhang, M. Evaluation of Loess Landslide Susceptibility Based on Optimised Max Ent Model: A Case Study of Wuqi County in Shaanxi Province. Northwestern Geol. 2025, 58, 172–185. [Google Scholar] [CrossRef]
  65. Cheng, J.; Xu, C.; Xu, X.; Zhang, S.; Zhu, P. Modeling Seismic Hazard and Landslide Occurrence Probabilities in Northwestern Yunnan, China: Exploring Complex Fault Systems with Multi-Segment Rupturing in a Block Rotational Tectonic Zone. Nat. Hazards Earth Syst. Sci. 2025, 25, 857–877. [Google Scholar] [CrossRef]
  66. Liu, Y.; Chen, C. Landslide Susceptibility Evaluation Method Considering Spatial Heterogeneity and Feature Selection. Acta Geod. Artographica Sin. 2024, 53, 1417–1428. [Google Scholar] [CrossRef]
  67. Wang, S.; Zhuang, J.; Fan, H.; Niu, P.; Jia, K.; Wang, J. Evaluation of Landslide Suseeplibilitly Based on Frequeney Raio and Ensemble Leaming Taking theBalang-Dege Seion in the Upstream of Jinsha River as an Example. J. Eng. Geol. 2022, 30, 817–828. [Google Scholar] [CrossRef]
  68. Liu, L.; Xiao, H.; Wang, C.; Yao, T. Landslide Analysis Based on Susceptibility to Factors Causing Geological Disasters in Red Beds Area of Hunan Province. Min. Metall. Eng. 2024, 44, 169–174. [Google Scholar] [CrossRef]
  69. Liu, R.; Xu, Q.; Pu, C.; Xu, F.; Wang, X.; Zhao, H.; Zhu, X.; He, N. Characteristics of Landslides Induced by Typhoon “GaeMi” in Zixing, Hunan, July 2024, and Their Geological Control Factors. Geomat. Inf. Sci. Wuhan Univ. 2025, 1–22. [Google Scholar] [CrossRef]
  70. Yang, J. Uncertainty Analysis of Rainfall-Induced Landslide Susceptibility Prediction and Risk Assessment Modeling. Master’s Thesis, Nanchang University, Nanchang, China, 2022. [Google Scholar]
  71. Xu, Z.; Xiao, N.; Liu, Z.; Li, Y. Research on Geological Disaster-Prone Area Basedon Susceptibility Index Method in Guiyang County. J. Chang. Inst. Technol. (Nat. Sci. Ed.) 2012, 13, 54–59. [Google Scholar] [CrossRef]
  72. Hong, H.; Miao, Y.; Liu, J.; Zhu, A.-X. Exploring the Effects of the Design and Quantity of Absence Data on the Performance of Random Forest-Based Landslide Susceptibility Mapping. Catena 2019, 176, 45–64. [Google Scholar] [CrossRef]
  73. Reza, P.H.; Aiding, K.; Norman, K.; Farzin, S. Investigating the Effects of Different Landslide Positioning Techniques, Landslide Partitioning Approaches, and Presence-Absence Balances on Landslide Susceptibility Mapping. Catena 2020, 187, 104364. [Google Scholar] [CrossRef]
  74. Sun, D.; Wen, H.; Wang, D.; Xu, J. A Random Forest Model of Landslide Susceptibility Mapping Based on Hyperparameter Optimization Using Bayes Algorithm. Geomorphology 2020, 362, 107201. [Google Scholar] [CrossRef]
  75. Li, Y. Method for the Warning of Precipitation Induced Landslides. Ph.D. Thesis, China University of Geosciences (Beijing), Beijing, China, 2005. [Google Scholar]
Figure 1. Location of the study area.
Figure 1. Location of the study area.
Sustainability 17 09575 g001
Figure 2. Landslide relic dataset.
Figure 2. Landslide relic dataset.
Sustainability 17 09575 g002
Figure 3. Grading diagram of conditioning factors.
Figure 3. Grading diagram of conditioning factors.
Sustainability 17 09575 g003
Figure 4. The datasets of different sample optimization methods.
Figure 4. The datasets of different sample optimization methods.
Sustainability 17 09575 g004
Figure 5. Flow chart of technical research.
Figure 5. Flow chart of technical research.
Sustainability 17 09575 g005
Figure 6. Correlation coefficient matrix.
Figure 6. Correlation coefficient matrix.
Sustainability 17 09575 g006
Figure 7. AUC for different non-landslide sample optimization methods.
Figure 7. AUC for different non-landslide sample optimization methods.
Sustainability 17 09575 g007
Figure 8. Landslide susceptibility mapping premised on different coupling models.
Figure 8. Landslide susceptibility mapping premised on different coupling models.
Sustainability 17 09575 g008
Figure 9. SHAP value summary.
Figure 9. SHAP value summary.
Sustainability 17 09575 g009
Figure 10. SHAP single-factor and two-factor interaction dependence plots.
Figure 10. SHAP single-factor and two-factor interaction dependence plots.
Sustainability 17 09575 g010
Table 1. Landslide data table for Guiyang County.
Table 1. Landslide data table for Guiyang County.
No.DateTypeV. (m3)No.DateTypeV. (m3)No.DateTypeV. (m3)No.DateTypeV. (m3)
12001/5/Soil4500382006/7/Soil24,000752014/6/Soil12001122005/5/Soil14,000
22004/5/Soil2250392014/3/Rock5200762001/5/Soil28,8001132013/5/Soil1500
32013/5/Soil4500402002/8/Soil24,000772008/6/Soil80001142014/5/Soil100
42012/7/Soil6000412002/8/Soil12,000782004/6/Soil12,8001152013/6/Soil200
52008/6/Soil8000422014/6/Rock1440792014/5/Soil1201162012/6/Soil50
62013/6/Soil108432014/3/Soil3000801996/5/Soil44401172013/5/Soil320
72014/4/Soil720442014/6/Soil1950812010/4/Soil12001182013/5/Soil150
82006/7/Soil12,000452014/5/Soil2550822014/3/Soil1501192013/5/Soil8000
92007/7/Soil720462002/5/Soil5400832013/3/Soil3001202007/3/Soil300
102004/5/Soil3600472001/6/Soil20,000842002/6/Soil42,9701212013/6/Soil9600
112006/7/Soil3000482002/6/Soil24,000852003/6/Soil12001222013/6/Soil2400
122006/7/Soil5000492002/4/Soil28,800862003/6/Soil18,0001232010/6/Soil19,200
132000/6/Soil1000502006/7/Soil5400872010/6/Soil96001242002/4/Soil8000
142008/2/Soil1360512002/6/Soil22,000882005/5/Soil49351252001/3/Soil21,600
151998/5/Soil225522012/3/Soil2400892014/4/Soil80001262003/5/Soil28,000
162005/5/Soil400532010/6/Soil4800902010/6/Soil45001272013/4/Rock2100
171997/5/Soil1800542002/6/Soil40,000912012/6/Soil16,0001282012/6/Soil16,800
182005/5/Soil5250552013/6/Soil3600922006/7/Soil147,4501292001/6/Soil12,800
192003/7/Rock1000562002/8/Soil30,600932005/6/Soil96001302006/7/Soil3400
202001/5/Soil3600571992/4/Soil3200941996/8/Soil15,0001311998/6/Soil12,000
212004/5/Soil6240582013/6/Soil3120951998/5/Soil18001322002/6/Soil8820
222012/4/Soil18,800592005/5/Soil7500962006/7/Soil36001332002/6/Soil14,640
232014/3/Soil400602002/6/Soil19,200971997/8/Soil30001342002/6/Soil3200
242014/4/Soil28,710612002/6/Soil460982003/5/Soil16,0001352002/6/Soil14,000
252014/3/Soil140622011/4/Soil720991994/6/Soil10,0001361998/6/Soil1720
262014/4/Soil24,000631999/7/Soil30801002007/6/Soil12,8001372002/6/Soil28,000
272014/3/Rock1050641992/5/Soil11,2001012007/5/Soil4001382002/6/Soil11,200
282014/4/Soil7050652002/5/Soil6001021992/7/Soil13601391998/5/Soil14,400
292014/5/Soil3000661987/5/Soil12001032004/6/Soil33601401998/6/Soil7200
302014/5/Soil2100672004/6/Soil16001042006/6/Soil13,2001412002/6/Soil8000
312014/6/Soil1950681979/6/Soil24001051998/5/Soil14001422002/6/Soil12,000
322003/12/Soil8160691980/7/Soil10501061992/4/Soil45601432002/6/Soil4000
332006/7/Soil10,800702012/6/Soil10801071996/6/Soil25001441998/6/Soil9600
342006/7/Soil8800711997/6/Soil15001082014/5/Soil32001452014/5/Soil338
352006/7/Soil19,200722002/7/Soil10,0001091998/3/Soil1,920,0001462013/9/Soil2280
361996/8/Soil4000731972/5/Soil60,0001101991/5/Soil9600
372006/7/Soil8000742004/6/Soil16,0001112012/4/Soil2000
Table 2. Data sources of landslides impact factors.
Table 2. Data sources of landslides impact factors.
Conditioning FactorName of the DataResolution/ScaleData TypeData Source
Density of fault (DOF), Lithology TypeGeological map of China1:50,000Vectorhttps://www.ngac.cn/
Elevation, Slope, Aspect, Profile Curvature, Plane Curvature, Terrain Wetness Index (TWI), Stream Power Index (SPI): Roughness, Cutting-Depth, Relief, Elevation Coefficient of Variation (CV)Spatial resolution DEM data for China30 mRasterhttp://www.gscloud.cn/
Density of road (DOR), Density of steam (DOS)Basic geographic data of river systems, roads, and administrative boundaries1:1,000,000Vectorhttp://www.gscloud.cn/
RainfallThe dataset of precipitation in China from 1991 to 202030 mRasterhttp://www.gisrs.cn/
Normalized Difference Vegetation Index (NDVI)NDVI data for China in 202030 mRasterhttp://www.nesdc.org.cn/
Land useLand cover data for China in 202030 mRasterhttps://www.ncdc.ac.cn/
Table 3. Multi-collinearity test by tolerance and VIF.
Table 3. Multi-collinearity test by tolerance and VIF.
FactorsTolerancesVIFFactorsTolerancesVIFFactorsTolerancesVIF
Aspect0.9401.063DOS0.9681.033Lithology type0.5801.723
CV0.5831.715DOR0.5701.755Relief0.2503.993
DOF0.9681.033Roughness0.6021.660Cutting depth0.2364.244
DEM0.6111.636Slope0.2464.062Plane curvature0.8121.231
Landuse0.6081.644SPI0.7091.411Profile curvature0.9281.077
NDVI0.7491.335TWI0.7081.412Rainfall0.5901.695
Table 4. IOE model parametrical collation table.
Table 4. IOE model parametrical collation table.
FactorsClassesNo. of LandslidesNo. of Raster abFRij*PijHiHi,maxIiPiWiWi*
Slope0–6.86231,147,2910.15750.34870.45180.07160.27232.32190.05431.26240.06860.0277
6.86–13.7346979,9350.31510.29791.05790.16760.4319
13.73–21.7942647,3220.28770.19681.46210.23160.4888
21.79–32.5323377,9730.15750.11491.37130.21730.4785
32.53–76.1012137,3530.08220.04181.96880.31190.5243
Plane Curvature−25.78–−2.96356,1990.02050.01711.20300.20160.46582.32190.01301.19340.01560.0063
−2.96–−0.9824366,6510.16440.11141.47510.24720.4984
0.98–0.33611,795,1410.41780.54570.76580.12830.3801
0.33–2.0950935,1360.34250.28421.20490.20190.4661
2.09–30.188136,7470.05480.04161.31840.22090.4813
DOS0–0.261282,889,2510.87670.87820.99840.21000.47292.32190.01940.95070.01840.0074
0.26–0.795120,6210.03420.03670.93420.19650.4613
0.79–1.247141,9480.04790.04311.11130.23380.4902
1.24–1.79595,7760.03420.02911.17650.24750.4986
1.79–3.28142,2780.00680.01290.53310.11210.3540
DOF0–0.25982,127,2930.67120.64661.03820.20910.47212.32190.01870.99280.01850.0075
0.25–0.7617265,9400.11640.08081.44050.29020.5180
0.76–1.2322655,9170.15070.19940.75590.15230.4135
1.23–1.765147,1710.03420.04470.76570.15420.4160
1.76–3.16493,5530.02740.02840.96350.19410.4591
DOR0–0.49972,477,5200.66440.75310.88230.07120.27142.32190.15612.47880.38690.1562
0.49–1.5217493,5590.11640.15000.77620.06260.2503
1.52–2.9917237,8570.11640.07231.61060.13000.3826
2.99–5.811372,0270.08900.02194.06710.32820.5275
5.81–11.39289110.01370.00275.05750.40810.5277
Rainfall2051–2118401,210,4370.27400.36790.74470.11670.36162.32190.20591.27650.26280.1061
2118–2169361,190,3780.24660.36180.68160.10680.3446
2169–225138532,9520.26030.16201.60680.25170.5010
2251–236231219,3540.21230.06673.18460.49900.5005
2362–25671136,7530.00680.04160.16490.02580.1363
TWI3.97–7.60731,305,2350.50000.39671.26040.28530.51622.32190.03240.88340.02860.0116
7.60–9.65451,171,8200.30820.35620.86540.19590.4607
9.65–12.6021491,5090.14380.14940.96290.21800.4791
12.60–16.815272,0140.03420.08270.41430.09380.3202
16.81–28.97249,2960.01370.01500.91430.20700.4704
NDVI−1649–42461290,5410.08220.02752.98660.35350.53032.32190.08781.68970.14830.0599
4246–611520199,1410.13700.06052.26320.26790.5091
6115–737835472,6580.23970.14371.66870.19750.4622
7378–826040915,1880.27400.27820.98500.11660.3615
8260–9999391,612,3460.26710.49010.54510.06450.2551
DEM59–249351,316,7760.23970.40030.59900.10610.34352.32190.10551.12870.11900.0481
249–393491,147,0250.33560.34870.96270.17060.4352
393–58441426,8510.28080.12972.16450.38350.5303
584–82619276,0050.13010.08391.55130.27490.5121
826–14002123,2170.01370.03750.36590.06480.2559
Lithology typeHarder Rock22312,9710.15070.09511.58410.27860.51362.32190.17101.13730.19440.0785
Hard Rock762,359,1030.52050.71710.72600.12770.3791
Weak Rock46538,8570.31510.16381.92370.33830.5290
Weaker Rock231,0220.01370.00941.45280.25550.5030
Loose Rock047,9090.00000.01460.00010.00000.0003
Land useCropland461,220,4740.31510.37100.84940.08650.30542.32190.27231.96460.53500.2160
Forest881,955,7660.60270.59451.01400.10320.3382
Others149980.00680.00154.50860.45900.5157
Water036,8070.00000.01120.00010.00000.0002
Buildup1171,8290.07530.02183.45090.35130.5302
AspectNorth12459,6350.08220.13970.58840.07250.27453.00000.01091.01410.01110.0045
Northeast17334,0510.11640.10151.14680.14140.3990
East18466,4010.12330.14180.86970.10720.3454
Southeast21401,9120.14380.12221.17750.14510.4041
South22410,4050.15070.12471.20800.14890.4091
Southwest20367,8350.13700.11181.22530.15100.4119
West21466,4320.14380.14181.01460.12510.3751
Northwest15383,2030.10270.11650.88210.10870.3481
SPI2.75–6.551263,5400.00680.08010.08560.01870.10722.32190.11060.91740.10150.0410
6.55–8.68371,104,3190.25340.33570.75510.16460.4285
8.68–10.63671,231,6180.45890.37441.22590.26720.5088
10.63–13.2335571,9610.23970.17391.37900.30060.5213
13.23–26.406118,4360.04110.03601.14160.24890.4994
Roughness1–1.04812,292,1920.55480.69670.79640.11590.36042.32190.19091.37380.26230.1059
10.4–1.1245722,6960.30820.21971.40320.20430.4681
1.12–1.2619210,8690.13010.06412.03040.29560.5197
1.26–1.50055,5780.00000.01690.00010.00000.0002
1.50–4.16185390.00680.00262.63900.38420.5302
Cutting depth0–2.05241,272,8420.16440.38690.42500.07440.27882.32190.05221.14270.05970.0241
2.05–4.78651,139,4890.44520.34641.28550.22500.4842
4.78–8.1938599,3510.26030.18221.42880.25010.5000
8.19–13.1917227,6220.11640.06921.68300.29460.5194
13.19–58250,5700.01370.01540.89130.15600.4181
Relief0–5321,608,8740.21920.48900.44830.07510.28042.32190.08481.19450.10130.0409
5–1060937,8630.41100.28511.44170.24140.4950
10–1738524,4990.26030.15941.63260.27330.5115
17–2615176,5890.10270.05371.91420.32050.5261
26–132142,0490.00680.01280.53600.08970.3121
Profile Curvature−37.08–−4.65386,8950.02050.02640.77810.16670.43092.32190.02430.93350.02270.0092
−4.65–−1.4418415,1620.12330.12620.97710.20930.4723
−1.44–0.81811,991,9420.55480.60550.91640.19630.4611
0.81–3.7040645,5980.27400.19621.39620.29910.5208
3.70–44.814150,2770.02740.04570.59990.12850.3804
CV0–0.0051371,224,7900.25340.37230.68080.09540.32332.32190.08531.42790.12180.0492
0.0051–0.0096561,119,9050.38360.34041.12690.15780.4204
0.0096–0.016238689,1860.26030.20951.24250.17400.4390
0.0162–0.027811225,8630.07530.06871.09750.15370.4153
0.0278–0.1289430,1300.02740.00922.99160.41900.5258
Table 5. Optimal hyperparameters for different models.
Table 5. Optimal hyperparameters for different models.
No.ModelParameters
1SVMC = 10, gamma = scale, and kernel = rbf (all other parameters as default).
2LDAshrinkage = None, and solver = lsqr (all other parameters as default).
3RFmax_depth = 20, min_samples_leaf = 1, min_samples_split = 2, and n_estimators = 200 (all other parameters as default).
4ETmax_depth = 20, min_samples_leaf = 1, min_samples_split = 2, and n_estimators = 100 (all other parameters as default).
Table 6. Confidence Intervals of AUC, Accuracy, Precision, F1, and Recall for the Different Models.
Table 6. Confidence Intervals of AUC, Accuracy, Precision, F1, and Recall for the Different Models.
ModelAUCACCPrecisionF1Recall
ValueCI.ValueCI.ValueCI.ValueCI.ValueCI.
SVM0.933[0.871, 0.946]0.872[0.784, 0.885]0.908[0.773, 0.964]0.861[0.781, 0.880]0.818[0.681, 0.833]
LDA0.922[0.862, 0.958]0.840[0.768, 0.900]0.875[0.765, 0.994]0.826[0.743, 0.893]0.782[0.622, 0.827]
RF0.965[0.912, 0.983]0.906[0.858, 0.943]0.925[0.848, 0.994]0.900[0.859, 0.942]0.876[0.783, 0.923]
ET0.961[0.923, 0.979]0.909[0.843, 0.936]0.916[0.820, 0.994]0.905[0.840, 0.937]0.894[0.749, 0.928]
Table 7. Classification performance of different coupling models.
Table 7. Classification performance of different coupling models.
ModelLandslide Susceptibility Zone LevelsNumber of Grid CellsArea ProportionNumber of LandslidesLandslide Number RatioLandslide Frequency Ratio
ETVery low982,1440.3020.010.05
low638,8820.1920.010.07
middle292,0520.0960.040.46
high321,1240.10110.080.77
Very high1,055,6720.321250.862.67
RFVery low912,1760.2820.010.05
low660,8510.2010.010.03
middle334,6000.1030.020.20
high224,4530.07230.162.31
Very high1,157,7940.351170.802.28
SVMVery low1,124,2430.3480.050.16
low499,7570.15110.080.50
middle237,6730.0780.050.76
high382,4770.1270.050.41
Very high1,045,7240.321120.772.41
LDAVery low1,550,8510.47170.120.25
low272,6060.08100.070.83
middle163,1230.0580.051.11
high161,6550.0540.030.56
Very high1,141,6390.351070.732.11
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kong, Y.; Zhu, K.; Wu, H.; Xu, C.; Meng, Z.; Kong, H.; Tan, W.; Kong, X.; Chen, X.; Chen, L.; et al. Towards Sustainable Development: Landslide Susceptibility Assessment with Sample Optimization in Guiyang County, China. Sustainability 2025, 17, 9575. https://doi.org/10.3390/su17219575

AMA Style

Kong Y, Zhu K, Wu H, Xu C, Meng Z, Kong H, Tan W, Kong X, Chen X, Chen L, et al. Towards Sustainable Development: Landslide Susceptibility Assessment with Sample Optimization in Guiyang County, China. Sustainability. 2025; 17(21):9575. https://doi.org/10.3390/su17219575

Chicago/Turabian Style

Kong, Yuzhong, Kangcheng Zhu, Hua Wu, Chong Xu, Ze Meng, Hui Kong, Wen Tan, Xiangyun Kong, Xingwang Chen, Linna Chen, and et al. 2025. "Towards Sustainable Development: Landslide Susceptibility Assessment with Sample Optimization in Guiyang County, China" Sustainability 17, no. 21: 9575. https://doi.org/10.3390/su17219575

APA Style

Kong, Y., Zhu, K., Wu, H., Xu, C., Meng, Z., Kong, H., Tan, W., Kong, X., Chen, X., Chen, L., & Xu, T. (2025). Towards Sustainable Development: Landslide Susceptibility Assessment with Sample Optimization in Guiyang County, China. Sustainability, 17(21), 9575. https://doi.org/10.3390/su17219575

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop