Examination of Landslide Susceptibility Modeling Using Ensemble Learning and Factor Engineering

Zhang, Lizhou; Ye, Siqiao; He, Deping; Wang, Linfeng; Li, Weiping; Jin, Bijing; Zeng, Taorui

doi:10.3390/app15116192

Open AccessArticle

Examination of Landslide Susceptibility Modeling Using Ensemble Learning and Factor Engineering

by

Lizhou Zhang

^1,2,3,

Siqiao Ye

¹,

Deping He

^2,3,

Linfeng Wang

¹

,

Weiping Li

^2,3,

Bijing Jin

⁴ and

Taorui Zeng

^5,6,*

¹

College of River and Ocean Engineering, Chongqing Jiaotong University, Chongqing 400047, China

²

Chongqing Institute of Surveying and Mapping Science and Technology, Chongqing 401121, China

³

Technology Innovation Center for Spatio-Temporal Information and Equipment of Intelligent City, Ministry of Natural Resources, Chongqing 401121, China

⁴

Faculty of Engineering, China University of Geosciences, Wuhan 430074, China

⁵

Institute of Future Civil Engineering Science and Technology, Chongqing Jiaotong University, Chongqing 400047, China

⁶

Institute of Frontier Interdisciplinary Technology, Chongqing Jiaotong University, Chongqing 400047, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(11), 6192; https://doi.org/10.3390/app15116192

Submission received: 28 April 2025 / Revised: 26 May 2025 / Accepted: 29 May 2025 / Published: 30 May 2025

(This article belongs to the Topic Earth Observation Systems in Geology Mass Identification, Investigation and Inventory Mapping)

Download

Browse Figures

Versions Notes

Abstract

Current research lacks an in-depth exploration of ensemble learning and factor engineering applications in regard to landslide susceptibility modeling. In the Three Gorges Reservoir area of China, a region prone to frequent landslides that endanger lives and infrastructure, this study advances landslide susceptibility prediction by integrating ensemble learning with systematic factor engineering. Four homogeneous ensemble models (random forest, XGBoost, LightGBM, and CatBoost) and two heterogeneous ensembles (bagging and stacking) were implemented to evaluate 14 influencing factors. The key results demonstrate the Digital Elevation Model (DEM) as the dominant factor, while the stacking ensemble achieved superior performance (AUC = 0.876), outperforming single models by 4.4%. Iterative factor elimination and hyperparameter tuning increased the high-susceptibility zones in the stacking predictions to 42.54% and enhanced XGBoost’s low-susceptibility classification accuracy from 12.96% to 13.57%. The optimized models were used to generate a high-resolution landslide susceptibility map, identifying 23.8% of the northern and central regions as high-susceptibility areas, compared to only 9.3% as eastern and southern low-susceptibility zones. This methodology improved the prediction accuracy by 12–18% in comparison to a single model, providing actionable insights for landslide risk mitigation.

Keywords:

landslide susceptibility; ensemble learning; factor engineering; random forest; XGBoost; LightGBM; CatBoost; Three Gorges Reservoir area

1. Introduction

Landslides represent a frequent natural hazard globally, endangering lives and infrastructure in mountainous and reservoir regions in particular [1]. The Three Gorges Reservoir area exemplifies these challenges, where complex geological conditions and harsh environments trigger frequent landslides along the Yangtze River. In recent years, periodic water level fluctuations, extreme rainfall, and accelerated urbanization have further exacerbated the risk of landslides [2]. Over the past decade, nearly 5000 landslides have been recorded in this region, including many highly destructive cases. Three key factors drive landslide proliferation: fragile geological structures, steep slopes, and intensive construction activities. These conditions promote slow-moving landslides that accumulate structural damage over time. Surface deformations during slope movements generate ground fissures and localized collapses. Such deformations escalate maintenance costs, while posing critical safety risks for residents. Effective landslide susceptibility mapping addresses these challenges by enabling proactive risk assessments. Advanced predictive modeling enhances early warning capabilities and supports targeted mitigation strategies, ultimately strengthening community resilience and guiding sustainable land-use planning.

Landslide susceptibility provides a critical framework for assessing regional landslide risks by synthesizing geological, topographic, and meteorological factors [3,4,5,6]. This probabilistic evaluation depends on a systematic analysis of the predisposing factors to predict landslide likelihood and spatial patterns [7,8,9,10,11]. Three primary modeling approaches dominate current methodologies: statistical, physical, and machine learning models. Statistical models predict landslide occurrence probabilities based on historical data through statistical analysis [12]. These models have been effectively demonstrated in regard to national and medium-scale assessments, using public datasets [13,14,15]. Theoretical models, based on physical laws, simulate slope failure mechanisms through the use of mechanistic equations, achieving high precision in controlled scenarios, but requiring exhaustive geotechnical inputs that often prove impractical for regional applications [16,17,18,19]. Recent advances in data-driven machine learning techniques have revolutionized landslide prediction through the use of automated pattern recognition in high-dimensional datasets. For instance, Azarafza et al. (2021) [20] and Nikoobakht et al. (2022) [21] developed deep convolutional neural networks for mapping landslide susceptibility [22]. They proposed a meta-learning approach that utilized Bayesian optimization for hyperparameter tuning, employing random forests as the predictive model. Additionally, Pradhan et al. (2023) [23] introduced an ML-based explainable algorithm, SHapley Additive exPlanations, for landslide susceptibility modeling. Ensemble learning architectures enhance prediction reliability by strategically combining multiple models, with homogeneous and heterogeneous approaches offering distinct advantages for landslide susceptibility mapping [24,25]. Homogeneous ensembles (e.g., random forest, XGBoost, LightGBM, and CatBoost) employ collections of the same base algorithm, typically decision trees, to reduce variance through the use of parallel training with bootstrap aggregation (bagging) or gradient boosting [26,27]. While these methods excel at handling high-dimensional data and mitigating overfitting, they may exhibit limited adaptability when geological processes require fundamentally different algorithmic approaches. In contrast, heterogeneous ensembles (e.g., bagging, stacking) integrate diverse machine learning models to leverage their complementary strengths [28,29,30]. Stacking, for instance, uses a meta-learner to optimally combine base model predictions, often improving performance in regard to complex, nonlinear landscapes. Despite these developments, critical gaps persist in understanding ensemble learning’s adaptability across varied geological contexts. This underscores the need for comprehensive investigations to optimize ensemble architectures for region-specific disaster risk assessments.

Landslide susceptibility modeling requires rigorous execution of three interdependent phases, namely data acquisition, predictor variable selection, and model development, all of which collectively determine prediction accuracy [4,31,32]. Recent advances include explainable AI frameworks that integrate synthetic aperture radar time series, NDVI dynamics, and geoenvironmental parameters for enhanced landslide forecasting [33]. Despite the progress made, critical methodological gaps persist. Current approaches frequently prioritize statistical collinearity analysis during variable selection, yet this focus inadequately addresses model performance optimization. Such oversights in regard to predictor selection propagate uncertainty during subsequent modeling stages, potentially omitting key geospatial parameters and compromising map reliability [32]. Factor engineering emerges as a vital corrective strategy; a systematic framework for transforming raw data into optimized predictive features through creation, transformation, and strategic selection of relevant factors [34,35,36]. While successfully applied in general machine learning domains to boost model efficiency, its implementation in geohazard prediction remains nascent. Advancing factor engineering methodologies specifically for landslide susceptibility analysis could substantially improve both prediction accuracy and operational efficiency, enabling more robust disaster mitigation frameworks.

This study introduces an integrated methodology, combining homogeneous and heterogeneous ensemble architectures with systematic factor engineering, for comprehensive landslide susceptibility assessments. The key innovations include: (1) the integration of advanced machine learning frameworks, including random forest, XGBoost, LightGBM, CatBoost, bagging, and stacking, to rigorously evaluate the relevant influencing factors through the use of multidimensional importance ranking; and (2) the utilization of factor engineering techniques to optimize model performance by identifying and prioritizing the most critical factors. The refined framework generates high-resolution susceptibility maps that improve spatial risk stratification, addressing critical gaps in regional landslide modeling, wherein previous efforts inadequately explored synergistic model combinations and factor engineering’s performance impacts.

2. Geological Background

2.1. Study Area

The research focuses on Chongqing’s southwestern region, within China’s Three Gorges Reservoir area (Figure 1), spanning Fuling, Nanchuan, Zhongxian, and Fengdu districts. Positioned 172 km upstream from Chongqing’s urban core and 434 km from the Three Gorges Dam, the area straddles critical tectonic boundaries: the Sichuan basin’s fold belt to the northwest and the southeastern uplift zone, demarcated by Jinfoshan Mountain’s northern slope. The northwestern part belongs to the Sichuan Depression and the eastern Sichuan fold belt, while the southeastern part is part of the uplift fold of southeastern Sichuan. Geologically, the area is part of the Neo-Cathaysian tectonic system, with dominant orientations of NNE, NS, and NNW, as well as some arcuate structural lines. Stratigraphically, the area spans from the Cambrian of the Paleozoic to the Cretaceous of the Mesozoic, with notable gaps in the Devonian, Carboniferous, and parts of the Cretaceous. Quaternary sediments are sporadically distributed in river valleys, plains, and troughs. The Cambrian is characterized by dolomite and dolomitic limestone, the Ordovician by clastic and cracked limestone, the Silurian by limestone, the Permian by gray-to-black aluminous shale and carbonaceous shale, the Triassic by interbedded limestone, sandstone, and siltstone, and the Jurassic by brown–purple sandstone and shale. Tertiary strata are absent in the Cenozoic, while Quaternary lithology consists primarily of accumulation layers. Elevations here exceed 1400 m, presenting a deeply dissected mid-mountain landscape. The northwestern part is lower, displaying the “red layer” geomorphology of eastern Sichuan, with elevations generally between 500 and 1100 m, forming a shallow-cut low mountain landscape. Along the Chuanxiang Highway, the terrain consists of low mountains and trough basins, with elevations ranging from 500 to 800 m. The highest point in the study area is Fengchuiling of Jinfo Mountain at 2238 m, while the lowest point is Yutiaoyan in Qilong Township at 340 m, resulting in a relative height difference of 1898 m. Mid-mountains and low mountains dominate the topography, covering 50.71% and 48.07% of the area, respectively, with hills accounting for only 1.22%.

The area experiences a subtropical monsoon climate, influenced by the westerlies, the Western Pacific Subtropical High, the Southwest Vortex, and the Tibetan High. Winter is dominated by northerly airflows, while summer is influenced by the southerly monsoon, with the Pacific Subtropical High often extending westward. This results in distinct seasons, significant dry and wet periods, warm winters, hot summers, ample rainfall, and a long frost-free period. Temperatures decrease and precipitation increases from northwest to southeast, forming a distinct vertical climate. The average annual temperature is 18.1 °C, with an average annual precipitation of 1185 mm. In 2023, the annual precipitation reached 1380.8 mm, a 38.0% increase from the previous year. The study area is part of the Yangtze River basin, with abundant water resources and a well-developed hydrological network. The Yangtze River flows northwest to northeast through the area, forming an irregular “W” shape, and is supplied by primary, secondary, and tertiary rivers, as well as seasonal streams. Secondary tributaries are primarily mid-mountain and high-hill water systems, with relatively gentle gradients. The Yangtze River within the study area is approximately 134 km long, with a perennial flow rate of about 13,357 cubic meters per second. The Three Gorges Reservoir causes water levels to fluctuate between a maximum of 175 m and a minimum of 145 m, with variations in the upstream flow during dry and wet periods directly impacting reservoir regulation. These changes are a significant factor in the reactivation and deformation of water-related landslides.

2.2. Landslide Inventory and Influencing Factors

A detailed and accurate landslide inventory is essential for conducting landslide susceptibility mapping. First, it provides comprehensive spatio-temporal information about historical landslide events, enhancing the accuracy of regional historical data. This enables researchers to better understand landslide patterns, characteristics, and frequency, as well as their triggering factors, laying a solid foundation for regional landslide risk assessments. Second, landslide inventory data are fundamental for constructing susceptibility and prediction models. By analyzing historical data, researchers can identify the spatial distribution characteristics of landslides, develop precise susceptibility models, and improve the accuracy and reliability of landslide predictions. Finally, an accurate landslide inventory plays a key role in guiding mitigation measures.

Through systematic data collection and field surveys, this study established a comprehensive landslide inventory for the period 1968–2022, documenting 2028 historical events within the study area. The temporal analysis reveals a marked increase in landslide frequency between 1998 and 2010 (Figure 2a), closely associated with the Three Gorges Reservoir impoundment project. Following the reservoir’s initial filling in 2003, accelerated water-level fluctuations induced substantial alterations to the bank stress distributions and hydrodynamic conditions, precipitating widespread slope failures. As shown in Figure 2b, the majority of landslides cover an area of 0 to 50 × 104 m², while a small proportion exceeds 100 × 10⁴ m². Regarding landslide volume, most landslides range from 0 to 500 × 10⁴ m³, with a few outliers reaching over 1000 × 10⁴ m³, and some even surpassing 1500 × 10⁴ m³ (Figure 2c), highlighting the massive scale and high risks posed by these events. The primary triggering factors identified include rainfall and fluctuations in reservoir water levels, along with substantial contributions from human activities, such as artificial slope cutting and loading.

In regard to landslide susceptibility modeling, the selection and comprehensive analysis of influencing factors are critical in regard to the accuracy and effectiveness of the model. These factors encompass various dimensions, including topography, geology, hydrology, soil characteristics, vegetation cover, and human activities, each offering distinct contributions to the preconditioning of slope instability. By offering detailed descriptions of the geographical and environmental conditions, these factors help researchers identify and quantify the key factors that influence slope failure. Integrating multi-source data into landslide susceptibility models enables the simulation and prediction of landslide probabilities within specific regions.

Based on literature reviews [37,38,39] and data availability, this study considers the DEM, slope, plan curvature, profile curvature, the Stream Power Index (SPI), the Topographic Wetness Index (TWI), lithology, soil type, distance to the road, distance to the river, land use, and the Normalized Difference Vegetation Index (NDVI) (Figure 3). The DEM, slope, plan curvature, and profile curvature data are derived from the ALOS Digital Elevation Model, provided by the Alaska Satellite Facility, with a spatial resolution of 12.5 m, in raster format (Table 1).

The DEM serves as a fundamental topographic parameter in landslide analysis, governing precipitation accumulation and surface water flow dynamics. Slope, a key factor of mechanical stability, directly correlates with landslide risk; steeper slopes generally exhibit higher susceptibility. The plan curvature and profile curvature describe terrain concavity/convexity, influencing water flow convergence, sediment distribution, and surface erosion processes, all critical to landslide initiation. The Topographic Wetness Index, derived from DEM data using ArcGIS 10.6, quantifies localized soil moisture; elevated TWI values suggest higher saturation and, thus, greater landslide likelihood. The Stream Power Index reflects runoff erosive potential, indicating how precipitation and overland flows contribute to slope destabilization. The lithological data are derived from a 1:50,000 scale geological map, provided by local natural resource departments, in vector format. Lithology forms the geological foundation for landslide occurrences, as the mechanical properties and weathering characteristics of different lithological materials significantly affect slope stability. For instance, clay-rich rocks are more susceptible to landslides when saturated, while sandstone and limestone can also be vulnerable under heavy rainfall or seismic activity. The soil type data are obtained from the 1:1,000,000 scale Soil Map of China, compiled by the Nanjing Institute of Soil Research, Chinese Academy of Sciences, in vector format. The soil type directly influences landslide occurrences, with variations in permeability, water retention, and shear strength affecting soil stability. The distances to the road and the river are calculated using the Euclidean distance tool in ArcGIS and formatted as raster data. Road construction can disrupt the natural stability of the surface, increasing the risk of landslides. River erosion, on the other hand, can weaken slope support and trigger landslides. By calculating these distances, we can assess the potential impact of human activities and natural water systems on landslides. The Normalized Difference Vegetation Index (NDVI) is derived from Landsat 8 TM imagery, processed using ENVI 5.3 software, formatted as raster data. The NDVI reflects the surface vegetation cover, which plays a vital role in stabilizing soil, reducing rainfall-induced soil erosion, and surface runoff, thus lowering the risk of landslides. The land use data are sourced from the Land Use Map, based on 2020 Landsat 8 imagery, produced by the Resource and Environmental Science Data Center (RESDC) of the Chinese Academy of Sciences, formatted as raster data. Land use reflects different patterns of human activity and land use, each influencing landslide risk differently. For instance, farmland, built-up land, and forest land have varying landslide risks. Agricultural and built-up land may see reduced surface stability due to human activities, whereas forest land, with better vegetation cover, generally has lower landslide risk.

3. Methodology

A comprehensive methodological framework was adopted to enhance the accuracy and robustness of the landslide susceptibility modeling (Figure 4). The process began with the meticulous collection and processing of data from various sources, including topographic, geological, hydrological, soil, and human activity-related factors. Next, both homogeneous ensemble models (random forest, XGBoost, LightGBM, and CatBoost) and heterogeneous ensemble models (bagging and stacking) were employed to evaluate and rank the importance of the influencing factors. This analysis involved the in-depth optimization of model performance, with critical influencing factors identified through the use of targeted feature engineering techniques. The predictive performance of the models was validated using Receiver Operating Characteristic (ROC) analysis, followed by iterative optimization through the removal of less significant factors. Finally, the optimized models were validated using ROC curves and statistical analysis, resulting in the creation of a detailed landslide susceptibility distribution map. This map delineates areas of high, medium, low, and extremely low susceptibility, providing a solid scientific foundation for developing landslide risk management and prevention strategies.

3.1. Random Forest

Random forests, established by Leo Breiman [40], are an ensemble method that utilizes decision trees. This algorithm involves randomly sampling both instances and features from the original dataset to construct multiple decision trees, each trained on a random subset of the data. This dual randomness in selecting samples and features distinguishes random forests from traditional decision tree methods. The approach assumes that: (1) the bootstrap samples accurately represent the data distribution, (2) feature collinearity does not significantly distort the importance scores, and (3) the majority of trees capture meaningful signals rather than noise. Random forests are particularly effective for high-dimensional datasets, allowing for classification without the need for dimensionality reduction. After training, the model calculates the factor importance to assess each factor’s contribution to the predictions, helping to identify the most influential variables. Despite their high accuracy and robustness, a notable limitation of random forests is their interpretability. The complexity of visualizing and interpreting the combined results of hundreds or thousands of decision trees makes them less straightforward than single decision trees.

3.2. XGBoost

XGBoost (eXtreme Gradient Boosting) is an optimized gradient boosting algorithm that extends traditional gradient boosting decision trees (GBDTs). Designed to handle large-scale and high-dimensional data, XGBoost iteratively adds decision trees to correct the residuals of previous models and minimizes the loss function by incorporating regularization terms during each tree construction. This method assumes that additive tree structures can effectively approximate complex patterns, while regularization terms (L1/L2) suppress irrelevant features, reducing overfitting and improving the generalization ability. XGBoost excels in regard to binary classification, multi-class classification, and multi-label classification tasks, providing both class labels and probability predictions for each class. Its key advantages include the incorporation of L1 and L2 regularization to mitigate overfitting, parallel processing for efficient tree construction, and it has a built-in mechanism to handle missing values. Additionally, XGBoost employs pruning techniques to prevent overfitting and integrates cross-validation functionality to automatically determine the optimal number of iterations, significantly streamlining the model selection and tuning process.

3.3. LightGBM

LightGBM (Light Gradient Boosting Machine), developed by Ke et al. (2017) [41], is an efficient gradient boosting framework, designed to handle large-scale data. Known for its exceptional processing speed and low memory usage, LightGBM is widely used in data science competitions and commercial applications. Building on the decision tree algorithm, LightGBM enhances efficiency and reduces memory consumption compared to frameworks like XGBoost through two key innovations. First, LightGBM employs a histogram-based decision tree algorithm, which converts continuous feature values into discrete bins, assuming that discretization preserves sufficient predictive information. Second, it utilizes a leaf-wise growth strategy with depth constraints, prioritizing the splitting of leaves with the highest gain to accelerate error reduction. This strategy assumes that local greedy splits accumulate to approximate global optima. For classification tasks, LightGBM efficiently handles both binary and multi-class problems by iteratively optimizing decision trees to minimize prediction errors. Particularly in big data environments, its rapid training capability and low memory requirements make it ideal for resource-constrained settings, while maintaining high-performance processing of large datasets.

3.4. CatBoost

CatBoost (Categorical Boosting) is a gradient boosting decision tree algorithm specifically designed to handle categorical factors effectively. It operates under three key assumptions: (1) categorical features exhibit stable target statistics, (2) ordered boosting mitigates target leakage in sequential data, and (3) symmetric trees balance computational efficiency and overfitting. These features give CatBoost a distinct advantage in tasks involving large amounts of categorical data. As an open-source machine learning library, CatBoost simplifies practical applications by directly processing datasets with categorical factors, eliminating the need for cumbersome preprocessing. To enhance performance and model accuracy, CatBoost employs several strategies. First, it uses a symmetric tree structure, where all the nodes at each level split based on the same factor. This approach accelerates training, while reducing overfitting. Second, CatBoost introduces an ordered boosting strategy, which calculates residuals for each sample using data unseen by the previous model, effectively preventing target leakage and ensuring unbiased model evaluation. Another core feature of CatBoost is its automatic handling of categorical factors. It uses a unique encoding mechanism to process categorical data, preserving the integrity of categorical information, while mitigating overfitting caused by high cardinality. CatBoost excels in regard to binary and multi-class classification tasks, demonstrating efficient and accurate performance. Notably, it can directly handle categorical data in string format without requiring one-hot encoding, offering significant convenience and efficiency in data science applications.

3.5. Heterogeneous Ensemble Strategy

This study evaluates four homogeneous ensemble models, random forest, XGBoost, LightGBM, and CatBoost, each employing distinct modeling strategies that result in unique decision boundaries. As shown in Figure 5, random forest exhibits relatively smooth decision boundaries with clear separation lines, indicating stable performance across most regions, although occasional misclassifications may occur. XGBoost demonstrates smoother decision boundaries and superior classification performance, particularly in densely populated data regions; however, it may overfit in sparse data areas. LightGBM, characterized by more complex decision boundaries, showcases high flexibility and sensitivity to detail, enabling the capture of intricate patterns, but also increasing susceptibility to overfitting at boundaries. CatBoost’s decision boundaries resemble those of XGBoost, showing stable performance in dense regions, while effectively avoiding overfitting in sparse areas. Given the unique strengths and limitations of each model, this study proposes a heterogeneous ensemble approach. By integrating the predictions from these models, the goal is to construct an ensemble with overall superior performance. This approach is expected to deliver more accurate and robust classification results across varying data distribution conditions, combining the individual advantages of each model, while mitigating their weaknesses.

Bagging, short for bootstrap aggregating, is an ensemble learning technique, designed to enhance predictive performance by reducing model variance. The core idea of bagging is to generate multiple subsets from the original dataset through random sampling with replacement, to train a base model on each subset, and combine their predictions using averaging (for regression tasks) or voting (for classification tasks). This process aims to improve the stability and accuracy of predictions by mitigating overfitting and enhancing generalization. In this study, a bagging strategy is employed to integrate the random forest, XGBoost, LightGBM, and CatBoost models, leveraging the strengths of each to enhance classification accuracy and stability. A soft voting strategy is utilized to make the final decision in classification tasks by combining the predicted probabilities from multiple base classifiers. In regard to soft voting, each classifier outputs a probability distribution reflecting the likelihood of a sample belonging to each class. These distributions are then weighted and averaged, typically with equal weights, although adjustments can be made based on individual classifier performance. The class with the highest combined probability is selected as the final prediction. The advantage of soft voting lies in its ability to utilize the full predictive probability information from each classifier, rather than relying solely on hard classification results. This approach not only improves classification accuracy and stability, but also generates smoother decision boundaries, making it particularly effective for handling complex classification problems.

Stacking, a sophisticated ensemble learning technique, aims to construct a more effective final model by integrating the predictions of multiple diverse base models. This method employs a “meta-learner” or “second-level learner” to aggregate the predictions from various base models, creating a novel strategy that has been widely adopted in predictive tasks, particularly in competitions and complex problem-solving scenarios. The core mechanism of stacking involves using the predictions from each base model as new inputs, referred to as meta-factors, which are then analyzed by a meta-learner to produce the final prediction. The base models can include a variety of machine learning algorithms, such as random forest, XGBoost, LightGBM, CatBoost, and neural networks. The diversity of these models is a critical factor in enhancing the effectiveness of stacking, as it allows the meta-learner to capture complementary patterns and improve the overall predictive performance. In regard to the implementation of heterogeneous ensemble stacking, multiple base learners are first trained on the original training data. These base models can be of the same or different types, depending on the problem and dataset. Next, the trained base models are used to generate predictions, based on either the original training data or an independent validation set. These predictions are then treated as meta-factors and used to train the meta-learner, which learns the optimal way to combine the base model outputs. In practical applications, all base models make predictions based on new data, and these predictions are fed into the meta-learner to produce the final prediction. This approach effectively combines the strengths of multiple models, aiming to achieve higher predictive accuracy and better generalization capability.

3.6. Factor Engineering

Factor engineering is a critical preprocessing step in data science and machine learning, involving the creation, selection, and transformation of factors from raw data to build more efficient and accurate predictive models. The primary goal of factor engineering is to maximize the utilization of information in the dataset, enhance model performance, simplify the model structure, and expedite training processes. In this study, a comprehensive factor engineering strategy was adopted, leveraging advanced ensemble learning models to evaluate and rank the importance of all the potential influencing factors.

This study departs from traditional factor collinearity detection methods, focusing instead on examining the performance of influencing factors across various ensemble learning models. Specifically, it explores the strategies employed by random forest, XGBoost, LightGBM, and CatBoost in evaluating and ranking factor importance. Random forest assesses factor importance by analyzing the purity gain contributed by each factor during decision tree node splits. This process typically uses Gini impurity or entropy as metrics, calculating the reduction in model error associated with each factor. The number of splits involving each factor across all the trees and the average purity gain they provide are accumulated to form a comprehensive factor importance score. XGBoost uses “gain” as its primary evaluation metric, quantifying the improvement in model performance contributed by each factor when making splits. This is specifically represented by the average reduction in the loss function following a split. Similarly, LightGBM also relies on gain to determine factor importance, but its calculation method differs slightly due to the incorporation of Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) techniques, which focus on evaluating the total gain in performance improvement from each factor during splits. CatBoost employs a method similar to the aforementioned models, assessing factor importance based on the performance improvement each factor brings during splits. Additionally, it considers the frequency of factor usage across different trees to evaluate the overall contribution of each factor, thereby determining its importance ranking.

Based on the evaluation mechanisms of different models, the specific implementation steps of factor engineering in this study are as follows: (1) The dataset was independently trained using models such as random forest, XGBoost, LightGBM, CatBoost, bagging, and stacking. Each model provided factor importance scores, indicating the contribution of each factor to the prediction task. (2) By integrating the importance scores from each model, a consolidated factor importance list was created using either the average scores or more complex statistical methods. (3) Based on the importance list, the least significant factors were gradually eliminated, with the model retrained after each removal to assess its impact on prediction accuracy. (4) Model performance was compared before and after factor removal to analyze which eliminations significantly affected the outcomes. This step helps identify and exclude factors with minimal impact, simplifying the model and potentially improving efficiency. (5) A subset of factors that minimally impacted the prediction accuracy, while retaining those critical to model performance, was selected. This approach optimized the factor set, enhanced the model’s generalization capability, and improved prediction accuracy. By employing this systematic factor engineering method, machine learning models were constructed that were both efficient and accurate, demonstrating the critical role of factor engineering in improving predictive performance.

3.7. Accuracy Test

In landslide susceptibility modeling, imbalanced datasets are common, with significantly fewer landslide (positive class) samples than non-landslide (negative class) samples. The ROC curve evaluates model performance across thresholds by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR). The TPR and FPR are defined as:

TPR = \frac{TP}{TP + FN}

(1)

FPR = \frac{FP}{TN + FP}

(2)

where TP, FN, TN, and FP represent the number of true positives, false negatives, true negatives, and false positives, respectively. The Area Under the Curve (AUC), ranging from 0 to 1, quantifies model performance. Higher AUC values indicate better performance, while values near 0.5 suggest random guessing, and values below 0.5 indicate predictions worse than random, requiring adjustments. The ROC curve and AUC provide an intuitive method for comparing models, making them accessible to non-experts. They are particularly useful in multi-model comparisons and cross-study evaluations, offering clear benchmarks for decision making. By analyzing the ROC curve, researchers can identify the optimal balance between sensitivity and specificity, selecting a threshold that maximizes landslide detection, while minimizing false positives. This approach ensures robust model performance in real-world applications.

4. Results

4.1. Factor Importance Analysis

Figure 6 presents the factor importance rankings. The DEM emerges as the most important factor in the random forest, LightGBM, CatBoost, bagging, and stacking models, with its importance score significantly surpassing that of the other factors, indicating its crucial role in predicting landslide susceptibility. In the random forest model, other important factors, in order of significance, include the distance to the river, lithology, the NDVI, and the distance to the road, all of which demonstrate high importance across multiple models. In regard to the XGBoost model, lithology shows exceptionally high importance, far exceeding the other factors, followed by the DEM and land use, which also significantly influence model performance. The LightGBM model similarly highlights the DEM, along with the distance to the river, the distance to the road, and the NDVI, as critical factors. In regard to the CatBoost model, the importance of the DEM is reaffirmed, with lithology and the distance to the river also ranked highly. In regard to the bagging and stacking models, in addition to the DEM, lithology, the distance to the river, and the NDVI consistently exhibit high importance. This consistency across various models underscores the significance of these factors. In summary, the DEM is consistently identified as the most important factor across all the models, emphasizing its critical role in landslide prediction. Lithology and the distance to the river also demonstrate high importance in multiple models, indicating their substantial impact on landslide susceptibility prediction. The bagging and stacking strategies enhance the robustness and comprehensiveness of the factor importance scores by integrating contributions from all the base models.

In regard to the comprehensive analysis of all the influencing factors, the ROC curve highlights the differences in prediction performance among the models (Figure 7). In regard to the random forest model, the DEM is identified as the most critical factor, consistent with its relatively high AUC value of 0.836, underscoring the decisive role of elevation in geological hazard prediction. In contrast, the XGBoost model identifies lithology as the most important factor, which may explain its slightly lower AUC value of 0.820. While lithology is important, its predictive power appears less consistent across the different regions compared to the elevation data. Similarly, both the CatBoost and LightGBM models also highlight the DEM as a significant factor, aligning in regard to their relatively high AUC values of 0.831 and 0.821, respectively. For the bagging and stacking ensemble models, the combined influence of all factors is observed, with both the DEM and lithology emerging as key contributors. This ensemble strategy significantly enhances model robustness and effectively reduces the risk of overfitting, as reflected in their AUC values of 0.835 and 0.839, respectively. Notably, the stacking model achieves the best performance, further demonstrating the advantages of integrating multiple models. This analysis emphasizes the importance of fully considering and utilizing key factors when constructing and optimizing geological hazard prediction models. By doing so, the predictive ability and reliability of these models can be significantly improved.

4.2. Factor Sensitivity Analysis

Based on the factor importance evaluation shown in Figure 6 for the different models, the least important factors were sequentially removed, and the models were retrained after each removal to assess the impact on the prediction accuracy. The final results are presented in Figure 8 and Figure 9. In regard to the random forest model, the AUC value is 0.836 when all the factors are included. Removing the soil type and land use factors slightly reduces the prediction accuracy. However, as additional factors are removed, the AUC value peaks at 0.866 when retaining seven factors (DEM, distance to the river, lithology, NDVI, and distance to the road), representing a 3.5% improvement over the initial model. Further factor removal significantly decreases the AUC value, indicating that the random forest model relies heavily on the combined effect of multiple factors, with key factors playing a critical role in maintaining model performance. In regard to this study area, the optimal number of factors for the random forest model is five. In regard to the XGBoost model, the AUC value is 0.820 with all the factors included. Removing the TWI, the SPI, profile curvature, and the distance to the road improves the AUC value, namely to 0.840, a 2.4% increase, demonstrating a high sensitivity to core factors, while maintaining stable performance with a reduced set of key factors. The LightGBM model initially achieves an AUC value of 0.821. Its performance declines steadily with factor removal, particularly after removing more than six factors. The optimal factor combination, the DEM, the distance to the river, the distance to the road, the NDVI, lithology, and slope, yields the highest AUC value of 0.826, a 0.6% improvement. For the CatBoost model, the AUC value is 0.831 when all the factors are included. Removing plan curvature, profile curvature, the TWI, and the SPI significantly reduces model performance. The optimal factor combination, the DEM, lithology, the distance to the river, the distance to the road, the NDVI, soil type, slope, and land use, achieves an AUC value of 0.842, a 1.3% improvement. The bagging and stacking ensemble models exhibit optimal performance after removing eight factors, retaining the combination of the DEM, lithology, the distance to the river, and the NDVI. The bagging model achieves an AUC value of 0.850, a 2.4% improvement, while the stacking model attains the highest AUC value of 0.876, a 4.4% improvement, demonstrating its superior predictive performance compared to the other models. This series of analyses highlights significant differences in factor selection and optimization strategies among the models in regard to landslide susceptibility prediction. Despite these differences, the DEM consistently emerges as the most critical factor across all the models. These results underscore the importance of the rational selection and optimization of factor combinations to enhance model prediction accuracy.

4.3. Pixel Statistical Analysis

In the further analysis, the performance of each model based on the training/testing dataset with specific combinations of influencing factors was investigated, and the prediction results in terms of landslide susceptibility were statistically decomposed. Based on probability, the landslide susceptibility levels were classified into four grades [32]: very low susceptibility (0 ≤ P(S) < 5%), low susceptibility (5% ≤ P(S) < 35%), moderate susceptibility (35% ≤ P(S) < 75%), and high susceptibility (75% ≤ P(S) ≤ 100%). Model performance was evaluated by analyzing the proportion of very low and high susceptibility levels. First, the susceptibility grade distribution of the models considering all the factors was analyzed. For example, according to Figure 10, the stacking model had the highest proportion within the high susceptibility grade at 40.52%, followed by the random forest (38.81%), XGBoost (30.01%), bagging (26.53%), CatBoost (24.18%), and LightGBM (22.93%) models. In regard to the very low susceptibility grade, the XGBoost model performed best with a proportion of 12.96%, followed by the random forest (10.60%) and CatBoost (9.31%) models, while the stacking model had the lowest proportion, at 7.22%. By optimizing the factor combination techniques of each model, the prediction of the susceptibility grades improved (Figure 11). For instance, the stacking model’s proportion in terms of the high susceptibility grade increased to 42.54%, and the random forest model increased to 41.25%. Meanwhile, the XGBoost model’s proportion in regard to the very low susceptibility grade rose from 12.96% to 13.57%. Overall, the uncertainty of all the models decreased. For example, the random forest model’s proportion in regard to the high susceptibility grade increased from 38.81% to 41.25%, and the proportions for the LightGBM and CatBoost models also improved. However, the bagging strategy’s proportion in regard to the high susceptibility grade decreased from 26.53% to 24.82%. In contrast, the stacking model demonstrated outstanding performance, significantly improving in regard to both high and very low susceptibility grades. Particularly in the very low susceptibility grade, its proportion increased from 7.22% to 14.35%. These results indicate that refined factor engineering strategies can effectively reduce model prediction uncertainty, thereby enhancing prediction accuracy.

Finally, the base models considering the optimal factors were used to predict landslide susceptibility across the entire region, and the prediction results were classified according to the susceptibility levels (Figure 12). The prediction results from the random forest model indicate that the low susceptibility category represents the highest proportion at 38.85%, followed by moderate susceptibility at 36.04%, very low susceptibility at 17.67%, and high susceptibility at 7.44%. The RF model demonstrates strong identification capabilities in regard to the low susceptibility category and slightly higher predictions in regard to the high susceptibility category compared to the other models. The XGBoost model’s prediction results show that the very low susceptibility category represents the highest proportion at 25.99%, followed by moderate susceptibility at 33.01% and low susceptibility at 31.93%, with high susceptibility at 9.07%. The XGBoost model excels in identifying the very low susceptibility category and also shows relatively high predictions in regard to the high susceptibility category. The LightGBM model predicts the moderate susceptibility category most frequently, accounting for 43.82%, indicating its high sensitivity to moderate-risk areas. The low susceptibility category accounts for 32.08%, while the very low and high susceptibility categories account for 17.77% and 6.34%, respectively. The CatBoost model’s prediction results reveal a balanced proportion between the moderate and low susceptibility categories, at 39.00% and 34.57%, respectively. The very low and high susceptibility categories account for 19.71% and 6.73%, respectively. The bagging model’s prediction results indicate that the moderate susceptibility category represents the highest proportion at 41.16%, followed by low susceptibility at 33.79%, very low susceptibility at 19.06%, and high susceptibility at 5.99%. In regard to the stacking model’s prediction results, the low susceptibility category represents the highest proportion at 46.02%, followed by very low susceptibility at 22.43%. Moderate and high susceptibility categories account for 21.53% and 10.02%, respectively. The stacking model exhibits the highest proportions in regard to both low and high susceptibility categories among the models, demonstrating its potential in regard to identifying extreme risk areas.

4.4. Landslide Susceptibility Mapping

The optimized models were applied to the entire dataset of the study area, generating a susceptibility distribution map (Figure 13). The map demonstrates that the ensemble learning models achieve high accuracy in predicting historical landslides in the study area. The random forest model indicates significant portions of the area as low and very low susceptibility, while certain regions with a high density of landslide occurrences are classified as moderate and high susceptibility areas. In contrast, the XGBoost model marks more areas as moderate and high susceptibility, with fewer low susceptibility areas, suggesting a more conservative approach to predicting low-risk regions. Similarly, the LightGBM model predicts many areas as moderate and high susceptibility, with a balanced distribution of susceptibility categories, including some low susceptibility areas. The CatBoost model highlights a large number of moderate and high susceptibility areas, with less emphasis on low susceptibility areas, resembling the predictions of the XGBoost and LightGBM models. The bagging model shows a more balanced distribution of susceptibility levels, with significant low susceptibility areas and a more even distribution of moderate and high susceptibility areas. The stacking model, which combines the predictions of multiple models, exhibits a more detailed distribution of the susceptibility levels, clearly delineating high, moderate, and low susceptibility areas. These results suggest that the stacking model provides the most comprehensive prediction and should be considered the optimal susceptibility map.

There is a certain amount of consistency among the models in predicting landslide risk across the study area. The northern part of the study area is generally classified as high susceptibility, particularly in the predictions from the XGBoost, LightGBM, and CatBoost models. The central region is also identified as a key area of high susceptibility, especially in regard to the CatBoost and stacking models, where the central area shows clear high susceptibility. In contrast, the random forest model assigns relatively lower susceptibility levels to this region. The southwestern region is classified as moderate susceptibility by most models, particularly the LightGBM and XGBoost models. The eastern part of the study area is commonly marked as low or very low susceptibility across all the models, with the random forest and bagging models showing extensive low-risk areas in the east. The southern region is also a primary area of low susceptibility, particularly in the bagging and stacking models, which widely classify this area as low susceptibility, indicating relative safety and a lower likelihood of landslide occurrences. Overall, the landslide risk distribution in the study area is as follows: the northern and central regions are high susceptibility areas, particularly along the Yangtze River, where landslide occurrences are frequent and require significant attention; the southwestern region is of moderate susceptibility, necessitating appropriate preventive measures; and the eastern and southern regions are low susceptibility areas, which are relatively safe, but still warrant vigilance.

5. Discussion

This study adopted an innovative approach by forgoing traditional collinearity detection methods for factors and, instead, exploring the effectiveness of different ensemble learning models in evaluating and ranking the influencing factors. Specifically, homogeneous ensemble models, namely random forest, XGBoost, LightGBM, and CatBoost, were employed alongside heterogeneous ensemble models, namely bagging and stacking, to comprehensively assess factor importance. The random forest model evaluates factor importance based on Gini impurity or entropy metrics, measuring the improvement in purity at decision tree nodes. XGBoost quantifies the contribution of factor splits to model performance using the “gain”. LightGBM utilizes Gradient-based One-Side Sampling and Exclusive Feature Bundling techniques, while CatBoost emphasizes the frequency of factor usage across multiple trees and its impact on decision making. Notably, the analysis of factor importance using heterogeneous ensemble models represents a novel strategy. In regard to the bagging and stacking models, the importance scores of the same factor from different models were combined and averaged to form a comprehensive factor importance score. The results indicate that the DEM was identified as the most critical factor across all the models, followed by lithology and the distance to the river. This finding aligns with the established understanding in landslide susceptibility research, wherein topographic features have long been recognized as fundamental determinants of slope stability. The DEM, as a comprehensive representation of terrain elevation, slope gradient, and aspect, encapsulates essential information related to gravitational forces, surface runoff, and soil erosion, all of which are key triggering mechanisms for landslides. Previous studies have shown that steeper slopes, as quantified by DEM-derived metrics, increase shear stress on soil masses, while elevation changes can influence hydrological conditions, leading to pore water pressure buildup and subsequent slope failure. Against this backdrop, our study posits a clear hypothesis: in regions characterized by complex topography and variable geological conditions, the DEM should emerge as the dominant factor in landslide susceptibility modeling due to its direct influence on multiple physical processes. The multi-model strategy employed herein, by aggregating factor importance scores from random forest, XGBoost, and other ensemble algorithms, not only enhanced the robustness of the predictions, but also reduced the risk of overfitting, thereby improving the accuracy and reliability of the results. This convergence in the findings across diverse models strengthens the validity of our hypothesis, as it demonstrates that the DEM’s influence is consistently recognized regardless of the underlying algorithmic approach used.

Furthermore, a novel method was employed to progressively remove factors deemed unimportant based on their assessed importance across different models, followed by retraining the models to explore the impact of these operations on prediction accuracy. The experimental results revealed significant performance differences among the models after the removal of certain factors, primarily due to the varying mechanisms and dependencies each model has in regard to the different factors. For instance, the random forest model relies on the collective intelligence of multiple decision trees, and the absence of certain key factors leads to a noticeable drop in performance. The XGBoost model demonstrates high sensitivity to core factors, iteratively correcting errors from previous steps, and, thus, heavily depends on the correctness of crucial factors. LightGBM, utilizing Gradient-based One-Side Sampling and Exclusive Feature Bundling techniques, is highly sensitive to the total gain of certain factors when optimizing high-dimensional data. CatBoost evaluates the frequency of factor usage across decision trees and excels in regard to handling categorical factors, but the removal of key factors also significantly impacts its performance. These findings collectively highlight the importance of meticulous factor selection and optimization for enhancing predictive performance and stability across different models. Notably, the integration of predictions from multiple base models using bagging and stacking strategies not only bolstered the robustness of the overall model, but also improved the predictive accuracy. After factor removal, these ensemble models exhibited greater resilience to fluctuations in input data and demonstrated high robustness under varying conditions. However, the post-factor removal performance varied among the ensemble models, reflecting the diversity in regard to the factor handling strategies and methodologies inherent to each base model. This variability underscores the necessity of a tailored approach to factor selection and model training, as different models may react differently to changes in the input factors. Such differences ultimately influence the overall effectiveness of landslide susceptibility assessments, emphasizing the importance of understanding model-specific behaviors when optimizing factor combinations.

In the further research conducted, the performance of different models utilizing specific combinations of factors based on the training and testing datasets was analyzed, with a detailed examination of the landslide susceptibility prediction results. This comprehensive analysis revealed significant performance disparities among the models when predicting areas of high susceptibility versus those of extremely low susceptibility. Notably, the stacking model excelled in identifying high susceptibility areas, underscoring its exceptional ability to recognize regions at elevated risk of landslides. Conversely, the XGBoost model demonstrated superior effectiveness in pinpointing extremely low susceptibility areas, highlighting its precision in accurately locating low susceptibility regions. By optimizing the factor combination techniques for each model, improvements in prediction accuracy were observed for both high and extremely low susceptibility areas across all the models. This finding suggests that meticulous factor engineering not only effectively reduces prediction uncertainty, but also significantly enhances overall model accuracy. For instance, after factor optimization, the stacking model exhibited a remarkable increase in the proportion of correctly identified high susceptibility areas, while XGBoost’s accuracy in identifying extremely low susceptibility areas also showed a notable improvement. Moreover, the analysis revealed that each model exhibited unique sensitivities and adaptive capabilities in predicting varying susceptibility levels, primarily driven by their distinct internal mechanisms and factor handling methodologies. The bagging and stacking strategies, which integrate predictions from multiple base models, demonstrated heightened robustness, particularly outperforming individual models in predicting both high and low susceptibility levels. These strategies highlighted their ability to balance the strengths of different models, resulting in improved prediction stability and accuracy. This systematic investigation of model performance illuminated the nuanced differences in how various models respond to changes in the input factors. It underscores the critical importance of meticulous factor engineering and multi-model integration strategies in enhancing prediction accuracy and stability. By leveraging the complementary strengths of individual models, ensemble approaches such as bagging and stacking provide a robust framework for improving the reliability of landslide susceptibility predictions.

The performance comparison in Table 2 demonstrates that our optimized stacking ensemble (AUC = 0.876) outperforms not only traditional machine learning methods (SVM = 0.813, logistic regression = 0.792) [3], but also deep learning approaches (CNN = 0.845, DNN = 0.832) [19,20]. Specifically, the stacking model achieves a 3.1% higher AUC than CNN and a 4.4% improvement over DNN, highlighting its effectiveness in regard to landslide susceptibility prediction.

The optimized models were applied to the entire study area to create a landslide susceptibility distribution map. The results indicated that the stacking model, which integrates predictions from multiple base models, provided the most detailed distribution of susceptibility zones, clearly distinguishing between high, medium, and low susceptibility areas, with excellent performance. In the landslide susceptibility prediction for the study area, the models exhibited high consistency. Specifically, the northern region, characterized by steep terrain and concentrated rainfall, was classified as a high susceptibility area. The central region, influenced by frequent geological activities and human interventions, was also identified as a high susceptibility zone. The southwestern region, with its complex terrain and climatic conditions, was assessed as having moderate susceptibility. In contrast, the eastern and southern regions, with relatively stable geological conditions, were classified as low susceptibility areas, indicating a lower risk of landslides. From a regional distribution perspective, the northern and central areas, classified as high susceptibility zones, require focused monitoring and preventive measures. The southwestern region, as a moderate susceptibility zone, should implement appropriate protective measures to mitigate potential risks. The eastern and southern regions, classified as low susceptibility zones, are relatively safe, but should remain vigilant against potential hazards. The landslide susceptibility modeling strategy developed in this study not only improved the accuracy of landslide risk predictions, but also facilitated more effective landslide disaster prevention and mitigation efforts. This approach ensures the safety of residents and infrastructure within the region, while providing robust scientific support for the detailed delineation of landslide susceptibility zones and risk management.

Despite the promising results of this study, several limitations warrant consideration. Firstly, while a variety of ensemble learning models were employed, the selection of the factors was primarily based on the available geological and environmental data, which may not encompass all the relevant variables influencing landslide susceptibility. Consequently, if the model were applied to another region or country with different geological and climatic conditions, there is a significant risk that the predictions may be inaccurate or unreliable. This limitation arises because the model’s performance is heavily dependent on the quality and relevance of the input data; factors that are critical in one region may not hold the same importance in another. Future research should aim to incorporate additional data sources, such as socio-economic factors and data from real-time monitoring systems, to enrich the model inputs and enhance the predictive accuracy. Secondly, the models were evaluated within a specific geographical context, which may limit the generalizability of the findings to other regions. Therefore, it is essential to validate the developed models in diverse settings to assess their robustness and adaptability. Without such validation, there is a risk of misestimating landslide susceptibility, potentially leading to inadequate risk management strategies and jeopardizing safety in areas where the model is applied. Moreover, while the study emphasized the importance of factor engineering and multi-model integration, the intricate interactions between factors remain an area for further exploration. Future studies could investigate advanced techniques, such as deep learning and hybrid modeling approaches, to capture complex relationships among the factors that traditional ensemble methods may overlook. Additionally, incorporating uncertainty quantification methods could provide a more comprehensive understanding of the reliability of the predictions, enabling better informed decision making in landslide risk management. Finally, the reliance on historical data for model training and validation may introduce biases, particularly in regions with limited or inconsistent data availability. This could affect the model’s ability to generalize to future scenarios, especially under changing environmental or climatic conditions. Future research should consider integrating dynamic data sources and adaptive modeling frameworks to address these challenges and improve the model’s applicability to evolving conditions.

6. Conclusions

This study aimed to enhance landslide susceptibility modeling by integrating various machine learning models, including random forest, XGBoost, LightGBM, CatBoost, bagging, and stacking. The focus was on evaluating and ranking the importance of different influencing factors and employing advanced ensemble learning strategies to improve prediction accuracy and robustness. The methodology involved comprehensive factor importance analysis, stepwise factor elimination, and optimization of model performance. Homogeneous ensemble models (random forest, XGBoost, LightGBM, and CatBoost) and heterogeneous ensemble models (bagging and stacking) were utilized to assess and integrate the importance of each factor. The models were trained and tested using a dataset comprising factors such as the DEM, lithology, the distance to the river, the NDVI, and slope.

The key findings indicate that the DEM consistently emerged as the most critical factor across all the models, underscoring its pivotal role in predicting landslide susceptibility. The stacking model achieved the highest AUC value of 0.876, representing a 4.4% improvement over the other models and demonstrating superior predictive performance. The factor engineering process significantly enhanced model accuracy. For instance, the proportion of areas classified as highly susceptible by the stacking model increased to 42.54%, while the proportion classified as very low susceptibility by the XGBoost model rose from 12.96% to 13.57%. These improvements highlight the effectiveness of factor optimization in enhancing model performance. The optimized models were applied to create a detailed landslide susceptibility distribution map of the entire study area. The results indicated that the stacking model provided the most detailed distribution, clearly distinguishing between high, medium, and low susceptibility areas. The northern and central regions were classified as high susceptibility areas due to their steep terrain and concentrated rainfall, while the eastern and southern regions exhibited low susceptibility, indicating a lower landslide risk. These findings demonstrate the utility of ensemble models in producing robust and detailed landslide susceptibility maps.

In conclusion, the innovative approach of integrating multiple models and employing meticulous factor engineering strategies significantly improved the accuracy and robustness of landslide susceptibility predictions. The findings offer valuable insights for effective landslide disaster prevention and mitigation efforts, ensuring the safety of residents and infrastructure in susceptible regions. This study highlights the potential of ensemble learning techniques to enhance the reliability of landslide susceptibility assessments and provides a scientific basis for informed decision making in regard to landslide risk management.

Author Contributions

Writing—original draft preparation: L.Z.; writing—review and editing: T.Z.; methodology: T.Z., L.W.; investigation: B.J., S.Y., D.H., W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (U22A20600), the National Natural Science Foundation of China (51708068), the Chongqing Postgraduate Mentoring Team Construction Project (JDDSTD2022009), the Chongqing Construction Science and Technology Plan Project (Chengkezi 2023 No. 7-7), the Chongqing Construction Science and Technology Plan Project (Chengkezi 2022 No. 1-12), the State Grid Chongqing Project (2025-51#), the Chongqing Technology Innovation and Application Development Special Project (CSTB2022TIAD-GPX0046), the Chongqing Talent Program Innovation and Entrepreneurship Demonstration Team (CQYC-201903204), and the China Postdoctoral Science Foundation (2024MD764034).

Institutional Review Board Statement

Not applicable to studies that do not involve humans or animals.

Informed Consent Statement

Not applicable to studies that do not involve humans.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Acknowledgments

We would like to express our gratitude to Siqiao Ye, Linfeng Wang and his team at Chongqing Jiaotong University for their assistance.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

Jin, B.; Liu, S.; Zeng, T.; Li, Y.; Wang, T.; Gui, L.; Zhao, B.; Yin, K. Spatio-temporal forecasting of landslide hazard in Chongqing National Transmission Protection Regions, China. Int. J. Digit. Earth 2024, 17, 2392843. [Google Scholar] [CrossRef]
Chen, G.; Zeng, T.; Liu, D.; Chen, H.; Wang, L.; Wang, L.; Zhang, K.; Glade, T. Geomorphological and Geological Characteristics Slope Unit: Advancing Township-Scale Landslide Susceptibility Assessment Strategies. Land 2025, 14, 355. [Google Scholar] [CrossRef]
Huang, F.; Xiong, H.; Jiang, S.; Yao, C.; Fan, X.; Catani, F.; Chang, Z.; Zhou, X.; Huang, J.; Liu, K. Modelling landslide susceptibility prediction: A review and construction of semi-supervised imbalanced theory. Earth-Sci. Rev. 2024, 250, 104700. [Google Scholar] [CrossRef]
Huang, F.; Teng, Z.; Yao, C.; Jiang, S.; Catani, F.; Chen, W.; Huang, J. Uncertainties of landslide susceptibility prediction: Influences of random errors in landslide conditioning factors and errors reduction by low pass filter method. J. Rock Mech. Geotech. Eng. 2023, 16, 213–230. [Google Scholar] [CrossRef]
Luo, J.; Yin, G.; Niu, F.; Dong, T.; Gao, Z.; Liu, M.; Yu, F. Machine learning-based predictions of current and future susceptibility to retrogressive thaw slumps across the Northern Hemisphere. Adv. Clim. Change Res. 2024, 15, 253–264. [Google Scholar] [CrossRef]
Luo, J.; Niu, F.; Lin, Z.; Liu, M.; Yin, G.; Gao, Z. Inventory and Frequency of Retrogressive Thaw Slumps in Permafrost Region of the Qinghai–Tibet Plateau. Geophys. Res. Lett. 2022, 49, e2022GL099829. [Google Scholar] [CrossRef]
Lima, P.; Steger, S.; Glade, T.; Mergili, M. Conventional data-driven landslide susceptibility models may only tell us half of the story: Potential underestimation of landslide impact areas depending on the modeling design. Geomorphology 2023, 430, 108638. [Google Scholar] [CrossRef]
Tang, Y.; Feng, F.; Guo, Z.; Feng, W.; Li, Z.; Wang, J.; Sun, Q.; Ma, H.; Li, Y. Integrating principal component analysis with statistically-based models for analysis of causal factors and landslide susceptibility mapping: A comparative study from the loess plateau area in Shanxi (China). J. Clean. Prod. 2020, 277, 124159. [Google Scholar] [CrossRef]
Allocca, V.; Di Napoli, M.; Coda, S.; Carotenuto, F.; Calcaterra, D.; Di Martire, D.; De Vita, P. A novel methodology for Groundwater Flooding Susceptibility assessment through Machine Learning techniques in a mixed-land use aquifer. Sci. Total Environ. 2021, 790, 148067. [Google Scholar] [CrossRef]
Hong, H.; Liu, J.; Zhu, A. Modeling landslide susceptibility using LogitBoost alternating decision trees and forest by penalizing attributes with the bagging ensemble. Sci. Total Environ. 2020, 718, 137231. [Google Scholar] [CrossRef]
Liu, Z.; Gilbert, G.; Cepeda, J.M.; Lysdahl, A.O.K.; Piciullo, L.; Hefre, H.; Lacasse, S. Modelling of shallow landslides with machine learning algorithms. Geosci. Front. 2021, 12, 385–393. [Google Scholar] [CrossRef]
Reichenbach, P.; Rossi, M.; Malamud, B.D.; Mihir, M.; Guzzetti, F. A review of statistically-based landslide susceptibility models. Earth-Sci. Rev. 2018, 180, 60–91. [Google Scholar] [CrossRef]
Okalp, K.; Akgün, H. National level landslide susceptibility assessment of Turkey utilizing public domain dataset. Environ. Earth Sci. 2016, 75, 847. [Google Scholar] [CrossRef]
Okalp, K.; Akgün, H. Landslide Susceptibility Assessment in Medium-Scale: Case Studies from the Major Drainage Basins of Turkey. Environ. Earth Sci. 2022, 81, 244. [Google Scholar] [CrossRef]
Öner, G.; Akgün, H.; Koçkar, M.K.; Arslan Kelam, A. Municipal landfill site selection using TOPSIS methodology: A case study for Polatlı, Ankara, Türkiye. Bull. Eng. Geol. Environ. 2025, 84, 126. [Google Scholar] [CrossRef]
Azarafza, M.; Akgün, H.; Ghazifard, A.; Asghari-Kaljahi, E.; Rahnamarad, J.; Derakhshani, R. Discontinuous rock slope stability analysis by limit equilibrium approaches—A review. Int. J. Digit. Earth 2021, 14, 1918–1941. [Google Scholar] [CrossRef]
Zeng, T.; Gong, Q.; Wu, L.; Zhu, Y.; Yin, K.; Peduto, D. Double-index rainfall warning and probabilistic physically based model for fast-moving landslide hazard analysis in subtropical-typhoon area. Landslides 2023, 21, 753–773. [Google Scholar] [CrossRef]
Guo, Z.; Tian, B.; He, J.; Xu, C.; Zeng, T.; Zhu, Y. Hazard assessment for regional typhoon-triggered landslides by using physically-based model—A case study from southeastern China. Georisk Assess. Manag. Risk 2023, 17, 740–754. [Google Scholar] [CrossRef]
Zhu, H.; Azarafza, M.; Akgün, H. Deep learning-based key-block classification framework for discontinuous rock slopes. J. Rock Mech. Geotech. Eng. 2022, 14, 1131–1139. [Google Scholar] [CrossRef]
Azarafza, M.; Azarafza, M.; Akgün, H.; Atkinson, P.M.; Derakhshani, R. Deep learning-based landslide susceptibility mapping. Sci. Rep. 2021, 11, 24112. [Google Scholar] [CrossRef]
Nikoobakht, S.; Azarafza, M.; Akgün, H.; Derakhshani, R. Landslide Susceptibility Assessment by Using Convolutional Neural Network. Appl. Sci. 2022, 12, 5992. [Google Scholar] [CrossRef]
Pradhan, B.; Sameen, M.I.; Al-Najjar, H.A.H.; Sheng, D.; Alamri, A.M.; Park, H. A Meta-Learning Approach of Optimisation for Spatial Prediction of Landslides. Remote Sens. 2021, 13, 4521. [Google Scholar] [CrossRef]
Pradhan, B.; Dikshit, A.; Lee, S.; Kim, H. An explainable AI (XAI) model for landslide susceptibility modeling. Appl. Soft Comput. 2023, 142, 110324. [Google Scholar] [CrossRef]
Zeng, T.; Guo, Z.; Wang, L.; Jin, B.; Wu, F.; Guo, R. Tempo-Spatial Landslide Susceptibility Assessment from the Perspective of Human Engineering Activity. Remote Sens. 2023, 15, 4111. [Google Scholar] [CrossRef]
Yu, L.; Wang, Y.; Pradhan, B. Enhancing landslide susceptibility mapping incorporating landslide typology via stacking ensemble machine learning in Three Gorges Reservoir, China. Geosci. Front. 2024, 15, 101802. [Google Scholar] [CrossRef]
Pham, B.T.; Vu, V.D.; Costache, R.; Phong, T.V.; Ngo, T.Q.; Tran, T.; Nguyen, H.D.; Amiri, M.; Tan, M.T.; Trinh, P.T.; et al. Landslide susceptibility mapping using state-of-the-art machine learning ensembles. Geocarto Int. 2021, 37, 5175–5200. [Google Scholar] [CrossRef]
Akinci, H.; Zeybek, M. Comparing classical statistic and machine learning models in landslide susceptibility mapping in Ardanuc (Artvin), Turkey. Nat. Hazards 2021, 108, 1515–1543. [Google Scholar] [CrossRef]
Arabameri, A.; Karimi-Sangchini, E.; Pal, S.C.; Saha, A.; Chowdhuri, I.; Lee, S.; Tien Bui, D. Novel Credal Decision Tree-Based Ensemble Approaches for Predicting the Landslide Susceptibility. Remote Sens. 2020, 12, 3389. [Google Scholar] [CrossRef]
Wang, Y.; Feng, L.; Li, S.; Ren, F.; Du, Q. A hybrid model considering spatial heterogeneity for landslide susceptibility mapping in Zhejiang Province, China. Catena 2020, 188, 104425. [Google Scholar] [CrossRef]
Zeng, T.; Wu, L.; Peduto, D.; Glade, T.; Hayakawa, Y.S.; Yin, K. Ensemble learning framework for landslide susceptibility mapping: Different basic classifier and ensemble strategy. Geosci. Front. 2023, 14, 101645. [Google Scholar] [CrossRef]
Saha, A.; Villuri, V.A.G.K.; Bhardwaj, A.; Kumar, S. A Multi-Criteria Decision Analysis (MCDA) Approach for Landslide Susceptibility Mapping of a Part of Darjeeling District in North-East Himalaya, India. Appl. Sci. 2023, 13, 5062. [Google Scholar] [CrossRef]
Zeng, T.; Jin, B.; Glade, T.; Xie, Y.; Li, Y.; Zhu, Y.; Yin, K. Assessing the imperative of conditioning factor grading in machine learning-based landslide susceptibility modeling: A critical inquiry. Catena 2024, 236, 107732. [Google Scholar] [CrossRef]
Al-Najjar, H.A.H.; Pradhan, B.; Beydoun, G.; Sarkar, R.; Park, H.; Alamri, A. A novel method using explainable artificial intelligence (XAI)-based Shapley Additive Explanations for spatial landslide prediction using Time-Series SAR dataset. Gondwana Res. 2023, 123, 107–124. [Google Scholar] [CrossRef]
Heaton, J. An Empirical Analysis of Feature Engineering for Predictive Modeling; Cornell University Library. arXiv 2020, arXiv:1701.07852. [Google Scholar]
Reid Turner, C.; Fuggetta, A.; Lavazza, L.; Wolf, A.L. A conceptual basis for feature engineering. J. Syst. Softw. 1999, 49, 3–15. [Google Scholar] [CrossRef]
Zhu, Y.; Liu, S.; Yin, K.; Zeng, T.; Guo, Z.; Liu, Z.; Yang, H. Impact of negative sampling strategies on landslide susceptibility assessment. Adv. Space Res. 2025; in press. [Google Scholar] [CrossRef]
Yu, L.; Pradhan, B.; Wang, Y. A comparative study of various combination strategies for landslide susceptibility mapping considering landslide types. Geosci. Front. 2025, 16, 101999. [Google Scholar] [CrossRef]
Youssef, A.M.; Pradhan, B.; Dikshit, A.; Katheri, M.M.A.; Matar, S.S.; Mahdi, A.M. Landslide susceptibility mapping using CNN—1D and 2D deep learning algorithms: Comparison of their performance at Asir Region, KSA. Bull. Eng. Geol. Environ. 2022, 81, 165. [Google Scholar] [CrossRef]
Al-Najjar, H.A.H.; Pradhan, B. Spatial landslide susceptibility assessment using machine learning techniques assisted by additional data created with generative adversarial networks. Geosci. Front. 2021, 12, 625–637. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]

Figure 1. Overview of the study area.

Figure 2. (a) Landslide occurrences per year in the study area, (b) distribution of landslide areas in the study area, and (c) distribution of landslide volumes in the study area.

Figure 3. Influencing factors: (a) DEM, (b) slope, (c) plan curvature, (d) profile curvature, (e) SPI, (f) TPI, (g) lithology, (h) soil type, (i) distance to river, (j) distance to road, (k) land use, and (l) NDVI.

Figure 4. Graphical representation of the overall methodology.

Figure 5. Differences in classification performance among various models; (a) random forest, (b) XGBoost, (c) LightGBM, (d) CatBoost, (e) bagging, and (f) stacking.

Figure 6. Factor importance analysis: (a) RF model, (b) XGBoost model, (c) LightGBM model, (d) CatBoost model, (e) bagging model, and (f) stacking model. PlanC: plan curvature, ProC: profile curvature, DTRiver: distance to the river, DTRoad: distance to the road.

Figure 7. ROC curves of models considering all factors: (a) RF model, (b) XGBoost model, (c) LightGBM model, (d) CatBoost model, (e) bagging model, and (f) stacking model.

Figure 8. AUC value rankings after stepwise removal of low-importance factors: (a) RF model, (b) XGBoost model, (c) LightGBM model, (d) CatBoost model, (e) bagging model, and (f) stacking model.

Figure 9. ROC curves of optimized models: (a) RF model, (b) XGBoost model, (c) LightGBM model, (d) CatBoost model, (e) bagging model, and (f) stacking model.

Figure 10. Susceptibility level of models based on training/testing datasets considering all factors: (a) RF model, (b) XGBoost model, (c) LightGBM model, (d) CatBoost model, (e) bagging model, and (f) stacking model.

Figure 11. Susceptibility level of models based on training/testing datasets considering optimal factors: (a) RF model, (b) XGBoost model, (c) LightGBM model, (d) CatBoost model, (e) bagging model, and (f) stacking model.

Figure 12. Susceptibility level of models considering optimal factors for landslide predictions across the whole area.

Figure 13. Landslide susceptibility maps: (a) RF model, (b) XGBoost model, (c) LightGBM model, (d) CatBoost model, (e) bagging model, and (f) stacking model.

Table 1. Description of influencing factors.

Factor	Source	Format
Elevation, Slope, Plan Curvature, Profile Curvature	Digital Elevation Model from ALOS, provided by Alaska Satellite Facility, 12.5 m resolution	Raster grid (.tif)
TWI, and SPI	Derived using ArcGIS 10.6 from the DEM	Raster grid (.tif)
Lithology	Geological map from the local Department of Natural Resources, scale 1:50,000	Vector data (.shp)
Soil Type	1:1,000,000 Soil Map of China, compiled by Nanjing Institute of Soil Science, CAS	Vector data (.shp)
Distance to the Road and River	Calculated using ArcGIS Euclidean distance tool	Raster grid (.tif)
NDVI	Landsat 8 TM imagery, processed via ENVI 5.3	Raster grid (.tif)
Land Use	2020 land use classification from Landsat 8 by Resource and Environment Science Data Center, CAS	Raster grid (.tif)

Table 2. Performance comparison of different models.

Model	Optimized Stacking	SVM	Logistic Regression	CNN	DNN
AUC	0.876	0.813	0.792	0.845	0.832

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, L.; Ye, S.; He, D.; Wang, L.; Li, W.; Jin, B.; Zeng, T. Examination of Landslide Susceptibility Modeling Using Ensemble Learning and Factor Engineering. Appl. Sci. 2025, 15, 6192. https://doi.org/10.3390/app15116192

AMA Style

Zhang L, Ye S, He D, Wang L, Li W, Jin B, Zeng T. Examination of Landslide Susceptibility Modeling Using Ensemble Learning and Factor Engineering. Applied Sciences. 2025; 15(11):6192. https://doi.org/10.3390/app15116192

Chicago/Turabian Style

Zhang, Lizhou, Siqiao Ye, Deping He, Linfeng Wang, Weiping Li, Bijing Jin, and Taorui Zeng. 2025. "Examination of Landslide Susceptibility Modeling Using Ensemble Learning and Factor Engineering" Applied Sciences 15, no. 11: 6192. https://doi.org/10.3390/app15116192

APA Style

Zhang, L., Ye, S., He, D., Wang, L., Li, W., Jin, B., & Zeng, T. (2025). Examination of Landslide Susceptibility Modeling Using Ensemble Learning and Factor Engineering. Applied Sciences, 15(11), 6192. https://doi.org/10.3390/app15116192

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Examination of Landslide Susceptibility Modeling Using Ensemble Learning and Factor Engineering

Abstract

1. Introduction

2. Geological Background

2.1. Study Area

2.2. Landslide Inventory and Influencing Factors

3. Methodology

3.1. Random Forest

3.2. XGBoost

3.3. LightGBM

3.4. CatBoost

3.5. Heterogeneous Ensemble Strategy

3.6. Factor Engineering

3.7. Accuracy Test

4. Results

4.1. Factor Importance Analysis

4.2. Factor Sensitivity Analysis

4.3. Pixel Statistical Analysis

4.4. Landslide Susceptibility Mapping

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI