1. Introduction
Landslides are among the most destructive natural hazards, causing significant damage to infrastructure, ecosystems, and human life, particularly in mountainous regions with steep terrain, fragile geology, and heavy rainfall [
1]. Between 2004 and 2016, over 55,000 fatalities were attributed to landslides worldwide, with the highest concentration in tropical and subtropical Asia [
2]. In Taiwan, vulnerability is amplified by its location along the Pacific Ring of Fire and frequent typhoons, which deliver intense seasonal rainfall [
3]. Combined with active tectonics, these conditions create a high risk of slope failure.
Historic events, including the 1999 Chi-Chi Earthquake and Typhoon Morakot in 2009, underscore this susceptibility. Recently, Typhoon Khanun (August 2023) triggered widespread shallow landslides in central Taiwan. Long-term analyses using multi-seasonal Landsat imagery and nighttime light data indicate persistent landslide expansion in urban and peri-urban areas from 1998 to 2017 [
4], reflecting increasing interactions between hazard processes and human settlements.
Conventional landslide mapping methods, such as field surveys and aerial photo interpretation, are labor-intensive and limited in spatial coverage. Consequently, remote sensing (RS) and digital terrain analysis integrated with geographic information systems (GIS) have become essential for large-scale landslide detection [
5]. Recent studies highlight the growing use of object-based image analysis and machine learning for landslide mapping [
6,
7]. Algorithms like Support Vector Machines (SVM) [
8], Artificial Neural Networks (ANN) [
9], and Random Forest (RF) [
10] have demonstrated strong performance in geospatial applications [
11,
12,
13]. RF, in particular, is widely favored for its high accuracy, robustness to noise, and ability to model nonlinear relationships through ensemble decision trees [
10].
RF has emerged as one of the most reliable and widely adopted machine learning algorithms for landslide detection, susceptibility mapping, and risk assessment, owing to its ability to capture complex, nonlinear relationships among environmental, geological, and anthropogenic factors [
14,
15,
16,
17]. Unlike traditional statistical models, RF builds an ensemble of decision trees that collectively reduce overfitting risk and improve prediction stability, even when datasets contain noise or redundant variables. Numerous studies have shown that RF achieves higher classification accuracy and predictive power compared with conventional methods such as logistic regression, single decision trees, and other non-ensemble algorithms [
15,
18]. A key strength of RF lies in its capacity to process large volumes of multi-source heterogeneous data thereby enabling comprehensive modeling of landslide conditioning factors and enhancing the physical relevance of susceptibility maps [
16,
17]. RF has also demonstrated robustness when applied to highly imbalanced datasets, which are common in landslide mapping, particularly when combined with oversampling strategies such as Synthetic Minority Oversampling Technique (SMOTE) variants [
19,
20]. Its versatility has been demonstrated across diverse geographic contexts, from regional-scale assessments in Kerala, India [
21] to highland tropical regions such as Cameron Highlands, Malaysia [
14]. Collectively, these findings highlight RF’s position as a leading tool for landslide hazard assessment, capable of integrating diverse data sources and advanced optimization techniques to deliver accurate, reliable, and site-specific predictions.
While classical machine learning models like RF remain widely used, recent studies have shown that deep learning (DL) approaches—especially convolutional neural networks (CNNs), recurrent models (e.g., long short-term memory (LSTM)), and hybrid architectures—have become state-of-the-art in landslide susceptibility mapping. These methods excel at learning hierarchical spatial patterns from high-resolution imagery and large datasets. Notable recent examples include CNN–BiLSTM–Attention models [
22], DL-based landslide detection using U-Net [
23,
24], DL-based rainfall-induced landslide prediction [
25], and deep ensembles integrating vision transformers [
26]. However, despite achieving strong accuracy in benchmark studies, DL approaches often require extensive labeled datasets, long training times, and substantial computational resources—conditions that are rarely met immediately after a disaster. In such time-critical contexts, rapid-response mapping demands models that can be trained quickly and effectively on smaller datasets. In addition, recent work has introduced advanced modular intelligence models, such as the Hybrid Block Neural Network (HBNN), which integrates modular neural structures with genetic algorithms to further enhance landslide susceptibility mapping [
27]. Against this backdrop, the present study adopts classical ML methods—specifically RF combined with advanced oversampling techniques—as a practical compromise that balances accuracy and efficiency, making it particularly suitable for small-area, data-limited, post-disaster applications.
A persistent challenge in these tasks is class imbalance (also referred to as skewed class distribution, non-uniform class distribution, or disproportionate class representation), where minority classes such as landslides are underrepresented, leading to biased predictions. Technically, any dataset with unequal class distributions can be considered imbalanced; however, the community generally reserves this term for cases of significant or extreme imbalance [
28]. This issue is particularly critical in post-disaster mapping, where rapid detection of landslide zones is essential despite their small spatial extent. Synthetic oversampling methods like SMOTE address this problem by generating new minority class samples via interpolation [
29]. Variants such as Borderline-SMOTE, Adaptive Synthetic Sampling (ADASYN), and Geometric SMOTE (G-SMOTE) further refine sample generation near class boundaries [
30,
31]. These techniques have been successfully applied in environmental and geospatial studies [
32,
33,
34], including landslide susceptibility mapping [
19,
35] and land use/land cover (LULC) classification in soil erosion modeling [
36]. However, comparative evaluations across a broad range of SMOTE variants remain scarce, as most studies consider only a few variants and primarily target large-scale susceptibility mapping rather than detailed, site-specific classification after recent disasters. Given that SMOTE algorithms can perform differently depending on dataset characteristics, a systematic side-by-side comparison is especially important for localized, high-resolution post-disaster applications, where time-sensitive and operationally reliable mapping is essential but largely untested.
This study applies RF and 65 SMOTE-based oversampling variants to classify land cover and detect landslides in Nanfeng Village, Nantou County, following Typhoon Khanun (August 2023). This study focuses on shallow rainfall-induced landslides common in Taiwan’s mountainous terrain during typhoons, characterized by rapid soil and colluvium movement on steep slopes. Differentiating landslide types is beyond the current scope but remains an important direction for future research to refine predictive variables for specific failure mechanisms. High-resolution Pléiades imagery, obtained from the Center for Space and Remote Sensing Research (CSRSR) at National Central University, and topographic derivatives (slope, aspect, curvature) derived from a 20 m Digital Elevation Model (DEM) provided by Taiwan’s National Land Surveying and Mapping Center (NLSC) were resampled to 2 m for alignment with the satellite imagery. Although resampling does not improve inherent resolution, it ensures spatial consistency for integrated analysis.
This approach combines high-resolution imagery, terrain factors, and advanced oversampling to enhance landslide detection under severe class imbalance. The integration of SMOTE with RF demonstrates practical potential for timely, accurate post-disaster mapping in complex mountainous terrain. This research is highly relevant to hazard mapping teams, local governments, and emergency management agencies in Taiwan and other mountainous regions. By determining which SMOTE variant most effectively improves landslide detection performance under severe imbalance, the findings offer an evidence-based guide for operational mapping workflows. This supports faster and more accurate post-disaster assessments, facilitates the targeted allocation of recovery resources, and strengthens the resilience of vulnerable communities. The objectives of this study are as follows:
To develop an RF-based land cover classification model capable of detecting landslides from high-resolution imagery and terrain features.
To evaluate the performance of 65 SMOTE variants in mitigating class imbalance and improving sensitivity to minority classes.
To generate a detailed, high-resolution LULC map emphasizing landslide distribution for disaster response and planning.
2. Materials and Methods
This study was conducted in Nanfeng Village, Nantou County, Taiwan, an area characterized by steep terrain and frequent slope failures (
Figure 1). Elevation ranges from approximately 606 m to 2419 m above sea level, and the landscape is dissected by Mei Creek and its tributary, Nanshan Creek, forming a network of valleys and ridges. The region spans both subtropical and temperate monsoon zones, with an average annual rainfall of about 2100 mm and significant diurnal temperature variation. These conditions, combined with abundant water resources, support diverse agricultural activities, including high-mountain tea, plums, and other horticultural crops.
Nanfeng Village is highly susceptible to natural hazards due to its fragile geology, seismic history, and frequent typhoon exposure. The 1999 Chi-Chi Earthquake destabilized slopes across the region, and subsequent typhoons—such as Mindulle (2004), Sinlaku (2008), and most recently Typhoon Khanun on 3 August 2023—have repeatedly triggered severe landslides, damaging infrastructure and threatening residents.
To address the research objectives—developing an accurate post-disaster land cover classification model, assessing the effectiveness of SMOTE-based oversampling techniques, and generating detailed landslide detection maps—a workflow integrating satellite imagery, DEM-derived terrain features, and land cover information was implemented (
Figure 2).
The core dataset comprises high-resolution Pléiades imagery and topographic data. The Pléiades images were acquired before and after Typhoon Khanun (26 June 2023 and 25 August 2023), providing valuable temporal context for mapping (
Figure 3). Each image includes four multispectral bands—Blue (B), Green (G), Red (R), and Near-Infrared (NIR)—at 2 m resolution and a panchromatic band at 0.5 m, enabling detailed land cover discrimination and landslide identification.
A 20 m-resolution DEM obtained from the NLSC was resampled to 2 m using bilinear interpolation for alignment with Pléiades imagery. While this resampling ensured spatial consistency, it may introduce interpolation artifacts that could affect terrain derivatives such as slope and curvature [
37,
38]. All datasets were standardized to the TWD97/TM2 coordinate reference system to ensure consistency across layers. Additionally, preprocessing steps were applied to guarantee spatial alignment of all data sources, including high-resolution Pléiades imagery, DEM-derived topographic layers, and vegetation indices. Temporal consistency was addressed by selecting satellite images immediately before (26 June 2023) and after (25 August 2023) Typhoon Khanun, minimizing seasonal variation and allowing accurate assessment of typhoon-induced changes. From the DEM, key topographic attributes were derived, including elevation, slope, aspect, curvature indices (general, profile, and plan), and terrain metrics such as the Terrain Ruggedness Index (TRI), Topographic Position Index (TPI), and roughness. These variables capture the geomorphological context influencing land cover and landslide occurrence.
In total, 22 features were prepared for classification, encompassing spectral bands, vegetation indices such as the Normalized Difference Vegetation Index (NDVI) and Soil Adjusted Vegetation Index (SAVI), a water index (Normalized Difference Water Index, NDWI), band ratios, and DEM-derived terrain variables (
Table 1).
Figure 4 illustrates examples of these inputs, including elevation, slope, NDVI, and NDWI.
LULC data were obtained from the 2023 NLSC dataset and reclassified into seven categories: farmland, forest, roads, water bodies, built-up areas, grassland, and landslides (
Figure 5). The landslide class was manually delineated through visual interpretation of a post-disaster Pléiades image acquired on 25 August 2023, following Typhoon Khanun, as no official post-typhoon inventory was available. Landslides were identified directly from the image, focusing on indicators such as newly exposed bare soil, vegetation loss, and slope disturbances. A standardized interpretation protocol was applied, and random spot checks were conducted against high-resolution Google Earth imagery to reduce manual interpretation errors. Small ambiguous patches were excluded during digitization to minimize potential false positives. These annotations were used for both training and validation, ensuring that the dataset captured both typical land cover patterns and event-specific landslide disturbances.
2.1. Random Forest Classification Framework
The LULC classification model was developed to detect landslides by integrating high-resolution Pléiades imagery, DEM-derived terrain attributes, and machine learning techniques, following the workflow shown in
Figure 2. The process included image preprocessing, sample preparation, handling class imbalance, RF model training, and performance evaluation.
To support model development, two datasets were constructed: a stratified training dataset and a balanced test dataset. The stratified sampling approach ensured that the training set reflected the actual land cover distribution, while the balanced test set enabled unbiased performance evaluation across all categories.
The training dataset contained 1000 samples selected proportionally to class area from the reclassified LULC classes. Class proportions were based on the 2023 national land use database, and the allocation of training and test samples by class is presented in
Table 2. In contrast, the test dataset consisted of 210 samples (30 per class) selected through random sampling to achieve class balance. This dual sampling strategy provided a realistic basis for model training and a fair evaluation framework for minority classes, particularly landslides.
The RF classifier was implemented in Python 3.9 using the
RandomForestClassifier from
sklearn.ensemble. RF constructs multiple decision trees, each trained on a bootstrap sample of the training set, and determines splits based on the Gini impurity criterion [
10]. Predictions are aggregated by majority vote, improving generalization and reducing overfitting, which makes RF suitable for heterogeneous geospatial data.
In this study, the RF model was configured with the following:
These parameters were fixed across all experiments for consistency when comparing SMOTE variants.
The RF model used the 22 input features previously described in
Table 1, including Pléiades spectral bands, band ratios, vegetation and water indices, and DEM-derived terrain attributes. These features were selected for their relevance to land cover discrimination and landslide susceptibility.
Feature importance analysis was performed to quantify the contribution of each input variable to the classification process. The analysis utilized the built-in functionality of the RF model implemented in the scikit-learn library. Feature importance was computed using the Gini importance metric [
10], also referred to as mean decrease in impurity. This approach measures the total decrease in Gini impurity attributable to each feature across all decision trees in the ensemble. The impurity decrease is accumulated for each feature over all internal nodes where the feature is used and is then averaged across the forest. The resulting scores were normalized such that the total importance across all features equals 1, allowing for direct comparison of relative contributions. In the broader domain of intelligent modeling, feature analysis is often conducted using the weight database of the optimum model in combination with sensitivity analysis techniques and feature selection approaches [
39,
40,
41]. By contrast, in this study, we relied on the Random Forest’s built-in implementation, where feature importance is derived directly from the Gini importance function in scikit-learn [
10].
2.2. Class Imbalance Mitigation Using SMOTE
Dataset imbalance is characterized by disproportionate sample sizes across classes, which can bias machine learning models towards majority classes, reducing detection accuracy for minority classes. Severe class imbalance posed a key challenge in this study. Landslides accounted for only 0.5% of the training samples (5 out of 1000), while forest dominated with over 87% (
Table 2). Without correction, the model would likely favor majority classes, underperforming in detecting landslides.
To address this, we applied synthetic oversampling techniques using the
smote-variants Python package (version 0.7.3) [
42], which implements over 65 SMOTE-based algorithms. These include the original SMOTE [
29], Borderline-SMOTE [
30], ADASYN [
31], and hybrid approaches such as SMOTE-Tomek Links [
43]. Each method was applied to the same imbalanced training dataset, and the resulting models were evaluated on the fixed balanced test set (210 samples).
For each SMOTE variant, an RF model was trained with identical hyperparameters (n_estimators = 100, max_depth = 5) and evaluated using class-wise F1-scores. The variant that produced the best overall and minority-class performance was selected for final map generation.
2.3. Performance Evaluation Metrics
To comprehensively assess classification performance, this study employed widely used metrics: Overall Accuracy, Kappa coefficient (
), Producer’s Accuracy, User’s Accuracy, and F1-score. OA measures the proportion of correctly classified samples, while
(Cohen’s kappa index [
44]) accounts for chance agreement. PA and UA indicate class-level recall and precision, respectively, and F1-score combines both in a harmonic mean, offering a balanced measure of accuracy for each class.
Overall Accuracy (OA): The proportion of correctly classified samples to the total number of samples
N:
where
is the number of correctly classified samples in class
i, and
n is the number of classes.
Kappa Coefficient (
): A measure of agreement corrected for chance, calculated as:
where
and
represent the total number of samples predicted as and actually belonging to class
i, respectively.
Producer’s Accuracy (PA): Also known as recall, it measures the proportion of correctly predicted samples out of all actual samples in class
i:
User’s Accuracy (UA): Also known as precision, it measures the proportion of correct predictions in class
i among all samples predicted as class
i:
F1-Score: The harmonic mean of PA and UA for class
i:
All metrics were calculated using the confusion matrix from the balanced test set to provide unbiased performance comparisons. Particular attention was given to the landslide class F1-score, reflecting the model’s capability to detect this critical minority class.
3. Results and Discussion
This section presents the evaluation of the LULC classification and landslide detection models. We begin by assessing model performance on the original dataset, followed by a comparison with SMOTE-based oversampling methods. Lastly, we examine changes in landslide distribution before and after Typhoon Khanun.
3.1. Model Performance Using Original Dataset
The RF model was initially evaluated using the original dataset, which contained 1000 samples and exhibited class imbalance across the seven land cover categories. To ensure robust assessment of the model’s generalization capability, particularly for underrepresented classes, a balanced test set was constructed containing 210 samples, with 30 instances from each land cover class. This balanced evaluation approach is critical in land cover classification studies, as it provides unbiased estimates of model performance across all classes regardless of their representation in the training data. The classification results without applying SMOTE are summarized in the confusion matrix (
Table 3), indicating an OA of 0.74 and a
of 0.69, reflecting substantial agreement beyond chance. Although these values suggest acceptable overall reliability, they mask significant variability among individual land cover classes. This variability is largely driven by the inherent class imbalance in the training dataset and spectral similarity between certain land cover types.
The landslide class achieved the highest accuracy, with a PA of 0.90 and a UA of 0.96, resulting in an F1-score of 0.93. This indicates the model was highly effective at both detecting and correctly predicting landslides. Water bodies also performed strongly, with PA and UA values contributing to an F1-score of 0.86, reflecting consistent model reliability for this class. The forest class showed perfect PA (1.00), meaning all actual forest pixels were correctly classified; however, its lower UA (0.60) indicates a notable rate of false positives, suggesting that some non-forest areas were incorrectly classified as forest.
The built-up class showed moderate performance, with both PA and UA measured at 0.70 and 0.72, respectively, leading to an F1-score of 0.71. In contrast, grassland exhibited the weakest performance: despite a perfect UA of 1.00 (indicating high precision), its low PA of 0.30 reflects poor recall and suggests many actual grassland instances were missed. The road class displayed a similar imbalance, with a relatively high UA of 0.89 but a lower PA of 0.53. These discrepancies indicate that while some classes were easily distinguishable (e.g., water and landslides), others—particularly grassland and roads—suffered from misclassification, likely due to overlapping spectral signatures with farmland and forest. This finding underscores the limitations of using an imbalanced dataset for land cover classification and highlights the need for an effective resampling strategy to improve performance for underrepresented classes.
3.2. Effectiveness of SMOTE-Based Oversampling for Model Enhancement
To improve model performance under imbalanced data conditions, this study evaluated a comprehensive set of 65 SMOTE-based oversampling techniques implemented in the smote-variants Python library. The objective was to identify the most effective method for enhancing classification accuracy, especially for the landslide class, without degrading performance on majority classes.
All 65 SMOTE variants were applied to the same original dataset, and RF models were trained using each oversampled dataset. Model performance was evaluated on a fixed, balanced test dataset containing 30 samples per class.
Figure 6 presents the OA and
for all variants, providing a comparative view of their effectiveness.
Among all tested methods, Distance_SMOTE emerged as the best-performing technique with an OA of 0.85 and a of 0.82. These values represent a substantial improvement compared with the baseline model without oversampling, which achieved an OA of 0.74 and of 0.69. Other top-performing methods included NT_SMOTE and G_SMOTE, each achieving OA values above 0.83 and above 0.80. This analysis clearly demonstrates that synthetic oversampling can substantially enhance both overall classification accuracy and agreement beyond chance.
To further illustrate the benefits of synthetic balancing, we compared two RF training scenarios: (i) the original imbalanced dataset and (ii) a balanced dataset generated using Distance_SMOTE. Prior to applying SMOTE, the training data exhibited severe imbalance, with the forest class containing 873 samples, whereas the landslide and road classes had only 5 and 8 samples, respectively. After applying Distance_SMOTE, all classes were balanced to 873 samples, ensuring equal representation during model building.
Table 4 summarizes the confusion matrix of the RF model trained on the Distance_SMOTE-enhanced dataset. Compared with the original RF model (
Table 3), the OA increased from 0.74 to 0.85, and
improved from 0.69 to 0.82, indicating stronger overall agreement and reduced misclassification rates.
Class-level improvements were significant. The grassland class, previously the weakest performer, showed an increase in F1-score from 0.46 to 0.78, demonstrating markedly better recall and precision. Similarly, the roads class improved from 0.67 to 0.85, and farmland rose from 0.68 to 0.81. The forest class improved from 0.75 to 0.90, while the built-up class remained stable with moderate gains. Even minority classes such as landslides, which initially performed well, exhibited a slight increase in F1-score from 0.93 to 0.97, and water maintained a high level of accuracy. These enhancements confirm that Distance_SMOTE not only elevated performance for underrepresented classes but also preserved strong performance for majority classes.
Our results compare favorably with recent studies that addressed skewed class distributions in landslide mapping. For example, Lu et al. [
19] applied four resampling methods to an imbalanced landslide dataset on Penang Island and reported that an RF + SMOTE-ENN (Edited Nearest Neighbor) model achieved a recall of 0.844 and an F2-score of 0.756, underscoring the value of oversampling for sensitivity to landslides. Similarly, Gupta and Shukla [
35] used EasyEnsemble and BalanceCascade with SVM/ANN and reported AUC values up to 0.923 for the BCANN model, demonstrating substantial gains after rebalancing. In our case, the optimized RF combined with Distance_SMOTE achieved an overall accuracy of 0.85, a kappa of 0.82, and an F1-score of 0.97 for landslides, which are comparable to or exceed those reported in the literature. This indicates that systematically evaluating a wide range of SMOTE variants can yield substantial improvements for post-disaster, high-resolution landslide mapping where rapid and reliable results are critical.
In summary, applying Distance_SMOTE-based oversampling resulted in substantial accuracy improvements across all land cover categories, particularly for classes that were previously underrepresented. These findings underscore the critical role of advanced data balancing techniques in improving land cover classification in heterogeneous and complex terrains.
Feature importance analysis of the RF model trained with Distance_SMOTE (
Figure 7) reveals clear patterns in variable contributions to classification accuracy. As explained in the
Section 2.1, the feature importance values shown in
Figure 7 are derived from the built-in Gini importance metric of the Random Forest model. The NIR band emerged as the most influential feature, followed closely by Roughness and TRI, underscoring the critical role of both spectral and topographic variables in capturing landslide-prone areas. Slope and the Red band ranked next, highlighting the importance of terrain gradients and visible spectrum information for differentiating land cover classes. Vegetation-related indices such as NDVI and SAVI, along with band ratios (e.g., Red/NIR), also exhibited strong influence, indicating their value in detecting vegetation disturbance and bare soil exposure commonly associated with landslides. In contrast, curvature-based metrics (general, profile, and plan curvature) and TPI displayed minimal contribution, suggesting that micro-topographic variations are less informative compared with broader terrain and spectral attributes. This ranking emphasizes that a combination of NIR reflectance, terrain ruggedness, and vegetation indicators provides the most discriminative power for accurate LULC classification with landslide detection, while features with low importance may be candidates for dimensionality reduction in future modeling efforts.
3.3. LULC Prediction Map and Landslide Distribution Change Before and After Typhoon Khanun
Building on the model selection results, the RF classifier trained with Distance_SMOTE was used to predict LULC classes and landslide areas for Nanfeng Village. To assess the model’s capability for landslide detection before and after a major event, predictions were performed on two Pléiades satellite images acquired on 26 June 2023 and 25 August 2023 (
Figure 3)—capturing the landscape before and after Typhoon Khanun.
Figure 8 illustrates the spatial distribution of the six selected sample areas (A–F) within the study site. These boxes were chosen to represent locations where landslides were observed or likely to occur due to factors such as proximity to creek corridors, tributary junctions, and steep slopes. Boxes A, B, C, and D are situated in the northern part of Nanfeng Village, while Boxes E and F are in the southern section. This selection ensured representation of diverse geomorphic settings most affected by intense rainfall events.
Figure 9 provides a detailed side-by-side comparison of each box before and after Typhoon Khanun, with red overlays representing landslides predicted by the model. The left panels (a, c, e, g, i, k) correspond to pre-typhoon imagery from 26 June 2023, and the right panels (b, d, f, h, j, l) show post-typhoon imagery from 25 August 2023. This visual comparison highlights both the persistence of pre-existing landslides and the occurrence of new failures caused by the typhoon.
Notable patterns include the following:
Box A: Significant lateral expansion of an existing landslide along a creek corridor.
Box B: Enlargement of scars near drainage lines, merging into broader disturbed zones.
Box C: Multiple small slides coalescing into elongated failures along steep slopes adjacent to creeks.
Box D: Formation of new landslides on previously undisturbed vegetated slopes.
Box E: Fresh landslides near slope toes adjacent to tributary streams.
Box F: Extensive new failures forming elongated scars along steep southern slopes.
The application of the Distance_SMOTE and RF model to the two Pléiades images demonstrates robust temporal generalization, accurately detecting both pre-existing landslides and new or enlarged areas following Typhoon Khanun. Observations from Boxes A, C, and F confirm the model’s ability to capture incremental changes as well as abrupt slope failures, underscoring the strong geomorphic response of the terrain to the typhoon, which destabilized slopes in both headwater regions and lower channels. The substantial post-typhoon increase in landslide extent within these boxes validates the model’s effectiveness for identifying both the expansion and initiation of landslides under extreme rainfall, highlighting the suitability of the RF + Distance_SMOTE approach for dynamic landslide hazard assessment.
3.4. Limitations and Future Research Directions
Our findings confirm that the application of SMOTE variants can significantly enhance the accuracy of landslide susceptibility models by addressing class imbalance issues. This improvement aligns with earlier works (e.g., [
19,
32]), which demonstrated that oversampling techniques effectively boost model sensitivity and overall performance. In this study, we conducted a comprehensive evaluation of 65 SMOTE variants—far exceeding the scope of most previous research—to provide a broader understanding of their relative effectiveness. The results indicate that certain variants, such as Distance_SMOTE, achieved higher accuracy than many commonly used alternatives. This systematic comparison is rare in the existing literature and highlights the importance of selecting appropriate oversampling methods to improve predictive performance in localized post-disaster landslide mapping. Although the combination of Distance_SMOTE and RF improved post-disaster LULC and landslide mapping, several limitations remain.
First, the landslide class in the original training data was extremely underrepresented, requiring synthetic oversampling. While SMOTE-based methods mitigated imbalance, synthetic samples may not fully capture real-world heterogeneity, especially in complex mountainous terrain, which could affect generalizability. Another limitation arises from the manual extraction of landslide inventories from Pléiades imagery, which, despite cross-checking with Google Earth, may still be subject to interpreter bias. Such biases could influence the accuracy of the reference data used for training and evaluation.
Second, despite efforts to maintain spatial independence between training and test sets, residual spatial clustering might still influence model performance. Future studies should apply spatial cross-validation and incorporate physically relevant variables—such as rainfall, lithology, soil properties, and historical landslide inventories—to better represent underlying processes.
Third, this analysis focused on a single event (Typhoon Khanun) in one locality, which limits transferability. Expanding the approach to other regions and events is essential to assess scalability. We explicitly highlight this limitation and propose the use of multi-event training datasets and the inclusion of lithology, soil properties, and hydrological factors to improve model robustness and generalizability. The absence of geological and geotechnical variables in this study further constrains applicability to diverse terrains. In addition, the study does not distinguish between different landslide types; future research should incorporate detailed classifications of landslide mechanisms (e.g., shallow vs. deep-seated failures, debris flows) to enhance the physical relevance of predictive models. This study also did not benchmark against DL or hybrid models, which may offer improved performance. Comparative analyses involving cost-sensitive learning, ensemble resampling, or Generative Adversarial Network (GAN)-based augmentation could provide further insights.
Fourth, while performance metrics were reported, no spatial error analysis was conducted. Misclassifications were possibly concentrated near class boundaries or in shadowed terrain, such as confusion between farmland and grassland and occasional road misclassification due to narrow geometry. Mapping false positives and negatives would help link statistical errors to geomorphic conditions.
Fifth, parameter tuning for SMOTE (e.g., neighbors) and RF (e.g., number of trees, depth) was not explored. Default settings were retained for consistency across 65 SMOTE variants. However, parameter optimization and sensitivity analysis could further improve model robustness and reduce overfitting.
Sixth, DEM resampling from 20 m to 2 m ensured alignment with Pléiades imagery but introduced potential interpolation artifacts, which may affect terrain derivatives such as slope and curvature. Future research should evaluate the impact of DEM quality on classification accuracy.
Seventh, this study provides a timely demonstration of post-disaster mapping in Taiwan using widely available satellite and elevation data. Addressing the above limitations is essential for developing more physically grounded and transferable machine learning frameworks for landslide susceptibility assessment. In particular, incorporating spatial uncertainty quantification and benchmarking interpretable models against deep or hybrid architectures may help balance predictive accuracy with explainability in geospatial hazard applications. Looking forward, novel approaches such as automated hybrid ensemble-based deep learning, integration with 3D geo-models, and explicit uncertainty analysis [
45] represent promising directions for enhancing both accuracy and reliability in post-disaster landslide mapping.