Assessing Debris-Flow Susceptibility at Local and Global Scales: A Deep-Learning-Based Comparative Study ofSichuan, China, and Worldwide

Nienkötter, Andreas; Bian, Ang; Di, Baofeng; Li, Jierui; Deng, Tian

doi:10.3390/rs18091442

Open AccessArticle

Assessing Debris-Flow Susceptibility at Local and Global Scales: A Deep-Learning-Based Comparative Study ofSichuan, China, and Worldwide

by

Andreas Nienkötter

¹

,

Ang Bian

^2,*

,

Baofeng Di

^1,3

,

Jierui Li

¹ and

Tian Deng

⁴

¹

Institute for Disaster Management and Reconstruction, Sichuan University-Hongkong Polytechnic University, Chengdu 610000, China

²

School of Computer and Software Engineering, Xihua University, Chengdu 610000, China

³

Center for Archaeological Science, Sichuan University, Chengdu 610000, China

⁴

School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430000, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(9), 1442; https://doi.org/10.3390/rs18091442

Submission received: 17 March 2026 / Revised: 24 April 2026 / Accepted: 2 May 2026 / Published: 6 May 2026

(This article belongs to the Special Issue Remote Sensing for Landslide Investigations: Mapping, Monitoring and Forecasting)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We constructed two large publicly available remote-sensing datasets consisting of the most important geological, meteorological, and other features for global and local debris-flow susceptibility based on a novel two-step negative sample generation scheme.
A unified end-to-end deep learning framework is proposed for debris-flow susceptibility using remote sensing images instead of manually crafted feature computing. Our work achieves state-of-the-art debris-flow susceptibility on both global and local scales. This cross-scale comparison also reveals that the impact of feature selection for multi-scale debris-flow susceptibility can be automatically balanced by deep learning.

What are the implications of the main findings?

Our datasets and unified deep learning framework provide a foundation for multi-scale debris-flow susceptibility research. Our end-to-end learning framework also shows that state-of-the-art debris-flow susceptibility is possible using a singular remote sensing image-based data generation scheme, leading to the possibility for rapid deployment in debris-flow-affected regions, without expensive feature collection and preparation.
Cross-scale feature importance comparison highlights that different scales focus on different features but can be automatically balanced by deep learning. This finding fills a critical gap in current debris-flow susceptibility analysis, which focuses on singular scales or regions using highly divergent, manually crafted feature combinations of various scales. Hence, our work will encourage the development of debris-flow susceptibility methods using advanced deep learning technologies.

Abstract

Debris flows pose a significant global geohazard, causing a large number of deaths and infrastructure damage every year. Effective protection and land-use planning in the affected regions requires understanding susceptibility to these events. Although a global phenomenon, previous studies have focused extensively on local areas with specialized models and accordingly complex feature selections. In this study, we investigate whether a unified debris-flow susceptibility prediction paradigm can be achieved regardless of regional scale, using only very few global public remote sensing data sources. To this end, this work contributes in the following ways: (1) A novel two-step negative sample generation scheme is proposed, and two open debris-flow datasets are constructed based on global debris-flow locations and locations in Sichuan, China. (2) An open-source end-to-end machine learning platform using remote sensing features directly is proposed, which achieves state-of-the-art results with 0.947 and 0.957 AUC in both scales compared to 0.88 for previous methods on the same location data, while using far fewer features. (3) A comparative feature importance analysis shows that, given the significant feature distribution difference on global vs local datasets, alleviating the scale-level gap is possible by leveraging the advanced deep learning technologies. This allows our unified framework to be easily applied to any regional study of debris-flow susceptibility prediction.

Keywords:

debris-flow susceptibility; geohazards; multi-scale; deep learning

1. Introduction

Debris flows are a globally occurring natural geohazard consisting of a loose mass of debris, such as soil, rock, wood, or other material mixed with water, traveling down a slope. Being a mixture of solid and fluid material makes them very dangerous to the population and properties alike due to being nearly as dense as rock avalanches, while still flowing nearly as fast as water with up to 35 km/h [1]. Due to this, they are a cause of extensive casualties and economic damage worldwide [2,3,4]. Because of their uncertain nature, land-use planning in the affected regions is severely restricted. A first step towards improving planning and warning against debris flows is susceptibility mapping, i.e., detecting the affected regions and conditions for upcoming debris flows. Susceptibility mapping can be used to find safe regions inside debris-flow-affected areas for development, as well as to analyze the (positive and negative) effects of the development on the susceptibility of the developed area.

Although a global phenomenon, as seen in the distribution of debris flows in the NASA global landslide catalog [5] shown in Figure 1, debris-flow susceptibility has been previously studied extensively in local areas, such as regions in China [6,7,8,9,10], the USA [11], France [12], Italy [13], Iran [14], and many others [2,15]. Recently, there were approaches for designing a global model using statistical modeling methods [16]. Interestingly, the current research on debris-flow susceptibility employs methods such as Gradient Boosting Trees [8], statistical modeling [16], Support Vector Machine [17], ANN [18], and other classic machine learning methods [7,14].

Deep learning, AI, and convolutional neural networks have recently gained large popularity due to their impressive performance in a variety of tasks, but so far have found only limited application in susceptibility prediction of debris flows. Ref. [19] for example, employs shallow convolutional neural networks using hand-crafted features for the susceptibility of debris flows and other geohazards in Pakistan, while other works, such as [20], focus on landslides. More common are applications in related tasks such as simulation of impact on buildings [21] or simulation of affected area [22].

An overview of the listed study regions, including scale, number of geohazard location samples, features, and methods, is given in Table 1. As can be seen, the current research overwhelmingly focuses on different local regions around the globe, with each method focusing on different models and features, often involving a large amount of manually collected and computed features. In this work, we therefore focus on three questions: (1) Can we use a simple set of publicly available remote sensing data for high-quality debris-flow susceptibility judgment regardless of the region or scale differences? (2) Can common deep learning models be used for high-quality debris-flow susceptibility, without the need for hand-crafted features as in previous works? (3) How does feature importance vary across small-scale local and large-scale global susceptibility modeling, even with the identical feature inputs?

To answer these questions, we contribute in the following ways: (1) Global and local debris-flow susceptibility research datasets are constructed, based on a novel two-step negative sample generation scheme. These datasets, including 6735 global debris-flow locations, 3290 locations from Sichuan, China, and their associated remote sensing features, are publicly available. (2) A unified end-to-end machine learning framework is proposed for debris-flow susceptibility prediction, where remote sensing images instead of handcrafted features are utilized directly. Our framework facilitates the easy deployment of both traditional machine learning and state-of-the-art deep learning methods for both global and any regional hazard studies. (3) Comprehensive experiments and analysis show that the state-of-the-art prediction accuracy is achieved under our framework on both global and local scales using only six publicly accessible remote sensing sources. Moreover, the prediction and feature impact tests on both datasets demonstrate the superior robustness of deep learning against the feature distribution shift in different scales.

The paper is structured as follows. We will first show the framework, study area, dataset generation, and end-to-end deep learning framework in Section 2. Results on both local and global scales are shown in Section 3. Section 4 follows up with a discussion of the results, including a comparative feature analysis. Finally, the work is concluded in Section 5.

2. Materials and Methods

The research framework used in this work is shown in Figure 2. After a brief summary of the study area in Section 2.1, the novel two-step dataset generation scheme is shown in Section 2.2. Then, the thorough machine learning model evaluation framework is detailed in Section 2.3. Finally, data and code availability are discussed in Section 2.4.

2.1. Overview of the Study Area

As a global geohazard, debris flows are affecting population and infrastructure of all populated continents, and they happen in a large variety of topologies and climates. This can be seen in the distribution of recorded debris flows in the NASA global landslide catalog [5] as shown in Figure 1. This inventory contains

39, 633

geohazards of various types, 321 of which are directly classified as debris flow. Following the same reasoning as [16], we include locations marked as “mudflow”, “earthflow” or similar in our dataset, as it is a common misnomer of debris flow. This raises the total number of locations to 6735. As suggested by [23], there may be a sampling bias in this data, as debris flows are more likely to be reported near populated areas. Nevertheless, it should serve as a good foundation for global debris-flow susceptibility analysis due to its large coverage area.

China is one of the most affected countries around the world for debris flows, with approximately

50, 000

known debris-flow locations distributed over 48% of the country. In China, the Sichuan province, shown in Figure 3, is one of the most affected regions with approximately 90 deaths annually between 2000 and 2016 [8]. The Sichuan province is located in China’s southwest in the upper reaches of the Yangtze River, between the latitudes of 26°030′N to 34°190′N and longitudes of 97°210′E to 108°330′E. It covers an area of 486,000 square kilometers with altitudes ranging from 212 m to 6904 m above sea level. The province contains a wide range of topographies, ranging from mountainous areas and basins to plateaus [10].

Meteorologically, the study area is divided between the subtropical monsoon climate and the plateau mountain climate. The monsoon season is between April and October, with an annual average rainfall of 1000 mm. The average temperature ranges from 3 °C to 8 °C to 25–29 °C in January and July, respectively [10].

Three active faults run through the study area, namely the Longmenshan fault, the Xianshuihe fault, and the Anninghe fault. Geological hazards are frequently triggered due to complex geological and geomorphological characteristics caused by the interactions between the Qinghai–Tibet Plateau in the west and the Sichuan Basin in the east [10]. Landslides and debris flows, often caused by earthquakes, are common. Additionally, environmental change and human activity increase the magnitude and frequency of debris flows in the area [8].

Our local research area includes 3290 debris-flow locations collected from 1949 to 2017 in the Sichuan province, China [8], as shown in Figure 3. Similarly to the global data, it is not guaranteed to be free from biases, but the large time span and coverage area inside Sichuan serve as a good foundation for local debris-flow susceptibility analysis.

2.2. Dataset Generation

Given debris-flow locations such as above, two ingredients are needed for the construction of datasets for deep-learning-based susceptibility models. First, negative samples of low-susceptibility regions are essential for discriminative model learning. Second, a unified input feature standardization is required for end-to-end learning and fair comparison. Our solution for the problem of missing negative samples is proposed in Section 2.2.1. The remote sensing image patch features and the standardization details for both global and local scales are then presented in Section 2.2.2.

2.2.1. Negative Sample Generation

Since both datasets only contain locations where debris flows happened, we propose a two-step negative sample generation scheme to generate an appropriate number of negative samples for each dataset. This scheme is designed to ensure both the spatial and temporal closeness of negative samples with debris-flow locations, and coverage of the study area. This is achieved by dividing the study region into two regions: a ‘near’ region close to existing debris flows and a ‘far’ region covering the study area. The algorithm is shown in Algorithm 1. An illustration for the local dataset is shown in Figure 4.

Algorithm 1 The two-step negative sample generation scheme

Require: positive Samples

P = {p_{i}}_{i = 1}^{N}

; radii

r_{1}, r_{2}

; sample region B

S_{n} \leftarrow \emptyset; S_{f} \leftarrow \emptyset

// Generate Near Samples

S_{n}

while

| S_{n} | < N / 2

do

p_{i} \leftarrow RandomUniform (P)

n_{i} \leftarrow RandomUniform (ρ (p_{i}, r_{2}) ∖ (⋃_{j = 1}^{N} ρ (p_{j}, r_{1})))

y e a r (n_{i}) \leftarrow y e a r (p_{i})

S_{n} \leftarrow S_{n} ⋃ {n_{i}}

end while

// Generate Far Samples

S_{f}

while

| S_{f} | < N / 2

do

n_{i} \leftarrow RandomUniform (B ∖ (⋃_{j = 1}^{N} ρ (p_{j}, r_{2})))

y e a r (n_{i}) \leftarrow y e a r (RandomUniform (P))

S_{f} \leftarrow S_{f} ⋃ {n_{i}}

end while

return

S_{n} ⋃ S_{f}

In the first step, a set of ‘near’ low-susceptibility samples is generated equal to half the number of debris-flow locations in the dataset. First, a random positive debris-flow location

p_{i} \in P

is selected, where

P = {p_{i}}_{i = 1}^{N}

is the set of all positive locations. Then, a new random negative point

n_{i} \in ρ (p_{i}, r_{2}) ∖ (⋃_{j = 1}^{N} ρ (p_{j}, r_{1}))

is chosen, where

r_{1} < r_{2}

are radii, and

ρ (p_{i}, r) = {x | ∥ p_{i} - x ∥ \leq r}

denotes neighborhood of

p_{i}

with distance r. This ensures the negative points are sampled within a proper distance

r_{2}

from

p_{i}

, while excluding areas that are closer than

r_{1}

to any location in the positive set. The newly generated point inherits its year information from

p_{i}

. This process repeats until

N / 2

points are generated. This step generates half of the original number of locations in low-susceptibility samples in close spatial and temporal proximity to known debris-flow locations, focusing the high susceptibility at the detected debris-flow locations and ensuring low susceptibility nearby.

In the second step, a set of ‘far’ low-susceptibility samples equal to half of the debris-flow locations is generated by randomly sampling locations on landmass in the whole study area. First, a random negative point

n_{i}

is selected within the landmass of the study area B, excluding the region

⋃_{j = 1}^{N} ρ (p_{j}, r_{2})

. This generates negative samples with a distance greater than

r_{2}

to any given debris-flow location while remaining inside the study area. Then, the year information of this generated location is set to the year information of a randomly chosen debris-flow location. This is again repeated until

N / 2

locations have been generated. This ensures coverage of the study area with negative samples in regions with no or very few reported debris flows, while covering the same temporal extent.

For our datasets, the radii

r_{1}

and

r_{2}

were chosen based on the distribution of the location data in the original datasets. Given the mean distance to the nearest neighbor of each debris flow is ≈0.15 degrees in the global dataset,

r_{1} = 0.15

and

r_{2} = 1.5

are selected. In the local dataset, we selected

r_{1} = 0.02

and

r_{2} = 0.2

as the mean distance is ≈

0.02

degrees. The effect of different radii on susceptibility performance will be evaluated in Section 3.2.

Note that we cannot guarantee that negative samples do not coincide with possible real debris-flow locations, as it is unknown whether a negative location could be an undetected debris-flow location in the future. However, due to the inclusion of a minimum distance of radius

r_{1}

to the generation of negative samples, we aim to mitigate the chance of this occurring.

This data generation scheme leads to

13, 465

locations in total for the global dataset and 6580 locations for the local dataset, as seen in Figure 5. Each dataset consists of approximately

50 %

reported debris-flow locations and

50 %

generated ‘near’ and ‘far’ low-susceptibility samples, ensuring a balanced dataset.

2.2.2. Remote Sensing Features

In our work, each sample in both global and local location datasets is represented by 6 remote sensing features, while each feature is composed of image patches centered on the given location. The 6 remote sensing features include global digital elevation model (DEM), soil moisture, soil depth, vegetation index, max. precipitation, and topsoil clay %. The features were chosen based on previous assessments of feature importance in the literature. An overview of the used features, their resolutions, and temporal extents is given in Table 2 and displayed in Figure 6.

The digital elevation model (DEM) features are extracted from the ASTER GDEM V003 dataset [24] with a resolution of one arc second or approximately 30 m. DEM feature provides direct and indirect information on elevation, slope, and shape of the surrounding environment, feature types that are recognized as the most important debris-flow indicators in many previous works [8,16,29].

The soil moisture and soil depth features consist of the “soil_moisture_x” and “soil_depth_x” layers, respectively, taken from the ‘AMSR2/GCOM-W1 surface soil moisture (LPRM) L3 1 day 10 km × 10 km ascending V001 (LPRM_AMSR2_DS_A_SOILM3) at GES DISC’ raster dataset [25] with a resolution of approximately 10 km or 0.1 degrees. These features provide information on soil water content and soil depth, two features commonly associated with debris-flow susceptibility [16,29]. Both remote sensing layers are available as daily data from 2013 to 2025. For each year, the daily data were fused into a yearly feature by averaging the data of the respective year.

The vegetation index feature is the ‘Average VI’ layer from the ‘Vegetation Index and Phenology (VIP) Phenology EVI-2 Yearly Global 0.05 Deg CMG V004’ dataset [26] with a resolution of 0.05 degrees or approximately 5 km. The presence of vegetation is an important factor in debris flow due to its impact on soil stability, as well as providing mass for debris flows [29]. This dataset contains the average yearly features for the years 1981 to 2014.

The max. precipitation feature is the yearly maximum precipitation taken from the ‘GPM IMERG Late Precipitation L3 1 day 0.1 degree × 0.1 degree V06’ dataset [27] with a resolution of 0.1 degrees or approximately 10 km. Maximum daily precipitation is one of the top features indicating debris-flow susceptibility in many previous works [18,29], due to heavy rainfall often being a trigger of debris flows. The daily precipitation is available in this dataset for the years 1998 to 2025. For each available year, the pointwise daily precipitation was fused into a pointwise maximum yearly precipitation feature.

The topsoil clay % feature is taken from the ‘Harmonized World Soil Database v1.2’ [28] with a Resolution of 30 arc seconds or approximately 1 km. As the topsoil clay content is one of the stronger indicators of debris-flow susceptibility in previous works [16,30], only this subset of data from the World Soil Database was included.

Our work adopts image prediction models for end-to-end susceptibility prediction; hence, only minimal feature preprocessing is necessary for input standardization and missing value imputation. First, all remote sensing features were normalized by linear transformation into the range

[0, 1]

. Then, for each potential debris-flow location, a feature patch of

128 \times 128

arc seconds (∼3.8 km) centered on the location was cut out from the respective year the debris flow occurred, and scaled to the resolution of

128 \times 128

pixels. This corresponds to the original resolution of the high-resolution DEM layer; all other feature maps are reprojected onto the same coordinate system using bilinear transformation. In the case of debris flows with missing year information or the year being outside the time range of the features, the average of all available years was used. Missing values and values outside of landmass in the feature patch were filled by interpolation of the surrounding values if the patch overlaps data partially. If the patch did not cover any data in a feature, the corresponding location was discarded. Thus, each location is associated with a feature patch of shape

(6 \times 128 \times 128)

. Each potential debris-flow location in the global and local dataset is given a susceptibility label of 1 if it is a debris-flow location and 0 if it is a generated ‘near’ or ‘far’ low-susceptibility sample.

2.3. The Deep Learning Framework

Given the above local and global debris-flow datasets, a thorough end-to-end deep-learning-based debris-flow framework is possible. This work constructed several state-of-the-art debris-flow susceptibility models as shown in Figure 2. First, we will present the three machine learning models used in this study, followed by the training and evaluation scheme.

2.3.1. The Susceptibility Models

In this study, we compare three successful deep learning models for image classification, namely VGG-16, ResNet-50, and Vision Transformer.

VGG-16 is a convolutional neural network architecture introduced by [31]. It is characterized by its simple and uniform structure, and it consists of 16 layers in total, with 13 convolutional layers followed by three fully connected layers. The repeated use of small convolution filters allows the network to capture hierarchical features effectively while reducing the number of parameters compared to larger filters. VGG-16 has shown strong performance on many image classification tasks.

ResNet-50 is a deep convolutional neural network introduced by [32]. The key innovation of ResNet-50 is the use of skip connections (or residual connections), which mitigate the vanishing gradient problem and enable training of substantially deeper networks without losing accuracy. ResNet-50 consists of 50 layers, including 49 convolutional layers and 1 fully connected layer, organized into small bottleneck blocks (1 × 1, 3 × 3, 1 × 1 convolutions) to reduce computational cost. It achieved state-of-the-art performance on image classification tasks while being more efficient than VGG-16.

The Vision Transformer (ViT) is a groundbreaking neural network architecture introduced by [33], which adapts the Transformer model, originally designed for natural language processing, to image classification tasks. Unlike the convolutional neural networks above, ViT processes images by splitting them into fixed-size patches. These patches are then processed by the Transformer encoder used in language processing. The central mechanism of the Transformer encoder is the self-attention, which computes attention scores between image regions to capture global relationships between them. This allows processing of long-range dependencies in the input data, unlike the neighborhood-based convolutional layers. ViT has achieved superior performance compared to convolutional nets such as VGG-16 and ResNet-50 when trained on large-scale datasets.

2.3.2. Model Training

For model training, the global and local datasets are divided into roughly 90% training and 10% test data. This split is done in a grid-based and risk-stratified manner using a stud region spanning 0.5 degree grid. The grid-based split ensures spatially close locations are in the same split while still having regionally relevant samples in both training and testing sets. The stratification by risk ensures an equal proportion of risk and non-risk locations in both the training and test sets. Then, the training data is used for 10-fold cross-validation while the test data is held out for final evaluation.

In 10-fold cross-validation training, the training dataset is further divided into 10 subsets, again in a grid-based and risk-stratified manner, 9 of which are used for the model training itself and 1 for model validation. For each combination of these subsets, i.e., folds, the model was trained until loss convergence on the validation set. The model with the lowest validation loss of each fold was then used for evaluation on the hold-out test set to validate the robustness of the models.

The training hyperparameters are as follows. For optimization, Binary Cross Entropy loss was used with the Adam Optimizer and a learning rate of

10^{- 4}

. Models were trained with a batch size of 16 until no validation loss improvement was made for 5 epochs, with a maximum of 40 epochs that were not reached in our experiments. The training data was augmented by 90° rotation of the feature patches, increasing the training data by a factor of four. All models were implemented and trained in python 3.14 with pytorch 2.11.

For VGG-16 and ResNet-50 we follow standard architecture with output size 1 for risk prediction. Vision Transformer was set up with a patch size 16 and 8 transformer layers, while each transformer layer consists of 8 multihead attention layers with dimension 2048, followed by a 2048-dimensional feed-forward neural net.

2.3.3. Model Evaluation

Following the framework above, we train one model per fold, leading to 10 models in total for each dataset. All evaluations are shown as average and 95% confidence interval (CI) over these 10 models for statistically comparative analysis. In the following, we shortly introduce the evaluation criteria used.

Area under the curve (AUC). The area under the curve (AUC) of the receiver operating characteristic (ROC) curve is a common performance measure for binary classification. Given a models prediction scores

y_{i}^{p r e d} \in [0, 1]

with the given true labels

y_{i}^{t r u e} \in {0, 1}

, and a threshold

t \in [0, 1]

, one can compute the False Positive Rate (

F P

) and True Positive Rate (

T P

) as

F P_{t} = \frac{\sum_{i = 1}^{n} 1 (y_{i}^{p r e d} > t \land y_{i}^{t r u e} = 0)}{\sum_{i = 1}^{n} 1 (y_{i}^{t r u e} = 0)}, T P_{t} = \frac{\sum_{i = 1}^{n} 1 (y_{i}^{p r e d} > t \land y_{i}^{t r u e} = 1)}{\sum_{i = 1}^{n} 1 (y_{i}^{t r u e} = 1)}

(1)

where

1 (p) = 1

if p is true, and else 0. In other words, if all model prediction scores larger than t are classified as positive, then

F P_{t}

is the fraction of negative predictions that are wrongly classified as positive, and

T P_{t}

is the fraction of positive predictions that are rightfully classified as positive.

The ROC curve is then the curve created by all possible t, plotted as

F P_{t}

on the x-axis and

T P_{t}

on the y-axis, while AUC is the area between the ROC and the x-axis. A classifier has a result of 1 if the prediction is perfect (

T P_{t} = 1

and

F P_{t} = 0

for all t), while a random guessing classifier has an AUC of

0.5

.

Classifier Calibration. Compared with standard metrics that only assess classification correctness, classifier calibration quantifies how well predicted probabilities align with the true likelihood of events. It provides a more interpretable measure of debris-flow susceptibility by giving a meaningful hazard occurrence probability with the model output. For instance, given a set of samples with a prediction score of 0.85, one would naturally expect that 85% of these samples are actual positive hazard locations, while 15% belong to the negative class. Calibration is commonly used to show the true relationship between prediction scores and probabilities as follows. First, the prediction scores are divided into histogram bins of equal size, commonly 10 bins

b_{1} = [0, 0.1), b_{2} = [0.1, 0.2), \dots, b_{10} = [0.9, 1]

. Then, each sample

y_{i}

in the test set is assigned to the bin of its model’s prediction score, i.e.,

y_{i} \in b_{j} \Leftrightarrow y_{i}^{p r e d} \in b_{j}

. Finally, for each bin

b_{j}

the actual probability is computed as

p_{j} = \sum_{y_{i}} 1 (y_{i}^{t r u e} = 1 \land y_{i}^{p r e d} \in b_{j}) / \sum_{y_{i}} 1 (y_{i}^{p r e d} \in b_{j})

. Calibration is commonly shown using a visual calibration curve. Hence, an ideal calibration would lead to a linear calibration curve of

p_{i} = b_{i}

where

b_{i}

is the predicted probability range and

p_{i}

the observed probability.

For objective comparison, we use two metrics, namely calibration intercept and slope. Assuming a linear calibration curve

p_{i} = a b_{i} + c

, the slope a describes the difference in rate of change between true and observed calibrations. An ideal value of 1 means that predicted probabilities and observed probabilities change equally, while a value larger (smaller) than 1 means the observed probabilities increase faster (slower) than the predicted scores, leading to a discrepancy between the values. A value of

a = 0.5

, for example, shows that an increase of 10 percentage points in model score on average leads to an increase of only

5 p p

in the actual prediction probability, typically leading to the model overestimating the real probability. The intercept, on the other hand, describes the vertical shift c of the calibration curve. As an ideal calibration curve with slope 1 results in

p_{i} = b_{i} + c

, the intercept c is a constant over- or under-estimation of the predicted probabilities by c. A value of

c = - 0.1

, for example, would mean that the observed prediction

p_{i}

is on average

10 p p

smaller than

b_{i}

, showing the model’s overestimation of the prediction scores. As such, a slope of 1 and an intercept of 0 is ideal, while positive diverging values show a general underestimation, and negative diverging values a general overestimation of the prediction probabilities. Both values are measured by linear regression of

(y_{i}^{p r e d}, y_{i}^{t r u e})

.

2.4. Data and Code Availability

The generated global and local remote sensing image patches are publicly available in the Zenodo repository https://zenodo.org (accessed on 1 May 2026) under https://doi.org/10.5281/zenodo.20034823. The Sichuan dataset contains all 6580 remote sensing data patches used in this study to construct the local model. The remote sensing data patches are generated as described in the previous section, containing images of 3290 collected debris-flow locations and the same number of generated negative samples. The global dataset contains all 13,465 global patches used to construct the global model, consisting of 6735 debris-flow locations from the NASA global landslide catalog and 6730 generated negative samples (5 no-data locations were discarded). The used python code for our data generation and machine learning framework is available under https://gitlab.com/ankidmr/multi-scale-debris-flow-susceptibility (accessed on 1 May 2026).

3. Results

In this section, we will show the results of the framework, data generation scheme, and deep learning models shown in Section 2. First, we show the model performance in Section 3.1, followed by the impact of radius selection in data generation in Section 3.2. Susceptibility maps generated with the best performing models are shown in Section 3.3.

3.1. Model Performance

Table 3 shows the achieved results of the three models as mean AUC and 95% confidence interval (CI) over the 10 computed folds. The respective ROC curves of our models can be found in Figure 7, as a mean curve with 95% confidence interval. For comparison, we include the previous results by [16] and our previous work [8]. Although using different methodologies and features, both achieved their results on the same global and local debris-flow location datasets, thereby serving as a baseline result of this study. Additional results using the classic Random Forest and Histogram Gradient Boost classifiers are included to show an additional baseline result for our datasets when not using deep learning models.

All tested models achieved advanced performance by a large margin over the baseline works using only six features. ResNet-50 has the best median AUC over all folds of up to 0.947 and 0.957 for the global and local datasets, respectively, compared to the reported 0.888 [16] and 0.88 [8] of previous methods based on the same research locations. More specifically, we achieve a gain of

+ 0.059

AUC compared to the global debris-flow susceptibility work of [16] using 12 different features. On the local scale, we improve the result of [8] by

+ 0.077

in the local Sichuan region, while using the six base remote sensing maps, compared to the 72 features used in the reference work.

Compared to ResNet-50, VGG-16 and Vision Transformer show slightly worse results with 0.945 and 0.930 AUC on the global dataset, and 0.949 and 0.940 AUC on the local Sichuan dataset. All computed models show very high confidence in the AUC result, as shown in the table and figure by the very narrow confidence interval of the computed models. Although lower than deep learning, the results are higher than baseline using classic machine learning methods like Random Forest and Histogram Gradient Boost, also demonstrating the effect of our remote sensing feature patches-based dataset and learning framework, with significantly improved results in the global dataset and similar to slightly worse performance in the local model, based on much fewer feature types. Interestingly, the classic models show a higher AUC in the global data, while the deep learning models show an increased performance in the local data.

The calibration of the models is shown in Figure 8 and Table 4. Again, the shown calibration curves and values are the means and 95% confidence intervals of the k-fold cross-validation models. Evident by the very narrow calibration curves along the ideal line, and by the calibration slope and intercept close to the ideal of 1 and 0, VGG-16 and Vision Transformer show a very good calibration on the global and local datasets. However, the best model in AUC performance, ResNet-50, shows a tendency to overestimate prediction probabilities in both global and local scales, with a calibration slope of 0.77 and 0.79 and an intercept of −0.098 and −0.134, respectively. Although techniques such as temperature scaling [34] exist for calibration correction, they did not lead to an improvement of calibration in these models and datasets.

3.2. Influence of the Data Generation Radii

As the proposed data generation scheme depends on the values of the chosen radii, they have a measurable effect on the generated models. To evaluate this effect, we generated additional datasets with varying parameters as seen in Table 5. In both global and local data, a larger radius generally increases the average AUC results, while smaller ones decrease AUC, regardless of the chosen model. This is expected, as larger minimum and maximum radius increases the chance of generating a negative sample from further regions to the real hazard regions, thereby increasing the likelihood of highly distinguishable features in debris-flow-affected locations. On the other hand, a smaller radius increases the likelihood of generating locations in debris-flow-affected regions with highly similar features, making classification more difficult.

As can be seen, the choice of radius is a trade-off between a high model AUC and a possible higher focus of the susceptibility results on true debris-flow locations. However, smaller radii carry greater practical relevance for debris-flow disaster prevention and mitigation. Nevertheless, even with a small radius that is lower than the average minimum distance between locations, the performance of machine learning models is strong. Therefore, larger radii are not recommended, since the gain of the prediction accuracy does not make a meaningful contribution to the fine-scale debris-flow susceptibility mapping and damage prevention. In this work, the selection of radius is based on the average minimum distance of debris-flow locations in the respective dataset, ensuring the choice scales with the spatial scale of the dataset.

3.3. Susceptibility Mapping

As seen in the previous section, the ResNet-50 model achieves the best prediction results by AUC. Figure 9 and Figure 10 show the resulting susceptibility prediction maps for the global and local ResNet-50 model, with a detailed image of the northwest American coastal region for the global model and the Danba Region for the local model. The northwest American coastal region was chosen for the global model due to being a region with higher coverage of debris flows in the dataset, while the Danba Region in the local dataset was chosen for its high activity of debris flows along the four rivers Jinchuan, Geshiza, Donggu, and Dadu that meet in this region.

4. Discussion

In this section we will conduct a comparative analysis of both features and models to bridge the gap between global and local debris-flow analysis. Although the same ResNet-50 architecture using the same remote sensing features achieved the best AUC performance on both scales in the previous section, it is not guaranteed that it additionally focuses on the same features in its prediction on both scales. Therefore, we analyze the features more closely in the following. First, the training data is examined more closely in Section 4.1, followed by a feature importance analysis in Section 4.2. The relationship of model and feature distribution is analyzed in Section 4.3.

4.1. Feature Analysis on Global and Local Datasets

To investigate the influence region scale in debris-flow susceptibility research, a feature correlation and distribution analysis is conducted in this subsection.

Figure 11 shows the Pearson correlation between the used remote sensing features it the global and local datasets, including the difference in correlation between them. As images were used for model training and evaluation, the correlation was computed by simplifying the features through the average value of the feature in the given image patch. As such, these values are only a coarse summary of the given features and show only a limited view of the underlying data, especially in complex and high-resolution data such as DEM.

Although many features express a near-equal correlation between global and local scales, there are notable differences in some features. The main differences lie in the DEM feature, which has a much higher negative correlation with vegetation index, topsoil clay %, and max. precipitation in the local Sichuan region than in the global region.

This can be explained by the unique characteristics of the Sichuan region due to the east–west split between the subtropical monsoon climate in the east and the plateau mountain climate in the west. This divide leads to a much higher max. precipitation in the relatively low monsoon region in the east, leading to a negative correlation between DEM and max. precipitation. Similarly, the high mountain plateau region generally has less vegetation than the low monsoon climate region.

These large differences in feature distribution can be seen when taking a closer look at the given values, shown in Figure 12 as a histogram of feature distribution.

In the DEM data, one can see a wide range of elevation values in the local data with two peaks, caused by the mountain plateau in the west, contrasted with the Chengdu basin in the east. Naturally, most debris flows are located along the mountainsides between them at intermediate elevations. In contrast, the global data is much more slanted towards lower elevations, although even here, the fact that debris flows occur on mountainsides is visible through the peak at higher elevations.

Globally, debris flows tend to occur in regions with high vegetation index, suggesting vegetation is a necessary requirement for debris flows. Comparatively, it is much more evenly distributed between debris-flow and non-debris-flow locations in the local dataset due to the more similar vegetation inside the smaller region. Still, debris-flow location shows a higher average vegetation index.

Soil depth shows another big difference between global and local data. Due to the smaller local region, the features show a much higher overlap between debris-flow and non-debris-flow locations, while the global data has a noticeable peak in debris-flow locations. Note also the large spike of non-debris-flow locations in the global data caused by generated locations in desert regions, similarly visible in the vegetation index.

A case of data bias is visible in the max. precipitation data. Here, a large spike of similar max. precipitation can be seen, caused by debris flows collected in the wake of hurricanes in North America. Comparatively, the local data is much more balanced.

Globally, soil moisture is much lower than in the monsoon-affected Sichuan region, leading to another difference in global and local data, with both debris flows and non-debris flows concentrated along lower values in the global data, while they are relatively evenly spread in the local data.

Finally, topsoil clay % shows a very similar distribution in the global and local datasets, with most locations having a value of approximately

0.25

.

4.2. Feature Importance for Global and Local Datasets

Although both global and local susceptibility models use the same features, it is not guaranteed that both focus on the same features in their prediction, especially considering the very different correlation and distribution of features, as seen in the previous section, due to different scales, climates, and topologies of the study regions. In this section, we will therefore analyze and compare feature importance across global and local scales for the highest performing ResNet-50 model.

The feature importance was measured using the leave-one-out method, i.e., by training new models with the same framework as shown in Section 2, but with selected feature maps left out in the training process. The feature importance is then evident as the difference in AUC scores between the models using all features and the models with features left out. It should be noted that this importance is only given indirectly as model dependence on the given features, which is not necessarily the strict causal importance.

Figure 13 shows the distribution of AUC scores for the best-performing ResNet-50 model with all feature layers, followed by models with different missing feature layers. For each feature, the distribution is realized by the 10 models generated through 10-fold cross-validation. Note that the order of features in these plots is sorted by the median AUC and therefore different between global and local models. Detailed views of the respective generated susceptibility maps are shown in Figure 14 and Figure 15.

Naturally, DEM is the most important feature in both global and local models, with an AUC reduction of 0.028 and 0.213, respectively. This shows the importance of topological features, such as elevation and slope for debris flows, as evident in prior works. Corominas et al. [29], for example, describe DEM-related features as critical for debris flows and similar geohazards. Compared to DEM, other features show much lower feature importance, ranging from 0.001 to 0.014 in AUC reduction for both scales.

Table 6 additionally shows whether the feature importance differs significantly between global and local models. For this, we employ the Wilcoxon rank-sum test [35]. This test compares whether one population tends to have larger values than the other by evaluating the ranks of the combined samples without assuming a normal distribution. In our case, it compares each model’s drop in AUC to the respective baseline median. As can be seen, there is no statistically significant difference in feature importance in our global and local models in all features except DEM, meaning the same features are statistically likely to have the same impact on debris-flow susceptibility regardless of scale. Comparing the generated susceptibility maps in Figure 14 and Figure 15, however, one can clearly see that the missing high-resolution DEM feature impacts the local region’s susceptibility map heavily, likely impacting the importance by misclassification of samples close to debris-flow locations.

This is evident in Table 7. This table shows the

T P_{t}

and

F P_{t}

rates of the three subsets of data: real debris-flow locations, generated near, and generated far locations. As debris flows are the positive class, only

T P_{t}

is defined for this subset. Generated near and far locations are the negative class, thus only

F P_{t}

is defined for them. For each model, t was selected by maximizing Youden’s Index

J = T P_{t} - F P_{t}

over the test set, i.e., balancing a large

T P_{t}

with a small

F P_{t}

. When breaking down test samples by their type, one can see that in the local dataset, the false positive rate

F P_{t}

is dramatically increased for near-generated samples if the DEM feature is missing, rising from 0.122 to 0.410, meaning 41% of near samples are wrongly classified as debris-flow locations without DEM information. In contrast,

F P_{t}

only rises from 0.144 to 0.208 in the global data for the same sample type. On the other hand, in both local and global data, the missing DEM does not affect the prediction rate of far samples and debris flows themselves nearly as much. By increasing the resolutions of the other datasets, this effect could likely be mitigated, thereby increasing the similarity of the DEM features between both scales.

4.3. Model-Feature Distribution Analysis

In Section 3, it can be seen that deep learning models ResNet-50, VGG-16, and Vision Transformer show the highest AUC in the local model, while performing slightly worse in the global model. On the other hand, the classic Random Forest and Histogram Gradient Boost models perform much worse on the local data than on the global data. Comparing the feature distribution in Section 4.1, one can see a clear difference in the distribution of the global data, showing distinctive peaks for most features, and the local data, showing strong overlap for most features. This suggests that classic machine learning models struggle to distinguish between more similar feature distributions in the local data, while deep learning models are able to find deeper, non-linear connections between features and debris-flow susceptibility, even showing improved performance over easier-to-distinguish distributions.

5. Conclusions

In this work, we introduced a two-step negative data generation scheme for dataset creation and constructed a global and local debris-flow remote sensing dataset for public use. Then, a remote sensing image-based end-to-end machine learning framework was developed for debris-flow susceptibility study on both global and local scales.

Compared with the previous results of 0.88 on the same debris-flow location data, our work leads to state-of-the-art debris-flow susceptibility prediction both globally and locally, with an AUC of 0.947 and 0.957, together with fine-grained mapping resolution, using far fewer feature sets. Most importantly, our unified framework uses remote sensing data directly, making it easy to apply in any regional study of geohazard susceptibility prediction.

Among all three compared deep learning models, the ResNet-50 model has achieved the best classification performance in AUC. However, it exhibits difficulties in aligning the predicted probability with the observed probability of a debris flow in the test set, with a tendency to overestimate the susceptibility of low-risk locations. Therefore, model calibration metrics of risk assessment models deserve due consideration for interpretable hazard risk assessment in future work.

Our multi-scale analysis demonstrates that there are significant differences in the global and local datasets, where the local region shows less diversity on multiple features. In this scenario, traditional machine learning methods suffer from performance reduction, while all deep learning models have improved performance, showing superior abilities in mining complex nonlinear dependencies. Hence, we conclude that debris-flow susceptibility prediction benefits from advanced deep learning techniques, especially in localized regions with higher feature similarity.

The feature importance analysis shows that while there is a difference in feature distribution between scales, feature importance is largely not statistically different between them. The only exception is the DEM dataset, which is likely a result of the relatively low resolution of other features compared to the DEM. By increasing their resolution, it is likely to reduce the feature importance of DEM to a level more similar to the global model.

Although debris-flow susceptibility is an important step in finding vulnerable regions, it is important to additionally predict the time of the debris flow happening in advance to prevent injury and death in the affected population, as well as to use countermeasures to prevent damage to property and infrastructure. In the future, we will therefore expand this framework to include fine-grained temporal remote sensing data, such as daily and seasonal meteorological, vegetation, and soil data for dynamic time-based debris-flow susceptibility. Additionally, the framework can be extended to enable global and local detection of other hazards such as landslides or flooding, thereby allowing for more complete susceptibility mapping of various hazards. Finally, in addition to general susceptibility, combining this work with population and infrastructure data can be used for global and local risk assessment of the affected population, supporting the planning and distribution of aid infrastructure and supplies in the affected regions.

Author Contributions

Conceptualization, A.N. and A.B.; data curation, A.N. and J.L.; formal analysis, A.N.; funding acquisition, A.N. and B.D.; investigation, A.N. and A.B.; methodology, A.N.; project administration, A.N.; resources, B.D. and J.L.; software, A.N.; validation, A.B., B.D. and T.D.; visualization, A.N. and J.L.; writing—original draft, A.N.; writing—review and editing, A.B. and T.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China, grant number W2433165; the Key Research and Development Program of Sichuan Province, grant number 2023YFWZ0009; and the National Key Research and Development Program of China, grant number 2023YFE0121900.

Data Availability Statement

The data presented in this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.20034823. The python code of the data generation and machine learning framework is available at https://gitlab.com/ankidmr/multi-scale-debris-flow-susceptibility (accessed on 1 May 2026).

Acknowledgments

The computation is completed in the HPC Platform of Huazhong University of Science and Technology.

Conflicts of Interest

The authors have no relevant financial or non-financial conflicts of interests to disclose.

References

Iverson, R.M. The physics of debris flows. Rev. Geophys. 1997, 35, 245–296. [Google Scholar] [CrossRef]
de Carvalho Faria Lima Lopes, L.; de Almeida Prado Bacellar, L.; Amorim Castro, P.d.T. Assessment of the debris-flow susceptibility in tropical mountains using clast distribution patterns. Geomorphology 2016, 275, 16–25. [Google Scholar] [CrossRef]
Cui, P.; Hu, K.; Zhuang, J.; Yang, Y.; Zhang, J. Prediction of debris-flow danger area by combining hydrological and inundation simulation methods. J. Mt. Sci. 2011, 8, 1–9. [Google Scholar] [CrossRef]
Dowling, C.A.; Santi, P.M. Debris flows and their toll on human life: A global analysis of debris-flow fatalities from 1950 to 2011. Nat. Hazards 2014, 71, 203–227. [Google Scholar] [CrossRef]
Kirschbaum, D.B.; Adler, R.; Hong, Y.; Hill, S.; Lerner-Lam, A. A global landslide catalog for hazard applications: Method, results, and limitations. Nat. Hazards 2010, 52, 561–575. [Google Scholar] [CrossRef]
Liu, X.; Lei, J. A method for assessing regional debris flow risk: An application in Zhaotong of Yunnan province (SW China). Geomorphology 2003, 52, 181–191. [Google Scholar] [CrossRef]
Zhao, Y.; Meng, X.; Qi, T.; Li, Y.; Chen, G.; Yue, D.; Qing, F. AI-based rainfall prediction model for debris flows. Eng. Geol. 2022, 296, 106456. [Google Scholar] [CrossRef]
Di, B.; Zhang, H.; Liu, Y.; Li, J.; Chen, N.; Stamatopoulos, C.A.; Luo, Y.; Zhan, Y. Assessing Susceptibility of Debris Flow in Southwest China Using Gradient Boosting Machine. Sci. Rep. 2019, 9, 12532. [Google Scholar] [CrossRef]
Liang, W.j.; Zhuang, D.f.; Jiang, D.; Pan, J.j.; Ren, H.y. Assessment of debris flow hazards using a Bayesian Network. Geomorphology 2012, 171–172, 94–100. [Google Scholar] [CrossRef]
Xiong, K.; Adhikari, B.R.; Stamatopoulos, C.A.; Zhan, Y.; Wu, S.; Dong, Z.; Di, B. Comparison of Different Machine Learning Methods for Debris Flow Susceptibility Mapping: A Case Study in the Sichuan Province, China. Remote Sens. 2020, 12, 295. [Google Scholar] [CrossRef]
Kern, A.N.; Addison, P.; Oommen, T.; Salazar, S.E.; Coffman, R.A. Machine Learning Based Predictive Modeling of Debris Flow Probability Following Wildfire in the Intermountain Western United States. Math. Geosci. 2017, 49, 717–735. [Google Scholar] [CrossRef]
Kappes, M.S.; Malet, J.P.; Remaître, A.; Horton, P.; Jaboyedoff, M.; Bell, R. Assessment of debris-flow susceptibility at medium-scale in the Barcelonnette Basin, France. Nat. Hazards Earth Syst. Sci. 2011, 11, 627–641. [Google Scholar] [CrossRef]
Calvo, B.; Savi, F. A real-world application of Monte Carlo procedure for debris flow risk assessment. Comput. Geosci. 2009, 35, 967–977. [Google Scholar] [CrossRef]
Pal, S.C.; Chakrabortty, R.; Saha, A.; Bozchaloei, S.K.; Pham, Q.B.; Linh, N.T.T.; Anh, D.T.; Janizadeh, S.; Ahmadi, K. Evaluation of debris flow and landslide hazards using ensemble framework of Bayesian- and tree-based models. Bull. Eng. Geol. Environ. 2022, 81, 55. [Google Scholar] [CrossRef]
Lay, U.S.; Pradhan, B.; Yusoff, Z.B.M.; Abdallah, A.F.B.; Aryal, J.; Park, H.J. Data Mining and Statistical Approaches in Debris-Flow Susceptibility Modelling Using Airborne LiDAR Data. Sensors 2019, 19, 3451. [Google Scholar] [CrossRef]
Kurilla, L.J.; Fubelli, G. Global debris-flow susceptibility based on a comparative analysis of a single global model versus a continent-by-continent approach. Nat. Hazards 2022, 113, 527–546. [Google Scholar] [CrossRef]
Yuan, L.; Zhang, Q.; Li, W.; Zou, L. Debris Flow Hazard Assessment Based on Support Vector Machine. In Proceedings of the 2006 IEEE International Symposium on Geoscience and Remote Sensing; IEEE: New York, NY, USA, 2006; pp. 4221–4224. [Google Scholar] [CrossRef]
Ferentinou, M.; Chalkias, C. Mapping Mass Movement Susceptibility Across Greece with GIS, ANN and Statistical Methods. In Landslide Science and Practice: Volume 1: Landslide Inventory and Susceptibility and Hazard Zoning; Margottini, C., Canuti, P., Sassa, K., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 321–327. [Google Scholar] [CrossRef]
Ullah, K.; Wang, Y.; Fang, Z.; Wang, L.; Rahman, M. Multi-hazard susceptibility mapping based on Convolutional Neural Networks. Geosci. Front. 2022, 13, 101425. [Google Scholar] [CrossRef]
Zhao, Z.; Chen, T.; Dou, J.; Liu, G.; Plaza, A. Landslide Susceptibility Mapping Considering Landslide Local-Global Features Based on CNN and Transformer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 7475–7489. [Google Scholar] [CrossRef]
Bai, T.; Jiang, Z.; Tahmasebi, P. Debris flow prediction with machine learning: Smart management of urban systems and infrastructures. Neural Comput. Appl. 2021, 33, 15769–15779. [Google Scholar] [CrossRef]
Yokoya, N.; Yamanoi, K.; He, W.; Baier, G.; Adriano, B.; Miura, H.; Oishi, S. Breaking Limits of Remote Sensing by Deep Learning From Simulated Data for Flood and Debris-Flow Mapping. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4400115. [Google Scholar] [CrossRef]
Reichenbach, P.; Rossi, M.; Malamud, B.D.; Mihir, M.; Guzzetti, F. A review of statistically-based landslide susceptibility models. Earth-Sci. Rev. 2018, 180, 60–91. [Google Scholar] [CrossRef]
NASA; METI; AIST; Japan Spacesystems; U.S./Japan ASTER Science Team. ASTER Global Digital Elevation Model V003; NASA Land Processes Distributed Active Archive Center: Sioux Falls, SD, USA, 2019. [Google Scholar] [CrossRef]
de Jeu, R.; Owe, M. AMSR2/GCOM-W1 Surface Soil Moisture (LPRM) L3 1 Day 10 km × 10 km Ascending V001 (LPRM_AMSR2_DS_A_SOILM3) at GES DISC; Goddard Earth Sciences Data and Information Services Center: Greenbelt, MD, USA, 2014. [Google Scholar] [CrossRef]
Didan, K.; Barreto, A. NASA MEaSUREs Vegetation Index and Phenology (VIP) Phenology EVI2 Yearly Global 0.05 Deg CMG; NASA Land Processes Distributed Active Archive Center: Sioux Falls, SD, USA, 2016. [Google Scholar] [CrossRef]
Huffman, G.; Stocker, E.; Bolvin, D.; Nelkin, E.; Tan, J. GPM IMERG Late Precipitation L3 1 Day 0.1 Degree x 0.1 Degree V06; Goddard Earth Sciences Data and Information Services Center: Greenbelt, MD, USA, 2019. [Google Scholar] [CrossRef]
FAO; IIASA. Harmonized World Soil Database Version 2.0; FAO: Rome, Italy; International Institute for Applied Systems Analysis (IIASA): Laxenburg, Austria, 2023. [Google Scholar] [CrossRef]
Corominas, J.; van Westen, C.; Frattini, P.; Cascini, L.; Malet, J.P.; Fotopoulou, S.; Catani, F.; Van Den Eeckhaut, M.; Mavrouli, O.; Agliardi, F.; et al. Recommendations for the quantitative analysis of landslide risk. Bull. Eng. Geol. Environ. 2014, 73, 209–263. [Google Scholar] [CrossRef]
Chen, N.S.; Zhou, W.; Yang, C.L.; Hu, G.S.; Gao, Y.C.; Han, D. The processes and mechanism of failure and debris flow initiation for gravel soil with different clay content. Geomorphology 2010, 121, 222–230. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On calibration of modern neural networks. In Proceedings of the International Conference on Machine Learning; PMLR; Journal of Machine Learning Research (JMLR) Inc.: Cambridge, MA, USA, 2017; pp. 1321–1330. [Google Scholar]
Wijnand, H.P.; van de Velde, R. Mann–Whitney/Wilcoxon’s nonparametric cumulative probability distribution. Comput. Methods Programs Biomed. 2000, 63, 21–28. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Distribution of debris flows contained in the NASA global landslide catalog. Shown is only the debris-flow-related subset of the full dataset.

Figure 2. Overview of the research framework for debris-flow susceptibility. The main contributions—the unified data generation scheme and end-to-end machine learning framework—are highlighted in red.

Figure 3. Distribution of debris flows in the Sichuan province, China.

Figure 4. Illustration of the two-step negative sample generation for the local dataset. ‘Near’ locations (green) are randomly generated in the ‘Near region’ close to debris-flow locations. ‘Far’ negative locations (black) are randomly generated in the ‘Far Region’ with a minimum distance to debris-flow locations. The red shaded areas show the location of the close up views.

Figure 5. Distribution of the real debris-flow locations (red), as well as the generated ‘near’ (blue) and ‘far’ (black) locations used for training the global (top) and local (bottom) models. The ‘near’ locations provide negative samples close to the debris-flow locations to focus the susceptibility prediction into a smaller area, while the ‘far’ samples provide coverage of regions with no recorded debris flows.

Figure 6. Examples of the used remote sensing features. Shown is the average of all available years. (a) DEM, (b) Soil Moisture, (c) Soil Depth, (d) Vegetation Index, (e) Max. Precipitation, (f) Topsoil Clay %.

Figure 7. ROC curves of the global and local models.

Figure 8. Calibration curves of the global and local models. Each curve shows the mean and 95% confidence interval of the 10 folds of the respective model.

Figure 9. Susceptibility map of the global ResNet-50 model. Display without (top) and with (bottom) known debris-flow locations.

Figure 10. Susceptibility map of the local ResNet-50 model for the Sichuan province. Display without (top) and with known debris-flow locations (bottom). The black box shows the location of the detailed view in Sichuan.

Figure 11. Correlation between features in the global (left) and local (center) datasets, with the difference (global–local, right) between them.

Figure 12. Distributions of mean values of the remote sensing feature patches for the global and local datasets, sorted by non-debris-flow locations (blue) and debris-flow locations (orange).

Figure 13. Impact of missing features on debris-flow susceptibility performance in the global and local ResNet-50 model. Shown are the distributions of the 10 models generated with k-fold cross-validation and the difference in median AUC from the base result.

Figure 14. Debris flow susceptibility map of the global model with detailed views of missing features. (a) All features with debris-flow locations, (b) missing DEM, (c) missing soil moisture, (d) missing soil depth, (e) missing vegetation index, (f) missing max. precipitation, and (g) missing topsoil clay %.

Figure 15. Debris flow susceptibility maps of the local model with detailed views of missing features. (a) All features with debris-flow locations, (b) missing DEM, (c) missing soil moisture, (d) missing soil depth, (e) missing vegetation index, (f) missing max. precipitation, and (g) missing topsoil clay %.

Table 1. Overview of the Related Work.

Statistical Modeling
Reference	Scale	Study Area	# Samples	# Features	Method
de Carvalho Faria Lima Lopes et al. [2]	Local	Southeast Brazil	87	8	Power Model
Liu & Lei [6]	Local	Yunnan, China	10	11	Power Model
Calvo & Savi [13]	Local	Alps, Italy	13	1	Monte-Carlo model
Kurilla & Fubelli [16]	Global	Global	7989	12	Maximum Entropy
Machine Learning
Reference	Scale	Study Area	# Samples	# Features	Method
Zhao et al. [7]	Local	Loess Plateau, China	∼380	1	Extra Trees
Di et al. [8]	Local	Sichuan, China	∼3800	72	Gradient Boosting
Liang et al. [9]	Local	China	716	7	Bayesian Network
Xiong et al. [10]	Local	Sichuan, China	∼2500	18	Various
Kern et al. [11]	Local	western USA	388	26	Various
Pal et al. [14]	Local	Markazi Province, Iran	∼700	15	Random Forest
Lay et al. [15]	Local	Cameron Highlands, Malaysia	∼700	12	Support Vector Machine
Yuan et al. [17]	Local	Yunnan, China,	259	9	Support Vector Machine
Ferentinou & Chalkias [18]	Local	Greece	1200	16	Artificial Neural Network
Deep Learning
Reference	Scale	Study Area	# Samples	# Features	Method
Ullah et al. [19]	Local	Hindu Kusch, Pakistan	n/a	15	Convolutional Neural Network
Zhao et al. [20]	Local	Three Gorges Res., China	∼4200	9	Transformer

Table 2. The used remote sensing data sources.

Feature	Description	Resolution (m)	Resolution (Deg)	Time Range	Reference
DEM	Digital elevation model	∼30 m	0.00028°	-	[24]
Soil Moisture	Soil water content	∼10 km	0.1°	2013–2025	[25]
Soil Depth	Depth of the surface soil	∼10 km	0.1°	2013–2025	[25]
Vegetation Index	Average yearly vegetation index	∼5 km	0.05°	1981–2014	[26]
Max. Precipitation	Maximum daily rainfall in a year	∼10 km	0.1°	1998–2025	[27]
Topsoil Clay %	Clay content percentage of the topsoil	∼1 km	0.0083°	-	[28]

Table 3. Results of the proposed methods on global and local datasets.

Global Model
Method	AUC ↑	CI ↓	# Features
VGG-16	$0.945$	$\pm 0.010$	6
ResNet-50	0.947	$\pm 0.007$	6
Vision Transformer	$0.930$	$\pm 0.017$	6
Random Forest	$0.916$	$\pm 0.002$	6
Histogram Gradient Boost	$0.919$	$\pm 0.001$	6
Maximum Entropy (Kurilla & Fubelli [16])	$0.888$		12
Local Model
Method	AUC ↑	CI ↓	# Features
VGG-16	$0.949$	$\pm 0.005$	6
ResNet-50	0.957	$\pm 0.002$	6
Vision Transformer	$0.940$	$\pm 0.013$	6
Random Forest	$0.874$	$\pm 0.008$	6
Histogram Gradient Boost	$0.894$	$\pm 0.004$	6
Gradient Boosting Trees (Di et al. [8])	$0.88$		72

Values and models marked in bold have the highest AUC. ↑: larger is better. ↓: smaller is better.

Table 4. Calibration slope and intercept of the global and local models.

Global Model
Method	Calibration Slope	Calibration Intercept
VGG-16	$0.912 \pm 0.192$	$0.195 \pm 0.224$
ResNet-50	$0.776 \pm 0.199$	$- 0.098 \pm 0.279$
Vision Transformer	$0.976 \pm 0.089$	$0.033 \pm 0.094$
Local Model
Method	Calibration Slope	Calibration Intercept
VGG-16	$0.991 \pm 0.129$	$- 0.132 \pm 0.241$
ResNet-50	$0.790 \pm 0.09$	$- 0.134 \pm 0.167$
Vision Transformer	$0.955 \pm 0.05$	$- 0.060 \pm 0.193$

Table 5. Effect of different radii on model performance, measured as average AUC and 95% confidence interval of k-fold cross-validation.

Global Model
$(r_{1}, r_{2})$	$(0.05, 0.5)$	$(0.15, 1.5)$	$(0.25, 2.5)$
VGG-16	$0.937 \pm 0.005$	$0.945 \pm 0.010$	$0.950 \pm 0.009$
ResNet-50	$0.936 \pm 0.004$	$0.947 \pm 0.007$	$0.951 \pm 0.006$
Vision Transformer	$0.886 \pm 0.030$	$0.930 \pm 0.017$	$0.940 \pm 0.009$
Local Model
$(r_{1}, r_{2})$	$(0.01, 0.1)$	$(0.02, 0.2)$	$(0.05, 0.5)$
VGG-16	$0.940 \pm 0.002$	$0.949 \pm 0.005$	$0.964 \pm 0.006$
ResNet-50	$0.943 \pm 0.006$	$0.957 \pm 0.002$	$0.964 \pm 0.004$
Vision Transformer	$0.921 \pm 0.014$	$0.940 \pm 0.013$	$0.958 \pm 0.006$

Values and models marked in bold have the highest AUC.

Table 6. Statistical significance of feature importance difference between the global and local models. Values are the p-values of a two-sided Wilcoxon rank-sum test. Values marked with a * are statistically significantly different.

Feature	Mean Difference	p
DEM	0.185	0.0002 *
Soil Depth	0.005	0.47
Topsoil Clay %	0.004	0.68
Vegetation Index	0.002	0.68
Max. Precipitation	0.006	0.21
Soil Moisture	0.001	0.73

Table 7. Change in prediction rates depending on sample type. Columns show the base result and the results of models without the listed feature. Note that, when debris flows are all classified as positive, only

T P_{t}

is defined, while for the negative generated near and far samples, only

F P_{t}

is defined.

Table 7. Change in prediction rates depending on sample type. Columns show the base result and the results of models without the listed feature. Note that, when debris flows are all classified as positive, only

T P_{t}

is defined, while for the negative generated near and far samples, only

F P_{t}

is defined.

Global Model
Sample	Score	Base Res.	DEM	S. Depth	Tops. Clay %	Veg. Index	Max. Prec.	S. Moisture
debris flow	$T P_{t}$ ↑	0.866	0.868	0.872	0.820	0.877	0.884	0.871
near sample	$F P_{t}$ ↓	0.144	0.208	0.150	0.073	0.199	0.165	0.098
far sample	$F P_{t}$ ↓	0.058	0.147	0.061	0.046	0.081	0.081	0.046
Local Model
Sample	Score	Base Res.	DEM	S. Depth	Tops. Clay %	Veg. Index	Max. Prec.	S. Moisture
debris flow	$T P_{t}$ ↑	0.858	0.674	0.870	0.891	0.897	0.931	0.949
near sample	$F P_{t}$ ↓	0.122	0.410	0.106	0.128	0.170	0.202	0202
far sample	$F P_{t}$ ↓	0.048	0.102	0.027	0.048	0.048	0.048	0.061

↑: larger is better. ↓: smaller is better.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nienkötter, A.; Bian, A.; Di, B.; Li, J.; Deng, T. Assessing Debris-Flow Susceptibility at Local and Global Scales: A Deep-Learning-Based Comparative Study ofSichuan, China, and Worldwide. Remote Sens. 2026, 18, 1442. https://doi.org/10.3390/rs18091442

AMA Style

Nienkötter A, Bian A, Di B, Li J, Deng T. Assessing Debris-Flow Susceptibility at Local and Global Scales: A Deep-Learning-Based Comparative Study ofSichuan, China, and Worldwide. Remote Sensing. 2026; 18(9):1442. https://doi.org/10.3390/rs18091442

Chicago/Turabian Style

Nienkötter, Andreas, Ang Bian, Baofeng Di, Jierui Li, and Tian Deng. 2026. "Assessing Debris-Flow Susceptibility at Local and Global Scales: A Deep-Learning-Based Comparative Study ofSichuan, China, and Worldwide" Remote Sensing 18, no. 9: 1442. https://doi.org/10.3390/rs18091442

APA Style

Nienkötter, A., Bian, A., Di, B., Li, J., & Deng, T. (2026). Assessing Debris-Flow Susceptibility at Local and Global Scales: A Deep-Learning-Based Comparative Study ofSichuan, China, and Worldwide. Remote Sensing, 18(9), 1442. https://doi.org/10.3390/rs18091442

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Assessing Debris-Flow Susceptibility at Local and Global Scales: A Deep-Learning-Based Comparative Study ofSichuan, China, and Worldwide

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Study Area

2.2. Dataset Generation

2.2.1. Negative Sample Generation

2.2.2. Remote Sensing Features

2.3. The Deep Learning Framework

2.3.1. The Susceptibility Models

2.3.2. Model Training

2.3.3. Model Evaluation

2.4. Data and Code Availability

3. Results

3.1. Model Performance

3.2. Influence of the Data Generation Radii

3.3. Susceptibility Mapping

4. Discussion

4.1. Feature Analysis on Global and Local Datasets

4.2. Feature Importance for Global and Local Datasets

4.3. Model-Feature Distribution Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI