Multi-Scale Attention Network for Landslide Susceptibility Assessment

Zhan, Zhao; Chen, Shanxiong; Zhang, Min; Shi, Wenzhong; Sun, Yangjie; Luo, Hongbo

doi:10.3390/geosciences16050188

Open AccessArticle

Multi-Scale Attention Network for Landslide Susceptibility Assessment

by

Zhao Zhan

^1,2,3

,

Shanxiong Chen

¹

,

Min Zhang

⁴

,

Wenzhong Shi

^3,5,*

,

Yangjie Sun

^3,5 and

Hongbo Luo

²

¹

Changjiang Spatial Information Technology Engineering Co., Ltd., Wuhan 430010, China

²

Changjiang Survey, Planning, Design and Research Co., Ltd., Wuhan 430010, China

³

Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China

⁴

School of Geosciences and Info-Physics, Central South University, Changsha 410083, China

⁵

Otto Poon Charitable Foundation Smart Cities Research Institute, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China

^*

Author to whom correspondence should be addressed.

Geosciences 2026, 16(5), 188; https://doi.org/10.3390/geosciences16050188

Submission received: 4 January 2026 / Revised: 17 March 2026 / Accepted: 26 March 2026 / Published: 7 May 2026

(This article belongs to the Special Issue Intelligent Landslide Early Warning: From Multi-Source Sensing to AI-Driven Forecasting)

Download

Browse Figures

Versions Notes

Abstract

Landslide susceptibility assessment (LSA) is crucial for regional landslide risk evaluation and mitigation strategy formulation. Previous studies mostly adopted single-scale features, while landslide formation is influenced by multi-scale factors, making multi-scale information extraction more appropriate for assessment. This study proposes a deep learning framework integrating multi-scale and attention modules for object-based LSA. A multi-scale network extracts geo-environmental features at different scales, which are input into attention networks using multi-head attention and Squeeze-and-Excitation, termed MSMHA and MSSE, respectively, to enhance relevant features and suppress irrelevant ones. Finally, features are fused for classification and prediction. In a case study in Hong Kong, CNN-based and ML-based methods were compared using 9814 landslides and 11 influencing factors. Results show the proposed MSMHA (area under the curve, AUC 0.91) and MSSE (AUC 0.90) outperform conventional methods (e.g., random forest with AUC 0.86; multi-layer perceptron and support vector machine with AUC 0.85; DenseNet with AUC 0.86; CNN with AUC 0.88; VGG with AUC 0.87; GoogLeNet and ResNet with AUC 0.81). CNN-based methods outperformed ML-based ones, indicating that incorporating neighborhood information improves model performance. The rationality of the susceptibility map generated by MSMHA was verified via comparative analysis. Results confirm that the proposed multi-scale and attention-integrated framework outperforms traditional single-scale methods consistently. Equally importantly, the case study provides advanced CNN-based landslide susceptibility maps for Hong Kong, which can serve as a critical reference for regional landslide risk management and the formulation of targeted mitigation strategies.

Keywords:

landslide susceptibility assessment; convolutional neural network; multi-scale network; multi-head attention; Squeeze-and-Excitation

1. Introduction

Landslides rank among the most destructive natural hazards, causing extensive damage to infrastructure and posing severe threats to human life and property [1,2,3]. Regional landslide susceptibility assessment (LSA) is a critical tool for mitigating these risks by evaluating the spatial likelihood of future landslides based on historical landslide inventories and associated conditioning factors [4]. LSA integrates diverse landslide-influencing factors (LIFs), including topographic, geological, hydrological, and anthropogenic variables, within a study area to identify zones with high potential for slope failure [5,6,7]. By delineating susceptible areas, LSA supports informed decision-making in land-use planning, disaster preparedness, and risk management.

Over the past decades, various methodologies have been employed for LSA, including empirical approaches [8,9], mechanistic models [10,11], and statistical techniques [12,13]. In recent years, machine learning (ML)-based methods have gained increasing prominence due to their strong capability in modeling complex, nonlinear relationships between environmental factors and landslide occurrences [14,15,16,17,18,19,20]. With rapid advances in artificial intelligence, numerous ML algorithms, such as support vector machines (SVM), random forests (RF), and multi-layer perceptrons (MLP), have been widely applied in landslide susceptibility mapping [21,22,23,24,25,26,27]. Concurrently, progress in remote sensing and geographic information systems (GIS) has enabled the integration of high-resolution, multi-source geospatial data into LSA frameworks, improving model reliability and spatial accuracy [14].

Despite these advancements, conventional ML models typically treat input features as one-dimensional vectors, representing individual sampling points. This point-based paradigm captures only local environmental conditions while neglecting the spatial context and interactions surrounding each location [28,29]. Such an approach fails to account for the spatial synergies among influencing factors, such as slope transitions, drainage patterns, and lithological boundaries, that are crucial for accurate susceptibility modeling. A more effective representation involves using gridded or areal data, such as raster layers, which preserve spatial structure and enable holistic analysis of the landscape.

Convolutional neural networks (CNNs), a class of deep learning models specifically designed for grid-structured data, have demonstrated success in capturing spatial patterns from such representations [28,30]. Their inherent properties of local connectivity and weight sharing allow CNNs to effectively extract hierarchical spatial features, making them well-suited for identifying geomorphological indicators of instability. Consequently, CNNs have been increasingly adopted in landslide studies. For instance, they have been used for landslide detection from remote sensing imagery [31] and for regional susceptibility assessment. Wang et al. (2019) pioneered the application of a LeNet-5-inspired CNN for LSA in Yanshan County, China, achieving higher accuracy than traditional ML methods [32]. Subsequent studies have explored more advanced architectures, including VGG [33], GoogLeNet [34], ResNet [20], and DenseNet [35], in diverse geographical settings [30]. Other works have integrated CNNs with recurrent networks, such as CNN-LSTM [29], or interpretability tools, such as CNN-SHAP [29], further advancing the field.

Nevertheless, most existing CNN-based LSA models rely on single-scale feature extraction through fixed-size convolutional kernels, resulting in a limited receptive field. This constraint poses a significant challenge given the wide variability in landslide sizes, ranging from tens of square meters to several square kilometers. A static receptive field may either fail to capture fine-grained details essential for detecting small landslides, if too large, or overlook broader geological and hydrological contexts necessary for characterizing large-scale failures, if too small. This inability to adapt to multiple spatial scales undermines model performance across heterogeneous landscapes and limits generalization capacity.

To address this limitation, multi-scale strategies have been introduced into CNN architectures for LSA [28,36]. For example, Yi et al. [28] proposed a multi-scale CNN that fuses features extracted at different resolutions, improving prediction accuracy. However, simple concatenation or summation of multi-scale features can lead to issues such as feature redundancy, scale conflict, and information interference. Specifically, dominant coarse-scale features may overwhelm finer-scale details during fusion, leading to the loss of critical information for small landslides.

To mitigate these challenges, attention mechanisms have emerged as a powerful solution [25,26,37,38,39]. These mechanisms enable the network to dynamically assign weights to different feature channels or spatial locations based on their relevance to the task, thereby enhancing informative features while suppressing irrelevant or redundant ones. When integrated into multi-scale CNNs, attention modules can selectively emphasize important cross-scale features, alleviating scale imbalance and improving feature fusion efficiency. Despite their potential, no study has yet systematically evaluated the effectiveness of combining multi-scale CNNs with advanced attention mechanisms, particularly both channel-wise and self-attention, for LSA.

To bridge this gap, we propose a multi-scale attention network (MSAN) framework for landslide susceptibility assessment. Within this unified architecture, multi-scale features are first extracted via parallel convolutional branches with kernels of varying sizes, such as

3 \times 3

,

5 \times 5

, and

7 \times 7

, enabling simultaneous capture of local, intermediate, and contextual patterns. Subsequently, two attention-based fusion strategies are implemented to refine the multi-scale feature representation. The first variant, termed the multi-scale Squeeze-and-Excitation (MSSE) CNN, applies channel-wise attention to recalibrate feature importance across scales. The second variant, termed the multi-scale multi-head attention (MSMHA) CNN, performs self-attention over flattened spatial-feature vectors to capture long-range dependencies and dynamic feature interactions. Both variants leverage the strengths of multi-scale representation while introducing adaptive fusion mechanisms to improve classification accuracy. The refined features are then passed through classification layers to generate a continuous susceptibility index.

The proposed MSAN framework is evaluated in the Hong Kong Special Administrative Region (China), a mountainous region highly prone to rainfall-induced landslides. Commonly used evaluation protocols in LSA were employed, including receiver operating characteristic (ROC) curve analysis with area under the curve (AUC) as the primary metric, along with precision, recall, and F-measure [4,14,21,40]. Spatial cross-validation techniques were implemented to ensure robust model validation, with landslide inventory data partitioned into training and testing datasets using appropriate sampling methods [10,28,41,42]. Our study focuses specifically on rainfall-induced shallow landslides, which represent the predominant landslide type in this region. Experiments compare the MSSE and MSMHA models against classical ML algorithms (RF, MLP, SVM) and established CNN variants (VGG, GoogLeNet, ResNet, DenseNet, and a standard multi-scale CNN baseline). Results demonstrate that the integration of multi-scale extraction with attention-based fusion enhances prediction accuracy and generalization.

This work contributes to the advancement of deep learning in geohazard assessment by (1) proposing a unified MSAN framework for LSA that supports flexible integration of different attention mechanisms; (2) developing and evaluating two variants, MSSE and MSMHA, that outperform existing models in both accuracy and feature adaptability; (3) demonstrating the effectiveness of attention-guided feature fusion in mitigating scale conflict and redundancy in multi-scale LSA models.

2. Study Area and Data

2.1. Overview of the Study Area

The study area encompasses the Hong Kong Special Administrative Region of China (Figure 1), located between longitudes 113.80° E to 114.50° E and latitudes 22.15° N to 22.57° N, covering approximately 1100 square kilometers. It includes three major regions: Hong Kong Island, the Kowloon Peninsula, and the New Territories. The highest peak, Tai Mo Shan, reaches 957 m above sea level. Hong Kong experiences a tropical monsoon climate, with an average annual rainfall of 2400 mm, 80% of which occurs between May and September.

The terrain is predominantly hilly, with 75% of the land having slopes steeper than 15° and over 30% exceeding 30°. This rugged topography, combined with frequent tropical cyclones and intense rainfall (Figure 1c), renders the region highly susceptible to landslides, with an average of over 300 incidents per year. Urban development is concentrated in flatter areas nestled among steep slopes. As one of the most densely populated cities globally, home to approximately 7 million people, landslides pose significant threats to life and infrastructure. These risks are further exacerbated by population growth and projected increases in extreme rainfall due to climate change.

The geological framework of Hong Kong is a key determinant of landslide susceptibility. The region comprises diverse lithological units, including granitic rocks (e.g., diorite, granodiorite), metamorphic schists, and sedimentary formations (e.g., mudstone, sandstone). Granitic terrains, though generally resistant to weathering, develop deep weathering profiles that can lead to unstable slopes when fractured. Metamorphic schists exhibit pronounced foliation planes that act as potential slip surfaces, while sedimentary layers often display differential erosion rates, creating oversteepened slopes.

Major fault systems, such as the Lion Rock Fault and Tolo Fault Zone, traverse the study area, creating zones of structural weakness. These faults not only increase rock fragmentation but also facilitate groundwater flow paths that contribute to slope instability. The proximity to faults correlates strongly with landslide density, with landslides showing preferential distribution in proximity to mapped fault lines, particularly in granitic and metamorphic terrains where structural weaknesses dominate.

2.2. Historical Landslide Inventory

A reliable landslide inventory map (LIM) is crucial for LSA, as its completeness and accuracy directly affect model performance [41]. The LIM used in this study was compiled by the Geotechnical Engineering Office of the Civil Engineering and Development Department (CEDD) of Hong Kong, based on aerial photograph interpretation from 1924 to 2019. This dataset, known as the Enhanced Natural Terrain Landslide Inventory (ENTLI) [11], contains 111,408 landslide records up to 2019.

Each landslide in ENTLI is represented as a polyline (the landslide trail) with an associated source point (Figure 1c). In this study, the source point of each landslide is used as the center of the corresponding sample. Given that records after 1984 are more reliable, only landslides occurring between 1984 and 2019 are included, totaling 13,771 events.

2.3. Landslide Influencing Factors

Landslides result from the interaction of multiple LIFs [43]. Selecting appropriate factors is essential for accurate susceptibility modeling, balancing physical relevance and data availability. In this work, 11 key LIFs were selected based on prior research [44]: lithology, distance to faults, distance to roads, elevation, slope, aspect, profile curvature, plan curvature, stream power index (SPI), topographic wetness index (TWI), and sediment transport index (STI). These factors exhibit low inter-correlation, ensuring they represent distinct aspects of slope stability.

As illustrated in Figure 2 and Figure 3, these factors influence slope stability from various perspectives:

Aspect: The downslope direction of the maximum rate of change, measured in degrees clockwise from north, ranging from 0° to 360°. Aspect affects solar radiation exposure and soil moisture distribution, with north-facing slopes receiving less direct sunlight and retaining more moisture in the Northern Hemisphere.
Distance to road: Euclidean distance from each cell to the nearest road network, measured in meters. This factor reflects anthropogenic disturbance and potential slope destabilization due to construction activities.
Elevation: Height above mean sea level, measured in meters. Elevation influences local climate conditions, vegetation patterns, and geological processes.
Slope: The maximum rate of elevation change between each cell and its neighbors, expressed in degrees with values ranging from 0° (flat) to 90° (vertical). Slope directly affects gravitational stress on the terrain.
Plan and profile curvature:
–
Profile curvature: Calculated as the curvature along the direction of maximum slope (tangent to the flow line), with positive values indicating convex surfaces (accelerating flow, erosion) and negative values indicating concave surfaces (decelerating flow, deposition). Values typically range from −1 to 1.
–
Plan curvature: Calculated as the curvature perpendicular to the direction of maximum slope, with positive values indicating convergent flow (convex shape) and negative values indicating divergent flow (concave shape). Values typically range from −1 to 1. These parameters control water convergence and flow acceleration, affecting slope stability.
SPI (Stream Power Index): Measures the erosive power of surface runoff, calculated as $A_{s} tan β$ , where $A_{s}$ is the contributing upslope area per unit contour length and $tan β$ is the local slope gradient. Higher values indicate areas prone to erosion and sediment transport.
TWI (Topographic Wetness Index): Predicts saturation-prone areas, calculated as $ln (A_{s} / tan β)$ , where $A_{s}$ is the specific catchment area and $tan β$ is the local slope gradient. Higher values indicate areas with higher potential for saturation and water accumulation.
STI (Sediment Transport Index): Characterizes sediment transport potential, calculated as ${(A_{s} / 22.13)}^{0.6} \times {(sin β / 0.0896)}^{1.3}$ , where $A_{s}$ is the upslope contributing area and $β$ is the slope gradient. Higher values indicate greater potential for sediment movement.
Distance to fault: Euclidean distance from each cell to the nearest geological fault line, measured in meters. This factor indicates structural weakness and potential zones of rock fragmentation and instability.
Lithology: Classification of bedrock and surficial geological materials based on rock type, mineral composition, and engineering properties. Different lithological units have varying resistance to weathering and different geomechanical properties.

All LIFs were derived from the Hong Kong Open Data Platform (https://data.gov.hk). Topographic factors were generated from a high-resolution Digital Terrain Model (DTM) based on LiDAR data collected in January 2011, with a planimetric accuracy of 0.5 m. Geological and anthropogenic factors were extracted from 1:20,000-scale geomaps. A summary of data sources and resolutions is provided in Table 1.

Given the difficulty in aligning LIFs perfectly with historical landslide timing, we assume that all selected factors remained stable during the study period (1984–2019).

2.4. Data Preprocessing

Raw LIFs vary in data structure and must be preprocessed before model input. Since CNNs require regularized image inputs, all vector-based LIFs are rasterized using ArcGIS 10.7. All layers are unified under the same coordinate reference system (HK1980 Grid) and resampled to a consistent spatial resolution.

The original LiDAR-derived DTM has a resolution of 1.0 m/pixel, which provides sufficient detail for fine-scale LSA. Thus, all 11 LIFs are resampled to 1.0 m resolution. Areas with no-data values (e.g., water bodies) are masked out using the Hong Kong 2020 land use dataset. Additionally, urbanized and engineered slopes—where landslides are unlikely due to stabilization measures—are excluded using a second land-use-based mask.

All LIFs are normalized to the range

[0, 1]

using min–max normalization:

x^{'} = \frac{x - x_{min}}{x_{max} - x_{min}}

(1)

This normalization serves several important purposes: (1) it eliminates scale differences between various LIFs with different units and value ranges (e.g., elevation in meters, slope in degrees, lithology classifications), ensuring that all factors contribute equally to the model without being dominated by variables with larger numerical values; (2) it accelerates model convergence during training by keeping all input values within a consistent and manageable range for the neural network; (3) it stabilizes gradient propagation in the deep learning model, preventing potential gradient vanishing or exploding issues that can occur when input values vary significantly in magnitude.

For CNN input, evaluation units are defined as fixed-size image cubes. Based on landslide morphology statistics (Table 2), most landslide source areas are smaller than 30 m in length and 900 m² in area. Considering the minimum input size requirement of 32 pixels for the CNN architecture and the 1.0 m/pixel resolution, a

32 \times 32

m image cube is adopted as the evaluation unit.

3. Methods

3.1. Study Workflow

This study follows a four-stage workflow (Figure 4): (1) data collection and preprocessing, which has been elaborated in Section 2; (2) construction of dual-format datasets to support both deep learning and traditional machine learning models; (3) model training and comparative evaluation using the proposed MSAN and baseline methods; and (4) result verification through quantitative metrics and spatial pattern analysis. The workflow is designed to systematically assess the impact of spatial context and model architecture on landslide susceptibility prediction performance.

3.2. Dual Dataset Construction for CNN and Traditional ML Models

To enable a fair comparison between convolutional neural networks (CNNs) and traditional machine learning (ML) models, we construct two datasets—object-based and pixel-based—from the same set of landslide-influencing factors (LIFs). This dual-dataset strategy isolates the impact of spatial context while ensuring consistency in sample locations and class distribution.

The sample generation process (Figure 5) consists of the following steps:

1.: Grid Partitioning: The study area (excluding water bodies) is divided into a regular grid of $200 \times 200$ m cells. This grid size was selected based on (1) the typical size distribution of landslides in the study area (most landslide source areas are smaller than 30 m in length and 900 m² in area, as shown in Table 2); (2) the spatial resolution of the input landslide inventory data and conditioning factors; and (3) the need to ensure adequate sample separation to minimize spatial autocorrelation effects, with 200 m separation exceeding the minimum distance threshold commonly recommended in spatial modeling studies.
2.: Landslide Object Determination: Each grid cell is classified as a landslide object if it contains at least one landslide source point, or as a non-landslide object otherwise. These labeled objects serve as the basis for sample generation.
3.: Sample Location Selection: All landslide objects are retained to generate positive samples. For each landslide object, the associated landslide source point is used as the sample location. Non-landslide objects are randomly selected to achieve a positive-to-negative ratio of 1:2, with the grid centroid serving as the sample location for each selected non-landslide object. This results in 29,440 sample locations (9814 positive, 19,626 negative).

Based on these shared sample locations, the two datasets are constructed as follows:

Object-based dataset: A $32 \times 32$ pixel block centered on each sample location is extracted from all 11 LIF layers and stacked into a $32 \times 32 \times 11$ tensor, forming an image-like input for CNNs, using the 11 LIFs defined in Section 2.3.
Pixel-based dataset: The LIF values at each sample location are extracted as an 11-dimensional feature vector, suitable for traditional ML models (e.g., SVM, RF, MLP).

Both datasets are aligned at the sample level, ensuring identical spatial locations and class labels. All samples are randomly partitioned into training (60%), validation (30%), and test (10%) subsets, with class distribution preserved in each split. The training and validation sets are used for model training and hyperparameter optimization (e.g., via cross-validation for traditional ML models), while the test set is reserved for final performance evaluation. This consistent protocol ensures a fair comparison between CNN and traditional ML models. The 200 m separation between sample locations helps mitigate spatial autocorrelation.

Specifically, the 60%–30%–10% split was chosen based on the following considerations: (1) the 60% training set provides sufficient data for training complex deep learning models with multiple parameters; (2) the relatively large 30% validation set ensures robust hyperparameter tuning and reliable early stopping criteria, which is particularly important for spatial prediction tasks where spatial autocorrelation can affect model generalization; and (3) the 10% test set provides an adequate holdout sample for unbiased final performance assessment while maintaining sufficient sample sizes for each subset in our spatially distributed landslide data. This split balances the need for sufficient training data with the requirement for reliable validation and testing in spatial prediction applications.

Table 3 summarizes the two datasets.

3.3. Proposed MSAN

To address the limitations of conventional CNNs in capturing multi-scale terrain features and distinguishing critical from redundant inputs, we propose a multi-scale attention network (MSAN) for LSA.

The architecture (Figure 6) integrates three components: (i) multi-scale feature extraction through parallel convolutional branches with varying kernel sizes (

3 \times 3

,

5 \times 5

,

7 \times 7

) to capture landslide features at different spatial scales, from local slope variations to regional geological structures; (ii) attention-based feature refinement via adaptive recalibration mechanisms (MSSE and MSMHA variants) to mitigate scale conflict and redundancy, employing channel-wise SE blocks and spatial multi-head self-attention for dynamic feature selection; and (iii) a classification head that maps the fused features to continuous susceptibility indices for spatial prediction, outputting per-pixel landslide probability through fully connected layers with sigmoid activation.

The network takes a

32 \times 32 \times 11

input patch and outputs a scalar susceptibility probability via a sigmoid activation. To evaluate the contribution of each component, we define three variants under the same architectural backbone:

MSCNN: multi-scale convolutional neural network, baseline with multi-scale extraction only.
MSSE: MSCNN + Squeeze-and-Excitation (SE) block for channel attention.
MSMHA: MSCNN + multi-head self-attention (MHA) for spatial attention.

All variants share the same input, depth, and training configuration, enabling a controlled ablation study.

3.3.1. Multi-Scale Feature Extraction

To capture terrain characteristics across multiple spatial scales—from local slope variations to regional geological structures—a multi-scale feature extraction module is employed at the input stage. This module processes the input landslide-influencing factors (LIFs) through parallel convolutional branches with kernel sizes of

3 \times 3

,

5 \times 5

, and

7 \times 7

, enabling the network to simultaneously perceive fine details and broad contextual patterns.

Each branch consists of a convolutional layer followed by batch normalization and a ReLU activation function. The smaller kernels, such as

3 \times 3

, are sensitive to fine-grained local details like abrupt elevation changes, while larger kernels, such as

7 \times 7

, capture broader contextual information, including watershed boundaries or fault zones.

The outputs from all branches are concatenated along the channel dimension, producing a fused multi-scale feature map that preserves both high-resolution detail and wide-field context. As illustrated in Figure 7, this design enables the network to adaptively respond to landslides of varying sizes and morphologies.

3.3.2. Attention Mechanism Integration

To enhance the model’s ability to focus on critical features while suppressing irrelevant or noisy inputs, two distinct attention mechanisms are integrated: channel-wise attention and spatial self-attention. These modules are applied sequentially after the multi-scale feature extraction to recalibrate feature importance in complementary ways.

Channel Attention (SE Block)

The Squeeze-and-Excitation (SE) block [23] is adopted to model inter-channel dependencies and adaptively re-weight feature channels based on their global contribution to landslide prediction. The mechanism operates in two steps: (1) Squeeze: global average pooling is applied across spatial dimensions to generate a channel descriptor; (2) Excitation: a bottleneck fully connected layer followed by a sigmoid activation produces channel-wise weights, which are then multiplied element-wise with the original feature map. This allows the model to emphasize relevant feature channels while down-weighting less informative ones. The SE block structure is illustrated in Figure 8.

Let

U = [U_{1}, U_{2}, \dots, U_{C}] \in R^{H \times W \times C}

denote the input to the channel attention layer, consisting of C feature maps of size

H \times W

. The process involves three steps:

Step 1: Attention Extraction (Squeeze) Global average pooling compresses each channel into a scalar:

z_{c} = F_{sq} (U) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} U_{c} (i, j)

(2)

Step 2: Attention Update (Excitation) A bottleneck MLP with ReLU (

σ

) and Sigmoid (

δ

) activations learns adaptive channel weights:

s = F_{ex} (z, W) = σ (W_{2} δ (W_{1} z))

(3)

where

W_{1} \in R^{C / r \times C}

,

W_{2} \in R^{C \times C / r}

, and r is the reduction ratio.

Step 3: Attention Allocation (Scaling) The learned weights

s_{c}

are applied via element-wise multiplication:

{\tilde{X}}_{c} = F_{scale} (u_{c}, s_{c}) = u_{c} \times s_{c}

(4)

Multi-Head Self-Attention (MHA)

To capture long-range spatial dependencies beyond the local receptive field of convolutions, a multi-head self-attention (MHA) module [38] is introduced. The MHA architecture is shown in Figure 9. The flattened multi-scale feature map is reshaped into a sequence of feature vectors, each representing a spatial location. Self-attention is computed across all positions, allowing each location to attend to others through query, key, and value transformations. By employing multiple attention heads, the model can jointly attend to information from different subspaces, capturing complex spatial relationships such as topographic convergence zones or fault-line alignments critical for landslide initiation.

The input features are linearly projected into query (Q), key (K), and value (V) matrices:

Q = X W^{Q}, K = X W^{K}, V = X W^{V}

(5)

Scaled dot-product attention is computed as

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(6)

Multi-head attention divides the input into h heads, performs attention independently, and concatenates the results,

\begin{matrix} {head}_{i} & = Attention (Q_{i}, K_{i}, V_{i}), i = 1, \dots, h \end{matrix}

(7)

\begin{matrix} MultiHead (Q, K, V) & = Concat ({head}_{1}, \dots, {head}_{h}) W^{O} \end{matrix}

(8)

where

W^{O} \in R^{(h \cdot d_{k}) \times d}

is a linear projection matrix,

d_{k}

is the dimension per head, and d is the output dimension.

The output retains positional information while encoding global context and is reshaped back into a 2D feature map for downstream processing.

3.3.3. Classification Head

Following feature refinement via attention, the enhanced feature map is fed into a classification head composed of two fully connected layers. The output layer uses a sigmoid activation function to produce a per-pixel landslide susceptibility probability map, where each value represents the likelihood of landslide occurrence at the corresponding location.

4. Experiment and Result Evaluation

4.1. Object-Based LSA

To validate the effectiveness of the proposed LSA model, classical CNN classification models were selected for comparative experiments. These models include VGG [33], GoogLeNet [34,45], ResNet [20], and DenseNet [35]. VGG is a well-known deep CNN featuring a straightforward architecture and impressive performance, characterized by a deep hierarchical structure that employs a repetitive stacking of small convolutional kernels and pooling layers. In this study, VGG16 was utilized as the backbone of the network. GoogLeNet, developed by the Google team, is recognized for its innovative Inception structure, enabling the extraction of features at different scales and performing well in image classification tasks. ResNet introduced residual blocks, allowing the network to learn residual information effectively, enabling the training of deeper networks while improving performance. DenseNet is a densely connected CNN where each layer is connected to all preceding layers. This architecture promotes better information flow, addressing the gradient vanishing problem and enhancing feature delivery efficiency.

Given that the original implementations of these networks accept input data of size

224 \times 224

with 3 channels, modifications were necessary to align with the dataset characteristics in this work. Specifically, the number of input channels in the first convolutional layer was adjusted to 11, while the output features in the last fully connected layer were modified accordingly to match the feature dimensions. These models maintained binary classification outputs.

Every model was trained using a batch size of 32 and a learning rate of 0.001 on a training dataset spanning 300 epochs. The model’s fitting performance is assessed on the validation dataset at the end of each epoch. An early stopping approach is used to avoid overfitting. Training is stopped and the optimal network parameters are preserved if the validation loss does not improve for 10 consecutive epochs. All experiments were conducted on a workstation equipped with an Intel Core i7-7700 processor (3.6 GHz), an NVIDIA GeForce RTX 3090 GPU, and running Windows 10. Models were implemented using the PyTorch 2.0.1 framework and the Python 3.8 environment. The datasets used were

32 \times 32 \times 11

in size. To find the best-performing parameters, hyperparameters were tuned empirically.

The Adaptive Moment Estimation (Adam) optimizer [24] is employed to optimize the model parameters during training. Adam iteratively updates the neural network weights to minimize the loss function J, which uses binary cross-entropy to quantify the discrepancy between true and predicted probability distributions:

J = - \frac{1}{m} \sum_{i = 1}^{m} [y^{(i)} log h_{θ} (x^{(i)}) + (1 - y^{(i)}) log (1 - h_{θ} (x^{(i)}))]

(9)

where the following apply:

m: number of training samples.
$y^{(i)}$ : actual label (0 or 1).
$h_{θ} (x^{(i)})$ : predicted probability of landslide occurrence.

After training all models, they are applied to calculate the landslide susceptibility in the natural areas of Hong Kong. During the experiments, the land areas of Hong Kong were divided into a grid of non-overlapping

32 \times 32

pixel image blocks. The LSMs for these natural areas, generated by each model, are presented in Figure 10 and Figure 11.

As shown in Figure 10 and Figure 11, the landslide susceptibility distributions from the models exhibit similar trends and align well with the actual landslide distribution (see Figure 1), though there are slight differences in classification tendencies. The LSM generated by the ResNet-based model tends to classify evaluation units into lower susceptibility categories, while the VGG-based model leans towards higher susceptibility categories. In contrast, the LSMs from the other models show a more balanced distribution across different susceptibility categories. The trends observed in the MSCNN and MSMHA models are quite similar and differ somewhat from the standard CNN-based models. High susceptibility areas are primarily concentrated in the southeastern mountains of Lantau Island, the western New Territories (e.g., Castle Peak), the mountains of Sai Kung (e.g., Sharp Peak), and various coastal slopes. The high landslide susceptibility in these regions can be attributed to steep slopes and the predominance of weathered volcanic rock with sparse vegetation, leading to low resistance against shear and erosion. The MSCNN, MSSE and MSMHA models identify more landslide-prone areas than the standard CNN classification model.

4.2. Pixel-Based LSA

To further validate the advantages of the proposed MSMHA and MSSE models, this work also compares them with traditional ML-based models that use pixels as samples. These ML-based models include RF, MLP, and SVM.

RF, MLP, and SVM are widely used methods for landslide susceptibility assessment. RF is an ensemble method based on decision trees that aggregates predictions via voting. MLP is a feedforward neural network with nonlinear activation functions and strong mapping capability. SVM finds an optimal separating hyperplane and uses kernel functions to handle nonlinear classification.

Different parameter combinations were tested to optimize the performance of each model on the test data. For SVM, the RBF kernel function was used, and the kernel function parameters and penalty coefficients were optimized by the grid search method to construct the SVM model. The penalty coefficient C in the SVM model is 0.5, and the gamma is 0.01. For RF, 1000 decision trees were selected to form an ensemble. To avoid overfitting, the maximum depth of each tree was limited to 10. The minimum number of split samples was set to 2, and the minimum number of leaf node samples was set to 1. The square root of the total number of features was used to limit the features considered at each node, thus increasing the diversity of the model. Bootstrap sampling was enabled to ensure some randomness in the training data for each tree. In addition, the Gini coefficient criterion was used to assess the quality of the splits. These configurations aim to balance the performance and generalization ability of the model to obtain the best training results. MLP uses three hidden layers, each containing 1024 neurons, with ReLU as the activation function for each layer.

The above models were trained using the produced pixel-level dataset. Similarly, the natural areas of Hong Kong were divided into pixel grids and the trained models were used to assess the susceptibility of each grid one by one to obtain a territory-wide LSM for each model. Figure 12 illustrates the LSMs for all ML-based models.

Figure 12 shows that there are some differences in the LSM based on each ML-based model but the high susceptibility areas are all concentrated in the Jurassic volcanic aggregates in the southwest hills of Lantau Island. Comparison with Figure 1c shows that, overall, the landslide susceptibility trends and spatial distributions obtained by each method are generally consistent, but there is some variability in the LSMs obtained by MSMHA and MSSE over the ML-based models. The LSM results from RF, SVM and MLP have a higher percentage of low susceptibility areas, while MSMHA and MSSE have a higher percentage of high susceptibility areas, but the MSSE model predicts a larger area of high and very high susceptibility areas of landslides.

4.3. Quantitative Evaluation

Model performance was evaluated using multiple metrics on both training and test datasets, with an ablation study to quantify the contribution of each architectural component.

Evaluation Metrics

This study employs a variety of statistical evaluation metrics, including accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve, and area under the curve (AUC), to evaluate the performance of the landslide susceptibility model. These metrics are derived from the four outcomes of binary classification: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). Specifically, TP refers to landslide samples correctly predicted as landslides, while TN denotes non-landslide samples correctly classified. FP represents non-landslide samples incorrectly predicted as landslides, and FN indicates landslide samples misclassified as non-landslides.

Precision measures the reliability of positive predictions, while Recall reflects the model’s sensitivity in detecting actual landslides. The F1 score, as the harmonic mean of precision and recall, offers a balanced assessment of overall performance. The ROC curve visualizes the trade-off between True Positive Rate (TPR) and False Positive Rate (FPR) across varying thresholds, and the AUC provides a scalar summary of discriminative power, with values approaching 1.0 indicating strong classification capability.

The formulas for these metrics are given in Equations (10)–(15):

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(10)

T P R = \frac{T P}{T P + F N}

(11)

F P R = \frac{F P}{F P + T N}

(12)

Precision = \frac{T P}{T P + F P}

(13)

Recall = \frac{T P}{T P + F N}

(14)

F_{1} score = \frac{2 \times Recall \times Precision}{Recall + Precision}

(15)

Overall Model Performance

The performance of all models was evaluated on both training and test datasets to assess their fitting capability and generalization, respectively. Results are summarized in Table 4 and Table 5, with ROC curves for CNN-based and ML-based models shown in Figure 13 and Figure 14, respectively.

As shown in Table 4, the proposed MSMHA and MSSE models achieve the highest F1 scores (0.89 and 0.85, respectively) on the training set, suggesting effective learning of complex feature representations. On the test set (Table 5), MSMHA achieves the best predictive performance with an F1 score of 0.87 and an AUC of 0.91, outperforming the baseline CNN (F1 = 0.84) as well as all other CNN variants. In contrast, traditional machine learning models—SVM, RF, and MLP—achieve lower F1 scores (0.66–0.70), revealing a performance gap compared to deep learning approaches.

As illustrated in Figure 13 and Figure 14, the MSMHA model achieves the highest AUC of 0.91, followed by MSSE (0.90), indicating strong discriminative ability among the evaluated models. Traditional ML models exhibit notably lower AUC values (RF: 0.86, SVM: 0.85, MLP: 0.85), further highlighting their reduced effectiveness in capturing complex spatial relationships. According to the classification scheme of Guzzetti et al. [4], an AUC of 0.91 places the MSMHA model in Quality Level 5, corresponding to excellent predictive accuracy.

Ablation Analysis: Role of Multi-Scale and Attention Mechanisms

To systematically evaluate the contribution of each component in the proposed MSAN framework, we conduct an ablation study. The analysis focuses on three key variants: (i) MSCNN, the baseline model with only multi-scale feature extraction; (ii) MSSE, incorporating channel-wise attention via Squeeze-and-Excitation blocks; and (iii) MSMHA, capturing long-range spatial dependencies via a multi-head self-attention mechanism.

Despite incorporating multi-scale receptive fields, MSCNN underperforms with an F1 score of 0.64 on the test set—lower than the simple baseline CNN (0.84). This degradation suggests that naive concatenation of multi-scale features introduces redundancy and scale conflicts, impairing feature integration. Without a mechanism to prioritize informative scales, the model struggles to distinguish relevant terrain characteristics from noise.

The integration of attention modules substantially improves performance:

MSSE (F1 = 0.86, AUC 0.89) improves over MSCNN by 33% in F1 score, demonstrating that channel-wise attention effectively recalibrates feature importance and suppresses irrelevant multi-scale responses.
MSMHA (F1 = 0.87, AUC 0.91) achieves the best overall performance, indicating that spatial self-attention further enhances the model’s ability to capture long-range dependencies—such as drainage convergence zones or fault alignments—that are critical for landslide initiation.

Both MSSE and MSMHA outperform the baseline CNN, confirming that adaptive feature recalibration, rather than mere multi-scale representation, is the key driver of performance gains.

4.4. Validation of the Landslide Susceptibility Map

The landslide susceptibility map (LSM) was evaluated using three complementary checks: areal distribution of susceptibility classes, landslide density analysis, and temporal validation with independent data.

First, the areal distribution of susceptibility classes across the entire study region was analyzed (Figure 15a). The results indicate that the “very low” susceptibility class occupies the largest proportion of the landscape, whereas “high” and “very high” susceptibility classes cover relatively small areas. This distribution aligns with the geographical and geological characteristics of Hong Kong, where, despite widespread steep topography, the majority of slopes are either inherently stable or effectively managed through engineering and vegetation controls, resulting in only a limited number of persistently unstable zones.

Second, to quantitatively assess the model’s discriminative power, landslide density (defined as the number of landslides per km²) was computed for each susceptibility class using the complete historical landslide inventory from 1984 to 2019 (Figure 15b). A clear and progressive increase in landslide density with rising susceptibility levels is observed, from the lowest density in the “very low” class to the highest in the “very high” class. This monotonic relationship confirms a strong positive correlation between predicted susceptibility and actual landslide occurrences, which is a fundamental criterion for a reliable and meaningful LSM.

Third, an independent temporal validation was performed using 3198 landslide locations from 1974 to 1983, predating the training dataset (1984–2019), to evaluate the model’s predictive capability over time. An equal number of non-landslide points were randomly sampled to ensure balanced evaluation. The spatial distributions of these points are shown in Figure 16a,c.

The MSMHA-generated LSM was applied to classify these historical points, with results presented in Figure 16b,d. As shown in Figure 16b, over 65% of the pre-inventory landslide locations were classified into the “high” and “very high” susceptibility categories, demonstrating the model’s ability to identify historically unstable areas despite temporal independence.

In contrast, non-landslide points are distributed relatively evenly across all susceptibility classes (Figure 16d). This contrast between historical landslide clustering in high-susceptibility zones and the uniform distribution of stable points confirms the model’s strong spatial discrimination and high sensitivity. These results validate the robustness of the proposed LSM for regional-scale hazard screening, land-use planning, and prioritization of slope safety inspections.

5. Discussion

The MSMHA model achieves the best performance (AUC: 0.91, F1: 0.87) among all evaluated models. This section discusses key findings and their implications.

5.1. Role of Multi-Scale Representation and Attention Mechanisms

Contrary to initial expectations, the baseline MSCNN model underperforms compared to conventional single-scale CNNs (e.g., VGG, ResNet), achieving lower AUC and F1 score. This suggests that simple concatenation of multi-scale features—without additional regularization—can introduce redundancy and amplify noise. However, when combined with attention mechanisms, multi-scale architecture becomes advantageous: both MSSE (AUC 0.90) and MSMHA (AUC 0.91) outperform MSCNN (AUC 0.71), with relative AUC improvements of 26.8% and 28.2%, respectively, confirming that attention mechanisms enable dynamic selection and fusion of scale-specific information.

The SE block in MSSE adaptively emphasizes critical spatial features derived from key geoenvironmental factors such as slope and lithology, acting as a feature selector that suppresses noise. MSMHA’s multi-head self-attention captures long-range spatial dependencies beyond local receptive fields—such as upslope contributing areas or fault intersections—thereby modeling non-local interactions that govern landslide initiation.

5.2. Comparison with Existing Methods

CNN-based models (MSMHA: AUC 0.91; baseline CNN: AUC 0.88) consistently outperform traditional ML models (random forest: AUC 0.86; SVM: AUC 0.85; MLP: AUC 0.85). As shown in Table 6, MSMHA achieves the highest AUC among all evaluated models. Visual comparison (Figure 17) demonstrates that MSMHA identifies more extensive very-high susceptibility zones across multiple areas, while random forest predicts a more restricted area of high susceptibility. MSMHA also outperforms VGG (AUC 0.87) and ResNet (AUC 0.81) without requiring residual connections or encoder–decoder structures.

5.3. Limitations and Future Work

The validation in this study is limited to rainfall-induced shallow landslides in Hong Kong. Different landslide typologies (rotational slides, translational slides, debris flows) have distinct kinematic characteristics and failure mechanisms that may require type-specific modeling strategies. For instance, rotational slides are primarily controlled by geological structure and groundwater conditions, while debris flows are more influenced by topographic confinement. Future research will extend to other landslide types and incorporate typology-sensitive features.

5.4. Practical Implications

Based on these results, we suggest:

For high-precision landslide risk mapping: MSMHA is recommended (AUC 0.91, F1 0.87), particularly for applications requiring identification of very-high susceptibility zones.
For resource-constrained deployments: Random forest offers a robust alternative with simpler implementation and stronger interpretability (AUC 0.86, F1 0.70).
Avoid relying solely on MSCNN: Its low recall (0.52) may miss substantial portions of landslide-prone areas.

6. Conclusions

This study proposes and evaluates a multi-scale attention network (MSAN) framework for landslide susceptibility assessment (LSA), with two attention-enhanced variants: MSSE and MSMHA. The key findings of this research are as follows:

Performance of proposed framework. Experimental results on a large-scale dataset from Hong Kong demonstrate that both MSSE and MSMHA consistently outperform classical CNN architectures (VGG, ResNet, DenseNet, and GoogLeNet) and traditional machine learning methods (SVM, RF, MLP). The MSMHA model achieves the highest performance with an F1 score of 0.87 and AUC of 0.91, achieving competitive LSA accuracy. The performance of the proposed models validates the effectiveness of combining multi-scale feature extraction with attention mechanisms for modeling complex landslide-terrain relationships.

Critical role of attention mechanisms. The ablation study reveals that attention mechanisms are essential for realizing the benefits of multi-scale representation. The MSCNN variant (multi-scale without attention) underperforms with an F1 score of 0.64, while both MSSE (F1: 0.85) and MSMHA (F1: 0.87) achieve better results. This demonstrates that simply concatenating multi-scale features introduces redundancy and scale conflicts, which attention mechanisms effectively resolve by adaptively emphasizing critical features and suppressing irrelevant inputs. The spatial self-attention in MSMHA further enhances performance by modeling non-local feature interactions.

Advantages of CNN-based approaches. All CNN-based models consistently outperform traditional machine learning approaches, demonstrating the advantage of learning hierarchical spatial representations directly from raw geo-environmental data without reliance on manual feature engineering. The spatial context captured by CNNs enables better identification of terrain patterns and geomorphological indicators that are crucial for accurate landslide susceptibility assessment.

Validation of predictive capability. Temporal validation using pre-inventory landslide data (1974–1983) demonstrates that the proposed MSMHA model successfully identifies over 65% of historically unstable areas within high- and very-high susceptibility zones, confirming the model’s strong spatial discrimination and predictive skill despite temporal independence from the training period.

Practical implications. The proposed MSAN framework, particularly the MSMHA variant, provides an accurate solution for regional landslide risk assessment, with potential for supporting landslide risk mitigation and land-use planning in urbanized mountainous regions. The framework’s ability to generate spatially coherent susceptibility maps with high accuracy offers valuable tools for hazard screening, prioritization of slope safety inspections, and informed decision-making in land-use planning.

Future work will explore the integration of dynamic environmental factors (e.g., rainfall and vegetation change) to further improve model performance and temporal resolution. Additionally, efforts will be directed toward developing interpretability analysis methods to enhance understanding of how individual conditioning factors contribute to landslide susceptibility predictions.

Author Contributions

Conceptualization, Z.Z. and W.S.; methodology, Z.Z. and M.Z.; software, Z.Z.; validation, Z.Z., M.Z. and Y.S.; formal analysis, Z.Z. and S.C.; investigation, Z.Z., S.C. and H.L.; resources, W.S. and M.Z.; data curation, Z.Z. and Y.S.; writing—original draft preparation, Z.Z.; writing—review and editing, W.S., M.Z. and Y.S.; visualization, Z.Z. and Y.S.; supervision, W.S. and M.Z.; project administration, W.S.; funding acquisition, W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Smart Cities Research Institute, The Hong Kong Polytechnic University, grant number CD06.

Institutional Review Board Statement

Not applicable. This study is based on historical landslide inventory data and geo-environmental factors, and does not involve human or animal subjects.

Informed Consent Statement

Not applicable.

Data Availability Statement

The landslide inventory data used in this study were obtained from the Geotechnical Engineering Office of the Civil Engineering and Development Department (CEDD) of Hong Kong. Due to data sharing restrictions, these data are not publicly available. Other landslide-influencing factors (topographic and geological) were derived from the Hong Kong Open Data Platform (https://data.gov.hk), which are available upon reasonable request. The trained model weights and source code are available from the corresponding author upon reasonable request.

Acknowledgments

During the preparation of this manuscript, the authors used AI-assisted tools for the purposes of language polishing and error checking. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

Author Z.Z. was employed by Changjiang Survey, Planning, Design and Research Co., Ltd. Author S.C. and H.L. were employed by Changjiang Spatial Information Technology Engineering Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AUC	Area under the curve
CNN	Convolutional neural network
DTM	Digital terrain model
GIS	Geographic information system
LiDAR	Light detection and ranging
LIF	Landslide-influencing factor
LSA	Landslide susceptibility assessment
LSM	Landslide susceptibility map
ML	Machine learning
MLP	Multi-layer perceptron
MHA	Multi-head attention
MSAN	Multi-scale attention network
MSCNN	Multi-scale convolutional neural network
MSMHA	Multi-scale multi-head attention
MSSE	Multi-scale Squeeze-and-Excitation
RF	Random forest
ROC	Receiver operating characteristic
SE	Squeeze-and-Excitation
SPI	Stream power index
STI	Sediment transport index
SVM	Support vector machine
TWI	Topographic wetness index

References

Alvioli, M.; Guzzetti, F.; Rossi, M. Scaling Properties of Rainfall Induced Landslides Predicted by a Physically Based Model. Geomorphology 2014, 213, 38–47. [Google Scholar] [CrossRef]
Arabameri, A.; Pradhan, B.; Rezaei, K.; Lee, C.W. Assessment of Landslide Susceptibility Using Statistical- and Artificial Intelligence-Based FR–RF Integrated Model and Multiresolution DEMs. Remote Sens. 2019, 11, 999. [Google Scholar] [CrossRef]
Barredo, J.; Benavides, A.; Hervás, J.; van Westen, C.J. Comparing Heuristic Landslide Hazard Assessment Techniques Using GIS in the Tirajana Basin, Gran Canaria Island, Spain. Int. J. Appl. Earth Obs. Geoinf. 2000, 2, 9–23. [Google Scholar] [CrossRef]
Guzzetti, F.; Reichenbach, P.; Ardizzone, F.; Cardinali, M.; Galli, M. Estimating the Quality of Landslide Susceptibility Models. Geomorphology 2006, 81, 166–184. [Google Scholar] [CrossRef]
Meneses, B.M.; Pereira, S.; Reis, E. Effects of Different Land Use and Land Cover Data on the Landslide Susceptibility Zonation of Road Networks. Nat. Hazards Earth Syst. Sci. 2019, 19, 471–487. [Google Scholar] [CrossRef]
Miao, F.; Zhao, F.; Wu, Y.; Li, L.; Török, Á. Landslide Susceptibility Mapping in Three Gorges Reservoir Area Based on GIS and Boosting Decision Tree Model. Stoch. Environ. Res. Risk Assess. 2023, 37, 2283–2303. [Google Scholar] [CrossRef]
Petley, D. Global Patterns of Loss of Life from Landslides. Geology 2012, 40, 927–930. [Google Scholar] [CrossRef]
Formetta, G.; Rago, V.; Capparelli, G.; Rigon, R.; Muto, F.; Versace, P. Integrated Physically Based System for Modeling Landslide Susceptibility. Procedia Earth Planet. Sci. 2014, 9, 74–82. [Google Scholar] [CrossRef]
Lombardo, L.; Mai, P.M. Presenting Logistic Regression-Based Landslide Susceptibility Results. Eng. Geol. 2018, 244, 14–24. [Google Scholar] [CrossRef]
Guzzetti, F.; Mondini, A.C.; Cardinali, M.; Fiorucci, F.; Santangelo, M.; Chang, K.T. Landslide Inventory Maps: New Tools for an Old Problem. Earth-Sci. Rev. 2012, 112, 42–66. [Google Scholar] [CrossRef]
Dias, A.; Hart, J.; Fung, E.K.S. The Enhanced Natural Terrain Landslide Inventory. In Natural Hillside Study—Risk Mitigation Measures; Geotechnical Engineering Office, Civil Engineering and Development Department: Hong Kong, China, 2009; pp. 71–78. [Google Scholar]
Merghadi, A.; Yunus, A.P.; Dou, J.; Whiteley, J.; ThaiPham, B.; Bui, D.T.; Avtar, R.; Abderrahmane, B. Machine Learning Methods for Landslide Susceptibility Studies: A Comparative Overview of Algorithm Performance. Earth-Sci. Rev. 2020, 207, 103225. [Google Scholar] [CrossRef]
Ganesh, B.; Vincent, S.; Pathan, S.; Garcia Benitez, S.R. Machine Learning Based Landslide Susceptibility Mapping Models and GB-SAR Based Landslide Deformation Monitoring Systems: Growth and Evolution. Remote Sens. Appl. Soc. Environ. 2023, 29, 100905. [Google Scholar] [CrossRef]
Kudaibergenov, M.; Nurakynov, S.; Iskakov, B.; Iskaliyeva, G.; Maksum, Y.; Orynbassarova, E.; Akhmetov, B.; Sydyk, N. Application of Artificial Intelligence in Landslide Susceptibility Assessment: Review of Recent Progress. Remote Sens. 2025, 17, 34. [Google Scholar] [CrossRef]
Huang, Y.; Zhao, L. Review on Landslide Susceptibility Mapping Using Support Vector Machines. Catena 2018, 165, 520–529. [Google Scholar] [CrossRef]
Ma, L.; Wang, J.; Cheng, J.; Wang, X.; Zhu, W. MLRP-KG: Mine Landslide Risk Prediction Based on Knowledge Graph. IEEE Trans. Artif. Intell. 2022, 3, 78–87. [Google Scholar] [CrossRef]
Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support Vector Machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
Baum, E.B. On the Capabilities of Multilayer Perceptrons. J. Complex. 1988, 4, 193–215. [Google Scholar] [CrossRef]
LeCun, Y.; Jackel, L.; Bottou, L.; Brunot, A.; Cortes, C.; Denker, J.; Drucker, H.; Guyon, I.; Muller, U.; Sackinger, E. Comparison of Learning Algorithms for Handwritten Digit Recognition. In Proceedings of the International Conference on Artificial Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 60, pp. 53–60. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
Ge, Y.; Liu, G.; Tang, H.; Zhao, B.; Xiong, C. Comparative Analysis of Five Convolutional Neural Networks for Landslide Susceptibility Assessment. Bull. Eng. Geol. Environ. 2023, 82, 377. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J.; Zhang, S.H.; Martin, R.R.; Cheng, M.M.; Hu, S.M. Attention Mechanisms in Computer Vision: A Survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
Niu, Z.; Zhong, G.; Yu, H. A Review on the Attention Mechanism of Deep Learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Huang, F.; Yan, J.; Fan, X.; Yao, C.; Huang, J.; Chen, W.; Chen, W.; Hong, H.; Hong, H. Uncertainty Pattern in Landslide Susceptibility Prediction Modelling: Effects of Different Landslide Boundaries and Spatial Shape Expressions. Geosci. Front. 2021, 13, 101317. [Google Scholar] [CrossRef]
Yi, Y.; Zhang, Z.; Zhang, W.; Jia, H.; Zhang, J. Landslide Susceptibility Mapping Using Multiscale Sampling Strategy and Convolutional Neural Network: A Case Study in Jiuzhaigou Region. CATENA 2020, 195, 104851. [Google Scholar] [CrossRef]
Chen, Y.; Ming, D.; Ling, X.; Lv, X.; Zhou, C. Landslide Susceptibility Mapping Using Feature Fusion-Based CPCNN-ML in Lantau Island, Hong Kong. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3625–3639. [Google Scholar] [CrossRef]
Pugliese Viloria, A.d.J.; Folini, A.; Carrion, D.; Brovelli, M.A. Hazard Susceptibility Mapping with Machine and Deep Learning: A Literature Review. Remote Sens. 2024, 16, 3374. [Google Scholar] [CrossRef]
Pradhan, B.; Dikshit, A.; Lee, S.; Kim, H. An Explainable AI (XAI) Model for Landslide Susceptibility Modeling. Appl. Soft Comput. 2023, 142, 110324. [Google Scholar] [CrossRef]
Wang, Y.; Fang, Z.; Hong, H. Comparison of Convolutional Neural Networks for Landslide Susceptibility Mapping in Yanshan County, China. Sci. Total Environ. 2019, 666, 975–993. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.E.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Laurens, V.D.M.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Zhao, Z.; Chen, T.; Dou, J.; Liu, G.; Plaza, A. Landslide Susceptibility Mapping Considering Landslide Local-Global Features Based on CNN and Transformer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 7475–7489. [Google Scholar] [CrossRef]
Yang, Z.; Xu, C.; Shao, X.; Ma, S.; Li, L. Landslide Susceptibility Mapping Based on CNN-3D Algorithm with Attention Module Embedded. Bull. Eng. Geol. Environ. 2022, 81, 412. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems 30 (Nips 2017); Curran Associates, Inc.: Long Beach, CA, USA, 2017; Volume 30. [Google Scholar]
Zhong, Z.; Xiao, G.; Wang, S.; Wei, L.; Zhang, X. PESA-Net: Permutation-Equivariant Split Attention Network for Correspondence Learning. Inf. Fusion 2022, 77, 81–89. [Google Scholar] [CrossRef]
Reichenbach, P.; Rossi, M.; Malamud, B.D.; Mihir, M.; Guzzetti, F. A Review of Statistically-Based Landslide Susceptibility Models. Earth-Sci. Rev. 2018, 180, 60–91. [Google Scholar] [CrossRef]
Steger, S.; Brenning, A.; Bell, R.; Glade, T. The Propagation of Inventory-Based Positional Errors into Statistical Landslide Susceptibility Models. Nat. Hazards Earth Syst. Sci. 2016, 16, 2729–2745. [Google Scholar] [CrossRef]
Chang, Z.; Huang, J.; Huang, F.; Bhuyan, K.; Meena, S.R.; Catani, F. Uncertainty Analysis of Non-Landslide Sample Selection in Landslide Susceptibility Prediction Using Slope Unit-Based Machine Learning Models. Gondwana Res. 2023, 117, 307–320. [Google Scholar] [CrossRef]
Chen, L.; Ma, P.; Fan, X.; Wang, X.; Ng, C.W.W. A Knowledge-Aware Deep Learning Model for Landslide Susceptibility Assessment in Hong Kong. Sci. Total Environ. 2024, 941, 173557. [Google Scholar] [CrossRef]
Wang, H.; Zhang, L.; Luo, H.; He, J.; Cheung, R.W.M. AI-Powered Landslide Susceptibility Assessment in Hong Kong. Eng. Geol. 2021, 288, 106103. [Google Scholar] [CrossRef]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar] [CrossRef]

Figure 1. Study area: (a) location, (b) contours, and (c) distribution of historical landslides.

Figure 2. Landslide-influencing factors (Part 1): (a) aspect, (b) distance to road, (c) elevation, (d) plan curvature, (e) slope, (f) profile curvature.

Figure 3. Landslide-influencing factors (Part 2): (a) SPI, (b) TWI, (c) STI, (d) distance to fault, (e) lithology.

Figure 4. Workflow of the landslide susceptibility assessment study.

Figure 5. Schematic diagram of sample generation: (a) determination of landslide objects; (b) sample generation for object-based and pixel-based datasets.

Figure 6. Architecture of the proposed MSAN model.

Figure 7. An example illustrating the multi-scale feature extraction process using a trained multi-scale convolutional neural network (MSCNN).

Figure 8. SE-based channel attention module, source: Hu et al. [23].

Figure 9. Schematic diagram of the multi-head attention module, adapted from Vaswani et al. [38].

Figure 10. Landslide susceptibility maps (LSMs) from classical CNN models: (a) DenseNet, (b) VGG, (c) GoogLeNet, (d) ResNet. Susceptibility levels range from very low (green) to very high (red), consistent across all maps.

Figure 11. Landslide susceptibility maps (LSMs) from advanced models: (a) Baseline CNN, (b) MSCNN, (c) MSSE, (d) MSMHA. Susceptibility levels range from very low (green) to very high (red), consistent across all maps.

Figure 12. LSMs produced by pixel-based machine learning models: (a) MLP, (b) SVM, (c) RF.

Figure 13. ROC curves of the CNN-based LSA model with the test dataset.

Figure 14. ROC curves of ML-based LSA models on the test dataset.

Figure 15. Statistical analysis of the landslide susceptibility map generated by the MSMHA model: (a) percentage of total area falling into each susceptibility class; (b) landslide density (landslides per km²) across susceptibility classes, calculated using the 1984–2019 inventory.

Figure 16. Results of temporal validation using independent landslide and non-landslide data (1974–1983). (a) Spatial distribution of landslide points. (b) Classification of landslides across susceptibility classes. (c) Spatial distribution of non-landslide points. (d) Classification of non-landslide points across susceptibility classes.

Figure 17. Comparison of susceptibility maps: (a) Random forest predicts a more restricted area of high susceptibility with moderate-to-high susceptibility dominating central regions; (b) MSMHA identifies more extensive very-high susceptibility zones across western, southern, and eastern regions.

Table 1. Description of landslide-influencing factors (LIFs).

Data Type	LIF	Source	Scale/Resolution
Geological	Distance to fault	Geo-map	1:20,000
	Lithology	Geo-map	1:20,000
Topographic	Elevation	LiDAR-derived DTM	1.0 m
	Slope	LiDAR-derived DTM	1.0 m
	Aspect	LiDAR-derived DTM	1.0 m
	Profile curvature	LiDAR-derived DTM	1.0 m
	Plan curvature	LiDAR-derived DTM	1.0 m
Hydrological	SPI	LiDAR-derived DTM	1.0 m
	TWI	LiDAR-derived DTM	1.0 m
	STI	LiDAR-derived DTM	1.0 m
Human activity	Distance to road	Geo-map	1:20,000

Table 2. Statistics of landslide inventory (1984–2019). The proportion (%) represents the percentage of landslides within each interval relative to the total count.

Length Intervals (m)			Area Intervals (m²)
Interval	No. of Landslides	Proportion (%)	Interval	No. of Landslides	Proportion (%)
0–5	3714	26.97	0–50	7229	52.49
5–10	6793	49.33	50–100	3703	26.89
10–15	2255	16.37	100–150	1376	9.99
15–20	643	4.67	150–300	1134	8.23
20–25	225	1.63	300–450	206	1.50
25–30	79	0.57	450–900	109	0.79
>30	62	0.46	>900	14	0.10

Table 3. Summary of the two datasets used in the study.

Property	Object-Based Dataset	Pixel-Based Dataset
Input Scope	$32 \times 32$ spatial block	Single pixel
Input Representation	$32 \times 32 \times 11$ tensor	$1 \times 11$ vector
Spatial Context	Yes	No
Total Sample Size	29,440	29,440
Positive:Negative Ratio	1:2	1:2
Intended Model Type	CNN-based models	Traditional ML models

Table 4. Model performance evaluation on the training dataset.

LSA Models	Precision	Recall	F1 Score	Accuracy
CNN-based models
Baseline CNN	0.73	0.97	0.83	0.77
VGG	0.84	0.80	0.82	0.79
GoogLeNet	0.71	0.96	0.82	0.74
ResNet	0.69	0.98	0.81	0.73
DenseNet	0.71	0.98	0.82	0.75
MSCNN	0.83	0.53	0.64	0.65
MSSE	0.77	0.95	0.85	0.81
MSMHA	0.83	0.95	0.89	0.86
ML-based models
SVM	0.69	0.66	0.68	0.77
RF	0.64	0.78	0.70	0.76
MLP	0.67	0.74	0.70	0.76

Note: Bold and underline indicate the best and second best results for the overall indicator, respectively.

Table 5. Model performance evaluation on the test dataset.

LSA Models	Precision	Recall	F1 Score	Accuracy
CNN-based models
Baseline CNN	0.74	0.98	0.84	0.78
VGG	0.84	0.80	0.82	0.79
GoogLeNet	0.72	0.96	0.82	0.75
ResNet	0.70	0.97	0.81	0.73
DenseNet	0.71	0.98	0.82	0.75
MSCNN	0.81	0.52	0.64	0.64
MSSE	0.78	0.96	0.86	0.81
MSMHA	0.81	0.95	0.87	0.83
ML-based models
SVM	0.67	0.64	0.66	0.75
RF	0.65	0.78	0.70	0.76
MLP	0.66	0.71	0.68	0.75

Note: Bold and underline indicate the best and second best results for the overall indicator, respectively.

Table 6. Model performance comparison on test dataset (AUC and F1-score).

Model	AUC	F1-Score
Traditional ML models
Random Forest	0.86	0.70
SVM	0.85	0.66
MLP	0.85	0.68
CNN-based models
VGG	0.87	0.82
DenseNet	0.86	0.82
Baseline CNN	0.88	0.84
MSCNN	0.71	0.64
MSSE	0.90	0.86
MSMHA (Proposed)	0.91	0.87

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhan, Z.; Chen, S.; Zhang, M.; Shi, W.; Sun, Y.; Luo, H. Multi-Scale Attention Network for Landslide Susceptibility Assessment. Geosciences 2026, 16, 188. https://doi.org/10.3390/geosciences16050188

AMA Style

Zhan Z, Chen S, Zhang M, Shi W, Sun Y, Luo H. Multi-Scale Attention Network for Landslide Susceptibility Assessment. Geosciences. 2026; 16(5):188. https://doi.org/10.3390/geosciences16050188

Chicago/Turabian Style

Zhan, Zhao, Shanxiong Chen, Min Zhang, Wenzhong Shi, Yangjie Sun, and Hongbo Luo. 2026. "Multi-Scale Attention Network for Landslide Susceptibility Assessment" Geosciences 16, no. 5: 188. https://doi.org/10.3390/geosciences16050188

APA Style

Zhan, Z., Chen, S., Zhang, M., Shi, W., Sun, Y., & Luo, H. (2026). Multi-Scale Attention Network for Landslide Susceptibility Assessment. Geosciences, 16(5), 188. https://doi.org/10.3390/geosciences16050188

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Scale Attention Network for Landslide Susceptibility Assessment

Abstract

1. Introduction

2. Study Area and Data

2.1. Overview of the Study Area

2.2. Historical Landslide Inventory

2.3. Landslide Influencing Factors

2.4. Data Preprocessing

3. Methods

3.1. Study Workflow

3.2. Dual Dataset Construction for CNN and Traditional ML Models

3.3. Proposed MSAN

3.3.1. Multi-Scale Feature Extraction

3.3.2. Attention Mechanism Integration

Channel Attention (SE Block)

Multi-Head Self-Attention (MHA)

3.3.3. Classification Head

4. Experiment and Result Evaluation

4.1. Object-Based LSA

4.2. Pixel-Based LSA

4.3. Quantitative Evaluation

Evaluation Metrics

Overall Model Performance

Ablation Analysis: Role of Multi-Scale and Attention Mechanisms

4.4. Validation of the Landslide Susceptibility Map

5. Discussion

5.1. Role of Multi-Scale Representation and Attention Mechanisms

5.2. Comparison with Existing Methods

5.3. Limitations and Future Work

5.4. Practical Implications

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI