1. Introduction
Landslides rank among the most destructive natural hazards, causing extensive damage to infrastructure and posing severe threats to human life and property [
1,
2,
3]. Regional landslide susceptibility assessment (LSA) is a critical tool for mitigating these risks by evaluating the spatial likelihood of future landslides based on historical landslide inventories and associated conditioning factors [
4]. LSA integrates diverse landslide-influencing factors (LIFs), including topographic, geological, hydrological, and anthropogenic variables, within a study area to identify zones with high potential for slope failure [
5,
6,
7]. By delineating susceptible areas, LSA supports informed decision-making in land-use planning, disaster preparedness, and risk management.
Over the past decades, various methodologies have been employed for LSA, including empirical approaches [
8,
9], mechanistic models [
10,
11], and statistical techniques [
12,
13]. In recent years, machine learning (ML)-based methods have gained increasing prominence due to their strong capability in modeling complex, nonlinear relationships between environmental factors and landslide occurrences [
14,
15,
16,
17,
18,
19,
20]. With rapid advances in artificial intelligence, numerous ML algorithms, such as support vector machines (SVM), random forests (RF), and multi-layer perceptrons (MLP), have been widely applied in landslide susceptibility mapping [
21,
22,
23,
24,
25,
26,
27]. Concurrently, progress in remote sensing and geographic information systems (GIS) has enabled the integration of high-resolution, multi-source geospatial data into LSA frameworks, improving model reliability and spatial accuracy [
14].
Despite these advancements, conventional ML models typically treat input features as one-dimensional vectors, representing individual sampling points. This point-based paradigm captures only local environmental conditions while neglecting the spatial context and interactions surrounding each location [
28,
29]. Such an approach fails to account for the spatial synergies among influencing factors, such as slope transitions, drainage patterns, and lithological boundaries, that are crucial for accurate susceptibility modeling. A more effective representation involves using gridded or areal data, such as raster layers, which preserve spatial structure and enable holistic analysis of the landscape.
Convolutional neural networks (CNNs), a class of deep learning models specifically designed for grid-structured data, have demonstrated success in capturing spatial patterns from such representations [
28,
30]. Their inherent properties of local connectivity and weight sharing allow CNNs to effectively extract hierarchical spatial features, making them well-suited for identifying geomorphological indicators of instability. Consequently, CNNs have been increasingly adopted in landslide studies. For instance, they have been used for landslide detection from remote sensing imagery [
31] and for regional susceptibility assessment. Wang et al. (2019) pioneered the application of a LeNet-5-inspired CNN for LSA in Yanshan County, China, achieving higher accuracy than traditional ML methods [
32]. Subsequent studies have explored more advanced architectures, including VGG [
33], GoogLeNet [
34], ResNet [
20], and DenseNet [
35], in diverse geographical settings [
30]. Other works have integrated CNNs with recurrent networks, such as CNN-LSTM [
29], or interpretability tools, such as CNN-SHAP [
29], further advancing the field.
Nevertheless, most existing CNN-based LSA models rely on single-scale feature extraction through fixed-size convolutional kernels, resulting in a limited receptive field. This constraint poses a significant challenge given the wide variability in landslide sizes, ranging from tens of square meters to several square kilometers. A static receptive field may either fail to capture fine-grained details essential for detecting small landslides, if too large, or overlook broader geological and hydrological contexts necessary for characterizing large-scale failures, if too small. This inability to adapt to multiple spatial scales undermines model performance across heterogeneous landscapes and limits generalization capacity.
To address this limitation, multi-scale strategies have been introduced into CNN architectures for LSA [
28,
36]. For example, Yi et al. [
28] proposed a multi-scale CNN that fuses features extracted at different resolutions, improving prediction accuracy. However, simple concatenation or summation of multi-scale features can lead to issues such as feature redundancy, scale conflict, and information interference. Specifically, dominant coarse-scale features may overwhelm finer-scale details during fusion, leading to the loss of critical information for small landslides.
To mitigate these challenges, attention mechanisms have emerged as a powerful solution [
25,
26,
37,
38,
39]. These mechanisms enable the network to dynamically assign weights to different feature channels or spatial locations based on their relevance to the task, thereby enhancing informative features while suppressing irrelevant or redundant ones. When integrated into multi-scale CNNs, attention modules can selectively emphasize important cross-scale features, alleviating scale imbalance and improving feature fusion efficiency. Despite their potential, no study has yet systematically evaluated the effectiveness of combining multi-scale CNNs with advanced attention mechanisms, particularly both channel-wise and self-attention, for LSA.
To bridge this gap, we propose a multi-scale attention network (MSAN) framework for landslide susceptibility assessment. Within this unified architecture, multi-scale features are first extracted via parallel convolutional branches with kernels of varying sizes, such as , , and , enabling simultaneous capture of local, intermediate, and contextual patterns. Subsequently, two attention-based fusion strategies are implemented to refine the multi-scale feature representation. The first variant, termed the multi-scale Squeeze-and-Excitation (MSSE) CNN, applies channel-wise attention to recalibrate feature importance across scales. The second variant, termed the multi-scale multi-head attention (MSMHA) CNN, performs self-attention over flattened spatial-feature vectors to capture long-range dependencies and dynamic feature interactions. Both variants leverage the strengths of multi-scale representation while introducing adaptive fusion mechanisms to improve classification accuracy. The refined features are then passed through classification layers to generate a continuous susceptibility index.
The proposed MSAN framework is evaluated in the Hong Kong Special Administrative Region (China), a mountainous region highly prone to rainfall-induced landslides. Commonly used evaluation protocols in LSA were employed, including receiver operating characteristic (ROC) curve analysis with area under the curve (AUC) as the primary metric, along with precision, recall, and F-measure [
4,
14,
21,
40]. Spatial cross-validation techniques were implemented to ensure robust model validation, with landslide inventory data partitioned into training and testing datasets using appropriate sampling methods [
10,
28,
41,
42]. Our study focuses specifically on rainfall-induced shallow landslides, which represent the predominant landslide type in this region. Experiments compare the MSSE and MSMHA models against classical ML algorithms (RF, MLP, SVM) and established CNN variants (VGG, GoogLeNet, ResNet, DenseNet, and a standard multi-scale CNN baseline). Results demonstrate that the integration of multi-scale extraction with attention-based fusion enhances prediction accuracy and generalization.
This work contributes to the advancement of deep learning in geohazard assessment by (1) proposing a unified MSAN framework for LSA that supports flexible integration of different attention mechanisms; (2) developing and evaluating two variants, MSSE and MSMHA, that outperform existing models in both accuracy and feature adaptability; (3) demonstrating the effectiveness of attention-guided feature fusion in mitigating scale conflict and redundancy in multi-scale LSA models.
2. Study Area and Data
2.1. Overview of the Study Area
The study area encompasses the Hong Kong Special Administrative Region of China (
Figure 1), located between longitudes 113.80° E to 114.50° E and latitudes 22.15° N to 22.57° N, covering approximately 1100 square kilometers. It includes three major regions: Hong Kong Island, the Kowloon Peninsula, and the New Territories. The highest peak, Tai Mo Shan, reaches 957 m above sea level. Hong Kong experiences a tropical monsoon climate, with an average annual rainfall of 2400 mm, 80% of which occurs between May and September.
The terrain is predominantly hilly, with 75% of the land having slopes steeper than 15° and over 30% exceeding 30°. This rugged topography, combined with frequent tropical cyclones and intense rainfall (
Figure 1c), renders the region highly susceptible to landslides, with an average of over 300 incidents per year. Urban development is concentrated in flatter areas nestled among steep slopes. As one of the most densely populated cities globally, home to approximately 7 million people, landslides pose significant threats to life and infrastructure. These risks are further exacerbated by population growth and projected increases in extreme rainfall due to climate change.
The geological framework of Hong Kong is a key determinant of landslide susceptibility. The region comprises diverse lithological units, including granitic rocks (e.g., diorite, granodiorite), metamorphic schists, and sedimentary formations (e.g., mudstone, sandstone). Granitic terrains, though generally resistant to weathering, develop deep weathering profiles that can lead to unstable slopes when fractured. Metamorphic schists exhibit pronounced foliation planes that act as potential slip surfaces, while sedimentary layers often display differential erosion rates, creating oversteepened slopes.
Major fault systems, such as the Lion Rock Fault and Tolo Fault Zone, traverse the study area, creating zones of structural weakness. These faults not only increase rock fragmentation but also facilitate groundwater flow paths that contribute to slope instability. The proximity to faults correlates strongly with landslide density, with landslides showing preferential distribution in proximity to mapped fault lines, particularly in granitic and metamorphic terrains where structural weaknesses dominate.
2.2. Historical Landslide Inventory
A reliable landslide inventory map (LIM) is crucial for LSA, as its completeness and accuracy directly affect model performance [
41]. The LIM used in this study was compiled by the Geotechnical Engineering Office of the Civil Engineering and Development Department (CEDD) of Hong Kong, based on aerial photograph interpretation from 1924 to 2019. This dataset, known as the Enhanced Natural Terrain Landslide Inventory (ENTLI) [
11], contains 111,408 landslide records up to 2019.
Each landslide in ENTLI is represented as a polyline (the landslide trail) with an associated source point (
Figure 1c). In this study, the source point of each landslide is used as the center of the corresponding sample. Given that records after 1984 are more reliable, only landslides occurring between 1984 and 2019 are included, totaling 13,771 events.
2.3. Landslide Influencing Factors
Landslides result from the interaction of multiple LIFs [
43]. Selecting appropriate factors is essential for accurate susceptibility modeling, balancing physical relevance and data availability. In this work, 11 key LIFs were selected based on prior research [
44]: lithology, distance to faults, distance to roads, elevation, slope, aspect, profile curvature, plan curvature, stream power index (SPI), topographic wetness index (TWI), and sediment transport index (STI). These factors exhibit low inter-correlation, ensuring they represent distinct aspects of slope stability.
As illustrated in
Figure 2 and
Figure 3, these factors influence slope stability from various perspectives:
Aspect: The downslope direction of the maximum rate of change, measured in degrees clockwise from north, ranging from 0° to 360°. Aspect affects solar radiation exposure and soil moisture distribution, with north-facing slopes receiving less direct sunlight and retaining more moisture in the Northern Hemisphere.
Distance to road: Euclidean distance from each cell to the nearest road network, measured in meters. This factor reflects anthropogenic disturbance and potential slope destabilization due to construction activities.
Elevation: Height above mean sea level, measured in meters. Elevation influences local climate conditions, vegetation patterns, and geological processes.
Slope: The maximum rate of elevation change between each cell and its neighbors, expressed in degrees with values ranging from 0° (flat) to 90° (vertical). Slope directly affects gravitational stress on the terrain.
Plan and profile curvature:
- –
Profile curvature: Calculated as the curvature along the direction of maximum slope (tangent to the flow line), with positive values indicating convex surfaces (accelerating flow, erosion) and negative values indicating concave surfaces (decelerating flow, deposition). Values typically range from −1 to 1.
- –
Plan curvature: Calculated as the curvature perpendicular to the direction of maximum slope, with positive values indicating convergent flow (convex shape) and negative values indicating divergent flow (concave shape). Values typically range from −1 to 1. These parameters control water convergence and flow acceleration, affecting slope stability.
SPI (Stream Power Index): Measures the erosive power of surface runoff, calculated as , where is the contributing upslope area per unit contour length and is the local slope gradient. Higher values indicate areas prone to erosion and sediment transport.
TWI (Topographic Wetness Index): Predicts saturation-prone areas, calculated as , where is the specific catchment area and is the local slope gradient. Higher values indicate areas with higher potential for saturation and water accumulation.
STI (Sediment Transport Index): Characterizes sediment transport potential, calculated as , where is the upslope contributing area and is the slope gradient. Higher values indicate greater potential for sediment movement.
Distance to fault: Euclidean distance from each cell to the nearest geological fault line, measured in meters. This factor indicates structural weakness and potential zones of rock fragmentation and instability.
Lithology: Classification of bedrock and surficial geological materials based on rock type, mineral composition, and engineering properties. Different lithological units have varying resistance to weathering and different geomechanical properties.
All LIFs were derived from the Hong Kong Open Data Platform (
https://data.gov.hk). Topographic factors were generated from a high-resolution Digital Terrain Model (DTM) based on LiDAR data collected in January 2011, with a planimetric accuracy of 0.5 m. Geological and anthropogenic factors were extracted from 1:20,000-scale geomaps. A summary of data sources and resolutions is provided in
Table 1.
Given the difficulty in aligning LIFs perfectly with historical landslide timing, we assume that all selected factors remained stable during the study period (1984–2019).
2.4. Data Preprocessing
Raw LIFs vary in data structure and must be preprocessed before model input. Since CNNs require regularized image inputs, all vector-based LIFs are rasterized using ArcGIS 10.7. All layers are unified under the same coordinate reference system (HK1980 Grid) and resampled to a consistent spatial resolution.
The original LiDAR-derived DTM has a resolution of 1.0 m/pixel, which provides sufficient detail for fine-scale LSA. Thus, all 11 LIFs are resampled to 1.0 m resolution. Areas with no-data values (e.g., water bodies) are masked out using the Hong Kong 2020 land use dataset. Additionally, urbanized and engineered slopes—where landslides are unlikely due to stabilization measures—are excluded using a second land-use-based mask.
All LIFs are normalized to the range
using min–max normalization:
This normalization serves several important purposes: (1) it eliminates scale differences between various LIFs with different units and value ranges (e.g., elevation in meters, slope in degrees, lithology classifications), ensuring that all factors contribute equally to the model without being dominated by variables with larger numerical values; (2) it accelerates model convergence during training by keeping all input values within a consistent and manageable range for the neural network; (3) it stabilizes gradient propagation in the deep learning model, preventing potential gradient vanishing or exploding issues that can occur when input values vary significantly in magnitude.
For CNN input, evaluation units are defined as fixed-size image cubes. Based on landslide morphology statistics (
Table 2), most landslide source areas are smaller than 30 m in length and 900 m
2 in area. Considering the minimum input size requirement of 32 pixels for the CNN architecture and the 1.0 m/pixel resolution, a
m image cube is adopted as the evaluation unit.
4. Experiment and Result Evaluation
4.1. Object-Based LSA
To validate the effectiveness of the proposed LSA model, classical CNN classification models were selected for comparative experiments. These models include VGG [
33], GoogLeNet [
34,
45], ResNet [
20], and DenseNet [
35]. VGG is a well-known deep CNN featuring a straightforward architecture and impressive performance, characterized by a deep hierarchical structure that employs a repetitive stacking of small convolutional kernels and pooling layers. In this study, VGG16 was utilized as the backbone of the network. GoogLeNet, developed by the Google team, is recognized for its innovative Inception structure, enabling the extraction of features at different scales and performing well in image classification tasks. ResNet introduced residual blocks, allowing the network to learn residual information effectively, enabling the training of deeper networks while improving performance. DenseNet is a densely connected CNN where each layer is connected to all preceding layers. This architecture promotes better information flow, addressing the gradient vanishing problem and enhancing feature delivery efficiency.
Given that the original implementations of these networks accept input data of size with 3 channels, modifications were necessary to align with the dataset characteristics in this work. Specifically, the number of input channels in the first convolutional layer was adjusted to 11, while the output features in the last fully connected layer were modified accordingly to match the feature dimensions. These models maintained binary classification outputs.
Every model was trained using a batch size of 32 and a learning rate of 0.001 on a training dataset spanning 300 epochs. The model’s fitting performance is assessed on the validation dataset at the end of each epoch. An early stopping approach is used to avoid overfitting. Training is stopped and the optimal network parameters are preserved if the validation loss does not improve for 10 consecutive epochs. All experiments were conducted on a workstation equipped with an Intel Core i7-7700 processor (3.6 GHz), an NVIDIA GeForce RTX 3090 GPU, and running Windows 10. Models were implemented using the PyTorch 2.0.1 framework and the Python 3.8 environment. The datasets used were in size. To find the best-performing parameters, hyperparameters were tuned empirically.
The Adaptive Moment Estimation (Adam) optimizer [
24] is employed to optimize the model parameters during training. Adam iteratively updates the neural network weights to minimize the loss function
J, which uses binary cross-entropy to quantify the discrepancy between true and predicted probability distributions:
where the following apply:
m: number of training samples.
: actual label (0 or 1).
: predicted probability of landslide occurrence.
After training all models, they are applied to calculate the landslide susceptibility in the natural areas of Hong Kong. During the experiments, the land areas of Hong Kong were divided into a grid of non-overlapping
pixel image blocks. The LSMs for these natural areas, generated by each model, are presented in
Figure 10 and
Figure 11.
As shown in
Figure 10 and
Figure 11, the landslide susceptibility distributions from the models exhibit similar trends and align well with the actual landslide distribution (see
Figure 1), though there are slight differences in classification tendencies. The LSM generated by the ResNet-based model tends to classify evaluation units into lower susceptibility categories, while the VGG-based model leans towards higher susceptibility categories. In contrast, the LSMs from the other models show a more balanced distribution across different susceptibility categories. The trends observed in the MSCNN and MSMHA models are quite similar and differ somewhat from the standard CNN-based models. High susceptibility areas are primarily concentrated in the southeastern mountains of Lantau Island, the western New Territories (e.g., Castle Peak), the mountains of Sai Kung (e.g., Sharp Peak), and various coastal slopes. The high landslide susceptibility in these regions can be attributed to steep slopes and the predominance of weathered volcanic rock with sparse vegetation, leading to low resistance against shear and erosion. The MSCNN, MSSE and MSMHA models identify more landslide-prone areas than the standard CNN classification model.
4.2. Pixel-Based LSA
To further validate the advantages of the proposed MSMHA and MSSE models, this work also compares them with traditional ML-based models that use pixels as samples. These ML-based models include RF, MLP, and SVM.
RF, MLP, and SVM are widely used methods for landslide susceptibility assessment. RF is an ensemble method based on decision trees that aggregates predictions via voting. MLP is a feedforward neural network with nonlinear activation functions and strong mapping capability. SVM finds an optimal separating hyperplane and uses kernel functions to handle nonlinear classification.
Different parameter combinations were tested to optimize the performance of each model on the test data. For SVM, the RBF kernel function was used, and the kernel function parameters and penalty coefficients were optimized by the grid search method to construct the SVM model. The penalty coefficient C in the SVM model is 0.5, and the gamma is 0.01. For RF, 1000 decision trees were selected to form an ensemble. To avoid overfitting, the maximum depth of each tree was limited to 10. The minimum number of split samples was set to 2, and the minimum number of leaf node samples was set to 1. The square root of the total number of features was used to limit the features considered at each node, thus increasing the diversity of the model. Bootstrap sampling was enabled to ensure some randomness in the training data for each tree. In addition, the Gini coefficient criterion was used to assess the quality of the splits. These configurations aim to balance the performance and generalization ability of the model to obtain the best training results. MLP uses three hidden layers, each containing 1024 neurons, with ReLU as the activation function for each layer.
The above models were trained using the produced pixel-level dataset. Similarly, the natural areas of Hong Kong were divided into pixel grids and the trained models were used to assess the susceptibility of each grid one by one to obtain a territory-wide LSM for each model.
Figure 12 illustrates the LSMs for all ML-based models.
Figure 12 shows that there are some differences in the LSM based on each ML-based model but the high susceptibility areas are all concentrated in the Jurassic volcanic aggregates in the southwest hills of Lantau Island. Comparison with
Figure 1c shows that, overall, the landslide susceptibility trends and spatial distributions obtained by each method are generally consistent, but there is some variability in the LSMs obtained by MSMHA and MSSE over the ML-based models. The LSM results from RF, SVM and MLP have a higher percentage of low susceptibility areas, while MSMHA and MSSE have a higher percentage of high susceptibility areas, but the MSSE model predicts a larger area of high and very high susceptibility areas of landslides.
4.3. Quantitative Evaluation
Model performance was evaluated using multiple metrics on both training and test datasets, with an ablation study to quantify the contribution of each architectural component.
Evaluation Metrics
This study employs a variety of statistical evaluation metrics, including accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve, and area under the curve (AUC), to evaluate the performance of the landslide susceptibility model. These metrics are derived from the four outcomes of binary classification: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). Specifically, TP refers to landslide samples correctly predicted as landslides, while TN denotes non-landslide samples correctly classified. FP represents non-landslide samples incorrectly predicted as landslides, and FN indicates landslide samples misclassified as non-landslides.
Precision measures the reliability of positive predictions, while Recall reflects the model’s sensitivity in detecting actual landslides. The F1 score, as the harmonic mean of precision and recall, offers a balanced assessment of overall performance. The ROC curve visualizes the trade-off between True Positive Rate (TPR) and False Positive Rate (FPR) across varying thresholds, and the AUC provides a scalar summary of discriminative power, with values approaching 1.0 indicating strong classification capability.
The formulas for these metrics are given in Equations (
10)–(
15):
Ablation Analysis: Role of Multi-Scale and Attention Mechanisms
To systematically evaluate the contribution of each component in the proposed MSAN framework, we conduct an ablation study. The analysis focuses on three key variants: (i) MSCNN, the baseline model with only multi-scale feature extraction; (ii) MSSE, incorporating channel-wise attention via Squeeze-and-Excitation blocks; and (iii) MSMHA, capturing long-range spatial dependencies via a multi-head self-attention mechanism.
Despite incorporating multi-scale receptive fields, MSCNN underperforms with an F1 score of 0.64 on the test set—lower than the simple baseline CNN (0.84). This degradation suggests that naive concatenation of multi-scale features introduces redundancy and scale conflicts, impairing feature integration. Without a mechanism to prioritize informative scales, the model struggles to distinguish relevant terrain characteristics from noise.
The integration of attention modules substantially improves performance:
MSSE (F1 = 0.86, AUC 0.89) improves over MSCNN by 33% in F1 score, demonstrating that channel-wise attention effectively recalibrates feature importance and suppresses irrelevant multi-scale responses.
MSMHA (F1 = 0.87, AUC 0.91) achieves the best overall performance, indicating that spatial self-attention further enhances the model’s ability to capture long-range dependencies—such as drainage convergence zones or fault alignments—that are critical for landslide initiation.
Both MSSE and MSMHA outperform the baseline CNN, confirming that adaptive feature recalibration, rather than mere multi-scale representation, is the key driver of performance gains.
4.4. Validation of the Landslide Susceptibility Map
The landslide susceptibility map (LSM) was evaluated using three complementary checks: areal distribution of susceptibility classes, landslide density analysis, and temporal validation with independent data.
First, the areal distribution of susceptibility classes across the entire study region was analyzed (
Figure 15a). The results indicate that the “very low” susceptibility class occupies the largest proportion of the landscape, whereas “high” and “very high” susceptibility classes cover relatively small areas. This distribution aligns with the geographical and geological characteristics of Hong Kong, where, despite widespread steep topography, the majority of slopes are either inherently stable or effectively managed through engineering and vegetation controls, resulting in only a limited number of persistently unstable zones.
Second, to quantitatively assess the model’s discriminative power, landslide density (defined as the number of landslides per km
2) was computed for each susceptibility class using the complete historical landslide inventory from 1984 to 2019 (
Figure 15b). A clear and progressive increase in landslide density with rising susceptibility levels is observed, from the lowest density in the “very low” class to the highest in the “very high” class. This monotonic relationship confirms a strong positive correlation between predicted susceptibility and actual landslide occurrences, which is a fundamental criterion for a reliable and meaningful LSM.
Third, an independent temporal validation was performed using 3198 landslide locations from 1974 to 1983, predating the training dataset (1984–2019), to evaluate the model’s predictive capability over time. An equal number of non-landslide points were randomly sampled to ensure balanced evaluation. The spatial distributions of these points are shown in
Figure 16a,c.
The MSMHA-generated LSM was applied to classify these historical points, with results presented in
Figure 16b,d. As shown in
Figure 16b, over 65% of the pre-inventory landslide locations were classified into the “high” and “very high” susceptibility categories, demonstrating the model’s ability to identify historically unstable areas despite temporal independence.
In contrast, non-landslide points are distributed relatively evenly across all susceptibility classes (
Figure 16d). This contrast between historical landslide clustering in high-susceptibility zones and the uniform distribution of stable points confirms the model’s strong spatial discrimination and high sensitivity. These results validate the robustness of the proposed LSM for regional-scale hazard screening, land-use planning, and prioritization of slope safety inspections.
6. Conclusions
This study proposes and evaluates a multi-scale attention network (MSAN) framework for landslide susceptibility assessment (LSA), with two attention-enhanced variants: MSSE and MSMHA. The key findings of this research are as follows:
Performance of proposed framework. Experimental results on a large-scale dataset from Hong Kong demonstrate that both MSSE and MSMHA consistently outperform classical CNN architectures (VGG, ResNet, DenseNet, and GoogLeNet) and traditional machine learning methods (SVM, RF, MLP). The MSMHA model achieves the highest performance with an F1 score of 0.87 and AUC of 0.91, achieving competitive LSA accuracy. The performance of the proposed models validates the effectiveness of combining multi-scale feature extraction with attention mechanisms for modeling complex landslide-terrain relationships.
Critical role of attention mechanisms. The ablation study reveals that attention mechanisms are essential for realizing the benefits of multi-scale representation. The MSCNN variant (multi-scale without attention) underperforms with an F1 score of 0.64, while both MSSE (F1: 0.85) and MSMHA (F1: 0.87) achieve better results. This demonstrates that simply concatenating multi-scale features introduces redundancy and scale conflicts, which attention mechanisms effectively resolve by adaptively emphasizing critical features and suppressing irrelevant inputs. The spatial self-attention in MSMHA further enhances performance by modeling non-local feature interactions.
Advantages of CNN-based approaches. All CNN-based models consistently outperform traditional machine learning approaches, demonstrating the advantage of learning hierarchical spatial representations directly from raw geo-environmental data without reliance on manual feature engineering. The spatial context captured by CNNs enables better identification of terrain patterns and geomorphological indicators that are crucial for accurate landslide susceptibility assessment.
Validation of predictive capability. Temporal validation using pre-inventory landslide data (1974–1983) demonstrates that the proposed MSMHA model successfully identifies over 65% of historically unstable areas within high- and very-high susceptibility zones, confirming the model’s strong spatial discrimination and predictive skill despite temporal independence from the training period.
Practical implications. The proposed MSAN framework, particularly the MSMHA variant, provides an accurate solution for regional landslide risk assessment, with potential for supporting landslide risk mitigation and land-use planning in urbanized mountainous regions. The framework’s ability to generate spatially coherent susceptibility maps with high accuracy offers valuable tools for hazard screening, prioritization of slope safety inspections, and informed decision-making in land-use planning.
Future work will explore the integration of dynamic environmental factors (e.g., rainfall and vegetation change) to further improve model performance and temporal resolution. Additionally, efforts will be directed toward developing interpretability analysis methods to enhance understanding of how individual conditioning factors contribute to landslide susceptibility predictions.