Harnessing Geospatial Artificial Intelligence and Deep Learning for Landslide Inventory Mapping: Advances, Challenges, and Emerging Directions

Chen, Xiao; Li, Wenwen; Hsu, Chia-Yu; Arundel, Samantha T.; Higman, Bretwood

doi:10.3390/rs17111856

Open AccessReview

Harnessing Geospatial Artificial Intelligence and Deep Learning for Landslide Inventory Mapping: Advances, Challenges, and Emerging Directions

by

Xiao Chen

¹

,

Wenwen Li

^1,*

,

Chia-Yu Hsu

¹

,

Samantha T. Arundel

²

and

Bretwood Higman

³

¹

School of Geographical Sciences and Urban Planning, Arizona State University, Tempe, AZ 85287, USA

²

U.S. Geological Survey, Center of Excellence for Geospatial Information Science (CEGIS), Rolla, MO 65401, USA

³

Ground Truth Alaska, Seldovia, AK 99663, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(11), 1856; https://doi.org/10.3390/rs17111856

Submission received: 6 March 2025 / Revised: 1 May 2025 / Accepted: 20 May 2025 / Published: 26 May 2025

(This article belongs to the Section Environmental Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

Recent advancements in artificial intelligence (AI) and deep learning enable more accurate, scalable, and automated mapping. This paper provides a comprehensive review of the applications of AI, particularly deep learning, in landslide inventory mapping. In addition to examining commonly used data sources and model architectures, we explore innovative strategies such as feature enhancement and fusion, attention-boosted techniques, and advanced learning approaches, including active learning and transfer learning, to enhance model adaptability and predictability. We also highlight the remaining challenges and potential research directions, including the estimation of more diverse variables in landslide mapping, multimodal data alignment, modeling regional variability and replicability, as well as issues related to data misinterpretation and model explainability. This review aims to serve as a useful resource for researchers and practitioners, promoting the integration of deep learning into landslide research and disaster management.

Keywords:

landslide detection; GeoAI; remote sensing

1. Introduction

Landslides involve the gravitationally driven movement of rock, debris, or soil down a slope. This definition includes a wide range of phenomena, from rockfalls involving less than a cubic meter of material to submarine failures of thousands of cubic kilometers (>12 orders of magnitude) and moving at rates ranging from less than a centimeter per year to tens of meters per second (>10 orders of magnitude) [1,2,3,4]. Materials range from strong igneous rocks to soupy loess mud, and failures can occur on slopes that are nearly vertical or as gently sloping as a few degrees. Landslides can be triggered by abrupt shaking, such as from earthquakes, or by increases in pore water pressure resulting from events such as extreme rainfall. However, the broader causes of landslides can span a range of preconditioning factors, including weathering, permafrost degradation, glacial debuttressing, and undercutting by natural or artificial processes. Large landslides can exhibit emergent behavior that culminates in catastrophic failure, as evidenced by systematic precursory acceleration [5].

Landslides pose a serious hazard, both as a result of direct impacts and through cascading effects, such as tsunamis. Worldwide, landslides are among the deadliest natural hazards [6]. A tragic example occurred from 14 to 16 December 1999, when a rainstorm on the slopes of the Sierra de Avila, Venezuela, triggered thousands of landslides, resulting in the loss of 15,000 to 30,000 lives [7]. Hence, landslide mapping to support post-disaster damage assessments has become a critical task. In this context, landslide inventory mapping (LIM) plays a vital role by providing fundamental information about landslides, including their location, date of occurrence, and types of mass movement, as well as indicators of their triggering events, which are critical for predicting and understanding the processes [8]. If they are large enough, inventory datasets can be analyzed using statistics to predict risk factors and enhance process understanding. Such inventory maps also help with translating susceptibility estimates into probabilistic hazard assessments [9,10]. In addition to post-disaster landslide assessment, prediction and pre-event mapping are also important. Several studies have leveraged large datasets to generate landslide susceptibility maps, hazard assessments, and early warning systems before major events occur [11,12]. For example, susceptibility models trained on historical landslide inventories and environmental factors have been used to identify high-risk areas, aiding in proactive disaster management [13].

Traditionally, producing landslide inventory maps has been labor-intensive and time-consuming, as they are created manually by experts through field or remote sensing investigations. Given the increasing availability of high-resolution satellite imagery, new methods are emerging to support automated and accurate landslide mapping. Machine learning techniques that aim to derive new information through data-driven analysis, such as Decision Trees (DTs), Random Forest (RF), Support Vector Machines (SVM), k-nearest neighbors (kNNs), Logistic Regression (LR), and Artificial Neural Networks (ANNs), have been widely applied in landslide mapping. In terms of data processing granularity, the abovementioned algorithms can be further adapted for pixel-based and object-based analysis. Pixel-based classification leverages spectral or terrain information at the individual pixel level during training to output the landslide area or non-landslide area for each pixel on an image. In comparison, object-based classification identifies the landslide area based on a group of pixels that are similar in texture, shape, spatial distribution, and other features. For example, Moosavi et al. [14] compared two pixel-based classifications (ANN and SVM) with an object-based approach (the Taguchi method, which optimizes ANN and SVM) in producing a landslide inventory map in a forested area. Li et al. [15] proposed semi-automatic methods and object-based image analysis (OBIA) in forested landslide mapping using lidar data. Stumpf and Kerle [16] applied object-oriented image analysis and RF to remotely sensed images to identify landslides in four different regions: the Momance River (Haiti) and the Wenchuan (China), Messina (Italy), and Barcelonnette (France) Basins.

Although popular, these traditional machine learning methods require feature engineering to derive important attributes that humans use to delineate landslide areas and other properties. As a result, the identified features may not transfer well to another study area, limiting the scalability and accuracy of landslide analysis across spatially heterogeneous regions. The recent rapid growth of geospatial artificial intelligence (GeoAI) [17] methods, especially deep learning, has helped fill the gap by enabling the intelligent and automated analysis of geospatial big data [18]. The human brain’s structure has inspired the development of neural networks, also known as artificial neural networks (ANNs), which consist of connected neurons with weighted input, hidden, and output layers, serving as the basis for deep learning models. By mimicking human perception processes, which possess the property of compositionality (for example, complex, more cognitive concepts can be composed of arbitrary concepts at a lower cognitive level), a deep learning model is capable of learning important semantic features through the iterative composition of low-level features into high-level semantics.

Compared to the literature on using shallow machine learning for landslide mapping, there is a lack of research that provides a systematic review of the recent advances in deep learning in this area. Ma et al. [10] reviewed the applications of machine learning and deep learning in landslide detection and prediction and explored various data sources used in landslide studies. However, the paper provides only a brief introduction to the use of convolutional neural networks in landslide detection. It does not include discussions on the full range of deep learning models, such as Generative Adversarial Networks, Long Short-Term Memory Networks, or Transformers, and their potential role in supporting landslide prediction and mapping. Hemalatha [19] introduces deep learning in landslide susceptibility mapping and detection. For landslide detection, they introduce a few works based on three data sources (satellite images, DInSAR, and author-generated datasets from various sources) and discuss the challenges of these techniques in landslide studies. However, the literature covered is limited in scope and does not cover newly proposed approaches. The importance of other critical data sources, such as digital elevation models (DEMs), in supporting landslide mapping is also not clarified in the paper. Tehrani et al. [20] provide a useful summary of the application of conventional machine learning and deep learning to landslide detection, susceptibility mapping, and temporal forecasting. In terms of deep learning, they categorize works into image classification, patch-wise image classification, and semantic segmentation, elaborating on the different models that have been used. However, this paper only briefly mentions a few works within these categories, so a more comprehensive review of deep learning applications in landslide mapping is desired, especially given the rapid growth in this area. Mohan et al. [21] provide an overview of landslide types and triggering factors, discussing several studies that use machine learning and deep learning models in landslide susceptibility mapping, detection, and segmentation. Regarding landslide detection tasks, this work primarily introduces various deep learning models that can be applied to landslide studies, some of which are promising models rather than summaries of existing techniques that have demonstrated successful applications.

This paper addresses these gaps by providing a systematic review of the applications of deep learning in landslide inventory mapping. We focus our discussion on several key aspects to enhance the integration of novel AI technology in this critical area of disaster research. Specifically, we provide an in-depth discussion of data sources, deep learning models, and various learning strategies and techniques employed in landslide detection. Furthermore, we categorize the latest strategies and techniques based on their shared and distinct features and research objectives. Figure 1 provides a big picture view of the content this paper covers. By synthesizing and categorizing the latest strategies, this paper seeks to enhance the understanding of how advanced deep learning techniques can be effectively applied to landslide detection and characterization and address common challenges within the field.

The remainder of the paper is organized as follows: Section 2 summarizes the important data and deep learning models frequently used in landslide mapping. Section 3 introduces recent improvements to various models, including CNNs, GANs, LSTMs, and Transformers, that support landslide mapping. Section 4 discusses advanced strategies and techniques that are widely applied to enhance the accuracy and generalizability of models. Section 5 outlines the remaining challenges in this interdisciplinary field of research and proposes potential directions for future studies. Section 6 provides the conclusion of this review.

2. Data and Methods

2.1. Data Sources

Remote sensing images, including optical satellite imagery, Unmanned Aerial Vehicle (UAV) images, Synthetic Aperture Radar (SAR), and Interferometric Synthetic Aperture Radar (InSAR), have proven to be valuable tools for studying landslides. Among these, low-elevation photogrammetry images from UAVs provide high-resolution, on-demand data that are particularly useful for monitoring landslides. UAVs can be rapidly deployed to capture images immediately after a landslide, offering timely data for rapid response and assessment. However, their coverage is typically limited to smaller areas compared to satellites, and adverse weather conditions can hinder their effectiveness.

Optical satellite images, such as those from RapidEye, GaoFen-1, and Sentinel-2, can detect landslides by identifying brightness variations caused by exposed soil and rock [22,23,24,25,26]. To improve detection accuracy, super-resolution techniques can be applied to enhance image details [27]. However, optical imagery has notable limitations, including poor orthorectification and susceptibility to cloud cover, which can obscure the Earth’s surface and hinder effective landslide monitoring. This is where Satellite Airborne Radar provides a valuable alternative.

Unlike optical sensors, SAR uses microwave signals that can penetrate through clouds, rain, and even vegetation. In addition, as SAR sensors are active sensors that transmit radar and record the returned signal, the images are not affected by the time of day or season, which can dramatically change illumination and shadowing in optical images. These capabilities allow SAR to collect terrain structure data both day and night, regardless of atmospheric conditions [20,28]. In addition, Interferometric Synthetic Aperture Radar (InSAR or IfSAR) can obtain precise information on the altitude and deformation of target objects through interferometry, using two or more SAR images of the same region captured at different times [29,30,31,32,33].

DEMs provide crucial topographic information for hilly areas, including slope, plan curvature, and aspect, which play key roles in landslide detection [34,35,36]. DEMs are acquired using various techniques, including photogrammetry (for example, structure-from-motion using UAV photographs), radar (InSAR), and lidar (laser mapping). Lidar transmits laser pulses that reflect off vegetation but also penetrate between leaves to measure the ground surface. The resulting point cloud data can be interpreted as bare-earth DEMs, as well as providing information about surface water and vegetation cover, making it the preferred data for the geomorphic mapping of landslides in most cases [37]. Single-era DEMs can be used to measure parameters such as slope, aspect, and elevation, while multi-temporal DEMs that bracket landslide activity can be used for change detection and volume estimation.

Figure 2 shows visualizations of sample optical imagery (e.g., RGB and short-wave infrared band), DEM (from ALOS PALSAR), along with ground-truth landslide annotations from the AI-ready dataset, Landslide4Sense [38], for LIM.

2.2. Common Deep Learning Models in LIM

Convolutional neural networks (CNNs) are highly effective in tasks such as image classification, object detection, and image segmentation, as they can efficiently extract hierarchical features from images, which proves beneficial in landslide mapping tasks. Early-stage CNN models typically consist of three main layers: a convolutional layer applies kernels or filters to extract features, a pooling layer reduces spatial dimensions and captures dominant features from the convolution layer’s output, and a fully connected layer uses nonlinear transformations to connect input and output vectors through weighted sums and biases [39]. Key CNN architectures include the Residual Network (ResNet), which is known for its deep structures and residual connections that mitigate the vanishing gradient problem. Popular variants include ResNet50, ResNet101, and ResNet152 [40]. The Visual Geometry Group (VGG) employs uniform 3 × 3 convolutional layers, followed by max pooling, a feature prominent in models such as VGG16 and VGG19. Inception (GoogLeNet) uses parallel convolutions of varying sizes to efficiently capture multi-scale features, with notable versions including InceptionV3 and InceptionResNetV2. MobileNet is optimized for mobile and embedded devices, using depth-wise separable convolutions, whereas Xception leverages depth-wise separable convolutions inspired by Inception to further enhance computational efficiency. These architectures have significantly advanced deep learning applications in remote sensing, providing robust capabilities for extracting detailed features and improving accuracy in geological and environmental mapping tasks.

Generative Adversarial Networks (GANs) have demonstrated promise in landslide detection, particularly in scenarios involving landslide change detection. GANs comprise a generator that produces synthetic images and a discriminator that distinguishes between these fake images and real images from bitemporal datasets [41]. This adversarial setup enables GANs to learn how to generate realistic images that simulate changes in terrain or land cover over time, which is crucial for detecting subtle alterations that are indicative of landslides. Conditional GANs (cGANs) further enhance this capability by conditioning the image generation process on additional information, such as environmental variables, historical data, or specific terrain features. In landslide mapping, cGANs can leverage bitemporal satellite images or other remote sensing data to generate synthetic images that highlight potential landslide-prone areas based on learned features and environmental conditions [42]. While they are still an emerging area in remote sensing applications, GANs offer novel opportunities to improve the accuracy and efficiency of landslide detection and mapping tasks by synthesizing informative images that complement traditional CNN-based approaches. These advancements contribute to a deeper understanding and more effective mitigation of landslide risks in geological and environmental studies.

Recurrent Neural Networks (RNNs) are essential for analyzing temporal changes in landslide mapping, utilizing hidden states to predict sequential outputs. Long Short-Term Memory (LSTM) networks address gradient issues with gated mechanisms, making them ideal for retaining long-term dependencies in time-series data, such as satellite imagery or weather records [43]. In landslide mapping, LSTMs excel in detecting changes that are indicative of landslides, enhancing early warning systems by leveraging historical data and temporal relationships. Convolutional LSTM (ConvLSTM) extends LSTM by integrating convolutional operations, thereby preserving spatial information while analyzing temporal sequences [44]. This integration is crucial for landslide mapping, allowing ConvLSTM models to extract spatial features from satellite images and learn spatio-temporal patterns of landslide-prone areas. By combining convolutional layers within LSTM cells, ConvLSTM enhances the accuracy of landslide detection models, providing comprehensive insights into terrain dynamics and improving disaster preparedness in vulnerable regions.

Transformers represent a significant advancement in neural network architectures and are designed to capture long-range dependencies and transform sequences effectively [45]. A key component of Transformers is self-attention, which assigns weights to input values based on similarity and generates attention vectors to capture relationships within the sequence. Multi-head attention further enhances this capability by processing multiple attention vectors simultaneously, enabling Transformers to model complex dependencies across data. For temporal data analysis, Transformers, including Transformer-XL and Reformer, are notable. Transformer-XL extends the architecture with recurrence mechanisms for the efficient handling of time-series data from satellite observations or weather records. The Reformer optimizes memory and computation through reversible layers and hashing, making it scalable for large-scale temporal datasets. In spatial analysis, Vision Transformers (ViT) and Multi-scale ViTs (e.g., MViTv2) have demonstrated superior performance [46]. ViT applies Transformers to image patches, capturing spatial relationships and context that are crucial for detecting landslide-prone areas using satellite imagery. MViTv2 enhances ViT models by extracting multi-scale features, resulting in stronger semantic representations and improved performance in dynamic geospatial mapping tasks.

3. Progress on Deep Learning-Based Modeling for LIM

3.1. Early Attempts at Using Deep Learning in LIM

Recognizing the effectiveness of CNNs in image processing, initial endeavors in landslide mapping involved the application of CNNs with simple architectures [34,47,48]. Examples of CNN applications in landslide inventory mapping include work by Ding et al. [47], who preprocessed bitemporal landslide images obtained from GaoFen-1 with 8 m resolution, after removing clouds, vegetation, water, and buildings. They then fit the preprocessed images into a five-layer CNN model, which consisted of two convolutional layers, two pooling layers, and a fully connected layer, to detect potential landslide change area. While the success of CNNs has been well documented, a recent study by Ghorbanzadeh et al. [34] found that not all CNN architectures significantly outperformed traditional machine learning methods such as ANNs, SVMs, and RF. The study also suggested that performance could be further enhanced with a larger and more representative set of training samples.

3.2. CNN Improvements in LIM

Later, additional studies were conducted to incorporate various modules or explore different CNN model architectures, aiming to enhance landslide detection accuracy. This section introduces how CNNs have been improved over time in LIM. Figure 3 visualizes four clusters of studies with enhanced CNN models for detecting landslides.

CNN model exploration in LIM

For the first cluster of research, Meena et al. [49] used a fully convolutional U-Net to segment landslides on two different datasets from the Rasuwa District, Nepal, to investigate the detection accuracy of U-Net using remote sensing images and topographic factors. The first dataset was based on bitemporal RapidEye optical images with a 5 m resolution, and the second dataset consisted of data from two sources: optical images and DEM. Unlike traditional CNNs geared toward image-level classification, U-Net’s encoder–decoder structure is tailored for semantic segmentation, which classifies images on a pixel basis. Additionally, U-Net’s decoder restores spatial dimensions for precise localization, aided by skip connections that preserve spatial details lost during downsampling. However, given the limited dataset size, although the U-Net model yielded the highest F1 score and Matthews Correlation Coefficient (MCC) among all the metrics, its performance was only slightly better than that of other machine learning techniques, such as SVM, kNN, and RF. Given that traditional CNNs require a fixed input size due to their architecture, particularly the fully connected layers, which demand consistent dimensionality, this can result in a loss of visual information during data resizing. To address this issue, Lei et al. [50] proposed a semantic segmentation approach leveraging a Fully Convolutional Network (FCN). The FCN is adept at handling inputs of varying sizes due to the absence of fully connected layers. The researchers also incorporated pyramid pooling (PP) to bridge the encoder and decoder of the FCN, creating a variant termed FCN-PP. This design enhances the exploitation of spatial information by providing multi-scale context to the network. Experimental results showed higher accuracy, F1 score, and precision, and lower overall error (OE), compared to U-Net and FCN.

Liu et al. [51] used a model called DenseNet, which has fewer parameters than ResNet, to segment landslides using aerial images with 5.8 m resolution and 19 topographic factors calculated from remote sensing images, DEMs, and geological data. The architecture of DenseNet is distinctive for its dense connectivity, in which each layer receives input from all preceding layers. This design facilitates the seamless flow of gradients during backpropagation, as the loss function’s gradients have direct paths through all layers rather than only the previous layer, as in traditional architectures. Consequently, this approach mitigates the vanishing gradient problem, making the network easier to train, even with fewer parameters. Moreover, the dense connections encourage feature reuse across layers, which is beneficial when working with smaller datasets. As a result, the model achieved higher precision, Kappa, and F1 scores than ResNet50 when segmenting landslides from limited data.

Su et al. [35] developed the LanDCNN, which generated a smooth segmentation map of landslides based on eight channels comprising bitemporal aerial images from Zeiss RMK, Lidar-Derived Digital Terrain Models (DTMs), and rasterized annotated labels for landslide areas. This model employs an encoder–decoder framework based on U-Net, incorporating ResNet50 (a residual CNN) into the encoder and utilizing its skip connections between layers to reduce the parameter space and enhance feature representations. The model outputs the pixel-wise probability for the landslide class, and the results are better than those of the traditional U-Net model. Janarthanan et al. [52] developed a lightweight model called SFCNet to segment landslides. This model employed a Separable Factorized Convolution, which reduces the number of learnable parameters and minimizes computational cost by decreasing the multiplication operations of the kernel. The model reduced the number of parameters to 54.9 k, which is approximately one hundred times less than ResNet50 and four hundred times less than VGG19; yet it achieved higher F1 scores and precision, accuracy, and Area Under the Curve (AUC) values than these models.

Component-level comparisons of the effectiveness of CNN models in LIM

Some research compares individual models or their combinations, utilizing different backbone networks, neural architectures, or optimizers [53,54,55,56,57,58]. For example, Ju et al. [53] compared RetinaNet, YOLOv3, and Mask R-CNN for detecting old loess landslides, which had vague boundaries and were less frequently studied. Optical images acquired from Google Earth were used for the analysis. The results showed that Mask R-CNN yielded the best precision, recall, and F1 score. Mask R-CNN’s two-stage framework produced better object localization, thereby improving overall detection accuracy compared to other architectures, such as YOLOv3, which predicts object location and category in a single stage to achieve faster inference speeds.

Similarly, Lin et al. [56] conducted a comparative analysis of various neural networks (DeepLab v3+, FCN, and U-Net) and optimization strategies (SGDM and Adam) for slope detection. DeepLab v3+ was of particular interest due to its use of atrous convolution with padding, which enables effective feature extraction by focusing on central features and leveraging information from previous feature maps. In addition, the network’s skip connections support multi-branch feature extraction, and batch normalization is applied at each layer to enhance parameter updates. Ultimately, the combination of DeepLab v3+ with ResNet18 and the SGDM optimizer demonstrated the highest precision, accuracy, and intersection over union (IoU) among all tested models. Yang et al. [57] compared different neural networks (U-Net, DeepLab v3+, and PSPNet) with varying backbones (VGG, MobileNet, Xception, and ResNet50). PSPNet uses a pyramid pooling module to divide the feature map into four distinct sizes, 6 × 6, 3 × 3, 2 × 2, and 1 × 1—enabling the learning of representations of varying sizes. The feature maps are then upsampled and concatenated to build a final representation. Yang et al. [57] demonstrated that DeepLab v3+ with Xception outperforms MobileNet but struggles to distinguish landslides from roads. As a result, they found that PSPNet, with ResNet50 as its backbone, achieved the highest overall IoU and precision values. A visual comparison of the results between PSPNet and the other models is presented in Figure 4. Both papers conclude that advanced architectures achieved better performance than the popular, off-the-shelf U-Net model in landslide detection.

Ganerød et al. [59] conducted two experiments: one experiment compared three models, including a pre-trained CNN, a continuous change detection and classification (CCDC) method, and a combined k-means clustering and RF classification model, for landslide segmentation at a global level using Sentinel-1, Sentinel-2 imagery, and DEM data on the Google Earth Engine. The other experiment aimed to compare the performance of a pre-trained U-Net with Classification and Regression Trees (CART) at a local level using both Sentinel-1 and Sentinel-2 imagery. The results showed that the locally pre-trained models outperformed globally pre-trained models, and the pre-trained deep learning-based U-Net model achieved better performance than the other machine learning models.

Multi-model integration in LIM

In addition to using a single CNN model, an Iterative Classification and Semantic Segmentation Network (ICSSN) was proposed to combine two CNN models by sharing the same encoder for landslide segmentation [60]. The encoder was a weight-sharing Siamese network that included a squeeze-and-excitation (SE) attention module, a skip connection, and an atrous spatial pyramid pooling (ASPP) module based on ResNet101. In the iterative training process, the upper branch was dedicated to object classification, while the lower branch featured a semantic segmentation network integrated with sub-object-level contrastive learning (SOCL) to extract boundary information. The encoder learned global and abstract features from the upper branch, as well as local and detailed features from the lower branch, through iterative training until both networks converged. ICSSN improved the F1 score and mIoU metrics beyond the enhancements achieved by an improved DeepLab v3+, a segmentation network with SOCL, and an object classification network augmented by OCL.

CNN model variants in LIM

Graph Convolutional Networks (GCNs) are a variant of CNNs. The similarity between GCNs and CNNs lies in their ability to learn hierarchical features through convolutional operations. However, GCNs are designed to handle graph-structured data consisting of nodes and edges. Unlike CNNs, GCNs can process varying sizes of neighboring features based on the graph’s topology, dynamically incorporating spatial proximity into their computations. As such, compared to CNNs, GCNs have advantages in utilizing topographical relationships among pixels to reveal contextual information about the spatial distribution and morphology of landslides [61,63]. For example, Li et al. [61] proposed a Dual-Channel Interaction Portable Graph Convolutional Network (DCI-PGCN) for landslide detection. The DCI-PGCN consists of three parts: an encoder, a dual-branch graph interaction module (DBGIM), and a decoder. The model combines a CNN and a GCN, where the CNN in the encoder extracts features that are then converted into a graph structure by the GCN. DBGIM embedded a dual-channel fusion graph structure, which was used to learn and refine the topologic features from the encoder. This structure includes a two-layer graph convolution applied to learn and fuse different channel features. A global maximum node connection algorithm with positive and negative connectivity was used to improve the efficiency of the graph feature propagation process. In the decoder, the output from DBGIM was upsampled to increase the feature dimensionality and fused with the features in the encoder at the corresponding depth, enriching the semantic information. This approach achieved the highest Mean Pixel Accuracy (MPA) and mIoU compared to other CNNs (such as DeepLabv3+), Transformers (e.g., TransResUNet [64]), and GCN-based methods (such as MGU-Net [65]).

3.3. The Application of GANs in LIM

Research has identified two key challenges in landslide mapping: misclassification between landslide and non-landslide areas, and the lack of pixel-level ground-truth data for training deep learning models. Several studies have explored the applications of GANs to address these issues, as shown in Figure 5.

Domain-adaptive GANs with adversarial learning in LIM

Fang et al. [66] developed a GAN-based Siamese Framework (GSF) consisting of two modules: a domain adaptation module and a Siamese detector. The goal of this framework is to distinguish between landslides and unchanged areas, as well as other types of changes, in bitemporal images. In the first module, the GAN generates post-landslide images from pre-landslide images with a 0.5 m spatial resolution, incorporating both adaptive loss due to domain shifts and adversarial loss from the discriminator. Minimizing these two losses helps separate changed areas from unchanged areas. In the second module, the Siamese detector distinguishes between post-landslide images generated by the GAN and real post-landslide images by introducing a contrastive loss. Finally, detected landslides can be separated from other changed areas after minimizing this contrastive loss.

Weakly inexact supervised learning trained on image-level labels in landslide segmentation

Zhou et al. [69] proposed a weakly supervised learning method that integrates Class Activation Maps (CAMs) with cycleGAN to achieve pixel-level landslide segmentation using only image-level labels, thereby reducing the need for large, annotated datasets. CAM generates the coarse spatial localization of landslides using learned model weights. cycleGAN is trained to perform style conversion, treating landslide and non-landslide images as two distinct styles. This enables cycleGAN to generate landslide images from non-landslide images inversely to generate non-landslide images from landslide images using two generators. Adversarial loss and cycle consistency loss help discriminators to differentiate between the real and fake images. After cycleGAN is trained, the non-landslide images generated from cycleGAN can be subtracted from real landslide images to segment the landslides. The final landslide areas can be identified by intersecting the results from CAM and cycleGAN.

Weakly semi-supervised learning in LIM

Zhang et al. [67] proposed a GAN-based semi-supervised multi-temporal deep representation fusion network (SMDRF-Net) for landslide mapping. As Figure 6 illustrates, the changed areas were first segmented based on pre- and post-event landslide images. Then, a fast fuzzy C-means (FCM) clustering algorithm was applied to classify landslide and non-landslide areas. A comprehensive uncertainty index was then applied to produce a pseudo-training set based on the uncertainty of the object labels, which was used to train and optimize the neural network. The results of this object-level segmentation informed the next step: a pre-trained Wasserstein generative adversarial network model with a gradient penalty (WGAN-GP) extracted pixel-level and object-level features from unlabeled pre- and post-event images, a process known as deep representation learning (DRL) on multi-temporal images. Lastly, the extracted objects from DRL were passed to a deep representation fusion (DRF) step, where a channel attention mechanism was employed to identify the inter-channel relationship in the multi-temporal features. A spatial attention mechanism was applied after the fusion of multi-temporal and multi-level features to enhance the feature representation, which was finally classified by a fully connected (FC) layer and a Softmax layer.

The segmentation accuracy of this GAN-based model was improved by adding multi-level analysis and spatial and channel attention mechanisms. This approach also reduced the demand for large amounts of manually labeled samples since DRF training only required pseudo-labels from previous analyses. When compared with two unsupervised LIM algorithms and two semi-supervised deep learning methods (the change-detection-based Markov Random Field model and the object-based majority voting method), it achieved the highest accuracy in landslide segmentation. This research demonstrates the successful integration of advanced deep learning strategies and their effective adaptation to landslide research.

Rather than improving the neural networks, Lu et al. [68] assessed the performance of a semi-supervised GAN (SSGAN) in classifying landslides using a dataset combining environmental factors and optical remote sensing images. This dataset consists of approximately 90,000 labeled landslide samples, 90,000 labeled non-landslide samples, and an equal number of unlabeled samples. The SSGAN framework reformulates the traditional N-class classification problem into an N+1-class task. Unlike standard GANs, which only distinguish between actual and synthetic samples, SSGAN simultaneously distinguishes between actual images and model-generated images and classifies both labeled and unlabeled data into landslide and non-landslide categories. The differentiation of real and model generated images enhances the comprehension of the discriminator to learn feature representations and classify the landslide and non-landslide samples. The experimental results showed that the combined dataset consistently yielded higher classification accuracy than using fully unsupervised learning.

3.4. The Application of RNNs in LIM

In contrast to single-temporal image classification, which requires high spatial resolution, change detection based on time-series images offers distinct advantages in detecting landslides using more readily available but coarser spatial resolution images. RNNs have been implemented to analyze time-series data to detect landslides. Figure 7 illustrates a classification of RNN methods used for LIM.

Landslide temporal change detection using LSTM

Zhou et al. [70] employed an LSTM model to detect landslide-affected areas by predicting changes in the Normalized Difference Vegetation Index (NDVI). Initially, landslide areas were masked out from the time-series imagery, allowing the model to generate NDVI predictions using GF-1 and Landsat data at a 30 m resolution. The predicted NDVI was then compared to post-event NDVI from actual imagery, with discrepancies flagged as anomalies. These anomalies were subsequently classified as either landslide or non-landslide areas using an SVM model trained on remote sensing imagery and DEM data. In this study, the landslide segmentation achieved 99.37% accuracy in the study area. Compared to multi-temporal image classification based on CNNs or GANs, LSTMs leverage their ability to capture temporal dependencies through memory cells and gating mechanisms, thereby mitigating the impact of non-landslide changes induced by noise in landslide change detection.

Landslide classification and extraction using LSTM

Some research has leveraged RNNs to classify landslides after the objects are segmented [37,71,72]. Mezaal et al. [37] compared the multi-layer perceptron neural network (MLP-NN) and the RNN in classifying landslide and non-landslide areas. They first segmented the landslide through a multiresolution segmentation algorithm, which delineated it based on segmentation parameters, such as scale, shape, and compactness, optimized through a fuzzy segmentation parameter-based optimizer. In the following step, they selected the training dataset through a stratified random sampling method, which could choose sufficient training sets from each class (landslide, cut slope, bare soil, and vegetation) to avoid sample selection bias. To classify landslides based on texture, spectral features, and lidar-derived data, they chose ten features (for example, digital terrain model, mean slope, and gray-level co-occurrence matrix homogeneity) from 39 candidates through a correlation-based selection algorithm. These selected variables are set as inputs to train the MLP-NN and the RNN separately. The MLP-NN comprised two hidden layers and an Adam optimizer, whereas the RNN consisted of an LSTM layer, two fully connected layers, a dropout layer, and a Softmax layer. As a result, the RNN achieved 81.11% cross-validation accuracy, which is 6.55% higher than that of the MLP-NN.

Varangaonkar and Rode [71] proposed a lightweight landslide extraction model that uses a CNN as a feature extractor and a LSTM as a classifier. They segmented landslides into regions of interest (ROI) based on threshold-based binary segmentation in Matlab. Features were extracted by inputting the ROIs into the CNN model, which was a modified and pre-trained ResNet50 with fewer filters in the residual blocks, thus reducing computation. The LSTM predicted the probability of the extracted features and classified them into landslide and non-landslide objects. The CNN-LSTM model was compared with a CNN paired with either an ANN or an SVM as the classifier, and the results showed that the CNN-LSTM model performed better than the other models, achieving the highest accuracy and highest F1 score.

3.5. The Application of Transformers in Landslide Detection

A Transformer is a neural network that exploits the multi-head self-attention mechanism following an encoder and decoder structure. Compared to the CNN, the Transformer can better capture global context information by taking advantage of its model architecture [73,74,75]. As such, the Transformer has been commonly adopted for landslide detection. Figure 8 presents two main architectural strategies for its use in LIM.

Single Transformer adaption in LIM

Several studies have used Transformers to detect landslides [23,74,75]. For example, Tang et al. [23] investigated the implementation of SegFormer, a Transformer-based segmentation model, for landslide detection. It comprises overlapped patch embedding layers that retain the local continuity of patches and a sequence reduction in the attention layer to improve computational efficiency. It adds a convolution and the skip connection in the position embedding layer to provide different levels of feature maps with positional information. The results showed that this model outperformed LandsNet [83], HRNet, DeepLabV3, Attention-UNet, U²Net, and FastSCNN in terms of IoU, mIoU, precision, recall, accuracy, and F1 score values.

Hybrid architecture designed for using Transformers with CNNs or GCNs in LIM

Rather than independently using Transformers, several studies involve combining Transformers and CNNs to map landslides more accurately [76,77,78,79,80,81,82]. For example, Kumar et al. [76] proposed a Glorot Initialization Optimal TransCNN that integrates a pre-trained Mask R-CNN and a Transformer to segment landslides. This study detected landslides by first denoising the images and preserving important features using a Maximum Information Coefficient-based wavelet filter, which measures the relationship between wavelet coefficients and landslide signals. Then, they separated the overlapping pixels using an independent component analysis-based clustering algorithm, which effectively decreased pixel ambiguity by grouping similar pixels. These steps produced individual landslide areas, which were then processed by a feature point descriptor to select the most informative features for landslides using the Archimedes Principle Optimization-based Attention Network (APO-AN). The APO-AN is introduced to capture complex relationships, reduce dimensionality, and enable interpretable feature selection.

After the features were selected, the TransCNN was applied to detect landslides. The TransCNN leverages the advantages of Mask R-CNN in object detection and the advantages of the Transformer in complex spatial context modeling, with the model’s weights optimized via Glorot Initialization. The Glorot Initialization scaled the initial weights to mitigate vanishing or exploding gradients, supported effective learning dynamics between two models’ components, and enhanced the initial training stability. The Glorot Initialization Optimal TransCNN demonstrated superior performance, achieving 1.32% higher accuracy and a 0.21% higher F1 score compared to the Fast Mask Regional Convolutional Neural Network (FMR-CNN), which was followed by the Region-Based CNN (R-CNN) and the plain CNN in terms of overall performance.

Wu et al. [82] developed a segmentation model called SCDUNet++ for landslide detection on 13 Sentinel-2 bands, four spectral index factors, and six topographic factors. In the Global Local Feature Extraction (GLFE) block, they applied a two-layer CNN to extract local details from images to prevent information loss in the subsequent deep Swin Transformer blocks. Meanwhile, the Detailed Spatial Spectral Aggregation (DSSA) block received the features extracted from GLFE and spectral information in two separate branches and fused these features to extract multi-scale representations. Furthermore, Deep Transfer Learning (DTL) was applied to train the model from a source domain and fine-tune the target model using target data so that the transferability of the model could be improved. The results showed that the proposed model obtained the highest precision, recall, MCC, IoU, and F1 scores compared to CNN models, such as FCN, UNet, STUNet, and DeepLabv3+, and other models integrated with or developed based on Transformers, such as Segformer and TransUNet.

ResU-Net can also be combined with Transformers [64,80,81]. For example, Yang et al. [80] designed a CTransUnet model with a Convolutional Block Attention Module (CBAM) and embedded a Transformer into ResU-Net to produce a landslide segmentation map. This study aims to fully leverage global contextual information—something that conventional CNNs struggle with—by incorporating a Transformer into the model architecture. With the help of ResNet50 and a Transformer in the encoder, ResNet50 can extract feature maps with different receptive fields, and the Transformer can extract global dependencies across the entire feature map. The CBAM in the decoder aims to enhance the important information and suppress the background noise. This modified model yielded a better IoU result than the standard ResU-Net, with an improvement of 1.84%.

Li et al. [81] proposed a hybrid network called DemDet, which consists of an encoder–decoder structure designed to fuse multimodal data for landslide detection. The encoder uses SegFormer to extract features from large hillshade maps and optical images, and ResNet to capture low-level features from DEM data. Since SegFormer is better at capturing global contextual information than CNNs, it was specifically applied to hillshade maps, where each pixel exhibits strong dependence on its neighboring pixels. In the decoding stage, multimodal features were concatenated through global pooling, and landslide regions were further enhanced using channel and spatial attention mechanisms. The experimental results showed that DemDet outperformed SegFormer, LandsNet [83], and other models, achieving the highest mIoU values and F1 scores. Specifically, DemDet achieved a 0.017 higher mIoU value than the SegFormer trained on hillshade and DEM, a 0.041 higher mIoU value when it was trained only on hillshade, and a 0.149 higher mIoU value when it was trained only on optical images.

Another realm of study combines Transformers and GCNs to map landslides more accurately. Fan et al. [62] introduced ETGC2-Net as a hybrid architecture that combines an enhanced Transformer (EFormer) and a Superpixel-Guided Graph Convolutional Network (SGCN) for landslide detection. EFormer fuses an Inverted Transformer and a Convolutional Block Attention Module to extract both global contextual and local semantic features with reduced computational cost. The Spatial Information Enhancement Module (SIEM) compensates for spatial detail loss during downsampling. SGCN utilizes superpixels to model spatial topology, thereby reducing computational burden while capturing structural dependencies among neighboring regions. The combination of Transformer and GCN leverages their complementary strengths—Transformers capture long-range global dependencies, while GCNs effectively model internal topological relationships. ETGC2-Net achieved notable performance improvements in mIoU values and F1 scores on two landslide datasets.

4. Advanced Deep Learning Strategies and Techniques Adopted for Landslide Mapping

4.1. Feature Enhancement and Fusion Techniques

Misclassifying landslide and non-landslide classes is a common challenge in LIM. This section introduces several studies that concentrate on mitigating this issue using feature enhancement techniques combined with CNN methods. Table 1 lists a classification of these techniques.

Image-derived feature representations enhancement in LIM

Several works enhance features based on original remote sensing images and secondary data derived from the original images. For example, remote sensing images provide multispectral data to create new indices that are useful for discerning landslides from non-landslide areas by calculating differences between multiple bands. As such, the landslide features fed into the models can be enhanced by leveraging these indices based on diverse radiation absorption and reflection, such as the grayscale index change and the NDVI [84,85,86]. For example, Wang et al. [25] fused the NDVI and NIR from GF-1 satellite images (2 m resolution), which were set as input data into a CNN. Where vegetation is removed in post-landslide areas, NDVI can be used to differentiate landslide and non-landslide regions by enhancing vegetation coverage. NIR can detect soil areas in landslide regions to further improve the detection of landslides. Compared to a single band input, this enhancement increased the accuracy, precision, recall, and F1 score by 9.96%, 72.92%, 44.98%, and 65.53%, respectively. In contrast, compared to RGB bands, its accuracy, precision, recall, and F1 score increased by 0.46%, 2.18%, 1.94%, and 2.06%, respectively.

In addition to indices, calculating spatial relationships between image pixels can also help to enhance textural features as secondary data, such as the gray-level co-occurrence method. Xu et al. [26] developed a Feature-Based Constraint Deep U-Net model (FCDU-Net). This model enhanced landslide features by using NDVI and a gray-level co-occurrence matrix created based on a grayscale image to extract the texture of features at the pixel level. They combined enhanced features with the original images fed into Deep U-Net (DU-Net), which leveraged U-Net and DenseNet, thus allowing for the semantic segmentation of landslides using feature-based models. This FCDU-Net yielded a higher mean intersection over union (mIoU), precision, recall, F1 score, Kappa, and overall accuracy (OA) than the regular Deep U-Net.

Feature fusion enhancement in LIM

Feature fusion is another commonly used feature enhancement approach for enhancing the foreground features in deep learning models. For instance, Wang et al. [87] adopted a multi-level feature enhancement network (MFENet), which comprised ResNet34 as a feature extractor, a post-event feature enhancement module (PFEM), a bi-feature difference enhancement module (BFDEM), and a flow direction calibration model (FDCM). Figure 9 illustrates its architecture. This study used PFEM and BFDEM to enhance multi-level features, and FDCM aimed to calibrate the flow direction to detect complete landslides. PFEM obtained post-event features extracted from the Siamese network and improved these features by fusing low- and high-level features in an enhancement block. The enhanced post-event and the original pre-event features were then fed into BFDEM to improve the detection of change areas in the upsampling and upward propagation process. MFENet proved superior in landslide segmentation, achieving higher recall, IoU, and F1 score values than other methods, such as LanDCNN [35].

Similar to DeepLabv3+, Wang et al. [88] introduced the Gated Dual-Stream Convolutional Neural Network (GDSNet) for landslide segmentation, which incorporates two branches: one for extracting detailed features such as shapes and the other for enhancing features through gated convolution to adjust feature weights. In addition, an atrous spatial pyramid pooling feature fusion module (AFF) combined with an attention map merges these two branches and optimizes the model’s performance by synergizing detailed extraction and enhancement processes. With the aggregated information from gated convolution and the other branch, this model outperformed nine other models, such as FCN8s, U-Net, DeepLabv3+, DANet, and CCNet, in terms of mIoU, F1 score, OA, and Kappa coefficient measures.

Zhang et al. [89] introduced the concept of structural reparameterization into the encoder of a U-shaped network and trained the model on the Bijie Dataset, Luding Dataset, and GF-2 satellite imagery. The structural reparameterization is completed with a five-branch feature extraction module (FFEM) to enhance feature capture. The first branch can enrich the feature representation with 1 × 1 convolution and batch normalization (conv-bn). The second and third asymmetric branches consist of convolutional modules with small receptive fields that can capture local relationships among feature maps. The fourth branch fuses and interacts with features in the channel dimension. In the fifth branch, 3 × 3 conv-bn obtains weight and bias from other branches. The design of structural reparameterization separates of the training and inference of the network. During inference, it combines multiple modules into a single convolutional layer to reduce the inference cost while maintaining improved accuracy. The proposed model achieved overall better performance compared to FCN, DeepLabv3+, Att-UNet, and UNet++.

Several papers adopted multi-scale feature fusion to enhance landslide detection [90,91,97]. Li et al. [90] developed a Multi-scale Feature Fusion Scene Parsing (MFFSP) framework to segment landslides. A Visual Geometry Module (VGM) was used to extract low-level features using 33 convolution kernels with a stride of 1. A Residual Learning Module (RLM) was applied to extract mid-level landslide feature maps by applying two convolution operations, batch normalization, and rectified linear units. Finally, the Transformer Module (TRM) was employed to obtain global-level long-range dependencies through patch embedding, attention, and positional embedding layers. Features at different levels were fused in the decoder. This method achieved the highest F1 score, Kappa, and mIoU values in two different experimental areas, Rio de Janeiro and Taiwan, compared to seven other Transformer-based or convolution-based networks.

In addition to the multi-level feature fusion strategies mentioned above, multimodal feature fusion combined with attention modules is commonly used to enhance feature representations in landslide mapping tasks [35,81,91,92]. For instance, Lu et al. [91] developed a dual-encoder architecture based on U-Net, incorporating a multi-source feature fusion module and a self-attention module in the decoder. A primary encoder was designed for Sentinel-2 imagery (consisting of 12 bands), while a secondary encoder was built for NASA DEM, slope, and aspect maps. Features from the two encoders were fused at each layer. The self-attention module was applied in the skip connections of the decoder to better integrate features from both the encoder and decoder. This model improved the F1 score and precision, achieving comparatively higher performance than SegNet, U-Net, and AttUNet.

Liu et al. [92] developed a feature fusion semantic segmentation network (FFS-Net), which employs a Siamese two-branch network without weight sharing to extract morphological features from high-resolution spectral images and topographic features from DEM data. These features are then fused, and semantic information is extracted from the fused features using a pre-trained ResNet101 network. Similarly, Yang et al. [93] introduced SAMLS, a two-branch encoder with a Cross-Branch Attention (CBA) mechanism embedded within the Segment Anything Model (SAM), to fuse optical features from RGB data and topographic features from DEM data. In the RGB and DEM branches, they modified three downscale adapters and eight feature adapters to adapt the original patch and positional embeddings to a shorter input sequence in SAM, enabling more efficient computation. The feature adapters incorporated Multi-Layer Perceptrons (MLPs) with activation functions to facilitate the fine-tuning of the pre-trained image encoder. In the DEM branch, the CBA in each global Transformer dynamically correlated optical and topographical features by taking features from the RGB branch as the query and terrain features from the DEM branch as the key and value.

Other studies have stacked multi-modality data to support landslide inventory mapping tasks [94,95]. For example, He et al. [94] proposed using four dimensions of input data: spectral bands, spectral indices (e.g., normalized green band), terrain factors (e.g., slope), and texture indices (e.g., entropy or the mean of a gray-level co-occurrence matrix). They applied a U-Net with an attention mechanism to analyze this multi-dimensional dataset. The results showed that the four-dimensional dataset achieved higher mIoU and F1 score values compared to datasets with fewer dimensions.

Background enhancements in LIM

In addition to the abovementioned feature fusion, which serves as a foreground feature enhancement strategy, there are several works focused on background enhancement [36,96]. Yang et al. [36] proposed a background enhancement landslide segmentation method based on Mask R-CNN. In their work, they adopted two steps for enhancing the background. The first one was splicing images (Mosaic), where each image was spliced with one landslide, and the other three confusing non-landslide objects were obtained from training samples. These spliced images were used to train the model, aiming to enhance its ability to distinguish landslides from complex and confusing features. The second one was a modified CutMix strategy, which was derived from Yun et al. [98], designed to enhance landslide segmentation tasks. This strategy involved replacing a region in an image with pixels from another image, allowing the model to focus on the entire area rather than differentiating objects, thereby enhancing landslide recognition. These two enhancement steps could train models to be more accurate in detecting landslides instead of effortlessly fusing features. They also set landslide-inducing topographic factors as input data to provide auxiliary information. They found that adding topographic factors as input data and training the model with background enhancement could yield the best precision, recall, and F1 score values.

Similarly, Liu et al. [96] proposed a background enhancement method with multi-scale samples (MSSCBE) based on Mask R-CNN. Although they also adopted CutMix and Mosaic, instead of training the model separately with the two enhanced samples, they randomly spliced the four samples into a final new sample based on the Mosaic method: non-landslide and landslide images of 256 × 256 pixels, as well as a background-enhanced sample based on CutMix and a landslide sample of 512 × 512 pixels. The multi-scale background enhancement techniques further improved the model’s overall performance compared to using CutMix or Mosaic background techniques only.

4.2. Attention-Boosted Neural Network for LIM

This section introduces various attention modules that have been applied in existing research to improve the model prediction accuracy of LIM. The attention mechanism is used to suppress background noise and help the model focus on foreground information, thereby increasing detection accuracy. Table 2 summarizes different attention-boosting techniques for LIM.

Attention blocks used in LIM

Nava et al. [28] incorporated an attention gate into U-Net to enable the model to focus on relevant information about landslides. The attention gate extracts spatial information from low-level feature maps to reduce the extraction of irrelevant features and increase the overall detection accuracy. Similarly, Wei et al. [99] deployed an attention convolutional block through element-wise multiplication and a sigmoid function, and its weights were further updated through the backpropagation process.

Spatial/channel attention mechanism applied in LIM

Spatial attention (shown in Figure 10a) and channel attention (shown in Figure 10b) are two specific types of attention modules. Spatial attention refines features within a feature map, whereas channel attention enhances the channels that contribute the most across different feature maps. Spatial and channel attention are commonly concatenated in attention modules, such as the Bottleneck Attention Module (BAM) and the Convolutional Block Attention Module (CBAM) [101,102,103]. For example, Qin et al. [104] modified the CBAM using different kernel sizes in a submodule to obtain higher landslide detection accuracy. While the integration of channel and spatial attention in the CBAM preserves global consistency, it fails to separately account for cross-dimensional interactions between the spatial and channel domains. To address this, Ji et al. [101] proposed a 3D spatial and channel attention module (3D SCAM) that integrated spatial and channel attention modules in parallel, enabling the integrated spatial and channel information to produce an attention map. The results of this study demonstrated its superiority over other attention mechanisms in overall metrics, including precision, recall, accuracy, and F1 score, compared to the BAM and CBAM, with ResNet50 or ResNet101 as the backbone. In the study, they also created the first openly available landslide dataset, called Bijie, which consists of 2 m TripleSat optical images, 2 m DEM, and vector boundaries of the landslides.

Fu et al. [105] deployed a CAL-Net architecture with a Dual-stream Conditional Attention Module (DCAM) embedded to enable real-time landslide segmentation. The DCAM included a channel conditional attention mechanism and a spatial conditional attention mechanism, both of which applied conditional convolution to dynamically adjust weights based on the input features. With the help of conditional convolution, the model can not only adjust the size of the convolution kernel to refine the features but also decrease the processing time due to a one-time computation after a linear combination.

Amankwah et al. [106] compared two attention-boosted neural network models in landslide delineation: (1) The first is a Spatial–Temporal Attention Neural Network (STANet), which is a model designed for landslide change detection that consists of a backbone network and a pyramid spatial–temporal attention module (PAM). This attention mechanism increases the capability to detect landslides on various scales and colors. (2) The second is a Siamese Nested U-Net (SNUNet), which adopts a skip connection to better use fine-grained localization features by transferring them from encoders to shallow sub-decoders, as well as the coarse-grained features that are obtained in deep sub-decoders. An Ensemble Channel Attention Module (ECAM) was used to merge the abovementioned multiple-level information. The results revealed that SNUNet performed better than STANet [121], with a higher F1 score and mIoU. It also exhibited superior performance in detecting scattered landslides and distinguishing them from riverbeds and road pixels using bitemporal PlanetScope imagery. In addition, SNUNet can better capture fine details than STANet, as it can fuse low-level and high-level features to extract intra-group relations. Because SNUNet has a shallower depth and fewer parameters than STANet, it takes less time to train SNUNet.

Many studies make efforts to use attention modules and feature fusion techniques together in the model architecture. Among the research, the squeeze-and-excitation network (SENet), which is a type of channel attention mechanism, is frequently used [107,108]. For example, Chen et al. [107] embedded the SENet into UNet to enable feature fusion. The SENet compressed the dimensions of the feature channel into one dimension, captured the nonlinear correlation between channels, and finally reweighted the aforementioned features based on the importance of each channel dimension. After that, the low-level features from the encoder and high-level features from the decoder were fused through the skip connection.

Additionally, several works have presented an attention module and feature fusion technique based on the YOLO detection model [100,109,110,111,112]. For example, Zhang et al. [112] proposed an LS-YOLO model to detect landslides. A multi-scale feature enhancement module consisted of two branches: one was a residual connection, and the other was embedded with an Efficient Channel Attention (ECM) module to enhance global spatial information and channel interactions. Furthermore, as the detection model is based on YOLOv5, an improved decoupled head regression task branch was used to replace the original coupled head in YOLOv5. In this decoupled head, a Context Enhancement Module (CEM) using dilated convolution replaced the original 3 × 3 convolutions with the help of three feature fusion techniques to enhance the generalization and robustness of CEM; the dilated convolutions with dilation rates of one, three, and five created three feature maps. The three feature fusion techniques can be used to fuse these feature maps. The three fusions were (1) adaptive fusion for conducting 1 × 1 convolutions on three feature maps, (2) concatenation fusion for concatenating the abovementioned feature maps in the channel dimension, and (3) weighted fusion for summing up the aforementioned feature maps in the spatial dimension. This improvement increased the average precision, precision, and recall with fewer parameters compared to Faster R-CNN, YOLOv5s, YOLOv7, and YOLOX, among others, while maintaining a faster speed.

Multi-scale mechanism applied in LIM

Multi-scale attention is also widely explored in combination with multi-level feature fusion [113,114,116]. For example, Lu et al. [114] designed an MS2LandsNet to segment landslides. The network consisted of three downsampling stages, each integrated with multi-scale feature fusion and multi-scale channel attention modules. This design retains the spatial details from low-level features and semantic information from high-level features. This network was compared with ten other models, including UNet, Attention UNet, ResUNet, and Deep U-Net, and achieved the highest F1 score and IoU value, as well as comparatively high precision and recall, with fewer parameters and a faster speed.

There are some other attention strategies used in landslide detection or segmentation. For example, Cheng et al. [116] designed a You Only Look Once Small Attention (YOLO-SA) model to enable the fast and accurate one-stage detection of landslide areas. This model extended YOLOv4 by adding group convolution and ghost bottlenecks to reduce the model parameters. It also added a selective kernel attention module to enable the adaptive adjustment of receptive field size, allowing for detection of landslides with different sizes. Their model was compared with YOLOv4, Faster RCNN, and nine other models, and it achieved the overall best average precision (AP) and fastest inference speed in landslide detection. Additionally, Han et al. [117] developed a Dynahead-YOLO, which consisted of three branches of feature fusion in the YOLO neck network and combined a task-aware, spatial-aware, and scale-aware attention block into the detection head. In the neck network, a fusion feature map was generated through convolutional layers in the first branch, which was then upsampled and concatenated with feature maps extracted from the backbone network (Darknet-53) in the second branch. This process was iteratively repeated with other feature maps of varying sizes in the third branch. Each of these fusion feature maps went through the three attention blocks. The task-aware block activated different channels of feature maps through an activation function to adapt to detection tasks. The spatial-aware attention block used a deformable convolution layer to adapt to the shape and scale variations of feature maps. The scale-aware block fused the input features based on different weights. The resulting maps of different levels, generated from the fusion branches and attention blocks, were finally determined using non-maximum suppression to obtain the final prediction result. This approach significantly increased the proportion of correct predictions compared to YOLOv3.

Long-range dependency-capturing attention mechanism used in LIM

Self-attention (also known as intra-attention) is an attention mechanism that operates on a single input sequence, calculating weights based on the relationships between vectors within the same sequence after linear transformations. Liu et al. [115] compared CBAM and self-attention by embedding them separately into their Mask R-CNN module. Their results revealed that the self-attention module achieved the best F1 score of 84.12% and improved the mIoU by an additional 3.84% compared to CBAM.

Multi-head attention refers to performing multiple self-attention operations in parallel. It allows for more effective mixing of information between each self-attention. Several studies have used multi-head self-attention to detect landslides [32,120]. For example, Niu et al. [32] integrated multi-head attention into the Faster R-CNN to generate region proposals. The model can detect landslides and mudslides from global images by adding an attention module. The improved model outperformed the Faster R-CNN, achieving a higher recall. Paheding et al. [97] proposed a MarsLS-Net to segment Martian landslides. The model is composed of repetitive Progressively Expanded Neuron Attention (PEN-Attention) blocks, which differ from the encoder–decoder architecture. PEN-Attention blocks can enhance the relevant regions at a global level by utilizing concatenated two-branch ConvPE layers, followed by multi-head self-attention. Compared to other models such as UNet, ResUNet, SwinUNet, and TransUNet, this model achieved the highest mIoU, F1 score, recall, and precision values.

4.3. Addressing the Limitation of Insufficient Training Data

Landslide data inventories for AI applications are often limited in size due to the challenges associated with annotating such events. Multiple studies focus on solving this issue. A classification of these techniques can be found in Table 3.

Manual creation of benchmark dataset in LIM

Most of the studies developed their own landslide samples, used existing datasets, or increased landslide samples through data augmentation [23,57,59,137,138]. To address the data scarcity challenge, multiple landslide datasets have been made publicly available [38,81,104,111,123,138,139,140,141,142,143], including the Bijie Dataset [104], the Martian landslide dataset [97], the CAS dataset (which includes landslides triggered by a mix of earthquake and rainfall factors) [140], the Luding dataset [141], the Landslide4Sense dataset [38], a dataset of mixed landslide types triggered by various factors: the Global Very-high-resolution Landslide Mapping (GVLM) dataset [123], the Medium Resolution Global Sentinel Landslide Dataset (MRGSLD) [142], a rainfall-induced landslide dataset: the Malawi dataset [139], an earthquake-induced global landslide dataset: the Global Distributed Coseismic Landslide Dataset (GDCLD) [143], and other earthquake-induced landslide datasets, such as the Lushan and Jiuzhaigou datasets [81]. For example, Ji et al. [104] manually created an open landslide dataset with 770 positive samples. The training images consist of optical images, landslide shapefiles, and DEMs, which are widely used data sources in landslide studies [21,32,59]. Researchers initially identified landslide areas using historical inventory data. These identifications were then manually confirmed by geologists or through on-site field surveys. This validation process corrected positional biases and removed unidentifiable landslides from the images.

Tools for dataset updating to support LIM

However, gathering the landslide inventory manually is a time-consuming and labor-intensive task. Research has been conducted to assist in updating and creating landslide datasets. Nagendra et al. [122] focused on rainfall- and earthquake-triggered landslides, which are a significant cause of disaster due to their rapid movement. The authors collected high-resolution aerial and satellite images from Google Earth and selected the landslide candidates by comparing pre- and post-landslide images. The goal was to resolve the domain shift issue in multi-ecoregion landslide detection. Task-specific model updates (TSMU) were developed to facilitate inventory updates and creation without compromising the integrity of existing data or losing inventory information. The detection model was embedded with U-Net as the decoder and ResNet34 as the encoder, relying on shared parameters to keep the model updated when the ecoregion changed, thereby avoiding data forgetting.

Learning with fewer or less accurate labels in LIM

To reduce the demand for large, labeled training datasets, several learning methods with limited supervision have been proposed. A partially supervised learning method was developed to detect landslides. Wang et al. [124] proposed a partially supervised learning approach called the Weight Transfer Function, based on Mask R-CNN. Since Mask R-CNN is seen as an extension of Faster R-CNN that adds a mask branch to the previous bounding box detection model, the Weight Transfer Function treats samples with labeled bounding boxes and segmentation masks as strong labels. It uses these samples to train with only a bounding box, which is considered weakly labeled data in this case. In this process, the mask segmentation weights were predicted using the bounding box detection weights, aided by a transfer function that generalized classes for unlabeled masks. This method helped resolve the issue of insufficient labels in landslide segmentation.

Weakly supervised learning is also commonly developed [125,126]. Wang et al. [126] proposed an Auto-Prompting Segment Anything Model (APSAM) with a Class Activation Map (CAM) to complete weakly supervised landslide segmentation. The coarse image-level of the landslide pseudo-label was initially identified to guide the SAM in segmenting irregular landslide boundaries. This image-level label identifies a coarse region where landslides are located, and serves as a box-level prompt for the SAM. Adaptive prompt algorithms automatically generated point-level prompts from the regions of interests produced by CAM. The point- and box-level hybrid prompts were separately encoded into the SAM to refine the instance segmentation results. This approach achieved more than 3% higher F1 score and mIoU values compared to using a single prompt and outperformed other models, such as the local–global anchor guidance network (LGAGNet).

Lv et al. [127] proposed a contrastive learning semi-supervised approach to segment landslides that is built on DeepLabV3+ with datasets composed of 1022 labeled landslide data points and 2776 unlabeled data points. They extended the mean–teacher framework, in which the student model learns from the teacher model to predict landslides at the pixel level. In this framework, the teacher model applied Virtual Adversarial Perturbation (VAP), a semi-supervised approach for identifying the most sensitive perturbation direction that decreases model stability and for generating confident pseudo-labels via an adaptive entropy threshold. The student network was trained using the pseudo-labels generated by the teacher model. Similarly, another contrastive learning method was used in self-supervised learning to solve the limited dataset issue.

Ghorbanzadeh et al. [128] proposed a contrastive self-supervised learning framework based on Swapping Assignments between Views (SwAV), which generates different views of the same image using random transformations, such as cropping. The SwAV loss function enforces consistency between the feature representations of these augmented views by comparing their prototype assignments. An online clustering method that uses the Sinkhorn–Knopp algorithm ensures that the soft assignments of features vectors to prototypes are balanced and stable, facilitating effective clustering in the feature space.

Unsupervised conventional machine learning methods, such as clustering algorithms, are frequently used to construct training samples [85,129,130]. Due to the lack of a training dataset, Garcia et al. [129] built their data set from two sources: one obtained from a manually labeled dataset and the other from the separate groups clustered by the k-means algorithm. In this study, they also augmented samples to increase the number of landslide images. Zhang et al. [59] clustered the segmented objects into landslides and non-landslides using fast fuzzy C-means (FCM), thus creating a pseudo-label for the segmented result.

Moreover, many studies have applied transfer learning in landslide detection [123,131,132,133,134,135]. Transfer learning not only addresses the concerns caused by insufficient training data but also significantly reduces training time in detection tasks and improves model generalization. Photography-based images play a crucial role in detecting landslides. Nonetheless, due to the lack of an available photogrammetric landslide inventory, Ullo et al. [24] manually labeled, resized, and augmented the Unmanned Aerial Vehicle (UAV) photogrammetry to increase the number of training samples. They leveraged the Mask R-CNN, with ResNet50 and ResNet101 as backbone networks for landslide segmentation. Pre-training of the Mask R-CNN was performed on the Common Objects in Context (COCO) dataset, followed by fine-tuning on the landslide training samples. The model achieved its highest F1 score of 0.97 when utilizing ResNet101. Similarly, Qin et al. [104] proposed an attention mechanism for Distant Domain Transfer Learning (DDTL) by pre-training a classification model with an improved CBAM embedded using various scene classification datasets, including the Bijie landslide dataset, land cover data, and other imagery. Asadi et al. [132] tested the cross-event transferability of a model consisting of a ResNet50 backbone and DeepLabv3+ without fine-tuning the model. Three independent landslide events were selected based on their regional similarity and the availability of 0.5 m aerial images. One of the datasets consisted of post-event imagery from the 2018 Mw 6.6 Kumamoto earthquake, which was used to train and test the model. The other two datasets included post-event images from a 2017 Asakura rainfall event and the 2018 Hokkaido earthquake, which were used to test the cross-event transferability.

To develop transferable deep learning models, unsupervised learning is a commonly used technique. For example, Zhang et al. [123] introduced Prototype-Guided Domain-aware Progressive Representation Learning (PG-DPRL), an unsupervised single-source-to-multitarget domain adaptation method for landslide segmentation based on panchromatic and multispectral images obtained from the SPOT 6 satellite. This method enables the adaptation of one source domain with known labels to multitype unlabeled target domains by using the progressive near-to-far representation alignment strategy. The representation divergence function was used to calculate the divergence between the source domain and the target domains, and the target domain with the least divergence was selected. After the target domain was selected, the Prototype-Guided Adversarial Learning (PGAL) aligned the domains. The Wasserstein distance metric was calculated in adversarial learning to measure the domain shift and served as an adversarial loss to mislead the discriminator into seeing the target domain as the source domain. Pseudo-label were assigned to the active target samples, as they were closer to the source domain and have lower clustering uncertainties. The pseudo-label was then used to guide the adaptation, building a prototypical constraint and enhancing representation consistency through loss updates so that the target domain data could be merged into the source domain. This method improved the model’s transferability and outperformed other cross-domain landslide segmentation models, such as the category-certainty attention method and the multitarget collaborative consistency learning method, in terms of precision, recall, and F1 score.

Similar to Zhang et al. [123], Li et al. [133] developed a cross-scene domain adaptation method for landslide segmentation based on UNet3+ at two levels: the feature level and the dataset level. At the feature level, adversarial learning was applied to minimize the domain distance. At the dataset level, they generated new landslide samples using a Transformer-based network called StyTr2 to mitigate domain shift. The new landslide samples not only preserved detailed content from the source domain but also incorporated the rich style from the target domain using two decoders: one to generate a feature sequence for both content and style, and another to apply the style to the content. These steps helped the model better align its features and subsequently improved generalizability when it was trained on the source domain and then transferred to the target domain.

Wang and Brenning [135] proposed an Unsupervised Active-Transfer Learning (UATL) method for landslide classification based on High-resolution PlanetScope (PS) optical satellite imagery and DEM data. This approach combined active learning (AL) and Transfer Learning (TL) by adjusting the weights between these two models. This experiment adopted a pixel-based approach, classifying landslide and non-landslide points on images. In the AL process, 85 initial points were selected and labeled, and another batch of 25 points was iteratively selected for classification. In the TL process, they adopted a Case-Based Reasoning (CBR) approach, one of the TL strategies. This CBR searched for regions similar to the source and target areas for training a model using geological and topographic factors. As the size of the training samples increased, the importance of AL exceeded TL, the weight was adaptively (adaptive UATL) or statically (regular UATL) optimized to adjust the contributions of the two methods by using a k-means clustering algorithm and stratification between landslide and non-landslide samples. The final results resembled consecutive landslide classification maps, as an increasing number of points were classified in the target domain.

Aside from transfer learning and other forms of limited supervision, some studies have explored frameworks that utilize GANs to augment or expand landslide datasets. For example, Feng et al. [136] generated 770 synthetic landslide images using a StyleGAN2 that was trained on 770 real landslide images. The Fréchet inception distance (FID) was used to measure the diversity and similarity between synthetic images and real images. They incorporated synthetic images and authentic images together to train a series of CNNs and a series of Transformers. The results revealed that the enriched dataset empowered the Swin Transformer more than other neural networks. The mIoU value reached 0.8993, which was 4% higher than that obtained using authentic images alone. It also achieved an overall accuracy of 0.9866, which was approximately 10% higher than that obtained by solely using authentic images.

4.4. Detecting Different Types of Landslides

Landslides can be categorized into several types based on the material involved and the mechanism of movement, including falls, topples, slides, lateral spreads, flows, and slow slope deformations, along with complex landslides that involve a combination of two or more types [144,145]. The triggering factors behind landslide events are diverse, including earthquakes [146], intense or prolonged rainfall [63], wildfires [147], snowmelt, permafrost thaw [148], and anthropogenic activities [149]. Among these, earthquake-triggered and rainfall-triggered landslides are the most widely studied using deep learning. These two types of landslides differ in their spatial characteristics and dynamics. Rainfall-triggered landslides are typically more localized, whereas earthquake-triggered landslides may cover areas ranging from a few square meters to hundreds of thousands of square meters [150,151]. Guzzetti et al. [151] further reported that earthquake-triggered landslides often exhibit higher velocities and occur on steeper slopes compared to rainfall-triggered events. Table 4 lists the categorization of deep learning techniques for detecting different types of landslides.

Identifying earthquake-triggered landslides

In the context of earthquake-triggered landslides, the limited availability of labeled samples has prompted the use of transfer learning and multi-scale feature fusion techniques. For example, Meng et al. [118] employed the TLSMF-YOLO model using a C3-Swin Transformer pre-trained on the Visual Object Classification (VOC) dataset. The model was fine-tuned and evaluated on two earthquake-triggered landslide datasets—the Luding dataset and the Jiuzhaigou dataset—to improve detection. Recognizing the challenges in identifying small-scale landslides in mountainous terrain with complex topographic features, Xiang et al. [119] proposed a dual-feature pyramid-based U-Net (DFPU-Net) model to segment the small-scale earthquake-induced landslides using an atrous spatial pyramid pooling (ASPP) module and a pyramid pooling module (PPM) to extract multi-scale contextual information and background information. This model was trained on an augmented dataset comprising approximately 4000 manually delineated landslide images and specifically designed to detect earthquake-triggered landslides in rugged mountainous environments. Several other works have also been conducted on mapping earthquake-triggered landslides [23,28,70,72,74,78,81,83,88,152,153].

Identifying rainfall-triggered landslides

Similarly, rainfall-triggered shallow landslides present challenges due to the scarcity of large-scale annotated datasets, complex topography (e.g., quarries, terraces, slopes, and riverbeds), diverse vegetation cover, seasonal variability, and heterogeneity in landslide size. To address these complexities, recent studies have proposed customized deep learning models trained on purpose-built datasets. Wang et al. [154] introduced a Context-Guided (CG) block that integrates local, surrounding, and global contextual features through a C2f module, which is a two-branch feature fusion module incorporated into a lightweight attention-guided YOLO (LA-YOLO) model, to capture multi-scale information from heterogeneous landscapes. Xu et al. [26] developed a Feature-Based Constraint Deep U-Net (FCDU-Net) architecture that combines DenseNet and U-Net, integrating both handcrafted and data-driven features (e.g., NDVI and gray-level co-occurrence matrix) to assess their relative importance for landslide prediction. Other similar works related to rainfall-triggered landslide datasets have been explored [106,139,155,156].

Identifying old landslides

In addition to earthquake-triggered and rainfall-triggered landslides, several studies have focused on detecting old landslides [53,102]. These older landslides, originally triggered by various factors, have been altered over time by residential or agricultural activities, resulting in modified characteristics such as increased vegetation or human-induced changes. They may be reactivated by multiple factors, potentially leading to greater casualties and economic losses [157]. Ju et al. [53] developed a deep learning model to detect old loess landslides in the Loess Plateau of China. Three study sites with differing characteristics were selected. In the first and second sites, landslide boundaries remained visible but had been degraded by human activity. In the third site, landslides were caused by gravitational erosion, and their boundaries were better preserved. The landslide areas ranged from thousands to hundreds of thousands of square meters. The training datasets for the three sites included 1875 samples (first site), 870 samples (second site), and 1118 samples (third site).

To compare model performance, the study evaluated Mask R-CNN (with ResNet101 or FPN backbone), YOLOv3 (with DarkNet-53 backbone), and RetinaNet (with ResNet101 and FPN backbone). Among these, Mask R-CNN consistently outperformed the others, achieving the highest F1 score at the first site, followed by the second, with the lowest performance at the third site. This reduced performance was attributed to the unstable morphology caused by soil erosion, which made it more difficult to distinguish loess landslides from non-landslide slope erosion. Based on these findings, the authors suggested developing multi-class labels for non-landslide areas and different types of landslides (e.g., new or old) to improve classification accuracy.

While progress has been made in this area, it is worth noting that from a geoscience standpoint, it can sometimes be difficult to distinguish landslide triggers based solely on the characteristics of individual landslides.

5. Discussions on the Limitations of the Current Research and Future Opportunities

Despite the growing body of literature on applying deep learning to landslide mapping, current solutions still exhibit several limitations, which also present opportunities for future development.

Estimating more diverse variables in landslide mapping

The complexity of landslides poses significant challenges for accurately mapping them. Landslides come in various forms, and it can be challenging for deep learning models to distinguish them from background features. For example, rock landslides may resemble bare rock masses, and rainfall-triggered landslides can be mistaken for turbid water. Furthermore, ancient landslides or those covered by vegetation are difficult to detect. As we reviewed and discussed, current deep learning methods for LIM are predominantly concentrated in the blue-shaded area shown in Figure 11, where landslides are small to medium in volume but exhibit rapid displacement rates. This could include both rainfall-triggered landslides and some earthquake-triggered landslides. In contrast, the green-shaded area—representing large-scale, slow-moving geomorphological landslides, especially ancient landslides—receives significantly less attention, despite its importance for understanding long-term landscape evolution and hazard assessment.

In addition to detecting different landslide types, AI techniques could also be explored for the detection of precursory signals, such as increased rockfalls, in data from before catastrophic failures, as well as for quantifying creep rates in slow-moving landslides. Identifying features such as antiscarps, transported blocks, landslide-dammed lakes, molars, and en-echelon lateral shear bands would also be valuable for classifying landslides into various classification schemes. These key landslide characteristics could further help verify whether a deep learning model is capable of learning essential aspects of landslide characterization to make accurate predictions.

Regional variability and model replicability

Another major challenge is landscape variability, limiting the deep learning model’s replicability and generalizability. Seasonal weather conditions, such as snow, rain, and fluctuating vegetation, introduce inconsistencies that can affect the accuracy of landslide mapping. In addition, models trained in one geographic region often fail to generalize well to other areas, limiting their broader applicability [158,159]. A promising solution to these challenges is the development of multimodal approaches and the domain adaptation of geospatial foundation models, such as NASA-IBM’s Prithvi model [160,161]. Such models are trained on multispectral and multi-temporal remote sensing images covering large geographical regions, enabling them to achieve high generalization ability and support high accuracy in landslide detection. Temporal–spatial data modeling also presents an opportunity to better capture the seasonal and environmental changes that impact landslide detection [162].

Multimodal data alignment challenges

While multimodal data integration enriches landslide mapping by combining information from various sensors, temporal misalignment remains a key challenge. Different sensors, such as optical imagery, InSAR, and PolSAR, often capture data at varying times relative to a landslide event, complicating alignment efforts. To address this issue, advanced data harmonization techniques could be employed, such as temporal interpolation or machine learning models that predict missing or temporally misaligned data points. Utilizing time-series analysis with multispectral and SAR imagery can help bridge these temporal gaps by capturing trends and changes over longer periods. Incorporating prior knowledge, such as pre-event stability indices from DEM and slope stability models, could serve as a proxy for missing data, adding weight to model predictions. Moreover, the development of multimodal fusion frameworks that prioritize features robust to temporal shifts—such as radar backscatter values that are less affected by vegetation—can enhance alignment efforts. Future work could also explore dynamic data acquisition protocols, leveraging advancements in real-time remote sensing to capture critical data closer to the time of landslide events.

High computational demand and resource consumption

The high computational demands of deep learning models, especially Transformer-based architectures, further constrain their applicability in landslide mapping. These models require large amounts of memory and processing power, which can make training time-consuming and resource intensive. To mitigate this issue, future research could focus on developing lightweight Transformer models that reduce computational overhead while maintaining performance. Optimizing model architectures, such as by replacing deep layers with more efficient alternatives or streamlining feature maps, could reduce training time and improve scalability. Task-specific adaptation, such as Low-Rank Adaptation, allows for small, focused parts of trainable parameters, achieving better performance and higher efficiency than traditional full fine-tuning in Transformers [163]. These strategies can make Transformer models more accessible for landslide detection, enabling broader applications in real-time and large-scale landslide detection and monitoring.

Data misinterpretation and model explainability

Data misinterpretation is a persistent challenge in landslide mapping, as surface changes from activities such as agriculture, irrigation, or construction often mimic landslide signals, leading to false positives. Conversely, subtle ground deformations or gradual soil compaction can obscure smaller-scale landslides, resulting in false negatives. Spectral similarities between bare soil, water-logged areas, and landslides in optical imagery further complicate accurate classification, whereas noise and artifacts in InSAR data add to the complexity. To address these issues, hybrid approaches that combine automated regional predictions with local field verification can enhance accuracy. Pre-processing techniques, such as filtering noise in InSAR data and detecting outliers, improve data reliability while integrating multimodal data sources like radar backscatter and optical indices, providing a more comprehensive perspective [30]. Leveraging domain knowledge, such as slope stability models and hydrological data, can also refine predictions.

Explainable AI for LIM

As deep learning techniques become increasingly central to LIM, concerns around model interpretability have grown. Explainable AI (XAI) offers a pathway to addressing these concerns by providing tools to analyze and visualize how input variables influence model outputs. In the broader landslide research community, particularly in landslide susceptibility mapping (LSM), XAI techniques such as SHapley Additive exPlanations (SHAP) have been widely adopted to uncover the contribution and influence of various environmental and geological factors on model predictions [164,165,166,167,168,169]. These approaches have helped improve model transparency and informed variable selection in susceptibility analyses. In contrast, the use of XAI in LIM remains limited. Unlike LSM, which focuses on predicting the likelihood of future landslides, LIM aims to detect and delineate existing landslide occurrences, often using satellite or aerial imagery and terrain data. New XAI techniques—such as Class Activation Mapping (CAM) [170], Gradient-Weighted CAM (Grad-CAM) [171], Integrated Gradients (IG) [172], and Layer-wise Relevance Propagation (LRP) [173]—have been used to support model interpretation at the decision level in landform mapping [174] and can be readily applied to LIM applications. Integrating XAI into LIM has the potential to reveal how deep learning models interpret spatial patterns, highlight key features influencing detection, and uncover systematic errors in inventory mapping. This integration can also improve model reliability, support validation efforts, and promote greater transparency in automated mapping workflows.

6. Conclusions

This paper provides a systematic review of the application and adaptation of AI, particularly deep learning techniques, for landslide mapping and segmentation. This rapidly expanding field has demonstrated promise in addressing long-standing challenges such as data processing, feature extraction, and model scalability, enabling more accurate and scalable landslide mapping. By leveraging advanced methodologies such as attention mechanisms, multimodal data fusion, and transfer learning, researchers have achieved notable improvements in prediction accuracy and model efficiency. However, important gaps persist, including the generalizability of models across diverse regions, the effective categorization of different landslide types, and addressing complexities in multimodal and temporally misaligned datasets.

These advancements, while critical for enhancing scientific understanding, also play an essential role in improving disaster response and landslide relief efforts. The accurate and timely mapping of landslides is essential for the rapid assessment and allocation of resources during emergencies, potentially reducing casualties and economic losses. Recent efforts, such as leveraging near-real-time remote sensing data and deploying lightweight, interpretable models for field use, have made strides in facilitating actionable insights during disaster scenarios. However, there is still work to be done to ensure these technologies are reliable and accessible under operational constraints, such as limited computational resources and dynamic environmental conditions.

Looking ahead, future research could prioritize developing models that are robust to regional and environmental variability, interpretable to field responders, and adaptable to real-time scenarios. Expanding datasets to include diverse terrains and event types, along with fostering collaborations among AI researchers, geoscientists, and disaster response teams, will be essential. Integrating physics-based approaches and geospatial foundation models (e.g., NASA-IBM Prithvi [161]) may further enhance the reliability of predictions, ensuring that models can operate effectively in varied contexts. Moving beyond mapping, it is equally important to develop forecasting capabilities that can predict future landslide occurrences, supporting early warning systems and saving lives.

Although challenges remain, the progress achieved thus far highlights the transformative potential of AI in landslide research and disaster response. With sustained interdisciplinary efforts, these technologies can contribute meaningfully to reducing the impact of landslides, improving disaster relief coordination, and fostering community resilience.

Author Contributions

Conceptualization, W.L.; methodology, X.C. and W.L.; validation, W.L., C.-Y.H., S.T.A. and B.H.; formal analysis, X.C. and W.L.; investigation, X.C., W.L., C.-Y.H., S.T.A. and B.H.; resources, W.L.; data curation, X.C.; writing—original draft preparation, X.C. and W.L.; writing—review and editing, X.C., W.L., C.-Y.H., S.T.A. and B.H.; visualization, X.C., B.H. and W.L.; supervision, W.L.; project administration, W.L.; funding acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by NASA under grant number 80NSSC24PC446, the National Science Foundation under award 2120943, and the Arizona Board of Regents.

Data Availability Statement

No new landslide datasets were created in this review. Data sharing does not apply to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Crosta, G.B.; Agliardi, F. A Methodology for Physically Based Rockfall Hazard Assessment. Nat. Hazards Earth Syst. Sci. 2003, 3, 407–422. [Google Scholar] [CrossRef]
Karstens, J.; Haflidason, H.; Berndt, C.; Crutchley, G.J. Revised Storegga Slide Reconstruction Reveals Two Major Submarine Landslides 12,000 Years Apart. Commun. Earth Environ. 2023, 4, 55. [Google Scholar] [CrossRef]
Higman, B.; Shugar, D.H.; Stark, C.P.; Ekström, G.; Koppes, M.N.; Lynett, P.; Dufresne, A.; Haeussler, P.J.; Geertsema, M.; Gulick, S.; et al. The 2015 Landslide and Tsunami in Taan Fiord, Alaska. Sci. Rep. 2018, 8, 12993. [Google Scholar] [CrossRef]
Measurement of Ridge-Spreading Movements (Sackungen) at Bald Eagle Mountain, Lake County, Colorado, II: Continuation of the 1975–1989 Measurements Using a Global Positioning System in 1997 and 1999. Available online: https://pubs.usgs.gov/publication/ofr00205 (accessed on 4 March 2025).
McColl, S.T. Landslide Causes and Triggers. In Landslide Hazards, Risks, and Disasters; Elsevier: Amsterdam, The Netherlands, 2015; pp. 17–42. ISBN 978-0-12-396452-6. [Google Scholar]
Landslides Kill and Hurt Thousands, but Science Largely Ignores These Disasters. Available online: https://www.scientificamerican.com/article/landslides-kill-and-hurt-thousands-but-science-largely-ignores-these-disasters/ (accessed on 4 March 2025).
Larsen, M.C.; Wieczorek, G.F.; Eaton, L.S.; Torres-Sierra, H. The rainfall-triggered landslide and flash-flood disaster. In Proceedings of the Seventh Federal Interagency Sedimentation Conference, Reno, NV, USA, 25–29 March 2001. [Google Scholar]
Hansen, A. Engineering geomorphology: The application of an evolutionary model of Hong Kong’s terrain. Zeitschrift für Geomorphologie. Supplementband 1984, 51, 39–50. [Google Scholar]
Galli, M.; Ardizzone, F.; Cardinali, M.; Guzzetti, F.; Reichenbach, P. Comparing Landslide Inventory Maps. Geomorphology 2008, 94, 268–289. [Google Scholar] [CrossRef]
Ma, Z.; Mei, G.; Piccialli, F. Machine Learning for Landslides Prevention: A Survey. Neural Comput. Applic. 2021, 33, 10881–10907. [Google Scholar] [CrossRef]
Li, M.; Wang, H.; Chen, J.; Zheng, K. Assessing Landslide Susceptibility Based on the Random Forest Model and Multi-Source Heterogeneous Data. Ecol. Indic. 2024, 158, 111600. [Google Scholar] [CrossRef]
Agboola, G.; Beni, L.H.; Elbayoumi, T.; Thompson, G. Optimizing Landslide Susceptibility Mapping Using Machine Learning and Geospatial Techniques. Ecol. Inform. 2024, 81, 102583. [Google Scholar] [CrossRef]
Youssef, K.; Shao, K.; Moon, S.; Bouchard, L.S. Landslide susceptibility modeling by interpretable neural network. Commun. Earth Environ. 2023, 4, 162. [Google Scholar] [CrossRef]
Moosavi, V.; Talebi, A.; Shirmohammadi, B. Producing a Landslide Inventory Map Using Pixel-Based and Object-Oriented Approaches Optimized by Taguchi Method. Geomorphology 2014, 204, 646–656. [Google Scholar] [CrossRef]
Li, X.; Cheng, X.; Chen, W.; Chen, G.; Liu, S. Identification of Forested Landslides Using LiDar Data, Object-Based Image Analysis, and Machine Learning Algorithms. Remote Sens. 2015, 7, 9705–9726. [Google Scholar] [CrossRef]
Stumpf, A.; Kerle, N. Object-Oriented Mapping of Landslides Using Random Forests. Remote Sens. Environ. 2011, 115, 2564–2577. [Google Scholar] [CrossRef]
Li, W. GeoAI: Where Machine Learning and Big Data Converge in GIScience. J. Spat. Inf. Sci. 2020, 20, 71–77. [Google Scholar] [CrossRef]
Li, W.; Hsu, C.-Y. GeoAI for Large-Scale Image Analysis and Machine Vision: Recent Progress of Artificial Intelligence in Geography. ISPRS Int. J. Geo-Inf. 2022, 11, 385. [Google Scholar] [CrossRef]
Thirugnanam, H. Deep Learning in Landslide Studies: A Review. In Progress in Landslide Research and Technology; Springer International Publishing: Cham, Switzerland, 2022; Volume 1, pp. 247–255. ISBN 978-3-031-18470-3. [Google Scholar]
Tehrani, F.S.; Calvello, M.; Liu, Z.; Zhang, L.; Lacasse, S. Machine Learning and Landslide Studies: Recent Advances and Applications. Nat. Hazards 2022, 114, 1197–1245. [Google Scholar] [CrossRef]
Mohan, A.; Singh, A.K.; Kumar, B.; Dwivedi, R. Review on Remote Sensing Methods for Landslide Detection Using Machine and Deep Learning. Trans. Emerg. Telecommun. Technol. 2021, 32, e3998. [Google Scholar] [CrossRef]
Schönfeldt, E.; Winocur, D.; Pánek, T.; Korup, O. Deep Learning Reveals One of Earth’s Largest Landslide Terrain in Patagonia. Earth Planet. Sci. Lett. 2022, 593, 117642. [Google Scholar] [CrossRef]
Tang, X.; Tu, Z.; Wang, Y.; Liu, M.; Li, D.; Fan, X. Automatic Detection of Coseismic Landslides Using a New Transformer Method. Remote Sens. 2022, 14, 2884. [Google Scholar] [CrossRef]
Ullo, S.; Mohan, A.; Sebastianelli, A.; Ahamed, S.; Kumar, B.; Dwivedi, R.; Sinha, G.R. A New Mask R-CNN-Based Method for Improved Landslide Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3799–3810. [Google Scholar] [CrossRef]
Wang, Y.; Wang, X.; Jian, J. Remote Sensing Landslide Recognition Based on Convolutional Neural Network. Math. Probl. Eng. 2019, 2019, 1–12. [Google Scholar] [CrossRef]
Xu, G.; Wang, Y.; Wang, L.; Soares, L.P.; Grohmann, C.H. Feature-Based Constraint Deep CNN Method for Mapping Rainfall-Induced Landslides in Remote Regions with Mountainous Terrain: An Application to Brazil. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 2644–2659. [Google Scholar] [CrossRef]
Sreelakshmi, S.; Vinod Chandra, S.S. Visual Saliency-Based Landslide Identification Using Super-Resolution Remote Sensing Data. Results Eng. 2024, 21, 101656. [Google Scholar] [CrossRef]
Nava, L.; Bhuyan, K.; Meena, S.R.; Monserrat, O.; Catani, F. Rapid Mapping of Landslides on SAR Data by Attention U-Net. Remote Sens. 2022, 14, 1449. [Google Scholar] [CrossRef]
Kamiyama, J.; Noro, T.; Sakagami, M.; Suzuki, Y.; Yoshikawa, K.; Hikosaka, S.; Hirata, I. Detection of landslide candidate interference fringes in DInSAR imagery using deep learning. Recall 2018, 90, 94–95. [Google Scholar]
Liu, X.; Zhao, C.; Yin, Y.; Tomás, R.; Zhang, J.; Zhang, Q.; Wei, Y.; Wang, M.; Lopez-Sanchez, J.M. Refined InSAR Method for Mapping and Classification of Active Landslides in a High Mountain Region: Deqin County, Southern Tibet Plateau, China. Remote Sens. Environ. 2024, 304, 114030. [Google Scholar] [CrossRef]
Liu, Y.; Yao, X.; Gu, Z.; Li, R.; Zhou, Z.; Liu, X.; Jiang, S.; Yao, C.; Wei, S. Research on Automatic Recognition of Active Landslides Using InSAR Deformation under Digital Morphology: A Case Study of the Baihetan Reservoir, China. Remote Sens. Environ. 2024, 304, 114029. [Google Scholar] [CrossRef]
Niu, C.; Ma, K.; Shen, X.; Wang, X.; Xie, X.; Tan, L.; Xue, Y. Attention-Enhanced Region Proposal Networks for Multi-Scale Landslide and Mudslide Detection from Optical Remote Sensing Images. Land 2023, 12, 313. [Google Scholar] [CrossRef]
Xiong, Z.; Zhang, M.; Ma, J.; Xing, G.; Feng, G.; An, Q. InSAR-Based Landslide Detection Method with the Assistance of C-Index. Landslides 2023, 20, 2709–2723. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Blaschke, T.; Gholamnia, K.; Meena, S.; Tiede, D.; Aryal, J. Evaluation of Different Machine Learning Methods and Deep-Learning Convolutional Neural Networks for Landslide Detection. Remote Sens. 2019, 11, 196. [Google Scholar] [CrossRef]
Su, Z.; Chow, J.K.; Tan, P.S.; Wu, J.; Ho, Y.K.; Wang, Y.-H. Deep Convolutional Neural Network–Based Pixel-Wise Landslide Inventory Mapping. Landslides 2021, 18, 1421–1443. [Google Scholar] [CrossRef]
Yang, R.; Zhang, F.; Xia, J.; Wu, C. Landslide Extraction Using Mask R-CNN with Background-Enhancement Method. Remote Sens. 2022, 14, 2206. [Google Scholar] [CrossRef]
Mezaal, M.R.; Pradhan, B.; Sameen, M.I.; Mohd Shafri, H.Z.; Yusoff, Z.M. Optimized Neural Architecture for Automatic Landslide Detection from High-Resolution Airborne Laser Scanning Data. Appl. Sci. 2017, 7, 730. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Xu, Y.; Zhao, H.; Wang, J.; Zhong, Y.; Zhao, D.; Zang, Q.; Wang, S.; Zhang, F.; Shi, Y.; et al. The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multisource Satellite Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 9927–9942. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.; Woo, W. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. arXiv 2015, arXiv:1506.04214. [Google Scholar]
Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Li, W.; Hsu, C.-Y.; Wang, S.; Yang, Y.; Lee, H.; Liljedahl, A.; Witharana, C.; Yang, Y.; Rogers, B.M.; Arundel, S.T.; et al. Segment Anything Model Can Not Segment Anything: Assessing AI Foundation Model’s Generalizability in Permafrost Mapping. Remote Sens. 2024, 16, 797. [Google Scholar] [CrossRef]
Ding, A.; Zhang, Q.; Zhou, X.; Dai, B. Automatic Recognition of Landslide Based on CNN and Texture Change Detection. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 444–448. [Google Scholar]
Yu, H.; Ma, Y.; Wang, L.; Zhai, Y.; Wang, X. A Landslide Intelligent Detection Method Based on CNN and RSG_R. In Proceedings of the 2017 IEEE International Conference on Mechatronics and Automation (ICMA), Takamatsu, Japan, 6–9 August 2017; pp. 40–44. [Google Scholar]
Meena, S.R.; Soares, L.P.; Grohmann, C.H.; Van Westen, C.; Bhuyan, K.; Singh, R.P.; Floris, M.; Catani, F. Landslide Detection in the Himalayas Using Machine Learning Algorithms and U-Net. Landslides 2022, 19, 1209–1229. [Google Scholar] [CrossRef]
Lei, T.; Zhang, Q.; Xue, D.; Chen, T.; Meng, H.; Nandi, A.K. End-to-End Change Detection Using a Symmetric Fully Convolutional Network for Landslide Mapping. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 3027–3031. [Google Scholar]
Liu, T.; Chen, T.; Niu, R.; Plaza, A. Landslide Detection Mapping Employing CNN, ResNet, and DenseNet in the Three Gorges Reservoir, China. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 11417–11428. [Google Scholar] [CrossRef]
Janarthanan, S.S.; Subbian, D.; Subbarayan, S.; Zhang, H.; Ko, S.B. SFCNet: Deep Learning-Based Lightweight Separable Factorized Convolution Network for Landslide Detection. J. Indian Soc. Remote Sens. 2023, 51, 1157–1170. [Google Scholar] [CrossRef]
Ju, Y.; Xu, Q.; Jin, S.; Li, W.; Su, Y.; Dong, X.; Guo, Q. Loess Landslide Detection Using Object Detection Algorithms in Northwest China. Remote Sens. 2022, 14, 1182. [Google Scholar] [CrossRef]
Liu, P.; Wei, Y.; Wang, Q.; Xie, J.; Chen, Y.; Li, Z.; Zhou, H. A Research on Landslides Automatic Extraction Model Based on the Improved Mask R-CNN. ISPRS Int. J. Geo-Inf. 2021, 10, 168. [Google Scholar] [CrossRef]
Dang, K.B.; Nguyen, C.Q.; Tran, Q.C.; Nguyen, H.; Nguyen, T.T.; Nguyen, D.A.; Tran, T.H.; Bui, P.T.; Giang, T.L.; Nguyen, D.A.; et al. Comparison between U-Shaped Structural Deep Learning Models to Detect Landslide Traces. Sci. Total Environ. 2024, 912, 169113. [Google Scholar] [CrossRef]
Lin, M.; Teng, S.; Chen, G.; Lv, J.; Hao, Z. Optimal CNN-Based Semantic Segmentation Model of Cutting Slope Images. Front. Struct. Civ. Eng. 2022, 16, 414–433. [Google Scholar] [CrossRef]
Yang, S.; Wang, Y.; Wang, P.; Mu, J.; Jiao, S.; Zhao, X.; Wang, Z.; Wang, K.; Zhu, Y. Automatic Identification of Landslides Based on Deep Learning. Appl. Sci. 2022, 12, 8153. [Google Scholar] [CrossRef]
Qin, H.; Wang, J.; Mao, X.; Zhao, Z.; Gao, X.; Lu, W. An Improved Faster R-CNN Method for Landslide Detection in Remote Sensing Images. J. Geovis. Spat. Anal. 2024, 8, 2. [Google Scholar] [CrossRef]
Ganerød, A.J.; Lindsay, E.; Fredin, O.; Myrvoll, T.-A.; Nordal, S.; Rød, J.K. Globally vs. Locally Trained Machine Learning Models for Landslide Detection: A Case Study of a Glacial Landscape. Remote Sens. 2023, 15, 895. [Google Scholar] [CrossRef]
Lu, Z.; Peng, Y.; Li, W.; Yu, J.; Ge, D.; Han, L.; Xiang, W. An Iterative Classification and Semantic Segmentation Network for Old Landslide Detection Using High-Resolution Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
Li, W.; Fu, Y.; Fan, S.; Xin, M.; Bai, H. DCI-PGCN: Dual-Channel Interaction Portable Graph Convolutional Network for Landslide Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–16. [Google Scholar] [CrossRef]
Fan, S.; Fu, Y.; Li, W.; Bai, H.; Jiang, Y. ETGC2-Net: An Enhanced Transformer and Graph Convolution Combined Network for Landslide Detection. Nat. Hazards 2025, 121, 135–160. [Google Scholar] [CrossRef]
Rajasekar, P.; Devendraiah, K.M.; Tamizhselvi, A.; Yadav, D.; Priyadharshini, A.; Suganthi, D. Rainfall-Induced Landslide Prediction Using Graph-Based Convolutional Network Models. In Proceedings of the 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkuru, India, 4 December 2024; pp. 1–6. [Google Scholar]
Tomar, N.K.; Shergill, A.; Rieders, B.; Bagci, U.; Jha, D. TransResU-Net: Transformer Based ResU-Net for Real-Time Colonoscopy Polyp Segmentation. arXiv 2022, arXiv:2206.08985. [Google Scholar]
Liu, L.; Chen, Q.; Su, J.; Du, X.G.; Lei, T.; Wan, Y. MGU-Net: A Multiscale Gate Attention Encoder-Decoder Network for Medical Image Segmentation. Int. J. Comput. Appl. Technol. 2023, 71, 275–285. [Google Scholar] [CrossRef]
Fang, B.; Chen, G.; Pan, L.; Kou, R.; Wang, L. GAN-Based Siamese Framework for Landslide Inventory Mapping Using Bi-Temporal Optical Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2021, 18, 391–395. [Google Scholar] [CrossRef]
Zhang, X.; Pun, M.-O.; Liu, M. Semi-Supervised Multi-Temporal Deep Representation Fusion Network for Landslide Mapping from Aerial Orthophotos. Remote Sens. 2021, 13, 548. [Google Scholar] [CrossRef]
Lu, W.; Zhao, Z.; Mao, X.; Cheng, Y. An Accurate Recognition Method for Landslides Based on a Semi-Supervised Generative Adversarial Network: A Case Study in Lanzhou City. Appl. Sci. 2024, 14, 5084. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, H.; Yang, R.; Yao, G.; Xu, Q.; Zhang, X. A Novel Weakly Supervised Remote Sensing Landslide Semantic Segmentation Method: Combining CAM and cycleGAN Algorithms. Remote Sens. 2022, 14, 3650. [Google Scholar] [CrossRef]
Zhou, Z.G.; Chen, B.; Li, Z.; Li, C. The Use of LSTM-Based RNN and SVM Models to Detect Ludian Coseismic Landslides in Time Series Images. J. Phys. Conf. Ser. 2020, 1631, 012085. [Google Scholar] [CrossRef]
Varangaonkar, P.; Rode, S.V. Lightweight Deep Learning Model for Automatic Landslide Prediction and Localization. Multimed. Tools Appl. 2023, 82, 33245–33266. [Google Scholar] [CrossRef]
Yang, Q.; Zhang, X.; Xie, Y.; Huang, S.; Kong, J.; Zhang, X. Seismic Landslide Detection Based on Transformer and RNN Algorithm. In Proceedings of the 2023 IEEE International Conference on Sensors, Electronics and Computer Engineering (ICSECE), Jinzhou, China, 18 August 2023; pp. 1394–1397. [Google Scholar]
Yang, S.; Wang, Y.; Zhao, K.; Liu, X.; Mu, J.; Zhao, X. Partial Convolution-Simple Attention Mechanism-SegFormer: An Accurate and Robust Model for Landslide Identification. Eng. Appl. Artif. Intell. 2025, 151, 110612. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, J.; He, H.; Jia, Y.; Chen, R.; Ge, Y.; Ming, Z.; Zhang, L.; Li, H. MAST: An Earthquake-Triggered Landslides Extraction Method Combining Morphological Analysis Edge Recognition with Swin-Transformer Deep Learning Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 2586–2595. [Google Scholar] [CrossRef]
Lv, P.; Ma, L.; Li, Q.; Du, F. ShapeFormer: A Shape-Enhanced Vision Transformer Model for Optical Remote Sensing Image Landslide Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 2681–2689. [Google Scholar] [CrossRef]
Kumar, A.; Misra, R.; Singh, T.N.; Dhiman, G. APO-AN Feature Selection Based Glorot Init Optimal TransCNN Landslide Detection from Multi Source Satellite Imagery. Multimed. Tools Appl. 2024, 83, 40451–40488. [Google Scholar] [CrossRef]
Chen, X.; Liu, M.; Li, D.; Jia, J.; Yang, A.; Zheng, W.; Yin, L. Conv-Trans Dual Network for Landslide Detection of Multi-Channel Optical Remote Sensing Images. Front. Earth Sci. 2023, 11, 1182145. [Google Scholar] [CrossRef]
Li, J.; Zhang, J.; Fu, Y. CTHNet: A CNN–Transformer Hybrid Network for Landslide Identification in Loess Plateau Regions Using High-Resolution Remote Sensing Images. Sensors 2025, 25, 273. [Google Scholar] [CrossRef]
Li, Y.; Zhu, W.; Wu, J.; Zhang, R.; Xu, X.; Zhou, Y. DBSANet: A Dual-Branch Semantic Aggregation Network Integrating CNNs and Transformers for Landslide Detection in Remote Sensing Images. Remote Sens. 2025, 17, 807. [Google Scholar] [CrossRef]
Yang, Z.; Xu, C.; Li, L. Landslide Detection Based on ResU-Net with Transformer and CBAM Embedded: Two Examples with Geologically Different Environments. Remote Sens. 2022, 14, 2885. [Google Scholar] [CrossRef]
Li, D.; Tang, X.; Tu, Z.; Fang, C.; Ju, Y. Automatic Detection of Forested Landslides: A Case Study in Jiuzhaigou County, China. Remote Sens. 2023, 15, 3850. [Google Scholar] [CrossRef]
Wu, L.; Liu, R.; Ju, N.; Zhang, A.; Gou, J.; He, G.; Lei, Y. Landslide Mapping Based on a Hybrid CNN-Transformer Network and Deep Transfer Learning Using Remote Sensing Images with Topographic and Spectral Features. Int. J. Appl. Earth Obs. Geoinf. 2024, 126, 103612. [Google Scholar] [CrossRef]
Yi, Y.; Zhang, W. A New Deep-Learning-Based Approach for Earthquake-Triggered Landslide Detection from Single-Temporal RapidEye Satellite Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6166–6176. [Google Scholar] [CrossRef]
Asadi, A.; Baise, L.G.; Koch, M.; Moaveni, B.; Chatterjee, S.; Aimaiti, Y. Pixel-Based Classification Method for Earthquake-Induced Landslide Mapping Using Remotely Sensed Imagery, Geospatial Data and Temporal Change Information. Nat Hazards 2024, 120, 5163–5200. [Google Scholar] [CrossRef]
Shahabi, H.; Rahimzad, M.; Tavakkoli Piralilou, S.; Ghorbanzadeh, O.; Homayouni, S.; Blaschke, T.; Lim, S.; Ghamisi, P. Unsupervised Deep Learning for Landslide Detection from Multispectral Sentinel-2 Imagery. Remote Sens. 2021, 13, 4698. [Google Scholar] [CrossRef]
Yusri, N.A.; Misbari, S.; Ismail, I.W.; Gisen, J.I.A. Satellite-Based Landslide Distribution Mapping with the Adoption of Deep Learning Approach in the Kuantan River Basin, Pahang. IOP Conf. Ser. Earth Environ. Sci. 2024, 1296, 012014. [Google Scholar] [CrossRef]
Wang, L.; Zhang, M.; Shen, X.; Shi, W. Landslide Mapping Using Multilevel-Feature-Enhancement Change Detection Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 3599–3610. [Google Scholar] [CrossRef]
Wang, X.; Wang, X.; Zheng, Y.; Liu, Z.; Xia, W.; Guo, H.; Li, D. GDSNet: A Gated Dual-Stream Convolutional Neural Network for Automatic Recognition of Coseismic Landslides. Int. J. Appl. Earth Obs. Geoinf. 2024, 127, 103677. [Google Scholar] [CrossRef]
Zhang, R.; Zhu, W.; Li, Z.; Zhang, B.; Chen, B. Re-Net: Multibranch Network with Structural Reparameterization for Landslide Detection in Optical Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 2828–2837. [Google Scholar] [CrossRef]
Li, P.; Wang, Y.; Si, T.; Ullah, K.; Han, W.; Wang, L. MFFSP: Multi-Scale Feature Fusion Scene Parsing Network for Landslides Detection Based on High-Resolution Satellite Images. Eng. Appl. Artif. Intell. 2024, 127, 107337. [Google Scholar] [CrossRef]
Lu, W.; Hu, Y.; Zhang, Z.; Cao, W. A Dual-Encoder U-Net for Landslide Detection Using Sentinel-2 and DEM Data. Landslides 2023, 20, 1975–1987. [Google Scholar] [CrossRef]
Liu, X.; Peng, Y.; Lu, Z.; Li, W.; Yu, J.; Ge, D.; Xiang, W. Feature-Fusion Segmentation Network for Landslide Detection Using High-Resolution Remote Sensing Images and Digital Elevation Model Data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
Yang, C.; Zhu, Y.; Zhang, J.; Wei, X.; Zhu, H.; Zhu, Z. A Feature Fusion Method on Landslide Identification in Remote Sensing with Segment Anything Model. Landslides 2025, 22, 471–483. [Google Scholar] [CrossRef]
He, Y.; Chen, H.; Zhu, Q.; Zhang, Q.; Zhang, L.; Liu, T.; Li, W.; Chen, H. A Heterogeneous Ensemble Learning Method Combining Spectral, Terrain, and Texture Features for Landslide Mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 3746–3765. [Google Scholar] [CrossRef]
Qu, Y.; Xing, H.; Sun, L.; Shi, X.; Huang, J.; Ao, Z.; Chang, Z.; Li, J. Integrating Sentinel-2a Imagery, DEM Data, and Spectral Feature Analysis for Landslide Detection via Fully Convolutional Networks. Landslides 2025, 22, 335–352. [Google Scholar] [CrossRef]
Liu, X.; Xu, L.; Zhang, J. Landslide Detection with Mask R-CNN Using Complex Background Enhancement Based on Multi-Scale Samples. Geomat. Nat. Hazards Risk 2024, 15, 2300823. [Google Scholar] [CrossRef]
Paheding, S.; Reyes, A.A.; Rajaneesh, A.; Sajinkumar, K.S.; Oommen, T. MarsLS-Net: Martian Landslides Segmentation Network and Benchmark Dataset. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 8236–8245. [Google Scholar]
Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
Wei, R.; Ye, C.; Sui, T.; Zhang, H.; Ge, Y.; Li, Y. A Feature Enhancement Framework for Landslide Detection. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103521. [Google Scholar] [CrossRef]
Yang, Y.; Miao, Z.; Zhang, H.; Wang, B.; Wu, L. Lightweight Attention-Guided YOLO With Level Set Layer for Landslide Detection from Optical Satellite Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3543–3559. [Google Scholar] [CrossRef]
Ji, Q.; Liang, Y.; Xie, F.; Yu, Z.; Wang, Y. Automatic and Efficient Detection of Loess Landslides Based on Deep Learning. Sustainability 2024, 16, 1238. [Google Scholar] [CrossRef]
Li, Y.; Ding, M.; Zhang, Q.; Luo, Z.; Huang, W.; Zhang, C.; Jiang, H. Old Landslide Detection Using Optical Remote Sensing Images Based on Improved YOLOv8. Appl. Sci. 2024, 14, 1100. [Google Scholar] [CrossRef]
Ji, S.; Yu, D.; Shen, C.; Li, W.; Xu, Q. Landslide Detection from an Open Satellite Imagery and Digital Elevation Model Dataset Using Attention Boosted Convolutional Neural Networks. Landslides 2020, 17, 1337–1352. [Google Scholar] [CrossRef]
Qin, S.; Guo, X.; Sun, J.; Qiao, S.; Zhang, L.; Yao, J.; Cheng, Q.; Zhang, Y. Landslide Detection from Open Satellite Imagery Using Distant Domain Transfer Learning. Remote Sens. 2021, 13, 3383. [Google Scholar] [CrossRef]
Fu, Y.; Li, W.; Fan, S.; Jiang, Y.; Bai, H. CAL-Net: Conditional Attention Lightweight Network for In-Orbit Landslide Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
Amankwah, S.O.Y.; Wang, G.; Gnyawali, K.; Hagan, D.F.T.; Sarfo, I.; Zhen, D.; Nooni, I.K.; Ullah, W.; Duan, Z. Landslide Detection from Bitemporal Satellite Imagery Using Attention-Based Deep Neural Networks. Landslides 2022, 19, 2459–2471. [Google Scholar] [CrossRef]
Chen, H.; He, Y.; Zhang, L.; Yao, S.; Yang, W.; Fang, Y.; Liu, Y.; Gao, B. A Landslide Extraction Method of Channel Attention Mechanism U-Net Network Based on Sentinel-2A Remote Sensing Images. Int. J. Digit. Earth 2023, 16, 552–577. [Google Scholar] [CrossRef]
Liu, Q.; Wu, T.; Deng, Y.; Liu, Z. SE-YOLOv7 Landslide Detection Algorithm Based on Attention Mechanism and Improved Loss Function. Land 2023, 12, 1522. [Google Scholar] [CrossRef]
Ge, X.; Zhao, Q.; Wang, B.; Chen, M. Lightweight Landslide Detection Network for Emergency Scenarios. Remote Sens. 2023, 15, 1085. [Google Scholar] [CrossRef]
Du, Y.; Xu, X.; He, X. Optimizing Geo-Hazard Response: LBE-YOLO’s Innovative Lightweight Framework for Enhanced Real-Time Landslide Detection and Risk Mitigation. Remote Sens. 2024, 16, 534. [Google Scholar] [CrossRef]
Mo, P.; Li, D.; Liu, M.; Jia, J.; Chen, X. A Lightweight and Partitioned CNN Algorithm for Multi-Landslide Detection in Remote Sensing Images. Appl. Sci. 2023, 13, 8583. [Google Scholar] [CrossRef]
Zhang, W.; Liu, Z.; Zhou, S.; Qi, W.; Wu, X.; Zhang, T.; Han, L. LS-YOLO: A Novel Model for Detecting Multiscale Landslides with Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 4952–4965. [Google Scholar] [CrossRef]
Wu, Y.; Shi, L.; Xu, D.; Wang, H. A YOLOv5 Landslide Detection Model Based on Multi-Scale Feature Fusion. Res. Sq. 2024. submitted. [Google Scholar] [CrossRef]
Lu, W.; Hu, Y.; Shao, W.; Wang, H.; Zhang, Z.; Wang, M. A Multiscale Feature Fusion Enhanced CNN with the Multiscale Channel Attention Mechanism for Efficient Landslide Detection (MS2LandsNet) Using Medium-Resolution Remote Sensing Data. Int. J. Digit. Earth 2024, 17, 2300731. [Google Scholar] [CrossRef]
Liu, Y.; Yao, X.; Gu, Z.; Zhou, Z.; Liu, X.; Chen, X.; Wei, S. Study of the Automatic Recognition of Landslides by Using InSAR Images and the Improved Mask R-CNN Model in the Eastern Tibet Plateau. Remote Sens. 2022, 14, 3362. [Google Scholar] [CrossRef]
Cheng, L.; Li, J.; Duan, P.; Wang, M. A Small Attentional YOLO Model for Landslide Detection from Satellite Remote Sensing Images. Landslides 2021, 18, 2751–2765. [Google Scholar] [CrossRef]
Han, Z.; Fang, Z.; Li, Y.; Fu, B. A Novel Dynahead-Yolo Neural Network for the Detection of Landslides with Variable Proportions Using Remote Sensing Images. Front. Earth Sci. 2023, 10, 1077153. [Google Scholar] [CrossRef]
Meng, S.; Shi, Z.; Pirasteh, S.; Liberata Ullo, S.; Peng, M.; Zhou, C.; Nunes Gonçalves, W.; Zhang, L. TLSTMF-YOLO: Transfer Learning and Feature Fusion Network for Earthquake-Induced Landslide Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1–12. [Google Scholar] [CrossRef]
Xiang, X.; Gong, W.; Zhao, F.; Cheng, Z.; Wang, L. Earthquake-Induced Landslide Mapping in Mountainous Areas Using a Semantic Segmentation Model Combined with A Dual Feature Pyramid. J. Earth Sci. 2025. [Google Scholar]
Lampert, J.; Pham, L.; Le, C.; Schlögl, M.; Schindler, A. Utilizing Deep Neural Networks for Landslide Detection and Segmentation in Remote Sensing Imagery. In Proceedings of EGU General Assembly, Vienna, Austria, 14–19 April 2024; p. 20698. [Google Scholar]
Chen, H.; Shi, Z. A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
Nagendra, S.; Kifer, D.; Mirus, B.; Pei, T.; Lawson, K.; Manjunatha, S.B.; Li, W.; Nguyen, H.; Qiu, T.; Tran, S.; et al. Constructing a Large-Scale Landslide Database Across Heterogeneous Environments Using Task-Specific Model Updates. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4349–4370. [Google Scholar] [CrossRef]
Zhang, X.; Yu, W.; Pun, M.-O.; Shi, W. Cross-Domain Landslide Mapping from Large-Scale Remote Sensing Images Using Prototype-Guided Domain-Aware Progressive Representation Learning. ISPRS J. Photogramm. Remote Sens. 2023, 197, 1–17. [Google Scholar] [CrossRef]
Wang, J.; Chen, G.; Jaboyedoff, M.; Derron, M.-H.; Fei, L.; Li, H.; Luo, X. Loess Landslides Detection via a Partially Supervised Learning and Improved Mask-RCNN with Multi-Source Remote Sensing Data. CATENA 2023, 231, 107371. [Google Scholar] [CrossRef]
Wang, L.; Zhang, M.; Shi, W. CS-WSCDNet: Class Activation Mapping and Segment Anything Model-Based Framework for Weakly Supervised Change Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–12. [Google Scholar] [CrossRef]
Wang, J.; Zhang, X.; Ma, X.; Yu, W.; Ghamisi, P. Auto-Prompting SAM for Weakly Supervised Landslide Extraction. arXiv 2025, arXiv:2501.13426. [Google Scholar] [CrossRef]
Lv, J.; Zhang, R.; Wu, R.; Bao, X.; Liu, G. Landslide Detection Based on Pixel-Level Contrastive Learning for Semi-Supervised Semantic Segmentation in Wide Areas. Landslides 2025, 22, 1087–1105. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Shahabi, H.; Piralilou, S.T.; Crivellari, A.; Rosa, L.E.C.L.; Atzberger, C.; Li, J.; Ghamisi, P. Contrastive Self-Supervised Learning for Globally Distributed Landslide Detection. IEEE Access 2024, 12, 118453–118466. [Google Scholar] [CrossRef]
Garcia, G.P.B.; Grohmann, C.H.; Soares, L.P.; Espadoto, M. Relict Landslide Detection Using Deep-Learning Architectures for Image Segmentation in Rainforest Areas: A New Framework. Int. J. Remote Sens. 2023, 44, 2168–2195. [Google Scholar] [CrossRef]
Notti, D.; Cignetti, M.; Godone, D.; Cardone, D.; Giordan, D. The unsuPervised shAllow laNdslide rapiD mApping: PANDA Method Applied to Severe Rainfalls in Northeastern Appenine (Italy). Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103806. [Google Scholar] [CrossRef]
Bhuyan, K.; Meena, S.R.; Nava, L.; Van Westen, C.; Floris, M.; Catani, F. Mapping Landslides through a Temporal Lens: An Insight toward Multi-Temporal Landslide Mapping Using the u-Net Deep Learning Model. GISci. Remote Sens. 2023, 60, 2182057. [Google Scholar] [CrossRef]
Asadi, A.; Baise, L.G.; Chatterjee, S.; Koch, M.; Moaveni, B. Regional Landslide Mapping Model Developed by a Deep Transfer Learning Framework Using Post-Event Optical Imagery. Georisk Assess. Manag. Risk Eng. Syst. Geohazards 2024, 18, 186–210. [Google Scholar] [CrossRef]
Li, P.; Wang, Y.; Si, T.; Ullah, K.; Han, W.; Wang, L. DSFA: Cross-Scene Domain Style and Feature Adaptation for Landslide Detection from High Spatial Resolution Images. Int. J. Digit. Earth 2023, 16, 2426–2447. [Google Scholar] [CrossRef]
Li, P.; Wang, Y.; Liu, G.; Fang, Z.; Ullah, K. Unsupervised Landslide Detection from Multitemporal High-Resolution Images Based on Progressive Label Upgradation and Cross-Temporal Style Adaption. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
Wang, Z.; Brenning, A. Unsupervised Active–Transfer Learning for Automated Landslide Mapping. Comput. Geosci. 2023, 181, 105457. [Google Scholar] [CrossRef]
Feng, X.; Du, J.; Wu, M.; Chai, B.; Miao, F.; Wang, Y. Potential of Synthetic Images in Landslide Segmentation in Data-Poor Scenario: A Framework Combining GAN and Transformer Models. Landslides 2024, 21, 2211–2226. [Google Scholar] [CrossRef]
Kubo, S.; Yamane, T.; Chun, P. Study on Accuracy Improvement of Slope Failure Region Detection Using Mask R-CNN with Augmentation Method. Sensors 2022, 22, 6412. [Google Scholar] [CrossRef]
Chen, T.-H.K.; Kincey, M.E.; Rosser, N.J.; Seto, K.C. Identifying Recurrent and Persistent Landslides Using Satellite Imagery and Deep Learning: A 30-Year Analysis of the Himalaya. Sci. Total Environ. 2024, 922, 171161. [Google Scholar] [CrossRef]
Niyokwiringirwa, P.; Lombardo, L.; Dewitte, O.; Deijns, A.A.J.; Wang, N.; Van Westen, C.J.; Tanyas, H. Event-Based Rainfall-Induced Landslide Inventories and Rainfall Thresholds for Malawi. Landslides 2024, 21, 1403–1424. [Google Scholar] [CrossRef]
Xu, Y.; Ouyang, C.; Xu, Q.; Wang, D.; Zhao, B.; Luo, Y. CAS Landslide Dataset: A Large-Scale and Multisensor Dataset for Deep Learning-Based Landslide Detection. Sci. Data 2024, 11, 12. [Google Scholar] [CrossRef]
Huang, Y.; Xie, C.; Li, T.; Xu, C.; He, X.; Shao, X.; Xu, X.; Zhan, T.; Chen, Z. An Open-Accessed Inventory of Landslides Triggered by the MS 6.8 Luding Earthquake, China on September 5, 2022. Earthq. Res. Adv. 2023, 3, 100181. [Google Scholar] [CrossRef]
Emani, G.F.; Xu, W.; Guédé, K.G.; Nattabi, F.S.; Yemele, O.M. Advancing Global Landslide Segmentation: A Coupled Multispectral Attention and Data Augmentation Approach Using the Novel MRGSLD Dataset. Earth Syst. Environ. 2025, 1–22. [Google Scholar] [CrossRef]
Fang, C.; Fan, X.; Wang, X.; Nava, L.; Zhong, H.; Dong, X.; Qi, J.; Catani, F. A Globally Distributed Dataset of Coseismic Landslide Mapping via Multi-Source High-Resolution Remote Sensing Images. Earth Syst. Sci. Data 2024, 16, 4817–4842. [Google Scholar] [CrossRef]
Varnes, D.J. Landslide types and processes. Landslides Eng. Pract. 1958, 24, 20–47. [Google Scholar]
Novellino, A.; Pennington, C.; Leeming, K.; Taylor, S.; Alvarez, I.G.; McAllister, E.; Arnhardt, C.; Winson, A. Mapping Landslides from Space: A Review. Landslides 2024, 21, 1041–1052. [Google Scholar] [CrossRef]
Rodríguez, C.E.; Bommer, J.J.; Chandler, R.J. Earthquake-Induced Landslides: 1980–1997. Soil Dyn. Earthq. Eng. 1999, 18, 325–346. [Google Scholar] [CrossRef]
Rengers, F.K.; McGuire, L.A.; Oakley, N.S.; Kean, J.W.; Staley, D.M.; Tang, H. Landslides after Wildfire: Initiation, Magnitude, and Mobility. Landslides 2020, 17, 2631–2641. [Google Scholar] [CrossRef]
Pandey, A.C.; Islam, A.; Parida, B.R.; Dwivedi, C.S. Permafrost Destabilization Induced Hazard Mapping in Himalayas Using Machine Learning Methods. Adv. Space Res. 2025, 75, 6188–6206. [Google Scholar] [CrossRef]
Pollock, W.; Wartman, J. Human Vulnerability to Landslides. GeoHealth 2020, 4, e2020GH000287. [Google Scholar] [CrossRef]
Highland, L.M.; Bobrowsky, P. The Landslide Handbook—A Guide to Understanding Landslides; US Geological Survey: Reston, VA, USA, 2008. [Google Scholar]
Guzzetti, F.; Peruccacci, S.; Rossi, M.; Stark, C.P. Rainfall Thresholds for the Initiation of Landslides in Central and Southern Europe. Meteorol. Atmos. Phys. 2007, 98, 239–267. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Meena, S.R.; Shahabi Sorman Abadi, H.; Tavakkoli Piralilou, S.; Zhiyong, L.; Blaschke, T. Landslide Mapping Using Two Main Deep-Learning Convolution Neural Network Streams Combined by the Dempster–Shafer Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 452–463. [Google Scholar] [CrossRef]
Cui, W.; He, X.; Yao, M.; Wang, Z.; Li, J.; Hao, Y.; Wu, W.; Zhao, H.; Chen, X.; Cui, W. Landslide Image Captioning Method Based on Semantic Gate and Bi-Temporal LSTM. ISPRS Int. J. Geo-Inf. 2020, 9, 194. [Google Scholar] [CrossRef]
Wang, L.; Lei, H.; Jian, W.; Wang, W.; Wang, H.; Wei, N. Enhancing Landslide Detection: A Novel LA-YOLO Model for Rainfall-Induced Shallow Landslides. IEEE Geosci. Remote Sens. Lett. 2025, 22, 1–5. [Google Scholar] [CrossRef]
Qi, W.; Wei, M.; Yang, W.; Xu, C.; Ma, C. Automatic Mapping of Landslides by the ResU-Net. Remote Sens. 2020, 12, 2487. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Meena, S.R.; Blaschke, T.; Aryal, J. UAV-Based Slope Failure Detection Using Deep-Learning Convolutional Neural Networks. Remote Sens. 2019, 11, 2046. [Google Scholar] [CrossRef]
Zhang, Y.; Ren, S.; Liu, X.; Guo, C.; Li, J.; Bi, J.; Ran, L. Reactivation Mechanism of Old Landslide Triggered by Coupling of Fault Creep and Water Infiltration: A Case Study from the East Tibetan Plateau. Bull. Eng. Geol. Environ. 2023, 82, 291. [Google Scholar] [CrossRef]
Goodchild, M.F.; Li, W. Replication across Space and Time Must Be Weak in the Social and Environmental Sciences. Proc. Natl. Acad. Sci. USA 2021, 118, e2015759118. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Hsu, C.-Y.; Wang, S.; Kedron, P. GeoAI Reproducibility and Replicability: A Computational and Spatial Perspective. Ann. Am. Assoc. Geogr. 2024, 114, 2085–2103. [Google Scholar] [CrossRef]
Jakubik, J.; Roy, S.; Phillips, C.; Fraccaro, P.; Godwin, D.; Zadrozny, B.; Szwarcman, D.; Gomes, C.; Nyirjesy, G.; Edwards, B.; et al. Foundation models for generalist geospatial artificial intelligence. arXiv 2023, arXiv:2310.18660. [Google Scholar]
Hsu, C.-Y.; Li, W.; Wang, S. Geospatial Foundation Models for Image Analysis: Evaluating and Enhancing NASA-IBM Prithvi’s Domain Adaptability. Int. J. Geogr. Inf. Sci. 2024, 1–30. [Google Scholar] [CrossRef]
Wang, T.; Dahal, A.; Fang, Z.; Van Westen, C.; Yin, K.; Lombardo, L. From Spatio-Temporal Landslide Susceptibility to Landslide Risk Forecast. Geosci. Front. 2024, 15, 101765. [Google Scholar] [CrossRef]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
Wen, H.; Liu, B.; Di, M.; Li, J.; Zhou, X. A SHAP-Enhanced XGBoost Model for Interpretable Prediction of Coseismic Landslides. Adv. Space Res. 2024, 74, 3826–3854. [Google Scholar] [CrossRef]
Razavi-Termeh, S.V.; Sadeghi-Niaraki, A.; Naqvi, R.A.; Choi, S.-M. Investigating the Efficacy of Physics-Based Metaheuristic Algorithms in Combination with Explainable Ensemble Machine-Learning Models for Landslide Susceptibility Mapping. Stoch. Environ. Res. Risk Assess. 2025, 39, 1109–1141. [Google Scholar] [CrossRef]
Reddy, C.N. Explainable Artificial Intelligence (XAI) for Climate Hazard Assessment: Enhancing Predictive Accuracy and Transparency in Drought, Flood, and Landslide Modeling. Int. J. Sci. Technol. (IJSAT) 2025, 16, 1309. [Google Scholar] [CrossRef]
Patanè, G.; Bortolotti, T.; Yordanov, V.; Biagi, L.G.A.; Brovelli, M.A.; Truong, X.Q.; Vantini, S. An Interpretable and Transferable Model for Shallow Landslides Detachment Combining Spatial Poisson Point Processes and Generalized Additive Models. Stoch. Environ. Res. Risk Assess. 2025, 39, 1723–1740. [Google Scholar] [CrossRef]
Pradhan, B.; Dikshit, A.; Lee, S.; Kim, H. An Explainable AI (XAI) Model for Landslide Susceptibility Modeling. Appl. Soft Comput. 2023, 142, 110324. [Google Scholar] [CrossRef]
Hsu, C.-Y.; Li, W. Explainable GeoAI: Can Saliency Maps Help Interpret Artificial Intelligence’s Learning Process? An Empirical Study on Natural Feature Detection. Int. J. Geogr. Inf. Sci. 2023, 37, 963–987. [Google Scholar] [CrossRef]
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. In Proceeding of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.-R.; Samek, W. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS ONE 2015, 10, e0130140. [Google Scholar] [CrossRef]
Kumari, S.; Agarwal, S.; Agrawal, N.K.; Agarwal, A.; Garg, M.C. A Comprehensive Review of Remote Sensing Technologies for Improved Geological Disaster Management. Geol. J. 2024, 60, 223–235. [Google Scholar] [CrossRef]

Figure 1. Commonly used data, deep learning models, and learning techniques in LIM.

Figure 2. Visualization of sample optical and DEM data for LIM. (a) RGB images from Sentinel-2; (b) Short-wave infrared band (SWIR-2) from Sentinel-2; (c) DEM data from Advanced Land Observing Satellite (ALOS) PALSAR. Warmer colors (e.g., red) represent higher elevations, while cooler colors (e.g., purple) indicate lower elevations; and (d) Ground-truth labels for landslide areas (white) for training deep learning models.

Figure 3. Four main clusters of strategies for improving CNNs that have been adapted in LIM. The first cluster summarizes work that applies various CNN-based models—such as FCN, U-Net, DenseNet, and LanDCNN—in LIM [35,49,50,51,52,53]. The second cluster evaluates performance variations by modifying individual components, including neural network architectures (e.g., Mask R-CNN, YOLO) [53,54,55,56,57], backbone networks (e.g., ResNet18, ResNet50) [57,58], and optimizers (e.g., Adam) [56], and examines the scalability of CNNs [59]. The third cluster combines multiple CNNs, often using shared encoders or hybrid architectures [60]. The fourth cluster explores alternative deep learning models derived from CNNs, such as Graph Convolutional Networks (GCNs), all of which are aimed at improving the adaptability of CNNs in LIM [61,62].

Figure 4. Model comparison results for landslide segmentation [57]. PSPNet with ResNet as the feature extraction backbone (last column) yields the best overall segmentation results for landslide areas (shown in red).

Figure 5. Strategies for adapting GANs in LIM. In the first cluster strategy, domain-adaptive GANs are applied with adversarial and contrastive learning to differentiate landslide and non-landslide areas [66]. In the second cluster strategy, one approach involves semi-supervised learning, such as clustering with pseudo-labeling to extract landslides [67], or training on both labeled and unlabeled images to classify landslide occurrences [68]. The third cluster strategy employs inexact supervised learning at the image level [69].

Figure 6. GAN-based architecture for landslide mapping (adapted from Zhang et al. [67]). In the yellow box, pseudo-labels are extracted from bi-temporal input images and used to guide the training process. The blue box illustrates the use of multi-level DRL to extract object-level and pixel-level representations through adversarial learning, implemented in the green box using WGAN-GP. These multi-level and multi-temporal deep representations are further refined via spatial and channel attention mechanisms and finally classified in the multi-temporal DRF module shown in the purple box.

Figure 7. Applications of RNNs adapted in LIM. RNNs have been used to detect temporal changes related to landslides [70], as well as for classification and feature extraction [37,71,72].

Figure 8. Two main strategies for adapting Transformers in LIM. The first cluster of studies applies a single Transformer to capture various features [23,74,75]. The second cluster of studies frequently combines Transformers with other models, such as CNNs and GCNs. Transformers are used for different tasks in combination with CNNs across various applications—for example, capturing spatial information [76], expanding the receptive field of CNNs [77,78,79,80,81], and extracting global features from large-scale imagery [82]. Transformers have also been shown to be effective for landslide segmentation when integrated with GCNs [62].

Figure 9. A multi-level feature enhancement framework for landslide mapping (adapted from Wang et al. [87]). Two branches of the Siamese Neural Network take pre- and post-event images as input data. PFEM, shown in the dashed green box, enhances post-event features using the DEM in PFEB and then sends them to BFDEB, shown in the dashed blue box, to amplify the differences between the bi-temporal images. In FDCM, shown in the dashed orange box, the DEM and the enhanced difference features are used for change detection, which is further refined using flow direction to complete the final landslide detection.

Figure 10. Illustrations of attention modules. (a) Spatial attention module: The input images undergo average pooling and max pooling to produce two spatial maps, which are then concatenated along the channel dimension and passed through a convolutional layer. Finally, the spatial attention map is generated using a sigmoid activation. (b) Channel attention module: The input images are processed with average pooling and max pooling to generate two descriptors, which are passed through a shared MLP, summed, and then used to generate the channel attention map via sigmoid activation.

Figure 11. Progress in LIM using deep learning. Current deep learning methods for LIM are predominantly focused on landslides that are small to medium in volume but exhibit rapid displacement rates (blue-shaded area), with limited efforts aimed at detecting large-scale, slow-moving geomorphological landslides (green-shaded area).

Table 1. Classification of feature enhancement techniques.

Strategy	Techniques	References
Image-derived feature representation enhancement	NDVI, gray-level co-occurrence matrix	[25,26,84,85,86]
Feature fusion enhancement (Multi-scale-level feature fusion, multimodal feature fusion, bitemporal feature differentiation)	Multi-level feature enhancement network (MFENet) Bi-feature difference enhancement module (BFDEM) Gated Dual-Stream Convolutional Neural Network (GDSNet) Multi-branch feature extraction module (FFEM) Shape-enhanced vision Transformer (ShapeFormer) Multi-scale Feature Fusion Scene Parsing (MFFSP), DemDet	[35] DemDet [81] MFENet, BFDEM [87] GDSNet [88] FFEM [89] ShapeFormer [75] MFFSP [90] [91] FFS-Net [92] SAMLS [93] [94] [95]
Background enhancement	Partial image replacement	[36] [96]

Table 2. Classification of attention-boosted deep learning techniques for landslide mapping.

Strategy	Techniques	References
Attention block	Attention gate, attention convolution block	[28,99,100]
Spatial/channel attention	Spatial attention, channel attention, CBAM, BAM, 3D SCAM, ECAM, PAM, SENet, dual-stream conditional attention module, etc.	[21,67,80,81,84,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115]
Multi-scale attention	Selective kernel attention mechanism, Task-aware/spatial-aware/scale-aware attention block, Light-pyramid feature reuse fusion attention mechanism, etc.	[32,105,113,114,115,116,117,118,119]
Long-range dependencies capturing attention	Self-attention module, multi-head self-attention, non-local attention	[23,32,62,76,77,78,79,80,81,82,91,97,113,115,120]

Table 3. Classification of deep learning techniques for addressing training data scarcity challenges.

Strategy	Techniques	References
Manual creation of benchmark datasets	Historical inventory utilization, on-site field surveys, data augmentation, data validation, data augmentation	[97]
Dataset updating tools	Task-specific model update (TSMU)	[122]
Learning with fewer or less accurate labels	Transfer learning, unsupervised active-transfer learning, unsupervised domain adaptation model, weakly supervised learning, partially supervised learning, Clustering algorithms (K-means, fuzzy C-means), GANs	[59,67,68,69,82,85,104,118,123,124,125,126,127,128,129,130,131,132,133,134,135,136]

Table 4. Classification of deep learning techniques for detecting various types of landslides.

Applications	Models	Techniques	References
Identify earthquake-triggered landslides	TLSMF-YOLO with C3-Swin Transformer, dual-feature pyramid-based U-Net (DFPU-Net), Auto-Prompting Segment Anything Model (APSAM), SegFormer, LandsNet, Mask R-CNN, Attention U-Net, LSTM, etc.	Pyramid-structured module, Transfer learning, Transformer, Attention mechanism	[23,28], [70,74,78], [81,84,88,118,122,139], [72,119], [152,153]
Identify rainfall-triggered landslides	lightweight attention-guided YOLO (LA-YOLO), Feature-based Constraint Deep U-Net (FCDU-Net), etc.	Context-guided block, Feature constraint	[106], [26,154,155,156]
Identify old landslides	Mask R-CNN, YOLO, YOLOv8, RetinaNet	Neural network comparisons	[53,102]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, X.; Li, W.; Hsu, C.-Y.; Arundel, S.T.; Higman, B. Harnessing Geospatial Artificial Intelligence and Deep Learning for Landslide Inventory Mapping: Advances, Challenges, and Emerging Directions. Remote Sens. 2025, 17, 1856. https://doi.org/10.3390/rs17111856

AMA Style

Chen X, Li W, Hsu C-Y, Arundel ST, Higman B. Harnessing Geospatial Artificial Intelligence and Deep Learning for Landslide Inventory Mapping: Advances, Challenges, and Emerging Directions. Remote Sensing. 2025; 17(11):1856. https://doi.org/10.3390/rs17111856

Chicago/Turabian Style

Chen, Xiao, Wenwen Li, Chia-Yu Hsu, Samantha T. Arundel, and Bretwood Higman. 2025. "Harnessing Geospatial Artificial Intelligence and Deep Learning for Landslide Inventory Mapping: Advances, Challenges, and Emerging Directions" Remote Sensing 17, no. 11: 1856. https://doi.org/10.3390/rs17111856

APA Style

Chen, X., Li, W., Hsu, C.-Y., Arundel, S. T., & Higman, B. (2025). Harnessing Geospatial Artificial Intelligence and Deep Learning for Landslide Inventory Mapping: Advances, Challenges, and Emerging Directions. Remote Sensing, 17(11), 1856. https://doi.org/10.3390/rs17111856

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Harnessing Geospatial Artificial Intelligence and Deep Learning for Landslide Inventory Mapping: Advances, Challenges, and Emerging Directions

Abstract

1. Introduction

2. Data and Methods

2.1. Data Sources

2.2. Common Deep Learning Models in LIM

3. Progress on Deep Learning-Based Modeling for LIM

3.1. Early Attempts at Using Deep Learning in LIM

3.2. CNN Improvements in LIM

3.3. The Application of GANs in LIM

3.4. The Application of RNNs in LIM

3.5. The Application of Transformers in Landslide Detection

4. Advanced Deep Learning Strategies and Techniques Adopted for Landslide Mapping

4.1. Feature Enhancement and Fusion Techniques

4.2. Attention-Boosted Neural Network for LIM

4.3. Addressing the Limitation of Insufficient Training Data

4.4. Detecting Different Types of Landslides

5. Discussions on the Limitations of the Current Research and Future Opportunities

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI