1. Introduction
Landslides constitute one of the most prevalent global geological hazards, whose formation and evolution are shaped by multiple factors such as topography, geological structures, geotechnical properties, and hydrology. Owing to their high suddenness, strong destructiveness, and widespread impact, landslides pose severe threats to human life, property safety, and socio-economic development [
1,
2,
3]. Owing to its unique topography and climate, Southwest China is the most severely affected region, experiencing the highest frequency and most substantial losses from landslide disasters [
4,
5,
6].
The Jinsha River Basin serves as a pivotal base for hydropower energy development. Nevertheless, the interplay of its distinctive topography, vertical climate zones, fragile geology, and intensive human engineering activities has resulted in a high frequency of landslides. These events seriously threaten hydropower project safety and regional sustainable development [
7,
8,
9]. The reservoir area of the Ahai Hydropower Station, situated in the middle reaches of the Jinsha River, constitutes a critical node in the West-East Electricity Transmission project. A combination of typical alpine canyon topography, active neotectonic movements, fragmented rock masses, and frequent heavy rainfall creates an environment predisposing the area to pronounced reservoir bank slope failures [
10,
11]. Consequently, research focused on landslide identification in this area is of paramount importance for ensuring the safe operation of the hydropower station and for protecting the lives and property within the surrounding environment [
12].
Surface deformation monitoring is crucial for landslide prevention. While traditional techniques, such as leveling and Global Navigation Satellite System (GNSS) offer high accuracy, their application is often hindered by poor accessibility in high-altitude terrain, long observation cycles, and limited spatial coverage, making them inefficient for monitoring large reservoir areas [
13,
14]. In recent years, InSARhas emerged as a pivotal tool for large-area deformation monitoring due to its broad coverage, high precision, and non-contact nature [
15,
16,
17,
18]. The evolution of time series InSAR techniques, including Persistent Scatterer Interferometric Synthetic Aperture Radar (PS-InSAR) [
19,
20], SBAS-InSAR [
21,
22], and Distributed Scatterer InSAR (DS-InSAR) [
23,
24], has effectively mitigated decorrelation issues, significantly advancing landslide applications. For instance, studies have successfully coupled these methods with machine learning for susceptibility mapping, identified numerous landslides in reservoir areas [
25], and the updating of a regional landslide inventory in Tuscany using PS-InSAR, leading to the identification of 672 active landslides [
26]. Further innovation is seen in the combination of multi-temporal InSAR with meta-learning to improve slow landslide detection in complex terrains such as Hong Kong [
27]. Despite these advancements, large-scale landslide identification still heavily relies on manual interpretation of InSAR deformation data, which is a labor-intensive and time-consuming process. There is a pronounced lack of efficient and accurate methods for the automated identification of landslide anomalies.
In recent years, deep learning has made significant progress in the field of geological hazard identification, providing a new technological approach for large-scale landslide identification. The landslide recognition method based on deep learning utilizes a hierarchical feature learning mechanism to automatically extract key features such as morphology and texture, effectively improving the low efficiency and subjectivity of traditional manual interpretation methods [
28], and has achieved a series of important results in the field of landslide recognition [
29,
30,
31,
32,
33]. In addition, the integration of InSAR technology and Convolutional Neural Network (CNN) algorithm has demonstrated good applicability in multiple fields, including seismic deformation monitoring [
34], mining subsidence assessment [
35], and ground subsidence detection [
36]. Research has shown that CNN has significant advantages in automatic identification and monitoring of geological hazards based on InSAR datasets [
37]. However, in alpine valley regions, influenced by complex topography and vegetation coverage, existing methods still exhibit limitations in feature representation and model generalization capabilities. The development of intelligent identification techniques tailored to complex terrain conditions remains an important research direction.
This study proposes an automated method for identifying large-scale potential landslide hazards by integrating InSAR technology with an improved CRF-Faster R-CNN model. Taking the Ahai Reservoir area in the Jinsha River Basin as a case study, we processed 248 ascending and descending Sentinel-1A images acquired between January 2019 and December 2021. The SBAS-InSAR technique was employed to derive surface deformation information. The Faster R-CNN architecture was enhanced through the incorporation of ResNet-50 integrated with CBAM and FPN. The model was trained using monthly deformation velocity maps generated through SBAS-InSAR processing to enable automated detection of deformation anomalies. Potential landslide hazards were systematically identified through the integration of high-resolution optical remote sensing imagery and field validation, with detection accuracy rigorously evaluated. This research provides significant contributions to landslide prevention and mitigation efforts, offering valuable insights for geological disaster risk management in southwestern China.
3. Materials and Methods
Figure 2 illustrates the technical flowchart of this study, which consists of three key components: (1) acquisition of surface deformation information using SBAS-InSAR technology, based on 88 ascending and 160 descending Sentinel-1A images acquired between January 2019 and December 2021; (2) an improved Faster R-CNN model incorporating the CBAM attention mechanism, ResNet-50, and FPN was adopted to automatically identify landslide anomalies from deformation velocity maps; (3) accuracy assessment was conducted through model comparison, while optical imagery and field surveys were integrated to confirm the landslide hazard inventory in the study area and validate the accuracy of landslide identification results.
3.1. Technical Principle of SBAS-InSAR
SBAS-InSAR technology, capable of monitoring large-scale, long-time-series surface deformation with millimeter-scale precision, has become an essential tool for observing slow surface displacement and estimating geophysical parameters [
11]. In contrast to D-InSAR and PS-InSAR, the SBAS method performs interferometric analysis on multiple SAR images acquired over the same area at different times. Unlike PS-InSAR, which depends on permanent scatterers, SBAS-InSAR utilizes distributed scatterers with stable characteristics, providing superior performance in natural terrain monitoring and enhanced resistance to decorrelation.
First, precise orbit data were applied to geometrically rectify the SAR images, thereby improving their geolocation accuracy. During the interferometric processing stage, the input SAR data undergoes registration of interferometric pairs. Given
single-look complex (SLC) SAR images acquired at times
, one image is selected as the master and the remaining images were co-registered to it. During interferogram generation, appropriate spatial and temporal baseline thresholds were set, ultimately resulting in
interferometric pairs. These interferometric pairs can be expressed as:
The experiment discarded interferometric pairs with low coherence, resulting in the generation of 216 ascending and 238 descending interferograms. For each interferometric pair, the phase difference was calculated to extract interferometric phase information. Phase unwrapping was performed using the minimum cost flow (MCF) method, and Goldstein filtering was applied to smooth the interferometric phase. To mitigate atmospheric phase delays, GACOS data were used for atmospheric correction, and the topographic phase was removed using a high-accuracy DEM. Through the selection of ground control points (GCPs) for orbit refinement and re-flattening processing, followed by two inversion and geocoding, we ultimately obtained line-of-sight (LOS) deformation rates and associated results.
Building upon the aforementioned processing, using
as the initial time and defining
and
as subsequent time intervals (
), the differential interferometric phase
at any pixel coordinates
in the
-th differential interferogram can be expressed as:
where
represents the radar wavelength, and
and
denote the cumulative deformation of LOS at times
and
relative to the reference epoch
.
3.2. Faster RCNN Model
Proposed by Ross Girshick in 2015, Faster R-CNN stands as one of the most representative achievements in the R-CNN series and a classical example among two-stage object detection algorithms [
38]. In contrast to single-stage object detection algorithms such as Single Shot MultiBox Detector (SSD) [
39], You Only Look Once (YOLO) [
40,
41], Faster R-CNN exhibits higher detection accuracy and demonstrates superior adaptability to multi-scale objects. The algorithm introduces a Region Proposal Network (RPN) and performs refined classification and regression through Region of Interest Pooling (RoI Pooling), achieving end-to-end object detection with significant advantages in complex scene applications.
The Faster R-CNN network architecture comprises four principal components, the backbone feature extraction network, the Region Proposal Network (RPN), the ROI pooling layer, and the detection head, as illustrated in
Figure 3. The processing involves first the input image feeds through a convolutional network to extract high-level feature maps enriched with semantic information. Subsequently, the RPN performs sliding-window detection across these feature maps, generating object proposals along with their corresponding confidence scores through a predefined anchor box mechanism. These variable-sized candidate regions are then normalized into fixed-dimensional feature blocks via ROI pooling. Finally, the corrected features are input into the detection head, which comprises two parallel branches: the classification branch that employs softmax function to calculate the probability of each proposal belonging to the landslide category, and the regression branch that further optimizes the bounding box coordinates to output the precise spatial location of the landslide body.
3.3. CRF-Faster RCNN Model
Aiming at the problems of the traditional Faster RCNN model, such as insufficient feature extraction capability caused by gradient vanishing in deep networks and difficulty in adapting single-layer feature output to multi-scale target detection, this study constructs a novel network model (CRF-Faster RCNN) based on the basic algorithm for the identification of landslide anomaly areas. This model replaces the traditional Visual Geometry Group (VGG) network with ResNet-50 and FPN as the backbone feature extraction network. With the help of the residual connection mechanism, it effectively solves the gradient vanishing problem in deep network training and improves the quality and stability of feature extraction. Meanwhile, it innovatively integrates the CBAM module, utilizing a channel-spatial dual attention mechanism to enhance the model’s adaptability to complex scenarios and improve the detection robustness in vegetation-covered areas. This model can more efficiently capture the semantic information and detailed features of landslide areas, thereby improving the accuracy and reliability of landslide monitoring.
Figure 4 shows the schematic diagram of the CRF-Faster RCNN model structure.
ResNet-50 is a deep convolutional neural network architecture based on residual learning mechanisms designed to address the critical challenges of vanishing gradients and performance degradation in deep network training, representing a milestone achievement in computer vision [
42]. By introducing residual connection mechanism, this architecture significantly enhances both optimization efficiency and feature representation capability in deep networks. The ResNet-50 architecture comprises five feature extraction stages (Conv1~Conv5), each implemented through stacked bottleneck modules that facilitate multi-level feature representation, as illustrated in
Figure 5. Different from the traditional network that directly fits the target mapping function
), ResNet-50 reconstructs the target function into a residual learning form:
where
is the input vector of the module, and
is the residual function; when the input and output dimensions are consistent,
is an identity matrix (i.e.,
), and the element-wise superposition of the input and residual features is achieved through Shortcut Connection; when the dimensions mismatch, W
s adopts a linear projection matrix composed of
convolution to adjust the number of input channels and ensure dimensional consistency.
To enhance the multi-scale object detection capability, this study introduces FPN as the multi-scale feature fusion architecture [
43]. Its core design includes the Top-Down Pathway, Lateral Connections, and feature fusion modules. Based on the feature maps (C2~C5) extracted from different stages of ResNet-50, FPN performs channel alignment and element-wise addition through bilinear interpolation sampling in the top-down pathway and lateral connections, thereby fusing high-semantic features from deep layers (e.g., C5) with high-resolution features from shallow layers (e.g., C4) across levels. It iteratively generates a multi-scale feature pyramid (P2~P6) with rich semantic and spatial details to support the detection of objects of different sizes. ResNet-50-FPN integrates deep semantic features and multi-scale features, which significantly improves detection and recognition performance. The combined architecture of ResNet50-FPN is specifically shown in
Figure 6.
In target detection, traditional network structures are limited by local feature extraction, making it difficult to capture global contextual information and prone to losing key features, which results in insufficient detection accuracy. Therefore, based on the ResNet-50 and FPN architectures, this study introduces the CBAM module, which consists of a Channel Attention Module (CAM) and a Spatial Attention Module (SAM) in series. As shown in
Figure 7, the input feature map
undergoes global average pooling and max pooling through the CAM module to capture channel dependencies and generate weights for weighting, resulting in an intermediate feature map
; subsequently, P′ is processed by the SAM module to generate spatial weights through spatial pooling and convolution, further calibrate the feature map, and ultimately output the optimized feature map
. This dual attention mechanism can effectively focus on the key regions of the target, enhancing the model’s recognition accuracy in complex scenarios.
3.4. Construction of Time-Series Deformation Dataset
This study constructed a sample dataset based on ascending and descending InSAR deformation measurements from the Ahai Reservoir area. The experimental data comprised SAR imagery acquired between 2019 and 2021, with monthly deformation data systematically incorporated into the dataset. The original deformation images, which comprise a total of 248 scenes (combining ascending and descending), each possess dimensions of 2285 × 3565 pixels. Owing to the considerable spatial extent of the study area, which far exceeds the model input size, a cropping procedure was implemented in order to resize individual samples to 512 × 512 pixels, ensuring complete representation of deformation-intensive regions in the processed images. This preprocessing stage generated 6560 image samples. To maintain the requisite level of detection accuracy, we excluded data with poor quality, retaining 410 high-quality samples as the foundational dataset. Subsequent to this, the dataset was expanded through the implementation of data augmentation techniques, resulting in the augmentation of the dataset to 1230 samples for comprehensive model training and testing.
3.5. Loss Function and Evaluation Metrics
This study employs a joint loss function to optimize the model parameters. The total loss () is composed of the classification loss () and the bounding box regression loss (), with the aim of improving the classification accuracy and spatial positioning ability of the landslide target simultaneously. The model performance is evaluated using mAP as the core metric, comprehensively reflecting the balance between precision and recall in landslide detection.
3.5.1. Joint Loss Function Design
The training of the Faster R-CNN model is achieved by optimizing the loss function, with the objective of minimizing the discrepancy between the predictions and the ground truth, thereby enhancing the model’s generalization capability and robustness. The overall loss function in this study consists of two components: the classification loss
and the bounding box regression loss function
. These two components coordinate the optimization weights through the balancing coefficient
.
The classification task employs the cross-entropy loss function, which quantifies the discrepancy between the probability distribution of model predictions and the distribution of ground-truth labels, thereby driving the optimization of classifier parameters. Combined with the Sigmoid function to map the outputs into a probability distribution, its mathematical expression is as follows:
where
represents the total number of samples (including both positive and negative samples),
denotes the total number of categories,
indicates the true label of the
i-th sample for class
, and
represents the predicted probability that the
i-th sample belongs to class
. The cross-entropy loss generates gradient signals proportional to the error through backpropagation, leading to substantial parameter updates when predictions are incorrect. In Faster R-CNN, this function is employed to determine whether candidate regions correspond to landslide targets or the background, enhancing the classifier’s discriminative ability though end-to-end training.
The bounding box regression task employs the Smooth
Loss function. By adopting a piecewise strategy, it maintains the smoothness of the
loss function for small errors while inheriting the robustness of the
loss function for large errors, making it the core loss function for bounding box regression in object detection models. Its mathematical expression is as follows:
where
denotes the true value,
represents the model’s predicted value, and
indicates the absolute value of the prediction error.
3.5.2. Evaluation Metrics
The model performance is evaluated using the mean Average Precision (
) as the core metric, comprehensively assessing detection accuracy in multi-class scenarios. The
combines the Precision-Recall (
) curve metric and computes the average of the Average Precision (
) values across all classes. A higher
value indicates greater model accuracy. The calculation process is as follows:
In evaluating the performance of the model in automatically identifying landslide anomaly areas, since the model’s prediction results for image data include only two categories: landslide anomaly areas and non-landslide targets; the task is treated as a binary classification problem. Calculations are performed using a binary confusion matrix. Combined with the confusion matrix (
Table 2), further analysis of the model’s false positives and false negatives can be conducted.
4. Results and Analysis
4.1. InSAR Deformation Results
This study integrates both ascending and descending orbit datasets to obtain InSAR deformation information, achieving comprehensive monitoring of surface deformation in the reservoir area and significantly enhancing the accuracy and reliability of the monitoring results.
Figure 8 displays the spatial distribution of the annual average deformation rate along the radar LOS, where blue indicates surface displacement toward the satellite and red denotes displacement away from the satellite. Constrained by the high mountain-valley terrain and the side-looking imaging geometry of radar, the positive and negative deformation values serve merely as preliminary characterization of surface deformation activity.
As shown in
Figure 8a, the deformation results from the ascending data reveal significant spatial heterogeneity in the average annual deformation rate across the study area. The maximum deformation rate reaches −79.2 mm/yr, primarily concentrated near Fengke Town and Donglian Village on the west side of the Jinsha River. The deformation results from the descending, presented in
Figure 8b, reveal an asymmetric distribution of extreme deformation values. The maximum deformation rate is −95.8 mm/y, concentrated along the eastern side of the Jinsha River from Baiya to Fengke. The average annual deformation rate within the region primarily ranges from −35 to −30 mm/yr. Deformation activities are spatially clustered in a belt-like pattern along both banks of the Jinsha River, showing strong correlation with regional fault structures and the orientation of the reservoir shoreline.
Prominent deformation zones identified from the ascending data are primarily distributed on the western side of the Jinsha River, encompassing villages such as Labo, Gukongmei, and Baiya. The segment from Meigudi to Shuzhi, in particular, shows potential landslide instability due to the effects of the reservoir bank. In contrast, the descending data reveal more active deformation on the eastern side of the river basin. Regions such as Ligu, Ruziluo, and Kuzhi generally exhibit higher subsidence rates compared to the left bank of the Jinsha River, which may be strongly associated with rock mass creep on steep eastern slopes and well-developed fold-related faults. Notably, regions including Baiya, Xinjian, and Shuzhi show significant deformation signals in both ascending and descending data, suggesting the potential presence of multi-directional superimposed deformation in these areas.
4.2. Automatic Identification Results of Landslide Anomaly Areas Based on the CRF-Faster RCNN
4.2.1. Experimental Setup and Training Visualization
The experiments were implemented in the Python 3.9 programming environment and developed based on the PyTorch 1.10 deep learning framework. The hardware environment for running the experiments was a Linux 18.04 64-bit operating system, an AMD Ryzen 5 3600X CPU @ 3.70 GHz, 128 GB of memory, and an NVIDIA GeForce RTX 3080 Ti GPU. The specific experimental parameters are listed in
Table 3.
4.2.2. Identification Results of Abnormal Landslide Areas in Ahai Reservoir Area
This study employs the Faster RCNN and CRF-Faster RCNN models for the automatic identification of landslide anomaly areas in InSAR deformation datasets, with the results shown in
Figure 9 and
Figure 10. Specifically,
Figure 9 presents the identification results from the ascending images, while
Figure 10 displays those from the descending images. In the
Figure 9a,b and
Figure 10a,b represent the identification results of the Faster RCNN and CRF-Faster RCNN models, respectively.
As shown in
Figure 9 and
Figure 10, the CRF-Faster R-CNN model demonstrates significant improvement in identifying landslide anomaly areas. In the ascending images, Faster RCNN identified 42 landslide anomaly areas, whereas the CRF-Faster RCNN model identified 57. In the descending images, Faster RCNN detected 74 landslide anomaly areas, while the CRF-Faster RCNN model identified 82. These results indicate that the proposed CRF-Faster RCNN model offers superior capability in detailed landslide recognition. However, overlapping detection boxes can be observed in the results, which is primarily due to the IoU threshold being set to 0.7 in this study—meaning any prediction box with a probability exceeding 0.7 is recognized and retained as a landslide anomaly area. Furthermore, the study area is located in a high mountain valley region with substantial vegetation coverage, which may lead to decorrelation issues in the deformation results obtained by the SBAS-InSAR technique. Additionally, vegetation growth may also cause phase changes that might be misinterpreted as surface deformation, leading to false identifications of landslides. Consequently, it remains challenging to completely exclude the influence of decorrelated areas during detection, which may introduce certain errors in the results.
4.3. Model Performance Evaluation
The identification capability of the model is evaluated using the loss function derived from the model testing data. In the experiments, the classification loss function L
cls employs the cross-entropy loss function, and the L1 loss function is adopted for the bounding box regression loss L
bbox.
Figure 11 illustrates the variation trends of the loss functions for both the Faster RCNN and the proposed CRF-Faster RCNN models. As observed, the CRF-Faster RCNN model converges faster and achieves a lower loss value. The Faster RCNN model shows local peaks after 2000 iterations, accompanied by significant fluctuations in the loss value, as indicated by the red dashed rectangle in
Figure 11a. In contrast, the CRF-Faster RCNN model shows only minor local peaks before 2000 iterations, with slight fluctuations in loss values, indicating that the CRF-Faster RCNN model performs better in object recognition tasks.
Figure 12 illustrates the trend of mAP for the Faster RCNN and CRF-Faster RCNN models. The two models employ different initial learning rates. The CRF-Faster RCNN model, with a lower initial learning rate, enabling it to more thoroughly learn of image features and facilitating a stable rapid convergence towards the optimal solution. When the patience value is set to 10 epochs, the Faster RCNN model begins to converge after 20 epochs, whereas the CRF-Faster RCNN model converges after only 15 epochs. Furthermore, at the same convergence iteration count, the CRF-Faster RCNN model demonstrates higher convergence accuracy. The mAP of the Faster RCNN model eventually stabilized at 0.856, whereas the mAP of the CRF-Faster RCNN model converged to 0.888, approaching 0.9 asymptotically. These results demonstrate that the proposed CRF-Faster RCNN model not only converges faster but also achieves significantly improved identification accuracy, rendering it more suitable for landslide identification tasks.
4.4. Results of Landslide Hazard Identification Using Combined Optical Imaging and Deep Learning
To address the limitations of relying solely on InSAR deformation data for landslide identification, we conducted comprehensive interpretation of automatically detected landslide anomaly zones using optical remote sensing imagery. This analysis incorporated textural features, vegetation coverage patterns, and topographic characteristics to validate potential landslides. Through systematic elimination of other surface deformation anomalies, 38 potential landslides were identified, as shown in
Figure 13. Field verification confirmed an overall accuracy of 84%, comprising 15 confirmed historical landslides and 17 showing varying degrees of landslide evidence.
The ascending data detected 24 deformation zones, including 7 prominent deformation areas identified on the western bank of the Jinsha River. The maximum deformation rate observed was −79.2 mm/yr, as shown in
Figure 13a. The descending detected 22 deformation zones, with a maximum deformation rate of −53.59 mm/yr, and 8 significant deformation areas located on the eastern side of the river basin, as shown in
Figure 13b.
Due to the right-side looking geometry of the radar system perpendicular to its flight direction, the ground projection of the InSAR observation vector for ascending trends from west to east, whereas that for descending trends from east to west. This difference in observation geometry results in substantial variations in the surface coverage monitored by each track. As a consequence, eight landslides (H3, H5, H8, H10, H11, H14, H15 and H16) were detected by both ascending and descending in this study.
A total of nine typical landslides (H1, H8, H10, H11, H14, H16, H22, H30 and H31) were selected from the ascending and descending orbit data for in-depth analysis. Verification was conducted by overlaying and comparing InSAR-derived deformation monitoring points with high-resolution remote sensing images. Among these, the landslides at Baiya (H8), Xinjian (H10), Gukongmei (H14), Shuzhi (H16), and Ladingli (H30) landslides are paleolandslide masses that remain continuously active to date. These landslides primarily develop in surface slopes, residual layers and saturated or strongly weathered rock, classifying them as shallow or medium-thick landslides, and are closely associated with human activities.
Figure 14 presents the InSAR deformation rates and corresponding optical images for some of the potential landslide sites. The red solid lines delineate the landslide boundaries, the yellow dashed lines denote landslide subsidence areas, and the arrows represent the sliding directions.
4.5. Analysis of Typical Landslides
To further verify the accuracy of landslide identification, the Ligu (H3) landslide, which exhibits significant deformation, was selected for analysis. The H3 landslide is a giant ancient landslide located on the eastern bank of the Jinsha River Basin, adjacent to the river itself and in close proximity to the Ahai and Liyuan Hydropower Stations. Given its particular geographical setting, a potential reactivation of landslide movement at this location could trigger a cascade of adverse consequences. Specifically, it might result in river blockage, causing extensive damage to the downstream hydropower stations. This, in turn, could lead to subsequent flooding events and the onset of secondary geohazards. Such occurrences would pose a severe threat to the livelihoods and safety of the residents in downstream towns.
Figure 15 presents the descending deformation velocity along the LOS direction for the H3 landslide during the 2019–2021 period, derived using the SBAS-InSAR technique. The results reveal prominent deformation anomalies adjacent to the Jinsha River, corresponding to a newly identified landslide mass whose basal portion connects directly with the river channel. The maximum deformation rate of this newly identified landslide mass reaches −92.8 mm/yr. The landslide exhibits characteristic geomorphology with elevated margins and a depressed central portion, forming a distinct depression approximately 800 m wide, as shown in the white dashed line area in
Figure 15c. The landslide boundaries are clearly delineated, within which the deformation rates show significant acceleration.
Figure 16a shows the optical characteristics of the H3 landslide. The landslide area exhibits low vegetation coverage, with predominantly exposed ground surface and locally steep slopes. The red dashed line marks the original boundary of the H3 landslide, measuring approximately 3000 m in length, 2500 m in width, and covering an area of about 7 km
2. Multiple gullies have developed along the longitudinal deposit zone of the landslide body, while tensile-shear cracks are sporadically distributed across its surface. The yellow dashed line indicates the boundary of the newly identified landslide section, and the white dashed lines mark typical gullies and cracks.
Figure 16b–f present field investigation photographs of the landslide.
Figure 16b,c show panoramic views of the newly identified landslide mass taken at different times, clearly illustrating the overall subsidence displacement and areal expansion of the landslide body.
Figure 16d–f depict the surrounding rock layers and internal debris condition of the landslide. Evidence of sliding events is preserved on both the ancient and new landslide surfaces, showing extensive landslide traces, including multiple scratch marks and mirror-like features. The lithology of the landslide mass is dominated by argillaceous slate and metamorphic sandstone, with surface cracks approximately 10 cm wide observed along the periphery. These phenomena indicate that the landslide is in a highly unstable state, and under the influence of factors such as river erosion and rainfall, it is highly susceptible to further instability.
5. Discussion
This study proposes a CRF-Faster RCNN model integrating CBAM, ResNet-50, and FPN, which realizes the automatic identification of landslide anomaly areas based on InSAR deformation features. This method significantly improves the recognition accuracy in complex terrains. By combining high-resolution optical image verification and field investigation, it effectively ensures the reliability of identification results while guaranteeing identification efficiency, providing a feasible technical approach for the precise identification of landslide hazards. However, this method still has certain limitations: first, the recognition results are highly dependent on the quality and spatial continuity of InSAR data, limiting its application in areas with severe decoherence, and the model’s ability to distinguish non-landslide deformation signals such as engineering activities remains inadequate; in addition, the current research is mainly based on a single model architecture, and systematic multi-dimensional performance comparison with other mainstream landslide identification models has not yet been carried out.
Based on the current achievements and limitations, future research will further deepen in the following aspects: First, construct a multi-source remote sensing data fusion analysis framework to integrates optical imagery, LiDAR topographic features, and regional geological structure data, improving the model’s feature representation capability and generalization in complex environments. Second, carry out multi-model coupling and comparison research to enhance the method’s versatility and engineering practical value by integrating the advantages of different models. Third, explore weakly supervised and cross-regional transfer learning strategies in small-sample scenarios to reduce the model’s dependence on large labeled samples, enhancing its applicability in data-scarce areas, and providing more accurate and reliable technical support for geological disaster risk prevention and control.