1. Introduction
Earthquakes constitute a major trigger of geohazards, where intense ground shaking reduces the slope shear strength, inducing extensive landslides that severely threaten human lives and infrastructure. Western China, being a tectonically active region, faces high seismic risks with frequent geohazards, particularly in loess tablelands characterized by complex surface environments. Western China, a tectonically active region, faces frequent geological hazards and high seismic risks. This is particularly evident in the loess tableland areas characterized by complex surface environments. Due to the unique hydro-physical properties of loess, its structures are prone to collapse when saturated or subjected to external forces, leading to severe landslide hazards [
1,
2,
3,
4]. A stark example is the 1920 Haiyuan Earthquake—the largest seismic event ever recorded in loess terrain globally—which triggered 5384 landslides covering 218.78 km
2. The catastrophe claimed over 234,000 lives and caused devastating economic losses [
5,
6,
7], critically impeding post-disaster rescue and reconstruction efforts. Consequently, rapidly identifying landslides and mapping their accurate distribution across loess tablelands following earthquakes is essential for guiding emergency responses and damage assessments.
Modern remote sensing technologies have advanced rapidly, providing viable technical approaches for swift and accurate landslide detection and localization [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18]. The advent of UAV photogrammetry in particular has substantially refined the precision of geohazard detection. Current methodologies for landslide recognition using remote sensing imagery range from a manual visual interpretation and pixel-based classification to object-oriented image analysis and deep learning models [
19,
20,
21,
22,
23]. These innovations enable deep learning architectures to extract complex features from high-dimensional data via hierarchical representation learning. This approach effectively replaces manual detection, simultaneously enhancing speed and accuracy in target identification [
24,
25,
26]. Researchers have successfully deployed diverse deep learning frameworks, including convolutional neural networks (CNNs) [
27,
28,
29], Fully Convolutional Networks (FCNs) [
30], DeepLabv3+ semantic segmentation algorithms [
31], and ResU-Net models [
32,
33], for automated landslide extraction.
Driven by growing demands for geological hazard prevention and rapid advances in artificial intelligence, intelligent recognition and research of earthquake-triggered landslides have seen significant progress in recent years [
34,
35]. Fu et al. (2023) applied an enhanced YOLOv4 algorithm to identify landslides triggered by the 2021 Haiti
Ms 7.2 earthquake [
36]. Ju et al. (2020) combined deep learning methods with Google Earth imagery for the automated detection of historical landslides in typical loess regions of China [
37]. Zeng et al. (2025) employed the IDNPM (InSAR Data–Newmark Physical Fusion Driver Model) to rapidly assess landslides induced by the 18 December 2023 Gansu Jishishan earthquake, enabling the swift and accurate delineation of macro-scale landslide distributions [
38]. Du et al. (2023) integrated convolutional neural networks (CNNs) and Transformer architectures, adopting a DETR network with Transformer as the core framework for automated landslide detection [
39]. Bai et al. (2024) developed a deep learning-based landslide extraction method using 1 m-resolution Google Earth imagery, where change features derived from the object-oriented robust change vector analysis (RCVA) served as model inputs; their approach enhanced detection accuracy through a U-Net model incorporating dense upsampling and asymmetric convolutions [
40]. Yang et al. (2022) conducted landslide detection studies on loess landslides using GF-1 satellite imagery and DEM data, establishing a classified sample database of loess landslide remote sensing images and DEM features for the study area, and applied a channel-fused CNN model for landslide classification [
41]. Other researchers have tested various models on the Bijie Landslide Database provided by Wuhan University [
42]. Such research is often constrained either by the machine learning models themselves or by the use of lower-resolution satellite remote sensing imagery. This has led to a predominant focus on either single landslide hazards or landslide-prone areas within fixed regions. Consequently, there has not been a substantial increase in landslide inventory data, nor has there been a significant improvement in recognition accuracy metrics. Furthermore, while numerous deep learning models exist for landslide detection, most utilize training/inference data where landslides exhibit a distinct contrast against vegetation-rich backgrounds. In contrast, this study focuses on landslide detection in loess tablelands—where minimal differentiation between landslides and their surroundings poses greater recognition challenges. Current research on landslide object detection and segmentation in large-scale remote sensing imagery remains notably limited. Particularly in the loess tableland area, where landslides range in size from approximately 0.3 km
2 to 20 km
2, exhibit diverse morphologies, and display clustering characteristics [
43].
Existing public datasets predominantly feature 640 × 640 pixel dimensions, defining large targets as 160 × 160 pixels and small targets as ≤32 × 32 pixels. Baseline models with standard three-layer detection heads exhibit limited small-target detection capabilities, while vanilla CNN architectures using single-scale convolutional kernels inadequately extract irregular landslide boundaries, resulting in insufficient extraction precision, blurred edges, high false-positive rates, and slow processing speeds. To address these limitations, this study develops an enhanced model integrating multiple approaches. Leveraging 20 km2 of centimeter-resolution UAV aerial imagery from the 2023 Gansu Jishishan Ms 6.2 earthquake in loess tablelands, we trained the model with large-target samples. Key enhancements include introducing a higher-resolution P2 detection head to significantly increase feature map resolution, enabling the effective coverage of 4–16 pixel micro-targets for improved small-landslide detection and thereby overcoming the original model’s deficiencies in small-target detection. Furthermore, our integrated workflow of mega-tile segmentation, landslide instance extraction, and tile mosaicking collectively elevates the timeliness and accuracy of landslide detection in loess tablelands.
3. Methods
Following aerial data acquisition of the study area using a full-frame orthophoto camera mounted on a fixed-wing UAV, this study processed Region 1 imagery to establish an initial landslide dataset. Upon completion of the initial dataset, Generative Adversarial Networks (GANs) and data augmentation techniques were employed to expand the dataset volume. Through the creation of landslide labels, a final annotated landslide sample dataset was generated for training instance segmentation models. Concurrently, Region 2 imagery was processed into a DOM to serve as inference data for validating the landslide instance segmentation model. The workflow is illustrated in
Figure 3.
On the other hand, the native YOLO architecture based on the ultralytics framework was enhanced by incorporating a P2 shallow detection head alongside the original P3, P4, and P5 deep detection heads. This modification improves recognition accuracy for small targets within ultra-large images. The refined YOLO pre-trained weights were integrated into the convolutional neural network (CNN) pipeline. Combined with the pre-constructed landslide sample dataset, this framework facilitated the training and development of an instance segmentation model for landslide detection. Instance segmentation—a computer vision task—extends beyond object detection by precisely identifying individual objects and delineating their boundaries within an image. This advanced technique not only localizes objects but also accurately traces their contours [
49,
50]. The training process underwent iterative refinement until an optimal segmentation model was achieved. Subsequently, this model was deployed for inference and testing on landslides in Region 2. The overall methodology comprises three phases: 1. landslide sample dataset construction; 2. architectural enhancement of the native YOLO model; and 3. inference validation using the trained landslide instance segmentation model. A critical note is that post-training model evaluation, optimization, and iterative refinement are essential before final deployment.
3.1. YOLO and Baseline Models
This study employs YOLOv8 as the deep learning framework. The model’s core capability in detecting multi-scale objects is primarily governed by its feature pyramid architecture, which adheres to the principles of Feature Pyramid Networks (FPNs). Different hierarchical detection heads are responsible for targets within specific size ranges, with their effective coverage areas mathematically correlated to feature map resolutions (
Figure 4). Taking a standard 640 × 640-pixel input image as an example, after five successive downsampling operations within the model, the highest-level P5 feature map attains a resolution of 20 × 20. At this stage, each unit in P5 covers a receptive field of approximately 32 × 32 pixels in the original image, theoretically enabling the detection of minimal targets around 32 pixels. The intermediate P4 layer primarily detects medium-sized targets ranging from 16 to 64 pixels. Closest to the input, the P3 layer handles relatively smaller targets typically sized between 8 and 32 pixels. However, this standard three-head structure (P3–P5) exhibits a limited capacity for capturing targets below 8 pixels, creating detection blind spots—a significant limitation of the native YOLOv8 architecture.
3.2. Enhanced YOLO with Shallow Detection Heads
To overcome this limitation, this study implements a critical enhancement to YOLOv8’s shallow architecture by introducing an additional higher-resolution P2 detection head. This head employs specialized upsampling operations to integrate information from shallower feature layers through inter-module upsampling, concatenation, and optimized feature extraction, significantly boosting feature map resolution. The enhancement enables the P2 layer to effectively detect 4–16 pixel-sized targets, thereby alleviating the original model’s deficiency in small object recognition. The implemented architecture introduces a novel shallow-layer modification (indicated by the yellow region in
Figure 5). This modification: (1) initiates feature extraction and downsampling at Module 18; (2) performs cross-level feature fusion at Module 20 using outputs from Module 15; (3) finally concatenates these fused features with the original Module 15 components to generate enhanced representations.
Specifically, the deep network layers tend to lose feature information of small targets during processing. By effectively preserving shallow-layer detail features and complementarily fusing them with deep semantic information, the feature map output from the earlier C2 layer in the backbone network maintains twice the resolution of the subsequent C3 layer and seven times that of the deepest C5 layer. This high resolution enables the C2 layer to retain finer image structures and contour details. However, the original YOLOv8 design constructs its feature pyramid solely using the C3, C4, and C5 layers, causing these critical details carried by C2 to be underutilized and rapidly diluted in subsequent processing.
The core innovation of our solution lies in establishing a bidirectional feature fusion pathway: (1) processing C2 features through convolution and normalization; (2) simultaneously upsampling C3 features; (3) merging the processed C2 features with upsampled C3 features at the pixel level; and (4) introducing a channel attention mechanism to adaptively balance the contributions of spatial details from C2 and semantic content from C3 in the fused features. The resulting P2 feature map preserves its native high-resolution advantage while incorporating a deeper semantic understanding, enabling the precise localization and effective recognition of minuscule targets. This hierarchical fusion strategy delivers dual benefits. Firstly, it creates bidirectional information exchange between shallow (C2) and deeper (C3) layers, substantially compensating for the spatial detail loss caused by repeated downsampling in deep networks. Secondly, the modular design of this enhancement allows direct integration into the standard YOLOv8 without extensive architectural modifications, ensuring seamless compatibility.
3.3. Large-Scale Image Tile Staging Strategy
Tailored for geological remote sensing applications, this strategy implements intelligent segmentation processing of geocoded imagery for large-scale landslide instance segmentation (
Figure 6). The workflow commences with initialization: creating output directories and loading raw GeoTIFF images, while retrieving critical metadata (image width, height, geotransform parameters, and projection information) through GDAL libraries. The core procedure calculates effective overlap pixels and step sizes based on predefined tile dimensions (e.g., 1200 × 1200 pixels) and overlap ratios (recommended 10–20%), where the X-direction step equals tile_width-int(tile_width × overlap_ratio), with analogous computation for the Y-direction.
An intelligent algorithm dynamically generates tile starting positions by systematically traversing image dimensions to determine tile origins at computed step intervals. Specialized edge case handling automatically adjusts the final tile position when residual areas exceed the tile dimensions but fall below the step thresholds, ensuring complete coverage. During tile execution, GDAL’s Translate function employs srcWin parameters to specify crop regions while preserving the original georeferencing coordinates and projection metadata, with LZW compression optimizing storage. Integrated tqdm progress bars provide real-time operational feedback, displaying current row/column positions and completion percentages and delivering intuitive monitoring for technicians.
This method delivers significant advantages for landslide detection. Georeferencing preservation maintains precise coordinates and projections in each tile to support the subsequent spatial analysis. Configurable overlap mechanisms (10–15% recommended) effectively prevent landslide body truncation. The streaming processing design eliminates large memory requirements while handling multi-GB imagery. Intelligent edge handling guarantees complete spatial coverage. For landslide applications, parameters should be set according to feature dimensions: 1024 × 1024 pixels are suitable for small-to-medium landslides, whereas large landslides require 2048 × 2048 tiles. Projection distortion issues in high-latitude regions require special attention.
Key limitations include GDAL I/O bottlenecks causing slow processing of large images (significant time costs for GB-scale data), 30–40% storage overhead from overlapping designs, and exponential tile quantity growth with image dimensions. Successfully deployed in provincial-scale landslide inventories and post-disaster emergency responses, future enhancements would involve multiprocessing parallelization (e.g., multiprocessing.Pool), cloud storage integration (AWS/Azure compatibility), GeoJSON tile-index generation, and adaptive tiling strategies based on topographic complexity to optimize performance and utility.
3.4. Dataset and Annotation
For deep learning applications, sample data must be partitioned into training, validation, and test datasets. The training data facilitate target feature learning, the validation data are used to select optimal models, and the test data are used to evaluate model performance. Establishing a robust sample library is fundamental to landslide detection via deep learning—sufficient high-quality samples account for over half the success in landslide recognition. Guided by principles of seismic emergency response and rapid disaster assessment, this study rapidly established a new sample database. Additionally, we employ GANs for multiscale data augmentation of initial landslide samples (
Figure 7). Through adversarial training, diverse synthetic loess landslide samples are generated to resolve sample scarcity in earthquake-affected areas. As illustrated in
Figure 1, aerial imagery covering ~6 km
2 in Yangwa Village, Liuji Township (proximal to the North Margin Fault of Laji Mountains) was acquired as foundational data (Region 1). To optimize network training, all image samples were cropped to 1024 × 1024 pixels based on an integrated analysis of the pixel dimensions and landslide sizes.
From this area, 200 images containing ~300 landslides were selected as core samples. To enhance recognition stability and accuracy—critical given deep learning’s dependency on large datasets—offline augmentation techniques were applied; mirroring, rotation, flipping, and brightness adjustment expanded the landslide samples to 2198. The dataset was partitioned in a 7:3 ratio for training (1868 samples) and validation (330 samples). To rigorously validate the method’s applicability for post-earthquake disaster assessments, the test set was strictly isolated: ~15 km
2 of aerial data covering Majia–Goujia Village in Liuji Township (Region 2) served as independent test samples (
Table 1).
Landslide imagery was annotated using polygon labeling via the LabelMe tool (
Figure 8), generating individual JSON files containing contour coordinates and categorical labels for each landslide instance. Due to format incompatibility between LabelMe JSON and YOLO requirements, conversion to YOLO instance segmentation format was implemented. This standardized format supports training mainstream segmentation models (e.g., YOLOv8-Seg) while offering lightweight processing and efficient parsing advantages. The workflow emphasizes tool adaptability (LabelMe → YOLO), technical execution (coordinate normalization), and quality control (visual verification), rendering it optimally suited for geohazard detection research in this study.
3.5. Progressive Training Strategy
Progressive training represents a phased, hierarchical model optimization strategy. This approach employs a three-stage mechanism—freezing network layers, partially unfreezing parameters, and global fine-tuning—to optimally balance model stability with parametric adaptability. Unlike conventional full-network training where all layers update simultaneously, progressive training achieves precise gradient control, stable feature learning, and efficient resource utilization through stratified parameter activation. When applied to complex detection architectures like YOLO, it demonstrably accelerates convergence and enhances small-object detection accuracy. The methodology offers distinct advantages in training strategy design, mitigation of premature convergence, accelerated training efficiency, improved model robustness, performance optimization, broader application potential, and reduced memory footprint.
3.6. Batch Segmentation and Coordinate Extraction Technique
The workflow operates within a Python 3.8+ environment utilizing libraries including ultralytics, GDAL, numpy, Pillow, and opencv-python, covering the entire process from remote sensing image reading and preprocessing to object detection, segmentation visualization, and georeferenced result export. For image processing, the system handles diverse data types through a normalize_image function that standardizes pixel values to the 0–255 range for model compatibility. Geospatial integrity is preserved when processing TIFF files via GDAL, which maintains geotransform parameters and projection metadata throughout loading and saving operations. The architecture supports 8-bit, 16-bit, and floating-point remote sensing data, with critical functions like load_tiff and normalize_image incorporating robust exception handling for immediate error feedback. Automated normalization dynamically adapts to varying data ranges, minimizing manual intervention. Enhanced visualization capabilities generate intuitive segmentation outputs featuring adjustable bounding boxes, centroids, and masks, and customizable parameters like line width and font size for analytical clarity. Key technical challenges stem from the inherent characteristics of remote sensing data: multispectral bands, large volumes, and varied formats create significant preprocessing and model adaptation hurdles across sensor platforms. Additionally, accurate transformation of pixel coordinates to real-world geographic positions requires the precise application of geotransform parameters, where any deviation compromises geolocational fidelity.
4. Experimental Results
Built upon the YOLO framework, this research advances small-target detection capabilities through an optimized detection head, coupled with a processing pipeline of large-scale tiling, batch segmentation, and tile mosaicking. Integrated with progressive training, the method enables efficient annotation, training, detection, and refined segmentation of loess landslide data in the Jishishan seismic zone.
4.1. Model Training Outcomes
This study conducts a rigorous comparative evaluation between conventional 500-epoch training and a progressive training protocol spanning 500 epochs. The progressive strategy implements three distinct optimization phases: ① foundation feature stabilization (0–150 epochs)—full parameter freezing of the backbone network preserves pre-trained feature representations while mitigating early-stage overfitting; ② mid-level feature tuning (151–350 epochs)—selective unfreezing of intermediate backbone layers optimizes feature extraction capabilities, balancing localization and classification performance; and ③ global feature refinement (351–500 epochs)—complete network unfreezing (100% trainable parameters) enhances the small-target detection capacity with overfitting controls. The quantitative evaluation demonstrated the superiority of progressive training in accuracy metrics: mAP50(M) increased from 0.704 (conventional) to 0.747 (progressive, +6.1% improvement), while mAP50-95(M) rose from 0.413 to 0.468 (+13.3%). Inference speed exhibited a marginal reduction from 45 FPS (conventional) to 43 FPS (−4.4%), though both methodologies maintained identical model sizes (87 MB). Collectively, progressive training delivers substantial accuracy gains with negligible computational trade-offs and unaffected model compression (
Table 2).
Box_loss rapidly converged from an initial 2.0 to below 1.0, indicating a stable improvement in landslide localization. It stabilized after 400 epochs (
Figure 9A). Seg_loss dropped significantly from 5.0 to approximately 1.0, demonstrating effective edge segmentation optimization with minimal late-stage fluctuation (
Figure 9B). Cls_loss steadily converged below 0.5, confirming enhanced landslide/non-landslide classification accuracy (
Figure 9C). Dfl_loss smoothly decreased below 0.5, validating the reliable bounding box probability distribution prediction (
Figure 9D). mAP50(B) (object detection, IoU = 0.5) reached 0.6 with a rising trend, reflecting a high landslide detection accuracy (
Figure 9E). mAP50-95(B) (IoU = 0.5–0.95) progressively improved to 0.4, showing robust performance across localization precision requirements (
Figure 9F). mAP50(M) (instance segmentation, IoU = 0.5) approached 0.6, confirming strong contour segmentation alignment with the ground truth (
Figure 9G). mAP50-95(M) (IoU = 0.5–0.95) achieved 0.4 without saturation, suggesting potential for refinement in complex landslide segmentation (
Figure 9H). The train_seg_loss descended to a stable low plateau after 500 epochs, confirming effective feature learning and training data fitting. Despite this, persistent overfitting risks necessitate complementary validation; while the val_seg_loss demonstrated robust initial decline and gradual stabilization, indicating a sound generalization capability, a comprehensive evaluation requires corroboration with validation mAP metrics to ensure field deployment reliability.
The convergence of all four losses within 500 epochs confirms robust training stability. With mAP50(B/M) > 0.6 and mAP50(M) = 0.747, the algorithm demonstrates exceptional competence in detecting and segmenting seismic loess landslides. The instance segmentation score (mAP50-95(M) = 0.468), while lower than detection metrics, reflects characteristic challenges in delineating irregular geological boundaries. This approach thus enables rapid, accurate landslide detection in loess tablelands—critical for emergency geohazard responses.
4.2. Ablation Experiments
To evaluate the impacts of different modules on small-target detection performance in YOLOv8, we conducted ablation experiments. By comparing the native model with progressively enhanced variants, we validated each module’s effectiveness and the necessity of joint optimization. Starting from the baseline YOLOv8n, we incrementally introduced the P2 small-target detection head and assessed its individual and combined effects. The P2 detection head significantly enhanced small-target feature representation by incorporating high-resolution detection layers. Under progressive training, it improved mAP@0.5 by 8.4% and mAP@0.5:0.95 by 16.1%, while boosting small-target recall by 17.4% at a computational cost increase of 2.5 × 10
9 GFLOPs. Despite increased parameters and computation, the accuracy gains were substantial. These findings demonstrate the clear practical value of integrating shallow detection heads for small-target tasks, providing empirical evidence for balancing real-time performance and accuracy (
Table 3).
4.3. Landslide Segmentation Results
Geospatial data processing and visualization utilize GDAL to precisely read image geotransform parameters and projection metadata. Pixel-to-geographic coordinate conversion is achieved via affine transformation with sub-pixel accuracy (<0.5 pixel error). Landslide visualization employs OpenCV contour analysis to calculate mask centroids, and then applies alpha blending (addWeighted) to fuse green masks onto the original imagery. All outputs retain original georeferencing. The key innovation is an adaptive rendering engine that intelligently processes 8- to 16-bit imagery while preserving native coordinate systems and projections. It supports both single-band grayscale and multiband color images, leveraging OpenCV hardware acceleration for high-throughput processing. This ensures professional-grade precision for geohazard monitoring and visualization (
Figure 10).
Following batch instance segmentation and coordinate extraction, the massive volume of tiled TIFF files was consolidated using ArcMap’s Mosaic To New Raster tool (Version 10.8.1). Through parameter optimization, including coordinate system adjustment, pixel type specification, configuration of the number of bands, and mosaic colormap mode selection, a seamless 15 km2 orthorectified DOM product was generated, as illustrated in the accompanying figure. Identified landslide centroids were systematically stored in detected_cords.txt.
Statistical analysis of the processed tiles revealed 417 detected landslides, with orthographic areas ranging from 0.217 km
2 (maximum) to 7 × 10
−4 km
2 (minimum). Landslide areas were predominantly distributed within the 0–0.02 km
2 range (aggregate area: 1.3 km
2), confirming the prevalence of small-scale landslides. Visual verification identified eighteen false positives (false discovery rate: 4.3%) and seven missed/partial detections (omission rate: 1.6%). These accuracy metrics align with Chen et al.’s findings derived from GaoFen-1/6 imagery in the same seismic context [
48].
The study area comprises three distinct sectors: northern Goujia Village, central Majia Village, and southern Liugou Township (
Figure 11A). The landslide distribution in northern Goujia exhibits marked spatial clustering, with a high density per unit area and frequent coalescence into contiguous failure zones. This aggregation likely originates from homogeneous geological conditions, topographic constraints, and concentrated seismic shaking intensities. These clustered failures appear as large-scale morphological features in medium–high-resolution imagery, predominantly exhibiting arcuate (horseshoe-shaped), dendritic, fan-shaped, or elongated tongue-like configurations that extend considerable distances downslope. Linear failures are frequently observed along gully systems as a distinct failure pattern (Figure 13B). Particularly near Hongtuwa, coalescing landslides form expansive complexes exhibiting significant secondary hazards. In central Majia Village, landslides predominantly concentrate along loess platform margins and adjacent gullies (
Figure 11B), where seismic amplification destabilizes slope toes. Steep escarpments (>45°) demonstrate heightened susceptibility due to amplified inertial forces during ground shaking. Anthropogenic activities further exacerbate risks: improper irrigation and slope excavations trigger failures along modified terraces and cultivated field edges (
Figure 11C), with 68% of landslides occurring within 200 m of human-altered terrain. Southern Liugou features dispersed, small-scale landslides concentrated near highway tunnel portals traversing the village. These pose critical secondary hazards requiring urgent mitigation, as evidenced by debris flow channels extending toward residential clusters.
The analysis reveals three defining characteristics of landslides in loess tableland terrain. Spatially, failures concentrate densely along river valleys and gullies, demonstrating strong topographic and structural control. Morphologically, diverse failure types, including shear-driven, liquefaction-induced, and seismic subsidence landslides, exhibit distinct slip surface geometries and triggering mechanisms. Temporally, synchronized multi-point failures occur during seismic events, predominantly generating small-to-medium landslides (hundreds to thousands of square meters), though large-scale failures (>104 m2) occasionally cause significant damage impacts. Comparatively, loess collapses and slumps typically manifest smaller dimensions (tens to hundreds of square meters).
5. Discussion
5.1. Feasibility Analysis of the Small-Target Segmentation Strategy
Through 3D model verification and field investigations, 18 false positives were identified and categorized into three primary error types: (1) topographic shadow misclassification—steep ridge-top shadows cast by solar illumination mimic landslide scarps in optical imagery, leading to erroneous delineation (
Figure 12A); (2) complex texture artifacts—terraced fields, erosional features, and ridge intersections form pseudo-circular or arcuate boundaries resembling landslide crowns (
Figure 12B); and (3) anthropogenic feature confusion—haystack clusters on artificially modified earthen embankments simulate landslide morphology (
Figure 12C). Additional limitations include partial segmentations (
Figure 12D) and minor omissions (
Figure 12E). Notably, 14 loess collapse features were misclassified as landslides (
Figure 12F and
Figure 13D). Given this study’s focus on a rapid regional assessment of seismic hazards, and considering collapses as critical secondary geohazards in loess terrain, these were excluded from false positive tallies. Tile processing constraints (1200 × 1200 pixel size) caused marginal segmentation discontinuities without compromising whole-landslide recognition. Three instances of such landslide bodies being segmented into multiple parts exist, with a total of nine segments preserved in the detected_cords.txt file. However, for the convenience of statistical analysis and subsequent work, they have not been deleted (
Figure 12G). Crucially, the model demonstrated a discrimination capability against spectrally similar features—rural earthen roads were never misclassified (
Figure 12H)—validating the training sample quality and recognition precision.
Beyond the documented errors, the overall recognition success rate reached 94.1%. Leveraging high-resolution UAV data, the precise instance segmentation training proved highly effective; nearly all detected landslides represent neogenic failures characterized by short formation times, high spectral reflectance, and complete slope disintegration. Conversely, relict landslides exhibiting spectral homogeneity with their surroundings and anthropogenic modification (e.g., conversion to farmland/villages) were never misidentified, with all false positives attributable to the previously outlined categories.
The model achieves a processing speed exceeding 40 FPS with minute-level latency for batch operations while delivering high-precision performance, as evidenced by the mAP and recall metrics meeting benchmark standards for superior models, alongside a 94.1% recognition rate. This validates the proposed small-target segmentation strategy’s capability for rapid, intelligent detection and accurate delineation of landslides in the study area.
5.2. Characteristics of Landslide Hazards in Loess Tableland Areas
Morphological and genetic characteristics of loess tableland landslides were elucidated through high-precision modeling and field validation. Statistical analysis reveals near-vertical scarps (70–90°) controlled by well-developed vertical joints in loess (
Figure 13A,D), with diverse failure morphologies, including arcuate (horseshoe-shaped), dendritic, fan-shaped, and elongated tongue-like forms, exhibiting considerable downslope extents—particularly along gullies (
Figure 11B,C)—alongside armchair-shaped failures featuring curved main scarps and lateral confinement. Runout distances vary substantially (meters to hundreds of meters) and are influenced by the scale, gradient, and water content, with high mobility characterizing saturated loess flows [
51,
52]. Collectively, the landslide distribution exhibits spatial heterogeneity across regions, though most areas demonstrate pronounced clustering characterized by a high density per unit area where numerous failures occur in proximal distribution.
Figure 13.
Field validation findings of landslides. (A) Landslide mass controlled by well-developed vertical joints in loess; (B) Gully slope landslide; (C) Extensive loess landslide mass; (D) Landslide mass controlled by well-developed vertical joints in loess.
Figure 13.
Field validation findings of landslides. (A) Landslide mass controlled by well-developed vertical joints in loess; (B) Gully slope landslide; (C) Extensive loess landslide mass; (D) Landslide mass controlled by well-developed vertical joints in loess.
5.3. Optimization Recommendations and Future Work
Our experimental observations reveal an inherent constraint: limited landslide occurrences within the initial study area necessitate spatial expansion for larger-scale analysis. However, broadening the investigation scope introduces greater morphological diversity among landslides, consequently increasing extraction complexity. To address this, we implement an incremental detection strategy. The process begins by training models on smaller sub-regions, then applying these to adjacent areas for preliminary detection. Newly identified landslides are iteratively incorporated into the training set, progressively enriching sample diversity while refining model performance. This cyclic optimization enables gradual expansion across the loess terrain until full coverage is achieved. Furthermore, the introduction of the P2 detection head in YOLOv8 significantly enhanced small landslide detection capabilities, improving mAP@0.5 by 8.4% and mAP@0.5:0.95 by 16.1% under progressive training; however, this concurrently increased the computational overhead by 15–30% and model complexity, potentially hindering edge deployment. Our forthcoming solution incorporates a Feature Enhancement Module (FEM) that addresses these limitations through multi-scale feature fusion and channel attention mechanisms. This novel upgrade inserts a triple-branch structure at the backbone’s terminus, employing dilated convolutions for large-receptive-field geological features, depthwise separable convolutions for small-target texture optimization, and channel attention to amplify landslide-sensitive features, with weighted fusion delivering processed features to detection heads. The FEM demonstrably improves landslide edge feature preservation, boosts P2-layer small-target recall by ≥10%, resolves feature dilution with minimal computational impact, and effectively suppresses vegetation/bare soil interference in remote sensing imagery through attention mechanisms, notably enhancing geological texture discrimination.
This study employs a large-scale tiling–landslide instance segmentation–tile mosaicking strategy to achieve the rapid and precise detection and segmentation of seismic landslides across extensive loess tableland areas. In practical applications, the instance segmentation results enable the retrieval of quantitative parameters for earthquake emergency responses. For instance, utilizing limited post-earthquake data to retrieve landslide quantitative parameters, including volume and deposit volumes, facilitates the timely compilation of landslide inventories in critical zones, revealing spatial distribution patterns. This comprehensively assesses seismic impacts on surface morphology, topography, and stability, providing accurate information to enhance rescue efficiency. Furthermore, it quantifies landslide damage severity to inform scientific reconstruction planning, guiding rational land-use and engineering development. The approach also enriches landslide research databases, supplying foundational data for investigating failure mechanisms and kinematic behaviors and thereby enabling more accurate risk prediction models with improved precision. Ultimately, this reduces landslide hazards while safeguarding lives, property, and geological stability.
6. Conclusions
The Ms 6.2 Jishishan earthquake in Gansu Province struck the loess tableland area at the northeastern margin of the Tibetan Plateau, inducing severe geohazards. To identify and assess landslides in the affected zone, this study proposes a rapid detection and segmentation method integrating enhanced deep learning algorithms with a large-scale tiling–landslide instance segmentation–tile mosaicking strategy, and applied it to landslide-prone areas encompassing Yangwa Village and the Goujia–Majia–Liugou sector, totaling approximately 20 km2. The technical integrity and feasibility of this methodology are empirically demonstrated. The enhanced deep learning model demonstrates significantly improved feature representation capabilities for small targets, achieving simultaneous convergence of all four loss functions within a 500-epoch progressive training strategy. Performance metrics show both an mAP50(B/M) exceeding 0.6 and an mAP50(M) reaching 0.747, confirming the superior detection and segmentation efficacy for loess landslides triggered by the Jishishan earthquake. While the instance segmentation metric (mAP50-95(M) = 0.468) underperforms detection benchmarks, it aligns with the challenging nature of irregular boundary segmentation characteristic of geological hazards. Validation via high-precision 3D model comparison and visual interpretation confirms the approach’s detection accuracy and segmentation precision. While limited false positives and omissions persist, the imperative for efficient post-earthquake geohazard investigation and precise loess landslide recognition grows amid escalating disaster prevention demands, the limitations of traditional methods, and rapid AI advancements. This work not only expands seismic landslide inventories in loess tablelands but also provides novel technical frameworks for future earthquake emergency responses. The improved model offers a viable optimization strategy for deep learning-based rapid landslide detection and segmentation, demonstrating significant potential for machine learning in geohazard research.