Enhanced YOLO11n for UAV-Based Surface Crack Detection in Mining Subsidence Areas

Wang, Mo; Zhao, Nan; Liu, Chuangchuang; Rao, Wanxiang; Zhang, Zhijun

doi:10.3390/pr14121988

Open AccessArticle

Enhanced YOLO11n for UAV-Based Surface Crack Detection in Mining Subsidence Areas

by

Mo Wang

¹

,

Nan Zhao

¹,

Chuangchuang Liu

¹,

Wanxiang Rao

² and

Zhijun Zhang

^1,*

¹

Langfang Integrated Natural Resources Survey Center, China Geological Survey, Langfang 065000, China

²

Independent Researcher, Beijing 100000, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(12), 1988; https://doi.org/10.3390/pr14121988

Submission received: 27 May 2026 / Revised: 12 June 2026 / Accepted: 17 June 2026 / Published: 18 June 2026

(This article belongs to the Special Issue Intelligent and Sustainable Safe Coal Mining: AI-Assisted Disaster Mitigation, Carbon Sequestration, and Energy Utilization)

Download

Browse Figures

Versions Notes

Abstract

Mining-subsidence-induced surface cracks pose substantial risks to ecological systems, infrastructure stability, and mining safety. Their thin, elongated, discontinuous, and low-contrast characteristics make accurate detection from unmanned aerial vehicle (UAV) imagery challenging, particularly under complex environmental conditions. This study proposes an enhanced YOLO11n framework for detecting surface cracks in mining subsidence areas. Switchable Atrous Convolution (SAConv) was incorporated to strengthen multi-scale feature extraction, while Cascaded Group Attention (CGA) was introduced to suppress background interference and improve feature discrimination, and Shape-IoU loss was adopted to enhance the localization of slender crack targets. The model was evaluated using 5000 annotated UAV images collected in the Zhungeer mining area. It achieved a precision of 85.6%, a recall of 77.9%, an mAP@0.5 of 84.3%, and an F1-score of 81.6%. Compared with the baseline YOLO11n, precision, recall, and mAP@0.5 increased by 1.4, 4.6, and 3.2 percentage points, respectively. Cross-dataset evaluation on the public Crack500 dataset further demonstrated improved robustness under domain variation. These results indicate that the proposed framework improves the detection and localization of slender and discontinuous cracks in complex mining environments, supporting its application in UAV-based geological hazard monitoring.

Keywords:

mining subsidence areas; surface crack detection; UAV imagery; YOLO11n; geological hazard monitoring

1. Introduction

Mining activities worldwide can induce ground deformation and surface cracks, which threaten ecological systems, infrastructure stability, and mine safety. In coal-producing regions, including but not limited to western China, large-scale and high-intensity extraction makes timely crack monitoring especially important [1,2,3,4].

Traditionally, surface crack monitoring in mining areas has relied on manual surveys, interferometric synthetic aperture radar (InSAR), and satellite remote sensing interpretation [5,6]. However, the spatial resolution of InSAR and satellite imagery is often insufficient for detecting narrow and discontinuous cracks, while manual surveys are inefficient for large-scale applications. Moreover, complex terrain and environmental variability further reduce monitoring reliability [7]. As a result, timely and accurate crack detection remains challenging, limiting the effectiveness of geological hazard assessment and early warning [8]. Therefore, developing high-resolution and automated crack detection methods remains an urgent research priority.

With the rapid development of unmanned aerial vehicle (UAV) technology, low-altitude remote sensing has provided new opportunities for geological hazard monitoring in mining areas [9]. Owing to their flexible deployment, high operational efficiency, and capability to acquire high-resolution imagery, UAVs have significantly improved the spatial accuracy and monitoring efficiency of surface crack detection [10,11]. Nevertheless, the processing and interpretation of UAV imagery remain challenging, particularly in the accurate extraction of crack information from large volumes of high-resolution images [12]. Consequently, developing efficient and robust crack detection methods from UAV imagery has become a key research focus in mining geological hazard monitoring.

With the rapid development of deep learning, intelligent surface crack detection has become an effective approach for geological hazard monitoring in mining areas [13]. Although deep learning models have achieved high accuracy in crack detection by learning complex image features [14], mining-induced surface cracks are characterized by elongated morphologies, large-scale variations, and complex background interference [15,16]. As a result, existing methods still face three key challenges: insufficient extraction of elongated and discontinuous crack features, vulnerability to background interference, and inaccurate localization of small and slender targets. Therefore, developing a robust crack detection method for complex mining environments remains crucial for improving monitoring accuracy and efficiency.

Through extensive evaluations of convolutional structures, attention mechanisms, and loss functions in the You Only Look Once (YOLO) series [17], optimal performance was achieved by integrating three modules. The main contributions are as follows:

A proposed SAConv module is introduced to enhance the multi-scale representation of mining-induced surface cracks.
A CGA attention mechanism is incorporated to improve feature discrimination under complex backgrounds.
Shape-IoU loss is adopted to improve localization accuracy for slender and irregular cracks.
A UAV-based dataset of 5000 annotated crack images is constructed for performance evaluation in mining subsidence areas.

2. Related Work

2.1. Traditional Crack Detection Methods

Accurate detection of mining-induced surface cracks is essential for geological hazard monitoring and prevention [18,19]. Early studies mainly relied on manual field surveys, InSAR, satellite remote sensing interpretation, and other traditional monitoring techniques [20]. Although these methods provide valuable geological information, they are often limited by labor intensity, low efficiency, and insufficient spatial resolution for detecting fine cracks.

2.2. UAV-Based Crack Monitoring

With the rapid development of UAV technology, low-altitude remote sensing has become an important tool for surface crack monitoring owing to its flexible deployment, high efficiency, and centimeter-level imaging capabilities [21,22]. Its applications in geological hazard investigation have expanded considerably in recent years [23]. However, UAV-based crack detection still faces challenges in balancing detection accuracy, computational efficiency, and robustness when processing large-scale high-resolution images under complex mining conditions [24].

2.3. Deep Learning-Based Crack Detection

Deep learning has become the dominant approach for crack detection due to its strong feature-learning capability [25]. Hou et al. [26] and Kang et al. [27] demonstrated the superiority of UAV imagery over satellite data for detecting mining-induced cracks, although manual interpretation remains labor-intensive and subjective. Bo et al. [28] proposed the MF-GDOG algorithm to improve fine-crack extraction through multi-scale feature fusion, but its adaptability to irregular crack distributions is limited. Wei et al. [29] enhanced crack detection by introducing residual dilated convolutions and coordinate attention mechanisms, improving small-target recognition. Ming et al. [30] employed contour evolution for automated fissure extraction from UAV imagery, while Wang et al. [31] combined dynamic snake convolution and dilated convolution to improve fracture extraction accuracy. Zhang et al. [32] incorporated deformable convolution into YOLOv8n, enhancing small-target perception and adaptability to crack morphology.

Despite these advances, accurate detection of elongated and low-contrast cracks under complex mining environments remains challenging [33]. Mining-induced cracks often exhibit discontinuous structures, large-scale variations, and severe background interference from vegetation, shadows, and surface textures. In addition, limited annotated datasets and the trade-off between detection accuracy and computational efficiency further constrain the practical application of existing methods.

International studies have also reported strong progress in image-based crack detection using deep learning and remote sensing platforms. For example, convolutional neural networks and fully convolutional networks have been widely used for pavement and concrete crack detection, while UAV-assisted visual inspection has been applied to infrastructure and geohazard monitoring [34,35,36,37,38,39]. These studies indicate that robust crack detection is a global research topic rather than a problem limited to one country, and they provide useful methodological references for mining-subsidence crack detection.

The comparison shows that previous studies have improved crack detection from different perspectives in Table 1, but few simultaneously address elongated mining-induced cracks, background interference, and shape-aware localization under UAV photogrammetric conditions.

3. Proposed YOLO11n Network Architecture

Compared with YOLOv8, YOLO11 introduces improvements mainly in feature extraction and fusion while retaining a similar detection head for framework compatibility. Specifically, the backbone replaces the C2f module with the more efficient C3k2 module within the CSPDarknet architecture [40]. In the neck, YOLO11 maintains the PANet-based feature pyramid and further incorporates C3k2 modules to enhance multi-scale feature interaction and representation, thereby improving detection performance, particularly for small targets.

YOLO11n was selected as the baseline because it provides a compact one-stage detector with improved C3k2 feature extraction and PAN-FPN feature fusion while retaining high inference speed and straightforward deployment. Compared with heavier transformer-based detectors, RT-DETR-like frameworks, EfficientDet-style compound-scaled detectors, or segmentation networks, YOLO11n offers a practical balance between accuracy, training stability, and deployment simplicity for UAV crack inspection. Therefore, the enhanced model was built on YOLO11n to test whether crack-specific feature modules could improve detection while keeping the workflow compatible with lightweight YOLO deployment pipelines.

Based on YOLO11n, this study proposes an enhanced framework for surface crack detection in coal-mining subsidence areas. To address the challenges of multi-scale variation, slender morphology, and complex background interference, three improvements are introduced: a proposed SAConv module, a CGA mechanism, and the Shape-IoU loss function. The detection head retains the decoupled design of YOLO11, where classification and regression are performed separately to better exploit semantic and spatial information, while an optimized channel allocation strategy improves computational efficiency. The overall architecture of the proposed network is shown in Figure 1.

The integration strategy is as follows: SAConv is embedded into selected C3k2 blocks in the backbone to form C3k2_SAConv, so that low- and middle-level feature maps can capture elongated crack continuity with adaptive receptive fields. CGA is inserted into the neck after multi-scale feature fusion, where P3, P4, and P5 feature levels carry small, medium, and context-rich crack information. The detection head remains decoupled and unchanged to preserve YOLO11n compatibility. This design limits architectural changes to the feature-extraction and feature-fusion stages, which are most relevant to discontinuous crack representation and background suppression.

3.1. Loss Function Shape-IoU Improvement

YOLO11 employs the CIoU loss function for bounding box regression. However, CIoU inadequately captures the shape and scale characteristics of small targets, leading to reduced localization accuracy in small-object detection scenarios [41]. To address this issue, Shape-IoU, originally proposed as a shape- and scale-aware bounding-box regression metric, introduces shape-aware constraints by decoupling geometric properties from spatial location information, thereby improving regression stability and accuracy for slender crack targets [42].

Compared with CIoU, DIoU, EIoU, and SIoU, Shape-IoU explicitly considers object shape and scale when measuring bounding-box regression errors. This is beneficial for mining cracks because their bounding boxes are often highly elongated and sensitive to small localization deviations along the narrow direction. In such cases, a loss function that penalizes shape inconsistency can improve localization stability even when the intersection area changes only slightly.

L = 1 - I O U + L_{d} + 0.5 \times L_{Ω}

(1)

I o U = \frac{| B \cap B_{g t} |}{| B \cup B_{g t} |}

(2)

L_{d} = H \times {(x - x_{g t})}^{2} / c^{2} + W \times {(y - y_{g t})}^{2} / c^{2}

(3)

L_{Ω} = \sum_{i = 1, 2} {(1 - e^{- w_{t}})}^{θ}, θ = 4

(4)

Here, L_d and L_Ω denote the shape distance loss and shape value loss, respectively. By explicitly modeling shape and scale information, Shape-IoU improves localization stability for slender crack targets.

W = \frac{2 \times w_{gt}^{s}}{w_{gt}^{s} + h_{gt}^{s}}

(5)

H = \frac{2 \times h_{gt}^{s}}{w_{gt}^{s} + h_{gt}^{s}}

(6)

Here, W and H are the horizontal and vertical weighting factors, respectively; w_gt and h_gt denote the width and height of the ground-truth box; and s is a scale factor related to target size.

The structure of Shape-IoU is shown in Figure 2. By introducing shape-aware weighting and directional distance penalties, Shape-IoU provides a more accurate representation of geometric discrepancies between predicted and ground-truth boxes, thereby improving localization performance for slender crack targets [43].

3.2. Attention Mechanism Improvements

To alleviate attention redundancy in conventional MHSA, the CGA module, derived from EfficientViT [44], is integrated into the neck network. CGA performs attention computation in grouped channel subspaces and progressively aggregates inter-group information through cascaded attention operations, thereby enhancing feature representation for slender and discontinuous crack structures.

Y_{j} = Attention (X_{j} + Y_{j - 1}), Y_{0} = 0

(7)

Here, Xj and Yj denote the input and output of the j-th attention head, respectively.

By enhancing information interaction among attention heads, CGA improves feature discrimination and suppresses background interference with limited computational cost [45]. The architecture is shown in Figure 3.

3.3. Convolution Enhancement

Because surface cracks exhibit elongated and discontinuous structures, adaptive receptive fields are beneficial for capturing long-range contextual information. Therefore, SAConv is incorporated into the backbone network to replace selected standard convolutions. By combining standard and atrous convolutions, SAConv dynamically adjusts the receptive field for multi-scale feature extraction [46].

Atrous convolution is suitable for slender crack extraction because it enlarges the receptive field without reducing feature-map resolution, helping the network connect discontinuous crack segments and capture long-range linear context [47]. Compared with deformable convolution, SAConv provides a more controlled receptive-field expansion and lower geometric instability for thin structures whose boundaries are weak and fragmented. Deformable convolution is powerful for irregular object shapes, but its learned offsets may be influenced by vegetation, shadows, and soil textures in UAV scenes. Therefore, SAConv was selected to strengthen context aggregation while preserving stable crack geometry.

Conv (x, w, r) = S (PRS (X)) Conv (a, w, l) + (1 - s (x, r)) Conv (x, w + Δ w, r)

(8)

Here, r denotes the dilation rate, Δw is the learnable weight offset, and s(x) is the adaptive switching function.

SAConv adaptively adjusts receptive fields through dynamic dilation selection, enabling more effective extraction of elongated crack features and improved robustness to scale variations while maintaining computational efficiency. In this study, SAConv is integrated into the C3k2 module to form C3k2_SAConv, as illustrated in Figure 4.

4. Experimental Results and Analysis

4.1. Research Region Overview

Zhungeer Banner is located in the eastern part of southwestern Inner Mongolia and is bordered by the Yellow River on three sides. It is commonly referred to as “Jiming San Sheng” [48]. The Zhungeer Banner coalfield is situated within the arid and semi-arid region of northern China, where the ecological environment is fragile, and water resources are limited. Continuous large-scale coal mining has induced extensive surface subsidence, significant declines in groundwater levels, and increasingly severe ecological and environmental degradation. The study area is located in the northeastern part of Zhungeer Banner, covering approximately 6.0 km² northeast of the Bulian Gou Coal Mine.

The major forms of ground subsidence in the mining area include surface cracks, subsidence trenches, and collapse pits. Among these, surface cracks are the most extensively developed [49], with widths ranging from 10 to 120 cm. These cracks commonly exhibit a parallel and stepped distribution pattern, with step heights generally varying between 15 and 130 cm and reaching a maximum of 2.5 m. In plan view, the cracks are predominantly curvilinear, although locally linear features are also observed.

4.2. Experimental Dataset

This study employed a Pegasus D20 unmanned aerial vehicle (UAV) (Feima Robotics Co., Ltd., Shenzhen, China) equipped with a D-OP3000 five-lens oblique photogrammetry system (Feima Robotics Co., Ltd., Shenzhen, China) for data acquisition. UAV flights were conducted between 09:00 and 15:00 to minimize the influence of shadows on aerial imagery. The flight altitude was maintained at 400 m above the take-off point, resulting in a ground sampling distance (GSD) of 5–7 cm across the survey area. The side overlap and forward overlap were set to 65% and 80%, respectively.

Each flight mission lasted less than 50 min, and a total of four flights were completed, covering an aerial survey area of approximately 6.071 km². The UAV flight scheme is illustrated in Figure 5.

The dataset was constructed using UAV-acquired remote sensing imagery. To enhance model performance and improve data diversity, data augmentation was applied to the calibrated samples by adjusting brightness and contrast with scaling factors of 0.25, 0.5, and 0.75, resulting in a dataset of 5000 surface crack images.

Color-space augmentation was selected because illumination variation, shadows, exposed soil brightness, and vegetation-background contrast are major sources of uncertainty in UAV images from the study area. Geometric transformations were not used in this revision because crack orientation, continuity, and scale are directly related to the physical morphology of mining-induced ground fissures; aggressive rotation, scaling, or warping may introduce samples that are less consistent with the photogrammetric scene geometry. Nevertheless, we agree that moderate geometric augmentation may further improve dataset balance and model robustness, and this will be investigated in future work.

After ensuring that the image GSD met the requirements for monitoring tasks, the dataset was randomly divided into training, validation, and test sets in a ratio of 7:2:1. Subsequently, LabelMe (version 5.3.1; MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA) was employed to annotate the semantic segmentation dataset, and the annotated results were converted into text files, representing surface crack detection as point-based coordinate sequences in each image. The 7:2:1 division corresponds to 3500 training images, 1000 validation images, and 500 test images.

To improve reproducibility, crack annotations were generated in LabelMe as point-based coordinate sequences and then converted to the training-label format required by the detector. The annotation guideline included visible continuous, discontinuous, curvilinear, and stepped fissures, while excluding roads, vegetation boundaries, shadows, and exposed-soil textures that did not correspond to crack morphology. Ambiguous samples were rechecked during label conversion and dataset cleaning. The verified quantitative descriptors comprise 5000 images divided into 3500 training, 1000 validation, and 500 test images, a ground sampling distance of 5–7 cm, field-observed crack widths of 10–120 cm, and brightness/contrast scaling factors of 0.25, 0.5, and 0.75. Detailed image-level crack-length distributions and instance-level size or class-imbalance statistics were not retained and are therefore reported as a limitation rather than estimated retrospectively.

4.3. Experimental Environment and Parameter Configuration

The hardware–software configuration was as follows: Windows 10 Pro, PyTorch 2.0.1, Python 3.11.7, and CUDA 12.9, running on a PC (Dell Technologies Inc., Round Rock, TX, USA) with i9-10900X and NVIDIA RTX 3090. Using the YOLO11n framework, the proposed model was trained on 1080 × 1080 drone images with a batch size of 28 over 300 epochs, starting with a learning rate of 0.01 and momentum set to 0.9.

The training strategy used stochastic gradient descent with an initial learning rate of 0.01, momentum of 0.9, and scheduled learning-rate decay over 300 epochs. All models were initialized from YOLO11n pretrained weights and trained using 1080 × 1080 inputs with a batch size of 28 under Windows 10 Pro, Python 3.11.7, PyTorch 2.0.1, and CUDA 12.9 on an Intel i9-10900X CPU and an NVIDIA RTX 3090 GPU. Five independent runs with different random seeds were used for statistical validation. No early stopping was applied, model selection was based on validation-set performance, and the test set was used only for the final comparison. To ensure a controlled comparison, all competing models used the same data split, augmentation settings, input size, initialization strategy, hardware/software environment, training budget, and evaluation pipeline.

4.4. Experimental Evaluation Criteria

The model was evaluated using precision, recall, mean average precision at an intersection-over-union threshold of 0.5 (mAP@0.5), number of parameters, floating-point operations (FLOPs), frames per second (FPS), and model size. Precision, recall, and mAP@0.5 were used to evaluate detection accuracy, whereas parameters, FLOPs, FPS, and model size were used to determine computational efficiency. FPS was measured on the same hardware platform using 1080 × 1080 input images, so the reported values reflect relative inference efficiency under identical test conditions:

Additional indicators were considered to support a more comprehensive interpretation. The F1-score can be derived from precision and recall, while mAP@0.5:0.95 and confusion-matrix analysis are useful for stricter localization and error analysis. ROC analysis is less directly applicable to the one-stage object-detection setting because detections depend on confidence thresholds and non-maximum suppression. These indicators will be incorporated in future expanded evaluations.

The F1-score was calculated from the reported precision and recall values using F1 = 2PR/(P + R) and was added to the comparative results. The proposed model achieved an F1-score of 81.6% on the mining-area UAV dataset, compared with 78.4% for the baseline YOLO11n, and 78.4% on Crack500, compared with 74.4% for YOLO11n. The archived experimental summaries contain mAP@0.5 but do not contain the complete threshold-wise AP outputs required for a valid COCO-style mAP@0.5:0.95 calculation. Because mAP@0.5:0.95 cannot be reconstructed reliably from mAP@0.5 alone, it was not inferred or estimated retrospectively. Future evaluations will retain complete threshold-wise outputs and report this metric using the COCO-style protocol.

P = \frac{TP}{TP + FP}

(9)

R = \frac{TP}{TP + FN}

(10)

In Equations (9) and (10), TP denotes correctly detected crack targets, FP denotes background or non-crack regions incorrectly detected as cracks, and FN denotes missed crack targets. Precision evaluates the reliability of positive detections, whereas recall measures the ability to avoid missing actual cracks.

4.5. Ablation Experiment Results and Analysis

Ablation studies were conducted to assess the performance gains contributed by the proposed modules in UAV aerial object detection. The SAConv, CGA mechanism, and Shape-IoU loss function were progressively integrated into the baseline YOLO11n model. The results are summarized in Table 2.

A controlled comparison between Configuration D (SAConv + CGA with CIoU) and the proposed model (SAConv + CGA with Shape-IoU) isolates the effect of the regression loss while keeping the feature-extraction architecture fixed. Replacing CIoU with Shape-IoU increased recall from 76.8% to 77.9% (+1.1 percentage points) and mAP@0.5 from 83.9% to 84.3% (+0.4 percentage points), while precision changed from 86.2% to 85.6% (−0.6 percentage points). Thus, Shape-IoU provides a complementary improvement in sensitivity and overall localization performance for slender cracks, although its standalone contribution is modest and does not uniformly improve all metrics.

Configuration A: Replacing CIoU with Shape-IoU enhanced precision by 0.7%, while the model size, number of parameters, and FLOPs remained unchanged. However, recall, mAP@0.5, and FPS decreased by 1.1%, 0.2%, and 4.08%, respectively.

Configuration B: Introducing the CGA mechanism increased recall and mAP@0.5 by 1.9% and 0.7%, respectively, with no change in model size or FLOPs. The parameter count was reduced by 0.03 M, while precision and FPS decreased by 1.5% and 18.92%, respectively.

Configuration C: Incorporating the SAConv module improved recall and mAP@0.5 by 5.4% and 2.3%, respectively. However, it significantly increased computational complexity, with parameters, model size, and FLOPs rising by 9.59 M, 18.34 MB, and 28.4 G, respectively, while precision decreased by 0.8% and FPS dropped by 51.57%.

Configuration D: The combined use of SAConv and CGA improved precision, recall, and mAP@0.5 by 2.0%, 3.5%, and 2.8%, respectively. This gain came at the cost of increased computational load, with parameters, model size, and FLOPs increasing by 9.49 M, 18.14 MB, and 28.4 G, while FPS decreased by 55.57%. Relative to YOLO11n, the proposed model uses approximately 4.35 times as many parameters and 3.78 times as many FLOPs, but it still processes 57.2 frames per second on the RTX 3090. This operating point is appropriate for post-flight or semi-real-time UAV inspection, whereas the baseline remains preferable when edge-device latency and storage are the dominant constraints.

Proposed Model: By integrating SAConv, CGA, and Shape-IoU, the proposed model achieves improvements of 1.4% in precision, 4.6% in recall, and 3.2% in mAP@0.5 compared with the baseline YOLO11n. In practical mining-crack monitoring, the 4.6% recall improvement is particularly important because missed detections may delay field verification of active fissures and potential subsidence hazards. The improvement also indicates better continuity perception for slender and discontinuous cracks in complex UAV backgrounds. These gains are accompanied by increased computational cost, with the number of parameters increasing from 2.83 M to 12.32 M, FLOPs from 10.2 G to 38.6 G, model size from 5.76 MB to 23.9 MB, and FPS decreasing from 127.4 to 57.2. Therefore, the proposed model is more suitable for offline or semi-real-time UAV inspection workflows where improved crack recall and localization are prioritized over ultra-lightweight deployment.

Quantitatively, the FPS decrease from 127.4 to 57.2 corresponds to an approximate inference-latency increase from 7.85 ms to 17.48 ms per image on the same RTX 3090 platform. Thus, the proposed model is about 2.23 times slower than the baseline while improving mAP@0.5 by 3.2% and recall by 4.6%. Memory usage and energy consumption were not measured in this study; therefore, the computational-cost discussion is limited to parameters, FLOPs, model size, FPS, and derived latency.

Among the three updates, SAConv has the greatest effect on recall and mAP@0.5 because the adaptive receptive field improves the representation of elongated and discontinuous crack structures. However, it also introduces the largest increase in parameters and FLOPs. CGA mainly improves feature discrimination and background suppression with a moderate inference-speed cost, whereas Shape-IoU improves shape-aware localization but provides limited improvement when used alone. The final model therefore represents an accuracy–efficiency trade-off rather than a purely lightweight alternative to YOLO11n.

To further interpret the performance gains, Grad-CAM visualizations were generated for both YOLO11n and the proposed model, as shown in Figure 6. The purpose of Figure 6 is not to claim that all target cracks are invisible to the naked eye; rather, it illustrates that even visually identifiable cracks may be difficult for detectors under vegetation cover, shadows, soil texture interference, and discontinuous crack boundaries. The baseline YOLO11n exhibits dispersed activations and noticeable responses to vegetation and background textures. In contrast, the proposed model produces more concentrated activations along crack structures, indicating that the combination of SAConv and CGA enhances crack feature perception while suppressing irrelevant background interference.

Figure 7 presents feature maps of different network configurations. Compared with YOLO11n, SAConv enhances crack continuity representation, while the addition of CGA further suppresses background interference and strengthens crack feature discrimination. The visualization results agree with the quantitative improvements in Table 2, confirming the effectiveness of the proposed framework in complex mining environments.

4.6. Comparison Results and Analysis with Other Algorithms

To evaluate the performance of the proposed model for surface crack detection, several state-of-the-art object detection models were compared with the baseline YOLO11n, as reported in Table 3. The F1-score is calculated from the reported precision and recall values.

The current quantitative comparison focuses on lightweight YOLO-family detectors because they share the same bounding-box output format and can be trained under comparable settings. Transformer-based detectors and segmentation-based crack methods are valuable alternatives, but direct comparison requires consistent annotation formats, scale handling, and evaluation protocols. Therefore, they are discussed qualitatively in the literature comparison table, while future work will include controlled experiments with RT-DETR-like detectors and segmentation networks.

The proposed model achieved the highest F1-score (81.6%) among the compared YOLO-family detectors, exceeding the baseline YOLO11n (78.4%) by 3.2 percentage points. This result indicates that the improvement in recall was achieved while maintaining high precision.

To further demonstrate the advantages of the proposed algorithm, Figure 8 presents a comparison of the precision (P), recall (R), and mAP@0.5 curves between the original and proposed YOLO11n during training. Both models converged after approximately 300 epochs, while the proposed model consistently outperformed the baseline in terms of P, R, and mAP@0.5.

To evaluate the robustness of the proposed framework, five independent training runs with different random seeds were conducted. Table 4 reports the mean and standard deviation of precision, recall, and mAP@0.5. The proposed model consistently achieved superior performance with lower variance than the baseline YOLO11n.

Independent-sample t-tests further confirmed that the improvements in all evaluation metrics were statistically significant (p < 0.05). The boxplots in Figure 9 illustrate the distribution of results across repeated experiments, demonstrating the stability and reliability of the proposed framework.

For the five-run statistical analysis, normality was assumed approximately because all runs used identical data splits and training settings while varying only random seeds. The reported p-values indicate statistically significant improvements, and the lower standard deviations suggest improved stability. In future work, confidence intervals and effect-size statistics will be reported together with p-values to further strengthen statistical interpretation.

4.7. Generalization Evaluation on the Crack500 Dataset

To further evaluate the generalization capability of the proposed framework, cross-dataset experiments were conducted using the publicly available Crack500 dataset [34]. The model was trained on the self-constructed mining crack dataset and directly tested on Crack500 without additional fine-tuning. Because Crack500 is a close-range pavement crack dataset rather than a high-altitude UAV mining dataset, this experiment should be interpreted as a cross-domain stress test rather than proof of full operational transferability.

As shown in Table 5, the proposed model achieved the highest precision, recall, and mAP@0.5 among all compared methods. Specifically, the proposed framework improved mAP@0.5 by 5.1% compared with the baseline YOLO11n, demonstrating superior transferability across different crack scenarios. The F1-score is calculated from the reported precision and recall values.

The scale space, background texture, and ground sampling distance differ substantially between the mining UAV dataset and Crack500. No scale alignment or domain adaptation was applied in this experiment; therefore, the reported gain mainly indicates that the proposed modules improve relative robustness compared with the baseline under the same cross-domain protocol. Broader validation across multiple mining regions and UAV flight conditions is still required before claiming general deployment capability.

On Crack500, the proposed model achieved the highest F1-score (78.4%), compared with 74.4% for YOLO11n. This 4.0-percentage-point improvement supports the cross-domain robustness of the proposed feature-extraction and localization strategy.

Figure 10 presents representative detection results on Crack500. Compared with the baseline model, the proposed framework exhibits stronger continuity perception for elongated crack structures and better resistance to background interference, particularly under shadow and texture-rich conditions.

These results indicate that the integration of SAConv, CGA, and Shape-IoU enhances feature representation robustness and improves the generalization capability of the network beyond mining-specific datasets.

4.8. Ablation Study and Comparative Analysis

To evaluate the effectiveness of the proposed improvements, systematic ablation and comparative experiments were conducted based on YOLO11n. The results show that SAConv, CGA, and Shape-IoU contribute differently to crack detection performance.

Among the three modules, SAConv provides the most significant improvement in recall and mAP@0.5. This improvement mainly results from its adaptive receptive field, which enhances multi-scale feature extraction and continuity perception for elongated and discontinuous crack structures. CGA further improves detection performance by suppressing responses to vegetation, shadows, and complex surface textures, thereby reducing background interference and strengthening crack feature representation. Although Shape-IoU contributes less to overall accuracy, it improves localization stability by incorporating shape-aware geometric constraints into bounding-box regression.

By integrating all three modules, the proposed model achieves the best overall performance. Compared with the baseline YOLO11n, precision, recall, and mAP@0.5 increase by 1.4%, 4.6%, and 3.2%, respectively. The improvement is particularly evident for small, slender, and low-contrast crack targets, which are commonly observed in mining subsidence areas.

Comparative experiments further demonstrate the superiority of the proposed framework. While YOLO8n achieves competitive performance, its detection accuracy remains lower than that of the proposed model. YOLO12n exhibits the weakest overall performance. Considering detection accuracy, model complexity, and computational cost jointly, the proposed model achieves the most favorable balance between effectiveness and efficiency.

The training curves of precision, recall, and mAP@0.5 show stable convergence, with all metrics gradually stabilizing after approximately 300 epochs. In addition, the Grad-CAM and feature-map visualizations presented in Figure 6 and Figure 7 indicate that the proposed model focuses more accurately on crack regions and suppresses irrelevant background responses. Representative results on the Crack500 dataset (Figure 10) and UAV imagery from the study area (Figure 11) further show fewer false positives and missed detections compared with YOLO11n, particularly for elongated and discontinuous cracks under complex environmental conditions.

Overall, the proposed framework achieves more accurate crack localization and detection while maintaining stable performance, demonstrating its applicability for UAV-based surface crack monitoring in mining areas.

5. Discussion

A deeper interpretation of the results indicates that the proposed architecture improves crack detection mainly by strengthening long-range context, suppressing background interference, and improving slender-box localization. However, these gains are accompanied by a clear computational trade-off, so the method should be considered an accuracy-oriented enhancement rather than a strictly lightweight detector.

The revised results also show that the proposed method should be interpreted as an accuracy-oriented enhancement rather than a lightweight model. The increase in parameters and FLOPs is justified for UAV-based inspection tasks where images can be processed after flight or on a workstation, but lightweight pruning, knowledge distillation, and edge-device deployment remain necessary for fully real-time field applications.

The proposed framework enhances surface crack detection in mining areas by integrating SAConv, CGA, and Shape-IoU into YOLO11n. Experimental results demonstrate improved detection performance on both the self-constructed mining crack dataset and the Crack500 dataset, indicating good adaptability to different crack scenarios.

Compared with recent UAV-based crack detection studies [16,17,23,32], the proposed framework achieves higher recall and mAP@0.5, but this improvement is obtained at the cost of substantially increased computational complexity. In particular, the proposed model improves mAP@0.5 by 3.2% over the baseline YOLO11n and demonstrates stronger cross-dataset performance on Crack500. These improvements suggest that the proposed framework is more effective in handling thin, elongated, and discontinuous crack structures, but it is not a strict nano-level lightweight model after SAConv is introduced.

The principal error mechanisms can be grouped into three categories: (i) visual ambiguity, where shadows, tire tracks, and erosion boundaries resemble cracks; (ii) incomplete visibility, where vegetation or severe shadow interrupts crack continuity; and (iii) insufficient spatial evidence, where narrow, low-contrast cracks approach the effective image-resolution limit. Potential remedies include environment-stratified sampling, hard-negative mining, moderate geometry-preserving augmentation, and multi-region validation. Because the available test records were not tagged by environmental category, category-specific error rates are not reported, and this remains a target for future work.

One notable advantage of the proposed framework is its robustness under complex environmental conditions. Mining surface images often contain vegetation, shadows, exposed rocks, and heterogeneous textures, which can interfere with crack identification. The incorporation of CGA improves feature discrimination and reduces background interference, enabling more reliable crack detection under challenging conditions. This advantage is further supported by the Grad-CAM and feature-map visualizations, which show more concentrated responses along crack regions and reduced activation in irrelevant background areas.

Failure cases were associated primarily with three mechanisms. First, visual ambiguity caused false positives where strong shadows, vegetation boundaries, tire tracks, and erosion textures resembled crack edges. Second, incomplete visibility under dense vegetation or shadow increased false negatives for partially occluded cracks. Third, insufficient spatial evidence reduced sensitivity to very narrow, highly fragmented, discontinuous, or low-contrast fissures. These observations highlight the need for environment-stratified sampling, hard-negative mining, moderate geometry-preserving augmentation, and multi-region validation. Because the retained test records were not tagged by environmental category, category-specific error rates are not reported.

The proposed framework also exhibits improved sensitivity to small, slender, and discontinuous cracks. Surface cracks in mining subsidence areas are typically characterized by elongated morphology, large-scale variations, and weak visual contrast. By adaptively adjusting the receptive field, SAConv enhances multi-scale feature extraction and continuity perception, allowing the network to better capture crack structures of varying sizes. Meanwhile, Shape-IoU introduces shape-aware constraints into bounding-box regression, improving localization stability for irregular crack targets. These improvements collectively contribute to the enhanced localization accuracy and detection performance observed across all experiments.

Despite these advantages, several limitations remain. Although the proposed model achieves higher detection accuracy, the introduction of SAConv substantially increases computational complexity, resulting in a 55.1% reduction in inference speed compared with the baseline YOLO11n. This trade-off may limit its deployment in resource-constrained edge devices or real-time monitoring systems. Furthermore, the dataset used in this study was collected from specific mining areas, and its geological and environmental diversity remains limited. Therefore, the model’s generalization capability under extreme conditions, such as dense vegetation cover, severe ground deformation, or highly complex terrain backgrounds, requires further validation.

Because all UAV images were collected from the Zhungeer mining area, the current dataset may reflect regional geological and environmental characteristics. Crack morphology in central-eastern mining areas, the Shendong mining area, Xinjiang mining areas, and southwestern mining areas may differ in soil color, vegetation coverage, fracture scale, and deformation pattern. Therefore, the current results should not be interpreted as fully representative of all mining regions; multi-region datasets are required to verify broader generalization.

Future work will focus on reducing computational overhead while maintaining detection accuracy. Lightweight network design, model compression, and knowledge distillation techniques may be explored to improve deployment efficiency. In addition, the integration of multi-source data, such as UAV imagery, LiDAR, and multispectral information, may further enhance the robustness and generalization capability of crack detection in complex mining environments.

6. Conclusions

This study presents an enhanced YOLO11n framework for UAV-based surface crack detection in coal-mining subsidence areas. By integrating SAConv, CGA, and Shape-IoU loss, the proposed method improved crack localization and detection accuracy under complex environmental conditions. Experimental results demonstrated gains of 1.4%, 4.6%, and 3.2% in precision, recall, and mAP@0.5, respectively, compared with the baseline model. However, the improved accuracy is accompanied by increased parameters, model size, FLOPs, and reduced FPS, and the dataset still has limitations in regional diversity. Future work will focus on lightweight model design, pruning, knowledge distillation, edge-AI deployment, multi-region UAV datasets, digital-twin integration, and predictive geological-hazard monitoring systems.

Author Contributions

Conceptualization, M.W. and Z.Z.; Methodology, M.W., N.Z., W.R. and Z.Z.; Software, M.W., N.Z., C.L. and W.R.; Validation, M.W. and Z.Z.; Formal analysis, M.W. and C.L.; Investigation, M.W. and Z.Z.; Resources, M.W. and Z.Z.; Data curation, M.W., N.Z. and W.R.; Writing—original draft, M.W. and Z.Z.; Writing—review and editing, M.W. and Z.Z.; Supervision, M.W., N.Z. and C.L.; Project administration, M.W. and Z.Z.; Funding acquisition, M.W. and Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the geological survey projects conducted by the Geological Survey of China (No. DD202606301704).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, Q.; Hu, Z.; Han, J.; Yang, K.; Fu, Y. Research on extraction method of ground fissures caused by mining through UAV images in coal mine areas. Coal Sci. Technol. 2023, 51, 187–196. [Google Scholar] [CrossRef]
Hu, Z.; Li, Y.; Li, G.; Han, J.; Liu, S. Opportunities and challenges of land reclamation and ecological restoration in mining areas under carbon neutrality targets. Coal Sci. Technol. 2023, 51, 474–483. [Google Scholar] [CrossRef]
Zhu, C.; Huang, Y.; Rui, G.; Zhou, Z. Development of ground fissures in coal mining areas induced by mining activities. Chin. J. Geol. Hazard Control 2017, 28, 47–52. [Google Scholar]
Zhang, J.; Wang, K.; Zhao, T.; Fang, P.; Qi, K.; Wei, B.; Li, Z. Status and development of UAV remote sensing technology in mining surface subsidence and fracture measurement. Coal Sci. Technol. 2024, 52, 435–444. [Google Scholar] [CrossRef]
Ma, R.; Yu, H.; Liu, X.; Yuan, X.; Geng, T.; Li, P. InSAR-YOLOv8 for wide-area landslide detection in InSAR measurements. Sci. Rep. 2025, 15, 1595. [Google Scholar] [CrossRef] [PubMed]
He, K.; Dong, J.; Ma, H.; Cai, Y.; Feng, R.; Dong, Y.; Wang, L. Remote sensing image interpretation of geological lithology via a sensitive feature self-aggregation deep fusion network. Int. J. Appl. Earth Obs. Geoinf. 2025, 137, 104384. [Google Scholar] [CrossRef]
Zhang, L.; Gao, P.; Gan, Z.; Wu, W.; Sun, Y.; Zhu, C.; Long, S.; Liu, M.; Peng, H. Surface subsidence monitoring of mining areas in Hunan Province based on Sentinel-1A and DS-InSAR. Sensors 2023, 23, 8146. [Google Scholar] [CrossRef] [PubMed]
Zhu, M.; Yu, X.; Tan, H.; Yuan, J. Integrated high-precision monitoring method for surface subsidence in mining areas using D-InSAR, SBAS, and UAV technologies. Sci. Rep. 2024, 14, 12445. [Google Scholar] [CrossRef] [PubMed]
Hou, E.; Zhang, J.; Xie, X.; Xu, Y. Comparative application of UAV and satellite remote sensing technologies for ground surface crack detection in coal mining areas. Geol. Bull. China 2019, 38, 443–448. [Google Scholar]
Wei, C.; Wang, Y.; Wang, J.; Zhao, H. Extraction of ground fissure information in mining areas using UAV imagery. Met. Mine 2012, 436, 90–92. [Google Scholar]
Wang, X.; Tian, M.; Zhang, Z.; He, K.; Wang, S.; Liu, Y.; Dong, Y. SDSNet: Building extraction in high-resolution remote sensing images using a deep convolutional network with cross-layer feature information interaction filtering. Remote Sens. 2024, 16, 169. [Google Scholar] [CrossRef]
Wang, H. Automatic Extraction of Ground Fissures in UAV Images Based on Deep Learning Techniques. Master’s Thesis, China University of Geosciences, Beijing, China, 2021. [Google Scholar]
Zhao, Y.; Xu, D.; Sun, B.; Jiang, Y.; Zhang, C.; He, X. Investigation on ground fissure identification using UAV infrared remote sensing and edge detection technology. J. China Coal Soc. 2021, 46, 624–637. [Google Scholar] [CrossRef]
Yang, K.; Hu, Z.; Liang, Y.; Fu, Y.; Yuan, D.; Guo, J.; Li, G.; Li, Y. Automated extraction of ground fissures due to coal mining subsidence based on UAV photogrammetry. Remote Sens. 2022, 14, 1071. [Google Scholar] [CrossRef]
Zhang, F.; Hu, Z.; Liang, Y.; Li, Q. Evaluation of surface crack development and soil damage based on UAV images of coal mining areas. Land 2023, 12, 774. [Google Scholar] [CrossRef]
Wang, W.; Du, W.; Song, X.; Chen, S.; Zhou, H.; Zhang, H.; Zou, Y.; Zhu, J.; Cheng, C. DRA-UNet for coal mining ground surface crack delineation with UAV high-resolution images. Sensors 2024, 24, 5760. [Google Scholar] [CrossRef] [PubMed]
Su, Y.; Tang, F.; Li, J.; Wang, C.; Zhang, X. Improved YOLOv7 model for intelligent recognition of mining surface cracks. Saf. Coal Mines 2024, 55, 169–176. [Google Scholar] [CrossRef]
Xu, Z.; Lin, Y.; Zhang, Z. FS-YOLOv8: A deep learning network for ground fissure instance segmentation in UAV images of the coal mining area. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, 48, 777–785. [Google Scholar]
Chen, W.; Zhong, C.; Qin, X.; Wang, L. Intelligent interpretation for geological disasters: A space-air-ground integration perspective. In Springer Series in Geomechanics and Geoengineering; Springer: Singapore, 2023; pp. 171–233. [Google Scholar]
Lian, X.; Li, Y.; Wang, X.; Shi, L.; Xue, C. Research on identification and localization of mining landslides based on improved YOLO algorithm. Drones 2024, 8, 150. [Google Scholar] [CrossRef]
Li, L. Research on Characteristics of Mining Surface Cracks Based on UAV Images. Master’s Thesis, Xi’an University of Science and Technology, Xi’an, China, 2021. [Google Scholar]
Wang, Z.; Wang, H.; Li, G. Weakly supervised learning for automatic extraction of ground fissures from UAV images. Exp. Technol. Manag. 2022, 39, 51–56. [Google Scholar]
Meng, J.; Xu, X.; Li, P.; Zhang, Z.; Zhao, W.; Ren, J. GF-Former: UAV-based high-precision ground fissure segmentation network. Int. J. Mach. Learn. Cybern. 2025, 16, 1127–1143. [Google Scholar] [CrossRef]
Lian, X.; Han, Y.; Liu, X.; Hu, H.; Cai, Y. Progress and trends in UAV-based mining geological disaster monitoring. Met. Mine 2023, 1, 17–29. [Google Scholar] [CrossRef]
Terven, J.; Cordova-Esparza, D.M.; Romero-Gonzalez, J.A. A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Hou, E.; Mu, J.; Xie, X.; Feng, D.; Yang, L.; Li, Y.; He, T.; Bao, K.; Chen, X. Formation mechanisms and evolution of surface cracks induced by shallow coal seam mining. Coal Geol. Explor. 2025, 53, 107–117. [Google Scholar] [CrossRef]
He, K.; Feng, R.; Zhang, Z.; Dong, Y. Remote sensing interpretation of geological elements via a synergistic neural framework with multi-source data and prior knowledge. Remote Sens. 2025, 17, 2772. [Google Scholar] [CrossRef]
Wei, B.; Liu, G.; Wang, Z. Ground fissure extraction in loess regions using modified MF-FDOG algorithm and UAV images. Ce Hui 2018, 41, 51–56. [Google Scholar]
Xu, W.; Lu, H.; Cheng, Z.; Lu, A.; Wang, H.; Wang, Y.; Li, S. Crack detection in UAV bridge images based on improved YOLOX-S. J. Jilin Univ. Sci. Ed. 2025, 63, 1091–1098. [Google Scholar] [CrossRef]
Hao, M.; Lin, H.; Gao, Y. Ground fissure extraction based on improved active contour model for UAV images. J. Geo-Inf. Sci. 2022, 24, 2448–2457. [Google Scholar] [CrossRef]
Wang, X.; Cai, Y.; Hu, H. Mining crack extraction based on dynamic snake-dilation convolution model. Bull. Surv. Mapp. 2024, 10, 144–150. [Google Scholar]
Zhang, L.; Li, X.; Hao, S.; Yan, Q.; Wang, J.; Wang, M. A study on the identification of cracks in mine subsidence based on YOLOv8n improvement. Processes 2024, 12, 2716. [Google Scholar] [CrossRef]
Wang, G.; Zhao, X.; Dang, D.; Wang, J.; Chen, Y. Enhancing object detection with Shape-IoU and Scale–Space–Task collaborative lightweight path aggregation. Appl. Sci. 2025, 15, 11976. [Google Scholar] [CrossRef]
Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1525–1535. [Google Scholar] [CrossRef]
Zakeri, H.; Nejad, F.M.; Fahimifar, A. Image based techniques for crack detection, classification and quantification in asphalt pavement: A review. Arch. Comput. Methods Eng. 2017, 24, 935–977. [Google Scholar] [CrossRef]
Eschmann, C.; Kuo, C.M.; Kuo, C.H.; Boller, C. Unmanned aircraft systems for remote building inspection and monitoring. In Proceedings of the 6th European Workshop on Structural Health Monitoring, Dresden, Germany, 3–6 July 2012. [Google Scholar]
Dorafshan, S.; Thomas, R.J.; Maguire, M. Comparison of deep convolutional neural networks and edge detectors for image-based crack detection in concrete. Constr. Build. Mater. 2018, 186, 1031–1045. [Google Scholar] [CrossRef]
Dung, C.V.; Anh, L.D. Autonomous concrete crack detection using deep fully convolutional neural network. Autom. Constr. 2019, 99, 52–58. [Google Scholar] [CrossRef]
Cha, Y.J.; Choi, W.; Buyukozturk, O. Deep learning-based crack damage detection using convolutional neural networks. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Wei, L.; Ke, Z.; Li, Q.; Cheng, M.; et al. YOLOv6: A single-stage object detection framework for industrial applications. IEEE Trans. Ind. Electron. 2023, 70, 11150–11161. [Google Scholar]
Zhang, Y.; Chen, Z. Improved YOLOv8 for small target detection on water surfaces. Comput. Syst. Appl. 2024, 33, 152–161. [Google Scholar]
Zhang, H.; Zhang, S. Shape-IoU: More accurate metric considering bounding box shape and scale. arXiv 2023, arXiv:2312.17663. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.Y.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Liu, X.; Peng, H.; Zheng, N.; Yang, Y.; Hu, H.; Yuan, Y. EfficientViT: Multi-scale linear attention for high-resolution dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 17302–17313. [Google Scholar]
Qiao, S.; Chen, L.C.; Yuille, A. DetectoRS: Recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 10213–10224. [Google Scholar]
Wang, X.; Liang, W.; Bi, C.; Li, J.; Wang, X. Hyperspectral image classification using hybrid convolution and cascaded group attention. Processes 2025, 45, 1485–1493. [Google Scholar]
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. In Proceedings of the 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
Guo, Y.; Liu, X. Environmental Geological Survey and Management in the Zhungeer Mining Area; China University of Mining and Technology: Xuzhou, China, 2021. [Google Scholar]
Wang, P.; Niu, Y.; Chen, K.; Zhang, J.; Gao, M.; Yu, Y. Field survey on the evolution of main ground cracks under condition of forced hard roof caving in mining ultra-thick coal seams under shallow overburden. J. China Coal Soc. 2023, 48, 3674–3687. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the proposed YOLO11 network architecture. Red circles indicate representative crack targets detected by the model.

Figure 2. Structure diagram of the Shape-IoU loss function.

Figure 3. Proposed attention mechanism diagram.

Figure 4. Convolution improvement module diagram.

Figure 5. Survey route planning map for the study area.

Figure 6. Grad-CAM visualization comparison of YOLO11n and the improved YOLO11n under four crack scenarios: vegetation cover, shadow interference, thin/discontinuous cracks, and complex-texture backgrounds. Warmer colors indicate higher activation intensity, whereas cooler colors indicate lower activation intensity.

Figure 7. Feature-map visualization of different network modules.

Figure 8. Performance comparison of different YOLO-based models on surface crack detection tasks in UAV mining scenarios: (A) precision; (B) recall; (C) mAP@0.5.

Figure 9. Boxplot comparison of precision, recall, and mAP@0.5 between YOLO11n and the proposed model over five independent runs.

Figure 10. Visual comparison of crack detection results on the Crack500 dataset.

Figure 11. Comparison of surface crack detection results at three representative UAV survey sites: (a) ground truth; (b) original YOLO11n; (c) improved YOLO11n. Green boxes denote TP, red boxes denote FP, yellow arrows denote FN, and numerical labels indicate detection confidence scores.

Table 1. Critical comparison of representative UAV and image-based crack detection studies and the proposed framework.

Study	Dataset/Platform	Architecture	Main Metrics	Complexity Report	Main Limitation
UAV mining fissure studies [16,17]	Coal-mining UAV imagery	Traditional/CNN-based extraction	Detection or segmentation accuracy	Limited or not reported	Often sensitive to shadows and irregular crack morphology
Improved YOLOv8n mining cracks [32]	Mining subsidence crack imagery	YOLOv8n + deformable modules	Precision, Recall, mAP	Lightweight YOLO metrics	Limited analysis of elongated discontinuous cracks
International pavement/concrete crack studies [34,35,36,37,38,39]	Close-range pavement, concrete, or infrastructure images	CNN/FCN/UAV inspection methods	Accuracy or mAP	Usually task-specific	Domain differs from high-altitude mining UAV imagery
This study	UAV mining subsidence crack imagery	YOLO11n + SAConv + CGA + Shape-IoU	Precision, Recall, mAP@0.5, FPS, FLOPs, parameters	12.32 M parameters, 38.6 G FLOPs, 57.2 FPS	Accuracy-oriented; edge deployment still needs compression

Table 2. Ablation experiment results.

Model	SAConv	CGA	Shape-IoU	P/%	R/%	mAP@0.5%	FPS/f.s-1	Parameters/M	FLOPs/G	Model Size/M
YOLO11n				0.842	0.733	81.1	127.4	2.83	10.2	5.76
A			√	0.849	0.722	80.9	122.2	2.83	10.2	5.76
B		√		0.827	0.752	81.8	103.3	2.80	10.2	5.76
C	√			0.834	0.787	83.4	61.7	12.42	38.6	24.1
D	√	√		0.862	0.768	83.9	56.6	12.32	38.6	23.9
Proposed Model	√	√	√	0.856	0.779	84.3	57.2	12.32	38.6	23.9

Note: Check marks indicate the modules included in each configuration; bold values indicate the best performance in each corresponding column.

Table 3. Comparative experimental results.

Model	P/%	R/%	mAP@0.5%	FPS/f.s-1	Parameters/M	FLOPs/G	Model Size/M	F1-Score (%)
YOLO8n	82.7	76.3	82.1	156.8	3.25	12.0	6.50	79.4
YOLO9t	81.6	74.6	80.6	65.4	3.09	54.1	6.60	77.9
YOLO10n	83.4	74.6	81.2	140.8	2.84	11.7	5.80	78.8
YOLO11n	84.2	73.3	81.1	127.4	2.83	10.2	5.76	78.4
YOLO11n-proposed	85.6	77.9	84.3	57.2	12.32	38.6	23.9	81.6
YOLO12n	82.1	72.7	79.4	86.8	2.81	10.2	5.80	77.1
YOLO26n	76.9	69.9	76.3	84.1	3.05	10.8	6.30	73.2

Table 4. Statistical significance analysis of five independent runs.

Model	Precision (%)	Recall (%)	mAP@0.5 (%)	FPS
YOLO11n	84.2 ± 0.36	73.3 ± 0.42	81.1 ± 0.39	127.4 ± 1.5
Proposed Model	85.6 ± 0.28	77.9 ± 0.31	84.3 ± 0.27	57.2 ± 0.8
p-value	0.014	0.002	0.001	—

Table 5. Generalization performance on the Crack500 dataset.

Model	Precision (%)	Recall (%)	mAP@0.5 (%)	F1-Score (%)
YOLO11n	78.4	70.8	75.1	74.4
YOLO8n	77.9	71.2	74.8	74.4
YOLO10n	79.3	69.5	75.7	74.1
Proposed YOLO11n	82.7	74.6	80.2	78.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, M.; Zhao, N.; Liu, C.; Rao, W.; Zhang, Z. Enhanced YOLO11n for UAV-Based Surface Crack Detection in Mining Subsidence Areas. Processes 2026, 14, 1988. https://doi.org/10.3390/pr14121988

AMA Style

Wang M, Zhao N, Liu C, Rao W, Zhang Z. Enhanced YOLO11n for UAV-Based Surface Crack Detection in Mining Subsidence Areas. Processes. 2026; 14(12):1988. https://doi.org/10.3390/pr14121988

Chicago/Turabian Style

Wang, Mo, Nan Zhao, Chuangchuang Liu, Wanxiang Rao, and Zhijun Zhang. 2026. "Enhanced YOLO11n for UAV-Based Surface Crack Detection in Mining Subsidence Areas" Processes 14, no. 12: 1988. https://doi.org/10.3390/pr14121988

APA Style

Wang, M., Zhao, N., Liu, C., Rao, W., & Zhang, Z. (2026). Enhanced YOLO11n for UAV-Based Surface Crack Detection in Mining Subsidence Areas. Processes, 14(12), 1988. https://doi.org/10.3390/pr14121988

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced YOLO11n for UAV-Based Surface Crack Detection in Mining Subsidence Areas

Abstract

1. Introduction

2. Related Work

2.1. Traditional Crack Detection Methods

2.2. UAV-Based Crack Monitoring

2.3. Deep Learning-Based Crack Detection

3. Proposed YOLO11n Network Architecture

3.1. Loss Function Shape-IoU Improvement

3.2. Attention Mechanism Improvements

3.3. Convolution Enhancement

4. Experimental Results and Analysis

4.1. Research Region Overview

4.2. Experimental Dataset

4.3. Experimental Environment and Parameter Configuration

4.4. Experimental Evaluation Criteria

4.5. Ablation Experiment Results and Analysis

4.6. Comparison Results and Analysis with Other Algorithms

4.7. Generalization Evaluation on the Crack500 Dataset

4.8. Ablation Study and Comparative Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI