Subgrade Distress Detection in GPR Radargrams Using an Improved YOLOv11 Model

Bai, Mingzhou; Ma, Qun; Liu, Hongyu; Zhang, Zilun

doi:10.3390/su18031273

Open AccessArticle

Subgrade Distress Detection in GPR Radargrams Using an Improved YOLOv11 Model

Beijing Rail Transit Line Safety and Disaster Prevention Engineering Research Center, Beijing Jiaotong University, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(3), 1273; https://doi.org/10.3390/su18031273

Submission received: 25 December 2025 / Revised: 22 January 2026 / Accepted: 23 January 2026 / Published: 27 January 2026

(This article belongs to the Special Issue Modelling Urban Geophysics for Sustainable Infrastructure and Smart Cities)

Download

Browse Figures

Versions Notes

Abstract

This study compares three detectors—Single Shot MultiBox Detector (SSD), Faster Region-based Convolutional Neural Network (Faster R-CNN), and Only Look Once v11 (YOLOv11)—for detecting subgrade distress in GPR radargrams. SSD converges fastest but shows weaker detection performance, while Faster R-CNN achieves higher localization accuracy at the cost of slower convergence. YOLOv11 offers the best overall performance. To push YOLOv11 further, we introduce three enhancements: a Multi-Scale Edge Enhancement Module (MEEM), a Multi-Feature Multi-Scale Attention (MFMSA) mechanism, and a hybrid configuration that combines both. On a representative dataset, YOLOv11_MEEM yields a 0.2 percentage-point increase in precision with a 0.2 percentage-point decrease in recall and a 0.3 percentage-point gain in mean Average Precision@0.5:0.95, indicating improved generalization and efficiency. YOLOv11_MFMSA achieves precision comparable to MEEM but suffers a substantial recall drop and slower inference. The hybrid YOLOv11_MEEM+MFMSA underperforms on key metrics due to gradient conflicts. MEEM reduces electromagnetic interference through dynamic edge enhancement, preserving real-time performance and robust generalization. Overall, MEEM-enhanced YOLOv11 is suitable for real-time subgrade distress detection in GPR radargrams. The research findings can offer technical support for the intelligent detection of subgrade engineering while also promoting the resilient development and sustainable operation and maintenance of urban infrastructure.

Keywords:

GPR; subgrade distress; YOLOv11; MEEM; MFMSA

1. Introduction

As China’s highway network continues to expand and mature, the structural integrity and stability of subgrades remain critical to operational safety [1,2]. Subgrade anomalies, such as soil loosening and internal voids, are often concealed beneath the surface, complicating early detection. Under the combined influence of environmental stressors and repeated traffic loading, these defects can drive progressive structural deterioration, making subgrade integrity a central concern in highway maintenance [3]. Escalating maintenance demands, together with labor shortages and budget constraints, frequently delay necessary interventions and allow traffic to operate over compromised subgrades [4]. Consequently, assessing how subgrade distress impacts operational safety has become imperative.

Detecting subgrade distress is essential for infrastructure condition assessment. Ground-penetrating radar (GPR) has become a widely used tool for identifying subsurface anomalies [5,6]. At the same time, as a core non-destructive testing technique for evaluating the structural foundations of urban infrastructure, it also serves as a critical technical enabler for sustainable infrastructure development and smart city advancement. Beyond defect geometry and burial depth, the heterogeneity of surrounding pavement materials significantly affects GPR wavefield responses. These complexities often lead to low signal-to-noise ratios and features that are difficult to discern in radargrams [7]. Conventional image-processing methods frequently struggle to achieve accuracy and generalization on complex radargrams. Consequently, there is growing interest in applying deep learning to automate GPR interpretation [8,9].

Preliminary detection frameworks were primarily governed by rule-based logic and classical machine learning [10,11]. Although they achieved modest accuracy, their heavy reliance on manual feature engineering limited effectiveness in localizing small-scale targets within cluttered radargrams. Today, SSD, Faster R-CNN, and the YOLO family offer robust distress-detection capabilities through end-to-end architectures that accelerate inference. Among them, SSD provides a favorable trade-off between speed and accuracy [12,13], while Faster R-CNN remains distinguished for its high-precision detection [14,15]. The YOLO series maintains competitive accuracy with superior real-time efficiency, underscoring its utility across diverse engineering applications [16,17]. Nevertheless, the generalization ability of detection models is constrained by the limited diversity of available GPR images and the lack of standardized data-collection protocols. Consequently, comparative evaluations of the detection performance of mainstream architectures—SSD, Faster R-CNN, and YOLO—in GPR imagery remain relatively scarce. Addressing the inherent complexities of GPR radargrams—namely cluttered textures, multi-scale targets, and ill-defined edges—will require targeted refinements of detection architectures to strengthen feature extraction and improve distress recognition performance.

This study develops a GPR dataset comprising loose soil, voids, cavities, and underground structures, derived from field inspections and forward modeling. Using this dataset, we benchmarked the detection performance of SSD, Faster R-CNN, and YOLOv11 architectures. Building on these findings, three optimizations are proposed for YOLOv11: (1) integrating MEEM, (2) incorporating MFMSA, and (3) a hybrid configuration that combines both modules. Performance was assessed using precision, recall, and F1-score, with YOLOv11_MEEM identified as the optimal architecture for subgrade distress identification. This study significantly improves the accuracy and efficiency of identifying disease targets in GPR images, offering a viable technical solution for the automatic diagnosis of subgrade diseases. At the same time, it provides robust technical support for enhancing the resilience and enabling intelligent operation and maintenance of transportation infrastructure, demonstrating substantial engineering application value.

2. Object Detection Algorithms

2.1. Single Shot MultiBox Detector

Single Shot MultiBox Detector is a foundational single-stage detector that bypasses region proposals and makes dense predictions directly via convolutional layers, balancing speed and accuracy [18,19,20,21]. As shown in Figure 1, its architecture rests on a few core components.

(1): Multi-scale Feature Maps: The architectural transition from the original VGG16 to a multi-scale framework is initiated by the convolutionalization of fully connected layers (FC6, FC7) and the subsequent integration of auxiliary stages (Conv8–Conv11). This structural reconfiguration establishes a hierarchical feature pyramid that facilitates a dual-pathway extraction mechanism. Within this hierarchy, shallow layers retain high-resolution spatial cues essential for identifying subtle distress details, while the deepened semantic layers capture the macro-scale signatures of extensive subgrade anomalies.
(2): Default boxes: At every location on each feature map, a fixed set of default boxes is assigned. The scales are determined by the following expression:

S_{k} = S_{\min} + \frac{S_{\max} - S_{\min}}{m - 1} (k - 1)

(1)

S_min = 0.2, S_max = 0.9, where m represents the total number of feature map layers.

(3): Detection Head. Default boxes produce two primary outputs: class confidence scores and localization offsets for each box.

SSD enables multi-scale detection by leveraging feature maps from multiple layers. Its single-stage, dense prediction approach avoids a separate region-proposal step, thereby accelerating inference.

2.2. Faster Region-Based Convolutional Neural Network

Faster R-CNN represents a landmark two-stage detection framework. By integrating the Region Proposal Network (RPN), it unifies region proposal generation and object classification into an end-to-end system, substantially improving both efficiency and accuracy [22,23,24,25]. Its architecture, illustrated in Figure 2, comprises four main modules.

(1): Feature Extraction Backbone: A Convolutional Neural Network (CNN) serves as the primary architecture for this module. Through a succession of convolutional layers, the network processes input GPR radargrams to extract multi-level, shared feature maps. This hierarchical mapping enables the internal representation of complex subgrade characteristics from the raw radar data.
(2): Region Proposal Network (RPN): This module uses an anchor mechanism that presets multiple anchors at each spatial location of the feature map. It performs a multi-task objective—predicting objectness scores and bounding-box regression offsets—and subsequently selects the Top-N proposals based on their confidence.
(3): Region of Interest (RoI) Pooling: The candidate regions generated by the RPN are initially mapped onto the feature maps. This mapping process is succeeded by an RoI Pooling stage, which normalizes variable-sized regions into uniform, fixed-dimension feature blocks. By standardizing these representations, the layer effectively resolves the discrepancy in candidate box scales while preserving the essential spatial integrity of the target signatures.
(4): Classification and Regression Heads: These standardized feature blocks are propagated through fully connected (FC) layers to achieve high-level feature integration. Such integrated representations then serve as the common input for two parallel prediction branches: a Softmax layer determines the specific distress category, while a bounding box regression branch simultaneously refines the spatial coordinates. This dual-branch optimization culminates in the precise localization and identification of the subgrade anomalies within the GPR radargrams.

2.3. You Only Look Once v11

The YOLO framework stands as a cornerstone of single-stage object detection, characterized by its iterative evolution to reconcile high-speed processing with competitive detection accuracy [26,27,28]. The developmental trajectory of the YOLO series can be categorized into three distinct phases: foundational architectures, engineering-centric optimizations, and modern multidimensional innovations [29].

YOLOv11 introduces a specialized architectural design that yields superior metrics in detection accuracy, inference speed, and multi-scale feature extraction. As depicted in Figure 3, the framework is organized into four primary components: backbone, neck, and head networks, complemented by task-specific modules [30,31,32].

(1): Backbone network: The backbone is designed to extract discriminative features from input geophysical radar radargrams. This extraction begins with a Conv module, which performs initial feature construction via 2-D convolution, normalization, and SiLU activation. The resulting representations are further refined by C3k2 modules utilizing multi-scale feature bottlenecks to enhance the network’s expressive capacity. The processing sequence concludes with an SPPF layer, which leverages multi-scale pooling to strengthen the receptive field for subgrade anomalies situated at varying depths.
(2): Neck network: The neck network is configured with a Path Aggregation Network (PAN) architecture. Within this framework, the collaborative operation of Upsample, Concat, and C3k2 units facilitates the integration of multi-scale features. This integrative process enables the bidirectional transfer and optimization of hierarchical representations from the backbone, thereby enhancing the capture of multi-scale distress targets within the subgrade.
(3): Head network: The “Detect” module executes the final prediction task on feature maps of varying resolutions. Facilitating this predictive task is the adoption of depthwise separable convolution, which decouples the process into channel-wise and point-wise components. This mechanism enables the precise output of distress categories and bounding box coordinates.
(4): Specialized modules:

YOLOv11 integrates the C2PSA module, a refined variant of the C2f block that incorporates Pointwise Spatial Attention (PSA). This mechanism endows the module with robust attention capabilities, enabling prioritization of salient regions across the spatial domain.

3. Evaluation of Object Detection Algorithms

3.1. Dataset Processing

To facilitate robust training for subgrade distress recognition, a comprehensive dataset was constructed by integrating in situ radargrams, acquired using an SIR-4000 portable GPR system manufactured by Laurel Industrial Company, San Jose, CA, USA, with synthetic data generated via numerical forward modeling in the GprMax 2.0-MATLAB R2021a framework. Specifically, the dataset consists of 1431 on-site detection images and 492 simulation images. For these simulations, the corresponding calculation parameters are presented in Table 1.

To enhance model generalization and robustness, the dataset was expanded using various augmentation techniques, including horizontal flipping, atomization, noise injection, sharpening, and Gaussian blur. Following random processing, samples that failed to maintain clear GPR characteristic features were manually discarded to ensure data quality. Representative samples of the expanded dataset are illustrated in Figure 4.

The dataset was annotated using labelimg, targeting four distinct classes: loosening, voids, cavities, and underground structures. For model development, the GPR dataset was partitioned into training and validation sets at a 4:1 ratio. The final dataset composition is detailed in Table 2.

3.2. Experimental Evaluation Metrics

Algorithm performance was assessed using precision (P), recall (R), F1-score (F1), mean average precision (mAP), and frames per second (FPS).

(1): Precision (P)

Precision is the proportion of correctly identified positive samples among all samples predicted as positive, defined as:

P = \frac{T P}{T P + F P}

(2)

where TP represents correctly predicted positive samples, and FP represents Type 1 error.

(2): Recall (R)

Recall is the proportion of actual positive samples that are correctly identified, expressed as:

R = \frac{T P}{T P + F N}

(3)

where FN represents Type 2 error.

(3): F1-Score (F1)

The F1-score is the harmonic mean of precision and recall:

F_{1} = \frac{2 \times P \times R}{P + R}

(4)

(4): mean Average Precision (mAP)

mAP evaluates object detection performance by averaging the Average Precision (AP) values across all categories:

A P = \int_{0}^{1} P (R) d R

(5)

m A P = \frac{\sum_{i = 1}^{K} A P_{i}}{K}

(6)

where K denotes the number of classes. It is commonly reported as mAP@0.5 or mAP@0.5:0.95. The notation “0.5:0.95” specifies the range of Intersection-over-Union (IoU) thresholds. Within this range, the average precision is calculated by incrementing the threshold from 0.5 to 0.95 with a step size of 0.05.

(5): Frames Per Second (FPS)

FPS quantifies the number of image frames processed per second, representing the model’s real-time inference capability. It is typically calculated as:

F P S = \frac{1}{T}

(7)

where T denotes the average time required to process a single frame.

3.3. Analysis of Tunnel Burial Depth Effect

The training performance of SSD, Faster R-CNN, and YOLOv11 was evaluated using standard loss functions and convergence metrics. The objective functions generally integrate Classification Loss (Cls Loss), bounding Box Regression Loss (Box Loss), and Total Loss (Total Loss), while the YOLOv11 framework further incorporates Distribution Focal Loss (Dfl Loss). Network convergence was assessed by analyzing loss trajectories across training iterations.

As illustrated in Figure 5, all models exhibit stable convergence as training epochs increase. SSD demonstrates the most rapid convergence, followed by Faster R-CNN, while YOLOv11 requires more epochs to stabilize. These trajectories indicate that all three architectures achieve robust convergence on the GPR dataset without evidence of divergence or overfitting, effectively learning to identify subgrade distress. A subsequent evaluation of key performance metrics is conducted to quantify their relative detection efficacy.

As shown in Figure 6 and Table 3, the comparative performance analysis reveals distinct training dynamics across the evaluated architectures. SSD demonstrates the most rapid initial precision growth within the first 20 epochs, with Faster R-CNN reaching a comparable level by approximately epoch 40. By epoch 60, precision values for all three frameworks converge; however, YOLOv11 ultimately achieves the highest peak precision of 0.986, outperforming the stabilized levels of its counterparts. In terms of Recall, while all three algorithms attain similar values by epoch 50, the performance gap widens as training progresses, with YOLOv11 reaching a superior peak of 0.952. A similar trend is observed for mean Average Precision@0.5, where YOLOv11 demonstrates steady improvement to a peak of 0.984. Regarding mean Average Precision@0.5:0.95, SSD exhibits pronounced fluctuations, whereas Faster R-CNN and YOLOv11 show stable, incremental gains. Notably, Faster R-CNN attains the highest value of 0.905 in this metric by epoch 200, indicating superior high-precision localization. Conversely, in terms of inference throughput, YOLOv11 achieves the highest Frames Per Second, followed by SSD, with Faster R-CNN exhibiting the lowest processing rate.

SSD yields the lowest overall performance among the evaluated models. While Faster R-CNN achieves the highest mean Average Precision@0.5:0.95, demonstrating superior efficacy in scenarios necessitating high localization fidelity, its substantial inference latency constrains its practical engineering utility. Notably, YOLOv11 outperforms both counterparts across Precision, Recall, and F1-score, indicating a simultaneous reduction in false positives and miss rates. Its superior mean Average Precision@50 underscores its accuracy in defect localization, while its high Frames Per Second confirms its suitability for high-throughput, real-time GPR data processing. The stable convergence observed in the terminal training phases further validates its robustness in feature extraction within complex environments. However, the positioning robustness of single reference models in roadbed defect detection remains susceptible to complex background interference. Such vulnerability persists even in high-efficiency frameworks like YOLOv11, where the pursuit of peak localization precision and the capture of salient edge features necessitate further optimization.

4. Improvements to the YOLOv11 Algorithm

4.1. Multi-Scale Edge Enhancement Module

To address challenges such as diverse target scales, significant environmental noise, and blurred reflection boundaries in GPR radargrams, the MEEM is introduced. This module focuses on capturing subgrade distress signals through multi-physical dimension feature extraction [33,34,35]. This extraction process is operationalized through the specific feature extraction paths integrated within the core architecture of MEEM, as illustrated in Figure 7.

(1): Multi-scale Response Path: Within GPR radargrams, reflected energy from various subgrade distresses and burial depths exhibits substantial spatial scale variations. Such multi-scale characteristics challenge traditional 3 × 3 convolutional kernels, whose limited receptive fields struggle to balance macroscopic structural outlines with microscopic localized details. To address this, the MEEM employs a parallel processing architecture that integrates multiple sets of 1 × 1 convolutional (Conv) and 3 × 3 average pooling (AP) layers. This structural configuration simulates diverse receptive fields, enabling the simultaneous capture of large-scale distress boundaries and minor anomaly features. Consequently, the reliance on this multi-scale representation effectively mitigates missed detections stemming from the significant size spans of subgrade targets.
(2): Edge Enhancer (EE) Branch: Subgrade distress is primarily manifested in radargrams as amplitude fluctuations or phase reversals at dielectric interfaces. These interfacial reflections typically form hyperbolic diffraction arcs; however, soil attenuation and clutter interference frequently render their edges blurry and degrade the signal-to-noise ratio. To mitigate such signal degradation, the module incorporates a specialized EE branch that extracts image gradient information. By locking onto and amplifying energy jumps at the reflection interfaces while suppressing random background clutter, this branch effectively transforms blurry defect boundaries into sharp, clear feature-level representations.
(3): Feature Integration and Optimization: The inherent complexity of subgrade environmental data necessitates the precise extraction of distress features from chaotic reflection signals. This requirement is addressed by concatenating (C) the enhanced multi-scale edge features with the branches of the Detail Enhancement Module (DEM). The resulting deep features are subsequently integrated through a 1 × 1 convolution layer (Conv) to concentrate the model’s energy on the core distress area. This focused energy distribution ultimately optimizes positioning accuracy and robustness within complex subgrade environments.

4.2. Multi-Feature Multi-Scale Attention

To address the challenges of variable target scales and low signal-to-noise ratios in GPR radargrams, the MFMSA module is integrated. The design of this module is strictly predicated on the physical characteristics of GPR signal echoes. This physically grounded logic synergizes frequency-domain analysis with spatial multi-scale features to enhance the discrimination of concealed targets [36]. The resulting structural configuration, illustrated in Figure 8, comprises the following core components:

(1): Multi-Scale Decomposition: Within GPR radargrams, signal responses exhibit pronounced multi-scale characteristics resulting from the diverse burial depths and geometric dimensions of detection targets. Addressing these characteristics begins with a ResNest backbone to extract fundamental features, followed by a Split-Attention mechanism that directs the feature flow into parallel branches. This branched architecture facilitates a Multi-Scale Decomposition stage, where differentiated physical receptive fields are constructed to achieve the simultaneous extraction of macroscopic structures and subtle distress details.
(2): Multi-Frequency Channel Attention (MFCA): To effectively suppress the frequent background clutter and random noise in measured radar data, the MFCA module utilizes the 2D Discrete Cosine Transform (DCT) to map features into the frequency domain [37]. This spectral transformation facilitates the analysis of global frequency components, enabling a precise focus on reflection interfaces with physical significance. Compared to traditional spatial domain pooling, such targeted frequency-domain analysis significantly enhances signal discrimination in complex environments.
(3): Multi-Scale Spatial Attention (MSSA): Building upon the frequency-domain calibration, the MSSA module implements adaptive weight parameters to modulate the information transfer ratio between distress targets and background strata. This dynamic modulation facilitates the precise capture of irregular scattering edges, an enhancement that ultimately strengthens the model’s positioning robustness within complex subgrade environments.
(4): Feature Fusion and Calibration: Features extracted from individual branches are integrated via a Multi-scale Feature Fusion layer. This integration is subsequently refined through depth calibration using a 1 × 1 convolution (Conv) layer, a process that ultimately yields highly robust and discriminative features.

4.3. Analysis of Ablation Experiment

To validate the effectiveness and synergistic impact of the MEEM and MFMSA modules on object detection performance, three comparative experiments—YOLOv11_MEEM, YOLOv11_MFMSA, and YOLOv11_MEEM+MFMSA—were designed based on the YOLOv11 architecture and systematically compared with the original YOLOv11 model.

As illustrated in Figure 9, the loss trajectories of the three enhanced models exhibit steady convergence, substantiating the architectural stability of the enhanced networks.

Figure 10 and Table 4 illustrate the distinct performance profiles of the three enhanced configurations. The YOLOv11 yields the highest Recall and mean Average Precision@0.5, effectively minimizing missed detections. Although YOLOv11_MFMSA improves Precision, its overall performance gain remains limited. In contrast, YOLOv11_MEEM achieves the highest mean Average Precision@0.5:0.95, suggesting superior localization robustness and generalization in complex subgrade environments. The YOLOv11_MEEM model facilitates a simultaneous enhancement in precision and a reduction in false detections, while elevating the inference speed to 294.1 Frames Per Second. Such elevated computational efficiency, complemented by robust multi-scale adaptability, effectively fulfills the real-time requirements essential for large-scale subgrade inspection.

As illustrated in Table 5, the YOLOv11_MEEM model maintains an extremely high accuracy of 0.995 for the challenging “loose” category, validating the module’s efficacy in capturing weak features. Such high-fidelity detection extends across all distress types—with accuracy values consistently exceeding 0.96—underscoring the model’s robust generalization for varying physical scales. This cross-scale consistency is further reflected in the evaluation metrics: despite slight fluctuations in mean Average Precision@50, a 0.3% increase is achieved in mean Average Precision@0.5:0.95. Such a shift in performance indicators signifies that the algorithmic optimization prioritizes refining distress boundary localization precision over merely expanding classification coverage.

Although YOLOv11_MEEM+MFMSA aims to leverage the complementary advantages of both modules, the increased architectural complexity and parameter density induce gradient interference during optimization. This is reflected in the performance drop, where mean Average Precision@0.5:0.95 and F1-score fell to 0.866 and 0.961, respectively, while Frames Per Second decreased to 243.9. Although MEEM and MFMSA initially sharpen localized features and cross-scale attention to boost early Precision, their interaction creates optimization bottlenecks in later training phases. As data diversity increases, the conflicting objectives between shallow feature saliency and deep spatial weighting lead to stochastic oscillations in confidence scores, ultimately preventing the model from reaching an optimal convergence state.

5. Conclusions

This study evaluates Faster R-CNN, SSD, and YOLOv11 using a dedicated GPR subgrade distress dataset. To address the inherent limitations of YOLOv11, three enhancements—MEEM, MFMSA, and their combination—were assessed through systematic ablation.

(1): Stable convergence on the GPR dataset was achieved across all three architectures. This convergent stability, maintained without signs of overfitting, facilitated the effective learning of distress characteristics. YOLOv11 demonstrates the best overall performance balance among the evaluated models. While Faster R-CNN excels in mean Average Precision@0.5:0.95, highlighting its potential for high-precision localization, its low Frames Per Second restricts practical engineering deployment. Conversely, although SSD offers the fastest convergence, its inferior core metrics make it unsuitable for the demands of complex GPR scenarios.
(2): YOLOv11_MEEM achieves a 0.2% increase in Precision and a 0.3% improvement in mean Average Precision@0.5:0.95 at the cost of a marginal 0.2% reduction in Recall, making it ideal for multi-scale distress detection in noisy environments. While YOLOv11_MFMSA yields comparable Precision, its significantly lower Recall highlights a need for improved multi-frequency fusion efficiency. The YOLOv11_MEEM+MFMSA exhibits the poorest performance across core metrics, confirming the functional incompatibility of these two modules.
(3): By leveraging the dynamic edge enhancement mechanism, the MEEM achieves superior false detection control under complex road conditions, effectively balancing detection accuracy and computational efficiency. The YOLOv11_MEEM model is particularly well-suited for subgrade image recognition tasks. This model significantly reduces the detection and diagnosis cycle for subgrade defects, thereby providing reliable technical support for rapid infrastructure repair and resilience enhancement.
(4): The advancements in the intelligent interpretation of GPR images in this study provide significant value for the construction of sustainable transportation systems. By enabling early identification and real-time monitoring of subgrade distresses, the proposed model effectively reduces long-term maintenance costs and prevents catastrophic traffic accidents caused by subgrade collapse. Consequently, it enhances the operational sustainability of urban infrastructure from both economic and social perspectives, while supporting the digital management of smart city assets.
(5): Although effective, the model’s detection of small-scale distresses and the efficiency of feature fusion require further optimization. Given that data augmentation currently relies on manual screening, future research will introduce adaptive data augmentation techniques to enhance automation and systematically expand dataset diversity.

Author Contributions

Conceptualization, M.B.; methodology, Z.Z.; software, H.L.; validation, Q.M.; formal analysis, Q.M.; investigation, Q.M.; resources, M.B.; data curation, H.L.; writing—original draft preparation, M.B.; writing—review and editing, M.B. and Q.M.; visualization, Z.Z.; supervision, Z.Z.; project administration, H.L.; funding acquisition, M.B. All authors have read and agreed to the published version of the manuscript.

Funding

The study was supported by the Beijing Natural Science Foundation (8242018).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Qi, Y.L.; Bai, M.Z.; Li, Z.L.; Zhang, Z.L.; Wang, Q.H.; Tian, G. Study on intelligent recognition of urban road subgrade defect based on deep learning. Sci. Rep. 2024, 14, 28119. [Google Scholar] [CrossRef] [PubMed]
Wang, D.Y.; Qi, Y.L.; Bai, M.Z.; Li, Z.L.; Song, L.L.; Tian, G. Research on Radar Forward Modeling for Detecting Urban Road Subgrade Disease Based on Radar. In Engineering Geology for a Habitable Earth: IAEG XIV Congress 2023 Proceedings, Chengdu, China; Springer: Berlin/Heidelberg, Germany, 2024; pp. 143–159. [Google Scholar]
Cheng, Z.H.; Song, X.G.; Wang, J.Z.; Du, C.; Wu, J.Q. Intelligent identification for subgrade disease based on multi-source data. Measurement 2025, 251, 117200. [Google Scholar] [CrossRef]
Wubuli, A.; Li, F.F.; Zhou, C.Z.; Zhang, L.L.; Jiang, J.R. Knowledge Graph- and Bayesian Network-Based Intelligent Diagnosis of Highway Diseases: A Case Study on Maintenance in Xinjiang. Sustainability 2025, 17, 1450. [Google Scholar] [CrossRef]
Li, Y.N.; Liu, H.B.; Wang, S.L.; Jiang, B.; Fischer, S. Method of Railway Subgrade Diseases (defects) Inspection, based on Ground Penetrating Radar. Acta Polytech. Hung 2022, 19, 199–211. [Google Scholar] [CrossRef]
Xiong, H.Q.; Li, J.; Li, Z.L.; Zhang, Z.Y. GPR-GAN: A Ground-Penetrating Radar Data Generative Adversarial Network. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5200114. [Google Scholar] [CrossRef]
Jin, G.L.; Liu, Q.L.; Cai, W.L.; Li, M.J.; Lu, C.D. Performance Evaluation of Convolutional Neural Network Models for Classification of Highway Hidden Distresses with GPR B-Scan Images. Appl. Sci. 2024, 14, 4226. [Google Scholar] [CrossRef]
Liu, Z.H.; Xiao, J.P.; Shen, R.J.; Liu, J.X.; Guo, Z.W. Deep Learning-Based Suppression of Strong Noise in GPR Data for Railway Subgrade Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5915709. [Google Scholar] [CrossRef]
Yang, Y.; Huang, L.M.; Zhang, Z.H.; Zhang, J.; Zhao, G.M. CycleGAN-Based Data Augmentation for Subgrade Disease Detection in GPR Images with YOLOv5. Electronics 2024, 13, 830. [Google Scholar] [CrossRef]
Huang, Z.Y.; Xu, G.Y.; Tang, J.M.; Yu, H.Y.; Wang, D.Y. Research on Void Signal Recognition Algorithm of 3D Ground-Penetrating Radar Based on the Digital Image. Front. Mater. 2022, 9, 850694. [Google Scholar] [CrossRef]
Liu, H.; Wang, S.L.; Jing, G.Q.; Yu, Z.Y.; Yang, J.; Zhang, Y.; Guo, Y.L. Combined CNN and RNN Neural Networks for GPR Detection of Railway Subgrade Diseases. Sensors 2023, 23, 5383. [Google Scholar] [CrossRef]
Zhao, K.; Ren, X.X.; Kong, Z.Z.; Liu, M. Object detection on remote sensing images using deep learning: An improved single shot multibox detector method. J. Electron. Imaging 2019, 28, 033026. [Google Scholar]
Lenatti, M.; Narteni, S.; Paglialonga, A.; Rampa, V.; Mongelli, M. Dual-View Single-Shot Multibox Detector at Urban Intersections: Settings and Performance Evaluation. Sensors 2023, 23, 3195. [Google Scholar]
Xi, R.; Hou, J.; Lou, W. Potato Bud Detection with Improved Faster R-CNN. Trans. Asabe 2020, 63, 557–569. [Google Scholar]
Xu, X.Y.; Zhao, M.; Shi, P.X.; Ren, R.Q.; He, X.H.; Wei, X.J.; Yang, H. Crack Detection and Comparison Study Based on Faster R-CNN and Mask R-CNN. Sensors 2022, 22, 1215. [Google Scholar] [CrossRef]
Tian, Z.; Yang, F.; Yang, L.; Wu, Y.J.; Chen, J.Y.; Qian, P. An Optimized YOLOv11 Framework for the Efficient Multi-Category Defect Detection of Concrete Surface. Sensors 2025, 25, 1291. [Google Scholar] [CrossRef]
Zou, C.; Yu, S.Q.; Yu, Y.K.; Gu, H.T.; Xu, X.L. Side-Scan Sonar Small Objects Detection Based on Improved YOLOv11. J. Mar. Sci. Eng. 2025, 13, 162. [Google Scholar] [CrossRef]
Zha, X.; Peng, H.; Qin, X.; Li, G.; Yang, S.H. A Deep Learning Framework for Signal Detection and Modulation Classification. Sensors 2019, 19, 4042. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.C.; Yu, K.M.; Kao, T.H.; Hsieh, H.L. Deep learning based real-time tourist spots detection and recognition mechanism. Sci. Prog. 2021, 104, 00368504211044228. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.Z.; Niu, P.H.; Guo, X.Y.; Yang, G.E.; Chen, J. Single Shot Multibox Detector with Deconvolutional Region Magnification Procedure. IEEE Access 2021, 9, 47767–47776. [Google Scholar] [CrossRef]
Zhang, L.; Xing, B.W.; Wang, W.G.; Xu, J.X. Sea Cucumber Detection Algorithm Based on Deep Learning. Sensors 2022, 22, 5717. [Google Scholar] [CrossRef]
Li, H.L.; Huang, Y.Q.; Zhang, Z.J. An Improved Faster R-CNN for Same Object Retrieval. IEEE Access 2017, 5, 13665–13676. [Google Scholar] [CrossRef]
Qi, L.; Li, B.Y.; Chen, L.K.; Wang, W.; Dong, L.; Jia, X.; Huang, J.; Ge, C.W.; Xue, G.M.; Wang, D. Ship Target Detection Algorithm Based on Improved Faster R-CNN. Electronics 2019, 8, 959. [Google Scholar] [CrossRef]
Ding, X.T.; Li, Q.D.; Cheng, Y.Q.; Wang, J.B.; Bian, W.X.; Jie, B. Local keypoint-based Faster R-CNN. Appl. Intell. 2020, 50, 3007–3022. [Google Scholar] [CrossRef]
Huang, H.; Wang, C.; Liu, S.C.; Sun, Z.H.; Zhang, D.J.; Liu, C.C.; Jiang, Y.; Zhan, S.Y.; Zhang, H.F.; Xu, R. Single spectral imagery and faster R-CNN to identify hazardous and noxious substances spills. Environ. Pollut. 2020, 258, 113688. [Google Scholar] [CrossRef]
Cheng, C.; Cheng, X.Y.; Li, D.B.; Zhang, J.W. Drill pipe detection and counting based on improved YOLOv11 and Savitzky-Golay. Sci. Rep. 2025, 15, 167779. [Google Scholar] [CrossRef]
He, L.H.; Zhou, Y.Z.; Liu, L.; Cao, W.; Ma, J.H. Research on object detection and recognition in remote sensing images based on YOLOv11. Sci. Rep. 2025, 15, 14032. [Google Scholar] [CrossRef]
Gao, Y.L.; Xin, Y.B.; Yang, H.; Wang, Y.J. A Lightweight Anti-Unmanned Aerial Vehicle Detection Method Based on Improved YOLOv11. Drones 2025, 9, 11. [Google Scholar] [CrossRef]
Sapkota, R.; Flores-Calero, M.; Qureshi, R.; Badgujar, C.; Nepal, U.; Poulose, A.; Zeno, P.; Vaddevolu, U.; Khan, S.; Shoman, M.; et al. YOLO advances to its genesis: A decadal and comprehensive review of the You Only Look Once (YOLO) series. Artif. Intell. Rev. 2025, 58, 274. [Google Scholar] [CrossRef]
He, L.H.; Zhou, Y.Z.; Liu, L.; Ma, J.H. Research and Application of YOLOv11-Based Object Segmentation in Intelligent Recognition at Construction Sites. Buildings 2024, 14, 3777. [Google Scholar] [CrossRef]
Cheng, S.; Han, Y.; Wang, Z.Q.; Liu, S.J.; Yang, B.; Li, J.R. An Underwater Object Recognition System Based on Improved YOLOv11. Electronics 2025, 14, 201. [Google Scholar] [CrossRef]
Zhang, L.; Zheng, A.; Sun, X.Y.; Sun, Z.P. Enhanced YOLOv11-Based River Aerial Image Detection Research. IEEE Geosci. Remote Sens. Lett. 2025, 22, 8002405. [Google Scholar] [CrossRef]
Liu, J.; Zhao, J.Y.; Cao, Y.Y.; Wang, Y.; Dong, C.Y.; Guo, C.P. Road manhole cover defect detection via multi-scale edge enhancement and feature aggregation pyramid. Sci. Rep. 2025, 15, 10346. [Google Scholar] [CrossRef]
Jia, L.; He, X.; Huang, A.; Jia, B.B.; Wang, X.F. Highly efficient encoder-decoder network based on multi-scale edge enhancement and dilated convolution for LDCT image denoising. Signal Image Video Process. 2024, 18, 6081–6091. [Google Scholar] [CrossRef]
Chen, J.W.; Yue, J.H.; Zhou, H.; Hu, Z.Q. NAF-MEEF: A Nonlinear Activation-Free Network Based on Multi-Scale Edge Enhancement and Fusion for Railway Freight Car Image Denoising. Sensors 2025, 25, 2672. [Google Scholar] [CrossRef]
Nam, J.H.; Syazwany, N.S.; Kim, S.J.; Lee, S.C. Modality-Agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 11480–11491. [Google Scholar]
Mukherjee, D. Parallel implementation of discrete cosine transform and its inverse for image compression applications. J. Supercomput. 2024, 80, 23712–23735. [Google Scholar] [CrossRef]

Figure 1. The architecture of Single Shot MultiBox Detector.

Figure 2. The architecture of Faster Region-based Convolutional Neural Network.

Figure 3. The architecture of You Only Look Once v11.

Figure 4. Geological radar image data expansion. (a) original; (b) horizontal flipping; (c) atomization; (d) noise addition; (e) sharpening; (f) gaussian blur.

Figure 5. Three target detection model Loss curves. (a) SSD train loss; (b) Faster R-CNN train loss; (c) YOLOv11 train loss.

Figure 6. Three target detection model Loss curves. (a) Precision; (b) Recall; (c) mean Average Precision @0.5; (d) mean Average Precision @0.5:0.95.

Figure 7. The architecture of Multi-Scale Edge Enhancement Module.

Figure 8. The architecture of Multi-Feature Multi-Scale Attention.

Figure 9. Improved YOLOv11 algorithm modeling Loss curves. (a) YOLOv11_MEEM train loss; (b) YOLOv11_MFMSA train loss; (c) YOLOv11_MEEM+MFMSA train loss.

Figure 10. Comparison of core metrics of YOLOv11 improved model. (a) Precision; (b) Recall; (c) mean Average Precision@0.5; (d) mean Average Precision@0.5:0.95.

Table 1. Forward model parameter.

Category	Parameter Name	Setting Value
Basic configuration	Model size	1.5 m × 1.7 m × 1.5 m
Basic configuration	Excitation source/frequency	Ricker wavelet/400 MHz
Layer thickness Permittivity Conductivity	Air layer	0.2 m
	Asphalt layer	0.1 m, ϵ = 4, σ = 0.001 S/m
	Concrete layer	0.2 m, ϵ = 9, σ = 0.05 S/m
	Soil layer	1.2 m, ϵ = 23, σ = 0.1 S/m
Disease noise	Boundary random noise	0.02 m

Table 2. Data set composition.

Defect Types	Original	Augmented	Excluded	Total	Train	Val
loosening; voids; cavities; underground structures	1923	13,461	294	15,090	12,071	3019

Table 3. Core performance indicators for the three models.

	Precision	Recall	F1	mAP@0.5	mAP@ 0.5:0.95	FPS
SSD	0.971	0.881	0.924	0.946	0.886	126.9
Faster R-CNN	0.976	0.891	0.931	0.947	0.905	77.9
YOLOv11	0.986	0.952	0.969	0.986	0.898	286.5

Table 4. Model core performance metrics.

	Precision	Recall	F1	mAP@0.5	mAP@ 0.5:0.95	FPS
YOLOv11	0.986	0.952	0.969	0.986	0.898	286.5
YOLOv11_MEEM	0.988	0.949	0.968	0.982	0.901	294.1
YOLOv11_MFMSA	0.988	0.936	0.961	0.984	0.893	279.8
YOLOv11_MEEM+MFMSA	0.983	0.939	0.961	0.978	0.866	243.9

Table 5. Precision values for different types of disease.

Distress Type	YOLOv11	YOLOv11+MEEM	Change
Loose	0.995	0.995	0
Void	0.990	0.983	−0.007
Cavity	0.967	0.965	−0.002
Underground structure	0.993	0.986	−0.007
mAP@0.5	0.986	0.982	−0.004

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bai, M.; Ma, Q.; Liu, H.; Zhang, Z. Subgrade Distress Detection in GPR Radargrams Using an Improved YOLOv11 Model. Sustainability 2026, 18, 1273. https://doi.org/10.3390/su18031273

AMA Style

Bai M, Ma Q, Liu H, Zhang Z. Subgrade Distress Detection in GPR Radargrams Using an Improved YOLOv11 Model. Sustainability. 2026; 18(3):1273. https://doi.org/10.3390/su18031273

Chicago/Turabian Style

Bai, Mingzhou, Qun Ma, Hongyu Liu, and Zilun Zhang. 2026. "Subgrade Distress Detection in GPR Radargrams Using an Improved YOLOv11 Model" Sustainability 18, no. 3: 1273. https://doi.org/10.3390/su18031273

APA Style

Bai, M., Ma, Q., Liu, H., & Zhang, Z. (2026). Subgrade Distress Detection in GPR Radargrams Using an Improved YOLOv11 Model. Sustainability, 18(3), 1273. https://doi.org/10.3390/su18031273

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Subgrade Distress Detection in GPR Radargrams Using an Improved YOLOv11 Model

Abstract

1. Introduction

2. Object Detection Algorithms

2.1. Single Shot MultiBox Detector

2.2. Faster Region-Based Convolutional Neural Network

2.3. You Only Look Once v11

3. Evaluation of Object Detection Algorithms

3.1. Dataset Processing

3.2. Experimental Evaluation Metrics

3.3. Analysis of Tunnel Burial Depth Effect

4. Improvements to the YOLOv11 Algorithm

4.1. Multi-Scale Edge Enhancement Module

4.2. Multi-Feature Multi-Scale Attention

4.3. Analysis of Ablation Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI