End-to-End Model Enabled GPR Hyperbolic Keypoint Detection for Automatic Localization of Underground Targets

Hou, Feifei; Zhang, Yu; Dong, Jian; Fan, Jinglin

doi:10.3390/rs17162791

Open AccessArticle

End-to-End Model Enabled GPR Hyperbolic Keypoint Detection for Automatic Localization of Underground Targets

¹

School of Automation, Central South University, Changsha 410017, China

²

School of Electronic Information, Central South University, Changsha 410017, China

³

School of Architecture and Art, Central South University, Changsha 410017, China

⁴

School of Construct Management, Hunan University of Finance and Economics, Changsha 410205, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(16), 2791; https://doi.org/10.3390/rs17162791

Submission received: 7 July 2025 / Revised: 5 August 2025 / Accepted: 6 August 2025 / Published: 12 August 2025

(This article belongs to the Special Issue Advanced Ground-Penetrating Radar (GPR) Technologies and Applications)

Download

Browse Figures

Versions Notes

Abstract

Ground-Penetrating Radar (GPR) is a non-destructive detection technique widely employed for identifying underground targets. Despite its utility, conventional approaches suffer from limitations, including poor adaptability to multi-scale targets and suboptimal localization accuracy. To overcome these challenges, we propose a lightweight deep learning framework, the Dual Attentive YOLOv11 (You Only Look Once, version 11) Keypoint Detector (DAYKD), designed for robust underground target detection and precise localization. Building upon the YOLOv11 architecture, our method introduces two key innovations to enhance performance: (1) a dual-task learning framework that synergizes bounding box detection with keypoint regression to refine localization precision, and (2) a novel Convolution and Attention Fusion Module (CAFM) coupled with a Feature Refinement Network (FRFN) to enhance multi-scale feature representation. Extensive ablation studies demonstrate that DAYKD achieves a precision of 93.7% and an mAP50 of 94.7% in object detection tasks, surpassing the baseline model by about 13% in F1-score, a balanced metric that combines precision and recall to evaluate overall model performance, underscoring its superior performance. These findings confirm that DAYKD delivers exceptional recognition accuracy and robustness, offering a promising solution for high-precision underground target localization.

Keywords:

ground-penetrating radar (GPR); hyperbolic characteristic; YOLO; dual-task learning; objection detection; keypoint detection

1. Introduction

Ground-Penetrating Radar (GPR), an advanced non-destructive testing (NDT) technology, is renowned for its high-precision localization, rapid scanning, operational flexibility, and superior detection accuracy [1]. This technique finds broad applications, ranging from soil contamination assessment [2], underground pipeline detection [3], sediment detection [4], and subsurface exploration [5]. By emitting high-frequency electromagnetic waves and analyzing their reflections, GPR enables the accurate identification of subsurface anomalies [6]. Effective target localization enhances the interpretation of GPR data, thereby facilitating the accurate mapping of subsurface features. The advantages of precise localization are substantial. It enables the identification of underground voids, which is crucial for assessing potential hazards such as sinkholes or soil instability [7]. Furthermore, it ensures reliable mapping of utility networks, minimizing excavation risks during construction [8]. Such capabilities make GPR indispensable for infrastructure safety and geotechnical engineering.

Conventional GPR image recognition predominantly depends on template matching and image processing techniques. Liu et al. [9] applied the Sobel operator to detect hyperbolic edges in GPR images, while Li et al. [10] leveraged randomized Hough transforms for automated root identification. Maas et al. [11] adopted the Viola–Jones learning algorithm to constrain regions of interest, thereby reducing computational overhead in Hough transform-based localization. M. Sharafeldin et al. [12,13] deployed a total of 10 electrical resistivity imaging (ERI), 26 shallow seismic refraction (SSR), and 19 GPR survey lines across the Giza Plateau, and performed integrated inversion to build a three-layer subsurface model, accurately characterizing the groundwater aquifer and its water table depth. Luo et al. [14] introduced a Laser Dynamic Deflectometer method that uses vehicle-mounted laser Doppler sensors to capture pavement deflection velocity anomalies as indicators of subsurface cavities for rapid, non-invasive road network screening. Recently, machine learning has revolutionized hyperbolic feature recognition in GPR data. Dou et al. [15] developed a Connected Component Clustering (C3) algorithm to isolate hyperbolas within target regions, whereas Zhang et al. [16] proposed a symmetry-based method for root detection, enabling radius estimation via multidirectional feature extraction. Nevertheless, these approaches, whether traditional or neural network-based, suffer from inherent limitations: manual parameter tuning, suboptimal training efficiency, and restricted detection accuracy.

Current methodologies primarily leverage deep learning to improve detection accuracy and computational efficiency, broadly categorized into two-task and single-task approaches. Two-stage detectors initially identify regions of interest through a candidate region generation step, followed by precise classification and bounding box regression on these regions to achieve high-precision object detection [17]. Pham et al. [18] employed Faster Region-based Convolutional Neural Networks (Faster R-CNN) [19] outperforming traditional HOG-based methods on real-world datasets; Cui et al. [20] developed a Faster R-CNN-based framework for highway GPR layer detection, enabling real-time, high-accuracy analysis; Li et al. [21] proposed an optimized model based on Mask R-CNN to achieve centimeter-scale void morphology and location detection with high precision and quantitative analysis; However, two-stage detection methods lead to slow processing speeds and prolonged training times, with insufficient information fusion between stages increasing false alarm rates, thus impacting the overall system performance and reliability.

In contrast, single-stage detectors such as YOLO (You Only Look Once) [22] and Single Shot MultiBox Detector (SSD) [23] directly predict object categories and locations without RPNs, offering superior real-time performance for subsurface target detection [24]. Wang et al. [25] augmented SSD by adding feature pyramid fusion layers and introducing Generalized Intersection over Union to enhance underground target detection performance; Qui et al. [26] modified YOLOv5’s architecture for small-target detection in GPR imagery; Hu et al. [27] incorporated attention mechanisms into YOLOv5 to refine hyperbolic feature recognition; Wang et al. [28] extend the YOLO-v7 architecture with an unsupervised domain-adaptive network, training jointly on simulated finite-difference time-domain-based GPR data and real-world GPR images to robustly detect delamination defects in subsurface pavement layers; Tian et al. [29] proposed a state-space-model-based detection method capable of processing arbitrarily long GPR B-scan sequences using an SSM framework to robustly localize hyperbolic features; Wang et al. [30] integrated CBAM into YOLOv8 for urban subsurface defect detection. However, existing studies using object detection models can only detect and frame the approximate location of targets without precise subsurface target localization. To address this challenge, Li et al. [31] extended YOLOv4 with a keypoint detection branch (YOLOv4-hyperbola), enabling joint bounding-box and vertex prediction. Since both methods aim to locate specific subsurface targets, issues such as poor compatibility and accuracy for tasks involving different scale targets persist, affecting efficiency and reliability in practical applications.

This study introduces Dual Attentive YOLOv11-based Keypoint Detector (DAYKD), a lightweight multi-task deep learning framework for precise detection and localization of underground targets in GPR data. As illustrated in Figure 1, The framework consists of three stages: dataset integration and annotation, target detection using an attention-enhanced YOLO-DAFRNet module, and keypoint localization within detected regions. By incorporating dual-task optimization, attention-based feature enhancement, and a cascaded learning structure, DAYKD effectively addresses three core challenges in GPR-based subsurface target interpretation:

(1) The methodology is structured into two distinct tasks: target detection and keypoint detection. In the initial task, the DAYKD model is trained using a target detection dataset to accurately identify and localize potential target regions. In the subsequent task, a portion of the weights from the first task is shared, and the model is further trained on a keypoint detection dataset, thus refining the accuracy of both target detection and localization in underground environments.

(2) Incorporating two specialized modules—the Convolution and Attention Fusion Module (CAFM) [32] and the Feature Refinement Network (FRFN) [33] to enhance the network’s performance during both the target detection and keypoint recognition tasks. These modules optimize the network architecture by improving its global perception capabilities and enhancing its ability to extract multi-scale features from images. This, in turn, facilitates the refinement of the feature selection process and results in a substantial increase in recognition accuracy.

(3) A notable innovation of this study lies in the task-specific partitioning of the dataset. The complete dataset is divided into two tailored subsets: one dedicated to target detection, focusing on identifying the presence and number of buried objects; and the other oriented toward keypoint detection, aimed at localizing underground target. This task-aligned dataset design allows the model to be trained more effectively for each objective, leveraging the shared feature representation while optimizing performance for both detection and localization tasks. This dual-purpose dataset structure enhances the modularity and flexibility of the proposed framework.

2. Dual-Task Framework Designed to Enhance GPR Target Localization

2.1. Challenge Analysis and Scheme Design

The identification and localization of subsurface targets in GPR imaging present dual technical challenges: on one hand, high detection rates must be achieved, while simultaneously meeting stringent spatial accuracy requirements due to the unique reflection characteristics of underground voids, buried pipelines, and similar targets [3,7]. Although most research emphasizes target detection, the critical role of localization is often neglected. Direct detection without localization faces inherent challenges: (1) Hyperbolic patches may be overly restrictive, failing to account for scenario variations such as overlapping or distorted hyperbolas, noise interference [34], and multi-hyperbola overlaps. (2) Patch-size selection poses dilemmas—excessive dimensions amplify clutter, while insufficient sizes fail to capture complete hyperbolas, risking misclassification [35]. (3) Even with successful detection and localization, complex soil backgrounds under natural conditions can generate clutter signals resembling hyperbolic patterns, resulting in spurious detections [31,36].

Notably, in facial recognition, deep learning localize 29 or 68 keypoints (e.g., eyebrow tips and nose tips) in two-dimensional images under varying conditions, achieving precise coordinate extraction [37,38]. This methodology inspires a novel approach for GPR data processing, where hyperbola keypoints—rather than traditional patches—are identified. However, adapting facial keypoint detection to GPR data presents unique challenges: (1) GPR data reflect electromagnetic signals, differing inherently from facial images. (2) Facial keypoints follow established anatomical standards, while GPR keypoints rely on electromagnetic scattering traits, such as hyperbola vertices and asymptote intersections, introducing inevitable discrepancies.

Prior studies have explored facial recognition techniques for GPR localization. Li et al. [31] incorporated a keypoint detection branch into the YOLOv4 model to achieve both target box detection and vertex localization in complex GPR images. Experimental results demonstrated that under natural conditions, the single-stage detection algorithm achieved a detection efficiency of 74% in field root system identification tasks. Fang et al. [39] proposed an improved YOLOv8-Pose algorithm to detect three keypoints of a hyperbola for precise hyperbola localization. However, there exists a trade-off between localization accuracy and object detection performance in single-stage models, where the introduction of keypoint detection often leads to a decline in detection accuracy. To resolve this limitation, we propose a dual-task detection framework based on YOLOv11, optimizing both object detection performance and keypoint localization accuracy. Its pseudocode is given in Algorithm 1.

Algorithm 1: Dual Attentive YOLOv11 Keypoint Detector

Input:

D_{o}

: Object detection dataset

D_{k}

: Keypoint detection dataset

I

: Input image
Output:

R

: Result image with detected targets and keypoints

1. Dataset Pre-processing
1.1

D

← AnnotateAndMerge (

D_{o}

,

D_{k}

)
1.2 [

D_t r a i n

,

D_t e s t

] ← Split (

D

)

2. Target Detection
2.1

F_{o}

← YOLO_DAFRNet_Detect (

I

)
//Backbone (w/CAFM) → Neck → Head
2.2 {

T_{1}

,

T_{2}

,…,

T_{n}

} ← SegmentTargets (

F_{o}

)

3. Keypoint Detection and Position Determination
for each

T_{i}

∈ {

T_{1}

, …,

T_{n}

} do
3.1

F_{k}

← YOLO_DAFRNet_Keypoint (

T_{i}

)
//Backbone (w/CAFM) → Neck → Head
3.2

K_{i}

← MapKeypoints (

F_{k}

,

I

)
3.3 TargetPosition_i ← DeterminePositionFromKeypoints (

K_{i}

)
//Determined by the center point (x, y) of the predicted bounding box
end for

4. Result Generation

R

← Compose (

I

, {

K_{1}

,

K_{2}

, …,

K_{n}

})
Return:

R

2.2. Task 1: Target-Candidate Region Detection

The object detection task involves localizing and identifying objects within an image, typically accomplished by annotating detected objects with bounding boxes (BBox). For underground target detection, we employ an optimized YOLOv11 algorithm [40] that features a decoupled head framework. This architecture separates object detection, category prediction, and keypoint detection into three independent processing branches, thereby enhancing task-specific detection accuracy.

The object detection head is illustrated in Figure 2. First, the model processes three feature maps of different scales (

80 \times 80

,

40 \times 40

,

20 \times 20

) output by the feature extraction network. Each feature map undergoes parallel processing through two convolutional layer sequences: one dedicated to bounding box regression prediction and another for category probability prediction. These sequences generate regression values for bounding boxes and probability scores for object categories through multiple convolutional operations. During the training task, the model directly outputs the processed feature maps for subsequent loss calculation. In the inference task, the model decodes these outputs to generate final detection results, transforming them into four-dimensional bounding box coordinates through a combination of sigmoid activation and convolutional operations, as detailed in Equations (1)–(4):

x = (2 \cdot σ (t_{x}) + c_{x} - 0.5) \times s

(1)

y = (2 \cdot σ (t_{y}) + c_{y} - 0.5) \times s

(2)

w = p_{w} \cdot e^{t_{w}}

(3)

h = p_{h} \cdot e^{t_{h}}

(4)

t_{x}

,

t_{y}

,

t_{w}

,

t_{h}

are the predicted values of the model;

c_{x}

,

c_{y}

are the coordinates of the top-left corner of the grid cell;

p_{w}

,

p_{h}

are the width and height of the prior anchors;

σ

is the Sigmoid function; and

e^{t_{w}}

,

e^{t_{h}}

are the adjustment factors for width and height.

x

,

y

,

w

,

h

are the coordinates and dimensions of the output bounding box.

s

represents the stride of the current feature map, which facilitates the conversion of normalized coordinates to the input image scale. This transformation process converts the model’s raw predictions into manageable bounding box coordinates, significantly simplifying subsequent bounding box operations. The Non-Maximum Suppression (NMS) algorithm is then employed to eliminate redundant bounding boxes, maintaining detection accuracy while ensuring the number of boxes per image remains within practical limits. Following this, the system identifies and marks candidate regions containing hyperbolic features through explicit bounding box annotations. Finally, based on the positions of these refined bounding boxes, the original image is cropped to extract regions of interest (ROIs), which are subsequently processed in the keypoint detection task.

2.3. Task 2: Hyperbolic Keypoint Detection

While the object detection framework successfully identifies candidate regions, it lacks the precision required for exact localization of subsurface targets. Therefore, this section introduces the keypoint detection task, aiming to accurately pinpoint underground targets. This task performs fine-grained feature analysis within the candidate regions to precisely locate characteristic points of hyperbolic signatures, thereby overcoming the inherent constraints of conventional object detection frameworks that typically fail to adequately capture the geometric configuration and spatial orientation of hyperbolic features.

2.3.1. Standard of Keypoint Selection

This method is grounded in the physical principles of electromagnetic wave propagation and reflection geometry in GPR systems. When electromagnetic waves emitted from the radar antenna encounter a subsurface target—such as a pipe or an object with a significant dielectric contrast—a portion of the wave is reflected back to the receiver. During a B-scan, variations in the travel path between the antenna and different points on the target result in a characteristic hyperbolic pattern in the radargram. The vertex of the hyperbola corresponds to the shortest two-way travel time, occurring when the antenna is directly above the target, thus providing a direct spatial indicator of the target’s lateral position under ideal conditions, assuming a constant wave velocity and negligible signal distortion within the subsurface medium. This approach is generally effective when the target exhibits a compact and symmetric shape (e.g., cylindrical or spherical), the dielectric properties of the medium are relatively homogeneous or gradually varying, the GPR system possesses sufficient resolution and signal-to-noise ratio to clearly resolve hyperbolic features, and the target reflection is prominent relative to background noise and clutter. By incorporating five strategically selected keypoints—especially emphasizing the hyperbolic vertex and its symmetrically distributed flanking points—this method captures both the geometric symmetry and curvature of the reflection pattern, thereby enhancing localization accuracy and robustness even under noisy conditions or partial occlusion. As illustrated in Figure 3, point 1 denotes the hyperbolic vertex; points 2 and 3 correspond to the left and right midpoints, assisting the model in more accurately learning the vertex location; and points 4 and 5 represent the left and right tail points, enabling the precise estimation of the target’s radius and aperture characteristics.

2.3.2. Coordinate Point Mapping

The detected keypoint coordinates are initially computed relative to the cropped image regions. To determine the absolute positions of subsurface targets within the original GPR image, these local coordinates must be transformed to the global image coordinate system through the mapping relationships defined in Equations (5) and (6):

x = x_{c} + x_{0}

(5)

y = y_{c} + y_{0}

(6)

Figure 4 demonstrates the two-dimensional coordinate transformation between the original image coordinate system and the cropped image coordinate system.

(x, y)

represents the coordinates of a keypoint in the original image coordinate system,

(x_{c}, y_{c})

represents the coordinates of the keypoint in the cropped image coordinate system, and

(x_{0}, y_{0})

represents the translation vector from the original coordinate system to the cropped image coordinate system, i.e., the coordinates of the upper left corner of the cropped image in the original image coordinate system. Based on the geometric meaning of vector addition, the true coordinates of the keypoints in the original image coordinate system can be obtained, thereby achieving accurate positioning of targets in GPR images.

3. The Proposed DAYKD Model

The development of DAYKD focuses on enhancing feature extraction capability, multi-scale feature fusion, and the speed and stability of the loss function. The proposed method inherits the four-part structure of YOLOv11, namely the backbone, neck, and head, as illustrated in Figure 5.

3.1. Basic Model Architecture

Inspired by the YOLOv11 model, we adopt it as the foundational framework and propose a novel dual-task detection architecture. As the first YOLO series model to officially support tasks both object detection and keypoint detection tasks, YOLOv11 inherently possesses the necessary capabilities for a two-task approach, dynamically invoking task-specific detection heads. Notably, the model has demonstrated exceptional performance in facial keypoint detection, indicating strong potential for effective transferability to GPR keypoint detection tasks.

Building upon YOLOv8 [41], YOLOv11 introduces optimizations across its backbone network, feature fusion layers, and detection heads, achieving superior inference speed and accuracy. The overall network architecture is illustrated in Figure 5.

The YOLOv11 backbone network is designed to extract hierarchical features from input images, efficiently capturing visual information across low-level to high-level representations. The architecture replaces the original C2F (Cross-task Partial with Two Fusions) module with an improved C3K2 (Cross-task Partial Network v3 with K2 convolution layers) module, enhancing feature extraction efficiency through a parallel convolutional design and adaptive parameter configuration. Specifically, the C3K2 module employs dual convolutional layers in place of a single large kernel, alongside a channel-splitting strategy to reduce computational complexity. Additionally, variable-sized convolutional kernels are applied to expand the receptive field, significantly improving performance in large-object detection and complex background scenarios. Following the SPPF (Spatial Pyramid Pooling Fast) module, we introduce a novel C2PSA (Cross-task Partial with Pyramid Squeeze Attention) module. This component splits feature maps using a CSP-based approach: one branch propagates features directly, while the other undergoes dynamic spatial refinement via a PSA attention mechanism before feature concatenation. This design not only reduces computational overhead but also enhances the model’s ability to focus on occluded objects and critical regions. The backbone progressively extracts multi-scale feature maps (P1 to P5), enriching semantic information at varying depths while minimizing model parameters. This hierarchical structure significantly improves the recognition of complex patterns (e.g., hyperbolic signatures) and enhances the detection of subtle details in GPR images, such as minor subsurface structural variations and intricate hyperbolic-shaped targets.

The feature fusion layer generates three multi-scale convolutional outputs with resolutions corresponding to 1/8, 1/16, and 1/32 of the original input image. In the detection head architecture, YOLOv11 employs a decoupled structure that independently processes classification, localization, and keypoint detection through dedicated convolutional branches. This design significantly enhances the model’s adaptability to GPR image characteristics. A key innovation in the classification detection head is the implementation of Depthwise Convolution (DWConv), which performs channel-wise spatial convolution while eliminating inter-channel interactions. This architectural choice achieves substantial reductions in both parameter count and computational complexity. The detection pipeline processes multi-scale feature maps through task-specific convolutional layers to extract relevant semantic features. These features are then unified through a final convolution operation that transforms the multi-scale representations into the required vector space, generating the model’s final output.

3.2. Enhanced Feature Representation for Object Detection

This section presents the optimization of the attention mechanism in the YOLOv11 framework. Specifically, the attention layer in the C2PSA module within the backbone has been replaced with the CAFM, which integrates the strengths of convolutional neural networks and transformers. By employing convolution operations to extract local features and the attention mechanism to capture global features, the CAFM effectively models both global and local characteristics, thereby enhancing detection performance.

Globel and Local Feature Extraction: CAFM in Backbone

The conventional convolution operation, while effective at capturing local features, suffers from an inherently limited receptive field that hinders its ability to model global contextual information effectively. In contrast, transformer architectures excel at capturing long-range dependencies through their attention mechanisms. To bridge this gap, we introduce the CAFM, which synergistically integrates convolutional operations with attention mechanisms to enable comprehensive joint modeling of both local and global features. The proposed module comprises two branches: a local branch and a global branch. In the local branch, convolution operations and channel reordering are employed to extract local features. Meanwhile, the global branch utilizes the attention mechanism to capture long-range feature dependencies.

The proposed method improves the final module of the YOLOv11 backbone, replacing the original CSPSA module. Building upon the original PSABlock, which enhances spatial features solely through multi-scale convolution and channel weighting, the CAFM introduces a series of structural improvements to enhance feature extraction and information interaction capabilities.

The CAFM processes features through two complementary branches. In the local feature extraction branch, channel shuffle operations disrupt channel independence to enhance feature diversity. Following channel dimension adjustment via

1 \times 1

convolution, features are divided into subgroups where depthwise convolution enables cross-channel information mixing. Within each subgroup, Depthwise Convolution is applied to achieve cross-channel information mixing. This design not only reduces the number of parameters but also effectively enhances feature representation capability. In the global feature extraction branch, CAFM further incorporates a learnable scaling parameter, which adjusts the magnitude of the Q-K similarity matrix prior to the SoftMax function. This adjustment enhances the model’s representational capacity while mitigating the risk of the attention matrix becoming excessively sharp.

The module ultimately combines local and global features through element-wise addition, effectively synthesizing information from different receptive fields. This architecture significantly enhances both feature extraction capability and global dependency modeling. Compared to the original PSABlock, CAFM demonstrates superior performance in capturing complex patterns and long-range spatial relationships while maintaining computational efficiency.

3.3. Enhanced Feature Refinement for Keypoint Detection

In this section, the FRFN mechanism is incorporated into the C3k2 module of YOLOv11. Although the C3k2 module is an optimized version of the traditional CSP Bottleneck structure in YOLOv11, keypoint detection tasks demand stronger feature refinement and fusion capabilities. The FRFN mechanism effectively addresses this requirement by transforming and optimizing feature representations. Specifically, it enhances feature information, reduces redundant data, and improves information expression along the channel dimension, thereby strengthening the model’s ability to capture and utilize critical features.

Refined Details: C3K2 with FRFN

While conventional feed-forward networks (FFNs) process information independently at each pixel location to enhance feature representations through self-attention mechanisms, their performance in keypoint localization can be significantly degraded by redundant spatial information. To address this limitation, we introduce the FRFN with an enhance and simplify paradigm. This approach first employs Partial Convolution (PConv) to amplify informative feature elements critical for keypoint detection. Subsequently, a gating mechanism selectively suppresses redundant information propagation, effectively reducing interference from irrelevant features and improving localization accuracy.

Building upon this foundation, we develop an enhanced C3k2_FRFN module by integrating the FRFN mechanism with the C3K2 architecture. This hybrid module combines the computational efficiency of CSPNet’s partial feature flow design with the refined feature processing capabilities of FRFN. The architecture splits input feature maps into two parallel processing streams, maintaining optimal gradient flow while enabling more effective feature fusion. This dual-path design significantly enhances the network’s ability to extract and combine multi-scale features, particularly benefiting keypoint detection tasks that require precise spatial localization.

Building on this, the module introduces the FRFN mechanism to further enhance feature refinement and fusion, particularly in capturing spatial and channel information. By combining partial convolution and depthwise separable convolutions, FRFN reduces computational complexity while expanding the receptive field, thereby improving the model’s ability to express local features. Moreover, the C3k2_FRFN module uses a gate mechanism to adaptively fuse features from different sources, ensuring efficient feature flow and maximizing information extraction. This improved design not only enhances the network’s performance in complex visual tasks but also significantly boosts the robustness and computational efficiency of the model, especially for high-precision tasks such as small object detection and keypoint localization.

3.4. Loss Function Strategy

The original loss function of YOLOv11 includes classification loss, object confidence loss, and bounding box regression loss. The total loss function is shown in Equation (7):

L_{t o t a l} = λ_{c l s} \cdot L_{c l s} + λ_{o b j} \cdot L_{o b j} + λ_{b o x} \cdot L_{b o x}

(7)

In this context,

λ_{c l s}

,

λ_{o b j}

, and

λ_{b o x}

, respectively, represent classification loss function weighting factors, object confidence loss function, and bounding box regression loss function. By designing different weighting combinations, the same DAYKD model can be tailored to emphasize either hyperbolic target detection or keypoint detection tasks in GPR images.

The YOLOv11 architecture employs Binary Cross-Entropy (BCE) as its classification loss function to quantify the discrepancy between predicted probabilities and ground truth labels. Specifically, the loss function calculates the overall loss by averaging the loss for each predicted sample. The complete loss formulation is presented in Equation (8):

L_{c l s} = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log (\overset{\land}{y_{i}}) + (1 - y_{i}) \log (1 - \overset{\land}{y_{i}})]

(8)

where

\overset{\land}{y_{i}}

is the output classes of the network,

y_{i}

is the true label classes, and

N

is the total number of samples. These elements are fundamental in computing the classification loss, which evaluates the model’s ability to correctly assign class labels across all samples in the dataset.

The object confidence loss function, defined in Equation (9), employs a binary cross-entropy formulation to assess the agreement between predicted objectness scores and their ground-truth counterparts. It comprises two weighted components: the first penalizes deviations in grid cells containing objects, while the second regulates predictions in background regions, with a balancing coefficient mitigating class imbalance.

L_{o b j} = - \frac{1}{N} \sum_{i = 1}^{N} [p_{i} \log (o_{i}) + (1 - p_{i}) \log (1 - o_{i})]

(9)

where

o_{i}

is the object confidence output by the network and

p_{i}

is the actual label indicating the presence of an object (1 or 0). This binary supervision guides the network in distinguishing between object and background regions during training.

The bounding box regression loss function, which quantifies the discrepancy between predicted bounding boxes and ground truth annotations, is formally defined in Equation (10). This loss function is pivotal in guiding the model toward precise object localization by minimizing spatial deviations throughout the training process.

L_{b o x} = 1 - IoU + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α \cdot v

(10)

where

IoU

is the intersection over union between the predicted and ground-truth bounding box,

ρ^{2} (b, b^{g t})

is the Euclidean distance between the centers of the predicted and ground-truth bounding boxes,

c

is the diagonal length of the smallest enclosing area that contains both boxes,

v

is an additional term that measures the consistency of the aspect ratio, and

α

is the weight used to balance the aspect ratio term.

4. Experiment Results and Analysis

4.1. Experimental Settings and Evaluation Metrics

4.1.1. Experimental Settings

The training protocol for the proposed model employed hyperparameters detailed in Table 1. Model performance was evaluated at each epoch termination, with the optimal training epoch selected according to the mean Average Precision (mAP) metric on the validation set.

Considering the varying image scales in the dataset, different batch sizes and image dimensions were used for different tasks. In the object detection task, a batch size of 32 and an image resolution of

512 \times 512

were employed; in the keypoint detection task, a batch size of 64 and an image resolution of

128 \times 128

were adopted. All experiments were conducted on an NVIDIA RTX 3050 GPU and 12-core Intel Core i5-12500H CPU platform to satisfy computational demands.

4.1.2. Evaluation Metrics

To conduct a more objective and scientific quantitative evaluation of the algorithm’s performance, this study employs commonly used evaluation metrics such as precision, recall, mAP, and F1-score. Precision measures the model’s ability to correctly identify positive samples, while recall assesses the model’s capability to comprehensively cover positive samples. As a comprehensive metric, mAP further evaluates the model’s overall performance by calculating a weighted average across multiple thresholds. F1-score effectively mitigates the weakest link effect by balancing precision and recall through emphasizing their lower value. This multi-metric evaluation framework ensures robust performance characterization across all critical aspects of detection quality, validating both the effectiveness and practical reliability of the proposed method. The formulas for the aforementioned evaluation metrics are shown in Equations (11)–(16):

Precision = \frac{T P}{T P + F P}

(11)

Recall = \frac{T P}{T P + F N}

(12)

F 1 score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

(13)

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(14)

m A P 50 = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}^{I o U = 0.5}

(15)

m A P 50 - 95 = \frac{1}{T \times N} \sum_{t = 1}^{T} \sum_{i = 1}^{N} A P_{i}^{I o U = τ_{t}}

(16)

In the formulas of the above evaluation metrics,

T P

(True Positives) denotes the number of correctly predicted positive samples,

F P

(False Positives) refers to the number of incorrectly predicted positive samples, and

F N

(False Negatives) represents the number of actual positive samples that were not detected by the model.

N

denotes the total number of object categories involved in the evaluation. For mAP50-95,

τ_{t}

represents the

I o U

threshold at the

t

-th level, ranging from 0.50 to 0.95 with a step size of 0.05, and

T

= 10 indicates the total number of such thresholds. The term

A P_{i}^{I o U = τ_{t}}

corresponds to the Average Precision of the

i

-th category under the

I o U

threshold

τ_{t}

.

4.2. GPR Dataset Collection

In this section, a total of 1617 images from both the simulated and real datasets were annotated with bounding boxes using LabelMe [42], and divided into a training set and a test set in a ratio of 8:2. For the training of the keypoint dataset, a total of 6112 images were obtained through cropping, and the dataset was split in a ratio of 8:2.

4.2.1. Simulated Dataset

To rigorously evaluate the proposed method’s performance, this study conducted comprehensive experiments using both simulated and real-world datasets. The simulated dataset consists of 1500 images generated using gprMax [43] to address the issue of limited samples in the real-world dataset. The simulation environment was modeled as a 2.5 m × 0.5 m × 2.5 mm 3D domain, discretized with a spatial resolution of 5 mm (x, y) and 2.5 mm (z). A Ricker wavelet with a center frequency of 0.9 GHz served as the excitation signal. The propagation medium was primarily concrete, with an overlying air layer.

Cylindrical metallic targets (modeled as perfect electric conductors, PEC) were randomly placed within the concrete volume, with their axes aligned along the z-direction. These images are categorized into three classes based on the number of targets: images with 1–9 targets, 10–15 targets, and 16–20 targets, distributed in a ratio of 5:3:2. Each simulation scenario includes a paired modeling script and resulting B-scan image, ensuring one-to-one correspondence for analysis. This stratified design maintains a balanced representation of detection difficulty and reflects real-world complexity distributions.

4.2.2. Field Dataset

The real-world dataset is divided comprises two distinct acquisition scenarios, with the data collection process detailed in Figure 6. The final constructed dataset comprises a total of 167 GPR images. Field scene 1: The first part is from Neyland Pedestrian Bridge at the University of Tennessee, Knoxville, as shown in Figure 6a. Data collection was performed using a GSSI SIR-4000 GPR system with a 2 GHz frequency antenna. Field scene 2: The second part is from the laboratory of the School of Earth Sciences at Guilin University of Technology using a GSSI SIR-4000 GPR system with a 400 MHz frequency antenna. The data acquisition scene and the tested concrete wall diagram are shown in Figure 6b. Steel bars with a diameter of 5 cm were embedded on both the left and right sides of the wall. The left side contained solid steel bars, while the right side featured hollow steel bars. The distances of these steel bars from the outer wall boundary were 10 cm, 20 cm, 30 cm, and 35 cm, respectively.

4.3. Ablation Experiment

4.3.1. Performance of Individual or Combination Improvements

To rigorously evaluate the contribution of each proposed enhancement, we conducted systematic ablation studies assessing individual components’ impact on model performance. Each improvement method was individually tested, with F1-score, mAP50, mAP50-95, precision, and recall selected as evaluation metrics for model accuracy. Parameter count and Giga Floating-point Operations Per Second (GFLOPS) were used to assess model complexity, while FPS was adopted as the evaluation metric for inference speed.

As evidenced by the ablation study results summarized in Table 2, the DAYKD framework exhibits marked improvements across all evaluation metrics, attaining an F1-score of 0.929, mAP50 of 0.947, and mAP50-95 of 0.825. Comparative analysis reveals significant performance advantages over the baseline model, with the F1-score exhibiting a 13% increment, mAP50 demonstrating a 12% enhancement, and mAP50-95 showing a notable 16% elevation. The YOLOv11 + CAFM configuration achieves respective improvements of 9% and 10% in mAP50 and mAP50-95 metrics relative to the baseline, substantiating the efficacy of the CAFM in augmenting feature extraction capabilities. The FRFN module further optimizes multi-scale feature fusion, allowing YOLOv11 + FRFN to attain an mAP50-95 of 0.788, corresponding to a 15% advancement compared to the baseline model.

The synergistic integration of CAFM and FRFN modules in DAYKD facilitates optimal detection performance. Precision and recall metrics demonstrate notable enhancements of approximately 6% and 7%, respectively, indicating superior capability in minimizing false positive rates while maximizing detection sensitivity. Although exhibiting increased parameter count (1.28 M vs. baseline 1.15 M) and computational complexity (3.2 G FLOPs vs. baseline 2.8 G FLOPs), DAYKD maintains real-time processing efficiency with an inference speed of 65.4 FPS, an 8% improvement over the baseline. While YOLOv11 + CAFM achieves marginally higher frame rates (68.2 FPS), its detection accuracy remains suboptimal compared to the integrated DAYKD architecture.

The experimental findings collectively demonstrate that the CAFM and FRFN modules significantly enhance feature representation capacity and detection precision through distinct yet complementary mechanisms. DAYKD effectively leverages both architectural enhancements, achieving comprehensive performance optimization that balances accuracy metrics with computational efficiency. This integration strategy yields state-of-the-art performance in object detection tasks, particularly in scenarios requiring high-precision recognition across multiple scales.

4.3.2. Convergence Stability Evaluation

The training dynamics and convergence behavior of the DAYKD model were rigorously analyzed through the examination of loss curves across different training tasks. Figure 7 presents the respective loss trajectories for both object detection and keypoint prediction tasks, revealing several important characteristics of the learning process. All loss functions demonstrate consistent monotonic decay throughout the training regimen, with values asymptotically approaching stable minima in later epochs. This smooth convergence profile, devoid of any oscillatory behavior or divergence, indicates robust optimization dynamics and effective learning without evidence of overfitting. The observed stabilization of loss values in the final training task further confirms the model’s ability to reach a well-optimized solution state, suggesting both numerical stability and effective capacity utilization of the network architecture.

Figure 7a reveals distinct convergence characteristics in the object detection task, where the bounding box loss exhibits rapid initial descent during early optimization (typically epochs 1–50), followed by asymptotic stabilization. This biphasic convergence pattern demonstrates the model’s capacity for efficient spatial feature acquisition while maintaining stable optimization dynamics throughout extended training. The more gradual reduction in dfl_loss components reflects the inherent complexity of learning precise distribution focal representations for bounding box regression.

Conversely, Figure 7b demonstrates accelerated convergence in keypoint detection compared to traditional pose estimation architectures, with loss values decreasing approximately 40% faster during initial training tasks. This enhanced learning efficiency represents a marked improvement over conventional approaches where pose-related losses typically exhibit slower convergence due to the intricate nature of structural feature learning. Moreover, the stable decline of the keypoint object confidence loss (kobj_loss) indicates the model’s increasing reliability in predicting the presence of keypoints. These results collectively demonstrate that the DAYKD model exhibits strong performance and stability in both object detection and keypoint detection tasks. The observed rapid optimization in DAYKD empirically validates three critical aspects of our approach: the effectiveness of the baseline architecture selection, the optimal design of the parameter-sharing strategy between detection tasks, and the appropriate composition of the keypoint training dataset. These results collectively demonstrate the model’s superior capability in simultaneous spatial and structural feature learning, establishing a new benchmark for integrated object detection and keypoint estimation tasks.

4.4. Performance Estimation of the Proposed Model

Figure 8 systematically illustrates the complete DAYKD recognition workflow, including the original data, enhanced processing, category probability visualization, object detection results before and after NMS, and final keypoint estimation. The original B-scan image in Figure 8a displays characteristic signal reflections through red-blue waveform patterns, representing unprocessed subsurface radar measurements. Following enhancement in Figure 8b, the processed image exhibits significantly improved contrast between target signals and background noise, facilitating subsequent detection tasks.

The object detection task produces a category probability thermogram as shown in Figure 8c, with color intensity representing target likelihood where deeper red hues correspond to higher probabilities. These probability estimates, when integrated with the preliminary bounding box detections visible in Figure 8d, undergo Non-Maximum Suppression processing to generate the final object localization results displayed in Figure 8e. The system successfully identifies hyperbolic features, marked by blue bounding boxes with confidence scores ranging from 0.65 to 0.81, demonstrating the method’s consistent detection reliability.

Further, Figure 8f illustrates the keypoint detection results. Keypoints are marked on each detected hyperbolic target to more accurately describe the geometric characteristics of the hyperbolic features. The confidence scores associated with keypoint annotations are generally higher, typically ranging from 0.88 to 0.94, suggesting that the method achieves high precision in keypoint localization. The combined results validate DAYKD’s capability for integrated object localization and structural feature extraction in complex GPR data.

The detection results of the DAYKD model on simulated data for underground target detection tasks are illustrated in Figure 9 and Figure 10. Figure 11, Figure 12 and Figure 13 further present a comparison between the model’s detection outcomes and the corresponding ground truth images for field scenes 1 and 2. Each detected bounding box contains a potential hyperbolic target, five keypoints, and the associated confidence scores. As shown in the figures, all targets were accurately identified, with no missed or false detections. The vertex of each hyperbola was precisely located and marked with a red dot, while other keypoints were highlighted in green.

From a model detection perspective, directly inputting images containing positional and time-traveling coordinate axes may interfere with the model’s feature extraction and affect detection performance. To mitigate this, all images are preprocessed to remove coordinate axes during both the training and validation phases. After detection is completed, the positional and temporal coordinates are re-applied to the output images for subsequent analysis and visualization. From Figure 9 and Figure 10, it can be observed that despite the presence of challenging conditions such as horizontal and vertical overlaps and indistinct target boundaries, the DAYKD model is still capable of performing accurate detection and localization. Figure 11, Figure 12 and Figure 13 showcase the model’s performance on real-world field data from two distinct acquisition scenarios. Despite substantial background noise and frequent target occlusions characteristic of ground-penetrating radar environments, DAYKD maintains consistently high detection accuracy, with confidence scores predominantly above 0.90. The model exhibits particular robustness in high-density target scenarios with small object sizes, successfully resolving individual targets through precise spatial differentiation.

Quantitative analysis reveals close alignment between the detected targets and ground truth references in both spatial position and hyperbolic shape characteristics. This performance consistency across diverse testing conditions, from controlled simulations to complex field environments, confirms the model’s superior adaptability and reliability for subsurface detection tasks. The combination of high-confidence detections and geometrically accurate keypoint localization establishes DAYKD as a robust solution for challenging underground target identification applications.

4.5. Performance Comparison with Other Algorithms

As evidenced by the quantitative results in Table 3, the DAYKD model establishes new benchmarks in underground object detection, achieving a state-of-the-art mAP50 of 94.7%. This performance surpasses conventional approaches including Faster R-CNN (by 12.3%), RTMDet (by 8.9%), and the baseline model (by 11.5%), while remaining competitive with Cascade R-CNN (difference of 1.2%). The model’s exceptional capability becomes particularly evident in the more rigorous mAP50-95 metric, where it outperforms all comparison methods by substantial margins exceeding 10 percentage points, demonstrating remarkable robustness across diverse object scales and geometric configurations.

The further examination of the recall metric reveals DAYKD’s superior detection completeness, achieving 92.2% recall, a 7.5% improvement over the second-best performer. This advantage stems from the model’s optimized feature representation and effective handling of challenging subsurface targets. The marginal deficit in mAP50 relative to Cascade R-CNN can be attributed to fundamental architectural differences: Cascade’s multi-task refinement mechanism provides progressive bounding box optimization that proves particularly effective at higher IoU thresholds, while DAYKD’s unified architecture prioritizes overall detection quality and computational efficiency. This trade-off reflects the inherent balance between precision refinement and holistic detection performance in deep learning-based object detection systems.

To effectively track the convergence of our proposed model during training, we recorded the box_loss convergence of various models over 300 epochs as an indicator of target localization performance. Figure 14 illustrates the variation in box_loss for five different models. The results demonstrate that DAYKD achieved the best performance throughout the training process, with the fastest decline in loss value, ultimately stabilizing at 0.35, showcasing superior convergence performance and accuracy. YOLOv11 and RTMDet also exhibited strong performance, characterized by rapid convergence and low final loss values. The Cascade model presented a slightly higher box_loss, while Faster RCNN maintained the highest box_loss throughout the training, indicating its inferior performance in the bounding box regression task.

This study conducts a systematic comparative analysis of five mainstream object detection algorithms under two representative experimental conditions: a simulated data environment and real-world field scene 1. The investigation aims to rigorously evaluate the practical efficacy of these models across varying environmental complexities. The evaluated algorithms comprise Faster R-CNN, RTMDet, Cascade, YOLOv11, and DAYKD. As evidenced by the detection outcomes depicted in Figure 15 and Figure 16, distinct performance differentials emerge among the models regarding detection accuracy and operational robustness. Notably, conventional detection frameworks including Faster R-CNN, RTMDet, and Cascade employ bounding-box-based localization methodologies without keypoint detection integration, thereby constraining their analytical capabilities. In contrast, both YOLOv11 and DAYKD incorporate advanced keypoint detection architectures, demonstrating superior precision and enhanced structural comprehension in target analysis.

Within controlled simulation environments, all algorithms demonstrate competent performance in basic recognition tasks, achieving target identification with negligible error margins. However, performance stratification becomes apparent with increasing target density and intersection scenarios. As quantitatively demonstrated in Figure 15, conventional models including Faster R-CNN, RTMDet, and Cascade manifest significant detection deficiencies in complex configurations, characterized by elevated rates of false negatives and suboptimal localization precision. These limitations indicate inherent constraints in high-density target processing capabilities. Comparative analysis further reveals that traditional architectures exhibit constrained feature representation capabilities and limited structural interpretability relative to their advanced counterparts.

In contrast, YOLOv11 demonstrates significant improvements in detection confidence through its integrated keypoint detection mechanism, thereby effectively mitigating false positive identifications and misdetections in regions devoid of distinct structural features. DAYKD exhibits the most balanced and robust performance metrics across all evaluation parameters, demonstrating exceptional boundary localization precision and superior small-object detection efficacy. The model achieves detection accuracy surpassing the 90% threshold, representing substantial performance enhancement over comparative architectures, which substantiates the effectiveness of its structural design and feature extraction methodology.

The evaluation on field scene 1, characterized by greater environmental complexity, further highlights the robustness of each model. Although Faster R-CNN, RTMDet, and Cascade can detect a majority of hyperbolic structures, their overall recognition performance is markedly inferior to that of YOLOv11 and DAYKD. In particular, DAYKD maintains high detection accuracy and boundary fitting in complex real-world conditions, with virtually no missed detections. This demonstrates its strong generalization ability and adaptability, making it the most robust and reliable model in this evaluation. In summary, DAYKD achieves the best detection performance across both tested environments, excelling in accuracy, robustness, and small-object recognition. It stands out as the most capable model among those evaluated. These results further validate the effectiveness of incorporating keypoint detection in enhancing both the precision and stability of object detection algorithms.

In another example from field scene 1, we conducted a detailed comparative analysis of the model combinations used in the ablation study. Unlike the relatively clear and scale-consistent detection targets in Figure 16, the objects in Figure 17 exhibit significant scale variations, along with complex backgrounds and occlusion effects, introducing substantial interference. These factors impose greater demands on the model, particularly in terms of global perception and multi-scale feature extraction capabilities.

As shown in Figure 17a–d, the performance differences among the models in handling this challenging scenario are evident. The baseline YOLOv11 model suffers from missed and false detections due to object scale variations and background noise. By integrating the CAFM during the object detection task, the model enhances its regional perception of targets, effectively mitigating background interference and significantly reducing missed detections. When the FRFN module is applied solely during the keypoint detection task, the model exhibits stronger feature fusion and scale modeling capabilities, leading to a notable improvement in keypoint confidence scores. Finally, the model incorporating the DAYKD strategy achieves the highest overall detection accuracy, demonstrating that this approach effectively enhances generalization and robustness.

However, it is worth noting that the proposed algorithm struggles to recognize incomplete hyperbolic targets, which may arise due to measurement line constraints or interference from adjacent objects. Although partial hyperbolic signatures often still indicate the presence of true subsurface targets, the model currently lacks the sensitivity to consistently interpret these incomplete patterns.

5. Conclusions

This paper proposes the DAYKD model, a dual-architecture framework based on YOLOv11, designed for efficient and precise underground target detection and keypoint localization in GPR images. DAYKD employs a two-task training strategy: the first task focuses on high-precision object detection, while the second task refines keypoint recognition accuracy. Ablation experiments demonstrate that DAYKD achieves a precision of 93.7% and an mAP@50 of 94.7% in object detection tasks. Compared to state-of-the-art models, such as YOLOv11, Cascade R-CNN, RTMDet, and Faster R-CNN, DAYKD exhibits superior performance, with an mAP@50-90 of 82.5% and a recall of 99.2%.

DAYKD demonstrates strong performance in GPR image interpretation, particularly in distinguishing overlapping subsurface targets and maintaining accuracy in cluttered, noisy environments. Its dual-task architecture and modular design enable efficient, robust inference across diverse imaging conditions and GPR systems. The model exhibits resilience to background noise and interference, with low computational overhead. However, its generalization is limited by the diversity of real-world datasets. DAYKD also struggles to detect incomplete hyperbolic signatures, especially near image boundaries, which may lead to missed detections under complex conditions.

Future work will aim to improve the generalization ability of the DAYKD model by expanding on-site data collection across a wider range of geological conditions and deployment scenarios. A more diverse and representative dataset is expected to enhance the model’s robustness in real-world applications. In addition, we plan to explore the use of partially labeled samples, particularly those with incomplete hyperbolic features, to improve the model’s ability to detect subtle or truncated subsurface targets under challenging conditions. To support deployment in resource-constrained environments, future research will also focus on improving computational efficiency. This includes investigating model compression techniques such as pruning, knowledge distillation, and low-bit quantization. These methods are expected to reduce inference time and memory consumption, enabling real-time performance while maintaining detection accuracy in practical engineering settings.

Author Contributions

Conceptualization, F.H.; Methodology, F.H.; Software, Y.Z.; Validation, Y.Z.; Formal analysis, F.H.; Resources, J.D.; Data curation, J.D.; Writing—original draft, Y.Z.; Writing—review & editing, F.H.; Supervision, J.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (nos 62406346); the Natural Science Foundation of Hunan Province (nos 2025JJ60257); and the Degree and Graduate Education Reform Project of Central South University (nos 2025YJSKSO19).

Data Availability Statement

Data will be made available upon request.

Acknowledgments

The authors would also like to thank Wang Honghua from Guilin University of Technology and Wang Xiangxu from Guangzhou University for their valuable assistance in the joint effort of collecting the dataset.

Conflicts of Interest

The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.

References

Jing, L.; Cai, L.; Zhaofa, Z.; Lingna, C. GPR Signal Denoising and Target Extraction with the CEEMD Method. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1615–1619. [Google Scholar] [CrossRef]
Li, H.-L.; Lu, Q. Progress in application of ground penetrating radar in LNAPL contaminated soil detection of LNAPL. Prog. Geophys. 2020, 35, 1141–1148. [Google Scholar] [CrossRef]
Yin, D.; Ye, S.; Zhang, J.; Liu, J.; Liu, X. Study on the effect of highway structure and dielectric properties on GPR echo. Electron. Meas. Technol. 2018, 41, 51–56. [Google Scholar] [CrossRef]
Wang, Y.; Wang, H.; Li, Q.; Kang, W. Study on sedimentation regulation of silted soil in the front of reservoir dam based on GPR. J. Soil Water Conserv. 2023, 34, 102–110. [Google Scholar] [CrossRef]
Liu, H.; Yue, Y.; Lian, Y.; Meng, X.; Du, Y.; Cui, J. Reverse-time migration of GPR data for imaging cavities behind a reinforced shield tunnel. Tunn. Undergr. Space Technol. 2024, 146, 105649. [Google Scholar] [CrossRef]
Bai, D.-W.; Du, B.-R.; Zhang, P.-H. Weak signal processing technology of low frequency GPR based on Hilbert-Huang transform and its application to permafrost area gas hydrate exploration. Geophys. Geochem. Explor. 2017, 41, 1060–1067. [Google Scholar] [CrossRef]
Baek, J.; Yoon, J.S.; Lee, C.M.; Choi, Y. A Case Study on Detection of Subsurface Cavities of Urban Roads Using Ground-coupled GPR. In Proceedings of the 2018 17th Int. Conf. Ground Penetrating Radar (GPR), Rapperswil, Switzerland, 18–21 June 2018; pp. 1–4. [Google Scholar] [CrossRef]
Li, H.; Chou, C.; Fan, L.; Li, B.; Wang, D.; Song, D. Toward Automatic Subsurface Pipeline Mapping by Fusing a Ground-Penetrating Radar and a Camera. IEEE Trans. Autom. Sci. Eng. 2020, 17, 722–734. [Google Scholar] [CrossRef]
Liu, Y.; Wang, M.; Cai, Q. The target detection for GPR images based on curve fitting. In Proceedings of the 2010 3rd International Congress on Image and Signal Processing, Yantai, China, 16–18 October 2010; Volume 6, pp. 2876–2879. [Google Scholar] [CrossRef]
Li, W.; Cui, X.; Guo, L.; Chen, J.; Chen, X.; Cao, X. Tree Root Automatic Recognition in Ground Penetrating Radar Profiles Based on Randomized Hough Transform. Remote Sens. 2016, 8, 430. [Google Scholar] [CrossRef]
Maas, C.; Schmalzl, J. Using pattern recognition to automatically localize reflection hyperbolas in data from ground penetrating radar. Comput. Geosci. 2013, 58, 116–125. [Google Scholar] [CrossRef]
Sharafeldin, S.M.; Essa, K.S.; Youssef, M.A.S.; Karsli, H.; Diab, Z.E.; Sayil, N. Shallow geophysical techniques to investigate the groundwater table at the Great Pyramids of Giza, Egypt. Geosci. Instrum. Method. Data Syst. 2019, 8, 29–43. [Google Scholar] [CrossRef]
Sharafeldin, M.; Essa, K.S.; Sayıl, N.; Youssef, M.A.S.; Diab, Z.E.; Karslı, H. Geophysical Investigation of Ground Water Hazards in Giza Pyramids and Sphinx Using Electrical Resistivity Tomography and Ground Penetrating Radar: A Case Study. In Proceedings of the 9th Congress of the Balkan Geophysical Society, Antalya, Turkey, 5–9 November 2017; Volume 2017, pp. 1–5. [Google Scholar] [CrossRef]
Luo, X.; He, J.; Zhang, D.; Zhu, J.; Li, M.; Zhang, B.; Li, Q. Evaluating subsurface cavities detection using innovative laser dynamic deflectometer for efficient and large-scale urban road network inspections. Tunn. Undergr. Space Technol. 2025, 159, 106471. [Google Scholar] [CrossRef]
Dou, Q.; Wei, L.; Magee, D.R.; Cohn, A.G. Real-Time Hyperbola Recognition and Fitting in GPR Data. IEEE Trans. Geosci. Remote Sens. 2017, 55, 51–62. [Google Scholar] [CrossRef]
Zhang, X.; Xue, F.; Wang, Z.; Wen, J.; Guan, C.; Wang, F.; Han, L.; Ying, N. A Novel Method of Hyperbola Recognition in Ground Penetrating Radar (GPR) B-Scan Image for Tree Roots Detection. Forests 2021, 12, 1019. [Google Scholar] [CrossRef]
Xiang, D.; Pan, X.; Ding, H.; Cheng, J.; Sun, X. Two-Stage Registration of SAR Images With Large Distortion Based on Superpixel Segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5211115. [Google Scholar] [CrossRef]
Pham, M.T.; Lefèvre, S. Buried Object Detection from B-Scan Ground Penetrating Radar Data Using Faster-RCNN. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 6804–6807. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Cui, F.; Ning, M.; Shen, J.; Shu, X. Automatic recognition and tracking of highway layer-interface using Faster R-CNN. J. Appl. Geophys. 2022, 196, 104477. [Google Scholar] [CrossRef]
Li, H.; Zhang, J.; Ye, M.; Wang, Q.; Wei, M.; Zhang, Z.; Guo, Y.; Tian, Y.; Zhang, Y.; Wang, C.; et al. High-accuracy intelligent detection of centimeter-level voids in cement pavement via ground-penetrating radar. Measurement 2025, 256, 118354. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Liu, H.; Lin, C.; Cui, J.; Fan, L.; Xie, X.; Spencer, B.F. Detection and localization of rebar in concrete by deep learning using ground penetrating radar. Autom. Constr. 2020, 118, 103279. [Google Scholar] [CrossRef]
Sun, X.; Qidao, Z.; Da, Y.; Youyuan, Z. Research on ground penetrating radar image target detection based on YOLOv5s. World Transp. 2023, 7, 3–6. [Google Scholar] [CrossRef]
Wang, Z.; Lan, T.; Qu, X.; Gao, S.; Yu, Z.; Yang, X. Improved SSD Framework for Automatic Subsurface Object Indentification for GPR Data Processing. In Proceedings of the 2021 CIE International Conference on Radar (Radar), Haikou, China, 15–19 December 2021; pp. 2078–2081. [Google Scholar] [CrossRef]
Qiu, Z.; Zhao, Z.; Chen, S.; Zeng, J.; Huang, Y.; Xiang, B. Application of an Improved YOLOv5 Algorithm in Real-Time Detection of Foreign Objects by Ground Penetrating Radar. Remote Sens. 2022, 14, 1895. [Google Scholar] [CrossRef]
Hu, H.; Fang, H.; Wang, N.; Ma, D.; Dong, J.; Li, B.; Di, D.; Zheng, H.; Wu, J. Defects identification and location of underground space for ground penetrating radar based on deep learning. Tunn. Undergr. Space Technol. 2023, 140, 105278. [Google Scholar] [CrossRef]
Wang, Y.; Qin, H.; Zhang, D.; Wu, T.; Pan, S. Domain Adaption YOLO Network to Enhance Target Detection in GPR Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5936415. [Google Scholar] [CrossRef]
Lan, T.; Zhao, Y.; Guo, C.; Gong, J.; Yang, X. A State-Space-Model-Based Hyperbola Detection Method for Arbitrarily Long GPR B-Scan. IEEE Geosci. Remote Sens. Lett. 2025, 22, 7504805. [Google Scholar] [CrossRef]
Wang, N.; Zhang, Z.; Hu, H.; Li, B.; Lei, J. Underground Defects Detection Based on GPR by Fusing Simple Linear Iterative Clustering Phash (SLIC-Phash) and Convolutional Block Attention Module (CBAM)-YOLOv8. IEEE Access 2024, 12, 25888–25905. [Google Scholar] [CrossRef]
Li, S.; Cui, X.; Guo, L.; Zhang, L.; Chen, X.; Cao, X. Enhanced Automatic Root Recognition and Localization in GPR Images Through a YOLOv4-Based Deep Learning Approach. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5114314. [Google Scholar] [CrossRef]
Hu, S.; Gao, F.; Zhou, X.; Dong, J.; Du, Q. Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5504005. [Google Scholar] [CrossRef]
Zhou, S.; Chen, D.; Pan, J.; Shi, J.; Yang, J. Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 2952–2963. [Google Scholar] [CrossRef]
Yang, G.; Yuan, D.; Xu, T.; Li, B. An Adaptive Clutter-Immune Method for Pipeline Detection with GPR. IEEE Sens. J. 2023, 23, 22984–22997. [Google Scholar] [CrossRef]
Terrasse, G.; Nicolas, J.M.; Trouvé, E.; Drouet, É. Automatic localization of gas pipes from GPR imagery. In Proceedings of the 2016 24th European Signal Processing Conference (EUSIPCO), Budapest, Hungary, 29 August–2 September 2016; pp. 2395–2399. [Google Scholar] [CrossRef]
Ma, Y.; Lei, W.; Pang, Z.; Zheng, Z.; Tan, X. Rebar Clutter Suppression and Road Defects Localization in GPR B-Scan Images Based on SuppRebar-GAN and EC-Yolov7 Networks. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
Kazemi, V.; Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1867–1874. [Google Scholar] [CrossRef]
Vilaça, L.; Viana, P.; Carvalho, P.; Andrade, M.T. Improving Efficiency in Facial Recognition Tasks Through a Dataset Optimization Approach. IEEE Access 2024, 12, 32532–32544. [Google Scholar] [CrossRef]
Fang, T.; Wang, C.; Wang, J.; Du, Y. Ground penetrating radar image pipeline location based on YOLO v8n. Foreign Electron. Meas. Technol. 2023, 42, 170–177. [Google Scholar] [CrossRef]
Jocher, G.; Qiu, J. Ultralytics YOLO11. Available online: https://github.com/ultralytics/ultralytics (accessed on 11 December 2024).
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. Available online: https://github.com/ultralytics/ultralytics (accessed on 10 December 2024).
Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A Database and Web-Based Tool for Image Annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
Warren, C.; Giannopoulos, A.; Giannakis, I. gprMax: Open source software to simulate electromagnetic wave propagation for Ground Penetrating Radar. Comput. Phys. Commun. 2016, 209, 163–170. [Google Scholar] [CrossRef]
Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. RTMDet: An Empirical Study of Designing Real-Time Object Detectors. arXiv 2022. [Google Scholar] [CrossRef]
Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1483–1498. [Google Scholar] [CrossRef]

Figure 1. Illustration of end-to-end model-guided GPR target localization.

Figure 2. Diagram of object detection head structure.

Figure 3. Display and definition of the position of five keypoints.

Figure 4. Two-dimension coordinate system conversion.

Figure 5. Overall architecture of the improved model.

Figure 6. An illustration of the field data acquisition: (a) field scene 1 data collection site; (b) field scene 2 data collection site.

Figure 7. Loss curve of model training process in different tasks: (a) target detection task; (b) keypoint detection task.

Figure 8. Instance of recognition process: (a) raw B-scan; (b) enhanced B-scan; (c) category probability thermogram in target detection; (d) all bounding boxes after initial prediction; (e) object detection results after NMS; (f) the result after keypoint detection.

Figure 9. Display within five targets in simulation data.

Figure 10. Display of more than five targets in simulation data.

Figure 11. First example of ground truth and test result in field scene 1: (a) ground truth annotations; (b) model test results with confidence scores.

Figure 12. Second example of ground truth and test result in field scene 1: (a) ground truth annotations; (b) model test results with confidence scores.

Figure 13. Ground truth and test result in field scene 2: (a) ground truth annotations; (b) model test results with confidence scores.

Figure 14. Variations in box_loss throughout training for diverse models.

Figure 15. Performance comparison of various target detection algorithms on simulation data (red boxes indicate missed detections): (a) Faster R-CNN; (b) RTMDet; (c) Cascade; (d) YOLOv11; (e) DAYKD.

Figure 16. Performance comparison of various target detection algorithms on field scene 1: (a) Faster R-CNN; (b) RTMDet; (c) Cascade; (d) YOLOv11; (e) DAYKD.

Figure 17. Performance comparison of ablation experiment on field scene 1 (red boxes indicate missed detections, and red arrows point to false positives): (a) YOLOv11, (b) YOLOv11 + CAFM, (c) YOLOv11 + FRFN, (d) DAYKD.

Table 1. Hyperparameter settings for model training.

Parameter	Value	Parameter	Value
Learning rate	0.01	Bounding box loss gain	7.5
Learning rate weight decay	0.005	Classification loss gain	0.5
Total training epochs	1000	Pose loss gain	12.0

Table 2. Ablation study on DAYKD: comparison of different modules on accuracy and efficiency (best performance results are bolded; arrows indicate the direction of preferable values: ↑ higher is better, ↓ lower is better).

Scheme	F1-Score ↑	mAP50 ↑	mAP50-95 ↑	Precision ↑	Recall ↑	Parameter ↓	GFLOPS ↓	FPS ↑
YOLOv11 (baseline)	91.2%	87.4%	73.6%	92.4%	90.2%	4.92 M	13.1	52.9
YOLOv11 +CAFM	91.1%	90.3%	77.6%	90.6%	91.8%	5.18 M	13.2	68.5
YOLOv11 +FRFN	89.6%	88.2%	78.8%	91.2%	88.1%	5.29 M	14.3	56.4
The proposed DAYKD	92.9%	94.7%	82.5%	93.7%	92.2%	5.47 M	14.4	65.4

Table 3. Precision and recall of various target detection algorithms (best performance results are bolded; arrows indicate the direction of preferable values: ↑ higher is better, ↓ lower is better).

Scheme	mAP50 ↑	mAP50-90 ↑	Recall ↑
Faster R-CNN [19]	94.0%	54.6%	63.8%
RTMDet [44]	90.5%	53.5%	71.9%
Cascade [45]	96.2%	59.7%	67.5%
YOLOv11	87.4%	73.6%	90.2%
The proposed DAYKD	94.7%	82.5%	92.2%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hou, F.; Zhang, Y.; Dong, J.; Fan, J. End-to-End Model Enabled GPR Hyperbolic Keypoint Detection for Automatic Localization of Underground Targets. Remote Sens. 2025, 17, 2791. https://doi.org/10.3390/rs17162791

AMA Style

Hou F, Zhang Y, Dong J, Fan J. End-to-End Model Enabled GPR Hyperbolic Keypoint Detection for Automatic Localization of Underground Targets. Remote Sensing. 2025; 17(16):2791. https://doi.org/10.3390/rs17162791

Chicago/Turabian Style

Hou, Feifei, Yu Zhang, Jian Dong, and Jinglin Fan. 2025. "End-to-End Model Enabled GPR Hyperbolic Keypoint Detection for Automatic Localization of Underground Targets" Remote Sensing 17, no. 16: 2791. https://doi.org/10.3390/rs17162791

APA Style

Hou, F., Zhang, Y., Dong, J., & Fan, J. (2025). End-to-End Model Enabled GPR Hyperbolic Keypoint Detection for Automatic Localization of Underground Targets. Remote Sensing, 17(16), 2791. https://doi.org/10.3390/rs17162791

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

End-to-End Model Enabled GPR Hyperbolic Keypoint Detection for Automatic Localization of Underground Targets

Abstract

1. Introduction

2. Dual-Task Framework Designed to Enhance GPR Target Localization

2.1. Challenge Analysis and Scheme Design

2.2. Task 1: Target-Candidate Region Detection

2.3. Task 2: Hyperbolic Keypoint Detection

2.3.1. Standard of Keypoint Selection

2.3.2. Coordinate Point Mapping

3. The Proposed DAYKD Model

3.1. Basic Model Architecture

3.2. Enhanced Feature Representation for Object Detection

Globel and Local Feature Extraction: CAFM in Backbone

3.3. Enhanced Feature Refinement for Keypoint Detection

Refined Details: C3K2 with FRFN

3.4. Loss Function Strategy

4. Experiment Results and Analysis

4.1. Experimental Settings and Evaluation Metrics

4.1.1. Experimental Settings

4.1.2. Evaluation Metrics

4.2. GPR Dataset Collection

4.2.1. Simulated Dataset

4.2.2. Field Dataset

4.3. Ablation Experiment

4.3.1. Performance of Individual or Combination Improvements

4.3.2. Convergence Stability Evaluation

4.4. Performance Estimation of the Proposed Model

4.5. Performance Comparison with Other Algorithms

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI