1. Introduction
Borehole radar, a specialized application of ground penetrating radar (GPR), is a geophysical technique operated within boreholes [
1,
2,
3]. Due to its flexible deployment capability in three-dimensional geological space, BHR detection offers several advantages, including close-range investigation within boreholes, reduced susceptibility to external electromagnetic interference, and high-resolution detection enabled by high-frequency electromagnetic waves. These features enable relatively refined detection of sub-meter-scale anomalies and have demonstrated promising performance in geological and hydrogeological investigations, mineral exploration, and structural health monitoring [
4,
5,
6,
7,
8], indicating considerable potential for engineering applications. However, conventional omnidirectional BHR systems are constrained by radar wave propagation characteristics and complex geological conditions, allowing only the determination of the distance between the target and the borehole and making directional detection of surrounding targets difficult [
9]. To address this limitation, the present study is based on a rotatable directional detection radar system, as illustrated in
Figure 1, which enables precise localization of adverse geological hazard sources. The proposed BHR system adopts an embedded design, in which the acquisition control module is implemented using an embedded chip to enable radar and gyroscope data acquisition, while communication with the host computer is achieved via a photoelectric converter. The gyroscope provides information on antenna orientation and in-borehole mileage, while a camera monitors the surrounding rock conditions near the antenna to assist in radar data interpretation. The radar system performs dense 360° scanning via an electric rotation controller, thereby enabling full-space observation of geological targets.
For B-scan profiles acquired by directional BHR, accurately determining the three-dimensional azimuth of geological targets within the profiles is of critical importance [
10]. Traditional anomaly localization algorithms for directional BHR profiles can generally be classified into two categories: the maximum-amplitude-based orientation algorithm and the intermediate-value-based localization algorithm [
9]. The fundamental principle of the maximum-amplitude-based orientation algorithm is as follows. When the directional BHR antenna performs rotational detection or uniform angular velocity scanning within the borehole, the transmitting antenna emits a directional electromagnetic beam, whose directivity is governed by the intrinsic radiation pattern of the antenna, thereby enabling directional scanning of the surrounding medium. When the beam axis (i.e., the direction corresponding to the maximum electromagnetic field intensity) is aligned with the anomaly, the reflected response exhibits the maximum amplitude. Therefore, when the echo pulse reaches its peak value, the corresponding beam axis direction is considered to represent the azimuth of the anomaly. In contrast, the intermediate-value-based method determines the anomaly azimuth by first setting a predefined detection threshold and then selecting several strong-amplitude points exceeding this threshold. Since the amplitude variation near the maximum value is relatively flat, the central position of these selected points is taken as the azimuth of the anomaly. As illustrated in
Figure 2, the maximum-amplitude-based orientation algorithm identifies the azimuth corresponding to the position with the strongest target response energy (
Figure 2a). By comparison, the intermediate-value-based localization algorithm first specifies a threshold (e.g., 0.7 in
Figure 2c) and then determines the target azimuth by calculating the average of the maximum and minimum angles within the range exceeding the threshold.
However, due to the complexity of in situ geological conditions and the large volume of data generated during rapid acquisition, even experienced technicians face significant challenges, as BHR data interpretation is both time-consuming and labor-intensive. In practice, data collected by BHR within a single day may require several weeks for complete interpretation. Moreover, the aforementioned traditional methods inherently involve subjective uncertainty. Therefore, there is an urgent need to develop an efficient and reliable automated BHR interpretation approach [
11].
In recent years, advances in artificial intelligence—particularly in computer vision—have enabled deep learning algorithms to be applied to the automatic and efficient detection of structural damage in civil engineering [
12,
13]. For instance, Pham et al. [
14] adopted a Faster R-CNN model pre-trained on the CIFAR-10 dataset and conducted joint training and fine-tuning on both real and simulated GPR profiles to detect underground targets (i.e., hyperbolic reflections) in GPR B-scans. Qin et al. [
12] extended deep learning to tunnel lining inspection by developing an automatic recognition framework based on a Mask Region-Based Convolutional Neural Network (Mask R-CNN) for identifying steel ribs, voids, and initial linings from GPR profiles. Their method employed a ResNet-101 backbone combined with a Feature Pyramid Network (FPN) for feature extraction, a Region Proposal Network (RPN) for target detection, and a Fully Convolutional Network (FCN) for segmentation. To address data scarcity, they further enhanced training using synthetic GPR data generated by the finite-difference time-domain (FDTD) method and a deep convolutional generative adversarial network. Results from both synthetic and field experiments demonstrated high recognition accuracy, highlighting the potential of deep learning for intelligent interpretation of GPR data.
However, the aforementioned methods are often computationally intensive and time-consuming. By contrast, lightweight single-stage frameworks, including the Single Shot Multibox Detector (SSD) and You Only Look Once (YOLO), have demonstrated remarkable effectiveness in GPR profile interpretation and are therefore more suitable for efficient and low-cost analysis of large-scale datasets [
15]. Illustratively, Luo et al. [
16] proposed Multi-Task DL for GPR data (MTGPR), an automatic method for detecting voids and cavities in tunnel linings from GPR radargrams. CAPW-YOLO was employed to enhance feature extraction and fusion under infrastructure interference, and a refined synthetic dataset was used to augment training. Ablation experiments showed 88.5% accuracy and 84.0% precision, with a 46.9% increase in speed and a 10.2% reduction in model size compared with the YOLOv7 baseline. Khedr et al. [
17] demonstrated that YOLOv8 outperforms both Faster R-CNN and YOLOv7. They trained the YOLOv8 model using experimental and field data and validated its accuracy and rebar diameter classification capability on real-world building datasets.
The aforementioned deep learning-based automatic GPR profile detection algorithms have primarily focused on data acquired along conventional survey lines. In contrast, research on automatic detection for directional BHR profiles remains limited, particularly regarding the angular localization of geological targets within such profiles, which has not been thoroughly investigated. Pose estimation is a task that involves identifying the locations of specific points in an image, commonly referred to as keypoints. These keypoints can represent various parts of an object, such as joints, landmarks, or other distinctive features. Determining both the azimuth of targets in directional BHR profiles and the distance between targets and the antenna involves keypoint localization within the profiles and can therefore be formulated as a pose estimation task. Motivated by the YOLOv11n-pose model, this study develops a robust deep learning framework, termed BSS-Pose-BHR, for keypoint detection in directional BHR response profiles. The main contributions of this work are as follows:
A three-dimensional electromagnetic model of limestone containing a certain density of clay particles is constructed, and a simulation dataset of directional BHR response profiles is generated by setting rigorously designed and dynamically controlled simulation parameters.
The BSS-Pose-BHR model incorporates three innovative modules: (1) Backbone network optimization: Bi-Level Routing Attention (BRA) is introduced to replace Multi-Head Self-Attention (MHSA) in C2PSA. This query-aware dynamic sparse attention mechanism filters irrelevant features, significantly improving computational efficiency and memory utilization for large-scale datasets. (2) Keypoint extraction enhancement: A lightweight sliced attention module, Conv_SAMWS, is embedded in both the Backbone and Neck networks. By slicing feature maps and applying parameter-free attention weighting, it enhances keypoint feature representation while maintaining a lightweight architecture. (3) Detection head improvement: Spatial and Channel Reconstruction Convolution (SCConv) is adopted to optimize the detection head. By refining spatial and channel information, feature redundancy is reduced and local feature extraction is strengthened, making the model more suitable for keypoint detection tasks.
Using BSS-Pose-BHR for keypoint detection in directional BHR response profiles, the proposed method achieves superior performance compared with current mainstream detection models, including YOLOv8n-pose, YOLOv10n-pose, YOLOv11n-pose, YOLOv12n-pose, and YOLOv11n-SPPF_improved-BSAM-LSCD_LQE. In addition, it provides more accurate angular localization than traditional target azimuth estimation methods for directional BHR profiles.
2. Materials and Methods
2.1. Principle of Directional BHR Detection and Imaging
As shown in
Figure 3a, conventional BHR detection methods typically involve laying survey lines on the working face to acquire geological information in previously unexplored regions. Electromagnetic wave propagation within the medium follows Maxwell’s equations [
18], as expressed in Equation (1).
In Equation (1), represents the current density (A/m2); denotes the electrical conductivity (S/m); E stands for the electric field intensity (V/m); D is the electric displacement (C/m2); indicates the permittivity (F/m); B refers to the magnetic flux density (T); represents the magnetic permeability (H/m); is the magnetic field intensity (A/m); and denotes the charge density (C/m3).
However, this conventional approach has several limitations, including limited detection accuracy, restricted detection range, constrained detection depth, and insufficient spatial information regarding geological structures [
19].
BHR, as a specialized form of ground-penetrating radar, operates on the same fundamental principles as conventional geological radar [
20] while retaining its high-resolution capability. This method enables close-range detection of subsurface targets via boreholes, thereby improving detection accuracy and depth of investigation. BHR can be implemented using three measurement configurations: single-hole detection, cross-hole detection, and borehole-to-surface detection [
21]. In this study, the directional BHR configuration adopts the single-hole detection mode. As illustrated in
Figure 3b, the BHR antenna consists of an arc-shaped radiating element, a metallic shielding cover, and impedance-absorbing materials, with a borehole antenna diameter of 60 mm. The electric rotation controller shown in
Figure 1 enables the directional BHR system to perform high-precision 360° close-range rotational scanning within the borehole.
As the antenna rotates through the full spatial domain, its radiation direction also rotates in three-dimensional space, resulting in a continuous change in the orientation of the main lobe. During this process, the antenna center remains fixed, and its relative position with respect to the detection target does not change [
22]. As shown in
Figure 2a, within the profiles, the target-reflected waveforms exhibit an approximately horizontal linear distribution, indicating that the distance between the reflection points and the antenna remains nearly constant. Furthermore, due to the strong directivity of the directional BHR antenna, its radiated field is mainly concentrated within the main lobe. When the main lobe is aligned with the target center, the target lies within the region of maximum radiation intensity, and the corresponding echo energy reaches its maximum. As the antenna gradually deviates from the target direction, the target moves out of the main lobe and enters the sidelobe region with lower radiation intensity, resulting in reduced incident energy and a gradual attenuation of the target response.
2.2. Dataset Acquisition
This study is still at the early stage of instrument development, and large-scale field surveys in complex geological environments have not yet been conducted. Therefore, the dataset is primarily generated and augmented based on simulated electromagnetic models. As shown in
Figure 4a,b, a three-dimensional electromagnetic model of limestone containing a specified proportion of clay particles is constructed. The main simulation parameters of the model are listed in
Table 1.
The simulations are conducted using gprMax (v3.0) [
23,
24]. The main electromagnetic parameters of the materials used in the model are listed in
Table 2. In particular, the rotation of the cavity around the antenna’s central
Z-axis is described by Equation (2).
Here, , , represent the three-dimensional coordinates of all points within the cavity anomaly. The function gives the three-dimensional coordinates of the cavity after rotation. “” denotes the rotation angle of the cavity, with . represents the model resolution, while and are the X–Y plane coordinates of the point around which the cavity rotates about the Z-axis. In this study, the default values are = 1.535 m and = 1.535 m, and = 0.005 m.
The dataset annotation in this study is performed using LabelMe (v5.9.1) [
25,
26].
Figure 4c illustrates the annotation of different BHR acquisition data. Regarding the keypoint information of the anomalies (distance and azimuth), the distance refers to the distance from the center of the anomaly to the antenna center (
), while the azimuth refers to the angle when the anomaly is aligned with the antenna (
). The corresponding formulas are as follows:
Here, , , and denote the geometric center coordinates of the anomaly along the X, Y, and Z axes, respectively, while , , and denote the geometric center coordinates of the antenna along the X, Y, and Z axes, respectively.
2.3. Design of the BSS-Pose-BHR Model
2.3.1. Overall Model Architecture Design
As shown in
Figure 5, the overall architecture of BSS-Pose-BHR is built upon the YOLOv11n framework [
27,
28], forming an end-to-end pipeline consisting of Input–Backbone–Neck–Pose Head–Output. The numbers (0–22) denote the layer indices in the YOLO network architecture. The simulation dataset constructed in
Section 2.2 serves as the input, and all annotated images are uniformly resized to 640 × 640 pixels before being fed into the input layer.
The BSS-Pose-BHR model incorporates three major modifications:
Backbone network reconstruction (C2PSA → C2PSA_BRA): Bi-Level Routing Attention (BRA) replaces the Multi-Head Self-Attention (MHSA) within the Position-Sensitive Attention (PSA) module of C2PSA. BRA is an attention mechanism designed to address the scalability limitations of MHSA. Traditional attention mechanisms require each query to attend to all key–value pairs, which leads to excessive computational cost and memory consumption when processing large-scale data. BRA introduces a dynamic, query-aware sparse attention mechanism [
29], which filters out most irrelevant key–value pairs at a coarse region-level granularity while retaining only a small set of routed regions. Fine-grained token-to-token attention is then performed within the union of these routed regions, allowing each query to focus on a limited number of relevant key–value pairs, thereby improving computational efficiency and memory utilization.
Keypoint extraction enhancement (Conv → Conv_SAMWS in Backbone and Neck): The Conv modules in both the Backbone and Neck networks are reconstructed as Conv_SAMWS, which incorporates a Simple Parameter-Free Attention Module with Slicing (SimAMWithSlicing). SimAM is a lightweight and efficient attention mechanism that enhances the model’s ability to capture important features through simple computations [
30]. By performing slicing operations on the input feature maps, the module strengthens the attention weights of keypoint-relevant features while maintaining a lightweight architecture.
Detection head optimization (SCConv): The SCConv [
31] is introduced to optimize the Pose Head structure, reducing spatial and channel redundancy while enhancing local feature learning. This improvement is particularly effective for keypoint detection tasks.
2.3.2. BRA Module
As shown in
Figure 6, the internal structure of the BRA module is illustrated. First, the input feature map
is partitioned into
regions, and each region is projected to generate Query (
), Key (
), and Value (
) tensors. Let
denote the reshaped regional feature matrix, and
,
, and
represent the corresponding projection matrices, respectively. The computation is formalized as follows:
Then, the mean values of
and
are computed to obtain
and
, respectively. Equation (6) is then employed to construct the adjacency matrix
, which measures the semantic similarity across different regions.
The matrix
is filtered using Equation (7), and only the top-
connections are retained for each region to prune the association graph, resulting in the index matrix
.
Finally, for each query region, the key–value pairs of the selected regions are aggregated to perform token-to-token attention computation.
where
and
are the aggregated key-value pairs from selected regions, and
represents Local Context Enhancement (Depthwise Convolution).
2.3.3. SimAMWithSlicing Module
SimAM integrates spatial, channel, and feature dimensions to generate 3D weights.
Figure 7a illustrates the generation of these 3D weights. The SimAM attention mechanism estimates the importance of each neuron by constructing and optimizing an energy function. By evaluating the linear separability of neurons, the energy function of SimAM can be expressed as follows:
where
is the energy function value representing the degree of difference between the target neuron and other neurons.
is input feature of target neuron in a single channel.
and
are the mean of all other neurons in the corresponding channel except the target neuron
and the variance of all neurons in the corresponding channel except target neuron
,
is input feature of other neurons in the single channel.
and
are the number of neurons and index and
is a regularization factor.
Equation (9) means that the lower the energy, the more different the neuron
is from the surrounding neurons, and the higher the importance. Finally, the output features of the SimAM
is expressed as follows:
where
denotes the input features tensor of the SimAM,
is a tensor composed of the energy function value of each features.
is a Hadamard product. Sigmoid function is used to scale attention and suppress relatively large values.
When SimAM computes the mean pixel difference across the entire feature map, the weighting process may overlook the importance of small targets, resulting in weak enhancement for small targets or keypoints and limiting its effectiveness in keypoint detection tasks. To address this issue,
Figure 7b introduces a slicing operation during feature map computation. By dividing the feature map into separate blocks, large targets, due to their prominent texture characteristics, influence the block-wise mean, thereby reducing the additional weighting they receive. After merging the blocks, large targets still maintain high recognizability and may even receive further enhancement. In contrast, small targets exhibit larger deviations from the local mean, thereby receiving stronger weighting and feature enhancement. This approach improves the precision of keypoint localization, particularly for small or subtle features.
2.3.4. SCConv Module
The structure of the Spatial and Channel Reconstruction Convolution (SCConv) is shown in
Figure 8 and primarily consists of the Spatial Reconstruction Unit (SRU) and the Channel Reconstruction Unit (CRU). The SRU serves as a spatial reduction unit, reducing the spatial dimensions of feature maps through separation and reconstruction operations. The CRU functions as a channel reduction unit, reducing the number of feature channels through segmentation, transformation, and fusion operations. By combining these two reconstruction units, SCConv effectively captures complex relationships within the input features. This not only mitigates feature redundancy but also reduces the number of model parameters and floating-point operations per second (FLOPs), thereby significantly enhancing the model’s feature extraction capability.
2.4. Evaluation Metrics for Keypoint Detection Model
In order to evaluate the effectiveness of the proposed model [
32], the metrics used in this study include mean Average Precision (mAP50), mAP50–95, model size (MB), and floating point operations (FLOPs). The mAP metrics are computed based on the precision–recall framework, while different matching criteria are adopted for bounding box detection and keypoint detection tasks.
The mAP is a comprehensive metric that reflects both precision and recall. Precision (P) and Recall (R) are defined as follows:
where
,
and
denote true positives, false positives, and false negatives, respectively. By varying the confidence threshold, a precision–recall (P–R) curve can be obtained, and the Average Precision (AP) is defined as the area under the P–R curve:
Here, is the total number of classes, and denotes the average precision for the -th class.
For bounding box detection, mAP50(B) and mAP50–95(B) are computed based on the Intersection over Union (IoU) between the predicted bounding box and the ground-truth box. IoU is defined as the ratio of the overlap area to the union area of the predicted box
and the ground-truth box
:
A prediction is considered correct when the IoU exceeds a specified threshold. Specifically, mAP50(B) denotes the mAP computed when the IoU threshold is set to 0.5, while mAP50–95(B) represents the average mAP over multiple IoU thresholds from 0.50 to 0.95 with a step of 0.05.
Similarly to bounding box evaluation, the metrics mAP50(P) and mAP50–95(P) are used for keypoint detection, where the Object Keypoint Similarity (OKS) defined in the MS COCO evaluation protocol is adopted instead of IoU, and the OKS is calculated as:
Here, represents the annotated key point index, represents the squared Euclidean distance between the detected key point position and the ground truth key point position, represents the area occupied by the detected human body in the image, represents the decay constant used to control the disease location point . In the case of multiple keypoints, can be calculated as the standard deviation of the corresponding ground truth positions across the dataset, reflecting the annotation consistency of that point. The value of is normalized by the target region size in the OKS calculation. A larger indicates lower consistency (higher annotation variability), while a smaller indicates higher consistency (more reliable annotations). For a single geological keypoint, is typically set to 0.5. In other words, can be interpreted as a weight reflecting the importance of each keypoint: more important points can be assigned smaller values, requiring higher localization precision and contributing more to the OKS; less important points can be assigned larger , allowing for larger prediction errors without significantly affecting the overall OKS.
is the impulse function, indicating that the OKS value is only computed for visible relationship points in the ground truth annotations. represents the visibility of the key point, where 0 signifies unannotated, 1 signifies annotated but occluded, and 2 signifies annotated and visible.
In keypoint evaluation, a prediction is considered correct when the OKS exceeds a specified threshold. Therefore, mAP50(P) denotes the mAP computed when the OKS threshold is set to 0.5, while mAP50–95(P) represents the average mAP over multiple OKS thresholds from 0.50 to 0.95 with a step of 0.05.
Meanwhile, model size is also crucial. Industrial equipment usually has limited resources, and smaller models are easier to deploy on edge devices or embedded systems. In addition, FLOPs are used to describe the number of floating-point computations required during model inference and are commonly adopted to evaluate the overall computational complexity of a model.
3. Results and Discussion
3.1. Implementation Details
All deep learning models are tested on a Windows 10 operating system. The experimental hardware includes an Intel(R) Xeon(R) Gold 6133 @ 2.50 GHz processor and an NVIDIA GeForce RTX 4090 GPU with 24 GB of video memory. Development is carried out using PyTorch 2.0.1 and CUDA 11.7, with Python 3.8 as the programming environment. The specific training parameters of all models used in the experiments are detailed in
Table 3.
In this study, for the simulation dataset, GPUs are employed to accelerate gprMax simulations, using a computational platform equipped with two RTX 4090 GPUs (24 GB VRAM each). However, since the simulations are conducted in three-dimensional space and involve detailed antenna modeling, the computational complexity is high. It takes nearly one month to generate 623 data pairs, which are further divided into training, validation, and test sets in an 8:1:1 ratio.
3.2. Training Performance of BSS-Pose-BHR
As shown in
Figure 9, this study presents the variation curves of different loss functions and evaluation metrics of BSS-Pose-BHR over 400 training epochs. For the training set, the box loss initially exhibits a high value (~4.4) but decreases sharply within the first 50 epochs, indicating that the model rapidly learns the spatial localization and scale of cavity objects by refining bounding box predictions. After 50 epochs, the decreasing trend slows and the curve stabilizes at approximately 0.86, indicating convergence. Further training yields negligible improvements in box regression performance. Similarly, the classification loss, which starts at a relatively high value (~3.9), gradually decreases and stabilizes at 0.44 after 400 epochs. The initially high classification loss indicates difficulty in assigning correct class labels to detected bounding boxes; however, as training progresses, the model achieves improved classification accuracy for cavity targets at different locations. Furthermore, the stabilization of the distribution focal loss after 400 epochs suggests that the model has learned a consistent pattern for refining bounding box predictions and has increased confidence in delineating cavity response boundaries within BHR radargrams. The overall reduction in the three loss functions demonstrates an effective optimization process, with the model progressively converging as training proceeds. The final loss values, all below 1, indicate that the BSS-Pose-BHR model is sufficiently trained to capture the gradual energy response characteristics of cavities in directional BHR profiles. Regarding keypoint estimation losses, both pose loss and keypoint objectness loss (kobj_loss) stabilize after 400 epochs, with values below 0.1. This indicates that BSS-Pose-BHR achieves balanced confidence in keypoint prediction, effectively distinguishing true keypoints from background noise and learning robust spatial keypoint representations. In addition, key evaluation metrics such as Precision, Recall, mAP50, and mAP50–95 all reach stable peak values after 400 epochs.
For the validation set, the above loss functions and evaluation metrics exhibit smooth and gradual convergence as training progresses. No curve shows signs of overfitting or severe oscillations during training, indicating that the training configuration of BSS-Pose-BHR is reasonable and stable.
3.3. Comparative Experiments
The optimal weights of the BSS-Pose-BHR model obtained after 400 training epochs are evaluated on the test set, and its performance is compared with several state-of-the-art models. The comparison results are shown in
Table 4.
To ensure a fair comparison, all baseline models (YOLOv8n-pose, YOLOv10n-pose, YOLOv11n-pose, and YOLOv12n-pose) are trained on the proposed simulated BHR dataset under the same training configuration as BSS-Pose-BHR, including input resolution, optimizer, learning rate, batch size, and number of epochs. No additional hyperparameter tuning is applied to individual models. Among these keypoint detection models, YOLOv8n-pose and YOLOv11n-pose also achieve relatively strong performance. Specifically, compared with YOLOv10n-pose, BSS-Pose-BHR achieves improvements of 7.54% in mAP50(B), 4.00% in mAP50–95(B), 2.75% in mAP50(P), and 3.00% in mAP50–95(P). This significant performance gap is also reflected in the confusion matrices computed on the validation set during training. As shown in
Figure 10, the confusion matrix of BSS-Pose-BHR shows perfectly correct predictions on the validation set, with no misclassifications or missed detections. In contrast, YOLOv10n-pose misclassifies five target samples as background on the same validation set, further demonstrating the superiority of BSS-Pose-BHR over YOLOv10n-pose.
For YOLOv11n-SPPF_improve-BSAM-LSCD_LQE, the model integrates several advanced enhancement strategies. Based on YOLOv11n-pose, three major modifications are introduced. (1) In the backbone, the original SPPF module is enhanced by incorporating global average pooling and global max pooling layers. The resulting features are concatenated to embed global background information, providing a broader contextual representation. (2) The Bi-Level Routing Spatial Attention Module (BSAM) is appended after the C2PSA module. BSAM is an improved variant of the Convolutional Block Attention Module (CBAM), in which the original channel attention mechanism is replaced to enhance feature selection capability. (3) The Local Structure and Context Description–Local Quality Estimation (LSCD_LQE) module replaces the original detection head to improve localization and quality estimation. For the variant “Ours (LSCD),” the detection head of BSS-Pose-BHR is replaced with LSCD for comparison. Compared with the baseline model (YOLOv11n-pose), these two modified models show improvements on only a subset of evaluation metrics. In contrast, BSS-Pose-BHR achieves consistent and significant improvements across all four key metrics, with larger gains than both variants. This further demonstrates the effectiveness and rationality of the proposed module design. In terms of computational complexity, measured in GFLOPs, BSS-Pose-BHR maintains relatively low overhead and ranks third among all compared models, while YOLOv12n-pose achieves the lowest complexity. Although three improvement modules are introduced into YOLOv11n-pose, the computational cost of BSS-Pose-BHR increases by only 0.2 GFLOPs, indicating a marginal increase in complexity and still remaining lower than most competing models. Overall, these results demonstrate that the proposed method achieves a favorable trade-off between detection accuracy and computational efficiency.
To provide a more intuitive comparison between the proposed model and the baseline in terms of both bounding box detection and keypoint localization accuracy,
Figure 11 and
Figure 12 are presented. As shown in the detection results of eight representative examples in
Figure 11, both BSS-Pose-BHR and YOLOv11n-pose successfully detect the expected number of bounding boxes. However, the predicted bounding boxes of BSS-Pose-BHR exhibit higher confidence scores than those of YOLOv11n-pose, indicating that BSS-Pose-BHR provides more reliable predictions for practical BHR profile-based target recognition applications.
Figure 12 illustrates the proportional differences along the x–y axes between predicted keypoint positions and ground truth for approximately 450 cases, including both simulated and indoor experimental data. From the figure, YOLOv11n-pose shows a more dispersed distribution of proportional differences, indicating slightly larger deviations from the ground truth compared with the proposed method. Specifically, the mean and standard deviation are 0.0128 and 0.0099 for YOLOv11n-pose, and 0.0118 and 0.0086 for BSS-Pose-BHR, respectively. These results demonstrate that the proposed method achieves higher accuracy and better stability in keypoint position estimation.
In addition,
Section 1 introduces two representative directional BHR profile target localization algorithms: the maximum amplitude-based directional algorithm and the median-value-based localization algorithm, both of which are used to estimate azimuth angles. Therefore, this study further compares these two methods with BSS-Pose-BHR in terms of azimuth angle prediction accuracy. The experimental data are consistent with the cases analyzed above, and the results are presented in
Table 5.
In summary, for advanced deep learning-based methods, both the quantitative metrics of bounding box detection and keypoint localization, as well as the qualitative visualization results, clearly demonstrate the superior performance of BSS-Pose-BHR. Compared with traditional directional BHR profile-based target azimuth estimation algorithms, BSS-Pose-BHR also achieves more accurate azimuth angle prediction.
3.4. Ablation Experiment
To verify the impact of the proposed improvement modules on model performance, ablation experiments are conducted based on YOLOv11n-pose. The results are shown in
Table 6.
The experimental results indicate that all improvement strategies positively contribute to overall model performance. The introduction of the BRA module improves the baseline YOLOv11n-pose model, yielding gains of 2.02% in mAP50(B) and 0.23% in mAP50(P), while increasing the model size by 0.56 MB. The incorporation of this module significantly enhances both mAP50(B) and mAP50(P), improving the model’s ability to extract and accurately localize targets in BHR profiles, albeit with a slight increase in model weight size.
After introducing the SimAMWithSlicing module on top of the BRA module, the model achieves improvements of 3.14% in mAP50(B) and 0.69% in mAP50(P) compared with YOLOv11n-pose. Due to the lightweight design of the SimAMWithSlicing attention mechanism, the model size remains almost unchanged relative to the BRA-only model, while still providing noticeable performance gains.
To further improve accuracy, the detection head is enhanced after integrating BRA and SimAMWithSlicing by incorporating SCConv. This module separately optimizes spatial and channel information, reduces redundant features, and further improves mAP50(B) and mAP50(P), thereby significantly enhancing the performance of YOLOv11n-pose in detecting keypoint positions of underground targets in BHR profiles. However, although the integration of these three modules yields substantial improvements in detection accuracy, it also leads to an increase in model size.
3.5. Indoor Experimental Testing
3.5.1. Single-Target Detection Experiment
Since the instrument research, supported by a major national project, is still at an early stage, this study conducts signal processing experiments using only indoor tests. In these experiments, the BHR system is positioned approximately 50 cm above the ground and performs a 360° full-space rotational scan. This setup is primarily used to observe the locations of ground response features in the BHR profiles. The antenna operates at 400 MHz and adopts a water-drop-shaped butterfly radiator. The borehole diameter is 60 mm, the load resistance is 200 Ω, and the antenna shielding arc is 180°. The trained optimal model weights of BSS-Pose-BHR and YOLOv11n-pose are then applied to perform target detection and keypoint localization on the acquired profiles. The results are shown in
Figure 13.
Due to factors such as the electronic design of the radar antenna, impedance mismatches between the antenna and the ground, and multiple reflections of electromagnetic waves at subsurface interfaces, the raw profiles obtained during indoor ground detection inevitably contain horizontal interference with varying amplitude levels. This interference degrades the quality of the high-resolution images provided by the system. Therefore, as shown in
Figure 13b, Robust Principal Component Analysis (RPCA) is applied to remove most low-rank components from the raw profiles. Subsequently, both BSS-Pose-BHR and YOLOv11n-pose are used to perform keypoint detection on the processed data. In terms of detection confidence, BSS-Pose-BHR shows a clear advantage. From the perspective of keypoint azimuth localization, the ground is theoretically located directly beneath the BHR system, and the corresponding angle of the strongest ground response in the profile should be approximately 180°, i.e., at the midpoint (0.5) of the horizontal axis. According to the quantitative results, BSS-Pose-BHR predicts the keypoint position at 0.527522, whereas YOLOv11n-pose predicts it at 0.551346. This indicates that BSS-Pose-BHR produces a keypoint estimation closer to the ground truth.
3.5.2. Multi-Target Detection Experiment
Based on the experimental setup described in
Section 3.5.1, a horizontal piece of aluminum foil is placed directly above the borehole radar. In this case, the BHR scans exhibit additional response features corresponding to the foil. The ground target keypoints are located at approximately 180° (i.e., the center of the scan), while the foil keypoints are located around 0°/360° (i.e., the edges of the scan). Detection is performed using both BSS-Pose-BHR and YOLOv11n-pose, and the comparative results are shown in
Figure 14. The results indicate that the two methods achieve similar performance in keypoint localization, while the proposed method produces higher confidence scores for the predicted bounding boxes compared with the baseline.
3.6. Robustness Analysis
To conduct a robustness analysis of the BSS-Pose-BHR model, we systematically evaluate its performance by introducing varying levels of Gaussian noise and different channel drop rates into the BHR profiles.
In practice, BHR data acquisition inevitably involves multiple sources of random interference, such as thermal noise from radar transmit/receive electronics, electromagnetic background noise, and stochastic perturbations in the signal acquisition chain. In addition, subsurface heterogeneity and scattering effects introduce further noise components. Therefore, degraded scenarios are constructed by artificially adding Gaussian noise with different intensities to the original clean profiles, enabling a systematic evaluation of the model’s robustness under complex real-world conditions.
Furthermore, to evaluate the model’s adaptability to incomplete sampling, profiles with varying degrees of missing channels are generated. The rotational directional BHR system acquires circumferential data by controlling the antenna rotation angle interval. In practice, the angular sampling density is sometimes reduced to improve acquisition efficiency, which can lead to missing traces in the profiles. Therefore, degraded scenarios with different missing-channel ratios are constructed to assess the stability and engineering applicability of the BSS-Pose-BHR model under sparse sampling and incomplete observation conditions.
Experiments are conducted on 40 profiles. Gaussian noise is added to achieve PSNR values ranging from 31 dB to 43 dB, representing light to moderate noise levels that preserve the main profile structures while challenging the model’s feature extraction and target recognition capabilities. In practical simulations, when the PSNR falls below approximately 30 dB, the noise becomes sufficiently strong to noticeably distort the morphology of the BHR profiles and may obscure the response characteristics of cavity anomalies. Conversely, when the PSNR exceeds 43 dB, the profiles become overly clean, and the noise interference is insufficient to effectively evaluate the robustness of the model. Therefore, the range of 31–43 dB is adopted as a relatively conservative setting, introducing noticeable but non-destructive noise while ensuring that the main structural features and anomaly responses remain clearly identifiable. It is observed that profiles with PSNR around 30 dB can still preserve the basic structural information; however, a slightly higher range is selected in this study to ensure stable anomaly visibility and consistent evaluation conditions. For missing-channel scenarios, random channel omissions are applied to the original profiles, after which the data are merged and resized to a uniform dimension.
As shown in
Figure 15 and
Figure 16, under different PSNR levels and missing-channel conditions, BSS-Pose-BHR consistently outperforms YOLOv11n-pose in terms of both mAP50–95(B) and mAP50–95(P), with the performance curves remaining consistently above those of YOLOv11n-pose. These quantitative results directly demonstrate the superior performance of BSS-Pose-BHR in both bounding box detection and keypoint estimation tasks. Representative cases further show that even at PSNR = 31 dB or a missing-channel rate of 0.35, BSS-Pose-BHR successfully detects the complete number of bounding boxes and accurately predicts keypoints. As summarized in
Table 7, across different degradation scenarios, the average error between predicted and true azimuth angles indicates that BSS-Pose-BHR achieves significantly higher azimuth prediction accuracy than the two traditional methods. Notably, even under severe conditions (missing-channel rate = 0.35, PSNR = 31 dB), its average error remains lower than that of the traditional algorithms under much milder conditions (missing-channel rate = 0.06, PSNR = 43 dB). Moreover, among the traditional methods, the maximum-amplitude-based approach consistently underperforms the median-value-based method.
3.7. Limitations
Although the proposed BSS-Pose-BHR method achieves good performance in the experiments, several limitations should be noted:
The dataset used in this study is mainly generated from numerical simulations, which may not fully represent the complexity of real geological environments. In future work, more realistic geological models and refined simulation settings will be considered, and real measured data or data augmentation techniques such as generative networks will be introduced to further improve the diversity and realism of the dataset.
The simulations are conducted in three-dimensional space with detailed antenna modeling, resulting in high modeling complexity and limiting the efficiency of data generation. Future studies will explore simplified simulation strategies and reduced model complexity to improve computational efficiency and support the construction of larger-scale datasets.
The keypoint definition in this study is limited to the azimuth of the target response, which is a simplified setting suitable for preliminary investigation. Physical validation in real borehole environments is time-consuming and resource-intensive. In addition, experiments involving more complex and non-symmetric geological structures are required in future work to further improve the generalization capability of the proposed method.