4.1. Algorithm Selection and Configuration
In our experiments, we evaluated the performance of several YOLO (You Only Look Once) object detection models—specifically YOLOv8, YOLOv9, YOLOv10, and YOLOv11—along with their small, medium, and large backbones. We selected these models for their advancements in real-time object detection, which is critical for automated, on-site GPR-based bridge inspections. Real-time detection methods enable faster scanning and analysis, thereby reducing both time and resources during field work.
Each YOLO version introduces architectural and algorithmic improvements to balance accuracy, inference speed, and computational complexity. These enhancements typically involve more efficient backbone networks, refined feature aggregation strategies (such as PANet, FPN, and CSP connections), and either anchor-based or anchor-free detection mechanisms. YOLOv8 offers a lightweight backbone and optimized feature pyramid refinements; YOLOv9 includes improved feature fusion layers and better loss functions; YOLOv10 leverages next-generation backbone designs and advanced data augmentation; and YOLOv11 further streamlines the detection pipeline, often achieving higher precision at greater computational cost.
We tested each model with multiple variant sizes to account for different resource constraints and to identify the best fit for our application. Smaller variants are ideal for real-time, edge-device scenarios where GPU memory and processing power are limited, whereas larger variants are more suitable for powerful workstations capable of handling heavier models. Hyperbolic signatures in GPR data can be subtle and vary significantly in size, so balancing accuracy and speed is essential for detecting critical structural features. This multi-model, multi-variant approach provides a comprehensive assessment of each YOLO architecture’s strengths and weaknesses on a diverse set of simulated and real GPR scans. Our findings can serve as a benchmark for future researchers seeking to optimize rebar detection models for ground-penetrating radar applications, helping guide the selection of an optimal trade-off between detection precision, inference speed, and resource usage.
4.2. Implementation and Metrics
The present study was conducted using a workstation running Ubuntu 20.04, equipped with an AMD® Ryzen Threadripper Pro 5955WX CPU, 128 GB RAM, and dual NVIDIA RTX A6000 GPUs for model training and testing. The deep learning network was trained using the stochastic gradient descent (SGD) optimizer, with carefully tuned hyperparameters to enhance performance. The initial learning rate was set to 0.01, and the final learning rate was calculated as a fraction (0.01) of the initial value. The batch size was set to 32, and the number of epochs was 50. The optimizer’s momentum was configured at 0.937, serving as the momentum parameter for SGD, while a weight decay of 0.0005 was applied to regularize the model and minimize overfitting. To ensure a smooth start to training, a warmup phase spanning 3 epochs was employed, during which the learning rate and momentum increased gradually. The warmup momentum began at 0.8, and the initial bias learning rate was set at 0.1.
Data augmentation techniques were employed to enhance the diversity and robustness of the training dataset, thereby improving the model’s generalization capabilities. A Mosaic augmentation was applied with a probability of 100%, combining four images into a single composite image during training. This approach introduced significant variability in image composition, allowing the model to learn more generalized features. Additionally, geometric augmentations were applied, including translation and scaling. The translation factor of 10% enabled the images to shift horizontally or vertically by up to 10% of their dimensions, while the scaling factor of 50% allowed objects to be resized by up to half their original size. These transformations helped the model become invariant to minor positional and size variations, further strengthening its robustness. Furthermore, color augmentations were performed by adjusting the HSV (hue, saturation, and value) color space. The hue was modified by 1.5%, the saturation by 70%, and the value (brightness) by 40%. These adjustments simulated variations in color intensity, making the model more resilient to real-world scenarios with diverse visual appearances.
We evaluated the performance of our YOLO-based rebar detection models using the mean Average Precision (mAP50-95) metric, a widely recognized benchmark in object detection. This metric assesses both detection accuracy and localization quality, providing a comprehensive measure of how well the models identify and precisely locate hyperbolic signatures corresponding to rebar in GPR scans. The mAP50-95 metric is calculated by averaging the Average Precision (AP) scores across multiple Intersection over Union (IoU) thresholds, ranging from 0.50 to 0.95 in increments of 0.05. This ensures that the evaluation captures model performance across varying degrees of localization strictness. At lower IoU thresholds, detections with some misalignment still contribute to the score, reflecting general detection capability. At higher IoU thresholds, only highly accurate bounding boxes are counted, emphasizing precise localization. By averaging over this range, mAP50-95 provides a balanced assessment of both detection robustness and localization accuracy, making it a reliable metric for evaluating rebar detection in GPR scans.
4.3. Results
To assess the effectiveness of various YOLO models for hyperbolic feature detection in GPR scans, we conducted experiments comparing model performance when trained from scratch and when using transfer learning. The dataset was split into 70% training (1578 images) and 15% validation (338 images) for the real GPR dataset. For transfer learning, we first trained models on a simulated GPR dataset, using 14,000 images for training and 3000 images for validation. The best-performing model on the validation set was then used as pretrained weights for fine-tuning on the real GPR dataset. The results of our experiments are summarized in
Table 4. The mean Average Precision (mAP50-95) values are reported for both training scenarios, along with the observed improvement due to transfer learning.
The results demonstrate that transfer learning significantly improves performance across all model variants. For most models, the mAP50-95 score increased by approximately 6% to 10%, highlighting the effectiveness of pretraining on simulated GPR data before fine-tuning on real GPR scans. Notably, YOLOv10x showed an exceptionally large improvement (+0.954 mAP50-95), likely due to instability in training from scratch, where the model struggled to converge. Smaller models such as YOLOv8n, YOLOv9t, and YOLOv10n, while computationally efficient, benefited from transfer learning but still achieved slightly lower overall accuracy compared to larger variants. Larger models such as YOLOv8x, YOLOv9e, YOLOv10x, and YOLOv11x exhibited the highest accuracy after transfer learning, demonstrating their ability to leverage complex features extracted from the pretraining stage.
To evaluate the robustness of the models under data-constrained scenarios, we conducted an experiment using only 339 images for training, while keeping the same validation set (338 images). The objective was to analyze how reducing the training dataset size affects model performance, and whether transfer learning could mitigate the performance drop typically observed when training with limited data. The results in
Table 5 below highlight the model performance in terms of mAP50-95, comparing training from scratch and training with pretrained weights obtained from the simulated GPR dataset.
The results highlight the critical role of simulated GPR data in improving model performance, particularly in data-limited scenarios. When training from scratch with only 339 real GPR images, many models struggled to learn meaningful patterns, with some—particularly larger models—failing to converge at all. This is a common challenge in deep learning applications where real-world data is scarce, as models require a sufficient volume of labeled examples to effectively capture complex features, such as hyperbolic patterns indicative of rebar in bridge decks. Without enough training samples, models trained from scratch exhibited low accuracy and poor generalization, making them unreliable for real-world deployment.
However, when transfer learning was applied using pretrained weights from the simulated GPR dataset, the models exhibited substantial performance improvements, with gains ranging from 13.5% to over 95% in mAP50-95. This demonstrates that the simulated dataset successfully captured essential GPR signal characteristics, allowing models to learn fundamental feature representations before fine-tuning on real data. As a result, models pretrained on simulated scans were able to generalize significantly better, despite being trained on a small real-world dataset. Notably, larger models that initially failed to converge (e.g., YOLOv10l, YOLOv10x, and YOLOv11x) achieved near-perfect accuracy when initialized with simulated GPR-trained weights, proving that pretraining on synthetic data can compensate for the lack of extensive real-world annotations.
One of the key advantages of using simulated GPR data for pretraining is that it enables models to learn robust feature representations without requiring extensive manual labeling efforts. Generating high-quality simulated GPR scans with controlled variations in material properties, rebar spacing, and antenna frequencies provides a diverse dataset that mirrors real-world conditions. By training models on this rich synthetic dataset first, we can significantly reduce the dependency on large real-world datasets, which are often costly and labor-intensive to collect. This is particularly valuable for bridge inspection applications, where acquiring high-quality GPR scans with labeled rebar locations can be both time-consuming and impractical.
Additionally, the simulated dataset allows for systematic control over data characteristics, ensuring that the model is exposed to a wide range of potential bridge deck conditions, including different subsurface materials, noise levels, and rebar configurations. This structured variability improves model generalization, enabling it to adapt more effectively to unseen real-world conditions. The success of transfer learning in this study underscores how simulated GPR data serves as a scalable and cost-effective alternative to manually collected real-world datasets, offering a practical solution for training high-performance deep learning models for GPR-based infrastructure inspection.