1. Introduction
A broad spectrum of nondestructive evaluation (NDE) techniques has been developed to assess the condition of civil infrastructure without causing damage. These methods encompass visual inspection, ultrasonic and acoustic testing, magnetic particle and eddy current analysis, radiography, thermography, optical methods, microwave scanning, penetrant testing, acoustic emission monitoring, and ground penetrating radar (GPR) [
1].
GPR continues to gain attention because of its effectiveness in locating reinforcing steel rebars and identifying subsurface anomalies in concrete structures. Such attention can be attributed to the GPR’s ability to provide rapid and non-invasive imaging, which makes it valuable for assessing structural integrity and informing maintenance strategies for transportation infrastructure. The increasing adoption of GPR in structural evaluations reflects its growing recognition as a reliable diagnostic tool in civil infrastructure. Recent guidelines from the American Concrete Institute, including ACI PRC-228.3-23 [
2] and ACI 228.2R-13 [
3], acknowledge the role of GPR in evaluating internal conditions and embedded elements in concrete structures.
Environmental conditions significantly influence GPR performance, particularly in detecting subsurface features such as rebars in concrete. Moisture from rainfall or snow can increase soil or overlay conductivity, resulting in greater signal attenuation and a reduced penetration depth. Similarly, freeze–thaw cycles alter the dielectric properties of the material, causing increased scattering and degradation of signal clarity [
4,
5]. Surface moisture after rainfall or during snow melt also introduces low-frequency noise and raises background signal levels, effectively masking weak reflections from embedded objects [
6]. Zatar et al. [
7] conducted experiments in an environmental chamber to investigate the impact of chloride content, rust from rebar corrosion, ambient temperature, and relative humidity on the GPR signal amplitude. The study found that both chloride contamination in concrete and rust formation on the steel surface reduced the amplitude of rebar reflections. Moreover, higher chloride concentrations led to greater signal attenuation. Under the experimental conditions, the reflection amplitudes from corroded rebars were approximately 1 dB lower than those from non-corroded rebars.
While prior research has introduced both analytical and deep learning-based methods to enhance rebar detection from radargrams, most approaches have been validated only under controlled or lightly obstructed conditions. For example, Zatar et al. [
8,
9] demonstrated the accurate localization of rebar using an analytical model for clean and moderately noisy radargrams.
Several researchers have reported on the use of a Convolutional Neural Network (CNN) for high-accuracy surface and deep detection [
10,
11,
12,
13]. Yang et al. [
14] and Wang et al. [
15] studied the use of deep learning to detect defects in manufacturing. Tian and Jia [
16] developed a rapid detection method for steel surface defects. Several other researchers have examined on industrial and aviation systems [
17,
18,
19,
20].
Previous research conducted by the authors has demonstrated that deep learning models, especially those based on CNNs, can automate rebar detection from radargrams with high accuracy, even under moderate signal degradation. Among the many deep learning frameworks developed for object detection, the You Only Look Once (YOLO) series is widely recognized for its high-speed performance and real-time detection capabilities [
21,
22,
23,
24]. In 2020, Li et al. [
25] implemented the YOLOv3 algorithm using Google’s TensorFlow framework to perform real-time pattern recognition in GPR images. A recent study proposed a two-stage strategy for detecting grouting defects in bridge tendon ducts by combining the impact-echo method with machine learning and spectrogram-based classification, achieving accuracies exceeding 90% in both laboratory and full-scale girder tests [
26].
Building on this research, Li et al. [
27] conducted a comparative study of YOLOv5 against YOLOv3 and YOLOv4, demonstrating that YOLOv5 delivered notable improvements, especially when trained on smaller datasets. It also showed enhanced robustness in detecting and distinguishing features within GPR data. Qiu et al. [
28] applied YOLOv5 [
29] for real-time target detection and coordinate localization in GPR imagery, highlighting the algorithm’s growing effectiveness in subsurface object identification. The YOLOv8 architecture has emerged as a fast and robust solution for real-time object detection [
30,
31]. The authors have trained separate YOLOv8 models using radargrams with varying levels of signal clarity, including clear hyperbolas from laboratory specimens and blurry hyperbolas from bridge beams with asphalt cover.
This study aims to evaluate the effectiveness of deep learning models for rebar detection in GPR radargrams under challenging field conditions. Unlike previous YOLO-based rebar detection studies, which were primarily conducted under controlled laboratory conditions, this research applies YOLOv8 to GPR data from in-service bridge beams with asphalt overlays, addressing real-world challenges such as signal attenuation, noise, and variable field conditions. Three YOLOv8 models were trained using distinct datasets: clear, interfering, and blurry. The clear-trained model was applied to field scans targeting stirrups beneath asphalt overlays, demonstrating strong performance in moderately attenuated conditions. The interfering-trained model was utilized for high-noise transverse scans of concrete beams in the laboratory, where voids and nearby rebars heavily distorted signal reflections. The blurry-trained model was used on radargrams collected from in-service bridge beams with thick asphalt and concrete cover, where signal quality was severely degraded. A post-processing filtering algorithm is needed to remove overlapping and potentially inaccurate hyperbolas.
8. AI Training Models
The dataset used to train and evaluate the YOLOv8 model was obtained from three laboratory-scale bridge box beams and one in-service bridge structure. These specimens featured asphalt overlays with thicknesses ranging from 1.5 to 2.5 inches, which are typical of standard construction practices and environmental protection requirements. The inclusion of varying asphalt conditions allowed for a comprehensive assessment of how overlay thickness affects the visibility and detectability of embedded rebar in GPR radargrams.
Meeting the input requirements of YOLOv8 and enhancing model performance involved preprocessing radargrams into image segments with a resolution of 640 × 640 pixels, a size that strikes a balance between computational efficiency and sufficient visual detail to detect hyperbolic signatures accurately. Overlapping image patches were extracted from each radargram to ensure complete coverage and to prevent truncation of hyperbolas located near image edges.
The rebar detection methodology developed by Zatar et al. [
9] was essential in accurately identifying multiple hyperbolas within the radargrams. Their approach demonstrated high accuracy in controlled laboratory environments without asphalt overlays, providing a foundational reference for this study. However, detecting rebars beneath asphalt layers presents additional challenges due to signal attenuation and the reduced clarity of hyperbola signatures. To tackle these challenges, a hybrid approach was adopted, combining automated detection with manual verification to ensure greater reliability under in-service bridge conditions. The refined detection results were subsequently used to define bounding box locations for preparing the training dataset.
A hybrid Delphi–Python workflow was implemented to execute the process efficiently. The Delphi program automated the generation of bounding boxes for each annotated hyperbola, ensuring that they were enclosed entirely while producing overlapping cropped image patches (default size: 640 × 640 pixels) to prevent any features from being cut off at the edges. In this context, the sliding window approach involves moving a fixed-size rectangular window step by step across the radargram, cropping each region into an image patch for YOLOv8 processing. Overlaps between adjacent windows are incorporated so that hyperbolas spanning window boundaries are fully captured in at least one patch. Additionally, the program generated YOLO-compatible label files, where the bounding box center coordinates (
xc, yc) and dimensions (
w, h) were normalized relative to the image dimensions (
W, H) according to:
The processed dataset was then passed to a Python 3.9 training script utilizing the Ultralytics YOLOv8 framework. This script handled model initialization, hyperparameter configuration (epochs, batch size, learning rate, confidence threshold, and IoU threshold), and Automatic Mixed Precision (AMP) to speed up training and reduce GPU memory usage. It trained the model on the prepared dataset, validated performance on a separate validation set, and output key metrics, including precision, recall, and mean Average Precision (mAP). The training process also generated performance curves, loss vs. epoch, mAP vs. epoch, and F1–confidence curves, to monitor convergence and detection quality.
Improving the effectiveness and reliability of hyperbola detection in GPR imagery requires categorizing the data into three distinct groups based on quality: clear, interfering, and blurry. These categories reflect the visibility and interpretability of hyperbolic signatures in the radargrams, which is crucial for tailoring model training strategies to different data conditions.
Clear data refers to radargrams with well-defined hyperbolas, typically collected from stirrups under laboratory conditions with minimal interference. Understanding these classifications is crucial for addressing detection challenges and improving model robustness across varying data qualities. Interfering data arises when hyperbolas from stirrups are distorted by reflections from nearby main rebars, often leading to overlapping or noisy patterns. This issue is further complicated by internal voids within the box beam cross-section, which disrupt signal propagation and contribute to reflection artifacts. The clear and interfering datasets exhibited similar signal attenuation because they were both collected from beams without asphalt overlays, with the distinction arising from interference caused by internal voids in the interfering class. Blurry data is characterized by weak or poorly defined hyperbolas, primarily due to signal attenuation from asphalt overlays. These conditions are common in field-collected radargrams from in-service bridges with asphalt covers ranging from 1.5 to 2.5 inches. The blurred dataset originated from a single asphalt-covered bridge beam, and since no additional asphalt-covered datasets were available, no further sub-categorization of blur severity was performed.
To ensure balanced and representative learning, the authors produced a dataset composition and training data selection where training data for each category was selectively obtained:
- -
Precise data was collected from Beam #1 and Beam #2.
- -
Interfering data was obtained from Beam #2 and Beam #3.
- -
Blurry data was obtained from field radargrams captured on the in-service bridge.
Table 2 provides a summary of the number of images and key training parameters used for each data group. In all cases, images were standardized to a resolution of 640 × 640 pixels, and the YOLOv8s model was utilized with AMP enabled to optimize training performance. To improve model generalization, standard data augmentation techniques were applied, including color jittering, horizontal flipping, scaling, and random erasing.
The training strategy employed was tailored to the specific characteristics of each data group, considering the number of images, potential model challenges, and the application of data augmentation. Each category serves to enhance the model’s ability to generalize across different image qualities, which is crucial for real-world civil and transportation infrastructure applications where image clarity can significantly vary.
The model was trained on a relatively small dataset of clear images for 50 epochs. With a confidence threshold (conf) of 0.3 and an Intersection over Union (IoU) threshold of 0.5, the model aimed to strike a balance between sensitivity and specificity. A batch size of 8 and a learning rate of 0.01 were employed with the SGD optimizer. Despite the limited number of training images, the use of standard data augmentation techniques can help enhance the model’s robustness, enabling it to generalize beyond the training data.
This dataset, comprising a larger number of 371 interfering datasets, was trained over 150 epochs. The model maintained the same confidence and IoU thresholds as in the clear data category. With a batch size of 2, the complexity of the data likely required more careful tuning during training. The considerable size of this dataset suggests that the model would encounter diverse scenarios of interference, thus providing it with a broader exposure to variations in the object detection task.
The training on blurry images, with the highest count of 1036, was conducted over the longest duration of 300 epochs. Here, the model’s confidence threshold was slightly reduced to 0.25 and IoU to 0.45. The smaller batch size of 2 could be attributed to the additional effort required by the model to learn from the more challenging blurry images. The extensive training duration suggests an emphasis on adapting to this data’s unique challenges, with augmentations further assisting the model in identifying blurred objects.
To evaluate the accuracy of hyperbola detection, the F1 score was employed. The F1 score is defined as the harmonic mean of precision and recall, providing a balanced measure of a model’s ability to accurately identify targets while minimizing both false positives and false negatives. This makes it particularly suitable for GPR hyperbola detection tasks, where both missed detections and false alarms can impact the reliability of the results. The F1 score is calculated as:
Figure 8 presents the F1 score curves for all training datasets. For clear hyperbolas, peak accuracy is achieved at a confidence threshold of 0.515, indicating that well-defined signals require higher confidence levels for optimal detection. Interfering hyperbolas achieve balanced performance at a confidence threshold of 0.351, indicating that moderate filtering is most effective in mitigating overlapping reflections. For blurry hyperbolas, maximum performance occurs within a confidence threshold range of 0 to 0.25, implying that a lower threshold is necessary to preserve weaker detections affected by asphalt-induced attenuation.
Figure 9 presents the YOLOv8 training and validation curves for clear, interfering, and blurry datasets, showing stable convergence in loss, precision, recall, and mAP@0.5 values. Clear data achieved optimal performance fastest (~50 epochs), while interfering and blurry data required more epochs due to increased complexity.
Figure 10 shows the corresponding precision–recall curves, where the clear dataset achieved the highest AUC-PR, followed by interfering and blurry data. These results confirm the model’s robustness while illustrating the greater detection challenge in asphalt-covered conditions.
The trained YOLOv8s model was evaluated for its inference speed on a workstation equipped with an NVIDIA GeForce GTX 1050 Ti GPU and an Intel Core i7 CPU. The model achieved an average inference time of approximately 40 milliseconds per image, resulting in a throughput of about 25 frames per second (FPS). This performance meets the practical requirements for near-real-time deployment in the field, allowing for rapid interpretation of GPR radargrams during inspections. Such speed enables inspectors to make informed decisions promptly without delaying the inspection process. Additionally, further optimization, such as deploying the model on a higher-performance GPU or utilizing acceleration frameworks like NVIDIA TensorRT, could enhance processing speed even more, making it suitable for large-scale bridge assessments.
The results in
Table 3 show that both YOLOv5 and YOLOv8 achieve high detection accuracy across the three datasets, with precision, recall, and mAP@0.5 values exceeding 0.93 in all cases. YOLOv5 generally achieves slightly higher precision, while YOLOv8 occasionally attains better mAP@0.5–0.95, particularly for the clear and interfering datasets. In terms of computational complexity, YOLOv8 in its small configuration (3.01M parameters for the clear dataset) demonstrates lower GFLOPs, faster inference times, and reduced GPU memory usage compared to YOLOv5, making it more suitable for resource-constrained deployments. However, the larger YOLOv8 configurations (used for interfering and blurry datasets) increase computational cost, leading to slower inference and higher memory requirements compared to YOLOv5. Overall, the comparison confirms that YOLOv8 can match or exceed YOLOv5 in accuracy while offering potential efficiency advantages depending on the chosen model size, enabling flexible trade-offs between accuracy and computational demands for different field conditions.
10. Results and Discussions
Figure 11 shows the detected hyperbolas using an interfering trained model for no interference (left) and high interference (right) of the voids. Despite the presence of strong horizontal reflections and diffractions caused by voids near the surface, the model successfully detects all four hyperbolic signatures representing rebars. This indicates YOLOv8’s ability to distinguish structural hyperbolas from background noise and non-rebar reflections, which often confuse traditional analytical approaches. Each bounding box is well-aligned with the apex of the hyperbolas, demonstrating high localization precision, even where signal distortion is evident, especially in the second and third hyperbolas.
Figure 12 presents a comparison between the YOLOv8 detection method and the analytical method for detecting rebars in 97 transverse radargrams collected from prestressed concrete beam #1. Each radargram contains four rebars at known locations, serving as ground truth for evaluation. The YOLOv8 detection method demonstrated significantly more consistent and accurate performance. It correctly detected exactly four rebars in 75% of the scans while over-predicting in 7% and under-predicting in 18% of the cases. In contrast, the analytical method correctly identified four rebars in only 41% of scans, overpredicting in 32% and underpredicting in 27% of the cases. The concentration of errors at the beginning of the beam can be attributed to the dense arrangement of stirrups in this region, with a spacing of only 2 inches to resist high shear forces. This close spacing produces overlapping reflections from both the stirrups and the main longitudinal rebars, which distorts the hyperbolic signatures in the radargrams. As a result, the reflections are complex for the detection model to resolve, leading to reduced accuracy in this localized area.
The YOLOv8’s lower false positive rate and higher consistency indicate that it is more robust against interference and signal distortion, particularly in regions affected by overlapping hyperbolas or noise. By learning from a diverse training set, the YOLOv8 model generalizes well to complex patterns that often confound traditional curve-fitting approaches. These results further support the superiority of the AI-based method for reliable automated rebar detection in real-world concrete structures.
Figure 13 presents a radargram of the in-service bridge from the longitudinal direction, with rebar locations marked. The hyperbolas shown represent rebar detections obtained using both the analytical and YOLOv8 detection methods. A comparison between the YOLOv8 detection and the analytical method for identifying hyperbolas corresponding to stirrups in a field scan is shown in
Figure 14. The data were collected from a reinforced concrete bridge with a 2- to 3-inch asphalt cover, using a clear-data-trained model for YOLOv8 inference. Despite the attenuation and dispersion caused by the asphalt overlay, the YOLOv8 model was able to closely follow the rebar reflection pattern with a high degree of consistency.
The analytical method, based on curve-fitting of hyperbolic signatures, demonstrated reasonable performance but exhibited greater fluctuation, particularly at lower amplitudes or in the presence of distorted reflections. Quantitatively, the YOLOv8 model detected all but one stirrup, while the analytical method missed four. Additionally, the time-of-flight (TOF) estimates from the YOLOv8 predictions align more tightly along the trendline, indicating greater stability and robustness across varying scan traces. This comparison highlights the effectiveness of the YOLOv8 model in detecting subsurface objects, even when signal quality is degraded. Its performance suggests strong potential for field deployment without requiring retraining or tuning, provided the model has been exposed to diverse training conditions.
Figure 15 illustrates the detected hyperbolas using the blurry-trained model applied to transverse GPR scans of the in-service bridge. The reflection signals from the main rebars in these radargrams were highly distorted and significantly attenuated due to the presence of a thick asphalt overlay and concrete cover. A filtering algorithm was applied to the initial YOLOv8 hyperbola detections to reduce false positives and improve detection accuracy.
This algorithm removed redundant or inaccurate bounding boxes based on spatial overlap and position criteria. Before filtering, the model detected approximately 40 hyperbolas per image, significantly overestimating the actual count. After applying the filtering logic, which eliminates boxes whose top-center point lies within a more confident detection, only 24 hyperbolas remained, closely aligning with the expected number of rebars. This post-processing step proved essential for refining detection results, especially in cases with overlapping or noisy signals.
Figure 16 illustrates the performance of the YOLOv8 model in detecting rebars from blurry radargrams obtained from the field sections, which exhibit weak signal quality due to a thick asphalt layer. The analytical method was excluded from this evaluation due to its inability to detect hyperbolas under such degraded conditions. Two curves are shown: the initial detection (solid thin line) and the filtered detection (dashed line), with the actual number of rebars (22 per section) shown as a reference (bold horizontal line).
The initial detection was consistently overestimated, with an average of 30 detections per section, primarily due to overlapping or duplicate bounding boxes around low-contrast hyperbolas. After applying the filtering algorithm, which removes redundant detections based on geometric and confidence criteria, the average detection was reduced to 21, closely approximating the actual number of rebars. This result underscores the significance of post-processing in refining YOLOv8 outputs and minimizing false positives, particularly in low-visibility conditions. The filtered YOLOv8 predictions provide a reliable estimate of rebar count in blurry GPR images, where traditional methods fail.
The study results show that the filtering method significantly reduces false positives while preserving accurate rebar detections. It highlights the influence of tailored model training and robust post-processing in achieving reliable performances. The findings will inform recommendations for selecting or combining training datasets when preparing models for field deployment. Ultimately, this work contributes to improving the reliability and flexibility of AI-assisted rebar detection in diverse inspection scenarios.
The output from the YOLOv8 detection framework provides precise spatial coordinates of the detected rebars. These measurements are directly translated into engineering drawings that map both longitudinal and transverse reinforcement layouts. By accurately determining rebar spacing and location, the generated drawings not only document existing reinforcement configurations but also serve as a basis for evaluating structural capacity. This information enables engineers to assess the load-carrying performance of the beam, identify potential deficiencies, and plan targeted maintenance or rehabilitation measures, ensuring long-term structural integrity.