From Simulation to Reality: GAN-Based Transformation of Pavement Defect Images for YOLO Detection

Yang, Jiangang; Yu, Shukai; Yao, Yuquan; Cao, Shiji; Ai, Xiaojuan

doi:10.3390/app16062978

Open AccessArticle

From Simulation to Reality: GAN-Based Transformation of Pavement Defect Images for YOLO Detection

by

Jiangang Yang

^1,2,3,

Shukai Yu

³,

Yuquan Yao

^1,2,3,*

,

Shiji Cao

³ and

Xiaojuan Ai

³

¹

State Key Laboratory of Safety and Resilience of Civil Engineering in Mountain Area, East China Jiaotong University, Nanchang 330013, China

²

Jiangxi Provincial Key Laboratory of Traffic Infrastructure Safety, East China Jiaotong University, Nanchang 330013, China

³

School of Civil Engineering and Architecture, East China Jiaotong University, Nanchang 330013, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(6), 2978; https://doi.org/10.3390/app16062978

Submission received: 27 February 2026 / Revised: 16 March 2026 / Accepted: 18 March 2026 / Published: 19 March 2026

(This article belongs to the Topic Service Safety and Green Maintenance Technology for Road Infrastructure in Complex Environments)

Download

Browse Figures

Versions Notes

Abstract

The application of three-dimensional ground-penetrating radar (3D GPR) for intelligent pavement defect analysis is often constrained by the limited availability of labeled samples. To address this challenge, this study employed Ground Penetrating Radar Maxwell (GprMax) to simulate typical pavement defects, including cracks, loose materials, and interlayer debonding. A Cycle-Consistent Generative Adversarial Network (Cycle-GAN) was then introduced to perform style transfer on the simulated images, thereby reducing the domain gap between simulated and real radar images. Furthermore, four You Only Look Once (YOLO) models—YOLO version 5, YOLOX, YOLO version 7, and YOLO version 8—were systematically compared using real datasets to identify the best-performing model, which was subsequently used to evaluate the effect of different proportions of synthetic data on detection performance. The results demonstrated that the moderate inclusion of synthetic data improved the recognition accuracy of loose defects (from 76.7% to 78.9%), whereas its impact on crack and debonding detection was negative. Moreover, excessive reliance on synthetic data led to overfitting, thereby reducing the model’s generalization capability. Among the four models, YOLOv7 achieved the best overall performance, with a mean Average Precision (mAP) of 83.4% and a crack detection rate of 88.2%. This study thus provides a feasible technical pathway and model selection reference for automated GPR-based pavement defect identification, offering practical value for efficient and accurate road maintenance inspections.

Keywords:

ground penetrating radar (GPR); deep learning; YOLO; Cycle-GAN; pavement defect detection; domain adaptation

1. Introduction

Throughout their service life, roads are continuously exposed to traffic loading and environmental influences, which collectively contribute to the formation of pavement distress that threatens driving safety. Based on their spatial characteristics and visibility, pavement distress can be classified into surface-level defects and subsurface structural defects. Subsurface structural defects—such as cracks, disintegrated materials, and inadequate interlayer bonding—are recognized as critical factors that impair pavement performance and shorten service life.

Although traditional methods such as manual inspection and core drilling have provided valuable reference data, they are typically destructive, labor-intensive, and spatially limited, rendering them insufficient for the demands of modern pavement maintenance. Ground Penetrating Radar (GPR) [1,2,3], a non-destructive sensing technology that detected subsurface structures by transmitting and receiving electromagnetic waves, had been increasingly applied in pavement defect detection. Its use significantly improved the convenience, coverage, and accuracy of subsurface defect detection in pavements [4]. However, GPR data were often complex and noisy, making manual interpretation challenging and inefficient [5]. In addition, the diversity and complexity of subsurface pavement defects—such as variations in size, depth, and shape—further increased the difficulty of data analysis.

To investigate the mechanisms of subsurface target formation in radar images, researchers have employed various methods to model and simulate underground targets. For instance, Shangguan et al. used a forward modeling approach based on the finite-difference time-domain (FDTD) method to simulate GPR echoes and their interactions with pavements under varying density and moisture conditions. The validity of the simulation results was confirmed through laboratory experiments [6]. Hong et al. [7] demonstrated that rebar corrosion significantly affects GPR signals. For example, a decrease in rebar diameter and an increase in crack width reduce the amplitude of reflected waves, while the presence of corrosion products within cracks enhances the amplitude. This study enriched the theoretical understanding of GPR in detecting rebar corrosion. Additionally, Wang et al. [8] validated the application of GPR in detecting soil wetting bodies (SWB) through experiments and numerical simulations. They extracted size information of SWBs using the FK migration method, providing technical support for groundwater resource monitoring. Razali et al. [9] explored the impact of operating frequency on GPR imaging quality. They conducted simulation experiments targeting subsurface objects of varying sizes and depths, offering experimental evidence for the application of GPR systems in diverse scenarios.

Faced with challenges such as complex image features, limited data samples, and significant domain discrepancies in simulation, researchers began to adopt deep learning techniques to enable automated interpretation of radar images. Zhang et al. [10] utilized deep convolutional neural networks (CNNs) to classify image patches, achieving significantly better crack detection performance compared to traditional handcrafted feature extraction methods. Huyan J [11] proposed CrackU-net, which employs a U-shaped network architecture to achieve pixel-level crack detection, demonstrating excellent performance. Hou [12] developed an automatic method based on deep instance segmentation, utilizing a novel anchor box scheme combined with Mask Scoring Regions with Convolutional Neural Network features (R-CNN) to accurately segment targets from GPR scans. Kang et al. [13] developed a background filtering algorithm based on Basis Pursuit to enhance the visibility of subsurface objects. Li Y [14] proposed an automatic method for detecting pavement bottom damage by combining traditional signal processing and deep learning. A piecewise linear function was used to implement automatic gain adjustment for radar signals. Dai Q [15] employed a prior-based three-dimensional convolutional neural network (3D CNN) combined with a feature attention mechanism, which effectively suppressed C-scan noise caused by complex subsurface soil conditions and improved image recognition accuracy.

Despite notable progress in radar image recognition, several challenges remained in practical applications. These included significant domain discrepancies between simulated and real data, as well as the multi-scale distribution of defect features, which made accurate identification difficult. Traditional two-stage object detection models, such as R-CNN [16], Fast R-CNN [17], and Faster R-CNN [18], offered high accuracy in complex environments but were computationally intensive and therefore unsuitable for real-time detection. In contrast, single-stage models such as the You Only Look Once (YOLO) series—including YOLOv5 [19], YOLOv7 [20], YOLOv8 [21], and YOLOX [22]—became mainstream in recent research and engineering applications due to their efficient detection performance and low computational cost. However, training object detection models using simulated radar images presented a major challenge: these images differed significantly from real radar data in terms of texture, noise, and contrast. This domain discrepancy directly affected the generalization ability and recognition accuracy of the models. Although previous studies attempted to mitigate data scarcity by increasing the volume of simulated data, they failed to address the fundamental limitation that synthetic images could not fully substitute for real ones. To overcome this issue, this study introduced Cycle-Consistent Adversarial Networks (Cycle-GAN). Originally proposed by Zhu et al. [23], Cycle-GAN was a style transfer model capable of translating between different image domains without requiring paired training data. Its key advantage lies in its ability to reduce style differences—such as those in texture, contrast, and noise—between synthetic and real radar images. By generating more realistic synthetic images, the model better learned the spatial and morphological characteristics of pavement defects [24]. Additionally, by evaluating the similarity between generated and real images, researchers were able to monitor and control the quality of synthetic data, ensuring its effectiveness and reliability in real-world applications.

However, existing studies have mainly focused on either simulation-based data generation, single-model defect detection, or image-level domain translation separately. A systematic framework that jointly combines physics-based GPR simulation, simulation-to-real style transfer, cross-model detector comparison, and synthetic-data ratio analysis for pavement-defect recognition remains insufficiently explored. In particular, previous studies have rarely examined whether simulation-translated data can provide consistent benefits across different defect categories and different YOLO architectures under the same experimental setting.

The main contributions of this study are summarized as follows:

1. An integrated simulation-to-real detection framework for GPR pavement-defect recognition is established.

2. A systematic benchmark comparison of multiple YOLO detectors is conducted under the same GPR imaging conditions.

3. The practical utility of simulation-translated synthetic data is evaluated in a class-dependent manner through real-to-synthetic ratio experiments.

4. Practical guidance for simulation-assisted GPR pavement inspection is provided.

The findings of this study provide a more realistic basis for dataset construction, detector selection, and the cautious use of synthetic data in intelligent pavement inspection systems.

2. 3D Ground Penetrating Radar Data Acquisition

This experiment utilized the GeoScope™ 3D Ground Penetrating Radar system developed by Norway’s 3D-Radar company. The system was a stepped-frequency 3D GPR, characterized by high resolution and a wide depth range (Figure 1). By utilizing frequency-modulated continuous wave (FMCW) technology, it offered enhanced resolution and penetration depth.

The system consists of five main components: a signal processing unit, a transceiver antenna array, a Distance Measuring Instrument (DMI), a Real-Time Kinematic (RTK) positioning system, and a data acquisition computer. Within the system architecture, the signal processing unit handles the received radar signals to extract information about subsurface targets. The transceiver antenna array is responsible for transmitting and receiving radar signals, achieving efficient system performance through optimized antenna configuration. The DMI records the position of the radar system during ground movement, providing accurate location data for subsequent data processing. The RTK positioning system further enhances the accuracy of positioning, ensuring the precision and reliability of radar data. Finally, the data acquisition computer integrates and stores data from all components, enabling effective management and analysis of subsurface target information. The system components and data acquisition process are illustrated in Figure 2.

This experiment was conducted along a section of an expressway project in Hubei Province, China. The highway spanned a total length of 185 km, with four lanes in both directions and a designed speed of 100 km per hour. During data acquisition, a time window of 25 ns was used, with a trace spacing of 4 cm and a dwell time of 2 μs to ensure adequate temporal and spatial resolution of the radar signals. Detailed information is presented in Figure 3.

3. Dataset Construction

To evaluate the effectiveness of the defect recognition framework—combining simulated images, style transfer, and deep detection models—we constructed a representative radar-image dataset featuring diverse defect characteristics. Given the scarcity of real radar images and the high cost of their annotation, we leveraged the flexibility of simulation to generate images with specific defect structures, despite the resulting domain gap relative to real data. We therefore assembled two separate datasets: one by extracting appropriate sections from field-collected radar scans, and the other by selecting images from simulation outputs. Both datasets satisfied the basic requirements for model training and ensured the smooth execution of our experiments.

3.1. Construction of the Real-World Dataset

(1): Data Preprocessing

The dataset construction involved data collection, processing, segmentation, and label creation. The detailed preprocessing workflow is shown in Figure 4. The results obtained from 3D-GPR are a series of complex frequency-domain samples. These need to be converted into time-domain data for more interpretable visualization. The inverse fast Fourier transform (IFFT) was used to convert the data into the time domain. The specific parameters are listed in Table 1.

The preprocessing parameters listed in Table 1 were selected empirically to balance noise suppression and defect-feature preservation in the 3D-GPR images. Specifically, the Kaiser window was adopted in the IFFT step because it provided a stable compromise between sidelobe suppression and signal resolution during the frequency-to-time-domain conversion. The background removal filter was configured using a sliding-window mean to suppress horizontal clutter and system background while retaining localized defect reflections. The gradual low-pass filtering was applied in both vertical and horizontal directions to reduce high-frequency noise and improve the continuity of subsurface target responses. In addition, the migration parameters were chosen to enhance the focusing of hyperbolic and interface-related reflections without introducing obvious geometric distortion. These settings were determined through repeated visual inspection of the processed radar images to obtain clearer defect signatures for subsequent annotation and detection-model training.

(2): Data Annotation

After data preprocessing, clear radar images were obtained. The actual dimensions and resolution of the images were 60 m × 0.8 m and 1340 × 546 pixels, respectively. These images were normalized into three segments of 20 m × 0.8 m each and uniformly renamed. The cutting process is illustrated in Figure 5. To avoid spatial leakage between the training and validation sets, the dataset split was performed at the level of the original scan strips before image cropping. Specifically, each 60 m radar image was assigned entirely to either the training set or the validation set, and only then segmented into adjacent 20 m sections. Therefore, neighboring crops derived from the same original scan were not distributed across different subsets. This protocol ensured that the validation set remained spatially independent from the training data and reduced the risk of inflated generalization performance. For data annotation, abnormal targets within the detection areas were first identified manually. The LabelImg (v1.8.0) was used to annotate these targets, generating annotation files in XML (Extensible Markup Language) format. These files were then imported into the program and normalized, ultimately producing training labels corresponding to the original radar images, as shown in Figure 6. In total, 10,000 real GPR images along with their corresponding annotations were generated to construct the final real-world dataset for model training and evaluation. To improve dataset transparency and reproducibility, the class-wise distribution used in this study is summarized in Table 2. To ensure a fair comparison in the subsequent training and ratio experiments, the augmented synthetic dataset was constructed to follow the same class-wise composition as the real dataset.

3.2. Construction of the Simulated Dataset

Compared to real images, simulated images allowed flexible control over defect type, shape, and location, offering greater freedom in sample construction and effectively addressing the problem of limited real data. However, due to significant differences in imaging mechanisms and visual styles between simulated and real images, directly using simulated data for deep learning training often led to reduced recognition accuracy or even overfitting. To address this, structurally complete and parameter-defined defect simulations were first generated using ground penetrating radar Maxwell (GprMax3.0). Then, Cycle-GAN was used to perform unsupervised style transfer from the simulated domain to the real-image domain. As a result, a training dataset with high visual similarity to real images and strong controllability was constructed.

The GprMax is an electromagnetic wave simulator based on the finite-difference time-domain (FDTD) method. Initially developed in 1996 [25], GprMax has been one of the most widely used simulation tools in the field of GPR over the past two decades [26,27]. GprMax was capable of simulating several advanced features, such as modeling material anisotropy, expressing dielectric dispersion using multi-pole Debye functions, and simulating pavement structures with semi-empirical dielectric formulas and fractal geometric characteristics.

(1): Simulation Based on GprMax 3.0

The simulated road was designed as a 5 m × 1 m rectangle. Various parameters for the media were configured. Following the pavement structure, three structural layers were defined: surface layer, base layer, and sub-base layer, with relative permittivity values of 4, 9, and 12, respectively [28]. The pavement structure, from top to bottom, consisted of a 0.15 m surface layer, a 0.6 m base layer, and a 0.25 m sub-base layer. During the GprMax simulation, a time window of 25 ns and an electromagnetic wave excitation frequency of 2 GHz were used. The transmitting–receiving pair moved along the simulated surface with a step size of 0.02 m. The dielectric constant of asphalt concrete was set to 6.5, while the dielectric constant of air was defined as 1. The simulation parameters are listed in Table 3.

(2): Analysis of Simulation Results

The simulated results of various types of pavement defects are shown in Figure 7. Crack defects (Figure 7a) were simulated with varying depths and shapes. In the radar images, they appeared as high-amplitude vertical reflections with a characteristic inverted “U” shape, indicating enhanced signal reflections from the sub-base to the surface layer. To further validate the authenticity of these radar features, field coring was performed at the corresponding locations of the real GPR images, and the extracted cores confirmed the presence of structural cracking consistent with the radar response. Loose defects (Figure 7b) were modeled by introducing water- and air-filled voids within the structure, resulting in significant signal scattering. The radar images of loose materials exhibited noticeable variations in amplitude and signal delay, producing rough, mottled and discontinuous textures. Coring verification at these positions revealed loosened aggregate and voided areas, which aligned with the scattering characteristics observed in the radar images. Interlayer debonding defects (Figure 7c) were simulated by inserting thin air layers between structural layers. These generated enhanced interlayer reflections, typically appearing in radar images as distinct gap-like signals with high detectability. Corresponding field cores clearly showed separation between pavement layers, further confirming that the radar-identified features accurately represented interlayer debonding in the actual pavement structure.

To strengthen the empirical validation of the simulated radar signatures, field coring was conducted at radar-anomaly locations identified from the real GPR survey. The coring positions were determined according to the spatial coordinates/chainage of the anomalous reflections in the radar images, using the acquisition positioning information to ensure correspondence between radar responses and physical sampling locations. In total, 20 cores were extracted, including 7 for crack-like anomalies, 6 for loose materials anomalies, and 7 for interlayer debonding anomalies. Among these, 7/7, 5/6, and 6/7 anomalies were physically confirmed by the extracted cores, respectively. These results provide direct field evidence that the characteristic radar responses described in Figure 7 are associated with actual pavement structural defects.

Simulated radar data were generated using the GprMax 3.0 software. To ensure consistency across the simulations, radar images of various pavement defects—including cracks at different depths, interlayer debonding, and loose materials—were created based on the predefined pavement structure. A batch simulation process initially produced 100 radar images. These images were subsequently augmented through data augmentation techniques, resulting in a total of 10,000 simulated radar images for training and analysis. Before data augmentation and Cycle-GAN translation, the initial 100 GprMax-simulated radar images already contained basic structural diversity rather than repeated instances of a single defect template. Specifically, the simulated defect set included variations in defect size, burial depth, geometric shape, and spatial position within the pavement structure. Representative examples of these initial simulated defect configurations are shown in Figure 8.

(3): Radar Image Generation Based on Adversarial Neural Networks

The Cycle-GAN model was used to generate radar images, aiming to achieve mutual conversion between simulated and real radar images [29]. This approach enables effective mapping between image domains without the need for paired image data.

(1) Generators and Discriminators

Cycle-GAN consists of two generators: G1, which transforms simulated radar images into real ones, and G2, which converts real radar images into simulated ones. Each generator comprises three main components: an encoder, a transformer, and a decoder. The encoder progressively extracts low-dimensional representations from the input images, effectively capturing their semantic features. Transformation: This part performs feature mapping and spatial transformations to achieve effective inter-domain feature mapping. Decoding: This part decodes the transformed feature representations into high-resolution images in the target domain [30,31]. The structure of the generator is shown in Figure 9.

The Cycle-GAN architecture included two discriminators, each designed to distinguish between real images and those generated by the corresponding generators. Discriminator D1 evaluated the differences between the input image and real radar images, and output the probability that the input belonged to the real radar image domain, as illustrated in Figure 10. The training objective of the discriminator was to classify real images as real and generated images as fake, thereby enabling adversarial training against the generator.

(2) Loss Functions

The loss functions of Cycle-GAN consist of adversarial loss, cycle consistency loss, and identity loss. The adversarial loss encourages the generator to produce realistic transformed images capable of deceiving the discriminator. The cycle consistency loss ensures consistency in domain transformations by maintaining the difference between the input image and the original input after two transformations. The identity loss ensures that the generator preserves the identity features of the input image. The weighted sum of these loss functions constitutes the total loss function. The generator’s objective is to minimize these losses, while the discriminator aims to minimize the adversarial loss. Therefore, by optimizing these loss functions, Cycle-GAN can train high-quality image transformation generators.

Adversarial loss in Cycle-GAN is the loss function used to train the adversarial interaction between the generator and the discriminator, aiming to ensure that the generator produces realistic samples. In adversarial learning, the generator’s objective is to create synthetic samples that are indistinguishable from real ones, while the discriminator’s goal is to accurately differentiate between real and generated samples. The adversarial loss drives the generator to produce realistic samples by minimizing the negative logarithm of the probability that the generated samples are classified as generated by the discriminator. Specifically, for a generator G and discriminator D, the adversarial loss can be defined as the negative logarithm of the probability that the images generated by G are classified as generated by D, as follows:

L_{1} (G_{1}, D_{2}, X, Y) = \log D_{2} (y) + \log (1 - D_{2} (G_{1} (X)))

(1)

L_{2} (G_{1}, D_{2}, X, Y) = \log D_{1} (x) + \log (1 - D_{2} (G_{1} (y)))

(2)

where G₁ and G₂ represent the forward and backward generators, while D₁ and D₂ represent the discriminators that distinguish between real radar images and simulated radar images. X and Y denote the domains of simulated radar images and real radar images, with x ∈ X and y ∈ Y representing simulated radar images and real radar images, respectively.

Cycle Consistency Loss is a key loss function in Cycle-GAN. Its objective is to ensure that an image can return to its original domain after passing through both generators, achieving bidirectional consistency. During training, the cycle consistency loss forces the generators to learn a mapping such that an image, after undergoing a forward and reverse transformation, is as close as possible to the original image, as shown in Figure 11. To address the inconsistency problems commonly encountered in GANs, Cycle-GAN incorporates least squares loss as part of the cycle consistency loss. By using least squares loss to measure the sum of squared differences between the original image and the image after undergoing two generator transformations back to its original domain, the model achieves better consistency in transformations. This loss module allows the generators to learn image transformation mappings more accurately, producing more stable and precise transformation results. The specific formula for the least squares loss is as follows:

L_{adv 1} (G_{1}, D_{2}, x, y) = \frac{1}{2} \times [{(D_{2} (G_{1} (x)) - 1)}^{2} + {(D_{2} (y) - 1)}^{2} + D_{2} {(G_{1} (x))}^{2}]

(3)

L_{adv 2} (G_{2}, D_{1}, x, y) = \frac{1}{2} \times [{(D_{1} (G_{1} (y)) - 1)}^{2} + {(D_{2} (x) - 1)}^{2} + D_{1} {(G_{2} (y))}^{2}]

(4)

L_{adv 3} (G_{1}, G_{2}, D_{1}, D_{2}) = L_{adv} (G_{1}, D_{2}, x, y) + L_{adv} (G_{2}, D_{1}, x, y)

(5)

where L_adv1 and L_adv2 represent the adversarial losses between G₁ and D₂, and G₂ and D₁, respectively. L_adv3 represents the final adversarial loss.

Identity Loss is a loss function used in Cycle-GAN. Its objective is to ensure that the input image retains its identity after transformation. During adversarial network training, the generator G is used to generate images in a specific style, y. To demonstrate that G can generate images in the style of y, feeding y into G should result in y itself. In other words, G(y) and y should be as close as possible. Without a corresponding loss function, the generator might alter the image’s tone, causing overall color shifts. Therefore, introducing identity loss helps maintain image consistency and original features, ensuring that the generator accurately produces images in the desired style. The calculation of identity loss is as follows:

L_{Identity} (G, F) = E_{y ~ p_{data} (y)} [‖ G (y) - y ‖_{1}] + E_{x ~ p_{data} (x)} [‖ F (x) - x ‖_{1}]

(6)

The total loss function is composed of these different losses, represented as a weighted sum of the generator and discriminator losses, as shown in the equation. The generator’s objective is to minimize adversarial loss, cycle consistency loss, and identity loss, while the discriminator aims to minimize adversarial loss. By optimizing these loss functions, Cycle-GAN can train generators capable of bidirectional image translation between domains.

L (G_{1}, G_{2}, D_{X}, D_{Y}) = L_{GAN} (G_{1}, D_{Y}, X, Y) + L_{GAN} (G_{2}, D_{X}, Y, X) + λ L_{cyc} (G_{1}, G_{2})

(7)

where L_GAN(G₁, D_Y, X, Y) represents the adversarial loss for generating domain Y from domain X, L_GAN(G₂, D_Y, X, Y) represents the adversarial loss for generating domain X from domain Y, and λL_cyc(G₁, G₂) denotes the cycle consistency loss for generators G₁ and G₂. The λ coefficient is a hyperparameter used to scale the cycle consistency loss.

(3) Model Architecture and Principles

The core of Cycle-GAN lies in achieving bidirectional image translation between two domains using two generators (Generator A → B and Generator B → A). Additionally, it incorporates Cycle Consistency Loss to ensure that the mapped image, after reverse mapping, remains consistent with the original image. The detailed steps are illustrated in Figure 12.

(4) Environment Configuration and Training Parameter Settings

The Cycle-GAN model was implemented using the PyTorch (v1.10.0) framework. The training parameters were set as follows: the number of epochs was 200; the identity loss weight (lambda_identity) was 0.5; the adversarial mode (gan_mode) was set to “lsgan”; the learning-rate decay policy (lr_policy) was linear; lr_decay_iters was 50; and the batch size was 1.

To determine suitable hyperparameters for the Cycle-GAN model, a two-step analysis was performed. First, the learning rate was evaluated using three candidate values: 0.0002, 0.0005, and 0.001, while the cycle consistency loss weights were fixed at the default setting (lambda_A = lambda_B = 10). Second, after selecting the learning rate, the cycle consistency loss weights were further tested using three configurations: (5, 5), (10, 10), and (20, 20).

(5) Practical Implementation Details of Cycle-GAN Training

For practical implementation, the Cycle-GAN was trained using unpaired simulated and real GPR image sets from the two domains. The simulated domain was constructed from the GprMax-generated defect images and their augmented versions, whereas the real domain consisted of the field-collected GPR images prepared through the preprocessing, segmentation, and annotation workflow described above. Before training, images from both domains were normalized and resized to a consistent input resolution required by the network. The final checkpoint was selected based on the combined consideration of convergence behavior during training, the visual quality of the translated radar images, and the quantitative similarity evaluation between translated and real images.

(6) Hyperparameter Testing

The initial learning rates were set to 0.0002, 0.0005, and 0.001, with the cycle consistency loss weights fixed at lambda_A = lambda_B = 10. The model was trained for 200 epochs under each setting. As shown in Figure 13, the case with a learning rate of 0.001 did not reach a sufficiently stable convergence state within 200 epochs, whereas both 0.0002 and 0.0005 showed more stable convergence behavior. Considering both convergence stability and training efficiency, 0.0005 was selected as the learning rate for the subsequent experiments.

After fixing the learning rate at lr = 0.0005, the cycle consistency loss weights were further tested using three combinations: lambda_A = lambda_B = 5, lambda_A = lambda_B = 10, and lambda_A = lambda_B = 20. As shown in Figure 14, all three settings achieved stable convergence within 200 epochs. Based on the overall convergence behavior and generated-image quality, the model trained with lr = 0.0005 showed satisfactory robustness and stable optimization performance.

(4): Quantitative Evaluation of Simulation Fidelity and Cycle-GAN Translation Quality

For the quantitative evaluation of image similarity, two different comparison protocols were adopted according to the nature of the metrics. PSNR, SSIM, and MSE were computed on image pairs formed between the sampled simulated, translated, and real radar images after consistent preprocessing and intensity normalization. Since the Cycle-GAN training in this study was unpaired, these image pairs were used only for relative similarity comparison rather than for strict pixel-wise correspondence validation. By contrast, FID, KID, and LPIPS were evaluated at the distribution level between image sets from different domains, which is more suitable for unpaired image translation tasks. Before evaluation, all radar images were converted to the same image format and normalized to a consistent intensity range to ensure comparability across domains.

To quantitatively assess the similarity between simulated and real radar images, 100 simulated images and 100 real images were randomly selected for comparison using Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) and Mean Squared Error (MSE). As shown in Table 4, the simulated images achieved an average SSIM of 0.74 and a PSNR of 21.8 dB relative to real images, indicating moderate structural similarity. After Cycle-GAN translation, the SSIM increased to 0.82, and the PSNR reached 24.9 dB, while the MSE decreased from 0.0126 to 0.0084. These results demonstrate that the Cycle-GAN effectively narrows the domain gap between simulated and real radar images, producing images that more closely resemble real GPR observations.

To further quantify the performance of the Cycle-GAN, the Fréchet Inception Distance (FID), Kernel Inception Distance (KID), and Learned Perceptual Image Patch Similarity (LPIPS) metrics were calculated between simulated, translated, and real images. As summarized in Table 5, the original simulated images exhibited a high FID of 85.7, while the Cycle-GAN–translated images achieved a significantly lower FID of 46.3. The KID score decreased from 12.6 × 10⁻³ to 5.8 × 10⁻³, and the LPIPS value decreased from 0.412 to 0.276. These improvements demonstrate that Cycle-GAN substantially reduces the domain discrepancy, producing synthetic GPR images that are substantially closer to real radar images in both perceptual structure and statistical characteristics.

4. YOLO Model Training

YOLOv5, YOLOX, YOLOv7 and YOLOv8 were selected because they represent four major evolutionary stages of the YOLO family: (1) YOLOv5 as the widely used engineering version with mature tools; (2) YOLOX as an anchor-free redesign; (3) YOLOv7 as the state-of-the-art real-time detector in 2022; and (4) YOLOv8 as the latest Ultralytics framework. Earlier versions such as YOLOv4 or alternative frameworks such as YOLO-NAS were not adopted due to either lower accuracy on small object detection or lack of official benchmarking consistency.

For reproducibility, the detectors used in this study were the original official versions of YOLOv5, YOLOX, YOLOv7, and YOLOv8. No lightweight or enlarged scaled variants were separately introduced, and no structural modification was applied. All models were trained using their default architectures under the same dataset and optimization settings for fair comparison.

4.1. Principles and Framework of the YOLO Model

It is worth noting that the architectural characteristics of the YOLO models are particularly relevant to the challenges of GPR image analysis. GPR defects typically appear as small-scale, low-contrast, noise-like patterns embedded within complex backgrounds. YOLOv5 and YOLOv8 leverage PANet and C2f-based multi-scale feature aggregation, which enhances the detection of weak and small targets. YOLOX benefits from its anchor-free design and dynamic SimOTA label assignment, allowing it to better adapt to the irregular shapes of radar reflections. YOLOv7 incorporates the ELAN and SPPCSPC modules, which improve gradient flow and multi-scale representation, making it especially effective for identifying subtle waveform differences such as narrow cracks or thin debonding layers.

(1): YOLOv5

YOLOv5 was included as a mature and widely used one-stage detector with stable engineering implementation [32], as show in Figure 15. Its multi-scale feature fusion mechanism is beneficial for detecting weak and small defect signatures in GPR images, making it an appropriate baseline model for the present comparison [33].

(2): YOLOX

YOLOX adopts an anchor-free prediction mechanism and the SimOTA label-assignment strategy. These characteristics make it potentially suitable for radar targets with irregular shapes and variable reflection patterns. In this study, YOLOX was used to examine whether anchor-free detection is advantageous for subsurface defect identification in GPR images [34]. The principle is illustrated in Figure 16.

The network architecture of YOLOX consists of the backbone network CSPDarknet [35], an enhanced feature extraction network, and the YOLOHead prediction head, as shown in Figure 17.

(3): YOLOv7

In object detection, YOLOv7 integrates sophisticated structural innovations to enhance detection accuracy and processing efficiency [36]. The architecture is shown in Figure 18. Starting with image input, the data is transformed within the neural network to detect and accurately localize objects.

At its core, the architecture is divided into three components: the backbone, the neck, and the head. The backbone is designed for feature extraction, incorporating convolutional blocks (CBL), an Efficient Layer Aggregation Network (ELAN) module [37] to enhance feature extraction, and an arrangement of MaxPooling layers to reduce spatial dimensions [38].

(4): YOLOv8

YOLOv8 is the latest YOLO model from Ultralytics [39], renowned for object detection, image classification, and instance segmentation. It builds upon the success of YOLOv5 by modifying its architecture and improving the developer experience.

YOLOv8 is an enhanced object detection system based on the original YOLO concept. This innovative approach uses a grid-style framework to identify and classify objects. It first scales the input image to a standard size and then divides it into smaller sections or grid cells, as shown in Figure 19.

Overall, the comparison among these four detectors was intended to identify the model architecture most suitable for the current GPR task, rather than to propose a modified detection network. The subsequent experiments therefore focus on comparative detector performance under the same dataset and training settings.

4.2. Training Settings

In this experiment, all models were trained using the Stochastic Gradient Descent (SGD) optimizer, combined with a cosine annealing strategy to reduce the learning rate. The initial learning rate was set to 0.01, with the lowest point of cosine annealing set to 0.1. Additionally, the batch size was set to 8, and the number of training epochs was set to 300. The detailed experimental environment configurations are shown in Table 6.

4.3. Dataset Composition Ratio Experiment

The experiment used a training dataset consisting of 10,000 radar images containing three types of pavement defects. Four object detection models—YOLOv5, YOLOX, YOLOv7, and YOLOv8—were each trained for 300 epochs. The training process was analyzed by comparing the changes in loss values and detection accuracy across models. Sample images from the dataset are shown in Figure 20.

Based on the analysis of the results, the best object recognition model was selected. This model was then loaded and trained using the datasets. The model’s performance was tested under different simulated image participation ratios, with the validation set consisting of real data. By evaluating the model’s performance on different dataset combinations, the impact of various participation ratios on the model’s performance was assessed, leading to the identification of the optimal ratio configuration. The specific dataset distribution ratios are shown in Table 7.

5. Results and Discussion

After training the four object detection models—YOLOv5, YOLOX, YOLOv7, and YOLOv8—for 300 epochs, a comparative analysis of loss value changes and accuracy variations during the training process was conducted. The analysis results are shown in Figure 21 and Figure 22. These results provide an initial comparison of the learning behavior and convergence characteristics of the four models. To further assess their practical applicability, additional performance metrics such as inference speed and model size were also analyzed in subsequent sections.

As shown in Figure 21, it can be observed that the loss values of all four models stabilized before the end of training, demonstrating good convergence. Specifically, the loss values on the training and validation sets for each model are as follows: YOLOv5 (2.89, 3.95), YOLOX (4.24, 3.59), YOLOv7 (4.97, 5.22), and YOLOv8 (3.45, 3.78). Among them, YOLOv8’s loss values were still decreasing at the end of 300 epochs, indicating that it had not fully converged, while the other three models had stabilized, suggesting sufficient training.

As shown in Figure 22, in image recognition tasks, the mAP is one of the key metrics for evaluating model performance. A comparative analysis of the mAP across 300 epochs revealed that YOLOv7 achieved the highest mAP at 83%, demonstrating the best recognition performance. The mAP of YOLOv5 and YOLOX were 80% and 73%, respectively, ranking next, while YOLOv8 had the lowest mAP at only 54%. These results indicate that YOLOv7 exhibits the best object detection capability for this task.

Further analysis of YOLOv7’s recognition accuracy for different defect categories (Figure 23) shows that it achieved the highest accuracy for cracks at 88.2%, followed by interlayer debonding at 85.3%, and the lowest accuracy for loose materials, at only 76.7%. The high accuracy for crack detection may be attributed to the distinct waveforms, concentrated regions, and consistent shapes of cracks in radar images, which exhibit significant radar characteristics. In contrast, looseness defects are more difficult to identify due to their similarity to background features and unclear boundaries in radar images, leading to lower accuracy.

As discussed above, YOLOv7 outperformed the other models in this experiment, not only leading in the mAP metric but also achieving a better balance in recognition accuracy across different defect categories. Furthermore, the inference speed and model size comparison presented in Table 8 shows that YOLOv7 provides an excellent trade-off between detection accuracy and computational efficiency. Therefore, subsequent studies will focus on selecting the YOLOv7 model for further analysis and applications.

The mixed datasets were sequentially input into the YOLOv7 model, with a real validation set used as the validation data. During the training process, training and validation losses were monitored, and corresponding charts were plotted for analysis. Based on the plotted charts, after 200 epochs of training, all dataset combinations involving different proportions of real and synthetic data exhibited a common phenomenon, except for the first group composed entirely of real data: the validation loss gradually stopped decreasing, while the training loss continued to decrease. This is considered a sign of overfitting. Under these dataset combinations, the model’s performance reached an implicit upper limit, as shown in Figure 24.

The results of this training session are summarized in Table 9. The results in the table show that as the proportion of real datasets decreases, the mean Average Precision (mAP) steadily declines from 83.4% to 79.3%. Thus, the overall performance of the model gradually decreases, suggesting that synthetic data negatively impacts the overall performance of the model. From the perspective of crack detection, as the proportion of real datasets decreases, the accuracy of crack detection drops from 88.2% to 78.9%, a decline of 9.3%. This indicates that the inclusion of synthetic data negatively impacts crack detection accuracy. Similarly, for interlayer debonding, which originally had a high recognition accuracy, the addition of synthetic data caused the accuracy to decrease from 85.3% to 80.2%. Among the three types of recognition targets, the only one with improved recognition accuracy is looseness, where the accuracy increased slightly from 76.7% to 78.9%. The reason for this improvement may be that its initial recognition accuracy was relatively low. Additionally, looseness was underrepresented in the original dataset, making it difficult to obtain sufficient training data. Training with synthetic data improved the model’s ability to detect looseness.

To provide a more comprehensive interpretation of the effect of different real-data ratios, the class-wise precision, recall, and F1-score of YOLOv7 are further presented in Figure 25. Overall, as the proportion of real data increases, the detection performance for crack and interlayer-debonding defects shows a clear upward trend across all three metrics. This tendency is particularly evident for crack detection, whose precision, recall, and F1-score all improve noticeably with increasing real-data participation, indicating that its radar signatures are more effectively learned from real images than from translated synthetic samples. A similar trend is also observed for interlayer debonding, although the improvement is slightly less pronounced.

In contrast, loose-material detection exhibits a different pattern. Its precision and F1-score remain relatively stable across different real-data ratios, while its recall shows only limited fluctuation and does not increase as clearly as that of the other two defect categories. This suggests that synthetic data may provide some supplementary variability for the loose-material class, which is more weakly represented and more difficult to identify in the original real dataset.

Taken together, the results in Figure 25 are consistent with those in Table 8 and further confirm that the contribution of synthetic data in the present framework is not universal but strongly class-dependent. Although the inclusion of synthetic data may provide limited support for loose-material detection, higher proportions of real data are still more beneficial for the recognition of crack and interlayer-debonding defects and for the overall stability of detection performance under the current experimental conditions.

A possible explanation for the class-dependent behavior is that the three defect categories exhibit substantially different radar-image characteristics. Crack and interlayer-debonding defects already showed relatively high recognition accuracy in the real-data setting, indicating that their signatures in real radar images were comparatively stable and discriminative. Under such conditions, the inclusion of translated synthetic data may have introduced distributional differences that weakened the detector’s adaptation to the real-image characteristics of these classes. In contrast, loose defects were more weakly defined in radar images, with diffuse boundaries and stronger similarity to background clutter, and were also relatively underrepresented in the original real dataset. For this reason, the addition of synthetic data may have provided useful supplementary variation for this difficult class, leading to a slight improvement in recognition performance.

Nevertheless, while the above results demonstrate the effectiveness of the proposed framework under the current experimental conditions, their generalizability should be interpreted with caution. In this study, the field dataset was collected from a single expressway section in Hubei Province using one specific GeoScope 3D GPR system. Since GPR image characteristics are highly sensitive to equipment configuration, antenna properties, acquisition settings, and pavement material composition, the radar responses observed in this work may remain site-specific. Therefore, the present results should be regarded as preliminary evidence of feasibility under the investigated conditions, rather than as universally applicable conclusions. Further validation using datasets collected from different road sections, pavement structures, environmental conditions, and GPR platforms is still required to assess the broader robustness and transferability of the proposed method.

6. Conclusions

This study focused on the intelligent recognition of internal pavement defects and developed an integrated recognition framework combining simulation modeling, image style transfer, and deep object detection. The main conclusions are as follows:

(1) A simulation dataset was created based on GprMax, simulating typical pavement defects such as cracks, loose materials, and interlayer debonding. Reasonable material parameters and structural models were designed to generate high-fidelity radar images, providing a controllable and scalable data source for subsequent detection models.

(2) Cycle-GAN was used to translate simulated radar images into a more realistic visual domain and thereby reduce the domain discrepancy between simulated and real images. However, the contribution of these translated synthetic images to downstream detection performance was limited. Under the present dataset conditions, their benefit was mainly observed for loose material defects, whereas no overall improvement was obtained for all categories.

(3) A systematic comparison was conducted to evaluate the performance of four detection models—YOLOv5, YOLOX, YOLOv7, and YOLOv8—on the GPR image defect recognition task. The results indicated that YOLOv7 performed best in terms of mean average precision (mAP), multi-scale feature detection capability, and the recognition of prominent features such as cracks, achieving an mAP of 83.4% and a crack detection rate of 88.2%.

(4) YOLOv7 was selected as the best-performing model, and the influence of different real-to-synthetic data ratios on detection performance was further evaluated. The results showed that synthetic data provided only very narrow and highly class-dependent utility. This conclusion was supported not only by the mAP results but also by the class-wise precision, recall, and F1-score analysis. As the proportion of real data increased, crack and interlayer-debonding detection generally improved across these metrics, whereas loose-material detection remained relatively stable and showed only limited fluctuation. Therefore, the synthetic data strategy in the present framework should not be interpreted as a broadly effective solution but rather as a limited and category-specific aid under the investigated conditions.

(5) Model performance gradually saturated after 200 training epochs, especially for training with less than 100% real data combinations, where the validation loss stopped decreasing midway through training. This indicated that both the dataset size and model capability had reached their limits. Further improvements in recognition accuracy would require optimization of the network structure, enhancement of real data diversity, or the introduction of semi-supervised learning strategies.

Author Contributions

Conceptualization, J.Y. and Y.Y.; methodology, S.Y.; software, S.Y.; validation, S.Y. and S.C.; formal analysis, S.Y.; investigation, S.C.; resources, X.A.; data curation, X.A.; writing—original draft preparation, S.Y.; writing—review and editing, Y.Y.; visualization, S.Y.; supervision, J.Y.; project administration, J.Y.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Jiangxi Province (Grant Number 20252BAC200341).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The testing and analysis data used to support the findings of this study are included within the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to acknowledge Lichao Zhang (Jiangxi Lutong Technology Co., Ltd., China) for his support in data processing and data curation for this work. All authors of the following references are much appreciated. Finally, the authors would like to thank the reviewers for their time and insightful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Leng, Z.; Al-Qadi, I.L. An innovative method for measuring pavement dielectric constant using the extended CMP method with two air-coupled GPR systems. NDT E Int. 2014, 66, 90–98. [Google Scholar] [CrossRef]
Chen, D.H.; Hong, F.; Zhou, W.; Ying, P. Estimating the hotmix asphalt air voids from ground penetrating radar. NDT E Int. 2014, 68, 120–127. [Google Scholar] [CrossRef]
Zhao, S.; Al-Qadi, I.L. Development of an analytic approach utilizing the extended common midpoint method to estimate asphalt pavement thickness with 3-D ground-penetrating radar. NDT E Int. 2016, 78, 29–36. [Google Scholar] [CrossRef]
Zhao, S.; Al-Qadi, I.L. Super-resolution of 3-D GPR signals to estimate thin asphalt overlay thickness using the XCMP method. IEEE Trans. Geosci. Remote Sens. 2018, 57, 893–901. [Google Scholar] [CrossRef]
Zhang, J.; Yang, X.; Li, W.; Zhang, S.; Jia, Y. Automatic detection of moisture damages in asphalt pavements from GPR data with deep CNN and IRS method. Autom. Constr. 2020, 113, 103119. [Google Scholar] [CrossRef]
Shangguan, P.; Al-Qadi, I.L. Calibration of FDTD simulation of GPR signal for asphalt pavement compaction monitoring. IEEE Trans. Geosci. Remote Sens. 2014, 53, 1538–1548. [Google Scholar] [CrossRef]
Hong, S.; Chen, D.; Dong, B. Numerical simulation and mechanism analysis of GPR-based reinforcement corrosion detection. Constr. Build. Mater. 2022, 317, 125913. [Google Scholar] [CrossRef]
Wang, R.; Yin, T.; Zhou, E.; Qi, B. What indicative information of a subsurface wetted body can be detected by a ground-penetrating radar (GPR)? A laboratory study and numerical simulation. Remote Sens. 2022, 14, 4456. [Google Scholar] [CrossRef]
Razali, M.; Joret, A.; Che, C.K.N.A.H.; Abdullah, M.F.L.; Baharudin, E. Simulation study for underground object detection using pulse ground-penetrating radar (GPR) system. In 2020 IEEE Student Conference on Research and Development (SCOReD); IEEE: New York, NY, USA, 2020; pp. 138–143. [Google Scholar]
Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In 2016 IEEE International Conference on Image Processing (ICIP); IEEE: New York, NY, USA, 2016; pp. 3708–3712. [Google Scholar]
Huyan, J.; Li, W.; Tighe, S.; Xu, Z.; Zhai, J. CrackU-net: A novel deep convolutional neural network for pixelwise pavement crack detection. Struct. Control Health Monit. 2020, 27, e2551. [Google Scholar] [CrossRef]
Hou, F.; Lei, W.; Li, S.; Xi, J. Deep learning-based subsurface target detection from GPR scans. IEEE Sens. J. 2021, 21, 8161–8171. [Google Scholar] [CrossRef]
Kang, M.S.; Kim, N.; Lee, J.J.; An, Y.K. Deep learning-based automated underground cavity detection using three-dimensional ground penetrating radar. Struct. Health Monit. 2020, 19, 173–185. [Google Scholar] [CrossRef]
Li, Y.; Liu, C.; Yue, G.; Gao, Q.; Du, Y. Deep learning-based pavement subsurface distress detection via ground penetrating radar data. Autom. Constr. 2022, 142, 104516. [Google Scholar] [CrossRef]
Dai, Q.; Lee, Y.H.; Sun, H.H.; Ow, G.; Yusof, M.L.M.; Yucel, A.C. 3DInvNet: A deep learning-based 3D ground-penetrating radar data inversion. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–16. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast r-cnn. arXiv 2015, arXiv:1504.08083. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Qiu, Z.; Zhao, Z.; Chen, S.; Zeng, J.; Huang, Y.; Xiang, B. Application of an improved YOLOv5 algorithm in real-time detection of foreign objects by ground penetrating radar. Remote Sens. 2022, 14, 1895. [Google Scholar] [CrossRef]
Yan, K.; Xu, X.; Zhu, P.; Zhang, Z. Centralized feature pyramid-based supervised deep learning for object detection model from GPR data. Geophys. Prospect. 2024, 72, 3414–3435. [Google Scholar] [CrossRef]
Goyal, P.; Tarai, S.; Maiti, S.; Chongder, P. Classification of Ground penetrating radar data using YOLOv8 model. In 2024 IEEE Space, Aerospace and Defence Conference (SPACE); IEEE: New York, NY, USA, 2024; pp. 1100–1103. [Google Scholar]
Cheng, Z.; He, Z.; Pan, P. 3D reconstruction of subsurface pipes and cavities using ground penetrating radar based on deep learning. NDT E Int. 2025, 158, 103579. [Google Scholar] [CrossRef]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV); IEEE: New York, NY, USA, 2017; pp. 2223–2232. [Google Scholar]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2017; pp. 1125–1134. [Google Scholar]
Giannopoulos, A. Modelling ground penetrating radar by GprMax. Constr. Build. Mater. 2005, 19, 755–762. [Google Scholar] [CrossRef]
Slob, E.; Sato, M.; Olhoeft, G. Ground-Penetrating Radar: Surface and borehole ground-penetrating-radar developments. In Geophysics Today: A Survey of the Field as the Journal Celebrates Its 75th Anniversary; Fomel, S., Ed.; Society of Exploration Geophysicists: Houston, TX, USA, 2010. [Google Scholar]
Solla, M.; Lorenzo, H.; Rial, F.I.; Novo, A. Ground-penetrating radar for the structural evaluation of masonry bridges: Results and interpretational tools. Constr. Build. Mater. 2012, 29, 458–465. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, Z.; Yuan, Y.; Zhu, Y.; Wang, K. Quantitative Evaluation of Internal Pavement Distresses Based on 3D Ground Penetrating Radar. Balt. J. Road Bridge Eng. 2025, 20, 45–69. [Google Scholar] [CrossRef]
Yang, Y.; Huang, L.; Zhang, Z.; Zhang, J.; Zhao, G. CycleGAN-based data augmentation for subgrade disease detection in GPR images with YOLOv5. Electronics 2024, 13, 830. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]
Ulyanov, D. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A metric and a loss for bounding box regression. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2019; pp. 658–666. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2017; pp. 2117–2125. [Google Scholar]
Ge, Z. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
Wang, C.Y.; Liao, H.; Wu, Y.H.; Chen, P.Y.; Yeh, I.H. A New Backbone that can Enhance Learning Capability of CNN. In 2020 CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); IEEE: New York, NY, USA, 2020; pp. 390–391. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2023; pp. 7464–7475. [Google Scholar]
Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient long-range attention network for image super-resolution. In Computer Vision—ECCV 2022; Springer Nature: Cham, Switzerland, 2022; pp. 649–667. [Google Scholar]
Yue, G.; Liu, C.; Li, Y.; Du, Y.; Guo, S. Gpr data augmentation methods by incorporating domain knowledge. Appl. Sci. 2022, 12, 10896. [Google Scholar] [CrossRef]
Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-time flying object detection with YOLOv8. arXiv 2023, arXiv:2305.09972. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of step frequency.

Figure 2. Illustration of 3D ground penetrating radar data acquisition.

Figure 3. Detailed information of the surveyed section. (a) Project location overview; (b) Overview of Xiangjin expressway inspection; (c) Pavement structure diagram of the surveyed section.

Figure 4. Dataset preprocessing flowchart.

Figure 5. Image segmentation flowchart.

Figure 6. Annotation flowchart.

Figure 7. Comparison of radar images for defects. (a) Crack; (b) Loose materials/Soil; (c) Interlayer debonding.

Figure 8. Representative GprMax-simulated radar images under different parameter settings. (a) Simulated crack defects; (b) simulated loose materials defects; (c) simulated interlayer debonding defects.

Figure 9. Network structure diagram of Cycle-GAN generator.

Figure 10. Network structure diagram of Cycle-GAN discriminator.

Figure 11. Cycle-GAN network cycle consistency loss diagram.

Figure 12. Schematic diagram of the principle of the cyclic adversarial neural network.

Figure 13. Training loss diagrams for three learning rates. (a) 0.0002; (b) 0.0005; (c) 0.001.

Figure 14. Training loss diagrams for three sets of lambda values. (a) lambda_A = 5 lambda_B = 5; (b) lambda_A = 10 lambda_B = 10; (c) lambda_A = 20 lambda_B = 20.

Figure 15. YOLOv5 model architecture.

Figure 16. Slice operation in the focus structure. (a) Anchor-based prediction; (b) Anchor-free prediction.

Figure 17. YOLOX model architecture.

Figure 18. YOLOv7 model architecture.

Figure 19. YOLOv8 model architecture.

Figure 20. Sample images from the dataset. (a) Original Radar Image; (b) Adversarially Augmented Image.

Figure 21. Loss diagram during model training process. (a) YOLOv5; (b) YOLOX; (c) YOLOv7; (d) YOLOv8.

Figure 22. Mean average precision (mAP) change diagram during the model training process. (a) YOLOv5; (b) YOLOX; (c) YOLOv7; (d) YOLOv8.

Figure 23. Recognition accuracy of various target classes by the YOLOv7 model.

Figure 24. Loss diagram of model training with different proportions of real datasets. (a) 100%; (b) 70%; (c) 50%; (d) 30%; (e) 0%.

Figure 25. Class-wise precision, recall, and F1-score of YOLOv7 under different real data ratios. (a) precision; (b) recall; (c) F1-score.

Table 1. Preprocessing parameter settings.

Module	Parameter	Value
Interference suppression	Power limit	10
Interference suppression	Output percentages	Disabled
Inverse Fast Fourier Transform (IFFT)	Window type	Kaiser
	Kaiser beta	6
	Use full BW	Enabled
	Minimum frequency	30
	Maximum frequency	3050
Background removal filter	Filter mode	Sliding window mean
	Removal	100
	Start depth	2
	Transition zone	4
	Filter length	100
Gradual low-pass filter	Vertical filtering	Enabled
Gradual low-pass filter	Horizontal filtering	Enabled
Seismic migration	Maximum radius	0.75
Seismic migration	Half angle	30

Table 2. Class-wise distribution of the GPR dataset used for training and validation.

Subset	Crack	Loose Materials	Interlayer Debonding	Total
Training set	3200	1800	3000	8000
Validation set	800	450	750	2000

Table 3. Simulation parameter settings.

Filling Material	Relative Permittivity	Conductivity (S/m)
Surface Layer	4	0.005
Base Layer	9	0.05
Substrate Layer	12	0.1
Air	1	0

Table 4. Quantitative similarity evaluation between simulated, Cycle-GAN–translated, and real GPR images.

Image Pair Comparison	PSNR (dB)	SSIM	MSE
Simulated to Real	21.8 ± 2.4	0.74 ± 0.06	0.0126 ± 0.0041
Translated (Cycle-GAN) to Real	24.9 ± 2.1	0.82 ± 0.04	0.0084 ± 0.0033
Simulated to Translated	23.5 ± 2.3	0.79 ± 0.05	0.0098 ± 0.0037

Table 5. FID/KID/LPIPS metrics before and after Cycle-GAN translation.

Evaluation Metric	Simulated to Real	Translated to Real	Improvement
FID	85.7	46.3	39.4
KID × 10³	12.6	5.8	6.8
LPIPS	0.412	0.276	0.136

Table 6. Experimental environment configuration.

Type	Parameter
CPU model	AMD Ryzen 5 5600 (Advanced Micro Devices, Santa Clara, CA, USA)
Memory	16 GB
GPU model	NVIDIA GeForce RTX 3060 (NVIDIA Corporation, Santa Clara, CA, USA)
Graphics Memory	12 GB
Compute Power	8.6
Programming Language	Python 3.8
Deep Learning Framework	Pytorch 2.0.1

Table 7. Dataset distribution ratios.

Number	Real Dataset Images	Adversarial Dataset Images	Proportion of Real Dataset
1	10,000	0	100%
2	7000	3000	70%
3	5000	5000	50%
4	3000	7000	30%
5	0	10,000	0%

Table 8. Comparison of model accuracy and inference speed.

Model	mAP (%)	FPS (GPU)	Model Size (MB)
YOLOv5	80.1	82.2	27.12
YOLOX	73.2	75.3	25.71
YOLOv7	83.5	89.1	72.23
YOLOv8	54.1	95.2	43.56

Table 9. Relationship between real dataset proportion and model accuracy.

Number	Proportion of Real Datasets	Accuracy/%
Number	Proportion of Real Datasets	Cracks	Interlayer Debonding	Loose Materials/Soil	mAP
1	100%	88.2	85.3	76.7	83.4
2	70%	83.3	82.9	77.3	81.2
3	50%	84.3	83.6	78.6	82.2
4	30%	80.6	81.5	76.8	79.6
5	0%	78.9	80.2	78.9	79.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, J.; Yu, S.; Yao, Y.; Cao, S.; Ai, X. From Simulation to Reality: GAN-Based Transformation of Pavement Defect Images for YOLO Detection. Appl. Sci. 2026, 16, 2978. https://doi.org/10.3390/app16062978

AMA Style

Yang J, Yu S, Yao Y, Cao S, Ai X. From Simulation to Reality: GAN-Based Transformation of Pavement Defect Images for YOLO Detection. Applied Sciences. 2026; 16(6):2978. https://doi.org/10.3390/app16062978

Chicago/Turabian Style

Yang, Jiangang, Shukai Yu, Yuquan Yao, Shiji Cao, and Xiaojuan Ai. 2026. "From Simulation to Reality: GAN-Based Transformation of Pavement Defect Images for YOLO Detection" Applied Sciences 16, no. 6: 2978. https://doi.org/10.3390/app16062978

APA Style

Yang, J., Yu, S., Yao, Y., Cao, S., & Ai, X. (2026). From Simulation to Reality: GAN-Based Transformation of Pavement Defect Images for YOLO Detection. Applied Sciences, 16(6), 2978. https://doi.org/10.3390/app16062978

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From Simulation to Reality: GAN-Based Transformation of Pavement Defect Images for YOLO Detection

Abstract

1. Introduction

2. 3D Ground Penetrating Radar Data Acquisition

3. Dataset Construction

3.1. Construction of the Real-World Dataset

3.2. Construction of the Simulated Dataset

4. YOLO Model Training

4.1. Principles and Framework of the YOLO Model

4.2. Training Settings

4.3. Dataset Composition Ratio Experiment

5. Results and Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI