Synthetic Data Generation Based on RDB-CycleGAN for Industrial Object Detection

: Deep learning-based methods have demonstrated remarkable success in object detection tasks when abundant training data are available. However, in the industrial domain, acquiring a sufﬁcient amount of training data has been a challenge. Currently, many synthetic datasets are created using 3D modeling software, which can simulate real-world scenarios and objects but often cannot achieve complete accuracy and realism. In this paper, we propose a synthetic data generation framework for industrial object detection tasks based on image-to-image translation. To address the issue of low image quality that can arise during the image translation process, we have replaced the original feature extraction module with the Residual Dense Block (RDB) module. We employ the RDB-CycleGAN network to transform CAD models into realistic images. Additionally, we have introduced the SSIM loss function to strengthen the network constraints of the generator and conducted a quantitative analysis of the improved RDB-CycleGAN-generated synthetic data. To evaluate the effectiveness of our proposed method, the synthetic data we generate effectively enhance the performance of object detection algorithms on real images. Compared to using CAD models directly, synthetic data adapt better to real-world scenarios and improve the model’s generalization ability.


Introduction
The advent of the Industry 4.0 era has brought enormous opportunities and challenges to the industrial sector [1].In this digital and intelligent age, object detection technology has become particularly crucial in the industrial domain.As one of the key technologies in industrial automation and intelligent manufacturing, object detection provides strong support for the intelligent perception of industrial equipment, automated control of production processes, and improvement of product quality [2].
However, in industrial object detection, we face a series of challenges.Firstly, industrial environments often exhibit complex and dynamic characteristics, involving a wide variety of object types and shapes [3]: for instance, various components on a production line, different types of packaging, and more.These objects may come in different sizes, shapes, colors, and materials, and they might overlap, occlude, or be positioned at various orientations and angles.This diversity makes it difficult for object detection algorithms to adapt to such highly varied situations, leading to insufficient detection accuracy and robustness [4].
Secondly, data collection and annotation in industrial production processes are typically labor-intensive and time-consuming tasks.Data annotation requires skilled personnel to manually label objects, such as bounding boxes or assigning labels to them.This process consumes a lot of human resources and time, and usually requires professional knowledge.Particularly for certain industrial scenarios or rare industrial components, there may be no readily available datasets to use.This results in a relative scarcity of datasets that can be used for training, limiting the performance of object detection algorithms in industrial applications [5].
To address the issue of the lack of datasets in industrial object detection, we have noticed many methods of data augmentation, such as the Object-Based Augmentation approach proposed by Svetlana Illarionova et al. in the field of remote sensing [6].Additionally, Golnaz Ghiasi et al. proposed a simple copy-and-paste method for data augmentation [7].Moreover, researchers began to use synthetic datasets [8][9][10].Synthetic datasets are created using computer graphics techniques and simulated physical processes to generate realistic synthetic images with corresponding annotated information of the targets.Compared to real datasets, synthetic datasets offer advantages such as rapid acquisition, flexible generation, and customization for different scenarios and tasks.B. Kiefer et al. utilized synthetic data for Unmanned Aerial Vehicle (UAV) object detection [11].Through synthetic datasets, we can create more complex and diverse industrial scenes without the limitations of actual data collection.This helps expand the training data and improves the adaptability and generalization of object detection algorithms in industrial environments.
However, when using synthetic datasets, we also need to carefully consider their realism and effectiveness.Since synthetic data are generated by models, there may be some differences compared to real data [12].Farzan Erlik Nowruzi et al. analyzed object detection performance using synthetic and real data [13].Therefore, we need to adopt a series of methods to ensure the consistency between synthetic data and real data in terms of feature distribution and data distribution, thus ensuring the accuracy and robustness of the model in real industrial scenarios.
To overcome the differences between synthetic and real data and achieve image translation from the synthetic image domain to the real image domain, researchers can leverage Generative Adversarial Networks (GANs) [14], a powerful deep learning tool [15].GANs consist of a generator and a discriminator, forming an adversarial model.The generator is responsible for generating synthetic data, while the discriminator is tasked with distinguishing between real data and the synthetic data generated by the generator.Both components continuously optimize their performance through adversarial training.The generator aims to generate increasingly realistic synthetic data, while the discriminator aims to accurately determine whether the input data are real or synthetic [16].
In recent years, the image translation technology of GANs has made significant progress and has been widely applied in various fields [17,18].In industrial object detection, using GANs to generate realistic synthetic data not only increases the diversity and quantity of training data but also improves the adaptability of the object detection model to the complexities and variations in real industrial scenarios [19].However, ensuring that the synthetic data generated by GANs matches the quality and diversity of real data still requires careful design and tuning.Furthermore, to maintain the good generalization capabilities of the object detection algorithm after training with synthetic data, it is essential to strike a moderate balance between synthetic and real data during the training process.Combining other data augmentation techniques is also necessary to enhance the diversity of the data.
In the task of image translation [20], we can regard the generator as a converter from the synthetic image domain to the real image domain.Through training GANs, the generator learns to transform synthetic images into images that resemble real data, thereby reducing the differences between synthetic and real data.This allows us to augment the real dataset with synthetic data, thus improving the performance of object detection algorithms in real industrial environments.
In this paper, we introduce a novel approach to provide more training data for object detection tasks in the industrial domain.Our main contributions are as follows: 1.
We propose a synthetic data generation framework for industrial object detection tasks, enabling the effortless creation of a larger volume of industrial part data using a small number of real industrial part images and CAD models.2.
To enhance the quality of generated images in achieving the transformation task from CAD models to real images, we have replaced the original feature extraction module with an RDB (Residual Dense Block) module.Additionally, we have introduced an SSIM (Structural Similarity Index Measure) loss function to strengthen the network constraints of the generator.The real images obtained through the RDB-CycleGAN network contribute to augmenting our dataset.

3.
Experiments show that the synthetic data obtained through our method has a significant competitive advantage, effectively augmenting industrial part data and partially bridging the gap between synthetic and real data.

Related Work
To address the issue of limited datasets in industrial object detection, researchers have started exploring the use of synthetic datasets.In this section, we primarily focus on methods related to synthetic data generation for object detection and image translation networks.

Overview of Object Detection
Object detection in the industrial domain has been one of the hot research topics in recent years, as it is of significant importance in improving production efficiency, ensuring product quality, and achieving industrial intelligence.Many researchers have proposed various object detection algorithms and solutions for different industrial scenarios and tasks.In traditional industrial object detection, researchers often rely on manually designed feature extraction and detection algorithms.For example, methods based on features like Histogram of Oriented Gradients (HOG) and Scale-Invariant Feature Transform (SIFT) have been widely used in industrial object detection tasks over the past decade [21].However, these methods often require a large amount of manual labor and expertise and have limited detection performance in complex scenes.With the rise of deep learning, Convolutional Neural Networks (CNNs) have made significant progress in industrial object detection.CNNs, through end-to-end learning, can automatically learn more efficient and useful feature representations from data, leading to excellent performance in complex scenes for object detection algorithms.Among them, Faster R-CNN, YOLO, and SSD [22,23] have become representative methods in the field of industrial object detection.Faster R-CNN introduces a Region Proposal Network (RPN) to optimize the process of candidate box generation for objects; YOLO (You Only Look Once) adopts a one-stage detection approach, achieving a good balance between high detection speed and accuracy; SSD (Single Shot Multibox Detector) combines multi-scale features for object detection, improving the detection capability for small objects.Despite the significant progress of deep learning methods in industrial object detection, challenges remain in practical applications, such as data scarcity and adaptability to complex industrial environments.To address these issues, some researchers have started exploring the use of synthetic datasets to increase training data [24].Synthetic datasets, generated using techniques like Generative Adversarial Networks (GANs), can simulate data from real scenarios, thereby enhancing the diversity and quantity of training data.This approach provides new perspectives for industrial object detection and opens up new possibilities for efficient object detection in complex industrial environments [25].

Synthetic Data Generation
Synthetic data have been extensively researched and applied in the fields of computer vision and machine learning.With the advancement of deep learning techniques, synthetic data have become an effective approach to address data scarcity and generalization issues [26].In the domain of industrial object detection, the use of synthetic data has also gained attention among researchers.Synthetic data are typically generated using Genera-tive Adversarial Networks (GANs) or other generative models [27].GANs can generate synthetic data that closely resembles real data through an adversarial process of training the generator and discriminator.The generator aims to produce realistic synthetic data, while the discriminator strives to differentiate between real and synthetic data.As the training progresses, the generator continuously improves, and the generated synthetic data become increasingly similar to the distribution of real data.The advantages of using synthetic data in industrial object detection lie in the ability to rapidly obtain large quantities of diverse data, especially when real data are difficult to obtain or costly [28].Synthetic datasets can flexibly generate different types and shapes of target objects based on the needs of various industrial scenarios and tasks.Additionally, synthetic data allow for control over factors such as lighting, angles, and backgrounds, thereby enhancing the robustness and generalization capabilities of object detection algorithms.Some studies have shown that joint training with synthetic and real data can significantly improve model performance in object detection tasks.For instance, using synthetic data as an auxiliary dataset and employing transfer learning via pretraining the model on synthetic data and fine-tuning it on real data can enhance the model's performance on real datasets [29].Furthermore, using synthetic data for data augmentation can increase the diversity of training data, thereby improving the model's adaptability to complex scenes.
However, the use of synthetic data also poses some challenges.Firstly, the generated synthetic data need to exhibit certain consistency with real data in terms of feature and data distributions; otherwise, the model's performance in real scenarios may decline [30].Secondly, the quality of synthetic data significantly impacts the final model's performance.Ensuring that the generated synthetic data are sufficiently realistic is a crucial concern.Therefore, when utilizing synthetic data, careful design of synthetic data generation strategies, along with the incorporation of other data augmentation techniques, is necessary to ensure that the generated synthetic data positively contribute to the training of object detection algorithms [31].

The CycleGAN-Based Image Translation Networks
To address the issue of data scarcity in industrial object detection, some researchers have begun to explore the use of synthetic data and utilize a variant model of Generative Adversarial Networks (GANs) called CycleGAN to achieve image translation from the CAD image domain to the real image domain [32,33].CycleGAN, proposed by Zhu et al. in 2017 [34], stands out for its ability to perform unpaired image translation, enabling bidirectional image conversion between two different domains while maintaining content consistency.In industrial object detection, CAD images are typically used by engineers for design and simulation, while real images are collected during the industrial production process.There is a significant difference between these two image domains, and traditional data augmentation and transfer learning methods often yield limited results in this scenario.Therefore, using CycleGAN for image translation has emerged as a novel solution.Through CycleGAN, CAD images can be translated into real images, thereby generating more realistic and diverse data in an industrial environment.This approach helps alleviate the data scarcity problem in industrial object detection and improves the model's generalization capabilities in real scenes.Additionally, CycleGAN can also perform inverse translation from the real image domain to the CAD image domain, converting real images into CAD images, further enriching the diversity of synthetic data.
However, the application of CycleGAN also faces challenges [35,36].Firstly, the generated synthetic images need to possess sufficient realism and credibility to ensure the performance of the object detection model in real scenes.Therefore, careful parameter tuning and optimization of CycleGAN are required to obtain high-quality synthetic images.Secondly, there may be significant differences between CAD images and real images, necessitating the reasonable design of loss functions and weights during the training process to allow CycleGAN to learn better image conversion mappings [37][38][39].

Proposed Method
As illustrated in Figure 1, the proposed method mainly contains two steps: (1) Training images are collected for the image-to-image translation model, including industrial part CAD models and real part images from the scene.We cropped the CAD images and real images to obtain images of size 256 × 256.These collected images are cropped and preprocessed to obtain source domain X images and target domain Y images for the image-to-image translation model.( 2 necessitating the reasonable design of loss functions and weights during the training process to allow CycleGAN to learn better image conversion mappings [37][38][39].

Proposed Method
As illustrated in Figure 1, the proposed method mainly contains two steps: (1) Training images are collected for the image-to-image translation model, including industrial part CAD models and real part images from the scene.We cropped the CAD images and real images to obtain images of size 256 × 256.These collected images are cropped and preprocessed to obtain source domain X images and target domain Y images for the image-to-image translation model.(2) The preprocessed training images are fed into the image-to-image translation model based on unpaired GAN, where the model learns the detailed characteristics of the parts from real images to achieve the image translation from CAD models to real images.

Model Architecture
In this paper, we use a model based on the CycleGAN network structure for the task of image-to-image translation.The model consists of two generators and two discriminators, performing the conversion between CAD model images and real scene images in an unpaired training dataset to generate synthetic data.The overall model architecture is illustrated in Figure 2.This model forms a circular network structure composed of two mirrored Generative Adversarial Networks (GANs), comprising two generators and two discriminators.The loss functions include adversarial loss, cycle consistency loss, and structural similarity loss.In the forward network, the generator G maps data Real_A from source domain X to target domain Y, producing Fake_B.The discriminator makes judgments on the generated image Fake_B, and then the generator F reconstructs it back to data Rec_A in domain X.Similarly, the transformation from the target domain Y to the source domain X follows the same process.The network can learn the mapping between the source domain and the target domain, and it can also reconstruct back from the target domain, achieving the task of transforming CAD models into real part images.

Model Architecture
In this paper, we use a model based on the CycleGAN network structure for the task of image-to-image translation.The model consists of two generators and two discriminators, performing the conversion between CAD model images and real scene images in an unpaired training dataset to generate synthetic data.The overall model architecture is illustrated in Figure 2.This model forms a circular network structure composed of two mirrored Generative Adversarial Networks (GANs), comprising two generators and two discriminators.The loss functions include adversarial loss, cycle consistency loss, and structural similarity loss.In the forward network, the generator G maps data Real_A from source domain X to target domain Y, producing Fake_B.The discriminator makes judgments on the generated image Fake_B, and then the generator F reconstructs it back to data Rec_A in domain X.Similarly, the transformation from the target domain Y to the source domain X follows the same process.The network can learn the mapping between the source domain and the target domain, and it can also reconstruct back from the target domain, achieving the task of transforming CAD models into real part images.During the model training process, first, for the input image domains X and Y, the Generative Adversarial Networks generate corresponding fake and reconstructed images.Then, the gradients of the generator network are computed, and the weights of the generator network are updated accordingly; next, the gradients of the discriminator network During the model training process, first, for the input image domains X and Y, the Generative Adversarial Networks generate corresponding fake and reconstructed images.Then, the gradients of the generator network are computed, and the weights of the generator network are updated accordingly; next, the gradients of the discriminator network are calculated, and the weight coefficients of the discriminator network are updated.Finally, the latest network model is saved based on the set frequency parameter for model saving.The pseudo-code of the model algorithm is shown in Algorithm 1: for each data in dataset do 3.
Generate domain X image fake_x and domain Y image fake_y; 4.
Set the gradient of the generated networks G and F to 0; 5.
Calculate the gradient of the generated network G and F; 6.
Update the weight parameters of the generated networks G and F; 7.
Set the gradient of D X and D Y to 0 for the discriminant network; 8.
Calculate the gradient of the discriminant network D X and D Y ; 9.
Update the weight parameters of D X and D Y discriminant networks; 10. end for 11. if iters % sava_model_freq == 0 12.
Save the latest model 13.end if 14. end for

Network Structure
The CycleGAN network has a mirrored structure and consists of two parts, each of which is a sub-network based on GAN.The generator of each GAN network is composed of an encoder, a transformation module, and a decoder.The encoder module consists of two convolutional layers with a stride of 2 and a kernel size of 3 × 3. When the input image size is 128 × 128, the transformation module consists of 6 residual blocks with a kernel size of 3 × 3, and when the input image is 256 × 256, it consists of 9 residual blocks with a kernel size of 3 × 3. The decoder module consists of two transposed convolutional layers with a kernel size of 3 × 3, and the modules are connected through a fully convolutional network.
To achieve the image translation task from industrial part CAD images to real images and improve the network performance, we have adopted a series of improvement measures.Firstly, we optimized the generator of the CycleGAN network through replacing the original Conv + LReLU structures in the Dense Blocks instead of the Conv + BN + ReLU structures used in ResNet, and we removed batch normalization.This change resulted in significant performance improvement, not only increasing network stability but also reducing artifacts in the generated images, leading to a notable enhancement in the overall image quality.The RDB module and the improved generator structure are shown in Figures 3 and 4. In Figure 3, the dashed lines indicate the structure of a Dense Block.The yellow lines, blue lines, green lines, and brown lines represent dense connection layers.used in ResNet, and we removed batch normalization.This change resulted in significant performance improvement, not only increasing network stability but also reducing artifacts in the generated images, leading to a notable enhancement in the overall image quality.The RDB module and the improved generator structure are shown in Figures 3 and 4. In Figure 3, the dashed lines indicate the structure of a Dense Block.The yellow lines, blue lines, green lines, and brown lines represent dense connection layers.

Loss Function
CycleGAN uses cycle consistency to establish the mapping relationship between source domain and the target domain.The model employs both adversarial loss and c

Loss Function
CycleGAN uses cycle consistency to establish the mapping relationship between the source domain and the target domain.The model employs both adversarial loss and cycle consistency loss in both Generative Adversarial Networks.Given two image domains X and Y, two mapping functions G: X→Y and F: Y→X are established between the two domains, where G and F are the generators in the GAN.With this, a GAN loss can be defined, and the adversarial loss from X to Y is as follows: Similarly, the adversarial loss from Y to X is as follows: In this context, we denote X and Y as the source domain and target domain, respectively.x∈X, y∈Y.p data (x) represents the data distribution of the source domain X, and p data (y) represents the data distribution of the target domain Y. E y∼p data (y) indicates the expectation of y under the distribution p data (y), and E x∼p data (x) signifies the expectation of x under the distribution p data (x).
When learning the mappings from X to Y and Y to X, the image domain X is transformed by the generator G to generate the forged domain Y F , and then through F to generate the reconstructed domain X R .The goal is to minimize the difference between X and X R through calculating their loss.Similarly, the aim is to minimize the difference between Y and Y R , as much as possible.Therefore, the cycle consistency loss function is defined as follows: In addition, the Structural Similarity (SSIM) loss function is introduced, which calculates the structural similarity between the generated images and their corresponding real images.Gwantae Kim et al. utilized SSIM to enhance the quality of super-resolution images [40].Similarly, Fengquan Zhang et al. employed SSIM in the context of improving the quality of image translation networks [41].Through minimizing the SSIM loss, we encourage the generator to preserve more image structural information during the translation process, thereby enhancing the quality and realism of the generated images.The introduction of SSIM loss also helps to reduce artifacts and blurriness that may occur in the generated images, thereby improving the stability and reliability of the image translation.This metric defines the structural information of an image from the perspective of image composition, including luminance, contrast, and structure, reflecting the attributes of objects in the scene.Thus, SSIM models distortion as a combination of these three different factors: luminance, contrast, and structure.In image processing, the estimate of luminance information is represented by the mean, contrast information is represented by the standard deviation, and the degree of structural similarity is represented by the covariance.µ x and µ y are the mean values of pixels in domains X and Y, respectively, σ 2 x and σ 2 y are the variances of pixels in domains X and Y, and σ xy is the covariance between domains X and Y.The formula for SSIM is as follows: Using the above formula, we obtain the term for the Structural Similarity (SSIM) loss function, which is as follows: The overall loss function of the RDB-CycleGAN network is a weighted combination of three parts: adversarial loss, cycle consistency loss, and structural similarity loss: Here, we set λ to 1 and θ to 0.2.

Experiments and Discussion
This paper conducted two main sets of experiments.Firstly, a comparison was made between the improved RDB-CycleGAN network and other image translation networks.The experimental results demonstrated the effectiveness of the proposed enhancement in improving the quality of image generation.Additionally, we applied the images generated by the RDB-CycleGAN network to the YOLOv5 object detection algorithm to demonstrate the effectiveness of synthetic data.
The computer configuration used in our experiments consists of an Nvidia GeForce RTX 2080Ti GPU and an Intel i9-13900K CPU.Regarding parameter settings, we set the epoch to 200, learning rate to 0.0002, and batch size to 8.During the network training process, we selected 1000 real images and 1000 CAD images for each of the three industrial parts to achieve image domain conversion.
We conducted comprehensive experiments and evaluations on the generated images to objectively assess their performance and quality.For this purpose, we employed several evaluation metrics, including Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Fréchet Inception Distance (FID).PSNR is a traditional image quality metric used to compare the distortion between original images and GAN-generated images.A higher PSNR value indicates less distortion and higher image quality between the generated and real images.SSIM is another widely used image similarity metric that considers both structural information and luminance contrast.A higher SSIM value closer to 1 indicates higher structural similarity between the generated and real images.FID is a metric used to assess the difference between two image distributions.It quantifies the difference between real and generated images through comparing their distributions in the feature space of an Inception network.A lower FID value indicates less difference between the generated and real images, indicating better performance of the generation model.
In our experiments, we calculated PSNR, SSIM, and FID to assess the quality of images generated by the adversarial network, using these metrics for a holistic performance evaluation.This approach offered an objective analysis, guiding enhancements in the network's generative efficiency.The FID score measures the distance between real and generated images at the feature level.Through using InceptionV3, we generate N × 2048 vectors for N images in the real dataset to obtain the mean µ x .Similarly, for M images generated in the synthetic dataset, we generate M × 2048 vectors and obtain the mean µ y .Σ x represents the covariance matrix for the real dataset, Σ y represents the generated dataset, and T r indicates the sum of the diagonal elements of the matrices.The FID is calculated using the following formula: Peak Signal-to-Noise Ratio (PSNR) is used to compare the similarity between two images and evaluate the level of distortion between compressed or processed images and the original image.Mean Squared Error (MSE) represents the mean squared difference between the two images, and MAX 2 I denotes the maximum possible pixel value of the image.The calculation formula for PSNR is as follows:

Experiment Results
We conducted image translation experiments using DualGAN, CycleGAN, GP-UNIT, StarGAN-v2, and the improved CycleGAN proposed in this paper.The specific experimental results are shown in Figure 6.In Figure 6, the initial column represents CAD images, while the second, third, fourth, and fifth columns display the corresponding experimental outcomes of image translation employing the DualGAN model, CycleGAN model, GP-UNIT model, StarGAN-v2, and the enhanced CycleGAN model, respectively.
From a subjective visual perspective, when using the original CycleGAN, DualGAN, GP-UNIT, and StarGAN-v2 networks for image style transfer, there are clearly many issues, such as significant structural deficiencies and excessive artifacts or incomplete images, which have a significant impact on subsequent object detection tasks.In comparison, the RDB-CycleGAN generates fewer artifacts and has minimal structural deficiencies, resulting in less object deformation.
Additionally, we randomly selected 200 images from each of the three categories of generated datasets for the objective evaluation of image quality.Table 1 demonstrates the quantitative results through the objective evaluation metrics.The best and secondbest results are in red and blue, respectively.From Table 1, it can be observed that the RDB-CycleGAN algorithm performs better than the CycleGAN, DualGAN, GP-UNIT, and StarGAN-v2 algorithms in terms of SSIM, FID, and PSNR metrics for the style transfer from CAD models to real images.The generated images have better quality and are closer to the standard real images.The objective evaluation results of the metrics are consistent with the subjective visual perception.We conducted ablation experiments on the introduced RDB module and SSIM loss, and Figure 7 presents the results of our ablation experiments.
From a subjective visual perspective, replacing the generator in the CycleGAN model with the RDB module has effectively improved the part completion in the image translation process.The generated features now align better with the original images, resulting in clearer images.Moreover, the inclusion of the SSIM loss function in the network model has also led to a certain degree of enhancement in the part's structural features.Through our ablation experiments, it is evident that our RDB module and SSIM loss function can effectively enhance the quality of the synthesized images.

Experiment Results
We conducted image translation experiments using DualGAN, CycleGAN, GP-UNIT, StarGAN-v2, and the improved CycleGAN proposed in this paper.The specific experimental results are shown in Figure 6.In Figure 6 From a subjective visual perspective, when using the original CycleGAN, DualGAN, GP-UNIT, and StarGAN-v2 networks for image style transfer, there are clearly many issues, such as significant structural deficiencies and excessive artifacts or incomplete images, which have a significant impact on subsequent object detection tasks.In comparison, the RDB-CycleGAN generates fewer artifacts and has minimal structural deficiencies, resulting in less object deformation.
Additionally, we randomly selected 200 images from each of the three categories of generated datasets for the objective evaluation of image quality.Table 1 demonstrates the quantitative results through the objective evaluation metrics.The best and second-best results are in red and blue, respectively.From Table 1, it can be observed that the RDB-CycleGAN algorithm performs better than the CycleGAN, DualGAN, GP-UNIT, and Star-GAN-v2 algorithms in terms of SSIM, FID, and PSNR metrics for the style transfer from In addition, we employed two different data augmentation methods, CAD images and RDB-CycleGAN synthetic images, to explore their effectiveness in the object detection task.We used Yolov5 as the object detection model and conducted comparative experiments to evaluate the effects of these two data augmentation methods.For CAD images, we generated diverse images using computer-aided design techniques to simulate different perspectives in real-world scenarios.For the RDB-CycleGAN synthetic images, we employed image style transfer techniques to generate synthetic data, thereby increasing the diversity and complexity of the dataset.We retrained the Yolov5 object detection model using the augmented datasets and evaluated its performance on the same test set.From a subjective visual perspective, replacing the generator in the CycleGAN mod with the RDB module has effectively improved the part completion in the image trans tion process.The generated features now align better with the original images, resulti in clearer images.Moreover, the inclusion of the SSIM loss function in the network mod has also led to a certain degree of enhancement in the part's structural features.Throu our ablation experiments, it is evident that our RDB module and SSIM loss function c effectively enhance the quality of the synthesized images.
In addition, we employed two different data augmentation methods, CAD imag and RDB-CycleGAN synthetic images, to explore their effectiveness in the object detecti task.We used Yolov5 as the object detection model and conducted comparative expe ments to evaluate the effects of these two data augmentation methods.For CAD imag we generated diverse images using computer-aided design techniques to simulate diff ent perspectives in real-world scenarios.For the RDB-CycleGAN synthetic images, employed image style transfer techniques to generate synthetic data, thereby increasi the diversity and complexity of the dataset.We retrained the Yolov5 object detecti model using the augmented datasets and evaluated its performance on the same test s The YOLOv5 framework provides models of various sizes (Yolov5s, Yolov5 Yolov5l, and Yolov5x) to cater to diverse requirements and computational resources.our experiments, we selected the Yolov5s model for its balance of high detection accura and reduced computational expense.Furthermore, we set the epoch to 200 and the bat size to 8, with network input images configured at 640 × 640 pixels.
The experimental results demonstrated that the model augmented with the RDB-C cleGAN synthetic images outperformed the model augmented with CAD images, achie ing better detection accuracy and generalization ability.This improvement can be The YOLOv5 framework provides models of various sizes (Yolov5s, Yolov5m, Yolov5l, and Yolov5x) to cater to diverse requirements and computational resources.In our experiments, we selected the Yolov5s model for its balance of high detection accuracy and reduced computational expense.Furthermore, we set the epoch to 200 and the batch size to 8, with network input images configured at 640 × 640 pixels.
The experimental results demonstrated that the model augmented with the RDB-CycleGAN synthetic images outperformed the model augmented with CAD images, achieving better detection accuracy and generalization ability.This improvement can be attributed to CycleGAN's capability to learn the feature distribution of real images and apply it to synthetic image generation, resulting in synthetic images that closely resemble the distribution of real data.In contrast, CAD images, being computer-generated, might have certain differences from real images, which could lead to inferior performance when used for data augmentation compared to the RDB-CycleGAN synthetic images.The experimental results suggest that CAD images can effectively augment the dataset to some extent, but the synthetic dataset obtained through image translation is more competitive.Therefore, the proposed synthetic dataset is deemed necessary and advantageous.The experimental results are shown in Table 2.
We also controlled the proportions of real data and synthetic data and set different Intersection over Union (IoU) thresholds to obtain the detection accuracy curves under different data ratios and IoU values.The following figures display the curves obtained through varying the proportion of data synthesized using the RDB-CycleGAN network and real data, as well as the curves obtained through varying the proportion of CAD images and real data.In Figure 8, we can observe several trends from the results shown in the above figures.As the IoU changes from 0.9 to 0.5, the mean Average Precision (mAP) gradually increases, with the highest mAP value achieved at an IoU of 0.5.Additionally, when the ratio of real data to synthetic data/CAD images changes, the mAP value also varies.The closer the ratio is to 0.8, the higher the mAP, and the highest mAP value is achieved when the ratio is 0.8.Furthermore, in a cross-comparison, we find that when using our synthetic data, the mAP values are higher than when using CAD images directly under the same conditions.Therefore, we can conclude that our synthetic data demonstrates strong competitiveness, enhancing object detection accuracy and outperforming the use of CAD images.We also controlled the proportions of real data and synthetic data and set di Intersection over Union (IoU) thresholds to obtain the detection accuracy curves different data ratios and IoU values.The following figures display the curves ob through varying the proportion of data synthesized using the RDB-CycleGAN ne and real data, as well as the curves obtained through varying the proportion of CA ages and real data.
In Figure 8, we can observe several trends from the results shown in the abo ures.As the IoU changes from 0.9 to 0.5, the mean Average Precision (mAP) gra increases, with the highest mAP value achieved at an IoU of 0.5.Additionally, wh ratio of real data to synthetic data/CAD images changes, the mAP value also varie closer the ratio is to 0.8, the higher the mAP, and the highest mAP value is achieved the ratio is 0.8.Furthermore, in a cross-comparison, we find that when using our syn data, the mAP values are higher than when using CAD images directly under the conditions.Therefore, we can conclude that our synthetic data demonstrates strong petitiveness, enhancing object detection accuracy and outperforming the use of CA ages.The proposed image translation method in this paper was used to synthesize data for three types of industrial parts.We established two sets of training data and conducted tests under the same set of images.The first set of data consisted of 200 real multi-category images and 600 single-category images synthesized using our method (200 images for each of three categories), with the test results shown in Figure 9.The second set comprised 200 real multi-category images and 600 single-category CAD images (200 images per category), with the test results depicted in Figure 10.Both datasets were trained using the Yolov5s model.The detection results indicate a noticeable improvement in accuracy when using our synthesized data.egory), with the test results depicted in Figure 10.Both datasets were trained using th Yolov5s model.The detection results indicate a noticeable improvement in accuracy whe using our synthesized data.
The results demonstrate that our synthetic data are highly competitive and can effec tively augment industrial part data that is difficult to obtain.The following figures displa some detection results from the test dataset.The results demonstrate that our synthetic data are highly competitive and can effectively augment industrial part data that is difficult to obtain.The following figures display some detection results from the test dataset.
From Figures 9 and 10, we can intuitively observe that within the same group of test images, the detection accuracy using our synthesized data is higher than that achieved with the direct use of CAD images.Through the method of synthesized data proposed in this paper, we can effectively expand the dataset first and foremost.With only a small number of CAD images and real images, we can inexpensively acquire numerous synthesized data.On the other hand, compared to the approach of directly using CAD images to expand the dataset, our synthesized data are more competitive.From Figures 9 and 10, we can intuitively observe that within the same group of tes images, the detection accuracy using our synthesized data is higher than that achieved with the direct use of CAD images.Through the method of synthesized data proposed in this paper, we can effectively expand the dataset first and foremost.With only a smal number of CAD images and real images, we can inexpensively acquire numerous synthe sized data.On the other hand, compared to the approach of directly using CAD images to expand the dataset, our synthesized data are more competitive.

Conclusions
This paper proposes a synthetic data generation framework for object detection tasks comparing real and synthetic data to analyze how different combinations of real and syn thetic data affect the accuracy of object detection models.In the original CycleGAN net work, RDB modules and SSIM loss are introduced to improve the quality of synthetic data and complete the translation task from CAD images to real images effectively.In the ex

Conclusions
This paper proposes a synthetic data generation framework for object detection tasks, comparing real and synthetic data to analyze how different combinations of real and synthetic data affect the accuracy of object detection models.In the original CycleGAN network, RDB modules and SSIM loss are introduced to improve the quality of synthetic data and complete the translation task from CAD images to real images effectively.In the experimental section, we controlled the ratio of synthetic data to real data, demonstrating that our synthetic data, being directly based on CAD images, effectively augments the dataset and improves detection accuracy.
Limitations and deficiencies: There is still room for further improvement in the quality of our synthetic data.Additionally, our synthetic data are relatively limited in scene diversity, lacking sufficient variation.Future work will focus on further enhancing the ) The preprocessed training images are fed into the image-to-image translation model based on unpaired GAN, where the model learns the detailed characteristics of the parts from real images to achieve the image translation from CAD models to real images.Mathematics 2023, 11, x FOR PEER REVIEW 5 of 18

Figure 1 .
Figure 1.Synthetic data generation for object detection.

Figure 1 .
Figure 1.Synthetic data generation for object detection.

Figure 4 .
Figure 4.The architecture of the generator network.The discriminator of the GAN network in this paper adopts the original network's 70 × 70 PatchGAN network structure.Compared to traditional generative adversarial networks, this structure can better capture local features in the images.PatchGAN maps the input feature map into a 30 × 30-sized output feature map, which corresponds to the probabilities of multiple 70 × 70 local patches of the input feature map being real.The

Figure 3 .
Figure 3.The RDB module and Dense Block of the RDB-CycleGAN network.

Figure 4 .
Figure 4.The architecture of the generator network.The discriminator of the GAN network in this paper adopts the original network's 70 × 70 PatchGAN network structure.Compared to traditional generative adversarial networks, this structure can better capture local features in the images.PatchGAN maps the input feature map into a 30 × 30-sized output feature map, which corresponds to the probabilities of multiple 70 × 70 local patches of the input feature map being real.The

Figure 4 .Figure 5 .
Figure 4.The architecture of the generator network.The discriminator of the GAN network in this paper adopts the original network's 70 × 70 PatchGAN network structure.Compared to traditional generative adversarial networks, this structure can better capture local features in the images.PatchGAN maps the input feature map into a 30 × 30-sized output feature map, which corresponds to the probabilities of multiple 70 × 70 local patches of the input feature map being real.The discriminator convolves over the entire N × N-sized image, resulting in a 30 × 30-sized output, and then takes the average value to obtain the final output.The structure of the discriminator is shown in Figure 5. Mathematics 2023, 11, x FOR PEER REVIEW 8 o

Figure 5 .
Figure 5.The architecture of the discriminator network.

Figure 8 .
Figure 8. Mean average precision at different image ratios and IoU levels: (a) real data and sy data; (b) real data and CAD image.

Figure 8 .
Figure 8. Mean average precision at different image ratios and IoU levels: (a) real data and synthetic data; (b) real data and CAD image.

Figure 9 .
Figure 9. Object detection results trained on real data and our synthetic data.

Figure 9 .
Figure 9. Object detection results trained on real data and our synthetic data.

Figure 10 .
Figure 10.Object detection results trained on real data and CAD images.

Figure 10 .
Figure 10.Object detection results trained on real data and CAD images.
The RDB module and Dense Block of the RDB-CycleGAN network.
The RDB module and Dense Block of the RDB-CycleGAN network.

Table 1 .
The quantitative comparison results of SSIM, FID, and PSNR.

Table 2 .
The quantitative comparison of the Yolov5 object detection algorithm at a mAP of 0.5 under different amounts of our generated synthetic data, CAD data, and real data preferences, with the best results in bold.