PreCaCycleGAN: Perceptual Capsule Cyclic Generative Adversarial Network for Industrial Defective Sample Augmentation

: Machine vision is essential for intelligent industrial manufacturing driven by Industry 4.0, especially for surface defect detection of industrial products. However, this domain is facing sparse and imbalanced defect data and poor model generalization, affecting industrial efficiency and quality. We propose a perceptual capsule cycle generative adversarial network (PreCaCycleGAN) for industrial defect sample augmentation, generating realistic and diverse defect samples from defect-free real samples. PreCaCycleGAN enhances CycleGAN with a U-Net and DenseNet-based generator to improve defect feature propagation and reuse and adds a perceptual loss function and a capsule network to improve authenticity and semantic information of generated features, enabling richer and more realistic global and detailed features of defect samples. We experiment on ten datasets, splitting each dataset into training and testing sets to evaluate model generalization across datasets. We train three defect detection models (YOLOv5, SSD, and Faster-RCNN) with original data and augmented data from PreCaCycleGAN and other state-of-the-art methods, such as CycleGAN-TSS and Tree-CycleGAN, and validate them on different datasets. Results show that PreCa-CycleGAN improves detection accuracy and rate and reduces the false detection rate of detection models compared to other methods on different datasets, demonstrating its robustness and generalization under various defect conditions

which can input additional labels, texts, or image information as conditions to the generator and discriminator and improve the realism of the generated samples according to this information. The discriminator tries to maximize its ability to distinguish between real samples and generated samples. AC-GAN [15] proposed a GAN based on an auxiliary classifier, which adds an extra classifier in the discriminator to predict the category of the input sample, and combines the classification loss and discrimination loss to optimize the network. This can improve the quality and diversity of the generated samples and also use category information to control the generation process. The objective function of the original GAN is a minimax game, which has problems such as gradient vanishing, saddle point, KL divergence asymmetry, etc. Therefore, WGAN introduced Wasserstein distance as a measure between the real distribution and the generated distribution and gave a simple and effective algorithm to optimize this distance. WGAN [16] can avoid gradient vanishing and mode collapse and provide a meaningful indicator of training progress. Models based on domain crossing mainly use the relationship or transformation rules between different domains to achieve cross-domain generation tasks. In the original GAN design, the generator can only map from a random noise space to a data space and cannot achieve transformation between different data spaces. Pix2Pix [17] proposed a GAN based on conditional GAN and U-Net structure, which can achieve supervised transformation from one image domain to another image domain, such as from sketch to color image, from day to night, etc. StarGAN [18] proposed a GAN based on conditional GAN and CycleGAN structure, which can achieve unsupervised transformation between multiple image domains, such as changing facial expressions, hairstyles, gender, etc. These methods provide some pioneering suggestions for solving this kind of problem.
Although GAN has a wide range of applications in the sample augmentation field, they mainly focus on domains such as face attribute transformation and landscape color transformation. In the industrial defect detection field, due to difficulties such as lack of defect samples, low visibility of defects, irregular shape, unknown type, etc., existing GAN-based augmented samples are difficult to meet the task requirements of high accuracy and high speed at the same time. Therefore, designing a GAN model that can synthesize realistic and diverse defect samples with high fidelity and efficiency is a challenge in industrial defect detection. To address this challenge, we propose a perceptual capsule cyclic generative adversarial network (PreCaCycleGAN) for industrial defect sample augmentation, which aims to learn a more realistic distribution of industrial defect data. Our method leverages CycleGAN's framework of bi-directional mapping and cyclic consistency loss and enhances it with least-squares loss and perceptual loss function. Moreover, our method adopts an optimized generator structure with U-Net and DenseNet modules, and a capsule network with perspective invariance, to further improve the generator's ability to learn the features of industrial defect samples. The main contributions of our model are shown below: (i) We design a generator model with U-Net network structure [19] and DenseNet [20] modules to enhance the feature propagation and feature reuse of defects. This can solve the gradient disappearance problem of deep networks and add perceptual loss functions to enhance the feature and semantic information of generated images; (ii) We use cyclic consistency loss, identity mapping loss, and least squares loss to construct an adversarial training framework to achieve random changes in defect location and shape, ensure the consistency between the generated samples and the real samples in the non-defective region, improve the similarity between the generated samples and the real samples, and avoid the mode collapse and gradient vanishing or oscillation problems; (iii) We design a discriminator model with PatchGAN [21] and capsule network [22] using dynamic routing protocols dual branches after the initial feature extraction, which can effectively extract and retain the detailed features of defective samples, identify the local and overall features of the samples, and improve the authenticity and diversity of industrial defect generation samples; (iv) We compare our method with other generation algorithms and validate it in the actual industrial manufacturing defect detection model. We prove that our method has the optimal performance improvement for the actual industrial manufacturing defect detection model and can effectively increase the generalization ability of the defect detection model.

Related Work
Data augmentation is a common technique to enhance the performance and generalization ability of machine learning models by artificially creating new data to expand and enrich the training dataset. Sample augmentation is a specific form of data augmentation that is tailored to the characteristics and requirements of different domains or tasks. Data augmentation has been widely applied in computer vision, especially for tasks such as image classification, object detection, semantic segmentation, etc., where it can effectively address the issues of data insufficiency, dataset imbalance, and overfitting. However, in industrial defect detection, obtaining industrial defect samples is challenging due to the high yield rate of intelligent manufacturing, which leads to the lack of quantity and diversity of defect samples. Moreover, industrial defect samples require manual inspection and annotation by professionals, which is time-consuming and expensive. Furthermore, industrial defect samples have high complexity and diversity and are often sensitive and confidential, which restricts data sharing and communication and hinders the development of the industrial defect detection field. Therefore, designing suitable data enhancement techniques to overcome the data scarcity and imbalance problems in the industrial defects domain and to improve the robustness and accuracy of industrial defects detection models is an important and meaningful topic. We will review the current related research in data augmentation from three perspectives: Model-free image augmentation, Modelbased image augmentation, and optimizing policy-based image augmentation, and analyze their advantages and challenges in industrial defect samples.

Model-Free Image Augmentation
Model-free Image Augmentation (MIA) is a data augmentation method that does not depend on any model training or optimization, and it augments the data by applying various geometric or color transformations to the original image, such as rotation, translation, scaling, cropping, flipping, brightness adjustment, contrast adjustment, etc. [23]. These transformations can be done in image space or frequency domain and can be randomly combined. However, these conventional transformation methods often only increase the data quantity but not the data diversity and may cause information loss or distortion. To address this issue, some researchers proposed methods such as CutMix [24], which mixes different images; Random Erasing [25], which replaces pixel values with random rectangles; Noise Injection [26], which adds random values from Gaussian distributions to an image; and Copy-Paste [27], which randomly pastes instance targets on background images. These methods can improve the data diversity and complexity by blending or erasing images, but they can also lose the details and boundary information of the images, which can affect the performance of the model for fine-grained target detection tasks.
MIA is a general and simple data augmentation method that can be applied to any image data and task, but there are few studies on algorithms specifically designed for industrial defect sample augmentation. Farady et al. [28] only proposed PreAugNet in 2023, which uses a Support Vector Machine (SVM) as a class boundary classifier to filter the samples generated by MIA and combine them with the original ones. The limitations of MIA for industrial defect sample augmentation are mainly divided into two aspects: on the one hand, it cannot customize the transformations for a specific type of defects, and it usually requires manual setting of the transformation types and parameters, which are hard to adapt to different tasks and datasets. On the other hand, it can only transform the original image in its spatial or frequency domain and cannot change the content or structure of the image, so the difference between the generated samples and the original samples is limited, and it cannot effectively extend the data distribution or cover a new feature space to generate new samples. This cannot cope with the industrial defect images with specific structures or constraints that are generated from complex and dynamic industrial defect scenarios, and excessive transformations may destroy the semantic information of the image and thus compromise the quality and authenticity of the generated industrial defect samples.

Optimizing Policy-Based Image Augmentation
Optimizing Policy-based Image Augmentation (OPIA) is an approach that uses an optimization algorithm to search for the optimal data augmentation policy. OPIA is essentially a sequence of MIA operations and their parameters, such as rotating 15 degrees + crop 0.8 + brightness adjustment 0.2, etc. OPIA can automatically find the best data augmentation strategy for different datasets and tasks and can significantly improve the model performance on a test set. Cubuk et al. [29] proposed AutoAugment, the first OPIA method, which uses a reinforcement learning-based controller to select the optimal data augmentation strategy, but it is very slow and computationally intensive. Cubuk et al. [30] then proposed RandAugment based on the data augmentation strategy of the Neural Network Architecture Search (NAS) method [31], which reduces the search space, makes the search results more general and stable, and can adapt to models and datasets of different sizes and complexities. Lim et al. [32] further improved AutoAugment by proposing Fast AutoAugment, which uses Bayesian optimization and density matching to speed up the search process and is three orders of magnitude faster than AutoAugment in search time while achieving similar or better performance. Ho et al. [33] proposed Population-Based Augmentation, which optimizes both the target network and the data augmentation strategy, and PBA is four orders of magnitude faster than AutoAugment in search time while achieving similar or better performance. Zhang et al. [34] proposed Adversarial AutoAugment based on Adversarial Production Networks, which uses adversarial loss and reinforcement learning to optimize the data augmentation strategy, and Adversarial Auto-Augment is 12 times faster than AutoAugment in search time while achieving the best performance on multiple datasets. However, OPIA still depends on MIA as a transformation operation and thus suffers from the same problems and limitations faced by model-free image augmentation techniques. For industrial defect detection, no studies have been found using optimization strategy-based image augmentation techniques to improve model performance. This may be due to the lack of sufficiently large and highquality training data and feedback signals in industrial defect detection, which makes it difficult for optimization strategy-based image augmentation techniques to effectively learn data augmentation strategies or parameters.

Model-Based Image Augmentation
Model-based Image Augmentation (MBIA) is an approach that leverages deep learning models to synthesize new data samples. With the advancement of deep learning, traditional data augmentation methods are gradually replaced by data augmentation algorithms based on deep learning frameworks. Deep learning models can learn latent feature distributions from raw data and can generate new data samples from random noise or conditional inputs. MBIA can effectively increase the size and diversity of datasets and can produce high-quality and high-fidelity data samples. Kuo et al. [35] proposed FeatMatch based on Convolutional Neural Networks (CNNs) [36], which replaces simple transformations in image space with complex transformations generated in feature space to achieve data augmentation effects in feature space, thus enhancing the data diversity and consistency. However, the lack of interpretability of vector data in feature space leads to difficult and time-consuming training. Therefore, Wong et al. [37] changed the perspective of data augmentation to data space and found that data augmentation in data space is superior to data augmentation in feature space. However, both data augmentation methods in feature space and data space do not sufficiently learn the true distribution of the sample data, which makes the data augmentation methods based on Adversarial Generative Networks (GANs) [38] start to attract attention and research.
GANs is a deep learning framework that consists of generative and discriminative models that compete with each other. GANs can learn the underlying data distribution from raw samples and generate novel samples with diverse attributes such as types, positions, sizes, and shapes. The generation and discrimination processes are driven by a zero-sum game that ensures the progressive convergence between the generated and authentic data distributions. However, the GAN training process faces many challenges due to its non-convex and non-cooperative nature. Mode collapse, gradient vanishing, and oscillatory disturbances are common problems that affect the quality and diversity preservation in generated samples. Various GAN variants such as WGAN [16], LS-GAN [39], and f-GAN [40] have introduced different loss functions and distance metrics to improve the similarity between generated and real samples. Likewise, models such as cGAN [14], AC-GAN [15], and InfoGAN [41] have modified the architectures of generators and discriminators to increase the expressiveness and diversity of generative models. However, due to the complex and variable features of industrial defects, relying only on GANs to generate new industrial defect samples from random noise might lead to significant differences or biases compared to real samples. The generated samples might lack plausibility or credibility, which limits the application of GANs in industrial defect sample generation.
To endow GANs with more control mechanisms for sample generation, models such as Pix2Pix [17] and CycleGAN [21] have used translation between different images to impose constraints on generated samples, ensuring their closer approximation to real images. Based on this idea, some researchers have explored industrial defect sample generation. Qin et al. [42] proposed Tree-CycleGAN, a cyclic generative adversarial network based on a symmetric tree structure. This method uses a tree-structured generator with maximal diversity loss to enable one-to-many generation mappings. Using a tree-structured reconstructor and dual discriminators, Tree-CycleGAN can generate multiple target domain samples from a single source domain sample while preserving differences and cyclic consistency across different branches. This method effectively alleviates the problem of industrial defect sample insufficiency.
Similarly, Song et al. [43] introduced CycleGAN-TSS, a Texture Self-Supervised Cy-cleGAN that leverages texture information as a self-supervisory signal to guide the generator in acquiring enhanced shadow features. Compared to traditional CycleGAN, Cy-cleGAN-TSS can produce more realistic shadow images, thereby improving road crack detection performance. Niu et al. [44] proposed a method that combines CycleGAN with a Defect Attention Module (DAM). This adaptive method adjusts the weights of defect regions and integrates structural similarity (SSM) into the original L1 loss to formulate the Defect Cycle Consistency Loss (DCL). By using grayscale and structural features, this method enhances the simulation of internal defect structures. Notably, unlike other GANbased methods, this method yields clearer and more authentic defect images, thereby enhancing defect detection accuracy. In a different contribution, SHAO et al. [45] introduced DuCaGAN, a Dual Capsule Generative Adversarial Network based on CycleGAN. DuCaGAN uses the Dual Capsule Network (DCN) [22] to generate diversified and highfidelity industrial defect samples, which can be used for practical industrial data augmentation.
All these methods address the problem of industrial defect sample augmentation to some extent, but in real industrial manufacturing applications, they often suffer from low quality, low diversity, and low fidelity and do not adequately reflect the data distribution of the real industrial defect samples, which affects the detection accuracy and generalization of the deep learning-based defect detection model. Therefore, MBIA needs to design appropriate network structures and loss functions to adapt to different data characteristics and task requirements and balance the relationship between quality and diversity of generated samples while avoiding training difficulties such as mode collapse and gradient vanishing.

Overall Structure
To address the challenge of small sample sizes in industrial defect detection and to address the shortcomings of the current model-based image augmentation methods, we present a model PreCaCycleGAN that leverages defect-free samples to synthesize defective samples based on CycleGAN, as illustrated in Figure 1. The framework includes two generators 2 (Positive Sample to Negative sample) and 2 (Negative sample to Positive Sample), and two discriminators 2 and 2 . The generators are optimized based on the U-Net network structure, and the perceptual loss function is incorporated as a constraint to enhance the feature and semantic quality of the generated images. In the discriminator, we employ capsule networks to learn more refined global spatial features based on PatchGAN. Furthermore, we replace the Sigmoid cross-entropy loss function with least squares to overcome the gradient vanishing problem during training and prevent mode collapse and training instability.

Generator Structure
The generator G architecture is illustrated in Figure 2. Initially, the input image is subjected to channel expansion by a convolution layer employing convolution operation, and the convolution kernel of this layer possesses a size of 3 × 3 and a stride of 1. Subsequently, the feature map is aggregated and reconstructed by four times downsampling and four times upsampling, and ultimately the defective samples are synthesized by the activation layer. To augment the local detail feature extraction of the defective samples and enhance the network training efficiency and accuracy, we incorporate the summation operation with the antecedent layer prior to transmitting to the subsequent layer in the first three layers of downsampling and amalgamate with the residual module [46] to further ascertain the feature integrity of the samples in the downsampling process. We adopt the DenseNet Block [20] in the converter layer in lieu of the ResNet structure to considerably diminish the parameter and computation overheads. In the upsampling process, we exploit the upsampling module to accomplish the stitching of the downsampled feature maps of the corresponding scales through skip connections and fuse the features by the residual module based on the nearest interpolation upsampling before conveying them to the next layer.

Discriminator Structure
We proposed a discriminator D with a PatchGAN and a capsule network with two branches [22] to optimize the discriminative output, as illustrated in Figure 3. The input image undergoes three feature extraction layers and then bifurcates into two branches for the output. The first branch employs the original PatchGAN discriminator structure to assess the local authenticity of the image. The second branch utilizes the capsule network to achieve sample discrimination and evaluate the global consistency of the image. To better preserve the spatial information of industrial defect samples, we employed vector encoding of the primary capsule layer and the digit capsule layer to represent the probability of feature existence and spatial information in the capsule network. This enhances the realism and diversity of the generated defect samples, as well as their interpretability and controllability. We also employed a dynamic routing mechanism between two consecutive capsule layers to iteratively learn and predict the features of the lower layer and achieve an adaptive feature combination. The dynamic routing relationship between capsule i in layer l and capsule j in layer (l + 1) is depicted in Equation (1).
where the output of capsule i is , the weight matrix between capsule i and capsule j is , and the prediction vector from capsule i to capsule j is ̂| . The discriminator only needs to output two types of data, true or false, and j takes the value {0,1}. i is then determined by the total number of master capsules and takes the value {1 ≤ i ≤ 4096|i∈N}. After the prediction vectors are input to the dynamic routing incentive mechanism, for each prediction vector, a routing weight needs to be defined, which is the log prior probability between capsule i and capsule j. The coupling coefficient is obtained using softmax, as shown in Equation (2).
where is the coupling coefficient of the prediction vector and is the routing weight. Since there is no initial routing preference, the initial routing weight of each capsule is the same, and the sum of the coupling coefficients of all prediction vectors is 1, so the initial value of is set to 0. So the output sum of capsule j in the (l + 1)th layer is shown in Equation (3).
where is the output vector of layer j, representing the sum probability of all weighted prediction vectors in this layer, and it is necessary to use the squash function for to ensure that the probability takes values between [0, 1], as shown in Equation (4).
where, is the predicted probability output of layer j after compression, and in the calculation, in order to prevent the denominator from being 0, the denominator is preprocessed by adding , where is taken as 10 −8 , as shown in Equation (5).
The routing iteration protocol is represented by the dot product of the output vector and the prediction vector, and the larger the dot product represents the smaller the pinch angle, which proves that the consistency of the output vector and the prediction vector is better, and the protocol is shown in Equation (6).
The dynamic routing incentive mechanism is a cyclic structure, so the routing weights need to be updated before the next cycle, and the formula is shown in Equation (7).
We used to measure the consistency between the output vector and the prediction vector. The higher the , the higher the coupling coefficient is updated, and the higher the probability that capsule i is assigned to capsule j. This means that capsule j is more likely to be activated and to represent the existence of an entity. The dynamic routing mechanism replaces the scalar output feature detector of the convolutional neural network with a vector output, replaces the max pooling layer with a routing protocol mechanism, and optimizes the discriminative output. The number of routing iterations is denoted by r; here, r = 3. The algorithm of the dynamic routing mechanism is shown in Algorithm 1.
for k in r do 5.
end for The dynamic routing mechanism processes each capsule in the primary capsule layer and then iteratively learns and predicts the features of the next layer. The activation capsule vector for each layer is found, and the output value of the primary capsule layer is obtained by continuous iterative updates.

Loss Function
The CycleGAN model generation involves two types of loss functions: the adversarial loss function and the reconstruction loss function. The adversarial loss function aims to minimize the discrepancy between the data distributions of the generated image and the target domain, thus producing more realistic images. The reconstruction loss function ensures that the mapping relation between the source and target domains is consistent and aligned. To enhance the quality of the industrial defect images, we incorporate the perceptual loss function and the capsule loss function based on the U-Net structure and the capsule network structure. We also add the identity mapping loss to better capture the industrial defect features. The overall loss function is given by Equation (8).

=
( 2 , 2 , , ) + ( 2 , 2 , , ) + 1 ( 2 , 2 , , ) + 2 ( 2 , 2 , , ) + 3 ( 2 , 2 , , ) Standard generative adversarial networks adopt a binary, zero-sum game, which poses a very large very small game problem [38]. In this game, the generator and the discriminator compete to reach the final Nash equilibrium, as shown in Equation (9). The CycleGAN model employs the Sigmoid cross-entropy loss function as the adversarial loss function, which is suitable for logical classification problems. However, this loss function can cause gradient vanishing problems in the training of generative adversarial networks, affecting model convergence and optimization. To address this issue, we use the least squares method as the adversarial loss function and combine it with the edge loss of the capsule network. This improves the stability and convergence of the training and ensures the authenticity and diversity of the generated samples. The model training loop consists of defective industrial samples and non-defective industrial samples that are mutually generated with the same two parts of the loss function. As an example, we introduce the following formulas to generate defective samples (y) from non-defective samples (x) by using the generator 2 and the discriminator 2 , and we show the adversarial loss function in Equation (10). * , * = min G max D L ( 2 , 2 , 2 , 2 , , ) is the discriminator branch that uses the capsule network. The first part on the right-hand side of the equal sign is the loss function corresponding to the generator, and its optimization objective is to make the value 2 −1 ( ( )) of the discriminator that discriminates the generated image approach 1. The second and third parts are the loss functions corresponding to the discriminator, and their optimization objective is to make the value of the discriminator that discriminates the real image 2 −1 ( ( )) approach 1, and the value of the discriminator that discriminates the generated image 2 −1 ( ( )) approach 0. The fourth and fifth parts are the edge loss functions, and is a hyperparameter that indicates the relative importance of the edge loss in the improved adversarial loss. The capsule discriminator only needs to determine whether the input image is a real image or a generated fake image, which is defined in Equation (11).
where (see Equation (4)) is the output vector of the discriminator layer of the capsule network. k = 0 for real data, or k = 1 for generated false data.
= 1 if it is desired for the discriminator or generator to determine that this is true data at this point, or = 0 if it is desired for the discriminator to determine that this is generated false data at this point. + and − are the baselines for determining whether the input image is true or false. If the mode of the vector is larger than + , 0 is returned; if the mode of the vector is smaller than + , the square of the difference between the two is returned; if the mode of the vector is smaller than − , 0 is returned; if the mode of the vector is larger than − , the square of the difference between the two is returned.
In order to avoid the pattern collapse problem in the process of fighting against the loss function to reach Nash equilibrium, we add the perceptual loss function to enhance the feature and semantic information of the generated image during the generator training to make the generated image more realistic and clear. The formula is shown in Equation (12), where is the pre-trained VGG19 feature extractor.
Identity loss and loop loss are used to ensure that the generated industrial defect images are consistent with the input industrial defect-free images in terms of content and structure, and the constraint generator generates industrial defect samples on the same background as the input defect-free samples, and the formulas are shown in Equations (13) and (14).
The training process of the PreCaCycleGAN model is shown in Algorithm 2. Unlike the standard CycleGAN model, in order to ensure the convergence and accuracy of the model, we get the optimal training steps through experiments. In steps 3 to 6, we update the optimized generator 2 , discriminator 2 , generator 2 , discriminator 2 , to obtain the final industrial defect sample generation model.

Validation Experiments
To assess the effectiveness of PreCaCycleGAN-generated defect samples in enhancing the generalization performance of real industrial defect detection models, we employed the DAGM 2007 dataset as an experimental platform. DAGM 2007 [47] is a publicly accessible dataset of texture surface images with various types of defects, which simulates the real-world defect detection problem with high complexity and diversity. We compared the defect samples produced by PreCaCycleGAN with those produced by Tree-CycleGAN [42] and CycleGAN-TSS [43] and applied them to three state-of-the-art defect detection models that are widely adopted in practice, namely YOLOv5 [48], SSD [49], and Faster-RCNN [50]. We evaluated the impact of PreCaCycleGAN-generated defect samples on the generalization performance of defect detection models by measuring the mAP, false detection rate, and other metrics of the three generative models on different defect detection models and datasets. All experiments were conducted on a single NVIDIA Ge-Force GTX 3060 GPU.
The training process lasted for 150 iterations with a batch size of 1. The learning rate was initially set to 0.0002 and was linearly decayed starting from the 100th iteration. The hyperparameters in the loss function were empirically determined. We set 1 to 10, 2 to 0.5, 3 to 0.02, + in the capsule network to 0.9, − to 0.1, and to 0.5, respectively, and used the Adam optimizer with default parameters for gradient computation. The comparison plots of defects generated by the three generative models are shown in Figure 4.  In (a,b,e), it can be clearly found that PreCaCycleGAN is able to generate more realistic and diverse defects, while the other models suffer from blurring and distortion. (f-j) represent Class 5-Class 10 datasets, respectively. In (h-j), it can be clearly found that PreCaCycleGAN is able to generate a wider variety and a larger number of defects, while the other models can only generate a single defect.
Visually, compared to the other two models, the PreCaCycleGAN model with Dense-Net incorporated into the U-Net network exhibits more diverse defect sample generation in four datasets: (b), (h), (i), (j). Moreover, in three datasets: (a), (e), (f), and (g), our model with a two-branch discriminator demonstrates a more refined feature representation. However, in the remaining dataset, our defect generation performance is not clearly superior to that of the other models.
Although our model appears to improve defect generation diversity and feature quality compared to the existing models from visual inspection, the generated defect samples need to be tested on actual industrial inspection models to verify their effectiveness. Therefore, we need further quantitative data to support the current superiority of our model.

Detection Model Training Validation
To validate the effectiveness of our model-generated defect samples in real industrial manufacturing, we selected three models YOLOv5 [48], SSD [49], and Faster-RCNN [50], which are currently widely used in industrial inspection, for generating images for generalization enhancement of the detection model.  Out of the 60 results from De-Train B training and De-Train C training, PreCa-Cy-cleGAN achieved 51 top scores, and the results in the items that did not achieve top scores were very close to the top scores. In general, the detection accuracy was improved by 3-5% using De-Train B to train the dataset than using De-Train A to train the dataset, while the detection accuracy was improved by 8-10% using De-Train C to train the dataset than using De-Train A to train the dataset, which proves that the defect samples generated by different models all have a significant impact on the generalization and accuracy of the detection model. The improvement is evident, especially when the mixture of generated defect samples and real defect samples is used to train the detection model, which is consistent with the current actual industrial manufacturing situation. We further compared the defect samples generated by PreCaCycleGAN with those generated by Tree-CycleGAN and CycleGAN-TSS and found that PreCaCycleGAN-generated defect samples exhibited better detail features for detection model learning in different datasets. Taking the YOLOv5 detection model as an example, we observed that PreCaCycleGAN-generated defect samples improved the detection accuracy by about 4% in (a), (b), (f), (g), and by 1-2% in the remaining datasets, compared to the other two generative models. This proves that our model can generate images with more detailed defect features and defect diversity. The same trend was observed in both SSD and Faster-RCNN detection models, demonstrating that our model-generated images can be practically applied to industrial defect detection models and show good generalization.

Detection Model Test Validation
To further verify the generalization improvement of the generated images to the detection model, test sets are constructed to show the application performance of the detection model in multiple dimensions and to demonstrate the practicality of our model to generate defective samples. In the validation set, we also set three types of test sets De-Test A, De-Test B, and De-Test C, for different datasets, where De-Test A is composed of 120 defect-free samples and 60 real defect samples, De-Test B is composed of 120 defectfree real samples and 60 fake defect samples generated by different models. De-Test C is composed of 120 defect-free samples, 30 real defective samples, and 30 fake defective samples generated by different training sets and training models correspondingly. During the experiments, we used the three most important metrics in real industrial manufacturing, data detection accuracy (DDA), defect detection rate (DDR), and false detection rate (FDR), to measure the detection accuracy [51]. Among them, the data detection accuracy is the percentage of the sum of correctly detected defect data and defect-free data in the total data volume, the defect detection rate is the percentage of correctly detected defects in the total defect data, and the false detection rate is the percentage of incorrectly detected defect-free samples as defective samples among all samples detected as defective, as shown in Equations (15)- (17).
where TP is the number of correctly detected sample defects in the testing process, TN is the number of correctly detected true defect-free samples, FP is the number of incorrectly detected true defect-free samples as defective samples, and FN is the number of incorrectly judged defects as true sample backgrounds. The IOU of the detection model in the testing process is set to 0.25. The results of the validation set of (a)-(j) are shown in Tables  4-14.

Discussion
Out of 360 test results consisting of ten datasets, three generation models, and three detection models, our algorithm achieved 354 optimal results. Overall, the detection models trained with our model-generated defects mixed with real defects, YOLOv5 detection models in ten datasets compared to the original detection models, CycleGAN-TSS generation models and Tree-CycleGAN generation models improved the detection accuracy by 5.75%, 1.16% and 1.26% on average, respectively, and the average improvement in detection rate by an average of 14.94%, 3.60% and 3.55% improvement, and 5.44%, 1.22% and 1.63% decrease in false detection rate; SSD detection model compared to the original detection model, CycleGAN-TSS generation model and Tree-CycleGAN generation model detection accuracy improved by 5.46%, 1.23% and 1.27% on average, respectively, with an average improvement of detection rate by 13.73%, 3.76% and 3.35% on average, and the false detection rate by 6.74%, 1.89% and 2.42%; Faster-RCNN detection model compared to the original detection model, CycleGAN-TSS generation model and Tree-CycleGAN generation model detection accuracy by 5.64%, on average, respectively 0.88% and 1.42%, the average improvement in detection rate is 14.54%, 3.22% and 3.62% on average, and the false detection rate is decreased by 5.64%, 0.89% and 2.26%.
The test results show that compared with the two generation models of Tree-Cy-cleGAN and CycleGAN-TSS, our model can extract the local and global features of defects more effectively and improve the fineness of features by adding perceptual functions and capsule discriminators for (d) and (i), which are datasets with complex backgrounds and obscure performance of defective features, Class4 and Class9 detection results are shown in Figures 5 and 6. The false detection rate of the original Faster-RCNN model in (d) is 12.12%, the false detection rate of the Tree-CycleGAN model that detects mixed samples after mixed sample training is 7.89%, and the false detection rate of the CycleGAN-TSS model is 5.26%, and our model can reduce the false detection rate to 2.70%; in (i) The false detection rate of the original YOLOv5 model is 19.61%, and the false detection rate of the Tree-CycleGAN model that detects mixed samples after mixed sample training is 10.00%, and the false detection rate of the CycleGAN-TSS model is 10.53%, and our model can reduce the false detection rate to 7.59%.    The comparison experiments show that our model can generate high-quality defect samples on different types of defect datasets, and mixing and matching real defect samples can further improve the generalization ability and robustness of the defect detection model, which can effectively identify defects with obscure features in complex backgrounds and further enhance the authenticity and diversity of defect features.
Subsequent IOU value optimization for our optimal model for actual industrial defect detection, and finally, when the IOU value is 0.15, the false detection rate is 0%, and the average accuracy rate of various data sets reaches 98.73%. The experiments show that the defect samples generated by our model are better than those generated by the current defect sample generation model and can be practically applied to industrial defect detection, which can effectively improve the robustness and generalization of the defect detection model.

Conclusions
In this work, we design PreCaCycleGAN, a model-based image-enhanced adversarial generative network for industrial defect sample augmentation, which aims to address the problem of defect data scarcity in industrial defect detection by current deep learning methods and to improve the generalization and robustness of the defect detection model in real industrial manufacturing. We show that our model uses the DenseNet module to enhance the U-Net network to synthesize defect samples and that the defect samples synthesized on different datasets are more realistic and faithful to the defect samples. We also show that our model uses a dual-branch discriminator and an added damage function to make the defect details more refined. We compare the augmentation effect of PreCa-CycleGAN on different detection models with the state-of-the-art industrial defect sample generation models and demonstrate that our model has better defect sample generation diversity. However, our proposed method still has some limitations in practical industrial defect detection applications. On the one hand, it is not very effective in some industrial scenarios with limited resources or high-efficiency requirements. On the other hand, the generated industrial defect samples still need manual annotation. In the future, we will reduce the algorithm complexity to further lower the computational overhead in defect sample generation, and integrate the annotation algorithm to achieve automatic annotation of the generated defect samples, so as to better adapt to the industrial defect detection application scenarios.  Data Availability Statement: Data sets generated during the current study are available from the corresponding author upon reasonable request. The DAGM2007 public dataset can be found at https://hci.iwr.uni-heidelberg.de/content/weakly-supervised-learning-industrial-optical-inspection (accessed on 20 December 2022).

Conflicts of Interest:
The authors declare no conflict of interest.