Detecting and Measuring Defects in Wafer Die Using GAN and YOLOv3

: This research used deep learning methods to develop a set of algorithms to detect die particle defects. Generative adversarial network (GAN) generated natural and realistic images, which improved the ability of you only look once version 3 (YOLOv3) to detect die defects. Then defects were measured based on the bounding boxes predicted by YOLOv3, which potentially provided the criteria for die quality sorting. The pseudo defective images generated by GAN from the real defective images were used as the training image set. The results obtained after training with the combination of the real and pseudo defective images were 7.33% higher in testing average precision (AP) and more accurate by one decimal place in testing coordinate error than after training with the real images alone. The GAN can enhance the diversity of defects, which improves the versatility of YOLOv3 somewhat. In summary, the method of combining GAN and YOLOv3 employed in this study creates a feature-free algorithm that does not require a massive collection of defective samples and does not require additional annotation of pseudo defects. The proposed method is feasible and advantageous for cases that deal with various kinds of die patterns.


Introduction
Wafer is the major material for making integrated circuits (ICs), and it plays an indispensable role in electronic products. The upstream of the semiconductor industry are IC design companies and silicon wafer manufacturing companies. IC design companies design circuit diagrams according to customer needs, while silicon wafer manufacturing companies use polysilicon as the raw material for silicon wafers. The primary task of IC manufacturing companies in the midstream is to transfer the circuit diagrams to wafers. The completed wafer is then sent to the downstream IC packaging and testing companies for packaging and testing the functions of ICs, concluding the whole manufacturing process.
With the continuous evolution of wafer manufacturing technology, wafer sizes have become larger and the patterns on the die have become more diverse. In order to inspect surface defects in the dies of a wafer, automated optical inspection (AOI), mainly using one or more optical imagery charge-coupled devices (CCDs), has gradually replaced traditional manual visual inspection (VI). solve the problem of die defect classification, while relatively few focused on how to use those to solve the problem of die defect detection. The latter is the focus of the present study and makes a contribution to introduce an object detection method, you only look once version 3 (YOLOv3), to solve the die defect detection problem. The YOLOv3 model is able to predict the center coordinate, width, and height of each bounding box where the defect is located, and the confidence that each bounding box contains a defect. There is no need to rely on experts for feature engineering and have a certain degree of invariance to interference such as translation and rotation, which are attractive characteristics for companies that face constant changes die patterns. In addition, since the particle defects embedded on the dies are very tiny and some of the defects are dense, YOLOv3 uses DarkNet53 as the backbone and introduces multiscale detection, which is able to detect defects of different sizes on the extracted feature maps. In this way it can effectively detect tiny and dense defects, ensuring the quality of the die.
Moreover, there is another issue regarding defective samples collection and annotation in the factory environment. Operators do not have much time to collect various appearances of different kinds of defects. Since the collection of defect images is time-consuming, recent research on generating pseudo defective images with GAN has attracted attention. Chen et al. [12] uses affine transformation and naïve generative adversarial networks (GAN) to tackle the problem of having unbalanced quantities of defect-free and defective images. They expanded the number of defective images that enhanced the classifier's generalization ability. Tsai et al. [13] applied cycle-consistent adversarial network (CycleGAN) to generate the saw-mark defect in heterogeneous solar wafer images and to solve the unbalanced classification problem arising in manufacturing inspection. Their experiment showed that the CNN's classification accuracy rates of GAN-based data augmentation were better than those of doing over-sampling and assigning higher class-weights on minor classes. In addition to the research related to defect classification, the GAN-based method was also applied to the field of defect detection. Yang et al. [14] introduced an image generating process for welded joints, which was based on the affine transformation and CycleGAN. Then the YOLOv3 model was used for the welding head detection, with a better average precision (AP) of 91.02% than faster region-based convolutional neural network (Faster RCNN). Tian et al. [15] used CycleGAN to augment images of healthy apples and apples with anthrax, thereby increasing the number of images and enriching its content. After that, YOLOv3-dense was used to test the apple for anthracnose. Experiments showed that their model performs at an AP of 95.57%. This method could also be applied to the detection of apple surface diseases in orchards. However, some welded joints on metals or anthraces on apples generated by CycleGAN are not expected output [14,15]. After augmenting the data using GAN, rich pseudo images are obtained. Although the appearance diversity of defects increases, no corresponding annotation files are provided [14,15], as operators have no time for time-consuming annotation work. Besides, GAN is currently unable to form specific structures, generative images that are not only blurry but also incorrectly colored. These undesirable generative images should be manually deleted. Cooperating GAN with an automatic annotation method is another contribution of this study, so that the data pairs are available for training the YOLOv3 die defect detection model. This research uses a series of pre-and post-digital image processing (DIP) techniques to reduce the generation loading of GAN and to develop an autoannotation procedure for pseudo defective images. The DIP techniques not only help to generate realistic pseudo defects but also save the time needed for annotating defective pseudo images.
This paper is composed of four parts. Following this introductory section, which has summarized the literature on die inspection and presents the contributions of this study, is the second section that describes the hardware architecture for capturing images, and the methodology. It also introduces GAN, the automatic annotation mechanism, and the modified application of YOLOv3. The third section presents the experimental results. A spot-checking process helps us to determine the YOLOv3 as the base model. The hyperparameters to be used in the GAN + YOLOv3 mechanism are derived, based on the design of the experiment (DOE). The defect detection results are reported, compared, and analyzed. The final part is the conclusion.

Research Method
The overall research process of this study is shown in Figure 1. First, we captured the images through the image-capturing system. We examined the die image structure and composition, and the appearance, characteristics, and specification of the particles. The next step was to separate the image set into training, validation, and testing sets. We manually marked the fine particles embedded on the surface of the die through an annotation tool to create the annotation file corresponding to each image. Defect-free dies do not need to be annotated and are not included in the training process. In order to increase the diversity of defects, the study created pseudo particle defects with the help of GAN's automatic generation ability. The study also automatically generated an annotation file corresponding to each pseudo defective image through the connected component labeling (CCL) [16]. The next step was to feed real and pseudo defective images to the YOLOv3 model for learning. Finally, we measured size of defects. Details of the research procedure are explained in the following sub-sections.

Research Method
The overall research process of this study is shown in Figure 1. First, we captured the images through the image-capturing system. We examined the die image structure and composition, and the appearance, characteristics, and specification of the particles. The next step was to separate the image set into training, validation, and testing sets. We manually marked the fine particles embedded on the surface of the die through an annotation tool to create the annotation file corresponding to each image. Defect-free dies do not need to be annotated and are not included in the training process. In order to increase the diversity of defects, the study created pseudo particle defects with the help of GAN's automatic generation ability. The study also automatically generated an annotation file corresponding to each pseudo defective image through the connected component labeling (CCL) [16]. The next step was to feed real and pseudo defective images to the YOLOv3 model for learning. Finally, we measured size of defects. Details of the research procedure are explained in the following sub-sections.

Hardware Structure and Composition of the Die
In order to retrieve the surface images of the dies on the wafer, the study used the imagecapturing system shown in Figure 2a. The CCD in this system was Hitachi KP-FD202GV, and the resolution was a 1620 × 1220 color image. The lens is an Olympus Lens with an optical magnification of 5 ×, a working distance of 19.6 mm, and a resolution of 3.36 μm, coupled with a lighting source with a 12 V/100 W coaxial yellow ring halogen lamp. The front-illuminated light source emphasizes the surface characteristics of the inspection object. During the shooting process, the researchers used the XY axis motion controller to capture the image of each die to be inspected with an S-shaped

Hardware Structure and Composition of the Die
In order to retrieve the surface images of the dies on the wafer, the study used the image-capturing system shown in Figure 2a. The CCD in this system was Hitachi KP-FD202GV, and the resolution was a 1620 × 1220 color image. The lens is an Olympus Lens with an optical magnification of 5×, a working distance of 19.6 mm, and a resolution of 3.36 µm, coupled with a lighting source with a 12 V/100 W coaxial yellow ring halogen lamp. The front-illuminated light source emphasizes the surface characteristics of the inspection object. During the shooting process, the researchers used the Appl. Sci. 2020, 10, 8725 5 of 15 XY axis motion controller to capture the image of each die to be inspected with an S-shaped scanning path. By lowering the requirements for positioning precision, one image could contain multiple dies, as shown in Figure 2b. However, only the die pattern at the center of the image was intact, called region of interesting (ROI), and the eight neighboring dies had only partial patterns.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 16 scanning path. By lowering the requirements for positioning precision, one image could contain multiple dies, as shown in Figure 2b. However, only the die pattern at the center of the image was intact, called region of interesting (ROI), and the eight neighboring dies had only partial patterns. The appearance of the image of the die surface in this research is shown in Figure 2c. In compliance with a confidentiality agreement with the case company, images displayed in this paper are only part of the die, and the images have been flipped and discolored before presentation. The die was composed of the pad, the ion implantation zone, the bottom layer and the testing block. The pad was mainly used for electrical testing to ensure the function of the die. The bottom layer was a protective layer protected by a thin film, which could protect the components from chemical reactions, moisture, corrosion, pollution, etc. The testing block was used by the foundry customers to perform special tests. During the manufacturing process, the wafer might contaminate the surface of the die due to particle residues, which could result in defective products. These particles appeared at random positions, and they might be seen on the surface of the entire die. The shape of the particle was irregular, either large or small, and sometimes dense clusters occurred. The testing block on the die was a dark rectangular pattern with an appearance similar to that of particle defects, which increased the difficulty of defect detection.

Manually Annotating Defects
After building the die image set, we needed a corresponding annotation set before the model could be trained. LabelImg was used as an annotating tool by the researchers to manually annotate the locations and names of the defects in the image one by one. These annotation messages were stored in the XML format, and the filename was the same as the filename of the annotated image, except for the filename extension. As shown in Figure 2b, only the central die in the image was the ROI. The traditional method might have been to design an algorithm to perform ROI image segmentation before proceeding to subsequent actions. However, since this study would use the object detection method in deep learning, the preprocess of extracting ROI could be omitted. As long as the researchers focused on framing the defects on the ROI die when annotating, later the algorithm would naturally ignore the defects on the eight neighboring dies when detecting defects. The appearance of the image of the die surface in this research is shown in Figure 2c. In compliance with a confidentiality agreement with the case company, images displayed in this paper are only part of the die, and the images have been flipped and discolored before presentation. The die was composed of the pad, the ion implantation zone, the bottom layer and the testing block. The pad was mainly used for electrical testing to ensure the function of the die. The bottom layer was a protective layer protected by a thin film, which could protect the components from chemical reactions, moisture, corrosion, pollution, etc. The testing block was used by the foundry customers to perform special tests. During the manufacturing process, the wafer might contaminate the surface of the die due to particle residues, which could result in defective products. These particles appeared at random positions, and they might be seen on the surface of the entire die. The shape of the particle was irregular, either large or small, and sometimes dense clusters occurred. The testing block on the die was a dark rectangular pattern with an appearance similar to that of particle defects, which increased the difficulty of defect detection.

Manually Annotating Defects
After building the die image set, we needed a corresponding annotation set before the model could be trained. LabelImg was used as an annotating tool by the researchers to manually annotate the locations and names of the defects in the image one by one. These annotation messages were stored in the XML format, and the filename was the same as the filename of the annotated image, except for the filename extension. As shown in Figure 2b, only the central die in the image was the ROI. The traditional method might have been to design an algorithm to perform ROI image segmentation before proceeding to subsequent actions. However, since this study would use the object detection method in deep learning, the preprocess of extracting ROI could be omitted. As long as the researchers focused on framing the defects on the ROI die when annotating, later the algorithm would naturally ignore the defects on the eight neighboring dies when detecting defects.

Defect Data Augmentation by GAN and Their Autoannotation
As proposed by Goodfellow et al. [17], GAN has a wide range of applications, such as fashion, advertising, science, and games. Since the images for defect detection are usually captured in a stable environment, each image is roughly the same regardless of location or color. Therefore, the general traditional data augmentation method is not necessarily applicable. The case company does not have much time for engineers to collect a huge image set, let alone an additional time-consuming annotation work for a large number of image sets. To overcome this difficulty, this study took advantage of the powerful generative capabilities of GAN to create richer types of defects. The basic idea of GAN is shown in Appendix A.1 of Appendix A.
When we directly input a set of die images into the GAN model, however, we found that its objective function value fluctuated during the iteration and converged only with difficulty. Additionally, we also found that the GAN model could not generate high-resolution pseudo images effectively. It could only generate the approximate outline of the die, but the details could not be identified. Consequently, a strategy of simply generating the particle defects was adopted.
The detailed process is shown in Figure 3. We used the defect coordinate position, and the information of length and width that the annotator has previously noted in the real image, which helped to cut out the patch that indicates the area of each particle defect. Otsu binarization [18] was then used to eliminate the background in the patch as far as possible, retaining the original appearance of the particle defects, and the defects were attached to a white background image with the same size as the GAN input image. As shown in the bottom left of Figure 3, GAN is composed of two networks: a generator and a discriminator. During the first iteration, the generator generated poor pseudo images and the discriminator distinguished them from real images easily. During the second iteration, the quality of pseudo images generated by the generator was improved, which fooled the underlying discriminator. With the rise of the ability of the discriminator, real images and pseudo images can be recognized, which will also drive the improvement of the generator. The training process of adversarial learning between the two networks was continued and a generative model similar to the real image distribution was created. As the learning object became simpler, the objective function converged rapidly, and generated more realistic pseudo particle defects. Finally, we embedded the pseudo particle defects into the defect-free dies to create a generative pseudo image set. Although we used GAN for data augmentation to increase the diversity of defects, the annotation files of these pseudo defective images were not generated. In line with the previous literature, an additional manual method was adopted to annotate the pseudo defect images [14,15]. In order to save time when annotating the pseudo defective images, DIP techniques were used to automatically annotate the pseudo particles as shown in the bottom right of Figure 3. The CCL algorithm [16] scanned the image from left to right and top to bottom. If the gray values between adjacent pixels were found to be similar during scanning, they were labeled with a same index. Each pseudo particle defect would be regarded as a blob, and information of its minimum bounding box could also be registered. Then the XML annotation file of the pseudo defective image could be output, which reduced the time spent in annotating the pseudo defective images.

Defect Detection and Measurement Using YOLOv3
This research used YOLOv3 [19] as the basis for die defect detection and measurement. The basic idea of YOLOv3 is shown in A.2 of Appendix A. The YOLOv3 model is a one-stage method, end-to-end training process that can be realized using a single network. The inference can predict the center coordinate, width and height of each bounding box where the defect is located, and the confidence that each bounding box contains a defect.
After YOLOv3 output the predicted bounding boxes, the study would further measure the defects of the corresponding patches and sort the quality of a die, as shown in Figure 4. The process included Ostu binarization [18], the estimation of the bounding ellipse, and the calculation of the major and the minor axis. The process was able to potentially assist to conduct the sorting of the dies in accordance with the quality specifications of the customers. For example, there were three classes of die products: an excellent die had no particle defect after inspection; a qualified die had particle defects with the major axis length between 50 and 149 µm and the minor axis length less than 20 µm; an unqualified die had particle defects that exceeded the quality specification.

Defect Detection and Measurement Using YOLOv3
This research used YOLOv3 [19] as the basis for die defect detection and measurement. The basic idea of YOLOv3 is shown in A.2 of Appendix A. The YOLOv3 model is a one-stage method, end-toend training process that can be realized using a single network. The inference can predict the center coordinate, width and height of each bounding box where the defect is located, and the confidence that each bounding box contains a defect.
After YOLOv3 output the predicted bounding boxes, the study would further measure the defects of the corresponding patches and sort the quality of a die, as shown in Figure 4. The process included Ostu binarization [18], the estimation of the bounding ellipse, and the calculation of the major and the minor axis. The process was able to potentially assist to conduct the sorting of the dies in accordance with the quality specifications of the customers. For example, there were three classes of die products: an excellent die had no particle defect after inspection; a qualified die had particle defects with the major axis length between 50 and 149 μm and the minor axis length less than 20 μm; an unqualified die had particle defects that exceeded the quality specification.

Analysis and Results of the Experiments
This study collected 669 die images, 198 defect-free images, and 471 defective images with particles. Since training an object detection model needs only defect samples of the "object", this research randomly selected 300 defective images as the training image set. The remaining defective

Analysis and Results of the Experiments
This study collected 669 die images, 198 defect-free images, and 471 defective images with particles. Since training an object detection model needs only defect samples of the "object", this research randomly selected 300 defective images as the training image set. The remaining defective images were used as testing image sets to evaluate the inference performance of the model. In addition, on the production line, the appearance of defects is multifaceted. It is not possible to produce distinctively different defect appearances if we only depend upon the jitter mechanism of YOLOv3 itself. In addition to the images generated by the jitter mechanism, this research also applied the GAN to generate images of pseudo die defects.

Spot-Checking Experiment
The spot-checking experiment gets a quick assessment of different models on a custom dataset. Researchers are able to know which type of models is suitable at picking out the structure of the dataset. In order to demonstrate the performance of the object detection models for the detection of particle defects on the dies, this study compared the YOLOv3, the Faster RCNN [20], and the single shot multibox detector (SSD) [21]. After all the training processes were completed, the validation AP was used to evaluate the performance of the models on defective images. As shown in second column of Table 1, there were significant gaps of validation APs between YOLOv3 and other two models. In practice, the inference speed of a model was always concerned. The frames per second (FPS) was adopted here to evaluate the inference speed of the models as shown in the last column of Table 1. We found that the inference speed of SSD was fastest, followed by the YOLOv3 and lastly by faster RCNN. Even though the FPS of YOLOv3 was not the fastest, it was enough to be applied to the production line. The spot-checking experiment indicates that the YOLOv3 was the best model at learning the structure in the dataset so we could focus the attention to optimize it.

Hyperparameter Sensitivity Experiment
The hyperparameters of a model are related to the flexibility and potential of its learning, and directly influence the degree of the generalization when the model makes inferences. Since training a deep learning model often takes a long time, it is extremely inefficient to find the optimal hyperparameter combination manually for a deep learning model. Based on DOE, the research analyzed the validation AP with various hyperparameter combinations. It endeavored to identify the key hyperparameter combinations that affect the AP of die defect detection, which provides the basis for improving the AP of the model in a reasonable way. The DOE of this research includes four factors, and each factor has three levels.

•
Input image size of GAN: The quality of the image generated by the GAN is not only influenced by the background complexity of the input image, but also by the size of the input image. If the size is too small, the GAN cannot generate detailed defects; if the size is too large, the GAN would weaken during the generation process. This research set three levels of the factor to be 28 × 28, 64 × 64, and 96 × 96, with 28 × 28 as the default value, and 96 × 96 the upper limit of the defect size. • Fold size of GAN image augmentation: This study proposed an image augmentation technology based on the GAN. Defect patches were generated from the adversarial learning process. Then the patches were pasted onto the defect-free die image to generate pseudo defect images. Compared with the original defective die images in Figure 5(a1,a2), the defects in Figure 5(b1,b2) were naturally embedded in the die image. This process generated various shapes, sizes and numbers of defects, and increased the quantity of training images and the diversity of defects, allowing YOLOv3 to learn a richer appearance of defects, and improving the performance of model training.
In this study, the factors were set to three levels: 1, 1.5, and 2. The 1 meant that the GAN image augmentation mechanism was turned off; 1.5 and 2 meant that the original number of defective die images was multiplied by 1.5 and 2 respectively as the pseudo defective die images for later training.

•
Upper limit of the input image size: There are only convolutional layers in YOLOv3, and its input image size is unrestricted. However, in order to strengthen the robustness of model inference, YOLOv3 adopted a multiscale training strategy. During the training process, the size of the input image was changed after a certain number of iterations. YOLOv3 could define the upper limit of the size range of the input image during the training process. In addition, because the minimum input feature map in Yoloblock employed downsampling 32 times, the upper limit of the image size range must also be a multiple of 32. In the experimental design, the factor was set to three levels: 416 × 416, 480 × 480, and 544 × 544. 416 × 416 was the default value of YOLOv3. • Degree of jitter: In addition to the pseudo images generated by the GAN, the YOLOv3 also had its own data augmentation program, called the degree of jitter. It could flip, zoom, crop, and perform HSV contrast conversion of the input image to augment the images and suppress overfitting. This research set the factor to three levels: 0, 0.15, and 0.3. The 0.3 was the default value of YOLOv3, and 0 indicated that the jitter was turned off.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 16 the GAN image augmentation mechanism was turned off; 1.5 and 2 meant that the original number of defective die images was multiplied by 1.5 and 2 respectively as the pseudo defective die images for later training.  Upper limit of the input image size: There are only convolutional layers in YOLOv3, and its input image size is unrestricted. However, in order to strengthen the robustness of model inference, YOLOv3 adopted a multiscale training strategy. During the training process, the size of the input image was changed after a certain number of iterations. YOLOv3 could define the upper limit of the size range of the input image during the training process. In addition, because the minimum input feature map in Yoloblock employed downsampling 32 times, the upper limit of the image size range must also be a multiple of 32. In the experimental design, the factor was set to three levels: 416 × 416, 480 × 480, and 544 × 544. 416 × 416 was the default value of YOLOv3.  Degree of jitter: In addition to the pseudo images generated by the GAN, the YOLOv3 also had its own data augmentation program, called the degree of jitter. It could flip, zoom, crop, and perform HSV contrast conversion of the input image to augment the images and suppress overfitting. This research set the factor to three levels: 0, 0.15, and 0.3. The 0.3 was the default value of YOLOv3, and 0 indicated that the jitter was turned off.
Next, this study removed 20% from the training image set to be used as the validation image set (not including any pseudo defective images). After conducting 3 4 DOE, the main effect plots of the validation AP for all the factor and level combinations were drawn as shown in Figure 6. Using the criterion "the larger the better" (LTB) for validation AP, the researchers selected the hyperparameter combination: the input image size and augmentation fold size of the GAN were 64 × 64 and 2, and the upper limit of the input image size and jitter degree of YOLOv3 were 416 × 416 and 0.3.  Next, this study removed 20% from the training image set to be used as the validation image set (not including any pseudo defective images). After conducting 3 4 DOE, the main effect plots of the validation AP for all the factor and level combinations were drawn as shown in Figure 6. Using the criterion "the larger the better" (LTB) for validation AP, the researchers selected the hyperparameter combination: the input image size and augmentation fold size of the GAN were 64 × 64 and 2, and the upper limit of the input image size and jitter degree of YOLOv3 were 416 × 416 and 0.3.


Upper limit of the input image size: There are only convolutional layers in YOLOv3, and its input image size is unrestricted. However, in order to strengthen the robustness of model inference, YOLOv3 adopted a multiscale training strategy. During the training process, the size of the input image was changed after a certain number of iterations. YOLOv3 could define the upper limit of the size range of the input image during the training process. In addition, because the minimum input feature map in Yoloblock employed downsampling 32 times, the upper limit of the image size range must also be a multiple of 32. In the experimental design, the factor was set to three levels: 416 × 416, 480 × 480, and 544 × 544. 416 × 416 was the default value of YOLOv3.  Degree of jitter: In addition to the pseudo images generated by the GAN, the YOLOv3 also had its own data augmentation program, called the degree of jitter. It could flip, zoom, crop, and perform HSV contrast conversion of the input image to augment the images and suppress overfitting. This research set the factor to three levels: 0, 0.15, and 0.3. The 0.3 was the default value of YOLOv3, and 0 indicated that the jitter was turned off.
Next, this study removed 20% from the training image set to be used as the validation image set (not including any pseudo defective images). After conducting 3 4 DOE, the main effect plots of the validation AP for all the factor and level combinations were drawn as shown in Figure 6. Using the criterion "the larger the better" (LTB) for validation AP, the researchers selected the hyperparameter combination: the input image size and augmentation fold size of the GAN were 64 × 64 and 2, and the upper limit of the input image size and jitter degree of YOLOv3 were 416 × 416 and 0.3.

Results of Die Defect Detection and Measurement
After deciding the hyperparameters of the GAN + YOLOv3 model and training the model, defect detection and measurement of the remaining test images were performed. The pipeline of the testing process first inferred the predicted bounding boxes of the defects through YOLOv3. Then, the major and the minor axis of the defects were measured for the content inside the bounding box. After the inference was completed, different evaluation metrics were used to measure the generalization ability of the proposed algorithm.
The testing AP was used to measure the performance of the predicted bounding boxes: after the testing image set was inferred by the object detection method, the predicted bounding boxes were compared with the ground truth boxes, and the average of the maximum precision values calculated when recall ≥ 0, 0.1, ..., 1.0. Coordinate prediction error was used to measure the accuracy of the coordinate prediction: after the testing image set was inferred by the object detection method, the closeness of the center coordinates, length, and width of the predicted bounding boxes were compared with those of the ground truth boxes, which could be calculated through the first two items in Equation (A2) of Appendix A. Figure 7 demonstrates the patches of defect detection results. When inputting the die image of the product, as shown in Figure 7(a1-a4), the model precisely box-bounds the corresponding particles, as in Figure 7(b1-b4). The testing blocks in Figure 7(b1-b4) are not falsely box-bounded. YOLOv3 is able to discriminate between irregular-shaped particles and rectangular testing blocks. The model effectively detected particle defects on the surface of the die, and even very small defects could be detected successfully.
process first inferred the predicted bounding boxes of the defects through YOLOv3. Then, the major and the minor axis of the defects were measured for the content inside the bounding box. After the inference was completed, different evaluation metrics were used to measure the generalization ability of the proposed algorithm.
The testing AP was used to measure the performance of the predicted bounding boxes: after the testing image set was inferred by the object detection method, the predicted bounding boxes were compared with the ground truth boxes, and the average of the maximum precision values calculated when recall ≥ 0, 0.1,..., 1.0. Coordinate prediction error was used to measure the accuracy of the coordinate prediction: after the testing image set was inferred by the object detection method, the closeness of the center coordinates, length, and width of the predicted bounding boxes were compared with those of the ground truth boxes, which could be calculated through the first two items in Equation (A2) of Appendix A. Figure 7 demonstrates the patches of defect detection results. When inputting the die image of the product, as shown in Figure 7a1-a4, the model precisely box-bounds the corresponding particles, as in Figure 7b1-b4. The testing blocks in Figure 7b1-b4 are not falsely box-bounded. YOLOv3 is able to discriminate between irregular-shaped particles and rectangular testing blocks. The model effectively detected particle defects on the surface of the die, and even very small defects could be detected successfully. In order to further demonstrate the performance improvement of the GAN-based image augmentation technology for the detection of particle defects on the dies, this research also constructed the YOLOv3 model, the GAN + YOLOv3 model (augmenting 1.5 times the training sample), the GAN + YOLOv3 model (augmenting 2 times the training sample), and the CycleGAN + YOLOv3 model (augmenting 1.5 times the training sample). After the training of the four models was completed, the study used the testing AP and the testing coordinate prediction error to evaluate the models on testing images.
Before calculating AP, the precision-recall (PR) curve of each model was drawn as shown in Figure 8. The pseudo defective die image generated by the GAN could work along well with the true defective die image to train the YOLOv3. In addition, when GAN helped to increase the training image by about 1.5 times, the PR curve tended to converge as shown in Figure 8, and the corresponding testing AP jumped from 81.39% to more than 88%, an increase of about 7% as shown In order to further demonstrate the performance improvement of the GAN-based image augmentation technology for the detection of particle defects on the dies, this research also constructed the YOLOv3 model, the GAN + YOLOv3 model (augmenting 1.5 times the training sample), the GAN + YOLOv3 model (augmenting 2 times the training sample), and the CycleGAN + YOLOv3 model (augmenting 1.5 times the training sample). After the training of the four models was completed, the study used the testing AP and the testing coordinate prediction error to evaluate the models on testing images.
Before calculating AP, the precision-recall (PR) curve of each model was drawn as shown in Figure 8. The pseudo defective die image generated by the GAN could work along well with the true defective die image to train the YOLOv3. In addition, when GAN helped to increase the training image by about 1.5 times, the PR curve tended to converge as shown in Figure 8, and the corresponding testing AP jumped from 81.39% to more than 88%, an increase of about 7% as shown in the second column of Table 2. As indicated by the results of the testing coordinate prediction error indicator, the coordinates and length and width of the predicted bounding boxes and the ground truth boxes, were very close to the ground truth labels. Even without the help of the GAN, the bounding box error predicted by the naïve YOLOv3 model was below three decimal places. After adding the GAN, the testing coordinate prediction error could be reduced to a level below four decimal places as shown in the last column of Table 2. This experiment shows that the pseudo defect images generated by the GAN play an important role in enriching the diversity of defects, which helps to improve the efficacy and versatility of the model. Beside, we also compared the CycleGAN + YOLOv3 with proposed GAN + YOLOv3. The corresponding result is shown in the last row of Table 2. It clearly found that its testing AP and coordinate prediction error were not satisfied. The main reason is the appearance of the particle patches generated by CycleGAN was far from the real particle. Not only the area of defect was large, but also the edge of defect was not smooth.
in the second column of Table 2. As indicated by the results of the testing coordinate prediction error indicator, the coordinates and length and width of the predicted bounding boxes and the ground truth boxes, were very close to the ground truth labels. Even without the help of the GAN, the bounding box error predicted by the naïve YOLOv3 model was below three decimal places. After adding the GAN, the testing coordinate prediction error could be reduced to a level below four decimal places as shown in the last column of Table 2. This experiment shows that the pseudo defect images generated by the GAN play an important role in enriching the diversity of defects, which helps to improve the efficacy and versatility of the model. Beside, we also compared the CycleGAN + YOLOv3 with proposed GAN + YOLOv3. The corresponding result is shown in the last row of Table 2. It clearly found that its testing AP and coordinate prediction error were not satisfied. The main reason is the appearance of the particle patches generated by CycleGAN was far from the real particle. Not only the area of defect was large, but also the edge of defect was not smooth.

Conclusion
Defect sample collection, defect annotation, and feature engineering have always been the most time-consuming tasks in defect detection. To address this issue, this research integrated the technology of generative pseudo defective samples (using GAN), automatic pseudo defect annotation (using DIP), and automatic feature extraction (using YOLOv3). The methods proposed in this study need not to rely on experts for feature engineering and did not need bulk defect samples. Massive defect annotations are not required, either. Users need only to prepare a few defect image

Conclusions
Defect sample collection, defect annotation, and feature engineering have always been the most time-consuming tasks in defect detection. To address this issue, this research integrated the technology of generative pseudo defective samples (using GAN), automatic pseudo defect annotation (using DIP), and automatic feature extraction (using YOLOv3). The methods proposed in this study need not to rely on experts for feature engineering and did not need bulk defect samples. Massive defect annotations are not required, either. Users need only to prepare a few defect image sets, manually annotate them, and complete the model training before conducting the inferences. This means that the method has great potential for application in various die patterns, where appearances are changeable and complex. In addition, the experimental results show that after the addition of the GAN mechanism, both the overall detection precision of the predicted bounding box and the measurement accuracy of quality classification were improved. This indicates that the pseudo defect images generated by the GAN help enrich the diversity of the training data set, which to some extent improved the versatility of the model.
If sematic segmentation methods make a breakthrough in the inference speed in the future, it may be possible to combine the GAN and sematic segmentation methods to perform defect segmentation. The annotation process of the object segmentation model captures the outline of the defect in the image, rather than simply annotating the rectangular bounding box, as happens in the object detection model. Therefore, the annotation does not contain the background and does not consider the angle. The annotation may include other rectangles near the defects. In this way, the process of removing the background and the process of extracting blobs from the predicted bounding box can be omitted, and the efficiency of model inference can be improved. During training, YOLOv3 used the modified loss function and the back-propagation algorithm to learn the weights, as shown in Equation (A2). The loss function of YOLOv3 was originally composed of three parts, namely coordinate prediction error, intersection over union (IoU) error, and classification error [19]. However, since this research focused on the single classification problem, the classification error could be omitted. During training, YOLOv3 used the modified loss function and the back-propagation algorithm to learn the weights, as shown in Equation (A2). The loss function of YOLOv3 was originally composed of three parts, namely coordinate prediction error, intersection over union (IoU) error, and classification error [19]. However, since this research focused on the single classification problem, the classification error could be omitted.
The first two terms in Equation (A2) represent the coordinate prediction error, and λ coord was the weight hyperparameter given in advance. Since the number of grid cells that did not contain objects far exceeded the number of grid cell that contained objects, the loss of confidence without objects would be great. In order to reduce the impact of this problem on the network, it was generally assumed to be 5. I obj ij described the predicted bounding box j of the grid cell i had an indicator function containing objects. Thex i ,ŷ i ,ŵ i , andĥ i represent the central coordinates and the length and width of the ith predicted bounding box and the x i , y i , w i , and, h i represented those of the ith ground truth box.
In addition, the last two terms in Equation (A2) represented the IoU error, where λ noobj was the weight hyperparameter given in advance, which generally defaulted to 0.5. I noobj ij represented that the predicted bounding box j of the grid cell i did not contain the indicator function of the object. TheĈ i represented the ith predicted value of confidence; the C i referred to whether the ith ground truth box contained object.