CycleGAN-Based Data Augmentation for Scanning Electron Microscope Images to Enhance Integrated Circuit Manufacturing Defect Classification

Yen, Andrew; Chang, Nemo; Chien, Jean; Chuang, Lily; Lee, Eric

doi:10.3390/electronics15040803

Open AccessEditor’s ChoiceArticle

CycleGAN-Based Data Augmentation for Scanning Electron Microscope Images to Enhance Integrated Circuit Manufacturing Defect Classification

by

Andrew Yen

,

Nemo Chang

,

Jean Chien

,

Lily Chuang

and

Eric Lee

^*

Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(4), 803; https://doi.org/10.3390/electronics15040803

Submission received: 22 January 2026 / Revised: 11 February 2026 / Accepted: 11 February 2026 / Published: 13 February 2026

(This article belongs to the Special Issue Deep Learning in Video and Image Processing: Challenges, Solutions, and Future Directions, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Semiconductor defect inspection is frequently hindered by data scarcity and the resulting class imbalance in supervised learning. This study proposes a CycleGAN-based data augmentation pipeline designed to synthesize realistic defective CD-SEM images from abundant normal patterns, incorporating a quantitative quality control mechanism. Using an ADI CD-SEM dataset, we conducted a sensitivity analysis by cropping original 1024 × 1024 micrographs into 512 × 512 and 256 × 256 inputs. Our results indicate that increasing the effective defect-area ratio is critical for improving generative stability and defect visibility. To ensure data integrity, we applied a screening protocol based on the Structural Similarity Index (SSIM) and a median absolute deviation noise metric to exclude low-fidelity outputs. When integrated into the training of XceptionNet classifiers, this filtered augmentation strategy yielded substantial performance gains on a held-out test set, specifically improving the Recall and F1 score while maintaining a near-ceiling AUC. These results demonstrate that controlled CycleGAN augmentation, coupled with objective quality filtering, effectively mitigates class imbalance constraints and significantly enhances the robustness of automated defect detection.

Keywords:

CycleGAN-based data augmentation; ADI CD-SEM dataset; structural similarity index (SSIM); XceptionNet classifiers

1. Introduction

1.1. Defect Detection Plays a Key Role in Semiconductor Manufacturing Yield and Cost

Moore’s Law has driven the semiconductor industry into an era of unprecedented complexity, with critical dimensions (CDs) of integrated circuits shrinking into the single-digit nanometer regime. One of the critical techniques is the lithography process, which is responsible for integrated circuit (IC) patterns transferring from a photomask onto a silicon wafer. To overcome resolution limitations, optical lithography is heavily reliant on Resolution Enhancement Techniques (RETs) to mitigate image degradation caused by optical diffraction and IC process effects [1]. Among these RETs, Optical Proximity Correction (OPC) has become a necessary technique for patterning correction. However, its accuracy is inherently limited by complex interactions between optical, resist, and etch processes, particularly at advanced technology nodes where edge placement errors and pattern shifts become increasingly significant [1,2].

Despite continuous improvements in OPC modeling accuracy, some OPC model residual induced systematic defects are difficult to detect during early process verification stages [3]. These model-to-wafer deviations induced defects like end-to-end short, bridging between lines, or line broken. In fact, not only systematic defects, but also random defects introduced at any stage of the IC manufacturing process can be devastating to integrated circuit functionality. In particular, the random defects are unpredictable in their location and size [3]. Defect issues do not only reduce the IC manufacturing yield but also cause reliability risks. To overcome these challenges, new metrology techniques that include optical inspection and electron-beam inspection have been widely deployed to monitor both critical dimensions and defectivity [4].

Recent advanced full-chip inspection systems, leveraging broadband illumination, AI-driven image analysis, and high-throughput platforms, enhance inspection capability to capture both systematic and random defects [5,6]. However, they also generate massive volumes of inspection data that require human effort to carry out defect verification because the tool classification capability is not good enough. In our previous studies, deep learning-based frameworks have been developed to address this challenge by enabling automated defect recognition and SEM image analysis, including a two-stage CNN strategy for hotspot monitoring, layout-to-SEM image reconstruction with uncertainty calibration, and stochastic-aware hotspot detection using CycleGAN-augmented SEM data [7,8,9]. To save engineering time and improve defect inspection accuracy, an advanced methodology is needed, and AI-driven automation is a possible opportunity to meet targets for yield and reliability improvement.

1.2. Leveraging Generative Adversarial Networks (GANs) to Enhance Defect Detection Capability

Conventional defect inspection methods are labor-intensive and time-consuming, and the semiconductor industry has turned to machine learning (ML) techniques to automate defect detection and enhance defect classification accuracy [10]. Over the past decade, ML models have evolved significantly, transitioning from neural networks (NNs) to deep neural networks (DNNs), convolutional neural networks (CNNs), and other advanced architectures optimized for image recognition tasks [10,11]. These models have demonstrated remarkable success in identifying abnormal defects on semiconductor wafers, thereby offering a promising alternative to manual defect review. However, a traditional obstacle remains: the scarcity and imbalance of defect images. Unlike general computer vision images, the defect images in semiconductor processes are rare and often exhibit significant imbalanced classification, which constrains the model performance and increases the likelihood of false negatives [11]. Empirical studies have shown that the effectiveness of classical augmentation can be highly sensitive to the choice of operations and hyperparameters (e.g., magnitude, probability, and ranges), which may substantially influence both the image quality and downstream model performance [12]. In this work, we do not perform an augmentation policy search or hyperparameter tuning; instead, we intentionally adopt a minimal and fixed classical augmentation recipe (random horizontal flip and small random rotation) to avoid confounding effects and to isolate the contribution of CycleGAN-based sample enrichment.

To address this issue, researchers have explored various data augmentation techniques, including geometric transformations and oversampling methods such as the Synthetic Minority Oversampling Technique (SMOTE) [13,14,15]. These approaches have shown some efficacy in expanding training datasets and improving model robustness; however, they often fall short in capturing the real defects. This limitation implies that we need more sophisticated data generation methods to produce realistic and diverse defect images.

The advanced ML model of Generative Adversarial Networks (GANs) has created a new pathway for resolving data scarcity issues in defect inspection. GANs operate by learning the underlying distribution of defective images and generating synthetic images that closely resemble real defects [16]. Recent studies have demonstrated that GAN-based data augmentation can significantly enhance the defect detection performance, particularly for rare or unseen defect classes [17,18,19]. GANs enrich training datasets with high-fidelity synthetic images, enabling ML models to learn more diverse features, and thus improving the classification accuracy and reducing the reliance on manual verification. GAN-based augmentation has also been successfully applied to mitigate data imbalance in other industrial fault/defect settings, supporting its suitability as a class-enrichment mechanism [20]. Complementary to synthesis-based enrichment, recent few-shot learning approaches (e.g., global–local feature fusion) provide an alternative route to improve the performance under sample scarcity; our work is orthogonal and complementary in that it enriches the image domain (after QC) to better support standard CNN training under severe defect imbalance [21].

1.3. Advantages of GAN Model and Framework to Mitigate Data Scarcity Issue

To build up a robust ML model for defect inspection, comprehensive images are necessary; however, both paired good and defective images are scarce, and we also need to preserve geometric features. It is really a significant challenge for conventional supervised learning approaches [17,18,19]. Generative Adversarial Networks (GANs), particularly Cycle-Consistent GANs (CycleGANs), have emerged as a promising solution to address these limitations by enabling unpaired image-to-image translation [22]. A CycleGAN’s cycle-consistency loss ensures that transformations between domains maintain structural integrity, making it suitable for generating realistic defect images without compromising critical layout features [22,23].

The generator architecture for CycleGAN is a crucial choice for the fidelity of synthesized images. ResNet-based generators, which utilize residual blocks to learn localized differences, have demonstrated a superior performance in preserving essential circuit features such as linewidth and contact apertures [24,25]. In contrast, U-Net architectures, while effective in biomedical segmentation tasks, may introduce undesirable spatial distortions due to their aggressive feature mixing across scales [26,27]. Consequently, ResNet is the favored architecture for CD-SEM image synthesis, where maintaining topological accuracy is important.

Recent studies have validated the efficacy of generative models in semiconductor defect inspection. They adapt conditional GANs and diffusion-based generators, which have shown an improved defective detection performance in limited-data regimes and highlighting the potential of synthetic data augmentation [17,18,19,28]. Our study will focus on CycleGAN with a ResNet architecture, as this approach can support broader high-fidelity defective images’ generation and support robust training of defect detection models, ultimately enhancing automated inspection capabilities in semiconductor fabrication.

1.4. Quantify the Quality of Synthesized Defective Images

CycleGAN can perform unpaired image-to-image translation, which allows the producing of defective images with retention of most of the original patterns and introduces localized anomalies; it mimics real defective images without requiring paired datasets [28,29]. One of the critical factors that influence the fidelity of CycleGAN-generated defect images is the input image resolution. A higher resolution enhances structural details in synthesized outputs, leading to a good perceptual quality and defective image quality [28,29,30]. However, an increasing resolution also requires greater computation demands, making it necessary to leverage the circuit complexity and generation efficiency [30]. An alternative method is to keep the image size and increase the image resolution as well, but the drawback is a lower pattern complexity.

It is crucial to evaluate the quality of synthetic defect images, so robust image quality assessment (IQA) metrics are needed. For defective images’ generation, full-reference IQA (FR-IQA) methods are particularly suitable, as they compare synthesis images against original references to quantify deviations. Among FR-IQA metrics—such as the mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and feature similarity index (FSIM)—SSIM stands out for its alignment with human visual perception [31,32,33]. SSIM evaluates luminance, contrast, and structural distortions, and it is outstanding among pixel-wise metrics for defect-like anomalies [32]. This study adapts CycleGAN with a ResNet structure for defective images’ synthesis, prioritizing resolution balancing and SSIM-guided quality control. By defining an optimal SSIM interval, we aim to produce high-fidelity datasets that support machine learning model build-up.

1.5. Optimize Data Sampling Methodology to Ensure Robust ML Model Capability

The primary objective of this study is to enrich the minority defect class, thereby mitigating class imbalance and improving the accuracy of ML models. However, prior research highlights that the proportion of synthetic images in the training dataset must be carefully controlled. Excessive reliance on synthetic data can lead to overfitting to the generative model’s style, and it will reduce the model’s ability to detect real defects [34].

Another critical consideration is the domain gap between synthetic and real images; it can degrade model performance if not properly handled. Fortunately, prior studies have shown that CycleGAN-based domain adaptation techniques can significantly reduce this domain gap by learning bidirectional mappings between domains while preserving structural integrity [35,36]. This capability makes CycleGAN particularly suitable for synthetic images that essentially need to maintain fine structural details as original images. The quality of synthetic images also plays a key role in determining the effectiveness of data augmentation.

Recent research demonstrates that filtering out low-quality synthetic images leads to remarkable improvements in ML model performance within the same training datasets [35,36]. Consequently, incorporating image quality assessment into data augmentation is crucial for maximizing the benefits of synthetic data. We aim to determine an optimal ratio of synthetic to real images that balance diversity and realism, leverage CycleGAN’s domain adaptation capabilities to minimize domain discrepancies, and implement quality control mechanisms to exclude low-fidelity synthetic images.

1.6. Contribution and Scope of This Work

This study presents a new methodology to generate defective images using CycleGAN, which mitigates the major challenge of defective images’ scarcity and imbalance in ML-based defect detection. While the traditional augmentation methods, such as geometric transformations and oversampling, fail to improve ML model accuracy and it is not possible to collect more real defective images, the proposed CycleGAN flow provides a promising alternative by synthesizing defective images without requiring paired datasets. These models can enrich minority classes, improve classifier robustness, and reduce the reliance on manual inspection. This work contributes as follows:

(1): Optimize CycleGAN-based augmentation to mitigate data scarcity and class imbalance.
(2): Adopt ResNet-based generators for preserving geometric fidelity and reduce the ML training time.
(3): Introduce a resolution-aware synthesis with SSIM-guided IQA for perceptual realism.
(4): Develop a robust data selection methodology for generated defective images and optimize synthetic-to-real image ratios to minimize discrepancies and filter low-quality images.

By combining a generative model, image quality assessment, and synthesized image selection, this study establishes a scalable pipeline for defect detection ML for semiconductor manufacturing. The proposed approach advances synthetic data-driven inspection and provides actionable insights for deploying AI-enhanced solutions for IC high-volume manufacturing.

2. Materials and Methods

Our study addresses the significant class imbalance in the After Development Inspection (ADI) CD-SEM image dataset, which comprises 517 normal and only 209 defective samples from the microlithography process. To mitigate the risk of biased model training, the CycleGAN approach was employed to generate realistic defective images from normal ones, thereby enriching the minority class and improving the dataset balance.

The generation process was systematically optimized in a multi-step approach. First, we determined the optimal image resolution by using noise as a key performance indicator. At the same time, we compared ResNet and U-Net generator architectures to select the most effective framework for producing realistic defect features. Secondly, we established the generation framework and identified the ideal training duration. The Structural Similarity Index (SSIM) was used as a metric to define an optimal training duration range, effectively balancing the trade-off between generating good defective images and avoiding image blurring.

Once the synthesis range was established, the generated defect images were integrated into the dataset. All images that included original and generated images were normalized to a unified grayscale format and resized to maintain a consistent resolution. The combined dataset was split into 60%, 20% and 20% for training, validation, and testing subsets respectively, ensuring statistical independence between each group. This dataset provided the foundation for subsequent XceptionNet model training, where Accuracy, Precision, Recall, F1 score and AUC served as performance indicators to compare the proposed CycleGAN-augmented dataset against conventional training datasets.

2.1. Dataset Acquisition and Characterization

This study utilized a dataset provided by a semiconductor manufacturer. All images were captured by a Critical Dimension Scanning Electron Microscope (CDSEM; Hitachi, Tokyo, Japan) in the ADI stage of a dedicate mask. Each image covers an on-wafer area of 1 μm × 1 μm with an image resolution of 1024 × 1024 pixels. To ensure the coverage of a broad variety of lithographic pattern behaviors, data were systematically collected from those dies located near the process window boundary in the same field, and approximately 20 pattern categories were identified. All images were manually examined and classified by senior process engineers. Based on lithographic print integrity, the dataset was divided into two classes: (1) normal patterns (517 images) and (2) defective patterns (209 images). Defects were defined as patterns exhibiting a complete broken or complete bridging relative to their nominal printed shape. The number of defective images was substantially lower than that of normal images because OPC significantly reduces the probability of systematic pattern failure. To mitigate this class imbalance, data collection was augmented by intentionally sampling dies at the process boundary, where defect occurrence is more likely. This strategy ensured that the dataset included defect types of high manufacturing relevance, improving the representativeness of the defect set for model training.

CDSEM images contain many patterns within a single field of view. A large image size facilitates the human detection of defects due to their broader coverage; however, prior studies have shown that extremely small defects produce weak signals in machine learning models, making them difficult to learn effectively. To address this limitation, regions of interest (ROIs) were extracted from each 1024 × 1024 image at two additional resolutions of 512 × 512 pixels and 256 × 256 pixels. This approach allowed us to evaluate which resolution yields the highest fidelity in generated defective patterns. CycleGAN was employed to synthesize additional defective patterns while preserving the majority of pattern geometry from the original images. Since we want the generated defect features to occur in a tiny region, the generated image should largely preserve the original features.

Training a generative model requires numerous iterations; thus, the model was trained for an extended number of epochs, and image outputs were recorded at every 10,000-epoch interval. These intermediate snapshots facilitated the systematic analysis of training stability and generation quality across different resolutions.

The evaluation of generated defect images proceeded in two stages. First, humans visually examined generated samples at each resolution to qualitatively assess whether the intended defect features were successfully reproduced. Because the goal is to introduce a defect without altering unrelated regions, visual inspection was essential for verifying localized pattern modification. Second, a noise metric was computed to quantify unintended modifications introduced by CycleGAN. The noise score measured pixel-level differences between the generated image and the original image. Smaller noise values indicate that the generative model preserved most pattern geometry while modifying only defect-relevant regions. This metric was therefore used to compare the effectiveness of different image sizes in maintaining structural fidelity while producing detectable defects.

2.2. CycleGAN Architecture

This study employs a CycleGAN to translate normal CDSEM images into their defective counterparts without requiring paired training data. As showed in Figure 1, the framework consists of two generators and two discriminators that jointly learn a bidirectional mapping between the normal domain (X) and the defect domain (Y). Each generator is trained to produce visually realistic images that fool the corresponding discriminator, while the discriminator learns to distinguish generated outputs from real samples using the least-squares adversarial loss.

To ensure structural fidelity in CDSEM imagery, the model incorporates a cycle-consistency loss, which enforces that an image translated from (X → Y) and back to (X) reconstructs the original input. An identity loss further stabilizes training by encouraging generators to preserve global line patterns when the input already belongs to the target domain. This design enables the CycleGAN to modify only defect-related regions while maintaining geometric consistency, making the framework suitable for synthesizing realistic defective CDSEM patterns for downstream defect-detection tasks.

Two CycleGAN architectures were implemented to translate normal CDSEM images into synthetic defective images: one using a ResNet-based generator and the other using a U-Net-based generator. Both models were trained with a learning rate of 2 × 10⁻⁴, batch size of 3, the Adam optimizer

(β_{1} = 0.5

), and standard CycleGAN losses with weights of 10 for cycle consistency, 5 for identity, and 1 for adversarial loss.

The ResNet generator follows Zhu et al. [22] and contains an encoder, 9 residual blocks, and a decoder. Each residual block uses two 3 × 3 convolutions with instance normalization and skip connections. This structure enables identity-preserving transformations that modify only small defect-relevant regions while maintaining the global line-edge and line-width integrity essential for CDSEM imagery. The design is particularly robust when defects are subtle relative to the normal pattern.

The U-Net generator adopts an 8-level encoder–decoder with long skip connections at each symmetric layer. Convolutional downsampling reduces the feature resolution to a bottleneck representation, while transposed convolutions restore full resolution. Skip connections directly fuse encoder features into the decoder, enabling precise spatial alignment and improved detection of defects occupying only a few pixels. However, the same skip connections can propagate SEM noise into the generated output.

2.3. Adapt MAD and SSIM to Evaluate Image Quality

To evaluate the appropriateness of the image resolution and determine the degree of structural consistency between generated and original CD-SEM images, we employed two quantitative metrics; one is the Median Absolute Deviation (MAD) and the other one is SSIM. MAD provides a global assessment of pixel-wise perturbations, and SSIM offers localized, perceptually grounded insight into structural fidelity, which is useful for evaluating CycleGAN-generated micro-patterns.

To quantify the noise level in generated CDSEM images of different image sizes, we employ MAD, which is sensitive to strong signal features such as edges and textures. The noise standard deviation, sigma, is estimated using Equation (1).

\hat{σ} = \frac{m e d i a n (|W_{H H} - m e d i a n (W_{H H})|)}{0.6745}

(1)

where

W_{H H}

represents the diagonal detail coefficients from the first level of the Haar wavelet transform. The denominator, 0.6745, is the inverse cumulative distribution function of the standard normal distribution at the 0.75 quantile, scaling the MAD to be a consistent estimator for Gaussian noise.

To capture local structural fidelity, SSIM was employed. SSIM between patches x and y is defined as

S S I M (x, y) = l (x, y) c (x, y) s (x, y)

(2)

where

l (x, y) = \frac{2 μ_{x} μ_{y} + C_{1}}{μ_{x}^{2} + μ_{y}^{2} + C_{1}}

(3)

c (x, y) = \frac{2 σ_{x} σ_{y} + C_{2}}{σ_{x}^{2} + σ_{y}^{2} + C_{2}}

(4)

s (x, y) = \frac{σ_{x y} + C_{3}}{σ_{x} σ_{y} + C_{3}}

(5)

Here,

μ_{x}

,

μ_{y}

denote local means,

σ_{x}

,

σ_{y}

their standard deviations, and

σ_{x y}

the cross-covariance;

C_{1}, C_{2}, C_{3}

are stabilizing constants. By integrating these three components, SSIM emphasizes localized structural alignment rather than raw pixel similarity. CD-SEM images are characterized by high-contrast line edges, narrow trenches, and critical dimension features, so SSIM is particularly advantageous because it sensitively detects changes in edge sharpness, pattern rounding, and micro-bridge formation.

2.4. Classifier Model and Model Performance KPI

2.4.1. Adapt Xception Model as Classifier

The classifier employed in this study is the Xception model and the dataset consists of images categorized into “non-defective” and “defective” classes. Prior to training, all images were converted to JPEG format to ensure consistency and partitioned into a training set (60%), a validation set (20%) and a validation set (20%). To enhance the model’s generalization capabilities and mitigate overfitting, data augmentation techniques, specifically random horizontal flipping and random rotation (10%), were applied during the training phase. The proposed model is a custom CNN architecture incorporating depthwise separable convolutions and residual connections. The network starts with a rescaling layer, followed by a series of convolutional blocks utilizing Rectified Linear Unit (ReLU) activation and Batch Normalization. The core architecture features residual blocks with increasing filter sizes (128, 256, 512, 728), employing SeparableConv2D layers to capture spatial and cross-channel correlations independently. The model concludes with a Global Average Pooling layer and a Dropout layer (rate = 0.5) to further regularize the network. The output layer utilizes a sigmoid activation function for binary classification. Training was conducted for 400 epochs using the Adam optimizer with a learning rate of

10^{- 3}

and a batch size of 8, minimizing the binary cross-entropy loss function.

2.4.2. Binary Classification Evaluation Metrics

The performance of the proposed machine learning model was rigorously evaluated using five standard metrics derived from the confusion matrix: Accuracy, Precision, Recall, F1 Score, and the Area Under the Receiver Operating Characteristic Curve (AUC). For a binary classification task, let TP (True Positives) be the number of instances correctly identified as positive, TN (True Negatives) be the number of instances correctly identified as negative, FP (False Positives) be the number of negative instances incorrectly labeled as positive, and FN (False Negatives) be the number of positive instances incorrectly labeled as negative. The total number of instances,

N

, is given by

N = T P + T N + F P + F N

. We define a defective image as TP since it is crucial in defective image classification.

Accuracy is the ratio of correctly predicted instances to the total number of instances. It represents the overall effectiveness of the model.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(6)

Precision (also known as Positive Predictive Value) is the ratio of correctly predicted positive observations to the total predicted positive observations. It measures the model’s ability to avoid labeling negative samples as positive.

P r e c i s i o n = \frac{T P}{T P + F P}

(7)

Recall (also known as Sensitivity or True Positive Rate) is the ratio of correctly predicted positive observations to all observations in the actual positive class. It measures the model’s ability to find all the positive samples.

R e c a l l = \frac{T P}{T P + F N}

(8)

The F1 Score is the harmonic mean of Precision and Recall. It is a robust metric, particularly useful when the class distribution is imbalanced, as it penalizes models that have a poor performance in either Precision or Recall.

F 1 S c o r e = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l} = \frac{2 \cdot T P}{2 \cdot T P + F P + F N}

(9)

The AUC measures the model’s ability to distinguish between positive and negative classes across all possible classification thresholds. The ROC curve plots the True Positive Rate (Recall) against the False Positive Rate (

F P R = \frac{F P}{F P + T N}

) at various threshold settings. An AUC value of 1.0 indicates a perfect classifier, while an AUC of 0.5 suggests no better performance than random chance. In practice, the AUC is typically calculated using numerical integration methods or non-parametric statistics.

3. Result

3.1. Computation and Image Resource

The core of this study is a CycleGAN-based architecture employed to generate synthetic defect images from non-defect images, and then these generated images were merged with the original defect dataset. The Xception architecture was adopted as the baseline model due to its good capability in defect classification reported in prior studies. All experiments were run on a workstation equipped with an Intel i7-13700 CPU, 64 GB DRAM, and an NVIDIA GeForce RTX 4090 GPU with 24 GB of GDDR6X VRAM, ensuring sufficient computational resources for both image generation and deep model training. The experimental dataset consisted of grayscale CDSEM images acquired from a leading foundry’s 22 nm process node in the ADI stage. All original images are in the TIF format, with 256-level grayscale, and 1024 × 1024 pixels in resolution. To investigate the influence of the image size on the generative model’s performance, a customized script was developed to chop and resize the original images into several different dimensions for study. After the optimized image size was verified for CycleGAN image generation, the original dataset was converted to a new image size. It directly impacts both the training time of the CycleGAN and the following classifier.

3.2. Optimizing Input Resolution for Defect Generation

This study began by exploring how to effectively generate high-quality CDSEM defective images using for the CycleGAN generator. The ability to generate realistic defective patterns is the most critical step. As shown in Figure 2, the original image with a size of 1024 × 1024 pixels contains a small line break defect with 5 × 10 pixels that only occupies a small area portion of 0.0048%. It presents a low signal-to-noise ratio (SNR) that may hinder the CycleGAN’s ability to effectively learn defect features. Based on this hypothesis, we try to amplify this weak defect signal by chopping off no defective pattern area of the original image and regenerate new images with sizes of 512 × 512 pixels and 256 × 256 pixels for the CycleGAN model training. The goal was to find the best image size, which can yield the most stable and realistic defect image for the generator. The results are summarized in Figure 3, where original denotes the defect-free image and subsequent show CycleGAN-generated results at different epochs.

As shown in Figure 3, the training duration increment is 10,000 epochs (denoted as 10k epochs) to examine its influence on the image quality. For the original image with a size of 1024 × 1024, after 10k epochs, the generated images became noticeably blurred, and details deviate from the originals. As training progressed, the image quality deteriorated further, yielding no usable defect images. For the image size of 512 × 512, there is only minor surface change, which appeared at 10k epochs, and slight surface variations were visible at 20k epochs. After 30k epochs of training, heavy blurring appeared, which is similar to the performance of 1×, and no good defect image was generated. In contrast, the images size of 256 × 256 exhibited mild surface changes from 10k to 30k epochs and kept high similarity with the original image. After 40k epochs, the generated images not only preserved the original structure but also showed scum and bridging defective features, and this behavior remained stable through 50k epochs. These findings indicate that a 256 × 256 image size provides the most effective performance for CycleGAN defect image generation; therefore, we convert all original images with a size of 1024 × 1024 to 256 × 256.

For advanced IC manufacturing, defect sizes vary across process nodes, and this variability is commonly addressed by adjusting CD-SEM magnification during the review stage. By calibrating the magnification according to the process context, the effective defect-area ratio within a fixed input resolution can be normalized to a predefined size (e.g., 256 × 256). This normalization enables the CycleGAN model to operate on a consistent input resolution while remaining sensitive to defects at different physical scales.

We subsequently adapted MAD to quantitatively validate these visual findings by measuring the deviation between the generated and original images. As shown in Figure 4, the noise index for the 1× resolution images increased only slightly with extended training. In comparison, the noise for the 1/4× resolution images increased at approximately twice the rate of the 1× images. Furthermore, the noise for the 1× resolution images increased at roughly double the rate of the 1/4× images. This quantitative result matches our visual assessments. Based on these findings, we selected the 1/16× size as the baseline setting for all subsequent CycleGAN generation.

3.3. Classification of Generated Defect Images

Based on the conclusions discussed in the previous section, we used 263 defect-free CDSEM images with a 0.25× size to generate synthetic defect images. We observed that the generated images could be roughly classified into three types, and their quality is a function of training epochs, which are illustrated in Figure 5.

The first type (Type-1) exhibited only small surface speckles or discontinuities before approximately 40k training epochs, while maintaining a high degree of consistency with the original pattern. As training progressed, a distinct defect was generated, but the structural similarity with the original image was high. Type-1 represented a desirable outcome, as the generated defects were visually realistic without structural distortion. The second type (Type-2) showed clear defect features after 20k training epochs, but with more noticeable deviations from the original pattern. After 60k epochs of training, these images become blurry and difficult to distinguish, and they are unsuitable for subsequent machine learning applications. The third type (Type-3) displayed severe noise accumulation along the pattern edges as early as 10k epochs. After 20k epochs, although numerous defect-like features appeared, they strongly deviated from the original pattern. When training epochs exceeded 60k, these images degraded into complete blurred and chaotic signals, which indicates that overfitting or an instability phenomenon happened after 60k training.

These observations reveal that images generated under same training condition can exhibit substantial variability; therefore, we need a quantitative metric to evaluate whether the generated patterns are within acceptable limitations. The Structural Similarity Index (SSIM) was adopted as such a metric; it can quantify how closely a generated image resembles the original image while capturing localized distortions. Accordingly, all SSIM values reported here represent comparisons between generated images and original images. As shown in Figure 6, the SSIM values of all generated images decreased while training epochs were increasing. The initial drop was steep, followed by a slower rate of decline as training progressed. However, the rate of SSIM decay varied across pattern types: Type 1 images demonstrated the slowest decline, indicating stable structural preservation and a higher likelihood of producing high-quality defective images; Type 2 images decayed more rapidly but still yielded moderate-quality defects; whereas Type 3 images exhibited the steepest decline, often falling below SSIM = 0.2 beyond 70k epochs, corresponding to a severely degraded image quality from the originals. These findings confirm that SSIM is an effective indicator for determining the appropriate training epoch range and identifying whether or not generated images are suitable for downstream machine learning applications.

Importantly, SSIM is utilized here as a relative guiding metric rather than an absolute binary filter. Due to the inherent structural heterogeneity among the three pattern types, a universal SSIM threshold is impractical. Instead, the metric serves to describe the effective training window, identifying the transition from under-trained features to over-trained structural distortions.

3.4. Comparison of Generator Architectures

Using the same images, two common architectures for CycleGAN were compared: ResNet, the architecture used in the previous subsection, and U-Net. Based on the finding in the previous section that SSIM is a suitable metric for the quality qualification of generated CD-SEM images, we continued to use it to evaluate these two architectures. After 50k training epochs, we observed that the SSIM value for the U-Net architecture decreased at a much slower rate than that of ResNet, as shown in Figure 7. The average SSIM of ResNet was 0.47, whereas the U-Net average SSIM was as high as 0.72. Notably, 50.1% of the U-Net-generated images still had an SSIM score above 0.8, while the highest SSIM for any ResNet-generated image was below 0.6.

To further evaluate the practical defect synthesis capability, we tested a rule-based defect verification criterion that is same as real CD-SEM image labeling. Based on this criterion, the ResNet-based generator achieved a defect generation success rate of 77%, whereas the U-Net-based generator achieved only 21% at 50k training epochs. It indicates that a large fraction of U-Net outputs with high SSIM remain nearly indistinguishable from the original defect-free images, suggesting insufficient defect synthesis rather than a superior image quality. In contrast, the ResNet-based generator reaches the effective defect-generation regime earlier, exhibiting a faster SSIM decline that correlates with the emergence of realistic localized defect patterns.

In Figure 8, we selected three images from the U-Net model (trained for 50k epochs) with SSIM scores above 0.9 and compared them against both original images and the images generated by ResNet over the same training period. It is evident that the images generated by U-Net are almost same as the originals. Conversely, the images generated by ResNet show considerable differences from the source. It demonstrates a high correlation with the quantitative SSIM scores. This result indicates that the U-Net’s training convergence for our task is significantly slower, requiring a much longer running time to produce good defective patterns. Therefore, we selected the ResNet architecture as the baseline for the CycleGAN generator.

3.5. Enhancing Defect Detection Performance via CycleGAN Data Augmentation

In semiconductor process monitoring, the misclassification of a defective pattern as a normal pattern can lead to delayed discovery of an abnormal product or process, which can impact manufacturing yield. Conversely, if a normal pattern is misclassified as anomalous, the product is flagged for further inspection. Since it is a false alarm, there is no impact on product quality; the only consequence is the consumption of additional manpower and resources for verification. Comparatively, the misclassification of defects is a more severe issue, and this is the metric we prioritize for improvement. Therefore, we define defective patterns as the “positive class” and normal patterns as the “negative class”. This classification scheme facilitates our focused evaluation of the defect detection performance.

The model trained using the original image database is designated as the baseline model, and additional models were subsequently developed by augmenting this original dataset with synthetic defect images. Based on the findings from the previous section, we identified the generated images with 30k, 40k, and 50k epochs of training are the most suitable conditions for training, and this resulted in three augmented models. These generated through CycleGAN at 30k, 40k, and 50k training epochs were denoted as baseline+30Ke, baseline+40Ke, and baseline+50Ke, respectively. As discussed in the previous section, generated images at the same epochs did not perform with a uniform quality; some images were under-trained, and others were over-trained. Therefore, a “rule-based defect classification” and filtering procedure, identical to that used for real wafer CD-SEM image labeling, was applied to exclude unsuitable synthetic images. Specifically, the generated images were included if (i) their partial geometry was substantially altered (e.g., line broken or bridging), rather than exhibiting merely partial pattern deformation, (ii) they resembled lithography-related defects rather than random defects, or (iii) there was the preservation of global pattern integrity. This procedure does not rely on subjective preference but follows the same labeling principles used for the baseline dataset. Datasets constructed using this procedure are denoted as filtered datasets (e.g., filtered_30Ke). A model trained on the baseline dataset augmented with this dataset is named “baseline + filtered_30Ke”. We applied this same procedure to the 40Ke and 50Ke datasets, resulting in three additional “filtered” datasets. We calculated the accuracy, precision, recall, F1 score, and AUC for all models, with comparative results presented in Table 1.

When compared to the baseline model, the change in accuracy for models augmented with an unfiltered dataset was 30Ke (−0.77%), showing a little degradation, while 40Ke (0.37%) and 50Ke (0.06%) showed a little improvement. In contrast, the filtered models showed varying degrees of improvement: filtered_30Ke (0.71%), filtered_40Ke (0.72%), and filtered_50Ke (1.7%). Compared with the baseline model, the F1 score was improved across all new models: 30Ke (0.46%), 40Ke (0.99%), 50Ke (1.45%), filtered_30Ke (1.83%), filtered_40Ke (1.86%), and filtered_50Ke (3.23%). Recall was also improved for all augmented models: 30Ke (0.9%), 40Ke (2.8%), 50Ke (2.8%), filtered_30Ke (4.31%), filtered_40Ke (3.03%), and filtered_50Ke (4.43%). We observed a decrease in precision for 40Ke (−0.9%) and filtered_30Ke (−0.71%); however, improvements were observed in 50Ke (0.72%), filtered_40Ke (0.63%), and filtered_50Ke (1.96%). The AUC for all models exceeded 99.5%, with all differences relative to the baseline within approximately 0.1% (30Ke: −0.13%, 40Ke: −0.13%, 50Ke: −0.05%, filtered_30Ke: −0.01%, filtered_40Ke: −0.08%, filtered_50Ke: 0.05%). These results, shown in Table 1B, indicate that while augmenting the dataset with unfiltered generated images provides a positive benefit to Recall and F1 score, the impact on Accuracy and Precision is variable. In contrast, models trained by augmenting the baseline dataset with the filtered image dataset demonstrated positive improvements across Accuracy, Precision, Recall, and F1 score.

4. Experimental Results and Discussion

4.1. Defect-Area Sensitivity Analysis

The experimental results demonstrate that the relative size of defects within CDSEM images has a critical influence on the performance of image generation using CycleGAN. When the ratio of defect area to the whole image is too small, the generator struggles to capture structure details for new image generation. In the examined sample (Figure 2), the defect area is approximately 5 × 10 pixels, corresponding to 0.0046% of the total area of a 1024 × 1024 image. This extremely low signal-to-noise ratio limits the model’s ability to learn defect features effectively.

To keep the defect area but chop the original image to 512 × 512 pixels, the defect-area ratio increases to 0.019%, and further chopping to 256 × 256 pixels increases it to 0.076%. Empirical results indicate that as this ratio increases, the noise metric of the generated images remains stable. This finding implies that a defect-size sensitivity test should be conducted prior to CycleGAN training to determine the optimal input resolution. In this study, an area ratio around 0.076% yielded the most stable and realistic synthetic defects.

However, every dataset contains different defect-size ratio images, and the optimal image size for the CycleGAN generator may differ across patterns. To select an image size with a specific defect-area ratio, we could suffer certain samples with under-learning and others over-learning as they have a different defect-area ratio from the target. Consequently, fixed image resizing may lead to an inconsistent generation quality. A CycleGAN architecture that has better tolerance to the defect-area ratio distribution could provide a more robust performance for CDSEM image synthesis.

4.2. Evolution of Generated Image Quality and Architectural Comparison

The quality of CycleGAN-generated CDSEM defect images is gradually improved with increasing training epochs. However, the success rate between different layout styles is not equal; in other words, not all non-defect images can be successfully converted into defective images. Because the generative process is inherently stochastic, post-generation quality evaluation is required to ensure meaningful output. Traditionally, CDSEM image defects happen in the local area, and high-quality generated images should preserve most of the original structures while introducing only local changes—such as bridging or break defects. SSIM is well-suited to capturing such pixel-level deviations. Our results indicate that an SSIM value of around 0.5 often correlated with the generation of appropriate defect patterns. However, SSIM is not a perfect indicator; it measures all pixel deviations, including undesirable features or blurring, not only the intended defect. At the same training epoch, we observed a wide SSIM range (e.g., 0.35–0.45), confirming the “under-learning” and “over-learning” phenomenon. Furthermore, we found that an SSIM value below 0.3 was a strong indicator of catastrophic generation failure, resulting in heavily blurred and unusable images. For certain patterns, prolonged training resulted in an abrupt drop of SSIM below 0.3, implying excessive structural distortion. Visual inspection confirmed that these samples exhibited severe blurring and poor fidelity. Therefore, generated images with SSIM < 0.3 are unlikely to be acceptable defect representations. By evaluating the downstream classifier performance across multiple epoch settings (30k, 40k, and 50k) instead of a single SSIM point, this study provides a more transparent and reproducible framework. This approach explicitly demonstrates how the synthetic images directly influence the classifier performance, acknowledging that the optimal quality exists as a functional range tailored to specific process windows. Accordingly, SSIM complements visual inspection as a reliable quantitative measure of generated image quality.

A narrower SSIM distribution indicates more synchronized generation behavior across different layout types. To investigate the effect of architectural choice, the CycleGAN backbone was modified from ResNet to U-Net. Quantitative comparison shows that, under the same number of epochs, CycleGAN + U-Net yields a lower average SSIM and a broader range than CycleGAN + ResNet. This outcome implies slower convergence and greater variability in image quality for the U-Net configuration. Accordingly, SSIM can complement visual inspection as a reliable quantitative measure for generated-image quality. Within the current CDSEM dataset, SSIM analysis consistently indicates that CycleGAN with ResNet produces more stable and higher-quality defect synthesis than CycleGAN with U-Net.

4.3. Model-Performance Evaluation

The model’s Accuracy represents the overall correctness of model predictions. Among the six models trained with different CycleGAN epochs and dataset filtering conditions, the model trained with 30k epochs (30Ke) exhibited lower accuracy than the baseline. Although the 40Ke model showed partial recovery, it remained slightly lower than the baseline until 50Ke, where accuracy roughly matched it. It suggests that unfiltered synthetic images may negatively affect both true-positive (TP) and true-negative (TN) classifications; however, the Recall value at 30Ke and 40Ke exceeded the baseline. The degradation in accuracy was primarily contributed to reduced TNs rather than TPs. This suggests that unfiltered, low-quality generated images cause noise, negatively affecting the model’s ability to correctly detect non-defective images. However, a positive correlation was observed between accuracy and training duration. Furthermore, removing low-quality generated images yielded significant improvements, with the filtered_50Ke model achieving the highest overall accuracy.

The model’s Precision did not show a consistent improvement with increasing training epochs, but it improved markedly after removing the low-quality generated images in Filtered_40Ke and Filtered_50Ke. This improvement is attributed to a reduction in false positives (FPs), meaning fewer non-defective images were misclassified as defective. The filtered datasets contained a higher proportion of accurately generated defect features, which assisted the model in refining its decision boundaries and avoiding false alarms on defect-free images.

Recall, which measures the model’s ability to correctly identify defective patterns, improved steadily from 30Ke to 50Ke as false negatives (FNs) decreased with a longer CycleGAN training duration. After removing defective synthetic samples, recall further increased, indicating that the high-quality generated defect images positively influenced defect detection sensitivity.

Given that the baseline dataset is a binary classification problem with relatively few defective samples, the F1 score serves as a balanced indicator of model performance under class imbalance. Its trend closely followed recall, increasing progressively from 30Ke through 50Ke and achieving the highest value of 99.4% in filtered_50Ke. These results summarized in Figure 9 demonstrate that although the original dataset was imbalanced, the augmentation of defect CDSEM images via CycleGAN successfully mitigated this bias, thereby enhancing the model’s harmonic mean performance.

4.4. Why CycleGAN Is Not a Conventional Data Augmentation Baseline

It is worth clarifying that the role of CycleGAN in this study is fundamentally different from that of conventional data augmentation techniques commonly used in computer vision tasks, such as geometric transformations or pixel-level mixing strategies. Classical augmentations, including flipping and rotation, primarily increase sample diversity through visual perturbations of existing images, without introducing new defect realizations.

In contrast, the proposed CycleGAN framework aims to enrich the defective-image distribution by generating non-identical but physically plausible defect instances that remain consistent with the same CD-SEM imaging process and lithographic conditions. From a manufacturing perspective, this process is conceptually analogous to acquiring additional SEM samples under an identical process window, rather than synthetically modifying a single observation.

This distinction is particularly important in semiconductor defect inspection, where defect patterns are governed by complex interactions between the layout, process variations, and imaging physics. Mixing-based augmentation strategies, such as copy-paste or CutMix, may inadvertently violate these physical constraints and generate unrealistic pattern combinations that do not correspond to manufacturable defects. Therefore, CycleGAN is treated in this work as a distribution-consistent sample enrichment mechanism, rather than a visual augmentation baseline.

Conventional augmentation techniques are still applied at the classifier-training stage to improve generalization, as described in Section 2.4.1; however, their role is complementary and orthogonal to the objective of addressing defect data scarcity through generative modeling.

5. Conclusions

This study focuses on the ML model accuracy improvement by reducing the scarcity of defective CDSEM images. We propose a CycleGAN-based method to generate synthetic defect images and develop a systematic analysis of the effects of the defect-area ratio and training epochs on the quality of generated images, as well as their impact on the classifier performance. The results demonstrate that CycleGAN can effectively produce high-quality defective images and improve the accuracy and robustness of subsequent classification ML models.

When the defect-area ratio is small, the signal-to-noise ratio becomes low, and it is hard for CycleGAN model to capture critical defect features. We demonstrate that image cropping with appropriate dimensions could significantly enhance both the stability and quality of the synthetic images. These findings highlight that it is important to analyze defect-size sensitivity before CycleGAN training to ensure optimal input conditions for model learning. Experimental evaluations confirmed that while high-quality defect images can be successfully generated, their quality is strongly influenced by training epochs and image morphology. The SSIM has been proven to be an effective indicator for evaluating the quality of synthetic CDSEM images. It helps us to set up a reliable SSIM range to avoid undertraining or overfitting images, which could lead to blurring or distortion. In addition, a comparison of the CycleGAN with a ResNet generator showed that it outperformed the U-Net in both generation speed and consistency, which indicates the superior capability of ResNet in capturing fine structural features of defects.

Classification experiments further verified that the inclusion of high-quality CycleGAN-generated images effectively enhances the capability of defect classifiers. Accuracy and precision have been improved with increasing training epochs and image refining strategies. When the model was trained by the baseline dataset plus refining generated images at 50k epochs, the recall and F1 score outperformed the baseline by 4.43% and 3.23%, respectively.

Beyond the reported experimental results, an important practical consideration concerns the generalizability and industrial applicability of the proposed framework. Although the experimental dataset used in this study is collected from a single technology node (22 nm), the proposed method is not inherently limited to specific process generation. In advanced semiconductor manufacturing, circuit layouts are typically evolved through direct scaling from mature technology nodes and are constructed from a finite set of standard-cell libraries. As a result, the fundamental layout topologies and dominant defect morphologies—such as line breaks, bridging, and local scumming—exhibit strong structural similarity across technology nodes.

From this perspective, the CycleGAN-based defect synthesis framework should be interpreted as a process-aware data enrichment strategy, rather than a node-specific classification solution. By learning defect distributions that are consistent with lithographic and SEM imaging physics, the generated defective images can be reused or rescaled to support early-stage defect modeling in more advanced nodes. Such a strategy enables datasets collected from mature processes to serve as prior knowledge, thereby accelerating defect classifier development and reducing data collection and manual inspection overhead during technology ramp-up.

In practical industrial scenarios, this capability is particularly valuable for emerging technology nodes, where defect data are scarce and rapid model deployment is required. The proposed approach provides a scalable pathway to bootstrap defect detection models using limited data, while maintaining physical plausibility and process consistency. Future work will focus on validating this transferability across multi-node and multi-tool datasets to further strengthen the robustness and applicability of the proposed framework in high-volume manufacturing environments.

In addition to cross-node generalization, another important aspect of robustness concerns validation across multiple datasets. Cross-dataset evaluation is indeed a valuable direction for further assessing the generalization ability of defect detection models. However, in semiconductor manufacturing, access to large-scale, multi-source CD-SEM image datasets is often constrained by data confidentiality and intellectual property (IP) considerations, as inspection images may directly reveal proprietary circuit layouts belonging to foundry customers.

As a result, the experiments in this study are conducted on a carefully curated single dataset to enable a controlled analysis of defect generation behavior, image quality, and classifier performance. We note that this limitation is not unique to the present work but is a common challenge in semiconductor defect inspection research. To address this, we are actively exploring opportunities to acquire additional real-chip inspection image databases under appropriate data-sharing agreements, with the goal of further validating the proposed framework across diverse datasets.

Looking forward, the proposed CycleGAN-based approach is well suited for extension to cross-dataset scenarios through domain-adaptive retraining or incremental calibration, as it explicitly models defect distributions under lithographic and SEM imaging constraints. Future validation on multi-dataset and multi-source inspection data will further strengthen the robustness and reliability of the proposed framework in realistic production environments.

This research confirms the critical interaction between the defect-area ratio, model architecture, and image quality to CycleGAN generative modeling in CD-SEM applications but also establishes a complete workflow for defective images’ inspection in the semiconductor industry. These findings provide solid research for future high-precision defect detection and image augmentation technologies.

Author Contributions

Conceptualization, A.Y.; Methodology, A.Y.; Validation, N.C., J.C.; Writing—original draft, A.Y.; Writing—review & editing, N.C., J.C. and L.C.; Supervision, E.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Feng, Y.; Liu, J.; Song, Z.; Jiang, H.; Liu, S. Efficient measurement and optical proximity correction modeling to catch lithography pattern shift issues of arbitrarily distributed hole layer. Front. Mech. Eng. 2024, 19, 24. [Google Scholar] [CrossRef]
Botlagunta, P.; Chitta, S. Advanced Optical Proximity Correction (OPC) Techniques in Computational Lithography: Addressing the Challenges of Pattern Fidelity and Edge Placement Error. Glob. J. Med. Case Rep. 2022, 2, 58–75. [Google Scholar] [CrossRef]
Ma, Y.; Hong, L.; Word, J.; Jiang, F.; Liubich, V.; Cao, L.; Jayaram, S.; Kwak, D.; Kim, Y.-C.; Fenger, G.; et al. Reduction of systematic defects with machine learning from design to fab. In Advanced Etch Technology for Nanopatterning IX; SPIE: Bellingham, WA, USA, 2020. [Google Scholar]
Averroes. Semiconductor Inspection & Metrology Explained; Averroes.ai.: San Mateo, CA, USA, 2024. [Google Scholar]
KLA Corporation. Defect Inspection and Review; KLA Corporation: Milpitas, CA, USA, 2025. [Google Scholar]
Nikon Corporation. Semiconductor Lithography Systems and Inspection Solutions. 2025. Available online: https://www.nikon.com/business/semi/ (accessed on 15 July 2025).
Tseng, J.; Chien, J.; Lee, E. Advanced defect recognition on scanning electron microscope images: A two-stage strategy based on deep convolutional neural networks for hotspot monitoring. J. Micro/Nanopatterning Mater. Metrol. 2024, 23, 044201. [Google Scholar] [CrossRef]
Chien, J.; Lee, E. Deep-CNN-Based Layout-to-SEM Image Reconstruction with Conformal Uncertainty Calibration for Nanoimprint Lithography in Semiconductor Manufacturing. Electronics 2025, 14, 2973. [Google Scholar] [CrossRef]
Chien, J.; Tseng, J.; Lee, E. Stochastic-aware hotspot detection with cycleGAN-augmented SEM data and uncertainty quantification for process control applications. In Proceedings of the International Conference on Extreme Ultraviolet Lithography 2025, Monterey, CA, USA, 21–25 September 2025; SPIE: Bellingham, WA, USA, 2025. [Google Scholar]
Phua, C.H.M. Design and Development of an Image Classification Model for Wafer Defects Using Deep Learning. Master’s Thesis, Swinburne University of Technology Sarawak Campus, Sarawak, Malaysia, 2021. [Google Scholar]
Imoto, K.; Nakai, T.; Ike, T.; Haruki, K.; Sato, Y. A CNN-based transfer learning method for defect classification in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 2019, 32, 455–459. [Google Scholar] [CrossRef]
Wang, S. Effectiveness of traditional augmentation methods for rebar counting using UAV imagery with Faster R-CNN and YOLOv10-based transformer architectures. Sci. Rep. 2025, 15, 33702. [Google Scholar] [CrossRef]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Udu, A.G.; Salman, M.T.; Ghalati, M.K.; Lecchini-Visintini, A.; Siddle, D.R.; Dong, H. Emerging SMOTE and GAN-variants for Data Augmentation in Imbalance Machine Learning Tasks: A Review. IEEE Access 2025, 13, 113838–113853. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems; NIPS: Cambridge, MA, USA, 2014; Volume 27. [Google Scholar]
Liu, J.; Zhang, F.; Yang, B.; Zhang, F.; Gao, Y.; Wang, H. Focal auxiliary classifier generative adversarial network for defective wafer pattern recognition with imbalanced data. In Proceedings of the 2021 5th IEEE Electron Devices Technology & Manufacturing Conference (EDTM), Chengdu, China, 8–11 April 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
Park, S.; You, C. Deep convolutional generative adversarial networks-based data augmentation method for classifying class-imbalanced defect patterns in wafer bin map. Appl. Sci. 2023, 13, 5507. [Google Scholar] [CrossRef]
Deng, G.; Wang, H. Efficient mixed-type wafer defect pattern recognition based on light-weight neural network. Micromachines 2024, 15, 836. [Google Scholar] [CrossRef]
Wang, S. A hybrid SMOTE and Trans-CWGAN for data imbalance in real operational AHU AFDD: A case study of an auditorium building. Energy Build. 2025, 348, 116447. [Google Scholar] [CrossRef]
Zhang, L.; Yang, X.; Cheng, X.; Cheng, W.; Lin, Y. Few-shot image classification algorithm based on global–local feature fusion. AI 2025, 6, 265. [Google Scholar] [CrossRef]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Alam, L.; Kehtarnavaz, N. Generating defective epoxy drop images for die attachment in integrated circuit manufacturing via enhanced loss function cyclegan. Sensors 2023, 23, 4864. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015. [Google Scholar]
Chen, Y.-H.; Ju, Y.-J.; Huang, J.-D. Capture the devil in the details via partition-then-ensemble on higher resolution images. In Diabetic Foot Ulcers Grand Challenge; Springer: Cham, Switzerland, 2022; pp. 52–64. [Google Scholar]
Li, J.; Tao, R.; Li, S.; Li, Y.; Huang, X. Sample-imbalanced wafer map defects classification based on Jacobian regularized generative adversarial network. Meas. Sci. Technol. 2025, 36, 036112. [Google Scholar] [CrossRef]
Yi, Z.; Zhang, H.; Tan, P.; Gong, M. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Wang, B.; Zhu, Y.; Chen, L.; Liu, J.; Sun, L.; Childs, P. A study of the evaluation metrics for generative images containing combinational creativity. Artif. Intell. Eng. Des. Anal. Manuf. 2023, 37, e11. [Google Scholar] [CrossRef]
Sheikh, H.R.; Bovik, A.C. Image information and visual quality. IEEE Trans. Image Process. 2006, 15, 430–444. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Borji, A. Pros and cons of GAN evaluation measures: New developments. Comput. Vis. Image Underst. 2022, 215, 103329. [Google Scholar] [CrossRef]
Rožanec, J.M.; Zajec, P.; Theodoropoulos, S.; Koehorst, E.; Fortuna, B.; Mladenić, D. Synthetic data augmentation using GAN for improved automated visual inspection. IFAC-PapersOnLine 2023, 56, 11094–11099. [Google Scholar] [CrossRef]
Ye, J.; Xue, Y.; Long, L.R.; Antani, S.; Xue, Z.; Cheng, K.C.; Huang, X. Synthetic sample selection via reinforcement learning. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Lima, Peru, 4–8 October 2020; Springer: Cham, Switzerland, 2020. [Google Scholar]
Xue, Y.; Ye, J.; Zhou, Q.; Long, L.R.; Antani, S.; Xue, Z.; Cornwell, C.; Zaino, R.; Cheng, K.C.; Huang, X. Selective synthetic augmentation with HistoGAN for improved histopathology image classification. Med. Image Anal. 2021, 67, 101816. [Google Scholar] [CrossRef]

Figure 1. CycleGAN architecture for unpaired translation between normal and defective CD-SEM images.

Figure 2. Example CD-SEM image (1024 × 1024) with a tiny line-break defect and the corresponding 512 × 512 and 256 × 256 cropped regions used as CycleGAN inputs.

Figure 3. Evolution of CycleGAN-generated CD-SEM images with training epochs for three input sizes (1×, 1/4×, and 1/16×).

Figure 4. MAD-based noise index versus training epochs for three input sizes (1×, 1/4×, and 1/16×).

Figure 5. Representative examples of three CycleGAN generation outcomes (Type-1 to Type-3) across training epochs.

Figure 6. SSIM versus training epochs for the three pattern types, indicating different rates of structural degradation.

Figure 7. SSIM distributions at 50k training epochs for CycleGAN with ResNet and U-Net generators (similarity to original image).

Figure 8. Visual comparison of original CD-SEM patterns and images generated by ResNet- and U-Net-based CycleGANs after 50k epochs.

Figure 9. Classification performance ((a) Accuracy, (b) Precision, (c) Recall, (d) F1 score, and (e) AUC) of the baseline Xception model and CycleGAN-augmented models with and without filtered synthetic defects.

Table 1. (A) Test-set classification performance (Accuracy, Precision, Recall, F1, and AUC) of the baseline and CycleGAN-augmented models; (B) corresponding metric differences (%) relative to the baseline.

(A) Model Performance	Accuracy	Precision	Recall	F1	AUC
baseline	97.83%	98.04%	94.34%	96.15%	99.88%
baseline + 30Ke	97.06%	98.04%	95.24%	96.62%	99.75%
baseline + 40Ke	97.46%	97.14%	97.14%	97.14%	99.75%
baseline + 50Ke	97.88%	98.08%	97.14%	97.61%	99.83%
baseline + filtered_30Ke	98.54%	97.33%	98.65%	97.99%	99.87%
baseline + filtered_40Ke	98.55%	98.67%	97.37%	98.01%	99.81%
baseline + filtered_50Ke	99.53%	100.00%	98.77%	99.38%	99.94%
(B) Performance Bias to Baseline	Accuracy	Precision	Recall	F1	AUC
baseline	0.00%	0.00%	0.00%	0.00%	0.00%
baseline + 30Ke	−0.77%	0.00%	0.90%	0.46%	−0.13%
baseline + 40Ke	−0.37%	−0.90%	2.80%	0.99%	−0.13%
baseline + 50Ke	0.06%	0.04%	2.80%	1.45%	−0.05%
baseline + filtered_30Ke	0.71%	−0.71%	4.31%	1.83%	−0.01%
baseline + filtered_40Ke	0.72%	0.63%	3.03%	1.86%	−0.08%
baseline + filtered_50Ke	1.70%	1.96%	4.43%	3.23%	0.05%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yen, A.; Chang, N.; Chien, J.; Chuang, L.; Lee, E. CycleGAN-Based Data Augmentation for Scanning Electron Microscope Images to Enhance Integrated Circuit Manufacturing Defect Classification. Electronics 2026, 15, 803. https://doi.org/10.3390/electronics15040803

AMA Style

Yen A, Chang N, Chien J, Chuang L, Lee E. CycleGAN-Based Data Augmentation for Scanning Electron Microscope Images to Enhance Integrated Circuit Manufacturing Defect Classification. Electronics. 2026; 15(4):803. https://doi.org/10.3390/electronics15040803

Chicago/Turabian Style

Yen, Andrew, Nemo Chang, Jean Chien, Lily Chuang, and Eric Lee. 2026. "CycleGAN-Based Data Augmentation for Scanning Electron Microscope Images to Enhance Integrated Circuit Manufacturing Defect Classification" Electronics 15, no. 4: 803. https://doi.org/10.3390/electronics15040803

APA Style

Yen, A., Chang, N., Chien, J., Chuang, L., & Lee, E. (2026). CycleGAN-Based Data Augmentation for Scanning Electron Microscope Images to Enhance Integrated Circuit Manufacturing Defect Classification. Electronics, 15(4), 803. https://doi.org/10.3390/electronics15040803

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CycleGAN-Based Data Augmentation for Scanning Electron Microscope Images to Enhance Integrated Circuit Manufacturing Defect Classification

Abstract

1. Introduction

1.1. Defect Detection Plays a Key Role in Semiconductor Manufacturing Yield and Cost

1.2. Leveraging Generative Adversarial Networks (GANs) to Enhance Defect Detection Capability

1.3. Advantages of GAN Model and Framework to Mitigate Data Scarcity Issue

1.4. Quantify the Quality of Synthesized Defective Images

1.5. Optimize Data Sampling Methodology to Ensure Robust ML Model Capability

1.6. Contribution and Scope of This Work

2. Materials and Methods

2.1. Dataset Acquisition and Characterization

2.2. CycleGAN Architecture

2.3. Adapt MAD and SSIM to Evaluate Image Quality

2.4. Classifier Model and Model Performance KPI

2.4.1. Adapt Xception Model as Classifier

2.4.2. Binary Classification Evaluation Metrics

3. Result

3.1. Computation and Image Resource

3.2. Optimizing Input Resolution for Defect Generation

3.3. Classification of Generated Defect Images

3.4. Comparison of Generator Architectures

3.5. Enhancing Defect Detection Performance via CycleGAN Data Augmentation

4. Experimental Results and Discussion

4.1. Defect-Area Sensitivity Analysis

4.2. Evolution of Generated Image Quality and Architectural Comparison

4.3. Model-Performance Evaluation

4.4. Why CycleGAN Is Not a Conventional Data Augmentation Baseline

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI