Integrating Local and Global Features for Wafer Defect Pattern Classification via Sequential Hybrid Architecture

Song, Jaeho; Oh, Seungmin; Noh, Juhyeon; Hahn, Minsoo; Kim, Jinsul

doi:10.3390/pr14071134

Open AccessFeature PaperEditor’s ChoiceArticle

Integrating Local and Global Features for Wafer Defect Pattern Classification via Sequential Hybrid Architecture

by

Jaeho Song

¹

,

Seungmin Oh

¹

,

Juhyeon Noh

¹,

Minsoo Hahn

² and

Jinsul Kim

^1,*

¹

Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of Korea

²

Department of Computational and Data Science, Astana IT University, Astana 010000, Kazakhstan

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(7), 1134; https://doi.org/10.3390/pr14071134

Submission received: 28 February 2026 / Revised: 27 March 2026 / Accepted: 27 March 2026 / Published: 31 March 2026

(This article belongs to the Special Issue Machine Learning, Control, and Optimization in Manufacturing and Industry 4.0)

Download

Browse Figures

Versions Notes

Abstract

Wafer map defect pattern classification supports quality monitoring in semiconductor manufacturing, but public benchmark datasets such as WM-811K exhibit extreme class imbalance, where majority classes can dominate standard metrics. This study aims to improve minority class performance while maintaining inference efficiency. Building on an iFormer-based hybrid backbone, we propose the Pattern-Selective Sequential Hybrid Network (PSS-HNet), which redesigns attention blocks to sequentially integrate local interaction (Modulated Convolution) and global interaction (Modulated Axial Attention) and applies sigmoid-based gating to control contextual information injection. Experiments on WM-811K (9 classes) compare iFormer (baseline), Axial-only, Axial+Modulation, and PSS-HNet using macro-averaged metrics as primary indicators, along with class-wise analysis and efficiency evaluation. PSS-HNet improves Macro-Recall by 1.02 percentage points (from 0.8852 to 0.8954) and Macro-F1 by 0.54 percentage points (from 0.9044 to 0.9098) over the baseline while maintaining similar accuracy. It also reduces computational cost and inference latency to 0.754 G FLOPs, 4.381 M parameters, and 7.682 ms, compared with 1.103 G FLOPs, 6.245 M parameters, and 8.666 ms for the baseline. Overall, selective sequential local–global integration provides a favorable balance between minority class performance and efficiency.

Keywords:

Axial Attention; class imbalance; deep learning; hybrid convolution–transformer; lightweight model; modulation; wafer defect pattern classification; wafer map; WM-811K

1. Introduction

As semiconductor manufacturing processes become increasingly advanced, the ability to diagnose defects rapidly and consistently becomes crucial for maintaining stable yield and product quality [1]. A wafer map is a spatial summary of defect distributions on a wafer, and it has been used to intuitively assess process conditions and detect anomalies via recurring defect patterns [2,3]. In manufacturing environments, defect characteristics are affected by process conditions, equipment states, and process steps. Accordingly, even defects belonging to the same class may show substantial variation in morphology and severity. As a result, manual interpretation and rule-based screening depend heavily on operator expertise and experience. They also impose substantial time and cost burdens and make it difficult to maintain consistent decision criteria. Moreover, as production scales up, fully manual inspection has inherent limitations. This highlights the need for automated wafer defect pattern classification methods that can be reliably deployed in practice [4].

This study addresses the wafer defect pattern classification problem using WM-811K, a public wafer map dataset. Public benchmark datasets provide an important starting point by improving reproducibility and comparability across studies; however, wafer map classification entails multiple challenges that closely reflect real-world settings. For example, class distributions are often imbalanced: while some patterns provide sufficient learning signals, minority patterns may be underrepresented during training [5,6]. In addition, defect patterns can be distinguished not only by local morphological characteristics but also by their spatial arrangement across the wafer and global structures [7], and noisy artifacts introduced during measurement can confound pattern recognition [8]. Therefore, wafer map classification models are required to simultaneously achieve the ability to precisely discriminate local patterns and the ability to capture global structures; moreover, when considering practical deployment, computational efficiency and robustness should be considered alongside predictive performance.

In prior studies on wafer map classification, Convolutional Neural Network (CNN)-based approaches have been widely adopted [9,10,11]. CNNs can effectively extract local morphological features, yielding strong baseline performance and providing representation learning that is well suited to the visual patterns in wafer maps. However, CNN-based models may have limited mechanisms for learning global structural relationships, and their representational capacity can be constrained when defect patterns are differentiated by wafer-wide spatial configurations or long-range dependencies. In contrast, Transformer-based models are strong at modeling global interactions and can directly capture correlations across the entire wafer region [12,13,14]. Nevertheless, they generally incur high computational cost, and additional considerations are required for training efficiency and generalization in settings with limited data or imbalanced class distributions. Accordingly, recent work has highlighted hybrid designs that combine local representations (convolution-based) and global interactions (global-attention-based) to achieve both performance and efficiency [15,16,17]. Even so, naive integration can lead to excessive information mixing or unnecessary computation; without clear design principles regarding what information to transmit, at which stage, and to what extent, it is difficult to achieve a balanced trade-off between predictive performance and computational efficiency.

Building on the hybrid design of iFormer [18], this study proposes the Pattern-Selective Sequential Hybrid Network (PSS-HNet), which improves the integration strategy to better suit wafer map classification. The key idea of PSS-HNet is an integration strategy that, rather than unconditionally merging local and global information, constructs representations sequentially and selectively transfers only the necessary information. Specifically, it first forms a base representation by reliably extracting local morphological cues and boundary characteristics of defects and then reinforces structural relationships from a wafer-wide perspective via global-level interactions. Finally, instead of assuming that globally enhanced information is always beneficial, PSS-HNet forwards only meaningful information to the next stage through a selective transfer mechanism, thereby mitigating interference that may arise from excessive mixing. This perspective can be particularly effective for wafer map classification, where (1) classes that rely primarily on local cues and (2) classes that depend on global structures coexist, and it aims not only to improve performance but also to reduce unnecessary computation to secure efficiency.

The contributions of this paper are as follows. First, we present PSS-HNet, a deployment-oriented design for wafer defect pattern classification that sequentially integrates local information and global interactions and incorporates selective transfer to account for efficiency. Second, we experimentally substantiate the validity of the proposed integration strategy by systematically evaluating its effectiveness on WM-811K using diverse baselines and ablation variants. Third, beyond classification metrics, we provide a comparative analysis from the perspective of model efficiency to discuss practical deployability and associated trade-offs. The remainder of this paper is organized as follows. Section 2 reviews related work, Section 3 describes the proposed method, and Section 4 presents the experimental setup. Section 5 discusses the experimental results and analyses and Section 6 concludes the paper and outlines directions for future work.

2. Related Works

2.1. Wafer Map Defect Pattern Classification and Benchmark Datasets

Wafer map-based wafer defect pattern classification can be defined as the task of automatically identifying defect types (patterns) from inputs that represent the spatial distribution of defects on a wafer in an image-like form [19].

In wafer defect pattern classification research, public benchmarks play a relatively important role due to limited access to in-fab process data, enabling model-to-model comparisons and reproducible experiments. This study conducts wafer defect pattern classification using the public WM-811K dataset. WM-811K is a widely used benchmark dataset for wafer map-based wafer defect pattern classification, and it is useful for comparing models’ pattern recognition capability because it includes a variety of defect pattern classes [20].

However, because wafer map classification has domain characteristics that differ from general natural-image classification, stable performance may not be achieved simply by increasing a model’s representational capacity. First, class distributions are often non-uniform, and training that is biased toward majority classes can degrade recognition performance for minority patterns [21]. Second, defect patterns may be distinguished not only by local morphological cues but also by wafer-wide spatial configurations, requiring representation learning that can jointly capture local and global information [22]. Third, noisy artifacts introduced during measurement and variations in defect severity can induce pattern variations even within the same class, which adds further difficulty for model generalization [23]. Therefore, metrics that account for imbalance and class-wise analyses are necessary.

2.2. CNN-Based and Lightweight Models for Wafer Map Classification

The mainstream approach to wafer defect pattern classification has formalized the task as an image classification problem and evolved toward learning morphological features using CNN-based models. CNNs, leveraging local receptive fields, are effective at hierarchically extracting morphological cues such as linear, ring-shaped, and clustered structures, and have therefore served as strong baselines for wafer map classification. Prior studies have demonstrated the effectiveness of deep learning-based classification on WM-811K by comparing a range of CNN architectures [24].

For deployment in manufacturing settings, efficiency that enables operation under constrained computational resources and response-time requirements is important in addition to accuracy. Reflecting this need, studies on wafer map classification have compared lightweight CNN models not only in terms of performance but also with respect to resource consumption and processing time [25,26]. For example, by evaluating lightweight models (e.g., EfficientNetV2, ShuffleNetV2, and MobileNetV2/V3) from both performance and efficiency perspectives, prior work reported that some lightweight architectures can achieve performance comparable to larger backbones while operating at a lower computational cost [27].

However, convolution-based models are inherently based on local receptive fields; therefore, when defect patterns are distinguished by wafer-wide spatial configurations or long-range correlations, global context may not be sufficiently captured. Moreover, naively increasing model capacity to enhance global context can raise computational cost and inference latency, thereby exacerbating deployment constraints in practice. Accordingly, wafer map classification requires modeling strategies that preserve the advantages of local representations while efficiently integrating global context [28].

2.3. Vision Transformers for Global Context Modeling

On wafer maps, defect patterns are not limited to locally clustered shapes such as Center and Donut; in some cases, continuity and directionality between spatially distant locations—such as in wafer-spanning linear patterns (Scratch)—serve as key cues for classification [12]. Because of these characteristics, CNN-based approaches that excel at local receptive-field modeling are efficient; however, it has been repeatedly noted that adequately capturing long-range dependency often requires excessively deep networks or the introduction of additional global interaction modules [29].

A representative line of research that directly targets global-context learning is the Vision Transformer (ViT). The ViT partitions an image into patch tokens and learns global interactions via self-attention, offering the advantage of directly modeling relationships between patterns that are spatially far apart. However, standard self-attention incurs rapidly increasing computational and memory costs with the number of tokens (quadratic complexity), making it difficult to apply directly in settings with high-resolution inputs or strict inference-latency constraints [30].

To alleviate these cost issues, approaches have been proposed that restrict the attention scope or construct hierarchical representations. For example, the Swin Transformer introduces a window-based attention mechanism that reduces computational burden while building hierarchical features [31]. Nevertheless, additional costs and design trade-offs have also been discussed, arising from window partitioning and reconstruction, inter-window information exchange, and memory access patterns [14].

Another direction is Axial Attention, which factorizes two-dimensional global interactions along axes [32]. By applying attention along each axis independently, Axial Attention can preserve a global receptive field while simplifying the computational structure and has been discussed as an alternative for reducing the cost burden of standard self-attention. In particular, in domains such as wafer maps where the background occupies a large portion and defect signals are sparse, uniformly diffusing global context across all locations may dilute defect cues; moreover, for structures with directionality and continuity, such as linear patterns, axis-based interactions may provide a useful inductive bias [33].

2.4. Hybrid Architectures Combining Convolution and Global Interaction

In wafer map classification, both local morphological cues (local patterns) and wafer-wide spatial configurations (global context) are often important simultaneously. CNNs are strong at representing local patterns but may be limited in directly handling global interactions; conversely, Transformer-based models excel at global context modeling but can face practical constraints due to computational cost and memory overhead [34]. To mitigate this trade-off, hybrid architectures that combine convolution-based local representations with global interactions have been widely explored [35].

Hybrid architectures focus not on simply combining local representations and global context in parallel but on achieving both efficiency and representational power through stage-wise functional division. Typically, convolution-based processing is used in early stages to reliably extract local features, and global interactions are then applied in later stages to reinforce long-range relationships. In addition, strategies for controlling the integration strength have been discussed to avoid excessive diffusion of global information [36].

The iFormer has been discussed as a representative approach that systematizes such hybrid architectures for mobile and lightweight deployment settings [18]. iFormer presents a design that performs local processing efficiently on high-resolution inputs and introduces global interactions in later stages to enhance representational capacity, emphasizing a design philosophy that balances local processing and global interactions under inference-latency and resource constraints.

In the wafer defect classification domain specifically, recent work has explored hybrid designs that reflect similar principles. DefectTrackNet proposes a lightweight CNN–Transformer architecture for semiconductor defect analysis, combining parallel local and global feature extraction to achieve efficient defect retrieval under practical manufacturing constraints [15]. More broadly, hybrid Transformer designs have demonstrated that jointly leveraging local convolutional representations and global attention can reduce model parameters and computation while maintaining strong task performance [17]. These efforts collectively highlight that the key challenge lies not in combining local and global modules per se but in controlling how and to what extent global context is integrated, a design consideration that directly motivates the proposed PSS-HNet.

However, because wafer maps often contain a large background region and sparse defect signals, defect cues may be either amplified or, conversely, diluted depending on how the global context is integrated. Therefore, when applying hybrid architectures, it is necessary to jointly consider the integration scheme of global interactions and the strength of information transfer [2].

2.5. Efficient Global Interaction Mechanisms for Lightweight Vision Models

Transformer-based models that directly model global context are strong at capturing long-range dependencies; however, standard self-attention incurs substantially increased computational and memory costs as the number of input tokens grows. Accordingly, in lightweight vision models, introducing global interactions while controlling their cost has been repeatedly identified as a key design challenge [37].

In particular, in domains with a large background region and sparse defect signals, excessive diffusion of global information can dilute discriminative cues; therefore, an integration strategy that selectively controls “how much” global context to incorporate can become important. Accordingly, selective transfer mechanisms such as sigmoid-based gating and weighted integration have been discussed not only from an efficiency perspective but also in terms of stabilizing performance.

3. Proposed Method

This section describes the overall design overview and key components of PSS-HNet proposed for wafer defect pattern classification. In addition, we present how the proposed model redesigns blocks within an iFormer-based backbone to integrate local and global interactions.

3.1. Overview of PSS-HNet

This study builds on an iFormer-based hybrid backbone and proposes PSS-HNet, which redesigns attention blocks to better fit the wafer map domain. The core idea of PSS-HNet is to replace a subset of the original attention blocks with redesigned attention blocks and sequentially integrate local and global interactions within each block. Specifically, the local interaction uses Modulated Convolution to strengthen morphology-driven local cues, while the global interaction uses Modulated Axial to capture long-range relationships and wafer-wide spatial configurations. To avoid assuming that global context is always beneficial, we introduce a sigmoid-based gating branch at each interaction stage to adaptively modulate the strength of information transfer conditioned on the input. With this design, PSS-HNet aims to achieve both stable local representations and the complementary benefits of global context, while alleviating unnecessary mixing and computational overhead.

In addition, with practical deployment in manufacturing settings in mind, this study aims to consider not only classification performance but also inference efficiency.

Figure 1 summarizes the overall backbone architecture of PSS-HNet and the locations where the redesigned attention blocks are applied. Figure 2 illustrates the internal flow of the redesigned attention block and Figure 3 and Figure 4 present the structures of the local interaction (Modulated Convolution) and global interaction (Modulated Axial) units, respectively.

3.2. Overall Architecture and Block Organization

PSS-HNet is built upon an iFormer-based hybrid backbone, while replacing the original attention blocks with redesigned attention blocks to better suit wafer defect pattern classification. The overall backbone follows a hierarchical flow in which input features are stably processed, with a local emphasis on early stages, and long-range relationships are reinforced via global interactions in later stages. This configuration avoids the indiscriminate application of costly global operations across all stages by introducing them only at selected stages where needed.

Figure 1 summarizes the overall flow of PSS-HNet and the locations where the redesigned attention blocks are applied. By replacing the original attention blocks with redesigned attention blocks in selected segments of the backbone, PSS-HNet performs local interaction and global interaction sequentially within a single integrated unit. Figure 2 illustrates the internal flow of the redesigned attention block. Within the block, local interaction (Modulated Convolution) is first applied to strengthen local morphological features, and global interaction (Modulated Axial) is subsequently used to incorporate long-range relationships and wafer-wide spatial configuration information. After each interaction stage, a feed-forward-style module that performs channel-wise transformations is inserted to enhance representational capacity, and each module is equipped with residual connections to enable stable learning.

The key design principle of the redesigned attention block is to control information transfer strength rather than simply adding global context. PSS-HNet introduces a sigmoid-based gating branch to generate gating signals and uses these gates to adaptively control the extent to which context features extracted from local or global interactions are injected into the next-stage representation. Figure 3 details the local interaction unit (Modulated Convolution) and Figure 4 details the global interaction unit (Modulated Axial).

3.3. Modulated Convolution and Modulated Axial Units

This subsection explains the operating principles of the two interaction units that constitute the redesigned attention block, namely, local interaction (Modulated Convolution) and global interaction (Modulated Axial). Both units share a common structure comprising a context branch that generates context features and a gating branch that produces gating signals to modulate the extent to which the context features are incorporated. The input features are transformed via a projection, passed through a sigmoid function, and then split along the channel dimension into a context part and a gating part. The final output is obtained by applying the gate to the context features in an element-wise manner.

The modulation operation shared by the two units can be expressed as Equation (1).

x_{0} = c t x (x) ⊙ g (x)

(1)

Here,

x_{0}

denotes the output feature,

c t x (x)

is the context feature produced by the context branch,

g (x)

is the gating signal generated by the gating branch, and

⊙

denotes element-wise multiplication. The gating signal is defined using a sigmoid function as in Equation (2).

g (x) = σ (φ (x))

(2)

Here,

φ (\cdot)

denotes a projection (linear transformation) and

σ (\cdot)

denotes a sigmoid activation function. In the implementation,

σ (\cdot)

is applied to the output of

φ (\cdot)

, after which

c t x (x)

and

g (x)

are formed by splitting along the channel dimension; Equations (1) and (2) summarize this modulation process in a functional form.

The context-branch operation

c t x (x)

is defined by different operators for local interaction and global interaction. The Modulated Convolution unit is responsible for local interaction and integrates local morphological cues with the surrounding local context through convolution-based operations. Because wafer map defect patterns often include features that are distinguishable in local regions—such as locally clustered shapes, boundary variations, and thin, elongated substructures—operations that effectively capture local context play an important role in classification performance. In this setting, local context features can be constructed using efficient convolutional operations such as depthwise separable convolution, and the resulting output corresponds to

c t x (x)

in Equation (1). Figure 3 presents the overall flow of the Modulated Convolution unit.

The Modulated Axial unit is responsible for global interaction and incorporates long-range relationships and wafer-wide spatial configuration information through axis-based global interactions. In wafer maps, certain defect patterns can be distinguished by wafer-wide directionality, continuity, or inter-region relationships, making global context an important cue. In this study, the global context feature is represented as

A x i a l A t t e n t i o n (\cdot)

and used as

c t x (x)

in Equation (1). Accordingly, the global context can be defined as in Equation (3).

c t x (x) = A x i a l A t t e n t i o n (x)

(3)

Here,

A x i a l A t t e n t i o n (\cdot)

denotes a global interaction operator that applies single-head attention along the height and width axes independently, aggregating the results via summation to capture global spatial dependencies across both dimensions. Figure 4 presents the overall flow of the Modulated Axial unit.

In summary, the redesigned attention block in PSS-HNet sequentially combines local interaction (Modulated Convolution) and global interaction (Modulated Axial) and controls the context injection strength at both stages via sigmoid-based gating. Moreover, the context branch is designed to reflect both the local morphological cues and wafer-wide spatial configurations required for wafer map classification, using efficient convolutional operations for local modeling and Axial Attention for global modeling.

3.4. Design Rationale

Wafer defect pattern classification is a task in which both local morphological cues and wafer-wide spatial configurations contribute to classification [38], and defect signals are often sparse in inputs with dominant background regions [39]. Therefore, a model should reliably preserve local cues while being able to reinforce long-range relationships and structural cues through global context when needed. Moreover, if the global context is incorporated unconditionally, non-discriminative background information may diffuse as well, which can dilute defect cues and increase unnecessary information mixing; thus, the context integration scheme should be designed with care.

Reflecting these data characteristics, PSS-HNet sequentially combines local and global interaction. It first forms stable morphological defect cues through local interaction and then strengthens structural cues—such as directionality, continuity, and inter-region relationships—via global interaction, thereby integrating information across different scales in a stage-wise manner. Rather than injecting context features at a fixed strength, the sigmoid-based gating mechanism continuously adjusts the contribution of contextual information according to the input and intermediate representations, suppressing unnecessary information mixing and achieving a balance between preserving local cues and enhancing global context.

4. Experiments

4.1. Dataset (WM-811K)

This study evaluated wafer defect pattern classification performance using the public wafer map dataset WM-811K. WM-811K consists of nine classes: Center, Donut, Local, Edge-Local, Edge-Ring, Scratch, Random, Near-Full, and None. Figure 5 shows randomly sampled wafer map examples from each class [20].

WM-811K contains a total of 811,457 samples; to use only labeled data, we excluded 638,507 unlabeled samples. As a result, experiments were conducted using the remaining 172,950 samples.

After removing unlabeled samples, the dataset exhibits extreme class imbalance, with the None class accounting for the majority of samples. This distribution is presented in Figure 6.

4.2. Preprocessing and Data Split

We split the data into 80% for training and 20% for testing and performed a stratified split to preserve class proportions. No data resampling techniques, such as oversampling to mitigate class imbalance, were used. This choice was intentional: incorporating imbalance mitigation strategies such as focal loss or class weighting would introduce confounding factors, making it difficult to attribute performance differences specifically to the proposed architectural design. The goal of this study is to evaluate the architectural contribution of PSS-HNet in isolation under a controlled and consistent training protocol across all model variants.

Data preprocessing used the same basic transformations for both training and testing, while data augmentation was applied probabilistically only during training. During testing, no random augmentation was applied; instead, inputs were normalized using a fixed preprocessing pipeline to evaluate performance.

Training-time augmentation combined techniques such as rotation, flipping, blurring, random cropping, and random erasing to reflect realistic variations in wafer map patterns, including changes in position, orientation, and partial occlusion. These augmentations aim to improve generalization by increasing input diversity while preserving the morphological characteristics of defect patterns.

4.3. Model Variants and Training Configuration

Based on an iFormer backbone, this study compared four model variants to examine the impact of global interaction and modulation (gating) configurations and assess the effectiveness of PSS-HNet. The compared models are: (1) the iFormer baseline; (2) Axial-only, which replaces the original attention-based global interaction with axis-based global interaction; (3) Axial+Modulation, which adds a modulation (gating) mechanism to Axial-only; and (4) PSS-HNet, which applies the redesigned attention block that sequentially integrates local (Modulated Convolution) and global (Modulated Axial) interactions.

Training followed the same protocol for all variants to ensure a fair comparison. Cross-entropy was used as the classification loss, and optimization was performed using AdamW. A cosine learning-rate schedule was adopted, including a warm-up stage.

The detailed training settings are as follows. The batch size was set to 512, and training was conducted for 100 epochs. The initial learning rate was 0.001, and the weight decay was 0.05. All models were trained on an NVIDIA H100 GPU. To improve robustness, each model was trained three times with different random seeds, and the mean ± standard deviation of the classification metrics is reported in Section 5.1. Inference latency and efficiency metrics were measured separately on an NVIDIA GeForce RTX 3060 Ti to reflect a realistic lightweight deployment scenario.

The comparisons in this study are not intended as a broad backbone competition such as “CNN vs. Transformer”; rather, they aim to isolate how local interaction (convolution), global/axis-based interaction, and modulation integration within an iFormer-based architecture affect performance and efficiency. Accordingly, direct comparisons against standalone CNN backbones were excluded from the experimental scope. While comparisons with a wider range of state-of-the-art models would further contextualize the reported improvements, such comparisons would conflate the effects of different backbone designs with those of the specific architectural modifications under investigation. The controlled variant-based evaluation adopted here is better suited to the study’s primary objective: providing a rigorous ablation of the proposed design choices within a consistent architectural framework.

4.4. Evaluation Metrics and Protocol

Considering the extreme class imbalance of WM-811K, this study evaluated both overall performance and class-wise performance rather than relying on a single metric. Evaluation was conducted on the test set, and for each model, we reported Accuracy along with Precision, Recall, and F1-score. In addition to overall averaged metrics, we also present class-wise metrics (Precision/Recall/F1-score) to explicitly examine performance degradation in minority classes and confusion patterns between specific classes.

Given that majority classes can dominate overall performance in imbalanced datasets, this study uses macro-averaged metrics as the primary performance indicators. Macro averaging computes the mean of per-class metrics with equal weight, aiming to reflect minority class performance in a balanced manner. We additionally report weighted-average metrics; weighted averaging uses the per-class sample count (support) as weights to reflect the overall data distribution. These metrics are summarized in Section 5.1 (overall performance comparison) and Section 5.2 (class-wise performance comparison).

We also analyzed misclassification patterns across classes using a confusion matrix. A confusion matrix indicates which other classes a given class is confused with and is useful for interpreting the reasons for changes in macro-averaged metrics. In this study, we additionally present normalized confusion matrices to enable comparable analysis of class-wise classification tendencies.

Finally, to discuss practical deployment, we compared efficiency metrics in addition to classification performance metrics. In this paper, we report Params, floating-point operations (FLOPs), inference latency, and Macro-F1 together to assess the balance between performance and efficiency. The classification metrics in Section 5.1 and the Macro-F1 values in Section 5.3 are reported as mean ± standard deviation over three independent runs. In addition, to qualitatively examine which regions of defect patterns the model relies on for its predictions, we performed Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations, and representative cases are presented in the results analysis [40].

5. Experimental Results and Analysis

5.1. Experimental Setup Recap

This subsection compares and analyzes the wafer defect pattern classification performance of the proposed model (PSS-HNet) on the WM-811K dataset (9 classes). WM-811K is a highly imbalanced dataset in which the normal (None) class accounts for a very large proportion, exceeding 85% of the test set. In this setting, Accuracy and Weighted-F1 can be largely dominated by majority class performance; therefore, we use Macro-Recall and Macro-F1 as the key comparison metrics to evaluate overall sensitivity across minority defect classes (reduced false negatives). Table 1 summarizes the overall metrics for the four models (iFormer, Axial-only, Axial+Modulation, and PSS-HNet).

5.2. Overall Performance Comparison

As shown in Table 1, all four models achieve generally high Accuracy values in the range of 0.9666–0.9782, and iFormer (0.9782 ± 0.0006) and PSS-HNet (0.9780 ± 0.0002) are nearly identical in terms of Accuracy. However, the differences become more pronounced in macro-averaged metrics. Axial-only exhibits substantial decreases in Macro-Recall (0.8258 ± 0.0105) and Macro-F1 (0.8462 ± 0.0106), suggesting that using axis-based global interaction alone may reduce the reliability of minority-class predictions. In contrast, Axial+Modulation recovers Macro-F1 to 0.9063 ± 0.0025, indicating that modulation contributes to stabilizing the performance of axis-based global interaction. Ultimately, PSS-HNet achieves the highest macro metrics among the four models, with Macro-Recall of 0.8954 ± 0.0034 and Macro-F1 of 0.9098 ± 0.0008, demonstrating overall improvements in minority-class performance under class imbalance. Table 2 presents the class-wise metrics.

5.3. Efficiency and Confusion Matrix Analysis

From the perspective of deployment in manufacturing settings, not only classification performance but also compute requirements and inference latency should be considered. Table 3 compares the four models in terms of the number of parameters (Params), computational complexity (FLOPs), and per-sample inference latency. FLOPs were computed using fvcore (FlopCountAnalysis) with an input size of 224 × 224. PSS-HNet achieves 4.381 M Params, 0.754 G FLOPs, and 7.682 ms, yielding the smallest model size and the lowest computational cost among the four models, and it also attains the shortest inference latency. In contrast, the iFormer baseline records 6.245 M Params, 1.103 G FLOPs, and 8.666 ms, whereas Axial-only and Axial+Modulation tend to increase compute and inference latency due to the modified global interaction. When Macro-F1 is considered alongside efficiency metrics, PSS-HNet aligns with the design intent of maintaining classification performance comparable to, or better than, competing models under a smaller computational budget. In particular, because WM-811K is highly imbalanced with a very large proportion of the None class, Accuracy alone is insufficient to fully characterize a model’s practical defect classification capability. In this setting, PSS-HNet offers a balanced trade-off that jointly satisfies overall performance (Macro-F1) and efficiency (Params/FLOPs/inference latency). A confusion matrix provides an intuitive view of which class pairs are frequently confused and whether errors are concentrated in specific minority classes. Figure 7 presents the confusion matrices of the four models, and Figure 8 shows the same results in a normalized form (per-class proportions) to enable direct comparison.

5.4. Qualitative Analysis with Grad-CAM

Along with quantitative metrics, this study performed Grad-CAM-based visualizations to qualitatively examine the spatial regions the models rely on for their predictions. Figure 9 presents representative cases across three defect classes—Scratch, Loc, and Edge-Loc—comparing the Grad-CAM activations of the iFormer baseline and PSS-HNet. Grad-CAM highlights input regions that contribute to a classification outcome and is used to interpret which parts of a defect pattern the model focuses on [40]. In the Scratch case (top row), iFormer distributes activation broadly across the wafer, including regions distant from the defect, whereas PSS-HNet produces a more localized response concentrated on the linear defect region. In the Loc case (middle row), iFormer exhibits diffuse activation extending beyond the defect area, while PSS-HNet focuses more sharply on the defect cluster near the wafer edge. In the Edge-Loc case (bottom row), iFormer shows broad activation that includes background noise regions, whereas PSS-HNet concentrates activation more selectively on the localized defect area despite the noisy input background. Across all three cases, PSS-HNet tends to produce more concentrated responses on regions where defect patterns are present, compared to the broader activation patterns of the iFormer baseline. However, this analysis is a qualitative observation based on representative examples, and the overall performance comparison is discussed primarily using the quantitative metrics in Table 1, Table 2 and Table 3.

6. Conclusions

In the WM-811K wafer map setting (9 classes) with extreme class imbalance, this study aimed to design an efficient classification backbone for mobile and lightweight deployment while reflecting minority defect class performance in a balanced manner. Because wafer map defects can be characterized not only by local patterns but also by long-range spatial context, it is necessary to introduce global context while controlling computational cost and mitigating unnecessary diffusion of global information.

Experiments were conducted on the WM-811K test set by comparing four models: iFormer (baseline), Axial-only, Axial+Modulation, and PSS-HNet. Because WM-811K is an imbalanced dataset in which the None class dominates the test set, we reported Macro-Recall and Macro-F1 as the key metrics, given the limitations of using Accuracy alone.

Quantitatively, compared with the iFormer baseline, PSS-HNet improved Macro-Recall by 1.02 percentage points (from 0.8852 to 0.8954) and Macro-F1 by 0.54 percentage points (from 0.9044 to 0.9098) while maintaining a similar Accuracy level, as shown in Table 1. In addition, the Axial-only configuration exhibits substantial drops in Macro-Recall and Macro-F1, indicating that applying axis-based global interaction alone may not ensure reliable predictions for minority classes. In contrast, Axial+Modulation recovers Macro-F1 to 0.9063, suggesting that incorporating selective transfer (gating/modulation) may help stabilize the performance of models based on global interaction. In terms of efficiency, PSS-HNet reduces both the number of parameters and the level of computation relative to the baseline and shortens per-sample inference latency by 0.984 ms (from 8.666 ms to 7.682 ms). Additionally, we qualitatively examined the basis of model predictions using Grad-CAM visualizations.

From a practical deployment perspective, these efficiency advantages carry direct implications for semiconductor manufacturing environments. Inspection systems in fab settings typically operate under strict throughput and response-time constraints, where inference latency and model size are as critical as classification accuracy. The efficiency gains of PSS-HNet therefore suggest suitability for integration into real-time wafer inspection pipelines on resource-constrained shop-floor hardware. Notably, the experimental setup adopted in this study reflects a realistic deployment scenario, in which models are trained in a high-performance compute environment and evaluated for inference efficiency on a more modest platform—consistent with the common industrial practice of training offline and deploying on edge or line-side systems.

The limitations of this study are as follows. First, the evaluation is based on a single dataset (WM-811K), and the sample sizes of some minority classes are limited. Although WM-811K is the most widely used public benchmark for wafer defect pattern classification and provides a standardized basis for comparison, generalization to other wafer datasets or different process conditions remains to be validated. Second, the efficiency metrics are measured under a specific hardware configuration and on a per-sample basis; thus, further validation is needed in realistic batch and device settings, including throughput and inference-latency variability.

Future work will pursue several directions. First, we will evaluate PSS-HNet on additional wafer map datasets collected under different process conditions and equipment configurations to assess cross-dataset generalization. Second, we will investigate the integration of imbalance-aware training strategies—such as focal loss, class-weighted loss, and mixup-based augmentation—to further stabilize minority class performance independently of architectural design choices. Third, we plan to conduct throughput and latency evaluations under realistic batch inference settings and on edge-deployable hardware beyond the RTX 3060 Ti to more comprehensively characterize practical deployability.

Author Contributions

Conceptualization, J.S.; Methodology, J.S., S.O. and J.N.; Software, J.S.; Validation, J.S.; Formal analysis, J.S.; Writing—original draft, J.S.; Writing—review & editing, S.O., J.N., M.H. and J.K.; Visualization, J.S.; Supervision, M.H. and J.K.; Project administration, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP)-Innovative Human Resource Development for Local Intellectualization program grant funded by the Korean government (MSIT) (IITP-2026-RS-2022-00156287).

Data Availability Statement

The data presented in this study are available in IEEE Xplore at https://doi.org/10.1109/TSM.2014.2364237, reference number [20]. These data were derived from the following resources available in the public domain: Wafer Map Failure Pattern Recognition and Similarity Ranking for Large-Scale Data Sets (https://doi.org/10.1109/TSM.2014.2364237).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Khope, N.P.; Kshirsagar, U.A. Yield Analysis in Semiconductor Manufacturing: Techniques, Case Studies, and Future Direction. J. Inf. Syst. Eng. Manag. 2025, 10, 198–212. [Google Scholar] [CrossRef]
Jeong, I.; Lee, S.Y.; Park, K.; Kim, I.; Huh, H.; Lee, S. Wafer Map Failure Pattern Classification Using Geometric Transformation-Invariant Convolutional Neural Network. Sci. Rep. 2023, 13, 8127. [Google Scholar] [CrossRef] [PubMed]
Alawieh, M.B.; Boning, D.; Pan, D.Z. Wafer Map Defect Patterns Classification Using Deep Selective Learning. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 20–24 July 2020. [Google Scholar] [CrossRef]
Kumar, N.; Kennedy, K.; Gildersleeve, K.; Abelson, R.; Mastrangelo, C.M.; Montgomery, D.C. A Review of Yield Modelling Techniques for Semiconductor Manufacturing. Int. J. Prod. Res. 2006, 44, 5019–5036. [Google Scholar] [CrossRef]
Bao, Y.Y.; Li, E.C.; Yang, H.Q.; Jia, B.B. Wafer Map Defect Classification Using Autoencoder-Based Data Augmentation and Convolutional Neural Network. arXiv 2024, arXiv:2411.11029. [Google Scholar] [CrossRef]
Chen, S.; Zhang, Y.; Hou, X.; Shang, Y.; Yang, P. Wafer Map Failure Pattern Recognition Based on Deep Convolutional Neural Network. Expert Syst. Appl. 2022, 209, 118254. [Google Scholar] [CrossRef]
Fan, S.K.S.; Chiu, S.H. A New ViT-Based Augmentation Framework for Wafer Map Defect Classification to Enhance the Resilience of Semiconductor Supply Chains. Int. J. Prod. Econ. 2024, 273, 109275. [Google Scholar] [CrossRef]
Yu, N.; Chen, H.; Xu, Q.; Hasan, M.M.; Sie, O. Wafer Map Defect Patterns Classification Based on a Lightweight Network and Data Augmentation. CAAI Trans. Intell. Technol. 2023, 8, 1029–1042. [Google Scholar] [CrossRef]
Kim, E.S.; Choi, S.H.; Lee, D.H.; Kim, K.J.; Bae, Y.M.; Oh, Y.C. An Oversampling Method for Wafer Map Defect Pattern Classification Considering Small and Imbalanced Data. Comput. Ind. Eng. 2021, 162, 107767. [Google Scholar] [CrossRef]
Chen, S.; Liu, M.; Hou, X.; Zhu, Z.; Huang, Z.; Wang, T. Wafer map defect pattern detection method based on improved attention mechanism. Expert Syst. Appl. 2023, 230, 120544. [Google Scholar] [CrossRef]
Choi, J.; Suh, D. A Depthwise Convolutional Neural Network Model Based on Active Contour for Multi-Defect Wafer Map Pattern Classification. Eng. Appl. Artif. Intell. 2025, 139, 109707. [Google Scholar] [CrossRef]
Mohammad, F.; Ryu, D. Semiconductor Wafer Map Defect Classification with Tiny Vision Transformers. arXiv 2025, arXiv:2504.02494. [Google Scholar] [CrossRef]
Nath, S. A Comparative Analysis of Semiconductor Wafer Map Defect Detection with Image Transformer. arXiv 2025, arXiv:2512.11977. [Google Scholar] [CrossRef]
Nafi, T.I.; Haque, E.; Farhan, F.; Rahman, A. High Accuracy Swin Transformers for Image-based Wafer Map Defect Detection. Int. J. Eng. Manuf. 2022, 12, 10–21. [Google Scholar] [CrossRef]
Zeng, L.; Mei, Z.; Shi, Z.; Chen, Y. DefectTrackNet: Efficient Root Cause Analysis of Wafer Defects in Semiconductor Manufacturing Using a Lightweight CNN–Transformer Architecture. In Proceedings of the 30th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 20–23 January 2025. [Google Scholar] [CrossRef]
Kim, Y.; Lee, J.H.; Jeong, J. Pre-trained CNN-based TransUNet Model for Mixed-Type Defects in Wafer Maps. Wseas Trans. Inf. Sci. Appl. 2023, 20, 238–244. [Google Scholar] [CrossRef]
Jeong, H.; Yoon, C.; Lim, H.; Chang, J.; Misra, S.; Kim, C. MT-former: Multi-task Hybrid Transformer and Deep Support Vector Data Description to Detect Novel Anomalies during Semiconductor Manufacturing. Light Adv. Manuf. 2025, 6, 306–318. [Google Scholar] [CrossRef]
Zheng, C. iFormer: Integrating ConvNet and Transformer for Mobile Application. arXiv 2025, arXiv:2501.15369. [Google Scholar] [CrossRef]
Misra, S.; Kim, D.; Kim, J.; Shin, W.; Kim, C. A voting-based ensemble feature network for semiconductor wafer defect classification. Sci. Rep. 2022, 12, 16254. [Google Scholar] [CrossRef]
Wu, M.J.; Jang, J.S.R.; Chen, J.L. Wafer Map Failure Pattern Recognition and Similarity Ranking for Large-Scale Data Sets. IEEE Trans. Semicond. Manuf. 2014, 28, 1–12. [Google Scholar] [CrossRef]
Manivannan, S. Semi-supervised imbalanced classification of wafer bin map defects using a Dual-Head CNN. Expert Syst. Appl. 2024, 238, 122301. [Google Scholar] [CrossRef]
Tsai, T.H.; Wang, C.Y. Wafer map defect classification using deep learning framework with data augmentation on imbalance datasets. EURASIP J. Image Video Process. 2025, 6, 6. [Google Scholar] [CrossRef]
Park, S.Y.; Kim, T.S. Fuzzy Inference System for Interpretable Classification of Wafer Map Defect Patterns. Electronics 2026, 15, 130. [Google Scholar] [CrossRef]
Zheng, H.; Sherazi, S.W.A.; Son, S.H.; Lee, J.Y. A Deep Convolutional Neural Network-Based Multi-Class Image Classification for Automatic Wafer Map Failure Recognition in Semiconductor Manufacturing. Appl. Sci. 2021, 11, 9769. [Google Scholar] [CrossRef]
Lee, J.; Ju, Y.; Lim, J.; Hong, S.; Baek, S.-W.; Lee, J. Enhancing Confidence and Interpretability of a CNN-Based Wafer Defect Classification Model Using Temperature Scaling and LIME. Micromachines 2025, 16, 1057. [Google Scholar] [CrossRef] [PubMed]
Sheng, H.; Cheng, K.; Jin, X.; Jiang, X.; Dong, C.; Han, T. An Efficient Deep Learning Framework for Mixed-Type Wafer Map Defect Pattern Recognition. AIP Adv. 2024, 14, 045329. [Google Scholar] [CrossRef]
Shin, E.; Yoo, C.D. Efficient Convolutional Neural Networks for Semiconductor Wafer Bin Map Classification. Sensors 2023, 23, 1926. [Google Scholar] [CrossRef]
Zhang, X.; Liang, X.; Zhang, Y.; Li, J.; Wei, S. MLR-WM-ViT: Global high-performance classification of mixed-type wafer map defect using a multi-level relay Vision Transformer. Expert Syst. Appl. 2025, 277, 127121. [Google Scholar] [CrossRef]
Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
Ho, J.; Kalchbrenner, N.; Weissenborn, D.; Salimans, T. Axial Attention in Multidimensional Transformers. arXiv 2019, arXiv:1912.12180. [Google Scholar] [CrossRef]
Wang, H.; Zhu, Y.; Green, B.; Adam, H.; Yuille, A.; Chen, L.-C. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020. [Google Scholar]
Guo, J.; Han, K.; Wu, H.; Tang, Y.; Chen, X.; Wang, Y.; Xu, C. CMT: Convolutional Neural Networks Meet Vision Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), New Orleans, LA, USA, 19–24 June 2022. [Google Scholar] [CrossRef]
Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
Dai, Z.; Liu, H.; Le, Q.V.; Tan, M. CoAtNet: Marrying Convolution and Attention for All Data Sizes. arXiv 2021, arXiv:2106.04803. [Google Scholar] [CrossRef]
Hao, Z.; Guo, J.; Jia, D.; Han, K.; Tang, Y.; Zhang, C.; Hu, H.; Wang, Y. Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation. In Proceedings of the Advances in Neural Information Processing Systems, San Diego, CA, USA, 28 November–2 December 2022. [Google Scholar]
Kyeong, K.; Kim, H. Classification of Mixed-Type Defect Patterns in Wafer Bin Maps Using Convolutional Neural Networks. IEEE Trans. Semicond. Manuf. 2018, 31, 395–402. [Google Scholar] [CrossRef]
Chen, S.; Huang, Z.; Wang, T.; Hou, X.; Ma, J. Wafer map defect recognition based on multi-scale feature fusion and attention spatial pyramid pooling. J. Intell. Manuf. 2025, 36, 271–284. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 27–29 October 2017. [Google Scholar]

Figure 1. Overall architecture of PSS-HNet based on iFormer, with the redesigned attention block.

Figure 2. Redesigned attention block. CPE: Conditional Positional Encoding, FFN: Feed-Forward Network.

Figure 3. Modulated Convolution unit.

Figure 4. Modulated Axial unit.

Figure 5. Representative wafer map examples for each class in WM-811K.

Figure 6. Class distribution of labeled samples after removing unlabeled data in WM-811K.

Figure 7. Confusion matrices (a): iFormer (baseline), (b) Axial-only, (c) Axial+Modulation, (d) PSS-HNet (proposed).

Figure 8. Normalized confusion matrices (a): iFormer (baseline), (b) Axial-only, (c) Axial+Modulation, (d) PSS-HNet (proposed).

Figure 9. Grad-CAM comparison: (a) Input wafer map, (b) iFormer (baseline), (c) PSS-HNet (proposed). Rows correspond to Scratch, Loc, and Edge-Loc classes, respectively.

Table 1. Overall performance comparison on WM-811K test set (9 classes).

Model	Accuracy	Macro-Precision	Macro-Recall	Macro-F1	Weighted-F1
iFormer (baseline)	0.9782 ± 0.0006	0.9264 ± 0.0041	0.8852 ± 0.0022	0.9044 ± 0.0010	0.9775 ± 0.0006
Axial-only	0.9666 ± 0.0005	0.8744 ± 0.0405	0.8258 ± 0.0105	0.8462 ± 0.0106	0.9652 ± 0.0010
Axial+Modulation	0.9780 ± 0.0003	0.9249 ± 0.0030	0.8901 ± 0.0018	0.9063 ± 0.0025	0.9773 ± 0.0004
PSS-HNet (proposed)	0.9780 ± 0.0002	0.9260 ± 0.0020	0.8954 ± 0.0034	0.9098 ± 0.0008	0.9774 ± 0.0003

Table 2. Class-wise precision, recall, and F1-score on the WM-811K test set (9 classes).

Model	Metrics	Center	Donut	Edge-Loc	Edge-Ring	Loc	Near-Full	Random	Scratch	None
iFormer (baseline)	Precision	0.96	0.90	0.87	0.98	0.86	0.94	0.95	0.88	0.99
	Recall	0.95	0.86	0.72	0.97	0.78	1.00	0.91	0.79	1.00
	F1	0.96	0.88	0.79	0.97	0.82	0.97	0.93	0.84	0.99
Axial-only	Precision	0.90	0.68	0.77	0.98	0.79	0.93	0.96	0.84	0.98
	Recall	0.94	0.86	0.64	0.94	0.68	0.83	0.73	0.70	0.99
	F1	0.92	0.76	0.70	0.96	0.73	0.88	0.83	0.77	0.99
Axial+Modulation	Precision	0.96	0.91	0.86	0.98	0.86	0.97	0.93	0.88	0.99
	Recall	0.95	0.86	0.72	0.97	0.77	1.00	0.90	0.83	1.00
	F1	0.95	0.89	0.79	0.98	0.81	0.98	0.91	0.85	0.99
PSS-HNet (proposed)	Precision	0.96	0.93	0.85	0.98	0.86	0.97	0.94	0.86	0.99
	Recall	0.96	0.89	0.72	0.97	0.77	1.00	0.92	0.84	0.99
	F1	0.96	0.91	0.78	0.97	0.81	0.98	0.93	0.85	0.99

Table 3. Efficiency comparison (Params/FLOPs/Latency/Macro-F1).

Model	Params	FLOPs	Latency	Macro-F1
iFormer (baseline)	6.245 M	1.103 G	8.666 ms	0.9044 ± 0.0010
Axial-only	7.833 M	1.223 G	10.718 ms	0.8462 ± 0.0106
Axial+Modulation	8.435 M	1.281 G	11.431 ms	0.9063 ± 0.0025
PSS-HNet (proposed)	4.381 M	0.754 G	7.682 ms	0.9098 ± 0.0008

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Song, J.; Oh, S.; Noh, J.; Hahn, M.; Kim, J. Integrating Local and Global Features for Wafer Defect Pattern Classification via Sequential Hybrid Architecture. Processes 2026, 14, 1134. https://doi.org/10.3390/pr14071134

AMA Style

Song J, Oh S, Noh J, Hahn M, Kim J. Integrating Local and Global Features for Wafer Defect Pattern Classification via Sequential Hybrid Architecture. Processes. 2026; 14(7):1134. https://doi.org/10.3390/pr14071134

Chicago/Turabian Style

Song, Jaeho, Seungmin Oh, Juhyeon Noh, Minsoo Hahn, and Jinsul Kim. 2026. "Integrating Local and Global Features for Wafer Defect Pattern Classification via Sequential Hybrid Architecture" Processes 14, no. 7: 1134. https://doi.org/10.3390/pr14071134

APA Style

Song, J., Oh, S., Noh, J., Hahn, M., & Kim, J. (2026). Integrating Local and Global Features for Wafer Defect Pattern Classification via Sequential Hybrid Architecture. Processes, 14(7), 1134. https://doi.org/10.3390/pr14071134

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating Local and Global Features for Wafer Defect Pattern Classification via Sequential Hybrid Architecture

Abstract

1. Introduction

2. Related Works

2.1. Wafer Map Defect Pattern Classification and Benchmark Datasets

2.2. CNN-Based and Lightweight Models for Wafer Map Classification

2.3. Vision Transformers for Global Context Modeling

2.4. Hybrid Architectures Combining Convolution and Global Interaction

2.5. Efficient Global Interaction Mechanisms for Lightweight Vision Models

3. Proposed Method

3.1. Overview of PSS-HNet

3.2. Overall Architecture and Block Organization

3.3. Modulated Convolution and Modulated Axial Units

3.4. Design Rationale

4. Experiments

4.1. Dataset (WM-811K)

4.2. Preprocessing and Data Split

4.3. Model Variants and Training Configuration

4.4. Evaluation Metrics and Protocol

5. Experimental Results and Analysis

5.1. Experimental Setup Recap

5.2. Overall Performance Comparison

5.3. Efficiency and Confusion Matrix Analysis

5.4. Qualitative Analysis with Grad-CAM

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI