# Automated Detection and Classification of Defective and Abnormal Dies in Wafer Images

## Abstract

**:**

## Featured Application

**This study presents a fully automated scheme for wafer inspection using scanning acoustic tomography images. Differing from traditional template-matching based methods, the proposed method involves a template extraction algorithm and a deep learning-based classification. This benefits the inspection process, making it more convenient and accurate.**

## Abstract

## 1. Introduction

- We propose an automatic procedure for extracting a standard template which is then utilized for detecting the die patterns from the original SAT image of a wafer.
- From the detected die patterns and their spatial properties, we present a simple method to predict the locations of pattern candidates that possibly contain certain predefined patterns.
- We design and implement a deep CNN-based classifier to identify all detected patterns and predicted pattern candidates. This classifier can categorize them into the background, alignment mark, normal, and abnormal classes.
- Finally, the proposed method uses the obtained patterns with the spatial properties and classification results to produce a wafer map. This map provides important information to engineers in their analysis regarding the root cause of die-scale failures [19].

## 2. The Proposed Method

#### 2.1. Automatic Template Extraction

#### 2.1.1. Template Size Estimation

- Initialize parameters: The original SAT image has a pixel resolution of ${w}_{\mathrm{Orig}}\times {h}_{\mathrm{Orig}}$, patch image has a pixel resolution of ${w}_{\mathrm{P}}\times {h}_{\mathrm{P}}$, and template has an initial pixel resolution of ${w}_{\mathrm{Tpl}}\times {h}_{\mathrm{Tpl}}$, with a similarity threshold of ${T}_{\mathrm{SIM}}$. These will be determined and discussed in Section 3.1.
- The original image is converted into a grayscale image.
- An image patch ${I}_{\mathrm{P}}$ with a pixel resolution of ${w}_{\mathrm{P}}\times {h}_{\mathrm{P}}$ is randomly cropped near the central area from the grayscale SAT image. If the original image is not too large, it can be considered an image patch; thus, this step can be skipped.
- Histogram equalization is applied to enhance the contrast on this cropped patch. Hence, for different imaging settings of the SAT, consistent performance is maintained when conducting the following steps. Figure 2 shows the results of the cropped patch before and after histogram equalization.
- An initial template ${I}_{\mathrm{Tpl}}$ with a size of ${w}_{\mathrm{Tpl}}\times {h}_{\mathrm{Tpl}}$ is randomly cropped from the patch ${I}_{\mathrm{P}}$, as shown in Figure 3. If step 3 is skipped, we crop this initial template from the grayscale SAT image.
- An ordinary template matching process is conducted to find the parts of image ${I}_{\mathrm{P}}$ that are similar to template ${I}_{\mathrm{Tpl}}$. This step simply slides the initial template image over the patch as in a two-dimensional convolution and calculates the following metric for comparing the template ${I}_{\mathrm{Tpl}}$ against the local region of the patch ${I}_{\mathrm{Loc}}$.$$R\left(x,y\right)=\frac{{{\displaystyle \sum}}_{{x}^{\prime},{y}^{\prime}}\left({I}_{\mathrm{Tpl}}\left({x}^{\prime},{y}^{\prime}\right)\xb7{I}_{\mathrm{Loc}}\left(x+{x}^{\prime},y+{y}^{\prime}\right)\right)}{\sqrt{{{\displaystyle \sum}}_{{x}^{\prime},{y}^{\prime}}{I}_{\mathrm{Tpl}}{\left({x}^{\prime},{y}^{\prime}\right)}^{2}\xb7{{\displaystyle \sum}}_{{x}^{\prime},{y}^{\prime}}{I}_{\mathrm{Loc}}{\left(x+{x}^{\prime},y+{y}^{\prime}\right)}^{2}}}$$
- A binary thresholding process is applied on this map to obtain a binary map ${R}_{\mathrm{B}}$ as follows:$${R}_{\mathrm{B}}\left(x,y\right)=\{\begin{array}{c}1,\mathrm{if}R\left(x,y\right)\ge {T}_{\mathrm{SIM}};\\ 0,\mathrm{otherwise}\end{array}$$
- A morphological opening operation is conducted to reduce small noise in map ${R}_{\mathrm{B}}$. Figure 5 shows the results of this step. As observed from the enlarged region depicted on the right, each presented bright dot is an object that is formed with connected bright pixels.
- The connected component method is applied to label all bright objects in map ${R}_{\mathrm{B}}$, and then calculate the centroid of every object. Here, ${c}_{i}=\left({x}_{i},{y}_{i}\right)$ denotes the center of the $i$-th object, and $1\le i\le {N}_{\mathrm{Obj}}$ for a total of ${N}_{\mathrm{Obj}}$ objects obtained from ${R}_{\mathrm{B}}$.
- A set of displacement tuples is found by considering every possible pair of $\left(i,j\right)$, for $1\le j\le {N}_{\mathrm{Obj}}$ and $j<i\le {N}_{\mathrm{Obj}}$.$$\mathcal{D}=\left\{{d}_{i,j}=\left(\left|{x}_{i}-{x}_{j}\right|,\left|{y}_{i}-{y}_{j}\right|\right)|\forall ij\right\}$$
- Every displacement vector ${d}_{i,j}$ contributes to a voting space $\mathcal{V}\left(p,q\right)$ as follows:$$\mathcal{V}\left(\left|{x}_{i}-{x}_{j}\right|,\left|{y}_{i}-{y}_{j}\right|\right)\leftarrow \mathcal{V}\left(\left|{x}_{i}-{x}_{j}\right|,\left|{y}_{i}-{y}_{j}\right|\right)+1$$
- Similar to steps 7–9, the centroid of every local peak is found in this voting space, and the centroid ${c}^{*}=\left({p}^{*},{q}^{*}\right)$ that is nearest to the origin of $\mathcal{V}$ is then localized. Therefore, the template size is estimated as follows:$${w}_{\mathrm{Tpl}}^{*}={p}^{*}\mathrm{and}{h}_{\mathrm{Tpl}}^{*}={q}^{*}$$

#### 2.1.2. Standard Template Extraction

- The initial template is first smoothed using a two-dimensional Gaussian filter with a kernel size of 5 $\times $ 5 pixels. Because the weights are effectively zero out of a 5 $\times $ 5 filter when approximating to Gaussian function with a standard deviation $\sigma =1.0$, we select this kernel size in this study.
- After labeling all bright objects, the largest one is found and its centroid $\left({x}_{\mathrm{L}},{y}_{\mathrm{L}}\right)$ is recorded.
- A patch centered at $\left({x}_{\mathrm{L}},{y}_{\mathrm{L}}\right)$ is cropped to a size of $\left({w}_{\mathrm{Tpl}}^{*},{h}_{\mathrm{Tpl}}^{*}\right)$ pixels from the initial template. This cropped image can be considered the standard template. In Figure 7, the green rectangle in subplot (a) shows the extracted template and (b) shows its close-up.

#### 2.2. Die Pattern Detection and Clustering

- Let the first coordinate point ${x}_{1}^{\mathrm{TL}}$ be taken as the first cluster center ${\mu}_{1}$. Let the selected set be $\mathcal{S}=\left\{1\right\}$, and the cluster set $\mathcal{C}=\left\{{c}_{1}\right\}$.
- Select the next point from $\left\{{x}_{l}^{\mathrm{TL}}|l\in \mathcal{K}\backslash \mathcal{S}\right\}$, and compute the distance ${d}_{c}\left({x}_{l}^{\mathrm{TL}}\right)$ for every $c\in \mathcal{C}$. Apply index $l$ into set $\mathcal{S}$.
- Compare this distance ${d}_{c}\left({x}_{l}^{\mathrm{TL}}\right)$ with the threshold ${T}_{\mathrm{d}}$. If ${d}_{c}\left({x}_{l}^{\mathrm{TL}}\right)<{T}_{\mathrm{d}}$, then set ${x}_{l}^{\mathrm{TL}}$ belonging to cluster $c$. Next, update center ${\mu}_{c}$ by averaging all coordinate points belonging to cluster $c$. In contrast, let ${x}_{l}^{\mathrm{TL}}$ become a new prototype point, and add a new cluster ${c}_{\#\left(\mathcal{C}\right)+1}$ with its center ${\mu}_{\#\left(\mathcal{C}\right)+1}={x}_{l}^{\mathrm{TL}}$. Here, $\#\left(\mathcal{C}\right)$ denotes the number of clusters in $\mathcal{C}$.
- Repeat steps 2–3 until all coordinate points belong to their corresponding clusters.

#### 2.3. Pattern Classification for Inspection

## 3. Implementation and Experimental Results

#### 3.1. Experiments on Automatic Template Extraction and Die Detection

- The size of the original SAT image is: ${w}_{\mathrm{Orig}}=\mathrm{30,000}$ and ${h}_{\mathrm{Orig}}=\mathrm{30,000}$.
- The size of the image patch is: ${w}_{\mathrm{P}}={w}_{\mathrm{Orig}}/5=6000$ and ${h}_{\mathrm{P}}={h}_{\mathrm{Orig}}/5=6000$. This size is determined to ensure that there are sufficient die patterns in this image patch. If template extraction fails, this size can be increased by ${w}_{\mathrm{P}}={w}_{\mathrm{Orig}}/4=7500$, ${h}_{\mathrm{P}}={h}_{\mathrm{Orig}}/4=7500$, and so on.
- The size of the initial template is: ${w}_{\mathrm{Tpl}}={w}_{\mathrm{P}}/3=2000$ and ${h}_{\mathrm{Tpl}}={h}_{\mathrm{P}}/3=2000$. The criterion for determining this size is to ensure that there exists one (or more) whole die pattern in this initial template. Generally, this size is big enough to detect and extract a standard template.
- The similarity threshold is the 90th percentile value of the map $R\left(x,y\right)$, that is, ${T}_{\mathrm{SIM}}=0.9\times \underset{x,y}{\mathrm{max}}\left\{R\left(x,y\right)\right\}$.
- The binarization thresholds are adaptively determined using Otsu’s method [20].

#### 3.2. Implementation of Die Pattern Classification

#### 3.3. Comparison among Feature Extractors

#### 3.4. Wafer Map Generation for Inspection Visualization

## 4. Discussion

#### 4.1. More Discussion on Pattern Classification Models

#### 4.2. Generalizing to Inspect More Abnormal Patterns

## 5. Conclusions

## Funding

## Conflicts of Interest

## References

- Huang, S.H.; Pan, Y.C. Automated visual inspection in the semiconductor industry: A survey. Comput. Ind.
**2015**, 66, 1–10. [Google Scholar] [CrossRef] - Shankar, N.G.; Zhong, Z.W. Defect detection on semiconductor wafer surfaces. Microelectron. Eng.
**2005**, 77, 337–346. [Google Scholar] [CrossRef] - Schulze, M.A.; Hunt, M.A.; Voelkl, E.; Hickson, J.D.; Usry, W.; Smith, R.G.; Bryant, R.; Thomas, C.E., Jr. Semiconductor wafer defect detection using digital holography. In Proceedings of the SPIE 5041, Process and Materials Characterization and Diagnostics in IC Manufacturing, Santa Clara, CA, USA, 27–28 Febuary 2003; pp. 183–193. [Google Scholar]
- Kim, S.; Oh, I.S. Automatic defect detection from SEM images of wafers using component tree. J. Semicond. Tech. Sci.
**2017**, 17, 86–93. [Google Scholar] [CrossRef] - Yeh, C.H.; Wu, F.C.; Ji, W.L.; Huang, C.Y. A wavelet-based approach in detecting visual defects on semiconductor wafer dies. IEEE Trans. Semicond. Manuf.
**2010**, 23, 284–292. [Google Scholar] [CrossRef] - Pan, Z.; Chen, L.; Li, W.; Zhang, G.; Wu, P. A novel defect inspection method for semiconductor wafer based on magneto-optic imaging. J. Low Temp. Phys.
**2013**, 170, 436–441. [Google Scholar] [CrossRef] - Hartfield, C.D.; Moore, T.M. Acoustic Microscopy of Semiconductor Packages. In Microelectronics Failure Analysis: Desk Reference, 6th ed.; Ross, R.J., Ed.; ASM International: Materials Park, OH, USA, 2011; pp. 362–382. [Google Scholar]
- Sakai, K.; Kikuchi, O.; Kitami, K.; Umeda, M.; Ohno, S. Defect detection method using statistical image processing of scanning acoustic tomography. In Proceedings of the IEEE 23rd International Symposium on the Physical and Failure Analysis of Integrated Circuits, Singapore, 18–21 July 2016; pp. 293–296. [Google Scholar]
- Kitami, K.; Takada, M.; Kikuchi, O.; Ohno, S. Development of high resolution scanning aeoustie tomograph for advanced LSI packages. In Proceedings of the IEEE 20th International Symposium on the Physical and Failure Analysis of Integrated Circuits, Suzhou, China, 15–19 July 2013; pp. 522–525. [Google Scholar]
- Sakai, K.; Kikuchi, O.; Takada, M.; Sugaya, N.; Ohno, S. Image improvement using image processing for scanning acoustic tomograph images. In Proceedings of the IEEE 22nd International Symposium on the Physical and Failure Analysis of Integrated Circuits, Hsinchu, Taiwan, 29 June–2 July 2015; pp. 163–166. [Google Scholar]
- Brunelli, R. Template Matching as Testing. In Template Matching Techniques in Computer Vision: Theory and Practice, 1st ed.; John Wiley & Sons Ltd.: West Sussex, UK; New York, NY, USA, 2009; pp. 43–71. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv
**2014**, arXiv:1409.1556. [Google Scholar] - Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 2012, Lake Tahoe, CA, USA, 3–8 December 2012; pp. 1097–1105. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–10 Febuary 2017; pp. 4278–4284. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Efficient convolutional neural networks for mobile vision applications. arXiv
**2017**, arXiv:1704.0486. [Google Scholar] - Nakazawa, T.; Kulkarni, D.V. Wafer map defect pattern classification and image retrieval using convolutional neural network. IEEE Trans. Semicond. Manuf.
**2018**, 31, 309–314. [Google Scholar] [CrossRef] - Sezgin, M.; Sankur, B. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging
**2004**, 1, 146–166. [Google Scholar] [CrossRef] - Picard, R.R.; Cook, R.D. Cross-validation of regression model. J. Am. Stat. Assoc.
**1984**, 79, 575–583. [Google Scholar] [CrossRef] - Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis.
**2015**, 115, 211–252. [Google Scholar] [CrossRef][Green Version] - Bianco, S.; Cadene, R.; Celona, L.; Napoletano, P. Benchmark analysis of representative deep neural network architectures. IEEE Access
**2018**, 6, 64270–64277. [Google Scholar] [CrossRef]

**Figure 6.**Binarized image of the initial template in Figure 3.

**Figure 10.**Four different patterns: (

**a**) background; (

**b**) alignment mark; (

**c**) normal; (

**d**) abnormal pattern.

**Figure 11.**More examples of abnormal patterns: (

**a**) crack; (

**b**) defect; (

**c**) and (

**d**) errors caused by voids.

**Figure 18.**Results of our proposed inspection method for three SAT images: (

**a**) img01; (

**b**) img02; (

**c**) img03.

Image | Template | Template Size (Unit: Pixels) | # of Detected Die Patterns | # of Predicted Regions |
---|---|---|---|---|

img01 | 13(a) | $300\times 320$ | 6745 | 1718 |

img02 | 13(b) | $306\times 318$ | 6756 | 1889 |

img03 | 13(c) | $302\times 320$ | 6763 | 1882 |

Feature Extractor: ResNet-50 Encoder | ||||
---|---|---|---|---|

Layer Name | Kernel Size | Stride | Channels | Repeat Times |

Conv 1 | $7\times 7$ | 2 | 3$\to $64 | 1 |

Pool 1 | $3\times 3$ | 2 | 1 | |

Resblock 1 | $\left[\begin{array}{c}1\times 1\\ 3\times 3\\ 1\times 1\end{array}\right]$ | 1 | 64$\to $256 | 3 |

Resblock 2 | $\left[\begin{array}{c}1\times 1\\ 3\times 3\\ 1\times 1\end{array}\right]$ | 1 | 256$\to $512 | 4 |

Resblock 3 | $\left[\begin{array}{c}1\times 1\\ 3\times 3\\ 1\times 1\end{array}\right]$ | 1 | 512$\to $1024 | 6 |

Resblock 4 | $\left[\begin{array}{c}1\times 1\\ 3\times 3\\ 1\times 1\end{array}\right]$ | 1 | 1024$\to $2048 | 3 |

Classifier: Fully-Connected Neural Network | ||
---|---|---|

Layer Name | Input Dimension | Output Dimension |

FC-1 ^{1} | 2048 | 1000 |

FC-2 ^{1} | 1000 | 100 |

FC-3 ^{1} | 100 | 4 |

Softmax ^{2} | 4 | 4 |

^{1}FC = fully connected layer.

^{2}Softmax is used to map the output of a neural network to a probability distribution over the predicted output classes. This ensures that the sum of all output elements equals 1.

Class Label | of Training Samples | of Validation Samples |
---|---|---|

Background | 417 | 83 |

Alignment mark | 375 | 75 |

Normal | 560 | 140 |

Abnormal | 417 | 83 |

Predicted | Background | Alignment Mark | Normal | Abnormal | Accuracy (%) | |
---|---|---|---|---|---|---|

True | ||||||

Background | 83 | 0 | 0 | 0 | 100 | |

Alignment mark | 0 | 64 | 0 | 0 | 100 | |

Normal | 0 | 0 | 138 | 2 | 98.57 | |

Abnormal | 0 | 0 | 0 | 83 | 100 |

Extractor | Time (Unit: ms) | Number of Extractor Parameters | Total Number of Model Parameters | ||
---|---|---|---|---|---|

Min. | Max. | Avg. | |||

VGG-16 | 30.25 | 35.63 | 31.02 | 14,714,688 | 39,904,192 |

VGG-19 | 36.75 | 39.63 | 37.19 | 20,024,384 | 45,213,888 |

InceptionV3 | 33.38 | 45.88 | 35.02 | 21,802,784 | 73,104,288 |

MobileNet | 24 | 30.63 | 25.03 | 3,228,864 | 53,506,368 |

ResNet-50 | 31 | 42 | 32.58 | 23,587,712 | 25,737,216 |

Extractor | Size of Feature Vector | Composition of Hidden Layers | ||
---|---|---|---|---|

(1000) | (1000, 100) | (1000, 100, 10) | ||

VGG-16 | 25,088 | 39,807,692 | 39,904,192 | 39,904,842 |

VGG-19 | 25,088 | 45,117,388 | 45,213,888 | 45,214,538 |

InceptionV3 | 51,200 | 73,007,788 | 73,104,288 | 73,104,938 |

MobileNet | 50,176 | 53,409,868 | 53,506,368 | 53,507,018 |

ResNet-50 | 2048 | 25,640,716 | 25,737,216 | 25,737,866 |

Extractor | Composition of Hidden Layers | ||
---|---|---|---|

(1000) | (1000, 100) | (1000, 100, 10) | |

VGG-16 | 0.8701 | 0.8809 | 0.8167 |

VGG-19 | 0.8766 | 0.8802 | 0.8673 |

InceptionV3 | 0.8016 | 0.8206 | 0.7976 |

MobileNet | 0.7972 | 0.8109 | 0.6548 |

ResNet-50 | 0.8794 | 0.8817 | 0.8663 |

Layer 1 | Number of Neurons | |||
---|---|---|---|---|

Layer 2 | 1000 | 500 | 200 | |

Number of neurons | 200 | 0.8715 | 0.8728 | 0.8756 |

100 | 0.8817 | 0.8789 | 0.8635 | |

50 | 0.8722 | 0.8810 | 0.8744 |

True | Background | Alignment Mark | Normal | Imaging Error | Crack | Pinhole | Accuracy (%) | |
---|---|---|---|---|---|---|---|---|

Predicted | ||||||||

Background | 83 | 0 | 0 | 0 | 0 | 0 | 100 | |

Alignment mark | 0 | 64 | 0 | 0 | 0 | 0 | 100 | |

Normal | 0 | 0 | 125 | 6 | 4 | 0 | 92.59 | |

Imaging error | 0 | 0 | 5 | 57 | 0 | 0 | 91.94 | |

Crack | 0 | 0 | 1 | 0 | 8 | 0 | 88.89 | |

Pinhole | 0 | 0 | 0 | 1 | 0 | 16 | 94.12 |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chen, H.-C. Automated Detection and Classification of Defective and Abnormal Dies in Wafer Images. *Appl. Sci.* **2020**, *10*, 3423.
https://doi.org/10.3390/app10103423

**AMA Style**

Chen H-C. Automated Detection and Classification of Defective and Abnormal Dies in Wafer Images. *Applied Sciences*. 2020; 10(10):3423.
https://doi.org/10.3390/app10103423

**Chicago/Turabian Style**

Chen, Hsiang-Chieh. 2020. "Automated Detection and Classification of Defective and Abnormal Dies in Wafer Images" *Applied Sciences* 10, no. 10: 3423.
https://doi.org/10.3390/app10103423