Performance Evaluation of CNN-Based End-Point Detection Using In-Situ Plasma Etching Data

: As the technology node shrinks and shifts towards complex architectures, accurate control of automated semiconductor manufacturing processes, particularly plasma etching, is crucial in yield, cost, and semiconductor performance. However, current endpoint detection (EPD) methods relying on the experience of skilled engineers result in process variations and even errors. This paper proposes an enhanced optimal EPD in the plasma etching process based on a convolutional neural network (CNN). The proposed approach performs feature extraction on the spectral data obtained by optical emission spectroscopy (OES) and successfully predicts optimal EPD time. For the purpose of comparison, the support vector machine (SVM) classiﬁer and the Adaboost Ensemble classiﬁer are also investigated; the CNN-based model demonstrates better performance than the two models.


Introduction
As the technology node shrinks and shifts towards complex architectures, accurate control of automated semiconductor manufacturing processes, particularly plasma etching, is crucial in yield, cost, and semiconductor performance. However, current end-point detection (EPD) methods relying on skilled engineers' experience result in process variations and even errors. Various approaches have recently been put forward to reduce such variations and errors, based on artificial intelligence (AI).
Artificial intelligence (AI) allows predicting results and behaviors in advance from collected experimental data through training procedures. Attributed to recent enhancement in computing capability and algorithms, it has progressed significantly and has been widely used in vast application areas [1][2][3][4][5][6][7][8][9][10]. In scientific and engineering problem-solving and manufacturing processes, machine learning has received great attention [11][12][13]. As advanced manufacturing becomes more complex, faster, and automated, quality control, process monitoring and predictive maintenance are crucial. In this regard, AI is suitable for automated semiconductor manufacturing as the technology node shrinks and shifts towards complex architectures [14][15][16].
There are some reported works related to adopting machine learning and AI in yield improvement, electrical testing, and predictive equipment maintenance [17][18][19][20][21][22]. Recently, they have been explored to boost semiconductor fabrication processes such as finding and classifying defects, lithography pattern recognition, and plasma etching process [23][24][25][26][27][28][29]. For the sub-7 nm technology node, the plasma etching scheme for extreme ultraviolet (EUV) patterning is quite challenging and optimal EPD is of paramount importance. As a means of controlling plasma etching, the various EPD methods have been proposed, and a non-invasive optical emission spectroscopy (OES) monitoring is widely adopted [30,31]. However, two concerns may limit its applicability for the future technology node: (i) difficulty in monitoring a vast amount of data across wide spectrum ranges in every sub-second, and (ii) the smaller feature size, the weaker optical signal. Therefore, machine learning and AI have been expected to provide some solutions to these problems, and previous work to tackle these issues has included SVM and K-means classifier [32][33][34][35][36]. In recent, neural network architecture is proposed to map the sensor data as input and the metrology as output. The efficacy limits of the neural network model are demonstrated with a small dataset [37]. A deep learning-based domain adaptation method is proposed for fault diagnosis in semiconductor manufacturing [38]. In this study, the deep convolutional neural network is used for autonomous feature extraction and health condition classification. A deep learning approach is proposed for virtual metrology that exploits semi-supervised feature extraction based on deep convolutional autoencoders [39]. This approach is applied for etch rate estimation from optical emission spectroscopy (OES) data.
In this study, the CNN model is developed and optimized for improving prediction accuracy using OES spectral data during the plasma etching process. According to a previous work [30,31], the OES spectral data at the endpoint display a specific pattern with respect to wavelength. It is well-known that CNN is good at specific two-dimensional pattern recognition such as image detection. For this reason, the CNN-based model is employed in this study. The OES data used in this experiment are collected from the insitu plasma etching process monitoring. The ground truth endpoint times for endpoint are obtained by verifying the produced wafers. For comparing the performance of the proposed model, the support vector machine (SVM) [40][41][42] and the Adaboost [43,44] are employed for the endpoint detectors.
This paper is organized as follows. Section 2 presents the proposed model used in the experiments and feature extraction techniques are described in Section 3. The experiment results are presented and discussed in Section 4, and finally, Section 5 summarizes the work with future research directions. Figure 1 shows an overview of the training process employed in this study. After inputting training data for learning, they are converted into a matrix form or a vector form depending on convenience for learning in the pre-processing stage. Thereafter, the normalization process is conducted, and an artificial intelligence model is learned by feedbacking the prediction results. When testing the learned model, the test data are applied to the model through the pre-processing and normalization stages. The prediction results are compared to the ground truth to evaluate the performance of the models. In this study, a CNN model is investigated with respect to various parameters, including layer numbers and the size and number of filters, since such parameters are crucial to overfitting. A CNN model of eight layers is selected by trial and error, and it is observed that accuracy decreases when the number of layers and/or filter sizes decrease. The validation loss value increases when they increase. For exploiting the good feature of CNN and better performance, data are reshaped from a 1 × 2048 vector to a 32 × 64 matrix. In the first layer, the convolution layer, the number of filters is set to 16, its kernel size is 3 × 3, and Relu is employed for the active function. The pooling layer is maximum pooled with 2 × 2 kernels. The number of filters is set to 32 on the convolution layer. The remaining are the same as those of the first layer. The maximum pooling is then performed on the fourth layer, the fifth layer is a flatten layer, and all nodes are fully connected. In the sixth layer, 2688 nodes are fully connected to 20 nodes. The seventh layer, the dropout layer, is a normalization process that randomly removes some nodes entering a fully connected layer. Finally, in the eighth layer, the binary layer is fully connected to the two layers and returns to the Softmax function. The total amount of nodes used in this model is 58,622. Table 1 describes the construction of the model used in the experiment, which illustrates each layer's type and dimension, the size of the kernel, and the number of perceptrons connected. In the type of layers, the flatten layer is a layer that transforms two-dimensional information into one-dimensional to convey the characteristics obtained from the convolution layer and the pooling layer to the fully connected layer. The model structure of CNN is illustrated in Figure 2. The optimizer used in this study is an Adam optimizer, and the loss function is categorical cross entropy [45].

Optical Emission Spectroscopy (OES)
One of the most commonly used EPD techniques is to monitor optical emission spectra gathered from OES during the plasma etching process. Figure 3 shows a schematic illustration of a plasma etching chamber attached with OES through a viewport and its multi-wavelength OES data. A reactive plasma generated by radio-frequency (RF) power under low pressure bombards the wafer surface and reacts with targeted materials.
Consequently, the reactants and by-products of etching induce the variation of optical emission spectra at a certain time. The OES data is influenced not only by target materials but also by sizes of features to be etched because reduced feature sizes (i.e., low open area) only provide a low signal to noise ratio [46]. The EPD is identified by monitoring the shift of emission peak. The OES measurement is conducted conveniently without intervening in the process but provides reliable real-time information on the etching process. However, the OES data are vast and multi-dimensional as a function of wavelength, time, and intensity, and high-resolution data are required to provide required sensitivity and accuracy for EPD as the feature size decreases. The signal of emission spectra can be weak, and thus the existing simple method of tracking a few selected wavelengths may be insufficient for advanced technology nodes. Figure 4 shows a sample of actual OES data used in this work. The collected spectra range from 190.0 to 892.8 nm, and the sampling rate is 0.1 sec for about 60 sec. Figure 5 shows the intensity fluctuations of the wavelengths of 440.1 nm, 516.5 nm, 777.06 nm, which are related to C 2 and SiF with respect to time. The red line denotes the ground truth EPD time. Figure 6 illustrates one sample of the intensity patterns of each wavelength at the EPD time. To handle thousands of such OES data, feature extraction and the aforementioned CNN model are adopted.    Figure 7 illustrates the structure of the training data selection process. In the figure, the vertical axis represents wavelength, while the horizontal axis is sample time. One column denotes the 2048 × 1 vector, whose component represents each wavelength's intensity. The training data set consists of endpoint vectors selected in the endpoint block and non-endpoint vectors, randomly selected in the Non-End Point block. Three consecutive vectors are selected in the known endpoint time block and mapped to one for supervised learning. For the non-endpoint data, three vectors are randomly extracted using the random function after excluding the forward ten blocks of the endpoint. The reason for using three vectors is that the accuracy and loss of the model improve compared to extracting one vector and five vectors.

Feature Extraction
The total OES data are obtained for 2046 wafers, which are processed using two chambers, but in this study, each chamber's characteristics are not considered. 1911 OES data are randomly chosen for the training data, while the remaining 135 data are allocated for the test data. Using 1911 OES data, 5733 endpoint feature vectors, and 5733 non-end point feature vectors are acquired, as mentioned above. To prevent the overfitting of the model, the ratio of the number of endpoint and non-end point features is equalized. In the experiment, various data ratios, such as 8:2 and 6:4, are tested, but the ratio of 5:5 demonstrates the highest accuracy. After selecting the training data, a random function is applied to separate the training data and the validation data at an 8:2 ratio. As a result, the feature vector set is separated into 9172 training data and 2294 validation data.

Experiment and Results
In this section, the performance of the proposed CNN-based model is evaluated using the data described in the previous section and for the purpose of comparison, the SVM and the Adaboost are also employed to detect EPD time using the same feature vector. In the CNN-based model, the 2048 × 1 feature vector is transformed into a 32 × 64 matrix form and the other two models use the feature vector as given. The models are developed using Keras with Tensorflow of Python 3.7 in the background, and the computing environment used in the experiments is implemented with an 8-core 3.7-GHz CPU, 32GB of RAM, and an RTX 2080 super GPU.
Three tests are carried out to verify the performance of the three learned models. In the first test, the accuracy is evaluated by using 20% random validation data that are not involved in the learning phase. In the second test, the means and variances of the detection time of the three models are compared using 135 data sets. Each of these data sets contains about 600 consecutive feature vectors of 2048 × 1 according to time. The number of feature vectors in the set is varying depending on its EPD time. The accuracy and variance of the CNN-based model are investigated according to the number of the feature vectors that are selected in the EPD and Non EPD blocks.

Model Accuracy Test
The model accuracy is evaluated by comparing the model prediction outputs with the ground truth. The third-order SVM classifier demonstrates an accuracy of 99.3%. The Adaboost ensemble classifier achieves an accuracy of 99.17%. CNN shows an accuracy of 99.81%. As a result, the CNN performance was the highest among the three models. These accuracy results are summarized in Table 2. In addition, the receiver operating characteristic (ROC) curve and area under the curve (AUC) of each model are investigated and shown in Figure 8. As observed in the figure, the AUC of SVM is 0.996979, AdaBoost is 0.992447, and CNN is 0.999865. According to [47], AUC greater than 0.9 indicates that the model achieves outstanding detection performance.

First Endpoint Detection
This test is performed with the 135 test datasets, which are not involved in the learning. Each dataset contains the feature vectors, as shown in Figure 4. The size of the vector is 2048 by 1, and its number depends on its EPD time, usually about 600.
In the test, all spectral data over time are applied as inputs to evaluate the endpoint detection performance in the actual etching process. That is, each feature vector of one dataset is sequentially applied to each learned model, and the learned model responds one or zero according to the feature vector, in which one represents EPD and zero does Non-EPD. In this test, the time point when the first EPD (one) appears for the first time is measured for each dataset. The test is carried out for the three models using the 135 datasets. Figure 9 shows the average of 135 results obtained with the first endpoint detection test. On average, the first endpoint detection of the SVM classifier is 10.79 blocks ahead of the actual endpoint, that of the Adaboost Ensemble classifier is 6.91 blocks ahead, and the CNN is 5.96 blocks ahead. The three models' commonality in this test is that the endpoint is continuously detected without false detection after the initial detection. Therefore, if the appropriate number of continuous detections is set, the result will match the actual endpoint.  Figure 10 shows the histogram plots of 135 test results for the first endpoint detection test, and Table 3 summarizes their averages, standard deviations, and variances for the three models. As observed in Figure 10 and Table 3, in terms of accuracy and variance, the CNN based model is superior to the other two models. For the two previous experiments, the CNN based model outperforms the third-order SVM classifier and the Adaboost ensemble classifier.

Overfitting
The previous experiments show that the CNN-based model is relatively better than the remaining two models. In this section, further investigation is carried out for the CNN-based model regarding overfitting and feature size.
First, to investigate the overfitting of the CNN-based model, the accuracy and loss of the CNN based model are plotted for 300 epochs in Figure 11. The figure reveals that the validation loss does not decrease any more after around epoch 150 compared to the training loss, which can be regarded as overfitting. To overcome this, the early stopping technique [48] is employed to train with the appropriate number of epochs. In Figure 12, the accuracy and loss of the CNN-based model, which is trained with the early stopping technique. In this experiment, the early stopping technique terminates the learning phase at 132 epochs.

Conclusions
In this paper, the CNN-based endpoint detection performance was investigated in terms of model accuracy and first endpoint detection time compared to those of the thirdorder SVM classifier and the Adaboost ensemble classifier. Besides, to prevent overfitting, the application of the early stopping technique is investigated. It is observed that the performance of the CNN-based model is better than the other two classifiers for the two investigations. Considering the results of the CNN-based model obtained in such a nonoptimized situation, it is expected that the artificial intelligence technique using the neural network will greatly contribute to the improvement of the accuracy of the endpoint detection technique. In the future, for the model to be applied to the real process environments, an approach based on reinforced learning is required to be further investigated for the model.
Author Contributions: All authors contributed to writing, reviewing, and editing the paper. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Restrictions apply to the availability of these data. Data were obtained from Prime Solution Co., LTD., and are available from the authors with the permission of Prime Solution Co., LTD.