Artiﬁcial Immune System for Fault Detection and Classiﬁcation of Semiconductor Equipment

: Semiconductor manufacturing comprises hundreds of consecutive unit processes. A single misprocess could jeopardize the whole manufacturing process. In current manufacturing environments, data monitoring of equipment condition, wafer metrology, and inspection, etc., are used to probe any anomaly during the manufacturing process that could affect the ﬁnal chip performance and quality. The purpose of investigation is fault detection and classiﬁcation (FDC). Various methods, such as statistical or data mining methods with machine learning algorithms, have been employed for FDC. In this paper, we propose an artiﬁcial immune system (AIS), which is a biologically inspired computing algorithm, for FDC regarding semiconductor equipment. Process shifts caused by parts and modules aging over time are main processes of failure cause. We employ state variable identiﬁcation (SVID) data, which contain current equipment operating condition, and optical emission spectroscopy (OES) data, which represent plasma process information obtained from faulty process scenario with intentional modiﬁcation of the gas ﬂow rate in a semiconductor fabrication process. We achieved a modeling prediction accuracy of modeling of 94.69% with selected SVID and OES and an accuracy of 93.68% with OES data alone. To conclude, the possibility of using an AIS in the ﬁeld of semiconductor process decision making is proposed.


Introduction
As semiconductor devices shrink, the semiconductor manufacturing process is becoming increasingly complicated. The semiconductor process is becoming more complicated, requiring hundreds of consecutive processing steps that can take as much as two months to produce the finalized semiconductor chips [1]. Factory integration has boosted manufacturing efficiency, ensuring that decreased waste generation, production equipment utilization, and predictive operation are parts of the semiconductor manufacturing infrastructure. Predictive operation uses specialized features of predictive maintenance (PdM), metrology prediction via virtual metrology (VM), and fault detection and classification (FDC) to support reactive operations for decreased costs and improved quality improvement [2][3][4].
An anomaly during the process may jeopardize the final chip performance and quality. A real-time monitoring and diagnosis system that uses a large amount of data is crucial for preventing any shift in the final chip performance or chip failure by a faulty process step. Moreover, research on smart manufacturing, dealing with big data in various fields such as quality control, maintenance, and scheduling, is actively being conducted [5][6][7]. The data generated from the manufacturing process and process control operations to metrology and inspection contain complicated information in fragmented sources. Therefore, it is difficult to process such large amounts of collected and stored data in spite of the importance and high utilization of manufacturing data. Moreover, current database management systems are limited in their ability to access, analyze, and extract information from the large variety of manufacturing data sources stemming from the semiconductor manufacturing industry [8]. Extracting data containing significant domain knowledge on the process is in need for subject matter expertise (SME) as well. The predictive operation process begins with handling big data acquired from sensors, equipment, and wafer metrology and inspection. Related challenges include the following: (i) improving collection, transfer, store, and computation speeds; (ii) improving data quality; and (iii) optimizing software for actionable analysis to improve decision making. Data collection and computation speeds have dramatically improved with the emergence of edge computing devices and graphic processor units. Data quality is also continuously improving with denoising features [9]. However, decision making in the semiconductor technology domain requires extensive SME. The importance of data-driven decision making cannot be overemphasized [10].
Advanced process control (APC) systems, which consist of FDC as well as run-to-run (R2R) control component, are pervasive in current fabs and utilize an extensive amount of data. They can be a solution of the finer levels of control and diagnostics of complex processes [11,12]. The use of sensors that enable the monitoring of plasma, which directly affects the actual process, has made an excellent contribution to the APC system, and studies using sensors such as RF sensors, optical sensors, and various types of probes have been conducted [13,14]. Numerous measurement processes in complex processes are essential, but have a devastating effect on time and cost, and research on virtual metrology has been actively conducted to solve this problem [15,16]. As the process is refined, even a small change in the condition of the equipment can cause a fault in the process result, and the need for a fault detecting technology in the process has increased. The FDC model consists of three building stages, namely, feature extraction, feature selection, and classification, and various modeling and decision-making algorithms [17].
In this paper, we present an artificial immune system (AIS) inspired by biology to support the initial decision making for the identification [18]. It is capable of detection of abnormal process conditions that are considered as faulty and affects the results of the process by using an AIS algorithm. In fact, COVID-19 inspired us to employ an AIS to solve the semiconductor process diagnostic problem. In the study, we employed two types of data. The first one is equipment state variable identification (SVID) data acquired from etch equipment and the other one is optical emission spectroscopy (OES) data containing plasma chemistry information. OES is a useful plasma monitoring sensor that measures the variation of the optical emission intensity of the plasma as a function of the reactants and byproducts inside the etch chamber, representing plasma process information [19]. By developing and monitoring a system that automatically transfers the two data types to the database on real-time, data can be managed and utilized in forms of process monitoring and detecting the process shifts.

Related Work
Traditional semiconductor manufacturing relied on statistical process controls used to monitor the production process. As the process became more complex and refined, process control with finer precision and accuracy was required and APC was proposed [20,21]. APC enables real-time process monitoring and control using sensors to instantly identify and control the critical process shifts [21]. Sensor-based APC studies such as a study that proposed strict control of CD and phase angle in a photomask dry-etch process using the RF sensor and a study using an in situ plasma monitoring sensor to process diagnosis and endpoint detection in etch process were conducted [22][23][24].
FDC, the component of an APC system, is a solution to the problem of process fault caused by changes in equipment condition due to aging of equipment parts. Through FDC, it became possible to detect the equipment abnormality and find out its cause in real-time. Accordingly, a study was conducted to describe the parts constituting the equipment and find the root cause of which part of etcher is a problem by applying a modular neural network to the SVID data [25]. A study that applied conjugate-based least-square support vector machine to SVID data and study that applied isolation forest algorithm to in situ monitoring sensor data were proposed to determine the abnormality of the process in real-time [26,27]. Research using multiple algorithms was also presented such as a fault detection study through applying an ensemble algorithm after selecting key SVID data with random forest and k-means clustering, evaluating selected data with k-nearest neighbors (KNN) and naive bayes classifiers [28]. A fault detection study of high-density plasma-chemical vapor deposition equipment through the autoencoder-based model, the study using the imputation data and algorithms like KNN, support vector machine, and logistic regression, and the anomaly detection study using the time-series data were conducted [29][30][31]. Research using plasma monitoring sensor data as well as SVID data also progressed. Fault detection using principal component analysis and statistical monitoring chart using OES data of three major wavelengths were studied [32]. NNs have been applied for the OES and residual gas analysis data to achieve FDC in reactive ion etching [33]. In recent years, the FDC study progressed using a wide range of data and algorithms.

Algorithm
An immune system is a biological system that can identify and eliminate foreign substances called antigens, which include viruses, bacteria, and other parasites, to keep the body healthy. This system consists of two levels of defense systems: the innate immune and the adaptive immune [34]. Both systems are basically maintained by white blood corpuscle. Granulocytes and macrophages play a major role in the innate immune system, by attacking all foreign substances and reacting directly and immediately without distinction. Conversely, lymphocytes play an important role in the adaptive immune system, reinforcing innate immunity by making antibodies that can effectively remove antigens by remembering those that first invaded and reacting specifically once they reinvade [35,36]. The commonly used definition of immunity refers to the adaptive immune system.
AISs are biologically inspired algorithms that use the principle of the aforementioned immune system. On the basis of their classification and reasoning abilities, AIS algorithms help solve pattern recognition, optimization, machine learning, and computer network security problems [36]. The system largely employs positive selection (PS) and negative selection (NS) algorithms and a clonal selection (CS) process, a process in which B lymphocytes create more effective antigens to remove antibodies. In an AIS, PS and NS algorithms use a process to distinguish between themselves and what is not. The CS algorithm uses a process of producing antigens that can more effectively remove common antibodies [37].
In the initialization step shown in Figure 1, it is required to normalize training data and measure distance and affinity [38]. The initial training data requires accurate class labeling, and a data normalization is performed to a value between [0, 1] in the initialization step. In this paper, we use the normalized input data of all dimensions under the assumption that data of all dimensions are independent. This process simplifies the calculation of the distance between data. The normalized data is used to calculate affinity by using the Euclidean distance later in the training process which is called antigen training. Preparing a pool of memory cells, which is the training data exemplars, is crucial for successfully classifying the unseen data based on the AIS algorithm. Therefore, the process of initializing data and system variables for use in the model training is very important. In the antigen training step, recognition cells in the memory pool are stimulated by the antigen, and each cell is allocated a stimulation value, which is the reciprocal of affinity. The memory cell with the greatest stimulation is selected as the best match memory cell in next memory cell selection step. The memory cell with the greatest stimulation is selected as the best match in the next memory cell selection step. The memory cell with the largest stimulation value is selected as an antibody string in Figure 2. After the memory cell selection step, the classification process is conducted to distinguish between normal and faulty data.  As Figure 2 shows, the NS algorithm consists of generated random strings and match steps. The random string step generates random antibodies, and all samples of the training set are compared to the "self" sample of antibody strings. In the matching step, if an antibody classifies the sample into the "self" class which is the normal state, the antibody is rejected from the antibody population. As a result, if the Euclidean distance between the newly entered data and the trained data is similar (expressed as match in the AIS algorithm), it is defined as a self, the normal state, and if not, it is defined as a non-self, the abnormal state [39]. In this research, if data fail to match, it is classified as faulty data. Table 1 presents the explanation of abbreviations used in the pseudo code, and Algorithm 1 is a pseudo code for the initialization step and antigen training step. In Algorithm 1, the "Calculate.affinity" function of the antigen training step calculates the Euclidian distance between the cell of the memory cell selected as the antibody string and the newly entered data. It checks whether the calculated value matches existing data through the "distance.match" function. Algorithm 2 is a pseudo code for the classification method using the NS algorithm described in Figure 2. If the distance between the data to be predicted and the antibody acting as the detector do not match, it is classified as "non-self", and if it matches, it is classified as "self". The AIS algorithm demonstrates the following characteristics: self-regulation, high performance, parameter stability, and labeling necessity [40]. Self-regulation implies that there is no need to select the internal structure of the algorithm; the appropriate structure is discovered and learned by the algorithm itself during model training. Among widely known classification systems, the AIS algorithm shows a relatively high accuracy even when it handles a small amount of data. The AIS algorithm can relatively train models fast since it is a single-shot detection algorithm. Their speed and ability to find optimal parameter values satisfy parameter stability. In addition, acquiring optimal results from a wide range of parameters allows a good tuning technique to achieve improved results. An AIS perhaps requires an arduous labeling process in the training dataset for antigen definition (or self conceptually). However, differentiating between the normal and faulty conditions fast with high accuracy is crucial. These feature and classification functions of AIS algorithms enable creating a population of simple classifiers. In this paper, we implemented an AIS algorithm for real-time anomaly detection in semiconductor manufacturing equipment. The data was achieved by an intentional modification of the process parameter of gas supplying parts by focusing on the ability of the NS algorithm to self-identify foreign substances.

Experiment and Data Acquisition
Semiconductor equipment consists of several system modules, such as RF power, process, and the gas delivery. Each module comprises numerous submodules and parts [25]. These parts degrade over time and fail to function properly. Plasma etch process equipment is sensitive to many parameters. The amount of gas flow through a mass flow controller (MFC) directly affects the chemical reactant in the plasma, which determines the result of the etch profile and selectivity. To this end, we focused on the performance degradation of MFCs in the plasma etch system. Experimental data were obtained by a reasonable scenario of the wearing of the MFC, which controls the flow rate of gas flowing through the process chambers [41]. We employed a 300-mm plasma etch system with 13.56-MHz RF-powered inductively coupled plasma-reactive ion etching (ICP-RIE) for the experiment. Experimental runs were conducted with the scenario that MFC for SF 6 was degraded in the anisotropic silicon trench etch using SF 6 /O 2 /Ar along the process time. We mimicked the malfunction scenario by intentionally modifying the set value of the MFC of SF 6 by ±2 sccm, assuming that the parameter set point was not disturbed. In other words, although the actual amount of injected gas differed from the recipe in steady state, it is assumed that the MFC and the equipment do not properly recognize by themselves. Table 2 presents the process recipe of the experiment condition. Collecting and monitoring the SVID data in real-time is essential during the experiment, since the value drifts slightly even if it is set to a constant value. These subtle changes are significant because the process is highly sensitive to the plasma. The plasma is greatly influenced by process recipe parameters such as bias power and gas flow rate during its formation. To acquire real-time SVID data from semiconductor manufacturing equipment, we used High-speed SECS Message Service (HSMS) communication and Mari-aDB. Figure 3 explains the overall data flow. HSMS is a standard transport communication protocol in semiconductor factories. It is a transmission control protocol/internet protocol (TCP/IP)-based Ethernet connection that is an alternative to the simple SECS-I protocol, and it is a SEMI standard [42]. Through the HSMS communication, it is possible to obtain a larger amount of data by using the TCP/IP protocol based on Ethernet, which is faster than the RS-232 communication method. Besides, there is an advantage that it is easy to configure equipment hardware since it only requires a local area network (LAN) environment. We wrote the HSMS communication program with Visual Basic, and the communication connection state diagram is shown in Figure 4. Table 3 describes the connection method. The red dot in Figure 4 indicates the initial communication status. If there is no TCP/IP connection to the equipment, it starts at the red dot before number 1, and if it is connected, it starts at the dot in the connected status box. After confirming the connection through the Stream 1 Function1 (S1F1) command, the communication between the server computer and the semiconductor equipment is started through the S2F23 command, and the transaction message, that is, the SVID data value, is received with S6F1. The communication connection state should be the HSMS established selected state to receive data. Through the program, SVID and OES data, two types of data, were simultaneously obtained with a time interval of 1 s from the described faulty process scenario with intentional modifications of the gas flow rate in the system employed in this research.   Among the 948 of SVIDs generated in the process module, some key parameters, which are shown in Table 4, are selected. The values of the key parameters are normalized to values in the range of (0,1) respectively, and used as the input data in the AIS modeling. Process information and the set and real values of the gas flow rate are selected as main parameters to identify malfunctions due to MFC aging. Considering that the wavelength intensity corresponding to the thin film material to be removed in the plasma etch process can be monitored and its change can be detected through OES data [43], the change in the plasma due to MFC aging is detected to determine whether there is an anomaly in the equipment state. OES peak intensities related to F, O, and Ar seemed to be the most significant variables in the analysis of plasma condition. The selected species and wavelengths are shown in Table 5. Pre-selected intensity value of Wavelength #4 332 Pre-selected Wavelength #4 106 Pre-selected intensity value of Wavelength #5 337 Pre-selected Wavelength #5 107 Pre-selected intensity value of Wavelength #6 342 Pre-selected Wavelength #6 108 Pre-selected intensity value of Wavelength #7 347 Pre-selected Wavelength #7 109 Pre-selected intensity value of Wavelength #8 352 Pre-selected Wavelength #8 Table 5. Selected optical emission spectroscopy (OES) peak wavelengths.

Experiment Results
Modifying intentionally the gas flow rate in the experiment proved that it influenced plasma and the process results. The trend of the etch rate and OES intensities and SF6 gas flow rate changes according to the considered faulty process scenario. Figure 5 shows the cross-sectional scanning electron microscope (SEM) images of the SF6 gas amount that increased to (a) 174, (b) 176, (c) 188, and (d) 190 sccm. Although the result is not depicted in a nice linear way, the etch depth appears to decrease as the gas flow rate increases. This phenomenon occurred because the number of collisions in the plasma increased as a result of the increased number of the gas molecules in a given pressure. OES data support our hypothesis on the collision. In Figure 6, three selected wavelengths of eight OES data, namely, 705 nm for F, 779 nm for O, and 357 nm for Ar, are presented. The SF6 gas flow rate and OES intensity tend to be inversely proportional. Electrons in the plasma are created by colliding electrons, and the plasma glow discharge occurs when the energetically excited electrons lose their energy. It is inferred that the increased amount of gas flow contributed to the increased collisional cross-section in the plasma, allowing the observation of increased OES intensities in the related wavelength peaks.  We observed that the OES intensity decreases as the gas flow rate increases. This can be explained by the negative gas ionization tendency. As the SF 6 gas becomes ionized, it generates F radicals, which have a strong negative ionization tendency and traps free electrons around them. O 2 has a strong negative ionization tendency, as well. Therefore, the number of free electrons decreased a lot, and then the number of excited electrons also decreased [44]. The opposite case, in which the OES intensity increases as the gas flow rate decreases, can be explained for the same reason. In conclusion, as the gas flow rate increased, the plasma changed as if the OES intensity decreased, and the etch depth decreased, as shown in Table 6. Table 6. Phenomenon by changes in SF 6 gas flow rate.

Modeling and Results
We carried out two cases of modeling using different types of input data: (1) SVID equipment data and selected OES data and (2) selected OES data only. For the first case, we included SVID 434, SVID 440, and SVID 491 for the real gas flow rate values and eight OES data with different wavelength intensities (as shown in Table 5), and they were applied to the second case of modeling input data to compare modeling results.
The model predictions of whether the equipment is in a normal state are as follows. The prediction accuracy results for both modeling cases with the selected SVID and OES data and with OES data alone are 94.69% and 93.68%, respectively. Considering that the set value of the gas flow rate has changed by 2 sccm in each process for simulating the minute process shifts and the real value of gas injected into the equipment fluctuated by ±1 sccm, and both models showed high classification accuracies. Although it was possible to use real values of gas flow rates as input data to the model, it was verified that with a simulated experiment with only OES data is required to detect on MFC failure. The change in SF 6 gas flow that we simulated causes a change in the OES data. As the SF 6 gas flow increases, the amount of O 2 in the chamber increases, and the ratio of O 2 in the total gas increases. Since O 2 gas has a lower electronegativity than SF 6 gas, an increase in O 2 gas leads to an increase in electron density [45]. For this reason, the increase of O 2 increases the F radical related to Si etch, and the etch rate increases as shown in Figure 5 [46]. In Figures 6-8, the change of F (703 nm) and O (777 nm) can be seen from the fault data where the SF 6 gas flow is changed. The results reveal that OES is a useful tool for real-time equipment fault diagnosis especially for gas related faults.
It should be noted that the purpose of this research is to detect unnoticed equipment faults and classify the causes of faults. We demonstrated FDC with respect to the gas flow rate. The prediction accuracies of the two models are shown in Table 7. To visualize the two modeling classification results in two dimensions, the classification prediction results according to two inputs out of the eight input OES wavelength data are shown as shown in Figure 7. Figure 7a,b are the classification results of the model using SVID and OES data, and Figure 7c,d are the classification results of the model using only OES data. In addition, in Figure 7b,d, prediction accuracy is expressed in detail by adding the label of figure. When the model predicted the process result state as normal, it is expressed as "T_normal" when it was actually normal, and as "F_normal" when it was abnormal state. In the same way, if the model correctly predicted the process result state as abnormal, it is expressed as "T_normal", and if not, as "F_normal". Figure 8 shows that high-dimensional data was well classified in 3D plotting.   As a result, we confirmed the possibility of applying the new FDC methodology, which is equipment abnormality detection using the AIS algorithm. In addition, it was confirmed that the abnormal state can be identified with only OES data, which represents the real-time plasma state, as it showed similar accuracy to the modeling result with the selected SVID and OES data, which includes the real value of the gas flow rate entering the equipment.

Conclusions
This research proposes real-time FDC of semiconductor etch processes using AIS algorithms, the immunological response systems. AIS, the newly proposed algorithm for FDC, is described in Section 3, and the experimental scenario with intentional modification of gas flow rate and two types of obtained data used for modeling are described in Section 4. In Section 5, we showed that the change in gas flow rate of SF 6 could affect plasma and FDC using the AIS algorithm. As a result, the prediction accuracy of modeling using selected SVID and OES data was 94.69%, and the accuracy obtained from using the OES data was 93.68%. Both models classified high-dimensional data well and demonstrated high accuracy. In addition, unlike other machine learning-based methods that are widely used for implementing FDC, the AIS algorithm has shown self-regulation, which means that there is no need to optimize any hyperparameters [38]. It was possible to learn and predict problems without over-fitting, even though there were only 2408 trained data. In this paper, we used labeled training data for accurate learning rather than taking advantage of the AIS algorithm that does not require data labeling. In addition, although the actual semiconductor equipment is composed of various parts, we proposed and completed the FDC methodology assuming only the aging of a single part. A future objective is to identify the cause of fault as well as detection of fault in consideration of the various parts constituting semiconductor equipment. We recommend the additional experiments that simulate the aging of various parts. The research also should be extended to modeling that takes into account the fact that the data used for training actually influences each other, not dimensional independent data.
In conclusion, we confirmed that the abnormal state caused by MFC aging can be detected by monitoring the plasma state based on OES data and that the new FDC methodology using an AIS algorithm. Just as the human body protects health by detecting invading foreign substances such as viruses in the immune system, when a part of semiconductor equipment malfunctions, the new FDC methodology can quickly identify the equipment abnormality by itself through the AIS algorithm using OES plasma monitoring data.