4.1. Next Generation Fault Detection and Classification (NG-FDC)
While fault detection has been an integral component of semiconductor manufacturing for at least a decade and provides significant required benefits (such as scrap reduction, quality improvement, and equipment maintenance indications), it continues to be plagued by high setup costs and high rates of false and missed alarms. This fact was underscored at the 2015 Integrated Measurement Association APC Council meeting, held in conjunction with the 2015 APC Conference ([46
] and [6
]). Here, top APC specialists from the microelectronics manufacturing user and supplier communities reached consensus on the following points: (1) “[There is a] need for automation of front-end of FD from model building to limits management, but keep in process and equipment expertise”; and (2) a major pain point in today’s FDC systems is false and/or missed alarms. This fact was verified in the field; for example, a deployment expert indicated that it often takes up to two weeks to correctly configure univariate FD for a process tool, including collecting data, refining limits, and correlating limits violations to actual events of importance, and even after this process is completed there are often too many false alarms or missed alarms associated with a particular FD model [29
The big data evolution afforded an opportunity in semiconductor manufacturing to provide an improved FDC capability that addresses these key pain points [6
]. The FD portion of this NG-FDC technology contains two key components. The first component is trace level automated analysis: This component uses multivariate analysis techniques that are data-driven to detect and characterize anomalies [48
]. It has the advantage of a one-size-fits-all approach to analysis that is easy to manage. It also can detect patterns in data that are not readily apparent or not suggested by an SME. However, because it does not incorporate SME very well, it can result in high numbers of false and missed alarms, lack appropriate prioritization of alarms, etc. [6
The second component, semi-automated trace partitioning, feature extraction, and limits monitoring, directly addresses the pain points identified in the aforementioned APC Council industry meeting. The approach uses trace partitioning, feature extraction, and limits setting techniques that are semi-automated, allowing the incorporation of SME. With this approach, model building times can be drastically reduced by orders of magnitude depending on the level of SME incorporation, and results indicate that false and missed alarm rates are improved ([47
] and [29
] and [6
The trace partitioning component is illustrated in Figure 5
. Common patterns of interest are determined by analyzing historical data and consulting with SMEs. Indeed, many of these patterns are signals that would typically result from a physical process such as turning on a switch (piecewise continuous step function), vibration or underdamped actuation (oscillation), momentary disturbance (spike), or drift (ramp). Using a myriad of search techniques organized in a hierarchical fashion, the boundaries of these features are determined and the sensor trace is partitioned; these techniques are described in more detail in [47
]. Then techniques specific to each feature type are used to extract each feature model. Note that the solution is configurable in that there are settable parameters for each feature type used to: (1) distinguish a feature (sensitivity); and (2) distinguish between features (selectivity). In addition, the method is extensible in that additional feature types can be added to the detection hierarchy. SME is used to: (1) select the sensitivity and selectivity parameters; (2) choose, among the candidate features, which to extract and monitor for FD; and (3) fine tune partitions and model parameters as necessary.
summarizes the results of applying this NG-FDC capability to a public etch data set ([47
] and [6
]). Three approaches to analysis were compared: (1) whole trace statistics in which whole trace mean and variance were used to determine faults; (2) manual windowing in which best guess manual techniques were used for partitioning and feature extraction; and (3) semi-automated feature extraction as defined above was employed for FD. A SOM clustering approach was used to optimize the alarm limits setting for each approach [42
]. The results indicate the semi-automated trace partitioning and feature extraction outperforms the other approaches in terms of reduced false alarms (false positive rate) and missed alarms (1minus true positive rate).
With semi-automated trace partitioning and feature extraction in place, a candidate set of FD models can automatically be presented to the SME for down selection. The down selection process can be aided if supervised data is available as the relationship between features and process or equipment faults can be determined. After the appropriate final list of FD models has been determined, monitoring limits must be set to provide a balance between false and missed alarms. As shown in Figure 7
, a receiver operating characteristic (ROC) can be plotted for each FD model in which the limits are adjusted [49
]. If the limits are very wide, the model is insensitive so there are few false alarms, but high levels of missed alarms. If the limits are very tight, the reverse is true. The challenge is to determine an optimal limit setting that balances the occurrences of true and false positives to the cost of these events in the particular application. A solution to this problem involves plotting a cost function on top of the ROC and is detailed in [50
After models and limits have been set up for a system, they must be managed in an environment that is oftentimes very dynamic, with numerous process and product disturbances and context changes (e.g., product changes). Model and limits management remains a challenge, but is often accomplished with a variety of tools for model pre-treatment to address context, model continuous update to support process dynamics, and model and limits update triggering and rebuilding or retargeting [51
4.2. Predictive Maintenance (PdM)
The concept of PdM in semiconductor industries grew out of the high cost of unscheduled downtime that includes cost of yield loss and maintenance in addition to lost production time. Practitioners realized that univariate FD systems could sometimes be leveraged to detect trends in a particular variable; these trends could then be extrapolated to determine approximate remaining useful life (RUL) of a particular component. Scheduled maintenance could then be adjusted to reduce the occurrence of unscheduled downtime or extend uptime (if scheduled maintenance is overly conservative). However, as noted earlier, semiconductor manufacturing processes are characterized by complexity and variability, and are subject to frequent disturbances. As a result, the univariate FD extrapolation mechanisms for predicting maintenance as mentioned above are generally not optimal, robust or even maintainable.
The approach to PdM that seems to be most effective in semiconductor manufacturing uses FD or NG-FDC output data along with maintenance data, context data, process data, and potentially process metrology data to develop predictive models using off-line using MVA techniques [6
]. The goal, as shown in Figure 8
a, is to develop a predictor that allows the user to predict a future failure along with obtaining an estimated time to failure (TTF) horizon and some indication of confidence in or range of the prediction. The failure indication trigger can be as simple as a threshold (as shown) or a more complex analysis of a degradation profile. The trigger is set based on the confidence of the prediction, but also to provide a TTF that useful to the user (e.g., the approximate time necessary to align maintenance resources).
The off-line modeling process used to achieve this goal is illustrated in Figure 8
b. Note that techniques for data aggregation, treatment, feature selection, model building and analysis, and model optimization through cost-benefit analysis are largely the same as the techniques used in NG-FDC. Indeed, these same techniques can be leveraged for other predictive capabilities such as VM and yield prediction. In the case of PdM, large amounts of data are needed so that failure prediction models accommodate factors such as the potentially long degradation profile of some failure types (which could be months or years), large number of failures types and failure modes within each failure type, and the impact to process variability and context (e.g., product changes). There often is not sufficient data to fully characterize a particular piece of equipment using purely data-driven methods, and this problem is often not solvable with improved big data practices because the process or equipment changes enough over time to make longer term data histories somewhat irrelevant. Thus, SME of equipment, equipment components, and processes is usually leveraged heavily in PdM model building and maintenance.
An example of application of PdM for an epitaxial process is shown in Figure 9
. Epitaxial equipment is used to grow films such as oxidation on semiconductor surfaces. Banks of lamps are used to create heat so that a film is deposited or grown to a precise thickness evenly. Utilizing straight-forward UVA FD, lamp replacement can be predicted only 4–5 h in advance, which leads to a high risk of unexpected lamp failures and unexpected downtime. Utilizing MVA PdM, lamp failure can be predicted five days in advance with about 85% accuracy, reducing unscheduled downtime and increasing throughput. Potential for impact is estimated at $
108K (USD) per process chamber per year. The solution is robust to process disturbances including R2R control process tuning; virtual sensor models were developed to decouple disturbances from the prediction signal [6