^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

A new data dimension-reduction method, called

As indicated in recent International Technology Roadmap for Semiconductors reports [

Plasma etching is a key processing method employed in IC fabrication steps. By first masking areas of the silicon wafer being processed, subsequent exposure to plasma yields the required etched features on the surface of the wafer. The process is fundamentally complex from a physical and engineering control perspective and sensitive to an array of process parameters [

Generally, there are two types of plasma diagnostic sensors: intrusive sensors and non-intrusive sensors. One popular intrusive technology is the Langmuir probe [

Previous research on OES measurements of plasma etching processes has largely focused on the use of OES data for particular target applications, for example, virtual metrology methods [

Our overall approach to the design of an effective dimension-reduction method for OES data is guided by the following factors: (i) at a fundamental level, emission spectra from chemical species in a plasma are composed of emissions at discrete wavelengths only. Thus, we wish to isolate and work with only peak wavelength intensities in our spectral data, the assumption being that non-peak intensities represent only noise; (ii) as emission lines from each chemical species are highly correlated we expect considerable data redundancy within spectra; (iii) to maximize the utility of the dimension-reduced data, we wish to avoid transforming the data to an abstract variable space (as is common in many dimension-reduction methods), instead working directly with wavelength variables; (iv) as plasma processing is a dynamic process, it is important to preserve time domain information, that is, our focus is on dimension reduction in the wavelength domain only.

From a plasma-etching viewpoint, there has been little focus on dimension and redundancy reduction of the OES dataset per se. Most previous research has been focused on application of the dataset (e.g., for process fault detection) where dimension reduction is used as a data pre-processing step but is not the focus itself. In [

A general feature of these previous applications of dimension reduction of OES data is that generic methods (e.g., PCA, SPCA, or use of summary statistics) are applied directly to the full set of input wavelength variables, without regard to the specific nature of the dataset and these methods can have difficulty in finally isolating important variables in the original variable space. For example, it is not possible to trace back to individual wavelength measurements at a certain time point when only summary statistics are the output of the method [

Other general dimension-reduction methods also have disadvantages for direct application to the problem at hand. Ensemble methods have been shown to be successful in identifying important variables in the original space [

Based on the above mentioned difficulties in directly applying general dimension-reduction methods in our specific domain, we propose in this paper a new method, called

A complete dataset of OES data is comprised of time-stamped spectral scans collected over multiple etching process runs. There are _{j}_{k}_{k}_{1}_{k}

We note here that the sensor employed in OES measurement can saturate its output value at certain wavelengths that are prominent in the process. A method to deal with de-saturation is described in _{j}_{j}_{j-1}

Given the above data as input, our proposed method is a dimension-reduction method with three sub-steps:

Absolute Peak Selection (APS),

Iterative Ranking Process (IRP),

Optimized Peak Selection (OPS).

Together they comprise the _{j}_{j}_{tj}_{k}_{j}_{tj}_{tj}_{t}_{1}, _{t}_{1}, …, _{tM}

Finally, at each time point, the OPS algorithm calculates a measure of how well the first _{j}

Absolute Peak Selection (APS) is a simple method to identify wavelength intensity variables that are relatively higher in value than neighboring wavelength intensities, while accounting for noise in wavelength intensity measurements. The noise accompanying each wavelength intensity measure is represented as a mean bias value _{j}_{i.tj}_{k}_{j}

Having found _{i,tj}_{t}_{1}, _{t}_{1}, …, _{tM}

Based on our OES dataset from a semiconductor etching process, we have found APS reduces the original 2,048 wavelength variables to a relatively small number of peaks at each time point, ranging from 22 to 113 peaks (averaging ∼47.7 peaks). Over all time points, 178 distinct peak wavelengths are detected.

The ultimate goal of the

Each set of peak wavelength intensity samples _{tj}_{k}_{tj}_{k,k′}_{k}^{2} value) of the prediction _{k}_{k}_{k}_{tj}^{2} value:
_{tj}_{tj}_{tj}_{j}

This IRP process is repeated for each time point to yield the final output:

The rationale for the method is that peaks that are removed from the pool early (low ranked peaks) can be well predicted by the remaining pool of peaks and so hold relatively less information. Peaks that remain in the pool are less correlated with others and are ranked higher. We note that in our IRP method, removing peaks and reiterating the evaluation with a decreasing pool size should improve the sensitivity of the ranking between peaks. Particularly, in very highly redundant datasets, a simpler method of ranking based only on a single evaluation of R^{2} over the full pool can yield all R^{2} values very close to 1, giving only a weak distinction between peaks.

We apply IRP to the APS output of the OES test dataset mentioned in Section 3.1. _{k}^{2} value of the peak to be removed from the pool at each iteration. At the start of the procedure the R^{2} values are very close to 1 and towards the end of the procedure, only the highest ranked peak variables, with lower R^{2} values, remain. In terms of identifying an opportunity for data reduction, it can be seen that only relatively few high rank peaks have lower R^{2} values. This general pattern was observed at all time points in the IRP output.

Having ranked peak wavelength variables using IRP, the _{tj}_{1}_{2}_{3}_{j}_{k}_{k1}_{k2}_{kr}_{k}_{k}_{k}^{2} value denoted _{r,k}_{k1}_{k2}_{kr}_{k}

Finally, an optimal value of _{l}_{tj}_{k}_{1}, _{k}_{1}, …, _{křj}_{j}

Empirically, we have found that as _{r} to determine the optimal value _{j}_{threshold}_{t}_{1}, _{t}_{2}, …, _{tM}

In the previous section we have shown that, when applied to our test data set, the IIRR method can reduce the number of input variables by a large degree without a significant loss in prediction accuracy from the remaining variables to the full set of original input variables. To further validate the method, in this section we quantify the prediction quality of the reduced set of variables produced by IIRR when predicting an independent output variable, the etch rate. Although this measurement is not normally available from plasma etch process monitoring, for our particular test dataset of spectral data from a real semiconductor etching process, we have a corresponding final etch rate measurement for each process run. Our validation procedure is as a follows.

We have 900 process samples (process runs) which we split equally into a training group and a testing group. A process sample contains the time series OES data for the process run plus one final etch rate measurement. The distribution of all etch rate samples in each group is shown in

The IIRR procedure is used to find a reduced set of wavelength measurements using only the training OES sample group. We note that the etch rate variable is not part of the IIRR training input, only the OES training samples.

A sample of the OES measurements before and after the IIRR process is shown in

We next compare the prediction accuracy of the IIRR reduced dataset to the prediction accuracy when using the full set of OES data (we note that the full data set is first de-saturated and time normalized, see Appendixes

We can see very good R^{2} and MAPE scores for the predictions. Interestingly, for the prediction of the testing dataset, there is better prediction accuracy (for MLR and PLS cases) when using the IIRR reduced dataset compared to using the full dataset. We attribute this improvement to the noise reduction effect of IIRR. We additionally note that PLS achieves the best result. It has been noted previously, in [^{2} and MAPE values, there is very good prediction accuracy across all individual samples, as shown in the

We have presented a new

We note that our IIRR operates in the original variable space, rather than a transformed variable space, which would make the method useful for OES analysis methods whose goal relates to physical interpretation of the data and process, for example in virtual metrology methods. We would also expect the method to be effective for application to high-dimensional spectral data from other processes, where the dataset represents a set of time series, each of which is an independent sample from the same fundamental process. Although the APS step of the algorithm is specific to OES datasets, the core method (IRP + OPS) could be expected to be effective for other (non-OES) high-dimensional time series datasets, where multiple independent samples of the same (repeatable) underlying process behaviour are available. However, we note a caveat here. As the IRP phase of the method ranks less correlated variables highly, there is a risk of biasing noise for inclusion in the final variable set. In our case, our interpretation of non-peak data as noise and its effective reduction/removal by APS avoids this scenario. For data from other processes, some similar insight to the nature of the noise and an effective noise reduction method would be required, so that a high level of data reduction can be achieved. On the other hand, as our IRP/OPS method is ‘internal’ in nature, not guided/biased by a chosen output variable(s), it is conservative in terms of attempting to distinguish unexplained variation from noise. As a stand-alone method of preparing a universal reduced OES dataset, that can be applied to prediction of multiple different output variables of interest, this may be useful.

Future work will investigate application of the method to other such data sets. Additionally, we will in future also consider how redundancy in the time domain can be reduced, which we have not considered in the present paper. In relation to our current OES plasma data, at least for certain periods of the process when it is less dynamic, the process is most likely over sampled and there is an opportunity for further reduction without significant loss of important time domain information.

The authors would like to thank all reviewers for their valuable opinions. The authors also thank Intel Ireland, the Irish Research Council and Dublin City University for their support.

Our particular OES test dataset presents a specific problem of saturated values in some wavelength intensity measurements, which we deal with before inputting our dataset to the IIRR procedure. Given the typical limited Signal-to-Noise Ratio (SNR) in OES measurement (in this case, the SNR is 300:1 at full signal [

Sample of OES measurements (

Our approach is simply to remove all wavelength variables at each time point that exhibit saturation, where our test for saturation is: for each wavelength _{k}_{max}

Each etching process run outputs a time series of spectral intensity scans, however, the sequence of timestamps from one process run to another is not necessarily identical. As the IIRR method needs to group all samples at a given time point during its data processing stages, the timestamps need to be aligned to a normalized time scale. The time between samples averages approximately 0.7 s and, over all process samples, the minimum final time stamp is 40.14 s. We set the normalized time scale to have 1 s intervals, with the final timestamp at 40 s. Having set the time scale, the values in each time series (process run) are transformed by linearly interpolating the wavelength intensity values between the points either side of exact 1 s intervals. The process is illustrated in

Original and normalized time series of wavelength (585.93 nm) from (

The authors declare no conflict of interest.

Workflow of IIRR.

R^{2} values of regression of the remaining peaks in the pool on the peak to be removed in each IRP iteration (

The MDR values for differently sized candidate peak sets (

Estimated Probability Mass Function (PMF) of etch rates for (

(

Individual etch rate predictions from the IIRR reduced dataset using PLS.

Acronym table.

ANN | Artificial Neural Network |

APS | Absolute Peak Selection |

FA | Factor Analysis |

IC | Integrated Circuit |

ICA | Independent Component Analysis |

IIRR | Internal Information Redundancy Reduction |

IRP | Iterative Ranking Process |

MDR | Mean Determination Ratio |

MAPE | Mean Absolute Percentage Error |

MLR | Multiple Linear Regression |

OES | Optical Emission Spectroscopy |

OPS | Optimized Peak Selection |

PC | Principal Component |

PCA | Principal Component Analysis |

PLS | Partial Least Square |

PMF | Probability Mass Function |

SPCA | Sparse Principal Component Analysis |

SNR | Signal to Noise Ratio |

Etch rate prediction accuracy comparison between original (full) OES dataset and the IIRR reduced dataset (with _{threshold}

| |||||
---|---|---|---|---|---|

^{2} |
^{2} |
||||

MLR | IIRR Reduced Dataset (224) | 0.9930 | 0.0024 | 0.9430 | 0.0070 |

Complete Dataset (2,048 × 41) | 0.9944 | 0.0021 | 0.9329 | 0.0074 | |

| |||||

PLS | IIRR Reduced Dataset (224) | 0.9802 | 0.0041 | 0.9705 | 0.0051 |

Complete Dataset (2,048 × 41) | 0.9805 | 0.0041 | 0.9676 | 0.0053 | |

| |||||

ANN | IIRR Reduced Dataset (224) | 0.9710 | 0.0042 | 0.9049 | 0.0084 |

Complete Dataset (2,048 × 41) | Input too large for computation |