Review Reports - Multi-Domain Data Integration for Plasma Diagnostics in Semiconductor Manufacturing Using Tri-CycleGAN

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The idea presented in the paper is promising, but there are a few important considerations that should be addressed before it can be considered for publication. Here are a few questions to evaluate:

Is the integration of Optical Emission Spectroscopy (OES), Quadrupole Mass Spectrometry (QMS), and Time-of-Flight Mass Spectrometry (ToF-MS) sufficiently justified, and is their role in plasma-based process monitoring clearly explained?
Does the introduction of the Tri-CycleGAN model offer a genuine technical advancement over existing methods, and is the novelty of using three interconnected CycleGANs well-established?
Are the experimental results comprehensive enough to validate the effectiveness of the model under different plasma conditions, and is the data comparison with traditional methods clearly presented?
Does the paper adequately address the practical implications and limitations of implementing this multi-domain diagnostic integration for real-world semiconductor manufacturing processes?
More references inculdin AI and CNN for EEG and optical fibers applictions need to be added in the inturduction section: a. Cognitive State Classification Using Convolutional Neural Networks on Gamma-Band EEG Signals - Applied Sciences, 2024. b. Color image identification and reconstruction using artificial neural networks on multimode fiber images: Towards an all-optical design - Optics Letters, 2018

Comments on the Quality of English Language

no comments.

Author Response

Response to Comments of Reviewer
The idea presented in the paper is promising, but there are a few important considerations that should be addressed before it can be considered for publication. Here are a few questions to evaluate:

Q1. Is the integration of Optical Emission Spectroscopy (OES), Quadrupole Mass Spectrometry (QMS), and Time-of-Flight Mass Spectrometry (ToF-MS) sufficiently justified, and is their role in plasma-based process monitoring clearly explained?

→ A1) The primary components generated when plasma is formed include radicals, ions, and excited electrons, which emit light. Radicals can be measured using QMS and ToF-MS, as they decompose during the etching process [1]. In contrast, OES is the only analytical tool suitable for use on production lines due to its ability to collect emitted light information non-invasively, without affecting plasma generation. However, OES has limitations in detecting low-intensity species and often encounters overlapping spectra. Similarly, QMS and ToF-MS can experience overlapping signals. Therefore, all three plasma diagnostic tools are essential to accurately identify and resolve overlaps and intersections in OES spectra. The ability to predict results from the other two diagnostic methods using only OES would enhance the accuracy of plasma diagnostics [2].

To address Reviewer #1's question regarding the integration of diagnostics we have added further clarification in the manuscript to explain their specific roles in plasma-based process monitoring:

→ (P. 2, Line. 36) “The generation of plasma involves the formation of several key components, including radicals, ions, neutral species, and excited electrons, which subsequently emit light [12]. To effectively monitor and control the chemical reactions within plasma, it is essential to diagnose its state during the process using various methods [13].”

→ (P. 2, Line. 42) “OES (Optical Emission Spectroscopy) analyzes the light emitted from excited radicals at various wavelengths, offering crucial insights into the electron energy levels of different species within the plasma [18]. However, it faces limitations, particularly in detecting species with low emission intensities or those that do not emit visible light, which can result in incomplete diagnostics. On the other hand, mass spectrometry-based techniques, such as Quadrupole Mass Spectrometer (QMS) [17, 19, 20] and Time-of-Flight Mass Spectrometry (ToF-MS) [21, 22], provide a more comprehensive detection capability, including neutral species and those with low emission characteristics, offering a more complete picture of the plasma state. Nevertheless, both QMS and ToF-MS are invasive, requiring the extraction of gas samples from the chamber, which may disturb the plasma. Furthermore, unstable species critical to plasma processes can degrade during sampling, reducing the accuracy of the diagnostics.”

→ (P. 2, Line. 78) “Additionally, cross-referencing overlapping spectral data from OES and mass data from QMS and ToF-MS enabled more precise identification of the species generated in the plasma.”

Q2. Does the introduction of the Tri-CycleGAN model offer a genuine technical advancement over existing methods, and is the novelty of using three interconnected CycleGANs well-established?

→ A2) The technical advancement of the Tri-CycleGAN model lies in applying a method conventionally used in image domains to semiconductor data. Specifically, the model handles diverse time-series data from OES, QMS, and ToF-MS, which requires resolving discrepancies between domains. To address this, we introduced Self-Attention and a Direct Loss Function to mitigate domain mismatches and facilitate seamless data transformations. This approach allows the model to maintain high accuracy across domains and reduces the information loss typically encountered in multi-domain transformations. By applying these improvements, the Tri-CycleGAN model achieves more precise and reliable data transformations, which marks a technical advancement for processing complex, multi-domain data in semiconductor manufacturing.

Regarding the novelty of the three interconnected CycleGANs, the Tri-CycleGAN demonstrates the ability to integrate and synchronize data across three diagnostic domains—OES, QMS, and ToF-MS—while preserving each domain's unique data characteristics. This interconnected structure not only enables accurate domain-specific transformations but also provides scalability and robustness for handling more than two domains. It offers a powerful framework for integrating and cross-referencing time-series data across multiple domains, reducing ambiguity and improving data accuracy. This multi-domain capability demonstrates the model's flexibility to scale beyond two domains, making it adaptable to a variety of data integration and diagnostic tasks.

Thus, Tri-CycleGAN is not just a technical enhancement for data transformation, but also a comprehensive analysis method with high potential for process monitoring and integration within the semiconductor industry.

In response to Reviewer #1's comment regarding the technical advancement and novelty of the Tri-CycleGAN model, we have added further clarification in the manuscript to highlight the model's scalability, robustness, and its ability to handle multi-domain data integration:

→ (P. 2, Line. 65) “In our experimental setup, we connected the three diagnostic techniques (OES, QMS, and ToF-MS) to an inductively coupled plasma reactive ion etching (ICP-RIE) system, enabling systematic data collection under various plasma conditions. To analyze the diverse data from these diagnostic methods, we developed the Triangular Cycle-Consistent Generative Adversarial Network (Tri-CycleGAN). In contrast to conventional CycleGAN models, which are typically constrained to two domains, our approach demonstrates the capacity to transform data between three distinct domains, illustrating the potential for broader application to other semiconductor diagnostic techniques. To further enhance the accuracy of data transformation between domains, we introduced self-attention mechanisms and a custom loss function into the transformation model, improving feature extraction and optimizing the consistency of data integration across the three diagnostic techniques. By leveraging the strengths of all three diagnostic techniques, we were able to overcome the limitations of each individual method, resulting in a more comprehensive understanding of the plasma state”

Q3. Are the experimental results comprehensive enough to validate the effectiveness of the model under different plasma conditions, and is the data comparison with traditional methods clearly presented?

→ A3) To ensure comprehensive experimental validation of the model across a wide range of plasma conditions, we varied key parameters such as CF₄ gas flow rates and RF power levels. Specifically, Ar and O₂ gases were introduced at a constant flow rate of 10 sccm each, and the CF₄ gas flow was adjusted across four conditions (5, 10, 15, and 20 sccm), with the chamber pressure held at 20 mTorr. RF power applied to the top coil was varied from 50 W to 110 W in 10 W increments, across seven conditions, while monitoring plasma stability. This resulted in a total of 28 experimental conditions.

For each condition, we collected data across three diagnostic domains (OES, QMS, and ToF-MS) targeting 8 key process gases, and each data collection was repeated three times. In response to Reviewer #1's comment, these experimental conditions have been clearly presented in the revised manuscript:

→ (P. 3, Line. 97) “To achieve stable plasma across a wide range of process conditions using Penning ionization [26], we introduced Ar and O2 gases at a flow rate of 10 sccm each. Based on typical gas composition ratios for CF₄, Ar, and O2 plasma etching, the CF4 flow was varied across four conditions: 5, 10, 15, and 20 sccm, while maintaining the chamber pressure at 20 mTorr. The RF power applied to the top coil was adjusted from 50 W to 110 W in 10 W increments across seven conditions. This resulted in 28 combinations of gas flow and power levels. For each condition, data were collected across three diagnostic domains (OES, QMS, and ToF-MS), targeting 8 key process gases. Each collection was repeated three times to ensure sufficient data for model training and testing.”

Q4. Does the paper adequately address the practical implications and limitations of implementing this multi-domain diagnostic integration for real-world semiconductor manufacturing processes?

→ A4) Thank you for your thoughtful question. Although this study utilized data collected from CF4-based plasma in RIE equipment, the Tri-CycleGAN model developed in this work is versatile and can be applied to other semiconductor process equipment that uses plasma, such as PEALD, PECVD, and sputtering systems. The model's flexibility stems from its ability to transform time-series data across different diagnostic domains (OES, QMS, and ToF-MS), independent of the specific plasma equipment or chemistry. As long as time-series diagnostic data can be collected [3-11], the model can integrate process data from various gas-phase environments, demonstrating its scalability and potential for broader applications beyond the CF₄-based plasma used in this study.

Furthermore, we are currently conducting a follow-up study to validate the model's adaptability in Plasma Enhanced Atomic Layer Deposition (PEALD) processes, which involve more complex time-sequenced reactions compared to PVD or PECVD systems. Despite these complexities, the Tri-CycleGAN based model has demonstrated its ability to integrate different signals effectively in the PEALD system. This reinforces the model's utility in mass production environments, as it relies on diagnostic data and machine learning rather than being constrained to a particular plasma process. To clarify this versatility, we have now added a detailed description of the model’s adaptability to various chemistries and equipment in the revised manuscript.

→ (P. 2, Line. 80) “This novel approach not only facilitates comprehensive plasma diagnostics but also presents a framework that can be extended to various other plasma-based processes and semiconductor equipment, including PECVD, sputtering, and PEALD.

Q5. More references including AI and CNN for EEG and optical fibers applications need to be added in the introduction section: a. Cognitive State Classification Using Convolutional Neural Networks on Gamma-Band EEG Signals - Applied Sciences, 2024. b. Color image identification and reconstruction using artificial neural networks on multimode fiber images: Towards an all-optical design - Optics Letters, 2018

→ A5) Thank you for your suggestion. We have added the requested references to the introduction to strengthen the connection between AI, CNN, and their applications in various fields, including EEG and optical fibers. The following references have been included:

Shabairou, N.; Cohen, E.; Wagner, O.; Malka, D.; Zalevsky, Z. Color Image Identification and Reconstruction Using Artificial Neural Networks on Multimode Fiber Images: Towards an All-Optical Design. Lett., OL 2018, 43, 5603–5606, doi:10.1364/OL.43.005603.
Avital, N.; Nahum, E.; Levi, G.C.; Malka, D. Cognitive State Classification Using Convolutional Neural Networks on Gamma-Band EEG Signals. Applied Sciences 2024, 14, 8380, doi:3390/app14188380.

Additionally, your suggestion helped us recognize that the introduction lacked sufficient mention of machine learning in relation to our study. We have revised this section to incorporate a broader discussion of machine learning, adding these references to ensure a smoother narrative. This now provides a clearer connection between general advancements in machine learning applications and the specific context of our research, where we apply these techniques to integrate and optimize data from different semiconductor diagnostic methods.

The additional part is as follows:

→ (P. 2, Line. 54) “To address these challenges, we employed machine learning to integrate data from the three diagnostic techniques, offering a method to complement their strengths and mitigate their respective limitations. Machine learning has shown its potential in various fields to handle complex, high-dimensional datasets, enabling more accurate and efficient analysis across different domains [23, 24]. Machine learning techniques such as deep learning and neural networks have been successfully applied to process large volumes of data that are challenging to interpret with traditional methods. Among these, Cycle-Consistent Generative Adversarial Network (CycleGAN) is particularly useful for transforming data between different domains without requiring paired datasets [25]. While primarily applied to image domain transformations, we adapted this technique to integrate data from different semiconductor process diagnostic methods.”

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes an approach to plasma diagnostics in semiconductor manufacturing by integrating (OES), (QMS), and (ToF-MS) into a reactive ion etching system. The authors use a machine learning model to transform and synchronize data from these three diagnostics, addressing the individual limitations of each technique. The model claims to improve accuracy in reconstructing plasma conditions, providing a more complete picture of chemical species present in CF4-based plasma, thus enhancing process monitoring, precision, and control in semiconductor fabrication.

While the paper demonstrates promising results for CF4-based plasma, how would the Tri-CycleGAN model perform with other plasma chemistries, such as those used in PVD processes (e.g., Ti, Al, Ni)? These materials are commonly used in semiconductor and coating processes. Can the model be easily adapted to handle these chemistries?

The paper notes limitations regarding gas sampling delays and degraded temporal resolution in QMS and ToF-MS diagnostics. What specific strategies can be employed to mitigate these issues, and how do they impact the real-time monitoring capabilities of the system?

Semiconductor manufacturing environments, including plasma etching and deposition, often experience fluctuations in parameters like pressure and RF power. How does the model account for or adapt to these real-world variations in plasma conditions? Can the Tri-CycleGAN effectively manage these process drifts?

How does the performance of the Tri-CycleGAN model compare to other traditional models for data integration and process control, such as Kalman filters or standard machine learning algorithms ? What are the specific advantages in terms of accuracy, computational speed, or sensitivity to detecting rare events?

The paper touches on detecting subtle variations in plasma diagnostics. Could the Tri-CycleGAN model be enhanced with additional anomaly detection mechanisms to identify early process faults or deviations? How can this be integrated into a real-time control system for semiconductor manufacturing?

Author Response

Response to Comments of Reviewer

This paper proposes an approach to plasma diagnostics in semiconductor manufacturing by integrating (OES), (QMS), and (ToF-MS) into a reactive ion etching system. The authors use a machine learning model to transform and synchronize data from these three diagnostics, addressing the individual limitations of each technique. The model claims to improve accuracy in reconstructing plasma conditions, providing a more complete picture of chemical species present in CF₄-based plasma, thus enhancing process monitoring, precision, and control in semiconductor fabrication.

Q1. While the paper demonstrates promising results for CF4-based plasma, how would the Tri-CycleGAN model perform with other plasma chemistries, such as those used in PVD processes (e.g., Ti, Al, Ni)? These materials are commonly used in semiconductor and coating processes. Can the model be easily adapted to handle these chemistries?

⇒ A1) Thank you for your thoughtful question. We acknowledge that the versatility of the Tri-CycleGAN model for other plasma chemistries, such as those used in PVD processes, has not been sufficiently described.

The versatility of the Tri-CycleGAN is based on its ability to transform time-series data across different domains, rather than being limited to a specific process or data collection technique. As long as time-series diagnostic data can be collected simultaneously, the model can integrate process data from various gas-phase environments, including PVD processes involving materials like Ti, Al, and Ni. Numerous studies have successfully applied mass spectrometry-based diagnostics [3-7] and spectroscopy-based techniques [8-11] to monitor PVD processes. Unlike the traditional CycleGAN model, which is limited to transformations between two domains, our model has successfully transformed data across three domains (OES, QMS, and ToF-MS). This achievement alone demonstrates the model's scalability and versatility, highlighting its potential for broader applications.

Additionally, we are currently conducting a follow-up study that integrates emission spectroscopy and mass spectrometry-based diagnostic techniques in Plasma Enhanced Atomic Layer Deposition (PEALD) processes. Despite the more complex, time-sequenced nature of ALD compared to typical PVD and PECVD systems, we have confirmed that the Tri-CycleGAN-based model can integrate different signals in the PEALD system as well.

It seems that the versatility of our model, particularly in its ability to be extended to other chemistries, was not sufficiently emphasized in the original manuscript. To address this, we have now added a detailed description about the model’s adaptability to diverse chemistries in the manuscript:

⇒ (P. 2, Line. 75) “By leveraging the strengths of all three diagnostic techniques, we were able to overcome the limitations of each individual method, resulting in a more comprehensive understanding of the plasma state. Additionally, cross-referencing overlapping spectral data from OES and mass data from QMS and ToF-MS enabled more precise identification of the species generated in the plasma. This novel approach not only facilitates comprehensive plasma diagnostics but also presents a framework that can be extended to various other plasma-based processes and semiconductor equipment, including PECVD, sputtering, and PEALD.”

⇒ (P. 7, Line. 248) “These advantages in both speed and anomaly detection make Tri-CycleGAN highly suitable for process monitoring and diagnostics in semiconductor environments.”

This addition clarifies how the Tri-CycleGAN model can be applied to a variety of gas-phase environments beyond CF₄-based systems, demonstrating its scalability and broad applicability.

Q2. The paper notes limitations regarding gas sampling delays and degraded temporal resolution in QMS and ToF-MS diagnostics. What specific strategies can be employed to mitigate these issues, and how do they impact the real-time monitoring capabilities of the system?

⇒ A2) Thank you for this question. Temporal resolution differences and delays can be caused by specification differences between diagnostic equipment and the dead volume in the gas sampling interfaces [12]. This mismatch can disrupt the smooth integration of data across different diagnostic techniques.

To address this, we incorporated a Self-Attention Layer into the CycleGAN Generator. The Self-Attention mechanism is particularly effective in handling time-series data because it allows data points from different time steps to reference one another and evaluate their relationships. By using the Query, Key, and Value framework, Self-Attention can assess dependencies between different time points and compensate for the time mismatches [13] between diagnostic techniques. The Multi-Head Attention Block diagram below illustrates how the Self-Attention mechanism works:

The input data from the three domains (OES, QMS, and ToF-MS) is transformed into three vectors: Query (Q), Key (K), and Value (V), which are computed using learned weight matrices ?_?, ?_? and W_V.
Query represents data from the current time step, while Key contains data from all other time steps. The model calculates the similarity between each Query and Key via a dot-product operation, producing a set of similarity scores [14].
These scores are scaled and passed through the Softmax function, normalizing them into a set of weights [15].
The weights are then applied to the Value (V) vector, which holds the actual data values from other time steps. This allows the model to interpolate across time steps and align information from the three diagnostic domains. For instance, if QMS data at a certain time step lacks resolution or is delayed, higher-resolution ToF-MS or OES data can be used to fill in the gaps, transforming the data to ensure an accurate representation [16].

This weighted combination of values produces an output that integrates information from various time steps, allowing the model to capture the full temporal dynamics of the plasma. By referencing data across the entire sequence, Self-Attention ensures that critical information is retained while aligning data from asynchronous diagnostics like OES, QMS, and ToF-MS.

As Reviewer #2 correctly pointed out, we had not sufficiently explained our strategy for addressing gas sampling delays and temporal resolution degradation in the original manuscript. In response to this valuable feedback, we have now included this explanation:

⇒ (P. 2, Line. 72) “To further enhance the accuracy of data transformation between domains, we introduced self-attention mechanisms and a custom loss function into the transformation model, improving feature extraction and optimizing the consistency of data integration across the three diagnostic techniques.”

⇒ (P. 6, Line. 224) “To address the timing discrepancies between different diagnostic techniques, such as OES, QMS, and ToF-MS, a self-attention mechanism [48] is incorporated into the model. When data is collected through different mechanisms and equipment, slight temporal offsets can occur, leading to misalignments. The self-attention mechanism helps reduce these temporal misalignments by evaluating the importance of each data point in relation to others across the entire sequence [49].”

Q3. Semiconductor manufacturing environments, including plasma etching and deposition, often experience fluctuations in parameters like pressure and RF power. How does the model account for or adapt to these real-world variations in plasma conditions? Can the Tri-CycleGAN effectively manage these process drifts?

⇒ A3) Thank you for this insightful question. While our Tri-CycleGAN model does not explicitly take RF Power and Flow Rate as direct inputs, it was designed to account for these variations by training on data collected under a range of RF Power and Flow Rate conditions [17].

During the experimental phase, we systematically varied RF Power and Flow Rate while collecting data from OES, QMS, and ToF-MS. By training the model on this diverse dataset, it was able to learn how plasma characteristics shift in response to changes in these parameters. In essence, rather than directly incorporating RF Power and Flow Rate as inputs, the model learned these variations implicitly from the data.

This approach allows the model to adapt to changes in plasma conditions, as it has been trained to recognize and predict how plasma characteristics evolve when these parameters fluctuate. As a result, it can handle process drifts effectively, performing data transformations that reflect real-time changes in plasma behavior.

In our experiments, the Tri-CycleGAN model showed consistent performance across the range of RF Power and Flow Rate values included in the training dataset. However, we do recognize that its performance may degrade under extreme conditions outside this trained range. To address this, we plan to extend the model’s capabilities by gathering data from a broader range of process conditions. Additionally, we are considering incorporating RF Power and Flow Rate as direct inputs to the model, which may further enhance its adaptability.

We believe that this approach has allowed the Tri-CycleGAN model to manage fluctuations in plasma conditions effectively within the studied range, demonstrating its potential applicability in semiconductor manufacturing environments.

Q4. How does the performance of the Tri-CycleGAN model compare to other traditional models for data integration and process control, such as Kalman filters or standard machine learning algorithms? What are the specific advantages in terms of accuracy, computational speed, or sensitivity to detecting rare events?

⇒ A4) Thank you for this important question. While the Kalman filter is well-suited for estimating the state of linear systems and managing noisy time series data, it has limitations when it comes to modeling nonlinear relationships or handling complex interactions across multiple domains [18]. Kalman filters update predictions at each time step based on previous data, which is efficient for a single stream of data but less suited for simultaneous processing of multiple domain data, such as OES, QMS, and ToF-MS.

In contrast, our Tri-CycleGAN model is designed to learn nonlinear relationships and handle data from multiple domains in parallel. By utilizing activation functions like ReLU (Rectified Linear Unit), the model captures nonlinearity and complex transformations across domains [19], allowing it to manage intricate domain-specific interactions [20]. The ability to process multiple domain data simultaneously is a key advantage of Tri-CycleGAN, particularly when compared to the Kalman filter's sequential processing.

Moreover, Tri-CycleGAN benefits from GPU-based parallel processing, where each CycleGAN model is independently trained, enabling faster execution. This parallelism enhances computational efficiency and speed, which is crucial when handling large datasets from multiple diagnostic sources in semiconductor manufacturing.

In terms of rare event detection, Tri-CycleGAN outperforms traditional models like the Kalman filter due to its competitive learning structure between the Generator and Discriminator. While the Generator transforms data between domains, the Discriminator evaluates whether the transformed data appears authentic when compared to the original domain. This adversarial learning setup helps the model identify anomalies more effectively, as it is trained to distinguish between normal and rare data patterns. Conventional models like Kalman filters, which are designed to handle more gradual variations, often struggle to detect sudden or rare anomalies.

In summary, Tri-CycleGAN offers significant advantages over traditional models like the Kalman filter in terms of handling nonlinear relationships, computational speed due to parallel processing, and its enhanced ability to detect rare events. These features make it particularly well-suited for complex, multi-domain data integration and process control in semiconductor manufacturing environments.

As Reviewer #2 noted, the comparison of our Tri-CycleGAN model with traditional models for data integration and process control was not sufficiently addressed in the original manuscript. In response, we have added further explanation of these advantages, particularly in terms of nonlinearity handling, computational efficiency, and anomaly detection:

⇒ (P. 7, Line. 239) “Unlike traditional models such as Kalman filters, which process data sequentially [50], Tri-CycleGAN offers a computational advantage by transforming the entire dataset in parallel once the model has been trained. This parallel processing reduces computational time, allowing for faster data transformations across multiple domains like OES, QMS, and ToF-MS.”

Q5. The paper touches on detecting subtle variations in plasma diagnostics. Could the Tri-CycleGAN model be enhanced with additional anomaly detection mechanisms to identify early process faults or deviations? How can this be integrated into a real-time control system for semiconductor manufacturing?

⇒ A5) Thank you for your question. The Tri-CycleGAN model is indeed capable of incorporating additional anomaly detection mechanisms to enhance sensitivity to subtle changes in plasma diagnostics. Each CycleGAN in the Tri-CycleGAN architecture includes a discriminator that distinguishes between real and generated data, enabling the model to identify inconsistencies that may indicate anomalies in the plasma state. As the discriminator is continuously trained to improve its ability to detect discrepancies, its capacity to identify anomalies becomes increasingly refined over time. If unexpected patterns or deviations occur during the data transformation process, the discriminator can detect these changes and flag potential anomalies.

The self-attention layer, which we introduced to capture time-series dependencies in the model [21], is also valuable for detecting anomalies. By assigning different levels of importance to various fluctuations in plasma parameters, such as pressure or gas flow, the self-attention mechanism allows the model to focus on specific variations over time. This capability makes the model more sensitive to rare anomalies.

To extend the Tri-CycleGAN model for real-time anomaly detection, we are currently implementing real-time diagnostics using data processed through a Sliding Window Approach and Incremental Updates. This method would allow the system to detect anomalies by evaluating the reconstruction error [22] for each window of data. As real-time data is input, the model reconstructs it using transformations learned from normal process data in sliding windows or incremental updates. Under normal conditions, the reconstruction error remains small, but increases when anomalies occur. This will allow the model to monitor and detect anomalies in real time. This approach enables the model to continuously monitor real-time data, learning and adapting as new data is processed, while relying on reconstruction error as a reliable metric for detecting anomalies.

This method effectively enables the Tri-CycleGAN model to monitor dynamic semiconductor processes and detect anomalies by continuously comparing new data with the model's learned representation of the normal state, enhancing its utility for real-time process control. \

Thank you for your insightful question. We realized the original manuscript lacked sufficient explanation regarding the method for detecting process faults or deviations using the Tri-CycleGAN model. In response, we have added a discussion on how our model can detect anomalies, staying within the scope of this study:

⇒ (P. 7, Line. 243) “Additionally, since the model learns complex nonlinear relationships, it maintains minimal reconstruction error when processing normal data. However, since the model is trained to accurately transform only normal data, the reconstruction error increases significantly after abnormal data undergoes both the transformation and reverse transformation process [51]. This contrast between low reconstruction error for normal data and high error for abnormal data allows for efficient differentiation between the two. These advantages in both speed and anomaly detection make Tri-CycleGAN highly suitable for process monitoring and diagnostics in semiconductor environments.”

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript proposes an approach to plasma diagnostics in semiconductor manufacturing based on CycleGAN model. The authors consider three diagnostic techniques (OES, QMS, and ToF-MS) integrated into a reactive ion etcher system, and develop three separate CycleGAN models to enable data transformation between each pair of the diagnostic techniques.

In terms of topic, the paper is suited to Journal. In terms of methodological approach and presentation, it suffers from some drawbacks, and I suggest to the authors to consider the possibility to address the following remarks.

Remarks:

1. One of the (main) contributions of this study is the underlying dataset, whose production has been well described in Section 2. However, to enable the interpretation of the reported results, the authors should describe the dataset in more details (e.g., dataset size, number and distribution of the comprised samples, etc.). The idea of this modification is that the authors convince the reader that the dataset is of sufficient quality (in terms of size, representativeness and balance) to support the research.

2. At the methodological level:

2.1 It is not clear is there a methodological novelty in the paper. The authors state that they introduce a direct loss function (cf. Eq. (5)). However, similar function have been already applied in the context of CycleGAN.

2.2. How the lambda parameters given in Eq. (6) have been determined.

3. The results (In Section 3) are presented and discussed in a “descriptory“ manner, which does not fully support the conclusion that the reported approach minimizes the reliance on multiple diagnostic techniques. It would be useful if the authors also provided a more quantitative insight into the obtained results (e.g., loss values, etc.) to support the conclusion.

Author Response

Response to Comments of Reviewer

The manuscript proposes an approach to plasma diagnostics in semiconductor manufacturing based on CycleGAN model. The authors consider three diagnostic techniques (OES, QMS, and ToF-MS) integrated into a reactive ion etcher system and develop three separate CycleGAN models to enable data transformation between each pair of the diagnostic techniques.
In terms of topic, the paper is suited to Journal. In terms of methodological approach and presentation, it suffers from some drawbacks, and I suggest to the authors to consider the possibility to address the following remarks.

Q1. One of the (main) contributions of this study is the underlying dataset, whose production has been well described in Section 2. However, to enable the interpretation of the reported results, the authors should describe the dataset in more details (e.g., dataset size, number and distribution of the comprised samples, etc.). The idea of this modification is that the authors convince the reader that the dataset is of sufficient quality (in terms of size, representativeness, and balance) to support the research.
⇒ A1) First of all, we would like to thank the reviewer for their valuable comments. As mentioned, the initial manuscript lacked a detailed explanation of the dataset. In response, we have included the total number of samples along with a more comprehensive description of the conditions, as outlined below.

To address this and ensure a thorough experimental validation of our model, we varied key parameters such as CF₄ gas flow rates and RF power levels. Specifically, Ar and O₂ gases were introduced at a constant flow rate of 10 sccm each, and the CF₄ gas flow was adjusted across four conditions (5, 10, 15, and 20 sccm), with the chamber pressure held at 20 mTorr. RF power applied to the top coil was varied from 50 W to 110 W in 10 W increments, across seven conditions, while monitoring plasma stability. This resulted in a total of 28 experimental conditions.

In each of these conditions, we collected data across three diagnostic domains (OES, QMS, and ToF-MS) targeting 8 key process gases, and each data collection was repeated three times. In response to Reviewer #3's comment, these experimental conditions have been clearly presented in the revised manuscript:

⇒ (P. 3, Line. 97) “To achieve stable plasma across a wide range of process conditions using Penning ionization [26], we introduced Ar and O2 gases at a flow rate of 10 sccm each. Based on typical gas composition ratios for CF₄, Ar, and O2 plasma etching, the CF4 flow was varied across four conditions: 5, 10, 15, and 20 sccm, while maintaining the chamber pressure at 20 mTorr. The RF power applied to the top coil was adjusted from 50 W to 110 W in 10 W increments across seven conditions. This resulted in 28 combinations of gas flow and power levels. For each condition, data were collected across three diagnostic domains (OES, QMS, and ToF-MS), targeting 8 key process gases. Each collection was repeated three times to ensure sufficient data for model training and testing.”

Q2. At the methodological level:
Q2.1 It is not clear is there a methodological novelty in the paper. The authors state that they introduce a direct loss function (cf. Eq. (5)). However, similar function has been already applied in the context of CycleGAN.

⇒ A2.1) Thank you for your insightful comments. I would like to clarify the methodological novelty of the direct loss function (?_dir) that we introduced, as it plays a critical role in addressing issues such as overfitting and mode collapse. This function is particularly significant in the context of integrating and transforming 1D semiconductor data, where precise preservation of physical characteristics is essential.

The fundamental objective of CycleGAN is to maintain consistency between two domains while generating similar data in the target domain. The generator transforms data, and the discriminator distinguishes between transformed and real data to improve the system’s performance. CycleGAN’s cycle-consistency loss ensures that data transformed back to the original domain can be reconstructed. This process inherently aims to produce realistic data, similar to what the direct loss function is trying to achieve. However, CycleGAN is prone to mode collapse, especially when dealing with data with repetitive patterns, such as Chinese character images [23, 24] or semiconductor process data. In these cases, the generator may repeatedly produce similar outputs instead of generating diverse ones. Furthermore, when there is a lack of data, the generator may overfit the training data, reducing its ability to generalize to new inputs.

In our experiments with semiconductor process data, we observed several issues when training the model without the direct loss function:

Although the temporal features of the process (e.g., plasma on/off states) were well captured, the model generated data with offsets or scale differences, and even when we input data from different plasma conditions (shown on the left in the figure, real ToF-MS and OES data), the output remained highly similar (center of the figure). This mode collapse likely occurred due to the high similarity between input data from slightly different conditions.

To address this, we incorporated the direct loss function into the error term. By directly comparing the transformed data with real data, this straightforward method significantly reduced mode collapse. While approaches like stroke encoding reconstruction loss [24] or Bayesian frameworks [25] are also used to address similar issues in other contexts, our approach is much simpler and more intuitive. This is appropriate given that the goal of our research is not to capture partial patterns or manage uncertainty as in image-based tasks, but rather to preserve the physical characteristics critical for anomaly detection and data integration in semiconductor processes. Although further hyperparameter optimization and data collection remain challenges, the direct loss function has allowed us to better reflect subtle differences in plasma conditions, even with limited data.

In the revised manuscript, we have clarified this context and highlighted the methodological contributions:

⇒ (P. 6, Line. 207) “Models without the direct loss function can capture distinct features such as plasma on/off states, but may struggle with offsets and scale differences in the generated data. This is due to mode collapse, where different plasma conditions produce similar outputs because of the high similarity between input data [44,45]. The direct loss function penalizes the model more heavily when mode collapse occurs, as it produces high errors in these cases. This ensures that the model is pushed to generate more diverse and accurate outputs by avoiding repetitive patterns or misrepresentations in the data, especially when dealing with highly similar input conditions.”

Thanks to Reviewer 3’s comments, we have been able to more clearly articulate the novelty and significance of our approach. We sincerely appreciate your feedback.

Q2.2. How the lambda parameters given in Eq. (6) have been determined.

⇒ A2.2) The lambda parameters (?_cycle, ?_direct) in Equation (6) were determined through a combination of theoretical understanding and empirical testing. Here's an explanation of how these values were chosen based on existing literature and experimentation.

For ?_cycle, we used the common value of 10, as established in earlier studies [26, 27]. This value has consistently proven effective in balancing transformation accuracy and reconstruction by maintaining strong cycle consistency. In our testing, adjusting this value offered no significant improvements, so we retained 10 to ensure consistency in transformations.

For ?_direct, the process was different due to the introduction of the direct loss function, which leverages the paired nature of our 1D semiconductor data. Unlike standard CycleGAN models working with unpaired data [28], we introduced this parameter to directly compare paired data points and improve transformation accuracy. We tested various values for ?_direct, adjusting it to balance the direct loss with the cycle-consistency and adversarial losses. Cross-validation helped monitor metrics like transformation accuracy, overfitting, and training stability. The final value allowed for accurate transformations without overshadowing other loss terms.

In summary, both ?_cycle and ?_direct were selected through empirical validation, informed by literature and adapted to our dataset's needs. This ensures each loss term contributes effectively to the model’s performance.

Q3. The results (In Section 3) are presented and discussed in a “descriptory” manner, which does not fully support the conclusion that the reported approach minimizes the reliance on multiple diagnostic techniques. It would be useful if the authors also provided a more quantitative insight into the obtained results (e.g., loss values, etc.) to support the conclusion.

⇒ A3) Thank you for your insightful feedback. We recognize the value of incorporating more quantitative insights to strengthen our conclusions, and we believe that the figures provided in the manuscript offer a evidence of the model’s effectiveness.

In Figure 2, we demonstrate the comparison between the transformed and original data across three different domains, where plasma conditions were varied in 10W increments. Figure 3 further illustrates this by comparing real, generated, and reconstructed data from three domains throughout the training process. This comparison highlights how our model adapts to different domains while preserving key features of the data.

In Figure 4 and the corresponding supporting figures, we compare original data, generated data from different domains, and reconstructed data for all eight major gas species in the CF4 mixed plasma process, clearly displaying the model's ability to handle multiple transformations. Figure 5 and its supporting figures show the evolution of the discriminator and generator losses throughout training, providing insight into how the model converges as learning progresses for the eight major gas species in the plasma process.

In addition, we have provided more detailed explanations on the changes in generator and discriminator loss values during training:

⇒ (P. 11, Line. 345) “More specifically, for the OES to ToF-MS transformation model (Figure 5a), the loss values show stabilization around 5k epochs. This period reflects when the model has learned to handle the transformation between these two domains effectively. After 16k epochs, a further decrease in loss indicates additional fine-tuning in the generator’s per formance. The other two transformation models, ToF-MS to QMS (Figure 5b) and QMS to OES(Figure 5c), show more regular patterns after approximately 4k epochs, highlighting the more consistent performance in these transformations. This stability suggests that the generator has successfully learned the relationships between the data from these domains earlier in the training process.”

While we have focused on the presentation of these visual and descriptive results in this manuscript, our goal is to introduce a novel methodology for integrating process data. We recognize that further quantitative comparisons could enhance the model evaluation, and we plan to pursue this in future studies as we continue to refine and develop our approach.

On the advice of the reviewers, we have appropriately mentioned the self-attention mechanisms and direct loss function and emphasized their role in improving data transformation accuracy in plasma diagnostics. Accordingly, we have revised the abstract and conclusion as follows:

⇒ (P. 1, Line. 8) “The model incorporates self-attention mechanisms to address temporal misalignments and a direct loss function to preserve fine-grained features, further enhancing data accuracy.”

⇒ (P. 11, Line. 373) “The model integrates multi-domain plasma diagnostics (OES, QMS, and ToF-MS), enabling effective data mapping across these different diagnostic techniques. By incorporating self-attention mechanisms to correct for temporal misalignments and a direct loss function to preserve fine-grained details, the model ensures more accurate and reliable data transformations. These improvements allow it to capture subtle but critical plasma variations, enhancing the precision of diagnostics. As the model is trained to synthesize data across multiple measurement domains, it effectively eliminates the need for redundant diagnostic techniques, offering a cost-efficient solution for complex diagnostic environments. This not only reduces operational costs but also leads to more reliable process monitoring and better decision-making in semiconductor manufacturing.”

References

Lieberman, M.A.; Lichtenberg, A.J. Principles of Plasma Discharges and Materials Processing; John Wiley & Sons, 2005; ISBN 978-0-471-72424-7.
Nasser, E. Fundamentals of Gaseous Ionization and Plasma Electronics; Wiley-Interscience, 1971; ISBN 978-0-471-63056-2.
Welzel, T.; Mändl, S.; Ellmer, K. Cluster Ion Formation during Sputtering Processes: A Complementary Investigation by ToF-SIMS and Plasma Ion Mass Spectrometry. J. Phys. D: Appl. Phys. 2014, 47, 065204, doi:10.1088/0022-3727/47/6/065204.
Cai, Y.; Henn-Lecordier, L.; Rubloff, G.W.; Sreenivasan, R.; Choo, J.-O.; Adomaitis, R.A. Multiplexed Mass Spectrometry for Real-Time Sensing in a Spatially Programmable Chemical Vapor Deposition Reactor. Journal of Vacuum Science & Technology B: Microelectronics and Nanometer Structures Processing, Measurement, and Phenomena 2007, 25, 1288–1297, doi:10.1116/1.2753851.
Jin, Y.; Takahashi, C.; Ono, T. Real-Time Etching Monitor Using Argon Quadrupole Mass Spectrometry for 100 Nm Class WSiN Gate Fabrication. Journal of Vacuum Science & Technology A 2003, 21, 1589–1594, doi:10.1116/1.1589527.
Müller, N.; Rettinghaus, G.; Strasser, G. A Trace Gas Mass Spectrometer for On‐line Monitoring of Sputter Processes at 10−2 Mbar without Pressure Reduction. Journal of Vacuum Science & Technology A 1990, 8, 2822–2825, doi:10.1116/1.576633.
Wild, Ch.; Wagner, J.; Koidl, P. Process Monitoring of a‐C:H Plasma Deposition. Journal of Vacuum Science & Technology A 1987, 5, 2227–2230, doi:10.1116/1.574962.
Kostrin, D.K.; Lisenkov, A.A.; Uhov, A.A. Spectrometric Control of Coatings Deposition Process. J. Phys.: Conf. Ser. 2016, 735, 012055, doi:10.1088/1742-6596/735/1/012055.
Malherbe, J.; Martinez, H.; Fernández, B.; Pécheyran, C.; Donard, O.F.X. The Effect of Glow Discharge Sputtering on the Analysis of Metal Oxide Films. Spectrochimica Acta Part B: Atomic Spectroscopy 2009, 64, 155–166, doi:10.1016/j.sab.2008.11.009.
Pulker, H.K. Optical Coatings Deposited by Ion and Plasma PVD Processes. Surface and Coatings Technology 1999, 112, 250–256, doi:10.1016/S0257-8972(98)00764-6.
Lu, C.; Guan, Y. Improved Method of Nonintrusive Deposition Rate Monitoring by Atomic Absorption Spectroscopy for Physical Vapor Deposition Processes. Journal of Vacuum Science & Technology A 1995, 13, 1797–1801, doi:10.1116/1.579771.
Batey, J.H. The Physics and Technology of Quadrupole Mass Spectrometers. Vacuum 2014, 101, 410–415, doi:10.1016/j.vacuum.2013.05.005.
Boulila, W.; Ghandorh, H.; Masood, S.; Alzahem, A.; Koubaa, A.; Ahmed, F.; Khan, Z.; Ahmad, J. A Transformer-Based Approach Empowered by a Self-Attention Technique for Semantic Segmentation in Remote Sensing. Heliyon 2024, 10, e29396, doi:10.1016/j.heliyon.2024.e29396.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need 2017.
Alkhunaizi, N.; Kamzolov, D.; Takáč, M.; Nandakumar, K. Suppressing Poisoning Attacks on Federated Learning for Medical Imaging 2022.
Che, X.; Zhang, H.K.; Li, Z.B.; Wang, Y.; Sun, Q.; Luo, D.; Wang, H. Linearly Interpolating Missing Values in Time Series Helps Little for Land Cover Classification Using Recurrent or Attention Networks. ISPRS Journal of Photogrammetry and Remote Sensing 2024, 212, 73–95, doi:10.1016/j.isprsjprs.2024.04.021.
Garvin, C.; Grizzle, J.W. RF Sensing for Real-Time Monitoring of Plasma Processing. In Proceedings of the The 1998 international conference on characterization and metrology for ULSI technology; ASCE: Gaithersburg, Maryland (USA), 1998; pp. 442–446.
Pires, D.S.; Serra, G.L.O. Methodology for Modeling Fuzzy Kalman Filters of Minimum Realization from Evolving Clustering of Experimental Data. ISA Transactions 2020, 105, 1–23, doi:10.1016/j.isatra.2020.05.034.
Kulathunga, N.; Ranasinghe, N.R.; Vrinceanu, D.; Kinsman, Z.; Huang, L.; Wang, Y. Effects of the Nonlinearity in Activation Functions on the Performance of Deep Learning Models 2020.
Agarap, A.F. Deep Learning Using Rectified Linear Units (ReLU) 2018.
Islam, S.; Haque, Md.M.; Sadat, A.J.Md. Capturing Spectral and Long-Term Contextual Information for Speech Emotion Recognition Using Deep Learning Techniques 2023.
Choi, J.H.; Jang, S.K.; Cho, W.H.; Moon, S.; Kim, H. Motor PHM on Edge Computing with Anomaly Detection and Fault Severity Estimation through Compressed Data Using PCA and Autoencoder. MAKE 2024, 6, 1466–1483, doi:10.3390/make6030069.
Chang, B.; Zhang, Q.; Pan, S.; Meng, L. Generating Handwritten Chinese Characters Using CycleGAN Available online: https://arxiv.org/abs/1801.08624v1 (accessed on 22 October 2024).
Zeng, J.; Chen, Q.; Liu, Y.; Wang, M.; Yao, Y. StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding 2021.
You, H.; Cheng, Y.; Cheng, T.; Li, C.; Zhou, P. Bayesian Cycle-Consistent Generative Adversarial Networks via Marginalizing Latent Sampling. IEEE Transactions on Neural Networks and Learning Systems 2021, 32, 4389–4403, doi:10.1109/TNNLS.2020.3017669.
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks 2020.
Easthope, E. (Un)Paired Signal-to-Signal Translation with 1D Conditional GANs 2024.
Harms, J.; Lei, Y.; Wang, T.; Zhang, R.; Zhou, J.; Tang, X.; Curran, W.J.; Liu, T.; Yang, X. Paired cycle‐GAN‐based Image Correction for Quantitative Cone‐beam Computed Tomography. Medical Physics 2019, 46, 3998–4009, doi:10.1002/mp.13656.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The new version can be published

Author Response

I am glad to hear that you are satisfied with the revised text.

Thank you.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have addressed most of the remarks from my previous review report and the manuscript has been improved. I suggest to the authors to consider the possibility to include a summary of their response to Remark 2.2 in the revised manuscript.

Author Response

Response to Comments of Reviewer #3
The authors have addressed most of the remarks from my previous review report and the manuscript has been improved. I suggest to the authors to consider the possibility to include a summary of their response to Remark 2.2 in the revised manuscript.

→A) Thank you for your insightful feedback to help us improve our work. We have included a summary of our response to Remark 2.2 in the revised manuscript. We have added an explanation of how the lambda parameters in Equation (6) were determined. In the following paragraph, we have also included content about leveraging the advantages of utilizing paired data. We hope these additions address your concern.

→(P. 6, Line. 201) The weights were selected based on existing literature and empirical testing to balance transformation accuracy and stability. ?_direct was adjusted to balance against the cycle-consistency and adversarial losses.

By utilizing paired data collected during experiments, the direct loss function addresses issues such as overfitting and mode collapse. These issues are particularly critical for 1D semiconductor data transformation, where preserving physical characteristics precisely is essential. Directly comparing the transformed data with real data enables the model to capture subtle discrepancies and transient fault signals that might not be evident in the overall data distribution.

Author Response File: Author Response.pdf