Multiscale Modeling and Recurrent Neural Network Based Optimization of a Plasma Etch Process

: In this article, we focus on the development of a multiscale modeling and recurrent neural network (RNN) based optimization framework of a plasma etch process on a three-dimensional substrate with uniform thickness using the inductive coupled plasma (ICP). Speciﬁcally, the gas ﬂow and chemical reactions of plasma are simulated by a macroscopic ﬂuid model. In addition, the etch process on the substrate is simulated by a kinetic Monte Carlo (kMC) model. While long time horizon optimization cannot be completed due to the computational complexity of the simulation models, RNN models are applied to approximate the ﬂuid model and kMC model. The training data of RNN models are generated by open-loop simulations of the ﬂuid model and the kMC model. Additionally, the stochastic characteristic of the kMC model is presented by a probability function. The well-trained RNN models and the probability function are then implemented in computing an open-loop optimization problem, in which a moving optimization method is applied to overcome the error accumulation problem when using RNN models. The optimization goal is to achieve the desired average etching depth and average bottom roughness within the least amount of time. The simulation results show that our prediction model is accurate enough and the optimization objectives can be completed well.


Introduction
Plasma etch has been applied in the manufacturing of integrated circuits (IC) for over 50 years, and becomes even more essential due to the continuous decreasing of the fabricating scale [1,2]. The traditional "failures and corrections" experimental procedure is not enough to maintain and develop the plasma etch technique. In addition, therefore, simulation of plasma etch is an effective way to improve our understanding about the etching process and to help improve the process technique [3,4]. A lot of work has been done to simulate the gas flow and chemical reactions in the plasma chamber, and there are three main types of models: kinetic model, fluid model, and hybrid model. Fluid model is the most commonly used model because of its computational efficiency and flexibility in coupling the electromagnetic fields. Additionally, the complex transport phenomena and reactions which motivate the etching process are simulated by some quite precise approaches, like level set method and kMC method. Level set method is based on solving a Hamilton-Jacobi type equation for a level set function, which has stable, accurate, and efficient performance in dealing with interface evolution problems with sharp corners, change topology, and order of magnitude changes in speed [5,6], while, in order to realize a high resolution simulation of plasma etch process, kMC is the most potential method for which it has both an atom resolution and the ability to deal with relatively long-time scales [7][8][9]. An appropriate modeling methodology is established by the fact that kMC transforms every physical phenomena into stochastically selected events. The key step for kMC method is to attain the probability table of the simulated process through simulations or experiments.

Multiscale Model
The plasma reactor considered in this work is shown in Figure 1. A two-dimensional model can be used due to the cylindrical symmetry of ICP. The reactive gases enter the chamber through two inlets: the edge inlet (inlet1) and the center inlet (inlet2), and then the top coils generate and maintain the plasma (an electrically neutral mixture of molecules, atoms, ions, electrons, and photons). Ions are accelerated by the bottom electrode and bombard on the substrate, which is placed on top of the electrode. With chemical reactions, ion impact reactions, and particle transportation, the nanoscale etching process occurs on the substrate, and desorbed atoms are pumped out of the plasma domain. As is shown in Figure 1, the plasma is generated in a macroscopic chamber, while the etching process that occurs on the substrate is at the nano scale. It is quite a necessity to capture both the macroscopic plasma behavior and the microscopic etching behavior. Thus, a multiscale model that consists of two simulation models is presented: the continuous fluid model that describes the gas flow and chemical reactions in plasma chamber is established in COMSOL; and the kMC model that simulates the kinetic behavior of the etching surface is completed through C language. A spatial-temporal discrete method is applied to address the issue of computing the fluid model and the kMC model concurrently. The spatial discrete method is shown in Figure 1. The whole substrate is divided into several parts, and the fluxes data of each parts are considered uniform. In our pre-test, we find that the flux data are quite different in different parts of the substrate. Three is the lowest number that the substrate should be divided for that the discrete parts should reflect the flux data on the middle and two sides of the substrate. Considering the computational efficiency and our model complexity, we choose to divide the whole substrate into five parts. The etch process of each parts is calculated by one kMC model. Furthermore, the data exchanges between the macroscopic model and the microscopic model are operated in every time step t s . The following sections present both the macroscopic fluid model and the microscopic kMC model.

Macrosopic Model
Basically, there are three main modules in computing the macroscopic fluid model: the Maxwell equations, the drift-diffusion equations and the heavy species transportation equations (simplified Stefan-Maxwell equations). The electromagnetic field that spread all over the plasma etch equipment is solved through the computing of Maxwell equations. Furthermore, the electron density and the electron energy density are solved through the computing of drift-diffusion equations. In addition, all the heavy species (neutral particles and ions) densities are solved through the computing of heavy species transportation equations. All of these equations are partial differential equations and thus can be solved by the finite-element methods.
Specifically, the gas mixture in plasma chamber would be Cl 2 and Ar. Twelve species and their corresponding reactions are taken into account. Four excitation forms of Cl 2 are included: Cl 2 V, Cl 2 (B3PI), Cl 2 (C1PI), and Cl 2 (B3PI+C1PI). Cl 2 V is generated by vibration excitations and the latter three are generated by electronic excitations. The excited energy of these four species are 0.068 eV, 2.5 eV, 3.12 eV, and 9.25 eV, respectively. Cl − is generated by attachment, and this negative ion owns a very small percentage of all ions. Two ionization reactions are computed and the reaction products are Cl + and Cl + 2 , respectively. In addition, both of their ionization energies are 11.48 eV-while only one excited form of Ar is considered, which is Ars (excitation energy is 11.5 eV). In addition, one ionic species Ar + (ionization energies are 15.8 eV and 4.24 eV) is taken into account. The reactions of these species are divided into three types: electron impact reactions, heavy species reactions, and wall reactions. The excited species and ions are all generated by ion impact reactions, which makes it the most important part in the plasma chemistry. The heavy species reactions happen when two heavy particles impact and react to form a new particle. Wall reactions occur when particles impact on the plasma chamber wall. We note that the latter two types of reactions would transform the excited species and ions to the ground state species. All of the reaction coefficients and reaction rates can be found in [28][29][30][31].

Microscopic Model
The kMC method applied in this work follows closely our former work [16] and share many similarities with other commonly used Monte Carlo methods. The details will not described again in this article. In addition, in order to show the completeness of the modeling process, we will present the main structure of the kMC model.
The KMC method uses stochastically selected events to represent all the phenomena of etching process, and realize a kinetic simulation. In Figure 2, we show the simplified schematic diagram of the virtual simulation box. The lattice, representing the substrate, consists of atomic cubic cells, and each of them would only include one atom. The side length of each cube is set as L = ρ −1/3 , where ρ is the atomic density of the substrate material. The black sphere above the lattice represents the particles from the plasma region. When it bombards onto the lattice surface, the etching parameters are calculated based on the material type, injection angle, and local coverage type. The atomic kinetic simulation is realized by adding or removing atoms on the lattice. In this work, the substrate material would be silicon. In addition, the resist material can be set as incorruptible since the sputtering rate of etchable resist is relatively low. Specifically, the initial structure of the lattice would be: the size of the microscopic silicon lattice is 100 × 100 × 100 monolayers 3 (ML 3 ); the resistive mask is placed above the substrate and the height of the resistive mask would be 50 ML; a 40 × 40 ML 2 surface is uncovered by the resistive mask and is placed in the middle of the whole surface.

Simulation Methods
The particle transportation process in vacuum is computed by the three-dimensional injection trajectories. First, when one particle from plasma injects into the simulation box, the particle type is selected based on the compared results between the flux data from plasma and a random number. In addition, the initial inject location is chosen as a randomly allocated point of the interface between the plasma domain and the simulation box. We note that only one particle would be simulated in one time step, and the transportation process is assumed without any collision, which is commonly assumed in the high vacuum process where the mean free path is large. Then, the angular functions of different fluxes are given, and the incident angles are randomly sampled from these functions. The expressions of these functions are based on in situ plasma measurements and are given in [32,33]. The simulation box is shown. The black sphere is the particle coming from plasma, which is above the vacuum region and not shown in simulation. Each cubic cell contains only one atom, and the lattice is formed by these cells.
When particles reach the substrate surface, two types of reactions occur. Neutral particles from the plasma domain that have the energy order kT can participate in the chemical reactions with surface atoms. The chemical reaction probabilities can be found in [31,34], and will stored in a reaction matrix. When a neutral particle reaches one surface cell, the corresponding reaction probability will be found in the reaction matrix and compared to a random number P = rand(0,1). If reactions occur, surface atoms will desorb or be replaced. The ions accelerated by the bottom electrode have the energy order eV, and will etch the substrate physically, like sputtering and ion-enhanced etching. We note that the physical etching rate is much larger than the chemical etching rate. In our case, the chemical etching rate is nearly zero due to the nonvolatile character of the chemical reaction product, while they can be sufficiently removed by ion enhanced etching. Meanwhile, the surface atoms can also be removed through physical sputtering. The etching yield expressions of ion-enhanced etching and sputtering can be found in [3,35,36]. When an ion impacts on one surface cell, the number of surface atoms that are etched is determined by the average etching field of its neighbor cells and itself.
The injected particle will be reflected to vacuum again if no surface reactions happen until it reacts or moves out of the simulation box. The desorbed etch products, which are generated by surface reactions, will re-emit to the vacuum until they reach the substrate surface or move out of the simulation box. Etch products can deposit on the substrate surface and the deposition probabilities can be found in [31]. The etch products will be reflected to vacuum again if no depositions happen until they deposit or move out of the simulation box.

Numerical Algorithm of the Model
At the beginning of every step, the particle type is selected based on the flux data that come from the plasma model. Then, the injection trajectory is computed based on the selection of the particle initial position and the injection angle. When all the simulative particles and etch products have reacted with surface atoms or moved out of the simulation box, one simulation step ends. A periodic boundary condition is used in the sidewalls of the simulation box. In addition, a perfectly absorbing boundary is applied on the interface between the simulation box and the plasma, which means all particles and etch products would not be computed again when they reach the boundary. The flowchart of the numerical algorithm of the microscopic model is shown in Figure 3.

Prediction Model Establishment
With the fluid model and the kMC model, we have established the multiscale model of the plasma etch process. However, long time horizon prediction and optimization are too computationally expensive by using the fluid model and the kMC model. The fluid model and the kMC model are all nonlinear models with dynamic behaviors, while RNN can also describe nonlinear dynamic behaviors since it has the memory of past states. As is shown in Figure 4, the states derived from past inputs are imported into the next RNN cell, which shows the dynamic behavior. Through the open source software like pytorch and tensorflow, RNN can be easily established. In addition, thanks to the recent development of parallel computing acceleration technique of GPU, the training process of RNN can be quite efficient. Thus, RNN is an appropriate approach to address our issue. In the following parts, we will present the data generation process and the training process of the applied RNN models. From previous work, the theorem is given in which RNN can approximate any dynamic system to arbitrary accuracy [25]. However, it should be noted that RNN models are nonlinear models without stochastic characteristic, which means that the computation results under the same initial condition and inputs would be constant. Thus, using simple RNN models as the prediction model cannot capture the natural stochastic characteristic of the kMC model. In this article, a probability function is analyzed to describe this important part in the microscopic kMC model. The data for training and analyzing are from the open-loop simulations of the fluid model and the kMC model. In order to capture all the system dynamic behaviors, the open-loop simulations should be operated with a combination of different initial states and input values. In addition, the data are sampled at discrete time steps. Two RNN models will be presented to approximate the fluid model and the kMC model, respectively. To distinguish these two RNN models, the RNN model that approximated the macroscopic fluid model is named RNN f , and the other one is named RNN k . We will separately illustrate the data generation process and training process of the RNN f model and the RNN k model.

Data Generation
As we mentioned in Section 2, the inputs of the fluid model would be: Ar/Cl 2 ratio of the input gas at edge inlet (R 1 ), Ar/Cl 2 ratio of the input gas at center inlet (R 2 ), the power of the top coils (P r f ), and the bottom electrode bias (V B ). In addition, the outputs would be the particle flux data on the interface of the fluid model and the kMC model, which is noted as F i,k (i is the particle type number and k is the location number). The sampling period of the fluid model open-loop simulation would be t s = 0.2 s. We note that this time step is enough for the plasma states to coverage from any different initial states to the final states. Thus, we only need to consider the selection of the inputs. Specifically, the inputs are selected randomly at the beginning of every time step, and the selection range of each inputs would be: 5/100 ≤ R 1 ≤ 50/100, 5/100 ≤ R 2 ≤ 50/100, 800 W ≤ P r f ≤ 1400 W, 50 V ≤ V B ≤ 250 V. The data are finally obtained from an open-loop simulation which includes the 800 sampling periods.

Training Process
The inputs of RNN f would be the values of R 1 , R 2 , P r f , and V B at every time step t k (t k = kt s ). The outputs of RNN f are the values of F i,k after a t s simulation. The RNN f model is constructed by the RNN module of pytorch. The python version is 3.0, the torch version is 1.2.0, and the cuda version is 10.0. The GPU that is used for training is GTX 1080ti. We use the k fold cross validation method to establish four sub RNN models. In addition, an ensemble method is used: the RNN f model would be the ensemble of these four sub RNN models. The training process of these four sub RNN models and the RNN f model is shown in Figure 5: four sub RNN models are trained, and the RNN f model is the average of these four sub models. The validation data are not used for training the sub models. We note that using the ensemble of multiple RNN models can improve the model prediction accuracy and capture more system dynamic behaviors. The parameters of these four sub models like neuron numbers and layer numbers are determined by the grid search method. Specifically, there will be four layers in the recurrent layer and two linear layers as well as one Rectified Linear Unit (ReLU) layer in the output layer. The parameter optimization algorithm is the Adam optimization method. The loss function is mean square error (MSE). In order to avoid the over-fitting problem, the training process would be finished if the loss falls below the desired threshold (which is set to be 10 −6 ). The inputs and outputs data are normalized by the maximum and minimum values of each variable: where x j is the jth variable of the data, and x max,j and x min,j are the maximum and minimum values of the jth variable.

Microscopic Model
We note that the kMC model is a nonlinear model with a stochastic feature. While RNN is not suitable to capture the stochastic feature, we use RNN k to approximate the expectation of the kMC model. The stochastic feature is described by a probability function that is analysed based on the open-loop simulation data. The following parts will present the establishment process of the RNN k model and the probability function.

Data Generation
As we mentioned in Section 2, the inputs of the kMC model are the particle flux data from the plasma model (F i,k ) and the bottom electrode bias (V B ). The outputs of the kMC model should be the surface topology structure of the lattice. Thus, the height data of the 40 × 40 ML 2 uncovered surface sites are sampled as the output data. The sampling period would also be set as t s = 0.2 s. Then, the training data are generated through a 150 run open-loop simulation of the kMC model. The initial condition of each simulation is set as the same as the initial condition mentioned in Section 2.2. The operation time is set as 15 s, thus 75 sampling periods are included in each simulation, and the size of the training data would be 150 × 75 = 11,250. At the beginning of each sampling period, the inputs are randomly selected from the pre-tested ranges (these ranges should be consistent with the simulation results of the fluid model).
In order to analyze the stochastic characteristic of the kMC model, multiple run simulation under the same initial condition and inputs should be operated. Thus, we use the same method as above to generate the open-loop data; only the kMC model is operated 100 times instead of one time in each sampling period. A 20 run open-loop simulation is carried out, and the data size would be 40 × 75 × 100 = 300,000.

Training Process
We note that RNN k is applied to approximate the expectation of the microscopic model. Thus, the training data should be the average data of multiple runs open-loop simulation under the same initial condition and inputs, while, from pre-experiments, we found that the RNN model trained by using the single run open-loop data and the RNN model trained by using the average of multiple run open-loop data are quite similar. Thus, we can directly use the single run open-loop data to train the RNN k model. The inputs of RNN k are the values of V B and F i,k at every time step t k . The initial hidden states of RNN k are the initial surface topology structure (the height data of the 40 × 40 mL 2 uncovered sites) of the kMC model. The outputs are the final surface topology structure of the kMC model after a t s etch process. The parameters of RNN k like neuron number and layer number are determined by the grid search method. The neuron numbers of RNN k would be much larger than those of RNN f because the inner state and output state dimensions of RNN k are much larger than those of RNN f . Specifically, there is one layer in the recurrent layer and four linear layers as well as three Rectified Linear Unit (ReLU) layers in the output layer. The training and computation time of the RNN k model would be relatively long and therefore the ensemble method would not be applied here. The other settings of the training process are the same as the RNN f model.

Stochastic Characteristic Analysis
Due to the stochastic characteristic of the kMC method, the height of the lattice surface sites will stochastically oscillate around the expected height. In order to capture this important feature, we analyze the statistical property of the lattice surface height data. The sample data generation process is shown in the previous section. The variances of the height data are computed and most of the variance values are in range (0,5). The statistical properties of the variance are shown in Figure 6. Except a few beginning time steps, the mean values and the variances of the height variance oscillate around a constant value. Thus, we can assume that the variance distributes in range (0,5), and this distribution will not change over time. Then, the sampling statistical histogram of the variance at a typical moment and the sampling statistical histogram of the variance of all time steps are shown in Figure 7. The variance range is divided into 200 parts, and the corresponding sample numbers are given. This sampling statistical histogram of all time steps is quite similar to the sampling statistical histogram at a typical moment, which suggests that our previous assumption is reasonable. Then, a probability distribution curve is computed to fit the statistical histogram of all time steps. We note that the probability distribution curve in the figure is quite similar to the probability distribution curve of chi-square distribution, which suggests that the origin distribution of the lattice surface height data might be normal distribution. The probability distribution curve is then used as the variance probability function to simulate the stochastic oscillation of the surface sites. Combined with the RNN k model, the prediction model of the kMC model is established and is called RNN k,p .

Optimization Computation
The optimization goal is to achieve the desired average etching depth and average bottom roughness within the least amount of time. The RNN models and the probability function developed in the sections above are to predict the plasma state and the etch structure over t ∈ [t k ,t k+1 ]. Long time horizon prediction is established by using the previous prediction results as the initial data for the current prediction. We note that the model error between the RNN models and the fluid model as well as the kMC model will accumulate over time and will lead to a huge prediction error at the final time. Thus, computing a single optimization problem from the beginning time step to the final time step will not be enough to compute the real optimization trajectories. Therefore, we apply a moving optimization method to compute the optimization trajectories, which limits the model error in one time step instead of the whole time horizon. In every time step t k , an open-loop optimization problem is computed. The optimizing time range is from t k to t f inal and the optimized parameters are R 1 , R 2 , P r f , V B . The initial condition is from the fluid model and the kMC model. The optimization problem is written below: where w D , w r , and w t are the weight of the penalty on etching depth, bottom roughness and etching time, respectively;D f inal ,D set ,r b, f inal , andr b,set are the average etching depth at final time, the set average etching depth, the average bottom roughness at final time, and the set average bottom roughness, respectively; S init is the initial surface topology data (generated from the kMC model); R min , R max , P min , P max , V min , V max are the lowest and the highest bound of the optimized parameters. We note that the optimization problem can be solved in Matlab with the nonlinear programming (NLP) tool box. Specifically, we use the multistart function in Matlab and the solving algorithm is sequential quadratic programming (SQP). The multistart function is able to calculate multiple local solutions with several initial points and select the best solution as the global optimization solution. When the optimization problem is solved, the optimized parameters for the current time step will be stored and the optimization problem is resolved at the next time step. It should be noted that the optimization solution calculated at each time step would not be the real optimal solution due to model error. This error is limited in a single time step and is corrected in the next time step because the results of the fluid model and the kMC model are applied as the initial condition in the optimization problem. However, the global optimization solution is hard to obtain because the solutions at past time steps have already deviated from the global optimal trajectories.

Simulations and Results
The plasma chemistry and microscopic structure of the etched lattice have been described in Section 2. The work pressure of the plasma chamber is set as 40 mTorr and the chamber temperature is set to be 60°C. The ICP is excited by means of a ratio frequency (RF) power at 13.56 MHz supplied to the upper coils. Figure 8 shows an open-loop cross section profile evolution that was captured at 0 s, 3 s, 6 s, 9 s, 12 s, and 15 s. The etching depth is defined as the average etching depth (D) of all uncovered surface sites, and its unit is mL. Due to the existence of the resistive mask, the bottom surface is getting narrower over time. The red lines are used to define the range of the bottom surface in order to describe the bottom surface roughness more accurately. The bottom roughness (r b ) is defined by variance and its unit would be mL 2 : where N is the total number of the counted bottom surface sites, h i is the height of the surface site at position i, andh is the average height of all counted bottom surface sites.
In the following, we will present the results of the RNN models validation process and the optimization simulation results.

Validation of RNN Models
The prediction performance of the RNN models is carried out through the validation process. Figure 9 shows the Cl 2 flux evolutions at location 1. The solid line represents the flux evolution of the fluid model, and the dotted line represents the predicted flux evolution of the RNN f model. The flux data are sampled in every 0.2 s, and the model inputs are randomly selected from the selection range. Figure 10 shows theD andr b evolutions and the minimum and maximum values ofD andr b among the 100 run kMC model simulations. In Figure 10a

Optimization Simulations
The optimization goal is to achieve the desired average etching depth and average bottom roughness within the least amount of time, and the optimized parameters would be R 1 , R 2 , P r f , and V B . From the 100 run simulation, the final average etching depth and average bottom roughness after a 10 s etching process with R 1 = 12/100, R 2 = 12/100, P r f = 1000 W, V B = 150 V are 40.4293 ML and 11.0977 ML 2 , which are chosen asD set and r b,set . Subsequently, we compute the moving optimization problem to calculate the optimal trajectories and the optimal total etching time. To emphasize the necessity of using the moving optimization method, we will also show the optimization results by solving a single optimization problem from the beginning time step to the final time step. Figure 11 shows the optimal trajectories of R 1 , R 2 , P r f , V B and the evolutions ofD and r b . Both results by using the moving optimization method and the single optimization method are shown. The average etching depth and average bottom roughness reach the desired level concurrently at 7.4 s by utilizing the moving optimization method, while the desired objects are not completed at the final time by utilizing the single optimization method. As we mentioned above, the model error will accumulate over time and the error would be so large that we cannot complete the desired objects by solving a single optimization problem. It should be noted that the optimized inputs would be largely modified in one time step when using the single optimization method, and this modification is feasible. Such techniques can be found in [37]. We also note that the optimization results by using the moving optimization problem will not be the global optimal solution due to the existence of the model error. Although the solutions might not be the best, using the moving optimization method can address the model error accumulation problem, and the desired average etching depth and bottom roughness are achieved in a shorter amount of time.  It should be noted that, although the microscopic processes of the real etch process are the same given a specific etch chemistry, the macroscopic dynamics and reaction kinetics of the plasma are highly equipment dependent with respect to the accuracy needed for real applications. While such high fidelity models are not openly available, equipment vendors typically have their own proprietary plasma models for their plasma processing equipments, and these models can be easily integrated into our multiscale modeling framework when needed.

Conclusions
In this work, a multiscale modeling and RNN based optimization framework of the plasma etch process on a three-dimensional substrate with uniform thickness using the inductive coupled plasma (ICP) was proposed. The macroscopic plasma model is simulated by a continuous fluid model and the microscopic etching process is simulated by a kMC model. A spatial-temporal discrete method is applied to compute these two models concurrently. Then, the RNN f model and the RNN k model are trained based on the open-loop simulation data of the fluid model and the kMC model, in order to reduce the computational complexity of these two models. The stochastic characteristic of kMC model is described by a probability function. Then, long time horizon prediction and optimization are calculated based on the RNN models and the probability function. A moving optimization method is applied to overcome the model error accumulation problem. The optimization goal is to achieve the desired average etching depth and bottom roughness within the least amount of time. In simulations and results, the validation experiments of RNN models are first implemented, and then the simulation results applying the optimized parameters are presented. The results demonstrate that our prediction model is accurate enough and the desired average etching depth and bottom roughness can be achieved in a shorter amount of time.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.