Optimal Control of SiC Crystal Growth in the RF-TSSG System Using Reinforcement Learning

: We have developed a reinforcement learning (RL) model to control the melt ﬂow in the radio frequency (RF) top-seeded solution growth (TSSG) process for growing more uniform SiC crystals with a higher growth rate. In the study, the electromagnetic ﬁeld (EM) strength is controlled by the RL model to weaken the inﬂuence of Marangoni convection. The RL model is trained through a two-dimensional (2D) numerical simulation of the TSSG process. As a result, the growth rate under the control of the RL model is improved signiﬁcantly. The optimized RF-coil parameters based on the control strategy for the 2D melt ﬂow are used in a three-dimensional (3D) numerical simulation for model validation, which predicts a higher and more uniform growth rate. It is shown that the present RL model can signiﬁcantly reduce the development cost and offers a useful means of ﬁnding the optimal RF-coil parameters.


Introduction
Silicon carbide (SiC) crystal is a promising semiconductor material of power devices and the radio-frequency (RF) top-seeded solution growth (TSSG) method that has been used to produce high-quality SiC crystals. However, the unstable growth and slow growth rates of SiC crystals prevent the utilization of the TSSG method to grow large single crystals in industrial setups. In the RF-TSSG process, maintaining the uniform growth rate along the seed within a certain range can stabilize crystal morphology, and the melt flow during the crystal growth plays an important role in the change of growth rate. To improve the quality of SiC crystals, we have conducted numerical simulations to shed light on the phenomena governing the SiC growth in this process [1][2][3][4][5]. Most studies have carried out simulations of the melt flow using a particular condition, that is, under certain control parameters of the TSSG system, such as a fixed input power for the RF-coil, prescribed boundary conditions, the magnetic field strength, and seed rotation rate.
It is standard for a design parameter in a crystal growth system to be optimized one by one through numerical simulations. However, the design parameters involved are many and may have a combined effect on the melt flow. Thus, it would be inefficient and expensive to optimize the whole system by means of numerical simulations alone. For instance, in the TSSG process of SiC crystals with the control of static magnetic fields, Takehara et al. [6] applied a Bayesian optimization to determine the optimal set-up of a cusp magnetic field and seed rotation for high-and uniform-crystal growth rates. However, with the increase of control parameters to be taken into account, this approach becomes costly because the algorithm requires a large amount of data set. To this end, a fast and proper design of the TSSG system for an optimal control of the growth rate is required. To the best of our knowledge, the optimization and control models developed previously were only for the Czochralski growth process. The first-principle is used to establish a prediction model that relates input design parameters to output crystal parameters [7][8][9][10][11]. Some studies have focused on developing a model that represent the relationship between the changes of the crystal radius and the crystal slope angle at the meniscus section [12][13][14]. A proportional integral derivative (PID) design has been used to control the growth process based on the developed models [15,16]. Basically, all those models introduced earlier are based on a series of differential equations but do not fully involve the physics of fluid mechanics. In a more complicated crystal growth process, however, the improvement of the accuracy of the models remains a challenge, and PID control still has deficiencies in dealing with nonlinear and high-dimensional problems.
In recent years, the application of machine learning has received a notable attention in fluid control problems. This is because this approach allows to examine completely different cases and gets results faster. The combination of appropriate machine learning tools and fluid mechanics knowledge can be used to directly optimize the control strategy, to reduce or even eliminate the artificial control modeling and design, and to change the traditional approach. In the field of crystal growth, Dropka et al. [17] designed and trained artificial neural networks (ANNs) in directional solidification of silicon to identify the relation and the optimum combination of magnetic fields and growth parameters through the data of 2D CFD simulation. As expected, the accuracy of the model naturally depends on the amount and accuracy of the available data set in a given system. On the other hand, Reinforcement Learning (RL), which is one of the machine learning tools recently widely utilized in the field of optimal control of fluid flows [18][19][20][21], can automatically discover the optimal control strategies without any prior knowledge. This approach presents itself as a powerful tool in general in modelling, and would naturally be beneficial for modelling crystal growth techniques. To this end, in the present study, we introduce the RL technique to the TSSG system to control the melt flow during the SiC crystal growth process. In the RL model, an 'agent' tries to learn the policy to maximize a 'reward' function through an certain 'environment'. The environment can be any stochastic process. For example, the numerical simulation of the SiC melt flow can be taken as the 'environment' in this study. The agent first obtains the state of the simulation (environment). Then the agent performs 'actions' to affect the time evolution of the simulating melt flow (environment). After receiving the reward from the state of the environment controlled by actions, the agent completes one control loop. In the TSSG process, maximizing both the growth rate and its uniformity simultaneously is essential for growing high quality crystals. This is the main objective of the present study.  [22,23]. The IPM solves the process in three steps: (i) the coil-induced electromagnetic field; (ii) heat generation and heat transfer in the furnace; and (iii) the melt flow in the crucible. A 2D numerical simulation for the melt flow was taken as an interactive environment in the reinforcement learning process as explained later.

Electromagnetic Field
In RF-heating, the frequency of the electric current in the coil is too high to resolve the time resolution by numerical simulations. Thus, IPM uses period-averages for calculating the densities of the Lorentz force and heat generation. The Lorentz force and heat generation are calculated by: (1) where ω is the frequency of electric current in the coil, σ e is the electric conductivity, and J 0 the peak current in the coil, C and S are time-independent in-phase and out-phase amplitudes of the magnetic stream function.

Heat Transfer in the Furnace
The steady-state conductive and radiative heat transfers are considered for computing the temperature field in the furnace. The associated governing equations are given by: (3) where k is the thermal conductivity, T temperature, Q the Joule heat generation density, J i the radiosity, ε i emissivity, F ij the view factor, E b,i the emissive power of a black body and q i /A i the heat flux.

The Melt Flow
The governing equations of the melt flow are the well-known continuity, momentum balance, energy balance, and mass transport equations that take the following forms, ∂T ∂t ∂c ∂t where u is the flow velocity vector, ρ density, p pressure, µ the kinematic viscosity, a represents the control value that will be explained later, g the gravitational acceleration, β the thermal expansion coefficient, T re f reference temperature, α thermal diffusivity, C p the specific heat, D the diffusion coefficient, and c the carbon concentration. The initial and boundary conditions of the melt flow are obtained from the results of the heat transfer simulation in the furnace, and the overall simulation is actually coupled with the computation of electromagnetic field, heat transfer in the furnace, which are described in Sections 2.

Reinforcement Learning
Reinforcement learning involves an agent built by ANNs interacting with an environment. In this study, numerical simulation is regarded as the interactive environment through three steps: the agent makes an observation of the state s t (an array of fluid flow variable obtained from the simulation), imposes the action a t on the simulation, and computes a reward r t from the controlled simulation. Here, t is the discrete time step when the interaction takes place. The optimal control problem is aiming to learn an optimal policy that maximizes the expected cumulative reward.
where γ is a discount factor, π Θ is the policy function described by ANN (Θ is the weights). The current RL model training is based on an episode, which means that the model will learn active control strategy in a limited time before analyzing the obtained results and resume learning with a new episode. The sketch in Figure 2 presents the one episode of the learning process, interacted with the simulation of fully developed melt flow (started from 700 s [1]) in the TSSG process. The RL agent interacts with the 2D melt flow simulation via a state inquiry, and an action decision is made at every T = 0.25 s during the simulation, and one episode training lasts 5.0 s. The states in the current simulation are the supersaturation near the seed with 50 sample points, which are shown in Figure 2. Wang et al. [5] reported that the melt flow can be controlled by the RF-coil induced electromagnetic field, and the contribution of electromagnetic field is described as a source term in Equation (6). The value of a in the initial case without control is equal to 1.0. To simplify the calculation process, a reference case (at the RF-coil frequency 25 kHz and current 360 A) that performed in [1] is directly used as an initial case in the present study. According to the computational results of the effect of electromagnetic field in [5], applying the Lorentz force twice in magnitude (compared to the initial case) is detrimental to crystal growth. Therefore, the output action range a is limited between 0 and 2 when the model training is carried out. It should be noted that the parameters of the RF-coil control the electromagnetic field, and change the heat generation and temperature boundary conditions. For simplicity, we set the control value a representing the parameters of the RF-coil that ideally control electromagnetic field strength, and the heat generation and temperature boundary conditions are assumed not to be changed in the training process, the solution of the changing of temperature boundary condition under control during the training process is discussed in later Section 3.2. Due to the aim of improving the growth rate, the instantaneous reward function, r t , consists of two contributions: the growth rate gradient (uniformity) and the growth rate, where T is the per action duration time, v g is the growth rate, M SiC is the molar weight of SiC, ρ SiC is the crystal density, and G x is the growth rate gradient along the seed interface. The growth rate of the reference case around the seed edge is extremely high compared with those on the other interface positions [1], and thus calculating G x through the full seed radius makes the training process very difficult to converge. Thus, a partial growth interface (0-3 mm) is used in Equation (12).
To balance the growth rate contribution, a factor of 0.001 is set in Equation (10) due to the large order difference between G x and v g . The agent consists of simple feed-forward ANNs with a hidden layer of 512 neurons. The discount factor γ is set as 0.95. Proximal Policy Optimization PPO algorithm [24] is used to update the agent, it belongs to the policy gradient class. The details of the policy gradient can be found in the article [25].

Growth Rate Improvement through Lorentz Force Control
By adapting the algorithm and hyperparameters mentioned in Section 2.2, we performed a robust RL training. The reward of every training episode is shown in Figure 3. As seen, the reward increases quickly after 40 episodes and it is converged after 100 episodes. Therefore, we consider that the policy is the close to the optimal one can be obtained after 100 episodes. The policy at the 120th episode was chosen to control the initial case over 50 s (which is 10 times longer than training time). Figure 4 shows the value of the action a in 50 s. The red point in the figure is the initial case (a = 1.0). For the case with control, it is clear that the Lorentz force was enhanced, the enhancement is about 1.5 times to 1.8 times the initial value (a = 1.5 to a = 1.8). Meanwhile, there is no obvious pattern of the action sequences, according to our previous study [3], the results may be due to the effect of the unstable flow generated by electromagnetic and interfacial forces.  The growth rate uniformity and its value for the initial and controlled cases are compared in Figure 5. Although the growth rate gradient in the case without and with control partially overlaps during the simulation time, the gradient in the case with control is generally smaller than that in the case without control. The smaller growth rate gradient indicates that the uniformity is improved. On the other hand, the value of growth rate in the case with control increased significantly (almost twice) compared with that in the case without control, as seen in Figure 5b. The results indicate that the present policy for the Lorentz force can increase the growth rate and improve its uniformity at the same time.  Figure 6 shows the time-averaged temperature field in the melt without and with optimal control. The hottest part is located at the bottom corner of the crucible wall and the lowest temperature region is near the seed. In the case with control, the temperature field beneath the seed is flatter. The time-averaged flow velocity and supersaturation in the melt are presented in Figure 7, which more directly shows the comparison between the initial and controlled cases. In the computations, the supersaturation S is calculated by using (c − c eq )/c eq , where c eq is the equilibrium concentration [1]. The flow patterns of the two cases are very similar, which are characterized as the electromagnetic convection induced by Lorentz force in the main region of the melt and Marangoni convection along the free surface [4]. The directions of Marangoni convection are towards the crystal and crucible on the free surface due to the low temperature region in the vicinity of the crystal and upper corner of the crucible wall as seen in Figure 6. In the initial case, the Marangoni effect gives rise to a strong downward flow near the seed. This is the reason we predict non-uniform growth rate and a lower rate in this case. In the case with control, the downward flow is weakened significantly by the effect of the upward flow induced by the Lorentz force. Thus, as seen in Figure 7, we predict a more uniform supersaturation distribution below the seed.
The surface supersaturation along the crystal radius is quantitatively plotted in Figure 8. The predicted results of the cases without and with control are presented. We see that, after applying the by trained RL model, the supersaturation value and uniformity are apparently improved. It should be noticed that the growth rate non-uniformity in the 2D simulation is overestimated compared to that in the 3D simulation [1,3]. Thus, the result of optimal control in Figure 8 could be different in 3D simulations.

Discussion of the Optimal Control
In the previous section, RL is trained to optimally control the melt flow by directly adjusting the Lorentz force strength. If we consider a real case about changing the Lorentz force, the natural way is to change the frequency and current of the RF-coil. Figure 9 shows the relationship between the Lorentz force, heat generation density, and the frequency and current of the RF-coil. We can input different parameters with time according to the action of Figure 4. However, it arises some problems. For instance, adjusting the RF-coil not only changes the electromagnetic field but also changes the heat transfer in the furnace, which cannot ensure the accuracy of the control model. More importantly, the input parameters are usually fixed values before carrying out the crystal growth, which means the real-time control of the Lorentz force is quite difficult. Therefore, we implemented a compromised method to guarantee the same temperature boundary condition and a constantly optimized parameter. As in Figure 4, the optimized a fluctuates between 1.5 and 1.8, so that optimal electromagnetic field strength should in the range of 1.5 and 1.8 times the initial electromagnetic strength. Due to the reason that Marangoni induced downward flow is overestimated in 2D simulation, here, the minimum value of a = 1.5 is chosen as the optimized parameter in the 3D case. Figure 9 shows the frequency and current dependency of Lorentz force and heat generation. The value of heat generation should be close (Q max ≈ 1.32 × 10 7 W/m 3 ) for keeping the similar boundary condition between the initial and optimized case, and Lorentz force is 1.5 times the initial case (Fe max ≈ 29,000 N/m 3 ). According to the calculation results in Figure 9, the optimized input parameters were then estimated: (1) The initial case (at 25 kHz the coil frequency, 360 A the coil current).
(2) The optimized case (at 18 kHz the coil frequency, 390 A the coil current). In order to validate the optimization method, the initial and optimized parameters are applied in the 3D system. The computed Lorentz force density and temperature along the crucible for the initial and optimized cases are shown in Figure 10. As seen from the comparison of those two cases, the maximum value of Lorentz force density in the optimized case is 28,200 N/m 3 , which is around 1.5 times that in the initial case 19,137 N/m 3 , this is in line with our proposed optimization plan (a = 1.5). The temperature distributions along the wall and seed in Figure 10b,d are similar, and the temperature difference between the seed and crucible walls is almost the same. Thus, the thermocapillary numbers of the initial and optimized cases are very close, Re σ ≈ 4.8 × 10 5 [3], which means that at 18 kHz and 390 A, only the electromagnetic flow is enhanced without changing Marangoni convection significantly. The time-averaged flow velocity and supersaturation in the 3D melt for the initial and optimized cases are presented in Figure 11. As seen, the Marangoni induced downward flow is significantly weakened in Figure 11b. In the enlarged view of the region below the seed, the flow velocity is apparently weaker in the optimized case. Usually, as the Lorentz force gets stronger the flow becomes more unstable. However, in this case, the flow is more stable for the stronger Lorentz force. This is due to the competition between the Marangoni downward flow and the electromagnetically induced upward flow. The results indicate that the effect of Marangoni flow is reduced, and the melt flow becomes more stable under the optimized RF-coil parameters.
The supersaturation along the seed diameter is plotted in Figure 12. It is clear that in the optimized case we predict more uniform supersaturation and higher supersaturation distribution on the crystal surface. These results show the validation of the present optimization model. It should be noticed that present optimized parameters are rough values that are close to the accurate optimal condition since they were chosen based on an estimation of the optimal control strategy discovered by the RL model in Section 3.1. It would be more accurate if we say that proposing EM/Re σ 2 is in the range of 0.006 and 0.0072 is an optimization range for the selection of input variables. Here, EM/Re σ 2 represents the ratio of electromagnetic and Marangoni forces [5].

Conclusions
A reinforcement learning model was developed to optimally control the top-seeded solution growth (TSSG) process. The model was trained by the 2D numerical simulation, and it improved the growth rate (supersaturation) of SiC crystal through the automatic control of electromagnetic field strength. The model accuracy was validated using a constant optimized parameter value (at 18 kHz and 390 A) to the 3D system according to the obtained optimal strategy range by the RL model in the 2D TSSG process. The selected optimized parameter enhanced the electromagnetic field without significantly changing the heat generation, and the supersaturation along the crystal diameter is also improved.
Based on the characteristics of the RL model that is capable of high-dimensional output, other parameters such as seed and crucible rotations, RF-coil position, and external magnetic fields, can be simultaneously optimized and the model needs to be validated by experiments in the future work, to test the potential of RL in the field of crystal growth.