2.1. Finite Element Model
The coating process for hemispherical resonators is conducted in a vacuum environment, with a vacuum level as low as 5 × 10
−5 Pa. This indicates an extremely low air density, which is crucial for ensuring the quality and uniformity of the deposited film. The motion state of particles within a vacuum chamber can be determined by the Knudsen number (Kn), which is a dimensionless parameter used to characterize the flow regime in a fluid system [
19]. It is defined as the ratio of the mean free path length (λ) of gas molecules to a characteristic physical dimension (L) of the system, such as the diameter of a pipe or the thickness of a film being deposited. It is given by
The mean free path (λ) of gas molecules is defined as the average distance that a molecule can travel before colliding with another molecule. It can be calculated using the following formula:
where k is the Boltzmann constant (1.38 × 10
−23 J/K), T is the absolute temperature of the gas (in Kelvin), d represents the diameter of the gas molecules (in meters), which is generally an estimated value, and P is the pressure of the gas (in Pascals).
The vacuum chamber pressure is maintained at 5 × 10
−5 Pa during the coating process. The characteristic length, defined as the distance from the evaporation source to the resonator, is 0.35 m. Based on the above equation, the Knudsen number during the coating process can be calculated as 5.14 × 10
2. In general, gas flow is considered to be rarefied when the Knudsen number (Kn) is greater than 10. Therefore, the coating process can be simulated using free molecular flow [
20].
COMSOL Multiphysics is used to numerically solve film deposition problems under the free molecular flow module. The hemispherical resonator is an axisymmetric structure composed of a hemispherical shell and a stem. The junction between the central rod and the shell features a rounded corner transition. The vacuum chamber is a cylindrical structure with a diameter of 300 mm and a height of 1000 mm. The E-beam evaporation source is located directly below the resonator, which is installed 350 mm above it. A Quartz Crystal Monitor (QCM) is installed at the top of the vacuum chamber to monitor changes in the thickness of the deposited film in real time. The resonator is rotatable along two axes, with the inner wall serving as the target region for coating deposition. The axial rotation of the resonator is achieved through a rotating domain with a speed of 4 rad/s. The oscillation of the resonator is defined by the initial deposition angle. In the simulation, the double-layer film structure of the resonator is simplified to a single-layer film, as the material of the film has minimal impact on the uniformity in finite element simulations. This study employs a total of 79,026 meshed elements, with refined meshing applied to the inner wall of the resonator. The spatial distribution of each part is shown in
Figure 1a. The parameters and dimensions of the resonator are presented in
Figure 1b and
Table 1, while the meshed resonator is illustrated in
Figure 1c.
The material of the hemispherical resonator is fused quartz, and the materials of the film are chromium and gold. To reduce computational complexity, a single-layer chromium film was used for simulation in this paper. In addition, during the simulation process, the resonator is designed to rotate around the stem to ensure the uniformity of the film in the circumferential direction. Coating is a repetitive and accumulative process. Therefore, it is unnecessary to calculate the entire process in detail. This paper presents simulated data demonstrating the distribution of the resonator film thickness when the QCM film thickness reaches 1 nm. It takes 5 s to achieve a thickness of 1 nm, which is close to the actual conditions. These data can be utilized to calculate the trend of film thickness distribution over the entire coating duration.
The coating areas include the inner surface of the shell and the stem. The simulation results indicate that the vast majority of elastic energy (98.65%) is stored in the shell, while only a minimal portion (1.35%) is retained in the stem [
21]. Therefore, film thickness uniformity is a critical factor in the region of the shell, because poor uniformity may lead to increased energy losses. The stem region requires a sufficiently thick film to ensure conductivity rather than uniformity.
2.2. Simulation Results
The angle between the central axis of the hemispherical resonator and the evaporation source is denoted as θ, as shown in
Figure 1a. When θ = 0°, 10°, 20°, 30°, 40°, 45°, 50°, 60°, 70°, and 80°, the film thickness distribution of the resonator is simulated.
Figure 2 illustrates the key characteristics of film deposition at various angles. At an angle of θ = 0°, the highest deposition rate occurs at the inner wall at the center of the hemisphere. This is because the normal direction in this region aligns closely with the deposition direction. In contrast, the peripheral regions of the hemisphere and the stem exhibit minimal deposition, as their orientations are nearly parallel to the deposition direction. As θ increases to 10°, 20°, 30°, 40°, and 45°, the point of maximum deposition shifts from the center of the hemisphere toward the periphery, with effective deposition also appearing on the stems. At angles θ = 50°, 60°, 70°, and 80°, the deposition rate along the inner wall near the periphery exceeds that at the center of the hemisphere. Additionally, a new phenomenon emerges: due to the shielding effect of the spherical shell, the deposition rate at the center of the hemisphere drops to zero.
In this study, a systematic analysis of film thickness uniformity in the circumferential direction of the resonator was initially conducted, exploring the effects of deposition angle, circumferential position, and deposition time on film thickness uniformity in this direction. The film thickness distribution along the circumferential direction in the middle of the inner shell from 0 to 5 s at angle θ = 45° is illustrated in
Figure 3 and
Figure 4. The X-axis in
Figure 3 represents the angular position along the circumferential direction. The results show that the film thickness at various points along the circumference increases sequentially as the resonator rotates. After one full rotation of the resonator, the peak-and-valley (PV) values of the film thickness remain below 0.5 nm throughout the subsequent deposition process, leading to the conclusion that the uniformity of circumferential film thickness does not worsen with increasing film thickness. On the contrary, it improves as the average film thickness increases. Further analysis shows the distribution of film thickness at the bottom, middle, and top of the shell at 5 s intervals under different deposition angles, as shown in
Figure 4 and
Figure 5. Under varying angles and positions, the peak-to-valley value of the film thickness does not exceed 1.5 nm. Based on the previous analysis, it is evident that the PV values of the circumferential film thickness are largely unaffected by deposition time. Therefore, when the target film thickness reaches 100 nm, the uniformity of the circumferential film thickness can be maintained and is within an acceptable precision.
Next, film thickness uniformity was analyzed in the meridional direction of the resonator. Based on the results, it was assumed that the uniformity of film thickness is consistent across any meridian. An arbitrary meridian line was chosen to demonstrate the film thickness distribution from 0 to 5 s; the results are shown in
Figure 6. The
X-axis represents the position (in mm) in the inner wall along the meridian and the direction shown in
Figure 7a. Due to differences in the deposition rate at different positions, the film thickness growth rate is non-uniform. Consequently, the PV values along the meridian line increase as the deposition time increases. Finally, the thickness distribution in the meridional direction at 5 s intervals under various deposition angles is presented.
Figure 7 illustrates the distribution direction of film thickness, while
Figure 8a,b show the film thickness distribution on the shell and stem after 5 s of deposition under different deposition angles, respectively. The
X-axis of
Figure 8a is the same as the
X-axis in
Figure 6. The
X-axis of
Figure 8b represents the position (in mm) in the stem along the generatrix, and the direction shown in
Figure 7b. Both
Y-axes indicate the film thickness (in nm). These data can be used to calculate the uniformity of film thickness along the meridional direction after the target film thickness has been reached.
Based on the above simulation, the film thickness uniformity on the shell is calculated when the QCM reading is 100 nm. The formula for calculating the film thickness uniformity is
where
and
are the maximum and minimum film thickness, respectively, and
is the average film thickness.
The film thickness uniformity at all angles is presented in
Table 2. Although the uniformity at coating angles of 40° and 45° is better than that at other angles, it is still not sufficiently optimal. Consequently, the film layer does not exhibit good uniformity on the inner shell of the resonator when using a single deposition angle.
Inducing axial oscillation in the resonator during the coating process may be a potential method to improve the uniformity of the film. Axial oscillation simulation results can essentially be equated to the superposition of thickness distributions at all the angles, including 10°, 20°, 30°, 40°, 50°, 60°, 70°, and 80°, but excluding 45°. Since the simulated film thickness corresponds to a QCM reading of 1 nm, to achieve the actual target thickness of 100 nm, the simulated thickness distribution needs to be superposed 100 times. Under uniform oscillation conditions, these 100 superpositions are equally distributed according to the angular positions obtained from the simulation. The result of film thickness distribution is shown in
Figure 9 and the PV value is 60.4 nm. The
X-axis of
Figure 9 is the same as the
X-axis in
Figure 8a. Analysis of the results indicates that this method cannot enhance the uniformity of film thickness; thus, it is essential to explore optimized coating strategies. This paper utilizes Proximal Policy Optimization from reinforcement learning to optimize the coating process. The algorithm is integrated with finite element simulation data to enhance the axial uniformity of the film layer.
2.3. Optimization Algorithm
Proximal Policy Optimization (PPO) is a reinforcement learning (RL) algorithm for training an intelligent agent’s decision function to accomplish difficult tasks. When PPO is applied to discrete action spaces, the policy network it uses is typically a neural network that outputs a probability distribution over all possible actions. Specifically, in a discrete action space, PPO selects an action at each time step based on the current policy, with the action sampled according to the probability distribution generated by the policy network. The core idea of the algorithm is updating the policy through gradient descent, increasing the probability of selecting actions that yield higher rewards, thereby gradually improving the overall performance of the agent [
22,
23,
24]. The PPO algorithm demonstrates distinct advantages over other optimization methods. Unlike genetic algorithms (GAs) that rely on random crossover/mutation operations—which may disrupt high-performance sequences—PPO strategically prioritizes exploration in high-reward regions. Compared with grid search methods, PPO avoids the prohibitive computational costs associated with fine-grained angle and timestep discretization in high-dimensional spaces. Crucially, PPO’s inherent suitability for continuous, high-dimensional, and dynamically interactive environments makes it the optimal choice for future multi-degree-of-freedom system development. This includes enhancements such as integrating Planetary Rotation Systems and improving yaw-angle precision—capabilities where GAs fundamentally underperform. The advantages of these algorithms indicate that PPO is a superior choice for coating optimization [
25,
26].
The PPO model was implemented in Python, utilizing PyCharm Community Edition 2024.2.2 as the development platform, to address the coating problem. The coating process can be divided into discrete time intervals, with each interval corresponding to a 1 nm increase in thickness, as detected by the QCM. The total film thickness on the resonator is obtained by summing the incremental thickness values for each time interval. The film thickness and its distribution at various coating angles have been determined through finite element simulations. To achieve a target film thickness of 100 nm, the process is discretized into 100 steps. In each step, the accumulated film thickness and distribution for a given angle are updated. The entire coating process is simulated by performing 100 iterations, with the PPO algorithm used to optimize the film thickness and distribution at each step. The goal is to minimize the PV value of the accumulated film distribution over these 100 steps. The optimal coating strategy is derived from the output of the selections made at each step.
For the task discussed in this paper, it is necessary to define the state and action spaces, in addition to designing the reward function. The “state” represents all the observational information about the environment available to the agent. In this task, the initial state is defined as a film thickness of zero. After each action taken by the agent—selecting a film thickness distribution at a particular angle—the agent observes the accumulated film thickness at the next time step, denoted as S(t). The “action” refers to the agent’s decision based on the current state. Specifically, the agent selects one of ten possible thickness distributions from different perspectives for the subsequent cumulative calculation, as shown in
Figure 10. The agent selects a thickness distribution from the ten available options by sampling from the probability distribution output by the Actor network. During training, this sampling encourages exploration, while during evaluation, the highest-probability action is chosen. The probabilities are optimized via PPO’s clipped objective to maximize the reward function (Equation (4)), which penalizes peak-to-valley variations. No explicit cost function is used; the selection is learned end-to-end through policy gradients. In addition, the PPO model ensures escape from suboptimal solutions by continuously exploring the action space during training, while its stochastic policy updates facilitate progressive convergence toward the global minimum, thereby inherently avoiding local minima.
The reward is a scalar value provided by the environment to assess the quality of the actions. The goal of this task is to minimize the difference between the maximum and minimum values of the accumulated film thickness. The reward function is designed to simultaneously minimize peak-to-valley thickness variation while maintaining a reasonable average thickness. Therefore, the reward function is defined as follows:
The training process of the algorithm consists of data collection and policy optimization. For data collection, during each episode, the agent selects coating angles based on the current policy. The environment then performs accumulation and returns the new state and reward. The experience tuples (state, action, reward, next state, done) are stored in the replay buffer. For policy optimization, at the end of each episode, the learn method is invoked to compute the Generalized Advantage Estimation (GAE), normalize the rewards, and update the policy network using the Clipped Surrogate Objective to ensure the update magnitude does not exceed ε. Additionally, the Critic network is updated by minimizing the value function error (MSE). The process terminates once the maximum accumulation count is reached [
27]. As shown in
Figure 11, the core algorithm flow of PPO is illustrated.
Table 3 presents the key parameters of the PPO algorithm along with their descriptions.
The algorithm incorporates a dynamic masking mechanism that prevents local overfitting by tracking array usage frequency. It employs a dual-save strategy, preserving both the best-performing and final models to optimize exploration-exploitation balance. Additionally, real-time TensorBoard monitoring tracks Actor/Critic losses, reward trends, and state differences.
The model was then trained multiple times, and the trained model was subsequently employed to process the simulation data in an iterative optimization framework. During the computational procedure, 1000 optimization cycles were executed to identify the optimal coating strategy.
Figure 12 shows the flowchart of PPO for the coating process. During each iteration, the model outputs the PV value of the film thickness under the current strategy. After 1000 iterations, the model identifies the iteration with the minimum PV value and provides the corresponding deposition time required for each coating angle under that strategy.
Figure 13 illustrates the PV value output during each of the 1000 iterations. After approximately 250 iterations, the PV values consistently remain below 15 nm, indicating good convergence of the model’s algorithm. This suggests that the model is well-suited for optimizing coating strategies and can be effectively applied to optimize the coating process of resonators with varying requirements and dimensions in further applications.
Table 4 presents the results obtained from the PPO output, showing the deposition angle and the corresponding time required for each angle. The unit of time is the time required for the QCM reading to increase by 1 nm. Since the rate of increase in the QCM reading is influenced by various deposition parameters, which are subject to continuous optimization during the coating process, the exact times are not specified in this study. Instead, the QCM reading is used as a reference for time. Since the target film thickness is 100 nm, the calculation is completed when the film thickness recorded by the QCM reaches this value.
Therefore, based on the data from
Table 4 output by the PPO algorithm, the actual coating process is performed as follows: Initial deposition begins at a deposition angle of 30° until the QCM indicates a film thickness of 32 nm. The deposition angle is then adjusted to 40°, and deposition continues until the QCM-measured thickness increases by an additional 11 nm. Subsequently, the deposition angle is further increased to 45°, and the process repeats in this stepwise manner. By following this protocol, a highly uniform film can be achieved on the inner surface of the resonator.
Figure 14 presents the thickness distribution of the shell and stem obtained through COMSOL using the optimized coating process. The
X-axis denotes the position (in mm) in the inner wall along the meridian, extending from the edge of the shell, passing through the junction, to the stem termination. The
Y-axis represents the deposited film thickness (in nm). The film thickness data in the shell section are as follows (in nanometers): 98.33, 102.25, 103.61, 103.51, 101.51, 99.26, 98.68, 100.65, 103.37, 103.16, 99.45, 98.31, 99.38, 101.36, 102.47, 101.99. The film thickness uniformity was determined to be 5.24% according to Equation (3). This result demonstrates the effectiveness of the finite element method combined with reinforcement learning algorithms in improving film uniformity. However, as shown in
Figure 15, another key issue arises: while the optimized process improves the film uniformity on the shell, it results in thinner films in other areas, which lead to an increase in the resistance of the entire hemispherical resonator. The film thickness data in the junction and stem are as follows (in nanometers): 103.21, 71.95, 74.11, 75.56, 107.23, 109.36, 99.52, 83.68, 58.76, 46.78, 50.33, 53.24, 54.37, 55.33, 55.95, 56.35, 56.61, 66.52, 90.41. This paper addresses this issue by using a correction mask.