Efficient Rigorous Coupled-Wave Analysis Simulation of Mueller Matrix Ellipsometry of Three-Dimensional Multilayer Nanostructures

Mueller matrix ellipsometry (MME) is a powerful metrology tool for nanomanufacturing. The application of MME necessitates electromagnetic computations for inverse problems of metrology determination in both the conventional optimization process and the recent neutral network approach. In this study, we present an efficient, rigorous coupled-wave analysis (RCWA) simulation of multilayer nanostructures to quantify reflected waves, enabling the fast simulation of the corresponding Mueller matrix. Wave propagations in the component layers are characterized by local scattering matrices (s-matrices), which are efficiently computed and integrated into the global s-matrix of the structures to describe the optical responses. The performance of our work is demonstrated through three-dimensional (3D) multilayer nanohole structures in the practical case of industrial Muller matrix measurements of optical diffusers. Another case of plasmonic biosensing is also used to validate our work in simulating full optical responses. The results show significant numerical improvements for the examples, demonstrating the gain in using the RCWA method to address the metrological studies of multilayer nanodevices.


introduction
Mueller matrix ellipsometry is a powerful measurement for both academic and industrial applications. By measuring all polarizing states of the sample, this non-destructive measurement is very sensitive to optical responses and has been extensively used for characterizing metrological structures [1][2][3][4][5][6]. The nature of MME is an indirect experimental measurement that necessitates the inverse problem to extract the expected information (grating dimensions, layer thicknesses, optical indices) from the Mueller matrix [7]. The inverse problem is conventionally implemented by the optimization approach, which involves a long iterative process of fitting simulated and measured Mueller matrices [8]. On the contrary, in the recent neural network (NN) approach, well-trained NN models have been reported to achieve high performances in the prediction of metrological geometries [9][10][11]. However, when considering NNs for metrology purposes, a large synthetic dataset must be generated to train the NN model due to the lack of available experimental data. In both optimization and NN approaches, a fast and accurate simulation of nanostructures is critical in order to reduce the computational cost of facilitating the inverse problems of metrology characterization.
Among the numerical methods for periodic nanostructures, RCWA has been widely applied because of its simplicity and accuracy [12][13][14][15]. Since it is well-formulated, RCWA method has been extensively studied for speeding up the modeling of the optical properties of subwavelength-grating structures and improving the efficiency of inverse problemsolving for accurate dimensional determination. The representative studies include the diffractive interface theory to bypass the eigen decomposition of ultrathin metasurfaces [16], ties of subwavelength-grating structures and improving the efficiency of inverse pro solving for accurate dimensional determination. The representative studies inclu diffractive interface theory to bypass the eigen decomposition of ultrathin metasu [16], the perturbation theory to reduce eigen problems of non-lamellar layers [17-19 normal vectors to improve simulation convergence [20,21]. It is worth noting that studies were formulated and developed from theoretical structures composed of a sentative grating layer. The theoretical structures enable the simulation of mult nanodevices because the RCWA method treats component layers independently (F 1) Figure 1. Schematic of Mueller matrix ellipsometry and RCWA simulation of multilayers. By uring all polarizing states of reflected waves, the metrological information of the multilayers tained in the Mueller matrix (MM). In the RCWA method, local layers are independently sim and combined for the global s-matrix to quantify reflectance to compute MM in a wide ra measured wavelengths. However, this feature may limit the RCWA method when it comes to mult nanostructures. In practical applications of biosensing [22], optical diffusers [23] cells [24], and photodetectors [25], nanostructures are composed of top gratings an tom homogeneous layers for tuning the efficiency of the optical devices. Simulating multilayers involves computing s-matrices of gratings and homogeneous layers combining these multiple s-matrices into a global one [26,27] to quantify the expect tical responses. Although the computation of a homogeneous layer is simple, w solving the eigen decomposition, the handling stack of homogeneous layers coul numerical issue in 3D structures as these layers involve expensive matrix algebra in to connect to the grating layers. In this study, we present the efficient RCWA simu of 3D multilayers with a stack of bottom homogeneous layers. We introduce vectorcomputation to quickly simulate and integrate s-matrices of homogeneous layers in global s-matrix. In the RCWA simulation, we also apply a recursive algorithm wit tom-up construction [27] to reduce the components of the global s-matrix. It is note the bottom-up construction is presented in decades to enhance the RCWA method volving only a quarter of the global s-matrix to quantify reflected waves. Howeve Figure 1. Schematic of Mueller matrix ellipsometry and RCWA simulation of multilayers. By measuring all polarizing states of reflected waves, the metrological information of the multilayers is contained in the Mueller matrix (MM). In the RCWA method, local layers are independently simulated and combined for the global s-matrix to quantify reflectance to compute MM in a wide range of measured wavelengths. However, this feature may limit the RCWA method when it comes to multilayer nanostructures. In practical applications of biosensing [22], optical diffusers [23], solar cells [24], and photodetectors [25], nanostructures are composed of top gratings and bottom homogeneous layers for tuning the efficiency of the optical devices. Simulating such multilayers involves computing s-matrices of gratings and homogeneous layers, then combining these multiple s-matrices into a global one [26,27] to quantify the expected optical responses. Although the computation of a homogeneous layer is simple, without solving the eigen decomposition, the handling stack of homogeneous layers could be a numerical issue in 3D structures as these layers involve expensive matrix algebra in order to connect to the grating layers. In this study, we present the efficient RCWA simulation of 3D multilayers with a stack of bottom homogeneous layers. We introduce vector-based computation to quickly simulate and integrate s-matrices of homogeneous layers into the global s-matrix. In the RCWA simulation, we also apply a recursive algorithm with bottom-up construction [27] to reduce the components of the global s-matrix. It is noted that the bottom-up construction is presented in decades to enhance the RCWA method by involving only a quarter of the global s-matrix to quantify reflected waves. However, the application of this algorithm is nontrivial. The explanation is provided through the demonstration of modeling the Mueller matrix of the 3D multilayer nanohole structure.

RCWA Simulation
The RCWA method is well-established, and its numerical implementations have been released [14,[28][29][30]. However, the studies usually applied a framework investigated for gratings for a global structure that may decrease numerical efficiency. In this section, we present our RCWA simulation as an efficient solution for multilayers. The distinct feature of our approach is the vector-based formation, which is capable of the fast computing of homogeneous layers in multilayer nanostructures.

Grating Layer
The RCWA method involves the semi-discretization of Maxwell 's equations in Fourier space to compute the s-matrix of a grating layer. For example, the electric field of a layer is expressed as: where k 0 is the free-space wave number, and e presents the electric field of the layer in Fourier space. Matrix Ω in Equation (1) is defined by: Here, [[ε]] is used to describe the permittivity distribution in Fourier space on the x-y plane. K x , K y are diagonal matrices characterizing the wave vector components. (The derivations of Equations (1) and (2) are presented in Appendix A). In this study, matrix Ω in Equation (2) is used to secure the general characteristics of the diffraction. One can quantify the diffraction of s-and p-polarization light by the submatrices Ω 11 , Ω 22 and the crossconversion of these polarization modes in the submatrices Ω 12 , Ω 21 . In the studies of the RCWA method [31,32] for 2D structures, the incident plane perpendicular to the y-direction zeroed wave component in the y-direction (K y = 0) was applied, consequently resulting in vanished Ω 12 Ω 21 . There is no cross-conversion between the two polarization modes. Thus, the diffraction of s-or p-polarization light can be characterized independently. Either ] was separately presented in the formulation of the RCWA method for each polarization.
The s-matrix of the grating layer is composed of square submatrices. The submatrices determining reflected and transmitted waves are defined as ( [26]): where X, A, B are related to eigenvalues Λ and eigen matrix W of matrix Ω. The formulas of these matrices are presented in Appendix B. Equations (3a) and (3b) involve matrix computation with a matrix size of 4N 2 h . Herein, N h is the harmonic number in Fourier expansions, and N h = (2m x + 1) 2m y + 1 , where m x , m y are diffraction orders in the x-and y-directions, respectively.

Homogeneous Layer
For homogeneous layers, a simple computation of the s-matrix without solving the eigen problem is implemented with the advantage of the simple form of permittivity distribution [ with i being the unit matrix. Inserting Equation (4) into Equation (2), we obtain a simplified form of matrix Ω: In Equation (5), the submatrices Ω 11 = Ω 22 = K 2 x + K 2 y − iε are diagonal matrices that entail the form of a diagonal matrix for Ω, eliminating the eigen problem as eigenvalues are elements of the diagonal matrix and eigenvectors have a form of the unit matrix.

Vector-Based Formation
Equations (3a) and (3b) are also used to determine the s-matrix of homogeneous layers and may result in expensive matrix algebra. For 3D subwavelength gratings, the harmonic number N h is very large for accurate RCWA simulation [33], increasing the numerical burden for the computing of the stack of homogeneous layers, though it does not require eigen decomposition. To alleviate the issue, our approach introduces vector-based formation to circumvent the large matrix computation of homogeneous layers. On the righthand side of Equations (3a) and (3b), X is a diagonal matrix and A, B are square matrices [32]. However, matrices A and B own a special form composed of four sub-diagonal matrices ( Figure 2), which is applied to enhance the RCWA simulation of homogeneous layers. Our approach compactly presents such matrices by the vector of four diagonal elements. Then, the computation of these matrices with the size of 4N 2 h can be implemented fast by only relating to the diagonal elements with a size of 4N h (the details are presented in Appendix C).
In Equation (5), the submatrices = = + − ε that entail the form of a diagonal matrix for Ω , eliminating the e values are elements of the diagonal matrix and eigenvectors have trix.

Vector-Based Formation
Equations (3a) and (3b) are also used to determine the s-matr ers and may result in expensive matrix algebra. For 3D subwavel monic number Nh is very large for accurate RCWA simulation [33 ical burden for the computing of the stack of homogeneous layer quire eigen decomposition. To alleviate the issue, our approach i formation to circumvent the large matrix computation of homo right-hand side of Equations (3a) and (3b), X is a diagonal matr matrices [32]. However, matrices A and B own a special form com onal matrices (Figure 2), which is applied to enhance the RCWA neous layers. Our approach compactly presents such matrices by onal elements. Then, the computation of these matrices with the plemented fast by only relating to the diagonal elements with a are presented in Appendix C).

Figure 2. Example of the matrix form in the computation of local s-matri
On the left is a completed matrix form involved in grating layers. On th composed of four sub-diagonal matrices, which is applied to enhance th mogeneous layers.

Global s-Matrix
The global s-matrix is used to quantify the optical response The global s-matrix can be established by connecting multiple slayers. In the numerical implementation, the global s-matrix is co sive algorithm. The conventional top-down construction, from the substrate, defines the global s-matrix [26] as: where ⊗ is the Redheffer Star Product; the superscript i indicates t i Figure 2. Example of the matrix form in the computation of local s-matrices in the RCWA method.
On the left is a completed matrix form involved in grating layers. On the right is the special form composed of four sub-diagonal matrices, which is applied to enhance the RCWA simulation of homogeneous layers.

Global s-Matrix
The global s-matrix is used to quantify the optical responses of the nanostructure. The global s-matrix can be established by connecting multiple s-matrices of component layers. In the numerical implementation, the global s-matrix is computed with the recursive algorithm. The conventional top-down construction, from the top (superstrate) to the substrate, defines the global s-matrix S G [26] as: where ⊗ is the Redheffer Star Product; the superscript i indicates the i-th component layer (i = 1, N with N the number of layers in the nanostructure). S i expresses the s-matrix of layer I, and S G i characterizes the global s-matrix computed at iteration i or the i-th partially collective structure. With this notation, S G N (or S G in the reduced form) is the final global s-matrix of the nanostructure. Initially, S G 0 is the s-matrix of the superstrate that surrounds the sample. All the elements of S G i are obtained in the following expressions: It is noted that practical applications only require a particular interest in the S G i 11 and S G i 21 elements of the global s-matrix, as these elements are respectively used for characterizing reflectance and transmittance ( Figure 3). Equation (7a) shows that computing S G i 11 requires the four exact values of the full matrix S G i−1 . In other words, the construction in Equation (6) necessitates the full global s-matrix for modeling reflectance.
elements of the global s-matrix, as these elements are respectively used for characterizing reflectance and transmittance ( Figure 3). Equation (7a) shows that computing requires the four exact values of the full matrix . In other words, the construction in Equation (6) necessitates the full global s-matrix for modeling reflectance.
In order to reduce the submatrices of the global s-matrix, the computation can start from the bottom to the top. With the rearrangement of the Redheffer Star Product, the bottom-up construction of the global s-matrix is expressed as ( [27]): In this construction, the initial is the s-matrix of the substrate. The submatrices of interest and are defined as: Only two equations are needed instead of the four required for the top-down construction. Equation (9a) shows that is uniquely related by itself and layer s-matrices in numerical computation. The advantage of bottom-up construction is illustrated in Figure 3. Only half of the global s-matrix is required to simulate full optical responses, and only a quarter of the global s-matrix is involved in determining the reflectance, enabling the fast simulation of the corresponding Mueller matrix.  In order to reduce the submatrices of the global s-matrix, the computation can start from the bottom to the top. With the rearrangement of the Redheffer Star Product, the bottom-up construction of the global s-matrix is expressed as ( [27]): In this construction, the initial S G 0 is the s-matrix of the substrate. The submatrices of interest S G i 11 and S G i 21 are defined as: Only two equations are needed instead of the four required for the top-down construction. Equation (9a) shows that S G i 11 is uniquely related by itself and layer s-matrices in numerical computation. The advantage of bottom-up construction is illustrated in Figure 3. Only half of the global s-matrix is required to simulate full optical responses, and only a quarter of the global s-matrix is involved in determining the reflectance, enabling the fast simulation of the corresponding Mueller matrix.  21 also possess a special form composed of four sub-diagonal matrices. Thus, vector-based computation also facilitates integrating homogeneous layers into the structures in numerical work.

Case 1: Mueller Matrix Ellipsometry
We validate and discuss the performance of our work with the industrial case: MME of the 3D multilayer nanohole structure. The structure is composed of five layers: the first two nanohole patterns of Si 3 N 4 , SiO 2 are stacked on three following homogeneous layers: Si 3 N 4 , SiO 2 , and Si 3 N 4 , respectively. In our case, these nanoholes are metrology dies on STMicroelectronics wafers [34] in order to control the process for imaging devices (diffusive patterns). Experimental MME was measured at an angle of incidence of 65 • and azimuth of 45 • and provided by STMicroelectronics, Crolles, France. The cross dimension of the hole was measured at 120 nm using an inline CD-SEM (ATLAS system from Onto Innovation).
We use our laboratory measurement for the refractive indices of Si, SiO 2 (shown in Figure 4) and the Tauc-Lorentz single oscillator dispersion model [35] for the dielectric function of Si 3 N 4 , the parameters of which are given in Table 1: Generally, and in Equations (9a) and (9b) involve large matrix algeb the matrix size of 4N . However, by starting from the substrate, and a sess a special form composed of four sub-diagonal matrices. Thus, vector-based tation also facilitates integrating homogeneous layers into the structures in nu work.

Case 1: Mueller Matrix Ellipsometry
We validate and discuss the performance of our work with the industrial cas of the 3D multilayer nanohole structure. The structure is composed of five layers: two nanohole patterns of Si3N4, SiO2 are stacked on three following homogeneous Si3N4, SiO2, and Si3N4, respectively. In our case, these nanoholes are metrology STMicroelectronics wafers [34] in order to control the process for imaging devices sive patterns). Experimental MME was measured at an angle of incidence of 65° a muth of 45° and provided by STMicroelectronics, Crolles, France. The cross dimen the hole was measured at 120 nm using an inline CD-SEM (ATLAS system fro Innovation) We use our laboratory measurement for the refractive indices of Si, SiO2 (sh Figure 4) and the Tauc-Lorentz single oscillator dispersion model [35] for the d function of Si3N4, the parameters of which are given in Table 1:  RCWA is implemented by our in-house code for the numerical demonstratio nanohole structure (Figure 5a   RCWA is implemented by our in-house code for the numerical demonstration of the nanohole structure (Figure 5a). The source code is publicly accessible [36]. Matrix Ω in Equation (5) is used for computing the local s-matrices of the three bottom homogeneous layers. For Si 3 N 4 /SiO 2 nanohole layers, solving the eigen problem of matrix Ω in Equation (2) is required for these layer s-matrices. Three RCWA algorithms are implemented for the comparison. Conventional RCWA needs a full global s-matrix to model the reflected waves (Equations (7a)-(7d)). RCWA with bottom-up construction only involves a quarter of the global s-matrix (Equation (9a)). Our work applies both bottom-up construction and vector-based formation for the fast solution of modeling the multilayers. The Mueller matrix is calculated from the global s-matrix with zeroth-order reflected light (Appendix D).

Case 2: Full Optical Responses
Another multilayer nanohole pattern is considered to test our approach. The structure consists of an Al nanohole pattern on three homogeneous layers of SiO2, Si, and Ge, respectively (Figure 5c). Such a structure is widely applied in biosensing as its plasmonic resonance is sensitive to the refractive index change induced by targeted biological molecules [22]. In numerical implementation, the optical responses are simulated in deionized water at normal incidence, with the incident plane perpendicular to the y-direction. The documented refractive indices of Al [37], Ge [38], and deionized water [39] were selected in the computation. Similar to the demonstration in Case 1, we employ Equations (2) and (5) for the s-matrices of the homogenous layers and nanohole pattern, respectively. To obtain full optical responses, , from Equations (9a) and (9b) are used to calculate the reflectance (R) and the transmittance (T). The absorbance (A) is determined by: A = 1 − R − T. The vector-based formation is applied to the computation related to homogeneous layers.
It is noted that similar multilayers were also studied in [30] to demonstrate the advantage of the RCWA method over the finite-difference time domain (FDTD) in designing plasmonic applications. Their numerical results are used as a reference for the validation of our work.

Mueller Matrix Ellipsometry
The simulation time of the three algorithms is normalized by the total simulation time of our work and is provided in Table 2. The enhancement is presented in percentages defined by the ratio of the algorithms. For simplicity, we only calculate the ratio needed for the discussion. Compared to bottom-up and conventional RCWA, this work accelerates the computation by 1.51 and 1.91 times, respectively. The bottom-up algorithm optimizes the RCWA method by only computing reflected waves, resulting in a significant reduction of 70% in computing the global s-matrix. However, it is worth remembering that solving layer s-matrices is still the key issue in the RCWA method, highly affecting numerical computations. Consequently, the bottom-up algorithm only saves 21% of the total computation time compared to the conventional RCWA. It can be the reason that the bottom-up algorithm is nontrivial for the application of multilayers in metrological studies.

Case 2: Full Optical Responses
Another multilayer nanohole pattern is considered to test our approach. The structure consists of an Al nanohole pattern on three homogeneous layers of SiO 2 , Si, and Ge, respectively (Figure 5c). Such a structure is widely applied in biosensing as its plasmonic resonance is sensitive to the refractive index change induced by targeted biological molecules [22]. In numerical implementation, the optical responses are simulated in deionized water at normal incidence, with the incident plane perpendicular to the ydirection. The documented refractive indices of Al [37], Ge [38], and deionized water [39] were selected in the computation. Similar to the demonstration in Case 1, we employ Equations (2) and (5) for the s-matrices of the homogenous layers and nanohole pattern, respectively. To obtain full optical responses, S G 11 , S G 21 from Equations (9a) and (9b) are used to calculate the reflectance (R) and the transmittance (T). The absorbance (A) is determined by: A = 1 − R − T. The vector-based formation is applied to the computation related to homogeneous layers.
It is noted that similar multilayers were also studied in [30] to demonstrate the advantage of the RCWA method over the finite-difference time domain (FDTD) in designing plasmonic applications. Their numerical results are used as a reference for the validation of our work.

Mueller Matrix Ellipsometry
The simulation time of the three algorithms is normalized by the total simulation time of our work and is provided in Table 2. The enhancement is presented in percentages defined by the ratio of the algorithms. For simplicity, we only calculate the ratio needed for the discussion. Compared to bottom-up and conventional RCWA, this work accelerates the computation by 1.51 and 1.91 times, respectively. The bottom-up algorithm optimizes the RCWA method by only computing reflected waves, resulting in a significant reduction of 70% in computing the global s-matrix. However, it is worth remembering that solving layer s-matrices is still the key issue in the RCWA method, highly affecting numerical computations. Consequently, the bottom-up algorithm only saves 21% of the total computation time compared to the conventional RCWA. It can be the reason that the bottom-up algorithm is nontrivial for the application of multilayers in metrological studies. Although computing homogeneous layers can be fast without solving the eigen problem, it is also time-consuming to handle large matrices in order to connect these layers to grating layers. In conventional and bottom-up RCWA, the normalized time of three homogeneous layers lasts 0.4, which is about 43% of the time needed for two Si 3 N 4 , SiO 2 nanohole patterns. This ratio dramatically declines to less than 1% in our work. The impressive improvement results from our approach that circumvents large matrices by using representative vectors for fast computation. It also reduces the 88% computational time of the global s-matrix. With these improvements, our work saves 48% of the total simulation time.
The Mueller matrix is calculated from reflected waves. As shown in Figure 6, the simulated Mueller matrix is in good agreement with the industrial measured one of the demonstrated structure. This result confirms the accuracy of the RCWA method, which is favorable for the metrological studies of 3D nanomultilayers. Although computing homogeneous layers can be fast without solving the eigen prob lem, it is also time-consuming to handle large matrices in order to connect these layers to grating layers. In conventional and bottom-up RCWA, the normalized time of three ho mogeneous layers lasts 0.4, which is about 43% of the time needed for two Si3N4, SiO nanohole patterns. This ratio dramatically declines to less than 1% in our work. The im pressive improvement results from our approach that circumvents large matrices by using representative vectors for fast computation. It also reduces the 88% computational time o the global s-matrix. With these improvements, our work saves 48% of the total simulation time.
The Mueller matrix is calculated from reflected waves. As shown in Figure 6, the simulated Mueller matrix is in good agreement with the industrial measured one of the demonstrated structure. This result confirms the accuracy of the RCWA method, which is favorable for the metrological studies of 3D nanomultilayers.   Figure 7a presents the full optical responses with observed Fano resonance, e.g., a 1290 nm of the absorbance spectrum of the demonstrated nanomultilayers. Figure 7b shows the simulation time of the two RCWA algorithms. In all studied diffraction orders our work achieves higher numerical efficiency. Accurate performance of the RCWA method necessitates high diffraction orders, which exponentially increase simulation  Figure 7a presents the full optical responses with observed Fano resonance, e.g., at 1290 nm of the absorbance spectrum of the demonstrated nanomultilayers. Figure 7b shows the simulation time of the two RCWA algorithms. In all studied diffraction orders, our work achieves higher numerical efficiency. Accurate performance of the RCWA method necessitates high diffraction orders, which exponentially increase simulation time. For tasks of understanding important optical responses, a diffraction order m x = m y = 5 can be favorable for acceptable accuracy. A higher value, e.g., m x = m y = 10, should be considered for more rigorous demand in applications of inverse design in metrological investigations. In these instances, our work is about 10 times and 2 times faster, respectively. However, it must be noted that the comparison is relative as the numeric may result from differences in computational frameworks, such as the progr guages, simulation targets, and computer resources. For example, ref. [30] em in a node of a workstation cluster with 24 cores, while our work uses Pyth personal laptop equipped with Intel ® core i7 CPU to perform numerical com this section, we demonstrate that the numerical investigation of the multil conducted in a reasonable time using bottom-up construction and vector-bas tion. Our algorithm mainly deals with the matrix algebra of only one patter stead of all four layers, which entails high performance in the demonstration Lastly, it should be pointed out that the impressive improvement in the strated cases of Muller matrix modeling and full optical responses is obtai intervening with the eigen problem of the grating layers. Thus, our appro implemented in the RCWA method. Owing to the compelling enhancemen and accuracy, our work is useful to the community as similar structures with ber of homogeneous underlayers are widely used in optical applications.

Conclusions
Practical applications of multilayer nanostructures with manipulated wa However, it must be noted that the comparison is relative as the numerical efficiency may result from differences in computational frameworks, such as the programming languages, simulation targets, and computer resources. For example, ref. [30] employed Julia in a node of a workstation cluster with 24 cores, while our work uses Python within a personal laptop equipped with Intel ® core i7 CPU to perform numerical computations. In this section, we demonstrate that the numerical investigation of the multilayers can be conducted in a reasonable time using bottom-up construction and vector-based computation. Our algorithm mainly deals with the matrix algebra of only one patterned layer instead of all four layers, which entails high performance in the demonstration.
Lastly, it should be pointed out that the impressive improvement in the two demonstrated cases of Muller matrix modeling and full optical responses is obtained without intervening with the eigen problem of the grating layers. Thus, our approach is easily implemented in the RCWA method. Owing to the compelling enhancement, simplicity, and accuracy, our work is useful to the community as similar structures with a large number of homogeneous underlayers are widely used in optical applications.

Conclusions
Practical applications of multilayer nanostructures with manipulated wave propagations require efficient numerical simulations for targeted optical responses. By introducing the vector-based formation of fast computing and integrating homogeneous layers into nanostructures, we present the simple but efficient RCWA simulation of multilayers with bottom homogeneous layers. An impressive performance is obtained in the demonstration of the 3D nanomultilayers in both the industrial MME measurement of optical diffusers and the full optical response of plasmonic structures. Thus, our work is expected to serve as a numerical simulation approach in future work for the metrological studies of engineered diffraction applications.   In Equations (A1a) and (A1b), e and h are the electric field and the magnetic field in Fourier space, respectively. k 0 is the free-space wave number. Matrices P and Q present the cross-sectional distributions of material permittivity of the layer. P, Q are defined in the following equations: Here, [[ε]] is used to describe the permittivity distribution in Fourier space on the x-y plane. K x , K y are diagonal matrices characterizing the wave vector components. Their constructions are presented in the next subsections.
The semi-discretization of Maxwell's equations in Equations (A1a) and (A1b) and the expression of matrix P, Q in Equations (A2a) and (A2b) are the common forms used in many numerical frameworks developed for the RCWA method [26,30]. From Equation (A1a), the second-order derivation of e is obtained: Inserting Equation (A1a) into Equation (A3), the semi-discretization of the electric field in Fourier space is obtained: Using PQ = Ω in Equation (A4), Equation (1) is derived, and the definition of matrix Ω in Equation (2) is simply the multiplication product of P and Q in Equations (A2a) and (A2b). The RCWA method is based on the Fourier expansion of the electric field, the magnetic field, and layer permittivity, e.g., the expansion of permittivity in the x-y plane is written as: ε(m, n) = m x ∑ p=−m x n y ∑ q=−n y ε(p, q). exp j k x x + k y y (A5) where m x , m y are diffraction orders in the x-and y-directions. The harmonic number in Fourier expansion is calculated as: N h = (2m x + 1)(2m y + 1). In Equation (A5), k x , k y are the components of the wave vector: k y (n) = k y,inc + 2π L y n , with n = −n y , n y (A6b) Here, k x,inc , k y,inc are vector components in the x-y plane of wave incident. L x , and L y are the periodic lengths of gratings. Vectors k x , k y are constructed for the combination of all m and n: k y (u + v(2m x + 1)) = k y v − n y , with u = 0, 2m x v = 0, 2n y (A7b) For matrix multiplication, a normalized diagonal matrix by the free-space wavenumber k 0 is used: K x = diag k x /k 0 , K y = diag k y /k 0 (A8) [[ε]] (2m x + 1)v + u, 2n y + 1 l + k = ε n y + u − k, m x + v − l with u, k = −m x , m x , v, l = −n y , n y (A9)

Appendix B. Matrices Used for Calculating Layer s-Matrix
where L, k 0 are layer thickness and free-space wave number, respectively. Matrix V is defined as: V = QWΛ −1/2 . Matrices W 0 and V 0 are similar terms for free-space propagation.