PMCHWT Solver Accelerated by Adaptive Cross Approximation for Efficient Computation of Scattering from Metal Nanoparticles

An accelerated algorithm that can efficiently calculate the light scattering of a single metal nanoparticle was proposed. According to the equivalent principle, the method of moment (MoM) transforms the Poggio–Miller–Chang–Harrington–Wu–Tsai (PMCHWT) integral equations into linear algebraic equations, which are solved by the flexible generalized minimal residual solver (FGMRES). Each element of near field MoM impedance matrix was described by Rao–Wilton–Glisson (RWG) basis functions and calculated by double surface integrals. Due to the low-rank property, the adaptive cross approximation (ACA) algorithm based on the octree data structure was applied to compress the MoM impedance matrix of far field action leading to the significant reduction of solution time and memory. Numerical results demonstrated that the proposed method is both accurate and efficient. Compared with the traditional MoM, the ACA algorithm can significantly reduce the impedance matrix filling time and accelerate the scattering field’s computation from actual metal nanoparticles using PMCHWT integral equations.


Introduction
The unique optical properties of metal nanostructures have attracted significant attention from researchers in recent years [1,2]. In the optical band, the metal nanoparticles change from good conductor property to medium property. The physical mechanism of this phenomenon can be described by complex permittivity [3,4]. Surface plasmon polaritons (SPPs) are generated on the surface of the metal-dielectric interface when irradiated by light waves of a specific frequency [5]. For metal nanoparticles with a size smaller than or close to the wavelength, localized surface plasmon resonances (LSPRs) occur [6]. Theoretically, this phenomenon leads to three effects: enhanced light absorption, far-field scattering, and localized surface electric field. The functions and applications of optoelectronics, sensing, photoacoustic imaging, photothermal therapy, and detection based on the properties of metal nanostructured LSPRs have also been continuously developed, resulting in much meaningful progress in the application of LSPRs [7][8][9][10]. Therefore, discussing the rapid calculation of far-field scattering characteristics of metal nanoparticles can promote the research progress of LSPRs.
The study of the optical properties of metal LSPRs can be carried out using theoretical simulations and experiments. The theoretical simulations not only calculate the near-field and far-field optical properties of metal nanoparticles but also assist in designing experiments and theoretically analyzing the experimental results [11][12][13][14][15][16][17]. The optical response of metal can be well described in the classical framework based on Maxwell's equations.
Many numerical methods can be used for solving Maxwell's equations, such as the finite difference time domain (FDTD) [18] method, finite element method (FEM) [19], and method of moment (MoM) [20]. FDTD and FEM are based on differential equations, while MoM is based on integral equations. When calculating metal scattering properties, FEM and FDTD need to discretize the scatterer and the surrounding space, while the MoM only needs to discretize the scatterer, thus reducing the time cost and storage. In addition, it is not easy to construct boundary conditions that describe open-domain problems. However, the Green function automatically satisfies the radiation boundary conditions, so MoM is used to calculate the scattering of any particle, ensuring high accuracy of the solution.
For the electromagnetic problems of arbitrary three-dimensional targets, both the surface integral equation (SIE) [21] and the volume integral equation (VIE) [22] can be applied. Since the metal nanoparticle is considered a uniform medium, the SIE is preferred with fewer unknowns than VIE. Reference [23] used the SIE and VIE formulas to calculate the bistatic scattering of the plasma, respectively and then compared with the accurate results to show the superiority of the SIE method.
The major computational costs of the MoM for solving SIEs are combined in two parts: they are the times for filling the impedance matrix and iteratively solving the equations. According to reference [12], the most accurate and reliable formula among the SIEs commonly used in plasma simulation is PMCHWT integral equations [24,25]. However, the diagonal elements of the impedance matrix established by the PMCHWT integral equations are not dominant, and the iterative solution suffers from the issue of slow convergence. This deficiency can be overcome if preprocessing is performed before the iterative solution [12]. In reference [26], FGMRES was considered to replace GMRES for solving large linear equations. There is a preprocessing technique based on the traditional GMRES solver added to the FGMRES solver, which makes the iteration convergence more efficient. By reducing the number of steps required for convergence, FGMRES achieves the purpose of saving calculation time.
On the other hand, the accelerated impedance matrix filling can be carried out in two ways. One is to reduce the calculation amount of unknowns by improving the calculation accuracy of matrix elements [27], and the other is to reduce the calculation time by using the additional algorithm to accelerate the calculation of matrix elements without reducing the unknowns. This article chose the second method. ACA [28] is a purely algebraic acceleration algorithm based on octree structure grouping and low-rank matrix compression, which can speed up impedance matrix filling. An octree is a three-dimensional spatial hierarchical tree-like data structure, which effectively represents the grouping of the far and near fields of three-dimensional objects. The impedance matrix formed by the far-field region has a low-rank characteristic, which utilizes the decomposition and compression characteristics of ACA to speed up the filling of the impedance matrix.
The ACA algorithm was proposed by Bebendorf in 2000 [28]. It uses the mathematical method to process the matrix and decreases the computational complexity and memory requirements. In 2005, Zhao introduced the ACA algorithm into the MoM to analyze electromagnetic radiation and scattering problems [29]. Since the impedance matrix is composed of the interaction between the field point and the source point, the farther the distance between the field point and the source point is, the smaller the effect is, which is reflected in the impedance matrix in that the far-field group matrix block is low rank. The ACA algorithm uses the low-rank characteristic of the far-field group matrix block to decompose and compress it to accelerate the filling of the impedance matrix.
Gold and silver are commonly studied in metals. Gold nanospheres have plasmon resonances in the region where gold is highly absorbing. The resonances are therefore strongly damped, but gold spheroids or silver nanospheres at resonance may prove more difficult. This paper focused on the calculation of bistatic RCS of a single metal nanoparticle (taking the calculation of a gold nanoparticle as an example). The electromagnetic scattering characteristics of metal nanoparticles were analyzed based on the PMCHWT integral equations by using the MoM as the numerical solution method. Combined with the ACA algorithm of matrix compression, the fast filling of the impedance matrix was accelerated to realize the fast calculation of far-field scattering characteristics of metal nanoparticles.

Generic PMCHWT Formulation for a Metal Nanoparticle
Considering a metal nanoparticle in a homogeneous medium space, the subscripts of the metal nanoparticle, the homogeneous medium, and the free space related parameters are represented by 1, 2, and 0, respectively. ε 1 = ε 1r ε 0 and µ 1 = µ 1r µ 0 are the complex permittivity and permeability of the metal nanoparticle; ε 2 = ε 2r ε 0 and µ 2 = µ 2r µ 0 are the complex permittivity and permeability of the homogeneous medium; ε 0 and µ 0 are the permittivity and permeability of free space. Where ε ir , µ ir , i = 1,2 are the relative permittivity and relative permeability. The surface of the metal nanoparticle is denoted as S. There is uniform plane wave (E inc , H inc ) irradiation outside the surface S. In the internal and external regions of the metal nanoparticle, the equivalent model is constructed by using the equivalent principle to solve the distribution of internal and external fields. The equivalent model is shown in Figure 1. Wheren andt are the unit normal and unit tangential vector of the outer surface of the medium, respectively; J e and J m are the equivalent currents and magnetic currents on the outer surface, respectively; −J e and −J m are the equivalent currents and magnetic currents on the inner surface, respectively; E sca 1 and H sca 1 are the scattering electric field and the scattering magnetic field of the inner surface, respectively; E sca 2 and H sca 2 are the scattering electric field and the scattering magnetic field of the outer surface, respectively.
Equations (1) and (2) calculate the scattering electric and magnetic fields, respectively: where γ 1 and γ 2 represent the propagation constants of the inner and outer regions, respectively; Equation (3) where r' and r represent the source point and observation point, respectively. Equation (7) introduces boundary conditions of electric field and magnetic field on the outer surface; Equation (8) introduces boundary conditions of the electric field and magnetic field on the inner surface: Then, we establish normal surface integral Equations (9) and (10) for electric current and magnetic current by combining the scattered electromagnetic fields and boundary conditions on the inner and outer surfaces, respectively: Next, the tangential PMCHWT Equations (11) and (12) are obtained by applyingn× on both sides of Equations (9) and (10): Finally, balancing the magnitudes of the electric and magnetic fields of Equations (11) and (12) with the wave impedance η 0 , yielding the tangential PMCHWT Equations (13) and (14): (14) where, η ir = ). The current density and magnetic current density of the electric field integral equation are expanded by the RWG basis function Λ n (r'): When the unknown functions of the PMCHWT integral equations are expanded by the basis function Λ n (r'), the task of solving the integral equations is transformed into the unknown coefficients α n and β n of the basis function. The number of expansion basis functions is the number of equations unknowns.
When the MoM is used to analyze the SIE of a target, the surface of any shape object can be simulated by a planar triangular patch, and the surface current and magnetic current can be described by a planar RWG basis function. Each RWG basis function is composed of two adjacent triangles, describing the current or magnetic current flowing from one triangle (positive triangle) to another (negative triangle). The mathematical expression is as in Equation (16): where A ± n represents the area of the corresponding triangle, l n is the side length, ρ ± n is the direction vector of the current in the triangle, and T ± n represents the positive and negative triangle. The definitions of each symbol are shown in Figure 2. The RWG basis function has two important properties. The first characteristic is the continuity of the normal component of the edge, which ensures the continuity of the current across the common edge; the second characteristic is that the divergences of the basis function of the two triangles are equal, and the symbols are opposite, which ensures that the total charge corresponding to the basis function is zero. Its expression is as in Equation (17): Since the RWG basis function is a vector basis function, it is often used to expand the surface current density and magnetic current density.
The structure of the impedance matrix of the PMCHWT equations is shown in Figure 3. The impedance matrix elements are calculated by Equations (18)- (21). The double integrals S m Λ m (r)· S n = G i r, r ' ·Λ n r ' dS dS and S m Λ m (r)· S n ∇G i r, r ' × Λ n r ' The right vector elements are calculated by Equations (22) and (23):

Triangular-Triangular Cyclic Integral Method for Element Calculation of Impedance Matrix
In the calculation of the traditional MoM, the cycle unit filled by impedance matrix elements is often the basis function itself. Each current RWG basis function contains two triangular patches for the commonly used triangulation. It is easy to find that the basis functions of non-single-sided basis function units such as RWG as they often have overlapping regions in the solution region, which leads to a large number of repeated calculations in the process of taking the basis function itself as the iterative unit. If the RWG-RWG cyclic integral is used, three edges of each triangle correspond to an RWG basis function. The same triangle area score will be repeated nine times in the iterative integral of the interaction of a pair of basis functions. If the triangle-triangle cyclic integral is used instead of the RWG-RWG, the repeated calculation can be avoided, which dramatically improves the calculation speed and iterative efficiency of matrix elements.

The Octree Establishes Grouping
The ACA algorithm is based on the low-rank matrix factorization principle, so it is applied to solve the matrix equation in the electromagnetic field and must have certain preconditions; that is, it can only be used to solve the far-field impedance matrix generated by far-field interaction. The impedance matrix generated by near-field interaction is generally of high rank, which is suitable for solving by the traditional MoM. Therefore, the basis function must be grouped. To make the established matrix have good performance, in the regional division of the whole three-dimensional object, we should try to ensure that each sub-domain has little difference. An octree is a tree-like data structure used to describe three-dimensional space. The basic idea of the subdivision is equalizing in every direction. In the specific subdivision process, we use a cube to represent the space where the entity is located; that is, the octree starts from a cube. The first divides it into eight sub-cubes, the next continues to decompose according to certain rules, and then divides some sub-cubes into eight sub-cubes, according to this recursive division, until a certain division termination condition is satisfied. The whole partitioning process can be represented by a tree structure with 8 or 0 child nodes per node.
In this paper, the metal nanoparticle was stratified by a square box. The square's side length is the maximum length of the metal nanoparticle in the three coordinate axes of the three-dimensional coordinate system. The specific octree structure can be defined in the following way.
As shown in Figure 4, the cube surrounding the entire metal nanoparticle is first divided into eight small cubes; because these eight cubes cannot distinguish between far interaction or near interaction, each cube has to be further divided down. Therefore, there are 64 small cubes in the second division. We take these 64 cubes divided twice as the first layer and then divide them again as the second layer. In this way, this is the whole process of octree division of the object region. After the partition, the square boxes in the whole region are divided into non-empty groups and empty groups according to whether the small square contains the split triangular mesh. Empty groups are groups that do not need to be calculated and are discarded directly. According to the distance between non-empty boxes, the interaction is divided into far-field interaction and near-field interaction, and then calculated, respectively. The simplified two-dimensional plane diagram is shown in Figure 5. The self-acting group refers to the interaction between the basis functions in the same square box; the near-field group refers to the interaction between adjacent square boxes; the far-field group refers to the interaction between square boxes that are not separated.

Overview of the Adaptive Cross Approximation (ACA) Algorithm
The ACA algorithm draws on the octree structure model, divides the basis functions into each group according to the vertex position, then divides the impedance matrix into many sub-matrix blocks with different dimensions. The diagonal matrix block is a full rank matrix, representing the interaction between self-acting groups and near-field groups. The traditional MoM is used to calculate. The non-diagonal matrix block is a low-rank matrix block that represents the interaction between far-field groups and is calculated by the ACA algorithm. The matrix block diagram is shown in Figure 6. The ACA algorithm is a pure algebraic algorithm, which approximately decomposes a large matrix into the multiplication of two small matrices according to the decomposition principle of matrices [30]. Assuming that the rank of the matrix A is r, the number of dimension objective basis functions of the matrix is determined, and it is assumed to be m×n, which is approximately decomposed into the multiplication of two matrices U and V. Matrix decomposition is shown in Figure 7, and its expression in Equation (24): The rank of a matrix can be artificially set by stopping criteria. Equation (25) defines the threshold condition of iteration.
where R is an error matrix, ε denotes the stop precision, and · means the matrix Frobenius specification. The detailed process of the ACA Algorithm 1 is as follows [29]:

Algorithm 1. The detailed process of the ACA algorithm
Initialization steps: (1) Initializing the index of the first row: I 1 = 1, suppose A = 0; (2) Initializing the first row of the error matrix: R(I 1 , :)= A(I 1 , :); (3) Determining the index J 1 of the first column: | R(I 1 , J 1 )|= max j (| R(I 1 , j)|); (4) Obtaining the first row of the V matrix: v 1 = R(I 1 , :)/| R(I 1 , J 1 )|; (5) Initializing the first column of the error matrix: R(:, J 1 ) = A(:, J 1 ); (6) Obtaining the first column of the U matrix: u 1 = R(:, J 1 ); (7) Computing the approximate matrix: (8) Determining the index of the second row I 2 : | R(I 2 , J 1 )|= max j (| R(i, J 1 )|); Next, the k th iteration: (1) Updating the I k th row of the approximation matrix of the error: R(I k , :)= A(I k , :) − ∑ k−1 l=1 (u l ) I k v l ; (2) Finding the maximum value in line I k to determine the J k th column: | R(I k , J k )|= max j (| R(I k , j)|), j = J 1 , . . . ,J k ; (3) Obtaining the k th row of the V matrix: v k = R(I k , :); (4) Updating the J k th column of the error matrix: R(:, J k )= A(:, J k ) − ∑ k−1 l=1 (v l ) J k u l ; (5) Getting the k th column of the U matrix: u k = R(:, J k )/| R(I k , J k )|; (6) Computing the approximate matrix: , end iteration, else go on; (8) Continue looking for the next line I k+1 : When the ACA algorithm is used to solve the matrix U and V, according to the adaptive principle, only k rows and k columns in the matrix are selected, and each element of the impedance matrix is not calculated in advance, which reduces the complexity. Since k < min(m,n), the calculation number of impedance matrix elements is k × (m + n), far less than m × n, and the computational complexity is reduced to 2kX, where X is the number of unknowns.
The most prominent feature of the ACA algorithm is that it does not depend on Green's function and integral equation in the solution process. In addition, another advantage of the ACA algorithm is that it approximates the original matrix only by knowing the elements of the original matrix and updating the specific column information.

ACA Algorithm Accelerates Filling Impedance Matrix
The octree data structure has divided all unknowns into near-field and far-field groups. The near-field group's impedance matrix filling adopts the traditional storage method, and all impedance elements are stored in sequence. The far-field impedance matrix elements are screened by the ACA algorithm, and the sparse storage method only needs to store k rows and k columns of elements, which significantly saves memory. At the same time, the ACA algorithm changes the matrix-vector multiplication order for the iterative solution of the equation. As shown in Figure 8, A n×n ·X n = b n in the traditional MoM is changed to U n×k · V k×n ·X n = b n , where n is the number of unknowns. The amount of calculation is changed from n × n to 2kn, plus the element filling time of the impedance matrix of the near-field group, and the overall time complexity is reduced to nlogn.

Results and Discussion
In this section, we provide some examples to prove the accuracy and efficiency of the PMCHWT integral equations with the ACA algorithm in analyzing the far-field scattering characteristics of a metal nanoparticle. The CPU model used in all examples in this section is Intel CoreTMi5-7500, and the memory capacity is 8 GB. The relative permeability is 1.0.
Considering a gold nanosphere with a radius of 200 nm analyzed in reference [14], the uniform plane wave with a wavelength of 550 nm is illuminated on the metal nanosphere; meanwhile, the pitch angle is 0 • , and the azimuth angle is 0 • . The relative dielectric constant of the gold nanosphere at this incident wavelength is listed in reference [3] as ε 1r = −8.0 − j1.66. The number of unknowns is 1920. Figure 9a calculates the HH polarization and VV polarization of the bistatic RCS of the metal sphere; Figure 9b-d calculate the memory consumption of the near-field impedance matrix, the impedance matrix filling time, and the time required for each iteration step respectively after applying the ACA algorithm under HH polarization. It is obtained from Figure 9a-d that the PMCHWT equations solved by the pure moment method and the PMCHWT equations solved by the ACA algorithm acceleration are in good agreement and also reduce the computation time from N 2 to Nlog(N), where N is the number of unknowns. Table 1 shows the time comparison between filling impedance matrix with pure MoM and accelerating the filling impedance matrix with the ACA algorithm. Considering a gold nanosphere with a radius of 68.25 nm analyzed in reference [15] and a radius of 456 nm analyzed in reference [17], respectively, the uniform plane wave with a wavelength of 546 nm is illuminated on the metal nanosphere; meanwhile, the pitch angle is 0 • , and the azimuth angle is 0 • . The relative dielectric constant of the gold nanosphere at this incident wavelength is listed in reference [3] as ε 1r = −5.84 − j2.11. The number of unknowns is 1920. Figure 10a,b calculate the HH polarization and VV polarization of the bistatic RCS of the metal sphere. It can be seen from the graph that the PMCHWT equations solved by the pure moment method and the PMCHWT equations solved by the ACA algorithm acceleration are in good agreement. Table 2 shows the time comparison between the filling impedance matrix with pure moment method and accelerating the filling impedance matrix with ACA algorithm.

Conclusions
A method for rapidly calculating the light scattering of a metal nanoparticle was proposed. Aiming at the light-scattering problem of metal nanoparticles, the MoM discretized PMCHWT integral equations, and, finally, the integral equation system was solved by FGMRES. Due to the low-rank characteristic of the far-field impedance matrix, a pure algebraic ACA algorithm was used to compress the far-field impedance matrix, which reduced