Computing the Double-Gyroaverage Term Incorporating Short-Scale Perturbation and Steep Equilibrium Profile by the Interpolation Algorithm

In the gyrokinetic model and simulations, when the double-gyroaverage term incorporates the combining effect contributed by the finite Larmor radius, short scales of the perturbation, and steep gradient of the equilibrium profile, the low-order approximation of this term could generate unignorable error. This paper implements an interpolation algorithm to compute the double-gyroaverage term without low-order approximation to avoid this error. For a steep equilibrium density, the obvious difference between the density on the gyrocenter coordinate frame and the one on the particle coordinate frame should be accounted for in the quasi-neutrality equation. A Euler–Maclaurin-based quadrature integrating algorithm is developed to compute the quadrature integral for the distribution of the magnetic moment. The application of the interpolation algorithm to computing the double-gyroaverage term and to solving the quasi-neutrality equation is benchmarked by comparing the numerical results with the known analytical solutions. Finally, to take advantage of the interpolation solver clearer, the numerical comparison between the interpolation solver and a classical second order solver is carried out in a constant theta-pinch magnetic field configuration using SELALIB code. When the equilibrium profile is not steep and the perturbation only has the non-zero mode number along the parallel spatial dimension, the results computed by the two solvers match each other well. When the gradient of the equilibrium profile is steep, the interpolation solver provides a bigger driving effect for the ion-temperature-gradient modes, which possess large polar mode numbers.


Introduction
The micro-scale turbulence plays a significant role on the confinement capability of the magnetized fusion plasma through its interaction with the low-frequency zonal flow [9,26,1,26,37,12,19,38], the equilibrium profile at the pedestal region and the edge localized modes [39,18,3,20,28], etc. The importance of the pedestal region is reflected by the fact that the pressure of the plasma core is proportional to the pressure at the pedestal top [30,35,10]. The gradient of the equilibrium temperature and density at the pedestal region could be very sharp, alternatively, the truncation of exp(ρ · ∇) acting over n 0 or T 0 at the first order, which is utilized by the standard gyrokinetic model [13,25,24], is not a good approximation, where ρ is the Larmor-radius vector defined as ρ(x, µ, θ) = 1 q s 2µm s B (e 1 cos θ + e 2 sin θ) .
Here, q s , m s , µ are the particles' charge, mass and the magnetic moment on the guidingcenter coordinate frame; e 1 , e 2 are the unit vectors perpendicular to unit vector b of the equilibrium magnetic field, and e 1 , e 2 , b obey the right hand rule; B is the magnitude of the magnetic field; θ is the gyrophase angle; n 0 , T 0 are the equilibrium density and temperature profile. In the gyrokinetic theory, the first gyroaverage term could incorporate the effect contributed by the finite Larmor radius and short scales of the perturbation at the core and edge tokamak plasma, while the double-gyroaverage term (DGT) incorporates the combining effect contributed by the finite Larmor radius, the short-scale perturbation and the steep-gradient equilibrium profile within the edge transport barrier of tokamak plasma [11,24,27,36]. However, in the standard electrostatic gyrokinetic model and simulations, the approximation of DGT consists of up to the first order truncation of the Taylor expansion of the equilibrium profile and the second order truncation of the expansion of the perturbative potential [25,13,22,29,4,5,21,6,14]. The details of this method can be found in section (4). These kinds of low order approximation are not enough concerning the short-scale perturbations and the equilibrium profile possessing the steep gradient.
To overcome the mentioned drawbacks of the low-order approximation of DGT, this paper develops an interpolation algorithm to compute DGT for the purpose of the resolution of the short-scale perturbation and the steep equilibrium profile together. The interpolation algorithm can approach DGT with any accuracy by choosing enough interpolation points on the Larmor circle, with only the constraint from the length scale of the mesh. Meanwhile, contrary to the traditional way, the steep equilibrium density shouldn't keep the same before and after the gyroaverage operation, which is denoted by the symbol J in this paper. And the obvious difference between the density on gyrocenter coordinate and the one on the particle coordinates should be accounted for in the quasi-neutrality equation(QNE). The obvious difference is revealed by Fig.(1), for which the normalisation scheme can be found in subsection (3.3) and the parameters to obtain this figure are provided in subsection (8.3.3). In Fig.(1), the purple line denotes the equilibrium density profile on the gyrocenter coordinates, which is given as the initial condition, while the density profiles on the particle coordinate derived by the gyroaverage operation computed with different number of µ are shown by other curves. The numerical integration of µ is done by the Euler-Maclaurin-based quadrature integration algorithm.
As a comparison to the interpolation solver, a classical 2nd order truncation of DGT is carried out in this paper to obtain QNE with the 2nd-order accuracy. It consists of the truncation of the exponential operator exp(ρ · ∇) over the potential up to the second order and the truncation of the operator exp(ρ · ∇) acting on the density up to the first order. The numerical simulations are carried out to compare the two solvers. Since this paper is to compare the two solvers in general purpose, the deduction of flux surface average of the potential from the total potential is not carried out to obtain the electrons' adiabatic distribution. For the equilibrium profile without steep gradient, both algorithms have almost the same performance in terms of the perturbation only possessing the mode number along the parallel spatial dimension. For the steep equilibrium profile, the interpolation algorithm could provide a stronger driving effect for the perturbation of high polar mode numbers, and the saturation time of these modes computed by the interpolation algorithm is earlier.
The remaining parts of this paper are arranged as follows. Section (2) introduces the orders used in deriving the gyrokinetic model. The DGT and QNE incorporating the short-scale perturbation and steep equilibrium profile on the gyrocenter coordinate frame are derived in Section (3). Section (4) derives the QNE with the 2nd-order truncation. In Section (5), the interpolation solver is introduced. The benchmark of the interpolation algorithm is done in Sections (6) and (7) for the case of the single µ and µ obeying the distribution, respectively. The Euler-Maclaurin-based quadrature integrating algorithm is developed in Section (7) to compute the numerical quadrature integral of µ. The application of the interpolation algorithm and the Euler-Maclaurin-based quadrature integrating algorithm to the gyrokinetic simulations is provided by Section (8), where the parallel scheme of the whole simulations are presented. density radial position density gyrocenter N=8 particle coordinate N=64 particle coordinate Figure 1: The equilibrium density profile on the gyrocenter coordinate before the gyroaverage operation is shown by the purple line, while the ones on the particle coordinate derived by the gyroaverage operation and with the quadrature integral of µ computed by the Euler-Maclaurin-based algorithm with the forward finite difference scheme for µ meshes of 8 and 64 nodes are presented by the other two curves. The radial distance is normalised by ρ 0 .

The basic orders
There are several basic orders or scales contained by the perturbation. The first one is the length scale O(ε) of the nondimensionalized Larmor Radius being ε. The second one is the amplitude of the normalized electrostatic potential, whose ε-based order is denoted as O (ε σ ) with the power σ a positive real number. In the magnetically confinied fusion plasmas, due to the fact that the charged particle can nearly migrate freely in the environment with collective interactions, the magnitude of the potential the particles feel must be much smaller than that of its kinetic energy.
The third one is the length scale of the gradient of the electrostatic potential. De- where the subscript ⊥ and denotes the directions perpendicular and parallel the direction of equilibrium magnetic field, respectively. The gyrokinetic model of this paper adopts the scales for the perturbation where the short scales of the perturbation are accounted for. For any equilibrium quantity E ∈ {n 0 , T 0 }, this paper considers the scale due to the steep gradient of the equilibrium profiles, as well as the following scales where U is the parallel velocity and will be given later.
3 QNE incorporating the short-scale perturbation and steep equilibrium profile 3.1 The particle-coordinate density associated with the shortscale perturbation and steep equilibrium profile The procedure to derive the gyrokinetic model is composited by two parts. The first one is to derive the coordinate transform by decoupling the gyroangle from the dynamics of other coordinates, while the second one is to obtain the gyrokinetic quasi-neutrality equation by inducing the transformation of the distribution through the derived coordinate transforms [11,13,2]. The first step is accomplished by implementing Lie transform perturbation method on the fundamental Lagrangian one-form. Generally, four kinds of coordinate frameworks are involved in the procedure. The first one is the full-orbit coordinate with the velocity part in Cartesian coordinates. It's denoted asz ≡ (x, v) here. The second one is obtained by transforming v into the cylindrical coordinates, and it's written as z ≡ (x, , µ 1 is the velocity along the parallel direction, and θ 1 is the angle between ρ and e 1 . The x component in z is still in full-orbit frame. The third one is the guiding-center coordinatesZ = (X,μ,Ū ,θ), which is derived by decouplingθ from the dynamics of the other coordinate components without the presence of the perturbation. The fourth one is the gyrocenter coordinate Z = (X, µ, U, θ) which is derived by decouplingθ from the dynamics of the other coordinate components with the presence of the perturbation. The coordinate transforms betweenz, z,Z and Z are denoted as ψ f :z → z, ψ gc : z →Z and ψ gy :Z → Z, respectively, while the distributions on the four kinds of coordinates are respectively written asf (z), f (z),F (Z) and F (Z).
The coordinate transform ψ gc : z →Z is given bȳ while the coordinate transform ψ gy :Z → Z is provided by the following equations and And through the Lie transform perturbative method, we could derive the following Lagrangian on the gyrocenter coordinate frame where µ is a constant. The Euler-Lagrangian equations based on L in Eq. (7) provides the equations of motion as follows: where B * = B + ms qs U ∇ × b. If the denominate in Eq.(8a) is expanded by the order of the small factor ms qs U ∇ × b, we could obtain the curvature drift part mU 2 κ eB where κ = ∇B B .
For the Vlasov gyrokinetic simulation, we need to transform the distribution function from the gyrocenter coordinate to the full-orbit coordinate [13]. After obtaining the coordinate transform composited by Eqs.(3a-3d) and (4a-4d), given a distribution function on the gyrocenter coordinate F s (X, µ, U , t), the distribution function on the full orbit can be derived following the transform chain First, the total distribution function is separated into the sum of an equilibrium one plus a perturbative one as Then, the approximation of the distribution on the guiding-center coordinate can be derived based on the coordinate transform given by Eqs.(4a-4d) where the equilibrium distribution involving µ is assumed as The approximation of f s (z) is derived with Eq. (11) as the base where ΦF s0 Ts (x − ρ 0 (x, µ 1 , θ 1 ) , µ 1 , u 1 ) is the origin of DGT and comprises the effect combining together the finite Larmor radius, short-scale perturbation and steep equilibrium profile.
We assume that the equilibrium distribution F s0 can be decomposed as the product between the parallel part and the perpendicular part with probability conservation satisfied by where under the equilibrium condition, the metric B(x)/m s is used.
Then, through the integral n s (x, t) = f s (z) B(x) ms dµ 1 du 1 dθ 1 , the density can be assembled as with Here, the metric B(x)/m s of the phase space is used.
In this paper, we sometimes use the symbols J 1 (µ) and J 2 (µ) to denote the first and the second gyroaverage associated with the magnetic moment µ, specifically, 3.2 QNE with respect to the adiabatic distribution of electron and the steep equilibrium profile on gyrocenter-coordinate frame As shown in Fig.(1), the equilibrium density profile possessing the steep gradient on the gyrocenter coordinate frame is obviously different from the one on the particle-coordinate frame after the gyroaverage operation. Therefore, contrary to the traditional way, the implementation of the equilibrium density on the particle-coordinate frame should be different from the one on the gyrocenter-coordinate frame if its profile is steep in the radial dimension. We consider a plasma only including protons and electrons. The electrons obey the adiabatic distribution on the particle-coordinate space where the flux-surface average of the potential is not deducted from the total potential, because our purpose is to compare the two solvers. n g0 (x) is the equilibrium density in the particle-coordinate space and needs to be solved. The initial equilibrium distribution of ion is given on the guiding-center coordinate system by from which the equilibrium density on the guiding-enter coordinate is easily derived as n 0 (x). For the scales given by Eq.(2), the equilibrium density on the particle-coordinate space should be derive as Taking into account of the quasi-neutrality equilibrium, it's derived that Then, QNE of this plasma is

Normalization
In this paper, the quantities t, v, B, l, µ, T, φ are normalized by where T e0 ≡ T e (r p ) and r p ∈ [r min , r max ] is the radial position of the peak of the initial distribution function. Then, by choosing the equilibrium perpendicular distribution given by Eq. (12), the normalized QNE is with the normalized DGT and

2nd-order approximation of QNE
We first need to take Taylor expansion of DGT. According to Appendix A, the expansion rule of a general function A (x + ρ 0 ) is The gyroaverage of the following terms are needed The following identity to reduce the perpendicular distribution to the density is also necessary The first gyroaverage expanded up to the second order is The second gyroaverage expanded up to the second order is The density derived from Eq. (27) is Then, QNE derived by the 2nd-order approximation is With the assumption of B = 1, Eq.(29) becomes In Eq.(30), the gradient term of n 0 contains a factor of 1/2, which doesn't exist in the standard gyrokinetic model. If we replace the last term on the right side of Eq.(13) by a term of the form Φ(x − ρ) F s0 Ts (x − ρ), the expansion of that term doesn't provide the 1/2 factor. However, as the derivation to obtain Eq.(13) illuminates, the last term in Eq.(13) is the correct one.

QNE solver
To obtain the QNE solver comprising of the interpolation algorithm, QNE in Eq.(22) is rewritten as Φ (x) can be written as the discrete sum over µ and U as where δµ j and δU k are the step length for µ and U . Here, and in cylindrical coordinate frame, F sjk is defined . Φ j (r h , Θ p ) is used to denote the value at mesh node (r h , Θ p ) of Φ(X, µ j ) as the first gyroaverage of φ with respect to the magnetic moment µ j . It is computed on the grid discretizedly by the following right Riemann sum: where ρ j = 2µ j . In the same way,Φ j (r h , Θ p ) is used to denote the value at grid point (r h , Θ p ) ofΦ(X, µ j ) as the second gyroaverage of φ with respect to the magnetic moment µ j . It is also approximated by the following right Riemann sum: whereΦ jk = Φ j F sjk . The respective symbols + and − in Eq.(42) and Eq.(34) should be paid attention. The cubic splines interpolation is used to calculate φ at the interpolated point on the Larmor circle. The radial projection on the boundaries for the points outside the domain is used and we consider 2π-periodic condition in Θ.
In order to detail the steps of the solver, we note So,Φ jk can be written as the vector product between Φ jk and F sjk as The specific procedure is given by: The matrix A spl is independent of the Larmor radius or the magnetic moment µ j , while depending on the equilibrium quantities.
2. For a Larmor radius ρ j = 2µ j contributed by the magnetic moment µ j , constructing the matrix A contr l,ρ j ∈ M (Nr+1)×N Θ ,(Nr+1)×N Θ which gives the contribution of the gyroaverage of the radius ρ j as the function of the splines coefficients. Here, l = 1, 2 denoting the first and the second gyroaverage, respectively.

The first gyroaverge is given by
The second gyroaverage is written as the matrix formΦ jk = G 2,ρ jΦ jk , with G 2,ρ j ≡ A contr l,ρ j A spl . To realise the matrix form betweenΦ jk and φ, the vector product F sjk ·Φ j can be written as the product between a diagonal matrix denoted as F sjk and Φ j . Eventually, Φ jk can be written as the matrix form Note that the elements of matrix G 2,ρ j F sjk G 1,ρ j only depends on the equilibrium quantities. It can be computed once for all at the beginning of the simulation. 5. Then, the integral of µ contained by the double gyroaverage term can be formulated as the discrete sumΦ We construct the matrix of the quasi-neutrality operator in the Fourier basis. In fact, due to the periodic structure in the Θ dimension, A spl and A contr l,ρ j are recognized to have circulant block structure [17], which in Fourier basis can be transformed as block diagonal matrixes. More precisely, the matrixes A spl , A contr l,ρ j and F s0jk have the following structure Due to that F sjk is a diagonal matrix with only the main diagonal elements not equaling zero, the following arrangements of the product order in Eq.(35) equal The product order on the left is chosen in this paper. Then, the product A spl F sjk is further denoted as A spl jk in terms of the following expression where The three matrixes are diagonalisable in the Fourier basis as where * means the complex conjugate and the other matrixes are defined as follows where I n is the identity matrix of size n × n.
As explained before, the value of φ(x) and n 1 (x) on the grid points can be assembled to be the vectors φ and n 1 . Then, the first term and third term of Eq.(31) can be written in a matrix form with only the main diagonal elements of the coefficient matrix unequal to zero.
Due to the circulant block structure of the matrix A spl hjk , A spl , A contr l,ρ j and the coefficient matrixes of the first term and third term on the left of Eq.(31), the computation of the QNE solver comprising gyroaverage matrix G 2,ρ j F sjk G 1,ρ j can be performed using a fast algorithm: 1. Change to Fourier basis by FFT(φ). 2. Compute the product G 2,ρ j F sjk G 1,ρ j φ to obtain the total coefficient matrix of the left side of QNE. 3. Use the subroutine of Lapack and Blas to compute the inverse matrix of the coefficient matrix. 4. Change the results to the real space by FFT −1 . The use of polar mesh and FFT allows to make more quickly computations and provides a base for a future work in more complex geometry.

5.2
The algorithm to compute A spl and A contr l,ρ j In our interpolation scheme, the following cubic spline function is used as the basis function The potential on the grid point (h, p)is where x 1 and x 2 can be the coordinates in the Cartesian frame and in the polar frame. C h+a,p+b is the weight of the two dimensional basis B h+a B p+b . B h+a (x 1h )/B p+b (x 2p ) is the value of the one dimensional basis B h+a /B p+b at the node x 1h /x 2p . By assembling φ h,p into the vector φ as explained before, Eq.(40) can be rewritten as the tensor product of two matrixes multiplying the vector C comprising the weight coefficients of all the basises where C ≡ (C 0,0 , · · · , C Nr,0 , C 0,1 , · · · , C Nr,1 , · · · , C 0,N Θ−1 , · · · , C Nr,N Θ−1 ) t , and the subscript P denotes the periodic boundary condition used in the x 2 dimension, while b denotes the periodic or natural boundary condition used in x 1 dimension. B b (x 1 ) and B P (x 2 ) are the matrixes representing the value of B h (x 1 ), h ∈ {0, 1, 2, · · · , N r } and B p (x 2 ), p ∈ {0, 1, · · · , N Θ − 1} on the nodes of the x 1 and x 2 domains, respectively. Here, Because the rows of B P (x 2 ) have the circulant property, the matrix B P (x 2 ) ⊗ B b (x 1 ) is of the circulant block structure. Its inverse can be written as which is still a circulant block matrix. Then, we derive For the gyroaverage operation, Eqs. (42) and (34) can be rewritten in the following form The global coordinates of the mth interpolated points on the Larmor circle of the grid point (h, p) are computed as In Eqs.(43a) and (43b), C min,1 and C min,2 are the respective minimal value of the x 1 and x 2 domain; δ 1 and δ 2 are the uniform step length in the respective dimension; 0 ≤ c 1 < δ 1 and 0 ≤ c 2 < δ 2 hold. d hpm,1 and d hpm,2 are the global nodes numbers of the interpolated point on the Larmor circle denoted by the indexes h, p, m. C lj,hpm,1 and C lj,hpm,2 can be the global coordinates in the Cartesian or polar coordinate frames. In the 2nd equality of Eq.(42), the potential at the interpolated point (C lj,hpm,1 , C lj,hpm,2 ) is written by the sum of the contribution of the cubic spline basis. We use D hpma and D hpmb to denote the global node number of the grid point associated with the indexes (a, b). This grid point is where the basis function locates. Then, val(a, b) as the spline function value at (C lj,hpm,1 , C lj,hpm,2 ) is given by C lj,D hpma ,D hpmb in Eq.(42) is the weight coefficient of the cubic spline function locating at (D hpma , D hpmb ). Eventually, by summing the repeated contribution nodes together, the gyroaverage of φ could be written into a matrix form Here, we want to mention that due to the finite value of µ j , some interpolated points on the Larmor circle of the grid point close to the boundary could locate out of the chosen domain. At the out boundary, for a realistic plasma, the amplitude of the fluctuation of the potential is close to zero. So we approximate the contribution of such points by the contribution of their projection points on the boundary.
The elements of A contr To compute A contr l,ρ j , due to its circulant block structure, we only need to compute the block matrixes associated with p 1 = 1. The other rows of block matrixes associated with p 1 = 1 can be obtained by permuting the indexes p 2 s for p 1 − 1 times, as shown in Eq.(36b). The algorithm to compute A contr l,ρ j (h 1 , (p 2 − 1) * (N r + 1) + h 2 ) is given by Algorithm 1.
6 Benchmark of the interpolation algorithm for the single µ case To benchmark the interpolation algorithm, we integrate the absolute value of the error of the interpolation solver deviating from the analytical solution over the simulated domain, specifically, where (interp) hp and (anal) hp denote the value computed by the interpolation solver and by the analytical solution at the mesh node (h, p), respectively. δx 1 and δx 2 are the uniform steps in x 1 and x 2 dimensions. For some cases, we also implement the same scheme to the 2nd-order solver.

1st example: double periodic boundary condition in Cartesian coordinate
In this example, the periodic boundary condition is used in both x 1 and x 2 dimensions. According to Eqs. (67) and (68), for the function f (x) = cos(nx 1 +mx 2 ), the exact solution of J 1 (µ)f (x) and J 2 (µ)J 1 (µ)f (x) are J 1 (µ) cos(nx 1 + mx 2 ) = J 0 (ρ(µ) √ n 2 + m 2 ) cos(nx 1 + mx 2 ), Two cases are computed. For the first one, n = 0, m = 2 are used, and µ = 0.02, N θ = 100, while for the second one, n = 5, m = 5 are chosen. For both cases, we compute the domain (0, 2π) × (0, 2π) which are divided into three meshes of (16,16), (32,32), (64, 64) cells, respectively. The results are shown in Tables 1 and 2. It's found that compared with the 2nd-order solver, the interpolation algorithm has better accuracy and its results present the converging rate of the fourth-order accuracy, which is provided by the cubic spline interpolation.  For this case, the periodic boundary condition is used in x 2 dimension, while the natural boundary condition is used in x 1 dimension. The function under gyroaverage is f (x) = x 2 . Its first gyroaverage and double gyroaverage are  Table 3. To obtain this, we ignored the boundary rows in the x 1 dimension, for which the natural boundary condition is used. The fact that the abs error of the 2nd-order solver is zero is due to that the first gyroaverage and the double one of f (x) depends on the gradient of f (x) only through ∇ 2 ⊥ f (x), as Eq. (26) shows. ∇ 2 ⊥ f (x) in Cartesian coordinates is written as (∂ 2 x 1 + ∂ 2 x 2 )f (x). The second order derivative is approximated by the centered finite difference as ∂ 2 with the 2nd-order precision, where f j+1,k ≡ f (x 1,j+1 , x 2,k ). It's obvious that the finite difference of ∂ 2 x 1 over f (x) being x 2 equals zero, so does ∂ 2 x 2 over f (x) except at the boundary. The abs error of the 2nd-order solver equaling zero is obtained when the boundary row are ignored.

Benchmark the QNE solver in the Cartesian coordinate frame and in the polar coordinate frame
The previous examples benchmark the interpolation algorithm to compute the gyroaverage operation. This subsection is dedicated to benchmark the QNE solver, which comprises of solving the inverse matrix of A spl and A contri l,ρ j by FFT based on the algorithm given in Subsec. (5.1), and the interpolation algorithm to obtain the gyroaverage.
For the convenience to find the exact solution of QNE, we simplify QNE in Eq.(22) by choosing T i = T e = 0.5, n 0 = n g0 = 1.0. We also solve the single µ case first, so that QNE is simplified as We first benchmark the QNE solver in Cartesian coordinate frame with doubleperiodic boundary condition. By choosing We choose n = 3, m = 3, µ = 1.0 and µ = 0. The simulated domain (0, 4) × (0, 2π) are divided into three meshes including 32 × 32, 64 × 64 and 128 × 128 cells, respectively. The integrals of the absolute value of the error for the three kinds of mesh are presented in Table 5, which indicates that the QNE solver could provide a good solution. For the µ = 0.5 case, we also plot in Fig.(3) the exact solution and the results computed by the QNE solver with node number in x 2 dimension equaling 20. The two solutions in Fig.(3) fit each other well. To benchmark the QNE solver in the polar coordinate frame, the function n 1 in Eq.(45) is chosen as which leads to the exact solution of φ(x) through Eq.(45) The simulated domain (0, 4) × (0, 2π) is also divided into meshes of 32 × 32 and 64 × 64 cells, respectively. We computed two cases of µ = 0.02 and µ = 0. The integral of the absolute error over the simulated domain for the two µ cases are given in Table 6. The fourth-order converging rate doesn't appear anymore, due to the error at the boundary area where the natural boundary condition is used. The error has two sources. One is that the natural boundary condition is used. The other is that around boundary, due to the finite value of the Larmor radius, there are interpolation points locating out of the simulated domain and we project these points to the corresponding points at the boundary.
We still plot the exact solution and the numerical solutions of the polar profile at the radial position 7/4 and the radial profile at the polar angle 7π/8 in Fig.(4), which presents good match between the two solutions.

Benchmark of the interpolation algorithm for the multiple µ case
This benchmark is done in the polar-coordinate frame. The test function is chosen as We consider the gyroaverage associated with µ obeying the distribution e −µ/a where a ∈ (0.5, 2.2) is chosen. The integral of the distribution over µ is written as The exact solution of the first gyroaverage is which can be written into a discretized formula

The Euler-Maclaurin-based quadrature integration algorithm
Φ is given in Eq.(23) and F is proportional to exp(− µB T i (r) ). To obtain the integral over µ, the conventional way is to redefine a new magnetic moment by µ new = µ/T i (r). Then, the exponential factor exp(− µB T i (r) ) becomes exp(−µ new ) and the Gauss-Laguerre quadrature is implemented to obtain the discretized sum to approximate the continuous integral. However, in the new termΦ, the denominate of T i (r) in exp(− µB T i (r) ) together with n 0 (r) determines the radial density profile of the ions which possess the magnetic moment µ. Since T i (r) could possess the steep gradient in the radial dimension, its gyroaverge J (µ)T i (x) can't be ignored that it can not be eliminated from the denominate in the conventional way. Moreover, we can not use Gauss-Laguerre quadrature method to provide the same group of µ nodes and the associated weights along the radial dimension for different T i (r) along the radial dimension. Because the value of T i (r) is within a domain, which is chosen as (0.5, 2.2) in this paper, and the gradient of T i (r) after the gyroaverage could also change, resulting in that the roots and weights provided by Gauss-Laguerre quadrature for the integral µmax 0 e −µ dµ can not fit the integral µmax 0 e −µ/a dµ for all as belonging to the domain (0.5, 2.2) which is chosen in this paper.
We therefore develop a quadrature integrating scheme based on the following Euler-Maclaurin quadrature formula where h = (c−b)/n, n−1 is the number of the cells of the mesh, B 2r is the 2r-th Bernoulli number and R p is the residual quantity. The order of the precision of the right side is beyond O(h 2M ). For the distribution function proportional to exp(−µ/a), R p and the term depending on f 2r−1 (b) can be ignored for large enough b. B 2r can be solved by the recurrence relation The derivative f 2r−1 (b) is replaced by the finite difference scheme. For a function F (x), the finite difference scheme of the dth order derivative with the pth order precision can be formally written as the following summation of the function value on the grid points where G dpi is the coefficient of the finite difference with the pth order precision for the dth order derivative, and the step length of the finite difference is the same with that used in Eq.(46). For the forward finite difference scheme of the dth order derivative with pth order precision, i min = 0 and i max = d + p − 1 are derived, while for the centered finite difference scheme, we have −i min = i max = (d + p − 1)/2. By truncating the Taylor expanding of F (x + ih) at the order O((ih) d+p−1 ), Eq.(47) can be rewritten as The equaling of both sides of Eq.(48) results in the following identities imax i=i min through which the coefficient G dpi s can be solved. By substituting Eq.(48) into the Euler-Maclaurin formula, we obtain its discretized version For the forward finite difference scheme, Eq.(49) can be rearranged into where we select the condition satisfying 2M + p − 1 ≤ n − 1. The weight coefficients of f (b + ih) in Eq.(50) for all i ∈ {0, 1, 2, · · · , n − 1} are computed and stored at the beginning of the simulation for the following quadrature integration. For the central difference scheme, Eq.(49) can be rearranged in the same way into The value of f (b − ih) is obtained by considering the even or odd property of f (x − b) relative to the point x = b. The value of b equals 0 for the situation we consider, while the integrand is an even function relative to the point b = 0. One more point to mention is that instead of µ, we implement v ⊥ , which equals √ 2µ, as the integration argument for the achievement of the high precision. So dµ = v ⊥ dv ⊥ and the minimal value of v ⊥ equals 0. Based on this implementation, we developed the third one named derivative-reduction scheme. By replacing µ with v ⊥ , the integral f (2µ)dµ At the point v ⊥ = 0, we use the following formula to reduce the order of the derivative

Then, the third discretized expression of Euler-Maclaurin formula is derived as
where we have used the even property of the function To test the accuracy and the convergent rate of the three schemes based on the Euler-Maclaurin formula, we apply them to the computation of the exponential integral We also compare its results to those computed by the trapezoidal quadrature algorithm with uniform mesh. Through dividing the domain (0, 30) into 1E + 7 equal cells, the two integrals are approximated by the Riemann sum. By treating the Riemann sums as the exact value, the absolute value of the numerical error computed by the two algorithms are plotted in Fig.(5) for the three cases a = 0.5, 1.0, 2.2. Fig.(5) shows that compared with the trapezoidal quadrature algorithm, the three schemes of the Euler-Maclaurin-based algorithm provide a much higher precision order when the nodes number goes from 17 to 129.

Benchmark of DGT computed by the interpolation algorithm in the polar coordinate frame assisted by the forward scheme of the Euler-Maclaurin-based algorithm
This benchmark is done in the polar-coordinate frame and the forward scheme of the Euler-Maclaurin-based algorithm is implemented. In this benchmark, n = 0, m = 1 is chosen. The simulated domain of (r, Θ) is (0, 16) × (0, 2π), which is divided into 40 × 40 meshes. µ is in (0, 20) which is divided into one mesh including 64 nodes. Only the case of a = 1 is computed here. The polar profile at r = 6 and the radial profile at Θ = 7π    √ 2µ)e −µ/a dµ computed by the three schemes of Euler-Maclaurin-based quadrature integrating algorithm and the trapezoidal algorithm with uniform mesh. We choose M = 4 in the Euler-Maclaurin formula and p = 6 as the precision order of the finite difference. Here, "deri-red" denotes the derivative-reduction scheme, while "trap" denotes the trapezoidal algorithm.

Benchmark of the QNE solver in the polar-coordinate frame assisted by the forward scheme of the Euler-Maclaurin-based algorithm
We also benchmark the QNE solver in the polar coordinate frame with µ obeying the distribution e −µ/a , with which the quasi-neutrality equation in this case is where the same parameters T i0 , T e0 , n 0 , n g0 are used. We choose the test function of n 1 (x) = cos(nx 1 + mx 2 ). The exact solution of Eq.(53) is φ(x) = cos(nx 1 + mx 2 ) 4 − I 2 (µ max , n, m) .
In this benchmark, the radial domain is chosen as (0, 20). The domain of r × Θ is divided into a mesh of 64 × 64 cells. The weights of of the µ mesh for a = 1, 0.5, 2.2 are computed by the Euler-Maclaurin-based quadrature algorithm. The magnitude of the numerical error of I 0 , I 2 and the abs error of the integration of the solution over the domain for the three a cases are given in Table 7. The computing formula of the abs error is presented by Eq.(44). In Fig.(7), the radial profile at Θ = 7π 8 and the polar profile at r = 35 4 for a = 0.5,a = 1,a = 2.     The cylindrical coordinates frame with the constant theta-pinch magnetic field configuration is used in our simulations. The whole simulations are executed on the ATLAS HPC platform of IRMA by introducing our test modules into SeLaLiB code [31], which is a classic semi-Lagrangian library using cubic splines interpolation [8,32,15,16]. The predictorcorrector method, the Verlet algorithm for computing the characteristics, and the Strang splitting of the advection of Vlasov equation are also implemented [23,15,33,34,7]. The gyrokinetic Vlasov equation in cylindrical coordinate frame is The characteristics areṙ The quasi-neutrality equation is given by Eq.(22).

The algorithm to reduce the distribution to the density
In the discrete version of µ, the distribution of ions associated with each µ j with j ∈ {1, · · · , N µ } is denoted as F j (x, µ j , U ). Due to the identity dµ j dt = 0, F j (x, µ j , U ) satisfies the Vlasov equation in the cylindrical coordinate system where (r, Θ, x ) denotes the cylindrical coordinate frame. F j (x, µ j , U ) can be rewritten as the sum and T i (x) ). In the numerical simulation, F 0j (x, µ j , U ) doesn't evolve. At each time step, F j (x, µ j , U ) is obtained by solving Eq.(56) and F 1j (C, µ j , U ) is derived by using F j (x, µ j , U ) minus F 0j (x, µ j , U ).
The normalised version of Eqs.(17a) and (17b) to obtain n sg0 and n sg1 can be discretized in the µ dimension by obtaining a mesh of µ with N µ nodes. The contribution of each µ j component to the density n sg and n sg0 can be written as Then, the n sg1j can be derived as n sgj − n sg0j . And the summation of all µ j s leads to the total density. The specific procedures to compute the density on the particle-coordinate frame are listed below: 1. The distribution F j (x, µ j , U, t) for each µ j at time moment t is computed by Eq.(56).
4. The equilibrium density in the particle-coordinate frame obtained from the gyroaveage of the equilibrium distribution F 0j (x, µ j , U ) for the µ j component is done at the beginning following the procedure 2, 3 and is stored for the subsequent invoking. By inheriting the symbol used in Eq. (29), the jth equilibrium density on particle-coordinate spatial mesh is denoted as n g0j,hp . Then, the perturbative density on the particle-coordinate spatial mesh contributed by the µ j component is 5. The total perturbative density on the particle-coordinate space computed by all j ∈ {1, · · · , N µ } is obtained by "MPI_ALLREDUCE" the n g1j,hp as j n g1j,hp B hp δµ j , where B hp is the magnetic field magnitude at the node denoted by (h, p).
For an initial equilibrium density profile on the gyrocenter coordinate frame shown by the purple line of Fig.(1), assuming the equilibrium distribution of µ obeying Eq.(60), the density on the particle coordinate space after the gyroaverage operation is obtained and shown in Fig.(1).

The parallelization
MPI is used to the parallelization of the computing of µ. The processors are divided into N µ sub-communicators. F j (x, µ j , U ) with j ∈ {1, · · · , N µ } is exclusively computed by the jth sub-communicator. And the respective precomputing matrixes of Φ j are stored in the jth sub-communicator.Φ and n i1 (x) in Eq. (23) and Eq.(17b) are computed by "MPI_ALLREDUCE" the respective quantity stored in the processors of the same "color" with respect to the respective sub-communicator.
To calculate the advection of distribution function in the 4D domain (r, Θ, x , U ), two parallelization schemes are involved: the one for parallelizing x with r, Θ, U sequential is used to calculate the advection due toṙ,Θ,U ; the other one for parallelizing r, Θ, U with x sequential is used to compute the advection due to U . The 3D domain (r, Θ, x ) with respect to the potential function and the density implements the scheme: the parallelization in x with r, Θ sequential is used to compute the characteristicsṙ,Θ,U which are needed in computing the advection of the distribution in r, Θ, U dimensions, and to solve QNE in the poloidal cross section.

The simulation results
In the cylindrical coordinates system, the initial distribution is of the structure in Eq. (14). The distribution function is specifically given as where the equilibrium function F eq is F eq (r, µ, U ) = and n, m, p are the respective mode numbers. The profile T i (r), T e (r) and n 0 (r) are given by: where P ∈ {T i , T e , n 0 }, C T i = C Te = 1 and For all the following simulation cases, the time step ∆t = 8 is chosen and the data are stored every two time steps. This domain is divided into a mesh of 64×64×32×32×63 cells. The simulation is carried out by implementing 128 processes, which are executed by 4 nodes on ATLAS HPC of IRMA using hyperthreading. Each node has 24 CPUs and carries on 32 processes.
The evolution curves of the polar modes of the potential at the radial node 30 are shown in Fig.(10). The radial spectrum evolution is shown in Fig.(11). A good match of the results computed by the two solvers is revealed by these figures. . The equilibrium profile of ion's density and temperature are given by Fig.(12).
The simulated domain is the same with that of the 1st case and is divided into a mesh of 64 × 128 × 32 × 32 × 63 cells. For this case, the implementation of the processes and the partition of these processes on the nodes of ATLAS are the same with that of the 1st case.
The potential profile on the polar cross section at time moment 16, 2096, 3680 computed by the two solvers are shown in Fig.(13). The ion temperature modes are driven by the equilibrium gradient. The evolution curves of the polar modes of the potential at the radial node 35 are presented in Fig.(14), which shows that the growth rate computed   ion's equilibrium density profile on the gyrocenter coordinate frame, while the green curve is on the particle coordinate frame transformed from the purple curve by the gyroaverage operation. Ion's equilibrium temperature profile is given by the right figure. by the interpolation algorithm is larger and the the saturation time is earlier. The radial spectrum evolution is shown in Fig.(15). For this case, we could obtain k r ρ ≈ π with ρ = 1. We consider the parameters: η = 10 −4 , k n 0 = 13.0, κ T i = κ Te = 66.0, δr Te = δr Te = δr n 0 = 0.1, L = 1506.759067, r p = 0.45(r min + r max ), δr = 4δrn 0 δr T i . The equilibrium profile of ion's density and temperature are given by Fig.(16).
The simulated domain is the same and is divided into a mesh of 128×64×32×32×63 cells. The implementation of the processes on the nodes of ATLAS doesn't change. The potential profile on the polar cross section at time moment 16, 3200, 5236 computed by the two solvers are shown in Fig.(17). The evolution curves of the polar modes of the potential at the radial node 35 are presented in Fig.(18). The radial spectrum evolution is given in Fig.(22). The parameters of this case are the same with those used in the 3rd case except that p = 0 is chosen. The results are plotted in Figs. (20) and (21). Compared with the simulation results of the 3rd case, it can be observed from both figures that the growth rate of the polar Fourier modes without the initial radial wavelength are much larger than that with the presence of the initial radial wavelength. Such a fact may contribute a new element to the turbulence induced transport theory, which states that the shear flow could reduce      Figure 19: These figures are for the 3rd case. The radial Fourier modes of the perturbative potential computed by the interpolation algorithm and the 2nd-order solver.
the radial coherent length of the turbulent vortex, mitigating the transport rate as a consequence. Our simulations in 3rd and 4th cases suggest that with the presence of the low toroidal mode number, the short radial wavelength causes much smaller growth rate of the poloidal modes.

Conclusion and Discussion
This paper points out that the traditional low-order approximation of DGT is not suitable concerning the short-scale perturbation and the steep equilibrium profile. To obtain the more precise mesh-grids value of the function which is under the first gyroaverage or double gyroaverage in the presence of these extreme cases, the interpolation algorithm is developed by uniformly layouting many points on the Larmor circle surrounding the grid point, the function value at which is obtained by utilising the cubic-spline piecewise basis function for the interpolation. Due to the periodic boundary condition of the cubic spline interpolation on the polar dimension, the interpolation coefficient matrix has the periodic block structure. Then, FFT can be implemented to speed up the process of obtaining the   inverse of the discretized coefficient matrix of QNE. Since the density of each µ j depends on a weight factor exp( −µB(x) Te(x) ) and all quantities as the functions of x experience a transform x → x − ρ(x, µ j , θ), the Gaussian-Laguerre quadrature integrating method can not be used and we developed an Euler-Maclaurin-based quadrature integrating method to obtain the quadrature integration for the distribution of µ.
The interpolation algorithm itself and the interpolation solver for the numerical computation of QNE are benchmarked under various boundary conditions based on various analytical solutions. It's found that numerical solutions computed by both cases match the analytical solutions very well. The interpolation solver is also applied to the gyrokinetic integrated simulation which comprises of the other Vlasov solver and the characteristic solver. The perturbations possessing various model numbers and the equilibrium profiles possessing various radial gradient are implemented in the integrated simulations. To observe the advantage of the interpolation algorithm, its numerical results are compared with those computed by the 2nd-order solver. When the equilibrium profile is not steep and the perturbation only has the nonzero mode number along the parallel spatial dimension, the results computed by the interpolation solver and the 2nd-order solver match each other well. When the gradient of the equilibrium profile is steep, the interpolation solver provides a bigger driving effect for the ion-temperature-gradient modes which possesses large polar mode numbers.

Acknowledgments
The first author appreciates the help of Dr. Sever Hirstoaga and Dr. Matthieu Boileau for the implementation of ATLAS HPC of IRMA.