High-Order CFD Solvers on Three-Dimensional Unstructured Meshes: Parallel Implementation of RKDG Method with WENO Limiter and Momentum Sources

: In aerospace engineering, high-order computational fluid dynamics (CFD) solvers suitable for three-dimensional unstructured meshes are less developed than expected. The Runge– Kutta discontinuous Galerkin (RKDG) finite element method with compact weighted essentially non-oscillatory (WENO) limiters is one of the candidates, which might give high-order solutions on unstructured meshes. In this article, we provide an efficient parallel implementation of this method for simulating inviscid compressible flows. The implemented solvers are tested on lower-dimensional model problems and real three-dimensional engineering problems. Results of lower-dimensional problems validate the correctness and accuracy of these solvers. The capability of capturing complex flow structures even on coarse meshes is shown in the results of three-dimensional applications. For solving problems containing rotary wings, an unsteady momentum source model is incorporated into the solvers. Such a finite element/momentum source hybrid method eliminates the reliance on advanced mesh techniques, which might provide an efficient tool for studying rotorcraft aerodynamics.


Introduction
Computational fluid dynamics (CFD) solvers, widely used in aerospace engineering, are based either on finite difference (FD) schemes or on finite volume (FV) schemes. However, FD schemes are restricted on structured meshes, while most FV schemes have only first-or second-order accuracy. One way to develop higher-order schemes suitable for unstructured meshes is to use the discontinuous Galerkin (DG) finite element (FE) methods. As with other FE methods, DG methods assume continuous approximation in each element (i.e., cell). However, they allow discontinuities to exist on intercell boundaries. Such discontinuities are then resolved by numerical fluxes, e.g., exact or approximate Riemann solvers, as with Godunov-type schemes [1]. The original DG scheme was not proposed for CFD but for solving the linear neutron transport equation [2]. For nonlinear conservation laws, Chavent and Salzano [3] designed an explicit Euler DG cheme, which is second-order accurate in space but only first-order accurate in time. To improve the accuracy of time discretization, Cockburn and Shu [4] replaced the first-order forward Euler method with high-order Runge-Kutta (RK) methods, which led to the family of Runge-Kutta discontinuous Galerkin (RKDG) schemes for scalar conservation laws [5,6], for one-dimensional conservation law systems [7], for multi-dimensional scalar conservation laws [8] and for multi-dimensional conservation law systems [9]. The last case includes the Euler system that depicts inviscid compressible flows, which is the first application of RKDG in CFD. We will give a concise derivation of this scheme in Sections 2.1 and 2.2.

The RKDG Method with WENO Limiters and Momentum Sources
Mathematically, the problems that we are going to solve in Section 3 are defined by the three-dimensional Euler system ρu y ρu x u y ρu y u y + p ρu z u y ρh 0 u y with certain boundary and initial conditions, which describes inviscid compressible flows. In Equation (1), • Variables in the first column are called conservative variables; • The density ρ, the velocity components u x , u y , u z and the pressure p are called primitive variables; • The thermodynamic quantities e 0 and h 0 are the specific total energy and the specific total enthalpy, respectively; for ideal gases, they can be written as functions of the five primitive variables: in which γ is the heat capacity ratio of the gas, and we use γ = 1.4 in this article; • The body force components b x , b y , b z are assumed to be 0s, except in Section 3.3.
The method that we use for solving the Euler system (1) is the RKDG method with a WENO limiter. A detailed formulation of this method for sourceless problems has been given in our previous work [19]. In this section, we give a more complete formulation, which takes source terms into consideration. If all source terms were canceled, it would reduce to the form in [19].

The DG Space Discretization
Equation (1) can be written as in which F = F x e x + F y e y + F z e z is the flux vector whose components F x , F y , F z , and U, Q are all 5 × 1 matrices, each row of which is a scalar function depending on position x and time t. By multiplying both sides with an arbitrary test function ψ( x) and integrating the products on the i-th element (i.e., cell) E i , and applying integral by parts and Gauss's divergence theorem, Equation (2) can be turned into a weak form: where ν is the outer normal unit vector of ∂E i (which is the boundary of E). By choosing an orthonormal basis of the linear space spanned by polynomials less than the p-th degree over E, denoted as φ( x) := φ 1 ( x) · · · φ L ( x) , the unknowns U and the arbitrary test function ψ can be approximated as in which eachÛ l (t) is a 5 × 1 matrix of temporal functions, and eachψ l is an arbitrary constant number. Substituting them into the weak form (Equation 2) gives where ν · F =: F ν is the normal flux at some point on ∂E i , whose value could be given by an exact or approximate Riemann solver of Equation (2); see [27] for details. Recall the arbitrariness of ψ l L l=1 and adopt the inner product notation and we could turn Equation (4) into a system of ordinary differential equations: in whichÛ is a 5 × L matrix of temporal functions (which will be solved in Section 2.2), and is a variable matrix depending onÛ. The integrals in Equation (6) are evaluated by Gaussian quadrature rules, such as those given by [28].

The RK Time Discretization
Equation (5) is a typical system of nonlinear ordinary differential equations. In this article, we use the following explicit Runge-Kutta methods: first-order:Û n+1 =Û n + R n ∆t; second-order:Û n+1/2 =Û n + R n ∆t, third-order:Û n+1/3 =Û n + R n ∆t, in which the R µ := R(Û µ )s are the values mapped by the nonlinear operator R from the correspondingÛ µ s.
It has been proven by Gottlieb and Shu that they are all total variation diminishing (TVD) [29] or strong stability preserving (SSP) [30] (which is a desired feature for hyperbolic problems), and fourth-order Runge-Kutta methods cannot be TVD or SSP without introducing the adjoint operator of R, denoted as R † , which satisfieŝ It is non-trivial to implement such an adjoint operator and thus we do not implement Runge-Kutta methods whose accuracy order is higher than three.

The Compact WENO Limiter
Following our previous work [19], we use the three-dimensional extension of the two-dimensional WENO limiters by Zhong [17] and Zhu [18]. For simplicity, we denote the index set of E i 's neighbors as K i . For each k ∈ K i , we first transform Equation (2) to the ν-split form on the interface shared by E i and its neighbor E k : in which the flux Jacobian can be approximated by the average value of U on E i : From the hyperbolicity of Equation (2), there always exists the eigenvalue decomposition in which . The original conservative variable U is then projected into the space spanned by the columns of R, which defines the characteristic variable: Each scalar component of V can now be treated as independent functions, to which any suitable scalar WENO limiter (such as the one in [17,18]) can be applied. Once obtaining the reconstructed characteristic variable V new , it is turned back to the original conservative variable: The subscript i|k emphasizes the fact that it is a function defined on E i with the help of E k . There is one such reconstructed U new i|k for each k ∈ K i , so the final step is to weight them by the volume of the corresponding adjacent element:

The Momentum Source Model
To model a fixed or rotary wing with a high aspect ratio, we use the actuator line model, which is a special case of the more general momentum source model. In this model, a wing is sliced into a series of thin pieces, each of which is perpendicular to the wing's axis. When moving relative to the air, a piece feels a pair of aerodynamic forces: in which • c is the chord length of the piece, and s is the arc length parameter of the axis; • α is the angle of attack, which is related to u and w (velocity components resolved in the airfoil's local frame) by tan α = w/u; • C L and C D are the lift and drag coefficients of the airfoil, respectively; • L and D are the lift and drag per unit length, respectively.
Using Newton's third Law, the force per unit length felt by the air surrounding the piece is in which e L and e D are unit vectors along and perpendicular to the direction of airflow relative to the airfoil. To avoid ambiguity, we denote the force per unit length as b L and use b V to denote the force per unit volume. If the intersection of a wing's axis PQ and a DG element E is the line segment RT, then the body integral of the source term in Equation (6) is actually a line integral: By parameterizing the line segment RT, the line integral in Equation (13) can be evaluated as a definite integral:ˆR to which the standard Gaussian quadrature can be applied.

Results of Model Problems and Engineering Problems
In this section, we give the results of various problems to show the accuracy and performance of the methods described in Section 2.  [27], their exact solutions can be obtained, which can be used for assessing the accuracy of our solvers. In our earlier work [19], we have given the results obtained by running our solvers on an unstructured hexahedral mesh. This time, we use an unstructured tetrahedral mesh instead. Problem 1 (Sod). Solve the Euler system Equation (1) for t ∈ [0.0, 1.0] with the initial condition This is a classical problem of inviscid compressible flows. It contains all three types of discontinuities: a shock wave and a contact discontinuity running towards the right and an expansion wave running towards the left. In Figure 1, we plot the density contour given by our third-order solver with an EigenWeno limiter. In Figure 2, we compare the density distributions given by various solvers along the longitudinal axis (on which y = 0.5 and z = 0.25) of the box. The accuracy of our solvers and the effect of p-refinement are clear in these figures.  It is predictable that both mesh refinement (decreasing h) and order increment (increasing p) can help to improve accuracy. To compare the performance of solvers with different orders more fairly, it is better to use finer meshes for running lower-order solvers. After a few trials, we find that the solutions given by the h-p pairs listed in Table 1 roughly have the same level of accuracy, as shown in Figure 3. It is clear, at least for Problem 1, that high-order solvers are better than low-order ones in the sense of obtaining the same level of accuracy with less time and space costs. Similar conclusions were drawn in our previous work [19], in which the p = 3 solution of a linear advection problem on an h ≈ 1/4 mesh defeated the p = 1 solution of the same problem on an h ≈ 1/32 mesh in accuracy but saved quite a lot of time. These encouraging results justify our efforts to implement higher-order solvers.  This is another problem that involves all three types of discontinuities. It is more difficult than the previous one in the sense that its solution contains values beyond the range of initial values and the discontinuities are much steeper. As before, we plot the density contour of our third-order solution in Figure 4 and compare the density distributions given by various solvers in Figure 5. Both figures demonstrate the effect of p-refinement and the ability of our high-order solvers to suppress numerical oscillations.
This problem is easier than the previous two in the sense that no strong discontinuities (shock or contact discontinuity) occur in the solution. However, there is a region of vacuum, which does not occur in the previous problems, generated between the left-and rightrunning expansion waves. The velocity u inside the vacuumed region (denoted by a * in subscript) is undefined, but its boundary values should be determined from the left (denoted by an L in subscript) and right (denoted by an R in subscript) initial values by the two Riemann invariants: See [27] for more detailed discussions. Unfortunately, approximate Riemann solvers usually (have to) neglect this condition. For this reason, we use an exact Riemann solver to obtain fluxes on cell boundaries. As before, we plot the results in Figures 6 and 7, which again validate the correctness and robustness of our solvers.

The Forward
Step Problem This is a classical two-dimensional problem [31], but we treat it as a three-dimensional one: To show the applicability of our solvers on structured meshes, we solve this problem on meshes such as the one in Figure 8. As a common practice, we plot the density contours at various time steps in Figures 9-19.            The flow starts from a uniform supersonic (M = 3.0) state. In Figures 9 and 10, a curved bow shock wave generates in front of the forward-facing step and a fan-shaped expansion wave generates at the corner (x = 0.6, y = 0.2). The bow shock then hits the top of the tunnel ( Figure 11) and reflects from it (Figures 12 and 13). At approximately t = 1.2 ( Figure 14), the reflected shock hits the bottom of the tunnel (the top of the step) and reflects from it. Both of the two reflections are regular at these moments. Shortly before t = 2.0 ( Figure 14), a Mach stem that characterizes a Mach reflection is generated from the first reflection point. A wavy contact discontinuity generating from the triple point is captured by Figures 16-19. This flow structure would be smeared off if we solved Problem 4 using lower-order solvers or coarser meshes.

Three-Dimensional Engineering Problems
In this section, we solve the Euler system Equation (1) on a three-dimensional unstructured mesh, which discretizes the space around a YF-17 aircraft by tetrahedral cells, as shown in Figure 20. We solve this problem via two of our finite element solvers and plot the streamlines released from the strake and the wing at t = 10 in Figures 21 and 22. It can be seen that the streamlines obtained from the high-order (p = 3) solution are smoother than those from the low-order (p = 1) solution, which shows the benefit of p-refinement. The difference in accuracy is more obvious in Figure 23, which clearly shows a more detailed flow structure in the third-order solution (left half) and the piecewise constantness of the first-order solution (right half). Both solutions are able to capture the vortex trailing from the wing tip, which is generated from the the pressure difference between the upper and lower surfaces of the wing. When the wing generates positive lift, the pressure on the lower wing surface is higher than that on the upper wing surface. Under the action of this pressure difference, the air under the wing rolls up around the tip and flows backward, and the tip vortex is thus formed. The vortex trailing from the strake is generated from a similar mechanism.

YF-17 in Supersonic Flight
Problem 6. Solve the Euler system Equation (1)  We solve this problem using the same solvers as for Problem 5. Density contours on the y = 0 surface obtained from the first-and the third-order solutions are given in Figures 24 and 25, respectively. It is obvious that shock waves are captured well (without spurious oscillations) by both of them. To show the difference more clearly, we compare the two solutions in Figures 26 and 27. As in the subsonic case, the third-order solution (which is piecewise quadratic) outperforms the first-order one (which is piecewise constant). Shock waves (red) are generated on surfaces facing the wind, since the relative speed of air is larger than the speed of sound. Expansion waves (blue) are generated on leeward surfaces, since the solid body contracts there, which leaves more room for the supersonic flow.

Parallel Efficiency
Since the mesh in Figure 20 is highly unstructured and non-uniform, simple geometric partitioning cannot achieve a relatively balanced distribution of computational effort. In our code, we use the METIS library [32] to partition the dual graph of the mesh, which gives the results in Figures 28 and 29. The fluctuation of cell numbers is under 2%, which is quite good since the optimal partitioning of an unstructured mesh is an NP-hard problem. With such an approximately optimal partitioning, the parallel efficiency, which is defined as in which P is the number of processes, could surpass 99% in theory. In practice, however, the E values given by our numerical experiments are only around 80%, as shown in Figure 30, which is derived from the measured time costs given in Table 2. The main reason for the gap between theory and practice is that in the derivation of the ideal E value, inter-process communications are assumed to overlap in time with inner-cell computations, which is an over-optimistic assumption. Nevertheless, the acceleration is still significant, which reduces the wall clock time to an acceptable level.

Problems with Momentum Sources
In this section, we solve two problems of rotorcraft aerodynamics using the momentum source model described in Section 2.4. The problems are set to simulate wind tunnel tests of a rotor (a pair of rotary wings), using the mesh shown in Figure 31. All of the boundaries of the outside box are solid walls, except he left and right ends, which are set as the inlet and outlet, respectively. The spherical region is the circumscribed sphere of the rotor, whose rotating axis could point in any direction. The smaller box enclosing the sphere is used for refining the mesh in the surrounding and downstream regions of the rotor.

A Climbing Rotor
Problem 7. Solve Equation (1) in the surroundings of a rotor, whose rotating axis points in the direction of (−1, 0, 0). The initial condition is given as a uniform flow: ρ u v w p t=0 = 1.29 1.0 0.0 0.0 101325.0 , which is also the background state at the two ends of the wind tunnel.
Third-and first-order solutions at the same moment (t = 1) are shown in Figures 32 and 33, respectively, in which density contours and velocity directions are plotted on selected slices. Large-scale flow structures, such as the contraction of airflow near the rotor disk and the rolled-up airwake, are captured in both figures. However, finer details, such as the ripples generated from the rotor disk and the strong vortex ring in the downstream, are only visible in the third-order solution. The nearly axisymmetric flow structure is caused by the periodic movement of the rotary wings. The density of air immediately below the rotor (red) is greater than that above the rotor (blue), which means that the rotor is compressing the air flowing across it. According to Newton's third law, the rotor must be experiencing a force in the opposite direction exerted by the air. This is where the rotor thrust comes from.   (1) in the surrounding of a rotor, whose rotating axis points in the direction of (0, 0, 1). The initial condition is given as a uniform flow: ρ u v w p t=0 = 1.29 10.0 0.0 0.0 101325.0 , which is also the background state at the two ends of the wind tunnel.

Problem 8. Solve Equation
As with the climbing problem, we plot the third-and first-order solution at the same moment (t = 1) in Figures 34 and 35, respectively. As before, both of them can capture large-scaled structures, such as the skewed and rolled-up airwakes. The rolled-up structure is more clear in Figure 36, in which streamlines are plotted and colored by the magnitude of velocity. The piecewise constant first-order solution is much vaguer and loses many details, which shows again the value of developing high-order solvers.

Discussion
In this article, we give a concise but complete formulation of an RKDG scheme with a compact WENO limiter for solving inviscid compressible flows, possibly with momentum sources, on three-dimensional unstructured meshes. The algorithms are implemented on top of public available libraries, which support unstructured mesh partitioning, interprocess communication and distributed memory parallelization. The correctness and accuracy of the solvers are validated by lower-dimensional reference problems. Results of real three-dimensional applications are also presented, in which complex flow structures are captured by the third-order solver even on very coarse unstructured meshes. Future works may include implementing higher-order solvers under this framework and testing them on larger meshes and on larger high-performance computing platforms.

Data Availability Statement:
The data presented in this article are all generated from the source code publicly available in our GitHub repository at https://github.com/pvc1989/miniCFD accessed on 6 July 2022 (or the mirror site https://gitee.com/pvc1989/miniCFD accessed on 6 July 2022), except the mesh used in Section 3.2, which was downloaded from https://cgns.github.io/CGNSFiles.html accessed on 13 May 2022.

Acknowledgments:
The authors would like to express their deepest appreciation to Shen Xin for his professional guidance on computer science, and to Zhou Yukai for his patient assistance with mathtyping.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations and Nomenclature
The following abbreviations are used in this manuscript: Jacobian of F ν with respect to U b L body force per unit length b V body force per unit volume b x , b y , b z x, y, z-component of body force per unit volume c chord length of a blade E i and ∂E i i-th element (cell) and its boundary e 0 specific total energy of the gas F flux vector whose components are matrices F ν projection of F on ν F x , F y , F z x, y, z-component of F h 0 specific total enthalpy of the gas p pressure of the gas Q column matrix of source terms R changing rate ofÛ R ν matrix whose j-th column is the j-th eigenvector of A ν U column matrix of conservative variables U h and U new approximation of U and its WENO reconstruction U new i|k WENO reconstruction of U h on E i with the help of E k U coefficient matrix whose k-th column isÛ k U k projection of U on φ k U n and R nÛ and R at the n-th time step u x , u y , u z x, y, z-component of velocity V column matrix of characteristic variables V h and V new approximation of V and its WENO reconstruction