A Hybrid Time Integration Scheme for the Discontinuous Galerkin Discretizations of Convection-Dominated Problems

: Discontinuous Galerkin (DG) method is a popular high-order accurate method for solving unsteady convection-dominated problems. After spatially discretizing the problem with the DG method, a time integration scheme is necessary for evolving the result. Owing to the stability-based restriction, the time step for an explicit scheme is limited by the smallest element size within the mesh, making the calculation inefﬁcient. In this paper, a hybrid scheme comprising a three-stage, third-order accurate, and strong stability preserving Runge–Kutta (SSP-RK3) scheme and the three-stage, third-order accurate, L-stable, and diagonally implicit Runge–Kutta (LDIRK3) scheme is proposed. By dealing with the coarse and the reﬁned elements with the explicit and implicit schemes, respectively, the time step for the hybrid scheme is free from the limitation of the smallest element size, making the simulation much more efﬁcient. Numerical tests and comparison studies were made to show the performance of the hybrid scheme.


Introduction
The convection-dominated problem plays a key role in fluid dynamics, mass and heat transfer, and many other fields. The discontinuous Galerkin (DG) method has become popular for numerically solving convection-dominated problems mainly due to the attractive features of the method. Firstly, a polynomial is used to locally approximate the solution to the governing equation, allowing the method to easily achieve high-order accuracy on unstructured meshes. Secondly, within the framework of the DG method, an element only communicates with its nearest face neighbors. The compact stencil greatly benefits parallel computation. Thirdly, upwinding is introduced in the DG method using numerical flux. The agreement between the numerical method and the convective nature of the governing equation helps to stabilize the calculation.
In 1973, the DG method was first proposed by Reed and Hill in [1]. During the 1980s and the 1990s, the method was fully developed by Cockburn, Shu, Lin, and Hou in a series of papers [2][3][4][5][6]. In those papers, a method-of-lines approach was used for solving the time-dependent conservation laws. That is, the DG method was used to spatially discretize the governing equation; then, a time integration scheme was used to evolve the result in the time domain. This was followed by many other studies. For a comprehensive review of the approach, the reader is referred to [7] (pp. 97-108) and [8] (pp. 171-188). The former discussed cases where the time integration scheme is explicit; the latter concentrated on the cases where the scheme is implicit.
Both the explicit and the implicit schemes have their pros and cons. In realistic applications, the explicit scheme is easy to implement and the computing cost per time step is small. However, the time (1) where u = u(t, x), f = ( f 1 (u), f 2 (u)) is the convective flux, u 0 (x) defines the initial condition, b(t,x) defines the boundary conditions, and Ω ⊂ R 2 is a bounded domain with a Lipschitz boundary ∂Ω. For a deeper understanding for the DG discretization, the reader is referred to, e.g., Chapter 4 in [8].
We start by introducing notations. After partitioning Ω into K non-overlapping elements, the gathering of the elements is denoted as T h = {Ω k }. For simplicity, we assume the mesh is conforming and shape-regular. We further assume all elements in T h are triangular. By F h , we denote the gathering of element faces in T h . We say Ω k1 "neighbors on" Ω k2 when ∂Ω k1 ∩ ∂Ω k2 ∈ F h . For ∀Ω k ∈ T h , ∂Ω k denotes the boundary of the element, n k denotes the unit outward normal vector on ∂Ω k , and h k denotes a characteristic size of the element. The solution space for the DG method is defined as: S h = w : w ∈ L 2 (Ω) ; w| Ω k ∈ P N (Ω k ) ; ∀Ω k ∈ T h , (2) where P N denotes the space of polynomials of degree ≤ N. We choose the Lagrange polynomials l k i (x), x ∈ Ω k , i = 1, 2, · · · , N p , defined on uniformly distributed nodes as the bases of P N (Ω k ). For a triangular element, N p = (N + 1)(N + 2)/2. In addition, l k i (x), i = 1, 2, · · · , N p , k = 1, 2, · · · , K, form the bases of S h . On ∂Ω k , with w − and w + denoting the inner and outer traces of function w, respectively, we denote the jump and mean of w as: We approximate the solution of Equation (1) with: where u k h denotes the local approximation on Ω k . By multiplying the governing equation with a test function l k i (x) and integrating by parts on Ω k , we obtain: where C k i and D k i are the DG discretizations for the convective item and the diffusive item, respectively [19]. The expression for C k i is: whereĤ is the inviscid numerical flux. A popular type is the local Lax-Friedriches (LLF) flux [7] (p. 170):Ĥ where λ is defined as: If a face of Ω k coincides with ∂Ω, then the u + on the face is defined using the boundary conditions. An introduction of various inviscid numerical fluxes can be found in [20].
Within the frame of the DG method, different approaches are possible for dealing with the diffusive item: the Bassi-Rebay approach [21], the local DG approach [22], the compact DG approach [23], and the direct DG approach [24]. In our work, we apply the non-symmetric interior penalty Galerkin (NIPG) approach [8] (pp. 37-43). Therefore, where the penalty weight σ is defined as: where h Γ denotes the length of element face Γ. Similar to the case for the convective item, the outer traces on ∂Ω are provided by the boundary conditions.
Formulating Equation (5) using all base functions of S h , we obtain a system of ordinary differential equations (ODEs): where: For ∀Ω k ∈ T h : R k denotes the right-hand sides of Equation (5) with i ranging from 1 to N p and F k = (M k ) −1 R k . In our applications, the spatial integrations are calculated using the Gauss quadrature rule. Specifically, the accuracy orders for the integrations over an element and an element face are 2N and 2N + 1, respectively, with N denoting the degree for the polynomial used for local approximation.

Basic Schemes
Multi-stage Runge-Kutta (RK) schemes are considered in this paper to integrate Equation (11) from time t n to t n+1 . An s-stage RK scheme has the form [25]: where ∆t = t n+1 − t n , U n = U(t n ) and the coefficients a ij , b j , and c j differ with specific RK schemes. The coefficients for SSP-RK3 and LDIRK3 are given by the following Butcher tableaus: and For SSP-RK3, Equation (15) characterizes a ij = 0, i ≤ j. That is, the calculation for the ith stage only requires the data of previous stages. This makes the explicit scheme easy to implement. However, its time step length is severely restricted by the CFL condition. When the DG solution for a d-dimensional time-dependent conservation law is sought from S h , the time step corresponding to SSP-RK3 has to satisfy [3,26]: where λ k is the biggest characteristic velocity on Ω k . A rather small ∆t is required when tiny elements exist in T h , making the simulation less efficient. For LDIRK3, the stability-based restriction on the time step is loose. Its available time step can be hundreds of times bigger than that for an explicit scheme [27]. However, due to the non-zero diagonal coefficients in Equation (16), nonlinear equations have to be solved during the time integration. Taking the first stage of Equation (16) as an example, we need to solve to get U (1) . This is usually achieved with the Newton-Raphson method, i.e., where the global Jacobian matrix is defined as: where I is the identity matrix. In solving a scalar problem with the DG method, there are (N p · K) 2 values in J. Both the construction of the matrix and the solution of the linear equations, e.g., Equation (19), are computing-expensive and time-consuming. When the nonlinear-solver-related calculation cannot be compensated by a big time step, the implicit scheme is inefficient.

Elements Grouping and Scheme Coupling
According to their size, the elements in T h are gathered into the explicit set T h,ex and the implicit set T h,im . Generally, all elements in T h,ex are coarse, whereas all fine elements are collected in T h,im . An index quantifying the element size variance is defined as [9]: Similar to [9], the ODEs in Equation (11) are divided into two groups: where U ex and U im correspond to T h,ex and T h,im , respectively. From t n to t n+1 , the ith stage for an s-stage RK scheme corresponds to t n + c i ∆t, where ∆t = t n+1 − t n and c i can be found in the left row of the Butcher tableau. That is, U (i) in Equation (14) can be explained as the numerical solution corresponding to t n + c i ∆t. The ARK scheme developed in [10] consists of an explicit RK scheme, denoted by ARK(ex), and an implicit RK scheme, denoted by ARK(im). Their Butcher tableaus characterize that c i,ARK(ex) = c i,ARK(im) , i = 1, 2, · · · , s. Therefore, the ith stage of ARK(ex) and that of ARK(im) correspond to the same time, which considerably simplifies the coupling between the two schemes. In [9], for the ith stage of ARK, U In Equations (15) and (16), the coefficients c i , i = 1, 2, 3, for SSP-RK3 and LDIRK3 are different. However, the two ends of the time step are universal for the two schemes. Therefore, our hybrid scheme consists of two phases: achieving U ex (t n+1 ) with SSP-RK3 first, then achieving U im (t n+1 ) with LDIRK3.
According to Equation (15), the SSP-RK3 scheme can be written as: A sketch of a one-dimensional (1D) mesh for the DG method is provided in Figure 1, where the elements around the interface between T h,ex and T h,im are marked by δ, θ, ζ and η. In the following descriptions, we let the element marker represent the local approximation on the element and omit the time in F, e.g., δ n+1 denotes U n+1 (δ) and F(δ (1) ) denotes F(t n , U (1) (δ)). The elements in T h,im that are necessary in computing δ n+1 can be identified using Equation (23). Figure 1. Elements grouping for a one-dimensional (1D) mesh. Theorem 1. Within time step [t n , t n+1 ], when SSP-RK3 is taken to achieve the local DG solution on a given element, the nearest three layers of elements are involved in the computation.
Proof. We restrict our discussion to one side of a 1D element for simplicity.
According to the last line in Equation (23), F(δ (3) ) contributes to δ n+1 . Therefore, δ n+1 requires θ (3) because F(δ (3) ) involves θ (3) . θ (3) requiring η (1) is already known. Therefore, δ n+1 requires η (1) . SSP-RK3 depends on the solutions of t n . According to Theorem 1, to achieve a reliable U ex (t n+1 ) with SSP-RK3, the t n -approximations on the outer two layers beyond T h,ex , together with the trace value of the outer third layer, must be reliable. In our applications, a region two layers wider than T h,ex is named T h,ex1 . All elements in T h,ex1 undergo the SSP-RK3 integration during the explicit phase of the hybrid scheme. Only the solutions corresponding to T h,ex are retained as reliable approximations for t n+1 .
The calculation of U im (t n+1 ) is straightforward. At the ith stage of LDIRK3, the local approximations of t n + c i,LDIRK3 ∆t are first achieved using SSP-RK3, providing the boundary conditions for the implicit integration on T h,im . Here, the region for the SSP-RK3 is denoted by T h,ex2 . In Figure 1, T h,ex2 is comprised of five elements centered at δ because only the trace value of U(t n + c i,LDIRK3 ∆t, δ) is useful for the implicit integration on T h,im .

Constructing the Jacobian Matrix
During the implicit phase of the hybrid scheme, a Newton-type nonlinear solver is necessary. As is shown in Equation (19), the Jacobian matrix is a key to the solver.
Here, we describe an analytic method of constructing the matrix J R ≡ ∂R/∂U, where R and U are defined in Equation (12). There are (N p × K) 2 values in J R . Being divided into K 2 squares, the (m, n)th square, denoted by J m,n R , equals ∂R m /∂U n . According to Equation (5), J m,n R is full of zeros except for the cases where m = n or Ω m neighbors on Ω n .
We first consider J m,m R . By taking the partial derivative of the right-hand sides of Equation (5) with respect to u m j , the (i, j)th value in the square is calculated by: For the LLF flux in Equation (7), we take: Next, we consider J m,n R where Ω m neighbors on Ω n . By taking the partial derivative of the right-hand sides of Equation (5) with respect to u n j , the (i, j)th value of the square is calculated as: where, for the LLF flux, Within J R , only the portions related to T h,im are useful to the hybrid scheme. We restrict ourselves to calculating the squares corresponding to T h,im and leave out the the squares related to T h,ex . This greatly reduces the required time and computing resources.

Nonlinear Solver
In our applications, an inexact Newton-Raphson method is used for solving the nonlinear equations. That is, within the time step [t n , t n+1 ], the Jacobian matrix is computed with the data corresponding to t n , then the matrix is reused in the remaining iterations as well as stages [25]. Without hardly affecting the convergence rate, the inexact method leads to a large reduction in computing expense.
For the purpose of solving the linear equations arising from the inexact Newton-Raphson method, a preconditioned generalized minimal residual (GMRES) method is adopted. The preconditioner is constructed using the incomplete lower-upper (ILU) decomposition method. To further accelerate the convergence of the GMRES method, the reverse Cuthill-Mckee (RCM) algorithm is taken to narrow the difference between the preconditioner and the coefficient matrix for the linear equations. A comprehensive view of the linear solver can be found in, e.g., [28].

Stability
Here we discuss the regions of absolute stability of the time integrations schemes mentioned above. For the purpose of deciding the region, we apply the scheme to the model ODE: where λ is a complex constant. For RK schemes, the numerical solution corresponding to t n+1 can be expressed as the product of that corresponding to t n and an increment function, i.e., where z ≡ λ∆t and ∆t = t n+1 − t n . A time integration scheme is stable when |R(z)| ≤ 1.
In Figure 2 we plot the absolute stability regions for the SSP-RK3 scheme and the LDIRK3 scheme. For the evolution of the ODEs, e.g., Equation (11), the complex constant λ in Equation (28) can be explained as the eigenvalue with the biggest norm of the system. It quantifies the stiffness of the system and is determined by the by the specific way of spatial discretization. In [9], the stiffness portion related to the element size is denoted as "geometry-induced stiffness". This can be expressed As is shown in Figure 2a, the stability region of SSP-RK3 is bounded by a closed curve. Its time step have to vary with λ, i.e., ∆t = ∆t(λ(h k )) = ∆t(h k ), ∀Ω k ∈ T h , to ensure that z ≡ λ∆t lying within the stability region. On the other hand, as is seen in Figure 2b, ∀z, |z| = ∞ belongs to the stability region of LDIRK3, making whose time step free from the affection of λ. In our hybrid scheme, the elements in T h,ex and T h,im are separately treated with the SSP-RK3 scheme and LDIRK3 scheme. Therefore, its time step is limited by the elements sizes in T h,ex , but free from the element size restriction of T h,im , i.e., ∆t = ∆t(h k ), ∀Ω k ∈ T h,ex . To summarize, the time step for our hybrid scheme is decided in such a way that it maintains the stability of SSP-RK3 on T h,ex . Therefore, we believe Figure 2a can be reused to illustrate the stability region of our hybrid scheme, on condition taking in mind that ∆t = ∆t(h k ), ∀Ω k ∈ T h,ex .

Numerical Tests
We execute numerical tests with unsteady problems in this section. All computations are performed on a laptop equipped with an Intel Core 2.6 GHz processor and 6GB memory. For each problem, we apply the hybrid scheme to evolve the DG discretization. In addition, we also integrate the DG discretization with the SSP-RK3 scheme and a four-stage, third-order accurate SSP-RK scheme [29], denoted by SSP-RK3-4, to compare the performances of the schemes in terms of efficiency. In Appendix A, we provide the Butcher tableau of the SSP-RK3-4 scheme and plot its region of absolute stability. Compared with Figure 2a, the stability region for the SSP-RK3-4 scheme is bigger. According to [29], the available CFL number for the SSP-RK3-4 scheme is twice that for the SSP-RK3 scheme.
For our hybrid scheme, the time step length is decided by: where d denotes the spatial dimension of the problem, h k takes the element length for 1D problems or the inscribed circle diameter of the element in 2D problems, λ k differs with the specific problem, and C ex and C im are adjustable parameters. We take C im /C ex ≈ S, with S defined in Equation (21). For the SSP-RK3 scheme, the time step is decided by: For the SSP-RK3-4 scheme, we set the time step two times as big as that for the SSP-RK3 scheme, i.e., For problems with analytic solutions, we calculate the L 2 errors of the numerical results. Given the numerical solution, u h , and the analytic solution, u analy , the L 2 error is defined as [30]:

1D Linear Advection Problem
The problem is governed by: and the boundary conditions are periodic. The analytic solution to the problem is q(t, x) = 1 + cos(2π(x − at)).
We numerically solve the problem with a = 1 on three meshes, denoted by "Coarse 1D", "Middle 1D", and "Fine 1D". During the mesh generation, the computational domain is divided into three regions: the left, the central and the right. The left and the right regions are symmetric with respect to the origin. The grids are uniformly distributed in the two regions. The range of the central region is kept four times the element size in the left/right region. In the central region, the grids are symmetrically distributed. The smallest element size in the central region is kept one-tenth of that in the left/right region, resulting in S = 10. In the central region, the ratio between the adjacent element sizes is kept less than 1.5. The biggest element sizes in Coarse 1D, Middle 1D, and Fine 1D are 0.1, 0.05, and 0.025, respectively.
Here are some remarks on the meshes. Generally the ratio between the adjacent element sizes should not exceed 1.2 in order to avoid big interpolation error. However, in the DG method, the high-order accuracy is achieved by increasing node numbers within the elements. There is no need to expand the stencil, as is done in the finite difference (FD) method or the finite volume (FV) method. In addition, as is shown in Equations (6) and (9), the DG solution is allowed to be discontinuous on element interfaces. The coupling between the local approximations is weak. Therefore, at least for the problems with smooth solutions, we believe the impact of the big element size change is endurable for the DG method. Last but not least, this kind of sudden jump in the size of element also appears in [9]. See Figure 6a therein for reference. Figure 3 shows the Coarse 1D mesh and its elements grouping. The roles of T h,im , T h,ex , T h,ex1 , and T h,ex2 during the temporal integration are same as in Figure 1. Since the time step is restricted by the elements that subjected to the explicit integration, i.e., the elements in T h,ex1 , the outer two layers of elements in T h,im are coarse. The elements groupings for Middle 1D and Fine 1D are similar to that for Coarse 1D. For each calculation, we take N = 2 for local approximations. We spatially discretize the problem according to Section 2. Since there is no diffusive item in the governing equation, we leave the diffusion-related items in the discretizations as zeros. For deciding the time steps, we take d = 1, C ex = 1, C im = 10 and λ k = a. Each calculation is run up to t = 1. Table 1 shows the L 2 error of the numerical result for each mesh and the convergence order of the errors. We found that the convergence order in space is N + 1, which is optional for the DG method in solving conservation laws [6]. According to [6], to guarantee the domination of the spatial error, the accurate order for the time integration scheme is at least N + 1. The optional convergence order in Table 1 indicates that the hybrid scheme maintains the high-order accuracy of SSP-RK3 or LDIRK3. It also proves that the big size changes in the mesh has little effect on the accuracy of the method. Table 2 compares the efficiency of the SSP-RK3 scheme (RK3), the SSP-RK3-4 scheme (RK3-4) and our hybrid scheme (hybrid). The second to the fourth columns show the average time step lengths for different schemes. The fifth to the last columns record the CPU time (in seconds) for the simulations. In accordance with Equations (32)-(34), the average time steps for the SSP-RK3-4 scheme and the hybrid scheme are twice and S times as that for the SSP-RK3 scheme, respectively. On the other hand, the CPU time indicate that the acceleration of the computation does not scale linearly with the amplification of the time step. For the SSP-RK3-4 scheme, this is owing to the computation related to the extra fourth stage. For our hybrid scheme, this may comes from the nonlinear-solver-related calculations, e.g., constructing the Jacobian matrix and solving linear systems. Nevertheless, based on the given meshes, it is clear that the hybrid scheme is superior towards the SSP-RK3 scheme or the SSP-RK3-4 scheme in terms of efficiency.

1D Linear Advection-Difussion Problem
The problem is governed by: The analytic solution to the problem is q (t, x) = exp −π 2 t sin (π (x − at)). The boundary conditions are applied using the analytic solution.
We numerically solve the problem with a = 1 and = 0.01. The meshes being used are same as in Section 4.1. N = 1 is taken for the local approximations. The spatial discretization for the problem is the same as in Section 2. To find the upper limit of C em , we first apply the SSP-RK3 scheme. The definition of λ k is given by: Through numerical tests, we find that the maximum of C ex for maintaining the calculation stability is 0.3. For defining the time step for the hybrid scheme, we take d = 1, C ex = 0.3, C im = 3. Each calculation is run up to t = 0.1. Table 3 shows the L 2 errors of the numerical results and their convergence order. According to [8] (p. 75), when the NIPG approach is selected for dealing with the diffusive item, the theoretical convergence order measured in L 2 -norm is N + 1 when N is odd. We found that the numerical results matched well with the theory. Table 4 compares the efficiency of different schemes. Similar to the results in Section 4.1, in every simulation, the hybrid scheme enjoyed the biggest available time step and consumed the least CPU time.

1D Euler System
The problem is governed by: where: where ρ is the density, u is the velocity, p is the pressure, and E = ρu 2 /2 + p/(γ − 1) is the total energy. Here γ = 1.4. We define the initial condition as: and apply periodic boundary conditions. The analytic solution to the problem is: The meshes being used are same as in Section 4.1. N = 2 is taken for local approximations. During the spatial discretization, the Vijayasundaram flux [31]: is chosen as the inviscid numerical flux. The treatment for each equation of the governing system is similar to that in Section 2. In our application, the ODEs pending to the temporal integration are: where: where Q e , M e , R e , F e , e = 1, 2, 3, correspond to the eth equation of the governing system. Their definitions are similar to that in Equation (12). When integrating Equation (44) with the hybrid scheme, it is necessary to construct the matrix J R,sys ≡ ∂R sys /∂Q sys . For the 1D Euler system, J R,sys consists of 3 × 3 blocks. The calculation for the (e 1 , e 2 )th block, i.e., ∂R e1 /∂Q e2 , is similar to that in Section 3.2.2. The matrices P − and P + are directly taken as the partial derivatives ofĤ Vija with respect to q − and q + , respectively. For defining time steps using Equations (32)-(34), we take d = 1, C em = 1 and C im = 10. The definition of λ k is: where c = γp/ρ is the speed of sound. Each calculation is run up to t = 0.5. Table 5 shows the L 2 errors of the densities and their convergence order. We found that the optimal (N + 1)th order is well attained. Table 6 compares the available time step lengths and the CPU time for different schemes. Again the hybrid scheme enjoyed the biggest time step and the least CPU time.

2D Euler System
The problem is governed by: where: In Equation (48), ρ is the density, u and v are the velocity components, p is the pressure, and E = ρ(u 2 + v 2 )/2 + p/(γ − 1) with γ = 1.4 is the total energy. We consider an isentropic vortex problem with the analytic solution [7] (p. 209): It is used for defining the initial condition and boundary conditions for numerical simulations.
We solve the problem on three meshes, denoted by "Coarse 2D", "Middle 2D", and "Fine 2D". In each mesh, the domain is divided into the surrounding region and the central region. The coarse elements in the surrounding region are made by diagonalizing the squares of same sizes. The central region is a square whose side length is kept two times the leg length for the coarse element. The elements in the central region are refined artificially. The smallest element length is kept one-eighth of that in the surrounding region, leading to S = 8. The ratio between the adjacent element sizes is kept less than 1.5 during the refinement. Since all elements within the mesh are are isosceles right triangles, the skewness (measured by the normalized equiangular method) for the mesh is constant 0.25 and the aspect ratio is 1.41. The biggest leg lengths in Coarse 2D, Middle 2D, and Fine 2D are 0.5, 0.25, and 0.125, respectively.
Here are some remarks concerning the quality of the 2D meshes. Similar to the 1D cases, we believe the impact of the radical size change is minimized by the independence of the local approximations in the DG method. In addition, the radical size jump in element size is common in relevant researches, see, e.g., Figure 8 in [9], Figure 5 in [11] and Figure 1c in [12], for reference. Figure 4 shows the Coarse 2D mesh and its elements grouping. The grouping strategy for the 2D elements is similar to the 1D case. The relationship between the color and the membership for an element is given in Table 7. The elements groupings for Middle 2D and Fine 2D are similar to that for Coarse 2D.

Group Color
T h,im red + yellow T h,ex green + blue T h,ex1 yellow + green +blue T h,ex2 yellow + green For each calculation, N = 2 is taken for local approximations. The spatial discretization for the problem is similar to that in Section 4.3. For defining time steps, we take d = 2, C ex = 1 and C im = 8. The definition of λ k is: where v = (u, v) and c = γp/ρ. Each calculation is run up to t = 3. Table 8 shows the L 2 errors of the densities and their order of convergence. Similar to the 1D cases, the optimal (N + 1)th order for the DG method is achieved. It indicates that the high-order accuracy of the SSP-RK3 scheme or the LDIRK3 scheme is maintained by the hybrid scheme, and the impact of the big size changes in the mesh is negligible. Table 9 compares the efficiency of different time integration schemes. The hybrid scheme performs even better in simulating 2D problems than in 1D problems. For each 2D simulation, the CPU time it took was merely one-third of that for the SSP-RK3 scheme or less than half of that for the SSP-RK3-4 scheme.

2D Navier-Stokes System
The problem is governed by: where q, f, and g are defined same to Equation (48) and: In Equations (52) and (53), u and v are the velocity components, µ is the viscosity coefficient, κ is the heat conductivity, and T is the temperature. To derive a dimensionless form of the governing system, we introduce a reference density ρ re f , a reference velocity V re f , and a reference length L re f . L re f /V re f defines a reference time, ρ re f V 2 re f defines a reference pressure, and V 2 re f /c v defines a reference temperature with c v denoting the specific heat at constant volume. Normalizing Equation (51) with the reference quantities and denoting the dimensionless quantities with the apostrophe, we obtain: where q , f , and g are defined as: and r 1 and r 2 are defined as: where In Equations (56) and (57), Re re f = ρ re f V re f L re f /µ is the reference Reynolds number and Pr = γc v µ/κ is the Prandtl number. We take γ = 1.4 and Pr = 0.71 for air. In our applications, ρ re f = ρ ∞ /γ, V re f = c ∞ and L re f takes the cylinder diameter. The speed of sound is calculated by c = γp/ρ and the subscript ∞ refers to the state of the free-stream. Figure 5a shows the mesh for the numerical simulation. It is generated in a way similar to that in Figure 14, [9] and Figure 10, [32]. The diameter for the outer boundary is 40 times as big as that for the cylinder. There are 3840 triangular elements in the mesh. They are achieved by diagonalizing 48 × 40 structured rectangular elements. The heights of the first five layers of elements adhering to the cylinder are 0.02, 0.03, 0.05, 0.12 and 0.12. In the first layer of elements, we achieve the maximum of the aspect ratio, 3.25, and the maximum of the skewness (measured by the normalized equiangular method), 0.72. The biggest size changes appear at the third and the fourth layers of elements, i.e., 0.12/0.05 = 2.4. Within the mesh, the maximum and minimum of h k are 0.0607 and 0.0168, respectively, resulting in S ≈ 4. Figure 5b presents the coloring of the elements. The coloring strategy is the same as in Table 8. In our computation, the state for the free-stream is defined by: leading to Ma ∞ = 0.3, with the Mach number calculated by Ma = √ u 2 + v 2 /c. Besides, the Reynolds number of the free-stream is set to Re ∞ = 150. To speed up the simulation, our computation consists of two periods. In the first period, we pursue the steady inviscid solution to the problem. That is, the flow is governed by the 2D Euler system, with b ∞ defining the initial condition. During the computation, characteristic boundary conditions [33] and inviscid wall conditions are applied to the outer boundary and the cylinder surface, respectively. In the second period, the governing system for the problem is replaced by Equation (54). The inviscid solution just obtained is used for initializing the flow. Characteristic boundary conditions and viscous wall conditions [8] (p. 484) are applied during the remaining computation.
In our computation, N = 3 is taken for local approximations. The spatial discretization for the convective items are the same as that in Section 4.4. The NIPG approach for treating the viscous items can be found in [8] (pp. 486-491). To find the upper limit of C em , we first integrate the DG discretization using the SSP-RK3 scheme, with the time step length defined by: where: where v = (u , v ), c = γp /ρ , and h k is the dimensionless element size. Through numerical tests, we found that the maximum of C ex for maintaining the calculation stability is 0.8. Next, the hybrid scheme with ∆t = 0.0025 is used for the time integration. The fixed time step length is close to one obtained by Equation (32) with d = 2, C em = 0.8 and C im = 4.
According to the direct numerical simulations (DNS) results in [34] and the experimental results referred to therein, the flow is unsteady with Ma ∞ = 0.3 and Re ∞ = 150. Over time, vortices periodically shed from the cylinder, causing fluctuations in the physical quantities in the wake of the cylinder. Figure 6 shows the numerical results at t = 408. The fluctuations in the fields and the bending of the streamlines indicate that the shedding of vortices has been set up well. During the calculation, the lift coefficient C L of the cylinder is recorded every 50 time steps. The time history for C L from t = 350 to t = 408 is shown in Figure 7. It can be found that the fluctuation period of C L is T shed = 18. Table 10 compares our numerical results with that in reference [34]. The matches between the two sets of results illustrate the usefulness of the hybrid scheme.     Table 11 presents the available dimensionless time step length and the CPU time (in seconds) cost per time step for each scheme. After dividing the CPU time with the time step length for each scheme, we got the required CPU time for the simulation corresponding to unit dimensionless time, i.e., 2985 s for the SSP-RK3 scheme, 1923 s for the SSP-RK3-4 scheme and 1480s for our hybrid scheme. It is clear that the hybrid scheme performs best in terms of efficiency.

Conclusions
In this work, we developed a hybrid scheme for the temporal integration of the DG discretization for the unsteady convection-dominated problem. The hybrid scheme consists of two popular high-order schemes: the SSP-RK3 scheme and the LDIRK3 scheme. By separately treating the coarse elements and the refined elements with an explicit scheme and an implicit scheme, a time step bigger than that for a purely explicit scheme can be reached at a comparatively low cost. Using the independence of the local approximation for the DG method, we proposed a strategy of grouping the elements that is necessary for maintaining the stability and reducing the computing cost of the hybrid scheme. We tested our scheme by numerically solving several threshold problems. The matches between the numerical solutions and the reference results proved the usefulness of the hybrid scheme. The comparison with the traditional explicit schemes testified the effectiveness of the hybrid scheme on the meshes with certain level of geometry-induced stiffness.
Author Contributions: Conceptualization, methodology, software, validation, formal analysis, investigation, data curation, writing-original draft preparation, writing-review and editing, visualization: L.L.; resources, supervision, funding acquisition: S.W. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China, grant number 91530325.

Acknowledgments:
The first author is sincerely thankful to Fujun Liu from the School of Mathematics and System Science, Beihang University for the advice on the setting for the last numerical test. The valuable suggestions given by the anonymous reviewers are deeply appreciated by the authors.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
A new class of SSP-RK schemes was developed in reference [29]. By allowing the stage numbers bigger than the order numbers, the available CFL numbers for the new SSP-RK schemes are larger than those for the traditional SSP-RK schemes. A four-stage, third-order accurate scheme of this class is denoted by SSP-RK3-4. Its Butcher tableau is given as: (A1) Similar to that in Section 3.4, we deduce the increment function for the scheme, i.e., and plot the stability region in Figure A1. For more information of the SSP-RK3-4 scheme, the reader is referred to reference [29].