Next Article in Journal
Relations between the Complex Neutrosophic Sets with Their Applications in Decision Making
Next Article in Special Issue
The Generalized Schur Algorithm and Some Applications
Previous Article in Journal
Umbral Methods and Harmonic Numbers
Previous Article in Special Issue
On a Class of Hermite-Obreshkov One-Step Methods with Continuous Spline Extension
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Efficient Implementation of ADER Discontinuous Galerkin Schemes for a Scalable Hyperbolic PDE Engine

1
Department of Civil, Environmental and Mechanical Engineering, University of Trento, I-38123 Trento, Italy
2
Department of Informatics, Technical University of Munich, D-85748 Munich, Germany
3
Department of Computer Science, University of Durham, Durham DH1 3LE, UK
*
Author to whom correspondence should be addressed.
Axioms 2018, 7(3), 63; https://doi.org/10.3390/axioms7030063
Submission received: 21 May 2018 / Revised: 21 August 2018 / Accepted: 22 August 2018 / Published: 1 September 2018
(This article belongs to the Special Issue Advanced Numerical Methods in Applied Sciences)

Abstract

:
In this paper we discuss a new and very efficient implementation of high order accurate arbitrary high order schemes using derivatives discontinuous Galerkin (ADER-DG) finite element schemes on modern massively parallel supercomputers. The numerical methods apply to a very broad class of nonlinear systems of hyperbolic partial differential equations. ADER-DG schemes are by construction communication-avoiding and cache-blocking, and are furthermore very well-suited for vectorization, and so they appear to be a good candidate for the future generation of exascale supercomputers. We introduce the numerical algorithm and show some applications to a set of hyperbolic equations with increasing levels of complexity, ranging from the compressible Euler equations over the equations of linear elasticity and the unified Godunov-Peshkov-Romenski (GPR) model of continuum mechanics to general relativistic magnetohydrodynamics (GRMHD) and the Einstein field equations of general relativity. We present strong scaling results of the new ADER-DG schemes up to 180,000 CPU cores. To our knowledge, these are the largest runs ever carried out with high order ADER-DG schemes for nonlinear hyperbolic PDE systems. We also provide a detailed performance comparison with traditional Runge-Kutta DG schemes.

1. Introduction

Hyperbolic partial differential equations are omnipresent in the mathematical description of time-dependent processes in fluid and solid mechanics, in engineering and geophysics, as well as in plasma physics, and even in general relativity. Among the most widespread applications nowadays are (i) computational fluid mechanics in mechanical and aerospace engineering, in particular compressible gas dynamics at high Mach numbers; (ii) geophysical and environmental free surface flows in oceans, rivers and lakes, describing natural hazards such as tsunami wave propagation, landslides, storm surges and floods; (iii) seismic, acoustic and electromagnetic wave propagation processes in the time domain are described by systems of hyperbolic partial differential equations, namely the equations of linear elasticity, the acoustic wave equation and the well-known Maxwell equations; (iv) high energy density plasma flows in nuclear fusion reactors as well as astrophysical plasma flows in the solar system and the universe, using either the Newtonian limit or the complete equations in full general relativity; (v) the Einstein field equations of general relativity, which govern the dynamics of the spacetime around black holes and neutron stars, can be written under the form of a nonlinear system of hyperbolic partial differential equations.
The main challenge of nonlinear hyperbolic PDE arises from the fact that they can contain at the same time smooth solutions (like sound waves) as well as small scale structures (e.g., turbulent vortices), but they can also develop discontinuous solutions (shock waves) after finite time, even when starting from perfectly smooth initial data. These discontinuities were first discovered by Bernhard Riemann in his ground breaking work on the propagation of waves of finite amplitude in air [1,2], where the term finite should actually be understood in the sense of large, rather than simple sound waves of infinitesimal strength that have been considered in the times before Riemann. In the abstract of his work, Riemann stated that his discovery of the shock waves might probably not be of practical use for applied and experimental science, but should be mainly understood as a contribution to the theory of nonlinear partial differential equations. Several decades later, shock waves were also observed experimentally, thus confirming the new and groundbreaking mathematical concept of Riemann.
The connection between symmetries and conservation laws were established in the work of Emmy Noether [3] at the beginning of the 20th century, while the first methods for the numerical solution of hyperbolic conservation laws go back to famous mathematicians such as Courant and Friedrichs and co-workers [4,5,6,7]. The connection between hyperbolic conservation laws, symmetric hyperbolic systems in the sense of Friedrichs [8] and thermodynamics was established for the first time by Godunov in 1961 [9], and was rediscovered again by Friedrichs and Lax in 1971 [10]. Within this theoretical framework of symmetric hyperbolic and thermodynamically compatible (SHTC) systems, established by Godunov and Romenski [11,12], it is possible to write down the Euler equations of compressible gas dynamics, the magnetohydrodynamics (MHD) equations [13], the equations of nonlinear elasticity [14], as well as a rather wide class of nonlinear hyperbolic conservation laws [15] with very interesting mathematical properties and structure. Very recently, even a novel and unified formulation of continuum physics, including solid and fluid mechanics only as two particular cases of a more general model, have been cast into the form of a single SHTC system [16,17,18,19]. In the 1940ies and 1950ies, major steps forward in numerical methods for hyperbolic PDE have been made in the ground-breaking contributions of von Neumann and Richtmyer [20] and Godunov [21]. While the former introduce an artificial viscosity to stabilize the numerical scheme in the presence of discontinuities, the latter constructs his scheme starting from the most elementary problem in hyperbolic conservation laws for which an exact solution is still available, the so-called Riemann problem. The Riemann problem consists in a particular Cauchy problem where the initial data consist of two piecewise constant states, separated by a discontinuity. In the absence of source terms, its solution is self-similar. While provably robust, these schemes are only first order accurate in space and time and thus only applicable to flows with shock waves, but not to those also involving smooth sound waves and turbulent small scale flow structures. In his paper [21], Godunov has also proven that any linear numerical scheme that is required to be monotone can be at most of order one, which is the well-known Godunov barrier theorem. The main goal in the past decades was to find ways how to circumvent it, since it only applies to linear schemes. The first successful nonlinear monotone and higher order accurate schemes were the method of Kolgan [22] and the schemes of van Leer [23,24]. Subsequently, many other higher order nonlinear schemes have been proposed, such as the ENO [25] and WENO schemes [26], and there is a rapidly growing literature on the subject. In this paper we mainly focus on a rather recent family of schemes, which is of the discontinuous finite element type, namely the so-called discontinuous Galerkin (DG) finite element methods, which were systematically introduced for hyperbolic conservation laws in a well-known series of papers by Cockburn and Shu and collaborators [27,28,29,30,31]. For a review on high order DG methods and WENO schemes, the reader is referred to References [32,33]. In this paper we use a particular variant of the DG scheme that is called the ADER discontinuous Galerkin scheme [34,35,36,37,38], where ADER stands for arbitrary high order schemes using derivatives, first developed by Toro et al. in the context of high order finite volume schemes [39,40,41,42]. In comparison to traditional semi-discrete DG schemes, which mainly use Runge-Kutta time integration, ADER-DG methods are fully-discrete and are based on a predictor-corrector approach that allows the achievement of a naturally cache-blocking and communication-avoiding scheme, which reduces the amount of necessary MPI communications to a minimum. These properties make the method well suitable for high performance computing (HPC).

2. High Order ADER Discontinuous Galerkin Finite Element Schemes

In this paper we consider hyperbolic PDEs with non-conservative products and algebraic source terms of the form (see also References [34,35])
Q t + · F Q + B ( Q ) · Q = S ( Q ) ,
where t R 0 + is the time, x Ω R d is the spatial position vector in d space dimensions, Q Ω Q R m is the state vector, F ( Q ) is the nonlinear flux tensor, B ( Q ) · Q is a non-conservative product and S ( Q ) is a purely algebraic source term. Introducing the system matrix A ( Q ) = F / Q + B ( Q ) the above system can also be written in quasi-linear form as
Q t + A ( Q ) · Q = S ( Q ) .
The system is hyperbolic if for all n 0 and for all Q Ω Q , the matrix A ( Q ) · n has m real eigenvalues and a full set of m linearly independent right eigenvectors. The system in Equation (1) is provided with an initial condition Q ( x , 0 ) = Q 0 ( x ) and appropriate boundary conditions on Ω . In some parts of the paper we will also make use of the vector of primitive (physical) variables denoted by V = V ( Q ) . For very complex PDE systems, such as the general-relativistic MHD equations, it may be much easier to express the flux tensor F in terms of V rather than in terms of Q , however the evaluation of V = V ( Q ) can become very complicated.

2.1. Unlimited ADER-DG Scheme and Riemann Solvers

We cover the computational domain Ω with a set of non-overlapping Cartesian control volumes in space Ω i = [ x i 1 2 Δ x i , x i + 1 2 Δ x i ] × [ y i 1 2 Δ y i , y i + 1 2 Δ y i ] × [ z i 1 2 Δ z i , z i + 1 2 Δ z i ] . Here, x i = ( x i , y i , z i ) denotes the barycenter of cell Ω i and Δ x i = ( Δ x i , Δ y i , Δ z i ) is the mesh spacing associated with Ω i in each spatial dimension. The domain Ω = Ω i is the union of all spatial control volumes. A key ingredient of the ExaHyPE engine http://exahype.eu is a cell-by-cell adaptive mesh refinement (AMR), which is built upon the space-tree implementation of Peano [43,44]. For further details about cell-by-cell AMR, see Reference [45]. High order finite volume and finite difference schemes for AMR can be found, e.g., in References [46,47,48,49,50,51,52]. For high order AMR with better than second order accurate finite volume and DG schemes in combination with time-accurate local time stepping (LTS), the reader is referred to References [37,53,54,55,56,57]. Since the main focus of this paper is not on AMR, at this point we can only give a very brief summary of existing AMR methods and codes for hyperbolic PDE, without pretending to be complete. The starting point of adaptive mesh refinement for hyperbolic conservation laws was of course the pioneering work of Berger et al. [58,59,60], who were the first to introduce a patched-based block-structured AMR method. Further developments are reported in References [61,62,63] based on the second order accurate wave-propagation algorithm of LeVeque. We also would like to draw the attention of the reader to the works of Quirk [64], Coirier and Powell [65] and Deiterding et al. [66,67]. For computational astrophysics, relevant AMR techniques have been documented, e.g., in References [68,69,70,71,72,73,74,75,76], including the RAMSES, PLUTO, NIRVANA, AMRVAC and BHAC codes. For a recent and more complete survey of high level AMR codes, the reader is referred to the review paper Reference [77].
In the following, the discrete solution of the PDE system in Equation (1) is denoted by u h and is defined in terms of tensor products of piecewise polynomials of degree N in each spatial direction. The discrete solution space is denoted by U h in the following. Since we adopt a discontinuous Galerkin (DG) finite element method, the numerical solution u h is allowed to jump across element interfaces, as in the context of finite volume schemes. Within each spatial control volume Ω i the discrete solution u h restricted to that control volume is written at time t n in terms of some nodal spatial basis functions Φ l ( x ) and some unknown degrees of freedom u ^ i , l n :
u h ( x , t n ) Ω i = l u ^ i , l Φ l ( x ) : = u ^ i , l n Φ l ( x ) ,
where l = ( l 1 , l 2 , l 3 ) is a multi-index and the spatial basis functions Φ l ( x ) = φ l 1 ( ξ ) φ l 2 ( η ) φ l 3 ( ζ ) are generated via tensor products of one-dimensional nodal basis functions φ k ( ξ ) on the reference interval [ 0 , 1 ] . The transformation from physical coordinates x Ω i to reference coordinates ξ = ξ , η , ζ [ 0 , 1 ] d is given by the linear mapping x = x i 1 2 Δ x i + ( ξ Δ x i , η Δ y i , ζ Δ z i ) T . For the one-dimensional basis functions φ k ( ξ ) we use the Lagrange interpolation polynomials passing through the Gauss-Legendre quadrature nodes ξ j of an N + 1 point Gauss quadrature formula. Therefore, the nodal basis functions satisfy the interpolation property φ k ( ξ j ) = δ k j , where δ k j is the usual Kronecker symbol, and the resulting basis is orthogonal. Furthermore, due to this particular choice of a nodal tensor-product basis, the entire scheme can be written in a dimension-by-dimension fashion, where all integral operators can be decomposed into a sequence of one-dimensional operators acting only on the N + 1 degrees of freedom in the respective dimension. For details on multi-dimensional quadrature, see the well-known book of Stroud [78].
In order to derive the ADER-DG method, we first multiply the governing PDE system in Equation (1) with a test function Φ k U h and integrate over the space-time control volume Ω i × [ t n ; t n + 1 ] . This leads to
t n t n + 1 Ω i Φ k Q t d x d t + t n t n + 1 Ω i Φ k · F ( Q ) + B ( Q ) · Q d x d t = t n t n + 1 Ω i Φ k S ( Q ) d x d t ,
with d x = d x d y d z . As already mentioned before, the discrete solution is allowed to jump across element interfaces, which means that the resulting jump terms have to be taken properly into account. In our scheme this is achieved via numerical flux functions (approximate Riemann solvers) and via the path-conservative approach that was developed by Castro and Parés in the finite volume context [79,80]. It has later been also extended to the discontinuous Galerkin finite element framework in References [35,81,82]. In classical Runge-Kutta DG schemes, only a weak form in space of the PDE is obtained, while time is still kept continuous, thus reducing the problem to a nonlinear system of ODE, which is subsequently integrated with standard ODE solvers in time. However, this requires MPI communication in each Runge-Kutta stage. Furthermore, each Runge-Kutta stage requires accesses to the entire discrete solution in memory. In the ADER-DG framework, a completely different paradigm is used. Here, higher order in time is achieved with the use of an element-local space-time predictor, denoted by q h ( x , t ) in the following, and which will be discussed in more detail later. Using Equation (3), integrating the first term by parts in time and integrating the flux divergence term by parts in space, taking into account the jumps between elements and making use of this local space-time predictor solution q h instead of Q , the weak formulation of Equation (4) can be rewritten as
Ω i Φ k Φ l d x u ^ i , l n + 1 u ^ i , l n + t n t n + 1 Ω i Φ k D q h , q h + · n d S d t t n t n + 1 Ω i Φ k · F ( q h ) d x d t + + t n t n + 1 Ω i Φ k B ( q h ) · q h d x d t = t n t n + 1 Ω i Φ k S ( q h ) d x d t ,
where the first integral leads to the element mass matrix, which is diagonal since our basis is orthogonal. The boundary integral contains the approximate Riemann solver and accounts for the jumps across element interfaces, also in the presence of non-conservative products. The third and fourth integral account for the smooth part of the flux and the non-conservative product, while the right hand side takes into account the presence of the algebraic source term. According to the framework of path-conservative schemes [35,79,80,82], the jump terms are defined via a path-integral in phase space between the boundary extrapolated states at the left q h and at the right q h + of the interface as follows:
D q h , q h + · n = 1 2 F ( q h + ) + F ( q h ) · n + 1 2 0 1 B ( ψ ) · n d s Θ q h + q h ,
with B · n = B 1 n 1 + B 2 n 2 + B 3 n 3 . Throughout this paper, we use the simple straight-line segment path
ψ = ψ ( q h , q h + , s ) = q h + s q h + q h , 0 s 1 .
In order to achieve exactly well-balanced schemes for certain classes of hyperbolic equations with non-conservative products and source terms, the segment path is not sufficient and a more elaborate choice of the path becomes necessary, see e.g., References [83,84,85,86]. In Equation (6) above the symbol Θ > 0 denotes an appropriate numerical dissipation matrix. Following References [35,87,88], the path integral that appears in Equation (6) can be simply evaluated via some sufficiently accurate numerical quadrature formulae. We typically use a three-point Gauss-Legendre rule in order to approximate the path-integral. For a simple path-conservative Rusanov-type method [35,89], the numerical dissipation matrix reads
Θ Rus = s max I , with s max = max λ ( q h ) , λ ( q h + ) ,
where I denotes the identity matrix and s max is the maximum wave speed (eigenvalue λ of matrix A · n ) at the element interface. In order to reduce numerical dissipation, one can use better Riemann solvers, such as the Osher-type schemes proposed in References [88,90], or the recent extension of the original HLLEM method of Einfeldt and Munz [91] to general conservative and non-conservative hyperbolic systems recently put forward in Reference [92]. The choice of the approximate Riemann solver and therefore of the viscosity matrix Θ completes the numerical scheme in Equation (5). In the next subsection, we shortly discuss the computation of the element-local space-time predictor q h , which is a key ingredient of our high order accurate and communication-avoiding ADER-DG schemes.

2.2. Space-Time Predictor and Suitable Initial Guess

As already mentioned previously, the element-local space–time predictor is an important key feature of ADER-DG schemes and is briefly discussed in this section. The computation of the predictor solution q h ( x , t ) is based on a weak formulation of the governing PDE system in space–time and was first introduced in References [34,93,94]. Starting from the known solution u h ( x , t n ) at time t n and following the terminology of Harten et al. [95], we solve a so-called Cauchy problem in the small, i.e., without considering the interaction with the neighbor elements. In the ENO scheme of Harten et al. [95] and in the original ADER approach of Toro and Titarev [40,41,42] the strong differential form of the PDE was used, together with a combination of Taylor series expansions and the so-called Cauchy-Kovalewskaya procedure. The latter is very cumbersome, or becomes even unfeasible for very complicated nonlinear hyperbolic PDE systems, since it requires a lot of analytic manipulations of the governing PDE system in order to replace time derivatives with known space derivatives at time t n . This is achieved by successively differentiating the governing PDE system with respect to space and time and inserting the resulting terms into the Taylor series. For an explicit example of the Cauchy–Kovalewskaya procedure applied to the three-dimensional Euler equations of compressible gas dynamics and the MHD equations, see References [96,97]. Instead, the local space–time discontinuous Galerkin predictor, introduced in References [34,93,94], requires only pointwise evaluations of the fluxes, source terms and non-conservative products, for element Ω i the predictor solution q h is now expanded in terms of a local space–time basis
q h ( x , t ) Ω i s t = l θ l ( x , t ) q ^ l i : = θ l ( x , t ) q ^ l i ,
with the multi-index l = ( l 0 , l 1 , l 2 , l 3 ) and where the space–time basis functions θ l ( x , t ) = φ l 0 ( τ ) φ l 1 ( ξ ) φ l 2 ( η ) φ l 3 ( ζ ) are again generated from the same one-dimensional nodal basis functions φ k ( ξ ) as before, i.e., the Lagrange interpolation polynomials of degree N passing through N + 1 Gauss-Legendre quadrature nodes. The spatial mapping x = x ( ξ ) is also the same as before and the coordinate time is mapped to the reference time τ [ 0 , 1 ] via t = t n + τ Δ t . Multiplication of the PDE system in Equation (1) with a test function θ k and integration over the space–time control volume Ω i s t = Ω i × [ t n , t n + 1 ] yields the following weak form of the governing PDE, which is different from Equation (4), because now the test and basis functions are both time dependent:
t n t n + 1 Ω i θ k ( x , t ) q h t d x d t + t n t n + 1 Ω i θ k ( x , t ) · F ( Q ) + B ( q h ) · q h d x d t = t n t n + 1 Ω i θ k ( x , t ) S ( q h ) d x d t .
Since we are only interested in an element local predictor solution, i.e., without considering interactions with the neighbor elements we do not yet take into account the jumps in q h across the element interfaces, because this will be done in the final corrector step of the ADER-DG scheme in Equation (5). Instead, we introduce the known discrete solution u h ( x , t n ) at time t n . For this purpose, the first term is integrated by parts in time. This leads to
Ω i θ k ( x , t n + 1 ) q h ( x , t n + 1 ) d x t n t n + 1 Ω i t θ k ( x , t ) q h ( x , t ) d x d t Ω i θ k ( x , t n ) u h ( x , t n ) d x = t n t n + 1 Ω i θ k ( x , t ) · F ( q h ) d x d t + t n t n + 1 Ω i θ k ( x , t ) S ( q h ) B ( q h ) · q h d x d t .
Using the local space–time ansatz (9), Equation (11) becomes an element-local nonlinear system for the unknown degrees of freedom q ^ i , l of the space–time polynomials q h . The solution of Equation (11) can be found via a simple and fast converging fixed point iteration (a discrete Picard iteration) as detailed e.g., in References [34,98]. For linear homogeneous systems, the discrete Picard iteration converges in a finite number of at most N + 1 steps, since the involved iteration matrix is nilpotent, see Reference [99].
However, we emphasize that the choice of an appropriate initial guess q h 0 ( x , t ) for q h ( x , t ) is of fundamental importance to obtain a faster convergence and thus a computationally more efficient scheme. For this purpose, one can either use an extrapolation of q h from the previous time interval [ t n 1 , t n ] , as suggested e.g., in Reference [100], or one can employ a second-order accurate MUSCL-Hancock-type approach, as proposed in Reference [98], which is based on discrete derivatives computed at time t n . The initial guess is most conveniently written in terms of a Taylor series expansion of the solution in time, where then suitable approximations of the time derivatives are computed. In the following we introduce the operator
𝓛 ( u h ( x , t n ) ) = S ( u h ( x , t n ) ) · F ( u h ( x , t n ) ) B u h ( x , t n ) · u h ( x , t n ) ,
which is an approximation of the time derivative of the solution. The second-order accurate MUSCL-type initial guess [98] then reads
q h 0 ( x , t ) = u h ( x , t n ) + t t n 𝓛 ( u h ( x , t n ) ) ,
while a third-order accurate initial guess for q h ( x , t ) is given by
q h 0 ( x , t ) = u h ( x , t n ) + t t n k 1 + 1 2 t t n 2 k 2 k 1 Δ t .
Here, we have used the abbreviations k 1 : = 𝓛 u h ( x , t n ) and k 2 : = 𝓛 u h ( x , t n ) + Δ t k 1 . For an initial guess of even higher order of accuracy, it is possible to use the so-called continuous extension Runge-Kutta (CERK) schemes of Owren and Zennaro [101]; see also Reference [102] for the use of CERK time integrators in the context of high-order discontinuous Galerkin finite element methods. If an initial guess with polynomial degree N 1 in time is chosen, it is sufficient to use one single Picard iteration to solve Equation (11) to the desired accuracy.
At this point, we make some comments about a suitable data-layout for high order ADER-DG schemes. In order to compute the discrete derivative operators needed in the predictor Equation (11), especially for the computation of the discrete gradient q h , it is very convenient to use an array-of-struct (AoS) data structure. In this way, the first or fastest-running unit-stride index is the one associated with the m quantities contained in the vector Q , while the other indices are associated with the space–time degrees of freedom, i.e., we arrange the data contained in the set of degrees of freedom q ^ l i as q ^ v , l 1 , l 2 , l 3 , l 0 i , with 1 v m and 1 l k N + 1 . The discrete derivatives in space and time direction can then be simply computed by the multiplication of a subset of the degrees of freedom with the transpose of a small ( N + 1 ) × ( N + 1 ) matrix D k l from the right, which reads
D k l = 1 h 0 1 ϕ k ( ξ ) ϕ m ( ξ ) d ξ 1 0 1 ϕ m ( ξ ) ϕ l ( ξ ) ξ d ξ ,
where h is the respective spatial or temporal step size in the corresponding coordinate direction, i.e., either Δ x i , Δ y i , Δ z i or Δ t . For this purpose, the optimized library for small matrix multiplications libxsmm can be employed on Intel machines, see References [103,104,105] for more details. However, the AoS data layout is not convenient for vectorization of the PDE evaluation in ADER-DG schemes, since vectorization of the fluxes, source terms and non-conservative products should preferably be done over the integration points l. For this purpose, we convert the AoS data layout on the fly into a struct-of-array (SoA) data layout via appropriate transposition of the data and then call the physical flux function F ( q h ) as well as the combined algebraic source term and non-conservative product contained in the expression S ( q h ) B ( q h ) · q h simultaneously for a subset of VECTORLENGTH space–time degrees of freedom, where VECTORLENGTH is the length of the AVX registers of modern Intel Xeon CPUs, i.e., 4 for those with the old 256 bit AVX and AVX2 instruction sets (Sandy Bridge, Haswell, Broadwell) and 8 for the latest Intel Xeon Scalable CPUs with 512 bit AVX instructions (Skylake). The result of the vectorized evaluation of the PDE, which is still in SoA format, is then converted back to the AoS data layout using appropriate vectorized shuffle commands.
The element-local space–time predictor is arithmetically very intensive, but at the same time it is also by construction cache-blocking. While in traditional RKDG schemes, each Runge-Kutta stage requires touching all spatial degrees of freedom of the entire domain once per Runge-Kutta stage, in our ADER-DG approach the spatial degrees of freedom u h need to be loaded only once per element and time step, and from those all space–time degrees of freedom of q h are computed. Ideally, this procedure fits entirely into the L3 cache or even into the L2 cache of the CPU, at least up to a certain critical polynomial degree N c = N c ( m ) , which is a function of the available L3 or L2 cache size, but also of the number of quantities m to be evolved in the PDE system.
Last but not least, it is important to note that it is possible to hide the entire MPI communication that is inevitably needed on distributed memory supercomputers behind the space–time predictor. For this purpose, the predictor is first invoked on the MPI boundaries of each CPU, which then immediately sends the boundary-extrapolated data q h and q h + to the neighbor CPUs. While the messages containing the data of these non-blocking MPI send and receive commands are sent around, each CPU can compute the space–time predictor of purely interior elements that do not need any MPI communication.
For an efficient task-based formalism used within ExaHyPE in the context of shared memory parallelism, see Reference [106]. This completes the description of the efficient implementation of the unlimited ADER-DG schemes used within the ExaHyPE engine.

2.3. A Posteriori Subcell Finite Volume Limiter

In regions where the discrete solution is smooth, there is indeed no need for using nonlinear limiters. However, in the presence of shock waves, discontinuities or strong gradients, and taking into account the fact that even a smooth signal may become non-smooth on the discrete level if it is underresolved on the grid, we have to supplement our high order unlimited ADER-DG scheme described above with a nonlinear limiter.
In order to build a simple, robust and accurate limiter, we follow the ideas outlined in References [36,37,38,107], where a novel a posteriori limiting strategy for ADER-DG schemes was developed, based on the ideas of the MOOD paradigm introduced in References [108,109,110,111] in the finite volume context. In a first run, the unlimited ADER-DG scheme is used and produces a so-called candidate solution, denoted by u h * ( x , t n + 1 ) in the following. This candidate solution is then checked a posteriori against several physical and numerical detection criteria. For example, we require some relevant physical quantities of the solution to be positive (e.g., pressure and density), we require the absence of floating point errors (NaN) and we impose a relaxed discrete maximum principle (DMP) in the sense of polynomials, see Reference [36]. As soon as one of these detection criteria is not satisfied, a cell is marked as troubled zone and is scheduled for limiting.
A cell Ω i that has been marked for limiting is now split into ( 2 N + 1 ) d finite volume subcells, which are denoted by Ω i , s . They satisfy Ω i = s Ω i , s . Note that this very fine division of a DG element into finite volume subcells does not reduce the time step of the overall ADER-DG scheme, since the CFL number of explicit DG schemes scales with 1 / ( 2 N + 1 ) , while the CFL number of finite volume schemes (used on the subgrid) is of the order of unity. The discrete solution in the subcells Ω i , s is represented at time t n in terms of piecewise constant subcell averages u ¯ i , s n , i.e.,
u ¯ i , s n = 1 | Ω i , s | Ω i , s Q ( x , t n ) d x .
These subcell averages are now evolved in time with a second or third order accurate finite volume scheme, which actually looks very similar to the previous ADER-DG scheme in Equation (5), with the difference that now the test function is unity and the spatial control volumes Ω i are replaced by the sub-volumes Ω i , s :
u ¯ i , s n + 1 u ¯ i , s n + t n t n + 1 Ω i , s D q h , q h + · n d S d t + t n t n + 1 Ω i , s B ( q h ) · q h d x d t = t n t n + 1 Ω i , s S ( q h ) d x d t .
Here we use again a space–time predictor solution q h , but which is now computed from an initial condition given by a second order TVD reconstruction polynomial or from a WENO [26] or CWENO reconstruction [51,112,113,114] polynomial w h ( x , t n ) computed from the cell averages u ¯ i , s n via an appropriate reconstruction operator. The predictor is either computed via a standard second order MUSCL–Hancock-type strategy, or via the space–time DG approach of Equation (11), but where the initial data u h ( x , t n ) are now replaced by w h ( x , t n ) and the spatial control volumes Ω i are replaced by the subcells Ω i , s .
Once all subcell averages u ¯ i , s n + 1 inside a cell Ω i have been computed according to Equation (17), the limited DG polynomial u h ( x , t n + 1 ) at the next time level is obtained again via a classical constrained least squares reconstruction procedure requiring
1 | Ω i , s | Ω i , s u h ( x , t n + 1 ) d x = u ¯ i , s n + 1 Ω i , s Ω i , and Ω i u h ( x , t n + 1 ) d x = Ω i , s Ω i | Ω i , s | u ¯ n + 1 i , s .
Here, the second relation is a constraint and means conservation at the level of the control volume Ω i . This completes the brief description of the subcell finite volume limiter used here.

3. Some Examples of Typical PDE Systems Solved With the ExaHyPE Engine

The great advantage of ExaHyPE over many existing PDE solvers is its great flexibility and versatility for the solution of a very wide class of hyperbolic PDE systems in Equation (1). The implementation of the numerical method and the definition of the PDE system to be solved are completely independent of each other. The compute kernels are provided either as generic or as an optimized implementation for the general PDE system given by Equation (1), while the user only needs to provide particular implementations of the functions F ( Q ) , B ( Q ) and ( Q ) . It is obviously also possible to drop terms that are not needed. This allows to solve all the PDE systems listed below in one single software package. In all numerical examples shown below, we have used a CFL condition of the type
Δ t α | λ max x | Δ x + | λ max y | Δ y + | λ max z | Δ z ,
where Δ x , Δ y and Δ z are the mesh spacings and | λ max x | , | λ max x | and | λ max x | are the maximal absolute values of the eigenvalues (wave speeds) of the matrix A · n in x, y and z direction, respectively. The coefficient α < 1 / ( 2 N + 1 ) can be obtained via a numerical von Neumann stability analysis and is reported for some relevant N in Reference [34].

3.1. The Euler Equations of Compressible Gas Dynamics

The Euler equations of compressible gas dynamics are among the simplest nonlinear systems of hyperbolic conservation laws. They only involve a conservative flux F ( Q ) and read
t ρ ρ v ρ E + · ρ v ρ v v + p I v ρ E + p = 0 .
Here, ρ is the mass density, v is the fluid velocity, ρ E is the total energy density and p is the fluid pressure, which is related to ρ , ρ E and v via the so-called equation of state (EOS). In the following we show the computational results for two test problems. The first one is the smooth isentropic vortex test case first proposed in Reference [115] and also used in Reference [36], which has an exact solution and is therefore suitable for a numerical convergence study. Some results of Reference [36] are summarized in Table 1 below, where N x denotes the number of cells per space dimensions. From the results one can conclude that the high order ADER-DG schemes converge with the designed order of accuracy in both space and time. In order to give a quantitative assessment for the cost of the scheme, we define and provide the TDU metric, which is the cost per degree of freedom update per CPU core, see also Reference [34]. The TDU metric is easily computed by dividing the measured wall clock time (WCT) of a simulation by the number of elements per CPU core and time steps carried out, and by the number of spatial degrees of freedom per element, i.e., ( N + 1 ) d . With the appropriate initial guess and AVX 512 vectorization of the code discussed in the previous section, the cost for updating one single degree of freedom for a fourth order ADER-DG scheme ( N = 3 ) for the 3D compressible Euler equations is as low as TDU = 0.25 µs when using one single CPU core of a new Intel i9-7900X Skylake test workstation with 3.3 GHz nominal clock speed, 32 GB of RAM and a total number of 10 CPU cores. This cost metric can be directly compared with the cost to update one single point or control volume of existing finite difference and finite volume schemes.
In the following we show the results obtained with an ADER-DG scheme using piecewise polynomials of degree N = 9 for a very stringent test case, which is the so-called Sedov blast wave problem detailed in References [100,107,116,117]. It consists in an explosion propagating in a zero pressure gas, leading to an infinitely strong shock wave. In our setup, the outer pressure is set to 10 14 , i.e., close to machine zero. In order to get a robust numerical scheme, it is useful to perform the reconstruction step in the subcell finite volume limiter as well as the space–time predictor of the ADER-DG scheme in primitive variables, see Reference [100]. The computational results obtained are shown in Figure 1, where we can observe a very good agreement with the reference solution. One furthermore can see that the discrete solution respects the circular symmetry of the problem and the a posteriori subcell limiter is only acting in the vicinity of the shock wave.

3.2. A Novel Diffuse Interface Approach for Linear Seismic Wave Propagation in Complex Geometries

Seismic wave propagation problems in complex 3D geometries are often very challenging due to the geometric complexity. Standard approaches either use regular curvilinear boundary-fitted meshes, or unstructured tetrahedral or hexahedral meshes. In all cases, a certain amount of user interaction for grid generation is required. Furthermore, the geometric complexity can have a negative impact on the admissible time step size due to the CFL condition, since the mesh generator may create elements with very bad aspect ratio, so-called sliver elements. In the case of regular curvilinear grids, the Jacobian of the mapping may become ill-conditioned and thus reduce the admissible time step size. In Reference [118] a novel diffuse interface approach has been forwarded, where only the definition of a scalar volume fraction function α is required, where α = 1 is set inside the solid medium, and α = 0 in the surrounding gas or vacuum. The governing PDE system proposed in Reference [118] reads
σ t E ( λ , μ ) · 1 α ( α v ) + 1 α E ( λ , μ ) · v α = S σ ,
α v t α ρ · σ 1 ρ σ α = S v ,
α t = 0 , λ t = 0 , μ t = 0 , ρ t = 0 ,
and clearly falls into the class of PDE systems described by Equation (1). Here, σ denotes the symmetric stress tensor, v is the velocity vector, α [ 0 , 1 ] is the volume fraction, λ and μ are the Lamé constants and ρ is the density of the solid medium. The elasticity tensor E is a function of λ and μ and relates stress and strain via the Hooke law. The last four quantities obey trivial evolution equations, which state that these parameters remain constant in time. However, they still need to be properly included in the evolution system, since they have an influence on the solution of the Riemann problem. An analysis of the eigenstructure of Equations (21)–(23) shows that the eigenvalues are all real and are independent of the volume fraction function α . Furthermore, the exact solution of a generic Riemann problem with α = 1 on the left and α = 0 on the right yields the free surface boundary condition σ · n = 0 at the interface, see Reference [118] for details. In this new approach, the mesh generation problem can be fully avoided, since all that is needed is the specification of the scalar volume fraction function α , which is set to unity inside the solid and to zero outside. A realistic 3D wave propagation example based on real DTM data of the Mont Blanc region is shown in Figure 2 and Figure 3, where the 3D contour colors of the wave field as well as a set of seismogram recordings in two receiver points are reported. For this simulation, a uniform Cartesian base-grid of 80 3 elements was used, together with one level of AMR refinement close to the free surface boundary determined by the DTM model. A fourth order ADER-DG scheme ( N = 3 ) has been used in this simulation. We stress that the entire setup of the computational model in the diffuse interface approach is completely automatic, and no manual user interaction was required. The reference solution was obtained with a high order ADER-DG scheme of the same polynomial degree N = 3 using an unstructured boundary-fitted tetrahedral mesh [119] of similar spatial resolution, containing a total of 1,267,717 elements. We observe an excellent agreement between the two simulations, which were obtained with two completely different PDE systems on two different grid topologies.

3.3. The Unified Godunov-Peshkov-Romenski Model of Continuum Mechanics (GPR)

A major achievement of ExaHyPE was the first successful numerical solution of the unified first order symmetric hyperbolic and thermodynamically compatible Godunov–Peshkov–Romenski (GPR) model of continuum mechanics, see References [17,18]. The GPR model is based on the seminal papers by Godunov and Romenski [14,15,120] on inviscid symmetric hyperbolic systems. The dissipative mechanisms, which allow to model both plastic solids as well as viscous fluids within one single set of equations were added later in the groundbreaking work of Peshkov and Romenski in Reference [16]. The GPR model is briefly outlined below, while for all details the interested reader is referred to References [16,17,18]. The governing equations read
ρ t + x k ρ u k = 0 ,
ρ u i t + x k ρ u i u k + p δ i k σ i k = 0 ,
A i k t + A i m u m x k + u j A i k x j A i j x k = ψ i k θ 1 ( τ 1 ) ,
ρ J i t + x k ρ J i u k + T δ i k = 1 θ 2 ( τ 2 ) ρ H i ,
ρ E t + x k u k ρ E + u i p δ i k σ i k + q k = 0 .
Furthermore, the system is also endowed with an entropy inequality, see Reference [17]. Here, ρ is the mass density, [ u i ] = v = ( u , v , w ) is the velocity vector, p is a non-equilibrium pressure, [ A i k ] = A is the distorsion field, [ J i ] = J is the thermal impulse vector, T is the temperature and ρ E is the total energy density that is defined according to Reference [17] as
ρ E = ρ e + 1 2 ρ v 2 + 1 4 ρ c s 2 tr ( dev ) G T ( dev G ) + 1 2 ρ α 2 J 2
in terms of the specific internal energy e = e ( p , ρ ) given by the usual equation of state (EOS), the kinetic energy, the energy stored in the medium due to deformations and in the thermal impulse. Furthermore, G = A T A is a metric tensor induced by the distortion field A , which allows to measure distances and thus deformations in the medium, c s is the shear sound speed and α is a heat wave propagation speed; the symbol dev G = G 1 3 tr G indicates the trace-free part of the metric tensor G. From the definition of the total energy Equation (29) and the relations H i = E J i , ψ i k = E A i j , σ i k = ρ A m i E A m k , T = E S and q k = E S E J k the shear stress tensor and the heat flux read σ = ρ c s 2 G dev G and q = α 2 T J . It can furthermore be shown via formal asymptotic expansion [17] that via an appropriate choice of θ 1 and θ 2 in the stiff relaxation limit τ 1 0 and τ 2 0 , the stress tensor and the heat flux tend to those of the compressible Navier-Stokes equations
σ μ v + v T 2 3 · v I and q λ T ,
with transport coefficients μ = μ ( τ 1 , c s ) and λ = λ ( τ 2 , α ) related to the relaxation times τ 1 and τ 2 and to the propagation speeds c s and α , respectively. For a complete derivation, see References [17,18]. In the opposite limit τ 1 the model describes an ideal elastic solid with large deformations. This means that elastic solids as well as viscous fluids can be described with the aid of the same mathematical model. At this point we stress that numerically we always solve the unified first order hyperbolic PDE system in Equations (24)–(28), even in the stiff relaxation limit in Equation (30), when the compressible Navier-Stokes-Fourier system is retrieved asymptotically. We emphasize that we never need to discretize any parabolic terms, since the hyperbolic system in Equations (24)–(28) with algebraic relaxation source terms fits perfectly into the framework of Equation (1).
In the Figure 4 we show numerical results obtained in Reference [17] for a viscous heat conducting shock wave and the comparison with the exact solution of the compressible Navier-Stokes equations.

3.4. The Equations of Ideal General Relativistic Magnetohydrodynamics (GRMHD)

A very challenging PDE system is given by the equations of ideal general relativistic magnetohydrodynamics (GRMHD). The governing PDE are a result of the Einstein field equations and can be written in compact covariant notation as follows:
μ T μ ν = 0 , and μ * F μ ν = 0 and μ ( ρ u μ ) = 0 ,
where μ is the usual covariant derivative operator, T μ ν is the energy-momentum tensor, * F μ ν is the Faraday tensor and u μ is the four-velocity. The compact equations above can be expanded into a so-called 3+1 formalism, which can be cast into the form of Equation (1), see References [57,121,122] for more details. The final evolution system involves nine field variables plus the 10 quantities of the background space–time, which is supposed to be stationary here. A numerical convergence study for the large amplitude Alfvén wave test problem described in Reference [122] solved in the domain Ω = [ 0 , 2 π ] 3 up to t = 1 and carried out with high order ADER-DG schemes in Reference [57] is reported in the Table 2 below, where we also show a direct comparison with high order Runge-Kutta discontinuous Galerkin schemes. We observe that the ADER-DG schemes are competitive with RKDG methods, even for this very complex system of hyperbolic PDE. The results reported in Table 2 refer to the non-vectorized version of the code. Further significant performance improvements are expected from a carefully vectorized implementation of the GRMHD equations, in particular concerning the vectorization of the cumbersome conversion of the vector of conservative variables to the vector of primitive variables, i.e., the function V = V ( Q ) . For the GRMHD system V cannot be computed analytically in terms of Q , but requires the iterative solution of one nonlinear scalar algebraic equation together with the computation of the roots of a third order polynomial, see Reference [122] for details. In our vectorized implementation of the PDE, we have therefore in particular vectorized the primitive variable recovery via a direct implementation in AVX intrinsics. We have furthermore made use of careful auto-vectorization via the compiler for the evaluation of the physical flux function and for the non-conservative product. Thanks to this vectorization effort, on one single CPU core of an Intel i9-7900X Skylake test workstation with 3.3 GHz nominal clock frequency and using AVX 512 the CPU time necessary for a single degree of freedom update (TDU) for a fourth order ADER-DG scheme ( N = 3 ) could be reduced to TDU = 2.3 µs for the GRMHD equations in three space dimensions.
As second test problem we present the results obtained for the Orszag-Tang vortex system in flat Minkowski spacetime, where the GRMHD equations reduce to the special relativistic MHD equations. The initial condition is given by
ρ , u , v , w , p , B x , B y , B z = 1 , 3 4 2 sin y , 3 4 2 sin x , 0 , 1 , sin y , sin 2 x , 0 ,
and we set the adiabatic index to Γ = 4 / 3 . The computational domain is Ω = [ 0 , 2 π ] 2 and is discretized with a dynamically adaptive AMR grid. For this test we chose the P 5 version of the ADER-DG scheme with FV subcell limiter and the rest mass density as indicator function for AMR, i.e., φ ( Q ) = ρ . Figure 5 shows 1D cuts through the numerical solution at time t = 2 and at y = 0.01 , while Figure 6 shows the numerical results for the AMR-grid with limiter-status map (blue cells are unlimited, while limited cells are highlighted in red), together with Schlieren images for the rest-mass density at time t = 2 . The same simulation has been repeated with different refinement estimator functions χ that tell the AMR algorithm where and when to refine and to coarsen the mesh: (i) A simple first order derivative estimator χ 1 based on discrete gradients of the indicator function φ ( Q ) , (ii) the classical second order derivative estimator χ 2 based on Reference [123], (iii) a novel estimator χ 3 based on the action of the a posteriori subcell finite volume limiter, i.e., the mesh is refined where the limiter is active (iv) a multi-resolution estimator χ 4 based on the difference in L norm of the discrete solution on two different refinement levels and 1 . The reference solution is obtained on a uniform fine grid corresponding to the finest refinement level, i.e., a uniform composed of 270 × 270 elements. The results shown in Figure 6 clearly show that the numerical results obtained by means of different refinement estimator functions are comparable with each other and thus the proposed AMR algorithm is robust with respect to the particular choice of the mesh.
As a last test case we simulate a stationary neutron star in three space dimensions using the Cowling approximation, i.e., assuming a fixed static background spacetime. The initial data for the matter and the spacetime are both compatible with the Einstein field equations and are given by the solution of the Tolman–Oppenheimer–Volkoff (TOV) equations, which constitute a nonlinear ODE system in the radial coordinate that can be numerically solved up to any precision at the aid of a fourth order Runge-Kutta scheme using a very fine grid. We setup a stable nonrotating TOV star without magnetic field and with central rest mass density ρ ( 0 , 0 ) = 1.28 × 10 3 and adiabatic exponent Γ = 2 in a computational domain Ω = [ 10 , + 10 ] 3 discretized with a fourth order ADER-DG scheme ( N = 3 ) using 32 3 elements, which corresponds to 128 3 spatial degrees of freedom. The pressure in the atmosphere outside the compact object is set to p a t m = 10 13 . We run the simulation until a final time of t = 1000 and measure the L error norms of the rest mass density and the pressure against the exact solution, which is given by the initial condition. The error measured at t = 1000 for the rest mass density is L ( ρ ) = 1.553778 × 10 5 while the error for the pressure is L ( p ) = 1.605334 × 10 7 . The simulation was carried out with the vectorized version of the code on 512 CPU cores of the SuperMUC phase 2 system (based on AVX2) and required only 3010 s of wallclock time. The same simulation with the established finite difference GR code WhiskyTHC [124] required 8991 s of wall clock time on the same machine with the same spatial mesh resolution and the same number of CPU cores. The time series of the relative error of the central rest mass density in the origin of the domain is plotted in the left panel of Figure 7. At the final time t = 1000 , the relative error of the central rest mass density is still below 0.1%. In the right panel of Figure 7 we show the contour surfaces of the pressure at the final time t = 1000 . In Figure 8 we show a 1D cut along the x axis, comparing the numerical solution at time t = 1000 with the exact one. We note that the numerical scheme is very accurate, but it is not well-balanced for the GRMHD equations, i.e., the method cannot preserve the stationary equilibrium solution of the TOV equations exactly at the discrete level. Therefore, further work along the lines of research reported recently in Reference [86] for the Euler equations with Newtonian gravity are needed, extending the framework of well-balanced methods [79,80,125] also to general relativity. Finally, in Figure 9 we compare the exact and the numerical solution at time t = 1000 in the xy plane.

3.5. A Strongly Hyperbolic First Order Reduction of the CCZ4 Formulation of the Einstein Field Equations (FO-CCZ4)

The last PDE system under consideration here are the Einstein field equations that describe the evolution of dynamic spacetimes. Here we consider the so-called CCZ4 formulation [126], which is based on the Z4 formalism that takes into account the involutions (stationary differential constraints) inherent in the Einstein equations via an augmented system similar to the generalized Lagrangian multiplier (GLM) approach of Dedner et al. [127] that takes care of the stationary divergence-free constraint of the magnetic field in the MHD equations. In compact covariant notation the undamped Z4 Einstein equations in vacuum, which can be derived from the Einstein–Hilbert action integral associated with the Z4 Lagrangian 𝓛 = g μ ν R μ ν + 2 μ Z ν , read
R μ ν + ( μ Z ν ) = 0 ,
where g μ ν is the 4-metric of the spacetime, R μ ν is the 4-Ricci tensor and the 4-vector Z ν accounts for the stationary constraints of the Einstein equations, as already mentioned before. After introducing the usual 3+1 ADM split of the 4-metric as
d s 2 = α 2 d t 2 + γ i j d x i + β i d t d x j + β j d t ,
the equations can be cast into a time-dependent system of 25 partial differential equations that involve first order derivatives in time and both first and second order derivatives in space, see Reference [126]. Nevertheless, the system is not dissipative, but a rather unusual formulation of a wave equation, see Reference [128]. In the expression above, α denotes the so-called lapse, β i is the spatial shift vector and γ i j is the spatial metric. In the original form presented in Reference [126], the PDE system does not fit into the formalism given by Equation (1). After the introduction of 33 auxiliary variables, which are the spatial gradients of some of the 25 primary evolution quantities, it is possible to derive a first order reduction of the system that contains a total of 58 evolution quantities. However, a naive procedure of converting the original second order evolution system into a first order system leads only to a weakly hyperbolic formulation, which is not suitable for numerical simulations since the initial value problem is not well posed in this case. Only after adding suitable first and second order ordering constraints, which arise from the definition of the auxiliary variables, it is possible to obtain a provably strongly hyperbolic and thus well-posed evolution system, denoted by FO-CCZ4 in the following. For all details of the derivation, the strong hyperbolicity proof and numerical results achieved with high order ADER-DG schemes, the reader is referred to Reference [129]. In order to give an idea about the complexity of the Einstein field equations, it should be mentioned that one single evaluation of the FO-CCZ4 system requires about 20,000 floating point operations! In order to obtain still a computationally efficient implementation, the entire PDE system has been carefully vectorized using blocks of the size VECTORLENGTH, so that in the end a level of 99.9% of vectorization of the code has been reached. Using a fourth order ADER-DG scheme ( N = 3 ) the time per degree of freedom update (TDU) metric per core on a modern workstation with Intel i9-7900X CPU that supports the novel AVX 512 instructions is TDU = 4.7 µs.

4. Strong MPI Scaling Study for the FO-CCZ4 System

A major focus of this paper is the efficient implementation of ADER-DG schemes for high performance computing (HPC) on massively parallel distributed memory supercomputers. For this purpose, we have very recently carried out a systematic study of the strong MPI scaling efficiency of our new high order fully-discrete one-step ADER-DG schemes on the Hazel Hen supercomputer of the HLRS center in Stuttgart, Germany, using from 720 up to 180,000 CPU cores. We have furthermore carried out a systematic comparison with conventional Runge-Kutta DG schemes using the SuperMUC phase 1 system of the LRZ center in Munich, Germany.
As already discussed before, the particular feature of ADER-DG schemes compared to traditional Runge-Kutta DG schemes (RKDG) is that they are intrinsically communication-avoiding and cache-blocking, which makes them particularly well suited for modern massively parallel distributed memory supercomputers. As governing PDE system for the strong scaling test the novel first-order reduction of the CCZ4 formulation of the 3+1 Einstein field equations has been been adopted [129]. We recall that FO-CCZ4 is a very large nonlinear hyperbolic PDE system that contains 58 evolution quantities.
The first strong scaling study on the SuperMUC phase 1 system uses 64 to 64,000 CPU cores. The test problem was the gauge wave problem [129] setup on the 3D domain Ω = [ 0.5 , 0.5 ] 3 . For the test we have compared a fourth order ADER-DG scheme ( N = 3 ) with a fourth order accurate RKDG scheme on a uniform Cartesian grid composed of 120 3 elements. It has to be stressed, that when using 64,000 CPU cores for this setup each CPU has to update only 3 3 = 27 elements. The wall clock time as a function of the used number of CPU cores (nCPU) and the obtained parallel efficiency with respect to an ideal linear scaling are reported in the left panel of Figure 10. We find that ADER-DG schemes provide a better parallel efficiency than RKDG schemes, as expected.
The second strong scaling study has been performed on the Hazel-Hen supercomputer, using 720 to 180,000 CPU cores. Again we have used a fourth order accurate ADER-DG scheme ( N = 3 ), this time using a uniform grid of 200 × 180 × 180 elements, solving again the 3D gauge wave benchmark problem detailed in Reference [129]. The measured wall-clock-times (WCT) as a function of the employed number of CPU cores, as well as the corresponding parallel scaling-efficiency are shown in Figure 10. The results depicted in Figure 10 clearly show that our new ADER-DG schemes scale very well up to 90,000 CPU cores with a parallel efficiency greater than 95%, and up to 180,000 cores with a parallel efficiency that is still greater than 93%. Furthermore, the code was instrumented with manual FLOP counters in order to measure the floating point performance quantitatively. The full machine run on 180,000 CPU cores of Hazel Hen took place on 7 May 2018. During the run, each core has provided an average performance of 8.2 GFLOPS, leading to a total of 1.476 PFLOPS of sustained performance. To our knowledge, this was the largest test run ever carried out with high order ADER-DG schemes for nonlinear hyperbolic systems of partial differential equations. For large runs with sustained petascale performance of ADER-DG schemes for linear hyperbolic PDE systems on unstructured tetrahedral meshes, see Reference [105].

5. Conclusions

In this paper we have presented an efficient implementation of high order ADER-DG schemes on modern massively parallel supercomputers using the ExaHyPE engine. The key ingredients are the communication-avoiding and cache-blocking properties of ADER-DG, together with an efficient vectorization of the high level user functions that provide the evaluation of the physical fluxes F ( Q ) , of the non-conservative products B ( Q ) · Q and of the algebraic source terms S ( Q ) . The engine is highly versatile and flexible and allows to solve a very broad spectrum of different hyperbolic PDE systems in a very efficient and highly scalable manner. In order to support this claim, we have provided a rather large set of different numerical examples solved with ADER-DG schemes. To show the excellent parallel scalability of the ADER-DG method, we have provided strong scaling results on 64 to 64,000 CPU cores including a detailed and quantitative comparison with RKDG schemes. We have furthermore shown strong scaling results of the vectorized ADER-DG implementation for the FO-CCZ4 formulation of the Einstein field equations using 720 to 180,000 CPU cores of the Hazel Hen supercomputer at the HLRS in Stuttgart, Germany, where a sustained performance of more than one petaflop has been reached.
Future research in ExaHyPE will concern an extension of the GPR model to full general relativity, able to describe nonlinear elastic and plastic solids as well as viscous and ideal fluids in one single governing PDE system. We furthermore plan an implementation of the FO-CCZ4 system [129] directly based on AVX intrinsics, in order to further improve the performance of the scheme and to reduce computational time. The final aim of our developments are the simulation of ongoing nonlinear dynamic rupture processes during earthquakes, as well as the inspiral and merger of binary neutron star systems and the associated generation of gravitational waves. Although both problems seem to be totally different and unrelated, it is indeed possible to write the mathematical formulation of both applications under the same form of a hyperbolic system given by Equation (1) and thus to solve both problems within the same computer software.

Author Contributions

Conceptualization and Methodology: M.D., M.B. and T.W.; Software: M.D., T.W., F.F. and M.T.; Validation: F.F. and M.T.; Strong MPI Scaling Study: F.F.; Writing original draft and editing: M.D.; Supervision and Project Administration: M.B., T.W. and M.D.; Funding Acquisition: M.B., T.W. and M.D.

Funding

This research was funded by the European Union’s Horizon 2020 Research and Innovation Programme under the project ExaHyPE, grant no. 671698 (call FETHPC-1-2014). The first three authors also acknowledge funding from the Italian Ministry of Education, University and Research (MIUR) in the frame of the Departments of Excellence Initiative 2018–2022 attributed to DICAM of the University of Trento. M.D. also acknowledges support from the University of Trento in the frame of the Strategic Initiative Modeling and Simulation.

Acknowledgments

The authors acknowledge the support of the HLRS supercomputing center in Stuttgart, Germany, for awarding access to the HazelHen supercomputer, as well as of the Leibniz Rechenzentrum (LRZ) in Munich, Germany for awarding access to the SuperMUC supercomputer. In particular, the authors are very grateful for the technical support provided by Björn Dick (HLRS) and Nicolay Hammer (LRZ). M.D., F.F. and M.T. are all members of the Italian INdAM Research group GNCS.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Riemann, B. Über die Fortpflanzung ebener Luftwellen von Endlicher Schwingungsweite; Göttinger Nachrichten: Göttingen, Germany, 1859; Volume 19. [Google Scholar]
  2. Riemann, B. Über die Fortpflanzung ebener Luftwellen von endlicher Schwingungsweite. Abhandlungen der Königlichen Gesellschaft der Wissenschaften zu Göttingen 1860, 8, 43–65. [Google Scholar]
  3. Noether, E. Invariante Variationsprobleme. In Nachrichten der königlichen Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse; Weidmannsche Buchhandlung: Berlin, Germany, 1918; pp. 235–257. [Google Scholar]
  4. Courant, R.; Friedrichs, K.; Lewy, H. Über die partiellen Differenzengleichungen der mathematischen Physik. Math. Annal. 1928, 100, 32–74. [Google Scholar] [CrossRef]
  5. Courant, R.; Isaacson, E.; Rees, M. On the solution of nonlinear hyperbolic differential equations by finite differences. Commun. Pure Appl. Math. 1952, 5, 243–255. [Google Scholar] [CrossRef]
  6. Courant, R.; Hilbert, D. Methods of Mathematical Physics; John Wiley and Sons, Inc.: New York, NY, USA, 1962. [Google Scholar]
  7. Courant, R.; Friedrichs, K.O. Supersonic Flows and Shock Waves; Springer: Berlin, Germany, 1976. [Google Scholar]
  8. Friedrichs, K. Symmetric positive linear differential equations. Commun. Pure Appl. Math. 1958, 11, 333–418. [Google Scholar] [CrossRef]
  9. Godunov, S. An interesting class of quasilinear systems. Dokl. Akad. Nauk SSSR 1961, 139, 521–523. [Google Scholar]
  10. Friedrichs, K.; Lax, P. Systems of conservation equations with a convex extension. Proc. Natl. Acad. Sci. USA 1971, 68, 1686–1688. [Google Scholar] [CrossRef] [PubMed]
  11. Godunov, S.; Romenski, E. Thermodynamics, conservation laws, and symmetric forms of differential equations in mechanics of continuous media. In Computational Fluid Dynamics Review 95; John Wiley: New York, NY, USA, 1995; pp. 19–31. [Google Scholar]
  12. Godunov, S.; Romenski, E. Elements of Continuum Mechanics and Conservation Laws; Kluwer Academic/Plenum Publishers: New York, NY, USA, 2003. [Google Scholar]
  13. Godunov, S. Symmetric form of the magnetohydrodynamic equation. Numer. Methods Mech. Contin. Med. 1972, 3, 26–34. [Google Scholar]
  14. Godunov, S.; Romenski, E. Nonstationary equations of the nonlinear theory of elasticity in Euler coordinates. J. Appl. Mech. Tech. Phys. 1972, 13, 868–885. [Google Scholar] [CrossRef]
  15. Romenski, E. Hyperbolic systems of thermodynamically compatible conservation laws in continuum mechanics. Math. Comput. Model. 1998, 28, 115–130. [Google Scholar] [CrossRef]
  16. Peshkov, I.; Romenski, E. A hyperbolic model for viscous Newtonian flows. Contin. Mech. Thermodyn. 2016, 28, 85–104. [Google Scholar] [CrossRef]
  17. Dumbser, M.; Peshkov, I.; Romenski, E.; Zanotti, O. High order ADER schemes for a unified first order hyperbolic formulation of continuum mechanics: Viscous heat-conducting fluids and elastic solids. J. Comput. Phys. 2016, 314, 824–862. [Google Scholar] [CrossRef]
  18. Dumbser, M.; Peshkov, I.; Romenski, E.; Zanotti, O. High order ADER schemes for a unified first order hyperbolic formulation of Newtonian continuum mechanics coupled with electro-dynamics. J. Comput. Phys. 2017, 348, 298–342. [Google Scholar] [CrossRef]
  19. Boscheri, W.; Dumbser, M.; Loubère, R. Cell centered direct Arbitrary-Lagrangian-Eulerian ADER-WENO finite volume schemes for nonlinear hyperelasticity. Comput. Fluids 2016, 134–135, 111–129. [Google Scholar] [CrossRef]
  20. Neumann, J.V.; Richtmyer, R.D. A method for the numerical calculation of hydrodynamic shocks. J. Appl. Phys. 1950, 21, 232–237. [Google Scholar] [CrossRef]
  21. Godunov, S.K. A finite difference Method for the Computation of discontinuous solutions of the equations of fluid dynamics. Math. USSR Sbornik 1959, 47, 357–393. [Google Scholar]
  22. Kolgan, V.P. Application of the minimum-derivative principle in the construction of finite-difference schemes for numerical analysis of discontinuous solutions in gas dynamics. Trans. Central Aerohydrodyn. Inst. 1972, 3, 68–77. (In Russian) [Google Scholar]
  23. Van Leer, B. Towards the Ultimate Conservative Difference Scheme II: Monotonicity and conservation combined in a second order scheme. J. Comput. Phys. 1974, 14, 361–370. [Google Scholar] [CrossRef]
  24. Van Leer, B. Towards the Ultimate Conservative Difference Scheme V: A second Order sequel to Godunov’s Method. J. Comput. Phys. 1979, 32, 101–136. [Google Scholar] [CrossRef]
  25. Harten, A.; Osher, S. Uniformly high–order accurate nonoscillatory schemes I. SIAM J. Num. Anal. 1987, 24, 279–309. [Google Scholar] [CrossRef]
  26. Jiang, G.; Shu, C. Efficient Implementation of Weighted ENO Schemes. J. Comput. Phys. 1996, 126, 202–228. [Google Scholar] [CrossRef] [Green Version]
  27. Cockburn, B.; Shu, C.W. TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws II: General framework. Math. Comput. 1989, 52, 411–435. [Google Scholar]
  28. Cockburn, B.; Lin, S.Y.; Shu, C. TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws III: One dimensional systems. J. Comput. Phys. 1989, 84, 90–113. [Google Scholar] [CrossRef]
  29. Cockburn, B.; Hou, S.; Shu, C.W. The Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws IV: The multidimensional case. Math. Comput. 1990, 54, 545–581. [Google Scholar]
  30. Cockburn, B.; Shu, C.W. The Runge-Kutta discontinuous Galerkin method for conservation laws V: Multidimensional systems. J. Comput. Phys. 1998, 141, 199–224. [Google Scholar] [CrossRef]
  31. Cockburn, B.; Shu, C.W. The Local Discontinuous Galerkin Method for Time-Dependent Convection Diffusion Systems. SIAM J. Numer. Anal. 1998, 35, 2440–2463. [Google Scholar] [CrossRef]
  32. Cockburn, B.; Shu, C.W. Runge-Kutta Discontinuous Galerkin Methods for Convection-Dominated Problems. J. Sci. Comput. 2001, 16, 173–261. [Google Scholar] [CrossRef]
  33. Shu, C. High order WENO and DG methods for time-dependent convection-dominated PDEs: A brief survey of several recent developments. J. Comput. Phys. 2016, 316, 598–613. [Google Scholar] [CrossRef] [Green Version]
  34. Dumbser, M.; Balsara, D.; Toro, E.; Munz, C. A Unified Framework for the Construction of One-Step Finite–Volume and discontinuous Galerkin schemes. J. Comput. Phys. 2008, 227, 8209–8253. [Google Scholar] [CrossRef]
  35. Dumbser, M.; Castro, M.; Parés, C.; Toro, E. ADER Schemes on Unstructured Meshes for Non–Conservative Hyperbolic Systems: Applications to Geophysical Flows. Comput. Fluids 2009, 38, 1731–1748. [Google Scholar] [CrossRef]
  36. Dumbser, M.; Zanotti, O.; Loubère, R.; Diot, S. A posteriori subcell limiting of the discontinuous Galerkin finite element method for hyperbolic conservation laws. J. Comput. Phys. 2014, 278, 47–75. [Google Scholar] [CrossRef] [Green Version]
  37. Zanotti, O.; Fambri, F.; Dumbser, M.; Hidalgo, A. Space-time adaptive ADER discontinuous Galerkin finite element schemes with a posteriori sub-cell finite volume limiting. Comput. Fluids 2015, 118, 204–224. [Google Scholar] [CrossRef]
  38. Dumbser, M.; Loubère, R. A simple robust and accurate a posteriori sub-cell finite volume limiter for the discontinuous Galerkin method on unstructured meshes. J. Comput. Phys. 2016, 319, 163–199. [Google Scholar] [CrossRef]
  39. Titarev, V.; Toro, E. ADER: Arbitrary High Order Godunov Approach. J. Sci. Comput. 2002, 17, 609–618. [Google Scholar] [CrossRef]
  40. Toro, E.; Titarev, V. Solution of the generalized Riemann problem for advection-reaction equations. Proc. R. Soc. Lond. 2002, 458, 271–281. [Google Scholar] [CrossRef]
  41. Titarev, V.; Toro, E. ADER schemes for three-dimensional nonlinear hyperbolic systems. J. Comput. Phys. 2005, 204, 715–736. [Google Scholar] [CrossRef]
  42. Toro, E.F.; Titarev, V.A. Derivative Riemann solvers for systems of conservation laws and ADER methods. J. Comput. Phys. 2006, 212, 150–165. [Google Scholar] [CrossRef]
  43. Bungartz, H.; Mehl, M.; Neckel, T.; Weinzierl, T. The PDE framework Peano applied to fluid dynamics: An efficient implementation of a parallel multiscale fluid dynamics solver on octree-like adaptive Cartesian grids. Comput. Mech. 2010, 46, 103–114. [Google Scholar] [CrossRef]
  44. Weinzierl, T.; Mehl, M. Peano-A traversal and storage scheme for octree-like adaptive Cartesian multiscale grids. SIAM J. Sci. Comput. 2011, 33, 2732–2760. [Google Scholar] [CrossRef]
  45. Khokhlov, A. Fully Threaded Tree Algorithms for Adaptive Refinement Fluid Dynamics Simulations. J. Comput. Phys. 1998, 143, 519–543. [Google Scholar] [CrossRef]
  46. Baeza, A.; Mulet, P. Adaptive mesh refinement techniques for high-order shock capturing schemes for multi-dimensional hydrodynamic simulations. Int. J. Numer. Methods Fluids 2006, 52, 455–471. [Google Scholar] [CrossRef]
  47. Colella, P.; Dorr, M.; Hittinger, J.; Martin, D.F.; McCorquodale, P. High-order finite-volume adaptive methods on locally rectangular grids. J. Phys. Conf. Ser. 2009, 180, 012010. [Google Scholar] [CrossRef] [Green Version]
  48. Bürger, R.; Mulet, P.; Villada, L. Spectral WENO schemes with Adaptive Mesh Refinement for models of polydisperse sedimentation. ZAMM J. Appl. Math. Mech. Z. Math. Mech. 2013, 93, 373–386. [Google Scholar] [CrossRef]
  49. Ivan, L.; Groth, C. High-order solution-adaptive central essentially non-oscillatory (CENO) method for viscous flows. J. Comput. Phys. 2014, 257, 830–862. [Google Scholar] [CrossRef]
  50. Buchmüller, P.; Dreher, J.; Helzel, C. Finite volume WENO methods for hyperbolic conservation laws on Cartesian grids with adaptive mesh refinement. Appl. Math. Comput. 2016, 272, 460–478. [Google Scholar] [CrossRef]
  51. Semplice, M.; Coco, A.; Russo, G. Adaptive Mesh Refinement for Hyperbolic Systems based on Third-Order Compact WENO Reconstruction. J. Sci. Comput. 2016, 66, 692–724. [Google Scholar] [CrossRef]
  52. Shen, C.; Qiu, J.; Christlieb, A. Adaptive mesh refinement based on high order finite difference WENO scheme for multi-scale simulations. J. Comput. Phys. 2011, 230, 3780–3802. [Google Scholar] [CrossRef]
  53. Dumbser, M.; Zanotti, O.; Hidalgo, A.; Balsara, D. ADER-WENO Finite Volume Schemes with Space-Time Adaptive Mesh Refinement. J. Comput. Phys. 2013, 248, 257–286. [Google Scholar] [CrossRef]
  54. Dumbser, M.; Hidalgo, A.; Zanotti, O. High Order Space-Time Adaptive ADER-WENO Finite Volume Schemes for Non-Conservative Hyperbolic Systems. Comput. Methods Appl. Mech. Eng. 2014, 268, 359–387. [Google Scholar] [CrossRef]
  55. Zanotti, O.; Fambri, F.; Dumbser, M. Solving the relativistic magnetohydrodynamics equations with ADER discontinuous Galerkin methods, a posteriori subcell limiting and adaptive mesh refinement. Mon. Not. R. Astron. Soc. 2015, 452, 3010–3029. [Google Scholar] [CrossRef] [Green Version]
  56. Fambri, F.; Dumbser, M.; Zanotti, O. Space-time adaptive ADER-DG schemes for dissipative flows: Compressible Navier-Stokes and resistive MHD equations. Comput. Phys. Commun. 2017, 220, 297–318. [Google Scholar] [CrossRef]
  57. Fambri, F.; Dumbser, M.; Köppel, S.; Rezzolla, L.; Zanotti, O. ADER discontinuous Galerkin schemes for general-relativistic ideal magnetohydrodynamics. Mon. Not. R. Astron. Soc. 2018, 477, 4543–4564. [Google Scholar] [CrossRef]
  58. Berger, M.J.; Oliger, J. Adaptive Mesh Refinement for Hyperbolic Partial Differential Equations. J. Comput. Phys. 1984, 53, 484–512. [Google Scholar] [CrossRef]
  59. Berger, M.J.; Jameson, A. Automatic adaptive grid refinement for the Euler equations. AIAA J. 1985, 23, 561–568. [Google Scholar] [CrossRef] [Green Version]
  60. Berger, M.J.; Colella, P. Local adaptive mesh refinement for shock hydrodynamics. J. Comput. Phys. 1989, 82, 64–84. [Google Scholar] [CrossRef] [Green Version]
  61. Leveque, R. Clawpack Software. Available online: http://depts.washington.edu/clawpack/ (accessed on 23 August 2018).
  62. Berger, M.J.; LeVeque, R. Adaptive mesh refinement using wave-propagation algorithms for hyperbolic systems. SIAM J. Numer. Anal. 1998, 35, 2298–2316. [Google Scholar] [CrossRef]
  63. Bell, J.; Berger, M.; Saltzman, J.; Welcome, M. Three-dimensional adaptive mesh refinement for hyperbolic conservation laws. SIAM J. Sci. Comput. 1994, 15, 127–138. [Google Scholar] [CrossRef]
  64. Quirk, J. A parallel adaptive grid algorithm for computational shock hydrodynamics. Appl. Numer. Math. 1996, 20, 427–453. [Google Scholar] [CrossRef]
  65. Coirier, W.; Powell, K. Solution-adaptive Cartesian cell approach for viscous and inviscid flows. AIAA J. 1996, 34, 938–945. [Google Scholar] [CrossRef] [Green Version]
  66. Deiterding, R. A parallel adaptive method for simulating shock-induced combustion with detailed chemical kinetics in complex domains. Comput. Struct. 2009, 87, 769–783. [Google Scholar] [CrossRef] [Green Version]
  67. Lopes, M.M.; Deiterding, R.; Gomes, A.F.; Mendes, O.; Domingues, M.O. An ideal compressible magnetohydrodynamic solver with parallel block-structured adaptive mesh refinement. Comput. Fluids 2018, 173, 293–298. [Google Scholar] [CrossRef]
  68. Dezeeuw, D.; Powell, K.G. An Adaptively Refined Cartesian Mesh Solver for the Euler Equations. J. Comput. Phys. 1993, 104, 56–68. [Google Scholar] [CrossRef]
  69. Balsara, D. Divergence-free adaptive mesh refinement for magnetohydrodynamics. J. Comput. Phys. 2001, 174, 614–648. [Google Scholar] [CrossRef]
  70. Teyssier, R. Cosmological hydrodynamics with adaptive mesh refinement. A new high resolution code called RAMSES. Astron. Astrophys. 2002, 385, 337–364. [Google Scholar] [CrossRef] [Green Version]
  71. Keppens, R.; Nool, M.; Tóth, G.; Goedbloed, J.P. Adaptive Mesh Refinement for conservative systems: Multi-dimensional efficiency evaluation. Comput. Phys. Commun. 2003, 153, 317–339. [Google Scholar] [CrossRef]
  72. Ziegler, U. The NIRVANA code: Parallel computational MHD with adaptive mesh refinement. Comput. Phys. Commun. 2008, 179, 227–244. [Google Scholar] [CrossRef]
  73. Mignone, A.; Zanni, C.; Tzeferacos, P.; van Straalen, B.; Colella, P.; Bodo, G. The PLUTO Code for Adaptive Mesh Computations in Astrophysical Fluid Dynamics. Astrophys. J. Suppl. Ser. 2012, 198, 7. [Google Scholar] [CrossRef]
  74. Cunningham, A.; Frank, A.; Varnière, P.; Mitran, S.; Jones, T.W. Simulating Magnetohydrodynamical Flow with Constrained Transport and Adaptive Mesh Refinement: Algorithms and Tests of the AstroBEAR Code. Astrophys. J. Suppl. Ser. 2009, 182, 519. [Google Scholar] [CrossRef]
  75. Keppens, R.; Meliani, Z.; van Marle, A.; Delmont, P.; Vlasis, A.; van der Holst, B. Parallel, grid-adaptive approaches for relativistic hydro and magnetohydrodynamics. J. Comput. Phys. 2012, 231, 718–744. [Google Scholar] [CrossRef]
  76. Porth, O.; Olivares, H.; Mizuno, Y.; Younsi, Z.; Rezzolla, L.; Moscibrodzka, M.; Falcke, H.; Kramer, M. The black hole accretion code. Comput. Astrophys. Cosmol. 2017, 4, 1–42. [Google Scholar] [CrossRef]
  77. Dubey, A.; Almgren, A.; Bell, J.; Berzins, M.; Brandt, S.; Bryan, G.; Colella, P.; Graves, D.; Lijewski, M.; Löffler, F.; et al. A survey of high level frameworks in block-structured adaptive mesh refinement packages. J. Parallel Distrib. Comput. 2014, 74, 3217–3227. [Google Scholar] [CrossRef] [Green Version]
  78. Stroud, A. Approximate Calculation of Multiple Integrals; Prentice-Hall Inc.: Englewood Cliffs, NJ, USA, 1971. [Google Scholar]
  79. Castro, M.; Gallardo, J.; Parés, C. High-order finite volume schemes based on reconstruction of states for solving hyperbolic systems with nonconservative products. Applications to shallow-water systems. Math. Comput. 2006, 75, 1103–1134. [Google Scholar] [CrossRef]
  80. Parés, C. Numerical methods for nonconservative hyperbolic systems: A theoretical framework. SIAM J. Numer. Anal. 2006, 44, 300–321. [Google Scholar] [CrossRef]
  81. Rhebergen, S.; Bokhove, O.; van der Vegt, J. Discontinuous Galerkin finite element methods for hyperbolic nonconservative partial differential equations. J. Comput. Phys. 2008, 227, 1887–1922. [Google Scholar] [CrossRef] [Green Version]
  82. Dumbser, M.; Hidalgo, A.; Castro, M.; Parés, C.; Toro, E. FORCE Schemes on Unstructured Meshes II: Non–Conservative Hyperbolic Systems. Comput. Methods Appl. Mech. Eng. 2010, 199, 625–647. [Google Scholar] [CrossRef]
  83. Müller, L.O.; Parés, C.; Toro, E.F. Well-balanced high-order numerical schemes for one-dimensional blood flow in vessels with varying mechanical properties. J. Comput. Phys. 2013, 242, 53–85. [Google Scholar] [CrossRef]
  84. Müller, L.; Toro, E. Well-balanced high-order solver for blood flow in networks of vessels with variable properties. Int. J. Numer. Methods Biomed. Eng. 2013, 29, 1388–1411. [Google Scholar] [CrossRef] [PubMed]
  85. Gaburro, E.; Dumbser, M.; Castro, M. Direct Arbitrary-Lagrangian-Eulerian finite volume schemes on moving nonconforming unstructured meshes. Comput. Fluids 2017, 159, 254–275. [Google Scholar] [CrossRef] [Green Version]
  86. Gaburro, E.; Castro, M.; Dumbser, M. Well balanced Arbitrary-Lagrangian-Eulerian finite volume schemes on moving nonconforming meshes for the Euler equations of gasdynamics with gravity. Mon. Not. R. Astron. Soc. 2018, 477, 2251–2275. [Google Scholar] [CrossRef]
  87. Toro, E.; Hidalgo, A.; Dumbser, M. FORCE Schemes on Unstructured Meshes I: Conservative Hyperbolic Systems. J. Comput. Phys. 2009, 228, 3368–3389. [Google Scholar] [CrossRef]
  88. Dumbser, M.; Toro, E.F. A Simple Extension of the Osher Riemann Solver to Non-Conservative Hyperbolic Systems. J. Sci. Comput. 2011, 48, 70–88. [Google Scholar] [CrossRef]
  89. Castro, M.; Pardo, A.; Parés, C.; Toro, E. On some fast well-balanced first order solvers for nonconservative systems. Math. Comput. 2010, 79, 1427–1472. [Google Scholar] [CrossRef]
  90. Dumbser, M.; Toro, E.F. On Universal Osher–Type Schemes for General Nonlinear Hyperbolic Conservation Laws. Commun. Comput. Phys. 2011, 10, 635–671. [Google Scholar] [CrossRef]
  91. Einfeldt, B.; Roe, P.L.; Munz, C.D.; Sjogreen, B. On Godunov-type methods near low densities. J. Comput. Phys. 1991, 92, 273–295. [Google Scholar] [CrossRef]
  92. Dumbser, M.; Balsara, D. A New, Efficient Formulation of the HLLEM Riemann Solver for General Conservative and Non-Conservative Hyperbolic Systems. J. Comput. Phys. 2016, 304, 275–319. [Google Scholar] [CrossRef]
  93. Dumbser, M.; Enaux, C.; Toro, E. Finite Volume Schemes of Very High Order of Accuracy for Stiff Hyperbolic Balance Laws. J. Comput. Phys. 2008, 227, 3971–4001. [Google Scholar] [CrossRef]
  94. Dumbser, M.; Zanotti, O. Very High Order PNPM Schemes on Unstructured Meshes for the Resistive Relativistic MHD Equations. J. Comput. Phys. 2009, 228, 6991–7006. [Google Scholar] [CrossRef]
  95. Harten, A.; Engquist, B.; Osher, S.; Chakravarthy, S. Uniformly high order essentially non-oscillatory schemes, III. J. Comput. Phys. 1987, 71, 231–303. [Google Scholar] [CrossRef]
  96. Dumbser, M.; Käser, M.; Titarev, V.; Toro, E. Quadrature-Free Non-Oscillatory Finite Volume Schemes on Unstructured Meshes for Nonlinear Hyperbolic Systems. J. Comput. Phys. 2007, 226, 204–243. [Google Scholar] [CrossRef]
  97. Taube, A.; Dumbser, M.; Balsara, D.; Munz, C. Arbitrary High Order Discontinuous Galerkin Schemes for the Magnetohydrodynamic Equations. J. Sci. Comput. 2007, 30, 441–464. [Google Scholar] [CrossRef]
  98. Hidalgo, A.; Dumbser, M. ADER Schemes for Nonlinear Systems of Stiff Advection-Diffusion-Reaction Equations. J. Sci. Comput. 2011, 48, 173–189. [Google Scholar] [CrossRef]
  99. Jackson, H. On the eigenvalues of the ADER-WENO Galerkin predictor. J. Comput. Phys. 2017, 333, 409–413. [Google Scholar] [CrossRef]
  100. Zanotti, O.; Dumbser, M. Efficient conservative ADER schemes based on WENO reconstruction and space-time predictor in primitive variables. Comput. Astrophys. Cosmol. 2016, 3, 1. [Google Scholar] [CrossRef]
  101. Owren, B.; Zennaro, M. Derivation of efficient, continuous, explicit Runge–Kutta methods. SIAM J. Sci. Stat. Comput. 1992, 13, 1488–1501. [Google Scholar] [CrossRef]
  102. Gassner, G.; Dumbser, M.; Hindenlang, F.; Munz, C. Explicit one-step time discretizations for discontinuous Galerkin and finite volume schemes based on local predictors. J. Comput. Phys. 2011, 230, 4232–4247. [Google Scholar] [CrossRef]
  103. Heinecke, A.; Pabst, H.; Henry, G. LIBXSMM: A High Performance Library for Small Matrix Multiplications. Technical Report, SC’15: The International Conference for High Performance Computing, Networking, Storage and Analysis, Austin (Texas), 2015. Available online: https://github.com/hfp/libxsmm (accessed on 23 August 2018).
  104. Breuer, A.; Heinecke, A.; Bader, M.; Pelties, C. Accelerating SeisSol by generating vectorized code for sparse matrix operators. Adv. Parallel Comput. 2014, 25, 347–356. [Google Scholar]
  105. Breuer, A.; Heinecke, A.; Rettenberger, S.; Bader, M.; Gabriel, A.; Pelties, C. Sustained petascale performance of seismic simulations with SeisSol on SuperMUC. Lect. Notes Comput. Sci. (LNCS) 2014, 8488, 1–18. [Google Scholar]
  106. Charrier, D.; Weinzierl, T. Stop talking to me—A communication-avoiding ADER-DG realisation. arXiv, 2018; arXiv:1801.08682. [Google Scholar]
  107. Boscheri, W.; Dumbser, M. Arbitrary–Lagrangian–Eulerian Discontinuous Galerkin schemes with a posteriori subcell finite volume limiting on moving unstructured meshes. J. Comput. Phys. 2017, 346, 449–479. [Google Scholar] [CrossRef]
  108. Clain, S.; Diot, S.; Loubère, R. A high-order finite volume method for systems of conservation laws—Multi-dimensional Optimal Order Detection (MOOD). J. Comput. Phys. 2011, 230, 4028–4050. [Google Scholar] [CrossRef] [Green Version]
  109. Diot, S.; Clain, S.; Loubère, R. Improved detection criteria for the Multi-dimensional Optimal Order Detection (MOOD) on unstructured meshes with very high-order polynomials. Comput. Fluids 2012, 64, 43–63. [Google Scholar] [CrossRef] [Green Version]
  110. Diot, S.; Loubère, R.; Clain, S. The MOOD method in the three-dimensional case: Very-High-Order Finite Volume Method for Hyperbolic Systems. Int. J. Numer. Methods Fluids 2013, 73, 362–392. [Google Scholar] [CrossRef]
  111. Loubère, R.; Dumbser, M.; Diot, S. A New Family of High Order Unstructured MOOD and ADER Finite Volume Schemes for Multidimensional Systems of Hyperbolic Conservation Laws. Commun. Comput. Phys. 2014, 16, 718–763. [Google Scholar] [CrossRef] [Green Version]
  112. Levy, D.; Puppo, G.; Russo, G. Central WENO schemes for hyperbolic systems of conservation laws. M2AN Math. Model. Numer. Anal. 1999, 33, 547–571. [Google Scholar] [CrossRef] [Green Version]
  113. Levy, D.; Puppo, G.; Russo, G. Compact central WENO schemes for multidimensional conservation laws. SIAM J. Sci. Comput. 2000, 22, 656–672. [Google Scholar] [CrossRef]
  114. Dumbser, M.; Boscheri, W.; Semplice, M.; Russo, G. Central weighted ENO schemes for hyperbolic conservation laws on fixed and moving unstructured meshes. SIAM J. Sci. Comput. 2017, 39, A2564–A2591. [Google Scholar] [CrossRef]
  115. Hu, C.; Shu, C. Weighted essentially non-oscillatory schemes on triangular meshes. J. Comput. Phys. 1999, 150, 97–127. [Google Scholar] [CrossRef]
  116. Sedov, L. Similarity and Dimensional Methods in Mechanics; Academic Press: New York, NY, USA, 1959. [Google Scholar]
  117. Kamm, J.; Timmes, F. On Efficient Generation of Numerically Robust Sedov Solutions; Technical Report LA-UR-07-2849; LANL: Los Alamos, NM, USA, 2007. [Google Scholar]
  118. Tavelli, M.; Dumbser, M.; Charrier, D.; Rannabauer, L.; Weinzierl, T.; Bader, M. A simple diffuse interface approach on adaptive Cartesian grids for the linear elastic wave equations with complex topography. arXiv, 2018; arXiv:1804.09491. [Google Scholar]
  119. Dumbser, M.; Käser, M. An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes—II. The three-dimensional isotropic case. Geophys. J. Int. 2006, 167, 319–336. [Google Scholar] [CrossRef]
  120. Godunov, S.K.; Zabrodin, A.V.; Prokopov, G.P. A Difference Scheme for Two-Dimensional Unsteady Aerodynamics. J. Comp. Math. Math. Phys. USSR 1961, 2, 1020–1050. [Google Scholar]
  121. Antón, L.; Zanotti, O.; Miralles, J.A.; Martí, J.M.; Ibáñez, J.M.; Font, J.A.; Pons, J.A. Numerical 3+1 general relativistic magnetohydrodynamics: A local characteristic approach. Astrophys. J. 2006, 637, 296. [Google Scholar] [CrossRef]
  122. Zanna, L.D.; Zanotti, O.; Bucciantini, N.; Londrillo, P. ECHO: An Eulerian Conservative High Order scheme for general relativistic magnetohydrodynamics and magnetodynamics. Astron. Astrophys. 2007, 473, 11–30. [Google Scholar] [CrossRef]
  123. Löhner, R. An adaptive finite element scheme for transient problems in CFD. Comput. Methods Appl. Mech. Eng. 1987, 61, 323–338. [Google Scholar] [CrossRef]
  124. Radice, D.; Rezzolla, L.; Galeazzi, F. High-order fully general-relativistic hydrodynamics: New approaches and tests. Class. Quantum Gravity 2014, 31, 075012. [Google Scholar] [CrossRef]
  125. Bermúdez, A.; Vázquez, M. Upwind methods for hyperbolic conservation laws with source terms. Comput. Fluids 1994, 23, 1049–1071. [Google Scholar] [CrossRef]
  126. Alic, D.; Bona-Casas, C.; Bona, C.; Rezzolla, L.; Palenzuela, C. Conformal and covariant formulation of the Z4 system with constraint-violation damping. Phys. Rev. D 2012, 85, 064040. [Google Scholar] [CrossRef] [Green Version]
  127. Dedner, A.; Kemm, F.; Kröner, D.; Munz, C.D.; Schnitzer, T.; Wesenberg, M. Hyperbolic Divergence Cleaning for the MHD Equations. J. Comput. Phys. 2002, 175, 645–673. [Google Scholar] [CrossRef] [Green Version]
  128. Gundlach, C.; Martin-Garcia, J. Symmetric hyperbolic form of systems of second-order evolution equations subject to constraints. Phys. Rev. D 2004, 70, 044031. [Google Scholar] [CrossRef]
  129. Dumbser, M.; Guercilena, F.; Köppel, S.; Rezzolla, L.; Zanotti, O. Conformal and covariant Z4 formulation of the Einstein equations: Strongly hyperbolic first–order reduction and solution with discontinuous Galerkin schemes. Phys. Rev. D 2018, 97, 084053. [Google Scholar] [CrossRef]
Figure 1. Sedov blast wave problem using an ADER-DG P 9 scheme with a posteriori subcell finite volume limiter using predictor and limiter in primitive variables, see Reference [100]. Unlimited cells are depicted in blue, while limited cells are highlighted in red (left). 1D cut through the numerical solution and comparison with the exact solution (right).
Figure 1. Sedov blast wave problem using an ADER-DG P 9 scheme with a posteriori subcell finite volume limiter using predictor and limiter in primitive variables, see Reference [100]. Unlimited cells are depicted in blue, while limited cells are highlighted in red (left). 1D cut through the numerical solution and comparison with the exact solution (right).
Axioms 07 00063 g001
Figure 2. Wave field of a seismic wave propagation problem with the novel diffuse interface approach on adaptive Cartesian grids developed in Reference [118] (left) compared with the reference solution obtained on a classical boundary-fitted unstructured tetrahedral mesh [119] (right).
Figure 2. Wave field of a seismic wave propagation problem with the novel diffuse interface approach on adaptive Cartesian grids developed in Reference [118] (left) compared with the reference solution obtained on a classical boundary-fitted unstructured tetrahedral mesh [119] (right).
Axioms 07 00063 g002
Figure 3. Seismogram recordings in two observation points obtained with the diffuse interface approach on adaptive Cartesian meshes [118] and with a reference solution obtained with high order ADER-DG schemes on boundary-fitted unstructured meshes [119].
Figure 3. Seismogram recordings in two observation points obtained with the diffuse interface approach on adaptive Cartesian meshes [118] and with a reference solution obtained with high order ADER-DG schemes on boundary-fitted unstructured meshes [119].
Axioms 07 00063 g003
Figure 4. Viscous heat conducting shock. Comparison of the exact solution of the compressible Navier-Stokes equations with the numerical solution of the GPR model based on ADER-DG P 3 schemes. Density profile (top left), velocity profile (top right), heat flux (bottom left) and stress σ 11 (bottom right).
Figure 4. Viscous heat conducting shock. Comparison of the exact solution of the compressible Navier-Stokes equations with the numerical solution of the GPR model based on ADER-DG P 3 schemes. Density profile (top left), velocity profile (top right), heat flux (bottom left) and stress σ 11 (bottom right).
Axioms 07 00063 g004
Figure 5. Results for the GRMHD Orszag-Tang vortex problem in flat space–time (SRMHD) at t = 2 obtained with ADER-DG- P 5 schemes supplemented with a posteriori subcell finite volume limiter and using different refinement estimator functions χ . A set of 1D cuts taken at y = 10 2 are shown. From (left) to (right): the rest-mass density, the velocity u and the magnetic field component B x . One can note an excellent agreement between the reference solution and the ones obtained on different AMR grids.
Figure 5. Results for the GRMHD Orszag-Tang vortex problem in flat space–time (SRMHD) at t = 2 obtained with ADER-DG- P 5 schemes supplemented with a posteriori subcell finite volume limiter and using different refinement estimator functions χ . A set of 1D cuts taken at y = 10 2 are shown. From (left) to (right): the rest-mass density, the velocity u and the magnetic field component B x . One can note an excellent agreement between the reference solution and the ones obtained on different AMR grids.
Axioms 07 00063 g005
Figure 6. Results for the GRMHD Orszag-Tang vortex problem in flat space–time (SRMHD) at t = 2 obtained with ADER-DG- P 5 schemes, supplemented with a posteriori subcell finite volume limiter and using different refinement estimator functions χ . (i) first order-derivative estimator χ 1 (top left); (ii) second-order derivative estimator χ 2 (top right); (iii) a new limiter-based estimator χ 3 (row two, left) and (iv) a new multi-resolution estimator χ 4 based on the difference between the discrete solution on two adjacent refinement levels (row two, right).
Figure 6. Results for the GRMHD Orszag-Tang vortex problem in flat space–time (SRMHD) at t = 2 obtained with ADER-DG- P 5 schemes, supplemented with a posteriori subcell finite volume limiter and using different refinement estimator functions χ . (i) first order-derivative estimator χ 1 (top left); (ii) second-order derivative estimator χ 2 (top right); (iii) a new limiter-based estimator χ 3 (row two, left) and (iv) a new multi-resolution estimator χ 4 based on the difference between the discrete solution on two adjacent refinement levels (row two, right).
Axioms 07 00063 g006
Figure 7. Computational results for a stable 3D neutron star. Time series of the relative error of the central rest mass density ρ ( 0 , t ) ρ ( 0 , 0 ) / ρ ( 0 , 0 ) (left) and 3D view of of the pressure contour surfaces at time t = 1000 (right).
Figure 7. Computational results for a stable 3D neutron star. Time series of the relative error of the central rest mass density ρ ( 0 , t ) ρ ( 0 , 0 ) / ρ ( 0 , 0 ) (left) and 3D view of of the pressure contour surfaces at time t = 1000 (right).
Axioms 07 00063 g007
Figure 8. Computational results for a stable 3D neutron star. Comparison of the numerical solution with the exact one at time t = 1000 on a 1D cut along the x-axis for the rest mass density (left) and the pressure (right).
Figure 8. Computational results for a stable 3D neutron star. Comparison of the numerical solution with the exact one at time t = 1000 on a 1D cut along the x-axis for the rest mass density (left) and the pressure (right).
Axioms 07 00063 g008
Figure 9. Computational results for a stable 3D neutron star. Cut through the xy plane with pressure on the z axis and rest mass density contour colors. Exact solution (left) and numerical solution at time t = 1000 (right).
Figure 9. Computational results for a stable 3D neutron star. Cut through the xy plane with pressure on the z axis and rest mass density contour colors. Exact solution (left) and numerical solution at time t = 1000 (right).
Axioms 07 00063 g009
Figure 10. Strong MPI scaling study of ADER-DG schemes for the novel FO-CCZ4 formulation of the Einstein field equations recently proposed in Reference [129]. (Left) comparison of ADER-DG schemes with conventional Runge-Kutta DG schemes from 64 to 64,000 CPU cores on the SuperMUC phase 1 system of the LRZ supercomputing center (Garching, Germany). (Right) strong scaling study from 720 to 180,000 CPU cores, including a full machine run on the Hazel Hen supercomputer of HLRS (Stuttgart, Germany) with ADER-DG schemes (right). Even on the full machine we observe still more than 90% of parallel efficiency.
Figure 10. Strong MPI scaling study of ADER-DG schemes for the novel FO-CCZ4 formulation of the Einstein field equations recently proposed in Reference [129]. (Left) comparison of ADER-DG schemes with conventional Runge-Kutta DG schemes from 64 to 64,000 CPU cores on the SuperMUC phase 1 system of the LRZ supercomputing center (Garching, Germany). (Right) strong scaling study from 720 to 180,000 CPU cores, including a full machine run on the Hazel Hen supercomputer of HLRS (Stuttgart, Germany) with ADER-DG schemes (right). Even on the full machine we observe still more than 90% of parallel efficiency.
Axioms 07 00063 g010
Table 1. L 1 , L 2 and L errors and numerical convergence rates obtained for the two-dimensional isentropic vortex test problem using different unlimited ADER-DG schemes, see Reference [36].
Table 1. L 1 , L 2 and L errors and numerical convergence rates obtained for the two-dimensional isentropic vortex test problem using different unlimited ADER-DG schemes, see Reference [36].
N x L 1 Error L 2 Error L Error L 1 Order L 2 Order L OrderTheor.
N = 3 25 5.77 × 10 4 9.42 × 10 5 7.84 × 10 5 4
50 2.75 × 10 5 4.52 × 10 6 4.09 × 10 6 4.394.384.26
75 4.36 × 10 6 7.89 × 10 7 7.55 × 10 7 4.554.304.17
100 1.21 × 10 6 2.37 × 10 7 2.38 × 10 7 4.464.174.01
N = 4 20 1.54 × 10 4 2.18 × 10 5 2.20 × 10 5 5
30 1.79 × 10 5 2.46 × 10 6 2.13 × 10 6 5.325.375.75
40 3.79 × 10 6 5.35 × 10 7 5.18 × 10 7 5.395.314.92
50 1.11 × 10 6 1.61 × 10 7 1.46 × 10 7 5.505.395.69
N = 5 10 9.72 × 10 4 1.59 × 10 4 2.00 × 10 4 6
20 1.56 × 10 5 2.13 × 10 6 2.14 × 10 6 5.966.226.55
30 1.14 × 10 6 1.64 × 10 7 1.91 × 10 7 6.456.335.96
40 2.17 × 10 7 2.97 × 10 8 3.59 × 10 8 5.775.935.82
Table 2. Accuracy and cost comparison between ADER-DG and RKDG schemes of different orders for the GRMHD equations in three space dimensions. The errors refer to the variable B y . The table also contains total wall clock times (WCT) measured in seconds using 512 MPI ranks of the SuperMUC phase 1 system at the LRZ in Garching, Germany.
Table 2. Accuracy and cost comparison between ADER-DG and RKDG schemes of different orders for the GRMHD equations in three space dimensions. The errors refer to the variable B y . The table also contains total wall clock times (WCT) measured in seconds using 512 MPI ranks of the SuperMUC phase 1 system at the LRZ in Garching, Germany.
N x L 2 Error L 2 OrderWCT [s] N x L 2 Error L 2 OrderWCT [s]
ADER-DG ( N = 3 )RKDG ( N = 3 )
8 7.6396 × 10 4 0.0938 8.0909 × 10 4 0.107
16 1.7575 × 10 5 5.441.37116 2.2921 × 10 5 5.141.394
24 6.7968 × 10 6 2.346.85424 7.3453 × 10 6 2.816.894
32 1.0537 × 10 6 6.4821.64232 1.3793 × 10 6 5.8121.116
ADER-DG ( N = 4 )RKDG ( N = 4 )
8 6.6955 × 10 5 0.3638 6.8104 × 10 5 0.456
16 2.2712 × 10 6 4.885.69616 2.3475 × 10 6 4.866.666
24 3.3023 × 10 7 4.7628.03624 3.3731 × 10 7 4.7829.186
32 7.4728 × 10 8 5.1789.27132 7.7084 × 10 8 5.1387.115
ADER-DG ( N = 5 )RKDG ( N = 5 )
8 5.2967 × 10 7 1.0908 5.7398 × 10 7 1.219
16 7.4886 × 10 9 6.1416.71016 8.1461 × 10 9 6.1417.310
24 7.1879 × 10 10 5.7884.42524 7.7634 × 10 10 5.8083.777
32 1.2738 × 10 10 6.01263.02132 1.3924 × 10 10 5.97260.859

Share and Cite

MDPI and ACS Style

Dumbser, M.; Fambri, F.; Tavelli, M.; Bader, M.; Weinzierl, T. Efficient Implementation of ADER Discontinuous Galerkin Schemes for a Scalable Hyperbolic PDE Engine. Axioms 2018, 7, 63. https://doi.org/10.3390/axioms7030063

AMA Style

Dumbser M, Fambri F, Tavelli M, Bader M, Weinzierl T. Efficient Implementation of ADER Discontinuous Galerkin Schemes for a Scalable Hyperbolic PDE Engine. Axioms. 2018; 7(3):63. https://doi.org/10.3390/axioms7030063

Chicago/Turabian Style

Dumbser, Michael, Francesco Fambri, Maurizio Tavelli, Michael Bader, and Tobias Weinzierl. 2018. "Efficient Implementation of ADER Discontinuous Galerkin Schemes for a Scalable Hyperbolic PDE Engine" Axioms 7, no. 3: 63. https://doi.org/10.3390/axioms7030063

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop