An algorithmic comparison of the Hyper-Reduction and the Discrete Empirical Interpolation Method for a nonlinear thermal problem

An algorithmic summary and comparison of the methodological and numerical properties of competing parametric model reduction techniques is presented for a planar nonlinear thermal conduction problem. First, the Galerkin reduced basis (RB) formulation is presented which fails at providing significant gains with respect to the computational efficiency for the nonlinear problem. Renown methods for the reduction of the computing time of nonlinear reduced order models are the Hyper-Reduction and the (Discrete) Empirical Interpolation Method (EIM, DEIM). An algorithmic description and a methodological comparison of both methods are provided. The accuracy of the predictions of the hyper-reduced model and the (D)EIM in comparison to the Galerkin RB is investigated. All three approaches are applied to uncertainty quantification and the results are compared to computationally intense finite element simulations.


Introduction
Numerical models in engineering or natural sciences are getting more and more complex, may be nonlinear and depending on unknown or controllable design-parameters. Simultaneously, simulation settings increasingly move from single-forward simulations to higher-level simulation scenarios. For example optimization and statistical investigations require multiple solves, interactive applications require real-time simulation response, or slim-computing environments, e.g. simple controllers, require rapid and memory-saving models. For such applications the field of model reduction has gained increasing attention during the last decade. The goal is an acceleration of a given numerical model based on construction of a low-dimensional approximate surrogate model, the so called reduced order model. Due to the reduced dimension, the computation should ideally be rapid, hence be applicable for the mentioned multi-query, real-time or slim-computing simulation scenarios. Well-known techniques for linear problems comprise Proper Orthogonal Decomposition (POD) [1,2], control-theoretic approaches such as Balanced Truncation, Moment Matching or Hankel-norm approximation [3]. For parametric problems, certified Reduced Basis (RB) methods have been developed [4,5]. Nonlinear problems pose additional challenges. In particular there exists a well-known drawback with POD, which is that a high-dimensional reconstruction of the reduced solution is required for each evaluation of the nonlinearity. Some solution techniques exist, which provide a remedy of this problem. Mainly, these approaches are sampling-based techniques such as Empirical Interpolation [6] and discrete variants [7,8,9,10] or Hyper-Reduction [11,12]. Note, that also further approaches exist, such as Gappy-POD [13], Missing Point Estimation (MPE) [14] or Gauss-Newton with approximated Tensors (GNAT) [15]. Most of those methods identify a subset of the arguments of the nonlinear function. Then, based solely on evaluation of these few components, they construct an approximation of the full solution. Such nonlinear problems emerge in many applications. For instance the effective behavior of dissipative microstructured materials needs to be predicted by nonlinear homogenization techniques. These imply a multi-query context in the sense of different loadings (and load paths) applied to the reference volume element in order to obtain the related effective mechanical response. Reduced basis methods combining the purely algorithmic gains of the reduced basis with a reformulation of the problem incorporating micromechanics have shown to be efficient for the prediction of the effective material response, for multi-level FE simulations and for nonlinear multiscale topology optimization [e.g., 16,17].
In the current paper, we want to compare two of those efficient methods, namely Discrete Empirical Interpolation (DEIM) and Hyper-Reduction (HR), for the reduction of a nonlinear parametric thermal model. We chose those two methods as the authors have extensive experience with these approaches, which promises a fair comparison. Including more procedures -and potential suboptimal implementations -in the study might lead to biased results.
The paper is structured as follows. We introduce the nonlinear parametrized thermal model problem in Section 2. Subsequently, in Section 3 the methods under investigation are introduced and formally compared: As a benchmark the POD-Galerkin procedure is formulated, then the DEIM and the hyper-reduction technique. The numerical comparison is provided in Section 4, and we conclude in Section 5.

Strong formulation
In the following we consider a stationary planar nonlinear heat conduction problem on a plate with a hole, see Fig. 1. The problem is parametrized by a vector p. The nonlinearity of the problem is introduced by an isotropic Fourier-type heat conductivity µ(u; p) that depends on the current temperature u and the parameter vector. The Dirichlet and Neumann boundary data are denoted by u * and q * ≡ 0, respectively. The corresponding Dirichlet and Neumann boundaries are denoted by Γ 1 and Γ 2 , respectively. The strong formulation of the boundary value problem is −∇ · (µ(u; p) ∇u) = 0 in Ω, u = u * (p) on Γ 1 , −µ(u; p) ∇u · n = q * on Γ 2 , where µ is the temperature dependent conductivity.

Weak formulation
The weak form of the heat conduction problem (1) is given by the quasi-linear form with the unknown temperature field u ∈ V = V 0 + {u * } sought after. The function space V 0 := {u ∈ H 1 (Ω) : u = 0 on Γ 1 } is referred to as space of test functions vanishing on the Dirichlet boundary Γ 1 . The Dirichlet conditions on Γ 1 are assumed to depend on two parameters g x and g y via u * (x, y; p) := g x x + g y y, g x , g y ∈ [0, 1], x, y ∈ Ω.
While the solution space V depends on the parameters p via u * , the space of test functions V 0 is independent of the parameters p. For the heat conductivity µ, an explicit dependence on the temperature via the nonlinear constitutive model is assumed (see also Fig. 1, right). The parameter vector p in the present context is In the current study attention is confined to the parameter range

Discrete formulation
In order to solve the nonlinear problem (1) Finite Elements (FE) are used in a classical Galerkin formulation [e.g., 18,19]. The space of finite element test functions V h 0 ⊂ V 0 is assumed to contain n linearly independent and continuous ansatz functions ϕ i . Assuming a solution expansion with the vector of coefficients u(p) = u h,j (p) n j=1 ∈ R n the discrete nonlinear equations that have to be solved for any given parameter vector p are defined by Note that the Dirichlet conditions defined by the prescribed boundary data u * (p) explicitly contribute to the residual equations. They are exactly captured by the discrete solution due to linearity. The condition (8) can be expressed in a compact representation using the vector notation In order to solve the nonlinear problem (9), expressed component-wise in (8), the finite element method is used to provide the functions for the expansion (7). In the following N denotes a row vector of finite element ansatz functions for the temperature, and B is the gradient of N . The FE approximation of the temperature u and its gradient ∇u are given by Then the exact Jacobian of the finite element system is In the following a symmetric approximation of J * given by is used. The numerical solution of the nonlinear problem (9) is then obtained by a fixed-point scheme using J as an approximation of the differential stiffness matrix J * . The resulting iteration scheme is commonly referred to as successive substitution [20, page 66]. Note also that J can be computed independent of u and p, while the exact Jacobian is not defined for the critical temperature u c = (µ 1 − µ 0 )/c since the conductivity µ is non-differentiable at this temperature.
Algorithm 1 illustrates the nonlinear finite element solution procedure for the considered problem. The comment lines in the algorithm comprise references to computational costs c reloc (computation of u and ∇u), c const (for the constitutive model), c rhs (linked to the residual computation), c Jac (related to the Jacobian) and c sol (for the linear solver). This will be relevant in the comparison section.

Reduced basis ansatz
In the following an m-dimensional basis of global ansatz functions ψ k is considered in the reduced framework, where m ≪ n is implicitly asserted in order to obtain an actual model order reduction. The functions ψ k define a linear subspace of V h 0 which can be characterized by a matrix V = (V ij ) ∈ R n×m via The reduced coefficient vector γ = (γ k ) m k=1 ∈ R m defines the temperature field in the reduced setting by the affine relatioñ The corresponding vector of discrete valuesũ := (ũ j ) n j=1 ∈ R n of the temperature with respect to the basis of V h 0 is given by the linear relationũ = V γ. Hence,ũ can equivalently be expressed as In the following the matrix V defining the space of the reduced ansatz functions is assumed to be given by a snapshot POD [21] obtained from full resolution finite element simulations evaluated at s different parameter vectors p (i) (i = 1, . . . , s). The POD subspaceṼ h 0 ⊂ V h 0 is then obtained as best-approximating m-dimensional subspace with respect to the L 2 -error, i.e.
where PṼ 0 is the orthogonal projection onto the subspaceṼ 0 . Numerically, a POD-basis of this space can be obtained by a corresponding matrix eigenvalue problem, e.g., cf. [22,1,5]. Here, we use the L 2 product for the computation of the symmetric snapshot inner product matrix Algorithm 2: snapshot proper orthogonal decomposition (snapshot POD) Input : s snapshot parameters p (i) ⊆ P; reduced dimension m (or: threshold δ) Output: reduced basis functions ψ k (k = 1, . . . , m) and coefficient matrix V 1 for i = 1, . . . , s do 2 solve high-fidelity problem (e.g. via FE solution; Algorithm 1) for parameter p (i) → u (i) 3 store u (i) as i-th column of the snapshot matrix S 4 compute snapshot inner product matrix M := S T LS with metric L (e.g. mass matrix) 5 compute m leading eigenpairs (q k , λ k ) of M (k = 1, . . . , m) . . , n) and ψ k = n j=1 ϕ j V jk A pseudo-code implementation of the snapshot POD is given by Algorithm 2.
We are well aware that there are many other techniques to determine reduced projection spaces, e.g., greedy techniques, balanced truncation, Hankel-norm approximation, moment matching and others [e.g., 3]. However, we do not aim at a comparison of projection space techniques, but rather want to focus on the treatment of the nonlinearities. Therefore, we decide for the POD space as common reduced projection space for all subsequent methods.
Note, that in view of (3) we have parameter separability for the Dirichlet data with two parameter-independent components. The discrete representation of the latter is given by Hence, any reduction scheme can still provide an approximation of u(p) via (14) which exactly matches the prescribed Dirichlet data. The components of the vectors u * ,1 , u * ,2 are chosen as where x i and y i denote the coordinates of node number i. Thereby, constant gradients within the structure are also represented exactly.

Galerkin reduced basis approximation
A classical Galerkin projection on a POD space is used to provide reference solutions for the other reduction methods. In order to solve the weak form of the nonlinear problem for given parameters g x , g y , a successive substitution is performed in which the local conductivity µ is computed using the previous iteration of the temperatureũ (α) := u * (p) + N V γ (α) in the reduced setting, where α ∈ N is the iteration number and γ (α) is the iterate of the reduced degrees of freedom in iteration α. As in a classical Galerkin approach, we assume the (reduced) test functioñ with arbitrary coefficients λ j (j = 1, . . . , m) represented by the vector λ. For convenience, we define the conductivity corresponding to the α-th iterate Here γ (α) can be interpreted as a constant parameter similar to p during the subsequent iteration which provides the new iterate γ (α+1) as the solution of Rewriting (22) using the approximated Jacobian J and the residual r leads to the linear system The projection of the residual onto the reduced basis defines the reduced residual In view of the definition of the reduced basis by L 2 (Ω)-orthogonal POD modes and with This gives rise to the simple convergence criterion δγ (α) l 2 < ǫ max , i.e. the iteration should stop upon sufficiently small changes of the temperature field. Algorithm 3 summarizes the online phase of the Galerkin RB method.

Algorithm 3: Galerkin reduced basis solution (online phase)
Input : parameter vector p ∈ P; reduced temperature matrix T i and gradient matrix G i at the integration points i = 1, . . . , n gp Output: reduced vector γ and nodal temperaturesũ (optional) The projected system can be interpreted as a finite element method with global, problem specific ansatz functions ψ k , whereas the classical finite element method uses local and rather general ansatz functions ϕ j (e.g., piecewise defined polynomials).
Note that the solution of (22), (23) is also a minimizer of the potential Therefore, variational methods can directly be applied to solve the minimization problem and alternative numerical strategies are available. Such variational schemes are also used, e.g., in the context of solid mechanical problems involving internal variables [e.g., 23].
The Galerkin RB method with well-chosen reduced basis ψ k (represented via the matrix V ) can replicate the FEM solution to a high accuracy (see Section 4). It also provides a significant reduction of the memory requirements: instead of u ∈ R n only γ ∈ R m needs to be stored. Despite the significant reduction of the number unknowns from n to m, the Galerkin RB cannot attain substantial accelerations of the nonlinear simulation due to a computationally expensive assembly procedure with complexity O(n gp ) for the residual vector r and the fixed point operator J (compare c rhs and c Jac in Algorithm 3 and Algorithm 1). However, if the linear systems are not solved with optimal complexity, e.g., using sparse LU or Cholesky decompositions with at least O(n 2 ), then a reduction of complexity can still be achieved. It shall be pointed out that for very large n (i.e. for millions of unknowns) the linear solver usually dominates the overall computational expense. Then the Galerkin RB can provide good accelerations without further modifications.
In order to significantly improve on the computational efficiency while maintaining the reduced number of degrees of freedom (and thus the reduced storage requirements), the Hyper-Reduction [11,12] and the Discrete Empricial Interpolation Method (DEIM, [7,8,9]) are used. Both methods are specifically designed for the computationally efficient approximation of the nonlinearity of the PDE.

Discrete Empirical Interpolation Method (DEIM)
The empirical interpolation method (EIM) was introduced by [6] to approximate parametric or nonlinear functions by separable interpolants. This technique is meanwhile standard in the reduced basis methodology for parametric PDEs. Discrete versions of the EIM for instationary problems have been introduced as empirical operator interpolation [24,7,8] or alternatively (and in some cases equivalently) as discrete empirical interpolation (DEIM) [9,10]. In particular a-posteriori [24,8,25] and a-priori [10] error control is possible under certain assumptions.
We present a formulation for the present stationary problem. Instead of approximating a continuous field variable, the goal of the discrete versions is to provide an approximationr for the vectorial nonlinearity of the nodal residual vector r of the form where the columns of U ∈ R n×M are called collateral reduced basis and P = [e i1 , . . . , e iM ] ∈ R n×M is a projection matrix with interpolation indices (also known as magic points) i 1 , . . . , i M ∈ {1, . . . , n}, with e i being the i-th unit vector. By multiplication of (27) with P T , we verify that In this sense the approximation acts as an interpolation within the set of magic points.
The identification of the interpolation points is an incremental procedure which is performed during the offline phase. We assume the existence of a set of training snapshots Y : i l := arg max i∈{1,...,n} Finally, we set P := P M , U := U M and I := I M , which concludes the construction ofr. Intuitively, in each iteration, the interpolation error q l for the current POD basis vector u l is determined, the vector entry i l with maximum absolute value is identified which gives the next index. Regularity of the matrix P T U is required for a well-defined interpolation. This condition is automatically satisfied under the afore-mentioned assumption of a sufficiently rich set of snapshots Y. As training set Y, one can either use samples of the nonlinearity ( [8]), POD-modes of these ( [10]), or use snapshots of the state vector or combinations thereof. In contrast to the instationary case, we may not use only training snapshots of r: As the residual r is zero for all snapshots, we would try to find a basis for a zero-dimensional space span(Y) = 0, which is not possible for M > 0. But the residual at the intermediate (non-equilibrium) iterates is non-zero and this is also a good target quantity for the (D)EIM, as these terms appear at the right hand side of the linear system during the fixed point iteration. Hence, a reasonable set Y is obtained via where α 1 , . . . , α s are the number of fixed point iterations of the full simulation scheme for parameters p (1) , . . . , p (s) .
Insertingr from (27) for the nonlinearity into the full problem and projection by left-multiplication with a weight matrix W ∈ R n×m yields the POD-DEIM reduced m-dimensional nonlinear system for the unknown γ This low-dimensional nonlinear problem is iteratively solved by a fixed point procedure, i.e. at the current approximation γ (α) we solve the linearized problem for δγ (α) and set γ (α+1) := γ (α) + δγ (α) . As in the previous sections, if M < n, this linear system cannot be solved uniquely. In that case, an alternative would be to solve a residual least-squares problem, similar to the GNAT-procedure, cf. [15]. Note, that the assembly of this system does not involve any high-dimensional operations, as the product of the first four matrices on the left and right hand side can be precomputed as a small matrix X := W T U (P T U ) −1 . Then, the terms P T J , P T r also do not require any high-dimensional operations, as the multiplication with P T (·) = (·) I corresponds to evaluation of the "magic" rows of the Jacobian and right hand side, respectively. Typically in discretized PDEs, these M rows only depend on fewM entries of the unknown variable (e.g. the DOFs related to neighboring elements). This numberM is typically bounded by a certain multiple of M due to regularity constraints on the mesh [8].
In (34) it can be recognized, what is required for the collateral basis U : If J (V γ; p)V δγ ∈ colspan(U ) and r(V γ; p) ∈ colspan(U ) then we are exactly solving the Galerkin-POD reduced linearized system Hence, this gives a guideline for an alternative reasonable choice of the training set Y, namely consisting of snapshots of both (columns of) J or J V and r.
In the Galerkin projection case, one can choose W = V . This is the choice that we pursue in the experiments to make the procedure more similar to the other reduction approaches. The offline phase of the DEIM is summarized in Algorithm 4 and an algorithm of the online phase is provided in Algorithm 5.

Hyper-Reduction (HR)
In order to improve the numerical efficiency, the Hyper-Reduction method [11] introduces a Reduced Integration Domain (RID) denoted Ω Z ⊂ Ω. The RID depends on the reduced basis. It is constructed by offline algebraic operations. The hyper-reduced equations are a Petrov-Galerkin formulation of the equilibrium equations, obtained by using truncated test functions having zero values outside the RID. The vector form of the reduced equations is similar to the one obtained by the Missing Point Estimation method [26] proposed for the Finite Volume Method. The strength of the Hyper-Reduction is its ability to reduce mechanical models in material science while keeping unchanged the formulation of the constitutive equations [12]. The smaller the RID, the lower the computational complexity and the higher the approximation errors. These points have been developed in previous papers dealing with various mechanical problems [e.g. 27,28].
The offline procedure of the Hyper-Reduction method involves two steps. The first step is the construction of the Reduced Integration Domain Ω Z . For the present benchmark test the RID is the union of a subdomain denoted by Ω u generated from the reduced vector gradients (∇ψ k ) k=1,...,m , and a domain denoted by Ω + corresponding to a set of neighboring elements to the previous subdomain. Usually in the Hyper-reduction method, the user can select an additional subdomain of Ω in order to extend the RID over a region of interest. This subdomain is denoted by Ω user . In the sequel, to get small RIDs, Ω user is empty. The set Ω u consists of aggregated contributions Ω u k , k = 1, . . . , m from all the reduced vectors: To give the full details of the procedure, we introduce the domain partition in finite elements denoted (Ω e j ⊂ Ω) j=1,...n el : Ω = ∪ n el j=1 Ω e j , where n el is the number of elements in the mesh. The domain Ω u k is the element where the maximum L 2 norm of reduced vectors ∇ ψ k is reached. In [11], ∇ ψ k was set equal to ∇ψ k . Here, when applying the DEIM to (∇ψ k ) k=1,...,m , the interpolation residuals provide a new reduced basis (q k ) k=1,...,m (cf. Algorithm 6) related to temperature gradients. In this paper, ∇ ψ k is the output reduced basis produced by the DEIM, when it is applied to (∇ψ k ) k=1,...,m . Other procedures, for the RID construction, are available in previous papers on hyper-reduction. The element selection reads for k = 1, . . . , m: where . L 2 (Ω e j ) is the L 2 norm restricted to the element Ω e j . Several layers of surrounding elements denoted Ω + can be added to Ω u .
The second step of the offline Hyper-Reduction procedure is the generation of truncated test functions which are zero outside of the RID. The truncation operator P Z is defined for all u h ∈ V h 0 by Here, F is the set of indices of internal points, i.e., inner FE nodes, in Ω Z which are related to the available FE equilibrium for predictions which are forecasted only over Ω Z , i.e. for all w ∈ V h 0 holds with The operator P Z can be represented by a truncated projector denoted Z. More precisely, if F = {i 1 , i 2 , . . . , i l } with l := card(F ), then with e i ∈ R n the i-th unit vector. Therefore, the hyper-reduced form of the linearized prediction step is: for a given γ (α) , find δγ (α) such that, where J is given by (12) and γ (α+1) := γ (α) + δγ (α) .
In addition to Z we introduce also the operatorZ ∈ R n×l that is a truncated projection operator onto thel points contained in the RID. In practice, the discrete unknowns are computed at thesel ≥ l points in order to compute the residual at the inner points l. Note that oftenl is significantly larger than l, especially if the RID consists of disconnected (scattered) regions.
The complexity of the products related to the fixed point operator J at the left hand side term scale with 2ζlm + 2lm 2 where ζ is the maximum number of non-zero entries per row of J . For the right hand side the computational complexity is 2lm. For both products, the complexity reduction factors are n/l. To obtain a well-posed hyper-reduced problem one requires to fulfill the following condition l ≥ m. If this condition is not fulfilled, the linear system of equations (41) is rank deficient. In case of rank deficiency, one has to add more surrounding elements to the RID. The closer l to m, and the lower m, the less complex is the solution of the hyper-reduced formulation. The RID construction must generate a sufficiently large RID. If not, the convergence can be hampered, the number of correction steps can be increased and, moreover, the accuracy of the prediction can suffer. When Ω Z = Ω then Z is the identity matrix and the hyper-reduced formulation coincides with the usual system obtained by the Galerkin projection. An a posteriori error estimator for hyper-reduced approximations has recently been proposed in [29] for generalized standard materials.
The offline phase of the hyper-reduction is summarized in Algorithm 6 and an algorithm of the online phase is provided in Algorithm 7. 9 Ω Z ← Ω Z ∪ Ω user ; // zone of interest is considered 10 constructZ and Z based on all/internal-only points of Ω Z

Methodological comparison
We comment on some formal commonalities and differences between the HR and the (D)EIM.

Algorithm 7: online phase of the Hyper-Reduction (HR)
Input : parameter vector p ∈ P; reduced basis; HR offline outputs Output: reduced vector γ and nodal temperaturesũ (optional) 1 set u (0) = u * (p); α = 0 ; // initialization 2 set r = 0, J = 0 ; // reset r.h.s. and Jacobian 3 for e ∈ Ω Z do // loop over reduced integration domain We first note, that both methods reproduce the Galerkin-POD case, if l = M = n. For the HR this means that the RID is the full domain which implies that Z is a square permutation matrix, hence being invertible and yielding ZZ T = I, thus (41) reduces to the POD-Galerkin reduced system (23). For the (D)EIM this implies that the magic points consist of all grid points. We similarly obtain that P and U are invertible and thus U (P T U ) −1 P T = I and (34) also reproduces the POD-Galerkin reduced system (23).
Further, we can state an equivalence of the DEIM and the HR under certain conditions, more precisely, the reduced system of the HR is a special case of the DEIM reduced system. Let us assume that the sampling matrices coincide and the collateral basis also chosen as this sampling matrix, i.e. U = P = Z.
Let us further assume that we have a Galerkin projection by choosing W = V for the DEIM. Then P T U = Z T Z = I M is the M -dimensional identity matrix, hence we obtain Then (34) yields which exactly is the HR reduced system (41).
A common aspect of HR, and (D)EIM obviously is the point selection by a projection matrix. The difference, however is the selection criterion of the internal points. In case of the DEIM these points are used as interpolation points, while for the HR they are used to specify the reduced integration domain.
A main difference of (D)EIM to HR is the way an additional collateral reduced-space is introduced in the reduced setting of the equations. The HR is more simple by not using an additional basis related to the residuals, but the implicit assumption, that colspan(V ) (which approximates u) also approximates r and J well. This is a very reasonable assumption in symmetric elliptic problems and -in a certain way -it mimics the idea of having the same ansatz and test space as in any Galerkin formulation. But, from a mathematical point of view, it may not be valid in some more general cases, as in principle U and V are completely independent. E.g. we can multiply the vectorial residual equation (9) by an arbitrary regular matrix, hence arbitrarily change r (and thus U for the DEIM), but not changing u at all (i.e. not changing the POD-basis U ). Hence, the collateral basis in the (D)EIM is first an additional technical ingredient and difficulty, which in turn allows to adopt the approximation space to the quantities, that need to be approximated well.
Theoretically, the EIM is well founded by analytical convergence results [30]. But also, as a downside, the Lebesgue-constant, which essentially bounds the interpolation error to the best-approximation error, can grow exponentially. The DEIM is substantiated with a-priori error estimates [10]. We are not aware of such a-priori results for the HR, but also a-posteriori error control has been presented in [29].

Computational complexity
With respect to the runtimes, we have decided not to provide numbers as these would heavily depend on the chosen implementation. In order to be more specific, the computational effort can generally be decomposed into: • the computation of the local unknowns and of their gradients c reloc (gradient/temperature computation), • the evaluations of the (nonlinear) constitutive model c const , • the assembly of the residual c rhs and of the Jacobian c Jac , • the solution of the (dense) reduced linear system c sol .
The presented methods differ with respect to c reloc , c const , c rhs and c Jac : c Jac = (4m + m 2 )n gp (direct Jacobian assembly) • Hyper-Reduction In the following n RID gp is the number of integration points in the RID. Further, c N,B FE denotes the cost for the evaluation of u and ∇u using the FE matrices N and B and c r FE is the related to the cost for the residual computation on element level (both at least linear in the number of nodes per element + scattered assembly + overhead) and c K FE is the cost related to the contribution to the element stiffness at one element (∼number of nodes per element squared + scattered assembly + overhead). Lastly, c A is the cost for the Jacobian assembly (i.e. matrix scatter operations). c Jac = (mω + m 2 )l + n RID gp c K FE + c A (Jacobian assembly and projection) c sol ∼ m 3 .

• Discrete Empirical Interpolation Method
The computational cost for the DEIM is closely related to the one of the HR by substituting M for l andM forl (denoting the number of nodes which are needed to evaluate the residual at the M magic points). In the cost notation of the Algorithm 5, we obtain c rhs ∼ MM (residual assembly and projection) c Jac ∼ MM m (Jacobian assembly and projection) From these considerations, the following conclusions can be drawn: First, the number of integration points (n gp , n RID gp , n DEIM gp ) required for the residual and Jacobian evaluation enter linearly into the effort. Second, the reduced basis dimension m enters linearly into the residual assembly and both linearly and quadratically into the Jacobian assembly. Third, for the HR and the (D)EIM, the ratiol/l andM /M have a significant impact on the efficiency: for the considered 2D problem with quadratic ansatz functions these ratios can range from 1 up to 21, i.e. for the same number of magic points pronounced variations in the runtime are in fact possible. The ratiol/l is determined by the topology of the RID, i.e. for connected RIDs it is smaller than for a scatter RID (i.e. for many disconnected regions forming the RID). Similarly, the (D)EIM has much smaller computational complexity in the case of magic points belonging to connected elements. Fourth, the Galerkin-POD can be based on simplified algebraic operations as no nodal variables need to be computed. This is due to the fact that the reduced residual and Jacobian are directly assembled without recurse to nodal coordinates and to any standard FE routine.

ONLINE/OFFLINE decomposition and RB identification
In the following we investigate the behavior of the heat conduction problem for parameters which we recall from equation (6) p = [g x , g y , c, µ 0 , First, a regular parameter grid containing 125 equidistant snapshot points is generated and the high-fidelity FE model is solved for all those points yielding solutions u h,i , i = 1, . . . , 125. Here, the FEM discretization is based on a discretization into 800 biquadratic quadrilateral elements comprising a total of 2560 nodes (including 160 boundary nodes). The problem hence has n = 2400 unknowns. In order to exemplify the nonlinearity of the problem due to the temperature dependent conductivity, the conductivity (top row) and the temperature field (bottom row) are shown for three different parameters in Figure 2.
Then a snapshot POD is performed in order to obtain a RB from the snapshots. Different dimensions of the RB are considered with the approximation error given in Table 1. HereP denotes the orthogonal projection operator onto the m dimensional RB with respect to the standard inner product ·, · L 2 (Ω) in the L 2 (Ω) function spacẽ The low approximation errors given in Table 1 indicate, that the training data can essentially be represented rather accurately by the RB using a projection. Additionally, the projection error naturally  decreases with increasing dimension. However, the solution of the reduced problem does not necessarily follow the same monotonicity. Since µ is bounded away from 0 due to µ 0 > 0 the problem under consideration is coercive. Similar to the linear case we expect that the approximation error e(p) = u(γ; p) − u h (p) and the projection error are comparable in the sense that is small. Due to the best-approximation by the orthogonal projection, we certainly have C(p) ≥ 1. The constant C(p) cannot be provided in closed form for the considered nonlinear problem, but it can only be computed a posteriori by means of comparing the reduced to the full solution. Numerical values are provided in the following.

Test cases
In order to investigate the accuracy of the reduced models additional parameter vectors p need to be considered that are not matching the training data. Two test cases are considered in the sequel: [A] A diagonal in the parameter space is considered with A total 101 equally spaced values of β (j) was chosen, i.e. β (j) = j 100 for j = 0, 1, . . . , 100.
[B] A set of 1000 random parameters p (j) was generated using a uniform distribution in parameter space, i.e. a uniform distribution U([0, 1]) was chosen for g x , g y and the parameter c was assumed to be distributed via U( [1,2]).

Certification of the Galerkin RB method
First, the ability of the Galerkin RB solution to approximate the optimal orthogonal projection and, thereby, the high-fidelity solution, was verified. Therefore, the constant C(p) was evaluated for all  Table 2 state the POD approximation error is found close to the projection error. This confirms the quality of the chosen RB. In addition to the error magnification parameter C(p), the minimum, average and maximum of the relative error are also computed for all samples. The results are provided in Table 3. Note that for test case [A] the minimum error is truly zero for g z = g y = 0 which implies a homogeneous zero temperature. For a RB of dimension 32 the mean error is well below 10 -3 for all tests and the maximum error over all tests is 3.23·10 -3 . This basis provides a compromise between accuracy and computational cost and is therefore used for the comparison of the methods in the sequel.  Note that the slow decay of the accuracy of the Galerkin approximation indicated by the data provided in Table 3 indicates that the information content captured in the training snapshots is not sufficient to provide better accuracies. Therefore we decided on a dimension m = 32 for the reduced basis in the subsequent experiments.
With respect to the computational efficiency of the RB solution it is observed that, unfortunately, the Galerkin RB solution is almost as costly as the high fidelity solution. This is due to the fact that the assembly of the reduced system is computationally intense. This aspect becomes more important for rather low-dimensional problems such as the one considered here. In order to actually reduce the computational cost the use of additional reduction techniques such as the Hyper-Reduction (HR) and Discrete Empirical Interpolation Method (DEIM) is required. Note also that the reduction techniques using a POD basis are only approximations of the Galerkin RB. Hence, the HR and DEIM cannot be better than the Galerkin RB solution except in few cases where C HR (p) < C(p) or C DEIM < C(p), where C HR (p) and C DEIM (p) denote the constant C from (47) for the HR and DEIM method. In our numerical tests this occurred only exceptionally.

Application to Uncertainty Quantification
In real world simulation scenarios, material coefficients and boundary conditions are often not exactly known and one is interested in the impact of this uncertainty on the quantities of interest. To this end, uncertainty quantification (UQ) has been proposed and has become an active research field on its own. In classical forward UQ, the critical parameters are modeled as random variables; the distributions and correlation are derived from measurements as for example shown for nonlinear material curves in [31].
Finally, the forward model is evaluated at collocation points p i in the parameter space according to a quadrature method as e.g. Monte Carlo, [32]. Typically many collocations (or 'sampling') points are needed and therefore model order reduction has been shown to significantly reduce the computational costs, e.g. [33].
In the case of our thermal benchmark problem, the parameter vector p = P (ω) is considered as a realization of the random vector, where ω ∈ Ω p and (Ω p , F , P) is the usual probability space. We refer to this as test case [C] and assume that the random variables are independent and uniformly distributed as already introduced in test case [B] with G x , G y ∼ U([0, 1]) and C ∼ U( [1,2]).
Finally, the statistical moments of the solution u ⋆ (x; P (ω)) are approximated by the Monte Carlo method, e.g. [32] E where p (j) are the same n p = 1000 random sample points generated for case [B] and u ⋆ and m k ⋆ are the approximations of the solution and its k-th centered moment (k > 1) obtained from Finite Elements, Galerkin-RB, DEIM and HR, i.e., ⋆ ∈ {h, RB, DEIM, HR}, respectively. For the DEIM we selected M = 400 magic points.
An estimation of the normalized root mean square error of the finite element solution due to Monte Carlo sampling can be obtained by This implies that the accuracy of the reduced models in the prediction ofū ⋆ should be around 2% or better, in order to render the reduced models capable of providing meaningful quantitative statistical information. Figure 3 shows the error in the moments computed for the various reduction techniques with respect to the finite element reference solution .
The figure indicates that the approximations of the expected valueū ⋆ are at least accurate up to 10 −3 and thus below (i.e. better than) the sampling accuracy E MC . Generally, the DEIM tends to perform better than the HR; the largest errors occur for E 7 HR = 1.09 · 10 −1 and E 7 DEIM = 3.46 · 10 −2 , respectively.

Accuracy of the HR and DEIM vs. number of interpolation points
Both, the HR and the DEIM sample the nonlinearity only at entries of the right hand side: the interpolation points. The HR has one additional parameter ℓ describing how many element layers around a certain degree of freedom (DOF) should be considered in order to generate Ω + which add additional interpolation points to the RID. The location of the DOF around which the interpolation points are located are selected based on the criterion described in Section 3.3. In contrast to that the DEIM selects  Figure 4 in terms of the statistical distribution function P (t) of the relative error, i.e., the probability of finding a relative error δ that is smaller or equal than t.
Obviously the number of interpolation points has a pronounced impact on the distribution. Generally, the error function for low numbers of points states a significantly increase of the computational error due to DEIM in comparison with the POD. With increasing number of points the distribution function approaches the one of the POD. In our test the use of more than 300 sampling points can only improve the accuracy in a minor way. We must note that, in general, the accuracy of the DEIM must not be a monotonic function of interpolation points number.  For the hyper-reduced predictions, different layers of elements are added in Ω + in order to extend the RID. We have considered here ℓ = 1, 2, 3 and 4 layers of elements connected to Ω u , for both test cases [A] and [B]. The resulting relative errors are compared in Figure 5 in terms of the statistical distribution function P (t) of the relative error. With increasing number of layers the distribution function approaches the one of the POD. The number of internal points, l, increases rapidly when increasing the number of layers. In the case of the DEIM the growth of interpolation points is much more progressive. More than two layers of elements do not improve the accuracy significantly.
One layer of additional elements gives predictions less accurate than the DEIM for approximately the same number of internal points/interpolation points (here: l = 416 for the HR and M = 300 for the DEIM). However, the number of additional points required for the residual evaluation differs considerable: l = 657 vs.M = 1490. This can be explained by direct comparison of the reduced domains in Figure 6. For the Hyper-reduction the RID is rather compact (leftmost plot) while the magic points of the DEIM are rather scattered, thus requiring the temperature evaluation at many additional points.
The predictions of the Hyper-reduction are less accurate especially which can especially be seen by comparing Figure 4 (left, test case [A], black line) and Figure 5 (left, test case [A], red line), where 80% of the samples lead to errors below ≈ 0.002 for the DEIM and ≈ 0.01 for the HR. Nevertheless, the accuracy of the hyper-reduced predictions is generally of the same order of magnitude as the accuracy of the DEIM.

Summary and conclusion
The presented study revisits a -at first sight -rather simple nonlinear heat conduction problem. In order to accelerate the nonlinear computations, a Galerkin reduced basis ansatz is proposed (see Section 3.1) using preliminary offline computations in the established framework of the snapshot POD.
Both the HR and the (D)EIM can achieve significant accelerations of the computing time. These scale approximately with the number of magic points (here: l or M ) and/or with the number of points connected to the magic points (l andM , respectively). In the presented examples less than 25% of the original mesh were considered in both, the (D)EIM and the HR. By virtue of the considerations presented in Section 3.5 the computing times are reduced accordingly.
The selection of the magic points in the HR and the (D)EIM requires the computation of the solution at an increased number of nodes, i.e.l ≥ l andM ≥ M , respectively. The higher the space dimension (2D, 3D, . . . ), the more scattered the magic points and the higher the number of degrees of freedom per element, the more arel andM increased in comparison to l and M .
Our examples also show that coarse predictions of the HR resulted in more robust computations than for the same number of magic points in the DEIM. However, the DEIM can achieve accuracies that are better than the ones of the HR in some situations, i.e. if the collateral basis is sufficiently accurate.
With respect to the implementation it shall be noted that the HR is less intrusive than the (D)EIM as it uses standard simulation outputs to generate the modes while the (D)EIM requires additional outputs for the construction of the collateral basis. Other than that, both techniques can be implemented using mostly the same implementation which is also confirmed by the similarity of both techniques presented in Section 3.4.