Math. Comput. Appl. 2018, 23(1), 8; https://doi.org/10.3390/mca23010008

Article
An Algorithmic Comparison of the Hyper-Reduction and the Discrete Empirical Interpolation Method for a Nonlinear Thermal Problem
1
EMMA—Efficient Methods for Mechanical Analysis, Institute of Applied Mechanics (CE), University of Stuttgart, Pfaffenwaldring 7, 70569 Stuttgart, Germany
2
Institute of Applied Analysis and Numerical Simulation, University of Stuttgart, Pfaffenwaldring 57, 70569 Stuttgart, Germany
3
MAT-Centre des Matériaux, MINES ParisTech, PSL Research University, CNRS UMR 7633, BP 87, 91003 Evry, France
4
Graduate School of Computational Engineering, Technische Universität Darmstadt, Dolivostraße 15, 64293 Darmstadt, Germany
*
Author to whom correspondence should be addressed.
Received: 19 December 2017 / Accepted: 7 February 2018 / Published: 13 February 2018

## Abstract

:
A novel algorithmic discussion of the methodological and numerical differences of competing parametric model reduction techniques for nonlinear problems is presented. First, the Galerkin reduced basis (RB) formulation is presented, which fails at providing significant gains with respect to the computational efficiency for nonlinear problems. Renowned methods for the reduction of the computing time of nonlinear reduced order models are the Hyper-Reduction and the (Discrete) Empirical Interpolation Method (EIM, DEIM). An algorithmic description and a methodological comparison of both methods are provided. The accuracy of the predictions of the hyper-reduced model and the (D)EIM in comparison to the Galerkin RB is investigated. All three approaches are applied to a simple uncertainty quantification of a planar nonlinear thermal conduction problem. The results are compared to computationally intense finite element simulations.
Keywords:
model order reduction (MOR); reduced basis model order reduction (RB MOR); uncertainty quantification (UQ); (discrete) empirical interpolation method (EIM; DEIM); hyper-reduction (HR)

## 1. Introduction

Numerical models in engineering or natural sciences are getting more and more complex, may be nonlinear and depending on unknown or controllable design-parameters. Simultaneously, simulation settings increasingly move from single-forward simulations to higher-level simulation scenarios. For example, optimization and statistical investigations require multiple solves, interactive applications require real-time simulation response, or slim-computing environments, e.g., simple controllers, require rapid and memory-saving models. For such applications, the field of model reduction has gained increasing attention during the last decade. Their goal is an acceleration of a given numerical model based on the construction of a low-dimensional approximate surrogate model, the so-called reduced order model. Due to the reduced dimension, the computation should ideally be rapid, hence be applicable for the mentioned multi-query, real-time or slim-computing simulation scenarios. Well-known techniques for linear problems comprise Proper Orthogonal Decomposition (POD) [1,2], control-theoretic approaches such as Balanced Truncation, Moment Matching or Hankel-norm approximation . For parametric problems, certified Reduced Basis (RB) methods have been developed [4,5]. Nonlinear problems pose additional challenges. In particular, there exists a well-known drawback with POD, which is that a high-dimensional reconstruction of the reduced solution is required for each evaluation of the nonlinearity. Several approximation techniques exist, which provide a remedy of this problem. Mainly, these approaches are sampling-based techniques such as Empirical Interpolation  and discrete variants [7,8,9,10] or Hyper-Reduction [11,12]. Note that also further approaches exist, such as Gappy-POD , Missing Point Estimation (MPE)  or Gauss–Newton with approximated Tensors (GNAT)  and Energy Conserving Mesh Sampling and Weighting (ECSW) . Most of those methods identify a subset of the arguments of the nonlinear function. Then, based solely on an evaluation of these few components, they construct an approximation of the full solution.
Nonlinear problems emerge in many applications. For instance, the effective behavior of dissipative microstructured materials needs to be predicted by nonlinear homogenization techniques. These imply a multi-query context in the sense of different loadings (and load paths) applied to the reference volume element in order to obtain the related effective mechanical response. Reduced basis methods combining the purely algorithmic gains of the reduced basis with a reformulation of the problem incorporating micromechanics have shown to be efficient for the prediction of the effective material response, for multi-level FE simulations and for nonlinear multiscale topology optimization (e.g., [17,18]).
In this paper, we analyze and compare two of those efficient methods, namely Discrete Empirical Interpolation (DEIM) and Hyper-Reduction (HR), for the reduction of a non-trivial nonlinear parametric thermal model. The comparison is carried out on the numerical, algorithmical and mathematical level. A new condition for which DEIM and HR are equivalent is given and numerical examples illustrate the application. Similar studies, often with stronger emphasis on the performance, have also been recently conducted by other authors, e.g., .
The paper is structured as follows. We introduce the nonlinear parametrized thermal model problem in Section 2. Subsequently, in Section 3, the methods under investigation are introduced and formally compared: as a benchmark, the POD–Galerkin procedure is formulated, then the DEIM and the hyper-reduction technique. The numerical comparison is provided in Section 4. A classical scenario from uncertainty quantification is demonstrated since model order reduction is particularly interesting for such many-query scenarios. Finally, we conclude in Section 5.

#### 1.1. Nomenclature

In the manuscript, the following notation is used: bold face symbols denote vectors (lowercase letters) or matrices (uppercase letters). The spatial gradient operator is denoted by $∇ •$ and the divergence is expressed by $∇ · •$. The dependence on spatial coordinates is omitted for simplicity of notation.

## 2. Nonlinear Reference Problem

#### 2.1. Strong Formulation

In the following, we consider a stationary planar nonlinear heat conduction problem on a plate with a hole (see Figure 1). The problem is parametrized by a vector p. The nonlinearity of the problem is introduced by an isotropic Fourier-type heat conductivity $μ ( u ; p )$ that depends on the current temperature u and the parameter vector. The Dirichlet and Neumann boundary data are denoted by $u *$ and $q *$, respectively. Note that, in the example, $q * = 0$ is considered. However, the formulation of the weak form is considered with $q *$ being arbitrary for the sake of generality. The corresponding Dirichlet and Neumann boundaries are denoted by $Γ 1$ and $Γ 2$, respectively. The strong formulation of the boundary value problem is
$− ∇ · μ ( u ; p ) ∇ u = 0 in Ω , u = u * ( p ) on Γ 1 , − μ ( u ; p ) ∇ u · n = q * on Γ 2 ,$
where $μ$ is the temperature dependent conductivity.

#### 2.2. Weak Formulation

The weak form of the heat conduction problem (1) is given by
$a ( u , δ u ; p ) − l ( δ u ; p ) = 0 ( ∀ δ u ∈ V 0 ) ,$
where
$a ( u , δ u ; p ) = ∫ Ω μ ( u ; p ) ∇ u · ∇ δ u d Ω , and l ( δ u ; p ) = − ∫ Γ 2 δ u q * d Γ ,$
with the unknown temperature field $u ∈ V = V 0 + { u ¯ * ( p ) }$ being sought-after. The function space $V 0 : = { u ∈ H 1 ( Ω ) : u = 0 on Γ 1 }$ is referred to as space of test functions vanishing on the Dirichlet boundary $Γ 1$ and $u ¯ * ( p ) ∈ H 1 ( Ω )$ is a field defined on the full domain that satisfies the Dirichlet conditions. The Dirichlet conditions on $Γ 1$ are assumed to depend on two parameters $g x$ and $g y$ via
$u * ( p ) : = g x x + g y y , g x , g y ∈ [ 0 , 1 ] , x , y ∈ Γ 1 .$
A trivial choice is to set $u ¯ * ( p ) = g x x + g y y$ in the full domain. This particular choice is considered in the remainder of this study. While the solution space $V$ depends on the parameters p via $u ¯ *$, the space of test functions $V 0$ is independent of the parameters p. For the heat conductivity $μ$, an explicit dependence on the temperature via the nonlinear constitutive model
$μ ( u ; p ) : = max μ 1 , μ 0 + c u$
is assumed (see also Figure 1, right). The parameter vector p in the present context is
$p : = [ g x , g y , c , μ 0 , μ 1 ] ,$
and attention will be confined to the compact parameter domain
$p ∈ P : = [ 0 , 1 ] × [ 0 , 1 ] × [ 1 , 2 ] × { 1 } × { 0.5 } ⊂ R 5 .$

#### 2.3. Discrete Formulation

In order to solve the nonlinear problem (1), nodal Finite Elements (FE) are used in a classical Galerkin formulation (e.g., [20,21]). The space of finite element test functions $V 0 h ⊂ V 0$ is assumed to be spanned by n linearly independent and continuous ansatz functions $φ i$ associated with nodes $( x i , y i )$ and $φ i ( x j , y j ) = δ i j$ where $i , j = 1 , … , n$. We assume a solution expansion into a linear field $u ¯ *$ defined over the full domain $Ω$ via (4) and comply to the parameterized boundary data and a fluctuation term according to
$u h ( p ) : = u ¯ * ( p ) + ∑ j = 1 n w h , j ( p ) φ j , u ¯ * ( p ) : = g x x + g y y ∀ x , y ∈ Ω .$
Here, the coefficient vector $w ( p ) = w h , j ( p ) j = 1 n ∈ R n$ contains the nodal temperature fluctuations, which represent the unknowns of the system. They define the coefficient vector containing the nodal temperatures via
$u ( w ; p ) = w ( p ) + u ¯ * ( p ) ,$
where $u ¯ * ( p )$ is the vector composed of the nodal values of the superimposed linear field $u ¯ * ( p )$. The discrete nonlinear equations that have to be solved for any given parameter vector p are given by
$r i ( w ; p ) : = a u ¯ * ( p ) + ∑ j = 1 n w h , j φ j , φ i ; p − l ( φ i ; p ) = 0 ∀ i ∈ { 1 , ⋯ , n } .$
In our specific numerical example, the linear functional $l ( φ i ; p )$ is zero due to the chosen homogeneous Neumann conditions. However, problem (10) still has a nontrivial solution due to the inhomogeneous Dirichlet data provided through $u ¯ * ( p )$. Condition (10) can be expressed in a compact representation using the vector notation
$r ( w ; p ) : = r i ( w ; p ) i = 1 n = 0 .$
In order to solve the nonlinear problem (11), expressed component-wise in (10), the finite element method is used to provide the functions for the expansion (8). In the following, N denotes a row vector of finite element ansatz functions for the temperature, and G and $G T$ are the discrete gradient and divergence matrices, respectively. The FE approximation of the temperature u and its gradient $∇ u$ are given by
$u h ( w ; p ) = N u ( w ; p ) = N w ( p ) + N u ¯ * ( p ) ,$
$∇ u h ( w ; p ) = G u ( w ; p ) = G w ( p ) + g ¯ ( p ) , g ¯ ( p ) = g x g y .$
Then, the exact Jacobian of the finite element system is
$J * ( w ; p ) = ∫ Ω μ ( u h ( w ; p ) ; p ) G T G + ∂ μ ( u h ( w ; p ) ; p ) ∂ u h G T G u ( w ; p ) N d Ω ,$
where the first part is the classical finite element stiffness matrix and the second part accounts for the thermal sensitivity of the conductivity $μ$. In the following, a symmetric approximation of $J *$ given by
$J ( w ; p ) = ∫ Ω μ ( u h ( w ; p ) ; p ) G T G d Ω$
is used, i.e., the thermal sensitivity is neglected. The numerical solution of the nonlinear problem (11) is then obtained by a fixed-point scheme using J as an approximation of the differential stiffness matrix $J *$. The resulting iteration scheme is commonly referred to as successive substitution (Ref. , p. 66). Note also that J is well defined for arbitrary thermal fields, while the exact Jacobian is not defined, or more precisely the Jacobian is semi-smooth, for the critical temperature $u c = ( μ 1 − μ 0 ) / c ,$ which implies a non-differentiability of the conductivity $μ$.
Algorithm 1 illustrates the nonlinear finite element solution procedure for the considered problem. The comment lines in the algorithm comprise references to computational costs $c reloc$ (computation of u and $∇ u$), $c const$ (for the constitutive model), $c rhs$ (linked to the residual computation), $c Jac$ (related to the Jacobian) and $c sol$ (for the linear solver). This will become relevant in the comparison section.
 Algorithm 1: Finite Element Solution #### 2.4. Reduced Basis Ansatz

In the following, an m-dimensional basis of global ansatz functions $ψ k$ is considered in the reduced framework, where $m ≪ n$ is implicitly asserted in order to obtain an actual model order reduction. The functions $ψ k$ define a linear subspace of $V 0 h$, which can be characterized by a matrix $V = ( V i j ) ∈ R n × m$ via
$ψ k : = ∑ i = 1 n V i k φ i ( k = 1 , ⋯ , m ) .$
The reduced coefficient vector $γ ( p ) = ( γ k ( p ) ) k = 1 m ∈ R m$ defines the temperature fluctations in the reduced setting by the affine relation
$u ˜ ( p ) = u ˜ ( γ ( p ) ; p ) : = u * ( p ) + w ˜ ( p ) , w ˜ ( p ) : = ∑ k = 1 m γ k ( p ) ψ k .$
The corresponding vector of discrete values $u ˜ ( p ) : = ( u ˜ j ( p ) ) j = 1 n ∈ R n$ of the temperature with respect to the basis of $V 0 h$ is given by the linear relation $u ˜ = u ¯ * ( p ) + w ˜ ( p )$ with the coefficient vector $w ˜ ( p ) = V γ ( p ) = w ˜ ( p ) j = 1 n$ of the reduced fluctuation field. Hence, the reduced fluctuation $w ˜$ can equivalently be expressed as
$w ˜ ( p ) = ∑ j = 1 n w ˜ j ( p ) φ j = ∑ j = 1 n φ j ∑ k = 1 m V j k γ k ( p ) .$
In the following, the matrix V defining the space of the reduced ansatz functions is assumed to be given by a snapshot POD  obtained from full resolution finite element simulations evaluated at s different parameter vectors $p ( i )$ ($i = 1 , ⋯ , s$). The POD subspace $V ˜ 0 h ⊂ V 0 h$ is then obtained as best-approximating m-dimensional subspace with respect to the $L 2 ( Ω )$-error, i.e.,
$V ˜ 0 h = arg min dim ( V ˜ ) = m V ˜ ⊂ V 0 h ∑ i = 1 s w h ( p ( i ) ) − P V ˜ ( w h ( p ( i ) ) L 2 ( Ω ) 2 ,$
where $P V ˜$ is the orthogonal projection of the thermal fluctuations onto the subspace $V ˜$. Numerically, a POD-basis of this space can be obtained by a corresponding matrix eigenvalue problem, e.g., cf. [1,5,24]. Here, we use the $L 2 ( Ω )$ inner product for the computation of the symmetric snapshot inner product matrix
$C = ( C i j ) i , j = 1 s ∈ R s × s , C i j : = ∫ Ω w h ( p ( i ) ) w h ( p ( j ) ) d Ω .$
Note that the correlation matrix is computed from the temperature fluctuations $w h$ only in order to obtain a reduced basis that will comply with the prescribed Dirichlet conditions that are accounted for in $u ¯ * ( p )$. The entries of C are defined by the discrete snapshot solutions $w ( i )$ via the unit mass matrix M as follows:
$M : = ∫ Ω N T N d Ω , C i j = w ( i ) T M w ( j ) .$
A pseudo-code implementation of the snapshot POD is given by Algorithm 2. Alternatively, a weighted SVD of the snapshot matrix can be used to obtain the same basis. The classical SVD w.r.t. the $l 2$-norm may be less prone to truncation errors , but it will yield a result differing from the optimum in (19), which is based on optimality with respect to $∥ · ∥ L 2 ( Ω )$. This is due to the missing consideration of the inner product matrix M.
 Algorithm 2: Snapshot Proper Orthogonal Decomposition (Snapshot POD) There are many other techniques to determine reduced projection spaces, e.g., greedy techniques, balanced truncation, Hankel-norm approximation, moment matching and others (e.g., ). However, we do not aim at a comparison of projection space techniques, but rather focus on the treatment of the nonlinearities. Therefore, we decide for the POD space as common reduced projection space for all subsequent methods.
Note that, for the given problem, the Dirichlet data can be considered via the linear field $u ¯ * ( p )$ defined in (4). More precisely, $u ¯ *$ is parametrized by two independent parameters $g x$, $g y$. Its discrete representation is
$u ¯ * ( p ) = g x u ¯ * , 1 + g y u ¯ * , 2 ,$
with the components of the vectors $u * , 1$, $u * , 2$ defined as
$u * , 1 i = 1 n : = x i , u * , 2 i = 1 n : = y i ,$
where $x i$ and $y i$ denote the coordinates of node number i. By accounting for $u ¯ * ( p )$ via (22), the Dirichlet data is exactly represented without further algebraic constraints. Additionally, this approach captures exactly constant temperature gradients.

## 3. Sampling-Based Reductions

#### 3.1. Galerkin Reduced Basis Approximation

A classical Galerkin projection on a POD space is used to provide reference solutions for the HR and the DEIM. In order to solve the weak form of the nonlinear problem for given parameters $g x$, $g y$, a successive substitution is performed in which the local conductivity $μ$ is computed using the previous iteration of the temperature $u ˜ ( α ) : = u * ( p ) + w ˜ ( α )$ with $w ˜ ( α ) = N V γ ( α )$ in the reduced setting, where $α ∈ N$ is the iteration number and $γ ( α )$ is the iterate of the reduced degrees of freedom in iteration $α$. As in a classical Galerkin approach, we assume the (reduced) test function
$v ˜ : = ∑ i = 1 n ∑ j = 1 m φ i V i j λ j$
with arbitrary coefficients $λ j$ ($j = 1 , ⋯ , m$) represented by the vector $λ$. For convenience, we define the conductivity corresponding to the $α$-th iterate $γ ( α )$ of the reduced coefficient vector $γ ( p )$
$μ ( α ) : = μ u ˜ ( γ ( α ) ; p ) ; p .$
Here, $γ ( α )$ can be interpreted as a constant parameter similar to p during the subsequent iteration, which provides the new iterate $γ ( α + 1 )$ as the solution of
$a ( u ˜ , v ˜ ; p ) = ∑ j = 1 m λ j ∫ Ω μ ( α ) ∇ u ˜ ( γ ( α + 1 ) ; p ) · ∇ ψ j d Ω = 0 ∀ λ ∈ R m .$
Rewriting (26) using the approximated Jacobian J and the residual r leads to the linear system
$V T J ( V γ ( α ) ; p ) V ( γ ( α + 1 ) − γ ( α ) ) + V T r ( V γ ( α ) ; p ) = 0 .$
The projection of the residual onto the reduced basis defines the reduced residual
$r ˜ ( α ) : = V T r ( V γ ( α ) ; p ) .$
In view of the definition of the reduced basis by $L 2 ( Ω )$-orthogonal POD modes and with $δ γ ( α ) : = γ ( α + 1 ) − γ ( α ) ,$ one obtains
$∥ δ γ ( α ) ∥ l 2 = ∥ u ˜ ( α + 1 ) − u ˜ ( α ) ∥ L 2 ( Ω ) = ∥ w ˜ ( α + 1 ) − w ˜ ( α ) ∥ L 2 ( Ω ) .$
This gives rise to the simple convergence criterion $∥ δ γ ( α ) ∥ l 2 < ϵ max$, i.e., the iteration should stop upon sufficiently small changes of the temperature field. Algorithm 3 summarizes the online phase of the Galerkin RB method.
 Algorithm 3: Galerkin Reduced Basis Solution (Online Phase) The projected system can be interpreted as a finite element method with global, problem specific ansatz functions $ψ k$, whereas the classical finite element method uses local and rather general ansatz functions $φ j$ (e.g., piecewise defined polynomials).
Note that, the solution of (26), (27) is also a minimizer of the potential
$Π ( γ ; γ ( α ) , p ) : = 1 2 Ω μ ( α ) ∇ u ˜ ( γ ; p ) · ∇ u ˜ ( γ ; p ) d Ω .$
Therefore, variational methods can directly be applied to solve the minimization problem and alternative numerical strategies are available. Such variational schemes are also used, e.g., in the context of solid mechanical problems involving internal variables (e.g., ).
The Galerkin RB method with a well-chosen reduced basis functions $ψ k$ (represented via the matrix V) can replicate the FEM solution to a high accuracy (see Section 4). It also provides a significant reduction of the memory requirements: instead of $u ∈ R n$, only $γ ∈ R m$ needs to be stored. Despite the significant reduction of the number of unknowns from n to m, the Galerkin RB cannot attain substantial accelerations of the nonlinear simulation due to a computationally expensive assembly procedure with complexity $O ( n gp )$ for the residual vector r and for the fixed point operator J (compare $c rhs$ and $c Jac$ in Algorithm 1 and Algorithm 3). Here, $n gp$ is the number of quadrature points in the mesh. However, if the linear systems are not solved with optimal complexity, e.g., using sparse $L U$ or Cholesky decompositions with at least $O ( n 2 )$, then a reduction of complexity can still be achieved. It shall be pointed out that, for very large n (i.e., for millions of unknowns), the linear solver usually dominates the overall computational expense. Then, the Galerkin RB may provide good accelerations without further modifications.
In order to significantly improve on the computational efficiency while maintaining the reduced number of degrees of freedom (and thus the reduced storage requirements), the Hyper-Reduction [11,12] and the Discrete Empricial Interpolation Method (DEIM, [7,8,9]) are used. Both methods are specifically designed for the computationally efficient approximation of the nonlinearity of PDEs.

#### 3.2. Discrete Empirical Interpolation Method (DEIM)

The empirical interpolation method (EIM) was introduced by  to approximate parametric or nonlinear functions by separable interpolants. This technique is meanwhile standard in the reduced basis methodology for parametric PDEs. Discrete versions of the EIM for instationary problems have been introduced as empirical operator interpolation [7,8,27] or alternatively (and in some cases equivalently) as discrete empirical interpolation (DEIM) [9,10]. In particular, a posteriori [8,27,28] and a priori  error control is possible under certain assumptions (see also ).
We present a formulation for the present stationary problem. Instead of approximating a continuous field variable, the goal of the discrete versions is to provide an approximation $r ˜$ for the vectorial nonlinearity of the nodal residual vector r of the form
$r ˜ ( u ; p ) : = U ( P T U ) − 1 P T r ( u ; p ) ,$
where the columns of $U ∈ R n × M$ are called collateral reduced basis and $P = [ e i 1 , … , e i M ] ∈ R n × M$ is a sampling matrix with interpolation indices (also known as magic points) $i 1 , … , i M ∈ { 1 , … , n }$, with $e i$ being the i-th unit vector. By multiplication of (31) with $P T$, we verify that
$( r ˜ ( u ; p ) ) i j = r ( u ; p ) ) i j , j = 1 , … , M .$
In this sense, the approximation acts as an interpolation within the set of magic points.
The identification of the interpolation points is an incremental procedure, which is performed during the offline phase. We assume the existence of a set of training snapshots $Y : = { y 1 , … , y n train } ⊂ R n$ with $dim ( span ( Y ) ) ≥ M$.
Then, a POD of these snapshots results, see e.g., , in the collateral basis vectors $u 1 , … , u M$ and we define $U l : = [ u 1 , … , u l ]$ for $l = 1 , … , M$. The algorithm for the point selection is initialized with $P 0 = [ ]$, $I 0 : = ∅$, $U 0 = [ ]$ and then computes for $l = 1 … , M$
$q l : = u l − U l − 1 ( P l − 1 T U l − 1 ) − 1 P l − 1 T u l ,$
$i l : = arg max i ∈ { 1 , … , n } | ( q l ) i | ,$
$P l : = [ P l − 1 , e i l ] , I l : = I l − 1 ∪ { i l } .$
Finally, we set $P : = P M$, $U : = U M$ and $I : = I M$, which concludes the construction of $r ˜$. Intuitively, in each iteration, the interpolation error $q l$ for the current POD basis vector $u l$ is determined and the vector entry $i l$ with maximum absolute value is identified, which gives the next index. Regularity of the matrix $P T U$ is required for a well-defined interpolation. This condition is automatically satisfied under the aforementioned assumption of a sufficiently rich set of snapshots $Y$. As training set $Y$, one can either use samples of the nonlinearity , or use snapshots of the state vector or combinations thereof. In contrast to the instationary case, we may not use only training snapshots of r: As the residual r is zero for all snapshots, we would try to find a basis for a zero-dimensional space $span ( Y ) = 0$, which is not possible for $M > 0$. However, the residual at the intermediate (non-equilibrium) iterates is non-zero and this is also a good target quantity for the (D)EIM, as these terms appear on the right-hand side of the linear system during the fixed point iteration. Hence, a reasonable set $Y$ is obtained via
$Y = [ y 1 , ⋯ , y n train ] = [ r ( u ( 0 ) ( p ( 1 ) ) ; p ( 1 ) ) , ⋯ , r ( u ( α 1 ) ( p ( 1 ) ) ; p ( 1 ) ) , ⋯ , r ( u ( α s ) ( p ( s ) ) ; p ( s ) ) ] ,$
where $α 1$, ..., $α s$ are the number of fixed point iterations of the full simulation scheme for parameters $p ( 1 ) , … , p ( s )$.
Inserting $r ˜$ from (31) for the nonlinearity into the full problem and projection by left-multiplication with a weight matrix $W ∈ R n × m$ yields the POD-DEIM reduced m-dimensional nonlinear system for the unknown $γ$
$W T U ( P T U ) − 1 P T r ( u ¯ * ( p ) + V γ ; p ) = 0 .$
This low-dimensional nonlinear problem is iteratively solved by a fixed point procedure, i.e., at the current approximation $γ ( α ) ,$ we solve the linearized problem for $δ γ ( α )$
$W T U ( P T U ) − 1 P T J ( u ¯ * ( p ) + V γ ( α ) ; p ) V δ γ ( α ) = − W T U ( P T U ) − 1 P T r ( u ¯ * ( p ) + V γ ( α ) ; p )$
and set $γ ( α + 1 ) : = γ ( α ) + δ γ ( α )$. As in the previous sections, if $M < m$, this linear system cannot be solved uniquely. In that case, an alternative would be to solve a residual least-squares problem, similar to the GNAT-procedure, cf. . Note that the assembly of this system does not involve any high-dimensional operations, as the product of the first four matrices on the left- and right-hand side can be precomputed as a small matrix $X : = W T U ( P T U ) − 1$. Then, the terms $P T J , P T r$ also do not require any high-dimensional operations, as the multiplication with $P T ( · ) = ( · ) I$ corresponds to evaluation of the “magic” rows of the Jacobian and right-hand side, respectively. Typically, in discretized PDEs, these M rows only depend on few $M ¯$ entries of the unknown variable (e.g., the DOFs related to neighboring elements). This number $M ¯$ is typically bounded by a certain multiple of M due to regularity constraints on the mesh .
In (38), what is required for the collateral basis U can be recognized: If $J ( V γ ; p ) V δ γ ∈ colspan ( U )$ and $r ( V γ ; p ) ∈ colspan ( U ) ,$ then we are exactly solving the Galerkin–POD reduced linearized system
$W T J ( V γ ; p ) V δ γ = − W T r ( V γ ; p ) .$
Hence, this gives a guideline for an alternative reasonable choice of the training set $Y$, namely consisting of snapshots of both (columns of) J or $J V$ and r.
In the Galerkin projection case, one can choose $W = V$. This is the choice that we pursue in the experiments to make the procedure more similar to the other reduction approaches. The offline phase of the DEIM is summarized in Algorithm 4 and an algorithm of the online phase is provided in Algorithm 5.
 Algorithm 4: Offline Phase of the Discrete Empirical Interpolation Method (DEIM) Algorithm 5: Online Phase of the Discrete Empirical Interpolation Method (DEIM) Input :  parameters $p ∈ P$ reduced basis V, POD-DEIM sampling matrix $X : = W T U ( P T U ) − 1$ and magic point index set IOutput:  reduced vector $γ$ and nodal temperatures $u ˜$ (optional) 1 set $γ ( 0 ) = 0$; $u ˜ ( 0 ) = u ¯ * ( p )$; $α = 0$ ; // initialize 2 $J ¯ ← ( J ( u ˜ ( α ) ; p ) V ) I$ ; // $c Jac$; evaluate M rows of right-projected Jacobian 3 $r ¯ ← ( r ( u ˜ ( α ) ; p ) ) I$ ; // $c rhs$; evaluate M rows of right hand side 4 solve $X J ¯ δ γ ( α ) = − X r ¯$ ; // $c sol$; fixpoint iter. for $δ γ ( α )$ 5 update $γ ( α + 1 ) ← γ ( α ) + δ γ ( α )$; // update 6 compute $P M ¯ u ˜ ( α + 1 ) ← P M ¯ u ˜ ( α ) + P M ¯ V δ γ ( α )$ and set $α ← α + 1$ ; // $c reloc$ 7 converged ($∥ δ u ˜ ( α ) ∥ L 2 ( Ω ) = ∥ δ γ ( α ) ∥ l 2 < ϵ max$)? → end;    else: goto 2

#### 3.3. Hyper-Reduction (HR)

In order to improve the numerical efficiency, the Hyper-Reduction method  introduces a Reduced Integration Domain (RID) denoted $Ω Z ⊂ Ω$. The RID depends on the reduced basis. It is constructed by offline algebraic operations. The hyper-reduced equations are a Petrov–Galerkin formulation of the equilibrium equations, obtained by using truncated test functions having zero values outside the RID. The vector form of the reduced equations is similar to the one obtained by the Missing Point Estimation method  proposed for the Finite Volume Method. The strength of the Hyper-Reduction is its ability to reduce mechanical models in material science while keeping the formulation of the constitutive equations unchanged . The smaller the RID, the lower the computational complexity and the higher the approximation errors. These points have been developed in previous papers dealing with various mechanical problems (e.g., [29,30]).
The offline procedure of the Hyper-Reduction method involves two steps. The first step is the construction of the Reduced Integration Domain $Ω Z$. For the present benchmark test, the RID is the union of a subdomain denoted by $Ω u$ generated from the reduced vector gradients $( ∇ ψ k ) k = 1 , … , m$, and a domain denoted by $Ω +$ corresponding to a set of neighboring elements to the previous subdomain. Usually, in the Hyper-reduction method, the user can select an additional subdomain of $Ω$ in order to extend the RID over a region of interest. This subdomain is denoted by $Ω u s e r$. In the sequel, to get small RIDs, $Ω u s e r$ is empty. The set $Ω u$ consists of aggregated contributions $Ω k u , k = 1 , … , m$ from all the reduced vectors:
$Ω Z : = Ω u ∪ Ω + ∪ Ω u s e r , Ω u : = ∪ k = 1 m Ω k u .$
To give the full details of the procedure, we introduce the domain partition in finite elements denoted $( Ω j e ⊂ Ω ) j = 1 , … n el$: $Ω = ∪ j = 1 n el Ω j e$, where $n el$ is the number of elements in the mesh. The domain $Ω k u$ is the element where the maximum $L 2 ( Ω )$ norm of the reduced vectors $∇ ψ ˜ k$ is reached. In , $∇ ψ ˜ k$ was set equal to $∇ ψ k$. Here, when applying the DEIM to $( ∇ ψ k ) k = 1 , ⋯ , m$, the interpolation residuals provide a new reduced basis $( q k ) k = 1 , ⋯ , m$ (cf. Algorithm 6) related to temperature gradients. In this paper, $∇ ψ ˜ k$ is the output reduced basis produced by the DEIM, when it is applied to $( ∇ ψ k ) k = 1 , ⋯ , m$. Other procedures, for the RID construction, are available in previous papers on hyper-reduction, e.g., [11,12]. The element selection reads for $k = 1 , ⋯ , m$:
$Ω k u = arg max Ω j e , j = 1 , ⋯ , n el ∇ ψ ˜ k L 2 ( Ω j e ) ,$
where $∥ . ∥ L 2 ( Ω j e )$ is the $L 2$ norm restricted to the element $Ω j e$. Several layers of surrounding elements denoted $Ω +$ can be added to $Ω u$.
The second step of the offline Hyper-Reduction procedure is the generation of truncated test functions that are zero outside of the RID. The truncation operator $P Z$ is defined for all $u h ∈ V 0 h$ by
$P Z ( u h ) : = ∑ i ∈ F φ i u h , i , F = i ∈ { 1 , ⋯ , n } | ∫ Ω ∖ Ω Z φ i φ i d Ω = 0 .$
Here, $F$ is the set of indices of internal points, i.e., inner FE nodes, in $Ω Z$, which are related to the available FE equilibrium for predictions, which are forecasted only over $Ω Z$, i.e., for all $w ∈ V 0 h$ holds with $Γ 2 Z : = Γ 2 ∩ ∂ Ω Z$
$a ( u h , P Z ( w ) ; p ) − l ( P Z ( w ) ) = ∫ Ω Z μ ( u h ; p ) ∇ u h · ∇ ( P Z ( w ) ) d Ω − ∫ Γ 2 Z P Z ( w ) q * d Γ .$
The operator $P Z$ can be represented by a truncated projector denoted Z. More precisely, if $F = { i 1 , i 2 , … , i l }$ with $l : = card ( F )$, then
$Z : = [ e i 1 , e i 2 , … , e i l ] ∈ R n × l , P Z ( u h ) : = ∑ i = 1 n φ i ( Z Z T u ) i$
with $e i ∈ R n$ the i-th unit vector. Therefore, the hyper-reduced form of the linearized prediction step is: for a given $γ ( α )$, find $δ γ ( α )$ such that,
$V T Z Z T J ( V γ ( α ) ; p ) V δ γ ( α ) = − V T Z Z T r ( V γ ( α ) ; p ) ,$
where J is given by (15) and $γ ( α + 1 ) : = γ ( α ) + δ γ ( α )$.
In addition to Z, we introduce also the operator $Z ¯ ∈ R n × l ¯$ that is a truncated projection operator onto the $l ¯$ points contained in the RID. In practice, the discrete unknowns are computed at these $l ¯ ≥ l$ points in order to compute the residual at the inner points l. Note that often $l ¯$ is significantly larger than l, especially if the RID consists of disconnected (scattered) regions.
The complexity of the products related to the fixed point operator J on the left-hand side term scale with $2 ζ l m + 2 l m 2$, where $ζ$ is the maximum number of non-zero entries per row of J. For the right-hand side, the computational complexity is $2 l m$. For both products, the complexity reduction factors are $n / l$. To obtain a well-posed hyper-reduced problem, one requires to fulfill the following condition $l ≥ m$. If this condition is not fulfilled, the linear system of Equation (45) is rank deficient. In case of rank deficiency, one has to add more surrounding elements to the RID. The closer l to m, and the lower m, the less complex is the solution of the hyper-reduced formulation. The RID construction must generate a sufficiently large RID. If not, the convergence can be hampered, the number of correction steps can be increased and, moreover, the accuracy of the prediction can suffer. When $Ω Z = Ω$, then Z is the identity matrix and the hyper-reduced formulation coincides with the usual system obtained by the Galerkin projection. An a posteriori error estimator for hyper-reduced approximations has recently been proposed in  for generalized standard materials.
The offline phase of the hyper-reduction is summarized in Algorithm 6 and an algorithm of the online phase is provided in Algorithm 7.

#### 3.4. Methodological Comparison

We comment on some formal commonalities and differences between the HR and the (D)EIM.
We first note that both methods reproduce the Galerkin–POD case, if $l = M = n$. For the HR, this means that the RID is the full domain, which implies that Z is a square permutation matrix, hence being invertible and yielding $Z Z T = I$, thus (45) reduces to the POD–Galerkin reduced system (27). For the (D)EIM, this implies that the magic points consist of all grid points. We similarly obtain that P and U are invertible and thus $U ( P T U ) − 1 P T = I$ and (38) also reproduces the POD–Galerkin reduced system (27).
 Algorithm 6: Offline Phase of the Hyper-Reduction (HR) Algorithm 7: Online Phase of the Hyper-Reduction (HR) Furthermore, we can state an equivalence of the DEIM and the HR under certain conditions, more precisely, the reduced system of the HR is a special case of the DEIM reduced system. Let us assume that the sampling matrices coincide and the collateral basis is also chosen as this sampling matrix, i.e., $U = P = Z$. Let us further assume that we have a Galerkin projection by choosing $W = V$ for the DEIM. Then, $P T U = Z T Z = I M$ is the M-dimensional identity matrix, hence we obtain
$U ( P T U ) − 1 P T = Z ( Z T Z ) − 1 Z T = Z Z T .$
Then, (38) yields
$V T Z Z T J V δ γ = − V T Z Z T r ,$
which is exactly the HR reduced system (45).
A common aspect of HR, and (D)EIM obviously is the point selection by a sampling matrix. The difference, however, is the selection criterion of the internal points. In case of the DEIM, these points are used as interpolation points, while, for the HR, they are used to specify the reduced integration domain.
A main difference of (D)EIM to HR is the way an additional collateral reduced-space is introduced in the reduced setting of the equations. The HR is more simple by not using an additional basis related to the residuals, but the implicit assumption, that $colspan ( V )$ (which approximates u) also approximates r and J well. This is a very reasonable assumption in symmetric elliptic problems and—in a certain way—it mimics the idea of having the same ansatz and test space as in any Galerkin formulation. However, from a mathematical point of view, it may not be valid in some more general cases, as in principle U and V are completely independent. For example, we can multiply the vectorial residual Equation (11) by an arbitrary regular matrix, hence arbitrarily change r (and thus U for the DEIM), but not changing u at all (i.e., not changing the POD-basis U). Hence, the collateral basis in the (D)EIM is first an additional technical ingredient and difficulty, which in turn allows for adopting the approximation space to the quantities that needs to be approximated well.
Theoretically, the EIM is well founded by analytical convergence results . However, in addition, as a downside, the Lebesgue-constant, which essentially bounds the interpolation error to the best-approximation error, can grow exponentially. The DEIM is substantiated with a priori error estimates . In particular, the error bounds depend on the conditioning of the small matrix $P T U$. We are not aware of such a priori results for the HR, but also a posteriori error control has been presented in .

#### 3.5. Computational Complexity

While the aim of model reduction is ultimately a reduction of the computing time, this quantity may heavily depend on the chosen implementation (see Section 4.6). Generally, the computational effort can be decomposed into the following contributions:
• the computation of the local unknowns and of their gradients $c reloc$ (gradient/temperature computation),
• the evaluations of the (nonlinear) constitutive model $c const$,
• the assembly of the residual $c rhs$ and of the Jacobian $c Jac$,
• the solution of the (dense) reduced linear system $c sol$.
From a theoretical point of view, the presented methods differ with respect to $c reloc$, $c const$, $c rhs$ and $c Jac$:
• Finite Element Simulation ($n gp$: number of integration points; $n el$: number of elements; $n el DOF$: degrees of freedom per element)
$c reloc = 2 n gp n el DOF ( gradient / temperature computation ) c const ∼ n gp ( constitutive model ) c rhs = 2 n gp n el DOF ( residual assembly ) c Jac = n gp ( n el DOF ) 2 + 4 n el DOF ( Jacobian assembly ) c sol ∼ n 2 .$
• Galerkin–POD
$c reloc = 3 n gp m ( gradient / temperature computation ) c const ∼ n gp ( constitutive model ) c rhs = 2 m n gp ( direct residual assembly ) c Jac = ( 4 m + m 2 ) n gp ( direct Jacobian assembly ) c sol ∼ m 3 .$
• Hyper-Reduction
In the following, $n gp RID$ is the number of integration points in the RID. Furthermore, $c FE N , B$ denotes the cost for the evaluation of u and $∇ u$ using the FE matrices N and G and $c FE r$ is the related to the cost for the residual computation on element level (both at least linear in the number of nodes per element + scattered assembly + overhead) and $c FE K$ is the cost related to the contribution to the element stiffness at one element (∼number of nodes per element squared + scattered assembly + overhead). Lastly, $c A$ is the cost for the Jacobian assembly (i.e., matrix scatter operations).
$c reloc = l ¯ m + n gp RID c FE N , B ( get u , ∂ x u , ∂ y u in Ω Z ) c const ∼ n gp RID ( constitutive model ) c rhs = n gp RID c FE r + m l ( residual assembly and projection ) c Jac = ( m ω + m 2 ) l + n gp RID c FE K + c A ( Jacobian assembly and projection ) c sol ∼ m 3 .$
• Discrete Empirical Interpolation Method
The computational cost for the DEIM is closely related to the one of the HR by substituting M for l and $M ¯$ for $l ¯$ (denoting the number of nodes which are needed to evaluate the residual at the M magic points). Similar to the other methods, we denote $n gp DEIM$ as the number of quadrature points to evaluate the entries of the residual and Jacobian. In the cost notation of the Algorithm 5, we obtain
$c reloc ∼ m + n m c const = n gp DEIM ( constitutive model ) c rhs ∼ M M ¯ ( residual assembly and projection ) c Jac ∼ M M ¯ m ( Jacobian assembly and projection ) c sol ∼ m 3 .$
From these considerations, the following conclusions can be drawn: First, the number of integration points ($n gp$, $n gp RID$, $n gp DEIM$) required for the residual and Jacobian evaluation enter linearly into the effort. Second, the reduced basis dimension m enters linearly into the residual assembly and both linearly and quadratically into the Jacobian assembly. Third, for the HR and the (D)EIM, the ratio $l ¯ / l$ and $M ¯ / M$ have a significant impact on the efficiency: for the considered 2D problem with quadratic ansatz functions, these ratios can range from 1 up to 21, i.e., for the same number of magic points pronounced variations in the runtime are in fact possible. The ratio $l ¯ / l$ is determined by the topology of the RID, i.e., for connected RIDs it is smaller than for a scatter RID (i.e., for many disconnected regions forming the RID). Similarly, the (D)EIM has much smaller computational complexity in the case of magic points belonging to connected elements. Fourth, the Galerkin-POD can be based on simplified algebraic operations as no nodal variables need to be computed. This is due to the fact that the reduced residual and Jacobian are directly assembled without recurse to nodal coordinates and to any standard FE routine.

## 4. Numerical Results

#### 4.1. ONLINE/OFFLINE Decomposition and RB Identification

In the following, we investigate the behavior of the heat conduction problem (1) for parameters, which we recall from Equation (7)
$p = [ g x , g y , c , μ 0 , μ 1 ] ∈ [ 0 , 1 ] × [ 0 , 1 ] × [ 1 , 2 ] × { 1 } × { 0.5 } .$
First, a regular parameter grid containing 125 equidistant snapshot points is generated and the high-fidelity FE model is solved for all those points yielding solutions $u h , i , i = 1 , … , 125$. Here, the FEM discretization is based on a discretization into 800 biquadratic quadrilateral elements comprising a total of 2560 nodes (including 160 boundary nodes). The problem hence has $n = 2400$ unknowns. In order to exemplify the nonlinearity of the problem due to the temperature dependent conductivity, the conductivity (top row) and the temperature field (bottom row) are shown for three different parameters in Figure 2.
Then, a snapshot POD is performed in order to obtain an RB from the snapshots, and the normalized spectrum of the snapshot correlation matrix is shown in Figure 3. Different dimensions of the RB are considered with the projection error
$E m = ∑ i = 1 125 ∥ u h ( p i ) − P ˜ m u h ( p i ) ∥ L 2 ( Ω ) 2 ∑ i = 1 125 ∥ u h ( p i ) ∥ L 2 ( Ω ) 2$
given in Table 1 and the decay is visualized in Figure 3 (right). Here, $P ˜ m$ denotes the orthogonal projection operator onto the m-dimensional RB with respect to the standard inner product $〈 · , · 〉 L 2 ( Ω )$ in the $L 2 ( Ω )$ function space
$P ˜ m u h ( p ) = ∑ i = 1 m ∑ k = 1 m ψ i ( M ˜ − 1 ) i k u h ( p ) , ψ k L 2 ( Ω ) , M ˜ i k = ψ i , ψ k L 2 ( Ω ) i , j = 1 , ⋯ , m .$
The low approximation errors $E m$ given in Table 1 and in Figure 3 (right) indicate that the training data can essentially be represented rather accurately by the RB using a projection. Additionally, the projection error naturally decreases with increasing dimension. However, the solution of the reduced problem does not necessarily follow the same monotonicity. Since $μ$ is bounded away from 0 due to $μ 0 > 0$, the problem under consideration is coercive. Similar to the linear case, we expect that the approximation error $e ( p ) = w ˜ ( γ ( p ) ; p ) − w h ( p )$ and the projection error are comparable in the sense that
$η ( p ) : = ∥ e ( p ) ∥ L 2 ( Ω ) ∥ u h − P ˜ m u h ∥ L 2 ( Ω )$
is small. Due to the best-approximation by the orthogonal projection, we certainly have $η ( p ) ≥ 1$. Numerical Values are provided in the following. The constant $η ( p )$ is generally not available in closed form for the considered nonlinear problem. Assuming a constant conductivity $μ$ evaluated at the solution $u ( p )$, an upper bound $η UB ( p )$ for the numerically determined $η ( p )$ is provided via Cea’s Lemma (for symmetric problems):
$η UB h : = c ^ h ( p ) c ˇ h ( p ) , c ^ h : = sup v h , w h ∈ V 0 h ∖ { 0 } a c ( v h , w h ; p ) ∥ v h ∥ L 2 ( Ω ) ∥ w h ∥ L 2 ( Ω ) , c ˇ h : = inf v h ∈ V 0 h ∖ { 0 } a c ( v h , v h ; p ) ∥ v h ∥ L 2 ( Ω ) 2 .$
Here, $c ^ h$ and $c ˇ h$ are the continuity constant and the coercivity constant of the condensed bilinear form
$a c ( w h , v h ; p ) : = ∫ Ω ∇ w h · μ ( u h ( p ) ; p ) ∇ v h d Ω ( ∀ v h , w h ∈ V 0 h ) ,$
with $u h ( p )$ denoting the Finite Element solution of the nonlinear problem. The estimate $η UB h$ is not a tight bound and, at the same time, its numerical evaluation is computationally expensive, so it is typically considered to be of limited practical use.

#### 4.2. Test Cases

In order to investigate the accuracy of the reduced models, additional parameter vectors need to be considered that are not matching the training data. Two test cases are considered in the sequel:
[A]
A diagonal in the parameter space is considered with
$p ( j ) : = p 0 + β ( j ) ( p ^ − p 0 ) , p 0 : = [ 0 , 0 , 1 , 1 , 1 / 2 ] , p ^ : = [ 1 , 1 , 2 , 1 , 1 / 2 ] .$
A total of 101 equally spaced values of $β ( j )$ was chosen, i.e., $β ( j ) = j 100$ for $j = 0 , 1 , ⋯ , 100$.
[B]
A set of 1000 random parameters $p ( j )$ was generated using a uniform distribution in parameter space, i.e., a uniform distribution $U ( [ 0 , 1 ] )$ was chosen for $g x$, $g y$ and the parameter c was assumed to be distributed via $U ( [ 1 , 2 ] )$.

#### 4.3. Certification of the Galerkin RB Method

First, the ability of the Galerkin RB solution to approximate the optimal orthogonal projection and, thereby, the high-fidelity solution, was verified. Therefore, the constant $η ( p )$ was evaluated for all snapshots of case [A] and [B]. The minimum, the mean and the maximum of $η ( p )$ were determined for the 101 and 1000 test of case [A] and [B], respectively. The results shown in Table 2 state the POD approximation error is found close to the projection error. This confirms the quality of the chosen RB.
Note that in test case [B] only a finite number of random parameter vectors was chosen which does not necessarily contain the extreme values of $η ( p )$. The numerical data in Table 2 for test case [A] shows that indeed, [A] contains parameters leading to larger values of $η ( p )$. When increasing the size of the random parameter set for [B], the maximum values of $η ( p )$ in case [B] should be equal or larger than the maximum values of case [A].
In addition to the error magnification parameter $η ( p )$, the minimum, average and maximum of the relative error
$δ ( u ˜ ( p ) , p ) = ∥ u ˜ ( p ) − u h ( p ) ∥ L 2 ( Ω ) ∥ u h ( p ) ∥ L 2 ( Ω )$
are also computed for all samples. The results are provided in Table 3. Note that for test case [A] the minimum error is truly zero for $g x = g y = 0$, which implies a homogeneous zero temperature. For an RB of dimension 32, the mean error is well below 10−3 for all tests and the maximum error over all tests is 3.23 × 10−3. This basis provides a compromise between accuracy and computational cost and is therefore used for the comparison of the methods in the sequel.
Note that the slow decay of the accuracy of the Galerkin approximation indicated by the data provided in Table 3 indicates that the information content captured in the training snapshots is not sufficient to provide better accuracies. Therefore, we decided on a dimension $m = 32$ for the reduced basis in the subsequent experiments.
In order to further reduce the computational cost, the use of additional reduction techniques such as the Hyper-Reduction (HR) and Discrete Empirical Interpolation Method (DEIM) is required. Note also that the reduction techniques using a POD basis are only approximations of the Galerkin RB. Hence, the HR and DEIM cannot be better than the Galerkin RB solution except in few cases where $η HR ( p ) < η ( p )$ or $η DEIM ( p ) < η ( p )$, where $η HR ( p )$ and $η DEIM ( p )$ denote the constant $η$ from (51) for the HR and DEIM method. In our numerical tests, this occurred only exceptionally, which can be explained by the slightly different considered residual in comparison to the Galerkin RB. Let us now turn to a more realistic multi-query situation where RB is crucial for non-prohibitive runtimes.

#### 4.4. Application to Uncertainty Quantification

In real world simulation scenarios, material coefficients and boundary conditions are often not exactly known and one is interested in the impact of this uncertainty on the quantities of interest. To this end, uncertainty quantification (UQ) has been proposed and has become an active research field on its own. In classical forward UQ, the critical parameters are modeled as random variables; the distributions and correlation are derived from measurements as for example shown for nonlinear material curves in . Finally, the forward model is evaluated at collocation points $p i$ in the parameter space according to a quadrature method as e.g., Monte Carlo, . Typically, many collocations (or ‘sampling’) points are needed and therefore model order reduction has been shown to significantly reduce the computational costs, e.g., .
In the case of our thermal benchmark problem, the parameter vector $p = P ( ω )$ is considered as a realization of the random vector, where $ω ∈ Ω p$ and $( Ω p , F , P )$ is the usual probability space. We refer to this as test case [C] and assume that the random variables are independent and uniformly distributed as already introduced in test case [B]
$P ( ω ) = [ G x ( ω ) , G y ( ω ) , C ( ω ) ] with G x , G y ∼ U ( [ 0 , 1 ] ) and C ∼ U ( [ 1 , 2 ] ) .$
Finally, the statistical moments of the solution $u ⋆ ( x ; P ( ω ) )$ are approximated by the Monte Carlo method, e.g., 
$E u ⋆ ( x ; P ) ≈ 1 n p ∑ j = 1 n p u ⋆ ( x ; p ( j ) ) = : u ¯ ⋆ ( x ) ,$
$M k u ⋆ ( x ; P ) ≈ 1 n p ∑ j = 1 n p u ⋆ ( x ; p ( j ) ) − u ¯ ⋆ ( x ) k = : m ⋆ k x ,$
where $p ( j )$ are the same $n p = 1000$ random sample points generated for case [B] and $u ⋆$ and $m ⋆ k$ are the approximations of the solution and its k-th centered moment ($k > 1$) obtained from Finite Elements, Galerkin RB, DEIM and HR, i.e., $⋆ ∈ { h , RB , DEIM , HR }$, respectively (e.g., $m h 3$ is the third statistical moment obtained from Finite Element computations). For the DEIM, we selected a fixed number of $M = 400$ magic points, which is a conservative choice (see Figure 4 for a discussion). In practice, one may choose an adaptive selection strategy.
An estimation of the normalized root mean square error of the finite element solution due to Monte Carlo sampling can be obtained by
$E MC = 1 n p ∥ u ¯ FE ∥ L 2 ( Ω ) m h 2 L 2 ( Ω ) ≈ 1.78 · 10 − 2 ≡ 1.78 % .$
This implies that the accuracy of the reduced models in the prediction of $u ¯ ⋆$ should be around 2% or better, in order to render the reduced models capable of providing meaningful quantitative statistical information. Figure 5 shows the error in the moments computed for the various reduction techniques with respect to the finite element reference solution
$E ⋆ k = ∥ m ⋆ k − m h k ∥ L 2 ( Ω ) ∥ m h k ∥ L 2 ( Ω ) .$
The figure indicates that the approximations of the expected value $u ¯ ⋆$ are at least accurate up to $10 − 3$ and thus below (i.e., better than) the sampling accuracy $E MC$. Generally, the DEIM tends to perform better than the HR; the largest errors occur for $E HR 7 = 1.09 × 10 − 1$ and $E DEIM 7 = 3.46 × 10 − 2$, respectively.

#### 4.5. Accuracy of the HR and DEIM vs. Number of Interpolation Points

Both the HR and the DEIM sample the nonlinearity only at entries on the right-hand side: the interpolation points. The HR has one additional parameter describing how many element layers around a certain degree of freedom (DOF) should be considered in order to generate $Ω +$, which add additional interpolation points to the RID. The location of the DOF around which the interpolation points are located are selected based on the criterion described in Section 3.3. In contrast to that, the DEIM selects the points using only the right-hand side information of the system. For the DEIM, the number of sampling points M is an input parameter describing the dimension of the collateral basis. The effect of the number of points is investigated in the following. Based on the considerations in Section 3.4, both the HR and the DEIM should reproduce the POD solution for a large number of sampling sites, while, for a lower number of points, the accuracy is a trade-in for computational efficiency.
For the DEIM, different interpolation point numbers are considered for both test cases [A] and [B]. The resulting relative errors are compared in Figure 4 in terms of the statistical distribution function $P ( t )$ of the relative error, i.e., the probability of finding a relative error $δ$ that is smaller than or equal to t. Obviously the number of interpolation points has a pronounced impact on the distribution. Generally, the error function for a low number of points states a significant increase of the computational error due to DEIM in comparison with the POD. With an increasing number of points, the distribution function approaches the one of the POD. In our test, the use of more than 300 sampling points can only improve the accuracy in a minor way. We must note that, in general, the accuracy of the DEIM must not be a monotonic function of interpolation points number.
For the hyper-reduced predictions, different layers of elements are added in $Ω +$ in order to extend the RID. We have considered here $ℓ =$ 1, 2, 3 and 4 layers of elements connected to $Ω u$, for both test cases [A] and [B]. The resulting relative errors are compared in Figure 6 in terms of the statistical distribution function $P ( t )$ of the relative error. With increasing number of layers, the distribution function approaches the one of the POD. The number of internal points, l, increases rapidly when increasing the number of layers. In the case of the DEIM, the growth of interpolation points is much more progressive. More than two layers of elements do not improve the accuracy significantly.
One layer of additional elements gives predictions less accurate than the DEIM for approximately the same number of internal points/interpolation points (here: $l = 416$ for the HR and $M = 300$ for the DEIM). However, the number of additional points required for the residual evaluation differs considerably: $l ¯ = 657$ vs. $M ¯ = 1490$. This can be explained by direct comparison of the reduced domains in Figure 7. For the Hyper-reduction, the RID is rather compact (leftmost plot) while the magic points of the DEIM are rather scattered, thus requiring the temperature evaluation at many additional points.
The predictions of the Hyper-reduction are particularly less accurate, which can especially be seen by comparing Figure 4 (left, test case [A], black line) and Figure 6 (left, test case [A], red line), where 80% of the samples lead to errors below $≈ 0.002$ for the DEIM and $≈ 0.01$ for the HR. Nevertheless, the accuracy of the hyper-reduced predictions is generally of the same order of magnitude as the accuracy of the DEIM.

#### 4.6. Computing Times

Although the main objective of model reduction is the reduction of computing time, this paper does not aim at benchmarking HR and DEIM. If not optimized properly, runtime benchmarks compare rather the performance of the implementations than that of the algorithms. Nonetheless, the numbers in Table 4 indicate the effect of varying parameters in the respective techniques using our Matlab implementation. In the case of a few magic points, divergence of the fixed point iteration was observed for a small number of configurations in the parametric space. The number is stated in the table as failure. HR and DEIM have similar performance, although HR seems to be slightly more robust for the problem at hand in the sense that it exhibits slightly less failures of the iterative solver. Finally, as expected, the overall runtime scales linearly with the number of solves, i.e., computing test case [B] is approx. 10 times slower than [A].

## 5. Conclusions

The presented study revisits an—at first sight—rather simple nonlinear heat conduction problem. In order to accelerate the nonlinear computations, a Galerkin reduced basis ansatz is proposed (see Section 3.1) using preliminary offline computations in the established framework of the snapshot POD.
Both the HR and the (D)EIM can achieve significant accelerations of the computing time. These scale approximately with the number of magic points (here: l or M) and/or with the number of points connected to the magic points ($l ¯$ and $M ¯$, respectively). In the presented examples, less than 25% of the original mesh were considered in both the (D)EIM and the HR. By virtue of the considerations presented in Section 3.5, the computing times are reduced accordingly.
The selection of the magic points in the HR and the (D)EIM requires the computation of the solution at an increased number of nodes, i.e., $l ¯ ≥ l$ and $M ¯ ≥ M$, respectively. The higher the space dimension (2D, 3D, ...), the more scattered the magic points and the higher the number of degrees of freedom per element, the more are $l ¯$ and $M ¯$ increased in comparison to l and M.
With respect to the implementation, it shall be noted that the HR is less intrusive than the (D)EIM as it uses standard simulation outputs to generate the modes, while the (D)EIM requires additional outputs for the construction of the collateral basis. Other than that, both techniques can be implemented using mostly the same implementation, which is also confirmed by the similarity of both techniques presented in Section 3.4.

## Acknowledgments

The authors acknowledge the generous funding of the CoSiMOR scientific network by the German Research Foundation/Deutsche Forschungsgemeinschaft (DFG) under grants DFG-FR-2702/4-1, DFG-FR-2702/7-1. Felix Fritzen is thankful for financial support in the framework of the DFG Emmy-Noether group EMMA under grant DFG-FR2702/6. Bernard Haasdonk and Felix Fritzen acknowledge the support via the cluster of excellence SimTech (EXC310). Sebastian Schöps acknowledges support from the Excellence Initiative of the German Federal and State Governments and the Graduate School of CE at TU Darmstadt.

## Author Contributions

All authors performed experiments, contributed to the software and the paper.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

1. Gubisch, M.; Volkwein, S. POD for Linear-Quadratic Optimal Control. In Model Reduction and Approximation: Theory and Algorithms; Benner, P., Cohen, A., Ohlberger, M., Willcox, K., Eds.; SIAM: Philadelphia, PA, USA, 2017. [Google Scholar]
2. Volkwein, S. Optimal control of a phase-field model using proper orthogonal decomposition. ZAMM 2001, 81, 83–97. [Google Scholar] [CrossRef]
3. Antoulas, A. Approximation of Large–Scale Dynamical Systems; SIAM Publications: Philadelphia, PA, USA, 2005. [Google Scholar]
4. Patera, A.; Rozza, G. Reduced Basis Approximation and a Posteriori Error Estimation for Parametrized Partial Differential Equations; Version 1.0, Copyright MIT 2006-2007, to Appear in (Tentative Rubric) MIT Pappalardo Graduate Monographs in Mechanical Engineering; Massachusetts Institute of Technology (MIT): Cambridge, MA, USA, 2007. [Google Scholar]
5. Haasdonk, B. Reduced Basis Methods for Parametrized PDEs—A Tutorial Introduction for Stationary and Instationary Problems. In Model Reduction and Approximation: Theory and Algorithms; Benner, P., Cohen, A., Ohlberger, M., Willcox, K., Eds.; SIAM: Philadelphia, PA, USA, 2017. [Google Scholar]
6. Barrault, M.; Maday, Y.; Nguyen, N.; Patera, A. An ’empirical interpolation’ method: Application to efficient reduced-basis discretization of partial differential equations. C. R. Math. Acad. Sci. Paris Ser. I 2004, 339, 667–672. [Google Scholar] [CrossRef]
7. Haasdonk, B.; Ohlberger, M. Reduced basis method for explicit finite volume approximations of nonlinear conservation laws. In Proceedings of the HYP 2008, International Conference on Hyperbolic Problems: Theory, Numerics and Applications, Providence, RI, USA, 9–13 June 2008; American Mathematical Society: Providence, RI, USA, 2009; Volume 67, pp. 605–614. [Google Scholar]
8. Drohmann, M.; Haasdonk, B.; Ohlberger, M. Reduced Basis Approximation for Nonlinear Parametrized Evolution Equations based on Empirical Operator Interpolation. SIAM J. Sci. Comput. 2012, 34, A937–A969. [Google Scholar] [CrossRef]
9. Chaturantabut, S.; Sorensen, D.C. Discrete empirical interpolation for nonlinear model reduction. In Proceedings of the 48th IEEE Conference on Decision and Control and the 28th Chinese Control Conference (CDC/CCC 2009), Shanghai, China, 15–18 December 2009; pp. 4316–4321. [Google Scholar]
10. Chaturantabut, S.; Sorensen, D.C. A State Space Error Estimate for POD-DEIM Nonlinear Model Reduction. SIAM J. Numer. Anal. 2012, 50, 46–63. [Google Scholar] [CrossRef]
11. Ryckelynck, D. A priori hypereduction method: An adaptive approach. J. Comput. Phys. 2005, 202, 346–366. [Google Scholar] [CrossRef]
12. Ryckelynck, D. Hyper reduction of mechanical models involving internal variables. Int. J. Numer. Methods Eng. 2009, 77, 75–89. [Google Scholar] [CrossRef]
13. Everson, R.; Sirovich, L. Karhunen-Loève procedure for gappy data. J. Opt. Soc. Am. A 1995, 12, 1657–1664. [Google Scholar] [CrossRef]
14. Astrid, P.; Weiland, S.; Willcox, K.; Backx, T. Missing point estimation in models described by proper orthogonal decomposition. IEEE Trans. Autom. Control 2008, 53, 2237–2251. [Google Scholar] [CrossRef]
15. Carlberg, K.; Bou-Mosleh, C.; Farhat, C. Efficient non-linear model reduction via a least-squares Petrov–Galerkin projection and compressive tensor approximations. Int. J. Numer. Methods Eng. 2011, 86, 155–181. [Google Scholar] [CrossRef]
16. Farhat, C.; Avery, P.; Chapman, T.; Cortial, J. Dimensional reduction of nonlinear finite element dynamic models with finite rotations and energy-based mesh sampling and weighting for computational efficiency. Int. J. Numer. Methods Eng. 2014, 98, 625–662. [Google Scholar] [CrossRef]
17. Fritzen, F.; Hodapp, M. The Finite Element Square Reduced (FE2R) method with GPU acceleration: Towards three-dimensional two-scale simulations. Int. J. Numer. Methods Eng. 2016, 107, 853–881. [Google Scholar] [CrossRef]
18. Fritzen, F.; Xia, L.; Leuschner, M.; Breitkopf, P. Topology optimization of multiscale elastoviscoplastic structures. Int. J. Numer. Methods Eng. 2016, 106, 430–453. [Google Scholar] [CrossRef]
19. Dimitriu, G.; Ştefănescu, R.; Navon, I.M. Comparative numerical analysis using reduced-order modeling strategies for nonlinear large-scale systems. J. Comput. Appl. Math. 2017, 310, 32–43. [Google Scholar] [CrossRef]
20. Bathe, K.I. Finite-Elemente-Methoden; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
21. Zienkiewicz, O.; Taylor, R.; Zhu, J. Finite Element Method; Butterworth-Heinemann: Oxford, UK, 2006. [Google Scholar]
22. Kelley, C.T. Iterative Methods for Linear and Nonlinear Equations; SIAM: Philadelphia, PA, USA, 1995. [Google Scholar]
23. Sirovich, L. Turbulence and the dynamics of coherent structures part I: Coherent structures. Q. Appl. Math. 1987, 65, 561–571. [Google Scholar] [CrossRef]
24. Joliffe, I. Principal Component Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2002. [Google Scholar]
25. Ştefănescu, R.; Sandu, A.; Navon, I.M. Comparison of POD reduced order strategies for the nonlinear 2D shallow water equations. Int. J. Numer. Methods Fluids 2014, 76, 497–521. [Google Scholar] [CrossRef]
26. Fritzen, F.; Hodapp, M.; Leuschner, M. GPU accelerated computational homogenization based on a variational approach in a reduced basis framework. Comput. Methods Appl. Mech. Eng. 2014, 278, 186–217. [Google Scholar] [CrossRef]
27. Haasdonk, B.; Ohlberger, M.; Rozza, G. A Reduced Basis Method for Evolution Schemes with Parameter-Dependent Explicit Operators. ETNA Electron. Trans. Numer. Anal. 2008, 32, 145–161. [Google Scholar]
28. Wirtz, D.; Sorensen, D.; Haasdonk, B. A Posteriori Error Estimation for DEIM Reduced Nonlinear Dynamical Systems. SIAM J. Sci. Comput. 2014, 36, A311–A338. [Google Scholar] [CrossRef]
29. Ryckelynck, D.; Missoum Benziane, D. Multi-level a priori hyper reduction of mechanical models involving internal variables. Comput. Methods Appl. Mech. Eng. 2010, 199, 1134–1142. [Google Scholar] [CrossRef][Green Version]
30. Ryckelynck, D.; Vincent, F.; Cantournet, S. Multidimensional a priori hyper-reduction of mechanical models involving internal variables. Comput. Methods Appl. Mech. Eng. 2012, 225, 28–43. [Google Scholar] [CrossRef]
31. Ryckelynck, D.; Gallimard, L.; Jules, S. Estimation of the validity domain of hyper-reduction approximations in generalized standard elastoviscoplasticity. Adv. Model. Simul. Eng. Sci. 2015, 2, 6. [Google Scholar] [CrossRef]
32. Maday, Y.; Nguyen, N.; Patera, A.; Pau, G. A General, Multi-Purpose Interpolation Procedure: The Magic Points; Technical Report RO7037; Laboratoire Jaques-Louis-Lions, Université Piere et Marie Curie: Paris, France, 2007. [Google Scholar]
33. Römer, U.; Schöps, S.; Weiland, T. Stochastic Modeling and Regularity of the Nonlinear Elliptic curl-curl Equation. SIAM/ASA J. Uncertain. Quantif. 2016, 4, 952–979. [Google Scholar] [CrossRef]
34. Xiu, D. Numerical Methods for Stochastic Computations: A Spectral Method Approach; Princeton University Press: Princeton, NI, USA, 2010. [Google Scholar]
35. Haasdonk, B.; Urban, K.; Wieland, B. Reduced Basis Methods for parameterized partial differential equations with stochastic influences using the Karhunen-Loeve expansion. SIAM/ASA J. Uncertain. Quantif. 2013, 1, 79–105. [Google Scholar] [CrossRef]
Figure 1. Geometry of the planar benchmark problem (left) and nonlinearity of the temperature dependent conductivity $μ$ (right).
Figure 1. Geometry of the planar benchmark problem (left) and nonlinearity of the temperature dependent conductivity $μ$ (right).
Figure 2. Parameter dependent conductivity $μ ( u ; p )$ (top row) and solution $u ( x ; p )$ (bottom row) for three different snapshot parameters.
Figure 2. Parameter dependent conductivity $μ ( u ; p )$ (top row) and solution $u ( x ; p )$ (bottom row) for three different snapshot parameters.
Figure 3. Decay of the spectrum of C (normalized to largest eigenvalue $ξ 1$) and the relative projection error $E m$ of the snapshot data defined by (49).
Figure 3. Decay of the spectrum of C (normalized to largest eigenvalue $ξ 1$) and the relative projection error $E m$ of the snapshot data defined by (49).
Figure 4. Statistical distribution function of the relative error of the DEIM for different numbers of magic points $M ∈ { 150 , 200 , 250 , 300 }$ (dimension of POD basis: $m = 32$).
Figure 4. Statistical distribution function of the relative error of the DEIM for different numbers of magic points $M ∈ { 150 , 200 , 250 , 300 }$ (dimension of POD basis: $m = 32$).
Figure 5. $L 2 ( Ω )$ error of the centered moments w.r.t. the finite element solution. Please note that we have set $m ⋆ 1 : = u ¯ ⋆$ for simplicity of notation.
Figure 5. $L 2 ( Ω )$ error of the centered moments w.r.t. the finite element solution. Please note that we have set $m ⋆ 1 : = u ¯ ⋆$ for simplicity of notation.
Figure 6. Statistical distribution function of the relative error of the hyper-reduction for different layers of elements added to the RID $ℓ ∈ { 1 , 2 , 3 , 4 }$ (dimension of POD basis: $m = 32$).
Figure 6. Statistical distribution function of the relative error of the hyper-reduction for different layers of elements added to the RID $ℓ ∈ { 1 , 2 , 3 , 4 }$ (dimension of POD basis: $m = 32$).
Figure 7. Position of the magic points (blue points) and the additional points required for the evaluation of the residual (green points); Hyper-reduction (left) vs. DEIM (middle, right) for $m = 32$.
Figure 7. Position of the magic points (blue points) and the additional points required for the evaluation of the residual (green points); Hyper-reduction (left) vs. DEIM (middle, right) for $m = 32$.
Table 1. Dimension of the RB vs. projection error for the snapshots.
Table 1. Dimension of the RB vs. projection error for the snapshots.
Dimension of RB1624324860
$E m$6.629 × 10−34.120 × 10−33.180 × 10−32.017 × 10−31.486 × 10−3
Table 2. Computed values of $η ( p )$ for different modes sets and for test cases [A], [B]; the last row represents the upper bound $η UB h ≥ η ( p )$.
Table 2. Computed values of $η ( p )$ for different modes sets and for test cases [A], [B]; the last row represents the upper bound $η UB h ≥ η ( p )$.
Test Case[A][B]
min.meanmax.min.meanmax.
$m = 16$1.0001.47081.82481.0001.31852.0239
$m = 24$1.0001.90882.89261.0001.31872.6559
$m = 32$1.0001.72732.74411.0001.36792.4393
$m = 48$1.0001.53862.02321.0001.30511.9371
$m = 60$1.0001.50961.93331.0001.34471.8922
$η UB h$ cf. (52)62.1264.69370.58162.5174.72882.504
Table 3. Relative error of the Galerkin RB approximation.
Table 3. Relative error of the Galerkin RB approximation.
Test Case[A][B]
min.meanmax.min.meanmax.
$m = 16$0.00   10−32.54   10−37.28   10−30.67   10−31.75   10−37.49  10−3
$m = 24$0.00   10−31.07   10−33.80   10−30.16   10−30.88   10−35.29  10−3
$m = 32$0.00   10−30.82   10−33.23  10−30.18   10−30.63   10−33.13  10−3
$m = 48$0.00   10−30.31   10−31.14  10−30.06   10−30.33   10−32.07  10−3
$m = 60$0.00  10−30.20   10−30.70  10−30.05   10−30.26   10−31.55  10−3
Table 4. Elapsed time and number of failures of the nonlinear solver for the solution of all computations of test case [A] (101 solves) and [B] (1000 solves) for Finite Element (FE), Hyper-Reduction (HR) with varying number of additional element layers $ℓ ∈ { 1 , 2 , 3 , 4 }$, DEIM for $M ∈ { 200 , 250 , 300 , 400 }$ magic points; all computations are carried out using $N = 32$ POD modes.
Table 4. Elapsed time and number of failures of the nonlinear solver for the solution of all computations of test case [A] (101 solves) and [B] (1000 solves) for Finite Element (FE), Hyper-Reduction (HR) with varying number of additional element layers $ℓ ∈ { 1 , 2 , 3 , 4 }$, DEIM for $M ∈ { 200 , 250 , 300 , 400 }$ magic points; all computations are carried out using $N = 32$ POD modes.
Test Case [A]Test Case [B]
TimeFailTimeFail
FE59.9 s-660.5-
HR, $ℓ = 1$10.6 s0118.9 s5
HR, $ℓ = 2$17.4 s0180.6 s2
HR, $ℓ = 3$24.5 s0247.2 s1
HR, $ℓ = 4$30.9 s0303.5 s0
DEIM, $M = 200$20.6 s0292.3 s108
DEIM, $M = 250$21.4 s0240.6 s7
DEIM, $M = 300$24.4 s0249.0 s3
DEIM, $M = 400$27.6 s0272.7 s2