A parallel solver for FSI problems with fictitious domain approach

We present and analyze a parallel solver for the solution of fluid structure interaction problems described by a fictitious domain approach. In particular, the fluid is modeled by the non-stationary incompressible Navier-Stokes equations, while the solid evolution is represented by the elasticity equations. The parallel implementation is based on the PETSc library and the solver has been tested in terms of robustness with respect to mesh refinement and weak scalability by running simulations on a Linux cluster.


Introduction
The analysis of fluid-structure interaction (FSI) problems is important for several applications in science and engineering; typical examples that we have in mind are, for instance, the study of fluid-dynamics of heart valves, particulate flows, and propagation of free surfaces.FSI problems are challenging both from the mathematical and computational point of view: the difficulties originate from the necessity of handling interactions between several objects and the presence of nonlinear terms in the governing equations.As a consequence, during the past years, several approaches have been presented for the numerical modeling of FSI problems: among those, we mention the Arbitrary Lagrangian Eulerian formulation (ALE) [26,21,22,28], the unfitted Nitsche method [14], the level set formulation [16], and the fictitious domain approach [24,23].It is important to notice that each method is effective for a selected class of problems, since there is no method that can be applied to all possible situations.
Our fictitious domain approach with distributed Lagrange multiplier was considered first in [9] as evolution of the immersed boundary method [34,13], originally introduced by C. Peskin during the Seventies for cardiac simulations of heart valves .This approach is based on the idea that the fluid domain is fictitiously extended also to the region occupied by the immersed body.In particular, the fluid is governed by the incompressible time dependent Navier-Stokes equations, while the structure can be characterized by either linear or nonlinear constitutive laws for viscous elastic materials.The fluid dynamics is studied on a fixed Eulerian mesh, while we resort to a Lagrangian description to represent the motion and the deformation of the immersed structure; the Lagrangian frame is built by introducing a reference domain which is mapped, at each time step, into the actual position of the body.
In order to get accurate results, simulations of FSI problems require in general huge computational resources, both in terms of time and memory.In this framework, the design of robust parallel solvers is an important tool to perform, in a reasonable amount of time, computations involving a large number of time steps and fine space discretizations.Several works are focused on this task, mainly in the setting of ALE formulations [7,17,20,8,36,25,29] or by applying a variational transfer in order to couple fluid and solids [30,32].To the best of our knowledge, preconditioners for the fictitious domain formulation with distributed Lagrange multiplier have been introduced only recently in [11].
Here, we continue the analysis of the parallel solver introduced in [11]: we focus our attention on the robustness with respect to mesh refinement, with different choices of time step, and to the weak scalability.We also describe some implementation issues.The finite element discretization is performed by choosing the (Q 2 , P 1 ) element for velocities and pressures of the fluid and the Q 1 element for the structure variables; the time marching scheme is a first order semi-implicit finite difference algorithm.Moreover, the fluid-structure coupling matrix is assembled by exact computations over non-matching meshes as described in [10]; the computational costs of this procedure will be described.At each time step, the linear system arising from the discretization is solved by the GMRES method, combined with either a block-diagonal or a block-triangular preconditioner.Our parallel implementation is based on the PETSc library from Argonne National Laboratory [6,5] and on the parallel direct solver Mumps [2,3], which is used to invert the diagonal blocks in the block preconditioners.
After recalling some functional analysis notation, in Section 3 we present the mathematical model describing fluid structure interaction problems in the spirit of the fictitious domain approach.In Section 4, we describe the numerical method we implemented for our simulations and in Section 5 we introduce two possible choices of preconditioner for our parallel solver.Finally, in Section 6 we present some numerical tests aiming at assessing the robustness with respect to mesh refinement and the weak scalability.

Notation
We recall some useful functional analysis notation [31].Let us consider an open and bounded domain D. The space of square integrable functions is denoted by L 2 (D), with scalar product (•, •) D .In particular, L 2 0 (D) is the subspace of functions with null mean over D. We denote Sobolev spaces by W s,q (D): with s ∈ R referring to the differentiability and q ∈ [1, ∞] to the summability exponent.When q = 2, we adopt the classical notation H s (D) = W s,2 (D).In addition, H 1 0 (D) ⊂ H 1 (D) is the space of functions with zero trace on the boundary ∂D.For vector valued spaces the dimension is explicitly indicated.

Continuous formulation
We simulate fluid-structure interaction problems characterized by a visco-elastic incompressible solid body immersed in a viscous incompressible fluid.We denote by Ω f t and Ω s t the two regions in R d (with d = 2, 3) occupied by the fluid and the structure, respectively, at the time instant t; the interface between these two regions is denoted by Γ t .The evolution of such a system takes place inside Ω, that is the union of Ω f t and Ω s t : this new domain is independent of time and we assume that it is connected and bounded with Lipschitz continuous boundary ∂Ω.It is worth mentioning that, even if we are going to consider only thick solids, also the evolution of thin structures can be treated by our mathematical model.
The dynamics of the fluid is studied by considering an Eulerian description, associated with the variable x.On the other hand, the evolution of the immersed body is modeled by a Lagrangian description: we introduce the solid reference domain B, associated with the variable s, so that the deformation can be represented by the map X : B −→ Ω s t .This means that x ∈ Ω s t is the image at the time t of a certain point s ∈ B and the motion of the structure is represented by the kinematic equation (1) u s (x, t) = ∂X ∂t (s, t) for x = X(s, t), denoting by u s the material velocity.The deformation gradient ∇ s X is denoted by F and J(s, t) = det F(s, t).Since we are assuming that the solid body is incompressible, the determinant J is constant in time.
In our model, we consider a Newtonian fluid with density ρ f and viscosity ν f > 0, so that the Cauchy stress tensor can be written as ( 2) where u f denotes the velocity of the fluid and p f its pressure and we denote by I the identity tensor.
In particular, the symbol ε(•) refers to the symmetric gradient ε(v) = (∇ v + ∇ v )/2.Therefore, the dynamics in Ω f t is governed by the incompressible Navier-Stokes equations (3) For the solid, we consider a viscous-hyperelastic material with density ρ s and viscosity ν s > 0; for this type of materials, the Cauchy stress tensor σ s can be seen as the sum of two contributions: a viscous part, similar to the one of the fluid (4) and an elastic part which can be written, moving from Eulerian to Lagrangian setting, in terms of the Piola-Kirchhoff stress tensor P (5) P(F(s, t)) = J(s, t)σ e s (x, t)F(s, t) − for x = X(s, t).
In particular, hyperelastic materials are characterized by a positive energy density W (F) which is related with P since P(F) = ∂W/∂F.Consequently, the elastic potential energy of the solid body can be expressed as ( 6) Finally, the system is described by the following equations in strong form (7) and completed by two transmission conditions to enforce continuity of velocity and stress along the interface Γ t (8) where n f and n s denote the outer normals to Ω f t and Ω s t , respectively.Moreover, we consider the following initial and boundary conditions ( 9) The idea of the fictitious domain approach is to extend the first two equations in (7) to the whole domain Ω so that all the involved variables are defined on a domain which is independent of time.Consequently, following [9], we introduce two new unknowns In this new setting, (1) becomes a constraint on u, since we have to impose that (11) u(X(s, t This condition can be weakly enforced by employing a distributed Lagrange multiplier.To this end, we set Λ = (H 1 (B) d ) , the dual space of H 1 (B) d , and denote by •, • the duality pairing between Λ and H 1 (B) d .Notice that, for Y ∈ H 1 (B) d , we have the following property At this point, following [9,12], the equations in (7), endowed with conditions ( 8) and ( 9), can be written in variational form.
and λ(t) ∈ Λ such that for almost all t ∈ (0, T ): In particular, ( 13) Moreover, ν is the extended viscosity with value ν f in Ω f t and ν s in Ω s t .For our numerical tests, we are going to consider ν f = ν s since it is a reasonable assumption for biological models [35].
For our simulations, we consider a simplified version of the problem: we drop the convective term of the Navier-Stokes equations and we assume that fluid and solid materials have the same density, i.e ρ s = ρ f .We focus on this case since it is interesting to see how the solver behaves when this assumption is combined with a semi-implicit time advancing scheme in the setting of the fictitious domain approach.This is actually the critical situation when the added mass effect can cause instabilities.For instance, in [15] a simplified one dimensional setting is considered for which non implicit schemes are proved to be unconditionally unstable when applied to FSI problems modeled by ALE if fluid and solid have the same densities ρ s = ρ f .This phenomenon appears regardless of the discrete parameters.In order to alleviate such critical behavior, some appropriate treatment of the transmission conditions has been investigated, for instance, in [19,4]) .Therefore, the problem we are going to simulate reads as follows.

Discrete formulation
Before discussing the discrete formulation, we remark that, from now on, we focus on two dimensional problems (d = 2).
The time semi-discretization of Problem 2 is based on the Backward Euler scheme.The time interval [0, T ] is uniformly partitioned into N parts with size ∆t = T /N .We denote the subdivision nodes by t n = n∆t.For a generic function g depending on time, setting g n = g(t n ), the time derivative is approximated as (14) ∂g ∂t Moreover, the nonlinear coupling terms λ(t), v(X(•, t)) and µ, u(X(•, t), t) are semi-implicitly treated by considering the position of the structure at the previous time step as λ n+1 , v(X n ) and µ, u n+1 (X n ) .
For the discretization in space, we work with quadrilateral meshes for both fluid and solid.For the fluid, we consider a partition T Ω h of Ω with meshsize h Ω and two finite element spaces V h ⊂ H 1 0 (Ω) d and Q h ⊂ L 2 0 (Ω) for velocity and pressure, respectively, satisfying the inf-sup condition for the Stokes problem.In particular, we work with the (Q 2 , P 1 ) pair, which is one of the most popular Stokes elements, making use of continuous piecewise quadratic velocities and discontinuous piecewise linear pressures.
For the solid domain, we choose a partition T B h of B with meshsize h B , independent of T Ω h .We then consider two finite dimensional spaces S h ⊂ H 1 (B) d and Λ h ⊂ Λ.We assume that S h = Λ h and we approximate both the mapping X and the Lagrange multiplier λ with piecewise bilinear elements on quadrilaterals.Other stable combinations of finite element spaces for our class of problems have been studied in [1], both from the theoretical and the numerical point of view.
We notice that, since Λ h is included in L 2 (B) d , at discrete level the duality pairing can be replaced by the scalar product in Therefore, we get the following fully discrete problem.
Assuming for simplicity P(F) = κF, Problem 3 can be represented in matrix form as ( 16) Here, φ i and ψ k denote the basis functions of V h and Q h respectively, while χ j are the basis functions of the space defined on B. We observe that, since S h = Λ h , C s is the mass matrix in S h or Λ h .
We can see that the matrix in ( 16) splits into four blocks, defined as follows: where A 11 is related to the fluid dynamic, A 22 to the solid evolution, while A 12 and A 21 contain the coupling term.Particular attention has to be paid to the assembly of the coupling matrix C f (X n h ), since it involves the integration over B of solid and mapped fluid basis functions.In order to compute these integrals, we need to know how each element E of T B h is mapped into the fluid domain.We implement an exact quadrature rule by computing, at each time step, the intersection between the fluid mesh T Ω h and a mapped solid element X n h (E).In order to detect all the intersections, each solid element is tested against all the fluid elements making use of a bounding box technique, which in this particular case is trivial since the fluid is discretized with a Cartesian mesh of squares; then the intersections are explicitly computed by means of the Sutherland-Hodgman algorithm.For more details about the procedure in a similar situation, we refer to [10].The implementation of this composite rule is quite involved and it is not straightforwardly parallelizable.
In general, when P(F) is nonlinear, we use a solver for nonlinear systems of equations such as the Newton iterator method.

Parallel preconditioners
The design of an efficient parallel solver influences two aspects of the numerical method: first, the finite element matrices need to be assembled in parallel on each processor, second, the solution of the saddle point system arising from the discretization has to be solved saving computational resources, in terms of both memory and execution time.For this purpose, we implemented a Fortran90 code based on the library PETSc from Argonne National Laboratory [6,5].Such library is built on the MPI standard and it offers advanced data structures and routines for the parallel solution of partial differential equations, from basic vector and matrix operations to more complex linear and nonlinear equation solvers.In our code, vectors and matrices are built and subassembled in parallel on each processor.

Our parallel solver adopts two possible choices of preconditioner:
• block-diagonal preconditioner A 11 0 0 A 22 • block-triangular preconditioner We solve the linear system making use of the parallel GMRES method combined with the action of our preconditioners, which consists of the exact inversion of the diagonal blocks performed by the parallel direct solver Mumps [2,3].

Numerical tests
The proposed preconditioners have been widely studied in [11] in terms of robustness with respect to mesh refinement, strong scalability and refinement of the time step.In this work, after reporting new results in terms of optimality, we analyze the weak scalability of our solver.We focus on both linear and nonlinear models describing the solid material.
In the GMRES solver, we adopt as stopping criterion a 10 −8 reduction of the Euclidean norm of the relative residual and a restart parameter of 200.In the case of the nonlinear model, the stopping criterion adopted for the Newton method is a 10 −6 reduction of the Euclidean norm of the relative residual.
Our simulations were run on the Shaheen cluster at King Abdullah University of Science and Technology (KAUST, Saudi Arabia).It is a Cray XC40 cluster constituted by 6,174 dual sockets computing nodes, based on 16 core Intel Haswell processors running at 2.3GHz.Each node has 128GB of DDR4 memory running at 2300MHz.6.1.Linear solid model.We consider a quarter of the elastic annulus {x ∈ R 2 : 0.3 ≤ |x| ≤ 0.5} included in Ω = [0, 1] 2 : in particular, the solid reference domain corresponds to the resting configuration of the body, that is The dynamics of the system is generated by stretching the annulus and observing how internal forces bring it back to the resting condition.In this case, Ω s 0 coincides with the stretched annulus.Four snapshots of the evolution are shown in Figure 1.
The solid behavior is governed by a linear model, therefore P(F) = κ F, with κ = 10.We choose fluid and solid materials with same density ρ f = ρ s = 1 and same viscosity ν f = ν s = 0.1.We impose no slip conditions for the velocity on the upper and right edge of Ω, while on the other two edges, we allow the motion of both fluid and structure along the tangential direction.Finally, the following initial conditions are considered In Table 1, we report the results for the optimality test, where the robustness of the solver is studied by refining the mesh and keeping fixed the number of processors.In particular, we set the time step to ∆t = 0.01 and the final time of our simulation to T = 2.The number of processors used for the simulation is 32.The time T ass needed to assemble the matrix of the   1.Refining the mesh in the linear solid model.The simulations are run on the Shaheen cluster.procs = number of processors; dofs = degrees of freedom; T ass = CPU time to assemble the stiffness and mass matrices; T coup = CPU time to assemble the coupling term; its = GMRES iterations; T sol = CPU time to solve the linear system; T tot = total simulation CPU time.The quantities T coup , its and T sol are averaged over the time steps.All CPU times are reported in seconds.
problem increases moderately, while the time T coup , needed for the assembly of the coupling matrix by computing the intersection between the involved meshes, exhibits a superlinear growth.In terms of preconditioners, we can see that block-diag is not robust with respect to mesh refinement since the number of GMRES iterations grows from 13 to 430; clearly, this phenomenon affects also the time T sol we need to solve the system.On the other hand, block-tri is robust since the number of GMRES iterations remains bounded by 14 when the mesh is refined.Therefore, T sol presents only a moderate growth and, for 1074054 dofs, it is 30 times smaller than the value we get for block-diag preconditioner.
The weak scalability of the proposed parallel solver is analyzed in Table 2. Again, we choose T = 2 and ∆t = 0.01.We perform six tests by doubling both the global number of dofs and the number of processors.Thanks to the resources provided by PETSc, the time T ass to assemble  1.
stiffness and mass matrices is perfectly scalable.On the other hand, the assembly procedure for the coupling matrix is much more complicated: in order to detect all the intersections between solid and fluid elements, the algorithm consists of two nested loops.For each solid element (outer loop), we check its position with respect to all the fluid elements (inner loop).In particular, only the outer loop is distributed over all the processors.Consequently, T coup is not scalable since the number of fluid dofs, analyzed in serial, increases at each test.We now discuss the behavior of the two proposed preconditioners.It is evident that block-diag is not scalable since the number of GMRES iteration drastically increases as we increase dofs and procs, clearly affecting T sol and T tot .On the other hand, block-tri behaves well: even if it is not perfectly scalable, the number of iterations slightly increases from 8 to 18 and T sol ranges from 2.24 • 10 −1 s to 11.43 s. , the structure is pulled down by a force applied at the middle point of the right edge.Therefore, when released, the solid body returns to its resting configuration by the action of internal forces.Four snapshots of the evolution are shown in Figure 2.
The energy density of the solid material is given by the potential strain energy function of an isotropic hyperelastic material; in particular, we have where tr(F F) denotes the trace of F F, while γ = 1.333 and η = 9.242.It can be proved that W is a strictly locally convex strain energy function, as discussed in [18,27,33].
Also for this test we assume that fluid and solid materials share the same density, equal to 1, and the same viscosity, equal to 0.2.The velocity is imposed to be zero at the boundary of Ω, while the following initial conditions are imposed u(x, 0) = 0, X(s, 0) = s.
Results for the mesh refinement test are reported in Table 3: we consider the evolution of the system during the time interval [0, 2], with time step ∆t = 0.002.The number of processors for the simulations is set to 64, while the number of dofs increases from 21222 to 741702.As for the linear  T ass = CPU time to assemble the stiffness and mass matrices; T coup = CPU time to assemble the coupling term; nit = Newton iterations; its = GMRES iterations to solve the Jacobian system; T sol = CPU time to solve the Jacobian system; T tot = total simulation CPU time.The quantities T coup and nit are averaged over the time steps, whereas the quantities its and T sol are averaged over the Newton iterations and the time steps.All CPU times are reported in seconds.
case, T ass increases moderately and T coup follows a superlinear growth.Both preconditioners are robust with respect to mesh refinement: the number of Newton iterations is 2 for each test and the average number of GMRES iterations per nonlinear iteration is bounded by 15 for block-diag and by 10 for block-tri.This behavior of block-diag is in contrast with the results we obtained for the linear solid model: this is due to the finer time step chosen for this simulation.
In order to study the weak scalability, we choose T = 0.1 and ∆t = 0.002.The results, reported in Table 4, are similar to the results obtained for the linear case.As before, T coup is not scalable due to the algorithm we implemented for the assembling of the coupling term.Even if it is not perfectly scalable, block-tri performs pretty well since the average number of linear iterations per nonlinear iteration increases only from 15 to 19.On the other hand, the good behavior of block-diag  3.
registered in Table 3 is not confirmed: the average number of linear iterations reaches 101, showing a lack of weak scalability as already seen in Table 2.

Conclusions
We analyzed two preconditioners, block-diagonal and block-triangular, for saddle point systems originating from the finite element discretization of fluid-structure interaction problems with fictitious domain approach.We have focused only on the case where the fluid and solid domains have the same densities, which, based on previous studies, should be the most challenging to solve.In particular, the analysis has been done by studying the robustness with respect to mesh refinement and weak scalability, applying the parallel solver to both linear and nonlinear problems.
Only block-triangular appears to be robust in terms of mesh refinement for linear and nonlinear problems; on the other hand, block-diagonal works well when the time step is very small.Moreover, by studying the weak scalability, we can notice two further limitations of the proposed method, which will be the subject of future studies.First, the time to assemble the coupling matrix is not scalable: it is based on two nested loops, related to solid and fluid elements respectively; but only the external one is done in parallel over the processors.In order to improve this procedure, one may subdivide the involved meshes into clusters or, alternatively, one may keep track of the position of the solid body at the previous time instant.Second, since the action of the preconditioners consists of the exact inversion of two matrices, the time for solving the linear system slightly increases when the mesh is refined.Some preliminary results have shown that, in contrast with the fluid block A 11 , the solid block A 22 does not behave well when an inexact inversion is applied: our future studies will be primarily focused to overcome this problem.
For what concerns the modeling approach, several extensions are possible.First of all, we have neglected the convective term in the Navier-Stokes equation; second, we have considered an isotropic nonlinear constitutive law for the structure instead of anisotropic variants and, finally, we have focused only on two dimensional problems.

Figure 1 .
Figure 1.Four snapshots of the evolution of the structure with linear constitutive law.

6. 2 .
Nonlinear solid model.For this test, we set again the fluid domain Ω to be the unit square; the immersed solid body is a bar represented, at resting configuration, by the rectangle B = Ω s 0 = [0, 0.4] × [0.45, 0.55].During the time interval [0, 1]

Figure 2 .
Figure 2. Four snapshots of the evolution of the structure with nonlinear constitutive law.

Table 2 .
Weak scalability for the linear solid model.The simulations are run on the Shaheen cluster.Same format as Table

Table 3 .
Refining the mesh in the nonlinear solid model.The simulations are run on the Shaheen cluster.procs = number of processors; dofs = degrees of freedom; Nonlinear solid model -Mesh refinement test procs = 64, T = 2, ∆t = 0.002 dofs T ass (s) T coup (s) block-diag block-tri nit its T sol (s) T tot (s) nit its T sol (s)

Table 4 .
Weak scalability for the nonlinear solid model.The simulations are run on the Shaheen cluster.Same format as Table