Steady-State Anderson Accelerated Coupling of Lattice Boltzmann and Navier–Stokes Solvers

We present an Anderson acceleration-based approach to spatially couple three-dimensional Lattice Boltzmann and Navier–Stokes (LBNS) flow simulations. This allows to locally exploit the computational features of both fluid flow solver approaches to the fullest extent and yields enhanced control to match the LB and NS degrees of freedom within the LBNS overlap layer. Designed for parallel Schwarz coupling, the Anderson acceleration allows for the simultaneous execution of both Lattice Boltzmann and Navier–Stokes solver. We detail our coupling methodology, validate it, and study convergence and accuracy of the Anderson accelerated coupling, considering three steady-state scenarios: plane channel flow, flow around a sphere and channel flow across a porous structure. We find that the Anderson accelerated coupling yields a speed-up (in terms of iteration steps) of up to 40% in the considered scenarios, compared to strictly sequential Schwarz coupling.


Introduction
The choice of an optimal numerical solver for a given fluid dynamics problem is often problematic. Memory and runtime requirements, numerical accuracy and stability, treatment of boundaries, parallel scalability and flow physics are only some of the criteria that need to be taken into consideration. In particular, many flow problems cannot be grouped into a particular solver's "favorite problems". One approach to locally exploit the features of different solvers consists in splitting the computational domain and apply different flow solvers on the respective subdomains.
Navier-Stokes (NS) solvers [1,2] and Lattice Boltzmann (LB) methods [3,4] are well established techniques for simulating in-or weakly compressible fluid flow. One or the other method may be advantageous in particular flow scenarios, and both approaches have been compared [5,6] and studied in detail over the last decades. Moreover, it is well-known that Lattice Boltzmann methods yield the Navier-Stokes equations in the asymptotics of a vanishing Mach number (convective scaling), and even result in the incompressibleNavier-Stokes system using diffusive scaling [7]. They can be considered equivalent in that sense. When using the term "Navier-Stokes solver" in the following, we refer to a direct discretization of the (incompressible) Navier-Stokes system and a respective numerical method solving the arising non-linear system. Despite their particular features-high locality (LB), local/simple treatment of complex geometries (LB), low memory requirements (NS), to only name a few-only little effort has been undertaken so far to spatially couple the two approaches [8][9][10][11][12] or, more generally speaking, to couple LB and direct PDE solvers [13,14].
In [10,11], we developed a coupling methodology and applied it to steady-state and recently to transient problems [15]. In the transient case, we considered explicit coupling of a Lattice Boltzmann and an incompressible Navier-Stokes solver. Although good agreement of the velocity profiles could be obtained in various scenarios, we also demonstrated that minor density/pressure perturbations may have an essential impact on the solution. In order to improve the matching of degrees of freedom at the coupling interface between LB and NS subdomains, more enhanced or pressure correction schemes may represent promising approaches. In this contribution, we take a step towards sophisticated coupling of LB and NS based on overlapping Schwarz-based domain decomposition. We restrict considerations to three-dimensional, steady-state scenarios. We discuss three coupling methodologies-referred to as sequential coupling, parallel coupling and Anderson accelerated coupling, cf. Section 2.3-and study their convergence behavior for channel flow scenarios. A particular focus is put on the Anderson acceleration method. This method has been used to accelerate and stabilize fixed-point iterations in, for example, fluid-structure interaction [16], groundwater flow [17] or electronic structure computations [18]. It dates back to 1965 [19], and recently re-attracted interest as it shows great potential for applications in which a Newton matrix is inaccessible or too expensive to store [20,21]. Though the method requires additional computations for processing the coupled variables at the interface, it has become attractive for HPC simulations as it allows for fully parallel execution of two coupled solvers. Schwarz methods for steady-state coupled problems may require many iterations to converge; the Anderson acceleration technique reduces the number of iterations steps. We demonstrate that these features of the Anderson acceleration also hold for LBNS coupling. We further show that our hybrid LBNS method can be used to efficiently solve flow problems with complex geometries, such as the flow through and around porous media. In particular, both LB and NS can be combined to optimally exploit LB and NS features, that is to reduce memory footprint (NS) while applying the local LB algorithm only in selected regions of interest on a potentially refined grid. This is similar to the idea of LB grid refinement; yet, this approach does not only refine the resolution, but also the model for the fluid flow by combining mesoscopic (LB) and continuum (NS) approaches. Besides, capturing different flow physics by using an efficient LBNS coupling may be enabled. For example, the mesoscopic nature of LB allows to access finite Knudsen flows for regimes up to Kn ∼ O(1) [22,23], potentially using higher-order LB velocity space discretizations. Flows in micro-devices such as microfluidic networks may thus be simulated using a NS solver in wider bulk regions and a LB solver in narrow regions. Another mesoscopic feature of LB consists in the strictly local treatment of Brownian fluctuations [24] which is basically impossible in the (incompressible) NS context.
We start with a short introduction to LB and NS solvers and a description of the three coupling methods in Section 2. Implementation of the coupling, validation results and a numerical convergence study of the Anderson acceleration scheme are provided in Section 3. We close with a brief conclusion and give an outlook to future work in Section 4.

Navier-Stokes
We solve the three-dimensional incompressible Navier-Stokes equations with fluid velocity u ∈ R 3 , pressure p ∈ R and kinematic viscosity ν ∈ R + . We apply the finite difference methodology described in [25]. The computational domain is discretized using a Cartesian grid. The unknowns for pressure and fluid velocity are stored according to a staggered scheme: the pressure is evaluated in the cell centers whereas the velocity components u = (u 0 , u 1 , u 2 ) are computed in the midpoints of the cell faces. A mix of first-and second-order finite differences is employed to discretize the differential expressions in Equations (1) and (2); the first-order expressions are used to enhance stability due to the convection term. An explicit time stepping for the velocities and making use of the continuity Equation (1)

Lattice Boltzmann
We consider the Bhatnagar-Gross-Krook (BGK)-based [26] Lattice Boltzmann method which computes the particle distributions f i (x, t) according to the well-known "collide-stream" scheme: Fluid density ρ ∈ R, pressure p ∈ R, momentum ρu ∈ R 3 and equilibrium distributions f eq i are given by where c s := 1/ √ 3 and w i are lattice-specific weights. We consider the discretizations D3Q15, D3Q19 and D3Q27, i.e., Q ∈ {15, 19, 27}. The relaxation time τ is related to the kinematic viscosity of the fluid via ν = c 2 s (τ − 0.5) and is restricted to the interval (0.5, 2) due to stability. It can be shown that the presented Lattice Boltzmann scheme is asymptotically equivalent to solving the Navier-Stokes equations in the weakly compressible limit [27].

Spatial Coupling: General Methodology Domain Decomposition, Interpolation and Unit Conversion
We decompose the computational domain into a LB and NS domain. Both domains overlap, see Figure 1; with LB and NS solver operating on Cartesian grids of mesh size dx LB , dx NS , the thickness of the overlap region is chosen as dx ovl p := k · max{dx LB , dx NS }, k ∈ N, k ≥ 1.
We have found that the size of the overlap region is a rather insensitive parameter in the coupling. Only for k = 1, minor deficiencies in the overall accuracy and coupling convergence could be observed. In all our studies, we choose k = 2 as compromise between minimizing the overlap region on the one hand and tightly coupling flow information of both solvers on the other hand. We expect the application of Cartesian grids-in particular on NS side-to be no strict requirement. Different grids and solvers (such as finite volume or finite element schemes) may be used, and (the thickness of) the overlap region needs to be adapted correspondingly.

Inlet
Outlet Ω NS Ω LB Ω Ovlp Figure 1. Decomposition of the computational domain into subdomains Ω NS and Ω LB for the NS and LB solver, and an overlap region Ω Ovl p .
We construct valid information on the outer overlap boundary cells of LB/NS using second-order interpolation, applied to the NS/LB quantities, see Figure 2. Since NS and LB solver work at different units, velocity and pressure values as well as kinematic viscosity need to be matched. Given a kinematic viscosity ν, characteristic velocity u c and pressure p c on NS side, LB values are computed as follows: where dx LB , dt LB correspond to LB mesh size and time step.
From Lattice Boltzmann to Navier-Stokes Dirichlet velocity boundary conditions are constructed from LB values and are imposed to the NS solver. LB velocity values are interpolated at the NS staggered grid nodes and scaled to NS units (based on Equation (8)). The Dirichlet velocity boundary conditions yield homogeneous Neumann conditions for the pressure [25].

From Navier-Stokes to Lattice Boltzmann
The number of unknown distributions f i , i = 1, ..., Q, is larger than the number of known quantities (pressure, velocity, fluid stresses) on Navier-Stokes side, Q > 1 + 3 + 6. Various approaches to construct the distributions from pressure, velocity [28] and stresses [11,29] have been proposed. We apply the optimization-based approach from [10,11] to obtain a unique set of distributions. Each distribution function is split into equilibrium and non-equilibrium part, To compute f eq i , the LB density and velocity are determined from the known NS pressure and velocity values by second-order interpolation and scaling (see Equation (8)). The NS pressure is not necessarily fixed, but may be shifted by an arbitrary constant throughout the whole computational domain. We introduce a reference pressure p re f which is given by the pressure value at the outlet (in a respective channel scenario). We set this pressure value to zero, p re f = 0. We note that an alternative approach consists in choosing p re f as average of the pressure values in the LBNS overlap region [9]; both approaches showed same performance and comparable accuracy in our simulations. The LB density ρ LB is computed from The non-equilibrium parts f neq i are minimized while retaining the viscous stresses at the LBNS interface. The arising minimization problem reads: This approach is similar to the one taken by entropic LB schemes and Grad's approximation techniques [30]. In contrast to the regular asymptotic Chapman-Enskog expansion of LB towards the Navier-Stokes systems, the right hand side of the last side constraint from Equation (10) is normalized by the fluid density ρ LB to match the incompressible NS description. The minimization problem can be solved analytically by the method of Langrange multipliers and yields a unique solution. This solution is computed from a cell-local, computationally cheap matrix-vector product; the input vector corresponds to the right hand side of the side constraints in Equation (10).

Sequential Coupling
A standard approach to solve steady-state coupled problems is given by Schwarz coupling [31], cf. Algorithm 2. Both LB and NS solver are executed alternately using the other solver's data as boundary conditions. When steady-state is reached, boundary data is extracted and imposed to the other solver, which is subsequently iterated until it reaches steady-state. This alternation is continued until the global solution has converged. Since LB and NS solver need to be executed one after the other, we refer to this method as sequential coupling.
Let t LB/NS par/seq denote the time-to-solution for LB/NS solver in one particular coupling cycle in sequential mode, where par/seq stand for the parallelizable and strictly sequential parts of the underlying program. We assume that both solvers are executed on the same processes. The total time-to-solution t sol seq for the sequential coupling using M processes evolves at where M NS , M LB denote the number of processes used simultaneously for NS and LB. If the numbers of processes M LB , M NS are chosen such that the computational loads of both solvers are perfectly balanced, i.e., and that all computational resources M are used, we obtain Inserting Equations (13)- (15) into Equation (11) yields a comparison of perfectly balanced parallel and sequential coupling: The runtime per coupling cycle of the sequential coupling is thus always slower than in the parallel coupling. Determining the perfect splitting M NS , M LB for each coupling cycle, however, is difficult: computational loads may significantly vary between coupling cycles, and dynamic load balancing is required. Furthermore, the NS/LB solver uses "old" data from the LB/NS solver as boundary input in case of the parallel coupling. The number of coupling cycles needs to be doubled compared to sequential coupling to achieve the same level of convergence.

Anderson Accelerated Coupling
Convergence of the sequential and parallel coupling can be accelerated by applying a fixed-point correction to the interface values, like the Anderson acceleration method [19,21]. For fluid-structure interaction problems, we showed that both sequential and parallel coupling result in nearly the same convergence order if an Anderson acceleration is applied [32]. The parallel Anderson accelerated coupling thus allows to simultaneously solve both problems, while not degrading the convergence order. We therefore restrict our considerations in this section to the parallel coupling, see Algorithm 4.
is fulfilled. For simplification, we abstract Equation (17)  and p NS ovl p , we include in the vector x and, thus, in the acceleration. We call such coupling variables primary variables of the coupling. Secondary variables are coupling variables that are not part of the fixed-point equation, and are deduced from the same linear combination with coefficients α. Algorithm 5 gives a formal description.

Algorithm 5 Anderson acceleration in pseudocode
// send data from LB/NS to NS/LB and solve LB/NS simultaneouslỹ

Implementation
We implement the different coupling approaches in the Advanced Scientific Computing Development Toolkit (ASCoDT) [33], a specialized component-based architecture. Optimized for HPC applications, the component architecture supports multiple component multiple data (MCMD) parallelization, where each parallel application is a component on its own. The LB solver, NS solver and coupling scheme are each modeled as a component and are implemented in C++. For the realization of parallel components, we attach multiple instances of a server responsible for handling incoming connections and interaction calls from other components. The data exchange is established via parallel request-based data communication [34].
The LB solver is given by a 3D test implementation using standard domain decomposition for distributed memory execution.
For solving the NS system, we rely on the in-house code NS-EOF. It uses a modular design separating Cartesian grid data structures, domain decomposition handling, grid traversal, and cell-wise (stencil-like) operator evaluations. The pressure Poisson equation is solved using PETSc, version 3.3 [35].
The implementation of the coupling component, in particular the Anderson acceleration, is realized by reusing a module of preCICE, a coupling library particularly designed for fluid-structure interaction [36].

Validation: Optimization-Based LB Boundary Conditions
Given pressure/density, velocity and fluid stresses, we use the cell-local optimization-based boundary condition on LB side to specify the respective NS values. We validate this boundary treatment for the discretizations D3Q15, D3Q19, D3Q27 in three different scenarios of type "flow between two parallel infinite plates", as illustrated in Figure 3. In all three setups, a cubic domain is considered, consisting of 10 × 10 × 10 inner cells and one additional layer of boundary cells. We impose a parabolic velocity profile and linear pressure drop in the boundary cells of the cube: pressure, velocity values and stresses are specified as required by the optimization problem, cf. Equation (10). We consider a coordinate system aligned with the Cartesian LB grid. The setups are defined as follows: • Normal: the LB grid is aligned with the plates and the main flow direction, cf. Figure 3a. The main flow direction is thus given by (1, 0, 0) . • Diagonal: the LB grid is rotated 45 • away from the main flow axis and kept aligned with the plates, cf. Figure 3b. The main flow direction is hence parallel to (1, 1, 0) . • Bi-Diagonal: the LB grid is rotated such that only one corner is placed on each boundary plane, cf. Figure 3c. The main flow direction is aligned with (1, 1, 1) , the normal of the plates is given by ± The LB viscosity is set to ν LB = 1/6. The avg. inlet velocity is given by u LB = 0.05 for Normal and Diagonal, and by u LB = 0.05 · √ 6 4 for Bi-Diagonal to retain the same Reynolds number as in the other two setups. After reaching steady-state, the profiles of the velocity and the shear stress expression  The results for the different LB discretizations are shown in Figure 4. For all discretizations, both velocity and shear stress are captured correctly. The same holds for the linear pressure drop (not shown in Figure 4).

LBNS Validation: Plane Channel
For validation of the coupled scenario, we consider laminar flow in an empty channel. The channel has a box-like shape, stretching over (l NS x , l NS y , l NS z ) = (4.0 × 2.0 × 2.0); the lower left front corner is located at (0, 0, 0). In the NS simulation, the channel is discretized using 40 × 20 × 20 cells, yielding a uniform mesh size dx NS = 0.1. The viscosity is set to ν = 1.0. On the left side, an inlet velocity profile With the parallel coupling exactly converging 50% slower than the sequential method, we restrict the considerations to the parallel coupling and the Anderson accelerated coupling. The Anderson acceleration starts "remembering and reusing" information only after one coupling cycle. Besides, the LB simulation is initialized at zero velocity and unit density, yielding a significantly different solution than the channel flow prediction of the NS solver in the first coupling cycle. For these reasons, we activate the Anderson acceleration after two coupling iterations. The different coupling methods (parallel/Anderson acceleration) converge towards the same solution, cf. Figure 5. The Anderson acceleration significantly speeds up the convergence. The velocity profiles are measured along a cross-section through the coupling region for the different coupling methods. After the first few iterations, the Anderson acceleration converges significantly faster than the sequential coupling. The overhead of the Anderson acceleration-compared to the parallel coupling scheme-amounts in the current (non-optimized) implementation to approx. 63% (computed from the average computational time over the first 10 coupling cycles). However, only 10-11 Anderson coupling iterations are required to reach an accurate description of the velocity profile, compared to 35 coupling iterations in the case of parallel coupling (17-18 coupling iterations in case of sequential coupling, respectively), cf. Figure 5. Since the Anderson acceleration allows for faster convergence in terms of coupling iterations and yields a better ratio of parallelizable to purely sequential tasks, we conclude that Anderson accelerated coupling is a promising candidate for massively parallel LBNS simulations. We further computed the solution using LB everywhere in the channel at the same resolution dx LB . Figure 6 shows good agreement between the pure LB and LBNS solution for both velocity profile and linear pressure drop.

LBNS: Flow Past Spherical Obstacle
Next, we consider the same channel setup, cf. Section 3.3, and extend it by introducing a spherical obstacle in the middle of the LB domain. We use the half-way bounce back scheme to treat the spherical boundary as no-slip obstacle. The sphere is located at (1.5, 1.0, 1.0) and has a radius r = 0.25. The coupled setup is shown in Figure 7. The three coupling schemes show similar convergence properties compared to the empty channel scenario, see Figure 8. We further compared the steady-state LBNS solution with the pure LB simulation. For this purpose, we consider the profiles before (x = 0.5), at (x = 1.5) and behind (x = 3.0) the obstacle at respective cross-sections. As it can be seen from Figure 9, all profiles are in excellent agreement.

Anderson Acceleration: Parameter Study
In order to further evaluate the performance of the Anderson acceleration applied to our coupled simulation (cf. Section 2.3.4), we reconsider the plane channel scenario, cf. Section 3.3. We quantify the convergence rate using the relative convergence criterion The criterion is evaluated for each coupling variable individually between two successive iterations. This yields a zick-zack-like convergence of both LB and NS solver for the parallel coupling, cf. Figure 10a. Both LB and NS solver converge at the same rate, but convergence is shifted by one iteration.
To avoid imbalances in the least-squares system of the Anderson acceleration, that is to put more weight on either LB or NS, we extend the formulation by normalizing the vectors of coupling variables u NS ovl p , p NS ovl p , and u LB ovl p . Slightly faster convergence is observed for the normalized variants compared to the original scheme, see Figure 10b. The first zick-zack of the parallel coupling is also visible in this case, due to the initialization of the coupled setup: zero velocity is initially imposed in the LB region, and at least one coupling iteration is required to transport flow information from NS into the LB region. Since this still implies very low velocities in the LBNS overlap, the solution and thus the convergence rate are hardly affected. The mean speed of convergence of the normalized Anderson accelerated coupling is significantly faster than for the parallel coupling, see Figure 10a.  Table 1 lists the number of iterations that different Anderson acceleration approaches need to converge below a threshold of 10 −5 and 10 −7 . The maximum number of iterations in u NS ovl p , u NS ovl p or p NS ovl p (shown as bold number in Table 1) dictates the overall number of iterations that are required for convergence. Similarly, in Table 2, we study the effect of different ratios of Lattice Boltzmann to Navier-Stokes cells on the convergence rate. Several aspects can be observed in both tables and are detailed in the following. Normalizing the vectors of coupling variables has an influence on the convergence speed, and can shift the bias from the Navier-Stokes to the Lattice Boltzmann variables and vice versa, compare also Figure 10b. This influence, however, is very limited in this example which is mainly due to the fact that all coupling variables already live on the same characteristic physical scale. Furthermore, Table 1 shows that the normalization is not necessarily the optimal balance. Due to its very limited influence, we omit further detailed studies here.
If only coupling variables of one solver are considered, the coupling information is incomplete, and the overall scheme does not converge in this case.
Consider Table 1. Introducing p NS ovl p as a third primary coupling variable does not imply faster convergence. Apparently, this does not result in new information compared to u NS ovl p . We can even observe a slight decrease in the overall convergence due to an imbalance in LB and NS information within the least-squares system.
Consider Table 1. If p NS ovl p is used as a primary variable instead of u NS ovl p , we observe a slightly slower convergence. This substitution, however, reduces the size of the least-squares system by a factor of ∼3 (the size of the vector u LB ovl p ∈ R 5760 is much smaller than the size of u NS ovl p ∈ R 302580 in this example).
Consider Table 2. The ratio of of Lattice Boltzmann and Navier-Stokes cells in the overlap region has no significant influence on the convergence behavior. We conclude from this parameter study that the usage of the two velocity vectors u NS ovl p and u LB ovl p as primary variables is recommended. A substitution of u NS ovl p by p NS ovl p can be useful for scenarios, where the size of the least-squares system matters, i.e., scenarios with a big overlap region compared to the overall domain. The normalization of coupling variables as well as the ratio between the number of Lattice Boltzmann and Navier-Stokes cells have little influence on the convergence behavior.

LBNS Showcase: Flow in Porous Structures
Finally, we apply the LBNS methodology to flow through a porous structure [5,37,38], pointing out the applicability of our method to parallel (component-based) simulations with complex geometries. The setup is shown in Figure 11 and consists of a channel domain Ω of size 32.0 × 16.0 × 16.0 with a set of 100 random-sized rigid spheres packed inside a volume Ω LB of 8.0 × 8.0 × 8.0. This volume, i.e., its lower left front corner, is located at 8.0 × 4.0 × 4.0. Due to the complex geometry of the porous medium-like sphere packing, we apply the highly localized LB method in this region Ω LB . The rest of the channel Ω NS = Ω\Ω LB is solved by NS. We discretize Ω NS with Cartesian grid cells of equidistant size dx NS = 0.1 and Ω LB with cells of size dx LB = dx NS /4 = 0.025 respectively, resulting in 320 × 160 × 160 NS and 321 × 321 × 321 LB nodes. The fluid parameters are chosen as ν = 1.0 and τ = 1.0, the avg. inlet velocity is set to u NS c = 1.0. Ten iterations of the parallel coupling scheme are carried out using 32 NS processes and 64 LB processes. A visualization of the flow field around and in the porous structure after the tenth coupling iteration is shown in Figure 12.

Conclusions
We presented a novel spatial coupling method for fully three-dimensional, steady-state simulations. The method combines Lattice Boltzmann and incompressible Navier-Stokes solvers and allows to locally exploit the features of each simulation approach. We validated the LB boundary conditions used in the LBNS overlap region, studied the Anderson acceleration method for LBNS coupling and compared it to purely sequential and simple Schwarz-based parallel coupling schemes. All coupling schemes converged in the presented test cases, i.e., plane channel flow and flow past a spherical obstacle, towards the expected solution. The Anderson acceleration method showed very good performance in terms of convergence rates which clearly outperformed the sequential and simple parallel coupling (11 vs. 18/35 coupling cycles, cf. Section 3.3). We currently work towards incorporating the iterative LBNS coupling into transient LBNS simulations to investigate whether this approach reduces any compressibility perturbations in the flow field. Besides, more detailed studies are required in future to evaluate the influence of normalization of the different coupling vectors (cf. Section 3.5). Although the influence was limited in the presented study, it may become more significant in case of more complex scenarios, or in case of bigger gradients in the coupling region.
Author Contributions: Atanas Atanasov implemented large parts of the coupling within a component-based framework for scientific computing. Benjamin Uekermann contributed coupling algorithms and implementations (in particular the implementation of the Anderson Acceleration), mostly within the preCICE library. Carlos A. Pachajoa Mejía carried out the validation of LB boundary conditions. Hans-Joachim Bungartz supervised the work, coordinated investigations and significantly contributed to the arrangement of this article. Philipp Neumann supervised parts of the work, in particular with regard to the mapping of degrees of freedom between LB and NS solvers. He further developed and revised the optimization-based LB boundary condition and other Lattice Boltzmann-specific aspects.

Conflicts of Interest:
The authors declare no conflict of interest.