Enhancing Computational Precision for Lattice Boltzmann Schemes in Porous Media Flows

Abstract: We reassess a method for increasing the computational accuracy of lattice Boltzmann schemes by a simple transformation of the distribution function originally proposed by Skordos which was found to give a marginal increase in accuracy in the original paper. We restate the method and give further important implementation considerations which were missed in the original work and show that this method can in fact enhance the precision of velocity field calculations by orders of magnitude and does not lose accuracy when velocities are small, unlike the usual LB approach. The analysis is framed within the multiple-relaxation-time method for porous media flows, however the approach extends directly to other lattice Boltzmann schemes. First, we compute the flow between parallel plates and compare the error from the analytical profile for the traditional approach and the transformed scheme using single (4-byte) and double (8-byte) precision. Then we compute the flow inside a complex-structured porous medium and show that the traditional approach using single precision leads to large, systematic errors compared to double precision, whereas the transformed approach avoids this issue whilst maintaining all the computational efficiency benefits of using single precision.


Introduction
The lattice Boltzmann (LB) method solves a discrete, meso-scale form of the Boltzmann equation [1], which can be shown to reduce to the incompressible Navier-Stokes equation in the low Mach number limit.Because of its computational efficiency and simplicity, the LB approach is now widely-used in many fields of computational fluid dynamics such as thermal and multiphase flows [2] and reactive transport [3], while rapidly developing in applications such as turbulent flows [4].
In a 1993 paper, Skordos presented an approach to increase the computational precision of lattice Boltzmann schemes in which the distribution functions were transformed by negating the equilibrium zero-velocity distribution function [5].In principle, this should have maintained more significant bits during the calculation, leading to higher accuracy which no longer depended on the local velocity.Curiously though, when compared with the standard method of calculation involving the unmodified distribution functions, the approach was shown only to provide a very minor benefit to the accuracy of the calculation, which was still strongly dependent on the velocity (see Figure 4 in [5]).
In this work, we revisit the idea and show that application of the method with an important extra consideration can indeed lead to orders of magnitude increase in the accuracy of LB calculations while incurring no extra computational cost.The velocity-dependence of the accuracy, which is observed with the standard approach, is also removed.In the first section, we describe the method and highlight important considerations in the implementation that were missed in Skordos' original paper which probably account for the underwhelming result.Then we demonstrate in a simple capillary how the accuracy of the solution is greatly improved, and unfavourable velocity-dependence of the accuracy is removed.Finally, the method is applied to the calculation of single-phase flow in a complex porous medium and shown to be critical to obtaining the correct flow-field if single precision is to be used.Many researchers have incorporated this method into their codes in recent years, including the open-source codes Palabos [6] and OpenLB [7].Dellar, for example, also used Skordos' approach [8] but, to the best of our knowledge, the full potential of this method has not been shown or quantified in the literature.This paper is intended to serve as a concise manual which will help LB practitioners fully exploit this simple but highly effective method.
The performance of optimised implementations of the LB model are known to be bound by memory bandwidth on CPUs and graphics processing units (GPUs).Using the single precision (float) datatype rather than double precision can provide up to twice the throughput between the main memory and the cache, offering a comparable performance enhancement.Cache occupancy is also doubled, resulting in fewer cache-misses.Single-precision arithmetic itself is faster than double precision in GPUs, and can be on CPUs if compiled using optimal hardware instructions.The performance benefits of using single precision rather than double precision are greatly advantageous in multi-component or reactive flow simulations which may take days to run, as well as the reduced memory requirement, which is often a limiting factor on modern GPUs.
Here, we apply LB to single-phase flow calculations in a porous medium, such as is used for permeability and transport calculations in petro-physical analysis, as a practical example.Pore structures in certain carbonate rocks can be extremely heterogeneous [9] and as such lead to wide flow-velocity distributions.When simulating flow in these cases, it is important not to have velocity-dependence in the accuracy of the calculation as low velocity regions are found to play an important role in transport properties [10,11].
We use an LB model which is particularly suited to porous media flows in this work.Although the Bhatnagar-Gross-Krook (BGK) approximation of the collision operator in conjunction with the halfway bounce-back scheme for treating solid-fluid boundaries is most commonly found in the literature, this model suffers from deficiencies such as viscosity-dependent slip at the walls [12] and low numerical stability.As such, it is difficult to obtain the true flow properties of a porous medium with this method.Instead, the multiple-relaxation-time (MRT) model offers much greater flexibility by transforming the distribution function into a set of moments, each of which may be relaxed with an individual rate [13,14].By tuning the unphysical relaxation parameters, viscosity-independence can be achieved with the bounce-back boundary conditions [15].Although boundary interpolation schemes demonstrate slightly more consistent behaviour [16,17], we have found that the standard bounce-back method gives accurate results for flow in complex porous media [18] and remain with this approach here for its computational efficiency.The fluid is driven by a body-force term, for which a precise treatment is needed to eliminate error terms in velocity gradients [19] and is incorporated into the MRT framework [20,21].

Method
The multiple-relaxation-time (MRT) calculation proceeds through the following steps.Firstly, the macroscopic node variables density ρ and velocity u are obtained from the distribution function f (x, t) The definition for velocity is according to the forcing scheme of Guo [21] where g is a body-force, and e i is the ith of the 19 velocity vectors at each node (Figure 1) which are defined as where c = dx/dt is the lattice speed.The lattice spacing dx and time-step dt are both unity.The equilibrium distribution ( , ) and forcing term are then computed from the density and velocity of the node where the components are [21] ( Here is the weight coefficient for the corresponding velocity vector, given by = 1 3 ⁄ , = 1 18 ⁄ , = 1 36 ⁄ and is the identity matrix.The post-collision distribution function ( , ) is computed using the MRT operator where is the orthogonal matrix which transforms the distribution function into moment space and is its inverse. is the diagonal matrix of relaxation rates for each moment.Full details of the MRT model can be found in [13].Finally, the post-collisional distribution is streamed to the neighbouring nodes To illustrate how accuracy is lost when computing the algorithm numerically, we consider an expansion of the summations involved in computing the macroscopic velocity at the node Each component of the velocity is a summation of differences (subtractions).Physically this is the difference between vectors of opposing direction (Figure 1).If the node velocity is very small (and hence the distribution is quite symmetric), these differences can become smaller than the ability of the floating point data type to resolve, and the calculation becomes unstable.
Floating point types are stored as two parts: a mantissa (1.2666665) and exponent (−1) in the example The mantissa in single precision (4-byte float) types is accurate to around 7 decimal places (in base 10), and the exponent ranges from −38 to +38.Therefore a difference can only be resolved to an accuracy of 10 the largest value in the negation.For a node with a small velocity, and distribution close to the zero-velocity (equilibrium) distribution ( = 1, = ), defined in D3Q19 by The equilibrium distribution f Eq (ρ, u) and forcing term F are then computed from the density and velocity of the node where the components are [21] Here w i is the weight coefficient for the corresponding velocity vector, given by w 0 = 1/3, w 1´7 = 1/18, w 8´18 = 1/36 and I is the identity matrix.
The post-collision distribution function f 1 (x, t) is computed using the MRT operator where M is the orthogonal matrix which transforms the distribution function into moment space and M ´1 is its inverse.S is the diagonal matrix of relaxation rates for each moment.Full details of the MRT model can be found in [13].Finally, the post-collisional distribution is streamed to the neighbouring nodes To illustrate how accuracy is lost when computing the algorithm numerically, we consider an expansion of the summations involved in computing the macroscopic velocity at the node ρu " ¨f1 ´f2 `f7 ´f8 `f9 ´f10 `f15 ´f16 `f17 ´f18 Each component of the velocity is a summation of differences (subtractions).Physically this is the difference between vectors of opposing direction (Figure 1).If the node velocity is very small (and hence the distribution is quite symmetric), these differences can become smaller than the ability of the floating point data type to resolve, and the calculation becomes unstable.
Floating point types are stored as two parts: a mantissa (1.2666665) and exponent (´1) in the example 1.2666665 ˆ10 ´1 The mantissa in single precision (4-byte float) types is accurate to around 7 decimal places (in base 10), and the exponent ranges from ´38 to +38.Therefore a difference can only be resolved to an accuracy of 10 ´7 the largest value in the negation.For a node with a small velocity, and distribution close to the zero-velocity (equilibrium) distribution f 0 (ρ 0 = 1, u 0 = 0), defined in D3Q19 by i " 1 to 6 2.7777778 ˆ10 ´2 i " 7 to 18 (9) only velocities larger than O(10 ´2) ¨10 ´7 = 10 ´9 can be resolved at all, and the calculation is verifiably unstable when velocities are less than 10 ´7 since only a few decimal places accuracy is maintained.When computing the flow in complex geometries, often with minimal connectivity, the magnitude of the velocity vectors can become smaller than this.A double precision implementation can handle this easily, but comes with the drawback of requiring considerably more compute time and memory than float precision.The small velocity can, in some cases, be counteracted by choosing a larger body-force.However if this is too large, it can exceed the model stability limits and the calculation will fail.Non-Darcy effects may also begin to appear in faster flow paths if the Reynolds number becomes too high [22].In these cases, slow flows are desirable for accurate permeability calculation.Ideally, it should not become an art to choose an appropriate body-force for a given simulation domain; often the connectivity in a simulation is not known beforehand.
The calculation variables can be transformed so that greater accuracy is obtained.Instead of storing the distribution function f (x, t), we define a perturbation df from the zero-velocity distribution in the following way, as was suggested by Skordos [5] f px, tq " f 0 `df px, tq so that the distribution function at the node is decomposed into a reference distribution, chosen as the zero-velocity equilibrium distribution f 0 (ρ 0 , u 0 ) with ρ 0 = 1 and u 0 = 0, and a perturbation df.The appropriateness of this transformation rests on the lattice Boltzmann algorithm being memory-bandwidth-limited.This means that it would be preferable to compute the full distribution function locally by adding together f 0 dbl and df f lt using double precision arithmetic (and where the subscripts dbl and flt express the variable's data type), storing df as a single-precision (float) data type in the main memory.However, even this is unnecessary if we transform the macroscopic quantities ρ " Although we are still computing differences in the expression for velocity, the zero-velocity expression for the distribution df is 0, so these differences are not bound by the order of magnitude of the calculation values, and are computed to full 7 d.p. accuracy.Note that the expression for the local density is still bound by the order 10 ˝.
The difference f Eq ´f which arises in the collision term should be considered as well to maximise the accuracy of the calculation.Writing this in terms of the deviation distribution df, and using the subscripts dbl and flt to indicate the precision (double and single respectively) to which the variables might be stored or calculated, we obtain Computation 2016, 4, 11 5 of 9 The difference could be computed to double precision and converted back to single precision afterwards.This is because the node distribution and equilibrium distribution will be of order 10 ´1, but the difference often considerably smaller.
Finally though, the need for double precision can be avoided completely if we first compute the equilibrium distribution as a perturbation from the zero equilibrium df Eq = f Eq (ρ, u) ´f 0 (ρ 0 , u 0 ) such that The density difference ρ ´ρ0 will be small relative to the density values of order 10 0 .However, this calculation can be made more accurate if we identify the following ρ ´ρ0 " We must explicitly compute this density difference from the right hand side summation and not via the densities themselves, otherwise precision is lost.This substitution is particularly important in the low velocity limit and was not mentioned in Skordos' original paper which only considered how the now comparable magnitudes of each term in (our) Equation ( 14) increases accuracy under addition, rather than the precision of each individual term [5].Using the transformed quantities, the evolution of the LB equation is given as In the single-relaxation-time (BGK) scheme, the collision part would similarly be written with a relaxation time τ df 1 px, tq " df px, tq `1 τ ´df Eq ´df ¯`ˆ1 ´1 2τ ˙F (18)

Results
The two methods of calculating the collision term are compared by computing the velocity distribution U(x) between two infinite parallel plates of separation L (Figure 2), for which the analytical solution is well known, using a viscosity µ and body-force component g in a direction parallel to the plates: Computation 2016, 4, 11 5 of 9 We must explicitly compute this density difference from the right hand side summation and not via the densities themselves, otherwise precision is lost.This substitution is particularly important in the low velocity limit and was not mentioned in Skordos' original paper which only considered how the now comparable magnitudes of each term in (our) Equation ( 14) increases accuracy under addition, rather than the precision of each individual term [5].Using the transformed quantities, the evolution of the LB equation is given as In the single-relaxation-time (BGK) scheme, the collision part would similarly be written with a relaxation time

Results
The two methods of calculating the collision term are compared by computing the velocity distribution ( ) between two infinite parallel plates of separation (Figure 2), for which the analytical solution is well known, using a viscosity and body-force component in a direction parallel to the plates: The method in which the complete distribution function is used, Equation ( 5) is referred to as the distribution method, and the scheme evolving the perturbation from the zero distribution, Equation ( 16) is referred to as the perturbation method.The deviation from the analytical solution for each method is computed as the relative error The method in which the complete distribution function is used, Equation ( 5) is referred to as the distribution method, and the scheme evolving the perturbation from the zero distribution, Equation ( 16) is referred to as the perturbation method.The deviation from the analytical solution for each method is computed as the relative error where u(x) is the computed velocity component in the body-force direction as a function of position x between the plates.For assuredly good convergence, computations are run for 10 6 time-steps.The error is shown in Figure 3 for the distribution and perturbation methods using respectively float (single-precision) data types and double (precision) data types.The body force used was g = 10 ´6 in dimensionless lattice units (Lu) and the viscosity µ " 1 6 For this system, the perturbation methods are consistently 2 to 3 orders of magnitude more accurate than the distribution method of the same data type, though the perturbation method with float precision cannot match the accuracy of the double-precision distribution method in this calculation.
Computation 2016, 4, 11 6 of 9 data types and double (precision) data types.The body force used was = 10 in dimensionless lattice units (Lu) and the viscosity = For this system, the perturbation methods are consistently 2 to 3 orders of magnitude more accurate than the distribution method of the same data type, though the perturbation method with float precision cannot match the accuracy of the double-precision distribution method in this calculation.To illustrate how the accuracy of the perturbation scheme is freed from dependence on the magnitude of the velocities, the relative error averaged across the flow profile is plotted in Figure 4 for different average flow velocities.These were obtained by systematically reducing the body-force.It is clear that as the distribution methods lose accuracy for lower lattice velocities, the perturbation methods maintain a consistent relative error.We also note that the accuracy of the distribution methods converge to those of the perturbation schemes as the magnitude of the velocities approaches that of the zero equilibrium distribution Equation (9).Finally, the accuracy of the perturbation method with float precision can exceed that of the distribution method with double precision at grid velocities below around 10 .To illustrate how the accuracy of the perturbation scheme is freed from dependence on the magnitude of the velocities, the relative error averaged across the flow profile is plotted in Figure 4 for different average flow velocities.These were obtained by systematically reducing the body-force.It is clear that as the distribution methods lose accuracy for lower lattice velocities, the perturbation methods maintain a consistent relative error.We also note that the accuracy of the distribution methods converge to those of the perturbation schemes as the magnitude of the velocities approaches that of the zero equilibrium distribution Equation (9).Finally, the accuracy of the perturbation method with float precision can exceed that of the distribution method with double precision at grid velocities below around 10 ´11 .
To demonstrate the practical advantages of the enhanced accuracy afforded by the perturbation scheme at low velocities, we compute the flow in a 3D pore space image of a Portland carbonate rock sample (Figures 5 and 6) obtained from micro-CT imaging [10].The sample is 400 3 lattice units in size, but reflected about the x = 0 plane to give continuous loop boundary conditions so that a body-force may be used [10].The structure of the pore-space is highly heterogeneous and of low permeability.
The body-force used was again g x = 10 ´6 and the viscosity µ " 1 6 .

6
It is clear that as the distribution methods lose accuracy for lower lattice velocities, the perturbation methods maintain a consistent relative error.We also note that the accuracy of the distribution methods converge to those of the perturbation schemes as the magnitude of the velocities approaches that of the zero equilibrium distribution Equation (9).Finally, the accuracy of the perturbation method with float precision can exceed that of the distribution method with double precision at grid velocities below around 10 .To demonstrate the practical advantages of the enhanced accuracy afforded by the perturbation scheme at low velocities, we compute the flow in a 3D pore space image of a Portland carbonate rock sample (Figures 5 and 6) obtained from micro-CT imaging [10].The sample is 400 3 lattice units in size, but reflected about the = 0 plane to give continuous loop boundary conditions so that a body-force may be used [10].The structure of the pore-space is highly heterogeneous and of low permeability.The body-force used was again = 10 and the viscosity = .The average velocity throughout the medium is given over the simulation time for the different schemes in Figure 7.Most strikingly, the distribution method using floating point precision exhibits a large, systematic deviation from the double precision calculation of almost 50% and as such would give an overestimate of the permeability by the same amount.The perturbation method with float precision on the other hand matches the double-precision distribution calculation closely, yet requires considerably less memory and computing time to perform.Run on a Tesla K20 GPU, the float precision calculations required 27.8 s per 1000 time-steps and the double precision calculation required 48.3 s.In our sparse grid implementation, the array indices of each fluid node's To demonstrate the practical advantages of the enhanced accuracy afforded by the perturbation scheme at low velocities, we compute the flow in a 3D pore space image of a Portland carbonate rock sample (Figures 5 and 6) obtained from micro-CT imaging [10].The sample is 400 3 lattice units in size, but reflected about the = 0 plane to give continuous loop boundary conditions so that a body-force may be used [10].The structure of the pore-space is highly heterogeneous and of low permeability.The body-force used was again = 10 and the viscosity = .The average velocity throughout the medium is given over the simulation time for the different schemes in Figure 7.Most strikingly, the distribution method using floating point precision exhibits a large, systematic deviation from the double precision calculation of almost 50% and as such would give an overestimate of the permeability by the same amount.The perturbation method with float precision on the other hand matches the double-precision distribution calculation closely, yet

Conclusions
We have shown how the simple transformation of the LB distribution function proposed by Skordos [5], does indeed lead to a considerable increase in computational precision when correctly implemented.Furthermore, the accuracy of this approach is no longer dependent on the magnitude of the velocity at the grid node.With application to flows in complex porous media, the modified scheme was shown to accurately compute the velocity field inside a heterogeneous pore-space when the average velocity became small.This means that it is no longer imperative to revert to using double precision arithmetic to achieve numerical stability and accuracy in simulations involving small grid velocities.Since the perturbation method incurs no extra computational burden, the computational efficiency is dramatically improved for these applications.

Figure 1 .
Figure 1.The 19 velocity components of the D3Q19 scheme.

Figure 3 .
Figure 3.The relative error of the flow between plates for different calculation schemes and precisions for a body-force of 10 .

Figure 3 .
Figure 3.The relative error of the flow between plates for different calculation schemes and precisions for a body-force of 10 ´6.

Figure 4 .Figure 4 .
Figure 4.The average relative error of the flow profile between plates for different calculation schemes against mean flow velocity.

Figure 5 .
Figure 5.A sample of Portland carbonate of porosity 9.0%.

Figure 6 .
Figure 6.The velocity distribution computed in the pore space image of the Portland carbonate sample.

Figure 5 .
Figure 5.A sample of Portland carbonate of porosity 9.0%.

Figure 5 .
Figure 5.A sample of Portland carbonate of porosity 9.0%.

Figure 6 .
Figure 6.The velocity distribution computed in the pore space image of the Portland carbonate sample.

Figure 6 .Figure 7 .
Figure 6.The velocity distribution computed in the pore space image of the Portland carbonate sample.