MRT-Lattice Boltzmann Model for Multilayer Shallow Water Flow

: The objectives of this study are to introduce a multiple-relaxation-time (MRT) lattice Boltzmann model (LBM) to simulate multilayer shallow water ﬂows and to introduce graphics processing unit (GPU) computing to accelerate the lattice Boltzmann model. Using multiple relaxation times in the lattice Boltzmann model has an advantage of handling very low kinematic viscosity without causing a stability problem in the shallow water equations. This study develops a multilayer MRT-LBM to solve the multilayer Saint-Venant equations to obtain horizontal ﬂow velocities in various depths. In the multilayer MRT-LBM, vertical kinematic viscosity forcing is the key term to couple adjacent layers. We implemented the multilayer MRT-LBM to a GPU-based high-performance computing (HPC) architecture. The multilayer MRT-LBM was veriﬁed by analytical solutions for cases of wind-driven, density-driven, and combined circulations with non-uniform bathymetry. The results show good speedup and scalability for large problems. Numerical solutions compared well to the analytical solutions. The multilayer MRT-LBM is promising for simulating lateral and vertical distributions of the horizontal velocities in shallow water ﬂow.


Introduction
The shallow water equations are used to describe flow in bodies of water where the horizontal length scales are much greater than the fluid depth (e.g., river or lake hydrodynamics, coastal and estuarine circulation, overland flow, etc.). They have wide applications in hydraulic engineering [1][2][3] and coastal engineering [4], and can be used to study main physical phenomena of interest to scientists and engineers such as storm surges [5], tsunami and bore wave propagation [6], the stationary hydraulic jump, forces acting on off-shore structures, and river, reservoir and open channel flows [1,7]. The shallow water equations can also be coupled to transport equations to model pollutant transport [8], salinity and temperature [9], and sediment transport [6,10], which are important subjects in many industrial and environmental projects. More advanced depth-averaged models were developed for shear shallow water flows [11,12] and for coastal waves in the shoaling and surf zones [13,14].
When vertical effects are important, such as in baroclinic regimes, where density varies with salinity and temperature, three-dimensional flow should be used. This would require the solution of a system of equations coupling the Navier-Stokes equation to a moving free surface boundary. However, solving full Navier-Stokes equations in three dimensions is computationally challenging and may have difficulties handling the discontinuities in the free surface. Substantial literature exists on the application of various numerical methods, e.g., finite difference methods (FDMs), finite equations with very small viscosity. This stability would be even worse in the multilayer shallow water equations. This study will introduce MTR-LBM to increase stability and accuracy and eliminate spurious oscillations. Moreover, this study aims to investigate LBM performance on GPU-based HPC environments using MATLAB code and the Jacket GPU engine.

Multilayer Shallow Water Equations
Consider a shallow water flow regime in which the vertical length scale is much smaller than the horizontal length scale. By depth-integrating the continuity equation and the Navier-Stokes equations for incompressible and viscous flows with free surface, the shallow water equations with horizontal viscous terms are [3]: where i and j are Cartesian indices and the Einstein summation convention is used, h is the water depth, u i is the depth-averaged velocity component in the i direction, g is the gravitational acceleration, ν is the kinematic viscosity, F i is the external force acting on the shallow water flow, and t is the time. Based on the multilayer Saint-Venant system [49,50], the governing equations are similar to the above shallow water equations with horizontal viscous terms for each layer and additional terms for transferring momentum between layers: where h ( ) is the local water height in layer , u  [49][50][51], density gradient forcing, F ( ) ρ [52], and forcing from the Coriolis effect (F ( ) Ci ) as follows: where δ M is the Kronecker delta function (δ M = 1, if = M) and τ W i is the wind stress in i direction, which is the product of fluid density (ρ), air density (ρ a ), wind stress coefficient (C W ), wind speed measured at 10 m above water surface (W s ), and wind velocity component in i direction (U Wi ).
Water 2019, 11, 1623 4 of 21 where z b is the bed elevation.
where δ 1 is Kronecker delta function (δ 1 = 1, if = 1), κ is the bottom friction coefficient, and µ is the vertical (kinematic) eddy viscosity, which is also called vertical viscosity [49][50][51] or internal friction coefficient [52]. The linear bed friction law is chosen to calculate bed friction. The 2nd and 3rd terms on the right-hand side of Equation (8) represent internal friction between layers. The vertical eddy viscosity µ is much larger than the kinematic viscosity ν in Equation (4).
is at the center of layer . The fluid density is constant in this study. However, this study follows [52] to include longitudinal density gradient in order to study the baroclinic circulation.
where f c = 2 sin ϕ is the Coriolis parameter, which a function of Earth rotation rate ( ) and latitude (ϕ).

MRT-Lattice Boltzmann Modeling
This study adopts the multiple-relaxation-time (MRT) lattice Boltzmann model [53,54] to solve the multilayer shallow water equations. Specifically, the authors implement the MRT-LBM to a D2Q9 lattice, which defines the streaming velocity as 2c cos 1 4 (2α − 9)π , sin 1 4 (2α − 9)π α = 5, 6, 7, 8 where c α = {c αi } is a streaming velocity along a streaming direction defined by an index α, and c is the lattice speed. Given square lattice size ∆x and time step ∆t, the lattice speed is c = ∆x/ ∆t. Then, the evolution equation for the MRT-LBM on a D2Q9 lattice for each layer is where f ( ) = f ( ) α , α = 0, 1, 2, . . . , 8 ∈ R 9 is a vector of particle distribution functions for layer m ( ) ∈ R 9 and m ( )eq ∈ R 9 are vectors of moments and their equilibria, respectively, for layer , and M ∈ R 9×9 is a transformation matrix that transforms the particle distribution functions and equilibrium distribution functions (EDFs) from velocity space to moment space, which makes m ( ) = M f ( ) and m ( )eq = M f ( )eq . f ( )eq = f ( )eq α , α = 0, 1, 2, . . . , 8 ∈ R 9 is a vector of the EDFs for layer . S = diag(s 0 , s 1 , . . . , s 8 ) ∈ R 9×9 is a diagonal matrix of multiple relaxation rates, where s α are the relaxation rates. right-hand side, particle collision is achieved by linear relaxation processes executed in the moment space. For each time step, particle distribution functions reach their neighboring nodes simultaneously through prescribed lattice connections. The transformation matrix M for D2Q9 is given by Lallemand and Luo [22]: where the vectors {b α } are mutually orthogonal [53,55]. Inserting matrices M and S into Equation (13), the evolution equation in the direction α for each layer becomes [54] f ( ) The moments m ( ) applied to the shallow water equations for each layer are where m xy are related to the diagonal and off-diagonal components of the stress tensor, respectively. When applied to the shallow water equations, the conserved moments are the water depth and the flow momenta in each layer: The remaining moments are not conserved quantities. The EDFs applied to the multilayer shallow water are Equation (17).
where c 2 s = gH/2 is known as the squared speed of sound and is obtained through the recovery of the shallow water equations up to second order by the Chapman-Enskog analysis. For the D2Q9 lattice, the weighting coefficient ω α = 1/3 is for α = 1,2,3,4 and ω α = 1/12 is for α = 5,6,7,8. Since each layer has the same planar discretization, the weighting coefficients ω α remain the same for each layer. Using Equations (14) and (19), the equilibrium moments, m ( )eq = M f ( )eq , are [46] m ( )eq 1 The equilibria of the conserved moments (m ) are equal to themselves. Therefore, the relaxation rates s 0 , s 3 , and s 5 have no effect on MRT-LBM solutions. With the moment equilibria given by Equations (20)-(22), the shallow water equations can be recovered with the shear viscosity (ν) and bulk viscosity (ζ) given [55] as Setting s α = 1/τ, the evolution Equation (13) reduces to the SRT-LBM model, where τ is the relaxation time. The LBM with a single relaxation time is commonly referred to as the Bhatnagar-Gross-Krook (BGK) collision operator model (LBGK), and has been introduced to the shallow water equations [34,56]. For the SRT-LBM, it has ν = 2ζ= 1 3 τ − 1 2 c∆x. When τ is close to 1/2, the kinematic viscosity becomes very small (e.g., ν = 1 × 10 −6 m 2 /s), causing the numerical instability. The MRT-LBM model has no such problem because the relaxation rates can be selected to attain stable solutions. In addition, to increase solution stability, this study adopts an implicit step to update flow velocities [39].

Initial Conditions
The initial conditions for a physical problem to be modeled are given in the form of macroscopic variables, which is normal practice in traditional numerical methods. Since the lattice Boltzmann formulation is based on solving Equation (15), the initial conditions must be written in terms of the distribution function. Given the initial macroscopic boundary conditions, (19)), are computed and used as initial conditions for f ( ) α .

Periodic Boundary Conditions
In cases where the flow pattern repeats itself at boundaries, a periodic boundary condition is required. To achieve boundary conditions in the x direction in the lattice Boltzmann formulation, we set the unknown distribution functions, f  Figure 1a for an example) to be the same as known distribution functions at the outflow boundary, Γ out f low : The unknown distribution functions, f 3 , f 6 and f 7 at the outflow boundary are set to the corresponding known distribution functions at the inflow boundary: Periodic boundary conditions similar to Equations (25) and (26) in the y direction can be formulated.

Solid Boundary Conditions
Solid boundaries in the flow region can be either no-slip or free-slip, which corresponds to zero velocity or zero normal velocity, respectively, at the boundary. A no-slip boundary condition is achieved by the bounce-back scheme under symmetry conditions [21]. We set the unknown distribution functions, , , .
This results in zero normal and tangential flow velocities at the boundary. A free-slip boundary condition is also achieved by the bounce-back scheme [21]. The unknown distribution functions,  , , .
This ensures no flow across the normal direction and flow along the tangential direction.

Open Boundary Conditions
Open boundary conditions refer to known macroscopic boundary values or functions at boundaries (e.g., known water depth at inflow or outflow boundaries). If flow velocity and water depth at the boundaries are known, unknown distribution functions can be computed using Equations (17-18) following the method described by Zou and He [57]. For example, at the inflow boundary (see Figure 1a), Equations (17-18) lead to the following three equations:

Solid Boundary Conditions
Solid boundaries in the flow region can be either no-slip or free-slip, which corresponds to zero velocity or zero normal velocity, respectively, at the boundary. A no-slip boundary condition is achieved by the bounce-back scheme under symmetry conditions [21]. We set the unknown distribution functions, f This results in zero normal and tangential flow velocities at the boundary. A free-slip boundary condition is also achieved by the bounce-back scheme [21]. The unknown distribution functions, f This ensures no flow across the normal direction and flow along the tangential direction.

Open Boundary Conditions
Open boundary conditions refer to known macroscopic boundary values or functions at boundaries (e.g., known water depth at inflow or outflow boundaries). If flow velocity and water depth at the boundaries are known, unknown distribution functions can be computed using Equations (17) and (18) following the method described by Zou and He [57]. For example, at the inflow boundary (see Figure 1a), Equations (17) and (18) lead to the following three equations: Given known flow velocities and water heights in each layer in Equations (29)

GPU Accelerated LBM
This study implements the MRT-LBM code to a GPU architecture to solve the multilayer shallow water equations. The MRT-LBM code was written in MATLAB ® (2008, Natick, MA, USA) and is parallelized with AccelerEyes Jacket GPU Engine and the NVIDIA ® Compute Unified Device Architecture (CUDA) [58] on a single GPU workstation. In all simulations, the computations were performed in a single precision. In this study, the simulations are performed on a single workstation 3.0-GHz Intel ® Core™2 Extreme quad core with an NVIDIA ® Tesla™ C1060 Computing Processor. The NVIDIA ® Tesla™ C1060 Computing Processor contains 240 stream processors running at 1.3 GHz, which has a peak performance of 933 GFLOPs. Readers refer to [46] for detail explanations of how the GPU accesses memory and processes data.

Jacket's GPU Engine
The Jacket GPU engine for MATLAB ® is built on NVIDIA's CUDA technology. It enables a standard MATLAB code to run on the GPU and connects the user-friendliness of MATLAB directly to the speed and visual computing capability of the GPU [59]. MATLAB GPU computing with Jacket starts at the most basic level through the replacement of low-level MATLAB data structures which normally reside on the CPU with data structures that reside on the GPU. This replaces the lowest level of MATLAB's CPU-bound computation engine, moving the work MATLAB would normally do on the CPU to the GPU. Jacket Beta version 0.  [59] on a 32-bit Windows XP with MATLAB R2007b is used. Jacket is run on the 2.0 beta version of the CUDA toolkit for Windows, which uses version 1.1 compute capabilities.
Jacket-enabled MATLAB scripts achieve speed improvements in the range of 2-10 times, and in some cases up to 100x improvements, over equivalent CPU versions. While Jacket accelerates MATLAB functions and computations at a lower level, the overall speedup of an algorithm depends on the nature of the algorithm. The LBM has a very straightforward implementation consisting of only local calculations (collisions) and nearest neighbor memory transfers (streaming), which makes it a great candidate to be implemented both on the GPU and in MATLAB.

Optimizing MATLAB GPU Performance
Implementing algorithms on the GPU using Jacket requires certain considerations to optimize performance. Both MATLAB and Jacket perform best on vectorized code. A vectorized code can make it easy to determine which parts of an algorithm are inherently serial or parallel. Both MATLAB and Jacket take advantage of the inherent parallelism of the MATLAB scripting M-language which is extremely powerful when utilized wisely. A good understanding of the memory dependencies of an algorithm is necessary as CPUs are inherently serial computing devices and GPUs are inherently parallel computing devices. In a program, one can control which segments of code are run on each device through the casting operations. Each casting operation to and from the GPU pushes or pulls data back and forth from CPU memory to GPU memory. Excessive memory transfers should be avoided as they will reduce the performance of an application. The Jacket software minimizes these memory transfers automatically in normal operation. However, care must be taken in implementing an algorithm. Fortunately, the LBM can be completely vectorized and therefore all computations can be carried out on the GPU. Transfers to CPU memory are only necessary for outputting solutions at desired intervals. Currently, a transfer to CPU is necessary for MATLAB plotting routines; however, due to the nature of the GPU, plots can be created through OpenGL.

Computational Aspects
The basic code to be parallelized on the GPU using Jacket is written in MATLAB M-Language and follows the same traditional practice of explicitly separating the collision and streaming operations. The solution algorithm has not changed. However, in order to take advantage of the GPU and MATLAB, the codes must be vectorized. Due to vectorization, the solution procedure focuses on three main steps: the calculation of local macroscopic variables from distribution functions, the collision step and the streaming step. Two copies of the distribution functions are used to allow the code to be vectorized. The vectorized version of the code is straightforward. The Jacket GPU engine makes translating the code on the GPU as simple as casting the variables to the GPU. From there, all calculations are performed on the GPU. Since the LBM is inherently parallel, there is no need to cast variables back to the CPU until the end of the simulation or when variables are written to file.

GPU Performance
The parallel performance on the GPU is investigated in this section. A rectangular lake, 170 km × 60 km, with a flat bottom was used to simulate wind-driven circulation. The model domain was discretized into four different grid cell sizes for comparison. The numbers of columns, rows, and layers are: 171 × 61 × 10, 341 × 121 × 10, 681 × 241 × 10, and 1361×481×10 with cell sizes of 1000 m, 500 m, 250 m, and 125 m, respectively, and 10 layers. Each layer has an initial local water height of 1 m, such that initial water depth is 10 m. The initial flow velocity is zero. We used τ = 0.501 and c = 20 m/s with ∆t calculated as ∆x/c for LBM parameters. Other parameters are f c = 0 s −1 , ρ = 1025 kg/m 3 , ρ a = 1.2 kg/m 3 , C W = 0.0015, µ = 0.01 m 2 /s, κ 0.001 m/s, U Wx =7.4536 m/s and U Wy = 0 (positive x wind direction). According to Equation (6), wind stress is τ W x = 0.1 N/m 2 along the x direction. The EDFs in Equation (19) with static water and 1-m initial local water height determines the initial condition for the distribution functions. We applied free-slip bounce-back boundary conditions to the vertical walls.
The parallel performance of the GPU is based on arithmetic intensity and data access patterns [54]; therefore, the parallel performance is investigated based on the scalability of the speedup with increasing problem size. The simulations were run for a simulation time of 30 h where steady state has been achieved. The average time per time step is investigated to make a fair comparison on computational effort.
The execution time per time step and speedup for the GPU over a single core of the CPU in MATLAB for the four cases are shown in Table 1. It demonstrates the importance of arithmetic intensity in LBM performance on the GPU. If the number of lattice nodes is sufficiently high, the computations outweigh the data transferring overhead, yielding a high arithmetic intensity. The multilayer LB algorithm yields a speedup of approximately 2.2-fold on the smallest number of lattice nodes with the maximum speedup of approximately 22.0-fold on the largest number of lattice nodes.

Verification
We first verify the multilayer MRT-LBM using the steady-state analytical solutions in [52] by neglecting the effects of inertial terms and the unsteady terms for cases of wind-driven, density-driven and combined wind-driven and density-driven circulations. The velocity in the x direction is where H 0 is the still water depth. The free-surface slope term g∂H/∂x is The model is a 3400 m × 1400 m rectangular lake with a flat bottom. The initial water depth is 65 m. Three numerical models with difference number of layers were developed for comparison. The model domain was discretized into 501 × 206 lattices in the planar direction with a cell size of 6.8 m and five, ten and twenty layers. The corresponding initial local water heights for each layer is 13, 6.5, and 3.25 m, respectively. Fluid density is considered to be constant in this study but can have a constant density gradient along the longitudinal direction when density-driven circulation is considered. The LBM parameters are ∆t = 0.17 s and c = 40 m/s. To achieve a kinematic viscosity of ν = 1 × 10 −6 m 2 /s, the relaxation time parameter in the SRT-LBM is given by τ = 1 2 + 3ν/c∆x, i.e., τ = 0.5 + 1.1029 × 10 −8 . For the MRT-LBM, the relaxation rates s 4 =s 6 =s 7 =s 8 =1/τ, and s 1 =s 2 =s 7 −0.6 were used. The linear friction law was used for bed friction. The approach to initialize distribution functions and the boundary conditions to the four vertical walls are the same as in the previous numerical case.
The wind-driven circulation validation is performed for two different wind stress values, τ W iz = 0.03 N/m 2 and τ W i = 0.3 N/m 2 . Wind direction is along the positive x direction. The physical parameters for this case are ∂ρ/ ∂x = 0, ∂ρ/ ∂y = 0, ρ = 1025 kg/m 3 , ρ a = 1.2 kg/m 3 , C W = 0.0015, µ = 0.004 m 2 /s, κ = 0.005 m/s. As shown in Figure 2, the multilayer MRT-LBM solutions from the 5-layer model to the 20-layer model compare well to the analytical solutions for u x profile at the location (x, y) = (1700 m, 700 m) (the center of the lake). Figure 2 indicates that the 10-layer model is sufficient to capture the velocity profile.  Figure 3. Figure 3 indicates that the 10-layer model is sufficient to capture the velocity profile. The density-driven circulation validation is performed for two different horizontal density gradients, ∂ρ/ ∂x = −5 × 10 −7 kg/m 4 and ∂ρ/∂x = −5×10 −5 kg/m 4 . The physical parameters for this case are τ W i = 0, ∂ρ/ ∂y = 0, ρ = 1025 kg/m 3 , ρ a =1.2 kg/m 3 , C W = 0.0015, µ = 0.004 m 2 /s, κ = 0.005 m/s. The multilayer MRT-LBM solutions with three different numbers of layers compare well to the analytical solutions for u x profiles for constant horizontal density as shown in Figure 3. Figure 3 indicates that the 10-layer model is sufficient to capture the velocity profile.    Figure 4, the multilayer MRT-LBM solutions compare well to the analytical solutions for u x profile for the combined effects. Figure 4 indicates that the 10-layer model is sufficient to capture the velocity profile. multilayer MRT-LBM solutions compare well to the analytical solutions for x u profile for the combined effects. Figure 4 indicates that the 10-layer model is sufficient to capture the velocity profile.

Wind-Driven and Density-Driven Circulation in Rotating Basins
In this example, the multilayer MRT-LBM is demonstrated on GPU-based HPC. The study simulates wind-driven, density-driven, and combined wind-density-driven circulations over a Gaussian bathymetry profile [39]: where D is the width of the basin. The x axis coincides with the southern lateral wall of the basin and is pointed toward the head of the system. The y axis is laid along the closed boundary at x = 0. The numerical domain is 100 by 15 km. The maximum depth, 20 m, occurs at y = −3D/10 km. The local maximum depth, 16 m, occurs at y = 3D/10 km. For each case, the grid size is 801 × 121 × 10. The LBM parameters are ∆x = 125 m, ∆t = 6.25 s, and c = 20 m/s. To achieve a kinematic viscosity of ν = 1 × 10 −6 m 2 /s, the relaxation time parameter in the SRT-LBM is τ = 0.5 + 1.2 × 10 −9 . For the MRT-LBM, the relaxation rates, s 4 = s 6 = s 7 = s 8 = 1/τ, and s 1 = s 2 = s 7 − 0.6, were used. All closed boundaries have the free-slip bounce-back boundary condition. The initial water is stationary. The horizontal density gradient is constant. Uniform wind stress was linearly increased for the first six simulated hours and became constant after that. Wind direction is along positive x direction. We executed numerical simulations on a single workstation with a 3.0-GHz Intel ® Core™2 Extreme quad core processor and a NVIDIA ® Tesla™ C1060 Computing Processor. The parallel performance of a single core of the quad core processor and the Tesla are compared.
For the wind-driven case, the physical parameters are τ W i = 0.04 N/m 2 , ∂ρ/ ∂x = 0, ∂ρ/ ∂y = 0, ρ = 1025 kg/m 3 , ρ a = 1. For the density-driven case, the physical parameters are τ W i = 0, ∂ρ/ ∂x = −5 × 10 −8 kg/m 4 , ∂ρ/ ∂y = 0, ρ = 1025 kg/m 3 , ρ a = 1.2 kg/m 3 , C W = 0.0015, µ = 0.004 m 2 /s, κ = 0.005 m/s, and f c = 1 × 10 −4 s −1 . Figure 6 shows the u x and u y distributions at plane x = 50 km and plane x = 98 km. The u x flows in the direction of the horizontal gradient at all depths in the shallow regions along the transverse boundaries and the center of the channel shown in Figure 6a,b. The u x is in the opposite direction of horizontal gradient at deep depths. These flow directions are the opposite of those found in the purely wind-driven case. Highest flow occurs near the surface and flow decreases with depth due to the bottom friction. Figure 6c,d show the transverse flow u y at plane x = 50 km and plane x = 98 km. Although the flow field is reversed for this case, the effect of the Earth's rotation is consistent with the wind-driven case. Moreover, the transverse velocities exhibit similar behavior as in the wind-driven case with stronger magnitude at the x = 98 km plane. However, at the x = 50 km plane, the velocities are small, yet exhibit a vertical distribution of positive and negative flows.
For the combined wind-driven and density-driven case, the physical parameters are τ W i = 0.04 N/m 2 , ∂ρ/ ∂x = −5 × 10 −8 kg/m 4 , ∂ρ/ ∂y = 0, ρ = 1025 kg/m 3 , ρ a = 1.2 kg/m 3 , C W = 0.0015, µ = 0.004 m 2 /s, κ = 0.005 m/s, and f c = 1 × 10 −4 s −1 . Figure 7 shows the u x and u y distributions at plane x = 50 km and plane x = 98 km. For the combined wind-driven and density-driven case, the u x distribution is similar to the density-driven case in terms of direction of the flow in shallow and deep regions as shown in Figure 7a,b. Figure 7c,d show the contours of the transverse flow, u y at x = 50 km and x = 98 km planes. The flow patterns created by bottom friction, the Earth's rotation, and bathymetry are all consistent with previous results. The main difference in the combined case is that the magnitudes of the flow are smallest near bed, increase in the positive z direction, and then decrease again near the surface. This is due to the bottom friction and the wind stress occurring in the opposite direction of the density gradient. The density gradient accounts for the direction of the flow, while the wind stress accounts for the smaller magnitude velocities near the surface.       The simulation time for each case was 47.3 h on a single core of the CPU and 1.68 h on the Tesla GPU, demonstrating a 28.16-fold speedup.

Conclusions
The lattice Boltzmann model (LBM) with multiple-relaxation-time (MRT) collision operation is a potential numerical method to simulate shallow water flow. The MRT-LBM can handle very low kinematic viscosity by using the last two relaxation rates (reciprocal of relaxation time) in D2Q9 lattice. Other relaxation rates can be determined with flexibility to ensure solution stability and accuracy. The MRT-LBM can avoid a stability problem which is often encountered when LBM with single-relaxation-time collision operation is used and the relaxation time is very close to 0.5.
The multilayer MRT-LBM is able to solve the multilayer Saint-Venant equations to obtain horizontal flow velocity distributions at different depths. The implementation of the multilayer MRT-LBM along with a given initial condition, boundary conditions, and forcing terms is straightforward. This study demonstrated the MRT-LBM to irregular bathymetry. The MRT-LBM solutions compare well to analytical solutions for horizontal velocity in various depths under the effects of wind-driven forcing, density-driven forcing, and their combined forcing.
The multilayer MRT-LBM is suitable for graphics processing unit (GPU) computing due to the locality of particle interaction and the transport of particle information in the LBM algorithm. For small grid sizes, the speedup is not impressive because the data transferring overhead between grid blocks on the GPU is not small compared to the actual computational cost. However, as the grid size increases, the computational cost becomes larger and dominates the data transferring overhead. This results in large speedup.