Validation of the GPU-Accelerated CFD Solver ELBE for Free Surface Flow Problems in Civil and Environmental Engineering

This contribution is dedicated to demonstrating the high potential and manifold applications of state-of-the-art computational fluid dynamics (CFD) tools for free-surface flows in civil and environmental engineering. All simulations were performed with the academic research code ELBE (efficient lattice boltzmann environment, http://www.tuhh.de/elbe). The ELBE code follows the supercomputing-on-the-desktop paradigm and is especially designed for local supercomputing, without tedious accesses to supercomputers. ELBE uses graphics processing units (GPU) to accelerate the computations and can be used in a single GPU-equipped workstation of, e.g., a design engineer. The code has been successfully validated in very different fields, mostly related to naval architecture and mechanical engineering. In this contribution, we give an overview of past and present applications with practical relevance for civil engineers. The presented applications are grouped into three major categories: (i) tsunami simulations, considering wave propagation, wave runup, inundation and debris flows; (ii) dam break simulations; and (iii) numerical wave tanks for the calculation of hydrodynamic loads on fixed and moving bodies. This broad range of applications in combination with accurate numerical results and very competitive times to solution demonstrates that modern CFD tools in general, and the ELBE code in particular, can be a helpful design tool for civil and environmental engineers.


Introduction
Hydrodynamic free-surface flows with and without fluid-structure interactions (FSI) have received an ever-increasing interest over the past few years (see, e.g., [1]). Applications of such flows range from naval architecture to offshore, marine, coastal and environmental engineering. In the present publication, free surface flows and potential applications of free surface solvers in civil and environmental engineering are addressed. Such flow problems involve an air and a water phase, so that, technically, such flows can be classified as two-phase flows. However, as for most problems of practical relevance, the flow behavior is dominated by the water phase, it is not necessary to resolve the full aerodynamics inside the air phase. Instead, the latter can be represented by proper kinematic and dynamic boundary conditions at the free surface, and the sharp interface is allowed to move freely [2].
Typical free surface flow problems occur in many different fields of civil and environmental engineering. Moreover, large-scale free surface flows, such as breaking waves or tidal currents, are fascinating, as they give us indications of the enormous amount of energy that water can contain, for both good and bad reasons. Recently, natural disasters have attracted public attention to hazardous free surface flow events. Particularly, the 2004 Boxing Day event and the recent 2011 Tohoku tsunami [3] have increased the awareness of the potential threat posed by tsunamis and, consequently, the responsibility of civil engineers to minimize the consequences of such major impacts. Here, numerical tools can help to establish early warning systems and to provide accurate and fast predictions for tsunami severeness and the details of coastal impact [4].
Apart from such challenging, even life-threatening long wave propagation applications, local wave impact and wave slamming are of primary importance in the design process in various kinds of engineering sciences, e.g., for the design of offshore structures, such as offshore wind turbines, which are possibly one answer to the open questions concerning the future of our energy supply, the wave impact force on monopiles is the main load case during the design process and has been part of extensive research [5]. For naval engineers, the flow patterns around and in the wake of a ship hull are important to know, as they highly depend on turbulent effects in the boundary layer. From a numerical point of view, all of those applications require fully three-dimensional, viscous and turbulent free surface flow simulations.
Even in this limited list of applications, the diversity of free surface flow problems and the wide range of length and time scales involved becomes obvious. On top of this, research on free surface flows and efficient numerical tools is by nature multidisciplinary, not only regarding flow theory and conservation laws, but also developing efficient algorithms and high performance computer models that are crucial for the development of accurate and efficient free surface flow solvers for desktop computers.
In this paper, selected applications of our in-house numerical tool ELBE (efficient lattice boltzmann environment, http://www.tuhh.de/elbe) for the simulation of free surface flow problems are presented. State-of-the-art high performance graphics processing units will be combined with an efficient numerical method to address various problems in civil and environmental engineering, on different length and time scales. After a quick review of the solver in Section 2, four applications will be addressed: tsunami-related propagation and inundation simulations in Section 3, debris flow in Section 4, dam break scenarios in Section 5 and a numerical wave tank for on-and off-shore structures in Section 6.

The ELBE Code
The efficient lattice Boltzmann environment ELBE is a versatile, efficient toolkit for the numerical simulation of complex two-and three-dimensional flow problems. Emphasis is given to marine-and coastal-engineering free-surface flows, such as wave breaking, tank sloshing or tsunami propagation. The model considers non-linear flow behavior with and without a free surface, the effects of viscosity and turbulence. On top of this, the software uses a bidirectional, partitioned, explicit coupling approach for the simulation of fluid-structure interaction problems. The rigid body motion is modeled with a quaternion-based motion modeler or a physics engine. ELBE is based on the lattice Boltzmann method (LBM) and offers various different collision operators, boundary conditions, turbulence models, interface capturing methods and grid refinement techniques. Thanks to the very efficient numerical back-end, three-dimensional simulations of complex flows are possible in a very competitive simulation time, as discussed in the following.

Lattice Boltzmann Method
While classical CFD solvers are based on the macroscopic Navier-Stokes equations, the LBM handles CFD problems on a microscopic scale. LBM's fundamental variable is the particle distribution function f (t, x, ξ), which specifies the probability to meet a fictive particle with velocity ξ at position x and time t. To obtain a model with reduced computational costs, the velocity space is discretized, and discrete particle velocities e i are introduced. In the present work, a three-dimensional model with 19 discrete microscopic particle velocities e i and corresponding particle distribution functions f i (t, x) is applied. A subsequent standard finite difference discretization of the velocity-discrete Boltzmann equation on an equidistant Cartesian grid then leads to the lattice Boltzmann equation: where the left-hand side is an advection-type expression and the discrete collision operator Ω i on the right-hand side models the interactions of particles at the microscopic scale. For the latter, the advanced multiple relaxation time (MRT) model [6] is used in this work. The MRT transforms the particle distribution functions into moment space, where they are relaxed to an equilibrium state with several different relaxation rates. The benefits of this operator are an increased stability and the possibility to develop more accurate boundary conditions [7]. The solutions of the lattice Boltzmann equation satisfy the incompressible Navier-Stokes equations up to errors of O(∆x 2 , Ma 2 ) [8]. The first two hydrodynamic moments of the particle distribution functions include the macroscopic values for the density and momentum fluctuation: A Smagorinsky large eddy model (LES) captures turbulent structures in the flow, including an additional turbulent viscosity ν T for the effects of unresolved sub-grid eddies [9]. Since after the advection step, the incoming particle distribution functions at the domain boundaries are missing, they are reconstructed with the help of boundary conditions. For no-slip and velocity boundaries, a simple bounce back scheme is used [10]. At the free surface, the anti-bounce-back rule [11] balances the fluid pressure and the surrounding atmospheric pressure. Volume forces, like gravity, are added directly to the distribution functions at every time step [12].

Free Surface Model
Free surface flows are immiscible two-phase flows, which are dominated by the denser phase, due to high viscosity and density ratios between the two phases. If capillarity is neglected, the simulation of the denser phase is sufficient for many applications. The influence of the lighter phase on the flow dynamics can be approximated by kinematic and dynamic boundary conditions at the interface. Numerically, the free surface represents a moving boundary, which is allowed to move freely, but at the same time has to be kept sharp. A common and straightforward way to simulate free surface flows in the scope of LBM is the combination of a volume of fluid (VOF) method and a flux-based mesoscopic advection scheme [11]. In contrast to common VOF methods, the flux terms are expressed directly in terms of the distribution functions. The VOF approach captures the interface via the fluid fill level ε of a cell: ε = 0.0 marks an empty cell in the inactive gas domain, and ε = 1.0 corresponds to a filled cell inside the fluid domain. Fluid and gas cells are separated by a closed interface layer with a fill level ε ∈ (0.0; 1.0). The new fill level of a cell is calculated by balancing the mass fluxes between the neighboring cells and updating the fill level. The wet area between two cells is calculated on the basis of a simplified surface reconstruction, e.g., as the arithmetic mean of the fill levels of two neighboring cells. Consequently, the lattice Boltzmann VOF advection scheme can be considered as a specialized, geometry-based VOF method on the basis of a mesoscopic advection model. Note that, in contrast to higher order schemes [13], the normal vector information is not considered here. Nevertheless, the model has been shown to perform well for many different free-surface scenarios [14,15] and also in the context of GPU implementations [16].

Fluid-Structure Interaction
Apart from standard bulk and free surface flows, ELBE also provides a fluid-structure interaction interface. Floating-body motions follow from a motion solver, which converts the external and hydrodynamic forces exerted on the body into the equivalent motions. For the coupling of the explicit LBM and the motion solvers, a bidirectional and explicit coupling approach is used [17,18]. First, the hydrodynamic loads on the rigid bodies are evaluated by integrating the stress tensor over the bodies' surface (stress-integration method (SIM)). Since the rigid bodies are considered to be infinitely stiff and do not allow elastic or plastic deformations, the evaluation of the integral force on each obstacle is sufficient. Once the hydrodynamic loads are calculated, the force and torque information is transferred to the motion solver, and the subsequent time step of rigid body motion is computed. The resulting displacements and velocities are transferred back to ELBE, where the geometries are updated, and a modified bounce back scheme serves to incorporate the rigid body boundary velocity. The bodies are represented by triangulated surface meshes and are mapped to the fluid field by our highly efficient in-house voxelizator [19], so that the grid update is up to two orders of magnitude faster than one flow field update.
For single rigid bodies, we apply either a straightforward spring-damper model or a more advanced quaternion-based motion modeler. For floating and colliding rigid multi-body systems, we recently coupled ELBE to a physics engine, the open-source Open Dynamics Engine (ODE) (http://www.ode.org). ODE uses a hard contact model and detects collisions very efficiently, even for triangulated mesh geometries. The equations of motion then are impulse-based solved, yielding a linear complementarity problem (LCP) [20]. If necessary, friction can be included in the simulations by a simplified and efficient Coulomb friction model. The ODE is well known for its high performance, free and easy access and the real-time simulation capabilities. It is mainly used in robotics and vehicle simulations, virtual reality environments, computer games and simulation tools [21][22][23].
Note that, in the coupling with ELBE, all fluid-related computations (hydrodynamics, force calculation, grid update, boundary conditions) are executed on the GPU device, while the solution of the rigid body equations of motion and/or the full ODE contact problem is done on the CPU.

Implementation
The main purpose of this paper is the extensive validation of ELBE for large-scale engineering applications. However, several ELBE-specific implementation details will be briefly discussed in the following. For the parallelization, the NVIDIA CUDA toolkit is used, as LBMs are especially well suited for implementation in a GPU context [24,25]. The latter provides a large amount of computing cores (i.e., up to 4992 on a recent NVIDIA Kepler K80 board) and sufficient device memory (24 GB, K80) for typical engineering applications. In 3D, this allows the allocation of up to 120 million lattice nodes per GPU board. The average performance yields more than 500 million node updates per second, for a single precision implementation on a state-of-the-art GPU. Thanks to this very efficient numerical back-end, three-dimensional simulations of complex flows are possible in a very competitive time.
However, many GPU-accelerated solvers (see, e.g., [26]) combine efficient and optimized solver techniques with conventional CPU-based pre-and post-processing routines for the initialization, the transient grid update and the data storage operations ( Figure 1). Due to the comparably low bandwidth of the PCI-Express slots that host the GPUs and additional latency effects, this dilutes the overall performance of the algorithms. In ELBE, innovative grid generation and post-processing techniques were developed instead and help to avoid this performance bottleneck completely. Lattice Boltzmann methods, as addressed in the present publication, discretize the governing equations on an equidistant Eulerian grid. Grid points of a computational domain used for lattice Boltzmann CFD implementations can be of a variety of different kinds. Those points representing a fluid particle ensemble are considered fluid particles, whereas other points can represent a slip wall, a no-slip wall, an inflow or even moving velocity boundaries. Since the grid itself is static and does not change over time, it is, apart from proper dynamic boundary conditions at the body surface, the changing character of the points through which a moving body's motion is manifested. Figure 2 (right) shows fluid particles as black circles and solid body particles as red dots. Black circles overlaid with red dots signify solid body particles from previous time steps, where a lighter shade of red indicates solid body particles from iterations further back in time. Our fast surface voxelization technique [19] maps arbitrary, tessellated triangular surface meshes to the LBM grids, directly on the GPU and without access to CPU host memory or CPU resources. The algorithm was optimized for massively parallel execution in a GPU context: its performance is superior to the performance of the flow field computations per time step, which allows efficient fluid-structure interaction simulations, without noticeable performance loss due to dynamic grid updates. Additionally, an integrated OpenGL-based visualizer tool allows one to visualize the simulation results during runtime, based on innovative, shared CUDA-OpenGL memory spaces residing in GPU memory, and allows for simulations without any CPU-GPU communication at all. ELBEvis is designed to help software developers, scientists and engineers to assess and interact with their computations in real time by providing an efficient and easy to use on-device implementation of several visualization features. After initialization, any time step's data are stored on device memory, so they can be used to compute the next time step's dataset without having to load new data from the host. Thus, the main computation's time loop could almost entirely be carried out on the device. To decouple visualization and computation, they are executed in two separate threads. Figure 3 shows several available filters to process the simulation data, e.g., colored slices through the computational domain, streamlines, glyphs and a volume-rendering representation. Moreover, in the case of, e.g., wave impact simulations, the surface pressure fields can be mapped to the surface of the geometry under consideration. On top of this, a graphical user interface (GUI) has been designed to provide a suitable control environment for the user. The GUI is executed as an entirely independent process and communicates with and instructs the computation/visualization via a shared memory segment that is spawned by the ELBEvis process upon its start.
Two different hardware concepts have been used with ELBE so far ( Figure 4). Typically, the numerical simulations are run on our local GPU compute server. Moreover, thanks to the reduced use of host-device communication in our method, we recently assembled a configuration based on an external expansion chassis and attached an NVIDIA GTX Titan board to an off-the-shelf laptop. Even though the connection bandwidth was extremely low (PCI-E 1x), no performance degradation was observed thanks to the reduced use of host-device communication in the method.

Summary
The ELBE code follows the paradigm of supercomputing on the desktop and aims at running numerical simulations on off-the-shelf commodity hardware to perform a near real-time analysis and evaluation of three-dimensional turbulent flow fields, without tedious access to supercomputers. The recently added proper pre-and post-processing routines reduce the host-to-device communication to further increase performance. We consider ELBE as a prototype for next-generation CFD tools for a near real-time analysis of complex flow fields and simulation-based design (SBD) on commodity hardware. The rich features and the outstanding performance of the code make it a tool that is also applicable to various kinds of challenges in civil and environmental engineering, as discussed in the following sections.

Wave Propagation and Inundation Modeling
As the first principal topic of this publication, tsunami-related simulations are addressed. The most recent tsunami with major impact was triggered by a seismic event approximately 40 miles off the coast of Japan, in March 2011. The earthquake triggered powerful tsunami waves that killed nearly 20,000 people. Apart from the massive destruction, long-term harmful environmental impact and the psychological effects, a major nuclear disaster at the Fukushima Daiichi nuclear power plant was triggered, including meltdowns and the release of radioactive material [27]. The degree of damage drastically illustrated how vulnerable modern society is to such natural hazards. The availability of a proper tsunami warning systems (as in Japan) hence is essential. Once earthquakes are detected, tsunami warning centers can predict whether a tsunami was generated or not and, based on additional real-time observations, feed long wave propagation models to simulate the expected tsunami propagation, tsunami impact and inundation. ELBE is already part of a Tsunami related project at Ocean Networks Canada [28] and eventually will be considered as part of the near-shore tsunami early-warning system for British Columbia. However, despite the existence of proper early warning systems, the Tohoku tsunami caused massive destruction. Consequently, apart from early warning systems, such violent flow scenarios also have to be simulated a priori in order to assess the hydrodynamic loads on potentially threatened structures and buildings. In this section, the applicability of ELBE to wave propagation and inundation modeling is discussed, consisting of the subsequent simulation of tsunami propagation, near-shore wave impact and debris flow. Ideally, these three components go hand in hand to provide a complete examination of potential wave impacts and serve as a helpful tool for civil and environmental engineers to assess the potential threat to coastal areas and structures.

Wave Propagation in Shallow Waters
The standard approach in real-time tsunami forecasting is based on models based on nonlinear shallow water (NSW) equations, which are a depth-integrated (hydrostatic) version of Navier-Stokes equations (e.g., [29]). The NSW equations are accurate for long wave propagation and runup problems, in which the horizontal length scale is much larger than the vertical length scale, such as for most tsunamis. Besides tsunamis, these equations are widely used in the field of ocean engineering (e.g., tide and ocean modeling) and atmospheric modeling, where one length scale is dominant. If vertical variations must be taken into account, these can be separated from the horizontal ones, resulting in a set of shallow water equations for a series of horizontal fluid layers (i.e., multilayer NSW). Recently, non-hydrostatic models, such as Boussinesq models (e.g., [27,30,31]), have increasingly been used in research and damage assessment following major tsunami events.
In NSW-LBMs, depth-averaging yields a modified collision operator for subcritical [32] and supercritical [33] flows. Depth-averaged LBM was successfully applied to a variety of illustrative benchmark problems, including wave run-up on a sloping beach [34], applications featuring bed slope and friction terms [35], wind-and density-driven circulation over irregular bathymetry [36] and tank sloshing examples [37]. While the predictive agreement with traditional methods was generally good, LBM run-times were always very competitive. However, instabilities were induced near the wet/dry boundaries. The development of improved wetting/drying models and second-order wet/dry boundary conditions is currently part of active research. In the following, the NSW-LBM module of ELBE is first applied to two benchmark problems, before discussing the application of ELBE to the 2011 Tohoku tsunami event.

Validation with Catalina Benchmark Problems
Here, we focus on two standard validation test cases of the tsunami community, which were initially proposed as part of the Third international workshop on long-wave runup models, which took place in 2004 on Catalina Island, CA, USA [38]. This and other benchmark test cases can be accessed online [39]. We examined two selected test cases: the 2D tsunami runup over a plane beach and tsunami runup over a complex 3D natural beach (Monai Valley). Note, in those, the latter benchmark problem is a demanding case, with very complex and realistic bottom topography. In both cases, the LBM solution is compared to available analytical, numerical or experimental reference data from the literature. In addition, the accuracy and performance of ELBE is compared to FUNWAVE, a state-of-the-art Boussinesq model used for tsunami propagation [31].

A. 2D Tsunami Runup over a Plane Beach
The benchmark reference data for this 2D runup problem was obtained from the analytical solution of NSW equations of Carrier et al. [40], for the initial tsunami wave profile shown in Figure 5 (without initial velocity) and a given beach slope (1:10). The benchmark task was to compute the free surface elevation in the runup region at three points in time (t = 160, 175, 220 s) and to compare those to the reference solution. Apart from the offshore wave propagation properties of the solver, the accuracy of the shoreline algorithm for the treatment of the air-water-beach interface at the shoreline is the key point of this benchmark. The LBM simulation parameters are summarized in Figure 5a. Figure 6 compares the resulting free surface profiles in the runup region to the benchmark reference data. A very good agreement can be seen, even at a relatively low grid resolution in ELBE. The larger differences for the last few nodes in the runup region are likely due to the employed first-order runup model, which does not use extrapolation of the water levels or complex reconstruction of the bottom elevation. Higher-order runup models would further improve the runup behavior, while the overall flow is already well-represented in the present simulation. Note that the Carrier solution is valid for the inviscid NSW equations, while our model includes viscous effects. Hence, for this benchmark, the fluid viscosity was iteratively minimized, and convergence to the presented results was observed. The viscosity for the given LB parameters in Figure 5a still is several orders of magnitude higher than the viscosity of water (ν ≈ 25 m 2 /s), but a further decrease does not significantly change the results. t=160 s t=175 s t=220 s Figure 6. Free surface in the runup region. ELBE results (solid lines) in comparison to the reference nonlinear shallow water (NSW) solution of [40] (alternating dots and dashes) for three points in time.
The simulation was run on an NVIDIA Tesla C2070 GPU in about 30 s, nearly ten-times faster than real time. Frandsen [41] computed the same test case with a D1Q3 LB model on a scalar processor, using a lattice of 100,000 nodes, yielding a grid spacing ∆x = 0.5 m and a time step ∆t = 0.002 s.
The CPU time for the whole D1Q3 simulation amounts to 6700-11,000 s for results with a very similar numerical accuracy in comparison to our results for the free surface elevation in the runup region. We found that a further grid refinement does not significantly change the results, so that we decided to keep our comparably low grid resolution. From a technical point of view, ELBE would have been able to increase the grid resolution up to a total of 50 million lattice nodes on the C2070 device. Despite the different grid resolutions, the comparison of times-to-solution of the two models is valid, as ELBE employs a 2D numerical model with nine degrees of freedom per lattice node (instead of three in the D1Q3 model) and a grid resolution of four lattice nodes in lateral direction, yielding an algorithmic overhead of approximately a factor of 20.

B. Tsunami Runup over a Complex 3D Natural Beach
After the theoretical benchmark case, ELBE is tested against experimental field data for the Okushiri 1993 tsunami event, which was also part of the Catalina set of benchmark problems. The Okushiri tsunami caused unexpectedly disastrous damage on the coast, whose survey provided numerous field data that can be used to validate and benchmark tsunami runup models. The corresponding 1/400 scale laboratory experiments of runup in the Monai Valley were conducted in order to reproduce and better understand the coastal impact of the Okushiri tsunami.
The scale model dimensions and parameters are given in Figure 7. The grid spacing was chosen as recommended in the benchmark setup information, and the given time-dependent offshore wave height (specified along the x = 0 boundary) is used. The benchmark is especially demanding in terms of bottom elevation. Moreover, the runup model had to be extended to cope with changes of bottom topography that are not aligned with the grid axes. The numerical results for the surface elevation at three different wave gauges are compared to experimental data from the scale model experiment in Figure 8. All in all, the ELBE results reproduce the salient features of the tsunami-induced flow quite well, particularly for the lower frequency components. The time of the initial wave impact and the arrival of the reflected wave match the experimental values very well for all three probes. The maximum water elevation at the numerical probes, however, is approximately 25% less than the experimental values. Attempts to fine-tune the LBM runup model and/or further reduce the fluid viscosity, to obtain a more realistic (and higher) water elevation induced instabilities in the runup region. This can potentially be improved by more sophisticated wetting/drying algorithms for the simulation of runup in complex bottom topography. As these run-up algorithms operate on a very small subset of lattice nodes only, the global workload per time step and the performance of the code are not affected significantly.  [30,31]. The flux terms and first-order derivatives are discretized with a fourth-order scheme, while for the higher order derivatives, central difference schemes are used. A third-order Runge-Kutta method is used for the time discretization. FUNWAVE's parallel code uses a domain decomposition method based on three rows of ghost layers and the message passing interface MPI. Grid resolutions and numerical setups for the simulations in ELBE and FUNWAVE are given in Table 1. Table 1. Simulation setup and performance results for FUNWAVE in comparison to NSW-LBM. Total number of node updates (NUP) and core hours (Core-h); resulting performance in million node updates per second (MNUPS) and thousand node updates per computational core (KNUPS/C).  [42], and a very good agreement of numerical and experimental data, also for the higher frequency variations, can be observed. The benchmark simulations were run on a Mac desktop with two quad-core CPUs (in hyper-threading mode), with a core clock of 2.93 GHz each; the ELBE simulations were run on a Tesla C2070 GPU board with 448 cores running at a clock speed of 1.15 GHz. Table 1 summarizes the resulting performance for the two simulations: with 2.6 × 10 10 node updates, due to the smaller time step, the NSW-LBM model needs 20-times more iterations than the Boussinesq solver. In the end, however, ELBE still runs about ten-times faster than FUNWAVE, due to the higher number of cores on the GPU. Moreover, the number of node updates per core also is a factor of four higher than in FUNWAVE. All of this shows the competitiveness of the ELBE solver for tsunami-related simulations. In combination with the currently-developed new runup models, the solver has the potential to become a competitive, fast and efficient tool for wave propagation and inundation simulations in the medium term.

Tohoku Tsunami
After the initial validation with universally-accepted benchmarks, the ELBE solver is applied to the most recent tsunami event, the 2011 Tohoku tsunami. The bathymetry is extracted from the ETOPO1 NOAA database [43], for a simulation domain of 1000 km×1000 km. The initial free surface elevation is generated by an analytical model [44], which is used to simulate ground deformation produced by, e.g., tectonic faults, such as earthquakes. For a given rectangular fault geometry, the free-surface elevation can be computed. The case is addressed with four consecutively-refined ELBE grids, as specified in Figure 9. In this one-directional grid coupling, the simulation results are interpolated from the coarser grids and serve as initial conditions for the fine grids. A bidirectional coupling approach has successfully been implemented in ELBE for the single-phase solver and is currently adapted to the shallow water cases. Figure 10 shows selected snapshots of the free surface elevation on three of the four grids, as the wave approaches the shoreline. In future work, the computed free surface elevations will be compared to available field data, and maximum water levels and inundation lengths will be verified with observations and numerical reference data. In any case, thanks to the recently-implemented interface to the ETOPO1 database and the Okada scripts, the ELBE code is applicable to almost any part of the ocean subject to tsunami generation, propagation and impact. As soon as reliable information on the earthquake source regions are available, initial conditions can be generated to feed the actual tsunami propagation simulations. Moreover, thanks to the efficient implementation, different input parameter sets can be simultaneously tested and simulated in a priori simulations, e.g., to identify the worst-case scenarios in civil engineering design.

Near-Field Wave Impact
Once the Tsunami propagation and inundation is successfully modeled, attention can be devoted to the near-field wave impact. Wave impact on engineering structures is one of the most popular applications of free surface flow solvers. Apart from single obstacles in the flow, which could also be analyzed in experimental setups and have been intensely investigated numerically, scenarios that involve more complete and complex geometries are still of great interest. From the numerical point of view, these test cases are demanding due to high Reynolds numbers, large computational domains and distinct geometrical features, which ask for massively parallel computing to tackle the simulations. Here, we exemplarily show the impact of a wave on South Manhattan to demonstrate the capability of the ELBE solver to deal with complex geometries of a large scale. The wave is generated by a breaking dam scenario. Snapshots of the simulation for selected time steps and camera angles are given in Figure 11. The simulation was run on one single NVIDIA Tesla card using the maximum available device memory on a 512 × 512 × 80 grid. Even with a full-scale grid spacing of ∆x = 2.3 m, detailed information on the liquid entry after the initial wave impact can be obtained.

Debris Flow: Coupling to a Physics Engine
Wave impact simulations on complex geometries provide detailed information on liquid entry and impact pressures. With the recent advances in high performance computing and GPU development, it has even become possible to go beyond the scope of static geometries and grid layouts and to resolve structural dynamics in the context of hydrodynamic simulation.

Validation of the Coupling Methodology
Before addressing more complex wave-multibody interactions, the coupling of the explicit LBM and the motion solver is examined, and a standard validation example is given. Here, a freely falling cube is analyzed, in line with the experimental work of [45] (Figure 12).
In Figure 13, the results of ELBE (left), our in-house finite-volume code FreSCo + (middle) and the experiment (right) are compared, and a very good agreement can be seen. Exemplarily, the Euler angles are plotted against FreSCo + reference data in Figure 14, and a reasonable agreement is found. At the same time, the validity of both ELBE motion modelers is shown, as both the ODE and our in-house quaternion-based motion modeler produce almost identical results.  φ, θ, ψ The numerical simulation of the twp-second event took approximately 5.4 h on a 10-core CPU machine using FreSCo + , while the ELBE simulation lasted less than 10 min on a recent NVIDIA GTX Titan gaming board.

ODE Performance Considerations
For enhanced debris flow simulations, we recently coupled ELBE to a physics engine, the open-source Open Dynamics Engine (ODE) (http://www.ode.org), to model floating and colliding rigid multi-body systems. After the initial physical validations, the second test case serves to investigate the calculation costs of ODE's collision method, as the collision detection/handling for triangulated surface mesh geometries was not investigated by other groups in detail yet and is examined in the following. ODE splits the collision handling into two separate parts, the collision detection and the collision handling. To accelerate the collision detection, it is split up into two phases. First, it is tested if the axis-aligned bounding boxes (AABB) of the geometries under consideration overlap (broad phase). If this is the case, the selected collision detection library tests for actual intersections of the two putatively colliding geometries (narrow phase). In certain situations, further efficiency improvements can be gained by using one of four different hierarchical bounding spaces or an arbitrary combination of them.
Two complex triangulated mesh geometries are considered [46]. One geometry is fixed, and the second geometry is freely falling and accelerated towards the first one by gravitational forces. To achieve as much contacts as possible in time, the restitution coefficient is set to k = 0.01. The time step size is set to ∆t = 10 −5 s. A total of five different surface mesh sizes were tested. In the first simulation, both surface meshes consists of 10.960 triangles. In the following runs, the triangle count is increased quadratically by the factors 2 2 , 3 2 , 4 2 and 5 2 . Three time-consuming operations have to be considered: the collision detection, the collision handling and the overhead of ELBE to finish the current time step and to initialize the collision detection of the next time step. Figure 15. Performance of ODE's collision detection method, coupled to ELBE, ∆t = 10 −5 s. Figure 15 shows the detailed time measurements of the collision detection part of the computation, plotted against the total simulation time t. In the broad phase, the computational costs are almost zero (beginning and end of the simulation), while the computational costs in the narrow phase (once the AABBs intersect) are significantly higher (middle part, in gray, 0.18 s < t < 0.42 s). When the actual collisions take place, the calculation time rises again significantly (yellow). Two fundamental conclusions can be drawn here: First, since the calculation times of the five simulations nearly increase linearly, while the triangle count n increases quadratically, the computational cost of the collision detection is approximately of O( √ n). Second, as a triangle mesh refinement has no direct impact on the size of the LCP itself, the computational cost of the collision handling remains constant for all five cases, t ≈ 0.01 ms. The remaining part of the time loop, i.e., the force calculation and the re-gridding step, is much more expensive. A linear increase with the number of triangles (and consequently, the number of rigid bodies) is observed. To sum up, the ODE can efficiently address complex multibody problems, also for tessellated surface meshes, as commonly used to describe and model complex engineering geometries.

Debris Flow Application
To demonstrate the applicability of the coupled ELBE-ODE solver, complex fluid-multi-body interactions of an incoming wave over a beach-section with obstacles are shown ( Figure 16). The incoming wave is generated by a breaking dam scenario. The obstacles are two pairs of containers and a duck, being characterized by surface meshes, as specified in Table 2. Two-point-five seconds of real-time flow were simulated within 4 h of computation, yielding an average node update rate of 40 million node updates per second (MNUPS). Further optimizations of the GPU-accelerated coupling algorithms, including an optimized force calculation based on k-d trees and axis-aligned bounding boxes, are currently part of active research. At the same time, in the scope of naval architecture, several scenarios of complex fluid-multi-body interactions, e.g., of incoming waves that hit a section of a shipping harbor and ship-ice interactions, are currently investigated. Nevertheless, the debris flow simulation nicely demonstrates the capability of the coupled ELBE-ODE solver to simulate fluid-structure interactions in civil and environmental engineering.

Dam Break Simulations
The second principal application for CFD problems in civil engineering that is addressed in this contribution is related to dam break scenarios. In these applications, a column of a fluid is freed at Time 0, to collapse and flow under gravity, yielding a surge wave of considerable height that potentially interacts with structures in the flow. Dam breaks have been examined in many publications, starting with the pioneering work of Martin and Moyce [47] in the early 1950s. Here, two more sophisticated dam breaks with practical relevance for civil engineers are addressed: (i) a bund wall overtopping case; and (ii) a dam break with surge wave impact on an obstacle in the flow.

Bund Wall Overtopping
The first breaking dam test case is related to the safety of cylindrical reservoirs containing dangerous chemicals. In case of catastrophic failure, fluid would flow from the reservoirs into a retaining trench encircled by a vertical containment wall. The question is how much fluid would overtop the wall for certain failure types and configurations. The problem can be studied in a radial direction (i.e., 2D vertical slice), and 2D laboratory experimental data are available for validation. The final goal of such simulations is, e.g., to develop predictions for the overtopping characteristics and the dynamic pressures on the bund walls post catastrophic failure of the storage vessels and, if possible, to propose methods of mitigation.
For the purpose of numerical modeling, a 3D tank rupture is simulated as a 2D dam break problem (Figure 17b). The fluid volume of height H and radius R in the tank is released at time t = 0. The fluid then propagates towards and hits a bund of height h (modeled as a stationary wall), whose toe is located at a distance L from the tank. The modeling will continue until the fluid is stationary, and then, the amount of fluid that overtops the wall is calculated.
The 12 investigated cases are given in Table 3a, in accordance with the work of [48]. Resulting free surface locations and wave patterns for four selected cases are depicted in Figure 18. Entire flow patterns and overtopping volumes for different tank and bund wall configurations can be observed. In Table 3b, numerical results for the overtopping volume Q (relative to the total fluid volume Q 0 ) and the bund wall impact pressure p (relative to the hydrostatic pressure p 0 ) are compared to the experimental reference data for all 12 cases. A very good agreement can be seen for both parameters: the relative error is below 16% (for the overtopping volume) and 17% (for the impact pressure), emphasizing the predictive accuracy of ELBE even for complex engineering applications.  Figure 17. Specification of overtopping test cases. (a) Parameters; (b) tank and bund nomenclature [48]. Table 3. Simulation parameters and results. (a) Simulation parameters in accordance with the tank spill experiments of [48]; (b) results of simulations in comparison to the tank spill experiments of [48].  1  300  120  574  36  2  300  300  574  90  3  300  600  574  180  4  300  120  520  48  5  300  300  520  120  6  300  600  520  240  7  300  120  822  24  8  300  300  822  60  9  300  600  822  120  10  300  120  775  36  11  300  300  775  90  12 300 600 775 180  Figure 18. Results for the overtopping test case of Atherton [48] for four different cases.

Wave Impact
Breaking dam scenarios are standard validation cases for free surface flow solvers. Such cases do not require the use of elaborate boundary conditions, so that here, a more complex experiment including wave gauges and pressure probes is selected for validation purposes; see Figure 19.  Figure 19. Specification of the wave impact test case. (a) Parameters; (b) test case setup [49].
The experiments were conducted in a laboratory tank at the Maritime Research Institute Netherlands (MARIN) [49,50]. A box was placed in the tank and was covered by eight pressure sensors, four on the front of the box at height z = {0.025, 0.063, 0.099, 0.136} m and four on the top of the box at x = {0.806, 0.769, 0.733, 0.696} m. In the back part of the tank, a water column is constrained by a door, which is pulled up by releasing a weight at the initial time and is made to collapse. The water elevation then was measured in the tank at four locations (H1-H4, in the reservoir and in the tank, at x i = {0.5, 1.0, 1.5, 2.66} m). The experimental results for the height probes H2 and H4 and the pressure probes P1 and P7 are public domain.
The problem is addressed with a fully two-dimensional model, as the main physical effects in this case are basically two-dimensional. No periodic boundary conditions were employed. In Figure 20, the ELBE results are plotted against the experimental reference data for two selected pressure probes at the front surface (P1) and the upper surface (P7) of the box. The plotted LBM pressure signals were processed with a moving average low-pass filter with a reasonable filter width to remove the high-frequency pressure noise. The experimental data are reproduced very well, both in terms of amplitude and frequency. On top of the accurate prediction of the initial impact, even the arrival time of the reflected wave at t ≈ 4.7 s is reproduced very well. The numerical simulation for this test case took approximately 60 min, on the two million node grid, including a full post-processing. In combination with the high result accuracy, this emphasizes the applicability of ELBE to civil-engineering-related wave impact scenarios.

Numerical Wave Tank for On-and Off-Shore Structures
Finally, a more classic application of CFD in engineering is addressed. While the first two scenarios demonstrated the applicability of the ELBE code for violent flow situations, such as tsunamis and wave impact, the evaluation of hydrodynamic forces for non-violent cases is one of the main and very basic applications for CFD techniques in civil and environmental engineering. Three application examples are discussed in the following.

River Engineering
As a first application, the flow past a weir is examined, as detailed in Figure 21. The parameters for our simulation are chosen in such a way that the subcritical inflow switches to a supercritical state while or shortly after passing the weir and jumps back to the subcritical state after passing the weir. In this hydraulic jump, energy is dissipated, so that the downstream flow velocity is lowered and erosion effects in the river bed are reduced.  (Figure 22a). The Froude number Fr 2 behind the weir also determines the characteristics of the hydraulic jump [51]. The estimated Froude number around Fr 2 ≈ 4.5 is generally considered to be sufficiently high for a stable, oscillating and wavy hydraulic jump. The weir geometry has to be chosen carefully, to guarantee the required performance and to avoid negative pressures and the risk of cavitation. On top a step in the rear part of the domain prevents the hydraulic jump from moving downstream and leaving the domain.
A rendered snapshot of the simulation after t = 60 s is shown in Figure 21b. Moreover, our numerical results for flow Sections 2 and 3 are compared to the Bernoulli estimates in Figure 22a, for the four consecutively-refined grids. In general, a fairly good agreement can be observed. However, in the supercritical flow section on the back of the weir, the numerical predicted water level on the coarsest grid is three-times higher than the Bernoulli prediction. Grid refinement slightly improves (i.e., lowers) the value of water height Probe 2, but still, the numerical prediction is two-times higher. This discrepancy might occur due to two reasons: (i) The no-slip boundary condition on the weir surface decelerates the flow, which in combination with the fairly low grid resolution in this area, yields an increased water level and a reduced flow velocity in Section 2. (ii) Bernoulli's equation is valid for inviscid, irrotational flows only and does not consider viscous and/or turbulent dissipation, so that the total energy of the fluid in Section 3 is overestimated, yielding a lower theoretical water level than observed numerically.
In Figure 22b, the x-component of the flow velocity in the hydraulic jump is shown. The maximum flow velocity does match the predicted value of v 2 = 8.51 m/s, and the expected flow behavior in the hydraulic jump can be observed, including turbulent structures and local backflow. Moreover, the near-wall slowdown on the back of the weir can be identified. Finally, the Smagorinsky LES model tends to overestimate the turbulent effects in the near-wall region, which should be tackled by the use of dynamic Smagorinsky models or advanced turbulence models. The simulations on the coarsest grid (∆x = 0.1 m, ∆t = 1.4 × 10 −3 s) are in the same order of magnitude as the time scale of the actual fluid event.

Wave-Current-Induced Loads on Submerged Bodies
As a second validation example and potential application of the ELBE wave tank, wave-current-induced loads on submerged bodies are addressed, for the case of a generic gravity foundation. In the scope of the development of renewable energies, tidal stream generators are getting more and more popular. While there exist several different concepts to install tidal stream generators in the open sea, one of the most popular ones is the gravity foundation. During the structural design process, it is important to know the hydrodynamic loads on the foundation caused by current and waves. In addition to simplified hydrodynamic models, which estimate the structural loads based on inviscid and/or irrotational flows, fully-resolved CFD tools, including wave and current boundary conditions, can give insight into the detailed hydrodynamic loads. In this context and to validate the employed CFD tools, a numerical and experimental benchmark for hydrodynamic wave-current loads acting on generic gravity foundations was recently published by Markus et al. [52]. Five different geometries are analyzed experimentally and numerically (Figure 23b) for scenarios with currents, waves and combined wave-current loads. Here, we present ELBE results for two selected scenarios, to show the validity of the hydrodynamic approach, the force calculation methods and wave-current inlet/outlet boundary conditions.  The five geometries have a cross-sectional area of 250 cm 2 , a height from the ground of 10.5 cm and a top base dimension of at least 5 cm. The load conditions are split up into three groups: current loads, wave-induced loads and wave-current loads. To demonstrate the applicability of ELBE to such scenarios, two cases are selected: a current load (case C3 in [52]) with a flow velocity of v = 0.38 m/s and a wave load (case H3 in [52]) with a wave height h = 0.12 m and a wave period of T = 1.9 s. The initial water height above the structure is chosen to be h 0 ≈ 0.35 m, in accordance with the experimental data.
The resulting horizontal force results are summarized in Table 4. Note that the given current loads are mean force values over a time span of 5 s, and the given wave loads are the average value of the horizontal force maxima. In Figure 24, ELBE results are plotted against the foundation inclination angle, again in comparison to the experimental data for both load scenarios. Very good agreement can be seen for smaller inclination angles, while the ELBE solution differs with increasing inclination angle. This is due to the geometry representation in ELBE, which was restricted to first order for the selected cases. Hence, the geometry with a higher inclination angle is represented as worse than rectangular, axis-parallel geometries.
Consequently, and as expected, the best results are obtained for Geometry 1, where the numerical and experimental results differ by only about 3%. Finally, the transient force signals are compared to the experimental data; see Figure 25. A very good agreement both for the maximum amplitude and the period of the force signal can be observed. To conclude, it could be shown that ELBE is applicable to the numerical simulation of wave/current-induced loads on submerged structures and can be a helpful tool in the structural design process. Thanks to the efficient implementation and the innovative GPU hardware that is used by the code, the presented simulations could be run in less than one minute, on a Kepler K40M board. Hence, such simulations could easily be included into the foundation design process and serve as a basis for a proper shape optimization.

Vortex-Induced Vibrations
Last, but not least, a very recent and up-to-date phenomenon is addressed: structural vibrations that are induced by transient, turbulent flows. Detailed investigations of such vortex-induced vibrations were, e.g., made by Mittal and Kumar [53]. They examined a circular cylinder in a uniform flow at Re = 325 with two translational degrees of freedom using a stabilized space-time finite element formulation. Results were presented for various values of structural frequency of the one-mass oscillator. We repeated these test cases with ELBE (see parameters in Table 5) and achieved results with a quality comparable to those of Mittal and Kumar [53] and Geller [17]. Figure 26 shows the trajectories for eight different dimensionless eigenfrequencies F s . Note that F s = 0.21 represents the resonance case with the highest amplitude. Even here, a very good agreement can be seen. Further information on the test case can be found in [17,53].

Summary and Conclusions
We presented several applications of the efficient lattice Boltzmann environment ELBE with practical relevance for various civil engineering applications. Wave propagation, wave impact and less violent hydrodynamic load cases were addressed. The validations show a good performance of the code and a good agreement of the numerical results with the reference data in most of the cases. In combination with the recently implemented OpenGL-based visualizer tool, which allows one to visualize the simulation results during runtime [54], the incorporation of this high-end numerical tool into a simulation-based design (SBD) process of civil and environmental engineers has become more than possible. In future work, several model aspects will be improved, particularly concerning wetting/drying models in the shallow water simulations, higher order boundary conditions near the solid surfaces, grid refinement and a coupling to inviscid methods.