# A GPU-Accelerated and LTS-Based Finite Volume Shallow Water Model

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Governing Equations and Empirical Relations

^{2}; ${S}_{bx}=-\partial {z}_{b}/\partial x$ and ${S}_{by}=-\partial {z}_{b}/\partial y$ are the bed slopes, where ${z}_{b}$ is the bed elevation; ${S}_{fx}={n}^{2}u\sqrt{{u}^{2}+{v}^{2}}/{h}^{4/3}$ and ${S}_{fy}={n}^{2}v\sqrt{{u}^{2}+{v}^{2}}/{h}^{4/3}$ are the friction slopes, where $n$ is the Manning roughness.

#### 2.2. Finite Volume Discretization on Unstructured Triangular Grids

#### 2.3. The Local Time Step Estimations

^{−}

^{6}m. The second step is to compute the global minimum value of time step $\Delta {t}_{\mathrm{min}}$:

#### 2.4. The Six SWM Versions and Their Numerical Structures

- (a)
- The model is initiated on the host by reading data (e.g., mesh information; basic parameters; boundary conditions such as time series of water level or discharge, if any); setting initial conditions; computing basic parameters (e.g., face length, normal direction, cell area). Afterwards, necessary data are transferred from the host (CPU) to the device (GPU). This data transfer occurs once per numerical case, of which the overhead is negligible as compared to the whole computational cost.
- (b)
- The LTS estimation involves twice implementation of the reduction algorithm (RA1, RA2) and five kernels: Kernel ‘Cell_Time_Step (CTS)’ computes the locally allowable maximum time step; the reduction algorithm (RA1) finds the globally minimum time step; Kernel ‘Cell_Potential_Exponent (CPE)’ computes the potential grade exponent. Kernel ‘Cell-Modified_Exponent (CME)’ modifies the potential grade exponents around dynamic/static fronts. Kernel ‘Face_Final_Exponent (FFE)’ computes the grade exponent of each face. Kernel ‘Cell_Final_Exponent (CFE)’ computes the actual grade exponent of cells. The reduction algorithm (RA2) finds ${m}_{\mathrm{max}}$.
- (c)
- The parameter ${m}_{\mathrm{max}}$ and $\Delta {t}_{\mathrm{min}}$ is transferred from the device to the host, such that a full cycle of ${S}_{c}=1,2,~,{2}^{{m}_{\mathrm{max}}}$ can be defined in the host. For each sub-cycle, the parameter ${S}_{c}$ is firstly transferred from the host to the device (on constant memory); afterwards, two kernels [Face_Flux (FF), Cell_Variable (CV)] are invoked: ‘Face_Flux (FF)’ computes numerical fluxes of faces; Cell_Variable (CV)’ updates the conserved variables of cells at a time interval of ${2}^{{m}_{i}^{*}}\Delta {t}_{\mathrm{min}}$ by firstly estimating the bed slope and frictions, and secondly implement Equation (3). At the end of each sub-cycle, ${S}_{c}$ is increased by unity until it exceeds ${2}^{{m}_{\mathrm{max}}}$, and subsequently, the computation goes to the LTS estimation again.

## 3. Results and Discussion

#### 3.1. Simulation of the Idealized Dam-Break Flows

^{−2}m, 5.24 × 10

^{−2}m, 3.46 × 10

^{−2}m, and 2.30× 10

^{−2}m for the four set of meshes, respectively. Such magnitudes of quantitative discrepancies are negligible. Figure 3 presents visual comparisons of the computed and analytical solutions for water depth profiles at four transients. From Figure 3, when the water is released, a forward-propagating water bore, and a backward-propagating rarefaction wave are formed.

#### 3.2. Simulation of the Experimental Partial Dam-Break Flow Propagating through Buildings

#### 3.3. Simulation of Tidal Flows in Yangtze Estuary and Hangzhou Bay

^{3}/s is specified. The open sea boundary was driven by nine tidal constituents (i.e., M2, S2, N2, K2, K1, O1, P1, Q1, and M4), which are obtained from TPXO. The Manning roughness is set equal to 0.01 + 0.012/h. Although strong sediment transport occurs in the Yangtze estuary and the Hangzhou bay, only clear-water flow over a fixed bed was simulated in this paper.

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Cea, L.; French, J.R.; Vazquez-Cendon, M.E. Numerical modelling of tidal flows in complex estuaries including turbulence: An unstructured finite volume solver and experimental validation. Int. J. Numer. Methods Eng.
**2006**, 67, 1909–1932. [Google Scholar] [CrossRef] - Hu, P.; Han, J.J.; Li, W.; Sun, Z.L.; He, Z.G. Numerical investigation of a sandbar formation and evolution in a tide-dominated estuary using a hydro-sediment-morphodynamic model. Coast. Eng. J.
**2018**, 60, 466–483. [Google Scholar] [CrossRef] - Luan, H.L.; Ding, P.X.; Wang, Z.B.; Ge, J.Z. Process-based morphodynamic modeling of the Yangtze Estuary at a decadal timescale: Controls on estuarine evolution and future trends. Geomorphology
**2017**, 290, 347–364. [Google Scholar] [CrossRef] - Qin, X.; LeVeque, R.J.; Motley, M.R. Accelerating an adaptive mesh refinement code for depth-averaged flows using GPUs. J. Adv. Modeling Earth Syst.
**2019**, 11, 2606–2628. [Google Scholar] [CrossRef] [Green Version] - Kim, D.H.; Cho, Y.S.; Yi, Y.K. Propagation and run-up of nearshore tsunamis with HLLC approximate Riemann solver. Ocean Eng.
**2007**, 34, 1164–1173. [Google Scholar] [CrossRef] - Kernkamp, H.W.J.; van Dam, A.; Stelling, G.S.; de Goede, E.D. Efficient scheme for the shallow water equations on unstructured grids with application to the Continental Shelf. Ocean Dyn.
**2011**, 61, 1175–1188. [Google Scholar] [CrossRef] - Hu, P.; Li, W.; He, Z.G.; Pahtz, T.; Yue, Z.Y. Well-balanced and flexible modelling of swash hydrodynamics and sediment transport. Coast. Eng.
**2015**, 96, 27–37. [Google Scholar] [CrossRef] [Green Version] - Hu, P.; Tan, L.; He, Z. Numerical investigation on the adaptation of dam-break flow-induced bed load transport to the capacity regime over a sloping bed. J. Coast. Res.
**2020**, 36, 1237–1246. [Google Scholar] [CrossRef] - Yu, H.; Huang, G.; Wu, C. Efficient finite volume model for shallow-water flows using an Implicit Dual Time-Stepping Method. J. Hydraul. Eng. ASCE
**2015**, 141, 04015004. [Google Scholar] [CrossRef] - Gaudreault, S.; Pudykiewicz, J.A. An efficient exponential time integration method for the numerical solution of the shallow water equations on the spher. J. Comput. Phys.
**2016**, 322, 827–848. [Google Scholar] [CrossRef] [Green Version] - Crossley, A.J.; Wright, N.G. Time accurate local time stepping for the unsteady shallow water equations. Int. J. Numer. Methods Fluids
**2005**, 48, 775–799. [Google Scholar] [CrossRef] - Dazzi, S.; Maranzoni, A.; Mignosa, P. Local time stepping applied to mixed flow modeling. J. Hydraul. Res.
**2016**, 54, 1–13. [Google Scholar] [CrossRef] - Dazzi, S.; Vacondio, R.; Palu, A.D.; Mignosa, P. A local time stepping algorithm for GPU-accelerated 2D shallow water models. Adv. Water Resour.
**2018**, 111, 274–288. [Google Scholar] [CrossRef] - Hu, P.; Lei, Y.L.; Han, J.J.; Cao, Z.X.; Liu, H.H.; Yue, Z.Y.; He, Z.G. An improved local-time-step for 2D shallow water modeling based on unstructured grids. J. Hydraul. Eng. ASCE
**2019**, 145, 06019017. [Google Scholar] [CrossRef] - Hu, P.; Lei, Y.L.; Han, J.J.; Cao, Z.X.; Liu, H.H.; He, Z.G. Computationally efficient hydro-morphodynamic modelling using a hybrid local-time-step and the global maximum-time-step. Adv. Water Resour.
**2019**, 127, 26–38. [Google Scholar] [CrossRef] - Sanders, B.F. Integration of a shallow water model with a local time step. J. Hydraul. Res.
**2008**, 46, 466–475. [Google Scholar] [CrossRef] - Sanders, B.F.; Schubert, J.E.; Detwiler, R.L. ParBreZo: A parallel, unstructured grid, Godunov-type, shallow-water code for high-resolution flood inundation modeling at the regional scale. Adv. Water Resour.
**2010**, 33, 1456–1467. [Google Scholar] [CrossRef] - Sanders, B.F.; Schubert, J.E. PRIMo: Parallel raster inundation model. Adv. Water Resour.
**2019**, 126, 79–95. [Google Scholar] [CrossRef] - Brodtkorb, A.R.; Sætra, M.L.; Altinakar, M. Efficient shallow water simulations on GPUs: Implementation, visualization, verification, and validation. Comput. Fluids
**2012**, 55, 1–12. [Google Scholar] [CrossRef] - Castro, M.J.; Ortega, S.; de la Asuncion, M.; Mantas, J.M.; Gallardo, J.M. GPU computing for shallow water flow simulation based on finite volume schemes. Comptes Rendus Mécanique
**2011**, 339, 165–184. [Google Scholar] [CrossRef] - de la Asuncion, M.; Castro, M.J. Simulation of tsunamis generated by landslides using adaptive mesh refinement on GPU. J. Comput. Phys.
**2017**, 345, 91–110. [Google Scholar] [CrossRef] - Lacasta, A.; Morales-Hernandez, M.; Murillo, J.; Garcia-Navarro, P. An optimized GPU implementation of a 2D free surface simulation model on unstructured meshes. Adv. Eng. Softw.
**2014**, 78, 1–15. [Google Scholar] [CrossRef] - Lacasta, A.; Morales-Hernandez, M.; Murillo, J.; Garcia-Navarro, P. GPU implementation of the 2D shallow water equations for the simulation of rainfall/runoff events. Environ. Earth Sci.
**2015**, 74, 7295–7305. [Google Scholar] [CrossRef] - Vacondio, R.; Palu, A.D.; Ferrari, A.; Mignosa, P.; Aureli, F.; Dazzi, S. A non-uniform efficient grid type for GPU-parallel Shallow Water Equations models. Environ. Model. Softw.
**2017**, 88, 119–137. [Google Scholar] [CrossRef] - Hou, J.; Liang, Q.; Simons, F.; Hinkelmann, R. A 2D well-balanced shallow flow model for unstructured grids with novel slope source term treatment. Adv. Water Resour.
**2013**, 52, 107–131. [Google Scholar] [CrossRef] - Liang, Q.; Marche, F. Numerical resolution of well-balanced shallow water equations with complex source terms. Adv. Water Resour.
**2009**, 32, 873–884. [Google Scholar] [CrossRef] - Audusse, E.; Bouchut, F.; Bristeau, M.; Klein, R.; Perthame, B.T. A fast and stable well-balanced scheme with hydrostatic reconstruction for shallow water flows. SIAM J. Sci. Comput.
**2004**, 25, 2050–2065. [Google Scholar] [CrossRef] - He, Z.G.; Wu, T.; Weng, H.X.; Hu, P.; Wu, G.F. Numerical investigation of the vegetation effects on dam-flows and bed morphological changes. Int. J. Sediment Res.
**2017**, 32, 105–120. [Google Scholar] [CrossRef] - Toro, E. Shock-Capturing Methods for Free-Surface Shallow Flows; Wiley: Chichester, UK, 2001. [Google Scholar]
- Yoon, T.H.; Kang, S. Finite volume model for two-dimensional shallow water flows on unstructured grids. J. Hydraul. Eng. ASCE
**2004**, 130, 678–688. [Google Scholar] [CrossRef] - Soares-Frazão, S.; Zech, Y. Dam-break flow through an idealised city. J. Hydraul. Res. IAHR
**2008**, 46, 648–658. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**Sketches of the unstructured triangular meshes: (

**a**) an internal cell (i) and its three neighboring cells; (

**b**) a face (j) and its two neighboring cells (jL and jR).

**Figure 2.**Numerical structures of these six versions: (

**a**) single-core CPU and multi-core CPU; (

**b**) GPU; (

**c**) single-core CPU + LTS, and multi-core CPU + LTS; (

**d**) GPU + LTS.

**Figure 4.**Speeding-up ratios of the six SWM versions for simulations of the dam break flows, as compared to the single-core CPU version.

**Figure 5.**Experimental configuration for dam-break flows through idealized city (modified from [28]).

**Figure 6.**Computed and measured hydrodynamics: (

**a**–

**d**) water level profiles; (

**e**–

**h**) flow velocity profiles along the B-B section at four transients.

**Figure 7.**Speeding-up ratios of the six SWM versions for simulations of the experimental dam-break flows propagating through buildings, as compared to the single-core CPU version.

**Figure 8.**Initial bed topography of computational domain, unstructured mesh and stations for Yangtze estuary and Hangzhou bay case.

**Figure 10.**Speeding-up ratios of the six SWM versions for simulations of tidal flows, as compared to the single-core CPU version.

SWM Version | LTS | Parallel Language | |
---|---|---|---|

V1 | single-core CPU | No | NA |

V2 | (multi-core CPU) | Open MP | |

V3 | GPU | CUDA Fortran | |

V4 | single-core CPU + LTS | YES | NA |

V5 | multi-core CPU + LTS | Open MP | |

V6 | GPU + LTS | CUDA Fortran |

**Table 2.**Statistics for simulations of the idealized dam-break flows (V1-V6 refers to the six SWM versions).

No. | ${\mathit{N}}_{\mathit{c}}$$/{\mathit{L}}_{50}$$\left(\mathbf{m}\right)/\mathit{\sigma}$$/\mathit{G}\mathit{c}$ | Computational Cost (s) of V1 | Speeding-Up Ratios Compared to Single-Core CPU Version | LTS Benefits | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|

V1 | V2 | V3 | V4 | V5 | V6 | V4 vs. V1 | V5 vs. V2 | V6 vs. V3 | |||

1 | 1010/0.14/0.7/1.1 | 0.079 | 1 | 1.4 | 2.6 | 2.6 | 2.0 | 2.3 | 2.6 | 1.4 | 0.9 |

2 | 4762/0.066/0.7/1.0 | 0.585 | 1 | 2.4 | 12.4 | 2.1 | 3.8 | 9.3 | 2.1 | 1.6 | 0.7 |

3 | 18,824/0.033/0.7/1.0 | 6.772 | 1 | 2.9 | 37.4 | 2.8 | 6.6 | 37.2 | 2.8 | 2.3 | 1.0 |

4 | 73,508/0.016/0.7/1.0 | 44.56 | 1 | 2.8 | 77.4 | 2.2 | 5.2 | 82.1 | 2.2 | 1.9 | 1.1 |

No. | ${\mathit{N}}_{\mathit{c}}$$/{\mathit{L}}_{50}$$\left(\mathbf{m}\right)/\mathit{\sigma}$$/\mathit{G}\mathit{c}$ | Computational Cost (s) of V1 | Speeding-Up Ratios Compared to Single-Core CPU Version | LTS Benefits | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|

V1 | V2 | V3 | V4 | V5 | V6 | V4 vs. V1 | V5 vs. V2 | V6 vs. V3 | |||

1 | 5000/0.1/1.5/1.5 | 2.9 | 1 | 1.2 | 3.7 | 2.4 | 3.7 | 7.3 | 2.4 | 3.0 | 2.0 |

2 | 10,510/0.1/1.6/1.7 | 6.3 | 1 | 1.4 | 6.3 | 2.1 | 3.2 | 7.9 | 2.1 | 2.3 | 1.3 |

3 | 20,523/0.07/1.8/2.0 | 26.3 | 1 | 1.9 | 13.8 | 3.8 | 2.3 | 10.1 | 3.8 | 3.4 | 1.4 |

4 | 28,694/0.07/1.7/1.9 | 72.1 | 1 | 2.1 | 17.6 | 4.5 | 8.8 | 28.8 | 4.5 | 4.1 | 1.6 |

5 | 43,302/0.05/1.6/1.8 | 105.8 | 1 | 2.2 | 25.2 | 4.5 | 9.2 | 39.2 | 4.5 | 4.1 | 1.6 |

6 | 68,892/0.04/1.3/1.3 | 196.0 | 1 | 2.1 | 31.3 | 3.4 | 8.0 | 53.0 | 3.4 | 3.8 | 1.7 |

7 | 153,218/0.03/1.4/1.5 | 694.1 | 1 | 2.0 | 39.4 | 3.3 | 7.9 | 79.8 | 3.3 | 4.0 | 2.0 |

8 | 263,115/0.02/1.8/2.1 | 1628.1 | 1 | 1.8 | 44.2 | 4.0 | 8.4 | 101.8 | 4.0 | 4.6 | 2.3 |

9 | 609,916/0.01/1.4/1.5 | 5954.3 | 1 | 1.9 | 51 | 3.6 | 9.1 | 132.9 | 3.6 | 4.9 | 2.6 |

No. | ${\mathit{N}}_{\mathit{c}}$$/{\mathit{L}}_{50}$$\left(\mathbf{m}\right)/\mathit{\sigma}$$/\mathit{G}\mathit{c}$ | Computational Cost (min) of V1 | Speeding-Up Ratios Compared to Single-Core CPU Version | LTS Benefits | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|

V1 | V2 | V3 | V4 | V5 | V6 | V4 vs. V1 | V5 vs. V2 | V6 vs. V3 | |||

1 | 21,629/10.3/1.7/1.7 | 20.3 | 1 | 3.2 | 7.0 | 8.1 | 33.8 | 25.4 | 8.1 | 10.5 | 3.6 |

2 | 44,606/9.6/1.9/1.9 | 28.5 | 1 | 2.4 | 10.2 | 5.6 | 20.4 | 35.6 | 5.6 | 8.4 | 3.5 |

3 | 61,315/9.6/1.9/2.0 | 35.7 | 1 | 2.9 | 17.9 | 6.0 | 14.3 | 44.6 | 6.0 | 5.0 | 2.5 |

4 | 82,717/9.6/1.8/1.8 | 59.3 | 1 | 2.9 | 22.0 | 4.1 | 12.6 | 65.9 | 4.1 | 4.4 | 3.0 |

5 | 190,308/2.5/1.7/1.8 | 104.8 | 1 | 2.8 | 18.7 | 4.1 | 11.9 | 61.6 | 4.1 | 4.3 | 3.3 |

6 | 269,945/2.4/1.4/1.4 | 442.9 | 1 | 3.1 | 24.2 | 6.4 | 18.1 | 79.1 | 6.4 | 5.9 | 3.3 |

7 | 422,853/1.8/1.2/1.2 | 662.7 | 1 | 3.0 | 24.6 | 5.5 | 18.3 | 98.9 | 5.5 | 6.1 | 4.0 |

8 | 518,175/1.6/1.2/1.2 | 1024.8 | 1 | 2.5 | 27.0 | 6.0 | 19.9 | 150.7 | 6.0 | 8.1 | 5.6 |

9 | 1,756,679/1.3/1.4/1.4 | 5954.8 | 1 | 2.6 | 108.1 | 3.5 | 12.4 | 350.3 | 3.5 | 4.7 | 3.2 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Hu, P.; Zhao, Z.; Ji, A.; Li, W.; He, Z.; Liu, Q.; Li, Y.; Cao, Z.
A GPU-Accelerated and LTS-Based Finite Volume Shallow Water Model. *Water* **2022**, *14*, 922.
https://doi.org/10.3390/w14060922

**AMA Style**

Hu P, Zhao Z, Ji A, Li W, He Z, Liu Q, Li Y, Cao Z.
A GPU-Accelerated and LTS-Based Finite Volume Shallow Water Model. *Water*. 2022; 14(6):922.
https://doi.org/10.3390/w14060922

**Chicago/Turabian Style**

Hu, Peng, Zixiong Zhao, Aofei Ji, Wei Li, Zhiguo He, Qifeng Liu, Youwei Li, and Zhixian Cao.
2022. "A GPU-Accelerated and LTS-Based Finite Volume Shallow Water Model" *Water* 14, no. 6: 922.
https://doi.org/10.3390/w14060922