Implementation of a Local Time Stepping Algorithm and Its Acceleration E ﬀ ect on Two-Dimensional Hydrodynamic Models

: The engineering applications of two-dimensional (2D) hydrodynamic models are restricted by the enormous number of meshes needed and the overheads of simulation time. The aim of this study is to improve computational e ﬃ ciency and optimize the balance between the quantity of grids used in and the simulation accuracy of 2D hydrodynamic models. Local mesh reﬁnement and a local time stepping (LTS) strategy were used to address this aim. The implementation of the LTS algorithm on a 2D shallow-water dynamic model was investigated using the ﬁnite volume method on unstructured meshes. The model performance was evaluated using three canonical test cases, which discussed the inﬂuential factors and the adaptive conditions of the algorithm. The results of the numerical tests show that the LTS method improved the computational e ﬃ ciency and fulﬁlled mass conservation and solution accuracy constraints. Speedup ratios of between 1.3 and 2.1 were obtained. The LTS scheme was used for navigable ﬂow simulation of the river reach between the Three Gorges and Gezhouba Dams. This showed that the LTS scheme is e ﬀ ective for real complex applications and long simulations and can meet the required accuracy. An analysis of the inﬂuence of the mesh reﬁnement on the speedup was conducted. Coarse and reﬁned mesh proportions and mesh scales observably a ﬀ ected the acceleration e ﬀ ect of the LTS algorithm. Smaller proportions of reﬁned mesh resulted in higher speedup ratios. Acceleration was the most obvious when mesh scale di ﬀ erences were large. These results provide technical guidelines for reducing computational time for 2D hydrodynamic models on non-uniform unstructured grids. v 1 ) at 0 s was updated to U ∗ 1 (cid:16) h ∗ 1 , u ∗ 1 , v ∗ 1 (cid:17) after 0.04 The blank value at 0.02 s was estimated using the average of the values at the former and the latter time-step level, giving U (cid:48) 1 (cid:18) h 1 + h ∗ 1 2 , u 1 + u ∗ 1 2 , v 1 + v ∗ 1 2 (cid:19) . Immediately following this, the intermediate grids ( m = 1) were updated twice over this maximum time step. The values of variables at 0.01 and 0.03 s were similarly approximated. Finally, the reﬁned mesh grids ( m = 0) were updated four times over this interval. All variables were ultimately updated to 0.04 s, which allowed the next time step to be performed.


Introduction
The numerical simulation of shallow water flow is frequently used in flood forecasting, river regulation, and flood-control planning [1]. Given the need for better accuracy and breadth of shallow-water simulations in engineering applications, a balance between the number of cells, simulation accuracy, and computational time needs to be found. Addressing this issue using optimization of the solving algorithm [2][3][4] has received much research attention, as has the use of computer hardware acceleration and parallel computing technology [5][6][7].
The local mesh refinement technique is especially efficient for balancing the number of grids and the accuracy of the simulation. It is flexible and allows grids to be arranged according to different engineering concerns, while retaining a high level of refinement in regions requiring detail, such as where flows change sharply or where there are key features. A coarse mesh is used in general or slow-flowing areas. Consequently, the simulation accuracy can be ensured with only a small increase also valuable for advancing the knowledge base. No specific comparison of the acceleration efficiency of the LTS algorithm has been demonstrated in the existing research. To address these issues, the LTS technique was applied in the establishment of a high-efficiency, 2D, shallow-water, dynamic model using the FVM and unstructured grids. Experiments were performed to verify the flux approximation at the interface, to assess the accuracy of the results, and to analyze the feasibility analysis of the LTS algorithm for complex applications. Model evaluation was performed using 2D simulations of two instantaneous dam break test cases and engineering applications that simulate the navigable flow of the river reach between the Three Gorges and Gezhouba Dams. These tests show the acceleration effect, influencing factors, and the adaptive conditions of the LTS algorithm.

Governing Equation
The governing equations used herein are the 2D shallow-water equations (SWEs), written in conservation form [23]: In the present study, the modified form of the SWEs was selected to guarantee that the scheme was well-balanced. The variables follow from where U is the vector of conserved variables; F and G are the tensors of fluxes in the x and y directions, respectively; S 0 and S f are the bed and friction slope source terms, respectively; x and y represent the spatial horizontal coordinates; u and v represent the average velocity components from integration of water depth in the x and y directions, respectively; h is the flow depth; z b is the bed elevation above the datum; g represents the acceleration related to gravity; n is Manning's roughness coefficient; T s is the duration.

Numerical Technique
Because of the nonlinear characteristics of the SWEs, the analytical solution cannot be obtained generally but can be approximated by numerical discretization. The FVM was used to discretize the governing equation. The computational region was discretized into several points. With these points as the center, the whole computational domain was divided into several interconnected but non-overlapping control volumes. By integrating the basic equation with each control volume, a set of algebraic equations with the unknown quantity on the calculation node was obtained. A triangular mesh was used to discretize the computational domain to adapt the boundary conditions for a complex terrain. The cell-centered mode was used in the calculations, which means that the physical variables are defined in the center of the control volume [24], as shown in Figure 1. Integration of Equation (1) over the area of the governing volume , gives In Equation (2), = [ , ] represents a 2D matrix. The volume integral is transformed to a line integral along the edge of the control volume using Gauss' formula where is the average value of the governing cell; is the edge of the kth control volume  ; ∆ is the area of each calculation cell; is the unit vector of the normal direction outside the edge. After discretization of Equation (3), the following formula is obtained: where ∆ is the length of each side (using triangular grids, = 1, 2, 3); and and represent the numerical fluxes and the external normal unit vector of edge .
= ∫   and = ∫   = ∆ represent the integral values of the bed and friction slope source terms in the mesh cell, respectively. The notation , denotes the inner product. Commonly used approximate Riemann solvers for the discontinuous problem include the Osher, Roe, Harten-Lax-van Leer (HLL), and HLLC schemes [25]. In the present study, the fluxes at the interface of the governing cell were estimated using the Roe scheme. The physical variables and on the interface of the left and right grid were determined to solve the Riemann discontinuity problem. The flux expression is defined as follows [26][27][28][29]: where = ( ⋅ ) is the Roe average of the Jacobian matrix. The three eigenvalues of the Jacobian matrix are ( = 1,2,3), while the related eigenvectors are ̃ . The left and right interface conservative form variables are and , respectively. To improve the adaptability of the model to complicated terrain conditions, the source term reflecting the bottom topography was managed using an eigen-decomposition method, while upwind treatment was applied to balance the interface flux [30,31]: where Ω k is the kth control volume; n is the unit normal vector; p(i, j) is a grid point; and ∆l k j is the length of a side.
Integration of Equation (1) over the area of the governing volume Ω, gives In Equation (2), E = [F, G] represents a 2D matrix. The volume integral is transformed to a line integral along the edge of the control volume using Gauss' formula ∆u k ∆t where U k is the average value of the governing cell; l k is the edge of the kth control volume Ω k ; ∆s k is the area of each calculation cell; n is the unit vector of the normal direction outside the edge. After discretization of Equation (3), the following formula is obtained: where ∆l k j is the length of each side (using triangular grids, j = 1, 2, 3); and E k j and n k j represent the numerical fluxes and the external normal unit vector of edge j. S 0 = Ω i S 0 dΩ and S f = Ω i S f dΩ = ∆s k S f represent the integral values of the bed and friction slope source terms in the mesh cell, respectively. The notation E k j , n k j denotes the inner product.
Commonly used approximate Riemann solvers for the discontinuous problem include the Osher, Roe, Harten-Lax-van Leer (HLL), and HLLC schemes [25]. In the present study, the fluxes at the interface of the governing cell were estimated using the Roe scheme. The physical variables U L and U R on the interface of the left and right grid were determined to solve the Riemann discontinuity problem. The flux expression is defined as follows [26][27][28][29]: Water 2020, 12, 1148 ∂U is the Roe average of the Jacobian matrix. The three eigenvalues of the Jacobian matrix are λ k (k = 1, 2, 3), while the related eigenvectors are e k . The left and right interface conservative form variables are U L and U R , respectively.
To improve the adaptability of the model to complicated terrain conditions, the source term reflecting the bottom topography was managed using an eigen-decomposition method, while upwind treatment was applied to balance the interface flux [30,31]: where sign() is the sign function. In this case, Furthermore, c is the average velocity given by the Roe scheme; and are the bottom elevations of the left and right side, respectively.
An explicit scheme that discretizes the frictional slope source term was carried out to increase the stability of the computation. Eventually, the variation of the conservation after undergoing ∆t was obtained.
where I is the unit matrix. Superscripts n and n + 1 refer to the current and updated time levels, respectively. ∆t is the globally permissible time step, which must be computed according to the CFL stability condition [32]: where i is the computational cell index, with values of 1, 2, . . . , N e . The term N e refer to the total number of computational grid elements; d i,j is the distance from the center of the cell (triangle centroid) to the three sides; u i and v i are the velocities in the x and y directions, respectively, of cell i; h i is the flow depth of cell i. The Courant number Cr was set to 0.8. In the traditional GTS algorithm, the time step ∆t is equal to the minimum value for the whole computational domain and is proportional to the size of the grid elements. When the scale of each mesh element varies markedly, the uniform time-step restriction will severely reduce the efficiency of the model operation. Therefore, it is essential to explore an efficient method in which the time-step changes with the mesh size and satisfies the CFL conditions.

Local Time Stepping Scheme
The key approach in the LTS algorithm is to advance each cell with a time step as close as possible to its maximum allowable value while satisfying the stability of the CFL condition. The algorithm reconstructs the time-step level of each cell with the aim of reducing the computational load of coarse grids, consequently saving computational time. Figure 2 compares the implementation of GTS versus LTS algorithms. The LTS algorithm differs in two aspects: reconstructing the local time step of each cell (Section 2.2.1) and updating values of the conserved variables ∆U (Section 2.2.2). possible to its maximum allowable value while satisfying the stability of the CFL condition. The algorithm reconstructs the time-step level of each cell with the aim of reducing the computational load of coarse grids, consequently saving computational time. Figure 2 compares the implementation of GTS versus LTS algorithms. The LTS algorithm differs in two aspects: reconstructing the local time step of each cell (Section 2.2.1) and updating values of the conserved variables ∆ (Section 2.2.2).

Reconstructing the Local Time Step of Each Cell
This procedure is defined as follows: (1) Compute the initial time step ∆t * i The ∆t * i of each cell satisfying the CFL condition is computed via Equation (9a): where u and v are the maximum velocities in the x and y directions, respectively; and h is the maximum water depth. Next, calculate the minimum time step ∆t r in the whole computational domain: (2) Set an initial temporal resolution level for each cell In terms of the CFL condition, the initial time step ∆t * i in cell i is proportional to the mesh scale d i,j . Taking the minimum time step ∆t r over the whole computational domain as the reference standard, a binary logarithmic function is selected as the conversion tool. Each cell is classified based on its initial time step ∆t * i , with the allocation of an initial time-step level m * i , reflecting the mesh scale and having integer values of 1, 2, 4, 8, 16, 32, 64, . . . The initial m-level distribution map is shown in Figure 3a.
where the function of int() is to obtain the maximum integer that does not exceed the given number.
The maximum time step ∆ is

Calculation of Element Variables at the Interface
Once the local time-step level of each computation cell is reconstructed, no further cells are updated with the same time step but are instead updated with the locally allowable time step at each cell. The algorithm focuses on the convergence of the element variables at different time levels along the interface. In a maximum time step ∆ , the number of updates at each cell is calculated using The expression for Δ in Equation (7) is then replaced by Equation (13), in which Δ from Equation (11c) takes the place of Δ in each computing cell. The improved expression is as follows: Then, Equation (13) is iterated times, and the conserved variables of each cell are updated to the same time level , from which the variables at the next time level + 1 are calculated until the calculation time terminates.
The fluxes on the interface of the governing cell are estimated from the approximate Riemann solution of the Roe scheme. However, the variable values of all cells at the current time level need to be given to confirm the and values of the adjacent states on the left and right sides of the interface. In the initial GTS model, the unified global time step ∆ can meet these requirements. In the LTS algorithm, however, when the m-level of the current and neighboring cells differ within the maximum time step ∆ , the update times are different, and time splitting occurs at the interface. In the present study, the value of the missing element variable at the interface was obtained by linear interpolation of the value of the element variable in two adjacent time layers. This is shown in Figure 4, where the maximum time-step level = 2. At the starting time of 0, we assumed that ∆ = 0.01 s. First, the element variables of the coarse grid (m = 2) were computed only once, which consumes 0.04 s. Next, the values for the intermediate meshes (m = 1) were calculated twice, costing 0.02 s for each calculation. The local time step of the coarse mesh (m = 2) at the interface is, however, 0.04 s, which suggests that the variable values at 0.02 s are missing. In the present study, the variable values U(ℎ, , ) were estimated approximately using the average value of the former and the latter time step (1/2 of the variable value at 0 and 0.04 s). Finally, the refined grids (m = 0) were calculated at four intervals, with time steps of 0.01 s. At 0.01 and 0.03 s, the values of the variables (3) Modify the time-step level The initial m-level distribution calculated by Equation (10) is usually scattered and broken (see Figure 3a), especially within the transition zone between coarse and refined grids. This distribution varies dramatically because of the large difference in mesh size. Appropriate measures need to be taken to smooth the transition and to reduce mass and momentum errors in these zones related to the calculation of conserved variables at interfaces between cells with different m-levels. The initial m-levels are modified via Equation (11a), which produces a buffer zone.
where m i,j (j = 1, 2, 3) is the time-step level of the three neighboring cells of cell i. The evaluation of the time-step level m i in cell i is based on values of the neighboring cells: m i,1 , m i,2 , m i,3 . If the time-step level of the current cell is larger than that of the neighboring cells, then the m i value will be reduced by one; otherwise, it will be retained. Thus, the difference between the m-levels of the current and neighboring cells cannot exceed one. In this way, the time-step level m i (see Figure 3a) is modified via Equation (11a), and the new m-level distribution is as shown in Figure 3b. The maximum of time-step level m max is given by where L = m max + 1 refers to the local time-step rank of the LTS algorithm in the computation (L = 1 represents the GTS solution). Building on the requirement of simulation accuracy, the maximum value of the different time-step level m u is selected according to the simulation accuracy requirements, and the restriction condition . In this case, the time step adopted by each calculation cell ∆t b i is The maximum time step ∆t max is

Calculation of Element Variables at the Interface
Once the local time-step level of each computation cell is reconstructed, no further cells are updated with the same time step but are instead updated with the locally allowable time step at each cell. The algorithm focuses on the convergence of the element variables at different time levels along the interface. In a maximum time step ∆t max , the number of updates at each cell N i is calculated using The expression for ∆U in Equation (7) is then replaced by Equation (13), in which ∆t b i from Equation (11c) takes the place of ∆t r in each computing cell. The improved expression is as follows: Then, Equation (13) is iterated N i times, and the conserved variables of each cell are updated to the same time level n, from which the variables at the next time level n + 1 are calculated until the calculation time terminates.
The fluxes on the interface of the governing cell are estimated from the approximate Riemann solution of the Roe scheme. However, the variable values of all cells at the current time level need to be given to confirm the U R and U L values of the adjacent states on the left and right sides of the interface.
In the initial GTS model, the unified global time step ∆t can meet these requirements. In the LTS algorithm, however, when the m-level of the current and neighboring cells differ within the maximum time step ∆t max , the update times are different, and time splitting occurs at the interface. In the present study, the value of the missing element variable at the interface was obtained by linear interpolation of the value of the element variable in two adjacent time layers. This is shown in Figure 4, where the maximum time-step level m max = 2. At the starting time of 0, we assumed that ∆t r = 0.01 s. First, the element variables of the coarse grid (m = 2) were computed only once, which consumes 0.04 s. Next, the values for the intermediate meshes (m = 1) were calculated twice, costing 0.02 s for each calculation. The local time step of the coarse mesh (m = 2) at the interface is, however, 0.04 s, which suggests that the variable values at 0.02 s are missing. In the present study, the variable values U(h, u, v) were estimated approximately using the average value of the former and the latter time step (1/2 of the variable value at 0 and 0.04 s). Finally, the refined grids (m = 0) were calculated at four intervals, with time steps of 0.01 s. At 0.01 and 0.03 s, the values of the variables are absent at the interface of the cells having m = 1. The above approximation method was again used. At 0.04 s, all cell variables were updated to the same time level in preparation for the next iteration.
Water 2020, 12, x FOR PEER REVIEW 8 of 24 are absent at the interface of the cells having m = 1. The above approximation method was again used. At 0.04 s, all cell variables were updated to the same time level in preparation for the next iteration. .03 s were similarly approximated. Finally, the refined mesh grids (m = 0) were updated four times over this interval. All variables were ultimately updated to 0.04 s, which allowed the next time step to be performed.

Numerical Tests
Numerical tests were carried out to verify the reliability and adaptability of the LTS strategy within a 2D hydrodynamic model, and to make a comparison between the effects of the novel LTS and the original GTS strategies on the simulation results. To quantify these differences, we used the mean-square error (MSE) where ( ) refers to the MSE of variables (such as water depth ℎ, and velocity components and in the and directions, respectively), and the subscript L refers to the local time-step rank of the solutions.
The principle purpose of using the LTS algorithm instead of the GTS algorithm is to reduce the computing time. The relative time saving ratio and speedup ratio were used as evaluation indices of the computing efficiency for different L values.
The time saving ratio is given by The blank value at 0.02 s was estimated using the average of the values at the former and the latter time-step level, giving U 1 . Immediately following this, the intermediate grids (m = 1) were updated twice over this maximum time step. The values of variables at 0.01 and 0.03 s were similarly approximated. Finally, the refined mesh grids (m = 0) were updated four times over this interval. All variables were ultimately updated to 0.04 s, which allowed the next time step to be performed.

Numerical Tests
Numerical tests were carried out to verify the reliability and adaptability of the LTS strategy within a 2D hydrodynamic model, and to make a comparison between the effects of the novel LTS and the original GTS strategies on the simulation results. To quantify these differences, we used the mean-square error (MSE) where L s ( f ) refers to the MSE of variables f (such as water depth h, and velocity components u and v in the x and y directions, respectively), and the subscript L refers to the local time-step rank of the solutions. The principle purpose of using the LTS algorithm instead of the GTS algorithm is to reduce the computing time. The relative time saving ratio T r and speedup ratio S n were used as evaluation indices of the computing efficiency for different L values.
The time saving ratio T r is given by Water 2020, 12, 1148 10 of 24 The speedup ratio S n is given by where T GTS is the computation time of the GTS strategy, and T L is the computation time of the LTS strategy using different local time-step rank values (L).

Anti-Symmetric Dam Break Case
A classic verification experiment that considers the anti-symmetric square dam break case has been described by Fennema and Chaudhry [33]. The computational domain consists of a 200 × 200 m square region with a flat bottom (bed elevation of 0 m). In this problem, the dam is assumed to fail instantaneously at a certain moment, producing an anti-symmetric breach with a width of 75 m at the center of the calculation domain, which is 95 m from the x-axis boundary. The detailed geometry of this problem is shown in Figure 5. The speedup ratio is given by where is the computation time of the GTS strategy, and is the computation time of the LTS strategy using different local time-step rank values (L).

Anti-symmetric dam break case
A classic verification experiment that considers the anti-symmetric square dam break case has been described by Fennema and Chaudhry [33]. The computational domain consists of a 200 × 200 m square region with a flat bottom (bed elevation of 0 m). In this problem, the dam is assumed to fail instantaneously at a certain moment, producing an anti-symmetric breach with a width of 75 m at the center of the calculation domain, which is 95 m from the x-axis boundary. The detailed geometry of this problem is shown in Figure 5. The computational domain was discretized on triangular meshes. Local mesh refinement technology was adopted near the break to accurately reflect the important flow regions while controlling the number of grids. The spatial resolution level varied from 6 to 1 m, with a refined area of 5%. A total of 13,338 computing cells were generated. The LTS algorithm was applied to shorten the computing time and improve the calculation efficiency. The m-level distribution of the LTS strategy is presented in Figure 6. The simulation duration was = 160 s. At the initial moment, upstream water depth was set to 10.0 m, while downstream water depth was assumed to be 5.0 m. Manning's roughness coefficient was set to 0.03 [34]. The computational domain was discretized on triangular meshes. Local mesh refinement technology was adopted near the break to accurately reflect the important flow regions while controlling the number of grids. The spatial resolution level varied from 6 to 1 m, with a refined area of 5%. A total of 13,338 computing cells were generated. The LTS algorithm was applied to shorten the computing time and improve the calculation efficiency. The m-level distribution of the LTS strategy is presented in Figure 6. The simulation duration was T s = 160 s. At the initial moment, upstream water depth was set to 10.0 m, while downstream water depth was assumed to be 5.0 m. Manning's roughness coefficient was set to 0.03 [34].
Before making a quantitative analysis of the MSE generated by the LTS strategy, it is useful to verify the validity of the GTS scheme applied to the original model. The numerical solution results at t = 7.2 s were obtained using the GTS strategy; the resulting water surface profiles are shown in Figure 7a, contours of water depth in Figure 7b, and velocity vector distribution in Figure 7c. These results effectively simulated the dam break waves and are consistent with previous research results [34][35][36]. Before making a quantitative analysis of the MSE generated by the LTS strategy, it is useful to verify the validity of the GTS scheme applied to the original model. The numerical solution results at t = 7.2 s were obtained using the GTS strategy; the resulting water surface profiles are shown in Figure 7a, contours of water depth in Figure 7b, and velocity vector distribution in Figure 7c. These results effectively simulated the dam break waves and are consistent with previous research results [34][35][36].  Table 1 shows the MSE of water depth ℎ, as well as velocity components and along the and directions, respectively, of the simulation results. As the value of the local time-step rank L increased, the MSE increased accordingly. Both precision and efficiency are important when performing hydrodynamic numerical simulations. By reducing the value of L, the calculation efficiency can be improved when levels of high accuracy are reached. This observation agrees well with the research results of Hu et al. [37]. In the present study, the error of the solution using the  Table 1 shows the MSE of water depth h, as well as velocity components u and v along the x and y directions, respectively, of the simulation results. As the value of the local time-step rank L increased, the MSE increased accordingly. Both precision and efficiency are important when performing hydrodynamic numerical simulations. By reducing the value of L, the calculation efficiency can be improved when levels of high accuracy are reached. This observation agrees well with the research results of Hu et al. [37]. In the present study, the error of the solution using the LTS algorithm was within the 10 −3 to 10 −2 range. At the beginning of the calculation, the total water volume was 3 × 10 5 m 3 . When the value of L was 2, 3, and 4, the relative error of the water volume was 0.0017%, 0.0019%, and 0.0045%, respectively, which indicates that the simulation meets the accuracy and conservation of water volume requirements. Table 2 shows the computing time, time-saving ratio, and speedup ratio for different values of L. The simulation time of the dam break was 160 s. A time saving of 40% to 50% was achieved using the LTS algorithm in the numerical experiment of the anti-symmetric square dam break. For L = 4, a speedup ratio of 2.1 was achieved.
This clearly shows that the calculation efficiency of the numerical simulation of the anti-symmetric square dam break is improved using the LTS technique ( Figure 8). As the value of L increased, the speedup ratio also increased, thus shortening the model operation.
Water 2020, 12, x FOR PEER REVIEW 13 of 24 Table 2 shows the computing time, time-saving ratio, and speedup ratio for different values of L. The simulation time of the dam break was 160 s. A time saving of 40% to 50% was achieved using the LTS algorithm in the numerical experiment of the anti-symmetric square dam break. For L = 4, a speedup ratio of 2.1 was achieved.
This clearly shows that the calculation efficiency of the numerical simulation of the anti-symmetric square dam break is improved using the LTS technique ( Figure 8). As the value of L increased, the speedup ratio also increased, thus shortening the model operation.

Non-Flat Bottom Dam Break Case
In this test case, the length of the rectangular channel was 75 m, and its width was 30 m. Three obstacles were placed on the bottom of the rectangular channel. Two obstacles had a radius of 6.5 m and a height of 1 m, and one had a radius of 8.6 m and a height of 2 m. The elevation expression is defined as follows: Local mesh refinement was applied with a spatial resolution level varying from 0.5 to 3 m. The grid generation results are shown in Figure 9. The total number of grid cells for the region was 6848. In the calculation, a solid wall boundary condition was adopted. Manning's roughness coefficient was set to 0.018 [30]. The water depth for 16 m in the computational domain was set to 2 m

Non-Flat Bottom Dam Break Case
In this test case, the length of the rectangular channel was 75 m, and its width was 30 m. Three obstacles were placed on the bottom of the rectangular channel. Two obstacles had a radius of 6.5 m and a height of 1 m, and one had a radius of 8.6 m and a height of 2 m. The elevation expression is defined as follows: Local mesh refinement was applied with a spatial resolution level varying from 0.5 to 3 m. The grid generation results are shown in Figure 9. The total number of grid cells for the region was 6848. In the calculation, a solid wall boundary condition was adopted. Manning's roughness coefficient was set to 0.018 [30]. The water depth for x ≤ 16 m in the computational domain was set to 2 m and was assumed to be 0 m elsewhere. The variation of the ground level is shown in Figure 10a. In the GTS scheme, the time step was 0.005 s, the initial total water volume was 959.841 m 3 , and the final water volume was 959.978 m 3 . The relative error was 0.01%, which is within the discrete error range. The model dealt well with wetting and drying fronts over the complex terrain, satisfied computational stability, and conservation of water volume requirements, and provided results that were consistent with previous experimental results [38,39].
Water 2020, 12, x FOR PEER REVIEW 14 of 24 and was assumed to be 0 m elsewhere. The variation of the ground level is shown in Figure 10a. In the GTS scheme, the time step was 0.005 s, the initial total water volume was 959.841 m 3 , and the final water volume was 959.978 m 3 . The relative error was 0.01%, which is within the discrete error range. The model dealt well with wetting and drying fronts over the complex terrain, satisfied computational stability, and conservation of water volume requirements, and provided results that were consistent with previous experimental results [38,39]. The distribution of m-values obtained by applying the LTS strategy is shown in Figure 9. The maximum time-step level is 2. The simulation of the dam break water flow was modeled for = 100 s. The inundation related to the dam break is shown in Figure 10. To observe the influence of the LTS algorithm on the simulation results, four bathymetric measuring points were arranged (see Figure 9 for the location of the measuring points). Figure 11 provides a comparison of water depths generated by the LTS algorithm and the classical algorithm during the process of the dam break. A visual inspection indicates that the simulated water depths from the two algorithms are essentially the same. Thus, the LTS technique improves the efficiency of calculation without any reduction in accuracy.
According to the statistical results for this simulation (shown in Table 3), adopting the LTS technology in the numerical experiment of the non-flat bottom dam break created a saving of up to 26% in computation time, and a speedup ratio of 1.3, without compromising the solution accuracy. The best result was obtained for L = 3, which reduced the computational cost and improved the calculation efficiency.    The distribution of m-values obtained by applying the LTS strategy is shown in Figure 9. The maximum time-step level m max is 2. The simulation of the dam break water flow was modeled for T s = 100 s. The inundation related to the dam break is shown in Figure 10. To observe the influence of the LTS algorithm on the simulation results, four bathymetric measuring points were arranged (see Figure 9 for the location of the measuring points). Figure 11 provides a comparison of water depths generated by the LTS algorithm and the classical algorithm during the process of the dam break. A visual inspection indicates that the simulated water depths from the two algorithms are essentially the same. Thus, the LTS technique improves the efficiency of calculation without any reduction in accuracy.

Navigable Flow Simulation Case
There are several noticeable perilous waterways between the Three Gorges Dam and the Gezhouba Dam. The river section is about 38 km long [40,41]. Bed elevations measured in October 2008 provided by the Three Gorges Navigation Authority were used as the initial bed topography. In the present study, the three segments with the highest risk to shipping (the Letianxi, Liantuo, and Shipai riverways), and the outlet of the Three Gorges Reservoir are adopted for local mesh refinement. The spatial resolution level ranged from 10 to 100 m, and 36,297 cells and 19,011 nodes were generated. The results of the grid generation and refined areas are shown in Figure 12. According to the statistical results for this simulation (shown in Table 3), adopting the LTS technology in the numerical experiment of the non-flat bottom dam break created a saving of up to 26% in computation time, and a speedup ratio of 1.3, without compromising the solution accuracy. The best result was obtained for L = 3, which reduced the computational cost and improved the calculation efficiency.

Navigable Flow Simulation Case
There are several noticeable perilous waterways between the Three Gorges Dam and the Gezhouba Dam. The river section is about 38 km long [40,41]. Bed elevations measured in October 2008 provided by the Three Gorges Navigation Authority were used as the initial bed topography. In the present study, the three segments with the highest risk to shipping (the Letianxi, Liantuo, and Shipai riverways), and the outlet of the Three Gorges Reservoir are adopted for local mesh refinement. The spatial resolution level ranged from 10 to 100 m, and 36,297 cells and 19,011 nodes were generated. The results of the grid generation and refined areas are shown in Figure 12. The LTS algorithm was applied to the numerical simulation of the navigable flow of the waterway. The results are provided in Table 4. The maximum time-step level ( = 4) and the minimum time step (∆ = 0.03 s) were calculated for the whole computational domain. The simulation duration was set to = 4 h. The absolute deviation in the water level between the calculated and measured values was 0.06 to 0.07 m, and the relative deviation in flow velocity was between −3.8% and 4.6%. There is no significant difference between the LTS and GTS algorithm simulation results for water level and flow velocity. The mass errors are non-negligible for a large computational area and a long simulation. This was addressed by adopting a smaller maximum time-step level . The obvious acceleration effect that this provides in the real application ensures that accuracy requirements are met. It is clear that the LTS algorithm has provided obvious improvements in the execution time without compromising the solution accuracy. This is of practical value and importance in navigation. The LTS algorithm was applied to the numerical simulation of the navigable flow of the waterway. The results are provided in Table 4. The maximum time-step level (m max = 4) and the minimum time step (∆t r = 0.03 s) were calculated for the whole computational domain. The simulation duration was set to T s = 4 h. The absolute deviation in the water level between the calculated and measured values was 0.06 to 0.07 m, and the relative deviation in flow velocity was between −3.8% and 4.6%. There is no significant difference between the LTS and GTS algorithm simulation results for water level and flow velocity. The mass errors are non-negligible for a large computational area and a long simulation. This was addressed by adopting a smaller maximum time-step level m max . The obvious acceleration effect that this provides in the real application ensures that accuracy requirements are met. It is clear that the LTS algorithm has provided obvious improvements in the execution time without compromising the solution accuracy. This is of practical value and importance in navigation. 2008.08. 20

The Influence of the Proportion of Refined Mesh on the Acceleration Effect
In the anti-symmetric dam break case, the 75-m breach was used as the core area in which to analyze the acceleration effect of the LTS algorithm. In this area, we set the refined area to different proportions of the total area (4 hm 2 ). The spatial resolution level varied from 1 to 6 m, with a minimum resolution close to the breach of 1 m, surrounded by a smooth transition up to a maximum resolution of 6 m in the outer domain. The mesh generation results for the refined areas for various proportions in the anti-symmetric dam break case is shown in Figure 13. The numerical simulation was performed for refined areas of 5%, 10%, 25%, 50%, and 75%. The impact of increasing the local refinement area on the acceleration effect was analyzed for a time-step level of m = 0 in the refinement area, m = 2 in the coarse grid area, and m = 1 in the buffer zone.

The Influence of the Proportion of Refined Mesh on the Acceleration Effect
In the anti-symmetric dam break case, the 75-m breach was used as the core area in which to analyze the acceleration effect of the LTS algorithm. In this area, we set the refined area to different proportions of the total area (4 hm 2 ). The spatial resolution level varied from 1 to 6 m, with a minimum resolution close to the breach of 1 m, surrounded by a smooth transition up to a maximum resolution of 6 m in the outer domain. The mesh generation results for the refined areas for various proportions in the anti-symmetric dam break case is shown in Figure 13. The numerical simulation was performed for refined areas of 5%, 10%, 25%, 50%, and 75%. The impact of increasing the local refinement area on the acceleration effect was analyzed for a time-step level of m = 0 in the refinement area, m = 2 in the coarse grid area, and m = 1 in the buffer zone. The duration of the dam break was kept at 160 s. Table 5 shows the number of generated grids, computational cost, speedup ratio, and time-saving ratio for different refined area proportions. As the refined area was increased, the total number of meshes generated by partitioning increased. The grid number increased from 13,338 at 0.2 hm 2 to 73,306 at 3 hm 2 , which represents an increase by a factor of 5.5. The operating time of the model rose from 608 to 3394 s, which represents an increase by a factor of 5.6. The number of grids is a key factor determining the overhead of computing. There is a direct positive correlation between the number of grids and computing overheads. The duration of the dam break T s was kept at 160 s. Table 5 shows the number of generated grids, computational cost, speedup ratio, and time-saving ratio for different refined area proportions. As the refined area was increased, the total number of meshes generated by partitioning increased. The grid number increased from 13,338 at 0.2 hm 2 to 73,306 at 3 hm 2 , which represents an increase by a factor of 5.5. The operating time of the model rose from 608 to 3394 s, which represents an increase by a factor of 5.6. The number of grids is a key factor determining the overhead of computing. There is a direct positive correlation between the number of grids and computing overheads. The speedup ratio was strongly affected by the ratio of the locally refined area to the total area. As the proportion of the refined area increased, the speedup ratio obtained by the LTS algorithm diminished, reducing the acceleration effect. When L = 3, the speedup ratio decreased from 1.39 times to 1.01 times; when the refined area was increased from 5% to 75%, there was scarcely any improvement in computational efficiency at high proportions of refined area. These patterns occurred because a high proportion of refined area brings about an unreasonable number of refined grids (with a local time-step level, m = 0), which reduces the number of coarse grids (m = 1, 2) that help optimize the operation. This suggests that little improvement is gained by implementing the LTS algorithm in cases that have a high proportion of refined grids. Conversely, when refined grids account for a small proportion of the total number of grids, the LTS algorithm can markedly shorten the computing time and achieve an excellent acceleration effect. Furthermore, the results in Figure 14 suggest that L has an influence on the calculation efficiency. The acceleration ratio increased as L increased.

The Impact of the Different Scale of the Mesh on Acceleration Effect
In the case study of the anti-symmetric dam break, the test was carried out using a refinement of 5% of the breach area, as shown in Figure 13a. A grid discretization method was employed to analyze the influence of using different grid scales on the acceleration effect while using the LTS algorithm. The refined grid had a minimum spatial step of 1 m in the burst section, while the coarse grids had spatial steps of 2, 3, 4, 5, and 6 m. The computational results are shown in Table 6 and Figure 15. These results indicate that the speedup ratio improves with the increasing differences in grid resolution. The CFL condition of Equation (9a) shows that the time step of the mesh is positively correlated with the mesh scale difference. As the difference between the maximum and minimum scales of the grids increases, the maximum time-step level becomes higher, and a larger L can be adopted. In this test, once the spatial step of the coarse grid was 4, 5, and 6 m, it was possible to set the L value to 3, while the other two conditions were only subdivided into two grades. This explains why greater grid diversity leads to a higher speedup ratio.

The Impact of the Different Scale of the Mesh on Acceleration Effect
In the case study of the anti-symmetric dam break, the test was carried out using a refinement of 5% of the breach area, as shown in Figure 13a. A grid discretization method was employed to analyze the influence of using different grid scales on the acceleration effect while using the LTS algorithm. The refined grid had a minimum spatial step of 1 m in the burst section, while the coarse grids had spatial steps of 2, 3, 4, 5, and 6 m. The computational results are shown in Table 6 and Figure 15. These results indicate that the speedup ratio improves with the increasing differences in grid resolution. The CFL condition of Equation (9a) shows that the time step of the mesh is positively correlated with the mesh scale difference. As the difference between the maximum and minimum scales of the grids increases, the maximum time-step level m max becomes higher, and a larger L can be adopted. In this test, once the spatial step of the coarse grid was 4, 5, and 6 m, it was possible to set the L value to 3, while the other two conditions were only subdivided into two grades. This explains why greater grid diversity leads to a higher speedup ratio.

Conclusions
Improvements in the computational efficiency of the 2D hydrodynamic model have a significant bearing on its engineering applications. In the present study, an improved 2D shallow-water dynamic model applying an LTS algorithm was established using an FVM on a triangular mesh. Results from simulation and analysis of three canonical test cases show that the LTS scheme has a satisfactory ability for adapting real complex applications and long simulations while meeting the required accuracy. The following conclusions were drawn: (1) Based on the FVM for unstructured grids, an LTS algorithm was implemented that improved the computational efficiency of the model, while satisfying water conservation conditions. In the anti-symmetric dam break case, a speedup ratio of 2.1 was achieved, which saved 53% in execution time. The speedup ratio of the non-flat bottom dam break case was 1.3, which represented a shortening of 26% in the calculation time. The numerical simulation of the navigable flow of the river reach between the Three Gorges and Gezhouba Dams achieved a speedup ratio of 1.9, which represented a saving of 49% in modeling time.
(2) The proportions of coarse to refined meshes on the acceleration effect of the LTS algorithm were noticeable. It was evident that a higher speedup ratio was obtained when the proportion of the refined mesh was minimized. When the proportion of the refined mesh was high, the acceleration effect was not significant. It is clear that the LTS algorithm is best suited to situations in which refinement is only required in small regions.
(3) When using the LTS algorithm on non-uniform unstructured grids, the larger the grid scale difference, the more obvious the grid layering became. This led to increased acceleration effects. However, computational accuracy was slightly impaired by excessive differences in grid mesh size.

Conclusions
Improvements in the computational efficiency of the 2D hydrodynamic model have a significant bearing on its engineering applications. In the present study, an improved 2D shallow-water dynamic model applying an LTS algorithm was established using an FVM on a triangular mesh. Results from simulation and analysis of three canonical test cases show that the LTS scheme has a satisfactory ability for adapting real complex applications and long simulations while meeting the required accuracy. The following conclusions were drawn: (1) Based on the FVM for unstructured grids, an LTS algorithm was implemented that improved the computational efficiency of the model, while satisfying water conservation conditions. In the anti-symmetric dam break case, a speedup ratio of 2.1 was achieved, which saved 53% in execution time. The speedup ratio of the non-flat bottom dam break case was 1.3, which represented a shortening of 26% in the calculation time. The numerical simulation of the navigable flow of the river reach between the Three Gorges and Gezhouba Dams achieved a speedup ratio of 1.9, which represented a saving of 49% in modeling time. (2) The proportions of coarse to refined meshes on the acceleration effect of the LTS algorithm were noticeable. It was evident that a higher speedup ratio was obtained when the proportion of the refined mesh was minimized. When the proportion of the refined mesh was high, the acceleration effect was not significant. It is clear that the LTS algorithm is best suited to situations in which refinement is only required in small regions. (3) When using the LTS algorithm on non-uniform unstructured grids, the larger the grid scale difference, the more obvious the grid layering became. This led to increased acceleration effects. However, computational accuracy was slightly impaired by excessive differences in grid mesh size.