A New Parallel Framework of SPH-SWE for Dam Break Simulation Based on OpenMP

Due to its Lagrangian nature, Smoothed Particle Hydrodynamics (SPH) has been used to solve a variety of fluid-dynamic processes with highly nonlinear deformation such as debris flows, wave breaking and impact, multi-phase mixing processes, jet impact, flooding and tsunami inundation, and fluid–structure interactions. In this study, the SPH method is applied to solve the two-dimensional Shallow Water Equations (SWEs), and the solution proposed was validated against two open-source case studies of a 2-D dry-bed dam break with particle splitting and a 2-D dam break with a rectangular obstacle downstream. In addition to the improvement and optimization of the existing algorithm, the CPU-OpenMP parallel computing was also implemented, and it was proven that the CPU-OpenMP parallel computing enhanced the performance for solving the SPH-SWE model, after testing it against three large sets of particles involved in the computational process. The free surface and velocities of the experimental flows were simulated accurately by the numerical model proposed, showing the ability of the SPH model to predict the behavior of debris flows induced by dam-breaks. This validation of the model is crucial to confirm its use in predicting landslides’ behavior in field case studies so that it will be possible to reduce the damage that they cause. All the changes made in the SPH-SWEs method are made open-source in this paper so that more researchers can benefit from the results of this research and understand the characteristics and advantages of the solution proposed.


Introduction
The Smooth Particle Hydrodynamics (SPH) is a meshless method [1] very commonly used nowadays [2][3][4][5][6][7][8][9][10][11][12]. Gingold and Monaghan [13] were the first to propose this method to solve astrophysical simulations, using statistical techniques to recover analytical expressions for the physical variables from a known distribution of fluid elements. The SPH method is typically used for solving the equations of hydrodynamics in which Lagrangian discretized mass elements are followed [14].
Over the years, with the development of parallel computing, the development of OpenMP and MPI in parallel methods has matured, and MPI is widely used in the field of engineering computing [59][60][61][62][63]. However, in this multi-machine cluster environment, memory is not shared. For global shared data operations, data must be transferred by the communication between machines [64,65].
OpenMP is based on the shared storage mode of multi-core processors, and it is commonly used in parallel processing of single workstations. Although it is limited by the processing capacity and memory capacity of a single node, it can simplify the past multi-core computing to the present multi-core computers. The program design is relatively simple, and it can secure the advantages of economic and programming optimizations [66].
Therefore, this paper adopts the parallel computing method of CPU-OpenMP that is applied to a single machine and a multi-core to calculate the new SPH-SWEs framework for the parallel computing of the test case of 2-D dry-bed dam break with particle splitting [67,68] and a 2-D debris flow with a rectangular obstacle downstream the dam [2]. The accuracy of the new SPH-SWEs framework was then verified by comparing the serial algorithm, and the advantages of CPU-OpenMP parallel computing were analyzed.
The paper is organized as follows: Section 2 describes the methodology adopted presenting the theoretical derivation of the numerical model applied and the governing equations. Section 3 explains the setup of the SPH-SWEs method used. Section 4 provides the results of the application tested with a discussion of the results obtained. Lastly, Section 5 produces a brief summary and concluding remarks of the whole study.

Methodology
Ata et al., [30] and Paz and Bonet [32] have initiated the idea of SPH-SWE model solution, which is described in Section 2.1 with all the governing equations.

Governing Equations
By ignoring the Coriolis effect and the fluid viscosity, SWEs can be written in the Lagrangian form as follows: dd dt = −d∇·v dv dt = −g∇d + g(∇b + S f ) (1) d represents the water depth, g the acceleration of gravity, v is velocity, b represents the riverbed elevation, and S f represents the riverbed friction. In the SWEs, the area density is defined as: ρ represents the density, and ρ w represents the density of water.

Water Depth Solutions
According to the SPH idea, the area density (i.e., water depth) of particles is solved as shown below in the implicit function: xi/xj represents the particle coordinates; mj represents the particle mass of j; h i and ρ i represent the smooth length and the area density of the particle i; h 0 and ρ 0 represent the initial values of the smooth length and the area density, respectively; d m represents the latitude (1 represents one dimension, 2 represents two dimensions) and W represents the kernel function.

Speed Solution
According to the Lagrangian equation of motion [69], a i is the acceleration of the particle i, and the solution formula of each particle can be obtained as follows: k i = ∇(∇b i ) represents b(x) of the curvature tensor [70]; t i represents the acceleration caused by the internal force; ∇b i represents the riverbed gradient of the particle i, and in order to deal with any complex terrain problem, the riverbed gradient can be modified as follows [71,72]: ∇W i denotes the gradient of the modified kernel function, which is modified by the correction matrix L i , as shown below: In order to reduce the numerical oscillation and ensure the stability of the calculation, one method is to increase the viscosity term as introduced below [73]: ρ ij x ij 2 +ζ 2 (7) β represents the correction coefficient caused by variable smooth length; r ij represents the particle spacing; π ij represents the numerical viscosity added to maintain stability. However, this method has the problem of numerical dissipation.
To reduce this issue, the interaction between two particles was treated as a Riemann problem [74], as follows: d l and d r represent the water depth on the left and right sides, respectively; k = l and k = r represent the left and right states, respectively; d 0 represents the initial estimated water depth; c = gh represents the shallow water wave velocity.

Time Integration and Boundary Processing
In order to update the particle velocity and displacement, the leap-frog time integration scheme [75] was used. By using this method, both time and space are of second-order accuracy, and the storage demand is relatively low, however the calculation efficiency is relatively high, as shown below: ∆t represents the time step; where the time step must meet the Courant number condition [76] displayed as follows: To solve the boundary problem, this study adopts the Modified Virtual Boundary Particle (MVBP) method [36]. MVBP method is an improvement of the virtual boundary particle (VBP) method [77]. The virtual particles on the boundary will neither move with the fluid particles nor interact with them but generate the virtual particles similar to the mirror image through point symmetry. This method is easier and simpler in dealing with the complex boundary.
Compared with the VBP method, the MVBP method has two improvements: (1) When a virtual boundary particle is within the range of the kernel function of a fluid particle, two layers of newly generated virtual particles can be added (X k,1 = 2X v − X i and X k,2 = 4X v − X i ). Among them, X k,1 and X k,2 represent the coordinates of the newly generated virtual particles, and X v represents the virtual boundary particles; (2) When the internal angle of the boundary is less than or equal to 180 • , two newly generated virtual particles are added outside the corner. Compared with the single point of the VBP method, this improvement reduces the kernel truncation error.
In order to verify the effect of different kernel functions to simulate the dam break, this paper used the SPH-SWE open source code [34][35][36][37]68] and applied different kernel functions (B-spline, super Gauss, quadratic spline, Gauss, quartic spline, quintic and Bell) to simulate case 2 open source scenario [68]. These were the initial conditions considered for the dam-break: (i) simulation area was 2000 m long; (ii) the river bed elevation was 0 m; (iii) the initial fluid particle area was 1000 m long; (iv) the particle spacing was 10 m; (v) the initial water depth was 10 m, (vi) and the simulation duration was 50 s. The position, the water depth, and speed of the fluid particles were obtained as an output every 10 s. After verification, the advantages and disadvantages of different kernel functions and different numerical oscillation processing methods (see Equations (7)-(8)) identified from the results are consistent. At t = 50 s, the water depth dissipated after 1500 m, and the results of different kernel functions and different numerical oscillation processing methods can be visually reflected through the graphs in Figure 1; at the same time, it also illustrates the continuity of the numerical oscillation issue. Therefore, the data of t = 50 s (Figure 1 results) was selected for analysis in this study.
It can be noticed from Figure 1 that in the numerical simulation of dam break based on SPH-SWEs approach, all kernel functions are characterized by numerical oscillation except the option where the Bell kernel function is considered. The three kernel functions that provide a more accurate estimation of the water depth and the velocity are B-spline, quadratric spline, and quartic spline (above 85.7%). Figure 2a-d shows the absolute error between the calculated water depths and velocities using these three kernel functions vs. the analytical solution. Results displayed confirm the optimal performance of the B-spline kernel function in dam break simulations. It can be noticed from Figure 1 that in the numerical simulation of dam break based on SPH-SWEs approach, all kernel functions are characterized by numerical oscillation except the option where the Bell kernel function is considered. The three kernel functions that provide a more accurate estimation of the water depth and the velocity are B-spline, quadratric spline, and quartic spline (above 85.7%). Figure 2a-d shows the absolute error between the calculated water depths and velocities using these three kernel functions vs. the analytical solution. Results displayed confirm the optimal performance of the B-spline kernel function in dam break simulations.  If the kernel function adopted is B-spline, the numerical oscillation is treated as adding the viscosity term (numerical viscosity, lax Friedrichs flux [78]). The results are shown in Figures 2d and  3. It is shown in Figure 3 that the Riemann solvers method has better characteristics in dealing with numerical oscillation problems when using the same kernel function. This confirms that it is possible  If the kernel function adopted is B-spline, the numerical oscillation is treated as adding the viscosity term (numerical viscosity, lax Friedrichs flux [78]). The results are shown in Figures 2d and 3. It is shown in Figure 3 that the Riemann solvers method has better characteristics in dealing with numerical oscillation problems when using the same kernel function. This confirms that it is possible to increase the viscosity by increasing π ij in Equation (7), which helps to reduce the numerical oscillation; however, this can cause a decrease in the accuracy of the calculation results and an possible increase on computational time. If the kernel function adopted is B-spline, the numerical oscillation is treated as adding the viscosity term (numerical viscosity, lax Friedrichs flux [78]). The results are shown in Figures 2d and 3. It is shown in Figure 3 that the Riemann solvers method has better characteristics in dealing with numerical oscillation problems when using the same kernel function. This confirms that it is possible to increase the viscosity by increasing πij in Equation (7), which helps to reduce the numerical oscillation; however, this can cause a decrease in the accuracy of the calculation results and an possible increase on computational time. According to Figures 2a,d and 3, it can be found that the two-shocks Riemann solver is more advantageous in dealing with numerical oscillations when considering dam break cases. The absolute error of the solutions considered is significantly small. Table 1 displays the errors of the three methods adopted, and it can be seen that the standard deviations of the velocity and water depth (WD) of the three methods are small and, basically, of similar magnitude. However, the average  Figure 3, it can be found that the two-shocks Riemann solver is more advantageous in dealing with numerical oscillations when considering dam break cases. The absolute error of the solutions considered is significantly small. Table 1 displays the errors of the three methods adopted, and it can be seen that the standard deviations of the velocity and water depth (WD) of the three methods are small and, basically, of similar magnitude. However, the average relative errors of water depth and speed solved by the two-shocks Riemann solver seem to be the smallest, 0.92% and 5.86%, respectively. In order to verify the effect of the selected B-spline kernel function and the numerical oscillation processing method, a wet case was simulated (simulation range of 2000 m, initial water depth in the range of 0-1000 m is 10 m, and water depth in the range of 1000-2000 m is 5 m. The simulation time was 50 s, and the calculation results are generated every 10 s). The results are summarized in Table 2 and are shown in Figure 4.
According to the numbers displayed in Table 2, using the SPH-SWE model adopting the B-spline kernel function and the two-shocks Riemann solver method to solve the wet case, the simulation results are better, and their mean absolute errors to simulate velocity and water depth are within the 6%.
Based on these results, it was decided to select the B-spline kernel function and the two-shocks Riemann solver method to perform the two-dimensional dam-break numerical simulation to verify the computational efficiency of the new SPH-SWE model solution framework proposed.   According to the numbers displayed in Table 2, using the SPH-SWE model adopting the B-spline kernel function and the two-shocks Riemann solver method to solve the wet case, the simulation results are better, and their mean absolute errors to simulate velocity and water depth are within the 6%.
Based on these results, it was decided to select the B-spline kernel function and the two-shocks Riemann solver method to perform the two-dimensional dam-break numerical simulation to verify the computational efficiency of the new SPH-SWE model solution framework proposed.

SPH-SWE Model Solution Framework
When simulating dam break cases with a large number of particles involved, there is a challenge to be faced associated with low calculation efficiency. The code runs in serial steps and before each variable calculation, the mesh is divided into small grids (mesh size is 2H, smooth length, in order to calculate the corresponding parameters of each particle), making the process more repetitive and requiring a lot of calculation time. Moreover, the code framework is complex and it is demanding to complete any modification. As the open source code solves the model with a large number of particles, it has the problem of low calculation efficiency and cannot even be calculated (the reason is that the array overflows). This paper proposes a new SPH-SWE model solution framework, which

SPH-SWE Model Solution Framework
When simulating dam break cases with a large number of particles involved, there is a challenge to be faced associated with low calculation efficiency. The code runs in serial steps and before each variable calculation, the mesh is divided into small grids (mesh size is 2H, smooth length, in order to calculate the corresponding parameters of each particle), making the process more repetitive and requiring a lot of calculation time. Moreover, the code framework is complex and it is demanding to complete any modification. As the open source code solves the model with a large number of particles, it has the problem of low calculation efficiency and cannot even be calculated (the reason is that the array overflows). This paper proposes a new SPH-SWE model solution framework, which can dynamically allocate the storage space of particle information, solve the problems of repeated particle search and unsuccessful memory allocation of the array of stored particle information, and can quickly solve the large-scale SPH-SWE model. Furthermore, the model framework proposed is simple, and any modification can be made easily to this algorithm.
For this study, Algorithm 1 was developed to solve the problem of data analysis and realize the CPU-OpenMP parallel computing. Algorithm 1. Calculation framework of the SPH-SWEs model. This algorithm is needed to read the particles data (include fluid particles/virtual particles/open boundary particle/riverbed particles).

Read parameters
Output initial data of the model Mesh riverbed particles and calculate fluid particles and the net water depth Loop 1 Step 1: Calculate the water depth of fluid particles Loop 3 .
Step 2: Calculate time water depth of fluid particles and the speed gradient Loop 4 .
Step 3: Calculate time increments. ∆t = CFL Step 4: Calculate accelerations of fluid particle,corrections of riverbed gradients,speeds, and displacements Step 5: Calculate displacements of open boundary particles.
Step 6: Step 7: Calculate fluid particle of the riverbed Loop 1 .

end do
The calculation of each time step includes five parts, which can be summarized as follows: 1.
Calculation of fluid particles in the riverbed; 2.
Particle search, and calculation of the water depth; 3.
Calculation of the fluid particle depth and velocity gradient; 4.
Calculation of velocity and displacement rate.
In this paper, multiple two-dimensional arrays were used to store the information of fluid particles, riverbed particles, virtual particles, and open boundary particles. When calculating the parameters related to the above five parts, each particle can be calculated separately, and there is no dependency between particles. Therefore, the CPU-OpenMP parallel computing can be realized (since each cycle is for all fluid particles, the calculation amount is known, so the schedule in the cycle configuration is set to Static mode) and the SPH-SWE model with a large number of particles can also be calculated.

Fluid Particle Riverbed Calculation
When calculating the riverbed particles in each time step, considering that the bed range and the smooth length of the riverbed particles are the same and there is no relationship between them, the calculation of the riverbed fluid particles was carried as follows.
Firstly, the grid division was completed, then the calculation of the riverbed fluid particles was conducted. It was verified that the calculation efficiency has no advantage, and additional array was needed to store riverbed particle information and increase data reading time. Please see below Algorithm 2 for the calculation framework adopted to achieve this task. Algorithm 2. Computing fluid particle riverbed.
Calculate particles' mesh locations based on the riverbed's mesh 7.
CALL PURE celij_hb (i, h_t(i), sum_h_t(i)) 9. endif 10. enddo 11. !$OMP END PARALEL DO Because the calculated variables were two arrays, h_t and sum_h_t, it was not difficult to realize OpenMP parallel, and multiple threads were set to calculate the riverbed fluid particles at the same time, following these formulae: After the calculation, the CSPM [79] was corrected, as shown in the next formula:

Particle Search
In this paper, the particle search technique [80,81] was used as a separate module to prepare for the calculation of parameters such as water depth, acceleration, and velocity.
In the particle search, the mesh was firstly divided, and then the mesh area of each particle search was calculated; in order to ensure the symmetry of particle interaction, less than 2h i or 2h j was used as the judgment condition of i effective particles.
The specific steps of the particle search technique adopted [82] (see Figure 5) are described as follows: 1.
Before each time step, the temporary grid position was updated, and each grid was assigned to a unique number; the grid size can be set to a fixed size dx_grid/dy_grid; 2.
According to the position of the current SPH particles, all the SPH particles were allocated to the temporary mesh space, and the particle chain in the mesh was established;

3.
According to the range (2h i ) of the tight support region of particle i, the search of other meshes (-xsize to xsize, -ysize to ysize) was completed in the tight support region of the mesh, storing the mesh number; 4.
All the SPH particles i and j in the mesh were searched (icell-xsize to icell + size, jcell-ysize to jcell + ysize) in the tight support domain.
Water 2020, 12, x FOR PEER REVIEW 12 of 31 3. According to the range (2hi) of the tight support region of particle i, the search of other meshes (-xsize to xsize, -ysize to ysize) was completed in the tight support region of the mesh, storing the mesh number; 4. All the SPH particles i and j in the mesh were searched (icell-xsize to icell + size, jcell-ysize to jcell + ysize) in the tight support domain. To avoid high time consumption caused by repeated particle search in the meshless SPH-SWE model, Algorithm 3 was produced. To avoid high time consumption caused by repeated particle search in the meshless SPH-SWE model, Algorithm 3 was produced.
In the open source code, the information of particles was stored in a three-dimensional array, and the grid was divided by a maximum smooth length of 2hmax (by adopting this, all particles can be stored into a cell, causing failure of memory allocation; however, the particle search method mentioned above solved this issue).

Water Depth Calculation
According to the Newton-iteration method, the water depth of particles was calculated, and the maximum number of iterations and the iteration-errors were taken as a criteria to terminate the iterations. In order to reduce the calculation time while ensuring the accuracy of the results, the maximum number of iterations of each particle was generally set to 50, and the iteration error was setup to 10 −3 . Nevertheless, different approaches can be selected according to different calculation models.
Before each calculation step, the water depth and smooth length of each particle were guessed. At the same time, the smooth length h and the correction coefficient α i k were re-calculated and updated.
In the same time step, the updated smooth length was then used for the sub-sequent SPH interpolation. The calculation framework is displayed in Algorithm 4. Calculate search mesh range of particle i:xsize/yszie 11. do row ∈ -ysize,ysize 12. irow=jcell+1 13.
Calculate number of search grid: gridn 16.
!Search for Virtual particles in the scope of I particle 18.
if particle_i and particle_j are neighbours then 20.
!Search for Fluid particles in the scope of i particle 24.
if particle_i and particle_j are neighbours then 26.
!Search for Open boundary particles in the scope of i particle 30.
if particle_i and particle_j are neighbours then 32. Write

!$OMP END PARALLEL DO
After each water depth calculation, each particle was judged on whether the error requirements were met, and re-calculation was then completed for the particles that did not meet them. After this iteration cycle, the water depth, speed of the water, and volume were constantly updated. if particle_i is valid then 6.
%Calculate next step's water depth and the smooth length

Velocity Calculations
In order to calculate the acceleration ( a ) caused by internal force, the gradient of velocity and water depth had to be calculated, and the kernel function was adopted to complete this task as shown below: where p i is the depth/velocity of particle i. The calculation conducted for this step is displayed in Algorithm 5. If a variable time-step was implemented, the next step was to calculate the time-step according to Equation (9). The calculation framework of the time step included a loop and no sub-routine. It was found that the speed-up of time step parallel computing was less than 2 to perform serially variable time step calculations. if particle_i is valid then 6.
!First conduct matrix for gradient correction 7.

Calculation of Fluid Particle Acceleration, Riverbed Scouring, Speed, and Displacement
The acceleration ( t ) caused by the internal force was firstly calculated, followed by the riverbed gradient and the total acceleration ( a ). After these calculations, the velocity and the displacement of fluid particles needed to be regularly updated as shown in the calculation framework Algorithm 6.
Algorithm 6: Calculation of acceleration, velocity, and position. The Lagrangian equation of motion for a particle i is d/dt ∂L/(∂v_i )-∂L/(∂x_i )=0, where the Lagrangian functional L is defined in term of kinetic energy K and potential energy π as L = K-π, where π is a function of particles position but not velocity. ! ar(i) is used to calculate depth 9.

15.
Stage 4: Calculate velocity and position of fluid particle i 16. enddo

!$OMP END PARALLEL DO
If the open boundary was adopted, the displacement was updated, and it was checked whether it became a fluid particle or a buffer zone; if the case under investigation had a particle splitting zone, the fluid particles that meet the conditions identified were split [37]. Achieved this landmark, the calculation process was then completely repeated according to Algorithm 1.

Validation 1: 2-D Dry Bed Dam Break with Particle Splitting
In order to test the performance of the new computing SPH-SWE framework according to the CPU-OpenMP parallel computing, the open source case "2-D dry bed dam break with particle splitting", referred to as DBDB case [67,68], was considered. Initial conditions of this open source DBDB case were setup as follows: area of 2.6 m × 1.2 m, initial fluid particle layout of 0.8 m × 1.2 m, and initial water depth equal to 0.15 m, as shown in Figure 6.  Table 3 shows the number of fluid particles, virtual particles, and riverbed particles selected for the 3 cases tested (considering that no inflow and outflow conditions were set, the number of open boundary particles was maintained equal to 0). The particles arrangements in the three cases were calculated using the open source code [67,68] and the CPU-OpenMP parallel code (in terms of 1000 particles per thread and 2000 particles per thread) to ensure that the calculation results were consistent and comparison of each model's performance could be completed.
For t = 0, the instantaneous burst occurs and the fluid flows downstream. Figure 7 shows flow velocities for T = 1.2 s under the three fluid particle arrangements displayed in Table 3. It can be seen from Figure 7 that the more fluid particles there were, the more water flow characteristics and velocity distribution characteristics after dam break were seen. Figure 8 shows the comparison between numerical and experimental results for the three cases tested in Table 3.  Table 3 shows the number of fluid particles, virtual particles, and riverbed particles selected for the 3 cases tested (considering that no inflow and outflow conditions were set, the number of open boundary particles was maintained equal to 0). The particles arrangements in the three cases were calculated using the open source code [67,68] and the CPU-OpenMP parallel code (in terms of 1000 particles per thread and 2000 particles per thread) to ensure that the calculation results were consistent and comparison of each model's performance could be completed.
For t = 0, the instantaneous burst occurs and the fluid flows downstream. Figure 7 shows flow velocities for T = 1.2 s under the three fluid particle arrangements displayed in Table 3. It can be seen from Figure 7 that the more fluid particles there were, the more water flow characteristics and velocity distribution characteristics after dam break were seen. Figure 8 shows the comparison between numerical and experimental results for the three cases tested in Table 3.
It can be concluded that with the increase of fluid particles (Case 3), the error between the numerical and the experimental results was smaller.
Hence, bigger is the number of fluid particles computed, and more valuable are the water depth and velocity calculations for each time step and location across the dam system, therefore providing more support for the formulation of dam break mitigation plans. This also reflects the superiority of SPH-SWE model in dealing with large deformation and free surface problems.    It can be concluded that with the increase of fluid particles (Case 3), the error between the numerical and the experimental results was smaller.
Hence, bigger is the number of fluid particles computed, and more valuable are the water depth and velocity calculations for each time step and location across the dam system, therefore providing more support for the formulation of dam break mitigation plans. This also reflects the superiority of SPH-SWE model in dealing with large deformation and free surface problems.
The time and speedup of total time t(s) (Equation 14) required for calculating water depth, acceleration, speed, and displacement were quantified according to open source code and CPU-OpenMP parallel code (under different thread configurations) as shown in Table 4.
where t(k) represents the running time of open source code and t(c) represents the running time of the parallel code. In Table 4, R(t) represents the particle search time; C(t) represents the time to calculate the water depth; A(t) represents the time to calculate the acceleration, speed, and displacement; T(t) represents the total running time of the case (including t(k) and t(c)); the time unit is always seconds (s). It can be seen from the results that the number of particles computed was higher, and the parallel calculation was larger, based on CPU-OpenMP (Figure 9).  The time and speedup of total time t(s) (Equation (14)) required for calculating water depth, acceleration, speed, and displacement were quantified according to open source code and CPU-OpenMP parallel code (under different thread configurations) as shown in Table 4.
where t(k) represents the running time of open source code and t(c) represents the running time of the parallel code. In Table 4, R(t) represents the particle search time; C(t) represents the time to calculate the water depth; A(t) represents the time to calculate the acceleration, speed, and displacement; T(t) represents the total running time of the case (including t(k) and t(c)); the time unit is always seconds (s). It can be seen from the results that the number of particles computed was higher, and the parallel calculation was larger, based on CPU-OpenMP ( Figure 9).  It can be concluded that with the increase of fluid particles (Case 3), the error between the numerical and the experimental results was smaller.
Hence, bigger is the number of fluid particles computed, and more valuable are the water depth and velocity calculations for each time step and location across the dam system, therefore providing more support for the formulation of dam break mitigation plans. This also reflects the superiority of SPH-SWE model in dealing with large deformation and free surface problems.
The time and speedup of total time t(s) (Equation 14) required for calculating water depth, acceleration, speed, and displacement were quantified according to open source code and CPU-OpenMP parallel code (under different thread configurations) as shown in Table 4.
where t(k) represents the running time of open source code and t(c) represents the running time of the parallel code. In Table 4, R(t) represents the particle search time; C(t) represents the time to calculate the water depth; A(t) represents the time to calculate the acceleration, speed, and displacement; T(t) represents the total running time of the case (including t(k) and t(c)); the time unit is always seconds (s). It can be seen from the results that the number of particles computed was higher, and the parallel calculation was larger, based on CPU-OpenMP (Figure 9).   In parallel computing, S(p) (the speedup ratio) and E(p) (parallel efficiency) were important indexes to evaluate the parallel effect. S(p) was the ratio between the serial time and the multi-core parallel time when threads calculate (p) and solved the iteration at the same time, as follows: where, T s is the time spent by a single processor in the serial mode; T p is the time spent by threads (p) in the parallel mode. E(p) (parallel efficiency) is the ratio of the acceleration ratio to the number of CPU cores used in the calculation (and E(p)≤ 1), indicating the average execution efficiency of each processor. When the acceleration ratio was close to the number of cores, the parallel efficiency was higher, and the utilization rate of each thread was higher, as calculated below.  In order to check the parallel effect of CPU-OpenMP, case 3 was calculated with different threads. The acceleration ratio and parallel efficiency of different threads are shown in Table 5, and the performances of parallel algorithms are displayed in Figure 10. According to Table 5 and Figure 10, as the number of enabled threads increased, the speedup ratio, parallel efficiency and calculation time were affected by the following trends: (1) the calculation time decreases with the increase of threads, but when the number of online processes exceeded 10, the time-consuming reduction speed changed from fast to slow reaching towards a balance;

R(t) C(t) A(t) T(t) t(s)
(2) the acceleration ratio increased all the time, but the improvements varied, and after 10 threads, the increase rate was from fast to slow; (3) the parallel efficiency decreased with the increase of threads, but the decrease rate fluctuated. When calculating the number of 10 threads, the parallel efficiency started to be less than 90%; therefore, case 3 could allocate one thread according to 5000 particles, with the acceleration ratio of 7.47 and the parallel efficiency of 93.35%. According to Table 5 and Figure 10, as the number of enabled threads increased, the speedup ratio, parallel efficiency and calculation time were affected by the following trends: (1) the calculation time decreases with the increase of threads, but when the number of online processes exceeded 10, the time-consuming reduction speed changed from fast to slow reaching towards a balance; (2) the acceleration ratio increased all the time, but the improvements varied, and after 10 threads, the increase rate was from fast to slow; (3) the parallel efficiency decreased with the increase of threads, but the decrease rate fluctuated. When calculating the number of 10 threads, the parallel efficiency started to be less than 90%; therefore, case 3 could allocate one thread according to 5000 particles, with the acceleration ratio of 7.47 and the parallel efficiency of 93.35%.

Validation 2: 2-D Dam Break with A Rectangular Obstacle Located in the Downstream Area
In order to solidify the accuracy and advantages of the new SPH-SWE model proposed calculated by CPU-OpenMP, a second case has been considered for validation. This case involved the dam break flow with a rectangular obstacle located in the downstream area as shown in Figure 11  .

Validation 2: 2-D Dam Break with A Rectangular Obstacle Located in the Downstream Area
In order to solidify the accuracy and advantages of the new SPH-SWE model proposed calculated by CPU-OpenMP, a second case has been considered for validation. This case involved the dam break flow with a rectangular obstacle located in the downstream area as shown in Figure 11  . For this second validation scheme, the fluid particles were arranged according to the particle spacing of 0.01, 0.005, and 0.002 m. Figures 12 and 13 display the results for t = 0.74 s and t = 1.76 s. for the same particle spacing used by Gu et al., [2] (0.01 m) and increasing number of fluid particles involved. For this second validation scheme, the fluid particles were arranged according to the particle spacing of 0.01, 0.005, and 0.002 m. Figures 12 and 13 display the results for t = 0.74 s and t = 1.76 s for the same particle spacing used by Gu et al., [2] (0.01 m) and increasing number of fluid particles involved.
In Table 6, it is possible to check the number of particles involved in each simulation, and it can be noticed that the speed up of total time (t) slightly increased with the rise of particles (using 8 threads in all three processes) ( Figure 14).  Figure 12. Velocity distribution at 0.74 s for Case 4 (a1), 5 (b1) and 6 (c1) displayed in Table 6. Figure 12. Velocity distribution at 0.74 s for Case 4 (a1), 5 (b1) and 6 (c1) displayed in Table 6.  Figure 13. Velocity distribution at 1.76 s for Case 4 (a2), 5 (b2) and 6 (c2) displayed in Table 6.
In Table 6, it is possible to check the number of particles involved in each simulation, and it can be noticed that the speed up of total time (t) slightly increased with the rise of particles (using 8 threads in all three processes) ( Figure 14).   Table 6.  Table 7 unveils values of R(t), which represents the particle search time; C(t), which represents the time to calculate the water depth; and A(t), which represents the time to calculate the acceleration, speed, and displacement. It can be seen from the results (Figure 14) that when the number of particles computed was higher, the parallel calculation was slightly larger, based on CPU-OpenMP.  Table 7 unveils values of R(t), which represents the particle search time; C(t), which represents the time to calculate the water depth; and A(t), which represents the time to calculate the acceleration, speed, and displacement. It can be seen from the results (Figure 14) that when the number of particles computed was higher, the parallel calculation was slightly larger, based on CPU-OpenMP. The simulation results are consistent with the previous case to demonstrate the improvement made by the proposed SPH-SWE calculation framework. When analyzing the same parameters as in the paper by Gu et al., [2], 12,423 fluid particles were involved. Results displayed in Figure 15 confirmed that the agreement between numerical results and experimental datasets improved even when simulating an increase of the number of fluid particles. However, the improvements do not involve the entire domain because in some positions (for example (a) H4 gauge, x = 1.3 m-1.8 m; (b) H2 gauge, x = 3.50 m-4.5 m) there are still minor inaccuracies (caused by the truncation error of the kernel function and the processing of boundary particles in the SPH method), even with the model with 323,145 fluid particles; hence, future work will focus on improving the algorithm to progress the calculation accuracy.  The simulation results are consistent with the previous case to demonstrate the improvement made by the proposed SPH-SWE calculation framework. When analyzing the same parameters as in the paper by Gu et al., [2], 12,423 fluid particles were involved. Results displayed in Figure 15 confirmed that the agreement between numerical results and experimental datasets improved even when simulating an increase of the number of fluid particles. However, the improvements do not involve the entire domain because in some positions (for example (a) H4 gauge, x = 1.3 m -1.8 m; (b) H2 gauge, x = 3.50 m -4.5 m) there are still minor inaccuracies (caused by the truncation error of the kernel function and the processing of boundary particles in the SPH method), even with the model with 323,145 fluid particles; hence, future work will focus on improving the algorithm to progress the calculation accuracy.  Figure 15. Comparison between experimental and numerical results water depth datasets: (a) gauge H4 in [2] and [80]; (b) gauge H2 in [2] and [80].

Conclusions
Vacondio et al. [34][35][36][37] made an open source version called SWE-SPHysics, which has been optimized and adapted based on the hydrodynamics investigated by other researchers during the last decade [38][39][40][41][42][43][44][45][46][47]. However, despite continuous progress, there was still a limitation related to the computational efficiency when the number of particles to simulate is very large, and this aspect still needs to be improved.
To fill this gap, in this study, a new solution to the SPH-SWE model introduced by Vacondio et al. [34][35][36][37] was proposed, and it was validated against two open source case studies of a 2-D dry-bed dam break with particle splitting [67,68] and of a 2-D dam break with a rectangular obstacle downstream [2,83]. To test the computing performance against the first case study, when involving large numbers of particles, three cases, involving different particles numbers, were tested (case 1-4374 particles; case 2-9801 particles; case 3-38,801 particles). Furthermore, this paper adopted the parallel computing method of CPU-OpenMP that is applied to a single machine and a multi-core to calculate the new SPH-SWEs framework.
By applying this CPU-OpenMP method, results have confirmed that the computing speeds of case 1/case 2/case 3 were increased by 4.01 times/6.14 times/7.17 times, respectively, to compute the new solution framework of the SPH-SWE model proposed to the open source case study previously mentioned [67,68]. According to the new solution framework of SPH-SWE model, case 3, characterized by the highest number of particles, was also calculated by using different threads. It was found that the speedup ratio can reach 7.47 when the parallel efficiency was more than 90%, which fully proves the good calculation performance of the CPU-OpenMP parallel new SPH-SWE model. Additionally, in the new solution framework of the SPH-SWE model proposed, particle search was used as a separate module for parallel computing, which greatly improved the computing efficiency and could replace the meshless SPH-SWE model in the open source code [67,68].
Therefore, using CPU-OpenMP parallel computing demonstrated that the SPH-SWE model new framework can accurately and timely simulate the flood evolution after a dam break.
In future works, the SPH-SWE model can be put into existing clusters to achieve more threads and further improve the calculation efficiency. Furthermore, this would enable the possibility of introducing more effective new algorithms into the SPH-SWE model (i.e., debris flow or water pollution modules) in order to expand its application. Continuous development of technology aids the improvement of new tools to design and inspect more accurate solutions, and this is an area in continuous development that needs to be addressed to support local and national authorities in making decisions to mitigate drastic effects generated by dam break.