Next Article in Journal
Integrating Meso-Scale Habitat Modelling in the Multicriteria Analysis (MCA) Process for the Assessment of Hydropower Sustainability
Next Article in Special Issue
2D Numerical Modeling on the Transformation Mechanism of the Braided Channel
Previous Article in Journal
Identification of the Debris Flow Process Types within Catchments of Beijing Mountainous Area
 
 
Article
Peer-Review Record

Comparison of Shallow Water Solvers: Applications for Dam-Break and Tsunami Cases with Reordering Strategy for Efficient Vectorization on Modern Hardware

Water 2019, 11(4), 639; https://doi.org/10.3390/w11040639
by Bobby Minola Ginting * and Ralf-Peter Mundani
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Water 2019, 11(4), 639; https://doi.org/10.3390/w11040639
Submission received: 13 February 2019 / Revised: 20 March 2019 / Accepted: 21 March 2019 / Published: 27 March 2019

Round 1

Reviewer 1 Report

The authors present a strategy to carry out parallel shallow flow simulations. Specifically, they present a vectorization approach, and compare its performance for three numerical solvers to approximate the interfacial flux. The centered scheme is found to be the most efficient one. This is expected, as computing eigenstructure of the problem is computationally expensive and requires branching.

The results are presented in an organized manner. The language is clear and easy to follow. Arguments are supported by the presented data.

High-performance parallel computing is an important topic of numerical modeling in water resources. Hence, the topic is of interest to the readership of the journal.

My only comment would be to improve the readability of Figures 6, 9, 15. The HLLC line is not very visible. It may help to include a marker (perhaps a 'X' as in some other figures) to make the HLLC solution more visible.

Consequently, I recommend accepting the article for publication.

Author Response

Please see the attached file. Many thanks. 

Author Response File: Author Response.pdf

Reviewer 2 Report

This article presents a model that solves the 2D shallow water equations. For this, it implements three different solvers (Roe, HLLC and CU) that are compared in terms of precision and efficiency. In order to increase the efficiency, several solutions are presented in order to use the vectorization present in modern processors. With this the authors obtained remarkable speed-ups.


The paper is well written and the solutions presented are interesting and well explained, providing detailed information about its implementation. This work could be very useful for people working with SWE models that are interested in increasing the efficiency of their codes. More concretely, people aiming to use vectorization to improve their codes.


In my opinion, the following suggestions should be considered:


In line 145 there's a small typo. "Gudonov-type" should be changed for "Godunov-type".

In section 2, lines 146-147. It is stated that the NUFSAW2D code supports both structured and unstructured meshes. However, only structured meshes are used in this study. Does the vectorization work also for unstructured meshes? If it does, how does it perform?

In section 3.3. The Algorithm 6 is claimed to provide up to 48% of improvement for the vectorized code. However, the reader may be curious about the penalty that this new layout may suppose for non-vectorized hardware. I'm also concerned about if this layout could perform worse in certain cases where a high percentage of the cells remain dry most of the time.

Figure 9: the lines corresponding to numerical results are hidden by the observation points. The observation points should be placed behind the numerical lines like in the other plots.

Section 4.5: It's not clear how many processors have the AVX and AVX2 machines. It should be specified how many processors each machine has. Intel Xeon E5-2690 has 8 physical cores and 16 logical cores. Intel Xeon E5-2697 v3 has 14 physical cores and 28 logical cores. Some readers may think that the results were achieved with just one processor.

Section 4.5: It should be clarified that the Xeon Phi machine has a different hardware architecture. It supports the x86 instruction set but it is a many-core processor that bundles 64 low-power cores, each of them features a very weak floating point unit, so it relies on the AVX-512 unit to achieve the announced computing power. This is the cause of the impressive speed-ups obtained with the vectorization in this machine and the lower performance when not using vectorization compared with the AVX2 machine. And for this, this paper is specially interesting for people working with this kind of hardware. However, this should be clarified, as some readers may think that this is a regular AVX-512 processor like the Intel Xeon with Skylake microarchitecture.

The Intel Fortran compiler is known for its excellent support for vectorization. I'm curious if the authors performed some tests with any open source compiler like gfortran, that also supports OpenMP 4.0, and how it performs compared with the Intel one. Some readers that work with other compilers may wonder if the solutions presented are also applicable to their workflows.

Line 484: it is stated that, with the AVX-512 machine, 56 cores were used but in the rest of the paper it is stated as the 64 cores machine. It is unclear if 64 or just 56 threads were used to perform the tests.

In section 4.5.1: It is stated that for the performance measurements, the average values from the four cases tested were used. However, it is unclear if there were or not significative differences between those cases in terms of performance. There are cases that are more "vectorization-friendly" than others?

Author Response

Please see the attached file. Many thanks. 

Author Response File: Author Response.pdf

Back to TopTop