Next Article in Journal
A Simple Nozzle-Diffuser Duct Used as a Kuroshio Energy Harvester
Next Article in Special Issue
Numerical Description of Jet and Duct Ventilation in Underground Garage after LPG Dispersion
Previous Article in Journal
Modeling Freight Consolidation in a Make-to-Order Supply Chain: A Simulation Approach
Previous Article in Special Issue
Experimental and Numerical Simulation Study of Pressure Pulsations during Hose Pump Operation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Parallelization of a 3-Dimensional Hydrodynamics Model Using a Hybrid Method with MPI and OpenMP

1
Water Quality Assessment Research Division, Water Environment Research Department, National Institute of Environmental Research, Incheon 22212, Korea
2
Research and Development Institute, GeoSystem Research Corporation, Gunpo 15807, Korea
3
Water Environment Research Department, National Institute of Environmental Research, Incheon 22689, Korea
*
Author to whom correspondence should be addressed.
Processes 2021, 9(9), 1548; https://doi.org/10.3390/pr9091548
Submission received: 11 May 2021 / Revised: 19 August 2021 / Accepted: 24 August 2021 / Published: 30 August 2021

Abstract

:
Process-based numerical models developed to perform hydraulic/hydrologic/water quality analysis of watersheds and rivers have become highly sophisticated, with a corresponding increase in their computation time. However, for incidents such as water pollution, rapid analysis and decision-making are critical. This paper proposes an optimized parallelization scheme to reduce the computation time of the Environmental Fluid Dynamics Code-National Institute of Environmental Research (EFDC-NIER) model, which has been continuously developed for water pollution or algal bloom prediction in rivers. An existing source code and a parallel computational code with open multi-processing (OpenMP) and a message passing interface (MPI) were optimized, and their computation times compared. Subsequently, the simulation results for the existing EFDC model and the model with the parallel computation code were compared. Furthermore, the optimal parallel combination for hybrid parallel computation was evaluated by comparing the simulation time based on the number of cores and threads. When code parallelization was applied, the performance improved by a factor of approximately five compared to the existing source code. Thus, if the parallel computational source code applied in this study is used, urgent decision-making will be easier for events such as water pollution incidents.

1. Introduction

In areas with monsoon climates, including South Korea, changes in hydraulic characteristics throughout the year are significant because weather conditions such as precipitation and air temperature change throughout the year [1,2]. The flow rate is strong during the flood season, but weak during the dry season. During the summer, when the temperature is high, stratification develops and sometimes causes various environmental problems [3]. In particular, the multi-functional weirs built as part of the Four Major River Restoration Project increase the residence time and water depth, which deepens the stratification phenomenon.
Changes in hydraulic characteristics also affect water quality [4]. Therefore, the vertical layers need to be classified to precisely simulate hydraulics and water quality for these conditions. Furthermore, cyanobacteria, which are generated in large quantities during the summer, often have different distribution characteristics along the lateral direction of a river, and the water quality distribution worsens downstream around basic environmental facilities or confluences of major pollutant sources. Vertically integrated 2D or x-z 2D models are limited in their ability to reproduce these 3D variation characteristics. Therefore, the application of a 3D model with an appropriate vertical/horizontal resolution is required, but the computational requirements can increase significantly if a model is constructed using 3D high-resolution grids [5]. In particular, the computational requirements increase sharply if a long-term simulation is performed for an entire river, such as the Nakdong River. To overcome this problem, it is necessary to reduce the time required to conduct the calculations by adjusting the number of calculations or calculation intervals using certain techniques, such as parallel calculation, independent computations of a hydraulic-water quality model, or application of an implicit scheme.
In particular, research on parallelization is very important in the field of modeling. With advancements in computer performance, it is possible to simulate various phenomena on a large scale and over an extended period. However, the number of input data items required for the model has increased. Furthermore, the number of parameters of the model required for simulation has also increased. Most of the models currently in use are executed serially, resulting in the simulation taking a long time to complete [6]. The evolution of the model conversely slows down the process. To overcome this issue, the model must be parallelized. The large-scale hydrological model simulates the water resources, changes in water quality, and water circulation globally. Therefore, much data such as weather, climate, runoff, and topography are required for modeling. For this reason, previous studies have tried to reduce the simulation speed by applying a parallelization technique to a large-scale basin model. Neal et al. [7] applied three parallelization techniques based on OpenMP, message passing, and specialized accelerator cards to improve the simulation speed of a 2D flood inundation model that requires various input data. Rouholahnejad et al. [8] significantly reduced the time to calibrate parameters by parallelizing the Soil and Water Assessment Tool (SWAT) model. Liu et al. [9] improved the simulation speed by parallelizing the grid-based distributed hydrological model with OpenMP. Avesani et al. [6] developed HYPERstreamHS, which parallelizes a large-scale river basin model to efficiently consider various hydraulic structures. In particular, the parallel performance was better in the large-scale model than in the small-scale model [9,10].
Parallel calculation is an efficient calculation technique that uses multiple resources simultaneously through job allocation and data distribution via a communication method, whereby processes with separate local memories send or receive data for sharing. A parallel calculation should be divided into separate task fragments that can be solved simultaneously, and should be solved in less time using multiple calculation resources rather than a single one. For example, Barney [11] proposed a conceptual scheme for parallel computing performed by multiple processes by subdividing a problem. Parallel computing methods include message passing interfaces (MPIs) based on a distributed memory system [12], open multi-processing (OpenMP) based on shared memory [13], and a method based on manycore processors such as an Intel Xeon or a graphics processing unit (GPU). Each parallel computing method has advantages and disadvantages. OpenMP is simple to implement, but it has limited scalability, depending on the size of the shared memory system. In contrast, MPI is scalable for high-dimensional problems, but its computing efficiency tends to decrease as the number of CPU cores increases because of the increase in communication between the CPUs. Thus, a high-performance coding technique is required to solve this problem effectively. In the parallelization of manycore processors such as GPUs, the device dependency is high, and the performance is improved only for certain types of calculations. In recent years, studies have been actively conducted on hybrid techniques that combine the OpenMP and MPI methods [14,15,16,17,18,19,20]. Recently, there has been a study that applied a hybrid technique combining OpenMP and MPI to an urban flood model. It was confirmed that the parallelization ability was better than with OpenMP or MPI solely [21].
Among various process-based numerical models, the Environmental Fluid Dynamics Code (EFDC) was developed with the FORTRAN language by the Virginia Institute of Marine Science (VIMS) in the United States, and the Environmental Protection Agency (EPA) released a generalized vertical grid (GVC) version [22,23,24,25]. The numerical scheme employed in EFDC to solve the equations of motion uses a second-order accurate spatial finite difference on a staggered or C grid, and the EFDC model’s internal momentum equation solution, at the same time step as the external, is implicit with respect to vertical diffusion [25]. See reference [25] for more information about the model concept, etc.
Dynamic Solutions International (DSI) released the EFDC_DS (20100328 version), and has been continually updating the source code since then. The EFDC model is used worldwide, which demonstrates its performance and applicability.
Based on the EFDC_DS (20100328 version), the National Institute of Environmental Research (NIER) in South Korea has added more features, such as the operating function of hydraulic structures in the major rivers of South Korea; a simulation function of multi-algae species; a vertical migration mechanism for cyanobacteria; akinete generation and germination mechanisms; and a mechanism for the effect of salt and toxicity on freshwater and sea algae, wind stress, and benthic flux of inorganic nutrients according to changes in the oxidation/reduction conditions [26].
Ahn et al. [26,27] established a method for short-term prediction of algae by using an improved source code with an operating function of hydraulic river structures and of mechanisms for vertical migration of cyanobacteria and akinete creation and germination. They also proposed an optimal method for predicting algae by applying hyperspectral remote-sensing data to the EFDC-NIER model. The model, which the NIER has named EFDC-NIER, has improved the functions to suit the major rivers of South Korea, and it is used to support the policies for algae and water quality control of major rivers and lakes in South Korea. Because the number of calculations required for analysis has increased in various fields such as hydraulics, hydrology, water quality, and aquatic ecosystems, a parallel computational code needs to be applied to increase the utilization of policies through fast decision-making.
In this study, we applied a hybrid parallel computational code constructed using the OpenMP and MPI methods to the EFDC-NIER model, and compared the calculation times required for the existing version and the parallel computational code version. We also compared the simulation results for the existing EFDC model and the parallel computational code model to check if they corresponded. Furthermore, the optimal parallel combination was determined through a comparative evaluation of the simulation time based on the number of cores and threads.

2. Materials and Methods

2.1. Research Trend in Parallel Calculation of EFDC Model

The EFDC is a general-purpose modeling package used for the simulation of 3D flow, material transport, and biochemical processes in systems such as rivers, lakes, estuaries, reservoirs, wetlands, and coasts. The EFDC model is open-source software developed by John Hamrick at the Virginia Institute of Marine Science (VIMS, Gloucester point, VA, USA) [28]. The EFDC is one of the models recommended by the US EPA for the management of the total maximum daily load (TMDL), and the US EPA continues to support its development. The EFDC has been extensively tested and used by many researchers in several modeling studies.
DSI has developed a version of the code that streamlines the modeling process and connects it to the pre- and post-processing tools of DSI [29]. DSI has developed and commercialized the EFDC_DSI OpenMP code by applying the OpenMP library for parallel computing. Figure 1 shows the comparison between the simulation times when the EFDC_DSI OpenMP is applied [30]. The model using an octa-core processor reduced the execution time of all subroutines by approximately 75% compared to the model that used a single-core processor (the dotted line on the left). However, although the calculation time of the subroutine that simulated the water quality response mechanism decreased slightly when a dual-core processor was used, the calculation time was similar to that when using a single-core processor. As the number of processors increased (the dotted line on the right), the calculation time did not significantly decrease.
IBM used the MPI library to parallelize the GVC version of the EFDC model released by the EPA [31] and published it on GitHub [32]. For continuous management during a parallelization operation, the setup process for parallel execution was automated (1) to limit the changes in a large number of source files to avoid computational errors, (2) to ensure that the results of serial calculations and parallel calculations match, and (3) to ensure that the originally configured serial model runs properly in the parallel code. Figure 2 shows the parallel efficiency as a function of the number of processors using IBM’s parallelized code. When six processors were used, the parallel efficiency was 50%, and when 25 processors were used, an efficiency of 40% was achieved. However, the rate of increase in the efficiency fell when more than six processors were used. The parallel efficiency for the water quality response calculation was unknown because it was calculated by parallelizing only the hydraulic module from the tidal current model of Galway Bay.

2.2. Development of Parallel Computational Code

As described above, DSI performed parallelization using the OpenMP method, and IBM performed parallelization using the MPI method. IBM performed parallelization only on the subroutines related to the hydraulic calculations. Therefore, in this study, we aimed to apply the parallel code to all of the EFDC-NIER model based on a hybrid method of applying both OpenMP and MPI. Because both MPI and OpenMP have advantages and disadvantages, the hybrid method can utilize the advantages of both while minimizing their drawbacks. In hybrid parallel programming, an intensive calculation is performed at a single node using the OpenMP method, and a large number of calculations are performed by executing communications between different nodes over the network based on the MPI method. In 2019, a research team at the University of Bialystok in Poland conducted a study on the K-means parallelization algorithm. Statistics such as the vector sum and count were calculated using OpenMP at each MPI node, and after gathering them at the master node, the remaining statistical calculations were completed and delivered to each processor to continue the next calculation. Similarly, four algorithms were parallelized, and the results of the computational experiments showed that all these algorithms were superior to the conventional Lloyd algorithm in terms of computing time [33].
MPI is a library specification containing standardized subroutines and functions for handling communication [34], whereas OpenMP is a shared memory model and an add-on to a compiler. One of the advantages of OpenMP is that there is no communication between nodes, but the disadvantage is that the user’s desired calculation time cannot be secured when large calculations are performed because a single node cannot be extended infinitely. In contrast, in the case of MPI, calculation nodes can be added to secure the calculation speed desired by the user.
Parallel code development was carried out in two stages: Sequential code optimization and parallel code creation. Sequential code optimization refers to the process of maximizing the performance of the sequential code before parallelizing the code (Table A1). The optimization was performed by finding unnecessary statements through measuring the precise computation time for each computational statement in the subroutines defined earlier by hotspot analysis. A DEBUG variable was added to the CALUVW subroutine to compose a log fine (CFL.OUT) when DEBUG was required, and the variable I/O was reduced in EEXPOUT by reducing the loop depth to 3 (NSP, K, L) when writing the result.
Parallel codes were developed using a hybrid parallel programming method that included both OpenMP and MPI. They were created by adding statements used by both parallel libraries, and parallelization was performed for approximately 40 source codes. For parallel code development, we chose the hybrid parallel programming approach and developed a technique for dividing the computational area according to the processor used. The MPI parallel code was composed by partitioning the LA variable (the number of calculation grids) corresponding to the 2D index in a Do loop, and the OpenMP parallel code was composed using a thread fork-join approach with the partitioned indices. To aid in the understanding of how the hybrid parallel code is created, Table A2 shows an example of a parallel code. First, we designed it such that the statements repeated from 2 to LA in the Do loop of the sequential code would be repeated from LMPI2 to LMPILA and partitioned according to the MPI rank in the parallel code. We then parallelized each Do loop using !$OMP PARALLEL DO, OpenMP Directive. The RS8Q variable is configured to calculate the sum of all indices by simultaneously using OpenMP’s REDUCTION and MPI’s ALLREDUCE.
We developed MPI functions for area partitioning and communication according to the number of MPI nodes, and Table 1 presents the specific roles of each function. Among them, the BROADCAST_BOUNDARY function was developed to communicate the updated boundary values between the MPI nodes to and from adjacent nodes. The COLLECT_IN_ZERO function transmits variables from all nodes to the master node in order to output the results and analyze the consistency at the master node. For the data type used in the MPI communication, a user-defined data type (MPI_TYPE_VECTOR) was used to reduce the frequency and amount of communication. For the topology, a single topology without partitioning was used. MPI-only functions that handled the communication between nodes were inserted to send and receive the boundary values for a minimal number of times; otherwise, a communication load would occur.

2.3. Parallel Calculation Test Model Sets

To evaluate the parallelization performance and consistency of the EFDC-NIER according to the number of grids, we used the model sets presented in Table 2 and Table 3. Measuring the consistency of the parallel code involves evaluating whether the results for the serial and parallel calculations match.

3. Results and Discussion

3.1. Source Code Analysis for the EFDC-NIER Model

In models such as EFDC-NIER, which rely on data-intensive calculations, it is important to identify the time-consuming calculation codes in the program before creating the parallelization code. Such a time-consuming function or section is called a hotspot, and it is crucial to clearly identify hotspots to perform optimization and parallelization for overall performance improvement.
In this study, we identified 20 subroutines and calculation statements as hotspots among approximately 250 subroutines executed in the source code of EFDC-NIER. Table 4 presents the names of the subroutines identified as hotspots, their execution times, and the proportions of the total execution time for each subroutine. The HDMT2T subroutine is the main execution component of the numerical operations, and for the 20 hotspots, the proportions are calculated by comparing their execution time to the execution time of HDMT2T. Twelve subroutines accounted for more than 1% of the total execution time: Water quality component simulation (WQ3D) > flow rate/direction component simulation (CALUVW) > concentration (water temperature, sediment) simulation (CALCONC) > subroutine description (CALQQ2T) > subroutine description (CALEXP2T). The proportions of these subroutines were high because they simulated 3D variables.

3.2. Parallel Performance Evaluation

For the parallel performance evaluation, we compared the execution times of the model for the cases presented in Table 2 and examined the execution times and performance improvement rates of the major subroutines for each parallel combination.
When the simulation times for each case were compared, the source code optimization and parallel code application resulted in a performance improvement by a factor of approximately five compared to Case 1 (Table 5). When the optimization was performed for EEXPOUT (shown in Table A1), the time decreased by approximately 1.5 h, indicating that a significant amount of time was spent in the writing method. A comparison of OpenMP and MPI showed that the MPI method required 0.05 h less in the simulation time. However, in the case of personal computers, it would be sufficient to use OpenMP because the core resource is limited.
For the combination of OpenMP and MPI, we compared the simulation times by increasing the number of OpenMP threads from one to six and decreasing the number of MPI nodes from five to one in each case. When four threads of OpenMP and two nodes of MPI were applied, the fastest simulation was executed in 0.65 h. It is difficult to determine whether a certain combination of OpenMP thread count and MPI node count is sufficient, and an appropriate combination that accounts for the specifications of the user’s computer should be determined.
We conducted this study using a Windows PC, which offers versatility. OpenMP can use only the cores corresponding to one computer, whereas MPI can utilize all cores of multiple computers. Therefore, the performance improvement will increase further if the software is applied to a Linux-based supercomputer. Appendix B describes the evaluation results for the parallel computing performance in a Linux-based cluster, and Appendix C describes the evaluation results for the parallel computing performance of the OpenMP and MPI methods.

3.3. Consistency Evaluation

In the process of parallelizing the sequential code, two categories can be considered for the factors that violate the consistency of the parallelization code. A complete understanding of the code is required before developing the parallelization code. If parallel programming directives are added without sufficient understanding of the algorithm, the variables that need to be communicated may not be communicated, resulting in improper synchronization. Additionally, subroutines that are calculated at a certain period may not be recognized, resulting in the omission of their parallelization. These cases imply that the parallel code has not been written properly, and a sufficient understanding of the code should be gained before proceeding.
In the second case, because a difference occurs in the order of calculations in the process of parallelizing the sequential code, a rounding error may occur owing to the limitations of floating-point arithmetic operations. Rounding errors occurred in the CONGRAD subroutine in the development process of the parallelization code for EFDC-NIER. To resolve this problem, we changed the types of variables (RPCG, PAPCG, RPCGN, ALPHA, and BETA) used for the calculations from the REAL 4-byte type to the REAL 8-byte type. The consistency of the parallel code means that the parallel execution results in the simulation are the same as the results of the sequential model for the simulation domains and the simulation options that the user can use. We considered the errors of the floating-point arithmetic operations when determining whether the simulation produced the same results. The consistency was evaluated by comparing the sum of the absolute values of all matrix values of the variables simulated up to a certain prediction period based on the variables that were written to the results file. As shown in Table 6 and Table 7, the consistency evaluation results confirm that the results exactly match the sequential execution of the parallel code, execution of the OpenMP, and execution of the MPI.

4. Conclusions

In this study, we optimized the source code of the EFDC-NIER model, which has been enhanced by the National Institute of Environmental Research (NIER) since 2010, and applied a parallel computational code. Then, we examined the consistency of the results between the existing EFDC model and the parallel computational code simulations and compared their calculation times. Furthermore, we determined an optimal parallel combination through a comparative evaluation of the simulation times between the number of cores and threads. The major findings of this study are as follows.
(1)
The source code optimization and parallel code application resulted in a performance improvement by a factor of approximately five compared to the existing source code (Case 1). In the case of the existing EFDC, a large amount of time was consumed by a subroutine that wrote the results, and when this was improved, it took approximately half as much time for the calculation. As shown in Appendix B, the parallel calculation performance of the OpenMP and MPI methods applied in this study showed a similar level of performance as the results of the version developed and released by DSI. Therefore, the parallel calculation of the EFDC-NIER is better than or on par with that of the EFDC+ MPI, especially considering that its improvement values include the simulation results of the water quality factors.
(2)
For a Windows PC, there is no difference in the reduction of the calculation speed between the OpenMP and MPI methods because the core and thread resources are limited. However, as shown in Appendix C, when using a Linux server, the simulation is performed more ideally when the MPI method is used compared to the OpenMP method. In the case of the hybrid method that uses both the OpenMP and MPI methods, the optimal computing combination should be applied according to the performance and computing resources of the computer on which the simulation will be performed.
(3)
In the case of South Korea, algae prediction information for the water supply source sections of large rivers is sent to the water quality managers at eight-day intervals. When predicting the algae eight days into the future, all prediction work must be completed within 8 h. The fastest simulation’s calculation time (0.65 h) is a very important factor; if the optimization and the parallel computational source code applied in this study are used, quick calculation will be facilitated when urgent decision-making is required for an event such as a water pollution incident.

Author Contributions

Conceptualization, J.M.A.; methodology, J.M.A., H.K. and J.K.; software, J.M.A.; validation, J.M.A. and J.K.; formal analysis, J.M.A. and J.K.; investigation, J.M.A., H.K., J.G.C. and T.K.; resources, T.K. and Y.-s.K.; data curation, J.M.A., Y.-s.K. and J.K.; writing—original draft preparation, J.M.A.; writing—review and editing, J.K.; visualization, H.K. and J.G.C.; supervision, J.K.; project administration, J.M.A. and J.K.; funding acquisition, T.K. and Y.-s.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a grant from the National Institute of Environmental Research (NIER) (grant number NIER-2020-01-01-012), which is funded by the Ministry of Environment (MOE) of the Republic of Korea.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

This study was supported by a grant (NIER-2020-01-01-012) from the National Institute of Environmental Research (NIER), which is funded by the Ministry of Environment (MOE) of the Republic of Korea.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Examples of source code optimization.
Table A1. Examples of source code optimization.
Variables Code Description
1.
CALUVW
ImprovedIF (ISCFL.GE.1.AND.DEBUG) THEN
IF (MYRANK.EQ.0) THEN
OPEN (1, FILE = ‘CFL.OUT’, STATUS = ‘UNKNOWN’, POSITION = ‘APPEND’)
ENDIF
IF (MYRANK.EQ.0) THEN
IF (ISCFL.EQ.1) WRITE (1,1212) DTCFL, N, ICFL, JCFL, KCFL
IF (ISCFL.GE.2.AND.IVAL.EQ.0) WRITE (1,1213) IDTCFL
ENDIF
ENDIF
2.
EEXPOUT
ConventionalDO NSP = 1, NXSP
DO K = 1, KC; DO L = 2, LA
WQ = WQVX(L, K, NSP)
WRITE (95) WQ
ENDDO; ENDDO
ENDDO
ImprovedDO NSP = 1, NXSP
WRITE (95) WQVX (:,:,NSP)
ENDDO
Table A2. Example of hybrid parallel code application.
Table A2. Example of hybrid parallel code application.
CategoryDescription
Sequential codeDO L = 2,LA
RCG_R8(L) = CCC(L)*P(L) + CCS(L)*PSOUTH(L) + CCN(L)*PNORTH(L)
& + CCW(L)*P(L-1) + CCE(L)*P(L + 1)-FPTMP(L)
ENDDO
DO L = 2,LA
RS8Q = RS8Q + RCG_R8(L)*RCG_R8(L)
ENDDO
Parallel code!$OMP PARALLEL DO
DO L = LMPI2,LMPILA
RCG_R8(L) = CCC(L)*P(L) + CCS(L)*PSOUTH(L) + CCN(L)*PNORTH(L)
& +CCW(L)*P(L-1) + CCE(L)*P(L + 1)-FPTMP(L)
ENDDO
!$OMP PARALLEL DO REDUCTION(+:RS8Q)
DO L = LMPI2,LMPILA
RS8Q = RS8Q + RCG_R8(L)*RCG_R8(L)
ENDDO
CALL MPI_ALLREDUCE(RS8Q,MPI_R8,1,MPI_DOUBLE,
& MPI_SUM,MPI_COMM_WORLD,IERR)
RS8Q = MPI_R8

Appendix B

DSI, the development and distribution organization of the EFDC model and the EFDC-Explorer, presented the results of the development and performance of the MPI-based EFDC+ in July 2020 [35]. The previous EFDC+ performed parallelization using OpenMP, and at the time, the calculation speed improvement was a factor of 2.5. This is similar to the performance improvement factor of the OpenMP performed in this study, which is approximately a factor of two to three. In the case of OpenMP, improvement in speed is impossible without the improvement of the CPU because it is affected by the performance of the computing resources. However, MPI is more favorable for speed calculation improvement than OpenMP because it is possible to use distributed computing resources, although the improvement varies depending on the constructed model. As a result of applying the source code optimization and the OpenMP and MPI methods in this study, the calculation speed improved by a factor of approximately five. In the case of OpenMP, the calculation speed improvement was similar for different numbers of calculation grids, but in the case of MPI, the calculation speed improved as the number of grids increased.
In addition, we compared the results of using similar processors to evaluate the MPI parallel code of EFDC+, developed by DSI. Table A3 presents an overview of the model and specifications of the hardware used. The number of horizontal grids is approximately 120,000 in EFDC-NIER and approximately 200,000 in EFDC+. However, the number of vertical layers was 11 in EFDC-NIER and four in EFDC+. DSI’s EFDC+ provides the simulation results for only the hydraulic factors, and the EFDC-NIER produces the results by performing parallelization of the hydraulics and the water quality. Table A4 presents the performance improvement in the calculation speed for each MPI processor. Because numerical results are not provided in the report for EFDC+, we compared the results digitized from the speed improvement graph.
The speed improvement comparison results of the same MPI processor for EFDC-NIER and EFDC+ were similar, showing a speed improvement of a factor of four in four processors. When 16 processors were used, the improvement was a factor of 11.77 and 11 for EFDC-NIER and EFDC+, respectively, indicating that EFDC-NIER was slightly superior. When 32 processors were used, the improvement was a factor of 15 and 17 times for EFDC-NIER and EFDC+, respectively, demonstrating that EFDC+ was superior. In the case of EFDC+ MPI, the performance improvement effect increased as the number of processors increased. However, the speed improvement varied according to the model configuration and hardware specifications. We determined that EFDC-NIER is better than or similar to EFDC+ MPI, especially considering that its improvement values include the simulation results of the water quality factors.
Table A3. Overview of EFDC-NIER and EFDC+ models and the hardware used.
Table A3. Overview of EFDC-NIER and EFDC+ models and the hardware used.
CategoryEFDC-NIEREFDC+
Study areaNakdong River
Processes 09 01548 i001
Chesapeake Bay
Processes 09 01548 i002
Number of horizontal grids116,473204,000
Vertical layers114
Simulation factorsHydraulics, water temperature, salinity, water quality, suspended sedimentsHydraulics, water temperature, salinity
CPUXeon E5-2650 v3 ×2Xeon Platinum 8000 Series
Number of cores10 (×2)36
Number of nodes104
Table A4. Performance improvement in calculation speed by MPI processor.
Table A4. Performance improvement in calculation speed by MPI processor.
CategoryMPI Processor (Unit for Speed Improvement: Times)
481632
EFDC-NIER3.977.2911.7715.01
EFDC+471117

Appendix C

For the performance evaluation of the parallel code, we performed OpenMP, MPI, and hybrid parallel performance evaluation for the model sets displayed in Table A3. The performance evaluation test was conducted using up to four nodes and 48 CPUs, considering the size of the model area. It was performed based on a 1-h computation, after which the calculation times were compared. Here, the combinations of the processors used in the OpenMP, MPI, and hybrid parallel performance evaluations were determined by considering the number of model grids and the hardware used in the experiment.
Table A5 shows the cluster for the parallel performance evaluation. The cluster consists of one login node and ten calculation nodes. Each calculation node is equipped with two units of the Intel Xeon CPU E5-2650 in the CentOS 6.7 operating system, and the data communication is based on the Fourteen Data Rate (FDR) InfiniBand. Intel Parallel Studio 2017.1.043 was used for the creation and execution of the hybrid parallel program, and the combination of Intel OpenMP and Intel MPI 2017.1.132 was used for the OpenMP and MPI libraries, respectively. In the case of OpenMP, up to 20 processors can be used because 20 cores are configured at the maximum per node. However, the test was conducted using only up to 16 processors because the basic process of the operating system was running. In the case of MPI and the hybrid method, the processors were configured with a combination based on the number of nodes and the number of cores per node, and the test was conducted using 32 or 40 processors depending on the number of model grids.
Figure A1 and Figure A2 show the ideal and actual values of the calculation times when the execution times of the model are compared. Here, the ideal value is the theoretical calculation time based on the increase in the number of calculation processors. For the ideal value, we applied the performance improvement–time ratio equation, which is known as Amdahl’s law [36] and is expressed as
S = 1 ( 1 f ) + f / n
where S is the improvement ratio of the calculation speed, f is the proportion of the total calculation time that the improved part occupies, and n is the number of processors. For convenience of comparison, we assumed that f was 1. Furthermore, we applied Equation (A1) by expressing Equation (A2) as the reduction ratio of the execution time:
T C T = ( ( 1 f ) + f n ) × A C T = 1 n × A C T ,
where TCT is the theoretical calculation time and ACT is the actual calculation time.
Table A5. Specification of the parallel performance evaluation system.
Table A5. Specification of the parallel performance evaluation system.
CategoryCluster
CPUProductIntel Xeon CPU E5-2650 v3 * 2 ea
#Cores10 (Total 20)
Frequency2.30 GHz
Cache25 MB
Instruction64-bit
ExtensionIntel AVX2
Memory64 GB
OSCentOS release 6.7
NetworkInfiniBand ConnectX-3 VPI FDR, IB (56 Gb/s)
CompilerIntel Parallel Studio 2017.1.043
OpenMPIntel OpenMP
MPIIntel MPI 2017.1.132
(1)
OpenMP Parallel Performance Evaluation.
In the OpenMP parallel computation evaluation, the performance was analyzed based on the number of OpenMP threads. Because OpenMP was scalable on a single node, eight experimental combinations of OpenMP threads were configured from 1 to 16. Table A6 and Figure A1 present the execution time for each function based on the number of OpenMP threads. As the number of threads increased, the OpenMP parallel calculation did not exhibit a calculation speed improvement close to the ideal value.
When examined based on the total time (HDMT2T), two threads showed a performance improvement of a factor of 1.7, and 12 threads showed an increase by a factor of 3.1. For most functions, the execution time decreased as the number of threads increased, but the increase was significantly lower.
Table A6. Execution time for each function based on the number of OpenMP threads.
Table A6. Execution time for each function based on the number of OpenMP threads.
CategoryNumber of OpenMP Threads
0102030406081216
HDMT2T656,953385,124308,917259,871231,408221,675215,195210,575
CALAVB21,82311,475801364455056428234253083
CALTSXY17221053802746726743751752
CALEXP2T52,02429,06121,94018,94817,60617,20116,83216,910
CALCSER10061013114810221059109110881085
CALPUV2C19,74613,19410,79098149245897687428842
ADVANCE90407050639363526367632362666228
CALUVW54,45732,16125,36022,69220,16718,57317,32316,409
CALCONC92,62153,05542,78936,72732,87331,40530,16829,506
SEDIMENT19311001712557425360294278
WQ3D283,800169,636141,214114,729101,73398,84798,50096,071
CALBUOY61024293382337163672361936373691
NLEVEL1781971678549486489491505
CALHDMF37,46819,85814,26711,5809188802266876335
CALTBXY10,3915492376829732278185714461257
QQSQR306316671199997786673559555
CALQQ2T55,88929,63521,59317,56115,24214,57714,26214,231
SURFPLT1617171718202124
VELPLTH3837383739434855
SALPTH726824827861864878862883
EEXPOUT31843499341634233444344834833536
Figure A1. Execution time as a function of the number of OpenMP threads (detailed grid #2 of Nakdong River).
Figure A1. Execution time as a function of the number of OpenMP threads (detailed grid #2 of Nakdong River).
Processes 09 01548 g0a1
(2)
MPI Parallel Performance Evaluation.
In the MPI parallel computation evaluation, the performance was analyzed based on the number of MPI nodes. Because MPI was scalable on multiple nodes, four calculation nodes were used to configure seven experimental combinations for the number of MPI nodes from 1 to 40. Table A7 and Figure A2 present the execution times for each function based on the number of MPI nodes. The MPI parallel computation showed that the calculation speed improvement approached the ideal value as the number of nodes increased.
When examined based on the total time (HDMT2T), a performance improvement of a factor of 4.0 was exhibited with four MPI nodes, 11.8 with 16 MPI nodes, and 16.8 with 40 MPI nodes, reaching the maximum improvement. The MPI communication time (COMMUNICATION) did not increase as the number of MPI nodes increased; the elapsed time was 6605 s, which was the maximum when the number of MPI nodes was 12. In the MPI parallel performance, the parallel scalability of CALTSXY and CALPUV2C was not high. In contrast, CALAVB, CALUVW, CALCONC, and WQ3D, which have extensive calculations, showed larger performance improvements as the number of MPI nodes increased.
Table A7. Execution time for each function based on the number of MPI nodes (detailed grid #2 of Nakdong River).
Table A7. Execution time for each function based on the number of MPI nodes (detailed grid #2 of Nakdong River).
CategoryNumber of MPI Nodes
1248163240
HDMT2T656,953323,191165,61490,13255,81443,75940,212
CALAVB21,82310,987560628161435857703
CALTSXY1722860487290179124108
CALEXP2T52,02425,49012,6556546354123652006
CALCSER100610189951015112219961936
CALPUV2C19,746995955173125202016281733
ADVANCE9040420521261107656592456
CALUVW54,45726,80613,8856912373923501879
CALCONC92,62146,99423,97313,098797771686571
SEDIMENT19319544772391247257
WQ3D283,800130,92663,21430,84315,65998988316
CALBUOY610231231623765404321252
NLEVEL178129314676413023
CALHDMF37,46817,83512,250958710,34791459478
CALTBXY10,391442022061105577332274
QQSQR3063144473237519712299
CALQQ2T55,88927,15213,6737129376223491920
SURFPLT16181717171919
VELPLTH38393738374246
SALPTH72637618388555540
EEXPOUT3184364134743438348139803988
COMMUNICATION0660622911497413278272
Figure A2. Execution time as a function of the number of MPI nodes.
Figure A2. Execution time as a function of the number of MPI nodes.
Processes 09 01548 g0a2
(3)
Hybrid Parallel Performance Evaluation.
In the evaluation of the hybrid parallel computation, the performance was analyzed based on the combinations of MPI and OpenMP threads. For quantitative evaluation, we composed four experimental hybrid combinations by fixing the number of utilizable CPUs to 40. In each case, the number of MPI nodes was increased from 4 to 8, 20, and 40. At the same time, the number of OpenMP threads was reduced from 10 to 5, 2, and 1. These particular numbers were chosen so that if the number of MPI nodes was multiplied by the number of OpenMP threads, the result would be 40.
As shown in Table A8, the best performance was achieved when the number of MPI nodes was eight and the number of OpenMP threads was five. When the computational time was examined for each function, no particular combination was superior. No consistency was observed in terms of time. For example, CALEXP2T and CALPUV2C showed the shortest execution time when the number of MPI nodes was eight and the number of OpenMP threads was five. However, in a combination of 40 MPI nodes and one OpenMP thread, the functions that require long computational times, such as CALUVW, SEDIMENT, and WQ3D, exhibited the best performance.
Table A8. Execution time for each function based on the hybrid combination.
Table A8. Execution time for each function based on the hybrid combination.
CategoryHybrid (MPI + OpenMP) Combination (40 cpu)
MPIOMPMPIOMPMPIOMPMPIOMP
41085202401
HDMT2T54,45739,30041,80040,212
CALAVB975752765703
CALTSXY213122135108
CALEXP2T3411192121262006
CALCSER1078107913071936
CALPUV2C3224172421281733
ADVANCE1350515436456
CALUVW3293228322131879
CALCONC8102585158166571
SEDIMENT155887357
WQ3D17,24011,10910,0908316
CALBUOY1016357245252
NLEVEL122422723
CALHDMF4039579488829478
CALTBXY324279263274
QQSQR15210410098
CALQQ2T3738206522951920
SURFPLT20191919
VELPLTH43454346
SALPTH178573240
EEXPOUT3625358435163988
COMMUNICATION210114771256272

References

  1. Thornton, K.W.; Kimmel, B.L.; Payne, F.E. Reservoir Limnology-Ecological Perspectives; A Wiley Interscience Publication, John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1990. [Google Scholar]
  2. Winston, W.E.; Criss, R.E. Geochemical variations during flash flooding, Meramec River basin. J. Hydrol. 2000, 265, 149–163. [Google Scholar] [CrossRef]
  3. You, K.A.; Byeon, M.S.; Hwang, S.J. Effects of hydraulic-hydrological changes by monsoon climate on the zooplankton community in lake Paldang, Korea. Korean J. Limnol. 2012, 45, 278–288. [Google Scholar]
  4. Jung, W.S.; Kim, Y.D. Effect of abrupt topographical characteristic change on water quality in a river. KSCE J. Civ. Eng. 2019, 23, 3250–3263. [Google Scholar] [CrossRef]
  5. Bomers, M.; Schielen, R.M.J.; Hulscher, S.J.M.H. The influence of grid shape and grid size on hydraulic river modelling performance. Environ. Fluid Mech. 2019, 19, 1273–1294. [Google Scholar] [CrossRef] [Green Version]
  6. Avesani, D.; Galletti, A.; Piccolroaz, S.; Bellin, A.; Majone, B. A dual-layer MPI continuous large-scale hydrological model including Human Systems. Environ. Model. Softw. 2021, 139, 105003. [Google Scholar] [CrossRef]
  7. Neal, J.; Fewtrell, T.; Bates, P.; Wright, N. A comparison of three parallelisation methods for 2D flood inundation models. Environ. Model. Softw. 2010, 25, 398–411. [Google Scholar] [CrossRef]
  8. Rouholahnejad, E.; Abbaspour, K.C.; Vejdani, M.; Srinivasan, R.; Schulin, R.; Lehmann, A. A parallelization framework for calibration of hydrological models. Environ. Model. Softw. 2012, 31, 28–36. [Google Scholar] [CrossRef]
  9. Liu, J.; Zhu, A.X.; Liu, Y.; Zhu, T.; Qin, C.Z. A layered approach to parallel computing for spatially distributed hydrological modeling. Environ. Model. Softw. 2014, 51, 221–227. [Google Scholar] [CrossRef]
  10. Neal, J.; Fewtrell, T.; Trigg, M. Parallelisation of storage cell flood models using OpenMP. Environ. Model. Softw. 2009, 24, 872–877. [Google Scholar] [CrossRef]
  11. Lawrence Livermore National Laboratory. Available online: https://computing.llnl.gov/tutorials/parallel_comp/ (accessed on 5 April 2021).
  12. Gropp, W.; Lusk, E.; Doss, N.; Skjellum, A. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 1996, 22, 789–828. [Google Scholar] [CrossRef]
  13. Chapman, B.; Jost, G.; Van Der Paas, R. Using OpenMP: Portable Shared Memory Parallel Programming; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
  14. Eghtesad, A.; Barrett, T.J.; Germaschewski, K.; Lebensohn, R.A.; McCabe, R.J.; Knezevic, M. OpenMP and MPI implementations of an elasto-viscoplastic fast Fourier transform-based micromechanical solver for fast crystal plasticity modeling. Adv. Eng. Softw. 2018, 126, 46–60. [Google Scholar] [CrossRef]
  15. Jiao, Y.; Zhao, Q.; Wang, L.; Huang, G.; Tan, F. A hybrid MPI/OpenMP parallel computing model for spherical discontinuous deformation analysis. Comput. Geotech. 2019, 106, 217–227. [Google Scholar] [CrossRef]
  16. Ouro, P.; Fraga, B.; Lopez-Novoa, U.; Stoesser, T. Scalability of an Eulerian-Lagrangian large-eddy simulation solver with hybrid MPI/OpenMP parallelisation. Comput. Fluids 2019, 179, 123–136. [Google Scholar] [CrossRef]
  17. Zhou, H.; Tóth, G. Efficient OpenMP parallelization to a complex MPI parallel magnetohydrodynamics code. J. Parallel Distrib. Comput. 2020, 139, 65–74. [Google Scholar] [CrossRef]
  18. Klinkenberg, J.; Samfass, P.; Bader, M.; Terboven, C.; Müller, M.S. Chameleon: Reactive load balancing for hybrid MPI + OpenMP task-parallel applications. J. Parallel Distrib. Comput. 2020, 138, 55–64. [Google Scholar] [CrossRef]
  19. Stump, B.; Plotkowski, A. Spatiotemporal parallelization of an analytical heat conduction model for additive manufacturing via a hybrid OpenMP + MPI approach. Comput. Mater. Sci. 2020, 184, 109861. [Google Scholar] [CrossRef]
  20. Zhao, Z.; Ma, R.; He, L.; Chang, X.; Zhang, L. An efficient large-scale mesh deformation method based on MPI/OpenMP hybrid parallel radial basis function interpolation. Chin. J. Aeronaut. 2020, 33, 1392–1404. [Google Scholar] [CrossRef]
  21. Noh, S.J.; Lee, J.H.; Lee, S.; Kawaike, K.; Seo, D.J. Hyper-resolution 1D-2D urban flood modelling using LiDAR data and hybrid parallelization. Environ. Model. Softw. 2018, 103, 131–145. [Google Scholar] [CrossRef]
  22. Stacey, M.W.; Pond, S.; Nowak, Z.P. A numerical model of circulation in Knight Inlet, British Columbia, Canada. J. Phys. Oceanogr. 1995, 25, 1037–1062. [Google Scholar] [CrossRef] [Green Version]
  23. Adcroft, A.; Campin, J.M. Rescaled height coordinates for accurate representation of free-surface flows in ocean circulation models. Ocean Model. 2004, 7, 269–284. [Google Scholar] [CrossRef]
  24. Tetra Tech. EFDC Technical Memorandum. Theoretical and Computational Aspects of the Generalized Vertical Coordinate Option in the EFDC Model; Tetra Tech, Inc.: Fairfax, VA, USA, 2007. [Google Scholar]
  25. Tetra Tech. The Environmental Fluid Dynamics Code, User Manual, US EPA Version 1.01; Tetra Tech, Inc.: Fairfax, VA, USA, 2007. [Google Scholar]
  26. Ahn, J.M.; Kim, J.; Park, L.J.; Jeon, J.; Jong, J.; Min, J.H.; Kang, T. Predicting cyanobacterial harmful algal blooms (CyanoHABs) 2 in a regulated river using a revised EFDC model. Water 2021, 13, 439. [Google Scholar] [CrossRef]
  27. Ahn, J.M.; Kim, B.; Jong, J.; Nam, G.; Park, L.J.; Park, S.; Kang, T.; Lee, J.K.; Kim, J. Predicting cyanobacterial blooms using hyperspectral images in a regulated river. Sensors 2021, 21, 530. [Google Scholar] [CrossRef]
  28. Hamrick, J.M. A Three-Dimensional Environmental Fluid Dynamics Computer Code: Theoretical and Computational Aspects. In Applied Marine Science and Ocean Engineering; Special Report No. 317; Virginia Institute of Marine Science: Gloucester Point, VA, USA, 1992; p. 58. [Google Scholar]
  29. Craig, P.M. Users Manual for EFDC_Explorer: A Pre/Post Processor for the Environmental Fluid Dynamics Code, Ver 160307; DSI LLC: Edmonds, WA, USA, 2016. [Google Scholar]
  30. DSI. EFDC_DSI/EFDC_Explorer Modeling System, Use and Applications for Alberta ESRD Environmental Modelling Workshop; DSI LLC: Edmonds, WA, USA, 2013. [Google Scholar]
  31. O’Donncha, F.; Ragnoli, E.; Suits, F. Parallelisation study of a three-dimensional environmental flow model. Comput. Geosci. 2014, 64, 96–103. [Google Scholar] [CrossRef]
  32. GitHub EFDC-MPI. Available online: https://github.com/fearghalodonncha/EFDC-MPI (accessed on 12 April 2021).
  33. Kwedlo, W.; Czochanski, P.J. A hybrid MPI/OpenMP parallelization of K-means algorithms accelerated using the triangle inequality. IEEE Access 2019, 7, 42280–42297. [Google Scholar] [CrossRef]
  34. Gropp, W.; Hoefler, T.; Thakur, R.; Lusk, E. Using Advanced MPI: Modern Features of the Message-Passing Interface; MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
  35. Mausolff, Z.; Craig, P.; Scandrett, K.; Mishra, A.; Lam, N.T.; Mathis, T. EFDC + Domain Decomposition: MPI-Based Implementation; DSI LLC: Edmonds, WA, USA, 2020. [Google Scholar]
  36. Amdahl, G.M. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the AFIPS Conference, Atlantic city, NJ, USA, 18–20 April 1967; AFIPS Press: Reston, VA, USA, 1967; Volume 30, pp. 483–485. [Google Scholar]
Figure 1. Comparison of simulation times when the EFDC_DSI OpenMP is applied (Vert Diff: Vertical Diffusion; PUV: Pressure, U, and V flux; QQ: Turbulent intensity; UVW: U, V, and W velocity; EXP: Explicit momentum equation term; T/B SH: Bottom friction; V&D: Viscosity and Diffusivity; Heat: Heat flux; SSED: Sediment transport; WQ Kin: WQ kinetic equation; WQ Trans: WQ transport; TOX: Toxic).
Figure 1. Comparison of simulation times when the EFDC_DSI OpenMP is applied (Vert Diff: Vertical Diffusion; PUV: Pressure, U, and V flux; QQ: Turbulent intensity; UVW: U, V, and W velocity; EXP: Explicit momentum equation term; T/B SH: Bottom friction; V&D: Viscosity and Diffusivity; Heat: Heat flux; SSED: Sediment transport; WQ Kin: WQ kinetic equation; WQ Trans: WQ transport; TOX: Toxic).
Processes 09 01548 g001
Figure 2. Parallel efficiency as a function of the number of processors using IBM’s MPI-parallelized code.
Figure 2. Parallel efficiency as a function of the number of processors using IBM’s MPI-parallelized code.
Processes 09 01548 g002
Table 1. MPI-only functions of the EFDC-NIER parallel code.
Table 1. MPI-only functions of the EFDC-NIER parallel code.
FunctionDescription
MPI_INITIALIZEInitializes MPI and sets up MPI variables (number of nodes, rank of each node)
MPI_DECOMPOSITIONPartitions the LA index
BROADCAST_BOUNDARYPerforms communication of boundary values between MPI nodes (1D variable)
BROADCAST_BOUNDARY_ARRAYPerforms communication of boundary values between MPI nodes (2D and 3D)
COLLECT_IN_ZEROPerforms collective communication to send the variables to the master node (1D)
COLLECT_IN_ZERO_ARRAYPerforms collective communication to send the variables to the master node (2D and 3D)
MPI_TIC/MPI_TOCTimestamp for hotspot analysis (MPI Walltime)
Table 2. Method for evaluating the parallelization performance.
Table 2. Method for evaluating the parallelization performance.
CaseDescriptionComparison Method
Case 1Before optimization and parallel code applicationConsistency of calculation times and results for simulations from 1 July 2015 to 9 July 2015
Case 21Sequential code optimization
2Case 2-1 + Changed method of writing the result file
Case 3Case 2-2 + OpenMP (6 threads)
Case 4Case 2-2 + MPI (6 nodes)
Case 51Case 2-2 + Hybrid (1 thread + 5 nodes)
2Case 2-2 + Hybrid (2 threads + 4 nodes)
3Case 2-2 + Hybrid (3 threads + 3 nodes)
4Case 2-2 + Hybrid (4 threads + 2 nodes)
5Case 2-2 + Hybrid (5 threads + 1 node)
Table 3. Model and computer specifications used for the evaluation of the parallel calculation.
Table 3. Model and computer specifications used for the evaluation of the parallel calculation.
CategoryDescription
ModelNumber of gridsHorizontal: 6998 units, vertical: 11 layers
Windows PCCPUInter Core i7-8700 3.20 GHz, 6 cores
Memory32 GB
OSMicrosoft Windows 10 Pro 64-bit
CompilerIntel parallel studio XE 2020, 19.1.2.254 20200623
Table 4. Hotspot analysis results for the EFDC-NIER model.
Table 4. Hotspot analysis results for the EFDC-NIER model.
Function NameDescriptionProportion of Execution Time (%)
HDMT2TMain code of numerical simulation100.00
WQ3DWater quality component simulation38.03
CALUVWFlow rate/direction component simulation18.45
CALCONCConcentration (water temperature, sediment) simulation10.65
CALQQ2TTurbulence intensity simulation7.33
CALEXP2TExplicit momentum equation calculation6.17
CALHDMFSimulation of horizontal viscosity and spreading momentum4.66
CALPUV2CSimulation of surface P, UHDYE, and VHDXE4.12
EEXPOUTWriting a result file3.15
CALAVBSimulation of vertical viscosity and dispersion3.12
ADVANCEUpdating the next timestep value0.98
CALBUOYBuoyancy simulation0.98
CALTBXYBottom friction factor calculation0.60
QQSQRUpdating the turbulence intensity value0.56
CALCSERUpdating the boundary data0.35
CALTSXYUpdating the surface wind stress0.28
SEDIMENTSediment simulation0.25
NLEVELDistribution of variables for each timestep0.11
SALPTHWriting the salinity result0.09
DUMPRecording the model variable dump0.06
Table 5. Parallelization performance evaluation method results.
Table 5. Parallelization performance evaluation method results.
CaseSimulation Time (h)
Case 13.22
Case 212.68
21.69
Case 30.74
Case 40.69
Case 510.66
20.73
30.68
40.65
50.79
Table 6. Consistency analysis results for each evaluation subject.
Table 6. Consistency analysis results for each evaluation subject.
VariableCase 1Case 3Case 4
TSX6.173959 × 10−46.173959 × 10−46.173959 × 10−4
TSY6.763723 × 10−46.763723 × 10−46.763723 × 10−4
TBX2.715987 × 10−22.715987 × 10−22.715987 × 10−2
TBY3.796839 × 10−13.796839 × 10−13.796839 × 10−1
AV6.2243356.2243356.224335
AB7.6670157.6670157.667015
AQ2.5519612.5519612.551961
HP3.046219 × 1043.046219 × 1043.046219 × 104
HU3.080725 × 1043.080725 × 1043.080725 × 104
HV3.018550 × 1043.018550 × 1043.018550 × 104
P1.726112 × 1061.726112 × 1061.726112 × 106
U1.927491 × 1021.927491 × 1021.927491 × 102
V1.632354 × 1031.632354 × 1031.632354 × 103
W8.197821 × 10−18.197821 × 10−18.197821 × 10−1
TEM1.677452 × 1051.677452 × 1051.677452 × 105
SEDT4.259870 × 1054.259870 × 1054.259870 × 105
QQ1.507725 × 101.507725 × 101.507725 × 10
QQL1.0709961.0709961.070996
WQV2.552020 × 1052.552020 × 1052.552020 × 105
WQVX5.815869 × 1045.815869 × 1045.815869 × 104
Table 7. Consistency results of major water quality factors.
Table 7. Consistency results of major water quality factors.
DateCase 1Case 3Case 4Case 5-4
BODT-NT-PBODT-NT-PBODT-NT-PBODT-NT-P
01-07-20151.4533.5900.0801.4533.5900.0801.4533.5900.0801.4533.5900.080
02-07-20151.4813.6090.0791.4813.6090.0791.4813.6090.0791.4813.6090.079
03-07-20151.5213.6580.0801.5213.6580.0801.5213.6580.0801.5213.6580.080
04-07-20151.5473.7000.0811.5473.7000.0811.5473.7000.0811.5473.7000.081
05-07-20151.5203.7120.0791.5203.7120.0791.5203.7120.0791.5203.7120.079
06-07-20151.5963.6720.0811.5963.6720.0811.5963.6720.0811.5963.6720.081
07-07-20151.6373.6120.0801.6373.6120.0801.6373.6120.0801.6373.6120.080
08-07-20151.5633.5680.0811.5633.5680.0811.5633.5680.0811.5633.5680.081
09-07-20151.6343.4930.0811.6343.4930.0811.6343.4930.0811.6343.4930.081
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ahn, J.M.; Kim, H.; Cho, J.G.; Kang, T.; Kim, Y.-s.; Kim, J. Parallelization of a 3-Dimensional Hydrodynamics Model Using a Hybrid Method with MPI and OpenMP. Processes 2021, 9, 1548. https://doi.org/10.3390/pr9091548

AMA Style

Ahn JM, Kim H, Cho JG, Kang T, Kim Y-s, Kim J. Parallelization of a 3-Dimensional Hydrodynamics Model Using a Hybrid Method with MPI and OpenMP. Processes. 2021; 9(9):1548. https://doi.org/10.3390/pr9091548

Chicago/Turabian Style

Ahn, Jung Min, Hongtae Kim, Jae Gab Cho, Taegu Kang, Yong-seok Kim, and Jungwook Kim. 2021. "Parallelization of a 3-Dimensional Hydrodynamics Model Using a Hybrid Method with MPI and OpenMP" Processes 9, no. 9: 1548. https://doi.org/10.3390/pr9091548

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop