Massively Parallel Monte Carlo Sampling for Xinanjiang Hydrological Model Parameter Optimization Using CPU-GPU Computer Cluster

Kan, Guangyuan; Li, Chenliang; Zuo, Depeng; Fu, Xiaodi; Liang, Ke

doi:10.3390/w15152810

Open AccessArticle

Massively Parallel Monte Carlo Sampling for Xinanjiang Hydrological Model Parameter Optimization Using CPU-GPU Computer Cluster

by

Guangyuan Kan

^1,2,*

,

Chenliang Li

³,

Depeng Zuo

¹

,

Xiaodi Fu

² and

Ke Liang

⁴

¹

Beijing Key Laboratory of Urban Hydrological Cycle and Sponge City Technology, College of Water Sciences, Beijing Normal University, Beijing 100875, China

²

State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, Research Center on Flood & Drought Disaster Reduction of the Ministry of Water Resources, China Institute of Water Resources and Hydropower Research, Beijing 100038, China

³

Haihe River Water Conservancy Commission, Ministry of Water Resources, Tianjin 300170, China

⁴

Beijing IWHR Corporation, Beijing 100048, China

^*

Author to whom correspondence should be addressed.

Water 2023, 15(15), 2810; https://doi.org/10.3390/w15152810

Submission received: 30 June 2023 / Revised: 1 August 2023 / Accepted: 2 August 2023 / Published: 3 August 2023

(This article belongs to the Special Issue Advances in Flood and Drought Disaster Forecasting and Early Warnings through Integrating Hydrological and Hydrodynamic Models)

Download

Browse Figures

Versions Notes

Abstract

:

The Monte Carlo sampling (MCS) method is a simple and practical way for hydrological model parameter optimization. The MCS procedure is used to generate a large number of data points. Therefore, its computational efficiency is a key issue when applied to large-scale problems. The MCS method is an internally concurrent algorithm that can be parallelized. It has the potential to execute on massively parallel hardware systems such as multi-node computer clusters equipped with multiple CPUs and GPUs, which are known as heterogeneous hardware systems. To take advantage of this, we parallelize the algorithm and implement it on a multi-node computer cluster that hosts multiple INTEL multi-core CPUs and NVIDIA many-core GPUs by using C++ programming language combined with the MPI, OpenMP, and CUDA parallel programming libraries. The parallel parameter optimization method is coupled with the Xinanjiang hydrological model to test the acceleration efficiency when tackling real-world applications that have a very high computational burden. Numerical experiments indicate, on the one hand, that the computational efficiency of the massively parallel parameter optimization method is significantly improved compared to single-core CPU code, and the multi-GPU code achieves the fastest speed. On the other hand, the scalability property of the proposed method is also satisfactory. In addition, the correctness of the proposed method is also tested using sensitivity and uncertainty analysis of the model parameters. Study results indicate good acceleration efficiency and reliable correctness of the proposed parallel optimization methods, which demonstrates excellent prospects in practical applications.

Keywords:

hydrological model; parameter optimization; Monte Carlo; MPI; OpenMP; CUDA

1. Introduction

Hydrological model parameter optimization is a key point and an important issue for model application [1,2,3,4,5]. With the development of mathematics and computer technology, a large number of parameter optimization algorithms have been proposed. Among the most popular optimization algorithms is the Monte Carlo sampling (MCS)-based parameter optimization method, which, as a basic algorithm, is frequently adopted in real-world applications due to its simplicity and practicality. MCS is a widely applied method in hydrology, offering valuable applications in various areas. This technique involves random sampling to obtain representative samples and assess uncertainties in the field of hydrology [6,7,8]. MCS has found applications in flood risk analysis, water resources management, and hydrological modeling [9]. In flood risk analysis, Monte Carlo simulations based on statistical distributions enable the estimation of flood magnitudes and the assessment of associated risks. This approach helps in designing flood protection measures, establishing flood warning systems, and evaluating infrastructure vulnerability [10]. Water resources management benefits from MCS by incorporating uncertainties related to factors such as precipitation patterns and water demand. These simulations aid decision-makers in assessing the reliability and sustainability of water supply systems. They also support optimal management strategies and long-term planning [11]. Hydrological modeling relies on MCS to quantify uncertainties associated with model inputs and parameters. This method allows for the generation of ensembles of model outputs, improving the credibility and understanding of model projections. Applications of Monte-Carlo-based modeling include rainfall-runoff modeling, streamflow prediction, and groundwater modeling [12].

However, the computational cost of the MCS-based method is expensive due to the extensive parameter searching space of the Monte Carlo experiments and the huge number of objective function evaluations. The single-threaded or, in other words, serial MCS-based parameter optimization method consumes too much time, which prevents researchers and engineers from applying it to highly complex real-world problems. Binley and Beven [13] attempted to assess the uncertainty associated with the predictions of a distributed rainfall-runoff model and carried out parameter optimization using an MCS-based method, GLUE. However, he recognized that the computational burden of the MCS is very high. Therefore, Binley constrained their MCS to 500 simulations even though they adopted a relatively simple distributed model, the Institute of Hydrology Distributed Model version 4 (IHDM4). However, at that time, performing this level of computation required significant code enhancement in order to fully use the computational horsepower of an 80-node transputer parallel computer [14]. In that era, these pioneer studies were trying to employ hardware and software that was new to hydrological sciences and relevant disciplines. With further development of computer technology, constraints of computers on the application of MCS-based methods have been relaxed to some extent. However, it remains an issue, either because of a model that is particularly slow to run so that it is still not possible to sample sufficient simulations or because of the high number of parameter dimensions. The largest number of model runs used in a GLUE application that we know of is the two billion runs application [15,16]. This was for a model written by just a few lines of code but including 17 parameters for calibration. We can infer that for more complex models, two billion samples are still insufficient. Therefore, a large number of samples is necessary when carrying out a Monte Carlo experiment, which consequently requires much more powerful parallel acceleration techniques.

The development of modern microelectronic technology provides more powerful parallel computers. Multi-core and many-core hybrid heterogeneous parallel computing platforms have become the upstart in a recent high-performance computing field due to their stupendous floating-point computing capabilities compared with traditional CPU-only and old-generation computers. Until recently, several heterogeneous supercomputers, such as Summit and Sierra, have shown excellent performance on the TOP500 test. CPU-GPU heterogeneous platform successes owed to its better cost performance and energy consumption. The modern CPU-GPU hybrid computing platform has become the best choice for researchers and engineers who need high-performance computing [17,18,19,20,21,22,23]. On the other hand, the GPU has been widely equipped in modern PCs; therefore, the CPU-GPU hybrid heterogeneous computer systems are easily available for scientific computing. The popularization of the GPU cards enables the CPU-GPU hybrid parallel program to execute on almost all PCs. Although the available GPU on PCs mainly focuses on gaming and entertainment tasks other than double-precision floating-point computation, these devices still perform very well in applications that do not require double-precision capability. Further, the software development toolkits of the CPU-GPU hybrid platform are easy to start with and can be acquired for free. Therefore, the modern CPU-GPU heterogeneous parallel computing platform can be established easily at a relatively low cost, and it shows good prospects in engineering applications.

More recently, the MCS-based parameter optimization has been speeded up for some applications using parallel computing techniques such as multi-core CPU. Even though a number of researchers have studied the acceleration of MCS-based parameter optimization, little research fully utilized the huge computational horsepower of the new generation CPU-GPU hybrid high-performance computer cluster. With the development of modern heterogeneous parallel computing technology, new generation hardware integrated with their versatile software development tools can provide tremendous computing horsepower and much better energy efficiency than ever before. The acceleration of the MCS-based parameter optimization method should catch up with the state-of-the-art of modern high-performance computing technology.

With the arrival of the big data era, hydrological model parameter optimization requires an unprecedentedly large amount of computing horsepower. This research focused on the MCS-based parameter optimization method and the coupled utilization of the newly emerged modern CPU-GPU hybrid high-performance computer cluster acceleration technology. In order to further improve the computational efficiency of the MCS-based hydrological model parameter optimization, the CPU-GPU hybrid computer cluster-based parallel parameter optimization method was proposed. The parallel method was implemented on the CPU cluster and GPU cluster, respectively. We utilized a total of five CPUs and five GPUs to achieve satisfactory acceleration results. Further, the scalability issue was investigated to prove the excellent robustness and scalability of the parallel optimization method. Additionally, the correctness of the proposed method is tested using sensitivity and uncertainty analysis of the model parameter sample points generated using the proposed method. Study results indicate good acceleration efficiency and reliable correctness of the proposed parallel optimization methods, which demonstrates excellent prospects in practical applications.

2. Methodology

2.1. Xinanjiang Model Parameter Optimization Based on Monte Carlo Sampling

2.1.1. The Monte Carlo Sampling

The brief procedure of the traditional MCS for model parameter optimization is listed below, and detailed descriptions of the MCS method can be found in relevant literature.

(1): A formal definition of a likelihood measure or set of likelihood measures is required. For hydrological model applications, the Nash–Sutcliffe coefficient of efficiency (NSCE) is usually adopted as the likelihood measure or, in other words, objective function value. It can be calculated as follows:

$N S C E = 1 - \frac{\sum_{i = 1}^{n} {(q_{obs, i} - q_{sim, i})}^{2}}{\sum_{i = 1}^{n} {(q_{obs, i} - {\bar{q}}_{obs})}^{2}}$

(1)

where $q_{sim, i}$ denotes simulated discharge at time step I; $q_{obs, i}$ denotes observed discharge at time step i; ${\bar{q}}_{obs}$ denotes the mean of the observed discharge values; n denotes the number of discharge data.
(2): An appropriate definition of the range and distribution of the parameter values is necessary for a particular model structure. Generally speaking, the ranges of parameters are predefined according to the physical meanings of the specific hydrological model, and the uniform distribution is adopted in most cases since the actual distribution of parameters is usually unknown.
(3): Sampling of the parameter sets in the feasible space is achieved by utilizing the Monte Carlo approach, and the likelihood values are evaluated with the objective function after obtaining the simulation results of the hydrological model.
(4): The optimality of different parameter sets is evaluated based on their likelihood value.

2.1.2. The Xinanjiang Model

The Xinanjiang model was developed in 1973 and published in 1980 [24,25,26]. Its main feature is the concept of runoff formation on repletion of storage, which means that runoff is not generated until the soil moisture content of the vadose zone reaches field capacity, and thereafter, runoff equals excess rainfall without further loss. This hypothesis was first proposed in the 1960s, and lots of subsequent experiences support its validity for humid and semi-humid regions. According to the original formulation, the runoff generated was separated into two components using Horton’s concept of a final, constant infiltration rate. Infiltrated water was assumed to move into the groundwater storage and the remainder to the ground surface or storm runoff. However, evidence of variability in the final infiltration rate and in the unit hydrograph assumed to connect the storm runoff to the discharge from each sub-basin suggested the necessity of a third component. Guided by the work of Kirkby, an additional component, interflow was provided in the model in 1980. The modified model is now successfully and widely applied in China. The model structure is demonstrated in Figure 1. Detailed descriptions of the principles of the Xinanjiang model can be found in relevant literature.

2.1.3. Model Parameter Optimization

The traditional MCS-based Xinanjiang model parameter optimization involves two aspects: (a) parameters upper and lower boundaries and their constraints and (b) the objective function or likelihood measurement. Additionally, for hydrological simulation with the Xinanjiang model in this study, both the computational time step and hydro-meteorological data time interval were set to one day.

For model parameters, the specification of lower and upper boundaries is listed in Table 1. For the Xinanjiang model, the number of parameters (n) that need to be sampled is 15. Parameters KG and KI of the linear reservoir-based flow concentration module have a structural constraint KG + KI = 0.7. Therefore, we sample KG in this research, and KI is calculated by 0.7 KG.

The Xinanjiang daily model focuses on water balancing and hydrograph simulation. Therefore, the objective function (OBJ) adopted herein is calculated as follows:

O B J = | R D R E | + 1 - N S C E

(2)

where RDRE and NSCE represent the Runoff Depth Relative Error and the Nash–Sutcliffe Coefficient of Efficiency, respectively. The computation of the NSCE has been listed in Equation (1), and the RDRE is calculated as follows:

R D R E = \frac{\sum_{i = 1}^{n} q_{sim, i} - \sum_{i = 1}^{n} q_{obs, i}}{\sum_{i = 1}^{n} q_{obs, i}}

(3)

where

q_{sim, i}

denotes simulated discharge at time step i;

q_{obs, i}

denotes observed discharge at time step i; n denotes the number of discharge data.

The parameters and state variables of the Xinanjiang model require two additional constraints to ensure the correctness of the model’s physical meaning. The constraints are applied to parameters CG, CI, and CS (CG ≥ CI ≥ CS) and soil moisture W (W must be non-negative). In order to consider the first constraint in the procedure of the MCS, before calculating the OBJ, we test the CG, CI, and CS values to verify whether the constraint is satisfied. If it is satisfied, we continue the calculation of the OBJ; otherwise, we set the OBJ to a penalty term, which is computed by

O B J = l a m b d a + l a m b d a [\max (C I - C G, 0) + \max (C S - C I, 0)]

(4)

where lambda is a penalty coefficient, which was set to 1000 in this research. If the first constraint is satisfied, then we can run the Xinanjiang daily model by using the hydro-meteorological data to generate the simulated discharge time series. After the model simulation is finished, a “flag” will be returned to indicate whether the simulation is a success. If the simulation is early stopped and returns a “flag” indicating that the state variable W has negative values, we set the OBJ to the following penalty term expressed as:

O B J = l a m b d a + l a m b d a (W M_{UB} - W M)

(5)

where WM_UB denotes the upper boundary of parameter WM; WM denotes the WM value generated from the MCS. This penalty term forces the algorithm to search toward larger WM to avoid the negative W values.

If the above-mentioned constraints are both satisfied, we calculate the OBJ according to

O B J = | R D R E | + 1 - N S C E

(6)

After the OBJ of each parameter set is calculated, the parameter set with the minimum OBJ value is concluded, which is defined as the optimal parameter set.

2.2. Xinanjiang Model Parameter Optimization Based on Parallel Monte Carlo Sampling

2.2.1. Parallel Monte Carlo Sampling

The parallelization of the MCS-based parameter optimization involves two steps, which include the parallelization of the Monte Carlo sampling and the optimal parameter set reduction. We implemented the parallel MCS method on a multi-core CPU computer cluster and a many-core GPU computer cluster, respectively. A detailed description of the implementation can be found in the following paragraphs.

2.2.2. CPU Computer Cluster Implementation

The parallel optimization method was implemented on a multi-core CPU computer cluster, which contains 4 HP Z-series workstations hosting a total of five INTEL Xeon E5-2630v3 multi-core CPUs. The flow chart of the CPU computer cluster implementation of the parallel optimization method is demonstrated in Figure 2.

The parallel optimization method starts from the initial settings on the master node. The hydro-meteorological data were loaded from CSV (comma-separated values) files, which include daily rainfall, runoff discharge, evapotranspiration, and catchment geographical information. Further, the program set the total sample number of the Monte Carlo experiment (NS), likelihood threshold (TH), model parameter boundaries, and KG plus KI constraint. After initial data loading and settings, the algorithm queries the number of slave nodes (NN) and number of CPU cores in each slave node by using MPI and OpenMP APIs, respectively. The workload quantity assigned to each slave node is calculated as follows:

N S S_{i} = N S \times N C_{i} / N T and N T = \sum_{i = 1}^{N N} N C_{i}

(7)

where NSS_i denotes the number of samples generated in slave node I; NC_i denotes the number of CPU cores in slave node i; NT denotes the total number of CPU cores of the computer cluster; i = 1, 2, …, NN.

After initial data loading and model settings have been finished, the above-mentioned data and settings are broadcasted to all slave nodes by using MPI_Bcast API. For each slave node (let us take slave node i as an example), generate NC_i threads to sample NSS_i parameter sets that fall in the parameter feasible space. For each thread, run the Xinanjiang hydrological model and compute the likelihood function value by using the generated parameter set. A parameter set with a likelihood value higher than TH is preserved. Model simulations and likelihood function evaluations in the NC_i threads are executed in parallel by using the OpenMP technology. After the NC_i threads of calculations had finished, the OpenMP parallel reduction operation was started to find the optimal parameter set of each slave node. At last, send the feasible parameter sets and optimal parameter set to the master node by using MPI_Send API.

During the MCS, the master node waits for the likelihood calculations and parallel reduction in all the slave nodes until these operations are complete. Once all the above computations are complete, the master node receives feasible and best parameter sets from each slave node by using MPI_Recv API. At last, the master node chooses the parameter set with the largest likelihood function value as the optimal parameter set and finishes the execution by using MPI_Finalize API.

2.2.3. GPU Computer Cluster Implementation

The parallel optimization method was implemented on a many-core GPU computer cluster, which is constructed using four HP Z-series workstations hosting a total of five NVIDIA Tesla K40c GPUs. The flow chart of the GPU computer cluster implementation of the parallel optimization method is demonstrated in Figure 3.

The parallel optimization method starts from the initial settings on the master node. The hydro-meteorological data are loaded from CSV (comma-separated values) files, which include daily rainfall, runoff discharge, evapotranspiration, and the geographical information of the study catchment. The algorithm also sets the total sample number of the Monte Carlo experiment (NS), likelihood threshold (TH), model parameter boundaries, and KG plus KI constraint. After initial data loading and settings, the algorithm begins the MPI execution and queries the number of slave nodes (NN), number of GPUs in each slave node, and number of GPU cores in each slave node by using MPI and CUDA APIs, respectively. The workload quantity assigned to each slave node is calculated as follows:

N S S_{i} = N S \times \sum_{j = 1}^{N G_{i}} N C_{i, j} / N T and N T = \sum_{i = 1}^{N N} \sum_{j = 1}^{N G_{i}} N C_{i, j}

(8)

where NSS_i denotes the number of samples generated in slave node i; NC_i,j denotes the number of GPU cores of GPU j in slave node i; NG_i denotes the number of GPUs in slave node i; NT denotes the total number of GPU cores of the computer cluster; i = 1, 2, …, NN; j = 1, 2, …, NG_i.

After initial data loading and settings have been finished, the above-mentioned data and settings are broadcasted to all slave nodes by using MPI_Bcast API. For each slave node (let us take slave node i as an example), create NG_i CPU threads to control NG_i GPUs by using OpenMP and offload the data and settings on the GPUs by using CUDA APIs such as cudaMemcpy. For slave node i, sample NSS_i parameter sets within the parameter feasible space on the GPUs. Each GPU thread ran the Xinanjiang hydrological model and computed the likelihood function value by using the generated parameter set, and parameter sets with a likelihood value higher than TH were preserved. The model and likelihood function calculations are executed in parallel by using the OpenMP and CUDA technology on the GPUs. The GPU j is responsible for parameter set generation, model run, and likelihood function evaluations of

{N S S}_{i} \times {N C}_{i, j} / \sum_{j = 1}^{{N G}_{i}} {N C}_{i, j}

samples. After the calculations of likelihood, the CUDA parallel reduction was started to find the feasible parameter sets and the optimal parameter set of each slave node. At last, the optimal parameter set was sent to the master node by using MPI_Send API.

During the process of MCS, likelihood calculations, and parallel reduction in all the slave nodes, the master node waits for all these computations’ completion. Once all the above computations are completed, the master node receives the feasible parameter sets and the optimal parameter set of each slave node by using MPI_Recv API. At last, the master node chooses the parameter set with the largest likelihood function value as the best parameter set and finishes the execution by using MPI_Finalize API.

2.3. Sensitivity and Uncertainty Analysis Based on GLUE

For the purpose of verifying the correctness of the proposed parallel parameter optimization method, we carry on sensitivity and uncertainty analysis to the Monte Carlo-generated parameter set samples by using the Generalized Likelihood Uncertainty Estimation methodology, GLUE.

The principle of the GLUE method can be summarized as follows: GLUE is proposed by Beven, and it is a framework used for model calibration and uncertainty analysis in hydrological and environmental sciences. The method assumes that the true system behavior cannot be fully represented by a single set of model parameters, so it considers multiple alternative model parameter sets to capture the uncertainties. GLUE uses a likelihood measure to compare the observed data with model simulations, quantifying how well each model reproduces the observed data. Each model simulation is characterized by a set of parameter values, and GLUE allows for parameter uncertainty using sampling from prior distributions. GLUE generates a large number of model realizations by sampling parameters randomly from their defined distributions by using a Monte-Carlo-based sampling method. Model performances are ranked based on how well they reproduce the observed data using the likelihood measure. A threshold value is defined to determine an “acceptable” range of model performance. Model parameter sets falling within this range are considered plausible. The ensemble of plausible model parameter sets is used to generate predictions and assess uncertainty using various statistical metrics and visualization techniques. By carrying on the above-mentioned procedures, GLUE provides a comprehensive framework for uncertainty analysis. It should be noted that this is just a brief overview of the GLUE method’s principles, and there are more technical details and variations depending on the specific application, which can be found in relevant references.

2.4. Hardware Adopted in This Study

The hardware utilized in this research is a computer cluster composed of one HP Z820 and three HP Z840 workstations. The screen and four workstations are demonstrated in Figure 4. The USB KVM switch, which is used for the controlling and switching of four workstations by one set of screen, keyboard, and mouse, is shown in Figure 5. This computer cluster has five INTEL Xeon E5-2630v3 CPUs and five NVIDIA Tesla K40c GPUs.

The Xeon E5-2630v3 CPU is a high-end server-level microprocessor. It’s a Haswell–EP architecture CPU with a 0.022 nm manufacturing process. It has 8 CPU cores and supports hyper-threading technology with up to 16 parallel threads. The base frequency of the CPU core is 2.4 GHz, and the turbo frequency is 3.2 GHz. The level 1 cache size is 8 × 32 KB 8-way set associative instruction and data caches. The level 2 cache size is 8 × 256 KB 8-way set associative caches. The level 3 cache size is 20 MB shared cache. It supports many new features such as MMX instructions, SSE/streaming SIMD extensions, AVX/advanced vector extensions, TBT2.0/turbo boost technology 2.0, etc. The V core is 0.65–1.3 V. The maximum operating temperature is 72 °C. The minimum power dissipation is 32 watt for the C1E state and 12 watt for the C6 state.

The Tesla K40c GPU is a high-end professional graphics card. Built on the 28 nm process and based on the GK110B graphics processor, the card supports DirectX 12.0. The GK110B graphics processor is a large chip with a die area of 561 mm² and 7080 million transistors. It features 2880 shading units, 240 texture mapping units, and 48 ROPs. NVIDIA has placed 12,288 MB GDDR5 memory on the card, which is connected using a 384-bit memory interface. The GPU is operating at a frequency of 745 MHz, and memory is running at 1502 MHz. Being a dual-slot card, the NVIDIA Tesla K40c draws power from 1 × 6-pin + 1 × 8-pin power connectors, with a power draw rated at 245 W maximum. Tesla K40c is connected to the rest of the system using a PCIe 3.0 × 16 interface. The card measures 267 mm in length and features a dual-slot cooling solution.

2.5. Software Adopted in This Study

The software is developed based on the Microsoft Windows 7 64-bit operating system. The software ecosystem applied in this study is composed of MPICH2, Microsoft VC++2010 with OpenMP, and NVIDIA CUDA 6.5.

MPICH is a high-performance and widely portable implementation of the Message Passing Interface (MPI) standard. The goals of MPICH are: (1) to provide an MPI implementation that efficiently supports different computation and communication platforms, including commodity clusters, high-speed networks, and proprietary high-end computing systems, and (2) to enable cutting-edge research in MPI using an easy-to-extend modular framework for other derived implementations. MPICH is distributed as a source, and it has been tested on several platforms, including Linux (on IA32 and x86-64), Mac OS/X (PowerPC and Intel), Solaris (32- and 64-bit), and Windows.

The OpenMP API supports multi-platform shared-memory parallel programming in C/C++ and Fortran. The OpenMP API defines a portable, scalable model with a simple and flexible interface for developing parallel applications on platforms from the desktop to the supercomputer. In this research, we adopted the OpenMP in Visual C++2010 implementation to develop parallel codes. The OpenMP C and C++ application program interface lets us write applications that effectively use multiple processors.

CUDA is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the GPU. With millions of CUDA-enabled GPUs sold to date, software developers, scientists, and researchers are using GPU-accelerated computing for broad-ranging applications. The NVIDIA CUDA Toolkit adopted in this study provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. The CUDA Toolkit includes a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing the performance of applications.

2.6. Hydro-Meteorological Data Utilized in This Study

The study area of this research is the Ba River basin. It originates from the north slope of the Qinling, China. The full length of the Ba River is 92.6 km. The elevation difference from the headwater to the outlet of the river is 1142 m. The total slope is 12.8%. The catchment area is 2577 km². The Ba River basin is an asymmetric watershed. The left bank tributaries are sparse and long, while the right bank tributaries are condensed and short. The Ba River is a mountainous river. The river discharge hydrograph rises and falls steeply. The peak flow usually happens in the summer season, and the drying season is winter. The average annual precipitation of the studied area is 630.9 mm. The average annual evaporation is 949.7 mm. The average annual runoff is 493.1 million m³. There are ten rainfall gauges located in this area and the outlet station is the Maduwang station. Observed daily rainfall, evaporation, and daily average discharges ranging from 2000 to 2010 were utilized as the calibration data set. The map of the Maduwang catchment is demonstrated in Figure 6.

In this study, we can only obtain daily data for 11 years to carry out the model calibration. The use of a wider time window should be applied in future studies to ensure that the model and methods perform equally well in several climatic fluctuations, correlations, and combinations of input parameters. This is mainly due to the large variability and correlation of the hydrological-cycle processes, which only at least 30 30-year temporal window (that corresponds to the climatic scale) can be representative of their complexity. This is a known issue in hydrological-cycle processes and corresponds to the so-called Hurst behavior, long-term persistence, or long-range dependence [27].

3. Results and Discussion

3.1. Total Execution Time Comparison

In order to test and examine the computational performance of the traditional serial and the improved parallel parameter optimization methods, we compared the total execution time of the above-mentioned methods. The total execution time of serial and parallel methods is demonstrated in Figure 7. For the purpose of comparing total execution time corresponding to different degrees of parallelism, we adjusted the number of Monte Carlo samples, which reflects the parallelism degree, from 10,000 to 1,000,000 with a multiplier of 10. Figure 7A,B show the total execution time of serial, CPU cluster parallel, and GPU cluster parallel methods, respectively. The serial method is executed by one CPU with only one core. The CPU cluster parallel version is executed by five CPUs with 80 cores. The GPU cluster parallel method is executed by five GPUs with 14,400 cores with the GPU boost and ECC configurations set to off.

As shown in Figure 7, when the number of samples increases, the total execution time of the three methods also increases. The execution time of serial, CPU cluster parallel, and GPU cluster parallel methods increased from 34.6 s to 3368.4 s, from 0.8 s to 85.2 s, and from 1.4 s to 13.4 s, respectively. For the serial method, the total execution time increases approximately according to a multiplier of 10, which is the same increment multiplier of the number of samples. The CPU cluster parallel method performs similarly to the serial version. Its total execution time also increases approximately according to the multiplier of 10. However, the GPU cluster parallel method does not perform as the above two versions. When the number of samples increases from 10,000 to 100,000, the total execution time increases more than 10 times (from 1.4 s to 2.3 s), while with the number of samples increases from 100,000 to 1,000,000, the total execution time increases less than 10 times (from 2.3 s to 13.4 s). The test result indicates that with a higher computational burden (larger number of samples), the GPU cluster parallel version’s computational efficiency becomes better than the serial and CPU cluster parallel versions. It can be observed from Figure 7 that parallel methods perform much better than their serial counterpart. The GPU cluster parallel version runs very fast for a large number of samples and significantly outperforms the other two versions. With a larger number of samples, the parallel methods run much faster than the serial ones. It can be concluded that parallel implementations can solve much larger scale parameter estimation tasks with much faster speed than the traditional serial implementation.

3.2. Speedup Ratio Comparison

The speedup ratios of different implementations are demonstrated in Figure 8. Figure 8A,B show the speedup ratio of the cluster-based parallel method vs. serial method and GPU cluster parallel method vs. CPU cluster parallel method, respectively. The hardware utilized here is the same as mentioned above, i.e., five CPUs with 80 cores and five GPUs with 14,400 cores with GPU boost and ECC off.

It can be seen in Figure 8A that with the increase in the number of samples, the speedup ratio of the CPU cluster parallel implementation decreases a bit from 43.25× to 39.54×. On the contrary, with the increase in the number of samples, the speedup ratio of the GPU cluster parallel implementation increases significantly from 24.71× to 251.37×. Under the condition of a relatively small number of samples, such as 10,000, the speedup ratio of the GPU cluster (24.71×) is a bit smaller than the CPU cluster (43.25×). When the number of samples increases, the GPU cluster implementation (152.17× and 251.37×) significantly outperforms the CPU cluster implementation (40.23× and 39.54×). It can be inferred that the GPU cluster parallel implementation performs much better than the CPU cluster parallel version, especially for large-scale hydrological model parameter estimation tasks.

The above paragraph assesses the speedup ratio between parallel implementations and serial implementations. This section compares the performances of the GPU clusters’ parallel version and CPU cluster parallel implementations. We can see in Figure 8B that with the increase in the number of samples, the GPU cluster version runs gradually faster than the CPU cluster version. For relatively small-scale problems such as the number of samples equal to 10,000, the speedup ratio is less than 1× (0.57×), which means the GPU implementation is slower than the CPU implementation. Nevertheless, when faced with large-scale parameter estimation problems, such as the number of samples equal to 100,000 or 1,000,000, the GPU cluster parallel implementation runs 3.78× or 6.36× faster than the CPU cluster parallel implementation. These facts indicate that the GPU cluster parallel method is more suitable for large-scale model parameter estimation problems and runs much faster than the CPU cluster parallel method for these kinds of tasks.

3.3. Scalability Analysis

3.3.1. Analysis Based on Total Execution Time

In this section, the scalability of the parallel methods is analyzed based on the total execution time. Here, we focused on the scalability analysis; therefore, we fixed the number of samples to 1,000,000. The GPU cluster turns off the GPU boost and ECC. We varied the number of CPUs or GPUs and tested the total execution time to compare the performances of parallel methods. The total execution time of the CPU cluster parallel version and the GPU cluster parallel version is demonstrated in Figure 9. It can be inferred from the figure that the total execution time of the parallel methods decreases when applying more CPUs or GPUs to accelerate the computation. With more CPUs or GPUs, the computational efficiency of the parallel methods improves significantly. These testing results indicate that the scalability of the parallel method is good, and the parallel method can be applied to highly parallel and heavy computational burden tasks.

3.3.2. Analysis Based on Speedup Ratio between CPU/GPU Cluster and Single-Core CPU

In this section, the scalability of the parallel methods is analyzed based on the speedup ratio between parallel methods and serial methods. Here, we focused on the scalability analysis and, therefore, set the number of samples to 1,000,000. The GPU cluster turns off the GPU boost and ECC. We varied the number of CPUs or GPUs and tested the speedup ratio between parallel methods and serial methods to compare the performances of parallel methods. The speedup ratio of the CPU cluster parallel version and GPU cluster parallel version vs. serial implementation is demonstrated in Figure 10. We can see from Figure 10 that the speedup ratios of the parallel methods increase with the increment of the number of CPUs or GPUs adopted to execute the parallel computation. The GPU cluster parallel model outperforms the CPU cluster parallel model significantly. With all five GPUs run in parallel, the GPU cluster parallel method achieved 251.37× speedup ratio compared with the CPU cluster parallel version’s 39.54× speedup ratio. These facts showed that even though the CPU cluster parallel method can run much faster than the serial CPU version, the GPU cluster parallel method achieved even much better computational efficiency than the CPU cluster parallel code.

3.3.3. Analysis Based on Speedup Ratio between GPU Cluster and CPU Cluster

In this section, the scalability of the parallel methods is analyzed based on the speedup ratio between parallel methods. Here, we focused on the scalability analysis by setting the number of samples as 1,000,000. The GPU cluster turns off the GPU boost and ECC. We varied the number of CPUs or GPUs and tested the speedup ratio between parallel methods to compare the performances of parallel optimization methods. The speedup ratio of GPU cluster parallel code vs. CPU cluster parallel code is demonstrated in Figure 11. It can be observed in Figure 11 that the GPU cluster parallel method runs faster than the CPU cluster parallel code when using one to five CPUs or GPUs. The GPU version runs 3.93× to 6.36× faster than the CPU version, which indicates that the GPU cluster parallel code has better scalability than the CPU version. Therefore, adopting more GPUs or CPUs can achieve better computational performance than only using a small number of CPUs or GPUs.

3.4. Correctness Verification Based on Sensitivity and Uncertainty Analysis of the Model Parameters

The sensitivity and uncertainty analyses are carried out based on the GLUE method. The basic settings and configurations of the GLUE method applied in this research are as follows: the number of samples totally generated by the MCS is set to 1,000,000; the Nash–Sutcliffe Coefficient of Efficiency, NSCE, is adopted as the likelihood measure; the threshold of the NSCE is set to 0.9 which means model simulations of parameter set corresponds to NSCE higher than or equal to 0.9 is reserved as the acceptable simulations, while other parameter sets are discarded.

3.4.1. Sensitivity Analysis of Model Parameters

Scatter plots of the MCS-generated model parameters vs. NSCE are demonstrated in Figure 12. The sensitive parameter usually behaves in a manner that changes drastically when the parameter value changes in a small range while the simulation changes drastically (here, we use NSCE to evaluate simulation performance). According to these criteria, it can be found from the scatter plots that parameters such as K, KG, KI, SM, CS, and X are sensitive, and parameters such as B, C, and IM are relatively insensitive.

There’s another point we should note that the insensitive judgment of a model parameter mainly comes from two reasons: one is that it is really an insensitive parameter, and the other is that it is caused by the phenomenon of “equifinality”. Let us take parameters WUM, WLM, and WDM as an example. Previous studies prove that if we fix two arbitrary parameters, the remaining one should be a sensitive parameter. However, we found that they are all insensitive parameters by analyzing Figure 12. This is because different combinations of these parameters may generate similar simulations and similar NSCEs, which are widely recognized as an “equifinality” phenomenon.

In summary, the sensitive analysis results obtained from the scatter plots are in line with the theoretical analysis conclusions of the previous research. The above results prove that the proposed parallel parameter optimization method based on MCS can correctly reflect the distribution and sensitivity of the model parameters.

3.4.2. Uncertainty Analysis of Model Simulations

In order to evaluate uncertainty contained in model simulations based on different model parameters generated using the MCS-based parameter optimization method, we adopt the GLUE method to generate a discharge confidence interval between 5% and 95% and draw observed, simulated hydrographs and confidence intervals in Figure 13, respectively. Due to space limitations, we only present the hydrographs for the years 2000, 2005, and 2010. Similar results are observed for other years. The simulated hydrograph is generated based on the optimal model parameter set generated from the parameter optimization method. It can be seen from the figures that most parts of the observed and simulated hydrographs fall in the confidence interval and prove that the proposed parallel Monte-Carlo-based parameter sampling and model parameter optimization method performs well and reliably.

3.4.3. Correctness Verification of the Model Simulations

We show a scatter plot of observed and simulated discharge time series in Figure 14. The simulated hydrograph is derived from the optimal model parameter set generated from the parameter optimization method. It is demonstrated that the data points uniformly distribute around the 45-degree regression line, which indicates a very good simulation accuracy of 0.9981 R² value. The results from Figure 14 show that the proposed model parameter optimization method successfully finds the optimal model parameter and achieves high flood simulation accuracy.

4. Conclusions

In this research, we developed an improved version of the MCS-based parameter optimization method to accelerate its computational efficiency. The multi-core CPU and many-core GPU hybrid high-performance computing hardware system and MPI, OpenMP, and CUDA-based hybrid software development ecosystems were adopted. We applied the improved parallel and traditional serial optimization method to the parameter calibration of a hydrological model, the Xinanjiang model, for the verification of computational efficiency in real-world applications. Additionally, the correctness of the proposed method is also verified using sensitivity and uncertainty analysis using the GLUE method. The following conclusions can be drawn.

(1): The parallel MCS-based parameter optimization methods run much faster than their serial implementation counterpart, consume much smaller computational time, and achieve much higher speedup ratios. The computational efficiency comparison results indicated the satisfying performance of the parallel methods. Furthermore, the GPU cluster parallel method achieved the highest speedup ratio up to 251.37×, which significantly improves the computational efficiency and makes the highly intensive computation MCS-based parameter optimization method possible and meaningful to be applied in real-world and real-time applications.
(2): The parallel optimization methods have good and satisfactory scalability, which allows scientists and engineers to apply more CPU or GPU devices to further improve computational efficiency and makes them capable of tackling large-scale computation tasks. It is possible to solve problems previously not solvable due to too much memory and computational power requirements. These kinds of parameter estimation problems often come up with the requirements of national or even global hydrological or meteorological modeling and simulation tasks. With the development of cluster-based supercomputers such as the Summit and Sierra of the TOP500 supercomputers, the CPU-GPU hybrid high-performance computer cluster-based parallel optimization method can achieve even better performance in the near future.
(3): By comparing the acceleration results between a single CPU, single GPU, multiple CPUs, and multiple GPUs, we find that GPU computation is the best way to achieve the highest computational efficiency. This is because the GPU is good at tacking floating-point computations while CPU is good at treating control of computation procedures. The reason is that the GPU has a much larger number of cores to carry on numerical computation. The best way to hybrid CPU and GPU is to let the CPU treat procedure controlling and let the GPU do floating-point computations.
(4): Even though the acceleration efficiency of the proposed parallel optimization method is satisfactory, there still exist some problems that need further improvement. The simple MCS-based parameter optimization algorithm is not suitable for highly complex optimization problems, and improvement in the sampling method, such as the introduction of Latin-Hypercube-based sampling, is in great need. It can be seen from the acceleration results that the scalability of the parallel optimization method has not arrived at the linear increasing mode, and there is potential for further improvement.

Author Contributions

Conceptualization, G.K.; methodology, G.K. and C.L.; software, G.K.; validation, G.K., C.L. and D.Z.; formal analysis, G.K.; investigation, G.K., X.F. and K.L.; resources, C.L. and D.Z.; data curation, D.Z. and K.L.; writing—original draft preparation, G.K.; writing—review and editing, G.K. and K.L.; visualization, C.L.; supervision, D.Z.; project administration, G.K.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Research Fund of Beijing Key Laboratory of Urban Hydrological Cycle and Sponge City Technology (HYD2020OF02); IWHR Research & Development Support Program (JZ0199A032021); GHFUND A No. ghfund202302018283. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 and TITAN V GPUs used for this research.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors also thank the anonymous reviewers for their helpful comments and suggestions.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Beven, K.; Binley, A. Future of distributed models: Model calibration and uncertainty prediction. Hydrol. Process. 1992, 6, 279–298. [Google Scholar] [CrossRef]
Singh, V.P. Computer Models of Watershed Hydrology; Water Resources Publications: Highlands Ranch, CO, USA, 1997; pp. 443–476. [Google Scholar]
Muleta, M.; Nicklow, J. Sensitivity and uncertainty analysis coupled with automatic calibration for a distributed watershed model. J. Hydrol. 2005, 306, 127–145. [Google Scholar] [CrossRef] [Green Version]
Kan, G.; He, X.; Li, J.; Ding, L.; Hong, Y.; Zhang, H.; Liang, K.; Zhang, M. Computer Aided Numerical Methods for Hydrological Model Calibration: An Overview and Recent Development. Arch. Comput. Methods Eng. 2019, 26, 35–59. [Google Scholar] [CrossRef]
Li, X. Research on Hydrological Model Parameter Optimization and Uncertainty Analysis Method. Ph.D. Thesis, Dalian Institute of Technology, Dalian, China, 2005. [Google Scholar]
Huang, G.; Xie, H. Uncertainty analysis of watershed hydrological model based on GLUE method. J. South China Univ. Technol. (Nat. Sci. Ed.) 2007, 35, 137–149. [Google Scholar]
Shu, C.; Liu, S.; Mo, X.; Liang, Z.; Dai, D. Uncertainty analysis of the Xinanjiang model parameter. Geogr. Res. 2008, 27, 343–352. [Google Scholar]
Beven, K.; Leedal, D.; Hunter, N.; Lamb, R. Communicating uncertainty in flood risk mapping. In Comprehensive Flood Risk Management: Research for Policy and Practice; CRC Press: CO, USA, 2012. [Google Scholar]
Rozos, E.; Dimitriadis, P.; Mazi, K.; Lykoudis, S.; Koussis, A. On the Uncertainty of the Image Velocimetry Method Parameters. Hydrology 2020, 7, 65. [Google Scholar] [CrossRef]
Smith, M.; Bates, P. Ensemble prediction of flood events using Monte Carlo simulation. J. Hydrol. 2003, 277, 1–14. [Google Scholar]
Loukas, A.; Vasiliades, L. Probabilistic analysis of a flood protection project using Monte Carlo techniques. J. Hydrol. 2004, 290, 307–322. [Google Scholar]
Seifert, D.; Sonnenborg, T.O.; Refsgaard, J.C.; Højberg, A.L.; Troldborg, L. Assessing of hydrological model predictive ability given multiple conceptual geological models. Water Resour. Res. 2012, 48, W06503. [Google Scholar] [CrossRef]
Binley, A.; Beven, K. Physically based modelling of catchment hydrology: A likelihood approach to reducing predictive uncertainty. In Computer Modelling in the Environmental Sciences; Clarendon Press: Oxford, UK, 1991. [Google Scholar]
Beven, K.; Binley, A. GLUE: 20 years on. Hydrol. Process. 2014, 28, 5897–5918. [Google Scholar] [CrossRef] [Green Version]
Iorgulescu, I.; Beven, K.; Musy, A. Data-based modelling of runoff and chemical tracer concentrations in the Haute-Menthue (Switzerland) research catchment. Hydrol. Process. 2005, 19, 2557–2574. [Google Scholar] [CrossRef]
Iorgulescu, I.; Beven, K.J.; Musy, A. Flow, mixing, and displacement in using a data-based hydrochemical model to predict conservative tracer data. Water Resour. Res. 2007, 43, W03401. [Google Scholar] [CrossRef]
Fang, M.; Zhang, W.; Fang, J.; Zhou, H.; Gao, C. GPU Programming and Code Optimization High Performance Computing for the Masses; Tsinghua University Press: Beijing, China, 2016. [Google Scholar]
Kan, G.; Zhang, M.; Liang, K.; Wang, H.; Jiang, Y.; Li, J.; Ding, L.; He, X.; Hong, Y.; Zuo, D.; et al. Improving water quantity simulation & forecasting to solve the energy-water-food nexus issue by using heterogeneous computing accelerated global optimization method. Appl. Energy 2018, 210, 420–433. [Google Scholar]
Kan, G.; Lei, T.; Liang, K.; Li, J.; Ding, L.; He, X.; Yu, H.; Zhang, D.; Zuo, D.; Bao, Z. Amo-Boateng Mark, Hu Youbing, Zhang Mengjie. A multi-core CPU and many-core GPU based fast parallel shuffled complex evolution global optimization approach. IEEE Trans. Parallel Distrib. Syst. 2017, 28, 332–344. [Google Scholar]
Kan, G.; He, X.; Ding, L.; Li, J.; Hong, Y.; Zuo, D.; Ren, M.; Lei, T.; Liang, K. Fast hydrological model calibration based on the heterogeneous parallel computing accelerated shuffled complex evolution method. Eng. Optim. 2018, 50, 106–449. [Google Scholar] [CrossRef]
Kan, G.; He, X.; Ding, L.; Li, J.; Liang, K.; Hong, Y. A heterogeneous computing accelerated SCE-UA global optimization method using OpenMP, CUDA and OpenACC. Water Sci. Technol. 2017, 76, 1640–1651. [Google Scholar] [CrossRef] [PubMed]
Kan, G.; Liang, K.; Li, J.; Ding, L.; He, X.; Hu, Y.; Amo-Boateng, M. Accelerating the SCE-UA Global Optimization Method Based on Multi-Core CPU and Many-Core GPU. Adv. Meteorol. 2016, 2016, 8483728. [Google Scholar] [CrossRef] [Green Version]
Kan, G.; He, X.; Ding, L.; Li, J.; Hong, Y.; Liang, K. Heterogeneous parallel computing accelerated generalized likelihood uncertainty estimation (GLUE) method for fast hydrological model uncertainty analysis purpose. Eng. Comput. 2018, 36, 75–96. [Google Scholar] [CrossRef]
Zhao, R.; Zhuang, Y.; Fang, L.; Liu, X.; Zhuang, Q. The Xinanjiang Model; Water Resources Publications: Littleton, CO, USA, 1980; pp. 351–356. [Google Scholar]
Zhao, R. Watershed Hydrological Model-Xinanjiang Model and Northern Shaanxi Model; Water Resources and Electric Power Press: Beijing, China, 1983. [Google Scholar]
Zhao, R. The Xinanjiang model applied in China. J. Hydrol. 1992, 135, 371–381. [Google Scholar]
Dimitriadis, P.; Koutsoyiannis, D.; Iliopoulou, T.; Papanicolaou, P. A global-scale investigation of stochastic similarities in marginal distribution and dependence structure of key hydrological-cycle processes. Hydrology 2021, 8, 59. [Google Scholar] [CrossRef]

Figure 1. The structure of the Xinanjiang model.

Figure 2. The flow chart of the CPU computer cluster implementation of the parallel optimization method.

Figure 3. The flow chart of the GPU computer cluster implementation of the parallel optimization method.

Figure 4. The HP Z-series computer cluster.

Figure 5. The USB KVM switch.

Figure 6. The map of the Maduwang catchment.

Figure 7. Total execution time of serial and parallel methods.

Figure 8. Speedup ratio of different implementations.

Figure 9. Total execution time of CPU cluster parallel code and GPU cluster parallel code.

Figure 10. Speedup ratio of CPU cluster parallel method and GPU cluster parallel code vs. serial code.

Figure 11. Speedup ratio of GPU cluster parallel code vs. CPU cluster parallel code.

Figure 12. Scatter plots of model parameter values vs. NSCE.

Figure 13. Observed and simulated hydrographs and the confidence interval between 5% to 95%.

Figure 14. Scatter plot of observed and simulated discharge time series.

Table 1. Parameters and their boundaries of the Xinanjiang model.

Parameters	Physical Meaning	Range and Unit
K	Ratio of potential evapotranspiration vs. pan evaporation	0.1–1.5
B	Distribution of tension water capacity coefficient	0.1–0.4
C	Deeper layer evapotranspiration coefficient	0.08–0.2
WUM	Upper soil layer water capacity	5–30 (mm)
WLM	Lower soil layer water capacity	50–100 (mm)
WDM	Deep soil layer water capacity	15–70 (mm)
IM	Impervious area ratio	0.01–0.02
SM	Free water capacity	10–50 (mm)
EX	Distribution of free water capacity coefficient	1–2
KG	Free water storage to groundwater outflow coefficient	0.1–0.6
CG	Groundwater storage recession constant	0.98–0.998 (1/d)
CI	Interflow storage recession constant	0–0.9 (1/d)
CS	Lag and route method recession constant	0–1 (1/d)
L	Lag time	0–5 (d)
XE	Muskingum method parameter	−0.5–0.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kan, G.; Li, C.; Zuo, D.; Fu, X.; Liang, K. Massively Parallel Monte Carlo Sampling for Xinanjiang Hydrological Model Parameter Optimization Using CPU-GPU Computer Cluster. Water 2023, 15, 2810. https://doi.org/10.3390/w15152810

AMA Style

Kan G, Li C, Zuo D, Fu X, Liang K. Massively Parallel Monte Carlo Sampling for Xinanjiang Hydrological Model Parameter Optimization Using CPU-GPU Computer Cluster. Water. 2023; 15(15):2810. https://doi.org/10.3390/w15152810

Chicago/Turabian Style

Kan, Guangyuan, Chenliang Li, Depeng Zuo, Xiaodi Fu, and Ke Liang. 2023. "Massively Parallel Monte Carlo Sampling for Xinanjiang Hydrological Model Parameter Optimization Using CPU-GPU Computer Cluster" Water 15, no. 15: 2810. https://doi.org/10.3390/w15152810

APA Style

Kan, G., Li, C., Zuo, D., Fu, X., & Liang, K. (2023). Massively Parallel Monte Carlo Sampling for Xinanjiang Hydrological Model Parameter Optimization Using CPU-GPU Computer Cluster. Water, 15(15), 2810. https://doi.org/10.3390/w15152810

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Massively Parallel Monte Carlo Sampling for Xinanjiang Hydrological Model Parameter Optimization Using CPU-GPU Computer Cluster

Abstract

1. Introduction

2. Methodology

2.1. Xinanjiang Model Parameter Optimization Based on Monte Carlo Sampling

2.1.1. The Monte Carlo Sampling

2.1.2. The Xinanjiang Model

2.1.3. Model Parameter Optimization

2.2. Xinanjiang Model Parameter Optimization Based on Parallel Monte Carlo Sampling

2.2.1. Parallel Monte Carlo Sampling

2.2.2. CPU Computer Cluster Implementation

2.2.3. GPU Computer Cluster Implementation

2.3. Sensitivity and Uncertainty Analysis Based on GLUE

2.4. Hardware Adopted in This Study

2.5. Software Adopted in This Study

2.6. Hydro-Meteorological Data Utilized in This Study

3. Results and Discussion

3.1. Total Execution Time Comparison

3.2. Speedup Ratio Comparison

3.3. Scalability Analysis

3.3.1. Analysis Based on Total Execution Time

3.3.2. Analysis Based on Speedup Ratio between CPU/GPU Cluster and Single-Core CPU

3.3.3. Analysis Based on Speedup Ratio between GPU Cluster and CPU Cluster

3.4. Correctness Verification Based on Sensitivity and Uncertainty Analysis of the Model Parameters

3.4.1. Sensitivity Analysis of Model Parameters

3.4.2. Uncertainty Analysis of Model Simulations

3.4.3. Correctness Verification of the Model Simulations

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI