Bayesian Modeling of Flood Frequency Analysis in Bangladesh Using Hamiltonian Monte Carlo Techniques

In recent years, several Bayesian Markov chain Monte Carlo (MCMC) methods have been proposed in extreme value analysis (EVA) for assessing the flood risk in a certain location. In this study, the Hamiltonian Monte Carlo (HMC) method was employed to obtain the approximations to the posterior marginal distribution of the Generalized Extreme Value (GEV) model by using annual maximum discharges in two major river basins in Bangladesh. As a comparison, the well-known Metropolis-Hasting (MH) algorithm was also applied, but did not converge well and yielded skewness values opposite those of HMC and the statistical characteristic of the data sets. The discharge records of the Ganges and Brahmaputra rivers in Bangladesh for the past 42 years were analyzed. To estimate flood risk, a return level with 95% confidence intervals (CI) has also been calculated. Results show that the shape parameter of each station was greater than zero, which describes the heavy-tailed Frechet cases of the GEV distributions. One station, Bahadurabad in the Brahmaputra river basin, estimated 141,387 m3·s−1 with a 95% CI range of [112,636, 170,138] for the 100-year return level, and the 1000-year return level was 195,018 m3·s−1 with a 95% CI of [122,493, 267,544]. The other station, Hardinge Bridge at the Ganges basin, estimated 124,134 m3·s−1 with a 95% CI of [108,726, 139,543] for the 100-year return level, and the 1000-year return level was 170,537 m3·s−1 with a 95% CI of [133,784, 207,289]. As Bangladesh is a flood-prone country, the approach of Bayesian with HMC in EVA can help policy-makers to plan initiatives that could result in preventing damage to both lives and assets.


Introduction
In Bangladesh, major floods are frequent, due to its unique geographic location. About one-quarter to one-third of the country is inundated by overflowing rivers during the monsoon season almost every year. Bangladesh is in the active delta of three major rivers, the Ganges, Brahmaputra and Meghna (GBM). The intensity and time duration of floods in Bangladesh typically depends on the GBM river system. Bangladesh is an agriculture-based economy. Calculating the risk level of river discharge is important for making plans to protect the ecosystem and increase crop and fish production.
Extreme value analysis (EVA) is a branch of probability theory which deals with the stochastic behavior of extreme values of a set of random variables. Typically, EVA is applied for describing a rare event, (e.g., the upper or lower tails of a distribution) [1]. The techniques of EVA are becoming widely applied in various disciplines, including environmental science [2], engineering [3], finance [4], hydroclimatology [5,6] and water resources engineering and management [7]. The fundamental concepts, techniques and guidelines of the application of the theory of EVA were detailed by Coles [8]. In hydrology, the goal of EVA is to estimate the risk to human beings and environments by extrapolating a certain range of historical observed values, such as for flood frequency analysis (FFA) or rainfall frequency analysis. In particular, EVA often demands estimation of the probability of hydrologic events that are more extreme than any that have already been observed. Here an "extreme event" is defined as the maximum value of a quantity over a given period, for example, the maximum annual discharge in a river basin.
In the application of EVA to flood peaks in at-site and regional settings, the generalized extreme value (GEV) distribution has been widely used [9][10][11]. The GEV distribution, introduced by Jenkinson, is a three-parameter distribution that associates three extreme value types into a single form [12], and is commonly used for frequency analysis of extreme values. It has been applied in various studies on regional FFA [13][14][15]. One of the principle goals of FFA is the estimation of the frequency of rare events. For practical use in the design of hydraulic structures, water management strategies and flood insurance, there is great demand for improved accuracy in the estimations yielded by FFA [16,17].
Several techniques have been recommended for parameter estimation in extreme value models. Each technique has its pros and cons. The likelihood-based technique is attractive, but the difficulty is the regularity conditions that are required for the usual asymptotic properties associated with the maximum likelihood estimator to be valid. Subsequently, Bayesian technique, an alternative parameter estimator technique of maximum likelihood estimators, has become popular in recent years [8]. The popularity can be clarified by the rediscovery of the utility of Markov chain Monte Carlo (MCMC) algorithms in the early 1990s [18]. The other reasons for the popularity of Bayesian analysis are the ability to comprise other sources of information through a prior distribution, and the posterior distribution, the output result of a Bayesian analysis, provides a more complete inference than the corresponding maximum likelihood analysis.
MCMC methods are well-known simulation methods, and are widely used for simulating data from an arbitrary distribution. MCMC methods afford a way of simulating from complex distributions by simulating from Markov chains which have the target distributions as their stationary distributions. Besides the general algorithms of MCMC, namely the Gibbs [19], the Metropolis [20,21] and the Metropolis-Hastings [22], for the purpose of estimating the parameters in GEV distributions, Hamiltonian Monte Carlo (HMC) algorithms are more efficient. In addition, the HMC parameter estimation is relatively robust and much faster [23]. In extreme value analysis with GEV models, avoidance of random-walk behavior is one of the major advantages of HMC [24]. Though HMC is much more computationally costly than Metropolis or Gibbs sampling, its proposals are typically much more effective. Therefore, it does not need as many samples to define the posterior distribution.
In determining the hydrologic risk of failure, the return period analysis is widely used. A common use of the return period is to extrapolate the recurrence interval of an event, such as flood peak discharges, droughts, earthquakes, landslides, and others. As an example, one can estimate the magnitude of a flood peak that is expected to occur once in 100 years or any other chosen period [25]. It can be estimated from quantiles of a parametric probability distribution, such as GEV distribution, fitted to the extreme values [26]. A full Bayesian MCMC provides a more exact description of flood risk and the parameter uncertainties with more accurate credible intervals.
The main aim of this article is to calculate a Bayesian regional model for the annual maximum discharge of two major rivers in Bangladesh, as well as flood risk estimation and extrapolation of the return period for the two rivers. For the block maxima or annual maximum discharge, the GEV distribution is used. For describing the Bayesian, the Hamiltonian Monte Carlo is used as the MCMC algorithm.

Data and Study Area
The area of the GBM river basin, the third largest freshwater outlet to the world's oceans [27], is about 1.7 million km 2 [28], of which 93% of the catchment is outside the geographic boundary of the Water 2018, 10, 900 3 of 21 country [29]. In the GBM basin, most of the rivers originate from the Himalayas, north of the country, and flow through the country to the Bay of Bengal, south of Bangladesh. The Ganges river originates at the Gangotri glaciers in the Himalayas, passes through Nepal, China and India and enters through the west part of the country, finally ending in the Bay of Bengal at Bangladesh. It is a snowmelt-fed river; its natural flow is controlled by several dams which were constructed by the countries upstream. The dam with the most direct influence on the flow of the Ganges into Bangladesh is the Farakka Barrage in India, located about 16 km upstream from the border. The main objective of the dam is to divert about 1130 m 3 ·s −1 of water via a canal to the Hooghly River for use in India during the dry season [30]. This reduces the discharge that reaches Bangladesh. The dam was commissioned in May 1975. This has caused a dispute between India and Bangladesh, with some temporary agreements leading to a water use treaty in 1996 [31] which established the discharge share for each country during the dry season from January to May, with 50/50 sharing when the discharge is lowest and a maximum diversion to India of 1130 m 3 ·s −1 . The treaty does not apply during the monsoon season when water is plentiful, and the diverted water flow is less than 10% of the average discharge during the monsoon period.
Since the Farakka Barrage started operation, the downstream flow has reduced significantly during the dry season months (from December to May). Among the wet months (from June to November), the flow of water also tends to be reduced in the early wet months (June and July) [32]. During June and July, the average monthly maximum peak discharge decreased by 26% and 4% respectively [33]. Also, the sediment movement has been reduced by the low flows. The lack of low-flow transport could be accountable for raising the riverbed levels [34]. However, in August, September and October, the river flow of this basin is dominated by monsoon rainfall, about 70% of rainfall in Bangladesh occurs during this time. Therefore, the Farakka Barrage has less impact on the monsoon-dominated period [32]. According to officials, "all flood gates of Farakka are kept open during the peak rainy season" [35]. Thus, during the period when annual maximum discharge occurs it can be assumed that there is no active, varying regulation of the discharge. However, the more passive effect of the open dam itself will affect the flow.
Pal & Pani [36] compared the Ganges discharge upstream and downstream of the Farakka Barrage. Over the study period (1998-2014) the annual maximum discharge downstream was always greater than upstream, which they conclude is "probably" a result of extra water release from the barrage during strong monsoon flow.
As a measure to counter the reduced flow in the dry season by retaining monsoon rain water for use in the dry season in Bangladesh, another barrage has been planned on the Ganges within the country. However, the project was postponed in 2017, pending further negotiations between India and Bangladesh [37].
The Brahmaputra river begins from a glacier in the Kailash range in Tibet at an elevation of 5300 m above sea level. It is a major transboundary river and the drainage area is around 5,300,000 km 2 . The basin is situated within four different countries: China (50.5%), India (33.6%), Bangladesh (8.1%) and Bhutan (7.8%) [38]. This river enters into the country from the north. China is planning to construct new dams in Tibet that will likely influence the Brahmaputra [39].
The Meghna river is comparatively small and runs through a mountainous region in India before entering Bangladesh. Major characteristics of the GBM river basin are shown in Table 1 (for more details see [40] and references therein).
The basins are rich with crop-related biodiversity and ecological diversity. Together they support livelihoods in this area. In the Ganges basin area, the average annual precipitation is about 1500 mm, and in the Brahmaputra basin it is about 2025 mm [41]. This is expected to increase under a warmer climate [42]. Perez and Henebry estimated that from June-October the maximum of accumulated precipitation is over 250 mm month −1 along the foothills of the Himalayas extending from Himachal Pradesh to Eastern Nepal in the Ganges basin. For the eastern part of the Ganges basin up to 80 • E, the authors estimated moderate precipitation varying between 100 and 250 mm month −1 , while west of 80 • E an estimated 100 mm month −1 accumulates in the same time duration. In the Brahmaputra basin, over 600 mm month −1 were estimated from Sikkim to Arunachal Pradesh. The northern (above 30 • N) part of this basin, over the Tibet Plateau, is mostly dry [43].  Figure 1). The total area is 147,570 km 2 . Bangladesh is a riverine country; the land was formed by the river delta process. So, most of the country (79%) is a floodplain. There are some hilly areas, 12% of the total area, which are situated in the southeast and northeast regions of Bangladesh. The rest of the area (9%) is occupied by four uplifted blocks, which are in the northwest and central parts of the country. The climate of this area is the tropical monsoon type, with a hot summer monsoon and dry season in the winter. The effect of climate on hydrology has many facets. During the summer monsoon time, from June to October, extreme seasonal rainfall occurs. Also, in the Himalayas, the storage of water in the form of snow and ice (in glaciers) provides a large water reservoir that regulates annual water distribution. As most of the rivers originate in the Himalayas, the upper catchment areas are snow-covered and flow through steep mountains. As climatic variability (sea-surface temperature, atmospheric circulation system, El Niño-southern oscillation are well-known climatic factors in this area) occurs, the impact is felt in downstream countries, that is, India and Bangladesh. Also, the seasonal rainfall of upstream countries is a major issue that contributes significantly to the streamflow contribution in the major rivers of Bangladesh. Therefore, the flood peaks are basically determined by basin-wide precipitation [47]. In the Brahmaputra basin, over 600 mm month −1 were estimated from Sikkim to Arunachal Pradesh. The northern (above 30° N) part of this basin, over the Tibet Plateau, is mostly dry [43]. Bangladesh is situated between latitudes 20°30′ N and 26°45′ N, and longitude 88°0′ E and 92°45′ E ( Figure 1). The total area is 147,570 km 2 . Bangladesh is a riverine country; the land was formed by the river delta process. So, most of the country (79%) is a floodplain. There are some hilly areas, 12% of the total area, which are situated in the southeast and northeast regions of Bangladesh. The rest of the area (9%) is occupied by four uplifted blocks, which are in the northwest and central parts of the country. The climate of this area is the tropical monsoon type, with a hot summer monsoon and dry season in the winter. The effect of climate on hydrology has many facets. During the summer monsoon time, from June to October, extreme seasonal rainfall occurs. Also, in the Himalayas, the storage of water in the form of snow and ice (in glaciers) provides a large water reservoir that regulates annual water distribution. As most of the rivers originate in the Himalayas, the upper catchment areas are snow-covered and flow through steep mountains. As climatic variability (seasurface temperature, atmospheric circulation system, El Niño-southern oscillation are well-known climatic factors in this area) occurs, the impact is felt in downstream countries, that is, India and Bangladesh. Also, the seasonal rainfall of upstream countries is a major issue that contributes significantly to the streamflow contribution in the major rivers of Bangladesh. Therefore, the flood peaks are basically determined by basin-wide precipitation [47]. Streamflow data were collected from the Hydrology Division of the Bangladesh Water Development Board (BWDB). There are only 16 stations in Bangladesh, with more than 40 years of river discharge measurement data. Taking the 3 stations closest to the outlets of the 3 rivers of the GBM system should represent the discharge of the GBM system as a whole. The 3 rivers are linked, having a common outlet to the Bay of Bengal. The Brahmaputra and Ganges join to become the Padma to the west of Dhaka. Then the Meghna joins the Padma to the south of Dhaka and flows to the bay. Here, we chose the stations located just upstream of each of these major forks in the GBM system. Station Hardinge Bridge is in the Ganges river. Station Bahadurabad is in the Brahmaputra river, and station Bhairab Bazar is in the Meghna.
In this study, 42 years of data, from 1976-2017, are used. This excludes the period before the start of operations of the Farakka Barrage in 1975. Normally, water discharges are measured weekly by the velocity-area method. The Brahmaputra river is highly braided, so here the discharge measurement was carried out on multiple channels. The Bhairab Bazar station on the Meghna has a large amount of missing data over the time period (over 40%) and could not be useful for this analysis. As the Meghna only contributes about 10% of the total discharge of the GBM basin, this loss of data should not have much effect in an overall assessment of the trends in the GBM system. Thus, only the 2 stations (Hardinge Bridge and Bahadurabad) are used in this analysis. The stations' basic information is presented in Table 2. For modeling block maxima, here the Annual Maxima (AM), extreme value theory was used. Figure 2 shows the AM of river discharge of the two stations with a time series of 42 years from 1976 to 2017. Streamflow data were collected from the Hydrology Division of the Bangladesh Water Development Board (BWDB). There are only 16 stations in Bangladesh, with more than 40 years of river discharge measurement data. Taking the 3 stations closest to the outlets of the 3 rivers of the GBM system should represent the discharge of the GBM system as a whole. The 3 rivers are linked, having a common outlet to the Bay of Bengal. The Brahmaputra and Ganges join to become the Padma to the west of Dhaka. Then the Meghna joins the Padma to the south of Dhaka and flows to the bay. Here, we chose the stations located just upstream of each of these major forks in the GBM system. Station Hardinge Bridge is in the Ganges river. Station Bahadurabad is in the Brahmaputra river, and station Bhairab Bazar is in the Meghna.
In this study, 42 years of data, from 1976-2017, are used. This excludes the period before the start of operations of the Farakka Barrage in 1975. Normally, water discharges are measured weekly by the velocity-area method. The Brahmaputra river is highly braided, so here the discharge measurement was carried out on multiple channels. The Bhairab Bazar station on the Meghna has a large amount of missing data over the time period (over 40%) and could not be useful for this analysis. As the Meghna only contributes about 10% of the total discharge of the GBM basin, this loss of data should not have much effect in an overall assessment of the trends in the GBM system. Thus, only the 2 stations (Hardinge Bridge and Bahadurabad) are used in this analysis. The stations' basic information is presented in Table 2. For modeling block maxima, here the Annual Maxima (AM), extreme value theory was used. Figure 2 shows the AM of river discharge of the two stations with a time series of 42 years from 1976 to 2017. In the 20th century, the country faced major floods in 1951,1987,1988,1998. More recent floods include the years 2004 and 2007. In 1998, 68% of total land of the country was inundated by water. At Bahadurabad station in 1988, the estimated maximum water level was 20.62 m and flowed above the danger level for 27 days. At this station, the theoretical danger level is above 19.5 m. In 1998, the estimated maximum water level was 20.37 m and flowed above the danger level for 66 days. On the other hand, the theoretical danger level of the Hardinge Bridge station is 14.25 m above water level.
In 1988, the estimated maximum level was 14.87 m and flowed above danger level for 23 days. In 1998, it was 15.19 m where water flowed above danger level for 27 days. In recent years, in 2004, 2007, 2015 water flowed above danger level at least once in Bahadurabad station. When the discharge of water flow increases in both river basins during the monsoon period (June to October), the water level increases, causing floods on the both sides of the river.

Materials and Methods
The GEV distribution is a standard tool for modeling flood peaks by using the annual maximum series (AMS). In this work, a Bayesian approach using the Hamiltonian Monte Carlo (HMC) and Metropolis Hasting (MH) algorithms was applied to estimate the parameters in the GEV model. In addition, extreme flood quantiles (up to 1000-year flood magnitude analysis) were extrapolated for risk estimation.

The GEV Distribution
The GEV distribution comprises into a single form all three Extreme Value (EV) distributions: Gumbel (EV-I, ξ = 0), Fréchet (EV-II, ξ > 0), and Weibull (EV-III, ξ < 0). The density function g(z) and cumulative distribution function G(z) of the GEV distribution can be written as [23], [8]: The distribution function is defined on the set {z: 1 + ξ (z − µ)/σ > 0}, where the parameters satisfy −∞ < µ < ∞, σ > 0 and ∞ < ξ < ∞. The distribution model has three parameters: µ is the location parameter, σ is the scale parameter and ξ is the shape parameter. The value of the shape parameter, ξ expresses the tail behavior of the distribution. The Gumbel distribution (when ξ = 0) has an exponentially decaying tail. The Fréchet distribution has a more slowly decaying tail and the Weibull distribution has an upper limit. The density function of all three forms of GEV distributions are shown in Figure 3 as an example case. When the discharge of water flow increases in both river basins during the monsoon period (June to October), the water level increases, causing floods on the both sides of the river.

Materials and Methods
The GEV distribution is a standard tool for modeling flood peaks by using the annual maximum series (AMS). In this work, a Bayesian approach using the Hamiltonian Monte Carlo (HMC) and Metropolis Hasting (MH) algorithms was applied to estimate the parameters in the GEV model. In addition, extreme flood quantiles (up to 1000-year flood magnitude analysis) were extrapolated for risk estimation.

The GEV Distribution
The GEV distribution comprises into a single form all three Extreme Value (EV) distributions: Gumbel (EV-I, ξ = 0), Fréchet (EV-II, ξ > 0), and Weibull (EV-III, ξ < 0). The density function g(z) and cumulative distribution function G(z) of the GEV distribution can be written as [23], [8]: The distribution function is defined on the set {z: 1 + ξ (z − µ)/σ > 0}, where the parameters satisfy −∞ < µ < ∞, σ > 0 and ∞ < ξ < ∞. The distribution model has three parameters: µ is the location parameter, σ is the scale parameter and ξ is the shape parameter. The value of the shape parameter, ξ expresses the tail behavior of the distribution. The Gumbel distribution (when ξ = 0) has an exponentially decaying tail. The Fréchet distribution has a more slowly decaying tail and the Weibull distribution has an upper limit. The density function of all three forms of GEV distributions are shown in Figure 3 as an example case.   Now suppose that the observed data set z = (z 1 , . . . , z n ) are realizations of a random variable with a density from the parametric family F = {f (z; θ):θ ∈ Θ)}. Also, suppose that prior beliefs about θ can be expressed by a probability density function π(θ) with no reference to the observed data. The likelihood for θ can be expressed as: The prior information and the likelihood are combined using Bayes' theorem to express a posterior distribution for θ as follows: So, the distribution notifies the beliefs about θ after observing the data and can be expressed as: The function of an EVA is to express the extremes of an observed process to find the probability of extreme events occurring in the future. If an appropriate prior can be specified, there are good reasons to choose Bayesian procedures. The main reason for rejecting the Bayesian procedures is the difficulty in calculating the integral in Equation (4). This problem can be solved by using simulation-based techniques such as MCMC to simulate realizations of the posterior distribution. Parameter estimates of the posterior distribution can then be obtained from the simulated sample.

MCMC Techniques (HMC)
MCMC techniques describe a method of simulating from a complex distribution by simulating from Markov chains which have the target distributions as their stationary distributions. There are many MCMC techniques, including HMC, which was used in this work. HMC was originally proposed by Duane [48] for simulating molecular dynamics under the name of Hybrid Monte Carlo. For an up to date review about the theoretical and practical aspects on HMC, the reader is referred to Neal [24]. The HMC method is used in the context of GEV models in this work.
In a HMC algorithm, a fictitious dynamical system is considered in which auxiliary variables called momentum, p ∈ R N , are introduced and the uncertain parameters θ ∈ R N in the target distribution π(θ) are treated as the variables for the displacement. The total energy, also known as the Hamiltonian function, of the fictious dynamical system is defined by H(θ, p) = V(θ) + W(p), where its potential energy is defined by V(θ) = −ln(P(θ|D)) and the kinetic energy is defined by W(p) = p T M −1 p/2, which only depends on p and some chosen positive definite "mass" matrix M ∈ R N×N . The Hamiltonian method is mentioned as a half-stochastic, hybrid approach that has a Molecular Dynamic trajectory. This property can be convenient when the HMC algorithm is implemented for large and complicated systems [49][50][51][52]. This method combines a molecular dynamic (MD) trajectory with a Monte Carlo (MC) acceptance-rejection step, for this reason it is called a hybrid method [51].
Using Hamilton's Equations, the evolution of (θ, p) through fictitious time t can be expressed as: The Hamiltonian function can be used to define a joint distribution, which can be expressed as: where β B = 1/K B T and K B is defined as the Boltzmann constant and T is temperature. Practically, Equations (6) and (7) have to be solved numerically using some time-stepping algorithm, most commonly-used is a leapfrog algorithm [43]. For time step δt, it works as follows: where ∇V is obtained by finite differences as: where, ∆ = [∆ 1 , ∆ 2 , . . . , ∆ N ] T is defined as the perturbation vector and h is a scalar that dictates the size of the perturbation of θ.
After each iteration of Equations (9)-(11), the output candidate is accepted or rejected according to the Metropolis criterion based on the value of the Hamiltonian H(θ, p). Summarization of the HMC algorithm can be expressed as follows: 1.
Initialize θ 0 to begin the algorithm.

3.
Initiate the leapfrog algorithm with (θ, p) and the evaluate the algorithm for L time steps to obtain (θ*, p*).

4.
Update θ* and obtain new analytic frequencies f a i and then compute H(θ*, p*).
If the probability in step 5 > r, then let θ in the next iteration be θ*, else keep as θ.
Repeat the steps 3-7 for N s samples. Here, ∆H = H(θ*, p*) − H(θ, p), and the estimated mean of the uncertain parameters can be expressed as:θ where the E( * ) defines the mean value of a quantity *. The variance of the uncertain parameters is expressed as: where the standard error is defined by σ θ = V(θ). The combination of the Bayesian inference and the HMC algorithm for parameter estimation is presented in Figure 4 as a flowchart.
In this approach, for estimating a Bayesian model using MCMC as the HMC algorithm, an open source programming language called Stan [53,54] was used for solving the mathematical terms.
To create an alternate benchmark to assess the accuracy of the MCMC results, another MCMC method, the MH algorithm, was used to estimate the parameters of the distribution models. An outline of the MH algorithm is given in Appendix A.

Inference for Return Levels
Estimates of return levels of the annual maxima are of particular interest in hydrologic extremes, as they give an estimate of the level the process is expected to exceed once, on average, in a given number of years. For the GEV distribution model, the return levels are defined in the following Equation (15): where = −log(1 − 1/ ); r is the return period (up to 1000-year flood magnitude analysis was estimated in this article). Moreover, by the delta method, where is the variance-covariance matrix of (,̂,̂) and evaluated at (,̂,̂). In the return level estimation, 95% confidence intervals (CI) were also calculated and presented.

Result and Discussion
Simulation based techniques, MCMC such as HMC and MH, have provided a way for analyzing the Bayesian techniques of extreme value data for calculating the risk level of flood frequency analysis in the major rivers in Bangladesh. This section shows the estimated parameters of the distribution model and presents the model using diagnostic plots defined as probability plots, quantile plots and density plots. Finally, the return levels of annual peak discharge were calculated, which yielded risk estimation for the subject river basin.

Inference for Return Levels
Estimates of return levels of the annual maxima are of particular interest in hydrologic extremes, as they give an estimate of the level the process is expected to exceed once, on average, in a given number of years. For the GEV distribution model, the return levels are defined in the following Equation (15) where y r = − log(1 − 1/r); r is the return period (up to 1000-year flood magnitude analysis was estimated in this article). Moreover, by the delta method, where V c is the variance-covariance matrix of (μ,σ,ξ) and evaluated at (μ,σ,ξ). In the return level estimation, 95% confidence intervals (CI) were also calculated and presented.

Result and Discussion
Simulation based techniques, MCMC such as HMC and MH, have provided a way for analyzing the Bayesian techniques of extreme value data for calculating the risk level of flood frequency analysis in the major rivers in Bangladesh. This section shows the estimated parameters of the distribution model and presents the model using diagnostic plots defined as probability plots, quantile plots and density plots. Finally, the return levels of annual peak discharge were calculated, which yielded risk estimation for the subject river basin.

Statistical Qualities of the Sample Data
The characteristics of the AM samples and best-fit distribution results of each station are shown in Table 3. The stations have similar AM values for mean and standard deviation. In a statistical analysis, skewness is a measure of the asymmetry of the probability distribution of a random variable about its mean. The Hardinge Bridge station is highly positive-skewed and the Bahadurabad station is more symmetric.
In a time series analysis, checking stationarity and homogeneity is essential if the results are to be applied to predict future events. Both parametric and non-parametric methods are used to detect the statistical significance of monotonic trends or non-stationarity.
The Von Neumann (VN) ratio test [55,56] and standard normal homogeneity (SNH) test [57,58] were used in the XLSTAT software package to check the homogeneity of the sample data set. These two test results are shown in Table 3. At the 5% significance level for the data sample size of 42, the critical value is 8.214 for measuring the homogeneity using the SNH test [58]. For both river data sets, the test statistic was less than the critical value. Thus, the sample data can pass as homogeneous. The VN ratio test also passed as homogeneous for both data series at the 5% significance level [56].
In a stationary data series, the statistical qualities are not time-dependent. While non-stationary variables provide misleading and spurious results. The Augmented Dickey-Fuller (ADF) test [59,60], a type of statistical test called a unit root test, was used to check the stationarity of sample data in this work. The ADF statistic is a negative number, and more negative is the stronger rejection of the hypothesis that there is a unit root, and thus the data series has stationarity. The results are shown in Table 3. The ADF statistic value of the stations were less than the critical value of −3.544 at the 5% significance level. Thus, the time series of this sample data pass as stationary. As the analyzed sample data were homogeneous and stationary, flood frequency analysis can be performed.
Also, the goodness-of-fit test statistic results of the commonly used frequency distributions, parametric distributions, in hydrology (see Methodology section in Alam et al. [5]) were also used and the results are presented in Table 3. The GEV distribution yielded the best-fit for Hardinge Bridge station. For Bahadurabad station, the GEV distribution was ranked third according to the test statistic results of goodness-of-fit test of the K-S and A-D. But the difference of the test statistic result of the GEV with the first ranked distribution was comparatively smaller.

Parameter Estimation of the GEV by the Bayesian MCMC Method
In practical applications, the objective of an EVA is typically an estimate of the probability of future events reaching extreme levels, naturally expressed through a predictive distribution. For example, an appropriate model for the annual maximum Z of a process is Z~GEV (µ, σ, ξ). Estimation of θ = (µ, σ, ξ) can be calculated based on the previous observed data set z = (z 1 , . . . , z n ). Defining a prior distribution is a necessary element of a Bayesian analysis. The prior distribution assumed was a trivariate normal on (µ, log(σ), ξ) with mean vector zero and diagonal and diagonal variance covariance matrices. Figure 5 shows the trace plots and posterior densities of the GEV parameters of both stations by using HMC. The HMC algorithm was implemented using the "rstan" package in the "R" programming language.
Typically, two properties are desired in the trace plots: stationarity and good mixing. Stationarity means the path is staying within the posterior distribution. More specifically, the traces all stick around a very stable central tendency and a center of gravity of each dimension of the posterior. In Figure 5, on the left side, experimental trace plots of the parameters have a stable stationarity characteristic. The second property, "good mixing" of a chain means each successive sample within each parameter is not highly correlated with sample before it. Visually, a rapid zig-zag motion of each path is seen, as the trace crosses the posterior distribution without getting mired anyplace. In the experimental trace plots, the second characteristic also shows very well. Two chains were used which are marked by black and red. The number of samples from the chains can be controlled by using the "warmup" and "iter" parameters. In the simulation process, 1000 warmup and 3000 iterations were used. The estimated posterior densities of the parameters are also shown in the same figure on the right side.
Water 2018, 10, x FOR PEER REVIEW 11 of 21 a trivariate normal on (µ, log(σ), ξ) with mean vector zero and diagonal and diagonal variance covariance matrices. Figure 5 shows the trace plots and posterior densities of the GEV parameters of both stations by using HMC. The HMC algorithm was implemented using the "rstan" package in the "R" programming language.
Typically, two properties are desired in the trace plots: stationarity and good mixing. Stationarity means the path is staying within the posterior distribution. More specifically, the traces all stick around a very stable central tendency and a center of gravity of each dimension of the posterior. In Figure 5, on the left side, experimental trace plots of the parameters have a stable stationarity characteristic. The second property, "good mixing" of a chain means each successive sample within each parameter is not highly correlated with sample before it. Visually, a rapid zigzag motion of each path is seen, as the trace crosses the posterior distribution without getting mired anyplace. In the experimental trace plots, the second characteristic also shows very well. Two chains were used which are marked by black and red. The number of samples from the chains can be controlled by using the "warmup" and "iter" parameters. In the simulation process, 1000 warmup and 3000 iterations were used. The estimated posterior densities of the parameters are also shown in the same figure on the right side. Besides the HMC method, the Metropolis Hasting (MH) algorithm, the most commonly used algorithm in the Bayesian MCMC method, was also applied for a basis of comparison to judge the suitability of the HMC method in this case. The algorithm is discussed in Appendix A. Figures A1  and A2 in Appendix A show the trace plot and posterior densities of the GEV parameters of the two stations where the MH method was applied. Judging from the figures, the parameter µ in each station was not converged well. On the other hand, the HMC gave a good convergence. Therefore, the estimated parameters and simulated values of the HMC were considered for the further analysis in this work.
The posterior means of the parameters with 95% confidence intervals for each station are given in Table 4 including the results of both the HMC and MH methods. Also, the effective number of samples, indicated as "n_eff", and healthy chain status, indicated as "Rhat" are presented in the same table.  Besides the HMC method, the Metropolis Hasting (MH) algorithm, the most commonly used algorithm in the Bayesian MCMC method, was also applied for a basis of comparison to judge the suitability of the HMC method in this case. The algorithm is discussed in Appendix A. Figures A1  and A2 in Appendix A show the trace plot and posterior densities of the GEV parameters of the two stations where the MH method was applied. Judging from the figures, the parameter µ in each station was not converged well. On the other hand, the HMC gave a good convergence. Therefore, the estimated parameters and simulated values of the HMC were considered for the further analysis in this work.
The posterior means of the parameters with 95% confidence intervals for each station are given in Table 4 including the results of both the HMC and MH methods. Also, the effective number of samples, indicated as "n_eff", and healthy chain status, indicated as "Rhat" are presented in the same table. In both stations, the shape parameters found by the HMC method are greater than zero, which yields heavy-tailed Fréchet cases. In Figure 5, it is also obvious that the posterior density of the shape parameter is positive skewed. The location and scale parameters define how dry or wet place is and how much variability there is in the yearly extremes. In Table 4, n_eff is an important tool to find the convergence diagnostic characteristics. When the effective number of samples, n_eff, is much lower than the actual number of iterations (3000) minus warmup (1000) in the chains (2 chains), it means the chains are inefficient, but possibly still valid. Warmup samples are used to adjust sampling, and so are not actually part of the target posterior distribution. It does not matter in the result how long the warmup continues. The second convergence diagnostic is the Gelman-Rubin convergence diagnostic, Rhat (R). When the output result of Rhat after convergence is above 1, it indicates that the chain has not yet converged, thus the result based on the n_eff should not be trusted. In a healthy set of chains, Rhat should approach 1. In Table 4, Rhat of all the parameters approached 1.00, and n_eff were well below the actual number of iterations (3000) minus warmup (1000). The MH method yields the shape parameter as negative for both stations. This is the opposite of both the HMC result and the statistical qualities of both data sets (see Table 3).

Diagnostic Plots and Return Level
The goal for fitting a statistical model to an observed data set is to reach a conclusion about some aspect of the population from which the data were drawn. Such a conclusion can be drawn onto the fitted model. Therefore, it is essential to check the model fits well. The various diagnostic plots for checking the accuracy of the GEV model fitted to the stream flow data of the two main rivers are shown in Figure 6. The practical application part of the EVA is the calculation of the return period analysis, which yields risk estimations for the event. These are also shown in Figure 6. In this figure the return level plot shows the discharge (m 3 ·s −1 ) heights of the maximum 1000-year return period with 95% CI. For calculating the 95% CI of the return level it is important to calculate the variance-covariance matrix, V c , of (μ,σ,ξ). The approximate variance-covariance matrix, which were calculated from the  A probability plot is defined as a comparison of the empirical and fitted distribution functions. With ordered block maximum data, the empirical distribution function should lie close to the unit diagonal. Any significant deviations from the linearity are symbolic of the approximation failing in the GEV model. A weakness of the probability plot for extreme value models is that the probability plot provides the least information in the region of most interest. This weakness is avoided by the quantile plot. Both the probability plot and the quantile plot draw the same information expressed on a different scale. However, the observations gained on different scales can be significant, because what looks like a reasonable fit on one scale may be a poor fit on the other scale. In Figure 6, both the probability plot and the quantile plot show the validity of the fitted model. Each set of the plotted points is near-linear. The corresponding density plot in Figure 6 also seems consistent with the histogram of the data. As the shape parameter is positive, the density plots of both stations show a positive skewness. In summary, all three diagnostic plots support the fitted GEV model.
The return level plot,ẑ r (from Equation (15)) against y r = − log(1 − 1/r) on a logarithmic scale, is particularly convenient for presenting extreme value models. In this process, the tail of the distribution is compressed, as a result return level estimates are easily made for long return periods. Return level heights were calculated by using the value of parameters in the Equation (15)  where Var(ẑ 1000 ) was 3.52 × 10 8 . For expressing the uncertainty level, the estimation of CI is important in risk analysis as well as the design purposes. For a certain return period, for example, for a 20-year or 50-year duration, the return level heights with 95% CI can be found from the return level plot in Figure 6.
It must be noted that in the Bahadurabad data there is a single data point (109,913 m 3 ·s −1 ) which may seem to be an outlier, as it is over 35% larger than the next highest data point. Although the river level was not as high as many of the smaller AM cases, the velocity was high. This maximum occurred in 2005, when there was no major flood. It is possible this is a result of influence of the Farakka Barrage, even though the operators claim all gates are left open during the rainy season. It is also possible that all AM data points are in part affected by the Farakka Barrage. Comparing AM for 40 years before Farakka was built to 30 years after, Gain and Giupponi found that both floods and droughts happen more frequently in the post-Farakka period [32]. Thus, it may be assumed that the results here could only be valid for the current Farakka Barrage operating period (from 1976) until such time when other major dams are built, possibly in the near future, or if operation policy of the Farakka gates changes.

Conclusions
This analysis has provided a demonstration of the Bayesian approach to modeling the extreme discharge of two major rivers in Bangladesh. Bangladesh is a flood-prone country because of its geographic characteristics. Knowing the risk level when designing infrastructure makes it possible to reduce the damage by water induced calamities in this area. Block maxima data, each from 42 years of data, was used in a GEV distribution model. There are several techniques to estimate the parameters in extreme value models. Bayesian analysis was used here because it provides a more complete inference than other parameter estimation methods, for example, the maximum likelihood method. For solving the Bayesian approach, a simulation-based technique (MCMC) was selected. As a more efficient, robust and much faster MCMC algorithm, the Hamiltonian Monte Carlo (HMC) algorithm was used. Avoidance of random-walk behavior is a major advantage of this algorithm. Before implementing the sample data in Bayesian MCMC flood frequency analysis, the homogeneity and stationarity was checked by the provided test. The sample data passed as homogeneous and stationarity. Research into the operations of the dam with the most direct influence on the flow of the Ganges into Bangladesh, the Farakka Barrage, showed that its gates are all left open during the monsoon season. Although the barrage has been shown to yield higher AM than before [32]. There is no active regulation of discharge by that dam during the monsoon.
Estimation of parameters, θ = (µ, σ, ξ) were calculated based on the historical observed data set z = (z 1 , . . . , z n ). In Bayesian analysis for the GEV distribution, the prior distribution is assumed to be a trivariate normal on (µ, log(σ), ξ) with mean vector zero and diagonal and diagonal variance covariance matrices. Two important characteristics, stationarity and good mixing, were checked in the trace plots. The resulting traces of the parameters had stable stationarity characteristics, and good mixing. Two chains were used, where the number of samples from the chains can be controlled by using the "warmup" and "iter" parameters. In the simulation process, 1000 warmup and 3000 iterations were used. There results also show the estimated posterior densities of parameters. Besides the graphical diagnostic process, numerical diagnostic methods; the effective number of samples (n_eff) and the Gelman-Rubin convergence diagnosticR were also used. Both gave a good convergence result. 95% credible intervals of the parameters were also presented. For both stations, the shape parameters were greater than zero, that shows a heavy-tailed Fréchet case with a positive skewness. The well-known Metropolis-Hasting (MH) algorithm was also used as a comparison with the HMC algorithm to assess the benefit of using HMC. The MH method resulted in poor convergence of the location parameter for each station, also yielding a negative shape parameter, which is opposite the HMC result, as well as opposite the statistical qualities of the data sets. Thus, the HMC is much better suited than the MH in this case.
Various diagnostic plots, such as probability plot, quantile plot and density plot with histogram of the data, were created to judge the accuracy of the GEV model fitted to the stream flow data of the two main rivers which were supported to the fitted GEV model. Return period estimations with 95% CI, which yields risk assessments for the events were also calculated. For the Bahadurabad station (in the Brahmaputra river basin), the estimated 100-year return level was 141,387 m 3 ·s −1 with a 95% CI range of [112,636, 170,138] and for 1000-year return level was 195,018 m 3 ·s −1 where 95% CI was [122,493,267,544]. The other station at Hardinge Bridge (in the Ganges river basin), the estimated 100-year return level was 124,134 m 3 ·s −1 with 95% CI range of [108,726, 139,543], and for the 1000-year return level was 170,537 m 3 ·s −1 with a 95% CI range of [133,784,207,289]. For a certain return period, such as for a 20-year or 50-year duration, it was also estimated with 95% CI. These results (for the Bahadurabad station) may be influenced by the Farakka Barrage. Future construction of dams or a change in Farakka gate control policy would change the hydrology of the river system. As Bangladesh is a flood prone country, this study can help policy-makers to plan initiatives that could result in preventing damage to both lives and assets. Further, the distributions found here could be used in current research on the effects of the Farakka Barrage or by future researchers as a historic baseline of comparison to determine the effects of future dam construction in the GBM system.