Maximum Entropy-Copula Method for Hydrological Risk Analysis under Uncertainty : A Case Study on the Loess Plateau , China

Copula functions have been extensively used to describe the joint behaviors of extreme hydrological events and to analyze hydrological risk. Advanced marginal distribution inference, for example, the maximum entropy theory, is particularly beneficial for improving the performance of the copulas. The goal of this paper, therefore, is twofold; first, to develop a coupled maximum entropy-copula method for hydrological risk analysis through deriving the bivariate return periods, risk, reliability and bivariate design events; and second, to reveal the impact of marginal distribution selection uncertainty and sampling uncertainty on bivariate design event identification. Particularly, the uncertainties involved in the second goal have not yet received significant consideration. The designed framework for hydrological risk analysis related to flood and extreme precipitation events is exemplarily applied in two catchments of the Loess plateau, China. Results show that (1) distribution derived by the maximum entropy principle outperforms the conventional distributions for the probabilistic modeling of flood and extreme precipitation events; (2) the bivariate return periods, risk, reliability and bivariate design events are able to be derived using the coupled entropy-copula method; (3) uncertainty analysis highlights the fact that appropriate performance of marginal distribution is closely related to bivariate design event identification. Most importantly, sampling uncertainty causes the confidence regions of bivariate design events with return periods of 30 years to be very large, overlapping with the values of flood and extreme precipitation, which have return periods of 10 and 50 years, respectively. The large confidence regions of bivariate design events greatly challenge its application in practical engineering design.


Introduction
Extreme hydrological events (e.g., floods, rainstorms, droughts) have had disastrous effects on society and the environment in recent years.Specifically, floods, as one of the most frequent and costly natural disasters, have posed a serious threat to the human life and economic development [1][2][3].A report issued by UNISDR (2015) highlights the fact that, between 1995 and 2015, floods affected 2.3 billion people, worldwide, accounting for 56% of the people affected by weather-related disasters [4,5].Flood risk analysis can provide extremely valuable information by estimating the occurrence of floods for flood control and disaster mitigation, hydraulic structure design, reservoir management, and so on [6,7].It is widely known that, in rain-dominant watersheds, river floods are commonly triggered by extreme precipitation events [8,9].Therefore, in practice, reducing the flood risk also requires information on extreme precipitation [10][11][12].Consequently, the present work focuses on exploring the bivariate risk of annual maximum flood discharge (AMF) and associated extreme precipitation (Pr) events.
Up until now, copula functions have been used extensively to evaluate the bivariate risk of hydro-meteorological events [13][14][15][16][17].For instance, Chen et al. (2013) constructed four-dimensional copulas to model the behaviors of drought events; She et al. (2016) applied copula-based severity-duration-frequency curves to evaluate the spatio-temporal variability of dry spells and wet spells.Compared with traditional bivariate hydrologic modeling, the main advantage of copulas is that they allow the joint dependence structure to be modeled, without any restrictions on marginal distributions [18].Given this, practitioners can flexibly choose marginal and joint probability functions [19,20].Consequently, the selection of marginal distribution is of crucial importance as it strongly impacts the performance of the copula in modeling bivariate variables [6].
However, distributions that model the univariate hydro-meteorological series are diverse.In terms of hydrologic frequency analysis, the most widely used distributions are parametric ones, such as the general extreme value distribution, normal distribution, lognormal distribution, Pearson type 3 distribution, Log Pearson type 3 distribution, Gamma distribution and so on [6,[21][22][23].When utilizing these distributions, one obvious drawback is that selecting the appropriate distribution from a variety of candidates is time-consuming [12].Worse, if the univariate probability distribution is misidentified, results derived from the copulas tend to be underestimated/overestimated [24].Hence, a widely applicable probability distribution with high accuracy is urgently needed.The maximum entropy principle (MEP), first expounded by [25], offers a methodology for deriving probability distribution functions (PDFs) with a minimum of bias from limited information in a more objective way [7,26].The MEP proposes a criterion for selecting the most appropriate PDF on the basis of the rationale that the desired PDF possesses maximum uncertainty, subject to a set of constraints [27].As Zhang and Singh (2012) stated, an entropy-based methodology is able to reach a universal solution, and can better capture the shape of the probability density function, without first knowing the format of the a priori distribution [24].More moments of observations, beyond just the second moment, can be accounted for in the MEP approach.Additionally, various generalized distributions, such as Pearson type 3 distribution, Gamma distribution, etc., can be derived from the MEP-based distribution using different constraints [28,29].Attracted by the splendid performance of MEP distribution, therefore, it has been extensively used in the hydrology field [6,12,24,30].For instance, Mishra et al. (2009) employed the entropy concept to investigate the spatial and temporal variability of precipitation time series for the State of Texas, USA [31]; Rajsekhar et al. (2013) used the entropy concept to identify the homogenous regions based on drought severity and duration [32].
Given the above, the present work takes advantage of the outstanding performance of MEP distribution, and subsequently develops a framework based on a coupled MEP-copula model for bivariate hydrological risk analysis in terms of AMF and Pr.
Also of note is that uncertainty accompanies the copula-based hydrological risk analysis.As Michailidi and Bacchi (2017) stated, flood risk evaluation without accounting for uncertainty is deceptive [33].Serinaldi (2013) also stressed that the uncertainty of multivariate design event estimation should be considered carefully for practical application, rather than speculation [34].However, previous studies have paid considerably less attention to the impact of uncertainty on hydrological risk analysis [34][35][36].Therefore, another contribution of this paper is to present a framework aiming to reveal the impact of marginal distribution selection uncertainty and sampling uncertainty on hydrological risk analysis.The two sources of uncertainty are often overlooked in spite of their widely recognized importance; particularly sampling uncertainty, due to its difficult estimation and interpretation [35,36].
The Loess Plateau (LP) is known as the "cradle of Chinese civilization", and is also one of the most serious soil erosion areas worldwide.On the LP, annual average soil erosion reaches to around 2000-2500 t/km 2 , and the area suffering severe soil and water loss covers more than 60% [37].Sparse vegetation cover, highly intense rainfall events, and the long history (over 5000 years) of human activities are generally considered to be the principal factors causing severe soil loss on the LP [38].Due to the arid and semi-arid continental monsoon climate, however, most previous studies have primarily focused on low flow and drought conditions [39,40].Studies investigating the bivariate risk of flood and extreme precipitation events for the LP are still few.
Consequently, the present study primarily aims to advance the coupled MEP-copula model for bivariate risk analysis, and to reveal the impact of the marginal distribution selection uncertainty and sampling uncertainty on hydrological risk analysis.The developed framework is exemplarily applied for two catchments of the LP.The remainder of the paper is constructed as follows.Section 2 describes the study area and data.Section 3 introduces the methods adopted in this study.The results and discussion are presented in Section 4. Section 5 shows the main conclusions drawn from this study.

Study Area
The Weihe River basin (104-107 • E and 33-34 • N), located in the southern part of the LP, was selected as our study area (Figure 1).The basin has a typical continental climate, and lies in the semi-humid and semi-arid transitional zone [41].The Weihe River (hereafter WR) provides the water supply for 9300 km 2 of fertile fields in the Guanzhong Plain, and more than 61% of the Shaanxi Province's population [42].Additionally, the start-point of the well-known Silk Road Economic Belt, Xi'an City, is situated in this basin.Mean annual precipitation varies between 400 and 600 mm, of which approximately 70% falls between June and September.Floods occur frequently after rainstorms.The largest gauged flood event at the Linjiacun station since 1960 occurred in 1966, and was 4200 m 3 /s.
The Weihe River basin is also one of the most serious soil loss areas on the LP.Areas suffering from severe soil loss cover approximately 65% of the total land area of this basin.It is of note that floods accelerate soil and water loss.Accelerated serious soil loss has caused severe sediment deposition in the lower reach of the WR, which poses great challenges for local flood control.primarily focused on low flow and drought conditions [39,40].Studies investigating the bivariate risk of flood and extreme precipitation events for the LP are still few.
Consequently, the present study primarily aims to advance the coupled MEP-copula model for bivariate risk analysis, and to reveal the impact of the marginal distribution selection uncertainty and sampling uncertainty on hydrological risk analysis.The developed framework is exemplarily applied for two catchments of the LP.The remainder of the paper is constructed as follows.Section 2 describes the study area and data.Section 3 introduces the methods adopted in this study.The results and discussion are presented in Section 4. Section 5 shows the main conclusions drawn from this study.

Study Area
The Weihe River basin (104-107° E and 33-34° N), located in the southern part of the LP, was selected as our study area (Figure 1).The basin has a typical continental climate, and lies in the semihumid and semi-arid transitional zone [41].The Weihe River (hereafter WR) provides the water supply for 9300 km 2 of fertile fields in the Guanzhong Plain, and more than 61% of the Shaanxi Province's population [42].Additionally, the start-point of the well-known Silk Road Economic Belt, Xi'an City, is situated in this basin.Mean annual precipitation varies between 400 and 600 mm, of which approximately 70% falls between June and September.Floods occur frequently after rainstorms.The largest gauged flood event at the Linjiacun station since 1960 occurred in 1966, and was 4200 m 3 /s.
The Weihe River basin is also one of the most serious soil loss areas on the LP.Areas suffering from severe soil loss cover approximately 65% of the total land area of this basin.It is of note that floods accelerate soil and water loss.Accelerated serious soil loss has caused severe sediment deposition in the lower reach of the WR, which poses great challenges for local flood control.

Dataset
The Linjiacun (107°03′ E, 34°22′ N) and Huaxian (109°78′ E, 34°51′ N) stations are important control stations upstream and downstream of the Weihe River basin, respectively.The locations of the two stations are displayed in Figure 1.Annual maximum flood records (1960-2012) from the two stations are utilized.The data quality was strictly controlled by the hydrology bureau of the Yellow

Dataset
The Linjiacun (107 • 03 E, 34 •  stations are utilized.The data quality was strictly controlled by the hydrology bureau of the Yellow River Conservancy Commission before the data was released.Data collected from Linjiacun and Huaxian stations can characterize the water hazard control in the Weihe River basin. Daily precipitation data  are provided by the China Meteorological Data Sharing Service System (http://cdc.cma.gov.cn).The flood discharge is closely linked to the accumulated rainfall amounts before the occurrence of annual peaks [43].Given this, the extreme precipitation event (Pr) used in present study is defined as: where Pr denotes the accumulated rainfall from the 1st to i-th day, l is the occurrence time of peak discharge Q, n (n = 0, 1, 2, 3, 4) indicates lag time (i.e., time from peak discharge to the beginning of rainfall), Rain i means the i-th day of rainfall.The Pr 1 , Pr 2 , Pr 3 , Pr 4 and Pr 5 represent the accumulated 1-, 2-, 3-, 4-and 5-day consecutive rainfall amounts (i.e., n = 0, n = 1, n = 2, n = 3, n = 4).The Thiessen polygon method is applied to compute the areal accumulated rainfall.
To select the extreme precipitation events most closely correlated to AMF, the Kendall's tau correlation coefficient was computed (Table 1).The Kendall's tau is a rank-based coefficient that is robust to departures from normality.It can be found from Table 1 that Pr 2 and Pr 3 were most closely correlated with AMF as gauged at Xianyang and Huaxian stations, respectively.

Methodological Framework
As mentioned above, the aim of this paper is to disclose the bivariate hydrological risk and to reveal the impact of marginal distribution selection uncertainty and sampling uncertainty on hydrological risk analysis.To achieve this goal, this paper presents the following framework, as shown in Figure 2.
First, appropriate marginal distributions for AMF and Pr series were ascertained from the MEP distribution, Pearson type III distribution (P3), lognormal distribution (Logn), normal distribution (Norm) and gamma distribution (Gam).These parametric distributions are popular for characterizing the probability distributions of extreme hydrological events due to their better performance [6,44].Second, we constructed copula models to depict the dependence structure of AMF and Pr series by joining their marginal distributions.Afterwards, the joint return periods, risk and reliability of AMF and Pr pairs were estimated for hydrological risk analysis.Last, the bivariate hydrological design events of specific joint return period were selected for hydraulic engineering design.To provide robust information for hydraulic structures design, we examined the impact of the marginal distribution selection uncertainty and sampling uncertainty on bivariate hydrological design event estimation.Here, the 6 candidate marginal distributions were combined with each other to form 36 combinations for modeling the AMF and Pr series.These combinations were utilized to explore the impact of marginal distribution uncertainty.Moreover, one Monte Carlo-based algorithm was designed to discover the impact of sampling uncertainty.
The Shannon entropy

( ) H x of a probability density function (PDF) ( )
where a and b denote the lower and upper limits of the variable X, respectively.In order to attain the least biased probability distribution, the maximum entropy principle is performed by maximizing the entropy given by Equation (2) subject to the following constraints
The Shannon entropy H(x) of a probability density function (PDF) f (x) for a continuous variable where a and b denote the lower and upper limits of the variable X, respectively.In order to attain the least biased probability distribution, the maximum entropy principle is performed by maximizing the entropy given by Equation (2) subject to the following constraints where g r (x), r = 1, 2, • • • , n, denotes some known functions of x; n represents the number of constraints; g r (x) means the expectation of g r (x).Equation ( 3) states that the probability density function must satisfy the total probability theorem.Here, the g r (x) can be expressed as the power of x such that In general, different numbers of constraints would obtain different performances [50].In light of this, the present study employs 3 and 4 constraints to build the MEP-based distributions for AMF and Pr series, respectively.The corresponding distributions are denoted as MEP-3 and MEP-4, hereafter.The impact of the number of constraints in terms of MEP distribution on bivariate design event identification will be discussed in Section 4.
To achieve the maximization of entropy, the Lagrange multiplier method is one simple and frequently used method [51].Subject to the constraints expressed in Equations ( 3) and ( 4), the Lagrangian function L can be written as where λ r (r = 1, 2, • • • , n) denotes the Lagrange multipliers.f (x) can be attained through maximizing the function L, and therefore one differentiates L with respect to f (x) being equal to zero: Hence, the resulting maximum entropy-based PDF of a variable X in terms of the given constraints can be written by Inserting Equation (8) into Equations ( 3) and ( 4), respectively, we can obtain With the use of Equation ( 9), the zeroth Lagrange multiplier can be written as Substituting Equation (11) into Equation (8), f (x) can be expressed as Entropy 2017, 19, 609 Here, the PDF expressed in Equation ( 12) can preserve the most important statistical moments.The cumulative distribution function (CDF) can be obtained through integration of Equation ( 12) Based on the PDF function, the Lagrange multipliers λ r can be estimated by minimizing the convex function, shown as In the present study, the conjugate gradient method is employed to determine the Lagrange multipliers λ r in Equation ( 14).The method is superior for solving large-scale nonlinear optimization problems.Its primary advantages are super-linear convergence, simple recurrence formula, and less calculation.Readers interested in the detailed process of determining the Lagrange multipliers λ r through the conjugate gradient method are referred to the papers of Fan et al. (2016) [6].

Copula Function
The copula, as introduced by Sklar (1959) is a powerful tool for modeling the dependence structures of individual variables [52].A d-dimensional copula is defined as a multivariate distribution function F [0, 1] d → [0, 1], linking standard uniform marginal distributions.Formally, the copula can be divided into two components: individual univariate distributions, and a copula function describing dependence structures between variables based on the copula and its parameter(s).
According to Sklar's theorem, one d-dimensional multivariate distribution function F for random variables X 1 , X 2 , . . ., X d with marginal distribution of F 1 , F 2 , . . ., F d can be expressed as where Θ is the copula parameter vector and In the field of hydrology, Archimedean copulas are quite popular due to their explicit functional forms.Moreover, they are superior for characterizing a wide range of dependence structures with several desirable properties.In the present study, the Clayton, Frank, and Gumbel-Hougaard copulas, which belong to the Archimedean class of copulas, were employed to evaluate the bivariate hydrological risk.The specific formula of these copulas was first reported by Nelsen (1999) [18].

Joint Return Periods, Risk and Reliability
To conduct a bivariate frequency analysis, the joint probability behaviors are defined in terms of variables X and Y, with thresholds x and y, respectively: (1): {X>x} OR {Y>y}, (2): {X>x} AND {Y>y} Accordingly, the joint return periods (also called primary return periods) can be written as [53][54][55]: Here, µ T indicates the average inter-arrival time between two successive events (µ T = 1 for maximum annual events).In engineering practice, risk (hereafter, Risk) and reliability (hereafter, Reliability) are another two important indices for hydraulic structure design [56].To date, they have received widespread attention within the hydrological community [55][56][57][58][59].
Risk is defined as the probability of occurrence of at least one event exceeding the design event for the project life of n years: where p indicates the annual exceedance probability.
It should be noted that the Risk also can be determined from the return period T (T OR and T AND in this study) by substituting p = 1 T into Equation ( 18): Reliability, which signifies the probability that a dangerous event will not occur within a project life of n years, i.e., that a system will remain in a satisfactory state within its lifetime, is defined as:

Bivariate Design Event Derived from Joint Distribution
Practical engineering design applications desire one appropriate multivariate design event or an appropriate subset of multiplets, instead of a large set of potential multiplets for the specific return period [34,60,61].However, in a multivariate context, there exists a problem of inherent ambiguity, whereby various combinations of random variables X and Y share the same joint probability, and thus produce the same return period.Therefore, Salvadori et al. (2011) proposed a method for solving the ambiguity problem by identifying the most-likely design event [62].The essence of the method is to identify one design event lying on critical layers L F t for a critical level t with the largest joint probability density.The most-likely design realization δ can be written as: Here, f indicates the density of F(x, y) = C(F X (x), F Y (y)), which can be expressed as where f X (x) and f Y (y) represent the probability density function of F X (x) and F Y (y), respectively; c(F X (x), F Y (y)) indicates the probability density function of C(F X (x), F Y (y)).L F t is defined as: where t ∈ (0, 1).Then, the design event δ can be estimated by determining the largest joint probability density in the logarithmic domain on the critical layers L F t , with the corresponding (x * , y * ) as the design realization with the joint return period T (T OR and T AND in this study).

Uncertainty Due to the Marginal Distribution Selection
To assess the impact of marginal distribution selection uncertainty on the most-likely design event identification, we designed one experiment project by combining the 6 candidate distributions with each other to model the random variables X and Y.The 36 designed combinations are displayed in Figure 3.Then, the 36 fitted combinations (F X (x), F Y (y)) are carried into Equation ( 21), and thus the most-likely design events for the different combinations are obtained.Through comparison among these bivariate design events, the impact of marginal distribution uncertainty is expected to be discovered.

Sampling Uncertainty
To determine the impact of sampling uncertainty on the most-likely design event estimation, the following Monte Carlo-based procedures were designed: 1. Estimate the parameter Θ of the copula for the observations (i.e., X and Y) as well as the parameters x α and y α of the marginal distributions for X and Y, respectively; 2. Simulate B bivariate samples of size n on the basis of the copula parameter, and then apply the marginal backward transformations using the estimated parameters x α and y α .The simulated bivariate samples are denoted as ) . n is equal to the length of the observed sample.Here, B is set equal to 10,000; 3. Estimate the parameters Θ , x α and y α for the simulated sample * j Z using the same estimation method used for the observations; 4. Identify the most-likely design realization δ for different ( ) x, y pairs. 5. Estimate the confidence intervals for δ at 95% confidence level by the method of highest density regions (denoted as HDR) propose by Hyndman et al. (1996) [63].

Marginal Distribution Selection
To construct the copula model for (AMF, Pr) in the study regions, the first step was to select appropriate marginal distributions.Table 2 lists the relevant parameters for different marginal distributions for the AMF and Pr series in the study regions.

Sampling Uncertainty
To determine the impact of sampling uncertainty on the most-likely design event estimation, the following Monte Carlo-based procedures were designed: 1.
Estimate the parameter Θ of the copula for the observations (i.e., X and Y) as well as the parameters α x and α y of the marginal distributions for X and Y, respectively; 2.
Simulate B bivariate samples of size n on the basis of the copula parameter, and then apply the marginal backward transformations using the estimated parameters α x and α y .The simulated bivariate samples are denoted as Z * = (X * , Y * ) = x ij , y ij , (i = 1, . . ., n; j = 1, . . ., B ). n is equal to the length of the observed sample.Here, B is set equal to 10,000; 3.
Estimate the parameters Θ, α x and α y for the simulated sample Z * j using the same estimation method used for the observations; 4.
Identify the most-likely design realization δ for different (x, y) pairs. 5.
Estimate the confidence intervals for δ at 95% confidence level by the method of highest density regions (denoted as HDR) propose by Hyndman et al. (1996) [63].

Marginal Distribution Selection
To construct the copula model for (AMF, Pr) in the study regions, the first step was to select appropriate marginal distributions.Table 2 lists the relevant parameters for different marginal distributions for the AMF and Pr series in the study regions.
; PDF of Norm: ; PDF of Gam:    It can be seen from this figure that the CDFs and PDFs exhibit variations in fitting performance between the theoretical and empirical distributions.In spite of this, it is difficult to select an appropriate fitting distribution by visual assessment.Therefore, the widely used root mean square error (RMSE) was employed to select the most appropriate model from among the candidate distribution models.The marginal distribution characterized by the minimum RMSE value was selected as the preferred model.Moreover, the goodness-of-fit test (the Kolmogorov-Smirnov (K-S) It can be seen from this figure that the CDFs and PDFs exhibit variations in fitting performance between the theoretical and empirical distributions.In spite of this, it is difficult to select an appropriate fitting distribution by visual assessment.Therefore, the widely used root mean square error (RMSE) was employed to select the most appropriate model from among the candidate distribution models.The marginal distribution characterized by the minimum RMSE value was selected as the preferred model.Moreover, the goodness-of-fit test (the Kolmogorov-Smirnov (K-S) approach) was also performed to provide support in evaluating the validity of these distribution models.The K-S statistic (denoted as S) aims to quantify the largest vertical difference between the empirical and estimated distributions [64].The p-value of the K-S statistic was obtained by using Miller's approximation.A value of p bigger than 0.05 indicates that the candidate distribution can appropriately fit random variables at the 5% significance level.
The RMSE values and K-S test results are presented in Table 3.It can be seen from Table 3 that the p values were much higher than the significance level 0.05, signifying that these candidate distributions are suitable for fitting the distributions of the AMF and Pr series.The RMSE values listed in Table 3 indicate that the MEP-4 distribution could be selected as the appropriate distribution for fitting the AMF and Pr series in the upper catchment of Linjiacun station, while MEP-3 and MEP-4 performed best among the candidates for fitting the AMF and Pr series, respectively, in the upper catchment of Huaxian station.A closer look at the fitting performance of these distributions for the AMF series at Linjiacun station, as presented in Figure 4 and Table 3, indicates that the fitting performance among the different distributions were similar, except for the Norm distribution.The RMSE values were 0.0303, 0.0296, 0.0309, 0.0366 and 0.0405 for the MEP-3, MEP-4, P3, Logn and Gam distributions, respectively, and 0.0823 for the Norm distribution.In terms of the Pr 2 series in the upper catchment of Linjiacun station, the Norm distribution had the worst performance of all the candidates.
As for the AMF series at the Huaxian station, the MEP-3 distribution outperformed other candidates (RMSE = 0.0195).The RMSE values for MEP-4, P3, Logn, Norm and Gam distributions ranged from 0.0202 to 0.0686.As shown in Figure 4, the histogram of the AMF series at Huaxian station has two peaks.MEP distribution can deal with multiple modes, while conventional models fail to fit a distribution with more than one mode.This is the reason that the conventional distributions show poor performance when fitting the AMF series at Huaxian station, while for the Pr 3 series in the upper catchment of Huaxian station, a comparison of RMSE values among the candidates indicates that the MEP-4 distribution (RMSE = 0.0267) performed better than other candidates (ranging from 0.0299 to 0.0390).
In light of the above findings, it can be concluded that MEP-related functions provide a better alternative than other conventional distributions for modeling the AMF and Pr series.Specifically, due to the PDF of the variables consisting of a single mode, the fitting performances of MEP-related distributions were similar to certain conventional distributions.The fitting performance of distributions for the AMF and Pr 2 series at Linjiacun station and the Pr 3 series at Huaxian station are able to demonstrate this inference.Moreover, it also can be inferred that the MEP-related distributions outperform the conventional distributions, particularly when modeling the distribution of random variables exhibiting multiple modes in the histogram, such as the AMF series at Huaxian station.These findings are similar to those obtained in Zhou et al. (2010) and Liu and Chang (2011) when fitting the distribution of wind speed data [65,66].
Additionally, we should also note that there exist some disadvantages to the MEP distribution; for example, the CDF of MEP-related distributions cannot be expressed in a closed form, parameter estimation is computationally expensive, and the mathematical expression of MEP-related distributions is more complex to develop as a computer program [6,66].

Copula Function Construction
Once the marginal distribution is chosen, the next step is to estimate the parameters of the copula and select an appropriate copula function from among the Clayton, Frank and Gumbel copulas.The parameters of the copulas are estimated using the maximum pseudo-likelihood method.The constructed MEP-related distributions are used to quantify the marginal probabilities of the AMF and Pr series.The estimated parameters of these candidate copulas are listed in Table 4.To select the suitable copulas, the Cramér-von Mises test was used to test goodness-of-fit based on the empirical copula.Table 4 lists the goodness-of-fit statistics S n corresponding to the Cramér-von Mises criteria, and their associated p-value based on N = 10,000 parametric bootstrap samples.A larger value of S n indicates greater distance between the estimated and the empirical copulas.p-value > 0.05 means the estimated copula can be accepted at the 5% level.Results displayed in Table 4 illustrate that these candidate copulas are all acceptable for fitting the dependence structures between AMF and Pr in each region at a 5% significance level.
Further, the corrected Akaike information criterion (AICc) indicator is employed to select the copula with the highest fitting performance (shown in Table 4).The AICc indicator is much stricter than classical AIC, particularly when the size of the hydrological observations is limited [67].The copula distribution characterized by the minimum AICc value was selected as the preferred model.It can be seen from Table 4 that the Gumbel and Frank copulas should be chosen to model the joint distribution of AMF and Pr series in the upper catchments of Linjiacun and Huaxian stations, respectively.For simplicity, the bivariate model constructed by integrating the MEP distributions into the copula is denoted as MEP-copula.

Bivariate Return Period, Risk and Reliability Analysis Based on the MEP-Copula
Exploring the concurrence probabilities of various combinations of AMF and Pr is of critical importance for practical flood control and disaster mitigation.As expressed as Equations ( 16) and ( 17), the bivariate return periods are estimated based on the constructed MEP-copula models.Figure 5 displays the bivariate joint return periods of "AND" and "OR" cases for various (AMF, Pr) pairs.It can be seen from Figure 5 that the joint return period level of the "AND" case is concave, Entropy 2017, 19, 609 14 of 23 while that for the "OR" case is convex.Generally, the joint return period of the "AND" case is lower than that of the "OR" case.For instance, if both the AMF and Pr observed in the upper catchment of Linjiacun station are in the 20-year return period, the "AND" joint return period of the (AMF, Pr) pair is 42.23 years, while the "OR" joint return period is 13.10 years.Additionally, in practical terms, water resource managers and policy-makers can identify the return periods for various (AMF, Pr) pairs of observations or forecasts through Figure 5.   A,C) denote the joint return period for the "OR" case, while (B,D) denote the joint return period for the "AND" case.
In engineering design of hydrological infrastructures, risk can be defined as the likelihood of experiencing at least one event exceeding the design event over the design life (denoted as n) of the hydraulic structure [6].Furthermore, reliability over the project life is commonly chosen to describe the probability that the hydraulic structure will remain in a satisfactory state within its project life [60].In the present study, the "AND" joint return period case is applied to define the bivariate risk and reliability through Equations ( 18)-( 20).Here, the service time of a river levee is assumed to be 10 years, i.e., n = 10.
Figure 6 exhibits the bivariate risk and reliability under different designed AMF and Pr combinations in the two regions.It can be found from Figure 6 that the risk value would decrease as the designed AMF or Pr increased, which is contrary to the reliability value.In other words, higher values of the designed AMF or Pr for the hydraulic structure indicate a smaller probability for the occurrence of an undesirable flood event within a project life of n years, and a higher probability that the hydraulic structure will remain in a satisfactory state.
The implication for the bivariate risk and reliability of (AMF, Pr) pairs is to provide decision support for hydraulic structures and practical flood control.Decision-makers can attain the corresponding risk and reliability under various AMF and Pr scenarios as the design events for the hydraulic structure.For example, if the design event for a river levee in the downstream area of Huaxian station is set to (5000 m 3 /s, 39.02 mm), the corresponding risk (reliability) that that event  A,C) denote the joint return period for the "OR" case, while (B,D) denote the joint return period for the "AND" case.
In engineering design of hydrological infrastructures, risk can be defined as the likelihood of experiencing at least one event exceeding the design event over the design life (denoted as n) of the hydraulic structure [6].Furthermore, reliability over the project life is commonly chosen to describe the probability that the hydraulic structure will remain in a satisfactory state within its project life [60].
In the present study, the "AND" joint return period case is applied to define the bivariate risk and reliability through Equations ( 18)- (20).Here, the service time of a river levee is assumed to be 10 years, i.e., n = 10.
Figure 6 exhibits the bivariate risk and reliability under different designed AMF and Pr combinations in the two regions.It can be found from Figure 6 that the risk value would decrease as the designed AMF or Pr increased, which is contrary to the reliability value.In other words, higher values of the designed AMF or Pr for the hydraulic structure indicate a smaller probability for the occurrence of an undesirable flood event within a project life of n years, and a higher probability that the hydraulic structure will remain in a satisfactory state.
The copula functions that modeled the joint distributions of (AMF, Pr) pairs in the upper catchments of the Linjiacun and Huaxian stations were the Gumbel and Frank copulas, respectively.Figure 3 exhibits the analyzed combinations of different marginal distributions.Given T OR = T AND = 30 years, and the estimated most-likely design events under different combinations of marginal distributions are shown in Figure 7 and Table 5. significance level.The copula functions that modeled the joint distributions of (AMF, Pr) pairs in the upper catchments of the Linjiacun and Huaxian stations were the Gumbel and Frank copulas, respectively.Figure 3 exhibits the analyzed combinations of different marginal distributions.Given TOR = TAND = 30 years, and the estimated most-likely design events under different combinations of marginal distributions are shown in Figure 7 and Table 5.It can be seen from Figure 7 and Table 5 that there is a remarkable difference among these design events for the "OR" case under different combinations of marginal distributions.Among these estimated (AMF, Pr) pairs, the AMF value ranges from 3108.07 m 3 /s to 4681.88 m 3 /s in the upper catchment of Linjiacun station, while the Pr value ranges from 51.29 mm to 82.35 mm.The corresponding cumulative probability for the AMF series varies between 0.97 and 0.98, while that for the Pr series ranges between 0.97 and 0.99.
As for the "AND" case, it can be seen from Figure 7 that the difference among these design events estimated under different combinations of marginal distributions is smaller than that for the "OR" case.Take the upper catchment of Linjiacun station, for example; among these estimated (AMF, Pr) pairs, the AMF value ranges from 2481.91 m 3 /s to 3204.83 m 3 /s in the "AND" case, while the Pr It can be seen from Figure 7 and Table 5 that there is a remarkable difference among these design events for the "OR" case under different combinations of marginal distributions.Among these estimated (AMF, Pr) pairs, the AMF value ranges from 3108.07 m 3 /s to 4681.88 m 3 /s in the upper catchment of Linjiacun station, while the Pr value ranges from 51.29 mm to 82.35 mm.The corresponding cumulative probability for the AMF series varies between 0.97 and 0.98, while that for the Pr series ranges between 0.97 and 0.99.
As for the "AND" case, it can be seen from Figure 7 that the difference among these design events estimated under different combinations of marginal distributions is smaller than that for the "OR" Entropy 2017, 19, 609 18 of 23 case.Take the upper catchment of Linjiacun station, for example; among these estimated (AMF, Pr) pairs, the AMF value ranges from 2481.91 m 3 /s to 3204.83 m 3 /s in the "AND" case, while the Pr value ranges from 42.96 mm to 60.07 mm.The corresponding cumulative probability of AMF values varies between 0.89 and 0.96, while that for the Pr value varies between 0.84 and 0.95.
To further explore the reasons for the difference among design events for the "OR" and "AND" cases, we exemplarily display part of CDF curves for AMF and Pr series for the upper catchment of Linjiacun station in Figure 8. Figure 8 illustrates that the difference of the fitting performance of different marginal distributions increases with the values of the variables.In other words, the smaller the value of AMF/Pr or the cumulative probability of AMF/Pr is, the smaller the difference in fitting performance among these distributions is.When T OR = T AND = 30 years, the corresponding cumulative probability of AMF in the "OR" case ([0.97-0.98]) is larger than that in the "AND" case ([0.89-0.96]),which is the same as that of Pr.Therefore, the difference among design events in the "AND" case is smaller than that in the "OR" case.This finding is consistent with that of Dung et al. (2015) [35].To further explore the reasons for the difference among design events for the "OR" and "AND" cases, we exemplarily display part of CDF curves for AMF and Pr series for the upper catchment of Linjiacun station in Figure 8. Figure 8 illustrates that the difference of the fitting performance of different marginal distributions increases with the values of the variables.In other words, the smaller the value of AMF/Pr or the cumulative probability of AMF/Pr is, the smaller the difference in fitting performance among these distributions is.When TOR = TAND = 30 years, the corresponding cumulative probability of AMF in the "OR" case ([0.97-0.98]) is larger than that in the "AND" case ([0.89-0.96]),which is the same as that of Pr.Therefore, the difference among design events in the "AND" case is smaller than that in the "OR" case.This finding is consistent with that of Dung et al. (2015) [35].

Uncertainty Due to the Limited Size of Hydrological Records
In order to uncover the impact of sampling uncertainty on the most-likely design event estimation, the designed Monte Carlo algorithm in Section 3.5.2was utilized.Here, the AMF and Pr series were modeled by the MEP-related distributions.The Gumbel and Frank copulas were applied to describe the joint distribution of (AMF, Pr) pairs in the upper catchments of Linjiacun and Huaxian stations, respectively.Moreover, Figure 9 clearly shows the confidence regions of the most-likely design events with TOR = TAND = 30 years using the HDR method.The highest density regions (95% and 99%) are exhibited in a two-dimensional plane (AMF-Pr) that correspond to a return period of 30 years.It can be seen from Figure 9 that the confidence regions of the most-likely design events for TOR = TAND = 30 years are very large in the study regions.The 95% confidence region for the most-likely design events in terms of AMF and Pr with a return period of 30 years could range between values for AMF and Pr with return periods of 10 and 50 years, at least.Take the upper catchment of Linjiacun station, for example; in the 95% region for TOR = 30 years, the AMF could assume values with a univariate return period from 10 to 50 years.Large uncertainties due to sampling uncertainty can also be found in the works of [34,35,68,69], i.e., the return periods of the most-likely design events overlap.These large uncertainties present a significant challenge for reservoir design, flood risk, etc., particularly for Guanzhong plain, as one of the most important Chinese agricultural production regions, and Xi'an city, as one of the four major ancient capitals of civilization, which lie in the floodplain between Linjiacun and Huaxian stations.Uncertainty of copula-based frequency analysis for the study regions should arouse critical concern, particularly when constituting policy for flood control and hazard reduction.

Uncertainty Due to the Limited Size of Hydrological Records
In order to uncover the impact of sampling uncertainty on the most-likely design event estimation, the designed Monte Carlo algorithm in Section 3.5.2was utilized.Here, the AMF and Pr series were modeled by the MEP-related distributions.The Gumbel and Frank copulas were applied to describe the joint distribution of (AMF, Pr) pairs in the upper catchments of Linjiacun and Huaxian stations, respectively.Moreover, Figure 9 clearly shows the confidence regions of the most-likely design events with T OR = T AND = 30 years using the HDR method.The highest density regions (95% and 99%) are exhibited in a two-dimensional plane (AMF-Pr) that correspond to a return period of 30 years.It can be seen from Figure 9 that the confidence regions of the most-likely design events for T OR = T AND = 30 years are very large in the study regions.The 95% confidence region for the most-likely design events in terms of AMF and Pr with a return period of 30 years could range between values for AMF and Pr with return periods of 10 and 50 years, at least.Take the upper catchment of Linjiacun station, for example; in the 95% region for T OR = 30 years, the AMF could assume values with a univariate return period from 10 to 50 years.Large uncertainties due to sampling uncertainty can also be found in the works of [34,35,68,69], i.e., the return periods of the most-likely design events overlap.These large uncertainties present a significant challenge for reservoir design, flood risk, etc., particularly for Guanzhong plain, as one of the most important Chinese agricultural production regions, and Xi'an city, as one of the four major ancient capitals of civilization, which lie in the floodplain between Linjiacun and Huaxian stations.Uncertainty of copula-based frequency analysis for the study regions should arouse critical concern, particularly when constituting policy for flood control and hazard reduction.The most-likely design events with the "OR" (A,C) and "AND" (B,D) joint period of 30 years.Black lines denote the "OR" and "AND" joint periods estimated with the original data series.The shaded area denotes the confidence regions of the most-likely design events with T OR = T AND = 30 years at 95% and 99% confidence interval.

Summary and Conclusions
In this study, one general framework, aiming to analyze bivariate hydrological risk through a coupled maximum entropy-copula method and to reveal the impact of marginal distribution selection uncertainty and sampling uncertainty on hydrological risk analysis, is proposed.
The framework excels previous studies in applying the maximum entropy principle-based marginal distribution for modeling random variables and accounting for the impact of different uncertainty sources on hydrological risk analysis.The joint return periods, risk reliability, and bivariate design events are derived based on the coupled maximum entropy-copula method.For the purpose of practical engineering design applications, the so-called most-likely design event is identified to characterize the bivariate design event.To reveal the impact of marginal distribution selection uncertainty and sampling uncertainty on the bivariate design event identification, we designed a corresponding experiment project and specific Monte Carlo-based algorithm to achieve the two goals, respectively.Here, to elucidate the impact of marginal distribution uncertainty on the bivariate design event identification, 6 candidate distributions were combined with each other to produce 36 combinations for fitting univariate flood and extreme precipitation series.Then, these combinations concerning the marginal distributions of flood and extreme precipitation events were utilized to derive the bivariate design event.For the second goal, the Monte Carlo-based algorithm was designed to disclose the impact of sampling uncertainty on the bivariate design event identification.
Two sub-catchments of Loess Plateau, which were typical eco-environmentally vulnerable regions, were selected as the study regions.The primary conclusions are drawn as follows: (1) The maximum entropy principle (MEP)-based distributions outperform the conventional distributions (i.e., P3, Logn, Norm and Gam at least in this study) in quantifying the probability of flood and extreme precipitation events.Results of this study indicate the better performance of MEP distribution, suggesting that it could be an attractive alternative for quantifying the marginal probability of random variables.(2) The Gumbel and Frank copulas were suitable dependence models for quantifying the joint probabilities of flood and extreme precipitation events in the upper catchments of Linjiacun and Huaxian stations, respectively.(3) The bivariate return periods, risk and reliability of flood and extreme precipitation events for the two study regions were calculated based on the coupled maximum entropy-copula models, which were expected to provide practical support for the local flood control and disaster mitigation.(4) The bivariate design realizations were estimated for the study regions.Comprehensive uncertainty analysis revealed that the fitting performance of univariate distribution is closely related to the bivariate design event identification.If the difference of the fitting performance between two marginal distributions is small, values of the bivariate design events are similar, and vice versa.Therefore, advanced univariate distribution is critical for the bivariate design event selection.
Most importantly, the uncertainty related to the limited sample size is considerable, and should arouse critical attention.The bivariate design events of a specific return period exhibit significant variation.In other words, the return periods of the most-likely design events overlap.The 95% confidence regions of bivariate design events for flood and extreme precipitation with a return period of 30 years could reach between the values for flood and extreme precipitation with return periods of 10 and 50 years.The overlap phenomenon poses great challenges for practical engineering design applications, flood control, and so on.To enable a more reliable estimation of the design realization, increasing the information content by expanding the temporal, spatial or causal data is desirable, as proposed by Merz and Blöschl (2008) [70].

Figure 1 .
Figure 1.Location of the studied watershed.

Figure 1 .
Figure 1.Location of the studied watershed.
22 N) and Huaxian (109 • 78 E, 34 • 51 N) stations are important control stations upstream and downstream of the Weihe River basin, respectively.The locations of the two stations are displayed in Figure 1.Annual maximum flood records (1960-2012) from the two Entropy 2017, 19, 609 4 of 23

Figure 2 .
Figure 2. Flow chart of hydrological risk analysis under uncertainty.

Figure 2 .
Figure 2. Flow chart of hydrological risk analysis under uncertainty.

1 .Figure 3 .
Figure 3. Experimental design for assessing the uncertainty of marginal distribution in bivariate design event estimation.

Figure 3 .
Figure 3. Experimental design for assessing the uncertainty of marginal distribution in bivariate design event estimation.

Figure 4
Figure 4 illustrates the distributions of the AMF and Pr series in the upper catchments of Linjiacun and Huaxian stations fitted by the MEP-3, MEP-4, P3, Logn, Norm and Gam distributions.The curves of CDF and PDF are exhibited in this figure.
a complete gamma function.

Figure 4
Figure 4 illustrates the distributions of the AMF and Pr series in the upper catchments of Linjiacun and Huaxian stations fitted by the MEP-3, MEP-4, P3, Logn, Norm and Gam distributions.The curves of CDF and PDF are exhibited in this figure.

Figure 4 .
Figure 4. Frequency distributions of the AMF and Pr series in the upper catchments of Linjiacun and Huaxian stations.(A,C,E,G) denote the cumulative probabilities of the AMF and Pr series, while (B,D,F,H) denote the probability density of the AMF and Pr series.

Figure 4 .
Figure 4. Frequency distributions of the AMF and Pr series in the upper catchments of Linjiacun and Huaxian stations.(A,C,E,G) denote the cumulative probabilities of the AMF and Pr series, while (B,D,F,H) denote the probability density of the AMF and Pr series.

Entropy 2017 ,
19, 609 14 of 23 resource managers and policy-makers can identify the return periods for various (AMF, Pr) pairs of observations or forecasts through Figure5.

Figure 5 .
Figure 5. Bivariate return periods for (AMF, Pr) pairs in the upper catchments of Linjiacun and Huaxian stations.(A,C) denote the joint return period for the "OR" case, while (B,D) denote the joint return period for the "AND" case.

Figure 5 .
Figure 5. Bivariate return periods for (AMF, Pr) pairs in the upper catchments of Linjiacun and Huaxian stations.(A,C) denote the joint return period for the "OR" case, while (B,D) denote the joint return period for the "AND" case.

Figure 7 .
Figure 7. Most-likely design events under different combinations of marginal distributions.(A,C) denote the most likely design events under different combinations of marginal distributions when TOR = 30 years in the upper catchments of Linjiacun and Huaxian stations, respectively, while (B,D) denote the most likely design events when TAND = 30 years in the two regions, respectively.

Figure 7 .
Figure 7. Most-likely design events under different combinations of marginal distributions.(A,C) denote the most likely design events under different combinations of marginal distributions when TOR = 30 years in the upper catchments of Linjiacun and Huaxian stations, respectively, while (B,D) denote the most likely design events when T AND = 30 years in the two regions, respectively.

Figure 8 .
Figure 8. Part of CDF curves for AMF (A) and Pr (B) series in the upper catchment of Linjiacun station.

Figure 8 .
Figure 8. Part of CDF curves for AMF (A) and Pr (B) series in the upper catchment of Linjiacun station.

Figure 9 .
Figure9.The most-likely design events with the "OR" (A,C) and "AND" (B,D) joint period of 30 years.Black lines denote the "OR" and "AND" joint periods estimated with the original data series.The shaded area denotes the confidence regions of the most-likely design events with T OR = T AND = 30 years at 95% and 99% confidence interval.

Table 1 .
Correlation coefficients between AMF and Pr.Bold numbers denote the extreme precipitation events most correlated with AMF.

Table 2 .
Parameters of the fitted distribution for the AMF and Pr series in the catchments of Linjiacun and Huaxian stations.

Table 2 .
Parameters of the fitted distribution for the AMF and Pr series in the catchments of Linjiacun and Huaxian stations.

Table 3 .
The goodness-of-fit and RMSE values of the candidate distributions for the AMF and Pr series in the study regions.

Table 4 .
Parameters and goodness-of-fit test values for the candidate copulas.
Entropy 2017, 19, 609 18 of 23 value ranges from 42.96 mm to 60.07 mm.The corresponding cumulative probability of AMF values varies between 0.89 and 0.96, while that for the Pr value varies between 0.84 and 0.95.