Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories

Multivariate hydrologic frequency analysis has been widely studied using: (1) commonly known joint distributions or copula functions with the assumption of univariate variables being independently identically distributed (I.I.D.) random variables; or (2) directly applying the entropy theory-based framework. However, for the I.I.D. univariate random variable assumption, the univariate variable may be considered as independently distributed, but it may not be identically distributed; and secondly, the commonly applied Pearson’s coefficient of correlation (g) is not able to capture the nonlinear dependence structure that usually exists. Thus, this study attempts to combine the copula theory with the entropy theory for bivariate rainfall and runoff analysis. The entropy theory is applied to derive the univariate rainfall and runoff distributions. It permits the incorporation of given or known information, codified in the form of constraints and results in a universal solution of univariate probability distributions. The copula theory is applied to determine the joint rainfall-runoff distribution. Application of the copula theory results in: (i) the detection of the nonlinear dependence between the correlated random variables-rainfall and runoff, and (ii) capturing the tail dependence for risk analysis through joint return period and conditional return period of rainfall and runoff. The methodology is validated using annual daily maximum rainfall and the corresponding daily runoff (discharge) data collected from watersheds near Riesel, Texas (small agricultural experimental watersheds) and Cuyahoga River watershed, Ohio.

In the above three types of applications, use of the copula theory separates approach II from approaches I and III with the capability of capturing the nonlinear dependence structure of studied variables, whereas the application of Pearson's linear covariance in approaches I and III is not sensitive to the nonlinear dependence structure.The advantage of approach III is that by applying the maximum entropy theory, one may reach the universal solution and better capture the shape of probability density function (PDF) [30][31][32][33][34][35][36].Considering approaches I and II, there exists one common assumption, i.e., the univariate hydrological variables are considered as independently identically distributed (I.I.D.) random variables.Although depending on how the data is collected, it may be valid to assume it as independently distributed random variables, the assumption of the variable being identically distributed may not be valid for the unviariate data with a mixed structure.The misidentification of univariate probability distribution may result in underestimation/overestimation of the joint and conditional return period in case of risk analysis.In addition, even if the I.I.D. random variable assumption is valid, the univariate distribution determined is usually not universal for the same datasets.Thus, it is important to re-evaluate the determination of univariate distributions.
With the limitations of each approach discussed above, this study attempts to utilize the advantages held by approaches II and III and aims to provide a framework to link the maximum entropy and copula theories for the study of multivariate hydrological frequency analysis to avoid misusing the assumptions.Comparing to the existing frameworks, the proposed framework has the following advantages: (i) the universal probability distribution can be obtained from appropriately defined constraints; (ii) the multi-mode can be captured using the maximum entropy theory if the data show the multi-mode structure which may result in better estimation of multivariate/conditional return periods of given events; and (iii) the nonlinear dependence can be captured among the correlated random variables by applying the copula theory rather than applying the known or entropy-based multivariate probability distribution with the dependence captured by linear covariance.For illustration, the paper applies rainfall and runoff (discharge) data from: (1) watersheds near Riesel, Texas (the agricultural experimental watersheds maintained by the USA Department of Agriculture, Agricultural Research Service), and (2) the Cuyahoga River watershed in Ohio, collected by USGS and NOAA.The paper is organized as follows: after introducing the subject in this section, univariate rainfall and runoff frequency distributions are derived using the entropy theory in Section 2. Section 3 discusses the joint probability distribution estimation using copula theory, tail dependence for extreme events and corresponding joint and conditional return period analysis.Section 4 discusses the goodness of fit statistics, and application of the methodology is presented in Section 5.The paper is concluded in Section 6.

Determination of Maximum Entropy-Based Univariate Distributions
Derivation of univariate distributions of rainfall and runoff using the entropy theory entails: (1) defining entropy and specifying the known information about the random variables in terms of constraints, and (2), maximizing entropy to obtain the probability density function using the method of Lagrange multipliers and determining these multipliers.

Entropy and Specification of Constraints
For a univariate random variable X with a continuous probability density function f X (x), the Shannon entropy [37], H(X) can be expressed as: In accordance with the principle of maximum entropy (POME) [38,39], one can obtain the most probable probability density function (PDF) for random variable X with the available information (i.e., constraints) by maximizing Equation (1).In this study, the sample statistical moments are used as constraints with two main advantages.First, it avoids assuming certain types of distributions from data based on a nonparametric approach (frequency histogram or kernel density function), and hence one may reach the universal PDF for the dataset analyzed.Second, the PDF so derived may capture the possible multi-modes embedded in the data.
It is well known that annual maximum daily rainfall amount and corresponding daily discharge are skewed to the right.Thus, at least the first three non-central sample statistical moments need to be considered as constraints.According to the probability theory, it is also known that if the excess kurtosis is significantly different from 0, the probability density function of the random variable is heavily tailed and results in the necessity to include the fourth non-central statistical moment as a constraint.This necessity is determined based on the excess kurtosis as follows: In Equations ( 2), stands for the excess kurtosis and G 2 stands for the sample excess kurtosis.Then, whether G 2 is significantly different from zero can be determined by statistic (T) as: (3) where SEK stands for the standard error of kurtosis as: In Equations (2,3), n is the sample size.For statistics T: if |T| > 2, the excess kurtosis is significantly different from zero and the fourth non-central moment needs to be applied as a constraint, otherwise, the fourth non-central moments does not need to be applied.In addition, considering the rainfall and runoff data structure, the first moment in the logarithm domain may also contribute to the PDF.Hence, the constraints for the maximum entropy-based distributions are: if excess kurtosis is not significantly different from zero:

Entropy and Specification of Constraints
With the constraints defined in Equations (4-6), the entropy function [Equation (1)] is maximized using the method of Lagrange multipliers with the resulting maximum entropy-based PDF expressed as: where ′s are the Lagrange multipliers.The PDF defined by Equation ( 7) will be able to preserve the most important statistical moments that dominate its shape.Following [40,41], the Lagrange multipliers can be estimated.In what follows, the estimation concept and procedure are described in detail.
Substituting Equation (7) into Equation ( 4) one can obtain the partition function as: or: It is proved that λ 0 is a strictly convex function of λ 1 , λ 2 , λ 3 , λ N+1 [41].Thus, one can write the objective function as: , , , ln exp ln (9) where a i stands for the sample statistical moment of the constraint.
It should be noted that the objective function Z so defined is a convex function of s, and minimizing the objective function Z will result in the maximum entropy.Now, the Lagrange parameters can be determined using Newton's method as follows: Let: Then the objective function [(Equation (9)] can be approximated with the second-order Taylor series around parameter vector λ = [ λ 1 , λ 2 ,…, λ N+1 ] as: where the elements ( ) of gradient vector G and the element ( , ) of Hessian matrix H can be written as: The Lagrange parameters can then be estimated using Newton's method with the initial parameter set 0, 0, 0, 0 and 0, 0, 0, 0, 0 and the corresponding constraints of gradient vector as G = 0.It is necessary to state that needs to be greater than 0 [42].

Bivariate Rainfall and Runoff Distribution Using Copula Theory
Using the copula theory, one may successfully capture the nonlinear dependence between rainfall and runoff (discharge) variables.The copula concept was first introduced by Sklar [43].For a bivariate case, let observations (x 1 , y 1 ), (x 2 , y 2 ),…, (x n , y n ), be drawn from the bivariate population of (X,Y) with the marginal distributions as and .Then, the joint distribution, i.e., H(X,Y) or simply H can be expressed using the copula as: where C is the copula.C is a unique mapping when and are continuous, and captures the dependence between random variable X and Y.
In what follows, the topics essential to apply the copula theory for rainfall and runoff analysis are discussed, i.e., dependence measure, choice of copulas, parameter estimation, tail dependence, and joint/conditional return period determination.

Dependence Measure for Bivariate Random Variables and Choice of Copulas
To apply the copula theory to investigate the bivariate random variables X and Y, the dependence structure can be examined using the rank-based coefficient of correlation, e.g., Kendall's , Spearman's , and Geni's  [44].The rank-based coefficient of correlation is distribution free and sensitive to the nonlinear dependence structure which makes it more robust than the commonly applied Pearson's coefficient of correlation (only sensitive to linear dependence structure).In this study, the rank-based coefficients of correlation (i.e., Kendall's , Spearman's ) were applied to detect the dependence structure of rainfall and runoff variables.
It is known that the dependence between rainfall and runoff are usually positive by nature.Thus, the copula models dealing with positive dependence are selected as the candidates to model the joint rainfall and runoff distribution.Appendix I lists the copula functions examined, including one-and two-parameter Archimedean copulas, extreme-value copulas, and Plackett copula.

Estimation of Copula Parameters
Parameters of a copula model can be estimated using nonparametric estimation through rank-based coefficient of correlation, i.e., Kendall's , Spearman's , and Geni's .The parameters can also be estimated using the maximum likelihood estimation (MLE).In this study, MLE was applied for parameter estimation.
Let the empirical probability distributions of rainfall (X) and runoff (discharge) (Y) random variables be and , then for a given copula model candidate , the maximum log-likelihood function may be written as: where θ represents the copula parameter vector, n is the sample size, and , represents the copula density function as: then, the copula parameter was optimized by maximizing the log-likelihood function or minimizing the negative log-likelihood function.

Tail Dependence of Copula
In rainfall and runoff analysis, one is usually interested in the extreme behavior of the rainfall and runoff (discharge) variables for risk analysis, i.e., , , and the conditional probability, i.e., | and (or) | .However, the best-fitted copula may not be guaranteed to appropriately model the extreme behavior [45].Thus, it is important to study the tail dependence of the bivariate rainfall and runoff data.The tail dependence may be studied either graphically using the Chi-plot [46] or numerically from an empirical copula, a given group of multivariate distributions, and a given group of copula functions [47].In this study, the tail dependence was numerically investigated by nonparametric estimation.
Nonparametric estimation was based on the empirical copula with no assumption imposed on either copula or marginal distributions [47].Let (R x , R y ) be the paired rank of the bivariate random sample , , 1, … , , the empirical copula C m is written as: then, the nonparametric upper-tail dependence coefficient may be estimated in three different forms as: where is the sample size; is the chosen threshold for Equations (14a,b); and in Equation (14b) denotes the relationship to the scant of the copula's diagonal.
Equation (14a) was first proposed in [48], whereas Equation (14b) first appeared in [49] and it is sensitive when the extreme values are not along the diagonal as SEC stands for.The threshold in Equations (14a,b) can be estimated following the heuristic plateau-finding algorithm discussed in [47].Equation (14c) was first proposed in [50] and may be appropriately applied only under the assumption that the empirical copula function approximates an extreme value (EV) copula.

Return Period of Bivariate Variables Using the Copula Theory
In rainfall and runoff analysis, the purpose of deriving the joint distribution and study of the tail dependence is to estimate the joint/conditional return period of extreme events.With the upper tail dependence appropriately assessed, the joint and conditional return period of extreme events may be studied.

Joint Return Period "AND" Case Using Copula Theory
Following [51], the joint return period can be determined with the appropriately selected copula function as follows.Considering the 2-dimensional continuous bivariate random variables , , , , the "AND" case may be determined using Kendall distribution, component-wise and most-likely excess design realizations [51].In this study, the most-likely design realization approach was adopted.For rainfall and runoff variables X and Y, the joint return period is written as: where stands for the critical layer and t stands for the joint return period: , is the joint probability density function derived from copula function as: where stands for the copula density function as Equation (13a); and and stand for the fitted univariate PDF.
Then, the design event (x, y) can be estimated by finding the maximum of the joint density function in the logarithm domain over the critical layer with the corresponding (x * , y * ) as the design event with T-year return period.The critical layer can be obtained using the Kendall distribution.

Conditional Return Period of Runoff Events Given Rainfall Events
Again, using X as rainfall random variable and Y as runoff random variable, the conditional return period of runoff events of given rainfall events can be written in two cases: Case I: Return period of runoff events conditioned on rainfall events greater than the given rainfall values: Applying the copula theory, the exceedance conditional distribution is written as: The corresponding conditional return period is written as: Case II: Return period of runoff events conditioned on rainfall events equal to the given rainfall values: similarly, the exceedance conditional probability is written as: Equation ( 16) can be also rewritten as: The corresponding conditional return period is written as: In Equations (16,17), x * represents the rainfall events; T represents the conditional return period of runoff events; and y * represents the runoff events that need to be estimated based on T and x * .In addition, Equation ( 16) is right tail increasing (RTI) if it is a nondecreasing function of x for all y, and Equation ( 17) or (17a) is stochastic increasing (SI) if it is a nondecreasing function of x for all y.
It should also be addressed that 1 in Equations (16a) and (17b) stands for the annual event.If one considers the partial duration time series (i.e., the events over a given threshold), 1 should be replaced with  (the expected number of event/year).

Goodness-of-Fit Statistics
Before applying the copula-entropy framework to study the bivariate rainfall and runoff frequency and risk analysis, the goodness-of-fit statistic test need to be performed for both fitted univariate distribution and copula functions.

Goodness-of-Fit Statistics for Univariate Distribution
With the parametric univariate probability distribution fitted to the random variable X, the goodness-of-fit statistical tests need to be performed to assess whether the fitted probability distribution is valid.In the study, three goodness-of-fit statistics were considered.
The goodness-of-fit statistics using the root mean square error (RMSE) may be expressed respectively as: where RMSE is root mean square error; is the estimated value from the fitted univariate probability distribution; is the corresponding observed value; and n is the sample size.The Kolmogorov-Smirnov (K-S) goodness-of-fit test is a nonparametric probability distribution free test.For continuous random variables, it quantifies the distance between the empirical distribution (F) and the specified distribution function ( ).The null hypothesis (H 0 ) is: X follows the specified distribution function .The alternative hypothesis (H a ) is: X does not follow the specified distribution function.The K-S goodness-of-fit statistics is defined as: where • : sample data sorted in increasing order.
The Anderson-Darling (A-D) goodness-of-fit test is the test to examine whether the sample data is drawn from a specific probability distribution.Comparing with the K-S goodness-of-fit test, the A-D goodness-of-fit test is not distribution free and gives more weight to tails than the K-S goodness-of-fit test [53].The null hypothesis (H 0 ) is: X follows the specified distribution.The alternative (H a ) is: X does not follow the specified distribution.The A-D goodness-of-fit test can be expressed as follows: (20) where is sample size;  is parameter vector of fitted probability distribution; and • is sample data sorted in increasing order.In Equation (20), the null hypothesis (H 0 ) is rejected if . .The .value is approximated using parametric bootstrap simulation for maximum entropy-based univariate distribution.

Goodness-of-Fit statistics for Copula
The formal goodness-of-fit statistics for multivariate distributions have been extensively discussed based on the copula theory [54,55].Following their discussion, the goodness-of-fit test based on the probability integral transformation (i.e., Kendall's univariate probability transformation) was employed in the study.
For a given bivariate probability distribution function using a copula function [Equation ( 12)], the corresponding Kendall's nonparametric univariate probability transformation can be written as: where n is sample size and: The null hypothesis is H 0 : the bivariate random variable can be modeled by a given copula function through the measure of the distance between K n and parametric estimation using: Now the test statistic of rank-based Cramér-von Mises statistics ( ) can be written as: (23) The corresponding P-value of the statistic is then determined using the parametric bootstrap procedure proposed in [14] outlined as follows: (1) Estimate parameter vector for the copula function using MLE with pseudo-observations.
(2) Calculate • from Equation ( 21).(3) Determine and • .The Archimedean copula family has the analytical formulation of • , and thus the statistics defined in Equation ( 22) may be calculated directly.Otherwise the Monte Carlo simulation can be applied to approximate • with the following steps:  Generate a random sample , from the fitted copula function with the sample size at least as the same length of the observed data. Calculate the approximated • using an approach similar to Equation ( 21) as:  Calculate the approximated as (25) S n ( K ) (4) Use parametric bootstrap procedure with a large number N to determine the associated P-value as follows:  Generate N bivariate random samples from the fitted copula function of the observed data. Estimate the parameters for the fitted copula functions using the generated bivariate random samples. Calculate , , 1: for each bivariate samples using Equation (21). Repeat step (3) to determine , , , for each sample. Approximate the associated P-Value for the Cramér-von Mises statistic:

Data
In this study, four watersheds were selected for analysis (two agricultural experimental watersheds in Riesel, Texas, and two watersheds from th Cuyahoga River Watershed, Ohio).Two experimental watersheds are located near Riesel (Waco), Texas, and are maintained by Agricultural Research Service (ARS) of the U.S. department of Agriculture (USDA).In what follows, the procedure for selecting rainfall-runoff events from these watersheds is outlined: (1) Agricultural experimental watershed near Riesel (Waco), Texas: The experimental watersheds near Riesel (Waco) are, W1 and Y2 watersheds [Figure 1(a)] and these were selected based on the watershed area and the length of records maintained.There are multiple raingages in both watersheds, so the Thiessen polygon method was applied to determine daily areal rainfall depth.The Thiessen polygon weights and daily rainfall and corresponding runoff were obtained from the USDA-ARS data warehouse.Furthermore, annual maximum daily rainfall amounts and the resulting daily discharges were applied for rainfall and runoff analysis.
(2) Cuyahoga River Watershed, Ohio: The discharge gages at Old Portage (USGS 04206000) and Independence (USGS 04208000) were selected for analysis.The digital terrain model (DTM) flow lines were obtained from USGS.The watersheds contributing to Old Portage and Independence are delineated in the Geographical Information System (GIS), as shown in Figure 1(b).The raingages within the watersheds were identified from the raingage information maintained by National Oceanic and Atmospheric Administration (NOAA).Again, the Thiessen polygon method was applied to determine the daily areal rainfall.The annual maximum daily rainfall amount and the resulting daily discharge were applied for rainfall and runoff analysis.Table 1 lists the pertinent information of the selected watersheds (i.e., drainage area, raingages and length of the record for each watershed).Table 2 lists the Thiessen polygon weight for Old Portage and Independence determined in GIS.This information is further applied to determine the areal rainfall amount at Old Portage and Independence.

Entropy-Based Univariate Rainfall and Runoff Distributions
As discussed in Section 2, the first moment in the logarithm domain and at least first three non-central moments (Table 3) are needed as constraints to derive the maximum entropy-based univariate distribution for rainfall and runoff random variables with the necessity of fourth non-central moment based on the study of excess kurtosis [Equations (2,3)].The study of excess kurtosis for rainfall and runoff variables indicates that the fourth non-central moment needs to be considered, except for daily rainfall of Old Portage watershed and daily runoff (discharge) of Independence watershed.With the number of the non-central moments identified, the Lagrange multipliers of the PDF defined in Equation (7) were estimated by finding the minimum of the objective function defined in Equation (9) with the constraints and Hessian matrix given by Equations (11a,b).Table 4 lists the parameters estimated for each watershed.Table 5 lists the relative differences between sample moments and those calculated from entropy-based distributions.Table 5 indicates that the sample moments were well preserved.Further, the goodness-of-fit, i.e., RMSE [Equation ( 18)], the K-S goodness-of-fit test [Equation ( 19)], and the A-D goodness-of-fit test [Equation (20)] were applied to examine whether the maximum entropy-based probability distribution may appropriately represent the underlining univariate rainfall and runoff probability distributions.The P-value was approximated using Miller's approximation for the K-S goodness-of-fit test and Monte Carlo simulation with parametric bootstrap resampling procedure (10,000 parametric bootstrap samples) for the A-D goodness of fit test.The test results in Table 6 indicate that the P-value calculated from both the K-S and A-D goodness-of-fit tests was much higher than the critical level  = 0.05.So the null hypothesis cannot be rejected, that is, the maximum entropy-based probability distribution can appropriately represent the univariate rainfall/runoff probability distributions.The RMSE results in Table 6 show that the corresponding error is also small.In addition, to compare graphically, the maximum entropy-based PDF is compared with the frequency histograms (Figures 2 and 3), which indicate the proposed maximum entropy-based probability density function is able to capture the shape of the frequency histogram.Thus, from both the formal goodness-of-fit statistics and graphical comparison for univariate rainfall and runoff random variables, the univariate entropy-based distribution derived represents the PDF of rainfall and runoff variables well.It is worth stating that the appropriate identification of univariate rainfall and runoff distribution plays an important role in the study of joint and conditional return period in case of extreme behavior of rainfall and runoff variables.

Bivariate Rainfall and Runoff Distribution
Considering rainfall and runoff as continuous random variables, the copula theory was applied to capture the dependence with a unique copula function C [Equation (12)].Table 7 lists sample Kendall's τ and Spearman's ρ rank coefficients of correlation.Results showed that overall there existed positive dependence structure for all the watersheds studied.It is therefore appropriate to apply copula functions listed in Appendix I.The parameters of the copula function were estimated using the Pseudo-Maximum Likelihood method in which the empirical marginal distribution was applied.Table 8 lists the parameters estimated and the corresponding maximum Log-Likelihood (LL).Table 8 indicates that Galambos copula, belonging to the extreme value copula family, reached the largest maximum LL for watersheds W1, Y2 and Old Portage.However, the Frank copula reached the largest maximum LL for Independence watershed.Note: [a] when θ 1 → 0 converge to Gumbel-Hougaard copula; [b] when θ 1 = 1 BB5 copula is Galambos copula; [c] whenθ 1 = 1 BB7 copula is the Clayton copula.
In order to better assess the copula functions estimated using the Pseudo-Maximum Likelihood method, the formal goodness-of-fit analysis was performed to test whether the given copula function may appropriately model the joint distribution using the goodness-of-fit test based on the integral probability transformation discussed in Section 4. The Cramér-von Mises test statistic was calculated using Equations (21)(22)(23).The corresponding P-value was approximated using Equations (24-26) with 10,000 parametric bootstrap samples.Table 9 lists the test statistics and the corresponding P-values forall the copula functions studied.It indicates: (i) the copula functions, reaching the maximum LL, can appropriately measure the full dependence of the rainfall and runoff variables, (ii) for the Independence watershed, the Plackett copula reached a much higher P-value than did the Frank copula, and there exists minimal differences for the maximum LL calculated from the Frank and Plackett copulas (4.5%).Thus, the Galambos copula can be applied to represent the joint distribution for W1, Y2 and Old Portage watersheds, and the Plackett copula can be applied to represent the joint distribution for Independence watershed.Figures 4-5 compare the empirical PDF (CDF) and the parametric PDF (CDF) determined from the fitted copula function for experimental watersheds, i.e., W1 and Y2, and Cuyahoga River watershed, i.e., Old Portage and Independence.The figures indicate that: (i) there clearly exists an upper tail dependence for experimental watersheds W1 and Y2 (joint PDF in Figure 4), (ii) the upper tail dependence for Old portage is not as significant as that of experimental watersheds, and (iii) there is no clear evidence of upper tail dependence for Independence which is an interesting finding through the study of the annual maximum daily rainfall amount and corresponding daily discharge.The findings for watersheds at Old Portage and Independence may be explained by the natural flow of the stream affected by flow diversion, storage reservoirs, and power plants located in the watersheds (USGS).To further assess the above findings numerically, the upper tail dependence coefficient was calculated from both the empirical copula and the copula function candidates (Appendix II).Equations (14a-c) were applied to determine the upper tail dependence coefficient nonparametrically from the empirical copula where the thresholds k in Equations (14a,b) were determined by applying the plateau-finding algorithm [10].The equations listed in Appendix II were applied to determine the upper tail dependence coefficient for the copula functions.Table 10 lists the results of the upper tail dependence coefficient.It shows that the differences are relatively small from the nonparametric estimation (the maximum relative difference being around 10% comparing Equations (14a,b) with Equation (14c) for W1, Y2 and Old Portage watersheds.For Independence watershed, the upper tail dependence coefficient was estimated to be close to 0 from Equations (14a,b), however it reached around 0.43 if Equation (14c) was applied.Again comparing with the graphical finding (Figure 5), Equation (14c) cannot be applied to estimate the upper tail dependence coefficient for Independence watershed, due to the strong underlining assumption of empirical copula approximating the extreme value copula.
To this end, the conclusion is that the extreme value copula can be applied to assess the upper tail dependence for W1, Y2 and Old Portage watersheds uwing the Galambos copula.No upper tail dependence was found for Independence watershed and the Plackett copula can be reasonably applied.Thus, in what follows, the Galambos and Plackett copula were applied to study the joint (and conditional) return periods.Note: [a] with b = 1 with threshold; [b] no threshold needed.

Return Period of Rainfall and Runoff Events
In rainfall and runoff frequency analysis as well as other multivariate hydrologic frequency analyses, the purpose is to estimate the joint and conditional return period (joint and conditional exceedance probabilities) of the extreme events for risk analysis and to provide a framework for engineering design.Following the discussion in Section 3.4, the rainfall and runoff events with given joint and conditional return periods were studied.

Joint Return Period of Rainfall and Runoff Events
The joint return period (i.e., 25-, 50-, and 100-yr) for the "AND" case was determined following [32] using the most-likely design realization [Equation (15)] discussed in Section 3.4.1.Using Old Portage watershed as an example, Figure 6 shows the procedure for the identification of critical layer and the corresponding rainfall and runoff event (x * , y * ).Considering the Galambos copula belonging to the extreme value copula family, the parametric Kendall distribution is given as: where  is the parameter, i.e., Kendall correlation of coefficient.Graphically, it is seen that the empirical Kendall distribution matches the parametric Kendall distribution function for the Galambos copula fairly well especially for the upper tail (Figure 6a). Figure 6b provides the graphical link for the identification of t which results in the joint K(t) being equal to the nonexceedance probability of 25-, 50-, and 100-year joint return periods.The identified t's are the cumulative probability for the identified critical layer shown in Figure 6c.Using 100-year joint return period as an example, Figure 6d plots the negative log-likelihood of function , [Equation (15b)].The critical event is then estimated by finding the minimum of the negative log-likelihood function.It is worth noting that in case of the Plackett copula applied to the Independence watershed, the Kendall distribution of the Plackett copula needs to be estimated using Monte Carlo simulation with the parametric bootstrap sampling technique as discussed in Section 4.2.
Table 11 lists the critical rainfall and runoff events with joint return period of 25-, 50-, and 100-year.The joint return period study indicates that the rainfall and runoff variables for all four watersheds are positively quadrant dependent (PQD) [28] as: , or equivalently: and for illustration purposes, for Old Portage watershed, the exceedance probabilities for rainfall events with joint return periods of 25-, 50-, and 100-year are 0.05, 0.02, and 0.01; the right side of Equation (28a) is calculated as: 0.023, 0.004 and 0.001, respectively.As discussed in Section 3.4.2,both cases were studied for conditional return period analysis.The critical runoff events (y * ) of given conditional return periods are estimated from daily rainfall amount.Table 12 lists the daily rainfall amount with univariate return period of 25-, 50-, and 100-year estimated from fitted entropy-based univariate distribution.Then the conditional return period of Case I (i.e., | was estimated using Equation ( 16) and that of Case II (i.e., | is estimated using Equation (17).Table 13 lists the runoff events obtained for Cases I and II with the conditional return periods of 25-, 50-, and 100-year.Using Old Portage as an example, Figure 7 plots the conditional exceedance probabilities for both cases.Figure 7 indicates that Equation (16) and Equations ( 17) are nondecreasing functions of given rainfall event for all runoff events.It further indicates that rainfall and runoff variables hold right tail increasing (RTI, for case I) and stochastic increasing (SI, for case II) properties.The same results are reached for the other two watersheds modeled by the Galambos copula as well (i.e., W1, Y2).On the other hand, Figure 8 plots the conditional exceedance probabilities for Independence watershed.One may note the minimal difference in exceedance probabilities (return periods) obtained by conditioning on the rainfall events of different return periods for cases I and II.This finding again indicates the RTI and SI properties do not hold for Independence watershed.

Conclusions
This study investigates the relationship between annual maximum daily rainfall amount and the corresponding daily runoff (discharge) using maximum entropy and copula theories to address the questions arising from the assumptions in the commonly applied approaches and to better estimate risk.The maximum entropy theory is applied to derive the univariate rainfall and runoff distributions.The joint distribution of rainfall and runoff is studied using the copula method.The following conclusions are drawn from the study: (1) The rainfall and runoff variables are fat tailed except for rainfall variable at Old Portage and runoff variable at Independence.Thus, except for these two cases, the fourth non-central moment is necessary to be considered as one of the constraints for the derivation of maximum entropy-based distribution.The maximum entropy-based univariate distribution can successfully model the rainfall and runoff variables, and it also provides the universal solution for the univariate rainfall and runoff frequency analysis.(2) The copula functions capturing the positive dependence structure may appropriately model the bivariate rainfall and runoff distribution.The Galambos copula (belonging to extreme value copula family) appropriately models the dependence between rainfall and runoff variables for watersheds W1, Y2 and Old Portage based on the MLE and formal goodness-of-fit statistics.
Similarly, the Plackett copula appropriately models the dependence for watershed Independence.(3) Upper tail dependence is found for watersheds W1, Y2, and Old Portage, and the nonparametric/parametric estimation of upper tail dependence coefficient indicates that the Galambos copula may again model the extreme events which in turn can be applied to study the joint and conditional return periods for these 3 watersheds.(4) No upper tail dependence is found for watershed Independence.It may be explained by the natural flow of the stream affected by diversion, storage reservoirs and power plants located in the watersheds.The fitted Plackett copula can be applied to study the joint and conditional return periods for watershed Independence.(5) The positive dependence structure and joint return period ("AND" case) study of the rainfall and runoff variables show that rainfall and runoff are positive quadrant dependent.(6) For watersheds W1, Y2, and Old Portage, Case I conditional return period indicates the right tail increasing (RTI) property, and Case II conditional return period indicates the stochastic increasing (SI) property.These findings are in agreement with the upper tail dependence identified for the above three watersheds.(7) For watershed Independence, Case I and II conditional return periods indicate that there does not exist RTI or SI (i.e., with given rainfall events of different return periods, the conditional exceedance probability exhibits minimal difference).This finding is in agreement with no upper tail dependence found for the watershed.
In summary, the study provides an appropriate framework to link the maximum entropy theory and copula theory in multivariate frequency analysis.This framework may lead to a better study of both univariate and multivariate studies and permit a better estimation of risk and better engineering design (e.g., runoff of a given rainfall event in this study).With different types of watersheds, the study shows that for experimental watersheds (well maintained and minimal human activity induced changes), the dependence and tail dependence structure between rainfall and runoff variables tend to follow the law of natural rainfall and runoff process.For the watersheds Old Portage and Independence belonging to Cuyahoga River basin, even though the positive dependence structure still holds for the whole dataset analyzed, the upper tail dependence is significantly lower.In case of watershed Old Portage, the upper tail dependence is in the range of [0.3, 0.4], and for Independence, there is no upper tail dependence existing.This may be explained by the intensity of human activity induced hydrological response changes.This finding provides an insight that one needs to pay attention to the real world situation when applying the copulas belonging to extreme value copula family (e.g., commonly applied Gumbel-Houggard copula as an example) to study the annual maximum multivariate hydrological time series.

Figure 1 .
Figure 1.Riesel experimental watershed and Cuyahoga river watershed maps.

Figure 4 .Figure 5 .
Figure 4. Comparison of empirical PDF and CDF versus parametric PDF and CDF of the best fitted copula function for experimental watersheds: W1 and Y2.

Figure 6 .
Figure 6.(a) Kendall distribution plot, (b,c) critical layer identification for 50-and 100-year event, (d) critical rainfall and runoff event for return period = 100-year as example.

Figure 7 .
Figure 7. Conditional exceedance probability estimated for Cases I and II with watershed Old Portage as an example.

Figure 8 .
Figure 8. Conditional exceedance probability for Cases I and II with watershed Independence as an example.

Table 2 .
Thiessen polygon weight for Old Portage and Independence.

Table 3 .
Sample statistics for each watershed.

Table 4 .
Lagrange multipliers for univariate rainfall and discharge distribution.
Note:  1 parameter for ln(X);  2 parameter for X;  3 parameter for X 2 ;  4 parameter for X 3 ; and  5 parameter for X4.

Table 5 .
Relative differences between sample moments and those obtained from entropy-based distribution.

Table 6 .
Goodness-of-fit statistics for univariate rainfall and discharge analysis.

Table 7 .
Rank correlation of coefficients for rainfall and discharge variables.

Table 8 .
Estimated copula parameters for bivariate rainfall and discharge analysis.

Table 10 .
Estimated upper tail dependence coefficient.

Table 13 .
Daily Runoff (m 3 /s) estimated based on Cases I and II for the return period of 50and 100-year with 50-and 100-year daily rainfall amount (mm).