Confidence Interval Estimation for Precipitation Quantiles Based on Principle of Maximum Entropy

The principle of maximum entropy (POME) has been used for a variety of applications in hydrology, however it has not been used in confidence interval estimation. Therefore, the POME was employed for confidence interval estimation for precipitation quantiles in this study. The gamma, Pearson type 3 (P3), and extreme value type 1 (EV1) distributions were used to fit the observation series. The asymptotic variances and confidence intervals of gamma, P3, and EV1 quantiles were then calculated based on POME. Monte Carlo simulation experiments were performed to evaluate the performance of the POME method and to compare with widely used methods of moments (MOM) and the maximum likelihood (ML) method. Finally, the confidence intervals T-year design precipitations were calculated using the POME for the three distributions and compared with those of MOM and ML. Results show that the POME is superior to MOM and ML in reducing the uncertainty of quantile estimators.


Introduction
One of the objectives of hydrological frequency analysis is to estimate the magnitude of a hydrologic event with a given return period [1]. Due to limited data records, inappropriate assumption regarding the parent distribution, and errors associated with parameters estimation, there are inevitably uncertainties in this estimation [2,3]. Hence, a point estimate of quantile corresponding to a desired return period is usually not enough because it cannot adequately describe the reliability of the estimation. Confidence interval is a convenient approach to quantifying the uncertainty of the estimates and provides more information than just a point estimate or its standard error [4].
The calculation of confidence interval requires a standard error of quantile estimator, and several methods have been proposed for determining such standard error. Hoshi and Barges derived the expressions for calculating the sampling variances and covariances of log-Pearson type 3 (P3) distribution parameters as well as the sampling variance of T-year flood event using the method of moments (MOM) [5]. Condie gave the maximum likelihood estimators for the parameters of a log-Pearson type 3 distribution, derived the expressions for asymptotic standard error of a T-year event, and concluded that the maximum likelihood (ML) method is markedly superior to MOM in the estimation of asymptotic standard error of T-year event [6]. Lu and Stedinger derived the simple formulas for estimating the asymptotic variance of probability weighted moments (PWM) quantile estimators for generalized extreme value (GEV) distribution when the location and scale parameters were estimated with a fixed regional shape parameter or all three parameters were estimated [4]. Phien derived the explicit formulas for the variances and covariances of the parameter estimates of log-Pearson type 3 distribution when the method of direct and mixed moments was used for parameter estimation [7]. The confidence intervals of MOM and ML quantile estimators for log-Gumbel, Weibull, and generalized logistic distribution distributions have also been investigated [8][9][10].
Shannon defined the concept of entropy as a measure of uncertainty of a random variable or its probability distribution [11]. Jaynes later formulated the principle of maximum entropy (POME), which provides a rational approach to choosing the most unbiased probability distribution for hydrologic frequency analysis [12]. Sonuga developed a minimally biased probability distribution appropriate for hydrologic frequency analysis in the absence of a large amount of data [13]. Singh developed a procedure for derivation of a number of frequency distributions used in hydrology using POME [14]. Lu derived the generalized distribution for flood and extreme rainfall frequency analysis, and she concluded that the entropy-based generalized distributions are superior or comparable to other traditional distributions [15,16]. POME also provides a way to estimate parameters of a given distribution from the specified constraints. Singh summarized the entropy method for parameter estimation for the commonly used distributions and indicated that the entropy method is reasonable and efficient for parameter estimation [17]. The POME-based parameter estimations for some other distributions have also been derived [18][19][20]. In recent years, an integration of entropy and copula has been developed to construct joint distribution function capable of bivariate flood and drought analysis as well as streamflow simulation [21][22][23].
For the estimation of the POME-based variance, Phien provided the formulas for calculating the approximate variances of the parameter estimators and T-year event for the extreme value type-1 (EV1) distribution and P3 distributions [24,25]. Through applications of the formulas to simulated data, he concluded that the approximate variance of estimates of the T-year event are of sufficient accuracy. However, there are no follow-up studies on the POME-based confidence interval estimation of quantile estimators.
The objective of this study is therefore to apply POME further in the estimation of confidence intervals of quantile estimators. The Monte Carlo simulation was carried out to evaluate the performance of POME in the calculation of confidence intervals based on simulated data sets. Then, the hamma, P3, and EV1 distributions were used to fit the observed annual precipitation series. The distribution parameters and confidence intervals of annual precipitation quantiles for different return periods were estimated using POME, MOM, and ML. Finally, the confidence intervals based on different methods were compared.

Estimation of Quantile
A general form for calculatingx T of a given distribution can be written in terms of the distribution moments and the frequency factor K T [26]: whereμ 1 andμ 2 are the mean and the standard error of the population, respectively, and they equal the sample moments only when the MOM is used for parameter estimation; K T is the frequency factor specific to the chosen distribution, which can be derived from the distribution parameters, sample size, and return period T or cumulative probability of exceedance of the design event. Expressions of K T for different distributions are commonly given in statistics texts [1].

Calculation of Confidence Interval
The standard error and confidence interval are two measures to describe the precision of a statistical quantity, such as the T-year quantile estimatorx T . The (1 − α) confidence interval forx T is approximated by [27] wherex L is the confidence interval; u 1− α 2 is the quantile of the standard normal distribution for confidence levels equal to 1 − α 2 ;x T is the design value for the return period T; s T is the standard error ofx T , which can be expressed as [27]: whereθ i , i = 1, 2, 3 denotes the estimators of either moments or distribution parameters; var θ i is the variance of θ i ; cov θ i ,θ j is the covariance ofθ i andθ j ; i, j = 1, 2, 3.
In this paper, the MOM, ML, and POME were considered, and the asymptotic variances estimated by these methods are described below.

Method of Moments (MOM)
The MOM asymptotic variance ofx T for a three-parameter distribution is given by [27]: where γ j , j = 1, 2, 3, 4 are the cumulants. For a two-parameter distribution, the frequency factor K T does not depend on γ 1 , then ∂K T /∂γ 1 = 0 in the above equation and the expression simplifies to:

Maximum Likelihood (ML) Method
ML is a probability distribution-related method that requires the log-likelihood function of the probability density function (pdf) of a specific distribution. The ML parameters estimators of the commonly used distributions in hydrology are available in the literature [1].
The asymptotic variance and covariance terms for the ML parameter estimators are the elements of the inverse of the information matrix I [28]:    var θ 1 cov θ 1 ,θ 2 cov θ 1 ,θ 3 var θ 2 cov θ 2 ,θ 3 var θ 3 Differentiating Equation (1) with parameters θ 1 , θ 2 , and θ 3 , one obtains the derivatives of x T with respect to θ 1 , θ 2 , and θ 3 . Substituting the derivative terms and the asymptotic variances and covariances in Equation (6) into Equation (3) yields the asymptotic variance of the ML quantile estimators. Finally, the confidence interval of quantile estimators can be calculated by using Equation (2).
2.2.3. Principle of Maximum Entropy (POME) Method POME involves essentially five steps in the estimation of the distribution parameters: (1) specification of constraints from the given information; (2) derivation of the probability density function of the maximum entropy distribution; (3) derivation of the relationship between Lagrange multipliers and constraints; (4) derivation of the relationship between Lagrange multipliers and distribution parameters; and (5) derivation of the relationship between distribution parameters and constraints [17,19].
The constraints in POME can be expressed in terms of moments, therefore, the variance and covariances of the parameters can be obtained from the relationship between the variance and covariances of the moments and that of the parameter estimates. Let P, Q, and R denote the three moments, thus one can approximately write the vector of variance and covariances of P, Q, and R of a three-parameter distribution as [24]: where V M and V P are the vectors of variance and covariances of the moments and parameter estimators, respectively: var θ 1 var θ 2 var θ 3 cov θ 1 ,θ 2 cov θ 2 ,θ 3 cov θ 1 ,θ 3 and θ 1 , θ 2 , and θ 3 are the distribution parameters; D is the matrix with elements d ij (1 ≤ i, j ≤ 6), which are the partial derivatives of the moments with respect to the distribution parameters. For example: Consequently, the V p can be calculated using Equation (10) as long as the elements of D and the V M have been calculated.
where D −1 is the inverse matrix of D.
Substituting the elements of V p and the partial derivatives of x T with respect to distribution parameters into Equation (3), one can obtain the variances of quantile estimators. The confidence interval of quantile estimators can then be calculated by using Equation (2).
The variances and covariances of MOM parameter estimates are calculated by using the relationship between the parameters and the population moments, which is relatively simple and understandable. However, the calculation of the second and higher order sample moments introduces sampling errors, which affects the accuracy of the estimation. The ML method is frequently applied owing to its large sample properties of yielding consistent estimates with minimum variance. Estimates for small samples have found general acceptance in practice as well [28]. However, this method involves some complicated calculations and approximations, which makes it inconvenient. The POME requires less artificial assumption due to insufficient data. Though it is comparable to the ML in parameter estimation, POME has the advantages of simple and fast calculation [17]. The calculation POME-based confidence interval also requires some approximations. Therefore, it is necessary to compare the performance of different methods to choose the most efficient one.

Asymptotic Variances of Quantile Estimators for Different Distributions
Three commonly used distributions-gamma distribution, P3 distribution, and EV1 were considered in this study.

Gamma Distribution
The pdf of the gamma distribution is given by: where α and β are the scale and shape parameters, and Γ(·) is the gamma function, and 0 < x < ∞.
For the gamma distribution, the T-year quantile is given by: Differentiation of Equation (12) with respect to α and β yields: where ∂K T /∂C S can be calculated by using Wilson-Hilferty transformation [1].

Estimation of Asymptotic Variances by MOM and ML
Based on MOM, the standard error ofx T for the gamma distribution can be calculated directly by [29]: where C v is the coefficient of variation, and C v = µ 1/2 2 /µ 1 ; γ 1 = C s is the coefficient of skewness. The asymptotic variance and covariances of ML parameter estimators are derived as [1]: is the tri-gamma function; D = 1 βψ −1 . Substituting Equations (13) and (14) and the variances and covariances terms in Equation (16) into Equation (3) yields the variance of the ML quantile estimator.

Estimation of Asymptotic Variances by POME
For the gamma distribution, the relation between parameters and constraints can be expressed as [17]: is digamma function. The parameter estimatorsα andβ can be obtained by solving the following equations: where X is the sample mean of x, and W is the sample mean of the random variable W = ln(x). Then, V M and V p are written by: For the gamma distribution, Exact formulas for computing var W and cov X, Consequently, one obtains: Additionally, taking the partial derivates of X and W with respect to α and β, one can obtain the matrix D: Thus, all the components of V M and D are obtained. Substituting V M and D [Equations (23) and (24)] into Equation (10) yields V P . The variance of the quantile estimator can then be obtained by substituting the terms of V P and Equations (13) and (14) into Equation (3).

Pearson Type 3 (P3) Distribution
The pdf of P3 distribution is given by: where α, βandγ are the scale, shape, and location parameters, respectively, and γ < x < ∞.
The T-year quantile of P3 distribution is given by: Taking partial derivatives of Equation (34) with respect to α, β, γ yields:

Estimation of Asymptotic Variances by MOM and ML
For the P3 distribution, the asymptotic variance of MOM quantile estimator is given by: The asymptotic variance and covariances of ML parameter estimators are given by [1]: (29) and the variance and covariance terms in Equation (31) into Equation (3) yields the asymptotic variance of the quantile estimator.

Estimation of Asymptotic Variances by POME
On the basis of POME, the relation between parameters and constraints for P3 distribution is given by [17]: where ψ = ψ(β) is digamma function. The parameter estimatorsα,β, andγ can be obtained by solving the following equations: where X and S 2 are the sample mean and variance of x, and W 1 is the sample mean of the random variable W 1 defined as W 1 = ln(x − γ). Therefore, V M and V p can be written by: Following Phine [24], the V M is given by: Taking partial derivatives of X, WandS 2 with respect to α, β, and γ yields the matrix D: where ψ = ψ (β) =

Extreme Value Type 1 (EV1) Distribution
The pdf and the cumulative distribution function of EV1 distribution can be expressed respectively as: where α and u are the scale and shape parameters, respectively, and −∞ < x < ∞.
The T-year quantile of EV1 distribution can be obtained from Equation (38) by substituting F(x) = 1 − 1/T and solving for x:x Differentiating Equation (39) with α and u yields the derivatives of x T with respect to α and u:

Estimation of asymptotic variance by MOM and ML
The asymptotic variance of MOM quantile estimator is given by: where K T is given by:  (3) yields the asymptotic variance of the quantile estimator.

Estimation of Asymptotic Variances by POME
The relation between parameters and constraints for EV1 distribution can be expressed as [17]: The estimatorsα andû of the parameters can be obtained by solving the following equations: where Y and V are the sample mean of variables defined by y = (x − u)/α and V = exp(−y), respectively. The variances and covariances of the moments and parameter estimators are written respectively as: According the derivations in [17], the V M is given by: Taking partial derivatives of Y and V with respect to α and u yields: where W = y exp(−y).

Simulation Experiments
In this section, the Monte Carlo simulation experiments were performed to evaluate the performance of POME in calculation of the asymptotic variances and confidence intervals of quantiles and to compare it with the MOM and ML methods. In this study, four kinds of data sets were generated from the Wakeby distribution with parameters as shown in Table 1 [16,30]. The quantile function of the Wakeby distribution is given by [31]: where F is the uniform (0,1) variate, and ξ, α, β, γ, δ are the parameters. Ns = 1000 samples with size n (n = 20, 50, 100, 1000) were generated from each Wakeby distribution. Then, the quantilesx T corresponding to different return periods (T = 10, 100, and 200) and their asymptotic variances and confidence intervals were calculated for EV1. Table 2 lists the median values of the estimated quantiles (x T ), standard errors (St), and confidence interval width (CI width).
From Table 2, generally for all methods and for all cases, it was observed that the standard errors and confidence interval widths of the quantiles increased with the return period T and decreased with the sample size. For all cases, the selected three methods exhibited very similar behaviors. Thus, we would take case III as an example in the latter discussion.
From case III, it was observed that the POME generally gave the smallest median of both standard errors and confidence interval widths of quantiles regardless of the sample size and return period. MOM was always the worst of the three competing methods and gave the largest results. The results of ML fell between MOM and POME. For example, when the sample size equaled 50, the median standard errors of MOM, ML, and POME quantile estimator for return period T = 100 were 69.2, 66.2, and 63.5, respectively. Correspondingly, the confidence interval widths were 271.4, 259.4, and 248.8, respectively, which indicated that the uncertainty of the POME estimator was less than that of the MOM and the ML estimators. Therefore, the performance of the POME was found to be superior to the MOM and the ML.
In addition, for each method considered, the median of both standard errors and confidence interval widths of quantiles decreased significantly when the sample size increased from 20 to 1000. For T = 100, when the sample size increased from 20 to 1000, the median of standard errors of MOM quantile estimators decreased from 109.5 to 8.2, the median of standard errors of ML quantile estimators decreased from 99.9 to 8.7, and the median of standard errors of POME quantile estimators decreased from 99 to 8.3. The median of confidence interval widths decreased from 429.3 to 32.2 for MOM quantile estimators, 391.8 to 34.2 for ML quantile estimators, and 388 to 32.6 for POME quantile estimators. This was an indication of the influence sample size had on the estimation accuracy.

Application
The annual precipitation data from four gauging stations at the Weihe River basin in China were considered as the case study. All data were obtained from the National Climate of China Meteorological Administration and were complete. The detailed information of these data is given in Table 3. The gamma, P3, and EV1 distributions were used to fit the data set, and the MOM, ML, and POME were used to estimate the parameters of these distributions, as given in Table 4. It can be seen that the parameters of the gamma distribution estimated by MOM, ML, and POME were very close, as were the EV1 distribution, while those of the P3 distribution departed significantly. To evaluate and compare the performances of the three methods and the distributions, the ordinary least square (OLS) criterion, Akaike information criterion (AIC), and quasi-optimal deterministic coefficient test (QD) were employed and can be defined as: where x i andx i are the observed data and the predicted values of a given (i-th) quantile, respectively, x is the mean value of observed data, m is the number of parameters of a given model, and n is the sample size. The OLS criterion is recommended as a curve optimization rule for measuring the difference between empirical and theoretical values in hydrological frequency analysis in China. The smaller OLS values represent the better performance of the model. The AIC is more appropriate for the comparison of models have different number of parameters. Given a set of candidate models for the data, the best model is the one with the minimum AIC value. QD is used to describe the fitting degree of observed values and theoretical values and the best fit model is the one that gets the QD value closest to 1. The OLS, AIC, and QD were calculated as given in Table 5. Table 5. Ordinary least square (OLS), Akaike information criterion (AIC) and quasi-optimal deterministic coefficient test (QD) values of three distributions calculated by MOM, ML, and POME. It is seen from Table 5 that the selected best parameter estimation method for each distribution by the three criterions is coincident and the result of the best fitted distribution for each station by the three criterions is the same as well. Take the Changwu station in Table 5 for example. According to the smallest OLS and AIC values and the largest QD values, the POME, MOM and POME are suggested to be the best methods for parameter estimation for Gamma, P3 and EV1 distributions, respectively. And the best fitted distribution for Changwu station recommended by the OLS, AIC and QD criteria is P3 distribution. Additionally, according to the results given in Table 5, the best fitted distributions for the gauging stations Meixian, Tongguan and Lintong recommended by the OLS, AIC and QD methods, is EV1 distribution with the parameters estimated by POME. Thus the best estimation method for each station is POME and this is coincident with the results of the simulation experiments in Section 4, which shows that the performance of POME is better than MOM and ML. The bold values in the table denote the smallest OLS and AIC values and the largest QD values.
The quantiles along with the standard errors and 95% confidence intervals for 10, 20, 50, 100, 200, and 500 years return periods of the best fitted distribution based on the parameters estimated by POME are given in Table 5. For the sake of comparison, the quantiles, standard errors, and 95% confidence interval widths based on MOM and ML are also given in Table 6. The results show that the standard errors and confidence interval widths of quantile estimators obtained by POME were smaller than those obtained by the MOM and the ML methods with the exception of the results of T = 10 at Changwu station, which indicated that the POME yielded more precise parameters and quantiles estimations. To better understand the performance of the different methods, the differences in the uncertainty reductions for the standard errors and 95% confidence interval widths of the quantile estimators were given in terms of relative deviation, as shown in Table 7. For the relatively long return period (T ≥ 50), there were significant reductions in the standard errors and 95% confidence interval widths obtained by POME compared to MOM. For example, for a return period of T = 500, the reductions in standard errors and the confidence interval widths were of about 32%, 17%, 17%, and 16% for Changwu, Lintong, Meixian, and Tongguan, respectively. It can also be seen from Table 6 that, for Changwu station, the reduction in standard errors and 95% confidence interval widths obtained by POME were significant when compared to ML. For example, the reductions in the standard errors and confidence interval widths of a 500-year quantile was about 19%. For Lintong, Meixian, and Tongguan stations, the reductions were relatively smaller-about 6%, 6%, and 5%, respectively. Overall, the POME provided more accurate quantile estimators.

Conclusions
In this study, the POME method was applied for the estimation of the asymptotic variances and confidence intervals of quantiles, and the corresponding calculation formulas for gamma, P3, and EV1 distributions based on the POME method were deduced. The calculation procedures of the MOM and the ML methods were also reviewed briefly for comparison. The Monte Carlo simulation experiments were carried out to evaluate the performance of the POME method and to compare it with the MOM and the ML methods. In addition, annual precipitation data from four stations at the Weihe River basin in China were selected as the case study. The following conclusions were drawn from this study: (1) The calculation formulas of the asymptotic variances and confidence intervals of quantiles for three distributions based on POME are given. The results of simulation experiments and the case study show that the POME method can provide an effective way for reducing the uncertainty of quantile estimators. (2) Results of the simulation experiments demonstrate that the POME method yields the smallest standard errors and the narrowest confidence intervals of quantile estimators compared with the results of MOM and ML. This may benefit from fewer sampling errors and approximation in derivation. Thus, the POME can give more accurate estimates. Furthermore, the standard errors and confidence interval widths of the quantiles increased with the return period T and decreased with the sample size. (3) Results of the case study indicate that when using different criteria for distribution selection, the results are coincidental, and the POME is the optimal method for parameter estimation. Furthermore, the POME can give more reliable precipitation quantiles since the standard errors and 95% confidence interval widths of precipitation quantiles obtained by POME are smaller than those obtained by the MOM and the ML methods.
This study investigated the calculation of asymptotic variances and confidence intervals based on POME for three commonly used distributions and compared the performance of POME with that of MOM and ML. In addition, the POME-based asymptotic variances and confidence intervals of quantiles for more distributions deserve more thorough investigation.
Author Contributions: S.S. and Ting Wei designed the computations; S.S. and T.W. wrote the paper. All authors have read and approved the final manuscript.

Funding:
The present study is financially supported by the National Natural Science Foundation of China (Grant Nos. 51479171, 41501022 and 51409222).

Acknowledgments:
The authors would like to appreciate the editor and anonymous reviewers for their constructive comments which greatly improve the quality of this manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The gamma distribution: Estimation of var W , cov X, W .

Estimation of var W
For the gamma distribution, the mean and variance are given by: Let y = x/α, x = αy, dx = αdy. Substituting these quantities into Equation (A4) and changing the integral limits, we obtain: