MLE-Based Parameter Estimation for Four-Parameter Exponential Gamma Distribution and Asymptotic Variance of Its Quantiles

: The choice of a probability distribution function and conﬁdence interval of estimated design values have long been of interest in ﬂood frequency analysis. Although the four-parameter exponential gamma (FPEG) distribution has been developed for application in hydrology, its maximum likelihood estimation (MLE)-based parameter estimation method and asymptotic variance of its quantiles have not been well documented. In this study, the MLE method was used to estimate the parameters and conﬁdence intervals of quantiles of the FPEG distribution. This method entails parameter estimation and asymptotic variances of quantile estimators. The parameter estimation consisted of a set of four equations which, after algebraic simpliﬁcation, were solved using a three dimensional Levenberg-Marquardt algorithm. Based on sample information matrix and Fisher’s expected information matrix, derivatives of the design quantile with respect to the parameters were derived. The method of estimation was applied to annual precipitation data from the Weihe watershed, China and conﬁdence intervals for quantiles were determined. Results showed that the FPEG was a good candidate to model annual precipitation data and can provide guidance for estimating design values.


Introduction
Hydrological frequency analysis is important for planning, designing and managing water resources projects. The design values (e.g., design flood, design rainfall) computed by frequency analysis involve uncertainties due to the sampling method, sample length, empirical frequency formula, cumulative distribution function (CDF) or probability density function (PDF), parameter estimation method, goodness-of-fit test, and extent of data extrapolation [1,2]. Among these uncertainty sources, there has been a considerable interest in the choice of CDF for a given sample, because the true CDF of a hydrological variable is unknown. Rao and Hamed summarized commonly used distributions: normal and related, gamma family, extreme value, Wakeby, and logistic as well as their application [3].
Further, quantifying the uncertainty of the estimated design values is important in the planning, design, and management of water resources projects [1,4]. In practice, standard error and confidence interval (confidence limits) are employed to measure the uncertainty of a statistical quantity [4]. Rao and Hamed provided confidence intervals of some common distributions, with parameters estimated by the method of moments (MOM), the probability weighted moments (PWM), and maximum likelihood estimation (MLE) [3]. Shin [5] and Shin et al. [6] summarized methods for computing confidence intervals. Methods of estimating the confidence intervals of design quantities include Monte Carlo simulation [7], approximate method [8], analytical method [5,[9][10][11], asymptotic variance of the expected moments algorithm [12], bootstrap method [13][14][15][16], and standard error of regional population index flood (RPIF) [17]. Studies show that confidence intervals mainly depend on the method of estimation of the parameters of the probability distribution function.
The four-parameter exponential gamma (FPEG) distribution has been applied in hydrology in China and be specialized into 10 kinds of probability distribution functions: gamma, Pearson type III (P-III), K-M, Weibull, Chi-square, exponential, normal, Pearson type V (P-V), log-normal, and Gumbel. The properties of FPEG distribution and relations between this distribution and others, and its potential for application, have been investigated [18,19]. However, the MLE-based parameter estimation method and algorithms for computing confidence intervals of the design values for the FPEG distribution have received little attention.
The objective of this paper, therefore, is to present the MLE method for estimating the FPEG distribution parameters, and derive confidence intervals of quantiles using asymptotic variances. The method of parameter estimation involved a set of four equations which are solved by a three dimensional Levenberg-Marquardt algorithm. Following Kendall and Stuart [20], the expected values of the second-order partial derivatives of the log-likelihood function with respect to the parameters, and the explicit formulae for the variances and covariances are analytically derived. The proposed estimation procedure is illustrated by using observed annual precipitation data.
The paper is organized as follows. Describing the FPEG distribution and estimation of its quantiles. A set of four equations of the MLE method for parameters and confidence intervals of quantiles are derived in Section 2, followed by an application to annual precipitation from the Weihe watershed in China in Section 3. Conclusions, along with a summary of the main features of the proposed method, are given in Section 4.

Probability Density Function and Cumulative Distribution Function
The FPEG distribution has the probability density function (PDF), f (x), expressed as [18,19]: ; otherwise (1) where α > 0 is the shape parameter; β > 0 is the scale parameter; δ > 0 is the location parameter; b > 0 is the transformation parameter; Γ(α) is the complete gamma function; x is the value of the random variable X. Figure 1 shows some typical shapes of the PDF. The CDF can be expressed as When the design frequency p is given, its correspondance to design value x p (quantile) can be expressed as Equation (2) can be transformed to a one-parameter gamma using the following substitution of these quantities into Equation (2) results in [18,19]: where t p = β x p − δ 1 b , which can be determined by the incomplete gamma function.

Estimation of Quantiles
The quantile corresponding to the probability of exceedance p, x p , is obtained as Also, the estimator x p may be generally written in terms of the mathematical expectation E(X), the coefficient of variation C v , and the frequency factor Φ p as Given the probability of exceedance p, the frequency factor Φ p can be written in the following form [18,19]: Note that from Equations (3) and (6), the frequency factor Φ p is a function of the probability of exceedance and parameters α and b. Some numerical values for such a function are shown in Table 1. For large α (e.g., α > 100) it is seen that the differences among the Φ p values for a given p are subtle.

Maximum Likelihood Estimation of the Parameters
For the maximum likelihood estimation (MLE), the log-likelihood function for a sample x = {x 1 , x 2 , · · · , x n } drawn from the FPEG distribution can be written as where n is the sample size, δ ≤ x < ∞; α > 0; β > 0. The MLE parameters can be obtained by taking the derivatives of the log likelihood with respect to parameters, setting them equal to zero, and solving for the parameters. Differentiating Equation (7) partially with respect to each parameter and equating each partial derivative to zero yield Equations (8)- (11) can be solved numerically to obtain parametersα,β,δ, andb.

Confidence Intervals of Quantiles
For a given p, the design value estimate x p is a random variable. The 1 − q confidence intervals for the population quantiles x p may be determined by [2,3] where u 1−q/2 is the 1 − q/2 quantile of the standard normal distribution, x p is the quantile estimator corresponding to the probability of exceedance p which can be determined from Equation (4) or Equation (5), and S x p is the standard deviation or standard error of x p . Such standard error S x p determined by the MLE is given in what follows. For the FPEG distribution, when parameters α, β, δ, and b are estimated by the MLE, x p is a function of α, β, δ, and b: The variance in this case is given by [21] Var The variance and covariance matrix of parameters in Equation (13) is the inverse of Fisher's expected information matrix [3].
where E represents the expected value; is Fisher's expected information matrix, the elements of which can be determined by taking the expected value of the sample information matrix. Its elements are derived in Equations (A1)-(A16) and Equations (A17)-(A32) in Appendix A, respectively.
The derivatives of x p with respect to the parameters of the FPEG distribution are obtained from Equation (4) as For ∂x p ∂α in Equation (33), p is a constant and is a function of t p and α in Equation (3). Thus, we can get is the psi function with a different meaning in Equation (9); values obtained numerically are listed in Table 2.

Data and Case Study
Annual precipitation data from eight sites in the Weihe watershed, China, (1959-2007) were applied to compute the parameters, quantiles, and confidence intervals for the FPEG distribution. All data were obtained from the National Climate of China Meteorological Administration (http://data.cma.cn (accessed on 29 July 2021)) and are complete. The sites and some statistical characteristics of data are summarized in Table 4. It was seen that for annual precipitation from these eight sites, the values of skewness were lower than 1 and the values of kurtosis were higher than 3. All the annual precipitation records also had very low first-order serial correlation coefficients. Using Anderson's test of independence, results showed that these gauge data are independent at the 90% confidence level. Hence, they are considered suitable for precipitation frequency analysis. One of the advantages of the FPEG distribution is that it accommodates a wide range of skewness and kurtosis values, which is one reason it was applied to these sites.

Parameters Estimation
Since there is no explicit solution for the parameters in Equations (8)-(11), the three dimensional Levenberg-Marquardt algorithm was used to obtain a numerical solution for the MLE estimates of α, β, δ, and b. The procedure is summarized as follows [22].
(1) From Equation (10) . Substituting this quantity into Equations (8), (9), and (11), respectively, the result is the system of nonlinear Equations as (2) Employ the three dimensional Levenberg-Marquardt algorithm to solve for parameters δ, α and b: where c i ≥ 0 is the scaling factor; y i and y i+1 are the parameter matrix esti-   at iteration i and i + 1, respectively; I is the three dimensional identity matrix;  is the Jacobian matrix at iteration y i , the elements of the Jacobian matrix can be numerically calculated by central difference or their first derivatives derived in Equations (A73)-(A85); J T (y i ) is the transpose matrix of J(y i ). Throughout this paper the iterative procedure was repeated until the relative change in all parameters was less than 0.01%, that is, max (3) After obtaining parametersδ,α andb, and substituting these quantities in , one obtains parameterβ.
The values of the distribution parameters are given in Table 5. For eight sites, the values of α fell in the range (72, 92), the values of β were higher than 4, and the values of δ were lower than 0.1, with one of them being even as high as 1.75. The sixth, seventh and eighth columns show the computed quantities of the left side functions in Equations (8), (9), and (11). It is seen that these computed quantities were close to zero, indicating a satisfactory performance of the three dimensional Levenberg-Marquardt algorithm.

Goodness-of-Fit Tests and Confidence Interval Calculation
Goodness-of-fit tests are designed to measure the agreement between a theoretical probability distribution and an empirical distribution for a random sample. Here, we used the Kolmogorov-Smirnov (K-S) test D n for the goodness-of-fit test of the FPEG distribution. The K-S test D n is also called empirical distribution function test statistic, because it measures the distance between a continuous distribution function and the empirical distribution function.
Let x(1) < x(2) < · · · < x(n) be order statistics for a sample size n whose population is defined by a continuous cumulative distribution function F(x) and F 0 (x i ) be a specified distribution that contains a set of parameters θ (θ is the value estimated from a sample size n). For an annual precipitation series, the null hypothesis H 0 that the true distribution was F 0 with parameters θ was tested. The K-S test D n can be expressed as [2]: The sample values of the K-S test statistic D n are shown in Table 6. The critical value D * n of the FPEG distribution (at the significance level a = 0.05, for sample size n) was 0.1940. It is seen that the statistics of observed annual precipitation were all less than their corresponding critical value, respectively, so that annual precipitation series were all accepted by the K-S test. For the FPEG distribution and the values of standard errors of the quantile estimates using the above methods, the 95 percent confidence intervals may be set at ∓1.96 standard errors around the x p values. Table 7 shows the quantiles and confidence interval widths estimated by the above methods for different probabilities of exceedance. For example, p = 70% annual precipitation at Binxian site was 465.56 mm. Using Equation (12), a 95% confidence interval for the p = 70% annual precipitation was 430.66 mm to 500.47 mm, its width was 69.80 mm. From Table 7, It is seen that for p = 30-95% the confidence interval widths estimated were much less than those for p < 30% and p > 95%.

Conclusions
The use of the FPEG distribution has received only limited attention from the hydrologic community, but some investigations in China suggest that this distribution performs well in modeling hydrological data. The MLE is proposed for determining the parameters and confidence intervals of the FPEG distribution. It involves parameter estimation and asymptotic variances of quantile estimators. The parameter estimation formulas constitute a system of nonlinear equations that have tedious forms. However, this should not be an insurmountable difficultly with the Levenberg-Marquardt algorithm, given the available numerical tools and computer power. An analytical expression of sample information matrix and Fisher's expected information matrix, and derivatives of design value with respect to the parameters were then derived. The asymptotic variances of the MLE quantile estimators for the FPEG distribution were expressed as a function of the probability (return period), parameters and sample size. Such variances can be employed for estimating the confidence intervals of the FPEG distribution quantiles. The FPEG distribution is applied to precipitation data of the Weihe watershed in China. The observed annual precipitation data were all accepted by the Kolmogorov-Smirnov test. These results showed that the FPEG distribution is a good candidate for modelling annual precipitation data. We expect that our results will provide guidance for estimating design values of random variables in other parts of world. In addition, Bayesian inference is a very good method for inferring the estimation of parameters from quantile parameters of the FPEG distribution, and will be studied further.

. Fisher's Expected Information Matrix
Multiplying Equations (A1)-(A16) and taking mathematical expectation, the elements of Fisher's expected information matrix can be obtained as: Because the above equations have some unknown mathematical expectations, these expectations need to be derived first.
In particular, In particular, Following (x + a) n = n ∑ k=0 n k x k a n−k , Equation (A42) can be written as In particular, In particular, In particular, In particular, Substitution of equations of Equations (A33)-(A56) into Equations (A17)-(A32), we can get the elements of the expected information matrix.