Entropy-Based Parameter Estimation for the Four-Parameter Exponential Gamma Distribution

Two methods based on the principle of maximum entropy (POME), the ordinary entropy method (ENT) and the parameter space expansion method (PSEM), are developed for estimating the parameters of a four-parameter exponential gamma distribution. Using six data sets for annual precipitation at the Weihe River basin in China, the PSEM was applied for estimating parameters for the four-parameter exponential gamma distribution and was compared to the methods of moments (MOM) and of maximum likelihood estimation (MLE). It is shown that PSEM enables the four-parameter exponential distribution to fit the data well, and can further improve the estimation.


Introduction
Hydrological frequency analysis is a statistical prediction method that consists of studying past events that are characteristic of a particular hydrological process in order to determine the probabilities of the occurrence of these events in the future [1,2].It is widely used for planning, design, and management of water resource systems.The probability distributions containing four or more parameters may exhibit some useful properties [3]: (1) versatility and (2) ability to represent data from mixed populations.Among these distributions, some popular distributions are Wakeby, two-component lognormal, two-component extreme value distributions, and the four-parameter kappa distribution.Since the pioneering stream flow records frequency analysis of Herschel and Freeman during the period from 1880 to 1890, hydrological frequency analysis has undergone extensive further development.There are a multitude of methods for estimating parameters of hydrologic frequency distributions.Some of the popular methods include [3,4]: (1) the method of moments; (2) the method of probability weighted moments; (3) the method of mixed moments; (4) L-moments; (5) the maximum likelihood estimation; (6) the least square method; and (7) the entropy-based parameter estimation method.
Among the above parameter estimation methods, entropy, which is a measure of uncertainty of random variables, has attracted much attention and has been used for a variety of applications in hydrology [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23].For example, an entropy-based derivation of daily rainfall probability distribution [24], the Burrr XII-Singh-Maddala (BSM) distribution function derived from the maximum entropy principle using the Boltzmann-Shannon entropy with some constraints [25]."Entropy-Based Parameter Estimation in Hydrology" is the first book focusing on parameter estimation using entropy for a number of distributions frequently used in hydrology [3], including the uniform distribution, exponential distribution, normal distribution, two-parameter lognormal distribution, three-parameter lognormal distribution, extreme value type I distribution, log-extreme value type I distribution, extreme value type III distribution, generalized extreme value distribution, Weibull distribution, gamma distribution, Pearson type III distribution, log-Pearson type III distribution, beta distribution, two-parameter log-logistic distribution, three-parameter log-logistic distribution, two-parameter Pareto distribution, two-parameter generalized Pareto distribution, three-parameter generalized Pareto distribution and two-component extreme value distribution.Recently, two entropy-based methods, called the ordinary entropy method (ENT) and the parameter space expansion method (PSEM), that are both based on the principle of maximum entropy (POME) have been applied for estimating the parameters of the extended Burr XII distribution and the four-parameter kappa distribution [5,16].The results of the estimation show that the entropy method enables these two distributions to fit the data better than the other estimation methods.In the above method of entropy-based parameter estimation of a distribution, the distribution parameters are expressed in terms of the given constraints, and then the method can provide a way to derive the distribution from the specified constraints.The general procedure for the ENT for a hydrologic frequency distribution involves the following steps [3]: (1) define the given information in terms of the constraints; (2) maximize the entropy subject to the given information; and (3) relate the parameters to the given information.The PSEM employs an enlarged parameter space and maximizes the entropy subject to the parameters and the Lagrange multipliers [3].The parameters of the distribution can be estimated by the maximization of the entropy function.
The Pearson III distribution is recommended as a standard distribution to fit hydrological data in China.In addition, generalized Pareto distribution (GPD), generalized extreme value (GEV) and three-parameter Burr type XII distribution also have been applied flood frequency analysis [26].
Inspired in large part by the two-parameter gamma distribution, a four-parameter exponential gamma distribution has been developed to apply in many areas, such as wind and flood frequency in Yellow River basin, Yangtse River basin, Aumer Basin and Liaohe River basin of China [4].Depending on the parameter values, the four-parameter exponential gamma distribution can be turned into a Pearson type III distribution, Weibull distribution, Maxwell distribution, Kritsky and Menkel distribution, Chi-square distribution, Poisson distribution, half-normal distribution and half-Laplace distribution.The properties of the four-parameter exponential gamma and relations between this distribution and other distributions have been investigated [4].These investigations suggest that the four-parameter exponential gamma distribution may have a potential in hydrology.Despite the advances mentioned above, the entropy-based parameter estimation for the four-parameter exponential gamma distribution has received comparatively little attention from the hydrologic community.
The objective of this paper is to apply two entropy-based methods that both use the POME for the estimation of the parameters of the four-parameter exponential gamma distribution; compute the annual precipitation quantiles using this distribution for different return periods; and compare these parameters with those estimated when the methods of moments (MOM) and maximum likelihood estimation (MLE) were employed for parameter estimation.

Probability Density Function and Cumulative Distribution Function
The probability density function (PDF) of the four-parameter exponential gamma distribution can be expressed as [4]: where α β, δ and b are, respectively, the shape, scale, location and transformation parameter.

Quantile Corresponding to the Probability of Exceedance
The quantile corresponding to the probability of exceedance p, x p , is obtained by Equation ( 14) or Equation (15): here, x and C v are the mean and coefficient of the variation of a sample, respectively, and Φ p is the frequency factor corresponding to x p .Given the expectation and variance of the population, the frequency factorr Φ p is given by [4]: where µ 1 and µ 2 are the expectation and variance of the population.If b = 10, the frequency factors of the four-parameter exponential gamma distribution, Φ p , are very close to that of the log-normal distribution (Table 1).If C s = 1.1395, the Φ p values are very close to that of the Gumbel distribution (Table 2).

Cumulants and Moments
The first three cumulants of the four-parameter exponential gamma distribution are expressed as [4]: Using the relations between moments and cumulants and Equations ( 17)-( 19), the expression for the first four moments of the four-parameter exponential gamma distribution are given below: In next sections, we use two methods of parameter estimation, ENT and PSEM, to derive the parameters estimation expression of the four-parameter exponential gamma distribution.

Ordinary Entropy Method
For ENT, three steps are involved in the estimation of the parameters of a probability distribution: (1) specification of appropriate constraints, (2) derivation of the entropy function of the distribution, and (3) derivation of the relations between parameters and constraints [3,16].

Specification of Constraints
Taking the natural logarithm of Equation (1), we obtain: Multiplying Equation ( 24) by [-f (x)] and integrating from δ to ∞, we obtain the entropy function: To maximize S in Equation ( 25), the following constraints for Equation ( 25) should be satisfied (29)

Construction of Partition Function and Zeroth Lagrange Multiplier
The least-biased pdf, f (x), consistent with Equations ( 26) to (29) and corresponding to the POME takes the form: where λ 0 , λ 1 , λ 2 and λ 3 are Lagrange multipliers.Substitution of Equation (30) in Equation ( 26) yields: The argument of the exponential function on the left side of Equation (31) has two parts: zeroth Lagrange multiplier without the random variable and four Lagrange multipliers with the random variable.The zeroth Lagrange multiplier part is separated out and is expressed as: To calculate the above integral, let Substituting the above quantities in Equation (32), we obtain: Taking the logarithm of Equation ( 33) results in the zeroth Lagrange λ 0 multiplier as a function of Lagrange multipliers λ 1 , λ 2 and λ 3 , with the expression given as:

Relation between Lagrange Multiplier and Parameters
Introduction of Equation (34) in Equation (30) produces: a comparison of Equation (48) with Equation (1) shows that:

Relation between Parameters and Constraints
The four-parameter exponential gamma distribution has four parameters α, β, δ and b that are related to the Lagrange multipliers by Equations ( 49) and (50).In turn, these parameters are related to the known constrains by Equations ( 44)-(47).Eliminating the Lagrange multipliers among these four sets of Equations, we can obtain the following Equations:

Specification of Constraints
Following reference [3], the constraints consistent with the POME method and appropriate for the four-parameter exponential gamma distribution are specified by Equations ( 26), ( 27) and (29).

Construction of Zeroth Lagrange Multiplier
The least-biased pdf corresponding to POME and consistent with Equations ( 26), ( 27) and (29) takes the form: where λ 0 , λ 1 and λ 2 are Lagrange multipliers.Substitution of Equation (52) into Equation ( 26) yields Substituting the above quantities in Equation ( 54) and changing the limits of integration, we obtain: This yields the zeroth Lagrange multiplier:

Derivation of Entropy Function
Introduction of Equation (56) into Equation (52) yields: a comparison of Equation (57) with Equation (1) shows that: taking the logarithm of Equation (57) yields: then, making use of Equation (60), the entropy function can be written as: (61)

Relation between Parameters and Constraints
Taking partial derivatives of (61) with respect to b, δ, λ 1 , and λ 2 , and equating each derivative to zero yields: Introduction of Equations ( 58)-(59) into Equations ( 62)-( 65) and recalling Equations ( 62)-(65) yields, respectively: The expectations of Equation (66) are replaced by their sample estimates, and the simplification of Equation (66) leads to: Equations ( 51) has the second moments and results in some biases.Therefore, Equation (67) should be used for the estimation of the parameters.

Two Other Parameter Estimation Methods
Two other methods of parameter estimation frequently used in hydrology are the method of moments (MOM) and the MLE method.

Method of Moments
The four-parameter exponential gamma distribution has four parameters α, β, δ and b.Therefore, four moments are needed for the parameters estimation.The detailed derivation of the four moments is presented in Appendix A: For a sample, x = {x 1 , x 2 , • • • , x n }, the estimation equations become: where n is the sample size; x = 1 n n ∑ i=1 x i .

Method of Maximum Likelihood Estimation
For the MLE method, the log-likelihood function L for a sample x = {x 1 , x 2 , • • • , x n } is given by: The MLE's of parameters α, β, δ and b are taken to be the values that yield the maximum of ln L. Differentiating Equation (70) partially with respect to each parameter and equating each partial derivative to zero produces: These are the parameter estimation Equations, and the obtained results are the same as those of the PSEM method.

Evaluation and Comparison of Parameter Estimation Methods
The PSEM as presented in this paper is used for six annual precipitation data sets observed from 1959 to 2008 without any missing records at the Weihe River basin of China.All data are obtained from the National Climate of China Meteorological Administration and are complete.The characteristics of these data are summarized in Table 3. Obviously, all annual precipitation records have very low first-order serial correlation coefficients, ρ.Using Anderson's test of independence, the results have shown that these gauge data have an independent structure at 90% confidence levels.Hence, they are suitable for the application of meteorological frequency analysis.None of the above-discussed three methods yielded explicit solutions for the estimation of parameters of the four-parameter exponential gamma distribution.
The parameter estimation Equations were therefore solved for α, β, δ and b by the four-dimensional Levenberg-Marquardt method.
Equations ( 67)-( 68) and ( 71) can be simplified as the form of . Then, according to the above procedures the Matlab (Version R2007b) computer codes were developed and used to calculate the parameters.To verify the validities of parameters, the left side functions F 1 , F 2 , F 3 and F 4 in Equations ( 67)-( 68) and ( 71) are listed Table 4.It is seen that these compute quantities are close to zero, indicating satisfactory performance of the four dimensional Levenberg-Marquardt algorithm.The values of the distribution parameters are given in Table 5.The results of PSEM and MLE are the same.To evaluate and compare the performance of the three methods, the relative error (RERR) was employed that can be defined as: where x 0i and x pi are the observed and predicted values of a given (i-th) quantile, respectively, and n is the sample size.The RERR values are summarized in Table 5. Examination of the data in Table 5 shows that the parameters estimated using PSEM and MLE are comparable to MOM in terms of RERR and it is thus difficult to distinguish them from one another.However, PSEM and MLE yield the best parameter estimates.Thus, the parameters estimated by PSEM should be employed as the ones of four-parameter exponential gamma distribution in case study sites.
To measure the agreement between a theoretical probability distribution and an empirical distribution for the samples, Kolmogorov-Smirnov (K-S) test D n was used to assess the goodness-of-fit.Let x 1 < x 2 < • • • < x n be order statistics for a sample size n whose population is defined by a continuous cumulative distribution function F(x) and F 0 (x i ) be a specified distribution that contains a set of parameters θ ( θ is estimated value from a sample size n).For an annual precipitation series, the null hypothesis H 0 that the true distribution was F 0 with parameters θ was tested.K-S test D n can be expressed as: The sample values of K-S test statistic D n , are shown in Table 6.The critical value D * n of the four-parameter exponential gamma distribution (at the significance level a = 0.05, for sample size n) is 0.18654.From Table 6 it can be seen that the statistics of observed annual precipitation are all less than their corresponding critical values, respectively.Therefore, it is concluded that annual precipitation series are all accepted by the K-S test.

Conclusions
Hydrologic frequency analysis, in spite of having developed a great number of distribution models and parameter estimation methods for reliable parameters and quantiles estimates, comes up against practical difficulties imposed by the short sample ranges.The Pearson Type III distribution is recommended as a standard distribution in hydrological frequency analysis in China.A large number of studies have shown that fitting small and large return period segments of Pearson Type III distribution is affected by its skewness value.Different studies employing the same parameter estimation methods may obtain different results.The use of four-parameter exponential gamma distribution has emerged as an attempt to reduce the estimate errors of small and large return period segments.The advantage of the proposed entropy method is that the first moments are made about the calculation of the distribution parameters, instead of variance, skewness and kurtosis.The results of the case estimates show that the entropy method enables the four-parameter exponential gamma distribution to fit the data well.The entropy-based parameter estimation also provides a new way to estimate parameters of the four-parameter exponential gamma distribution.The disadvantage of the method is that it will be computationally cumbersome because four parameters are involved.However, this should not be an insurmountable difficultly, given the currently available numerical tools and computer progress.Also, there are significant differences between among the MOM, PSEM and MLE estimates.Such large differences may be caused by the system of non-linear equations of parameter estimation involving the second central moment of the variable for the MOM, first moments for PSEM and MLE.In addition, the confidence intervals of quantiles for the four-parameter exponential gamma distribution deserve thorough investigation.

Table 1 .
Frequency factors of the four-parameter exponential gamma distribution and log-normal distribution (b = 10).

Table 2 .
Frequency factors of the four-parameter exponential gamma distribution and Gumbel distribution under C s = 1.1395.

Table 3 .
Characteristics of data used for parameter estimation.

Table 5 .
Parameter values estimated by the three methods.

Table 6 .
Sample values of K-S test statistic D n of case study sites.