New Approaches on Parameter Estimation of the Gamma Distribution

: This paper discusses new approaches to parameter estimation of gamma distribution based on representative points. In the ﬁrst part, the existence and uniqueness of gamma mean squared error representative points (MSE-RPs) are discussed theoretically. In the second part, by comparing three types of representative points, we show that gamma MSE-RPs perform well in parameter estimation and simulation. The last part proposes a new Harrel–Davis sample standardization technique. Simulation studies reveal that the standardized samples can be used to improve estimation performance or generate MSE-RPs. In addition, a real data analysis illustrates that the proposed technique yields efﬁcient estimates for gamma parameters.


Introduction
The term representative points (RPs) indicates a set of supporting points with corresponding probabilities, which can be used as the best approximation of a d-dimensional probability distribution.Representative points can be regarded as a discretization of a continuous distribution, and are expected to retain as much information as possible.In the univariate case, X is considered to be a population random variable with cumulative distribution function (cdf) F(x), a discrete random variable Z is defined to approximate X with probability mass function (pmf) by a set of supporting points z = {z 1 , , where P(Z = z i ) = p i and ∑ k i=1 p i = 1.In the literature, there are several approaches to choosing the supporting point set z.For example, a set of random samples from F(x) can be viewed as a representative of the distribution; Fang and Wang [1] suggest generating representative points based on the number theoretic method.In 1957, Cox [2] proposes the idea of using mean squared error (MSE) to measure the loss of information from F(x), where The point set z MSE } such that MSE(z) arrives its minimum is called the mean squared error representative points (MSE-RPs) of F(x).MSE-RPs are found to have many good properties and have been applied in study fields such as signal compression (Gersho and Gray [3]), numerical integration computation (Pagès [4,5]), simulating stochastic differential equation (Gobet et al. [6]; El Amri et al. [7]), statistical simulation (Fang et al. [8], Fang et al. [9]) and clothing standard settings (Fang and He [10]; Flury [11]).To compute MSE-RPs for different distributions, effective numerical methods are proposed.Fang-He algorithm (Fang and He [10]) calculates MSE-RPs by solving a system of non-linear equations; Lloyd I algorithm (Lloyd [12]), LBG algorithm (Linde et al. [13]) and Competitive Learning Vector Quantization algorithm (Pagès [5]) obtain MSE-RPs by iterating a long training sequence of data; Tarpey's self-consistency algorithm (Tarpey [14]) brings the idea of k-means algorithm for generating MSE-RPs; Chakraborty et al. [15] provides an accelerate algorithm using Newton's method.When the number of MSE-RPs (k) is large, obtaining MSE-RPs becomes computationally intensive.Fang and He [10] presents some discussion on the optimum choice of k.
Recently, the use MSE-RPs properties for some distributions have been studied in detail, including normal distribution (Fang et al. [8]), mixed normal distribution (Fang et al. [9] and Li et al. [16]), arcsine distribution (Jiang et al. [17]) and exponential distribution (Xu et al. [18]).A general relationship between MSE-RPs and population distribution can be found in the work of Fei [19] and Fang et al. [9].The study of the gamma distribution's MSE-RPs (gamma MSE-RPs) can be traced back to Fu [20], which discusses the existence of gamma MSE-RPs and establishes an algorithm for computing these points.The gamma distribution is one of the most important distributions in statistics and probability theory, it is worth taking a closer look at gamma MSE-RPs and discovering their merits.The innovations of this paper are listed as follows: 1.
New theoretical results prove the uniqueness of gamma MSE-RPs; 2.
Gamma MSE-RPs are found to outperform other types of representative points in parameter estimation; 3.
A new standardization technique is proposed to improve the estimation performance of random samples from the gamma distribution.
Our discussion will focus on these three perspectives.Section 2 provides some preliminary knowledge of the gamma distribution and different types of representative points for readers to access our content easily.Section 3 gives some theoretical discussion on the existence and uniqueness of gamma MSE-RPs.An algorithm for generating gamma MSE-RPs is recommended.Section 4 compares the performance of three typical gamma representative points in parameter estimation and simulation.The results demonstrate that gamma MSE-RPs take advantage of other representative points in many scenarios.Section 5 introduces a new Harrel-Davis standardization technique.Simulation studies show that the standardized samples have better performances than random samples in estimation and can be used to generate gamma MSE-RPs.Section 6 provides a real clinical data analysis and illustrates that the standardization technique yields efficient estimates for gamma parameters.

Preliminaries 2.1. The Gamma Distribution and Gamma MSE-RPs
A gamma-distributed random variable with shape parameter a and rate parameter b is denoted X ∼ Gamma(a, b) ≡ Ga(a, b).The corresponding probability density function (pdf) in the shape-rate parametrization is where Γ(•) is the gamma function.The mean, variance, skewness and kurtosis of X are accordingly.Let } be a set of MSE-RPs for Ga(a, b), derive the following intervals with the corresponding probabilities in these intervals as Here f (x; a, b) is the pdf in (2).

Other Types of Representative Points
In addition to MSE-RPs, two other types of representative points are frequently discussed in the literature: Monte Carlo representative points and number theoretic representative points.
(A) Monte Carlo representative points (MC-RPs) MC-RPs are generated by the Monte Carlo method.Consider a random sample {x 1 , x 2 , • • • , x k } from the distribution function F(x); this can be treated as a set of MC-RPs, written as (B) Number theoretic representative points (NT-RPs) NT-RPs are determined from the number theoretic method (Fang and Wang [1]).
Given an one dimensional interval (0, 1), it is known that point set 2i−1 2k ) is uniformly scattered on this interval.Based on the inverse transformation method, points

Harrel-Davis Quantile Estimator
In Harrel and Davis [21], a distribution-free quantile estimator is proposed, which consists of a linear combination of the order statistics admitting a jackknife variance.Let X 1 , X 2 , • • • , X n denote a random sample of size n from Ga(a, b); the pth quantile estimator is where and I(•) is the indicator function of the set A. This method can be used for sample standardization.More details are discussed in Section 5.

The Existence and Uniqueness of Gamma MSE-RPs
Let a random variable is the supporting points set of X, to minimize MSE(z), by taking partial derivative of (1), we have where f (x) is the pdf of the gamma distribution (2).When k = 1, system of Equation ( 7) has only one equation Obviously, it has one solution z 1 = a b = µ, which is the only representative point.When k ≥ 2, the existence of MSE-RPs is true if the system of Equation ( 7) has a solution.After several transformations, (7) becomes where F(x) is the cdf.Theorem 1 shows that the system of Equation ( 8) has a solution: Theorem 1.

1.
For given z 1 > 0, equation a solution z 2 exists if and only if z 1 < µ.

2.
For given z i exists a solution z i+1 when z i−1 < z i,i−1 , where z i,i−1 is the i − 1th representative point in the set of gamma MSE-RPs, which has k = i.

3.
For a given z k−1 > 0, Equation a solution z k exists.
Theorem 1 guarantees the existence of gamma MSE-RPs.Its proof is provided in Appendix A.
For the special case k = 2, the existence can be provided by statements 1 and 3 in Theorem 1.
Next, we show the uniqueness of gamma MSE-RPs in Theorem 2.
Theorem 2. Suppose X ∼ Ga(a, b).For any k ∈ N + , the set of gamma MSE-RPs is unique if a ≥ 1.

Gamma MSE-RPs in Parameter Estimation and Simulation
This section compares the performances of gamma MSE-RPs with other types of representative points, i.e., NT-RPs and MC-RPs, in terms of parameter estimation and simulation.Recall that random variable X ∼ Ga(a, b) and Z is a discrete approximation of X.The mean, variance, skewness and kurtosis of Z are By the method of moments, we have which are the point estimators of a and b in Ga(a, b).As Z is a discrete approximation of X, it is expected that the moments of Z and estimates in (12) are close to the moments of X, a and b accordingly.The following theorem shows some connections between gamma MSE-RPs and the corresponding Ga(a, b).
) with corresponding probabilities in (4); then, The proof of Theorem 3 is provided in Appendix A. Note that Theorem 3 is established not only for the gamma distribution but also for all continuous population distribution.
Next, moments and estimates in (12) are calculated from MSE-RPs, NT-RPs, and MC-RPs of different Ga(a, b).Three typical shapes of gamma distributions (Ga(1, 0.5)-monotone decreasing; Ga(2, 0.5)-right skewed and Ga(7.5, 1)-bell-shaped; their pdfs are plotted in Figure 1).These are chosen and the representative points are set to three sizes (k = 5, 20, 100).The first part of Tables 1 This practice is also carried out in Tables 3-6, A1, and A2.Next, the comparison focuses on the estimating performance of samples from representative points.We take samples from different shapes of gamma distributions (Ga(1, 0.5), Ga(2, 0.5) and Ga(7.5, 1)), as well as their representative points with different sizes (k = 5, 20, 100).Setting sample size N = 1000 and repeat sampling M = 10,000 times for each scenario, the method of moment estimates ( âm 2 and bm 2 ) and maximum likelihood estimates ( âmle and bmle ) are calculated.Define as the average proportional deviation between estimations and parameters.The second part of Tables 1-3 show that MSE-RPs samples have the smallest average proportional deviation in most of the selected scenarios.Tables A1 and A2 in Appendix C give medians and 95% empirical confidence intervals of âm 2 , bm 2 , âmle and bmle .In this simulation study, we observe that the point estimates of a and b from MSE-RPs samples generally have good estimation accuracy with both the moment and maximum likelihood methods.Meanwhile, when k is large, the estimation performances of MSE-RPs samples are similar to those samples from the corresponding Ga(a, b).It is also worth mentioning that when k = 5, the proportional deviation PD âm 2 and PD bm 2 are much smaller than PD âmle and PD bmle .That is, when the size of gamma MSE-RPs is small, it is better to estimate parameters using the method of moments.

Generating MSE-RPs from Harrel-Davis Standardized Samples
This section discusses how to generate MSE-RPs from a gamma-distributed sample.A commonly used approach has two steps as follows: 1.
Calculate the maximum likelihood estimates (MLEs) for a and b, namely â and b, based on the sample dataset; 2.
Generate MSE-RPs from the gamma distribution with the estimated parameters, i.e., As we know, the representativeness of MSE-RPs depends on the estimate of gamma parameters.More accurate estimates will produce better representativeness.However, if a random sample does not represent the population well, the estimates may show large deviations from the true parameters.Hence, the MSE-RPs that are generated are not good representatives of the population distribution.This usually occurs when the sample size is small or medium.Next, we introduce a new Harrel-Davis (HD) standardization technique that can reduce the effect of randomness from samples.This technique transfers a random sample to a set of HD quantile estimators and then treats these estimators as a new "sample".Recall that a set of quantiles with equal probability is a set of NT-RPs for population; a similar idea is utilized for sample standardization.
where Q p i is the p i th HD quantile estimator defined in (6), p i = 2i−1 2n and P( Note here that x is not a random sample because Q p 1 , Q p 2 , • • • , Q p n are not independent.However, since quantile estimators are equiprobable (P(Q p i ) = 1/n), set x is treated as an arbitrarily selected sample, which can be used to calculate MLEs for a and b.A new approach to generate MSE-RPs is proposed as follows: 1.
Obtain the HD standardized sample; 2.
Calculate the MLEs for a and b, namely â and b, based on the HD standardized sample; 3.
Next, a simulation study is provided to show the good performance of HD standard samples in parameter estimation.Consider three gamma distributions (Ga(1, 0.5), Ga(2, 0.5) and Ga(7.5, 1)) and three different sample sizes (n = 50, 200, 500), in each scenario, a number of N = 10, 000 random samples are generated and their HD standardized samples are obtained.The MLEs are calculated for each sample/standardized sample and summarized in Table 4.This shows that the means of estimates from HD standardized samples are closer to the true value in most scenarios.Moreover, the estimates from HD standardized samples appear to have smaller standard deviations than those from random samples.We conclude that HD standardized samples outperform random samples in terms of estimation accuracy and stability based on these results.Therefore, it is recommended to use the new three-step approach to generate MSE-RPs.Here, a comparison study between the MSE-RPs generated by random samples and HD-samples is provided.The estimates ( â and b) in Table 4 are used to generate gamma MSE-RPs.Table 5 summarizes the results when n = 200 with the size of MSE-RPs k = 20.It shows that the moments of gamma MSE-RPs from HD-samples are close to the moments of the origin Ga(a, b).Meanwhile, the method of moment estimates in (12) are obtained.The estimates from HD samples have a better than those from random samples.This conclusion is generally valid when n = 50 and 500.It is noteworthy that the HD standardization technique can also be applied in resampling.Consider another simulation study with the same settings as Table 4.We resample from each sample/standardized sample using n r = n and calculate the MLEs.The means and standard deviations of the resampled MLEs are summarized in Table 6.This shows that estimates from standardized samples generally have a better accuracy and smaller standard deviations when resampling.

Real Data Illustration
In this section, we consider a real-world dataset and illustrate the HD standardized technique proposed in the previous section.In this clinical study, 97 Swiss females (n = 97) aged 70-74 inclusive at the time of diagnosis of dementia (a form of mental disorder) were studied for survival times (in years) by Elandt-Johnson and Johnson [23].These data were analyzed by Ozonur and Paul [24] using the likelihood ratio test and score test with p-values 0.233 and 0.140, which are greater than 0.05.Both tests suggest that the two-parameter gamma distribution adequately fits the dementia data.
Point estimates (MLE) and the bootstrap interval estimates [25] based on the origin sample data and the corresponding HD sample are calculated.The approximate (1 − α) bootstrap percentile interval is defined as In practice, we resample the original data M = 1000 times to obtain 1000 replications of the parameter estimate θ * (i.e., â and b for the gamma distribution) with α = 0.05.These estimates are sorted and the 25th value is used as the lower bound; the 975th value is the upper bound.The MLEs based on the HD standardized sample are âHD = 1.4602 and bHD = 0.2886 with confidence intervals (1.3846, 1.8073) and (0.2637, 0.3839).The lengths of confidence intervals are shorter than those based on the origin sample data, where âorigin = 1.4602 and borigin = 0.2886 with confidence intervals (1.3777, 1.8632) and (0.2659, 0.3914).

Concluding Remarks
In the first part of this paper, the existence and uniqueness of gamma MSE-RPs are proved using two different approaches.An effective algorithm is recommended for the generation of gamma MSE-RPs.The second part of this paper compares gamma MSE-RPs with other representative points in terms of parameter estimation and simulation.This shows that the moments and estimates based on gamma MSE-RPs are the closest to the true values in different scenarios.In addition, samples from gamma MSE-RPs show a good general estimation accuracy.The last part of this paper introduces the new HD standardization technique.When a gamma-distributed sample is at hand, we recommend first transferring it to the HD standardized sample and then using it to estimate gamma parameters or generate MSE-RPs.
In future work, we would like to study whether the MSE-RPs of other distributions can also perform well in parameter estimation.It would also be interesting to explain how HD standardization technique reduces the randomness from samples through a theoretical demonstration.

Table 1 .
-3 summarizes the results in different scenarios.The last line of each table presents the moments and parameters of Ga(a, b).It is clear that if k is fixed, the moments and estimates of MSE-RPs are closer to the true values than other representative points.Moreover, we can observe that the means of MSE-RPs are almost equal to the means of Ga(a, b) in all scenarios; when k becomes large, the moments and estimates of MSE-RPs converge to the true values much faster than other representative points.These results are consistent with the description in Theorem 3. Summary of results from RPs of Ga(1, 0.5) in parameter estimation.RPs are randomly generated 100 times for each Ga(a, b).All values are the average values of the repeats.

Table 2 .
Summary of results from RPs of Ga(2, 0.5) in parameter estimation.

Table 3 .
Summary of results from RPs of Ga(7.5, 1) in parameter estimation.

Table 4 .
Mean (Standard deviation) of MLEs from samples and HD standardized samples.

Table 5 .
Summary of results for MSE-RPs from the esitmated gamma distributions.

Table 6 .
Mean (Standard deviation) of resampled MLEs from samples and HD standardized samples.