A Compound Class of the Inverse Gamma and Power Series Distributions

: In this paper, the inverse gamma power series (IGPS) class of distributions asymmetric is introduced. This family is obtained by compounding inverse gamma and power series distributions. We present the density, survival and hazard functions, moments and the order statistics of the IGPS. Estimation is first discussed by means of the quantile method. Then, an EM algorithm is implemented to compute the maximum likelihood estimates of the parameters. Moreover, a simulation study is carried out to examine the effectiveness of these estimates. Finally, the performance of the new class is analyzed by means of two asymmetric real data sets.


Introduction
In the last few decades, several papers have discussed the derivation of new probabilistic families by compounding different distributions with the power series (PS) model.For example, the exponential geometric (EG, Adamidis and Loukas [1]), exponential Poisson (EP, Kus [2]) and exponential logarithmic (EL, Tahmasbi and Rezaei [3]) distributions.The exponential PS is introduced in Chahkandi and Ganjali [4], Morais and Barreto-Souza [5] presented the Weibull PS (WPS) class of distributions, Mahmoudi and Jafari [6] defined the generalized exponential PS (GEPS) distributions, Silva et al. [7], the extended Weibull PS (EWPS) and Bagheri et al. [8], the generalized modified Weibull PS distribution (GMWPS).More recently, Warahena-Liyanage and Pararai [9] introduce the Lindley PS distributions (LPS), Alizadeh et al. [10] study the exponentiated power Lindley PS class of distributions, Elbatal et al. [11] propose and study a new family of exponential Pareto PS and finally the Generalized Burr XII PS distribution was given by Elbatal et al. [12].
In this work, we propose to study the resulting model obtained by compounding the inverse gamma (IG) and the PS distribution introduced by Noack [13].
We say that a random variable X follows an IG distribution with shape parameter α > 0 and scale parameter β > 0 (henceforward, the notation X ∼ IG(α, β) will be used) if its probability density function (pdf) is given by and its survival function is

The Model
Let M ≥ 1 be the number of concurrent causes producing the event of interest in a subject.For instance, in a cancer context, M represents the number of carcinogenic cells that a patient has and, as a result of this number, it might trigger the metastasis process.In electronic circuits connected in series, M represents the number of components that the circuit has, so if one of those components fails, the entire circuit will fail.In credit scoring, M represents the number of different factors for which a customer stops paying their bills (economic, psychological, family, etc.).Let us also assume that M follows a PS distribution (Noack [13]) with probability mass function (pmf) given by where a m > 0, θ > 0 is called the power parameter and the series function We highlight that the pmf in Equation ( 2) also corresponds to the generalized Power distribution discussed in Patil [14].However, many works that have discussed this distribution referred to this model as PS model (see for instance, Adamidis and Loukas [1]; Morais and Barreto-Souza [5]).Hereafter, (2) will be denoted as PS(θ, A(θ)).In Table 1, for four members of the PS family, the values of a m , A(θ) and parameter space Θ are illustrated.
Table 1.Special cases of the PS(θ, A(θ)) distribution.For Binomial distribution q is considered known.

Distribution
Notation Let the random variable W a denote the time when the ath concurrent causes produce the event of interest.W a , a = 1, 2, . . ., M, are assumed conditionally independent and identically distributed given M with common distribution IG(α, β).The inverse gamma PS (henceforth, IGPS) model is defined as the marginal distribution of T (1) = min(W 1 , . . ., W M ).The survival function for the IGPS model is given by and its corresponding density function is provided by The hazard function is Figure 1 shows the density and hazard functions for the inverse gamma Poisson (IGP), inverse gamma logarithmic (IGL), inverse gamma geometric (IGG) and inverse gamma binomial (IGB) distributions, respectively.
In the following, we will examine some properties of the probability density function (4).
Proposition 1.The IG distribution for T (1) is a limiting special case of the IGPS model when Proof.We have that the cumulative distribution function (cdf) of the IG distribution for using the proof given by Morais and Barreto-Souza [5] such limit is given by lim The probability density function of IGPS(θ, α, β) have the following interesting representation.
By inserting the latter expression in (4), it is satisfied that where f T (1) |M is conditional pdf of T (1) given the value of M for the IG distribution.Therefore, this density function can be expressed as an infinite linear combination of the inverse gamma distribution.
Proof.Let P(M = m) given by ( 2) and f T (1) |M (t; α, β) given by Then .  The following proposition illustrates the moments of the IGPS model.

Proposition 3.
The rth moment of an IGPS(θ, α, β) distribution is given by Proof.By taking expression (6) and using the Monotone Convergence Theorem, the rth moment of the random variable T is calculated as follows .
The pdf of the ith order statistic T (i) is given by where f (•) is the pdf given in (4), the cdf of T (i) is given by and using the result Barakat and Abdelkader [15], and considering that S(t) for our model as (3), we obtain rth moment of the order statistic given by for i = 1, . . ., n.

Estimation
In this section, we discuss two different methods to estimate the parameters of the IGPS distribution.The first one is based on matching theoretical and sample quantiles for the IGPS and the second is based on the EM algorithm (see Dempster et al. [16]).

Quantile-Matching Estimation Method
A first set of estimates for Ψ = (θ, α, β) is obtained by matching the first, second and third sample quartiles (denoted as q 1 , q 2 and q 3 respectively) with their theoretical counterpart.In this case, the resulting equations are By solving for β, we have Therefore, the system is reduced to the following equations that can be solved numerically,

EM-Type Algorithm
For a sample t 1 , . . ., t n from the IGPS model, the (observed) log-likelihood function for ψ is given by Direct maximization of ( 11) can be hard.For this reason, since the distribution given in ( 4) is obtained through a mixing process, we propose an EM-type algorithm to perform the parameter estimation.In this problem, the vector M = (M 1 , . . ., M n ) is unobservable and the vector t = (t 1 , . . ., t n ) represents the observable data.Thus, the vector D comp = (t, M) represents the complete data.Up to a constant, the complete log-likelihood function for ψ is given by Let ψ (k) be the estimate of ψ at the kth iteration and denote Q(ψ | ψ (k) ) as the conditional expectation of c (ψ) given the observed data and ψ (k) .Therefore, where i can be computed using the Proposition 1 in Gallardo et al. [17] considering We also note that the maximization in relation to θ can be performed independently from the values of α and β.However, the maximization in relation to α (β) can be performed conditioning on the value of β (α), producing a conditioning maximization (CM) step (see Meng and Rubin [18] for details).
In summary, the kth iteration of the EM algorithm have the following form: and compute as the solution for the non-linear equation • CM-step III: Given α (k) , update β (k) as the solution for the non-linear equation • If some convergence condition is satisfied then stop iterating, otherwise move back to the E-step for another iteration.
The standard errors of the estimates ψ = ( θ, α, β) can be estimated using the method given by Louis [19].Here, we use the observed information matrix instead of the Fisher's information matrix and replace the missing values by the corresponding pseudo-values calculated in the last iteration of the ECM algorithm.

Randomized Quantile Residuals
As a graphical method of model diagnosis, we use QQ-plots of the randomized quantile residuals (see Dunn and Smyth [20]).The ith randomized quantile residual is defined as where F(•) is the cdf of the model specified by (4) and Φ(•) the cdf of the standard normal distribution.If the model is correctly specified, r q,i are a random sample from the standard normal distribution.In particular the expression for the ith randomized quantile residual of the IGPS distribution is given by The latter expression will be used to sketch the QQ-plots in the applications section.

Simulation Study
In the following, we study the behavior of the maximum likelihood estimates (MLE) in finite samples, to empirically verify that these estimates satisfy desirable properties (unbiased, asymptotically efficient, normally asymptotic distributed).For this purpose, the EM algorithm was used to compute the estimates and their corresponding standard errors by means of the Hessian matrix.This process is replicated 1000 times with a sample size n = 50, 100, 200 for the parameters θ = 1.5, 3 in the Poisson model and θ = 0.2, 0.85 in Geometric model.The values α = 1.2, 2 and β = 5, 10 are maintained for both models.Then, for each estimate we calculated its average bias (bias), average standard error (se), root of the mean squared error (RMSE) as shown in Table 2.We observed that the averages are close to the true values for the IGP and IGG models.Additionally, as expected, the bias and RMSEs decrease as the sample size increases.

Real Data Illustration
In this section, we apply the IGPS distribution to two real data sets.
We first use the quantile-matching estimation method to find the estimates of the IGP, IGL and IGG distributions, obtaining θ = 0.01, α = 0.8656 and β = 0.9592 for the IGP model, θ = 0.01, α = 0.8657 and β = 0.9592 for the IGL model and θ = 0.01, α = 0.8661 and β = 0.9597 for the IGG model respectively.Then, by taking those figures as starting values, we estimate the parameters via the aforementioned EM algorithm.For model comparison, we have also fitted the WPS model discussed in Morais and Barreto-Souza [5] and EG, EP and EL by Adamidis and Loukas [1], Kus [2] and Tahmasbi and Rezaei [3], respectively.Table 3 exhibits three measures of model selection, the maximum of the log-likelihood function ( max ), Akaike's information criterion (AIC) (see Akaike [22]) and Bayesian information criterion (BIC) (see Schwarz [23]), for the repair times data set.For the first measure of model validation a larger value is preferable whereas for last two measure of model selection a lower figure is desirable.Table 4 shows the estimates and standard errors (in brackets) for the three models of the EPS, WPS and IGPS families with a lower AIC and BIC statistics.In addition, it is useful to express the fit of the model to the data in terms of distribution functions.In particular, it is suggested to use the following three empirical distribution function (EDF) goodness-of-fit measures to quantify the "distance" between the empirical distribution function constructed from the data and the cumulative distribution function of the fitted models.In this paper, we propose the use of the Anderson-Darling (AD) test statistics.We also are interested in testing for normality by means of the Shapiro-Francia (SF) and Shapiro-Wilk tests.For the AD test, smaller values of the test statistics indicate a better fit of the model to the data.With respect to the SF and SW tests under the null hypothesis the data are drawn from a normal distribution.As judged by the figures of p-value of the corresponding test statistics presented in Table 4, it can be seen that none of the models are rejected at the 5% significance level, validating that the models are statistically legitimate candidates to explain this data set.Furthermore, we have plotted in Figure 2 the histogram and the estimated density functions for this data set.Finally, the QQ-plot of the randomized quantile residuals is illustrated in Figure 3.A perfect alignment with the 45 • line implies the residuals are normally distributed.It is observable that the residuals for the IGG distribution underestimate the lower part and overestimate the upper part of the distribution of residuals.

Gauge Lengths Data Set
The second data set was originally reported by Badar and Priest [24] and it also discussed in Kundu and Raqab [25].It deals with the strength measured in GPA for single carbon fibers and impregnated 1000-carbon fiber tows.Single fibers were tested under tension at gauge lengths of 10 mm (n = 63).The data set consists of the following observations: We use again the quantile-matching estimation method to find the estimates of the IGP, IGL and IGG distributions, obtaining θ = 0.01, α = 20.1179 and β = 58.5640for the IGP model, θ = 0.01, α = 20.0862 and β = 58.3896for the IGL model and θ = 0.01, α = 20.0958 and β = 58.5036for the IGG model respectively.Next, we estimate the parameters by means of the EM algorithm using those numbers as initial values.For the sake of model comparison, we have also fitted the WPS model, EG, EP and EL, respectively.Table 5 exhibits three measures of model selection, the maximum of the log-likelihood function ( max ), AIC and BIC criteria, for the gauge lengths data set.Table 6 shows the estimates and standard errors (in brackets) for the three models of the EPS, WPS and IGPS families with a lower AIC and BIC statistics.Moreover, we propose again the use of the Anderson-Darling (AD) test statistics.We also are interested in testing for normality by means of the Shapiro-Francia (SF) and Shapiro-Wilk tests.For the AD test, smaller values of the test statistics indicate a better fit of the model to the data.With respect to the SF and SW tests under the null hypothesis the data are drawn from a normal distribution.As judged by the figures of p-value of the corresponding test statistics presented in Table 6, it can be seen that none of the models are rejected at the 5% significance level, validating that the models are statistically legitimate candidates to explain this data set.Once again, we have plotted in Figure 4 the histogram and the estimated density functions for this data set.Finally, the QQ-plot of the randomized quantile residuals is now displayed in Figure 5.It is again noticeable that the residuals for the IGG distribution underestimate the lower part and overestimate the upper part of the distribution of residuals.

Conclusions
In this paper, the inverse gamma power series (IGPS) family of probabilistic distribution has been introduced.This family has been obtained by mixing the inverse gamma and power series distributions.Moreover, four particular members of this family has been derived and examined.Some of its most relevant properties has been studied including the probability density function, survival and hazard functions, and the order statistics.The issue of parameter estimation was first discussed by means of the quantile-matching estimation method.Then, the estimates obtained by the latter method were used as initial values in a novel EM algorithm to carry out maximum likelihood estimation.Furthermore, a simulation study was performed to examine the efficiency of these estimates.

Figure 1 .
Figure 1.Density and hazard functions for the IGP, IGL, IGG and IGB distributions with different combinations for parameters.

Figure 2 .
Figure 2. (a) Density function for IGG, WG and EP models and (b) for the right tail in repair times data set.

Figure 3 .
Figure 3. QQ-plot of the randomized quantile residuals of IGG distribution for repair times data set.

Figure 4 .Figure 5 .
Figure 4. Density function for IGG, WG and LG models in gauge lengths data set.

Table 2 .
Simulation study for IGP and IGG models.

Table 3 .
Maximum of the log-likelihood function max , AIC and BIC for EPS, WPS and IGPS models in the repair times data set.

Table 4 .
Estimates, standard errors (in brackets) and p-values associated with the AD, SF and SW statistics for IGG, WG and EP models in the repair times data set.

Table 5 .
Maximum of the log-likelihood function max , AIC and BIC for EPS, WPS and IGPS models in the gauge lengths data set.

Table 6 .
Estimates, standard errors (in brackets) and p-values associated with the AD, SF and SW statistics for IGG, WG and EP models in the gauge lengths data set.