An Asymmetric Distribution with Heavy Tails and Its Expectation–Maximization (EM) Algorithm Implementation

: In this paper we introduce a new distribution constructed on the basis of the quotient of two independent random variables whose distributions are the half-normal distribution and a power of the exponential distribution with parameter 2 respectively. The result is a distribution with greater kurtosis than the well known half-normal and slashed half-normal distributions. We studied the general density function of this distribution, with some of its properties, moments, and its coefﬁcients of asymmetry and kurtosis. We developed the expectation–maximization algorithm and present a simulation study. We calculated the moment and maximum likelihood estimators and present three illustrations in real data sets to show the ﬂexibility of the new model.


Introduction
In recent years, for data with positive support, specifically, lifetime, or reliability, the half-normal (HN) model has been widely used. The probability density function (pdf) is given by where σ > 0 is the scale parameter and φ(·) represents the standard normal pdf. We denote this by writing X ∼ HN(σ). Some generalizations for this model are proposed by Cooray and Ananda [1], Cordeiro et al. [2], Bolfarine and Gómez [3] and Gómez and Vidal [4].
Olmos et al. [5] extended the HN distribution by incorporating a kurtosis parameter q, with the purpose of obtaining heavier tails, i.e., it has greater kurtosis than the base model. They called this model the slashed half-normal (SHN) distribution. Its construction is based on considering the quotient of two independent random variables, with random variable X ∼ HN(σ) in the numerator and the U ∼ U(0, 1) in the denominator (See Rogers and Tukey [6] and Mosteller and Tukey [7] for more details). Thus a model is obtained that has more flexible coefficients of asymmetry and kurtosis than the HN model. We say that a random variable T follows a SHN if its pdf is given by f T (t; σ, q) = q 2 q π σ q Γ((q + 1)/2)t −(q+1) G t 2 ; (q + 1)/2, 1 2σ 2 , t > 0, where σ > 0 is a scale parameter, q > 0 is a kurtosis parameter, G(z; a, b) = 2 0 g(x; a, b)dx is the cumulative distribution function (cdf) of the gamma distribution and g(·; a, b) is the pdf of the gamma model with shape and rate parameters a and b, respectively.
Reyes et al. [8] introduced the modified slash (MS) distribution. We say that M has a MS distribution if the construction of which is based on considering an exponential (Exp) distribution with parameter 2 in the denominator, i.e., they consider that E ∼ Exp (2). The motivation of the selection of the Exp(2) distribution is given in Reyes et al. [8]. The result of this work shows that the MS model has a greater coefficient of kurtosis and this characteristic is very important for modeling data sets when they contain atypical observations. The principal goal of this article is to use the idea published by Reyes et al. [8] to construct an extension of the half-normal model with a greater range in the coefficient of kurtosis than the SHN model, in order to use it to model atypical data. This will allow us obtain a new model generated on the basis of a scale mixture between an HN and a Weibull (Wei) distribution.
The rest of the paper is organized as follows: Section 2 contains the representation of this model and we generate the density of the new family, its basic properties and moments, and its coefficients of asymmetry and kurtosis. In Section 3 we make inferences using the moments and maximum likelihood (ML) methods. In Section 4 we implement the expectation-maximization (EM) algorithm. In Section 5 we carry out a simulation study for parameter recovery. We show three illustrations in real datasets in Section 6 and finally in Section 7 we present our conclusions.

An Asymmetric Distribution
In this section we introduce the representation, its pdf, and some important properties and graphs to show the flexibility of the new model.

New Distribution
The representation of this new distribution is where X ∼ HN(σ) and Y ∼ Exp(2) are independent, σ > 0, q > 0. We call the distribution of T the modified slashed half-normal (MSHN) distribution. This is denoted by T ∼ MSHN(σ, q).

Density Function
The following Proposition shows the pdf of the MSHN distribution with scale parameter σ and kurtosis parameter q, generated using the representation given in (3). Proposition 1. Let T ∼ MSHN(σ, q). Then, the pdf of T is given by where t > 0, σ > 0, q > 0, and N(·, ·, ·, ·) is defined in Lemma 1 in the Appendix A.
Proof. Using the stochastic representation given in (3) and the Jacobian method, we obtain that the density function associated with T is given by Making the change of variable u = t 2 w 2 we have, Hence, applying the Lemma 1 as set forth in the Appendix A, we obtain the result. We perform a brief comparison illustrating that the tails of the MSHN distribution are heavier than those of the SHN distribution. Table 1 shows the tail probability for different values in the SHN and MSHN models. It is immediately apparent that the MSHN tails are heavier than those of the SHN distribution.

Properties
In this sub-section we study some properties of the MSHN distribution.
Proof. Using Proposition 1 for σ = q = 1, we have, Changing the variable x = u 2 we obtain the result.
Proof. Since the marginal pdf of T is given by and using the Lemma 1 in the Appendix A, the result is obtained. Lehmann [9]). Since T ∼ MSHN(σ, q), by applying Slutsky's Lemma (see Lehmann [9]) to T = X W , we have that is, for increasing values of q, T converges in law to a HN(σ) distribution.
Remark 1. Proposition 2 shows us that the MSHN(1, 1) distribution has a closed-form expression. Proposition 3 shows that an MSHN(σ, q) distribution can also be obtained as a scale mixture of an HN and a Wei distribution. This property is very important since it makes it possible to generate random numbers and implement the EM algorithm. Proposition 4 implies that, if q → ∞ then the cdf of an MSHN(σ, q) model approaches to the cdf of a HN(σ) distribution.

Moments
In this sub-section, the following proposition shows the computation of the moments of a random variable T ∼ MSHN(σ, q). Hence, it also displays the coefficients of asymmetry and kurtosis.
. Then the r-th moment of T is given by where Γ(·) denotes the gamma function.
. Then the expectation and variance of T are given respectively by and . Then the coefficients of asymmetry (β 1 ) and kurtosis (β 2 ) are given by , q > 3, and Remark 2. Figure 2 shows graphs of the coefficients of the MSHN distribution compared with those of the SHN distribution. Note that the MSHN distribution presents higher asymmetry and kurtosis values than the SHN distribution. Furthermore, in both distributions when q → ∞ the coefficients of asymmetry and kurtosis converge to √ 2(4 − π)(π − 2) −3/2 and (3π 2 − 4π − 12)(π − 2) −2 , respectively; they coincide with the coefficients of the HN distribution.

Inference
Proposition 6. Let T 1 , . . . , T n be a random sample of size n of the T ∼ MSHN(σ, q) distribution. Then for q > 2, the moment estimators of σ and q are given by where T is the mean of the sample and T 2 is the mean of the sample for the square of the observations.
Proof. From Proposition 5, and considering the first two equations in the moments method, we have Solving the first equation above for σ we obtain σ M given in (9). Substituting σ M in the second equation above, we obtain the result given in (10).

Em Algorithm
The EM algorithm (Dempster et al. [10]) is a useful method for ML estimation in the presence of latent variables.
To facilitate the estimation process, we introduce latent variables W 1 , . . . , W n through the following hierarchical representation of the MSHN model: In this setting, we have that Therefore, the complete log-likelihood function can be expressed as Letting w i = E[W i |t i , θ = θ], it follows that the conditional expectation of the complete log-likelihood function has the form where Q(q| θ) = n log(q) + qS 1n − 2S 2n,q , with S 1n = As both quantities S 1n and S 2n,q have no explicit forms in the context of the MSHN model, they have to be computed numerically. Thus to compute Q(q| θ) we use an approach similar to that of Lee and Xu ( [11], Section 3.1), i.e., considering {w r ; r = 1, ..., R} to be a random sample from the conditional distribution W|(T = t, θ = θ), then Q(q| θ) can be approximated as Therefore, the EM algorithm for the MSHN model is given by E-step: Given θ = θ (k) = ( σ (k) , q (k) ) , calculate w i (k) , for i = 1, . . . , n.

CM-step II:
The E, CM-I and CM-II steps are repeated until a convergence rule is satisfied, say |l( θ (k+1) ) − l( θ (k) )| is sufficiently small. Finally, standard errors (SE) can be estimated using the inverse of the observed information matrix.

Remark 3.
i. For q → ∞, σ in M-step reduces to those obtained when the HN distribution is used; ii. An alternative to the CM-Steps II is obtained considering the idea in Lin et al. ( [12], Section 3), by using the following estimation: CML-step: Update q (k) by maximizing the constrained actual log-likelihood function, i.e. q (k+1) = arg max q ( σ (k+1) , q).

Simulation
We present a simulation study to assess the performance of the EM algorithm for the parameters σ and q in the MSHN model. We consider 1000 samples of three sample sizes generated from the MSHN model: n = 30, 50 and 100. To generate T ∼ MSHN(σ; q) the following algorithm was used: For each sample generated, the ML estimates were computed using the EM algorithm. Table 2 shows the mean of the bias estimated for each parameter (bias), its SE and the estimated root of the mean squared error (RMSE). From Table 2, we conclude that the ML estimates are quite stable. The bias is reasonable and diminishes as the sample size is increased. As expected, the terms SE and RMSE are closer when the sample size is increased, suggesting that the SE of the estimators is well estimated.

Aplications
In this section we provide three applications to real datasets that illustrate the flexibility of the proposed model.

Application 1
Lyu [13] presents a data set related 104 times with programming in the Centre for Software Reliability (CSR). Some descriptive statistics are: mean = 147.8, variance = 60, 071.7, skewness = 3, and kurtosis = 14.6. The moment estimators for the MSHN model were σ M = 74.085 and q M = 2.402, which were used as initial values to compute the ML estimator in Table 3.
For each distribution we report the estimated log-likelihood. To compare the competing models, we consider the Akaike information criterion (AIC) (Akaike [14]) and the Bayesian information criterion (BIC) (Schwarz [15]), which are defined as AIC = 2k − 2 log lik and BIC = k log(n) − 2 log lik, respectively, where k is the number of parameters in the model, n is the sample size and log lik is the maximum value for the log-likelihood function. Table 4 displays the AIC and BIC for each model fitted. Figure 3 presents the histogram of the data fitted with the HN, SHN and MSHN distributions, provided with the ML estimations. The QQ-plot for the MSHN and SHN distributions are presented in Figure 4.

Application 2
The second dataset is taken from Von Alven [16], and represents 46 instances of active repairs (in hours) for an airborne communication transceiver. Some descriptive statistics are: mean = 3.607, variance = 24.445, skewness = 2.888, and kurtosis = 11.802.
Initially we computed the moment estimators for the MSHN distribution, obtaining the following estimations: σ M = 2.407 and q M = 2.635. We used these estimations as initial values in computing the ML estimators presented in Table 5. For each distribution we report the estimated log-likelihood.  Table 6 displays the AIC and BIC for each model fitted. Figure 5 presents the histogram of the data fitted with the HN, SHN and MSHN distributions, provided with the ML estimations.
Initially we computed the moment estimators for the MSHN distribution, and obtained the following estimations: σ M = 38.125 and q M = 2.743. We used these estimations as initial values in computing the ML estimators presented in Table 7. For each distribution we report the estimated log-likelihood.  Table 8 displays the AIC and BIC for each model fitted. Figure 6 presents the histogram of the data fitted with the HN, SHN and MSHN distributions, provided with the ML estimations.

Conclusions
In this paper, we have introduced a new and more flexible model, as it increases kurtosis and contains, as a particular case, the HN distribution. The EM algorithm is implemented, obtaining acceptable results for the maximum likelihood estimators. In applications using real data it performs very well, better than competing models. Some further characteristics of the MSHN distribution are: • The MSHN distribution has a greater kurtosis than the SHN distribution, as is clearly reflected in Table 1. • The proposed model has a closed-form expression and presents more flexible asymmetry and kurtosis coefficients than that of the HN model. • Two stochastic representations for the MSHN model are presented. One is defined as the quotient between two independent random variables: An HN in the numerator and Exp (2) in the denominator. The other shows that the MSHN distribution is a scale mixture of an HN and a Wei distribution. • Using the mixed scale representation, the EM algorithm was implemented to calculate the ML estimators. • Results from a simulation study indicate that with a reasonable sample size, an acceptable bias is obtained. • Three illustrations using real data show that the MSHN model achieves a better fit in terms of the AIC and BIC criteria.