Generalized Modified Slash Birnbaum – Saunders Distribution

In this paper, a generalization of the modified slash Birnbaum–Saunders (BS) distribution is introduced. The model is defined by using the stochastic representation of the BS distribution, where the standard normal distribution is replaced by a symmetric distribution proposed by Reyes et al. It is proved that this new distribution is able to model more kurtosis than other extensions of BS previously proposed in the literature. Closed expressions are given for the pdf (probability density functio), along with their moments, skewness and kurtosis coefficients. Inference carried out is based on modified moments method and maximum likelihood (ML). To obtain ML estimates, two approaches are considered: Newton–Raphson and EM-algorithm. Applications reveal that it has potential for doing well in real problems.


Introduction
The BS distribution was introduced by Birnbaum and Saunders [1,2].The aim of this distribution is to model the fatigue in lifetime of certain materials.Nowadays, its use has spread to other contexts such as economic and environmental data.In these new applications, it is quite common to find real datasets in which a BS model with heavier tails would be suitable.Slash models are a good option to deal with this kind of situations, in which heavy tails are a serious problem for the data analyst.This is the main reason slash distributions have received a great deal of attention during the last decades.In this context, we face the problem of improving BS distribution by introducing a generalization able to model more kurtosis than other slash extensions previously proposed in the literature.In these extensions, the emphasis is on kurtosis because, as Moors [3] pointed out, the presence of heavy tails produces high kurtosis.Next, we briefly describe the BS-model and the most relevant slash precedents of our proposal.
As properties (see, for instance, Leiva [5]), we highlight that the BS distribution is continuous, unimodal and positively skewed (asymmetry to right).β is the median of the distribution.α is a shape parameter that modifies the skewness and kurtosis of the distribution.As α tends to zero, the BS distribution tends to be symmetrical around β and its variability decreases.On the other hand, as α increases, the BS distribution exhibits heavier tails.

Slash Methodology
To use the BS distribution for modeling data with outliers, Gómez et al. [6] and Reyes et al. [7] proposed extensions of BS model based on the slash (S) and modified slash (MS) distribution.In this way, they got extensions of BS distribution with a high kurtosis coefficient.
The canonic slash distribution was introduced by Rogers and Tukey [8].This model is defined as the ratio of a N(0, 1) and an independent uniform U(0, 1) distribution.It is proposed as a model for bell-shaped data with heavier tails than the corresponding normal distribution.Their theoretical properties can be seen, for instance, in Rogers and Tukey [8] or Johnson et al. [4].The slash model, denoted as S, in which a kurtosis parameter q is introduced, is defined as with Z ∼ N(0, 1) independent of U ∼ U(0, 1) and q > 0. Based on the representation given in Equation ( 5), Reyes et al. [7] proposed the modified slash (MS) distribution in which the variable at the denominator of Equation ( 5) is replaced by an exponential distribution of parameter 2, that is, U ∼ Exp(2).The MS model exhibits greater kurtosis than the S model.A new extension, called generalized modified slash (GMS) distribution, was introduced recently by Reyes et al. [9].These authors proposed a new slash model where the denominator in Equation ( 5) is a Gamma distribution of parameters (2q, q) with q > 0. The GMS model generalizes the MS model.As main features of the GMS model, we highlight that is a bell-shaped distribution, symmetrical with respect to zero, and exhibits a greater level of kurtosis than its predecessors, thus it can be of interest to study the distribution of the BS extension obtained when Z ∼ N(0, 1) in Equation ( 1) is replaced by a GMS distribution with kurtosis parameter q > 0. This proposal is a generalization of the papers by Gómez et al. [6] and Reyes et al. [7] where slash versions of the BS distribution were considered based on the slash and modified slash distribution, called slash Birnbaum-Saunders (SBS) and modified slash Birnbaum-Saunders (MSBS), respectively.This paper is outlined as follows.In Section 2, the stochastic representation of the generalized modified slash Birnbaum-Saunders (GMSBS) distribution is introduced, and its probability density function, properties, moments, skewness and kurtosis coefficients are obtained.Section 3 is devoted to estimation methods: modified moment and maximum likelihood estimation (an iterative method and the EM-algorithm are proposed).Section 4 assesses the performance of the MLE using the EM algorithm via a simulation study.Two practical applications are given in Section 5.

GMSBS Distributions
In this section, the stochastic representation of a GMSBS distribution is introduced.A closed expression for its pdf is obtained and its properties are studied in depth.Motivated by Equation (1), the stochastic expression proposed for a GMSBS distribution is where X follows a Generalized Modified Slash distribution, X ∼ GMS(0, 1, q), q > 0. It is then said that T follows a GMSBS distribution with parameters α, β, and q, T ∼ GMSBS(α, β, q).Similar to the BS distribution, α is a shape parameter and β is a scale parameter.It is shown below that the new parameter q allows us to control the kurtosis and skewness of this new model and to obtain distributions with greater level of kurtosis than other slash Birnbaum-Saunders models.This fact allows us to model real datasets in which a BS-model can be appropriate but we have heavy tails, especially on the right.

Probability Density Function
Since T, introduced in Equation ( 6), is given as a function X with X ∼ GMS(0, 1, q), to obtain the distribution of T, we need the pdf of X, which is given in next lemma.Lemma 1.Let X ∼ GMS(0, 1, q) be defined as X = Z/V with Z ∼ N(0, 1) independent of V ∼ Ga(2q, q), q > 0.Then, the pdf of X is where φ() denotes the pdf of a N(0, 1) distribution.
Proof.It can be seen in Reyes et al. [9].
Lemma 2. The following closed expression for Equation (7) can be given where U(•) denotes the confluent hypergeometric function of the second kind (Abramowitz and Stegun [10], p. 505).
Proof.It can be seen in Reyes et al. [9].
The next corollary relates the new model, proposed in Equation ( 6), to other slash models previously introduced in the literature.
Proof.This corollary follows from the fact that, for q = 1, a Ga(2, 1) distribution reduces to an exponential, Exp(2), and the stochastic representation proposed in Equation ( 6).
Figure 1 illustrates the effect of the parameter q on the tails of our proposal.Plots given in this figure compare the pdfs of several GMSBS models for different values of q.Specifically, the pdfs of a GMSBS(0.3, 2, q) distribution for q = 8, 3, 1 are given.Note that a greater level of kurtosis is observed for small values of q.These appreciations are formalized in Section 2.3 where moments are obtained.

Properties
In this subsection, some properties of GMSBS distributions are deduced.
Let t p be the pth quantile of T, 0 < p < 1.
where x p denotes the pth quantile of X ∼ GMS(0, 1, q).In particular, the median of T is β, Proof.(1).Equation ( 13) follows from the fact that Equation ( 6) is a one-to-one transformation from R to R + .
On the other hand, t 0.5 = β since X ∼ GMS(0, 1, q) is a symmetric distribution around zero, and therefore x 0.5 = 0.
(2) and (3) are immediate from Proposition 1 by properly using the change-of-variable technique.Next, we show that if q → ∞ then GMSBS(α, β, q) model approaches to a Birnbaum-Saunders distribution.The subscript q is included in the notation to highlight this fact.
Proof.See Appendix A, Proof of Proposition 3.
Proposition 3 means that, for large q, GMSBS(α, β, q) model can be approached by a Birnbaum-Saunders distribution.

Moments
Formulae for the moments of order r, r ∈ Z + , in a GMSBS distribution are given next.Proposition 4. Let T ∼ GMSBS(α, β, q).For r ∈ Z + , E[T r ] exists if and only if q > 2r and Proof.See Appendix A, Proof of Proposition 4.

Remark 1.
From Equation (15), note that E [T r ] is a polynomial in β of degree r, in α of degree 2r (only even powers are obtained), and coefficients that involve rational functions of q (with numerator and denominator of the same degree).
Next, some non-central moments for the GMSBS distribution are given.These expressions involve the Pochhammer symbol or rising factorial, defined for a > 0 and k ∈ Z + as + 1 , q > 2 (mean or expected value of T), (q−8) 8 Proof.The proposed results follow from Proposition 4 and Equation (16).Aditional details can be seen in Appendix A, Proof of Corollary 3.
From Corollary 3, it follows that the variance of T is where The skewness coefficient, β 1 , and the kurtosis coefficient, β 2 , can be computed by using the previous expressions and the relationships Next, the behavior of β 1 and β 2 as functions of the kurtosis parameter q is studied.Although the convergence in law, in general, does not imply the convergence in moments, in this case, we have such convergence as q → ∞.The notation β 1 (q) and β 2 (q) is used.The next corollary states explicit results for β 1 (q) and β 2 (q), if q → ∞, along with others that help us to understand the behavior of these features.The explicit expressions of β 1 (q) and β 2 (q), given in Appendix A, Equation (A12), are used.

Corollary 4. (1) Limit behavior of skewness coefficient
that is, if q → ∞ then the skewness coefficient of a GMSBS(α, β, q) tends to the skewness coefficient of a BS(2α, β) distribution.
(2) Limit behavior of kurtosis coefficient that is, if q → ∞ then the kurtosis coefficient of a GMSBS(α, β, q) tends to the kurtosis coefficient of a BS(2α, β) distribution.
Proof.The proposed results follow from expressions for β 1 (q) and β 2 (q) given in Appendix A, Equation (A12), and the moments µ r , given in Equation ( 15).
(i) In the GMSBS model, as in the Birnbaum-Saunders distribution, β > 0 is a scale parameter, which is also the median of the distribution (see Equation ( 6) and Proposition 2).
(ii) It can be seen in Leiva [5] that in the Birnbaum-Saunders distribution α > 0 is a shape parameter that modifies the skewness and kurtosis of the distribution.As α tends to zero, the BS distribution tends to be more symmetrical around its median β and its variability decreases.The expressions of the skewness coefficient β 1 , given in Equations (A12) and ( 17), suggest that α has a similar interpretation in the GMSBS model.(iii) As for the parameter q > 0, it is proven through this paper that controls the kurtosis and skewness coefficient in the GMSBS model, in such a way that allows us to obtain models with greater level of kurtosis than other slash BS distributions, previously introduced in the literature.
As graphical aid, to show the way in which α and q determine the asymmetry and kurtosis of a GMSBS(α, β, q) model, see plots in Figure 2. Without loss of generality, the scale parameter is taken equal to one, β = 1.They illustrate the way in which the asymmetry and kurtosis coefficients depend on both parameters.Plots in Figure 2 suggest that, on the one hand, for increasing values of α, the asymmetry and kurtosis increase.On the other hand, if α is fixed, asymmetry and kurtosis coefficients are decreasing functions of q.These considerations motivate that GMSBS distribution can be used for modeling more kurtosis than other slash Birnbaum-Saunders distributions previously introduced in the literature such as SBS and MSBS densities.Figure 3 displays the GMSBS pdf plot along with MSBS and SBS densities.Note that the right tail of the GMSBS distribution is heavier than the tails of the other ones.

Estimation
Let T 1 , . . ., T n be a simple random sample (srs) from T ∼ GMSBS(α, β, q), n > 3.In this section, we face the problem of estimating (α, β, q).Next, we propose a couple of techniques to tackle this problem.

Modified Moment Estimation
Following Ng et al. [12], a modified method moment based on Property (3) given in Proposition 2 is next introduced.Thus, we propose to equal E[T], E[T 2 ], and E[1/T] to their corresponding sample moments, that is where the sample harmonic mean.
The solutions of previous equations for q > 4 and α > 0 are called the modified moment (MM) estimators, denoted as α MM , β MM , and q MM .

Let us denote by d(
. Then, the following iterative process can be proposed for k ≥ 0 (21) which needs starting values α (0) , β (0) and q (0) to start the recursion.As initial values, the modified moment estimators, previously proposed, can be considered.
(2) It can be seen in Leiva [5] p. 41 that in the Birnbaum-Saunders model, BS(α, β), the iterative equations for the MLEs of α and β are (24) The effect of introducing the generalized modified slash variable on the BS(α, β) model can be appreciated by comparing Equations ( 21) and ( 22) to Equations ( 24) and (25).

ML Estimation Using EM-Algorithm
Taking advantage of the stochastic representation of the GMSBS model, we can develop a more attractive iterative method to find the MLEs based on the EM algorithm (Dempster et al. [13]).This is a well-known tool when unobserved (missing) data or latent variables are present while modeling.This algorithm enables the computationally efficient determination of the ML estimates when iterative procedures are required.Looking at the stochastic representation of a generalized modified slash distribution given in Equation ( 6), we note that the scale factor V depends on the parameter q, thus we consider a re-parameterization to get the EM-algorithm in the GMSBS model.Then, the resulting stochastic representation for T can be expressed as where X = U −1/2 Z, with Z ∼ N(0, 1) independent of U ∼ GG q, 2q, 2 , i.e., the generalized gamma distribution whose pdf can be expressed as h(u) = 2 q−1 q q u q/2−1 exp{−2qu −1/2 }/Γ(q), u > 0.
Under the new parameterization, we have the conditional distribution of T, given U = u, follows the BS(α/ √ u, β) distribution.Consequently, the pdf of the T reduces to where φ(•) is the pdf of N(0, 1) distribution.
Let T 1 , ..., T n be a simple random sample of size n of T ∼ GMSBS(α, β, q).Here, the parameter denote the complete-data log-likelihood function and its expected value, respectively.Each iteration of the EM algorithm involves two steps.Note that the above setup can be represented through a hierarchical representation given by Let t = [t 1 , . . ., t n ] and u = [u 1 , . . ., u n ] be observed and unobserved data, respectively.The complete data t c = [t , u ] corresponds to the original data t augmented with u.We now detail the implementation of the ML estimation of parameters of GMSBS distributions by using the EM-algorithm.In this section, the hierarchical representation given in Equations ( 28) and ( 29) is useful to obtain the complete log-likelihood function associated with t c , which can be expressed as where c (q|t c ) = n [(q − 1) log(2) + q log q − log Γ(q)] + (q/2 − 1) it follows that the conditional expectation of the complete log-likelihood function has the form where As both quantities S 1n and S 2n have no explicit forms in the context of our model, they have to be computed numerically.Thus, to compute Q(q| θ), we use a similar approach to that by Lee and Xu (2004, Section 3.1) [14].Specifically, let {u r ; r = 1, ..., R} be a sample randomly drawn from the conditional distribution U|(T = t, θ = θ), so the quantity Q(q| θ) can be approximated as follows: We then have the EM-algorithm for the ML estimation of the parameters of the GMSBS distributions as follows: Results in Table 1 show that, when the sample size increases, MLEs' bias tends to zero, and their standard errors and √ MSE s decrease.Therefore, they are consistent.

Applications
Next, the model is illustrated with two datasets collected by the Department of Mines of the University of Atacama, Chile, representing Neodymium and Nickel levels in samples of minerals.

Neodymium Dataset
The descriptive summaries are given in Table 2 where t denotes the sample mean, S t the sample standard deviation, g 1 the sample skewness coefficient, and g 2 the sample kurtosis coefficient.GMSBS, MSBS and SBS distributions are fitted to this dataset, the parameters are estimated via maximum likelihood (EM-algorithm), abd their corresponding standard errors are given in parentheses in Table 3.As goodness of fit criteria, the Akaike Information Criterion (AIC) and QQ-plots are considered.Recall that AIC = −2 ln(likelihood) + 2p where p is the number of parameters to be estimated [16].The AIC values we obtained are given in Table 3.They suggest that GMSBS model provides the best fit to these data since this model exhibits less AIC.
Figure 4 depicts the histogram for the data with the fitted density and the empirical cdf along with the cdf estimated by GMSBS model, as well QQ-plots given in Figure 5; these also show the good agreement of the GMSBS model for the Neodymium data.

Nickel Dataset
The descriptive summaries are given in Table 4. GMSBS, MSBS and SBS distributions are fitted to this dataset, the parameters are estimated via maximum likelihood (EM-algorithm), and their corresponding standard errors are given in parentheses in Table 5.The AIC values we obtained are given in Table 5.They suggest that GMSBS model provides the best fit to these data since this model exhibits less AIC.
Figure 6 depicts the histogram for the data with the fitted density and the empirical cdf along with the cdf estimated by GMSBS model.QQ-plots are given in Figure 7.All of them show the good agreement of the GMSBS model for the Nickel data.Funding: This research received no external funding.

Appendix A. Some Proofs of Results Given in Section 2
In this appendix, details about results dealing with the convergence in law of a GMSBS(α, β, q) model to a BS distribution (q → ∞), moments, skewness and kurtosis coefficients for the GMSBS(α, β, q) are given.
Proof of Proposition 3. To obtain the result proposed in Proposition 3, we must prove that lim q→∞ F T q (t) = F T (t), with F T () the cdf of a BS(2α, β) model (see, for instance, Rohatgi and Ehsanes Saleh, [17]).
Since T q ∼ GMSBS(α, β, q), then we can write T q = h(X q ) where X q ∼ GMS(0, 1, q) and h(•) was given in Equation (6).Recall that we have the following relationship for the cdf of T q F T q (t) = F X q (w(t)) with w(t) = 1 It can be seen in Reyes et al. [9] Proposition 3 that, given X q ∼ GMS(0, 1, q), then X q L −→ X as q → ∞ where X ∼ N(0, 2), that is, lim q→∞ F X q (w) = Φ w 2 , with Φ() : cdf of a N(0, 1).
So, taking the limit in Equation (A1), we have that corresponds to the cdf of a BS(2α, β) distribution.Thus, we obtain the proposed result.
Proof of Proposition 4. By using the stochastic representation given in Equation ( 6), we have with X ∼ GMS(0, 1, q).
By using the binomial formula and therefore E [T r ] exists iff E X 2r exists, that is, iff 2r < q (Reyes et al. [9], Proposition 4).Next we also show that Equation (A3) allows us to obtain the explicit expression of E [T r ] given in Equation (15).Note that for odd s

3 ,Figure 3 .
Figure 3.Comparison of right tails of densities for GMSBS, MSBS and SBS models for the same value for parameters α, β and q.

Figure 4 .Figure 5 .
Figure 4. (left) Histogram of the Neodymium data with estimated pdf of GMSBS distribution; and (right) empirical cdf (dotted lines) with estimated cdf of GMSBS model.

Figure 6 .
Figure 6.(left) Histogram of the Nickel data with estimated pdf of GMSBS distribution; and (right) empirical cdf (dotted lines) with estimated cdf of GMSBS model.

Table 1 .
Empirical bias, standard error and √ MSE for the MLEs of α, β, and q using the EM-algorithm.

Table 2 .
Summary of Neodymium dataset.

Table 3 .
MLEs for Neodymium dataset, their standard errors (in parenthesis) and AIC values.

Table 4 .
Summary of Nickel dataset.

Table 5 .
MLEs for Nickel dataset, their standard errors (in parenthesis) and AIC values.