EM Algorithm in the Slash 2S-Lindley Distribution with Applications

Muñoz, Héctor A.; Castillo, Jaime S.; Gallardo, Diego I.; Venegas, Osvaldo; Gómez, Héctor W.

doi:10.3390/axioms14020101

Open AccessArticle

EM Algorithm in the Slash 2S-Lindley Distribution with Applications

by

Héctor A. Muñoz

¹,

Jaime S. Castillo

¹

,

Diego I. Gallardo

^2,*

,

Osvaldo Venegas

³

and

Héctor W. Gómez

¹

Departamento de Estadística y Ciencias de Datos, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile

²

Departamento de Estadística, Facultad de Ciencias, Universidad del Bío-Bío, Concepción 4081112, Chile

³

Departamento de Ciencias Matemáticas y Físicas, Facultad de Ingeniería, Universidad Católica de Temuco, Temuco 4780000, Chile

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(2), 101; https://doi.org/10.3390/axioms14020101

Submission received: 26 December 2024 / Revised: 21 January 2025 / Accepted: 24 January 2025 / Published: 29 January 2025

(This article belongs to the Special Issue Probability, Statistics and Estimations, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

In this work, we present a new distribution, which is a slash extension of the distribution of the sum of two independent Lindley random variables. This new distribution is developed using the slash methodology, resulting in a distribution with more flexible kurtosis, i.e., the ability to model atypical data. We study the density function of the new model and some of its properties, such as the cumulative distribution function, moments, and its asymmetry and kurtosis coefficients. The parameters are estimated by the maximum likelihood method with the EM algorithm. Finally, we apply the proposed model to two real datasets with high kurtosis, showing that it provides a better fit than two distributions known in the literature.

Keywords:

maximum likelihood; 2S-Lindley distribution; slash distribution

MSC:

62E15; 62E20; 62F10; 62P99

1. Introduction

The slash distribution is a distribution with heavier tails than the normal distribution, and its representation is the quotient between two independent random variables, one normal and the other a power of the uniform distribution. We say that X has a slash distribution if its representation is given by

\begin{matrix} X & = & Y / U^{1 / q}, \end{matrix}

(1)

where

Y \sim N (0, 1)

,

U \sim U n i f o r m (0, 1)

, Y is independent of U and

q > 0

; its representation can be seen in Johnson et al. [1]. Properties of this family are discussed by Rogers and Tukey [2] and Mosteller and Tukey [3]. The maximum likelihood estimators for location and scale are discussed in Kafadar [4]. Wang and Genton [5] offer a multivariate version and a multivariate skew version of the slash distribution. Gómez et al. [6] extend the slash distribution using the family of univariate and multivariate elliptical distributions. This methodology for increasing the weight of tails has also been used in distributions with positive support; for example, by Olmos et al. ([7,8]) in the half-normal and generalized half-normal distributions, Astorga et al. [9] in the power Muth distribution, and Rivera et al. [10] in the Rayleigh distribution.

A distribution with positive support is the Lindley model (see Lindley [11]); we say that a random variable X has a Lindley (L) distribution if its probability density function (pdf) is given by

f_{X} (x; θ) = \frac{θ^{2}}{θ + 1} (1 + x) exp (- θ x), x > 0,

where

θ > 0

is the shape parameter. We denote this

X \sim L (θ)

. The L distribution has been used in various areas. Researchers who have carried out these studies include Ghitany ([12,13]), Gómez-Déniz [14], Krishna and Kumar [15], Bakouch et al. [16], Gui [17], Oluyede and Yang [18], Shanker et al. [19], and Abouammoh et al. [20]. Gui [17] made an extension of the L distribution using the slash methodology described in (1); by considering that

Y \sim L (θ)

they obtain the L slash distribution (LSD). The L distribution and its generalizations have also been studied by Tomy [21].

Chesneau et al. [22] introduced a distribution constructed as the sum of two independent

L (β)

random variables, i.e., if

X_{1}

and

X_{2}

are independent and identically distributed as

L (β)

, then a new random variable is defined as

Y = X_{1} + X_{2}

. We say Y has a 2SL distribution with shape parameter

β

and its pdf is given by:

f_{Y} (y; β) = \frac{β^{4}}{{(1 + β)}^{2}} y (\frac{y^{2}}{6} + y + 1) exp (- β y), y > 0,

where

β > 0

is the shape parameter. We denote this by

Y \sim 2 S L (β)

. The derivation of the pdf of Y is based on the convolution product, and is detailed in Section 2.1 of Chesneau et al. [22].

The principal object of this paper is to increase the weight of the tail of the 2SL distribution, using the slash methodology given in (1) and considering

Y \sim 2 S L (β)

. In this way we obtain a distribution with a heavier right tail than the 2SL distribution, for modelling atypical data.

The article is organized as follows: in Section 2 we describe the new distribution and its properties. In Section 3, we carry out inferences by the moments and maximum likelihood (ML) methods using the EM algorithm, and perform a simulation study. In Section 4, we present two applications to real datasets, comparing them with the 2SL and LSD distributions. In Section 5, we provide some conclusions.

2. Density and Properties

In this section, we provide the representation, pdf and basic properties of the new distribution.

2.1. Stochastic Representation

A random variable X follows a slash 2SL (S2SL) distribution with parameters

β

and q if X is obtained as

X = Y / U^{1 / q},

(2)

where X and Y are independent,

Y \sim 2 S L (β)

,

U \sim U (0, 1)

,

β > 0

, and

q > 0

. We denote this as

X \sim S 2 S L (β, q)

.

Proposition 1.

Let

X \sim S 2 L S (β, q)

. Then, the density function of X is given by

f_{X} (x; β, q) = \frac{q x^{- (q + 1)}}{6 β^{q} {(1 + β)}^{2}} {γ_{4} + 6 β [γ_{3} + β γ_{2}]},

where

γ_{i} = γ (q + i, β x)

and

γ (a, v) = \int_{0}^{v} w^{a - 1} e^{- w} d w

is the lower incomplete gamma function.

Proof.

Using the stochastic representation given in (2) and the random vectors transformation method, we obtain

\begin{matrix} X = \frac{Y}{U^{\frac{1}{q}}} \\ W = U^{\frac{1}{q}} \end{matrix}\} \Rightarrow \begin{matrix} Y = X W \\ U = W^{q} \end{matrix}\} \Rightarrow J = |\begin{matrix} \frac{\partial y}{\partial x} & \frac{\partial y}{\partial w} \\ \frac{\partial u}{\partial x} & \frac{\partial u}{\partial w} \end{matrix}| = |\begin{matrix} w & x \\ 0 & q w^{q - 1} \end{matrix}| = q w^{q} .

Then,

f_{X, W} (x, w) = | J | f_{Y, U} (x w, w^{q}) = q w^{q} \frac{β^{4}}{{(1 + β)}^{2}} (x w) (\frac{x^{2} w^{2}}{6} + x w + 1) e^{- β x w}

,

0 < w < 1

,

x > 0

. Marginalizing with respect to the random variable W, we have that

f_{X} (x) = \frac{x β^{4} q}{{(1 + β)}^{2}} [\frac{x^{2}}{6} \int_{0}^{1} w^{q + 3} e^{- β x w} d w + x \int_{0}^{1} w^{q + 2} e^{- β x w} d w + \int_{0}^{1} w^{q + 1} e^{- β x w} d w] .

By substituting the variable

t = β x w

and evaluating the integrals, the result is obtained. □

The 2SL distribution is an alternative to the L distribution, and the construction of the S2SL distribution aims to increase the right tail of the 2SL distribution. On the other hand, one of the representations of the S2SL distribution facilitates parameter estimation using the EM algorithm, thereby transforming it into an alternative distribution to other heavy-tailed distributions.

Figure 1 shows the density of the S2SL and 2SL distributions for

β = 1

and different values of parameter q. It can be seen that as parameter q diminishes, the density function of the S2SL distribution presents greater kurtosis.

2.2. Properties

Proposition 2.

Let

X \sim S 2 L S (β, q)

. Then the cumulative distribution function (cdf) of X is given by

F_{X} (x; β, q) = \frac{{(β x)}^{- q}}{6 {(1 + β)}^{2}} {{(β x)}^{q} [γ_{4 - q} + 6 β (β γ_{2 - q} + γ_{3 - q})] - 6 β (β γ_{2} + γ_{3}) - γ_{4}} .

Proof.

Using the definition of cdf and integrating by parts, the result is obtained. □

Figure 2 presents a graphical comparison of the cdf of the S2SL model (with

β = 1

) for different values of q, compared to the 2SL distribution.

The survival and hazard functions are defined as

R_{X} (x; β, q) = 1 - F_{X} (x; β, q)

and

h_{X} (x; β, q) = f_{X} (x; β, q) / [1 - F_{X} (x; β, q)]

, respectively. These are two important functions in survival analysis because they represent the probability that an observation does not present the event of interest as a function of time, and the approximate probability of presenting the event of interest at the immediately following instant. For the S2SL distribution, these functions are presented in the following proposition.

Proposition 3.

Let

X \sim S 2 S L (β, q)

; then, the survival and hazard functions are given by

R_{X} (x; β, q) = \frac{6 {(1 + β)}^{2} - {(β x)}^{- q} {{(β x)}^{q} [γ_{4 - q} + 6 β (β γ_{2 - q} + γ_{3 - q})] - 6 β (β γ_{2} + γ_{3}) - γ_{4}}}{6 {(1 + β)}^{2}},

h_{X} (x; β, q) = \frac{q x^{- (q + 1)} {γ_{4} + 6 β [γ_{3} + β γ_{2}]}}{6 β^{q} {(1 + β)}^{2} - x^{- q} {{(β x)}^{q} [γ_{4 - q} + 6 β (β γ_{2 - q} + γ_{3 - q})] - 6 β (β γ_{2} + γ_{3}) - γ_{4}}} .

Proof.

Using the definitions of the survival and hazard functions,

R_{X} (x; β, q) = 1 - F_{X} (x; β, q); h_{X} (x; β, q) = \frac{f_{X} (x; β, q)}{1 - F_{X} (x; β, q)},

and replacing

f_{X} (x; β, q)

and

F_{X} (x; β, q)

, the result is obtained. □

Figure 3, shows the survival function (left) and hazard function (right)

β = 1

and different values of q, compared with the 2SL distribution.

Table 1 shows

P (X > x)

for different values of x in the mentioned distribution.

The size of the right tail of a distribution is crucial when the chosen model aims to capture values far from the beginning of the distribution’s support, such as outliers. The concept of heavy tails is fundamental in actuarial statistical applications. In this context, distributions such as Pareto, Lognormal, and Weibull, among others, have been widely used to model losses in automobile insurance and catastrophic insurance. It is well-established that any probability distribution defined by its cdf

F_{X} (x)

on the real line is classified as heavy right-tailed (see Rolski et al. [23]) if

lim {sup}_{x \to \infty} (- log (R_{X} (x)) / x) = 0

. An important topic in extreme value theory is regular variation (see Bingham [24]), a concept formalized in the following definition.

Definition 1.

A distribution function is called regular varying at infinity with index

- α

si

lim_{x \to \infty} \frac{R_{X} (t x)}{R_{X} (x)} = t^{- α},

(3)

where the parameter

α \geq 0

is called the tail index.

The following proposition states that the survival function of the S2SL distribution exhibits regular variation.

Proposition 4.

The survival function of the random variable

X \sim S 2 S L (β, q)

is a survival function with regularly varying tails.

Proof.

Applying the above definition and using L’Hospital’s rule we have that

lim_{x \to \infty} \frac{R_{X} (t x)}{R_{X} (x)} = t lim_{x \to \infty} \frac{f_{X} (t x; β, q)}{f_{X} (x; β, q)} = t^{- q} lim_{x \to \infty} \frac{γ (q + 4, β t x) + 6 β [γ (q + 3, β t x) + β γ (q + 2, β t x)]}{γ (q + 4, β x) + 6 β [γ (q + 3, β x) + β γ (q + 2, β x)]} .

Since

{lim}_{x \to \infty} γ (q + i, β t x) = {lim}_{x \to \infty} γ (q + i, β x) = Γ (q + i)

, the result is obtained when calculating the limit. □

A direct consequence of the above proposition is that the S2SL distribution is heavy right-tailed (see Rolski et al. [23]).

Proposition 5 shows that the S2SL distribution is the product of a scale mixture between the 2SL and Beta distributions.

Proposition 5.

Let

X | W = w \sim 2 S L (w^{- 1}, β)

and

W \sim B e t a (q, 1)

. Then,

X \sim S 2 S L (β, q)

.

Proof.

The marginal density function of X is given by

\begin{matrix} f_{X} (x; β, q) & = \int_{0}^{1} f_{X | W} (x | w) \cdot f_{W} (w) d w, \\ = \int_{0}^{1} \frac{w^{2} β^{4}}{{(1 + β)}^{2}} x (\frac{w^{2} x^{2}}{6} + w x + 1) e^{- β x w} \cdot q w^{q - 1} d w, \\ = \frac{x β^{4} q}{{(1 + β)}^{2}} [\frac{x^{2}}{6} \int_{0}^{1} w^{q + 3} e^{- β x w} d w + x \int_{0}^{1} w^{q + 2} e^{- β x w} d w + \int_{0}^{1} w^{q + 1} e^{- β x w} d w] . \end{matrix}

Finally, substituting the variable

u = β x w

, the result is obtained. □

Proposition 6.

Let

X \sim S 2 S L (β, q)

. If

q \to \infty

, then

X \overset{D}{\to} 2 S L (β)

, where

\overset{D}{\to}

denotes convergence in the distribution.

Proof.

Let

X \sim S 2 S L (β, q)

and

X = \frac{Y}{U^{1 / q}}

given in (2). First, we study the convergence in the probability of

U^{1 / q}

. We have that,

U \sim U (0, 1)

and

W = U^{1 / q} \sim B e t a (q, 1)

. Thus, we obtain

E [{(W - 1)}^{2}] = \frac{2}{{(q + 1)}^{2} (q + 2)},

where if

q \to \infty \Rightarrow

E [{(W - 1)}^{2}] \to 0

. Therefore,

W \overset{P}{\to} 1

, where

\overset{P}{\to}

denotes convergence in probability. Finally, applying Slutsky’s theorem for

X = \frac{Y}{W}

, we have that

X \overset{D}{\to} Y \sim 2 S L (β)

. □

2.3. Moments

Proposition 7.

Let

X \sim S 2 S L (β, q)

, then the r-th moment of X is given by

μ_{r} = E [X^{r}] = \frac{q (r + 1)!}{6 β^{r} (q - r) {(1 + β)}^{2}} [6 β^{2} + 6 β (r + 2) + r^{2} + 5 r + 6] .

Proof.

Using the stochastic representation given in (2), we have that

\begin{matrix} μ_{r} & = E [X^{r}] = E [{(\frac{Y}{U^{\frac{1}{q}}})}^{r}] = E [Y^{r}] \cdot E [U^{- \frac{r}{q}}], \end{matrix}

where

E [U^{- \frac{r}{q}}] = \frac{q}{q - r}

,

q > r

and

E [Y^{r}] = \frac{(r + 1)!}{6 β^{r} {(1 + β)}^{2}} [6 β^{2} + 6 β (r + 2) + r^{2} + 5 r + 6]

, are the r-th moments of

U^{- \frac{1}{q}}

and Y, respectively, where

U \sim U (0, 1)

and

Y \sim 2 S L (β)

. □

Corollary 1.

If

X \sim S 2 S L (β, q)

with β and

q > 0

, the first four moments and variance of X are

μ_{1} = E [X] = \frac{2 q (β + 2)}{β (q - 1) (1 + β)}, q > 1 .

μ_{2} = E [X^{2}] = \frac{2 q (3 β^{2} + 12 β + 10)}{β^{2} (q - 2) {(1 + β)}^{2}}, q > 2 .

μ_{3} = E [X^{3}] = \frac{24 q (β^{2} + 5 β + 5)}{β^{3} (q - 3) {(1 + β)}^{2}}, q > 3 .

μ_{4} = E [X^{4}] = \frac{120 q (β^{2} + 6 β + 7)}{β^{4} (q - 4) {(1 + β)}^{2}}, q > 4 .

V (X) = \frac{2 q (β^{2} q^{2} - 2 β^{2} q + 3 β^{2} + 4 β q^{2} - 8 β q + 12 β + 2 q^{2} - 4 q + 10)}{b^{2} {(1 + β)}^{2} (q - 2) {(q - 1)}^{2}}, q > 2 .

The asymmetry and kurtosis coefficients are defined as

E (Z^{3})

and

E (Z^{4})

, respectively, where

Z = (X - E (X)) / \sqrt{Var (X)}

represents the standardized variable. These coefficients are of great importance, since the first allows us to quantify the degree of asymmetry of a variable, while the kurtosis coefficient can be used to detect the presence of heavy tails in the underlying distribution. The following proposition presents these coefficients for the S2SL distribution.

Proposition 8.

Let

Y \sim S 2 S L (β, q)

; then, the asymmetry and kurtosis coefficients of the random variable Y are given by

\sqrt{β_{1}} = \frac{24 κ_{3} (β^{2} + 5 β + 5) - 12 κ_{1} κ_{2} (β^{2} + 3 β + 2) (3 β^{2} + 12 β + 10) + 16 κ_{1}^{3} {(β^{2} + 3 β + 2)}^{3}}{{[2 κ_{2} (3 β^{2} + 12 β + 10) - 4 κ_{1}^{2} {(β^{2} + 3 β + 2)}^{2}]}^{3 / 2}},

β_{2} = \frac{120 κ_{4} (β^{2} + 6 β + 7) - 192 κ_{1} κ_{3} (β^{2} + 3 β + 2) (β^{2} + 5 β + 5) + 48 κ_{1}^{2} κ_{2} {(β^{2} + 3 β + 2)}^{2} (3 β^{2} + 12 β + 10) - 48 κ_{1}^{4} {(β^{2} + 3 β + 2)}^{4}}{{(2 κ_{2} (3 β^{2} + 12 β + 10) - 4 κ_{1}^{2} {(β^{2} + 3 β + 2)}^{2})}^{2}},

where

κ_{r} (β, q) = \frac{q}{β^{r} (q - r) {(1 + β)}^{2}}

.

Proof.

Using the definitions of the standardized asymmetry and kurtosis coefficients,

\begin{matrix} \sqrt{β_{1}} = \frac{E [{(X - E (X))}^{3}]}{{(V (X))}^{3 / 2}} = \frac{μ_{3} - 3 μ_{1} μ_{2} + 2 μ_{1}^{3}}{{(μ_{2} - μ_{1}^{2})}^{3 / 2}}, and \\ β_{2} = \frac{E [{(X - E (X))}^{4}]}{{(V (X))}^{2}} = \frac{μ_{4} - 4 μ_{1} μ_{3} + 6 μ_{1}^{2} μ_{2} - 3 μ_{1}^{4}}{{(μ_{2} - μ_{1}^{2})}^{2}}, \end{matrix}

where

μ_{1}

,

μ_{2}

,

μ_{3}

and

μ_{4}

are given by Corollary 1. The result is obtained by substituting the corresponding terms. □

Figure 4 shows that when the values of parameter q are low, the asymmetry and kurtosis coefficients increase.

3. Inference

In this section, we carry out parameter estimation of the S2SL distribution using the moments and ML methods with the EM algorithm, and perform a simulation study.

3.1. Moments Estimators

Proposition 9.

Let

X_{1}, X_{2}, \dots, X_{n}

be a random sample from

X \sim S 2 S L (β, q)

, then the moments estimators of

θ = (β, q)

are given by

{\hat{q}}_{M} = \frac{\bar{X} {\hat{β}}_{M} (1 + {\hat{β}}_{M})}{\bar{X} {\hat{β}}_{M} (1 + {\hat{β}}_{M}) - 2 ({\hat{β}}_{M} + 2)},

(4)

\frac{\bar{X} (1 + {\hat{β}}_{M}) ({\hat{β}}_{M}^{2} {(1 + {\hat{β}}_{M})}^{2} \bar{X^{2}} - 2 (3 {\hat{β}}_{M}^{2} + 12 {\hat{β}}_{M} + 10))}{\bar{X} {\hat{β}}_{M} (1 + {\hat{β}}_{M}) - 2 ({\hat{β}}_{M} + 2)} - 2 {\hat{β}}_{M}^{2} {(1 + {\hat{β}}_{M})}^{2} \bar{X^{2}} = 0,

(5)

where

\bar{X}

is the sample mean and

\bar{X^{2}}

is the sample mean of the squares of sample units. We solve Equation (5) numerically to obtain

{\hat{β}}_{M}

. Then,

{\hat{β}}_{M}

must be replaced in Equation (4) to obtain

{\hat{q}}_{M}

.

Proof.

Using Proposition 7 and substituting

E [X]

by

\bar{X}

and

E [X^{2}]

by

\bar{X^{2}}

, the following equations are obtained:

\bar{X} = \frac{2 q (β + 2)}{β (q - 1) (1 + β)},

(6)

\bar{X^{2}} = \frac{2 q (3 β^{2} + 12 β + 10)}{β^{2} (q - 2) {(1 + β)}^{2}} .

(7)

Solving Equation (6) for parameter q, we obtain Equation (4). Then, replacing

{\hat{q}}_{M}

in Equation (7), we obtain Equation (5). □

3.2. ML Estimators

Let

X_{1}, \dots, X_{n}

, be a random sample of size n of a random variable X with

S 2 S L (β, q)

distribution, then the log-likelihood function for

θ = {(β, q)}^{T}

can be expressed as

\begin{matrix} ℓ (θ, x_{i}) & \propto & n log (q) - (q + 1) \sum_{i = 1}^{n} log (x_{i}) - n q log (β) - 2 n log (1 + β) \\ + & \sum_{i = 1}^{n} log [γ (q + 4, β x_{i}) + 6 β [γ (q + 3, β x_{i}) + β γ (q + 2, β x_{i})]] . \end{matrix}

(8)

Deriving partially the log-likelihood function for

β

and q and equalling to zero, we obtain the following equations:

\frac{\partial ℓ (θ, x_{i})}{\partial β} = - \frac{n q}{β} - \frac{2 n}{1 + β} + \sum_{i = 1}^{n} \frac{6 γ (q + 3, β x_{i}) + 12 β γ (q + 2, β x_{i}) + β^{q + 3} x_{i}^{q + 2} e^{- β x_{i}} (x_{i}^{2} + 6 x_{i} + 6)}{γ (q + 4, β x_{i}) + 6 β [γ (q + 3, β x_{i}) + β γ (q + 2, β x_{i})]} = 0,

(9)

\frac{\partial ℓ (θ, x_{i})}{\partial q} = \frac{n}{q} - \sum_{i = 1}^{n} log (x_{i}) - n log (β) + \sum_{i = 1}^{n} \frac{I (q + 4, β x_{i}) + 6 β [I (q + 3, β x_{i}) + β I (q + 2, β x_{i})]}{γ (q + 4, β x_{i}) + 6 β [γ (q + 3, β x_{i}) + β γ (q + 2, β x_{i})]} = 0,

(10)

where

I (a, v) = \int_{0}^{v} t^{a - 1} log (t) e^{- t} d t

,

a > 0

, and

v > 0

(see Milgram [25]) where

I (a, v)

is related with the generalized integral-exponential function when

v = \infty

.

The solutions to Equations (9) and (10) can be obtained using digital methods like the Newton–Raphson algorithm. One alternative for obtaining the ML estimators is to maximize Equation (8) using the optim function of the R software [26] version 4.0.5. However, in order to obtain a more robust estimation procedure, in the next subsection we will explore the use of the EM algorithm for this particular problem.

3.3. EM Algorithm

The EM algorithm (see Dempster et al. [27]) is a widely used tool for estimating ML in scenarios with unobserved or latent data. In this context, the S2SL distribution can also be expressed by the following stochastic approach.

\begin{matrix} Y_{i} ∣ Z_{1 i} = z_{1 i}, Z_{2 i} = z_{2 i}, U_{i} = u_{i} & \sim G (2 + z_{1 i} + z_{2 i}, β y_{i}), \\ Z_{1 i} & \sim Bern (\frac{1}{1 + β}), \\ Z_{2 i} & \sim Bern (\frac{1}{1 + β}), \\ U_{i} & \sim Beta (q, 1) . \end{matrix}

(11)

where

Z_{1 i}, Z_{2 i}

and

U_{i}

, for

i = 1, \dots, n

represent the unobserved variables. The data observed are given by

D_{o} = y^{⊤}

, where

y^{⊤} = (y_{i}, \dots, y_{n})

. The vectors

z_{1}^{⊤} = (z_{1 i}, \dots, z_{1 n})

,

z_{2}^{⊤} = (z_{2 i}, \dots, z_{2 n})

and

u^{⊤} = (u_{i}, \dots, u_{n})

are the latent variables and the vector

D_{c} = {(y^{⊤}, z_{1}^{⊤}, z_{2}^{⊤}, u^{⊤})}^{⊤}

are the complete data. The joint distribution of

(Y_{i}, Z_{1 i}, Z_{2 i}, U_{i})

is given by

\begin{matrix} f (y_{i}, z_{1 i}, z_{2 i}, u_{i}) = & f (y_{i} ∣ z_{1 i}, z_{2 i}, u_{i}) \times f (z_{1 i}) \times f (z_{2 i}) \times f (u_{i}), \\ = & \frac{{(β u_{i})}^{2 + z_{1 i} + z_{2 i}}}{Γ (2 + z_{1 i} + z_{2 i})} y_{i}^{1 + z_{1 i} + z_{2 i}} e^{- β u_{i} y_{i}} \times {(\frac{1}{1 + β})}^{z_{1 i} + z_{2 i}} {(\frac{β}{1 + β})}^{2 - (z_{1 i} + z_{2 i})} \times q u_{i}^{q - 1}, \\ = & \frac{q β^{4} {(u_{i})}^{2 + z_{1 i} + z_{2 i} + q - 1}}{Γ (2 + z_{1 i} + z_{2 i}) {(1 + β)}^{2}} y_{i}^{1 + z_{1 i} + z_{2 i}} e^{- β u_{i} y_{i}} . \end{matrix}

Thus the complete log-likelihood function for

θ = (β, q)

can be expressed as

ℓ_{c} (θ; D_{c}) = n [4 log β - 2 log (1 + β) + log q] - β \sum_{i = 1}^{n} u_{i} y_{i} + q \sum_{i = 1}^{n} log u_{i} + c,

where c is a constant that does not depend on the parameters vector. Thus the expected

ℓ_{c} (θ; D_{c})

, given by the observed data, is

Q (θ ∣ θ^{(k)}) = n [4 log β - 2 log (1 + β) + log q] - β \sum_{i = 1}^{n} {\hat{u_{i}}}^{(k)} y_{i} + q \sum_{i = 1}^{n} {\hat{κ_{i}}}^{(k)},

where

\hat{u_{i}} = E (U_{i} ∣ y_{i})

and

\hat{k_{i}} = E (log U_{i} ∣ y_{i})

. Note that

\begin{matrix} f (u_{i}, z_{1 i}, z_{2 i} ∣ y_{i}) & \propto \underset{\begin{matrix} U_{i} ∣ z_{1 i}, z_{2 i}, y_{i} \sim T G_{(0, 1)} (2 + z_{1 i} + z_{2 i} + q, β y_{i}) \end{matrix}}{\underset{⏟}{\frac{{(β y_{i})}^{2 + z_{1 i} + z_{2 i} + q}}{Γ (2 + z_{1 i} + z_{2 i} + q)} \cdot \frac{u_{i}^{(2 + z_{1 i} + z_{2 i} + q) - 1} e^{- β y_{i} u_{i}}}{G (1; 2 + z_{1 i} + z_{2 i} + q, β y_{i})}}} \\ \times \underset{\begin{matrix} Z_{1 i}, Z_{2 i} ∣ y_{i} \sim Bernoulli (v_{i}) \end{matrix}}{\underset{⏟}{\frac{Γ (2 + z_{1 i} + z_{2 i} + q)}{Γ (2 + z_{1 i} + z_{2 i})} β^{- z_{1 i} - z_{2 i}} G (1; 2 + z_{1 i} + z_{2 i} + q, β y_{i})}} \end{matrix}

(12)

where

v_{i} = \frac{6 β Γ (3 + q) G_{3} + Γ (4 + q) G_{4}}{6 β^{2} Γ (2 + q) G_{2} + 6 β Γ (3 + q) G_{3} + Γ (4 + q) G_{4}}

,

G (y; a) = \int_{0}^{y} \frac{1}{Γ (a)} t^{a - 1} e^{- t} d t

is the cdf of the gamma model. Furthermore, we define

G_{p} = G (β y_{i}; q + p)

and

T G_{(0, 1)} (a, b)

; this denotes the gamma distribution with shape parameter a and rate b truncated in the interval

(0, 1)

.

Therefore, using properties of conditional expectations, we have that

E (U_{i} ∣ y_{i}) = E [E (U_{i} ∣ Z_{1 i}, Z_{2 i}, y_{i}) ∣ y_{i}]

; according to (12), this expectation is simple to compute, and we obtain

E (log U_{i} ∣ y_{i})

similarly. The results are as follows:

E (U_{i} ∣ y_{i}) = \frac{1}{β y_{i} S} [Γ (2 + q) (2 + q) G_{3} + \frac{Γ (3 + q) (3 + q) G_{4}}{β} + \frac{Γ (4 + q) (4 + q) G_{5}}{6 β^{2}}],

(13)

E (\log U_{i} ∣ y_{i}) = - log (β y_{i}) + \frac{I (q + 2, β y_{i})}{S} + \frac{I (q + 3, β y_{i})}{β S} + \frac{I (q + 4, β y_{i})}{6 β^{2} S},

(14)

where

S

is the normalization constant, defined as

S = Γ (2 + q) G_{2} + \frac{Γ (3 + q) G_{3}}{β} + \frac{Γ (4 + q) G_{4}}{6 β^{2}} .

Thus, the EM algorithm for estimating the vector

θ = (β, q)

is as follows:

Step E: given ${\hat{β}}^{(k - 1)}$ and ${\hat{q}}^{(k - 1)}$ , for $i = 1, \dots, n$ compute ${\hat{u_{i}}}^{(k)}$ and ${\hat{k_{i}}}^{(k)}$ using Equations (13) and (14).
Step M1: update ${\hat{q}}^{(k)}$ as,

${\hat{q}}^{(k)} = \frac{- n}{\sum_{i = 1}^{n} {\hat{k_{i}}}^{(k)}} .$
Step M2: update ${\hat{β}}^{(k)}$ as the solution of the following non-linear equation

$\frac{4 n}{β} - \frac{2 n}{1 + β} = \sum_{i = 1}^{n} y_{i} {\hat{u_{i}}}^{(k)} .$

Steps E, M1, and M2 are repeated until convergence is reached, defined when the difference between the estimations of two consecutive iterations is less than a previously fixed value. Note that Step M1 has an explicit solution, while

β

can be solved using, for example, the uniroot function in R.

3.4. Simulation Study

In this section, we present a simulation study to evaluate the performance of the EM algorithm in estimating the parameters of the S2SL distribution. A total of 1000 replicas were generated for four sample sizes:

n = 50, 100, 200

and 500, using fixed values for parameters

β

and q. The initial values to start the EM algorithm are

β^{(0)} = 1

and

q^{(0)} = 1

. Based on the stochastic representation given in Equation (11), random numbers can be generated from the S2SL model, leading to Algorithm 1.

Algorithm 1 For simulating values from the distribution

X \sim S 2 S L (β, q)

1:: Generate $Z_{1 i} \sim Bernoulli (\frac{1}{1 + β}), i = 1, 2, \dots, n$ .
2:: Generate $Z_{2 i} \sim Bernoulli (\frac{1}{1 + β}), i = 1, 2, \dots, n$ .
3:: Generate $U_{i} \sim Beta (q, 1), i = 1, 2, \dots, n$ .
4:: Compute $X_{i} = Gamma (2 + Z_{1 i} + Z_{2 i}, U_{i} β) \sim S 2 S L (β, q), i = 1, 2, \dots, n$ .

Table 2 shows the estimated mean for each parameter (Mean), together with their standard errors (SE), the root mean squared error (RMSE), and the coverage percentage (CP) of the ML estimators, based on a 95% confidence interval. It may be concluded from the results that the ML estimators are consistent. As the sample size increases, the estimation means draw progressively closer to the true value of the parameter. As might be expected, the values of the SE and the RMSE diminish and stabilize as the sample size increases, suggesting that the standard errors of the estimators are calculated correctly. The R codes are available in Appendix A.

4. Applications

In this section, we analyse two real datasets to evaluate the performance of the S2SL distribution in modelling data with high kurtosis. A comparison is made between the S2SL, 2SL, and LSD distributions, using the Akaike information criterion (AIC) presented in Akaike [28], and the Bayesian information criterion (BIC) proposed in Schwarz [29]. Below, we present the pdf of the LSD distribution (see Gui [17]):

f_{Y} (y; θ, σ, q) = \frac{q θ^{2}}{σ (1 + θ)} \int_{0}^{1} (1 + \frac{y t}{σ}) e^{- \frac{θ y t}{σ}} t^{q} d t, y, θ, σ, q > 0 .

4.1. Application 1: Patients with Acute Bone Cancer

The dataset contains the survival times (in days) of 73 patients diagnosed with acute bone cancer. The data were originally presented by Mansour et al. [30] and subsequently analysed by Klakattawi [31] and Alanzi et al. [32]. The dataset is available in the R software package [26] “ComRiskModel” with the “data_acutebcancer” database.

Table 3 presents the descriptive statistics of the data: sample mean, standard deviation, sample asymmetry and kurtosis coefficients. Figure 5 shows a boxplot for the patients with acute bone cancer dataset, which is seen to present atypical observations and high kurtosis (

b_{2} = 51.78

).

The moments estimators for the parameters of the S2SL model are

{\hat{β}}_{M} = 1.4399

and

{\hat{q}}_{M} = 1.0715

. These estimators were used as initial values to calculate the ML estimators. Table 4 shows the ML estimations with their standard errors and the AIC and BIC criteria. The S2SL distribution shows a better fit to the bone cancer patients dataset than the 2SL and LSD distributions, as the AIC and BIC values are smaller.

Figure 6 shows that the theoretical quantiles of the proposed S2SL model present a more exact fit to the quantiles of the survival data in the sample, when compared with the 2SL and LSD distributions. This supports the above finding, since according to the AIC and BIC selection criteria, the S2SL model presents a better fit to these dataset.

4.2. Application 2: Air Transceiver Repair Times

The second application is to a set of 46 repair times for an air communications transceiver, measured in hours. The complete dataset was taken from Jorgensen [33]. Table 5 shows the descriptive statistics for the repair times, which present high kurtosis. Figure 7 shows the boxplot of the dataset, in which the existence of outliers can also be appreciated.

The moments estimators used as starting points for estimation by ML of the S2SL distribution are

{\hat{β}}_{M} = 1.2196

and

{\hat{q}}_{M} = 1.0421

. Table 6 shows the ML estimates for the parameters, with their respective SE, and the values of the AIC and BIC criteria for each distribution compared. Figure 8 presents the QQ-plots for the 2SL, LSD and S2SL distributions. All these summaries and graphs enable us to conclude that the S2SL distribution provides the best fit to the repair times data.

5. Conclusions

In this work, we present the S2SL distribution, an extension of the 2SL distribution in which the slash methodology is used to increase its flexibility for modelling data with heavy tails and outlying observations. Some properties of this new distribution are obtained, and its parameters are estimated by the ML method using the EM algorithm. Below, we highlight some of the most important characteristics of the S2SL distribution:

The S2SL distribution has two different stochastic representations, given in Equation (2) and Proposition 5.
The expressions of the pdf, cdf, and hazard function are obtained, all of which have a closed form and are represented by the lower incomplete gamma function.
When the coefficients of asymmetry and kurtosis are analysed, the S2SL model is shown to be more flexible than the 2SL model. Furthermore, as shown in Table 1, the distribution tails become heavier as parameter q diminishes.
Implementation of the EM algorithm allows ML estimators for the model parameters to be obtained more efficiently.
The simulation study shows that as the sample size is increased, the ML estimators draw progressively closer to the true values of the parameters, suggesting that the estimators are consistent and stable.
In the applications to real data, the S2LS distribution is seen to provide a better fit to the data when compared with the 2SL and LSD distributions, reflected in lower values in the AIC and BIC criteria.

In future work, we will consider exploring Bayesian inference for model parameters using the Bayesian bootstrap algorithm described by Lyddon et al. [34], as it represents a relevant complementary approach to the methodology presented in this study.

Author Contributions

Conceptualization, H.A.M. and H.W.G.; methodology, D.I.G. and H.W.G.; software, J.S.C. and D.I.G.; validation, J.S.C., D.I.G. and O.V.; formal analysis, H.A.M. and H.W.G.; investigation, J.S.C.; writing—original draft preparation, H.A.M.; writing—review and editing, D.I.G., O.V. and H.W.G.; funding acquisition, D.I.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset for Application 1 is available in the R software package [26]. Specific details can be found in the text. The dataset for Application 2 was taken from Jorgensen [33].

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Codes in R to reproduce the results.

Density function
rm(list=ls(all=TRUE))
x <- seq(0.04,15,0.006)
library(expint)
pdf_S2SL <- function(x,beta,q){
  G4 <- gamma(q+4)-gammainc(q+4,beta*x)
  G3 <- gamma(q+3)-gammainc(q+3,beta*x)
  G2 <- gamma(q+2)-gammainc(q+2,beta*x)
  ((q*x^(-(q+1)))/(6*beta^q*(1+beta)^2))*(G4+6*beta*G3+6*beta^2*G2)
}
resultado1 <- pdf_S2SL(x,1,1)
resultado2 <- pdf_S2SL(x,1,3)
resultado3 <- pdf_S2SL(x,1,5)
plot(x,resultado1,type="l",lty=1,lwd=2,xlab="x",ylab="Density",
     xlim=c(0,13),ylim=c(0,0.25),
     cex.lab=1.35,cex.axis=1.35)
lines(x,resultado2,lty=2,lwd=2)
lines(x,resultado3,lty=3,lwd=2)
Hazard function
hazard_S2SL <- function(x, beta, q) {
  library(expint)
  x <- seq(0.04, 15, 0.006)
  G44 <- gamma(q+4)-gammainc(beta*x,q+4)
  G33 <- gamma(q+3)-gammainc(beta*x,q+3)
  G22 <- gamma(q+2)-gammainc(beta*x,q+2)
  G4 <- gamma(4)-gammainc(beta*x,4)
  G3 <- gamma(3)-gammainc(beta*x,3)
  G2 <- gamma(2)-gammainc(beta*x,2)
  numerator <- q*x^(-(q+1))*(G44+6*beta*(G33+beta*G22))
  denominator <- 6*beta^(q)*(1+beta)^2-x^(-q)*((beta*x)^q*
        (G4+6*beta*(beta*G2+G3))-6*beta*(beta*G22+G33)-G44)
  hazard <- numerator/denominator
  return(hazard)
}
Asymmetry and kurtosis coefficient
rm(list=ls(all=TRUE))
library(plot3D)
library(latex2exp)
beta <- seq(3.1,15,length=40)
q <- seq(3.1,15,length=40)
beta2 <- seq(4.1,15,length=40)
q2 <- seq(4.1,15,length=40)
Skewness_S2SL <- function(beta,q){
  m1 <- (2*q*(beta+2))/(beta*(q-1)*(1+beta))
  m2 <- (2*q*(3*beta^2+12*beta+10))/(beta^2*(q-2)*(1+beta)^2)
  m3 <- (24*q*(beta^2+5*beta+5))/(beta^3*(q-3)*(1+beta)^2)
  (m3-3*m1*m2+2*m1^3)/((m2-m1^2)^(3/2))
}
Kurtosis_S2SL <- function(beta2,q2){
  m1 <- (2*q2*(beta2+2))/(beta2*(q2-1)*(1+beta2))
  m2 <- (2*q2*(3*beta2^2+12*beta2+10))/(beta2^2*(q2-2)*(1+beta2)^2)
  m3 <- (24*q2*(beta2^2+5*beta2+5))/(beta2^3*(q2-3)*(1+beta2)^2)
  m4 <- (120*q2*(beta2^2+6*beta2+7))/(beta2^4*(q2-4)*(1+beta2)^2)
  (m4-4*m1*m3+6*m1^2*m2-3*m1^4)/((m2-m1^2)^2)
}
Resultado_Skewness <- outer(beta,q,Vectorize(Skewness_S2SL))
Resultado_Kurtosis <- outer(beta2,q2,Vectorize(Kurtosis_S2SL))
persp(beta,q,Resultado_Skewness,theta=55,phi=20,col="#CAFF70",
      xlab=TeX(’$\\beta$’),ylab=TeX(’$q$’),zlab="Skewness",
      ticktype="detailed",nticks=4,shade=0.3,cex.lab=1.2,
      cex.axis=1.2,cex.main=1.2,cex.sub=1.2)
persp(beta2,q2,Resultado_Kurtosis,theta=55,phi=20,col="#FFD700",
      xlab=TeX(’$\\beta$’),ylab=TeX(’$q$’),zlab="Kurtosis",
      ticktype="detailed",nticks=4,shade=0.3,cex.lab=1.2,
      cex.axis=1.2,cex.main=1.2,cex.sub=1.2)
Simulation study for the S2SL distribution
rm(list=ls(all=TRUE))
library(knitr)
library(pracma)

set.seed(1234)

replicas=1000
b_true=4
q_true=0.5
muestra<-c(50,100,200,500)

resultados<-list()

for(J in muestra){
  cat("Processing sample size:",J,"\n")
  flush.console()

  bias.rep<-c()
  se.rep<-c()
  CP.rep<-c()
  est.rep<-c()

  for(j in 1:replicas){
    if(j%%100==0){
      cat("Replica:",j,"for sample size:",J,"\n")
      flush.console()
    }
    Z1<-rbinom(J,1,1/(1+b_true))
    Z2<-rbinom(J,1,1/(1+b_true))
    U<-rbeta(J,q_true,1)
    shape_Y<-2+Z1+Z2
    rate_Y<-b_true*U
    x<-rgamma(J,shape=shape_Y,rate=rate_Y)
    beta_last=1
    q_last=1
    dif=1
    max.iter=10000
    i<-1
    n<-length(x)
    while(i<=max.iter&dif>0.0001){
      u<-numeric(n)
      k<-numeric(n)
      for(j in 1:n){
        g2<-gamma(2+q_last)
        g3<-gamma(3+q_last)
        g4<-gamma(4+q_last)
        G2<-pgamma(x[j]*beta_last,q_last+2)
        G3<-pgamma(x[j]*beta_last,q_last+3)
        G4<-pgamma(x[j]*beta_last,q_last+4)
        G5<-pgamma(x[j]*beta_last,q_last+5)
        Sumf<-g2*G2+(g3*G3)/beta_last+
              ((g4*G4)/(6*(beta_last)^2))
        u[j]<-(1/(x[j]*beta_last*Sumf))*(g2*G3*(2+q_last)+
               ((g3*(3+q_last)*G4)/beta_last)+
               ((g4*(4+q_last)*G5)/(6*beta_last^2)))

        int1<-integrate(function(w) log(w)*w^(q_last+1)*exp(-w),
                        0,x[j]*beta_last)$value
        int2<-integrate(function(w) log(w)*w^(q_last+2)*exp(-w),
                        0,x[j]*beta_last)$value
        int3<-integrate(function(w) log(w)*w^(q_last+3)*exp(-w),
                        0,x[j]*beta_last)$value

        k[j]<--log(x[j]*beta_last)+int1/Sumf+
               int2/(beta_last*Sumf)+
               int3/(6*((beta_last)^2)*Sumf)
      }
      q_new<--n/sum(k)
      solve_beta<-function(beta_val){
        (4*n/beta_val)-(2*n)/(1+beta_val)-sum(x*u)
      }
      result<-uniroot(solve_beta,interval=c(0.01,100))
      beta_new<-result$root
      dif<-max(abs(c(beta_new,q_new)-c(beta_last,q_last)))
      beta_last<-beta_new
      q_last<-q_new
      i<-i+1
    }
    param<-cbind(beta_last,q_last)
    loglike<-function(theta,x,t.param=TRUE){
      beta=theta[1]
      q=theta[2]
      if(t.param){beta=exp(theta[1]);q=exp(theta[2])}
      ll=log(q)-(q+1)*log(x)-log(6)-q*log(beta)-
         2*log1p(beta)+
         log(exp(pgamma(beta*x,shape=q+4,log.p=TRUE)+
             lgamma(q+4))+6*beta*(exp(pgamma(beta*x,shape=q+3,
             log.p=TRUE)+lgamma(q+3))+beta*exp(pgamma(beta*x,
             shape=q+2,log.p=TRUE)+lgamma(q+2))))
      -sum(ll)
    }

    H<-hessian(loglike,x0=param,x=x,t.param=FALSE)
    var.est<-diag(solve(H))
    if(min(var.est)>0){
      bias.rep<-rbind(bias.rep,param-c(b_true,q_true))
      se.rep<-rbind(se.rep,sqrt(var.est))
      est.rep<-rbind(est.rep,param)

      lim.inf<-param-1.96*sqrt(var.est)
      lim.sup<-param+1.96*sqrt(var.est)

      cp.aux<-as.numeric(c(b_true,q_true)>lim.inf&
                c(b_true,q_true)<lim.sup)
      CP.rep<-rbind(CP.rep,cp.aux)
    }
  }
  est_mean<-round(apply(est.rep,2,mean),3)
  se_prom<-round(apply(se.rep,2,mean),3)
  rmse<-round(sqrt(apply(bias.rep^2,2,mean)),3)
  cp_prom<-round(apply(CP.rep,2,mean),3)
  resultados[[as.character(J)]]<-cbind(est_mean,se_prom,rmse,cp_prom)
}

tabla_resultados<-as.matrix(do.call(cbind,resultados))

tabla_latex<-kable(tabla_resultados,format="latex",
                booktabs=TRUE,digits=3)
print(tabla_latex)

References

Jonhson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; Wiley: New York, NY, USA, 1995; Volume 1. [Google Scholar]
Rogers, W.H.; Tukey, J.W. Understanding some long-tailed symmetrical distributions. Stat. Neerl. 1972, 26, 211–226. [Google Scholar] [CrossRef]
Mosteller, F.; Tukey, J.W. Data Analysis and Regression; Addison-Wesley: Reading, MA, USA, 1977. [Google Scholar]
Kafadar, K. A biweight approach to the one-sample problem. J. Am. Statist. Assoc. 1982, 77, 416–424. [Google Scholar] [CrossRef]
Wang, J.; Genton, M.G. The multivariate skew-slash distribution. J. Stat. Plan. Inference 2006, 136, 209–220. [Google Scholar] [CrossRef]
Gómez, H.W.; Quintana, F.A.; Torres, F.J. A New Family of Slash-Distributions with Elliptical Contours. Stat. Probab. Lett. 2007, 77, 717–725, Erratum in Erratum in Stat. Probab. Lett. 2008, 78, 2273–2274. [Google Scholar] [CrossRef]
Olmos, N.M.; Varela, H.; Gómez, H.W.; Bolfarine, H. An extension of the half-normal distribution. Stat. Pap. 2012, 53, 875–886. [Google Scholar] [CrossRef]
Olmos, N.M.; Varela, H.; Bolfarine, H.; Gómez, H.W. An extension of the generalized half-normal distribution. Stat. Pap. 2014, 55, 967–981. [Google Scholar] [CrossRef]
Astorga, J.M.; Reyes, J.; Santoro, K.I.; Venegas, O.; Gómez, H.W. A Reliability Model Based on the Incomplete Generalized Integro-Exponential Function. Mathematics 2020, 8, 1537. [Google Scholar] [CrossRef]
Rivera, P.A.; Barranco-Chamorro, I.; Gallardo, D.I.; Gómez, H.W. Scale Mixture of Rayleigh Distribution. Mathematics 2020, 8, 1842. [Google Scholar] [CrossRef]
Lindley, D.V. Fiducial distributions and Bayes’ theorem. J. R. Stat. Soc. Ser. B 1958, 20, 102–107. [Google Scholar] [CrossRef]
Ghitany, M.E.; Atieh, B.; Nadarajah, S. Lindley distribution and its applications. Math. Comput. Simul. 2008, 78, 493–506. [Google Scholar] [CrossRef]
Ghitany, M.; Al-Mutairi, D.; Balakrishnan, N.; Al-Enezi, I. Power Lindley distribution and associated inference. Comput. Stat. Data Anal. 2013, 64, 20–33. [Google Scholar] [CrossRef]
Gómez-Déniz, E.; Calderin-Ojeda, E. The discrete Lindley distribution: Properties and application. J. Stat. Comput. Simul. 2011, 81, 1405–1416. [Google Scholar] [CrossRef]
Krishna, H.; Kumar, K. Reliability estimation in Lindley distribution with progressively type II right censored sample. Math. Comput. Simul. 2011, 82, 281–294. [Google Scholar] [CrossRef]
Bakouch, H.S.; Al-Zaharani, B.; Al-Shomrani, A.; Marchi, V.; Louzada, F. An extended Lindley distribution. J. Korean Stat. Soc. 2012, 41, 75–85. [Google Scholar] [CrossRef]
Gui, W. Statistical properties and applications of the Lindley slash distribution. J. Appl. Statist. Sci. 2012, 20, 283–298. [Google Scholar]
Oluyede, B.O.; Yang, T. A new class of generalized Lindley distribution with applications. J. Stat. Comput. Simul. 2014, 85, 2072–2100. [Google Scholar] [CrossRef]
Shanker, R.; Hagos, F.; Sujatha, S. On modeling of Lifetimes data using exponential and Lindley distributions. Biom. Biostat. Int. J. 2015, 2, 140–147. [Google Scholar] [CrossRef]
Abouammoh, A.M.; Alshangiti, A.M.; Ragab, I.E. A new generalized Lindley distribution. J. Stat. Comput. Simul. 2015, 85, 3662–3678. [Google Scholar] [CrossRef]
Tomy, L. A retrospective study on Lindley distribution. Biom. Biostat. Int. J. 2018, 7, 163–169. [Google Scholar] [CrossRef][Green Version]
Chesneau, C.; Tomy, L.; Gillariose, J. On a Sum and Difference of two Lindley Distributions: Theory and Applications. REVSTAT 2020, 18, 673–695. [Google Scholar]
Rolski, T.; Schmidli, H.; Schmidt, V.; Teugel, J. Stochastic Processes for Insurance and Finance; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
Bingham, N. Regular Variation; Cambridge University Press: Cambridge, UK, 1987. [Google Scholar]
Milgram, M.S. The generalized integro-exponential function. Math. Comput. 1985, 44, 443–458. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://www.r-project.org/ (accessed on 16 October 2024).
Dempster, A.P.; Laird, N.M.; Rubim, D.B. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B 1977, 39, 1–38. [Google Scholar] [CrossRef]
Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 1974, 19, 716–723. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Mansour, M.; Yousof, H.M.; Shehata, W.A.; Ibrahim, M. A new two parameters Burr XII distribution: Properties, copula, different estimation methods and modeling acute bone cancer data. J. Nonlinear Sci. Appl. 2020, 13, 223–238. [Google Scholar] [CrossRef]
Klakattawi, H.S. Survival analysis of cancer patients using a new extended Weibull distribution. PLoS ONE 2022, 17, e0264229. [Google Scholar] [CrossRef]
Alanzi, A.R.; Imran, M.; Tahir, M.H.; Chesneau, C.; Jamal, F.; Shakoor, S.; Sami, W. Simulation analysis, properties and applications on a new Burr XII model based on the Bell-X functionalities. AIMS Math. 2023, 8, 6970–7004. [Google Scholar] [CrossRef]
Jorgensen, B. Statistical Properties of the Generalized Inverse Gaussian Distribution; Lecture Notes in Statistics; Springer: New York, NY, USA, 1982. [Google Scholar]
Lyddon, S.P.; Holmes, C.C.; Walker, S.G. General bayesian updating and the loss-likelihood bootstrap. Biometrika 2019, 106, 465–478. [Google Scholar] [CrossRef]

Figure 1. Graphical comparison of the pdf between the 2SL and S2SL distributions for a fixed beta (

β = 1

) and different values of q.

Figure 1. Graphical comparison of the pdf between the 2SL and S2SL distributions for a fixed beta (

β = 1

) and different values of q.

Figure 2. Graphical comparison of the cdf between the 2SL and S2SL distributions for a fixed beta (

β = 1

) and different values of q.

Figure 2. Graphical comparison of the cdf between the 2SL and S2SL distributions for a fixed beta (

β = 1

) and different values of q.

Figure 3. Plots of the survival function (left) and the hazard function (right) for the S2SL distribution with

β = 1

and different values of q, compared to the 2SL distribution.

Figure 3. Plots of the survival function (left) and the hazard function (right) for the S2SL distribution with

β = 1

and different values of q, compared to the 2SL distribution.

Figure 4. Graphs of the asymmetry and kurtosis coefficients of the S2SL (

β, q

) model.

Figure 4. Graphs of the asymmetry and kurtosis coefficients of the S2SL (

β, q

) model.

Figure 5. Boxplot for the bone cancer dataset.

Figure 6. QQ-plot for the S2SL, 2SL, and LSD distributions for the bone cancer patients dataset.

Figure 7. Boxplot for repair times dataset.

Figure 8. QQ-plot for the S2SL, 2SL and LSD distributions for the repair times data.

Table 1. Comparison of the tails of the 2SL and S2SL distributions.

Distribution	$P (X > 5)$	$P (X > 10)$	$P (X > 15)$
S2SL (1, 1)	0.5586	0.2995	0.2000
S2SL (1, 5)	0.2428	0.0267	0.0041
S2SL (1, 10)	0.1876	0.0104	0.0005
2SL (1)	0.1387	0.0041	0.0001

Table 2. Simulation study for the parameters of

β

and q in the S2SL model.

Table 2. Simulation study for the parameters of

β

and q in the S2SL model.

True Value			$n = 50$				$n = 100$				$n = 200$				$n = 500$
$β$	$q$	Estim.	Mean	SE	RMSE	CP	Mean	SE	RMSE	CP	Mean	SE	RMSE	CP	Mean	SE	RMSE	CP
2	0.5	$\hat{β}$	2.022	0.410	0.403	0.939	2.020	0.288	0.288	0.949	2.004	0.201	0.207	0.94	2.013	0.128	0.125	0.951
	0.5	$\hat{q}$	0.515	0.091	0.099	0.949	0.505	0.063	0.065	0.955	0.503	0.044	0.046	0.94	0.497	0.027	0.027	0.948
	1	$\hat{β}$	1.999	0.338	0.356	0.941	1.996	0.237	0.232	0.948	2.000	0.168	0.161	0.954	2.002	0.106	0.108	0.941
	1	$\hat{q}$	1.080	0.246	0.297	0.970	1.030	0.159	0.167	0.970	1.014	0.109	0.114	0.961	1.004	0.068	0.068	0.954
	1.5	$\hat{β}$	1.973	0.315	0.318	0.934	2.005	0.224	0.221	0.945	1.995	0.157	0.160	0.934	1.996	0.099	0.093	0.961
	1.5	$\hat{q}$	1.765	0.680	1.969	0.960	1.577	0.302	0.324	0.967	1.546	0.204	0.220	0.954	1.521	0.124	0.126	0.958
4	0.5	$\hat{β}$	4.019	0.873	0.898	0.918	4.034	0.618	0.622	0.940	4.046	0.438	0.448	0.945	4.014	0.274	0.275	0.949
	0.5	$\hat{q}$	0.519	0.094	0.101	0.945	0.504	0.063	0.065	0.951	0.499	0.044	0.045	0.946	0.498	0.028	0.028	0.947
	1	$\hat{β}$	3.958	0.725	0.758	0.916	3.997	0.514	0.501	0.953	4.026	0.366	0.379	0.939	4.008	0.230	0.224	0.960
	1	$\hat{q}$	1.103	0.270	0.375	0.968	1.043	0.165	0.180	0.961	1.012	0.111	0.119	0.937	1.004	0.069	0.068	0.953
	1.5	$\hat{β}$	3.966	0.691	0.714	0.932	4.019	0.487	0.506	0.929	4.000	0.341	0.349	0.937	3.996	0.215	0.222	0.936
	1.5	$\hat{q}$	1.948	1.009	4.297	0.962	1.583	0.318	0.361	0.963	1.546	0.210	0.218	0.959	1.515	0.128	0.137	0.945
6	0.5	$\hat{β}$	6.078	1.375	1.410	0.921	6.054	0.965	1.008	0.938	6.015	0.676	0.695	0.941	6.002	0.425	0.426	0.955
	0.5	$\hat{q}$	0.522	0.094	0.106	0.946	0.509	0.065	0.068	0.953	0.505	0.045	0.048	0.948	0.503	0.028	0.028	0.943
	1	$\hat{β}$	6.045	1.155	1.197	0.926	6.019	0.810	0.862	0.928	6.005	0.569	0.59	0.944	6.004	0.358	0.364	0.945
	1	$\hat{q}$	1.075	0.253	0.294	0.953	1.035	0.166	0.180	0.963	1.016	0.113	0.12	0.949	1.008	0.070	0.069	0.953
	1.5	$\hat{β}$	5.958	1.082	1.123	0.925	5.962	0.755	0.787	0.930	5.997	0.534	0.547	0.948	5.988	0.336	0.346	0.939
	1.5	$\hat{q}$	1.931	1.136	4.178	0.964	1.623	0.340	0.414	0.957	1.544	0.213	0.220	0.967	1.525	0.130	0.140	0.942

Table 3. Descriptive statistics for the application to bone cancer patients.

n	$\bar{x}$	s	$\sqrt{b_{1}}$	$b_{2}$
73	3.76	10.60	6.80	51.78

Table 4. Estimations for the 2SL, LSD and S2SL distributions.

Estimations	2SL (SE)	LSD (SE)	S2SL (SE)
$\hat{β}$	0.8245 (0.0509)	-	2.4243 (0.2969)
$\hat{θ}$	-	0.0343 (0.0371)	-
$\hat{σ}$	-	0.0233 (0.0258)	-
$\hat{q}$	-	2.3092 (0.7187)	1.4611 (0.2903)
AIC	468.1453	290.0614	282.5487
BIC	470.4357	296.9328	287.1296

Table 5. Descriptive statistics for the application to repair times.

n	$\bar{x}$	s	$\sqrt{b_{1}}$	$b_{2}$
40	4.01	5.17	1.85	10.02

Table 6. ML estimates for the 2SL, LSD and S2SL distributions.

Estimations	2SL (SE)	LSD (SE)	S2SL (SE)
$\hat{β}$	0.7787 (0.0647)	-	2.0323 (0.4068)
$\hat{θ}$	-	0.0158 (0.0141)	-
$\hat{σ}$	-	0.0113 (0.0102)	-
$\hat{q}$	-	1.3333 (0.4079)	1.2423 (0.3462)
AIC	223.5754	189.9217	187.8777
BIC	225.2642	194.9884	191.2554

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Muñoz, H.A.; Castillo, J.S.; Gallardo, D.I.; Venegas, O.; Gómez, H.W. EM Algorithm in the Slash 2S-Lindley Distribution with Applications. Axioms 2025, 14, 101. https://doi.org/10.3390/axioms14020101

AMA Style

Muñoz HA, Castillo JS, Gallardo DI, Venegas O, Gómez HW. EM Algorithm in the Slash 2S-Lindley Distribution with Applications. Axioms. 2025; 14(2):101. https://doi.org/10.3390/axioms14020101

Chicago/Turabian Style

Muñoz, Héctor A., Jaime S. Castillo, Diego I. Gallardo, Osvaldo Venegas, and Héctor W. Gómez. 2025. "EM Algorithm in the Slash 2S-Lindley Distribution with Applications" Axioms 14, no. 2: 101. https://doi.org/10.3390/axioms14020101

APA Style

Muñoz, H. A., Castillo, J. S., Gallardo, D. I., Venegas, O., & Gómez, H. W. (2025). EM Algorithm in the Slash 2S-Lindley Distribution with Applications. Axioms, 14(2), 101. https://doi.org/10.3390/axioms14020101

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EM Algorithm in the Slash 2S-Lindley Distribution with Applications

Abstract

1. Introduction

2. Density and Properties

2.1. Stochastic Representation

2.2. Properties

2.3. Moments

3. Inference

3.1. Moments Estimators

3.2. ML Estimators

3.3. EM Algorithm

3.4. Simulation Study

4. Applications

4.1. Application 1: Patients with Acute Bone Cancer

4.2. Application 2: Air Transceiver Repair Times

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI