Some New Contributions on the Marshall–Olkin Length Biased Lomax Distribution: Theory, Modelling and Data Analysis

: The Lomax distribution is arguably one of the most useful lifetime distributions, explaining the developments of its extensions or generalizations through various schemes. The Marshall–Olkin length-biased Lomax distribution is one of these extensions. The associated model has been used in the frameworks of data ﬁtting and reliability tests with success. However, the theory behind this distribution is non-existent and the results obtained on the ﬁt of data were sufﬁciently encouraging to warrant further exploration, with broader comparisons with existing models. This study contributes in these directions. Our theoretical contributions on the the Marshall–Olkin length-biased Lomax distribution include an original compounding property, various stochastic ordering results, equivalences of the main functions at the boundaries, a new quantile analysis, the expressions of the incomplete moments under the form of a series expansion and the determination of the stress–strength parameter in a particular case. Subsequently, we contribute to the applicability of the Marshall–Olkin length-biased Lomax model. When combined with the maximum likelihood approach, the model is very effective. We conﬁrm this claim through a complete simulation study. Then, four selected real life data sets were analyzed to illustrate the importance and ﬂexibility of the model. Especially, based on well-established standard statistical criteria, we show that it outperforms six strong competitors, including some extended Lomax models, when applied to these data sets. To our knowledge, such comprehensive applied work has never been carried out for this model.


Introduction
The Lomax distribution introduced by [1] can be described as a simple two-parameter lifetime distribution with a varying polynomial decay. By denoting the shape parameter as α > 0 and the scale parameter as β > 0, it is specified by the following probability density function (pdf): and f L (x; α, β) = 0 for x < 0. Basically, the Lomax distribution corresponds to the famous Pareto distribution that has been shifted to the left until its support starts at 0 (see [2], p. 573). It has been widely used for the modeling of various measures in reliability and life testing from heavy tailed data. The literature on the Lomax distribution and its applications is vast, including [3][4][5][6][7][8][9], to name of few.
In particular, based on the concept of length-biased distribution pioneered by [20,24] introduced the length-biased Lomax (LBLO) distribution with the following pdf: where µ L denotes the mean of the Lomax distribution. That is, by taking into account that µ L = β/(α − 1) for α > 1, the pdf above can be expressed as for α > 1 and β > 0, and f LBLO (x; α, β) = 0 for x < 0. One can remark that f LBLO (0; α, β) = 0 against f L (0; α, β) = α/β. Thus the parameters α and β only governed the shapes of the pdf independently of the values of the initial value, contrary to the former Lomax distribution, while keeping a similar level of flexibility. In this sense, for some applied problems, the LBLO model is an interesting alternative to the Lomax model, with the same number of parameters. Further detail can be found in [20].
Recently, [23] proposed a new extension of the LBLO distribution called Marshall-Olkin length-biased Lomax (MOLBL) distribution. It consists in modifying the LBLO distribution via the Marshall-Olkin scheme pioneered by [25]. The aim is to add more flexibility to the LBLO distribution through ratio-type definitions of the main functions depending on a tuning parameter. Precisely, the corresponding pdf with the shape parameters α > 0 and γ > 0, and scale parameter β > 0 is f MOLBL (x; α, β, γ) = α(α − 1)γ β 2 x(1 + x/β) −(α+1) and f MOLBL (x; α, β, γ) for x < 0. Thus, the parameter γ modulates the denominator function; the LBLO distribution being recovered by taking γ = 1. In [23], the parameters of the MOLBL model are estimated by the maximum likelihood estimation method, without convergence evidence. The remission data set by [26] is analyzed, and it is proved that the MOLBL model has a better fit to the former LBLO model by considering the Akaike information criterion (AIC) and Bayesian information criterion (BIC), only. In addition, a reliability test plan is developed to accept or reject a submitted lot of products for inspection whose lifetime is directed to be a MOLBL distribution.
In this study, we complete the study of [23] on several important aspects, making significant theoretical and practical contributions to the MOLBL distribution. For the theoretical findings, (i) we prove that the MOLBL distribution can be derived by a simple compounding argument, (ii) new stochastic ordering properties are established, (iii) asymptotic equivalences are described for the first time with discussion on the role played by the parameters in this regard, (iv) a quantile analysis is performed with a special focus on the case α = 2, (v) the incomplete moments are expressed, as well as the ordinary moments, and (vi) the stress-strength parameter is determined for a special configuration on the parameters. For the practical contributions, (a) a complete simulation study guaranties the numerical convergence of the maximum likelihood estimates, (b) four different data sets are considered, and (c), for these data sets, six competitors are used, including some extended Lomax distributions. We show that the MOLBL model is the best based on the following benchmarks: AIC as well as its consistent version (CAIC), BIC, Hannan-Quinn information criterion (HQIC), Anderson-Darling (A * ), Cramer-von Mises (W * ), Kolmogorov-Smirnov (KS) and the p-value of the corresponding KS statistical test. A graphical analysis of the obtained fits is also provided, showing the high quality of the MOLBL model.
The paper is as follows. Section 2 is devoted to the theoretical contributions on the MOLBL distribution. Section 3 completes the above by offering the practical contributions. Finally, Section 4 contains some concluding remarks.

Theoretical Contributions
This section is devoted to new theoretical facts about the MOLBL distribution.

Main Functions of the MOLBL Distribution
We now recall the main functions of the MOLBL distribution, as sketched in [23]. First, the cumulative distribution function (cdf) of the MOLBL distribution is given as and F MOLBL (x; α, β, γ) = 0 for x < 0. The cdf of the LBLO distribution is obtained as a special case; it follows by substituting γ = 1 in Equation (4). Based on Equation (4), the survival function (sf) of the MOLBL distribution can be expressed as and S MOLBL (x; α, β, γ) = 1 for x < 0. We recall that the pdf of the MOLBL distribution is specified by Equation (3), corresponding to the derivative of F MOLBL (x; α, β, γ) with respect to x. From Equations (3) and (5), we can express the hazard rate function (hrf) of the MOLBL distribution by and h MOLBL (x; α, β, γ) = 0 for x < 0. The above analytical definitions are fundamental to explore the possibilities of the MOLBL model. They are used intensively in the remainder of the paper.
For the purposes of this study, the MOLBL distribution is denoted as MOLBL(α, β, γ) distribution when the parameters must be communicated.

Compounding
The following proposition shows that the MOLBL distribution follows from a special compounding distribution involving the classical exponential distribution with parameter γ.

Proposition 1. Let X and Y be continuous random variables such that
• the conditional cdf of X | {Y = y} is given as which is a well-identified cdf specified later, • Y has the exponential distribution with parameter γ > 0, that is with the pdf defined by f E (y; γ) = γe −γy for y ≥ 0 and f E (y; γ) = 0 for y < 0.
Proof. By using the distribution of X conditionally to {Y = y}, the cdf of X is obtained as This entails the desired result.
As announced, the conditional cdf of X | {Y = y} expressed in Equation (7) is well identified; it corresponds to the cdf of the Weibull-G family of distributions by [27] defined with the parameters α = y andβ = 1, and with the LBLO distribution as parental distribution. However, to our knowledge, it has not received a special attention, and can be the object of a future study.

Stochastic Ordering
The notion of first-order stochastic dominance is the most basic stochastic ordering concept. It consists in giving a mathematical setting to compare several distributions through their cdfs or, equivalently, their sfs. More precisely, we say that a distribution symbolized by A first-order stochastically dominates (FOSD) a distribution symbolized by B if their respective cdfs, say F A (x) and F B (x), satisfy the following inequality: F A (x) ≤ F B (x), for all x ∈ R. This concept finds numerous applications in actuarial sciences, econometrics, reliability and biometrics. One may refer to [28,29].
The next result shows several first order stochastic order dominance results for the MOLBL distributions based on all its parameters. Proposition 2. The following results hold.
The next result is about a hazard rate ordering satisfied by the MOLBL distribution. We say that a distribution symbolized by A dominates a distribution symbolized by B in the hazard rate ordering if their respective hrfs, say h A (x) and h B (x), satisfy the following inequality: h A (x) ≤ h B (x), for all x ∈ R. This kind of stochastic ordering is a useful concept in reliability and order statistics (see [30]).
Proof. On the basis of Equation (6), after differentiation and standard operations, we obtain which is clearly negative. Therefore, h MOLBL (x; α, β, γ) is decreasing with respect to γ, implying that, . This ends the proof of Proposition 3.
Note that, by the relation between the first-stochastic order dominance and hazard rate ordering, Proposition 3 implies Proposition 2 (iii).

Equivalences
The asymptotic behaviors of the main functions of the MOLBL distribution are useful to understand the role of the parameters played in the limit bounds and also, to prove the existence of important probabilistic quantities such as the moments. When x tends to 0, since the following equivalence at the order two holds: The last result implies that both f MOLBL (x; α, β, γ) and h MOLBL (x; α, β, γ) tend to 0 with a polynomial rate of degree 1. When x tends to +∞, the following equivalences hold: Therefore, f MOLBL (x; α, β, γ) and h MOLBL (x; α, β, γ) tend to 0 under all circumstances. This convergence is with a polynomial decay with degree α for f MOLBL (x; α, β, γ), and with a polynomial decay with degree 1 for h MOLBL (x; α, β, γ).
As a consequence, by the obtained equivalence: When x tends to +∞, implying the existence of the s th moments of the MOLBL distribution for any positive integer s satisfying this condition. Also, with a similar argument, we show that, for all t > 0, +∞ 0 e tx f MOLBL (x; α, β, γ)dx = +∞, meaning that the MOLBL distribution has a heavy (right) tail.

Incomplete Moments
The interests of the incomplete moment of a random variable or a distribution are (i) to generalize the notion of ordinary moments, (ii) to be involved in the definitions of important curves, deviation measures and functions, such as the Lorenz curve, mean deviation about the mean and mean residual life function. Discussions and applications on incomplete moments are available in [31,32]. Here, the incomplete moments of the MOLBL distribution are investigated, with discussion on the ordinary moments as well.
First, we need the following general integral result.

Lemma 1.
For any integer a ≥ 0, and real numbers b > 0, c > 0 and t ≥ 0, let us set Then, the following sum formula is valid: This equality is true for t → +∞ provided to b > 1 + a, and we have Proof. By performing the change of variables Since a is a positive integer, the classical binomial formula holds and we obtain For the case t → +∞, it is enough to notice that, for b > 1 + a, we have j − b + 1 < a − b + 1 < 0, implying that (1 + t/c) j−b+1 tends to 0. The desired result follows. This ends the proof of Lemma 1.
Lemma 1 can be used independently of interest, but will be at the heart for manageable series expression of the incomplete moments.
We are in the position to present the main results of this section, regarding the incomplete moments of the MOLBL distribution. The proposition below proposes a series expansion of any of these incomplete moments in the case γ ∈ (0, 1).

Proposition 4.
Let s be an integer and X be a random variable having the MOLBL(α, β, γ) distribution with γ ∈ (0, 1). Then, for t ≥ 0, the s th incomplete moment of X according to t is given as where I({X ≤ t}) is a random variable having the Bernoulli distribution with parameter P(X ≤ t), and .
Also, provided to s < α − 1, by applying t → +∞, the s th ordinary moment of X is given as for any x ≥ 0, based on the geometric series expansion, we can express f MOLBL (x; α, β, γ) in Equation (3) as Now, by the classical binomial formula, we obtain Therefore, by multiplication with x s , integrating over (0, +∞) with respect to x and introducing the integral function I(t; a, b, c) defined in Equation (8), it comes The desired result follows from Lemma 1 applied to I(t; a, b, c) with a = s + + 1, b = α(k + 1) + 1 and c = β, after some elementary simplifications. This concludes the proof of Proposition 4.
Based on Proposition 4, the following approximations are acceptable: where K denotes any large integer. Such finite sums can give precise numerical evaluations of moments, better in terms of error than computational integration procedures. The next proposition completes Proposition 4 by investigating the case γ > 1.

Proposition 5.
We adopt the same setting to Proposition 4 but with γ > 1. Then, for t ≥ 0, the s th incomplete moment of X according to t is given as where .
Also, provided to s < α − 1, by applying t → +∞, the s th ordinary moment of X is given as where Ω * k, ,m,j;s = −Ω k, ,m,j;s .

Proof.
Of course, the integral definition set in Equation (9) still holds. Now, remark that, after some developments, we can write for any x ≥ 0, the geometric series expansion gives The classical binomial formula applied two times in a row gives Through the use of the integral function I(t; a, b, c) defined in Equation (8), we obtain By virtue of Lemma 1 applied to I(t; a, b, c) with a = s + m + 1, b = α( + 1) + 1 and c = β, the stated result follows after some developments. The proof of Proposition 5 is ended.
Thanks to Proposition 5, the following approximations are possible: where K denotes any large integer. Such finite sums can give precise numerical evaluations of moments, better in terms of error than computational integration techniques. From the moments of the MOLBL distribution, under some condition on α, one can derive standard measures of centrality, dispersion, asymmetry and peakness, such as the mean (µ), variance (V), moments skewness coefficient (S) and moments kurtosis coefficient (K), respectively. They are classically defined by respectively, all existing for α > 5. Table 1 indicates numerical values for mean, variance, skewness and kurtosis of the MOLBL distribution for α = 6 and some selected values of parameters β and γ. For the considered values, we see that the MOLBL distribution is right skewed. Wide variations for the considered measures are observed.

Stress-Strength Parameter
The stress-strength parameter of a distribution naturally appears in many random systems and population comparison (see [33][34][35]). Here, we formulate a result on the expression of this parameter in the context of the MOLBL distribution. Proposition 6. Let us define the stress-strength parameter by R = P(Y ≤ X), where X and Y are independent random variables following the MOLBL(α, β, γ 1 ) and MOLBL(α, β, γ 2 ) distributions, respectively. Then, we have Proof. We follow the lines of ([36] [Section 2]). Based on the independence of X and Y, and the expressions of their pdf and sf in Equations (3) and (5), respectively, we get the following integral expression: By performing the change of variables y = (1 + x/β) −α (1 + αx/β), the above integral is reduced to We get the desired result.
Proposition 6 is the first step for the statistical treatment of R, as derived in [36], for instance.

Applied Contributions
We now focus on the applicability of the MOLBL model in a concrete statistical setting.

Estimation with Simulation
As developed in [23], the parameters α, β and γ of the MOLBL model can be estimated via the maximum likelihood method. That is, based on n data supposed to be drawn from the MOLBL distribution, say x 1 , . . . , x n , the maximum likelihood estimates (MLEs) of α, β and γ, sayα,β andγ, respectively, are defined by The log-likelihood function as well as the related score equations can be found in [23]. However, it is worth mentioning that the MLEsα,β andγ have no closed-form expressions. For practical purposes, they can be determined numerically by the use of statistical software. Here, we employ the R software with the package named maxLik (see [37]).
As a new contribution, we conduct a simulation study to check the asymptotic behavior of the MLEs of the model using Newton-Raphson method. The algorithm used in this simulation study is as follows.
Step 1: We chose the number of replications denoted by N.
Step 2: We chose the sample size denoted by n, the values of the parameters α, β, γ and an initial value denoted by x 0 . Step 3: We generate a value denoted by u from a random variable with the unit uniform distribution.
Step 4: We update x 0 by using the Newton formula in the following way: Step 5: For a small enough tolerance limit denoted by , if |x 0 − x | ≤ , we store x = x as a sample from MOLBL distribution.
Step 8: Compute the MLEs of the parameters.
Step 9: Repeat Steps 3-8 N times to generate N MLEs.
The results are obtained from N = 1000 replications. In each replication, a random sample of size n = 80, 120, 200, 300 and 800 is generated for different combinations of α, β and γ. Here, the considered values of α, β and γ are (1.5, 5, 0.5), (1.5, 5, 1), (2.5, 10, 0.5), (1.75, 10, 1) and (1.5, 8, 0.5). Tables 2-6 list the average MLEs, biases and the corresponding mean squared errors (MSEs). We recall that the average MLEs of α, β and γ are given bŷ respectively, the biases of α, β and γ are respectively, and the MSEs of α, β and γ are respectively. The values in Tables 2-6 show that, as the sample size increases, the MSEs of the estimates of the parameters tend to zero and the average estimates of the parameters tend closer to the true parameter values. One can notice that the convergence is slow for the estimation of β. This can be explained by the fact that it is taken relatively large in our experiments, i.e., at 5, 8 and 10. The overall numerical convergence can certainly be improved by using modern algorithms, such as the Simulated Annealing (SANN) described in [38]. Indeed, the SANN method guarantees a convergence that does not depend on the initial values, even when several local extrema are present. Further details and applications of this method can be found in [39]. Alternatively, Bayesian estimation can be investigated in a similar manner to the former Lomax distribution, as performed in [8]. However, these methods require additional developments that we leave for future work. Table 3. Average MLEs, biases and MSEs for α = 1.5, β = 5 and γ = 1.  Table 6. Average MLEs biases and MSEs for α = 1.5, β = 8 and γ = 0.5.

Applications to Four Data Sets
This section provides new applications to explore the potential of the MOLBL model with other six well known competitive models, namely the power Lomax (POLO) (see [19]), exponentiated Lomax (EXLO) (see [14]), Marshall-Olkin length-biased exponential (MOLBE) (see [40]), length-biased Lomax (LBLO), original Weibull and original Lomax models. The MOLBL, POLO and EXLO models have three parameters, whereas the MOLBE, LBLO, Weibull and Lomax models have two parameters. The pdfs of these competitive models are shown below.

•
The pdf of the POLO model is The pdf of the EXLO model is The pdf of the LBLO model is given as Equation (2), that is The pdf of the MOLBE model is and f MOLBE (x; α, β) = 0 for x < 0.

•
The pdf of the Weibull model is The pdf of the Lomax model is specified by Equation (1), that is and f L (x; α, β) = 0 for x < 0.
Four data sets were considered and analyzed, chosen for their interests as well as their different statistical natures (right-skewed, left-skewed, high peak, etc.) The model parameters were classically estimated by the maximum likelihood method, as described in Section 3.1 for the MOLBL model. Then, we compared the considered models by taking into account the AIC, CAIC, BIC, HQIC, A * , W * , KS and the p-value of the corresponding KS test. The best model is the one with the smallest values for the AIC, CAIC, BIC, HQIC, A * , W * , KS and the greatest value for the p-value of the KS test.
Data set 1: The data were extracted from [41]. It represents the survival times of a group of patients suffering from Head and Neck cancer disease and treated using radiotherapy. The data are as follows: 6 Table 7 shows the MLEs of the parameters of the considered models, with their standard errors.  Table 8 indicates the values of the AIC, CAIC, BIC, HQIC, A * , W * , KS and p-value of the considered models.  Table 9 shows the MLEs of the parameters of the considered models, with their standard errors.  Table 10 indicates the values of the AIC, CAIC, BIC, HQIC, A * , W * , KS and p-value of the considered models.  Table 11 shows the MLEs of the parameters of the considered models, with their standard errors.  Table 12 indicates the values of the AIC, CAIC, BIC, HQIC, A * , W * , KS and p-value of the considered models.  Data set 4: The data were taken from [44]. They represent the survival data on the death times of psychiatric patients admitted to the University of Iowa hospital. The data are as follows: 1, 1, 2, 22,30,28,32,11,14,36,31,33,33,37,35,25,31,22,26,24,35,34,30,35,40,39. Table 13 shows the MLEs of the parameters of the considered models, with their standard errors.   A graphical analysis is now performed, showing the fitted pdfs and cdfs of all the models. The fitted pdfs are superposed over the corresponding histogram of the data, and the estimated cdfs are superposed over the corresponding empirical cdf of the data. The plots are displayed in Figures 1-4, for Data sets 1, 2, 3 and 4, respectively.   In all the figures, we see that the MOLBL model better adjusted the empirical objects, making enough pliancy to adapt to the right or left skewness property of the data, as well as versatile peakness.

Concluding Remarks
The present study completes the work of [23] about the Marshall-Olkin length-biased Lomax distribution by providing important theoretical and applied contributions. New results on the following subjects are proved: (i) compounding, (ii) stochastic ordering, (iii) asymptotic equivalences of the main functions, (iv) quantile, (v) incomplete and ordinary moments, and (vi) stress-strength parameter. Thanks to a simulation study, the maximum likelihood estimates of the parameters of the Marshall-Olkin length-biased Lomax model are proved to be numerically efficient in the convergence sense. New applications are given, revealing that the Marshall-Olkin length-biased Lomax model is more powerful than expected; it can outperform the famous power Lomax, exponentiated Lomax, Marshall-Olkin length-biased exponential, length-biased Lomax, Weibull and Lomax models. This fact is illustrated by the analysis of four different data sets coming from real-life experiments. Graphic evidence is also provided.
We hope that the present study has revealed the potential of the Marshall-Olkin length-biased Lomax distribution for various probabilistic and statistical purposes, also opening new application horizons.
Author Contributions: Conceptualization, J.M. and C.C.; methodology, J.M. and C.C.; validation, J.M. and C.C.; formal analysis, J.M. and C.C.; investigation, J.M. and C.C.; writing-review and editing, J.M. and C.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.