A New Power Topp–Leone Generated Family of Distributions with Applications

In this paper, we introduce a new general family of distributions obtained by a subtle combination of two well-established families of distributions: the so-called power Topp–Leone-G and inverse exponential-G families. Its definition is centered around an original cumulative distribution function involving exponential and polynomial functions. Some desirable theoretical properties of the new family are discussed in full generality, with comprehensive results on stochastic ordering, quantile function and related measures, general moments and related measures, and the Shannon entropy. Then, a statistical parametric model is constructed from a special member of the family, defined with the use of the inverse Lomax distribution as the baseline distribution. The maximum likelihood method was applied to estimate the unknown model parameters. From the general theory of this method, the asymptotic confidence intervals of these parameters were deduced. A simulation study was conducted to evaluate the numerical behavior of the estimates we obtained. Finally, in order to highlight the practical perspectives of the new family, two real-life data sets were analyzed. All the measures considered are favorable to the new model in comparison to four serious competitors.


Introduction
Owing to the growing amount of data from various applied fields and unstoppable computer progress, there is increasing motivation on developing efficient and flexible statistical models. Such models can be derived from general families of distributions having desirable properties, such as those constructed from a generator distribution. The main idea of this construction is to add shape parameter(s) to a baseline distribution with the aim to upgrade its flexibility level. Among the well-known examples of such families, there are the beta-G [1], Kumaraswamy-G [2], Weibull-GG [3], Garhy-G [4], type II half logistic-G [5], Transmuted Topp-Leone G [6], generalized odd log-logistic-G [7], odd Fréchet-G [8], power Lindley-G [9], Fréchet Topp-Leone-G [10], exponentiated generalized Topp-Leone-G [11], and truncated inverted Kumaraswamy-G [12]. We also refer to the exhaustive survey in [13]. Recently, several researchers used the Topp-Leone (TL) distribution as 1.
To provide very simple models and create new simple distributions.

2.
To improve the flexibility of existing distributions on various aspects (such as mode, median, skewness, and kurtosis. . . ).

3.
To provide better fits than competing modified models having the same of higher number of parameters.
We support these claims both in full generality and by putting the light on the special member of the NPTL-G family defined with the inverse Lomax (ILx) distribution as the baseline distribution (the reason of this choice will be explained later). The resulting distribution, called the new power Topp-Leone inverse Lomax (NPTLILx) distribution, offers a new three-parameter lifetime distribution, with a high potential of applicability. We illustrate that by the means of two practical data sets with different features: the first one is from [23] and is about active repair times for airborne communication transceiver, and the second one is from [24] and is about actual tax revenue in Egypt. Favorable results were obtained for the proposed model in comparison to serious competitors, motivating its use wider statistical uses.
The contents of this paper are organized as follows. In Section 2, the basics of the NPTL-G family are presented, as is the NPTLILx distribution. Various mathematical properties of the family are discussed in Section 3. Section 4 is devoted to the estimation of the unknown parameters from the NPTLILx model, with a comprehensive simulation study. The data analyses are shown in Section 5 with numerical and graphical illustrations. A conclusion and perspectives are formulated in Section 6.

Basics of the NPTL-G Family
The basics of the NPTL-G family are presented in this section, with a focus on the main functions of interest.
Furthermore, when G(x; ξ) → 1, we have The variations of f (x; α, β, ξ) can be studied in a standard manner, starting with the critical point(s) given by the solution of the non-linear equation according to .
Then, for a critical point x c , the sign of {ln[ f (x; α, β, ξ)]} | x=x c is informative on its nature (minimum, maximum, or inflection point).

Hazard Rate Function
The hazard rate function (hrf) of the NPTL-G family is given by Some asymptotic results on h(x; α, β, ξ) are presented below. When G(x; ξ) → 0, we have Thus, the parameters α and β have a significant effect on the asymptotes when G(x; ξ) → 0, but no effect when G(x; ξ) → 1.

A Special Member: The NPTLILx Distribution
The NPTL-G family contains distributions of various natures, depending on the choice of the baseline distribution. In this study, as evoked in the introduction, we chose the inverse Lomax distribution with shape parameter θ > 0 as the baseline distribution to define the NPTLILx distribution. Thus, it is defined by the following cdf: (another parameter of the former definition of the inverse Lomax distribution has been reduced to 1 for the purposes of the paper). Let us now briefly motivate this choice. As suggested by its name, the inverse Lomax distribution is the distribution of the random variable Y = 1/X, where X denotes a random variable following the standard Lomax distribution (with parameters θ and 1). The corresponding pdf and hrf are, respectively, given by In addition to being simple, it has been proven to be a very flexible to model data having a subjacent non-monotonic hrf. Further details and applications can be found in [25][26][27]. Thus, the NPTLILx distribution is defined by the following cdf: Possible shapes of the pdf and hrf of the NPTLILx distribution are illustrated in Figures 1 and 2, respectively. In particular, from Figure 1, we see that the pdf can be right skewed and reversed-J shaped. From Figure 2, we see that the hrf can be increasing, decreasing, upside down, and bathtub shaped. All these curvature properties are known to be desirable to create flexible statistical models.

Some Mathematical Properties
The section presents some important mathematical properties of the NPTL-G family.

On a Stochastic Ordering
The following result shows some inequalities involving F(x; α, β, ξ).

Proposition 1.
For any x ∈ R such that G(x; ξ) > 0, the following inequalities hold: Proof. The bracket term in the definition of F(x; α, β, ξ) given by (1) is central. Since e ≤ 2, implying the second inequality. For the first inequality, the following well-known logarithmic inequality: for y > −1, y 1+y ≤ ln(1 + y) gives e 1− 1 . The first inequality follows. This ends the proof of Proposition 1.
An immediate consequence of Proposition 1 is the following stochastic ordering result: is the cdf of the exponentiated IE-G family (with power parameter αβ).
Another stochastic ordering result comes from the following remark: the function F o (x; α, β, ξ) given by has the properties of a cdf, with the corresponding pdf given by To the best of our knowledge, it is new in the literature (and out the scope of this paper).

Quantile Function with Some Related Measures and Functions
The quantile function (qf) of the NPTL-G family is expressed in the following result.

Proposition 2.
The qf of the NPTL-G family is given by where Q G (u; ξ) is the qf corresponding to G(x; ξ).
Proof. For the sake of simplicity, let us set x u = Q(u; α, β, ξ) for u ∈ (0, 1). Then, by the definition of a qf, x u satisfies the non-linear equation: u = F(x u ; α, β, ξ), implying that , which is equivalent to solving the polynomial equation according to y: y 2 − 2y + u By determining the two roots of this polynomial, keeping only the one in the unit interval (since y ∈ (0, 1)), we get y = 1 − 1 − u 1 α . After some algebra, we get G(x u ; ξ) The desired result follows by compounding with Q G (u; ξ), ending the proof of Proposition 2.
From the qf, we can define several quantities of importance, providing distributional properties of the family. Some of them are presented below.
The three quartiles of the NPTL-G family are defined by Q 1 = Q(1/4; α, β, ξ), Q 2 = Q(1/2; α, β, ξ), and Q 3 = Q(3/4; α, β, ξ). In particular, the median of the NPTL-G family is given by Additionally, the inter-quartile range is given by IQR = Q 3 − Q 1 , allowing one to define the Galton coefficient of skewness and the Moors coefficient of kurtosis, given by, respectively, See [28,29] for more details on these coefficients, respectively. On the other hand, upon differentiation of Q(u; α, β, ξ) according to u, the corresponding quantile density function is given by where q G (u; ξ) is the quantile density function corresponding to G(x; ξ). Also, the hazard quantile function is defined by These functions have central roles in reliability. Further details can be found in [30].
Last but not least, the qf allows us to generate values from members of the NPTL-G family. This property will be used in Section 4.2 in the context of the NPTLILx distribution; i.e., with the qf given by As a numerical illustration, Table 1 shows the values of Q 1 , M, Q 3 , S, and K of the NPTLILx distribution for some parameter values.  We see in Table 1 that the effects of α, β, and θ on the quartiles are significant (we always have S > 0 so the distribution is right-skewed and moderate variations for K).

Series Expansion
The exp-G family of distributions, introduced by [31], is defined by the following cdf: The corresponding pdf is given by The interesting part of the exp-G family is to have well-known properties for a lot of baseline cdfs G(x; ξ). For instance, the member of the exp-G family defined with the inverse Lomax distribution as baseline with shape parameter θ becomes the inverse Lomax distribution with shape parameter γθ.
The following result concerns a series expansion for the pdf of the NPTL-G family in terms of pdfs of the exp-G family.

Proposition 3.
We have the following series expansion: Proof. We first investigate a series expansion of F(x; α, β, ξ) based on the Equation (1). Since On the other hand, thanks to the power series of the exponential function, we get . Now, it follows from the generalized and standard binomial formulas that By combining all the above equalities together, we obtain Upon differentiation of F(x; α, β, ξ) according to x, we get the desired result, by removing the term in q = 0, which vanished. Proposition 3 is proven.

General Moments with Some Related Measures and Functions
Let X be a random variable having the cdf given by (1) (defined on a probability space (Ω, A, P), with an expectation denoted by E). Then, for any function φ(x) (such that all the following introduced quantities exist or converge), we have Two equivalent expressions involving already introduced qfs are as follows: Numerical solutions exist to evaluate them for given G(x; ξ), φ(x) and α, β, and θ. Alternatively, we can consider Proposition 3, which implies that where In some circumstances, truncated sums can be considered for practical purposes; for a large integer K, the following approximation reveals to be tractable and efficient: Some specific choices for φ(x) are of particular interest. Some of them are discussed below.
• By taking φ(x) = x s , we get the s-th moment of X-i.e., µ s = E(X s ), including the mean of X, i.e., µ = µ 1 = E(X)-and allow the expression the variance of X; i.e., , allowing one to calculate the s-th general coefficient of X given by C s = µ s /σ s , among others. This coefficient is useful to investigate the skewness and kurtosis properties of X.

•
By taking φ(x) = e tx , we get the moment generation function of X according to the variable t; i.e., By taking φ(x) = e itx , we get the characteristic function of X according to the variable t; i.e., ϕ(t) = E(e itX ). In a same title of the cdf, the characteristic function entirely determines the NPTL-G family.
which is equal to x s if x ≤ y and 0 otherwise, we get the s-th incomplete moment of X according to the variable y; i.e., µ s (y) = E(X s 1 {X≤y} ). This function is useful to define mean deviations of X, the corresponding residual life function, Bonferroni and Lorenz curves, and others.
In the case of the NPTLILx distribution, since f (x; α, β, θ) ∼ 2αβ 2 θ 2 x −3 when x → +∞, the mean exists but the variance does not exist, nor do moments of order greater to 2 (there is no problem when x → 0). However, all the incomplete moments exist for any fixed y > 0. In this regard, Table 2 provides the four first incomplete moments for X with y = 1000.

Shannon Entropy
Here, we study the Shannon entropy of the NPTL-G family as defined by [32]. We recall that the Shannon entropy of a random variable measures the amount of uncertainty for the outcome of this variable. A high entropy reveals a high degree of uncertainty. Now, let X be a random variable having the cdf given by (1). Then, the Shannon entropy of X is defined by By the use of any mathematical software, for a given baseline cdf G(x; ξ), φ(x) and α, β, and θ, we can determine this integral. Another approach consists of developing η by the use of the pdf given by (2): . Some expectation terms can be expressed by using (5) with an appropriate function φ(x) as soon as U q (φ, G) exists and the sums converge.
In the context of the NPTLILx distribution, some values of η are collected in Table 3 for some parameter values.  In Table 3, the values belongs to the wide interval [−12.8, 3.55], meaning that α, β, and θ have an important impact on the amount of information quantified by η.

Estimation with Numerical Results
In this section, we investigate the NPTLILx model characterized by the cdf given by (3). Thanks to its attractive theoretical and practical properties, the maximum likelihood method is used to estimate the parameters α, β, and θ. Numerical results attest to the efficiency of the estimates obtained.
Hereafter, we consider a random variable X following the NPTLILx distribution with parameters α, β, and θ.
The corresponding Fisher information matrix we observed is given by (the elements of J(α, β, θ) are upon request from the authors). When n is large, the distribution of the subjacent random vector behind (α,β,θ) can be approximated by a three dimensional normal distribution with mean vector (α, β, θ) and covariance matrix J(α,β,θ) −1 . By denoting vα, vβ and vθ, the diagonal elements of this matrix, we are able to construct asymptotic confidence intervals for α, β, and θ. Indeed, with the adopted notations, the asymptotic (equitailed) confidence intervals (CIs) of α, β, and θ at the level 100(1 − γ)% are given by, respectively, where z γ/2 is the upper γ/2-th percentile of the normal distribution N (0, 1). For practical purposes, if lower bounds of these intervals are negative, we can put it at 0, since all the parameters are supposed to be positive. All the technical details can be found in [33].

Numerical Results
Here, we provide a simulation study to show the nice behavior of the MLEs for the NPTLILx model presented in the subsection above. First of all, let us mention that a random sample from X can be obtained by the use of the qf: for any random sample of size n from the uniform distribution U (0, 1), say u 1 , . . . , u n , the corresponding random sample of size n of X is given by x 1 , . . . , x n with x i = Q(u i ; α, β, θ).
From N random samples of X, let be either α, β, or θ andˆ i be the MLE of constructed from the i-th sample. Then, we define the (mean) MLE, bias, and mean square error (MSE) by, respectively, Additionally, the asymptotic (mean) confidence intervals of α, β, and θ at the level 100(1 − γ)% can be determined. We define the (mean) lower bounds (LBs), (mean) upper bounds (UBs), and (mean) average length (ALs) by, respectively, For the purposes of this study, we consider the levels 90% and 95%, so z 0.05 = 1.644854 and z 0.025 = 1.959964, respectively. The software Mathematica 9 was employed.
Our simulation study was based on the the following plan.
• N = 1000 random samples of size n = 100, 200, 300, and 1000 are to be generated from X.  From Tables 4-6, we can see that, when n increases, biases, MSEs, and ALs decrease. This observation is consistent with the well-known convergence properties of the MLEs.

Data Analysis
In this section, we prove the flexibility of the NPTLILx model by analyzing two practical datasets. The fits of the NPTLILx model are compared to the competitive models listed in Table 7. The common point of all of them is the use the inverse Lomax distribution as the baseline distribution. Table 7. The competitive models considered.

Distribution Reference
Inverse Lomax (ILx) [34] Inverse Power Lomax (PILx) [35] Topp-Leone Inverse Lomax (TILx) [36] Weibull Inverse Lomax (WILx) [37] Except the former inverse Lomax distribution, the considered models possess three or four parameters. The comparison of these models was performed by using the following well-known statistical benchmarks: CVM (Cramér-von Mises); AD (Anderson-Darling); KS (Kolmogorov-Smirnov) statistic with the corresponding p-value, minus log-likelihood (−ˆ ); AIC (Akaike information criterion); CAIC (corrected Akaike information criterion); BIC (Bayesian information criterion); and HQIC (Hannan-Quinn information criterion). For the CVM, AD, KS, (−ˆ ), AIC, CAIC, BIC, and HQIC, the smaller the value is, the better the fit to the data. Additionally, the higher the p-values of the KS test are, the better the fit to the data. All these measures were computed by using the R software.
The graphical and numerical analyses of these two datasets are as follows. Figure 3 presents the total test time (TTT) plots of the two datasets. The first plot shows a convex curve, indicating that a decreasing hrf for the fitting model is appropriate for Data set I, whereas the second plot shows a concave curve, indicating that an increasing hrf for the fitting model is appropriate for Data set II. These cases are covered by the NPTLILx model, as shown in Figure 2.   Tables 8 and 9 present the CVM, AD, KS, and the related p-value, and the MLEs of the models' parameters for Datasets I and II, respectively. The obtained p-values indicate that the NPTLILx model is the best. Tables 10 and 11 communicate the −ˆ , AIC, BIC, CAIC, BIC, and HQIC of the models for Datasets I and II, respectively. Since the smallest values are obtained for the NPTLILx model, it can be considered the best with these criteria. The estimated pdfs and cdfs for the considered models are displayed in Figures 4 and 5 for Datasets I and II, respectively. The plots of the estimated pdfs are visually refined via an individual treatment in Figures 6 and 7. In order to give another point of view, we illustrate the adequateness of the models via the use of probability-probability (PP) plots in Figures 8 and 9, for Datasets I and II, respectively. In particular, for Dataset II, in view of the perfect adjustment of the scatter plot by the PP line, it is clear that the NPTLILx model provides a better fit in comparison to the other models. To resume, the NPTLILx model reveals itself to be the more appropriate model for the two datasets, illustrating its applicability in a concrete setting.       We end this section by providing some additional graphical and numerical elements on the NPTLILx model, related to the quantities presented in Section 4.1. To illustrate the uniqueness of the MLEs of α, β and θ, the profiles of the log-likelihood function are proposed in Figures 10 and 11 for Datasets I and II, respectively. The Fisher information matrices of the NPTLILx model taken at the MLEs for Datasets I and II are, respectively, given by Then, the confidence intervals for α, β, and θ at the levels 90% and 95% are provided in Table 12.

Conclusion and Perspectives
In this paper, we introduced and studied a new general family of distributions, called the NPTL-G family, based on the so-called power Topp-Leone-G and inverse exponential-G families. Various mathematical properties were presented, including stochastic ordering, quantile function and related measures, general moments and related measures, and the Shannon entropy, with discussions. Then, we payed special attention to a member of the family defined with the inverse Lomax distribution, called the NPTLILx distribution. The estimation of the unknown model parameters was done with the maximum likelihood method, with numerical guarantees on their behavior via a simulation study. The applicability of the NPTLILx model was then illustrated by the consideration of two practical datasets. It was then proven that the NPTLILx model is a serious alternative to other models, also using the inverse Lomax distribution as the baseline. Future work will include the constructions of various regression models, Bayesian estimation of the parameters, and analyses of new datasets. Thanks to its numerous qualities, we believe that the NPTL-G family can be helpful for the practitioner, for statistical analyses beyond the scope of this paper.
Among the interesting perspectives of work, one could investigate the confidence bounds and supersaturation properties of the cdfs of the members of the NPTL-G family, which are useful for choosing an appropriate model for given data, following the spirit of [38][39][40][41][42]. All these aspects need further investigations that we leave for future works.