Statistical Properties and Different Methods of Estimation for Type I Half Logistic Inverted Kumaraswamy Distribution

Ramadan A. ZeinEldin 1,2, Christophe Chesneau 3,*, Farrukh Jamal 4 and Mohammed Elgarhy 5 1 Deanship of Scientific Research, King AbdulAziz University, Jeddah 21589, Saudi Arabia; rzainaldeen@kau.edu.sa 2 Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt 3 Department of Mathematics, Université de Caen, LMNO, Campus II, Science 3, 14032 Caen, France 4 Department of Statistics, Govt. S.A Postgraduate College Dera Nawab Sahib, Bahawalpur, Punjab 63360, Pakistan; drfarrukh1982@gmail.com 5 Valley High Institute for Management Finance and Information Systems, Obour, Qaliubia 11828, Egypt; m_elgarhy85@yahoo.com * Correspondence: christophe.chesneau@unicaen.fr; Tel.: +33-02-3156-7424


Introduction
Over the last years, numerous families of distributions have been introduced, providing new opportunities in terms of statistical modeling and data analysis. One of the most promising of them is the so-called type I half-logistic-G (TIHL-G) family introduced by [1]. The TIHL-G family is characterized by the cumulative distribution function (cdf) given by where G(x; ξ) is a baseline cdf of a continuous distribution depending on a parameter vector ξ and λ > 0. The definition of F(x; λ, ξ) is derived to the T-X technique of [2] applied with the half-logistic distribution as main generator, that is, F(x; λ, ξ) = − log[1−G(x;ξ)] 0 f 0 (t; λ)dt, where f 0 (t; λ) = 2λe −λt /(1 + e −λt ) 2 , t > 0, is the probability density function (pdf) of the half-logistic distribution. Among the qualities of the TIHL-G family, we would like to mention the tractability of the corresponding functions and the fact that the additional parameters λ can significantly increase the flexibility of the mode and the tails of the former distribution characterized by G(x; ξ). That explains why several special members have been recently studied in detail. We refer the reader to [3] for the type I half-logistic Burr XII distribution (also called extended Bur XII distribution), [4] for the type I half-logistic generalized Weibull distribution, [5] for the type I half-logistic Lomax distribution, and [6] for the type I half-logistic exponential distribution. Also, with the use of an additional parameter, the type I half-logistic-G family was recently generalized by [7].
In parallel, several new flexible continuous distributions was studied. In particular, [8] recently introduced a new two-parameter lifetime distribution, called the inverted Kumaraswamy (IK) distribution. It is characterized by the cdf given by where a, b > 0. The basics of the IK distribution is as follows: It is the distribution of the random variable V = 1/U − 1, where U is a random variable following the standard Kumaraswamy distribution introduced by [9], i.e., with the cdf given by G * (x; a, b) = 1 − (1 − x a ) b , x ∈ (0, 1). Among the features of this distribution, it covers many well-established distributions as the Lomax (Pareto type II) distribution when b = 1, the beta type II (inverted beta) distribution when a = 1, the log-logistic (Fisk) distribution when a = b = 1, the inverted Weibull distribution when b → +∞ and the generalized exponential distribution when a → +∞. Also, the IK distribution demonstrates a great flexibility in terms of curvature of the distribution functions, specially on the mode and the tails of the distribution.
In this paper, we define a new three-parameter lifetime distribution by mixing the TIHL-G family and the IK distribution. Thus, it can be viewed as a new special member of the TIHL-G family benefiting from the qualities of the IK distribution, naturally called the type I half-logistic inverted Kumaraswamy (TIHLIK) distribution. This study is devoted to both its theoretical and practical features, with an emphasis on the applied side. Indeed, a substantial part is devoted to the estimation of the model parameters via various methods, with all the details. Nice performance of the estimates are observed, with discussions. Then, we show that the related TIHLIK model shows better fits for some data sets when compared to recent rivals, also defined with the IK model as baseline. The required computations are carried out in the R-language introduced by R Development Core Team [10]. Beyond the data analysis, the TIHLIK distribution (and its symmetric version around 0) can find applications in many other applied domains. For instance, it can be used to construct new prior distributions in a Bayesian setting and new mixtures of distributions in a discriminant analysis framework. Modern developments in these directions can be found in [11,12].
The rest of the paper is divided into six sections. Section 2 presents the TIHLIK distribution. Some mixture representations of the main functions in terms of Lomax distribution functions are given in Section 3. In Section 4, we attempt to derive the main mathematical and statistical properties of the TIHLIK distribution. Section 5 is devoted to the estimation of the model parameters, with a simulation study. Applications are given in Section 6. Section 7 provides concluding remarks.

The TIHLIK Distribution
By mixing the TIHL-G family and the IK distribution, the cdf of the TIHLIK distribution is given by Upon differentiation, the corresponding pdf is given by Also, the hazard rate function (hrf) of the TIHLIK distribution is given by and the corresponding cumulative hazard rate function (chrf) is given by In order to illustrate the flexibility of the TIHLIK distribution, Figures 1 and 2 present plots of the above pdfs and hrfs. We would like to mention that the values for the parameters λ, a, and b have been taken arbitrarily until we get a wide variety of shapes for the involved functions.
We observe that the pdf is left, right skewed, symmetrical, and reverse J shaped while the hrf is increasing, decreasing, and upside-down bathtub shaped.

Mixture Representations
In this section, we will express the pdf and cdf of the TIHLIK distribution in terms of pdfs and cdfs of the well-established Lomax distribution (also called the Pareto Type II distribution).

Proposition 1.
We have the following mixture representation for F(x; λ, a, b): and S m (x; a) denotes the survival function of the Lomax distribution with parameters am and 1, i.e., S m (x; a) = (1 + x) −am .
b λ ∈ (0, 1), the power series formula gives b ∈ (0, 1) and (1 + x) −a ∈ (0, 1), it follows from the binomial formula applied two times in a row that By putting the above equalities together, we obtain This ends the proof of Proposition 1.
In Corollary 1, we would like to mention that the sum of m begins with 1 since f 0 (x) = 0, which remains an important detail for the coming technical developments involving the distributional properties of the Lomax distribution.
The following result considers a mixture representation for the exponentiated F(x; λ, a, b).

Proposition 2.
Let ζ be a positive integer. Then, we have the following mixture representation: Proof. Owing to the (standard) binomial formula, we get the general binomial formula applied three times in a row gives We obtain the desired result by combining the above equalities together, ending the proof of Proposition 2.

Mathematical and Statistical Properties
This section deals with the mathematical and statistical properties of the TIHLIK distribution. Hereafter, we consider a random variable X following the TIHLIK distribution, i.e., with the cdf given by Equation (1) and the pdf given by Equation (2).

Shapes and Asymptotes
When x → 0, we have From these equivalences, when b ∈ (0, 1), we get f (x; λ, a, b) → +∞, when b = 1, we obtain f (x; λ, a, b) → λa/2 and when b > 1, we have f (x; λ, a, b) → 0. The same holds for h(x; λ, a, b), under the same conditions. When x → +∞, we have The shapes of f (x; λ, a, b) also depend on the critical point(s) of the function, given by the solution(s) of the following equation: In a same way, the shapes of h(x; λ, a, b) also depend on the critical point(s) of the function, given by the solution(s) of the following equation: These equations provide some mathematical backgrounds to Figures 1 and 2.

Quantile Function
The quantile function (qf) of X is given by The median of X is given by M = Q(1/2; λ, a, b). The other quartiles can be defined in a similar manner.
Simulated values from the TIHLIK distribution can be performed by using the following result. For any random variable U following the uniform distribution U(0, 1), x U = Q(U; λ, a, b) follows the TIHLIK distribution.
Upon differentiation of Q(y; λ, a, b), the corresponding quantile density function is given by

Ordinary Moments
We begin the study of the ordinary moments of X by an existence result.
Proposition 3. Let r be a positive integer. Then, the r-th ordinary moment of X, i.e., µ r = E(X r ), exists if, and only, if aλ > r.
Proof. The proof is centered around the equivalence results presented in Section 4.1. When x → 0, we have x r f (x; λ, a, b) ∼ (λ/2)a b bx r+b−1 and, for any > 0, by the Riemann integral criteria, 0 x r+b−1 dx Then, when aλ > r, the r-th ordinary moment of X is defined by This integral can be evaluated by any mathematical software. If a min(λ, 1) > r, an alternative expression, with possible gain in precision in terms of errors, is given by using Corollary 1, i.e., where Γ(s) = +∞ 0 t s e −t dt, s > 0. After some algebra, one can remark that Γ(am − r)Γ(r + 1)/Γ(am) = r!/ ∏ r u=1 (am − u). For practical purpose, one can consider finite limit for the sums, say a large integer as 40.
As consequence, if a min(λ, 1) > 2, the mean and the variance of X can be expressed as, respectively, µ = µ 1 and

Skewness and Kurtosis
Assuming that aλ > 4, the first four moments of X can be used to determine the measures of skewness and kurtosis of X defined by, respectively, To relax the assumption aλ > 4, one can consider measures of skewness and kurtosis based on the qf, as the Bowley skewness and Moors kurtosis defined by, respectively, and They was introduced by [13,14], respectively. The plots of γ * 1 and γ * 2 are shown in Figures 3-8, for different parameter ranges. We see smooth non-monotonic variation of these measures, attesting a significant effect of a, b, and λ on them.

Incomplete Moments
Let us now studied the incomplete moments of X. For any t > 0, the r-th incomplete moment of X exists and it is given by where 1 {X≤t} denotes the random variable such that 1 {X≤t} = 1 if {X ≤ t} is realized, and 0 otherwise. It follows from Corollary 1 that For t ∈ (0, 1), we have the following sum expression for the integral term: The first incomplete moment is given by µ 1 (t). It is useful to define other important quantities, as the mean deviation of X about µ given by the mean of X about M given by the Bonferroni curve given by and the Lorenz curves given by L(y) = yB(y), y ∈ (0, 1).

Stress Strength Parameter
This subsection is devoted to the stress strength parameter, as described in [15], in the context of the TIHLIK distribution. It is defined by R = P(X 2 < X 1 ), when X 1 and X 2 are two independent random variables following the TIHLIK distribution with the parameters λ 1 , a 1 and b 1 , and λ 2 , a 2 , and b 2 , respectively. Proposition 4. Under the setting described above, we have Proof. Owing to Propositions 1 and Corollary 1, we have Then, one can notice that υ k, ,m,u,v,w = α (2) k, ,m β (1) u,v,w and that the integral term can be expressed as By putting the above equalities together, we end the proof of Proposition 4.

Order Statistics
Order statistics was first studied by [16] in the context of the standard normal distribution. In a more general way, order statistics naturally arise for modeling a wide variety of phenomenas, mainly in reliability and life testing. Here, we provide some useful results involving the order statistics of the TIHLIK distribution. Let X 1 , . . . , X n be n independent random having the TIHLIK distribution as common distribution and X (i) be the i-th order statistic defined by the i-th random variable such that, by arranging X 1 , . . . , X n in increasing order, we have X (1) ≤ X (2) ≤ . . . ≤ X (n) . Then, a well-known result ensures that the cdf of X (i) is given by By applying Proposition 2, we obtain the following mixture representation: where φ i,j,k, ,m, Also, it is well-known that the pdf of X (i) is given by Upon differentiation of Equation (3), one can express f (i) (x; λ, a, b) as a mixture of pdfs of the Lomax distribution, i.e., where ψ i,j,k, ,m,q = −φ i,j,k, ,m,q and f q (x; a) denotes the pdf of the Lomax distribution with parameters aq and 1. From this expression, one can derive some structural properties of X (i) (ordinary moments, incomplete moments. . . ).
Let us now specially focus on the crucial first order statistic given by X (1) = inf(X 1 , . . . , X n ). The cdf and pdf of X (1) are, respectively, given by The corresponding mixture expressions in terms of Lomax distribution functions are given by Equations (3) and (4), respectively, by taking i = 1.
Let us now derive the asymptotic distribution of X (1) . It follows from the equivalence results of Section 4.1 that Hence, since F(0; λ, a, b) = 0, it follows from [17] (Theorem 8.3.6) that the asymptotic distribution of X (1) is the Weibull distribution with parameter b, i.e., there exist two sequence of real numbers (u n ) n∈N

Estimation with Simulation
This section is devoted to some statistical features of the TIHLIK model, assuming that λ, a, and b are unknown. The estimation of λ, a, and b is performed by the several recognized methods of estimation. Hereafter, x 1 , . . . , x n denote n observed values from X, and x (1) , . . . , x (n) their ascending ordering values, i.e., x (1) ≤ . . . ≤ x (n) .

Method of Maximum Likelihood Estimation
The method of maximum likelihood estimation in the context of the TIHLIK model is described below. We refer the reader to [18] for the general details. The maximum likelihood estimates (MLEs) of λ, a, and b can be obtained by maximizing, with respect to λ, a, and b, the likelihood function given by L(λ, a, b) = ∏ n i=1 f (x i ; λ, a, b) or, alternatively, the log-likelihood function for (λ, a, b) given by (λ, a, b) = log[L(λ, a, b)], i.e., (λ, a, b) = n log(2) + n log(λ) + n log(a) + n log(b) − (a + 1) Thus, the MLEs are obtained by solving the following equations simultaneously: ∂ (λ, a, b)/∂λ = 0, These equations can be solved numerically by using any mathematical software. When n is large enough, under some conditions of regularity, the subjacent distribution of the MLEs can be approximated by normal distributions, with variance given as the corresponding component of the inverse of the observed information matrix computed at those MLEs. Owing to these distributional results, confidence intervals and statistical tests for λ, a, and b can be defined analytically.

Methods of Least Squares and Weighted Least Squares Estimation
We now consider the methods of least squares and weighted least squares estimation introduced by [19]. The least square estimates (LSEs) of λ, a, and b can be determined by minimizing, with respect to λ, a, and b, the following function: Thus, the LSEs are obtained by solving the following equations simultaneously: ∂LS(λ, a, b)/∂λ = 0, ∂LS(λ, a, b)/∂a = 0, and ∂LS(λ, a, b)/∂b = 0, where and η (2) i (λ, a, b) = 2λ The weighted least square estimates (WLSEs) of λ, a, and b can be determined by minimizing, with respect to λ, a, and b, the following function: Thus, the WLSEs can be determined by solving the following equations simultaneously: ∂W LS(λ, a, b)/∂λ = 0, ∂W LS(λ, a, b)/∂a = 0, and ∂W LS(λ, a, b)/∂b = 0, which are similar to those previously presented, with the weight sequence plug-in in the right place.

Method of Cramer-von Mises Minimum Distance Estimation
Another famous estimation method is the method of Cramer-von Mises minimum distance estimation introduced by [20]. By applying it in the context of the TIHLIK model, the Cramer-von Mises minimum distance estimates (CVEs) of λ, a, and b can be obtained by minimizing, with respect to λ, a, and b, the following function: Thus, the CVEs are obtained by solving the following equations simultaneously: ∂C(λ, a, b)/∂λ = 0, ∂C(λ, a, b)/∂a = 0, and ∂C(λ, a, b)/∂b = 0, where i (λ, a, b), η (2) i (λ, a, b), and η (3) i (λ, a, b) are given by Equations (5)-(7), respectively.

Simulation Study
In this section, we come up with a numerical study to compare the behavior of the different estimates presented above. We generate N = 1000 random samples of size n = 30, 50, and 100 from the TIHLIK distribution. Four sets of the parameters are assigned as: Set1: (a = 1.5, λ = 2, b = 2), Set2: (a = 2, λ = 2, b = 2), Set3: (a = 1.5, λ = 3, b = 2), and Set4: (a = 1.5, λ = 3, b = 3). The MLE, LSE, WLSE, CVE, and RTADE of λ, b, and a are determined, along with their mean estimate Est. = (1/N) ∑ N i=1ˆ i and their mean square errors (MSEs), i.e., where, for a given method, corresponds to λ, a, or b andˆ i denotes the considered estimates for obtained by using the i-th random sample. All the numerical values are set in Tables 2-5. Table 2. Estimates and mean square errors (MSEs) of TIHLIK model for maximum likelihood (ML), least squares (LS), weighted least square (WLS), Cramer-von Mises minimum distance (CV), and right-tail Anderson-Darling (RTAD) estimates for the Set1, i.e., a = 1.5, λ = 2, b = 2.  The MSE of a, λ, and b for all methods of estimation decreases as n increases. Tables 2-5 show that MLEs get the least MSE of a, λ, and b in all situations, even if some extra bias are observable.

Applications to Practical Data Sets
This section provides applications of the TIHLIK model to two practical data sets. We compare the TIHLIK model with some recent efficiency models: those corresponding to the IK distribution, generalized inverted Kumaraswamy (GIK) distribution proposed by [22], Marshall-Olkin extended inverted Kumaraswamy (MOEIK) distribution introduced by [23], and Topp-Leone generalized inverted Kumaraswamy (TLGIK) distribution developed by [24]. The corresponding pdfs are presented below.

•
The pdf of the MOEIK distribution is given by The pdf of the GIK distribution is given by The pdf of the TLGIK distribution is given by where λ, α, β, θ > 0.
Since they have no analytical expressions, the MLEs of the model parameters are computed using an iterative optimization technique (the so-called limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm allowing bound constraints on variables). The well-known goodness-of-fit measures minus log-likelihood (−ˆ ), Akaike information criterion (AIC), Bayesian information criterion (BIC), Anderson-Darling (A * ), and Cramer-von Mises (W * ) values are computed. The rule is clear: the lower the values of these criteria, the better the fit. The value for the Kolmogorov Smirnov (KS) statistic along with its p-value are also provided. We recall that the required computations were carried out via the R software.
The first data set (data set 1). The first practical life data consist of 30 observations of precipitation (in inches) collected in March in Saint Paul, Minneapolis. It was originally reported by [25]. A first analysis of the data is given in Table 6, with descriptive statistics of the two considered data sets. Figures 9 and 10 present the boxplots and total test time (TTT) plots for data sets 1 and 2, respectively. In particular, we see that both TTT plots are convex, indicating that the subjacent hrf are increasing, as the hrf of the TIHLIK distribution can be for some values of the parameters. We refer the reader to [27] for more facts about the TTT plot. The use of the TIHLIK model for these data sets is pertinent at the first glance; this will be refined below. Tables 7 and 8 show the MLEs of the considered models along with the corresponding standard errors for data set 1 and 2, respectively. The goodness-of-fit measures for the considered models are given in Tables 9 and 10 for data set 1 and 2, respectively. Probability-Probability (P-P) plots and Quantile-Quantile (Q-Q) plots for the estimated TIHLIK model are presented in Figures 11 and 12 for data sets 1 and 2, respectively. We can observe the nice adjustments of the scatter plots by the line of equation y = x, illustrating that the TIHLIK model is adequate for the considered data. In order to observe the obtained fits, plots of the estimated pdfs over the histograms corresponding to data set 1 and 2 are shown in Figures 13 and 14, respectively.

Q−Q plot
Theoretical quantiles Empirical quantiles TIHLIK Figure 11. Probability-Probability (P-P) plots and Quantile-Quantile (Q-Q) plot of the estimated TIHLIK model for data set 1. By observing the values of the AIC, BIC, A * , and W * in Tables 9 and 10, since they are the smallest for the TIHLIK model, we conclude that it provides the best fit for the considered data sets. This is also confirmed with the p-value of the KS test. The superiority of the TIHLIK model comparing to the others is particularly flagrant for data set 2 in view of the fits in Figure 14. Last but not least, the TIHLIK model has only three parameters; it is thus less complex than the TLGIK and MOEIK models having both one more parameter.

Conclusions
In this paper, we introduce a new three-parameter lifetime distribution called the type I half-logistic inverted Kumaraswamy (TIHLIK) distribution, following the methodology described in Appendix A. Its main mathematical and statistical properties are derived, including mixture representations of crucial functions, shapes and asymptotes, quantile function, ordinary moments, skewness and kurtosis, incomplete moments, stress strength parameter, and order statistics. Several methods are investigated to estimate the TIHLIK model parameters, with efficiency supported by a simulation study considering varying sample sizes. We use two practical data sets to show that the new model can provide adequate fits as compared to four rivals, also based on the inverted Kumaraswamy distribution. In this setting, the smallest values for AIC, BIC, W * , A * , and KS are obtained for the TIHLIK model. We thus believe that it can be of interest for statisticians looking for precision in fitting various data sets extracted from sophisticated experiments, among others.