Truncated Inverted Kumaraswamy Generated Family of Distributions with Applications

In this article, we introduce a new general family of distributions derived to the truncated inverted Kumaraswamy distribution (on the unit interval), called the truncated inverted Kumaraswamy generated family. Among its qualities, it is characterized with tractable functions, has the ability to enhance the flexibility of a given distribution, and demonstrates nice statistical properties, including competitive fits for various kinds of data. A particular focus is given on a special member of the family defined with the exponential distribution as baseline, offering a new three-parameter lifetime distribution. This new distribution has the advantage of having a hazard rate function allowing monotonically increasing, decreasing, and upside-down bathtub shapes. In full generality, important properties of the new family are determined, with an emphasis on the entropy (Rényi and Shannon entropy). The estimation of the model parameters is established by the maximum likelihood method. A numerical simulation study illustrates the nice performance of the obtained estimates. Two practical data sets are then analyzed. We thus prove the potential of the new model in terms of fitting, with favorable results in comparison to other modern parametric models of the literature.


Introduction
The inverted Kumaraswamy distribution was introduced by [1], with the motivation to offer a new flexible lifetime distribution with tractable distributional properties. As suggested by its name, it corresponds to the distribution of the random variable V = (1 − U)/U, where U follows the standard Kumaraswamy distribution (more detail on the Kumaraswamy distribution can be found in the former work of [2]). Thus, it is characterized by the cumulative density function (cdf) given by with a, b > 0. Upon differentiation, the corresponding probability density function (pdf) is given by From an analytical point of view, it also corresponds to a special case of the exponentiated Lomax distribution introduced by [3] (with λ = 1), having a great success in data analysis over the last decade.
Having in mind the aim to explore new statistical horizons, we aim to benefit from the qualities of the inverted Kumaraswamy distribution to create a new general family of distributions. That is, we propose to truncate the inverted Kumaraswamy distribution on the unit interval and to compose it with a general cdf of a continuous distribution. Such a truncation technique has been employed with success to define new general families from well-established distributions on the semi-interval (0, +∞). See, for instance, [4] who introduced the truncated Fréchet-G family (by using the truncated Fréchet distribution on (0, 1)), [5] who proposed the truncated Weibull-G family (by using the truncated Weibull distribution on (0, 1)), and [6] who developed the truncated Burr-G family (by using the truncated Burr distribution on (0, 1)). However, to the best of our knowledge, the consideration of the truncated inverted Kumaraswamy distribution on (0, 1) in this setting remains new and motivates this study. For the mathematical foundation, the cdf of the truncated inverted Kumaraswamy distribution on (0, 1) is given by Thus, the restriction on the support implies an adjustment on the normalization constant depending on the two shape parameters a and b, which is now (1 − 2 −a ) −b . Then, for any cdf of a continuous distribution G(x; ξ), by a natural composition technique, we introduce the cdf given by defining the truncated inverted Kumaraswamy generated (TIK-G) family of distribution. One can notice that the TIK-G family is defined with a simple cdf, offering a tractable alternative to other families sometimes defined with sophisticated cdf. This study discusses the main distributional and practical properties of the TIK-G family, with an emphasis on entropy, as well as its potential of applicability. We also introduce a special member of the family defined with the exponential distribution as baseline, forming a new three-parameter lifetime distribution called truncated inverted Kumaraswamy exponential (TIKEx) distribution. Among its nice features, by the consideration of two practical data sets, we show that the related model has better fits to 9 other well-established models, proving the importance and interest of the TIK-G family of distribution in a data analysis setting. The rest of the article is organized as follows. The main distributional functions related to the TIK-G family are presented in Section 2, with discussion on special members of interest. Section 3 is devoted to the mathematical properties of the TIK-G family, with a focus on the entropy. In Section 4, the maximum likelihood method is employed to obtain the estimates of the model parameters. In Section 5, we apply a special TIK model to two practical data sets, with fair comparison to other well-established models. We provide a conclusion in Section 6.

The TIK-G Family
Here, we present the main functions related to TIK-G family, a short list of special members, with discussion on the special member of the family defined with the exponential distribution as baseline.

Main Functions
We recall that the TIK-G family is defined by the cdf given by (4). The corresponding survival function (sf) is given by Upon differentiation of F(x; a, b, ξ) according to x, the corresponding pdf is given by where g(x; ξ) is the pdf corresponding to G(x; ξ). The corresponding hazard rate function (hrf) is given by The corresponding cumulative hazard rate function (chrf) is given by The corresponding quantile function (qf), say Q(u; a, b, ξ), is characterized by the equation F(Q(u; a, b, ξ); a, b, ξ) = u, for u ∈ (0, 1). After some algebra, we obtain where Q G (u; ξ) denotes the qf corresponding to G(x, ξ). Among all the important quantities related to the qf, one can mention the median defined by Med = Q(1/2; a, b, ξ) and the interquartile range defined by Upon differentiation of Q(u; a, b, ξ) according to u, the corresponding quantile density function (qdf) is given by where q G (u; ξ) denotes the qdf corresponding to G(x, ξ).

The TIKEx Distribution
The TIKEx distribution is defined by the following cdf: It corresponds to the special member of the TIK-G family defined with the cdf of the exponential distribution with parameter θ as baseline, i.e., ξ = θ and G(x; θ) = 1 − e −θx , x, θ > 0. Then, it constitutes a new three-parameter lifetime distribution and will be the object of all the attentions in the rest of study.

Remark 1.
Let us notice that, by taking a = 1, the cdf given by (11) is reduced to which corresponds to the exponentiated cdf of the Marshall-Olkin-G family introduced by [7] defined with α = 1/2 and with the the cdf of the exponential distribution with parameter θ as baseline (or the M transformation of the exponential distribution introduced by [8]). To the best of our knowledge, it is new in the literature. Hence, in our study, we consider a generalization of it thanks to the shape parameter a.
The corresponding pdf is given by The corresponding hrf and qf are, respectively, given by and In particular, the median of the TIKEx distribution is given by Med = Q(1/2; a, b, θ).
In order to illustrate the flexibility of the shapes of f (x; a, b, θ) and h(x; a, b, θ), Figures 1 and 2 display the plots of f (x; a, b, θ) and h(x; a, b, θ), respectively, for some values of the parameters a, b, and θ. We observe that the pdf is left skewed, reversed-J shaped, and approximately symmetrical, while the hrf is increasing, decreasing, upside down, and bathtub shaped.

Properties
This section is devoted to the most fundamental properties of the TIK-G family, with an emphasis on the entropy. Also, the general properties are applied to the TIKEx distribution as illustration.

Some Series Expansions
The following result presents a series expansion for the pdf of the TIK-G family.

Remark 2.
By following the lines of the proof of Proposition 1, by applying the generalized binomial formula for (1 + G(x; ξ)) −ak as we also have the following series expansion in terms of pdfs of the exponentiated-G family (see [9]): where g (x; ξ) = ( + 1)g(x; ξ)G(x; ξ) and Two different generalizations of Proposition 1 are given in Propositions 2 and 3.

Proposition 2.
Let κ > 0. Then, for x such that G(x; ξ) ∈ (0, 1), we have the following series expansion: where v (κ) Proof. Following the lines of the proof of Proposition 1, by replacing b by b(κ + 1), we get The series expansion of f (x; a, b, ξ)F(x; a, b, ξ) κ is obtained upon differentiation of F(x; a, b, ξ) κ+1 according to x and a change of indexes. The proof of Proposition 2 is completed. Now, we propose an expansion for the exponentiated pdf of the TIK-G family. where Proof. We have , and S G (x; ξ)/2 ∈ (0, 1), the generalized binomial formula gives and Therefore, by putting the above equalities together, we get the desired result. The proof of Proposition 3 is completed.
To end this section, let us notice that, if G(x; ξ) is the cdf of the exponential distribution with parameter θ, i.e., G(x; θ) = 1 − e −θx , x, θ > 0, then, for any positive integer , we have which is the pdf of the exponential distribution with parameter ( + 1)θ. Also, for any positive real number κ, we have where ψ * (x; θ) denotes the pdf of the exponential distribution with parameter (κ + )θ. We thus take advantage of the above results in this setting; the well-established distributional properties of the exponential distribution are useful to determine those of the TIKEx distribution.

Critical Points of the pdf and hrf
The critical implying that The nature of a critical point of The same approach can be applied to the critical point(s) of h(x; a, b, ξ); it/they is/are given by the solution(s) of the equation The nature of a critical point of h(x; a, b, ξ), say x o , depends to the sign of In the context of the TIKEx distribution, the critical point(s) of f (x; a, b, ξ) is/are the solution(s) of the following equation according to x: and the critical point(s) of h(x; a, b, ξ) is/are the solution(s) of the following equation according to x: Clearly, the nature of a critical point depends on a, b, and θ and no close form exists. For given values of a, b, and θ, they can be determined numerically by using a mathematical software (R, Matlab, Mathematica, Python. . . ). We refer the reader to Figures 1 and 2 for a graphical illustrations of these critical points.

Moments
Hereafter, we consider a random variable X have the cdf of the TIK-G family given by (4). By assuming that it exists, for any positive integer s, the s-moment of X is given by For given G(x; ξ), a and b, we can evaluate this integral numerically. From an analytical point of view, Proposition 1 can be useful. Indeed, by assuming that the signs sum and integral can interchange, we have where u k, is given by (18) and the integral term can be calculated in a simple way, depending on the complexity of G(x; ξ). For instance, in the context of the TIKEx distribution, we have ξ = θ and Hence, in this special case, we have The mean of X is given by µ = µ 1 = E(X) and the variance of X is given by Also, the s-th general coefficient of X is given by For s = 1, we obtain C 1 , which is used to define the coefficient of variation of X as This coefficient is an useful standardized measure of dispersion. For s = 3, C s becomes the skewness coefficient of X and for s = 4, it becomes the kurtosis coefficients of X, which are traditionally used to evaluate the asymmetry and the peakedness of the corresponding distribution, respectively. We end this subsection by giving numerical values of some central, dispersion, skewness, and kurtosis parameters for the TIKEx distribution in Table 1.

Probability Weighted Moments
By assuming that it exists, for any positive integers s and t, the (s, t)-probability weighted moment of X is given by Again, for given G(x; ξ), a and b, this integral can be evaluated numerically. One can also use Proposition 2 in the following manner. By assuming that the signs sum and integral can interchange, we have where v (t) k, is given by (24). In the context of the TIKEx distribution, we have The probability weighted moments appear naturally in many applied areas, as those using order statistics. We refer the reader to [10].

Incomplete Moments
By assuming that it exists, for any positive integers s, the s-incomplete moment of X is given by where Y y is a random variable such that Y y = X if X ≤ y and Y y = 0 elsewhere. Owing to Proposition 1, we can express µ s (y) as For the special case of the TIKEx distribution, we have where γ(a, x) = x 0 y a−1 e −y dy is the lower incomplete gamma function. Among the possible applications of the incomplete moments, we would like to mention the Lorenz and Bonferroni curves using the first incomplete moment; they are, respectively, defined by Numerous real life applications employed such curves. We refer the reader to [11] and [12], respectively. Figure 3 shows the plots of these curves in the context of the TIKEx distribution for selected values of parameters.

Entropy
The entropy of a random variable X is a measure of variation of the uncertainty: high entropy means high uncertainty. The entropy plays a fundamental role in information theory, where several entropy measures have been introduced. We refer the reader to the review of [13], and the references therein. This subsection is devoted to two notable ones: the Rényi entropy and Shannon entropy.

Rényi Entropy
The Rényi entropy of X is defined by where δ > 0 and δ = 1. The former work and motivations can be found in [14]. Under some configuration on G(x; ξ), a, b, and δ, it can be computed numerically. Also, owing to Proposition 3, we can express I δ as where w (δ) k, is defined by (27). As example, for the TIKEx distribution, by using (32), we have implying that Numerical values of the Rényi entropy for the TIKEx distribution for various values of the parameters are documented in Table 2.  In Table 2, we observe that the Rényi entropy can take negative and positive values belonging to the interval [−6.13, 1.66] for the considered values of δ, a, b, and θ. Thus, these parameters have a strong effect on the Rényi entropy, showing different degrees of uncertainty.

Shannon Entropy
The Shannon entropy of X is defined by It has been introduced by [15]. One can show that it is obtained by applying δ → 1 to the Rényi entropy presented above. Another expression comes from the former definition. Indeed, by using the expectation expression, we can write We now propose a series expansion for η. Owing to Proposition 1, we have Under some circumstance, the integral term can be determined. On the other hand, the series expansion of the logarithmic function gives and, with the application of the generalized binomial formula in a second step, For the expectations terms in the sums, one can notice that, for any positive integer κ, by Remark 2, (one can also use Proposition 1, but with more developments in this case). Some numerical values Shannon entropy for some values of the parameters are collected in Table 3. For the considered values of a, b, and θ in Table 3, the Shannon entropy takes its values into the interval [−0.95, 1.9]. Thus, the amount of uncertainty is impacted by these parameters, showing the richness of the TIKEx distribution in this sense.

Maximum Likelihood Estimation
This section focuses on the estimation of the TIK-G model parameters by the maximum likelihood method.

Basics on the Maximum Likelihood Method
Let x 1 , . . . , x n be a random sample of size n of X. Then, the log-likelihood functions is defined by Assuming that (a, b, ξ) is differentiable according to a, b, and ξ, the maximum likelihood estimates (MLEs) are given by the simultaneous solutions of the following equations: and, by setting g ξ (x; ξ) = ∂g(x; ξ)/∂ξ and G ξ (x; ξ) = ∂G(x; ξ)/∂ξ, Let us denote the MLES of a, b, and ξ byâ,b, andξ, respectively. Then, it follows from the equation ∂ (â,b,ξ)/∂b = 0 the following simple relation: Under standard regularity conditions, all the well-established theoretical properties behind the MLEs can be applied, allowing the construction of confidence interval and statistical tests, among others. The complete theory can be found in [16].
To end this subsection, we would like to mention that, in the context of the TIKEx distribution, i.e., ξ = θ and G(x; θ) = 1 − e −θx , x, θ > 0, the above partial differential becomes and

Simulation
Here, we consider exclusively the TIKEx model. Let X be a random variable following the TIKEx distribution with parameters a, b, and θ. We simulate values from X and, for each n = 50, 100, 200, 500, and 1000, we consider N = 1000 random samples of size n from X. This simulation is based on the fact that, for any random variable A following the uniform distribution U (0, 1), x A = Q (A; a, b, θ) following the TIKEx distribution with parameters a, b, and θ. We consider 8 sets of different parameters with b fixed as b = 2. Then, the performance of the MLEs is evaluated by considering the mean of the estimates (estimate) and the root-mean-squared error (RMSE), respectively defined by where φ denotes a or b or θ andφ i denotes the corresponding MLE obtained by using the i-th random sample. The numerical results, obtained by the use of the R software, are documented in Table 4. From Table 4, we see that the RMSEs of the model parameters decrease as n increases, which is consistent with the maximum likelihood method theory (see [16]).
The second data set The second data set consists of 179 values of successive failure of the air conditioning system. For the data and more detail, we refer the reader to [24,25].
First of all, we would like to mention that the coming data analyzes are performed by the use of the R software. Table 5 shows first description of the data, revealing different natures, mainly on the range, the skewness and kurtosis. Figure 4 presents the total test time (TTT) plots for the two data sets. We can see that the curve in the first TTT plot is concave which indicates that the first data set is related to an increasing failure rate. The curve in the second TTT plot is convex, indicating that the second data set is related to a decreasing failure rate (see [26] for further detail on TTT plot). These cases are covered by the TIKEx distribution, motivating its used (see Figure 2).
The MLEs of the model parameters are considered, the essential of the MLEs for the TIKEx model can be found in the Section 4.1. For the first data set, Table 6 presents the MLEs for all the considered models. Then, Table 7 presents some standard goodness-of-fit measures, including the AIC: Akaike Information Criterion, BIC: Bayesian Information Criterion, A * : Anderson-Darling statistic and W * : Cramer-von Mises statistic. Also, the minus log-likelihood for the estimated model is computed (−ˆ ). The lower the values of these measures, the better the fit. We complete them by providing the KS: Kolmogorov-Smirnov statistic, along with its p-value.
For the second data sets, Table 8 gives the MLEs for all the considered models. Table 9 is the analogue of Table 7 but for the second data set.
For the data set 1, all the estimated pdfs, superposed on the related histogram of the data, are given in Figure 5. A simultaneous comparison of the estimated cdfs with the empirical cdf of the data can be seen in Figure 6. Similarly, for the data set 2, the estimated pdfs and cdfs can be observed in Figures 9 and 10, respectively. Figures 11 and 12 propose the same, respectively, but with an individual treatment of the estimated functions.      In view of Tables 7 and 9, the TIKEx model is the best (smallest AIC, smallest BIC. . . ), except for the second data set where the log-normal model has a better KS and p-value (but the TIKEx model remains the best for the other measures). The superiority of the TIKEx model is also supported by all the figures illustrating the fitting of the models over the considered data. All these results motivate the importance of the TIKEx model in the context of data analysis.

Conclusions
A new general family of distributions is introduced by the use of a truncated version of the inverted Kumaraswamy distribution and composition. It is called the TIK-G family. Thanks to its simplicity, richness, and nice flexible properties demonstrated along the study, it provides a suitable alternative to other general families somehow complex. Its main theoretical properties are discussed, with expressions and numerical analyzes for the Rényi entropy and Shannon entropy. A special focus is put on the special member of the family defined with the exponential distribution, called the TIKEx distribution. After showing its undeniable qualities (flexible pdf and hrf, series expansions for the moments, entropy. . . ), we prove that the TIKEx model is capable of fitting various types of data, better than several modern models also derived to the exponential distribution. Thus, we hope that the TIK-G family can attract wider applications in many applied field where a sharp data analysis is essential to explain new phenomena.