The Truncated Cauchy Power Family of Distributions with Inference and Applications

As a matter of fact, the statistical literature lacks of general family of distributions based on the truncated Cauchy distribution. In this paper, such a family is proposed, called the truncated Cauchy power-G family. It stands out for the originality of the involved functions, its overall simplicity and its desirable properties for modelling purposes. In particular, (i) only one parameter is added to the baseline distribution avoiding the over-parametrization phenomenon, (ii) the related probability functions (cumulative distribution, probability density, hazard rate, and quantile functions) have tractable expressions, and (iii) thanks to the combined action of the arctangent and power functions, the flexible properties of the baseline distribution (symmetry, skewness, kurtosis, etc.) can be really enhanced. These aspects are discussed in detail, with the support of comprehensive numerical and graphical results. Furthermore, important mathematical features of the new family are derived, such as the moments, skewness and kurtosis, two kinds of entropy and order statistics. For the applied side, new models can be created in view of fitting data sets with simple or complex structure. This last point is illustrated by the consideration of the Weibull distribution as baseline, the maximum likelihood method of estimation and two practical data sets wit different skewness properties. The obtained results show that the truncated Cauchy power-G family is very competitive in comparison to other well implanted general families.


Introduction
The general version of the truncated Cauchy distribution is defined by the following cumulative distribution function (cdf): In this paper, we offer a comprehensible alternative by introducing the truncated Cauchy power-G (TCP-G) family. It is defined on the basis on the truncated Cauchy distribution on the interval (0, 1) and the exp-G family. Indeed, the cdf of the TCP-G family is given by where α > 0 and, again, G(x; ξ) denotes the cdf of a univariate continuous distributions with parameter vector denoted by ξ. As immediate remark, the cdf of the TCP-G family has a simple expression, with an immediate series expansion, which is not the case for the GOHC-G or OPC-G families. The related probability functions can be deduced easily, with tractable expressions and immediate series expansions. Thus, the main properties of the TCP-G family can be derived, including the analyzes of the shapes of the probability and hazard rate functions, as well as their asymptotic properties, the quantile function, moments and related functions, several measures of skewness and kurtosis, Rényi and q-entropies and order statistics. Then, the estimation of the TCP-G model parameters is investigated by the maximum likelihood method, with an emphasis on the one defined with the Weibull distribution as baseline. To evaluate the performance of the obtained estimates, two sampling schemes are considered, namely the simple random sampling and the ranked set sampling. As expected, nice numerical results are obtained for both. Then, two practical data sets are employed to show the modelling ability of the TCP-G family. More precisely, with the consideration of the Weibull distribution as baseline, we show that the TCP-G family generates very competitive models compared with other widely known general families, such as the Kumaraswamy-G and beta-G families with however one more parameter. The rest of the paper is organized as follows. In Section 2, more mathematical backgrounds are given on the TCP-G family. Its most notable properties are presented in Section 3. The estimation of the model parameters is discussed in Section 4. Section 5 is devoted to the applied part. Some concluding remarks and perspectives are communicated in Section 6.

The TCP-G Family
This section is devoted to the description of the main probability functions of the TCP-G family, namely the probability density, hazard rate and quantile functions, with discussions on some of their analytical properties. A special member of the family is presented as example.
The critical point(s) of f (x; α, ξ) is (are) of interest for the uni/multimodality analysis and, a fortiori, modelling perspectives. Thus, a critical point x c of f (x; α, ξ) satisfies the non-linear equation given by {log[ f (x; α, ξ)]} | x=x c = 0, i.e., g (x c ; ξ) g(x c ; ξ) The nature of x c depends on the position of the value of η = {log[ f (x; α, ξ)]} | x=x c about 0, i.e., Hence, if η > 0, then x c is a local minimum, if η < 0 then x c is a local maximum and if η = 0, then x c is an inflexion point. There is no closed-form for x c or η; mathematical softwares are required to provide numerical evaluations for x c or η.

Hazard Rate Function
The hazard rate function (hrf) of the TCP-G family is defined by h( We present some of its immediate analytical properties below. When G(x; ξ) → 0, we get h(x; α, ξ) ∼ f (x; α, ξ) ∼ (4α/π)g(x; ξ)G(x; ξ) α−1 . Hence, as for f (x; α, ξ), the parameter α plays an important role on the asymptotic properties of h(x; α, ξ). When G(x; ξ) → 1, by using the following equivalence: when y → 1, arctan(y) ∼ π/4 − (1 − y)/2, we get The possible shapes for h(x; α, ξ) are of interest from the modelling point of view. Here, we only discuss the critical point(s) of this function. Thus, a critical point x o of h(x; α, ξ) satisfies the non-linear equation given by {log[h(x; α, ξ)]} | x=x o = 0, i.e., The nature of x o depends on the position of the value of We omit to express it for the sake of place. Again, there is no closed-form for x o or υ, but the use of a mathematical software can help to evaluate them.
The quantile function is useful to simulate values from distributions belonging to the TCP-G family. Indeed, for a given baseline cdf G(x; ξ), from n values u 1 , . . . , u n randomly and independently obtained from the uniform distribution over (0, 1), then x 1 , . . . , x n with x i = Q(u i ; α, ξ) are n values randomly and independently obtained from the corresponding TCP-G distribution.
Furthermore, the quantile function allows defining some skewness and kurtosis measures. They have the advantage to always exist contrary to those defined with moments.
If Q * (u; ξ) has not an analytical expression but can be expressed by a power expansion series (such as the qf of the normal distribution), one can determine a power expansion series for Q(u; α, ξ) by proceeding as in Section 3.4 of [25].

Example: The Truncated Cauchy Power Weibull Distribution
By construction, the TCP-G family is rich and contains numerous new distributions with a potential interest from a statistical point of view (with different supports, numbers of parameters, properties. . . ). Here, we focus our attention on the member of the TCP-G family defined with the Weibull distribution as baseline. For the purpose of this paper, it is called the truncated Cauchy power Weibull (TCPW) distribution.

Notable Properties
In this section, some notable properties of the TCP-G family, and of the TCPW distribution in particular, are derived.

Linear Representations
Simple expansion series for the pdf and cdf of the TCP-G family are obtained according to the cdf and pdf of the exponentiated-G family by [8] given by G γ (x; ξ) = G(x; ξ) γ and g γ (x; ξ) = γg(x; ξ)G(x; ξ) γ−1 , where γ > 0. The interest of such expansions series is mainly for practical purposes: the determination of some properties of the TCP-G family via such expansions can be more efficient than computing those directly by numerical integration involving the corresponding pdf (which is well-known to prone to rounding off errors).
Since G(x; ξ) α ∈ (0, 1), owing to the well-known series decomposition of the arctangent function, we have the following series expansion for F(x; α, ξ): Upon differentiation of F(x; α, ξ), a series expansion for f (x; α, ξ) follows: One can remark that the coefficients in these series expansions are readily computed numerically using any standard mathematical software. Also, in any numerical calculations using these series expansions, infinity should be substituted by a large integer number. In this sense, some properties of the exponentiated-G family can be useful to determine those of the TCP-G family, as developed for the moments and related functions in the next section.

On Moments and Related Functions
Now, let X be a random variable with the cdf given by (1), defined on a probability space (Ω, A, P).
By virtue of (10), for any function ψ(x) such that all the introduced quantities are well-defined, we have the following integral expression: For some configurations, the integral term can be calculated or, at least, evaluated numerically by any mathematical software.
In particular, the s-th moment of X is obtained by choosing ψ(x) = x s , i.e., µ s (α, ξ) = E(X s ). Hence, by taking s = 1, we get the mean of X, i.e., µ(α, ξ) = µ 1 (α, ξ). Furthermore, by taking s = 2, we obtain µ 2 (α, ξ) = E(X 2 ), from which we can express the variance of X defined by From the first s moments of X, the s-th central moment of X can be deduced as Then, some properties of the TCP-G family, as the skewness and kurtosis properties, can be investigated by the study of the s-th general coefficient of X given by The moment generation function of X according to t is obtained by choosing ψ(x) = ψ t (x) = e tX , i.e., M(t; α, ξ) = E(e tX ). Similarly, the characteristic function of X according to t is obtained by Another important function is the s-th incomplete moment of X according to y which follows from the choice ψ(x) = ψ * y (x) = x s 1 {x≤y} , where 1 A denotes the indicator function equal to one if A is satisfied and 0 otherwise, i.e., µ s (y; α, ξ) = E(X s 1 {X≤y} ). In particular, the first incomplete moment allows us to define the mean deviation about the mean, i.e., , as well as the Lorenz curve, the Gini inequality index and the Zenga curve, which are of great importance in many applied fields. Further details can be found in [27,28].
Let us now discuss some of the above properties in the context of the TCPW distribution, with the use of (12). Thus, X is a random variable following the TCPW distribution, i.e., having the cdf given by (5). Then, the s-th moment µ s (α, λ, θ) exists. Owing to (12) and That is, we obtain the mean µ(α, λ, θ) and the variance σ 2 (α, λ, θ) of X proceeding as above. To illustrate the effect of the parameters α, λ and θ on them, Figure 3 represents µ(α, λ, θ) and σ 2 (α, λ, θ) under two different scenarios: (i) for fixed λ and θ and varying α and (ii) for fixed θ and α and varying λ. Wee that the mean can increase with a near constant variance (see Figure 3a) whereas it can decrease with high variations for the variance (Figure 3b). This illustrates the flexibility of these two measures according to the distribution parameters. We conclude this part by the description of the incomplete moments of X. By introducing the lower incomplete gamma function defined by γ(x, y) = y 0 t x−1 e −t dt, the s-th incomplete moment of X is given by Thus, the first incomplete moment can be derived, as well as the related important quantities and functions (mean deviations, Lorenz curve. . . ).

Skewness and Kurtosis Based on Quantiles
As previously mentioned, one can define measures of skewness and kurtosis based on quantiles. In comparison to those defined with moments, they are more simple to calculate and not influenced by the eventual extreme tails of the distribution. One of the most useful skewness based on quantile is the MacGillivray skewness introduced by [29]. In the context of the TCP-G family, based on (4) and the median, it is given by the following function: We can use this robust function to describe efficiently the effect of the parameters (α, ξ) on the skewness; more the shapes of the graphs of ρ(u; α, λ, θ) are varying according to the parameters, more the skewness is flexible. One can notice that, for u = 1/4, it becomes the Galton skewness studied by [30]. The sign of the Galton skewness is informative on the right or symmetric or left skewed nature of the distribution; ρ(3/4; α, ξ) > 0 means that the distribution is right skewed, ρ(3/4; α, ξ) = 0 means that the distribution is symmetrical and ρ(3/4; α, ξ) < 0 means that the distribution is left skewed. Also, the kurtosis of the TCP-G family can be studied by considering the Moors kurtosis proposed by [31]. It is defined by A high value for K(α, ξ) means that the distribution has heavy tails and a small values for K(α, ξ) means that the distribution has light tails. We now investigate the skewness and kurtosis of the TCPW distribution. In this case, thanks to (8), the MacGillivray skewness and Moors kurtosis have a closed-form. We now propose some visual explorations of these measures. Figure 4 presents the MacGillivray skewness when (i) λ and θ are constant, i.e., λ = 1.5 and θ = 0.3, and α increases and (ii) α and λ are constant, i.e., α = 1.5 and λ = 0.5, and θ increases. Moderate variations can be seen in the curves of Figure 4a, meaning that the parameter α has a moderate effect on the skewness, whereas various wide variations on the shapes of the curves are observed in Figure 4b, showing that the parameter θ strongly influenced the skewness. Then, a similar visual approach is performed for the Galton skewness in Figure 5. For the selected values of the parameters, we see that the Galton skewness decreases. Also, it is observed that it can be positive (see Figure 5a) or negative (see Figure 5b with λ = 2, α ∈ {0.4, 0.6, 1.2} and θ > 5 approximately), meaning that the TCPW distribution can be left or right skewed, respectively. Figure 6 displays the Moors kurtosis following the same scenarios. We see that the TCPW distribution can be of different kurtosis nature, which small or high possible values. All these facts show the great skewness and kurtosis flexibility of the TCPW distribution.

Rényi Entropy and q-Entropy
Entropy is a fundamental measure to quantify the amount of informations in a distribution, finding applications in information science, thermodynamics and statistical physics. Here, we investigate two different and complementary kinds of entropy arising from various physical experiments: Rényi entropy and q-entropy, of the TCP-G family, as introduced by [32,33], respectively. As common interpretation, the lower the entropy, the lower the randomness of the related system. For further detail, we refer the reader to the survey of [34].
Rényi entropy is defined by with δ ∈ (0, +∞)/{1}. Since it can be expressed analytically, we aims to provide a series expansion of I δ (α, ξ). Owing to (2) and the generalized binomial formula, we get Therefore, we can expressed I δ (α, ξ) as: For given functions and parameters, mathematical software can be useful to evaluated numerically this last integral.
If we consider the case of the TCPW distribution, we can formulate I δ (α, λ, θ) by the above expression and the following series expansion: In the general context of the TCP-G family, the q-entropy is defined by with δ ∈ (0, +∞)/{1}. Proceeding as for the Rényi entropy, we can expressed it as: For the the TCPW distribution, by replacing δ by q, we can express the integral term as in (13).

Order Statistics
We now present the main properties of the order statistics in the context of the TCP-G family. The general theory can be found in [35]. Now, let X 1 , . . . , X n be a random sample from the TCP-G family and X i:n be the i-th order statistic, i.e., its i-th smallest random variables (in the standard probabilistic ordering sense, i.e., X ≤ Y if and only if P(X ≤ Y) = 1). Then, it is well-known that the cdf and pdf of X i:n are, respectively, given by We now focus on the determination of a tractable series expansions for F i:n (x; α, ξ) and f i:n (x; α, ξ).
where c s,0 = 1 and, for any m ≥ 1, c s,m is defined by the following relation: Thus, it follows from (15) that (c j+i,k is defined as in (15) with s = j + i). This shows that the cdf of the order statistics of the TCP-G family can be expressed as an infinite mixture of cdfs of the exponentiated-G family by [8]. Therefore, the well-established properties of the exponentiated-G family can be used to determine those of the order statistics of the TCP-G family. Indeed, from F i:n (x; α, ξ), one can deduce the corresponding pdf by differentiation as follows: This expression allows determining moments, skewness, kurtosis, and other important measures and functions.
In the case of the TCPW distribution, a refinement of these series expansions are possible. Indeed, we can expend G α(2k+j+i) (x; ξ) in a series expansion as in (11), which implies that where e j,k, ;i:n = ( α(2k+j+i) )(−1) d i,j;i:n and S(x; λ, θ) = e − λx θ (we recall that it is the survival function of the Weibull distribution with parameters λ and θ).
Also, upon differentiation of F i:n (x; α, λ, θ), the pdf of X i:n is given by e j,k, ;i:n g(x; λ, θ), where g(x; λ, θ) = λθx θ−1 e − λx θ (we recall that it is the pdf of the Weibull distribution with parameters λ and θ). As a direct application, the r-th moment of X i:n can be obtained as e j,k, ;i:n −s/θ .

Estimation of the TCP-G Model Parameters
This section is devoted to the inferential properties of the TCP-G model. The estimation of the parameters α and ξ is performed by the maximum likelihood method. Two different sampling schemes are considered: the simple random sampling (SRS) and the ranked set sampling (RSS). In what follows, n denotes a positive integer measuring the size of the considered sample; it can be small or large.

Maximum Likelihood Method under SRS
Let x 1 , . . . , x n be a SRS from the TCP-G family, i.e., with the pdf given by (2). Then, the corresponding likelihood function is defined by Thus, the corresponding log-likelihood function is defined by (α, ξ) = log[L(α, ξ)] = n log(4) + n log(α) − n log(π) Then, the maximum likelihood estimates (MLEs) of α and ξ are defined by (α,ξ) = argmax (α,ξ) L(α, ξ) = argmax (α,ξ) (α, ξ). Assuming that (α, ξ) is differentiable, the MLEs can be obtained by solving the following non-linear equations simultaneously: ∂ (α,ξ)/∂α = 0 and ∂ (α,ξ)/∂ξ = 0, with and, by setting g( In general, these non-linear equations cannot be solved explicitly. However, the corresponding MLEs can be evaluated by using any well-know numerical numerical optimization technique. Thanks to the well-established theory of the maximum likelihood maximum method, by assuming that n is large enough and some regularity conditions hold, we can construct asymptotic confidence intervals of the model parameters. In this regard, we need the approximate inverse of the observed information matrix. By setting r be the number of components in the vector ξ and ξ = (ξ 1 , . . . , ξ r ), it is given by Then, the asymptotic confidence intervals of α and ξ i , for i = 1, . . . , r, at the level 100(1 − ν)% are, respectively, given by where vα and vξ i are the first and i + 1-th elements of the main diagonal of I(α,ξ) −1 , respectively, and z γ is the quantile of the standard normal distribution taken at γ.
For the special case of the TCPW model, we recall that ξ = (λ, θ), G(x; λ, θ) = 1 − e −λx θ and g(x; λ, θ) = λθx θ−1 e −λx θ . Thus, the equations to obtain the MLEsα,λ andθ of α, λ and θ, respectively, can be expressed by using the following partial derivatives: The same for the approximate inverse of the observed information matrix, i.e., I(α,λ,θ) −1 , but with the determination of the second partial derivatives. Here, we omit them for the sake of place.

Maximum Likelihood Method under RSS
First of all, let us briefly present the considered RSS as introduced by [36] in our distributional context and in the following simple scheme: it is supposed that the set size is n and that number of cycles is n. In this scheme, let x 1 , . . . , x n 2 be a SRS of size n 2 from the TCP-G family, i.e., with the cdf and pdf given by (1) and (2). Then, the obtained values are randomly divided into n sets of n units each. On each set, we rank the n elements. In the first set, we select the element with the smallest ranking, denoted by x 1 (1) . In the second set, we select the element with the second smallest ranking, denoted by x 2 (2) . We follow this processes until we have ranked the elements in the n-th set and selected the element with the largest ranking, denoted by x n(n) .

Simulation Study
As a logical sequel of the previous subsection, we provide a numerical study on the MLEs of the TCPW model parameters based on simple random sampling (SRS) and ranked set sampling (RSS). A comparison study between the estimates is performed by considering the mean squared errors (MSEs) and relative efficients (REs) defined by RE = MSE(RSS)/MSE(SRS). Also, lower bounds (LBs), upper bounds (UBs) of the related asymptotic confidence intervals, as well as their average lengths (ALs) defined by AL = UB -LB at the levels 90% and 95%, are calculated based on RSS and SRS via Mathematica 9. The simulation procedure follows the following six steps.
Step 3: For the chosen set of parameters and each sample of size n, the MLEs are computed under SRS and RSS as described in the above subsection.
Step 4: Repeat the previous steps from 1 to 3, N times representing with different samples, where N = 1000. Then, MSEs and REs are computed.
Step 5: The LB, UB and AL for selected values of parameters are calculated based on SRS and RSS.
• For both of the sampling schemes, the MSEs decrease as n increases.

•
For both of the sampling schemes, the AL of the CI become decreases as n increases.

•
The estimates based on RSS have smaller MSE than the corresponding based on SRS. For this reason, in case of a high level of precision is required, RSS is preferable.

Application to Two Practical Data Sets
The TCPW model finds a concrete interest in the precise modelling of real life data sets. Here, we illustrate this aspect by considering the two following data sets.
The first data set is taken from tests on the endurance of deep-groove ball bearings. The measurements represent the number of millions revolutions reached by each bearing before fatigue failure (see [37]). The first data set is given by: 17 Table 9. From Table 10, we see that the data are highly right skewed with a consequent kurtosis, which is a case also covered by the TCPW model.
Then, we compare the TCPW model to the following well-established models: the Kumaraswamy-Weibull-exponential (KwWE) model by [39], the Kumaraswamy-Weibull (Kw-W) model by [9], the beta Weibull (BW) model by [40], and the standard Weibull (W) model. The results are obtained using the R software.
By respecting the standard in the field, all the parameters will be estimated by the MLEs in the SRS case, even if the simulation study is favorable to the RSS for the TCPW model (see the subsection above). Then, standard measures are taken into account, namely: the Cramér-Von Mises (CVM) statistic, the Anderson-Darling (AD) statistic and the Kolmogorov-Smirnov (KS) statistic along with the corresponding p-value. The obtained results are summarized in Tables 11 and 12 for the first and second data sets, respectively. We see that the TCPW model has the smallest CVM, AD, KS and the greatest p-value (with p-value ≈ 0.94 and ≈ 0.97 for the first and second data sets, respectively, which are quite close to the limit 1), attesting that it is the best model for these data sets.
To solidify this claim, we provide the minus estimated log-likelihood function (−ˆ ), Akaike information criterion (AIC), corrected Akaike information criterion (CAIC), Bayesian information criterion (BIC), and Hannan-Quinn information criterion (HQIC) in Tables 13 and 14 for the first and second data sets, respectively. We observe that the TCPW model has the smallest AIC, CAIC, BIC and HQIC, attesting its superiority in terms of modelling. To illustrate this, Figures 7 and 8 show the fits of (i) the estimated pdfs over the corresponding histograms and (ii) cdfs over the corresponding empirical cdfs of the related models, for the first and second data sets, respectively. As expected, nice fits can be seen for the TCPW model.

Concluding Remarks and Perspectives
In this paper, we offered a new general family of distributions based on the truncated Cauchy distribution and the exp-G family, called the truncated Cauchy power-G (TCP-G) family. A focus was put on the special member of the family defined with the Weibull distribution as baseline, called the TCPW distribution. Its cdf has the feature of being simply defined with the arctangent and power functions, allowing tractable expressions for the other corresponding functions (pdf, hrf, qf. . . ). In ddition to its simplicity, we revealed the desirable properties of the family, such as very flexible shapes for the pdf and hrf, skewness, kurtosis, moments, entropy. . . . By considering the special TCPW model, a full simulation study illustrates the nice performance of the maximum likelihood method in the estimation of the model parameters. The deep analysis of two famous data sets shows all the potential of the new family, with fair and favorable comparison to well-established models in the same setting.
From the perspective of this work, one can apply the TCP-G family in a regression model framework (creating new possible distributions on the error term). Also, one can investigate some natural (and not too complicated) extensions of the TCP-G family as those defined by where α > 0, λ ∈ (0, 1] and G(x; ξ) denotes the cdf of a univariate continuous distributions with parameter vector denoted by ξ.
These extensions needs further investigations; there is no guarantee as to their superior efficiency over the former TCP-G family is provided at this stage, opening new work chapters for the future.