On a New Result on the Ratio Exponentiated General Family of Distributions with Applications

: In this paper, we ﬁrst show a new probability result which can be concisely formulated as follows: the function 2 G β / ( 1 + G α ) , where G denotes a baseline cumulative distribution function of a continuous distribution, can have the properties of a cumulative distribution function beyond the standard assumptions on α and β (possibly different and negative, among others). Then, we provide a complete mathematical treatment of the corresponding family of distributions, called the ratio exponentiated general family. To link it with the existing literature, it constitutes a natural extension of the type II half logistic-G family or, from another point of view, a compromise between the so-called exponentiated-G and Marshall-Olkin-G families. We show that it possesses tractable probability functions, desirable stochastic ordering properties and simple analytical expressions for the moments, among others. Also, it reaches high levels of ﬂexibility in a wide statistical sense, mainly thanks to the wide ranges of possible values for α and β and thus, can be used quite effectively for the real data analysis. We illustrate this last point by considering the Weibull distribution as baseline and three practical data sets, with estimation of the model parameters by the maximum likelihood method.


Introduction
The sharp and fine analysis of modern data sets often requires in-depth statistical treatments, beyond the capabilities of the usual statistical models. That is, particular attentions was made to define models with new features in this regard, motivating the development of general families of distributions having the ability to generate flexible distributions. Among these families, we may mention the skew-normal family pioneered by [1], the exponentiated-generated (Exp-G) family proposed by [2], the Marshall-Olkin-generated (MO-G) family studied by [3], the order statistics-generated family introduced by [4], the Sinh-arcsinh-generated family developed by [5], the beta-generated (B-G) family introduced by [6], the transmuted-generated (T-G) family developed by [7], the gamma-generated (Gam-G) family proposed by [8] and the Pareto ArcTan family developed by [9]. Each of them has generated a plethora of distributions and statistical models, widely used in practice.
In order to illustrate their diversity, we now succinctly present some of these families from the mathematical point of view. The most simple one uses the power transform; the Exp-G family is defined by the following cumulative distribution function (cdf): F(x; α, ξ) = G(x; ξ) α , x ∈ R, where G(x; ξ) denotes a cdf of a continuous distribution, with ξ as a generic parameters vector, and α > 0. Based on the geometric series expansion as a prime structure, the MO-G is defined by the following cdf: F(x; θ, ξ) = 1 − θ(1 − G(x; ξ))/[1 − (1 − θ)(1 − G(x; ξ))], x ∈ R, where θ > 0. Centered around the so-called beta function, the B-G family is defined by the following cdf: F(x; α, β, ξ) = B G(x;ξ) (α, β)/B 1 (α, β), x ∈ R, where B y (α, β) = y 0 t α−1 (1 − t) β−1 dt, and α, β > 0. By the use of the quadratic rank transmutation, the T-G family is defined by the following cdf: F(x; λ, ξ) = G(x; ξ) [(1 + λ) − λG(x; ξ)], x ∈ R, with λ ∈ [−1, 1]. Finally, emerging from the so-called gamma function, the Gam-G family is defined by the following cdf: F(x; α, ξ) = γ (α, − log[1 − G(x; ξ)]) /Γ(α), where γ(α, y) = y 0 t α−1 e −t dt, Γ(α) = lim y→+∞ γ(α, y) and α > 0. In this paper, we contribute to the subject by going further some standard assumptions; we first show that the following function: remains a valid cdf under wide assumptions on the parameters α and β (allowing possible different and/or negative values). This cdf can be viewed as an extension of the type II half logistic-G (TIIHL-G) family by [10] or, alternatively, an hybrid version of the cdfs corresponding to the Exp-G and MO-G families. More specifically, if α = 0, it corresponds to the cdf of the Exp-G family, if β = α = 1, it corresponds to the cdf of the MO-G family with parameter θ = 1/2 (also corresponding to the M-G family by [11]) and, if β = α, it becomes the cdf of the type II half logistic-G (TIIHL-G) family. However, to the best of our knowledge, the general case including possible β = α remains unexplored, and is the motor of this study. We thus introduced the ratio exponentiated general (or generated) (RE-G) family of distributions defined by the cdf (1), with values of α and β to be specified later. We investigate some interesting mathematical properties of the family, including the analytical expressions the main corresponding functions, useful stochastic ordering results, analysis of the asymptotes and mode(s) for the probability density and hazard rate functions, series expansions of the probability density function, various measures involving moments and general formula related to the maximum likelihood method. Then, we pay a particular attention on a special member of the family based on the Weibull distribution as baseline. It constitutes a new and simple four-parameters distribution with many attractive features for the statistician. In particular, we show that the corresponding probability density and hazard rate functions enjoy flexible shape properties, which are desirable for modelling purposes. Thus, the related model is able to capture the complexity of various kinds of data. We illustrate this claim by the use of the maximum likelihood method (validated by a simulation study), and the means of three different practical sets. Our model reveals to be competitive in comparison to other five strong model competitors, with notable gain in terms of well-established criteria.
This paper is organized as follows. In Section 2, we present the mathematical foundations of the RE-G family. Some notable properties are derived in Section 3. Numerical studies are provided in Section 4, including a golden member of the RE-G family, simulation study and analyses of three practical data sets, with discussions. Section 5 draws some concluding remarks.

The RE-G Family
Here, we give the essential of the RE-G family, including its genesis and the main corresponding functions.

Central Result
The RE-G family is based on the following new theoretical result. Theorem 1. Let α, β ∈ R, G(x; ξ) be the cdf of a continuous distribution and F(x; α, β, ξ) be the function defined by Then, F(x; α, β, ξ) has the properties of a cdf if β = 0 and β ≥ α/2, or if β = 0 and α < 0.
Now, by denoting g(x; ξ) the pdf corresponding to G(x; ξ), upon differentiation with respect to x, we have, almost everywhere, We claim that Theorem 1 provides a contribution to the TIIHL-G family proposed by [10]. Indeed, we recall that the TIIHL-G family is defined with the cdf F(x; α, β, ξ) with β = α only; Theorem 1 shows that it can be significantly extended and enriched with wider ranges of values for α and/or β, allowing negative values for α and β as well. To the best of our knowledge, such a result is unexplored in the literature and opens new perspectives of work. The next sections are devoted to the prime theoretical or practical features of the RE-G family.

Definition and Main Functions
The definition of the RE-G family, as well as some important functions, are described below. Based on Theorem 1, we define the RE-G family by the cdf given by where α and β are two shapes parameters satisfying β = 0 and β ≥ α/2, or β = 0 and α < 0 (these assumptions will hold implicitly in the sequel of the study). Upon differentiation of F(x; α, β, ξ) with respect to x (almost everywhere), the corresponding probability density function (pdf) is obtained as This pdf will be central to express important measures and functions of the RE-G family (moments, likelihood function, etc.). Also, the possible shapes of this pdf are informative on the nature of the related model. On the other side, the corresponding survival function (sf) is given by The corresponding hazard rate function (hrf), reverse hazard rate function (rhrf) and cumulative hazard rate function (chrf) are, respectively, given by When the support of the distribution related to G(x; ξ) is (0, +∞), as any lifetime baseline distribution, the sf, hrf, rhrf and chrf are involved in a plethora of applications in survival analysis (see [12]). The quantile function (qf), say Q(u; α, β, ξ), can be obtained by inverting F(x; α, β, ξ), i.e., it satisfies the following non-linear equation: F(Q(u; α, β, ξ); α, β, ξ) = u for u ∈ (0, 1). In full generality, there is no closed-form for this function. As basic approach, the only identifying closed-form are /u] 1/β ; ξ , and Q G (u; ξ) denotes the qf corresponding to G(x; ξ). Thanks to the qf, one can express the quartiles and octiles of the considered distribution, as well as several measures of skewness and asymmetry, such as the Bowley or MacGillivray skewness (see [13]).

Mathematical Properties
In this section, we derive the main mathematical properties of the RE-G family, with discussions.

Stochastic Ordering Results
Here, we investigate some stochastic ordering relation between the RE-G and Exp-G families according to the values of α and β.
First of all, as alpha results, We thus see the importance of the parameter α according to its sign, regarding an immediate stochastic hierarchy between the RE-G and Exp-G families.
The following result proposes a refinement of the upper bound for F(x; α, β, ξ).
Hence, Proposition 1 shows the deep relation existing between the RE-G and Exp-G families. Also, for practical purposes, the above result proves that the RE-G family reached different targets in terms of modelling in comparison to the Exp-G family; the RE-G models can be more adequate to the Exp-G models, depending on the nature of the data.

On the RE-G pdf
Let us now present some properties on the curvature of f (x; α, β, ξ), which can be informative for fitting purposes (uni/multimodality nature, polynomial/exponential decay on the tails, etc.).
First of all, when x → −∞, let us distinguish the cases α > 0 on the one hand, then α = 0 and α < 0 on the other hand.
. For a given G(x; ξ), a possible polynomial or exponential decay of the limiting functions characterize the heaviness nature of the tails of the corresponding RE-G distribution. The mode(s) of a distribution gives an important information of the related model, mainly on its uni/multimodality nature. Here, the mode(s) is(are) given by the critical point(s) of f (x; α, β, ξ), which is (are) given by the solution(s) of the following non-linear equation Then, a mode, say For a given baseline distribution and parameters, the use of a mathematical software is required to determine the numerical value of a mode.

On the RE-G hrf
Now, let us present some properties on the curvature on h(x; α, β, ξ), which is informative on several survival analysis aspects (see [12]). When x → −∞, let us distinguish the cases α > 0 on the one hand, then α = 0 and α < 0 on the other hand.
i.e., the hrf corresponding to G(x; ξ). The critical point(s) of h(x; α, β, ξ) is (are) given by the solution(s) of the following equation: The nature of a critical point, say x c , can be determined by studying the sign of Again, there is no closed-form for x c ; a mathematical software seems necessary to have an efficient numerical approximation.

Series Expansions
Now, we claim that the pdf of the RE-G family can be expressed as an infinite linear combination of pdfs of the Exp-G family, in a similar fashion to the pdfs of the families developed by [8,14,15], among others. This is formulated in the result below.
Proof. Let us distinguish the cases α > 0 on the one hand, then α = 0 and α < 0 on the other hand.
This ends the proof of Proposition 2.
The interest of Proposition 2 is the use of some well-known properties and definitions of the Exp-G family to derive those of the RE-G family. This point is illustrated for the moments and some crucial functions in the next subsection.

Moments: Related Measures and Functions
Let X be a random variable having the cdf of the RE-G family, i.e., given by (2). Then, for any function φ(x), by the transfer theorem, the expectation of φ(X) is given by (provided that it exists). The considered domain of integration is R in full generally; it can be reduced, depending on the supports of φ(x) and g(x; ξ) only. By using the change of variables u = G(x; ξ), it can also be expressed as For given G(x; ξ) and parameters, we can provide a numerical evaluation of this integral by using any mathematical software. For analytical purposes, Proposition 2 can be of interest; it implies that for a large enough integer K. From the numerical point of view, this approximation may be more efficient that compute directly the integral form of Θ φ (X), which can be prone to rounding off errors, as discussed in [14]. Some notable measures and functions derived to Θ φ (X) are listed below. By the m th incomplete moment with respect to t follows by taking φ(x) = x m if x ≤ t, and 0 elsewhere, and, by choosing φ(x) = e itx , i = √ −1, we get the characteristic function of X with respect to t. Further applications of these measures and functions under the forms (4) can be found in [8,14,15], among others.

Maximum Likelihood Method
Here, we adopt a general statistical point view. We consider the RE-G models and investigate the estimation of the models parameters by a very efficient estimation method: the maximum likelihood method, for complete sample only. Let x 1 , . . . , x n be n independent observations of a random variable having the pdf of the RE-G family, i.e., given by (3). Then, the log-likelihood function is defined by Hence, the maximum likelihood estimates (MLEs) of α, β and ξ, sayα,β andξ, respectively, are defined by (α,β,ξ) = argmax α,β,ξ (α, β, ξ). This maximization can be performed either directly by using any statistical software such as R (with the package AdequacyModel) or SAS (with the procedure PROC NLMIXED), or by solving the nonlinear likelihood equations obtained by differentiating (α, β, ξ) with respect to the model parameters. In this regard, the score function is useful; it can be expressed as U(α, β, ξ) = (∂ (α, β, ξ)/∂α, ∂ (α, β, ξ)/∂β, ∂ (α, β, ξ)/∂ξ), whose elements are given in Appendix. Thus, the solutions of the system of non-linear equations: U(α, β, ξ) = 0 with respect to α, β and ξ givesα,β andξ. Also, the observed information matrix can be expressed analytically, allowing to define the corresponding standard errors, and so on. The complete theory can be found in [16].

Numerical Studies
This section is devoted to the applicability of the RE-G family in a concrete statistical setting. First of all, we introduce a special member of interest, then we investigate the efficiency of the MLEs of the related parameters and analyze three practical data sets, with discussions.

The REW Distribution
For practical purposes, we pay a particular attention on special member of the RE-G family defined with the Weibull distribution with parameters µ > 0 and θ > 0, as baseline. We call it the REW distribution. Thus, we aim to improve some characteristics of the Weibull distribution (and the related model as well), such as the skewness, kurtosis and heaviness of the tails, by tuning the parameters α and β in (2). Hence, by using (2), the REW distribution is defined by the following cdf and pdf: Here, µ > 0 is a scale parameter, and β, θ > 0 and α ∈ R with β ≥ α/2 are shape parameters (α can be negative). Also, the corresponding hrf is given by The REW distribution constitutes a new four-parameters lifetime distribution, whose pdf and hrf enjoy attractive flexible properties for modelling purposes, as illustrated in Figures 1 and 2. From Figure 1, we see that the pdf be can left skewed, right skewed, near symmetrical and also presents reverse J shapes. The tails of the distribution can be more or less heavy, mainly depending on the values of µ; the tails is heavier as µ decreases. We also observe various degrees of leptokurtic, mesokurtic and platykurtic shapes. As a notable fact, Figure 2 reveals that the REW distribution has all the possible monotonic and non-monotonic hazard rate shapes, such as increasing, decreasing, decreasing-increasing-decreasing, constant, bathtub and upside-down bathtub shapes.
In addition, all the general theoretical properties presented in the above section can be applied without efforts.
Naturally, one can consider other special members of the RE-G family by choosing other baseline distributions, depending on the context of the study. In particular, the conjoint actions of the parameters α and β can be of interest to modulate the properties of the tails of distributions somehow rigid on this aspect. We may think to the ("only" right heavy-tailed) Pareto distribution, whose extensions has been the object of all the attentions these last years. We may refer to [9,17], and the references therein.
The next of the study focus on the numerical aspects of the REW model. The software R is used (see [18]), with the package AdequacyModel, allowing to analyze the adequacy of statistical models via several statistics for a given data set. For all the technical details, we refer the reader to [19].

Simulation Study
Here, turning out the REW distribution as a statistical model with parameters α, β, µ and θ, we propose a Monte-Carlo simulation study to evaluate the accuracy of the MLEsα,β,μ andθ, as described in Section 3.6. Under some technical conditions, it is well-known that the MLEs are asymptotically unbiased and convergent. We illustrate these properties by proceeding as follows.      Tables 1 and 2 show that, as the sample size n increases, the MLEs are closed to the true values of the parameters and the corresponding MSEs tend to 0, which is in concordance with the theoretical properties of the MLEs.

Data Fitting
Application of the REW model is provided by the means of the following three practical data sets. Data set 1: Carbon dioxide. The first data set represents the annual mean growth rate of carbon dioxide at Mauna Loa (Hawaii). The measurement are given in parts per million year. The data are available at the earth system research laboratory website, from the following electronic link: https://www.esrl.noaa.gov/gmd/ccgg/trends/gr.html We have analyzed these data from the period of 1959 to 2014.
Data set 2: Annual flood discharges. The second data set was originally reported by [20]. It represents the maximum annual flood discharges of the North Saskachevan of the North Saskachevan River at Edmonton, over a period of 47 years. The measurement are given in 1000 cubic feet per second.
Data set 3: Rainfall. The third data consists of the mean of maximum daily rainfall for 30 years  at 35 stations in the middle and west of peninsular Malaysia. This data was recently studied by [21]. The data are given as follows: As immediate remarks, the overall histogram shape of Carbon dioxide is near symmetrical, the one of Annual flood discharges is highly right skewed and the one of Rainfall is moderately left skewed; these data sets are of different nature. In the next, as expected, we show that REW model has the ability to fit these data in an efficiency manner, as developed below.
In this regard, we aim to compare the fits of the REW model with those of five other solid models, also defined with the Weibull distribution as baseline, namely: transmuted-Weibull (TW) model studied by [22], Marshall-Olkin exponential Weibull (MOEW) model introduced by [23], odd log-logistic modified-Weibull (OLLMW) model proposed by [24], Kumaraswamy-Weibull (KW) model introduced by [15] and Beta-Weibull (BW) model proposed by [25].
The values of the MLEs of the models parameters are given (with four decimals) in Tables 3-5 for Carbon dioxide, Annual flood discharges and Rainfall, respectively.  For fitting comparison purposes, we consider the following criteria: complete minus log-likelihood (−ˆ ), i.e., the minus log-likelihood of the model taken at the corresponding MLEs, Akaike information criterion (AIC), Cramér-von-Mises (W), Anderson-Darling (AD) and Kolmogorov-Smirnov (KS) statistics, as well as the p-value of the related KS test. They are, respectively, defined by and p-value = P(D n ≥ KS), with D n = sup x∈R |F n (x) −F(x)|, where p is the number of parameters of the considered model, x (1) , . . . , x (n) are the ordered observations, y i =F(x (i) ), whereF(x) denotes the corresponding cdf of the model defined with the corresponding MLEs for the parameters and F n (x) denotes the random empirical cdf. Details on these measures can be found in [26]. We also refer to [27] concerning the KS test. For a given data set and model, the values of these measures are listed in the output of the command goodness.fit of the R package AdequacyModel.
The rule is universal: a lower value of (−ˆ ), AIC, W, AD and KS, and a larger value of the p-value of the KS test, indicate a better fit. The obtained results for the considered models are given in Tables 6-8 for Carbon dioxide, Annual flood discharges and Rainfall, respectively. For the three data sets, Tables 6-8 indicate that the REW model has the smallest value for −ˆ , AIC, BIC, W, AS and K-S, and the largest value of the p-value for the KS test (which is near optimal in each case, in the sense that p-value ≈ 1), indicating that it provides the best fits among those of the concurrence.
We illustrate this sharpness but plotting various fits of the REW model, as the curves of the fitted pdfs, cdfs and sfs, i.e.,f (x) = f (x;α,β,μ,θ),F(x) = F(x;α,β,μ,θ) andŜ(x) = S(x;α,β,μ,θ), over the corresponding histograms, empirical cdfs and empirical sfs, respectively, and the probability-probability (P-P) plots in Figures 3-5 for Carbon dioxide, Annual flood discharges and Rainfall, respectively. In all the visual investigations of Figures 3-5, near perfect fits are observed, attesting the adequateness of the REW model for further studies on these data sets, among others.

Concluding Remarks
The paper started by providing a new theoretical result, involving a generalized cumulative distribution function, which can be viewed as an extension of the one defining the TIIHL-G family. It is called the ratio exponentiated-generated (RE-G) family. Then, we used this new result to elaborate upon a new attractive flexible family of distributions from the statistical point of view. In particular, a new modified four-parameter Weibull distribution is derived, called the REW distribution. We show how it can be applied in a concrete statistical framework, involving the analysis of data sets with different features. In particular, it is proved that the REW model can perform quite better than the following five other well-reputed models: TW, MOEW, OLLMW, KW and BW models, all having a plethora of applications in various articles. The same fate is expected for the REW model. As a perspective of work following the same scheme, since the RE-G and MO-G families are complementary in their definitions, one can consider a generalization of the MO-G family by providing an answer to the following question: What are the possible values for α, β and θ such that the following function has the properties of a cdf?
This expression is motivated by the following facts: for α = β = 1, F(x; α, β, θ, ξ) becomes the cdf of the MO-G family (not covered by the RE-G family), and, on the other side, by taking θ = 1/2, we get F(x; α, β, θ, ξ) = [G(x; ξ) α + G(x; ξ) β ]/[1 + G(x; ξ) α ], corresponding to the cdf of the M-G family by [11] with two baseline exponentiated cdfs with power parameters α and β. This perspective thus unifies this two families, generating a myriad of new ratio-type distributions with possible wide