1. Introduction
The sharp and fine analysis of modern data sets often requires in-depth statistical treatments, beyond the capabilities of the usual statistical models. That is, particular attentions was made to define models with new features in this regard, motivating the development of general families of distributions having the ability to generate flexible distributions. Among these families, we may mention the skew-normal family pioneered by [
1], the exponentiated-generated (Exp-G) family proposed by [
2], the Marshall-Olkin-generated (MO-G) family studied by [
3], the order statistics-generated family introduced by [
4], the Sinh-arcsinh-generated family developed by [
5], the beta-generated (B-G) family introduced by [
6], the transmuted-generated (T-G) family developed by [
7], the gamma-generated (Gam-G) family proposed by [
8] and the Pareto ArcTan family developed by [
9]. Each of them has generated a plethora of distributions and statistical models, widely used in practice.
In order to illustrate their diversity, we now succinctly present some of these families from the mathematical point of view. The most simple one uses the power transform; the Exp-G family is defined by the following cumulative distribution function (cdf): , , where denotes a cdf of a continuous distribution, with as a generic parameters vector, and . Based on the geometric series expansion as a prime structure, the MO-G is defined by the following cdf: , , where . Centered around the so-called beta function, the B-G family is defined by the following cdf: , , where , and . By the use of the quadratic rank transmutation, the T-G family is defined by the following cdf: , , with . Finally, emerging from the so-called gamma function, the Gam-G family is defined by the following cdf: , where , and .
In this paper, we contribute to the subject by going further some standard assumptions; we first show that the following function:
remains a valid cdf under wide assumptions on the parameters
and
(allowing possible different and/or negative values). This cdf can be viewed as an extension of the type II half logistic-G (TIIHL-G) family by [
10] or, alternatively, an hybrid version of the cdfs corresponding to the Exp-G and MO-G families. More specifically, if
, it corresponds to the cdf of the Exp-G family, if
, it corresponds to the cdf of the MO-G family with parameter
(also corresponding to the M-G family by [
11]) and, if
, it becomes the cdf of the type II half logistic-G (TIIHL-G) family. However, to the best of our knowledge, the general case including possible
remains unexplored, and is the motor of this study.
We thus introduced the ratio exponentiated general (or generated) (RE-G) family of distributions defined by the cdf (
1), with values of
and
to be specified later. We investigate some interesting mathematical properties of the family, including the analytical expressions the main corresponding functions, useful stochastic ordering results, analysis of the asymptotes and mode(s) for the probability density and hazard rate functions, series expansions of the probability density function, various measures involving moments and general formula related to the maximum likelihood method. Then, we pay a particular attention on a special member of the family based on the Weibull distribution as baseline. It constitutes a new and simple four-parameters distribution with many attractive features for the statistician. In particular, we show that the corresponding probability density and hazard rate functions enjoy flexible shape properties, which are desirable for modelling purposes. Thus, the related model is able to capture the complexity of various kinds of data. We illustrate this claim by the use of the maximum likelihood method (validated by a simulation study), and the means of three different practical sets. Our model reveals to be competitive in comparison to other five strong model competitors, with notable gain in terms of well-established criteria.
This paper is organized as follows. In
Section 2, we present the mathematical foundations of the RE-G family. Some notable properties are derived in
Section 3. Numerical studies are provided in
Section 4, including a golden member of the RE-G family, simulation study and analyses of three practical data sets, with discussions.
Section 5 draws some concluding remarks.
3. Mathematical Properties
In this section, we derive the main mathematical properties of the RE-G family, with discussions.
3.1. Stochastic Ordering Results
Here, we investigate some stochastic ordering relation between the RE-G and Exp-G families according to the values of and .
First of all, as alpha results,
if , we have ,
if , the reversed inequality holds: .
Indeed, if , we have so and if , we have so . We thus see the importance of the parameter according to its sign, regarding an immediate stochastic hierarchy between the RE-G and Exp-G families.
The following result proposes a refinement of the upper bound for .
Proposition 1. For any (such that and , or and ), we have Proof. Let us distinguish the two following cases: and on the one hand, and and on the other hand.
- Case I:
and . We can express
as
Now remark that since , implying the desired result.
- Case II:
and . We can express
as
Since , the desired result follows.
This ends the proof of Proposition 1. □
Hence, Proposition 1 shows the deep relation existing between the RE-G and Exp-G families. Also, for practical purposes, the above result proves that the RE-G family reached different targets in terms of modelling in comparison to the Exp-G family; the RE-G models can be more adequate to the Exp-G models, depending on the nature of the data.
3.2. On the RE-G pdf
Let us now present some properties on the curvature of , which can be informative for fitting purposes (uni/multimodality nature, polynomial/exponential decay on the tails, etc.).
First of all, when , let us distinguish the cases on the one hand, then and on the other hand.
If , we have .
If , we have .
If , we have .
Also, when , we have . For a given , a possible polynomial or exponential decay of the limiting functions characterize the heaviness nature of the tails of the corresponding RE-G distribution.
The mode(s) of a distribution gives an important information of the related model, mainly on its uni/multimodality nature. Here, the mode(s) is(are) given by the critical point(s) of
, which is (are) given by the solution(s) of the following non-linear equation:
, with derivative with respect to
x, where
Then, a mode, say is a local maximum if , it is a local minimum if and it is a point of inflexion if . For a given baseline distribution and parameters, the use of a mathematical software is required to determine the numerical value of a mode.
3.3. On the RE-G hrf
Now, let us present some properties on the curvature on
, which is informative on several survival analysis aspects (see [
12]). When
, let us distinguish the cases
on the one hand, then
and
on the other hand.
If , we have .
If , we have .
If , we have .
Also, when , we have , where , i.e., the hrf corresponding to .
The critical point(s) of
is (are) given by the solution(s) of the following equation:
, where
The nature of a critical point, say , can be determined by studying the sign of . Again, there is no closed-form for ; a mathematical software seems necessary to have an efficient numerical approximation.
3.4. Series Expansions
Now, we claim that the pdf of the RE-G family can be expressed as an infinite linear combination of pdfs of the Exp-G family, in a similar fashion to the pdfs of the families developed by [
8,
14,
15], among others. This is formulated in the result below.
Proposition 2. There exist two sequences of real numbers and such that, for x satisfying (excluding the limit cases), we havewhere is the pdf of the Exp-G family with power parameter . Proof. Let us distinguish the cases on the one hand, then and on the other hand.
- Case I:
. Since
, the standard power series expansion gives
Upon differentiation with respect to
x, we get
Hence, we can take and .
- Case II:
. We have immediately , so the desired result holds with and for any , and .
- Case III:
. By using again the standard power series expansion, we get
Upon differentiation with respect to
x, we get
Hence, we can take and .
This ends the proof of Proposition 2. □
The interest of Proposition 2 is the use of some well-known properties and definitions of the Exp-G family to derive those of the RE-G family. This point is illustrated for the moments and some crucial functions in the next subsection.
3.5. Moments: Related Measures and Functions
Let
X be a random variable having the cdf of the RE-G family, i.e., given by (
2). Then, for any function
, by the transfer theorem, the expectation of
is given by
(provided that it exists). The considered domain of integration is
in full generally; it can be reduced, depending on the supports of
and
only. By using the change of variables
, it can also be expressed as
For given and parameters, we can provide a numerical evaluation of this integral by using any mathematical software.
For analytical purposes, Proposition 2 can be of interest; it implies that
for a large enough integer
K. From the numerical point of view, this approximation may be more efficient that compute directly the integral form of
, which can be prone to rounding off errors, as discussed in [
14]. Some notable measures and functions derived to
are listed below. By taking
,
becomes the mean of
X denoted by
, by choosing
, we obtain the
raw moment of
X, by taking
,
becomes the
central moment of
X, the
incomplete moment with respect to
t follows by taking
if
, and 0 elsewhere, and, by choosing
,
, we get the characteristic function of
X with respect to
t. Further applications of these measures and functions under the forms (
4) can be found in [
8,
14,
15], among others.
3.6. Maximum Likelihood Method
Here, we adopt a general statistical point view. We consider the RE-G models and investigate the estimation of the models parameters by a very efficient estimation method: the maximum likelihood method, for complete sample only. Let
be
n independent observations of a random variable having the pdf of the RE-G family, i.e., given by (
3). Then, the log-likelihood function is defined by
Hence, the maximum likelihood estimates (MLEs) of
,
and
, say
,
and
, respectively, are defined by
. This maximization can be performed either directly by using any statistical software such as R (with the package
AdequacyModel) or SAS (with the procedure
PROC NLMIXED), or by solving the nonlinear likelihood equations obtained by differentiating
with respect to the model parameters. In this regard, the score function is useful; it can be expressed as
, whose elements are given in
Appendix A. Thus, the solutions of the system of non-linear equations:
with respect to
,
and
gives
,
and
. Also, the observed information matrix can be expressed analytically, allowing to define the corresponding standard errors, and so on. The complete theory can be found in [
16].
5. Concluding Remarks
The paper started by providing a new theoretical result, involving a generalized cumulative distribution function, which can be viewed as an extension of the one defining the TIIHL-G family. It is called the ratio exponentiated-generated (RE-G) family. Then, we used this new result to elaborate upon a new attractive flexible family of distributions from the statistical point of view. In particular, a new modified four-parameter Weibull distribution is derived, called the REW distribution. We show how it can be applied in a concrete statistical framework, involving the analysis of data sets with different features. In particular, it is proved that the REW model can perform quite better than the following five other well-reputed models: TW, MOEW, OLLMW, KW and BW models, all having a plethora of applications in various articles. The same fate is expected for the REW model. As a perspective of work following the same scheme, since the RE-G and MO-G families are complementary in their definitions, one can consider a generalization of the MO-G family by providing an answer to the following question: What are the possible values for
,
and
such that the following function has the properties of a cdf?
This expression is motivated by the following facts: for
,
becomes the cdf of the MO-G family (not covered by the RE-G family), and, on the other side, by taking
, we get
, corresponding to the cdf of the M-G family by [
11] with two baseline exponentiated cdfs with power parameters
and
. This perspective thus unifies this two families, generating a myriad of new ratio-type distributions with possible wide range of values for
,
and
, beyond the standard positive values. The complete answer remains a mathematical challenge, needing deep investigations in the future.