1. Introduction
The twenty-first century begun with establishing and extending new tools for modern statistics. In terms of distribution theory, one of the important developments is to define new useful models and then these are tested on real-life data sets available from simple to complex phenomenons. The modern distribution theory has also motivated statisticians and practitioners to propose new generalized (G) families and to investigate their special models, which can effectively be used in different fields, in particular, medicine, reliability engineering, agriculture, survival analysis, demography, actuarial study and others. The G-families proposed by Azzalini [
1] (skew-Normal-G (SN-G)), Marshall and Olkin [
2] (Marshall-Olkin-G (MO-G)), Gupta et al. [
3] (exponentiated-G (exp-G) [Lehmann alternative 1 (LA1) and Lehmann alternative 2 (LA2)]), Eugene et al. [
4] (beta-G), Gleaton and Lynch [
5] (odd log-logistic-G (OLL-G)), Shaw and Buckley [
6] (transmuted-G), Zografos and Balakrishnan [
7] (ZBgamma-G), Cordeiro and de-Castro [
8](Kumaraswamy-G (Kw-G)), Alexander et al. [
9] (McDonald-G (Mc-G)), Ristić and Balakrishnan [
10] (RBgamma-G), Cordeiro et al. [
11] (exponentiated-generalized-G (EG-G)), Bourguignon et al. [
12] (odd Weibull-G (OW-G)), Tahir et al. [
13] (odd generalized-exponential (OGE-G)), Tahir et al. [
14] (logistic-X) and Rezaei et al. [
15] (Topp-Leone-G (TL-G)) have received increased attention in recent statistical literature. For more G-families, the reader is referred to Tahir and Nadarajah [
16], and Tahir and Cordeiro [
17].
Kumaraswamy [
18] pioneered a two-parameter model for bounded unit interval
which we denote here by using random variable (rv)
. The cumulative distribution function (cdf) and probability density function (pdf) of
T are
and
respectively, where
and
are shape parameters.
Cordeiro and de-Castro [
8] defined the cdf and pdf of the Kw-G family by
and
where
and
are two additional shape parameters, and
is the vector of baseline parameters.
The Kw-G family has received wide-spread recognition and more than sixty special models have been studied so far, namely: exponential, exponentiated-exponential, Weibull, exponentiated-Weibull, modified Weibull (Lai et al. [
19]), flexible-Weibull (Bebbington et al. [
20]), generalized power Weibull, log-logistic, half-logistic, Lomax, Burr, Kumaraswamy, generalized gamma, exponentiated-gamma, generalized Rayleigh, Pareto, generalized Pareto, Pareto-IV, Gumbel, exponentiated-Gumbel (type-II), Fréchet, Laplace, Gompertz, Gompertz-Makeham, normal, inverse Gaussian, skew-normal, generalized half-normal, Birnbaum-Saunders, skew-
t, Nadarajah-Haghighi, linear failure rate, quadratic hazard rate, Lindley, quasi-Lindley, Lindley-Poisson, Sushila, half-Cauchy, inverse exponential, inverse Rayleigh, inverse Weibull, inverse Weibull-Poisson, inverse flexible-Weibull, modified inverse Weibull (using LA2), Fisher-Snedecor, compound-Rayleigh, exponential-Rayleigh, exponential-Weibull (compounded), exponentiated-Chen, generalized Kappa, generalized extreme-value, Weibull-geometric (WG), complementary WG, Marshall-Olkin exponential, Marshall-Olkin Fréchet (MOFr), Marshall-Olkin Lindley, transmuted Weibull, transmuted Pareto, transmuted modified-Weibull (Sarhan-Zaindin), transmuted exponentiated modified Weibull, transmuted exponentiated additive Weibull and transmuted MOFr.
Some other special models of the Kw-G family were also reported in the literature but these suffer non-identifiability issue (when two parameters appear, for example, in a product and it is impossible to determine their individual effects). These special models are: power function, Burr III, generalized linear failure rate, exponentiated-Pareto, exponentiated Burr and exponentiated-Lomax.
Note 1. The citations and the references of the authors of special models of the Kw-G family [
8] are avoided in this section and in references to save some space.
Alzaatreh et al. [
21] proposed a general method for constructing G-families by using the
transformed-transformer (T-X) approach. Let
be the pdf and
be the cdf of a rv
for
and let
be a function of the cdf
or survival function (sf)
of any baseline rv (
is known as generator) such that
satisfies three conditions:
- (i)
,
- (ii)
is differentiable and monotonically non-decreasing, and
- (iii)
and .
The cdf of the
T–X family is
where
satisfies the conditions (i)–(iii).
The pdf corresponding to Equation (
5) is
The main motivation for proposing new G-family are:
- (i)
Constructing new and novel G-families as a function of a cdf,
, is a difficult task in these days. A few pioneer G-families were developed in the literature considering
viz. exponentiated-G with power parameter
(LA1 and LA2) (Gupta et al. [
3]) [
], beta-G (Eugene et al. [
4]) [
], ZBgamma-G (Zografos and Balakrishnan [
7]) [
], odd log-logistic-G (Gleaton and Lynch [
5]) [
], RBgamma-G (Ristic and Balakrishnan [
10]) [
], log-odd logistic-G (Torabi and Montazeri [
22]) [
], Gumbel-X (Al-Aqtash et al. [
23]) [
], Weibull-X (T-X approach) and Weibull-X (Ahmad et al. [
24]) [
] are the pioneer works. Other G-families either non-composite (alone based on well-established parent model) or composite (mixture of two G-families) and compounded G-families are the extensions or modifications of the above described pioneer G-families. For example, the generator
, where
was pioneered by (Eugene et al. [
4]) for defining the beta-G family, and later this generator was adopted by (Cordeiro and de-Castro [
8]; Alexander et al. [
9]; Rezaei et al. [
15]) for defining the Kw-G, Mc-G and TL-G families, respectively. Similarly, the odd generator
(where
) was suggested by (Gleaton and Lynch, [
5]) for proposing the odd log-logistic-G family, and it was adopted by (Bourguignon et al. [
12]; Torabi and Montazeri [
25]; Tahir et al. [
13]; Silva et al. [
26]; Cordeiro et al. [
27]; Alizadeh et al. [
28]; Cordeiro et al. [
29]; Hassan et al. [
30]; Hassan and Nassr [
31]; Maiti and Pramanik [
32]); El-Morshedy and Eliwa [
33], Alizadeh et al. [
34]; El-Morshedy et al. [
35]; Eliwa et al. [
36] for defining the odd Weibull-G, odd gamma-G, odd generalized-exponential-G, odd Lindley-G, odd Burr-G, odd power-Cauchy-G, odd half-Cauchy, odd additive Weibull-G, odd power-Lindley-G, odd Xgamma-G, odd flexible Weibull-H, odd log-logistic Lindley-G, odd Chen generator and exponentiated odd Chen-G, respectively, among others.
- (ii)
The proposed extension of the Kumaraswamy-G model is based on a new generator for instead of the only existing generator for which the beta-G, Kw-G, Mc-G and TL-G classes were developed so far.
- (iii)
The proposed generator
seems little complicated in comparison to earlier well-established generator for the unit interval but it has the ability to produce better estimates and goodness-of-fit (GoF) tests results that can make it distinguishable and attractive for applied researchers (as evident from the results in
Section 5 and
Section 7).
- (iv)
For most of the families and models, if the cdf is in closed form, then the quantile function (qf) can be straightforward to obtain. In some families and models, where the qf is based on some special functions such as beta, gamma, and others, then the qfs can only be determined by using power series. In our case, the cdf of the family is in closed form but the qf can be obtained only numerically.
Note 2. A complete and independent investigation of the properties and application of our proposed generator as a new family such as transmuted-G (Tr-G) and exponentiated-generalized-G (EG-G) will appear in another outlet very soon. It is noted here that the two G-families (Tr-G and EG-G) have not been developed from any existing parent model similar to our proposed one.
The paper is unfolded as follows. In
Section 2, we define the
new Kumaraswamy generalized (NKw-G) family. In
Section 3, some of its mathematical properties are determined from a useful linear representation of the family density. We investigate the asymptotics and shapes of the density and hazard rate, ordinary and incomplete moments, generating function, mean deviations and estimation of the model parameters. Several properties of a special model viz.
new Kumaraswamy Weibull (NKwW) distribution are discussed in
Section 4. A simulation study is also conducted to assess the performance of maximum likelihood estimators of the newly proposed model in this section. In
Section 5, the usefulness of new model is illustrated by means of two real-life data sets. In
Section 6, we define the Bivariate New Kumaraswamy G-family of distributions. In
Section 7, the usefulness of the new bivariate model is illustrated by means of a real-life data set. In fact, we prove empirically that our proposed model outperforms some well-known univariate and bivariate distributions. Finally,
Section 8 offers some concluding remarks.
The important feature of our article is that the proposed model from this new generalized family, NKwW, is better in performance as compared to some well-known (or well-established) generalized Weibull models selected from the statistical literature. It can be noted from
Section 5 that the Kolmogrov-Smirnov GoF statistic yields minimum GoF values along with high p-values of this statistic. Furthermore, it can also been observed from
Section 5 that the GoF values of some other well-established statistics such as Akaike information Criterion, Bayesian Information Criterion, Hannan-Quinn Information Criterion, Anderson-Darling and Cramér-von Mises for our propose model are smallest as compared to some important generalized Weibull models. This fact reveals that the performance and flexibility of our proposed model is better in comparison to all other competitive models, when applied to these selected real-life data sets. The same fact is valid for our proposed bivariate model (see,
Section 7).
3. Properties of the NKw-G Family
In this section, we obtain some mathematical properties of the NKw-G family.
3.1. Quantile Function
The most common and simplest method for generating random variates is based on the inverse cdf. For an arbitrary cdf, the quantile function (qf) is define as
. The qf of the NKw-G family can be determined by inverting (
7) and then solving the two non-linear equations numerically. We can use the following procedure:
- (i)
Set ;
- (ii)
Find numerically in using any Newton-Raphson algorithm;
- (iii)
Solving numerically for x in yields the qf of X.
3.2. Asymptotics
The following asymptotics for the density, distribution function and hrf of X hold.
Corollary 1. The asymptotics of Equations (
7)–(
9)
when or () are Corollary 2. The asymptotics of Equations (
7)–(
9)
when or () are 3.3. Analytic Shapes of the Density and Hazard Rate Function
The shapes of the density and hrf of
X can be described analytically. The critical points of the density of
X are the roots of the equation:
where
.
The critical points of the hrf of
X are obtained from the equation:
3.4. Linear Representation of the NKw-G Density
Here, we derive useful expansions for Equations (
7) and (
8) based on the concept of exponentiated distributions. For an arbitrary baseline cdf
, a rv is said to have the exponentiated-G (exp-G) distribution with power parameter
if its cdf and pdf are
respectively.
The properties of the exponentiated distributions were studied by many authors in recent years. We consider the generalized binomial expansion
which holds for any real non-integer
b and
. Using (
11) twice in the following expression
in Equation (
7), where
, we can write
, where
. Then, we can expand Equation (
7) as
Furthermore, using Mathematica, the power series holds
where
,
,
,
, etc.
By inserting Equation (
13) in Equation (
12) and noting that
, we obtain
where
By differentiating
, the NKwG density has the form
where
is the exp-G density with power parameter (
). Equation (
16) reveals that the NKw-G density function is a linear combination of exp-G densities. Then, some of its mathematical properties can be determined directly from those of the exp-G distribution.
3.5. Mathematical Properties
The formulae derived throughout the paper can be easily handled in most symbolic computation platforms such as
,
and
which have the ability to deal with analytic expressions of formidable size and complexity. Henceforth, let
be a rv with the exp-G distribution with power parameter
. We obtain some mathematical quantities of the NKw-G family from (
16) and those properties of the exp-G distribution. The exp-G properties are known for at least fifty distributions; see those distributions listed in Tahir and Nadarajah [
16].
First, the
nth ordinary moment of
X, say
, can be expressed from (
16) as
where
, and
is the qf of the baseline G. The quantities
are known for many G distributions as can been seen in those papers cited in Tahir and Nadarajah (2015).
Moments are important in any statistical analysis. Some of the most important features of a distribution can be studied through moments. For instance, the first four moments can be used to describe some characteristics of a distribution. Clearly, the central moments and cumulants of
X can be determined from (
17) using well-known relationships.
Second, the
nth lower incomplete moment of
X, say
, is
The last two integrals can be evaluated numerically for most G distributions.
The first incomplete moment is used to construct the Bonferroni and Lorenz curves (popular measures in economics, reliability, demography, insurance, and medicine) and to determine the totality of deviations from the mean and median of X (important statistics in statistical applications).
Third, for a given probability
, the Bonferroni and Lorenz curves (popular measures in economics, reliability, demography, insurance and medicine) of
X are given by
and
, respectively, where
can be found from the procedure described at the last paragraph of
Section 2.
Fourth, the total deviations from the mean and median are
and
, where
comes from (
7).
Fifth, the moment generating function (mgf)
of
X follows from (
16) as
where
is the mgf of
and
. Hence, we can obtain the mgfs of many special NKw-G distributions directly from exp-G generating function and Equation (
19).
3.6. Estimation
Here, we consider the estimation of the unknown parameters of the NKw-G family by the maximum likelihood method. The MLEs enjoy desirable properties and deliver simple approximations that work well in finite samples when constructing confidence intervals. The normal approximation for the MLEs can be handled either analytically or numerically.
The log-likelihood function
for the vector of parameters
from
n observations
has the form
The MLE of can be evaluated by maximizing . There are several routines for numerical maximization of in the R program (optim function), SAS (PROC NLMIXED), Ox (sub-routine MaxBFGS), among others.
All distributions belonging to the NKw-G family can be fitted to real data using the
AdequacyModel package for the
R statistical computing environment (
https://www.r-project.org/). An important advantage of this package is that it is not necessary to define the log-likelihood function and that it computes the MLEs, their standard errors and some GoF statistics. We only need to provide the pdf and cdf of the distribution to be fitted to a data set.
Alternatively, we can differentiate the log-likelihood and solving the resulting nonlinear likelihood equations. Then, the score components with respect to
a,
b and
are
where
and
are column vectors of the same dimension of
.
Setting the score components to zero and solving them simultaneously yields the MLEs of the model parameters. The resulting equations cannot be solved analytically, but some statistical softwares can be used to solve them numerically through iterative Newton-Raphson type algorithms.
For interval estimation and hypothesis tests on the model parameters, we can obtain the observed information matrix numerically (p is the dimension of ) since the expected information matrix is very complicated and requires numerical integration.
Under standard regularity conditions, we have , where means approximately distributed and is the expected information matrix. The asymptotic behavior remains valid if is replaced by the observed information matrix evaluated at , i.e., . The multivariate normal distribution can be used to construct approximate confidence intervals for the model parameters.
5. Empirical Illustrations of NKwW Model
In this section, we compare the NKwW distribution with some well-known extended (or generalized) Weibull distributions. To check the potentiality of the new distribution, we use two real data sets representing different hydrological events such as precipitation and flood. We compare the NKwW model with the Kumaraswamy-Weibull (KwW) (Cordeiro et al. [
37]), beta-Weibull (BW) (Lee et al. [
38]), exponentiated-generalized Weibull (EGW) (Oguntunde et al. [
39]), McDonald-Weibull (McW) (Cordeiro et al. [
40]), gamma-Weibull (GaW) (Cordeiro et al. [
41]), odd log-logistc Weibull (OLLW) (da-Cruz et al. [
42]), Marshall-Olkin Weibull (MOW) (Ghitany et al. [
43]), transmuted-Weibull (TrW) (Khan et al. [
44]) and Weibull (W) models by means of two real-life data sets which are described below:
Data Set 1. Precipitation data. The data were taken from Katz et al. [
45] which represent the annual maximum precipitation (inches) for one rain gauge in Fort Collins, Colorado from 1900 through 1999. The data are: 239, 232, 434, 85, 302, 174, 170, 121, 193, 168, 148, 116, 132, 132, 144, 183, 223, 96, 298, 97, 116, 146, 84, 230, 138, 170, 117, 115, 132, 125, 156, 124, 189, 193, 71, 176, 105, 93, 354, 60, 151, 160, 219, 142, 117, 87, 223, 215, 108, 354, 213, 306, 169, 184, 71, 98, 96, 218, 176, 121, 161, 321, 102, 269, 98, 271, 95, 212, 151, 136, 240, 162, 71, 110, 285, 215, 103, 443, 185, 199, 115, 134, 297, 187, 203, 146, 94, 129, 162, 112, 348, 95, 249, 103, 181, 152, 135, 463, 183, 241.
Data set 2. Flood data. The data were taken from Asgharzadeh et al. [
46] which represent the maximum annual flood discharges (in units of 1000 cubic feet per second) of the North Saskachevan River at Edmonton, over a period of 48 years. The data are: 19.885, 20.940, 21.820, 23.700, 24.888, 25.460, 25.760, 26.720, 27.500, 28.100, 28.600, 30.200, 30.380, 31.500, 32.600, 32.680, 34.400, 35.347, 35.700, 38.100, 39.020, 39.200, 40.000, 40.400, 40.400, 42.250, 44.020, 44.730, 44.900, 46.300, 50.330, 51.442, 57.220, 58.700, 58.800, 61.200, 61.740, 65.440, 65.597, 66.000, 74.100, 75.800, 84.100, 106.600, 109.700, 121.970, 121.970, 185.560.
All the calculations in these two applications are performed using the package in R. The unknown parameters of the models are estimated by the maximum likelihood method. The log-likelihood function is evaluated at the MLEs (). The well-known GoFS such as the Akaike information criterion (AIC), Bayesian Information Criterion (BIC), Hannan-Quinn Information Criterion (HQIC), Anderson-Darling (), Cramér–von Mises () and Kolmogrov-Smirnov (K-S) are adopted for model comparisons. The lower values of GoFS and higher p-values of the K-S statistic indicate good fits.
Table 1 and
Table 2 list the MLEs and their standard errors (SEs) for the NKwW distribution and other competitive models (KwW, BW, EGW, McW, GaW, OLLW, MOW, TrW and W) fitted to the two hydrological data sets. The values of the GoFS in
Table 3 and
Table 4 indicate that the NKwW model shows small values of these statistics and hence it provides the best fit as compared to the other models. The plots in
Figure 6 and
Figure 7 also support our claim.
6. Bivariate New Kumaraswamy G-Family
In this Section, we introduce a bivariate extension of the NKw-G family according to Marshall and Olkin shock model (see, Marshall and Olkin, [
47]). Several authors used the Marshall and Olkin approach as a method to generate bivariate distributions, see for example Sarhan and Balakrishnan, [
48], Kundu and Dey [
49], El-Gohary et al. [
50], Muhammed [
51], El-Bassiouny et al. [
52], Ghosh and Hamedani [
53], El-Morshedy et al. [
54,
55], Eliwa et al. [
56], among others. The bivariate new Kumaraswamy (BvNKw) G-family is constructed from three independent NKw-G families by using a minimization process. Assume three independent rvs
NKw-G
and defining
, the bivaraite random vector
is said to have the BvNKw-G family with parameters vector
(
,
) if its joint reliability function (jrf) is given by
The marginal reliability functions (rfs) corresponding to (
28) can be written as
The corresponding joint pdf (jpdf) to (
28) can be formulated as
where the jpdf in Equation (
30) can be derived from a well-known formula (see Eliwa and El-Morshedy, [
57]). The marginal pdfs corresponding to Equation (
29) can be proposed as
If
have the BvNKw-G family, then the distributions of
and
are
respectively. If
NKw-G
,then the coefficient of correlation between
and
is
The BvNKw-G family has a singular part along the line
with weight
, whereas on
with weight
, the BvNKw-G family has an absolute continuous part. Assume
where
NKw-G
, the jrf of the proposed family can be derived by using copula of the Marshall-Olkin model as
where
. For more details on copula property, see Gijbels et al. [
58] and Husková and Maciak [
59].
Using Equations (
28) and (
30), the joint hrf (jhrf) can easily be reported by using
.
Figure 8,
Figure 9 and
Figure 10 show the jpdf, jhrf, and jrf for different values of the BvNKw-Weibull (BvNKwW) parameters.
The MLE for the BvNKw-G Family
In this section, the unknown parameters of the BvNKw-G family are estimated by using the maximum likelihood approach. Assuming that
,
is a sample of size
p from the BvNKw-G family where
,
,
,
and
. Using Equation (
30), the likelihood function
can be expressed as
Through differentiation of the term with respect to and , and then equating the resulting equations to zeros, we get the non-linear normal equations. An iterative procedure such as Newton–Raphson technique is required to solve them numerically.
8. Concluding Remarks
Proposing new and flexible models through G-classes is an active research area in distribution theory. The new era has proved that flexible models can prove very helpful to researchers and practitioners in investigating data genertated from different phenomenons. G-classes are one of the basic source which provides a paradigm to data related science and its investigation.
The purpose of our article is to contribute a new G-family and hence a new Kumaraswamy-G family of distributions is introduced from a new generator for support (0,1), that has ability to serve as an alternative to well-known Kumaraswamy-G family (pioneered in 2011) and other classes of distributions for . The proposed generator adopted here involves a different function of the cumulative function instead of existing generator which is only based on . In the literature, beta-G, Kw-G, Mc-G and TL-G families were introduced from the existing generator for bounded unit interval. Therefore, similar G-families can be developed from our proposed generator . We obtain some structural properties of this new Kumaraswamy-G family, and also study some properties of the special model called the new Kumaraswamy-Weibull (NKwW) distribution. We compare this distribution with the well-known generalized Weibull models (Kumaraswamy-Weibull, McDonald-Weibull, beta-Weibull, exponentiated-generalized Weibull, gamma-Weibul, odd log-logistic-Weibull, Marshall-Olkin-Weibull, transmuted-Weibull and Weibull) using six popular GoF test-statistics. We found that the new distribution provides better estimates and minimum GoF-tests values. Thus, the NKwW distribution outperforms the well-established competitive models on the basis of numerical and graphical analysis. Similarly, the BvNKwW distribution is introduced, and is compared with other well-known bivariate models such as bivariate generalized power Weibull, bivariate exponentiated Weibull, bivariate Weibull, bivariate generalized exponential, bivariate exponential, and bivariate generalized linear failure rate distributions. The results of popular goodness-of-fit statistics showed that our proposed bivariate model is better as compared to other well-known bivariate models. We expect that this new family will be able to attract readers and applied statisticians.