Normal-G Class of Probability Distributions: Properties and Applications

In this paper, we propose a novel class of probability distributions called Normal-G. It has the advantage of demanding no additional parameters besides those of the parent distribution, thereby providing parsimonious models. Furthermore, the class enjoys the property of identifiability whenever the baseline is identifiable. We present special Normal-G sub-models, which can fit asymmetrical data with either positive or negative skew. Other important mathematical properties are described, such as the series expansion of the probability density function (pdf), which is used to derive expressions for the moments and the moment generating function (mgf). We bring Monte Carlo simulation studies to investigate the behavior of the maximum likelihood estimates (MLEs) of two distributions generated by the class and we also present applications to real datasets to illustrate its usefulness.


Introduction
For many purposes, statistical distributions are used in a plethora of science fields.They are regularly useful tools to describe natural and social phenomena, providing suitable models which can help dealing with real problems, such as for instance, those concerning the prediction of an event of interest.Recent works have focused attention at formulating and describing new classes of probability distributions, which are defined generally as extensions of widely known models by adding a single or more parameters to the cumulative distribution function (cdf).Hopefully, the new models will provide more flexibility and better fitting to real data.Some examples are [1,2] where a shape parameter is added to the model by exponentiating the cdf.A general method of introducing a parameter to expand a family of distributions was presented by [3]; they applied the method to create a new two-parameter extension of the exponential distribution and a new three-parameter Weibull distribution.
A natural generalization of the Normal pdf was proposed by [4] and perhaps it is the most widely known generalized Normal distribution.The power 2 appearing in the original pdf was replaced by a shape parameter s > 0. Therewith, the new pdf becomes: where K is a normalizing constant, which depends on σ and s.One can see that the Laplace distribution is a particular case of the generalized Normal of Nadarajah [4] when s = 1.
Azzalini [5] defined a mathematically tractable class that includes strictly (not just asymptotically) the Normal distribution.The general pdf of the class is 2G(λy) f (y) for −∞ < y < ∞, where λ ∈ R, G is an absolutely continuous cdf, d dy G and f are pdfs symmetric about 0. Making G = Φ and f = φ, namely the standard normal cdf and pdf respectively, one gets to the well-known skew-normal distribution, whose pdf is φ(y; λ) = 2φ(y)Φ(λy).It is easy to see that φ(y; 0) = φ(y), but when λ = 0, the distribution is asymmetric and its coefficient of skewness has the same sign as λ.
A generalization denoted by compressed normal distribution was introduced by [6], whose objective was dealing with negatively skewed data (specifically with human longevity data); in this way, they induced a skew by adding kx to the denominator of the location-scale transformation, that is, t(x) = x − µ σ + kx and when k < 0, the curve presents a negative skew; for k > 0, a positive skew occurs.
Classes with one or more additional parameters usually generalize existing classes as particular cases.The McDonald-Weibull distribution [7] is an important sub-model of the McDonald class; it has three extra parameters and includes the Beta-Weibull [8] and the Kumaraswamy-Weibull [9] as special cases.
A technique to derive families of continuous distributions using a pdf as a generator was introduced by [10] and the models emerged from such method are called members of the T-X family.In other words, if r(t)dt is the cdf of a new family of distributions.An example of a T-X family member is the Gompertz-G class [11]; to define its cdf, the chosen functions were W[G(x)] = − log[1 − G(x)] and r(t) = θe γt e − θ γ (e γt −1) for t > 0, given that θ > 0, γ > 0. Varying G(x), one can get different sub-models of the class.
The procedure to define a T-X family member is indeed capable to generalize a large number of distributions.Even though it can be regarded as a particular case described by the method of generating classes of probability distributions presented in the recent work of [12].This new method has a high power of generalization.It consists of creating distribution classes by integrating a cdf, such that the limits of the integration are special functions that satisfy some conditions.Thus, the cdf of the general class is given by: where H is a cdf, n ∈ N, ζ, ν : R → R and L j , U j , M j , V j : R → R ∪ {±∞} are the aforementioned special functions that will be discussed in the next section.
Based on this innovative method, we introduce the Normal-G class of distributions.We consider that this extension will yield good submodels.This paper aims to investigate and compare some of them with other competitive extended probability distributions.

The Normal-G Class and Some Mathematical Properties
The method established by [12] states that if H, ζ, ν : R → R and L j , U j , M j , V j : R → R ∪ {±∞} for j = 1, 2, 3, . . ., n are monotonic and right continuous functions such that: H is a cdf and ζ and ν are non-negative; (c2) ζ(x), U j (x) and M j (x) are non-decreasing and ν(x), V j (x), L j (x) are non-increasing ∀j = 1, 2, 3, . . ., n; ) and H(t) = Φ(t); the function in Equation (1) turns into: where G(x) is a cdf.Since ν(x) = 0, there is no need to specify M 1 (x) and V 1 (x).The conditions (c1), (c7), (c8) and (c10) are straightforward; clearly (c4), (c5) and (c9) do not need to be verified in this case.Given that G(x) is non-decreasing: Therefore, according to the method exposed above, Equation ( 2) is a cdf and, from now on, we will denote it by Normal-G class of probability distributions.The new cdf can be viewed as a composed function of G(x), which will be referred as parent distribution or baseline; in agreement with [12], if the baseline is continuous (discrete), then the Normal-G will generate a continuous (discrete) distribution, whose support will be the same as G(x).It is worth remarking that the proposed class demands no additional parameters other than the ones of the parent distribution.
Although the Normal-G class has been defined as a composed function of a single G(x), it is possible to formulate classes that depend on more than one baseline; see [12] for further details.
We can rewrite Equation (2) as: and since φ(t) = 1 √ 2π e −t 2 /2 , and Φ(x) = x −∞ φ(t)dt, we get to: In case of continuous G(x), we can take the derivative of Equation ( 4) with respect to x: The expression in Equation ( 5) is the pdf of the class Normal-G, whose hazard rate function (hrf) is given by: Many distributions presented in the statistical literature undergo the problem of non-identifiability.One cannot assume that the parameters of a non-identifiable model will be uniquely determined from a set of observed random variables; in other words, inferences on the parameters may not be reliable.As the Theorem 1 states, the Normal-G class is exempt from this problem, whenever the parent distribution G satisfies the property of identifiability.Theorem 1.If the cdf F G belongs to the Normal-G class and the cdf G is identifiable, then F G is identifiable.
Proof of Theorem 1.Given that 0 < G(x|ξ j ) < 1 for j = 1, 2, where ξ j is a parametric vector and assuming that F G (x|ξ 1 ) = F G (x|ξ 2 ), we have: Since the function Φ is injective, we can write: The left-hand side of Equation ( 6) is necessarily positive for almost all x ∈ R, whereas the right-hand side is negative, a contradiction.Thereby,

Special Normal-G Sub-Models
Here we present two distributions from the Normal-G class.

The Normal-Weibull Distribution
Weibull is one of the most used models to describe natural phenomena and failure of several kinds of components.It is extensively used in survival analysis and reliability.In recent times, many authors have focused on new extensions for it, such as [13,14].The two-parameter Weibull cdf is given by G W (x|k, λ) = 1 − e −(x/λ) k for x ≥ 0, where k, λ > 0. Replacing the baseline G in Equation ( 4) by G W , we get to the Normal-Weibull cdf, namely: for x ≥ 0. Using Equation ( 5) to write the corresponding pdf, we have: Plots of pdf and hrf of the Normal-Weibull distribution for different values of the parameters are portrayed in Figure 1.The different shapes of the hrf curve evince the flexibility of the model.Particularly for k = 1, the Weibull distribution is equivalent to an Exponential distribution, so the hrf is constant; in contrast, the Normal-Exponential model has an increasing hrf in some left-bounded interval.In Figure 2, the vertical axis shows the range of values of Pearson's moment coefficient of skewness, which depends on the parameters k and λ.We can see in the graph that the Normal-Weibull distribution is also able to fit data with either positive or negative skew.

The Normal-Log-Logistic Distribution
The Log-logistic distribution is commonly applied to reliability and oftentimes it works well as a lifetime model.Its cdf is given by G for x ≥ 0, where α, β > 0. The Normal-log-logistic cdf is easily obtained replacing the parent distribution G in Equation ( 4) by G LL .Thus: for x ≥ 0. Taking the derivative of Equation ( 9) with respect to x, we get to the pdf: Figure 3 shows plots of pdf and hrf for different values of α and β.It is worth noting that the Normal-log-logistic distribution may have a decreasing hrf of early failure.It is also possible for the hrf to be increasing or unimodal.Pearson's moment coefficient of skewness for the Normal-log-logistic distribution is depicted in Figure 4.

Series Representation
The normal cdf is related to the error function erf as follows: where erf(z) = 2 √ π z 0 e −t 2 dt.Provided that erf(z/ √ 2) can be linearly represented by: replacing Equation (12) in Equation (11), we obtain: Now, considering |G(x)| < 1, we can write: and replacing z of the right member of Equation ( 13) by the expression in Equation ( 14), we have: The right member of Equation ( 15) has two factors, namely, A1 and A2, that can be rewritten as power series.Concerning to A1, the binomial theorem allows us to write: It is a known result related to power series raised to powers that: where Setting N = 2n + 1 and a k = 1 for all k ≥ 0, we get to the expression A2 in Equation ( 15) and we can use the result in Equation ( 17) to write as follows: Now replacing A1 and A2 of the Equation ( 15) by the right members of the Equations ( 16) and ( 18) respectively, we obtain the result below: The Fubini's theorem on differentiation allows us to write the derivative of Equation ( 19) with respect to x as follows: Since g k−j (x) is the pdf of a random variable of the exponentiated family, as described in [15,16], one can say that ( 20) is the Normal-G pdf (5) expressed as a linear combination of pdfs of exponentiated distributions.Such useful property is typically found and detailed in works on new classes of distributions; see for instance: [17][18][19][20].

Quantile Function
By inverting Equation (4), the quantile function associated with the Normal-G class is obtained.For simplification, let us write v = F G (x). From Equation (4) we have: that is, a quadratic equation for G(x), that admits the following two solutions: If the first solution above is picked, then G(x) might assume values lesser than 0 (see v = 0.95 for example).On the other hand, the second one allows us to verify that 0 < G(x) < 1 is valid for all x ∈ R. Finally, we can write the quantile function of Equation ( 4) as follows: such that Q G (•) is the quantile function of the baseline G.A uniform random number generator and (21) make the simulation of random variables following (3) quite simple.Namely, if Z ∼ U (0, 1), then Q F (Z) follows a Normal-G distribution.

Raw Moments, Incomplete Moments and Moment Generating Function
Provided that X follows a Normal-G distribution, the rth raw moment of 20) and r ∈ Z * + .Using Fubini's theorem to change the order of integration and series, we have: where Y k−j follows the exponentiated distribution whose pdf is g k−j (x) shown in Equation ( 20).Despite the upper infinity limit in the sums, expressions like Equation ( 23) are not intractable.According to [21], one can get fairly accurate results truncating each infinite sum by 20; they used numerical routines to compute accurately similar expressions for the moments of some Kumaraswamy generalized distributions.
The rth moment can also be represented in terms of the quantile function of the baseline.Defining u = G k−j (x) and replacing x in Equation ( 22) by Q G u 1/(k−j) , we have: The rth incomplete moment of X is given by the following expression: where T * r (z) is the rth incomplete moment of Y k−j .One can also write Equation (24) in terms of the quantile function of G: The mgf is a function associated with a random variable, whose moments can be straightforwardly derived using it.It is also useful to check whether two functions of random variables are equal since there is a bijection between pdfs and mgfs (when they exist).The mgf M X (t) of X is the expected value of e tX , where t ∈ (−ι, ι), ι > 0. Given that M Y k−j (t) is the mgf of Y k−j , on the lines of Equation ( 23), we can write:

Estimation and Inference
Attractive asymptotic properties, such as efficiency and consistency, are some of the reasons that make the maximum likelihood method the most usually applied method of parametric point estimation.The MLEs are the points that maximize the likelihood function over the domain of the parameter space.Since the logarithmic function is increasing, performing the maximization of the log-likelihood function, besides being a more convenient task, also provides the MLEs.
Given that ξ = (ξ 1 , . . ., ξ r ) is the r × 1 parametric vector of a random variable X that follows a Normal-G distribution, G(x|ξ) = G ξ (x) is the baseline, g(x|ξ) = g ξ (x) is its corresponding pdf and X = (x 1 , . . ., x m ) is a complete random sample of size m from X, then the log-likelihood function is: Thanks to powerful functions available within the software for statistical computing, it is possible to use numerical methods to maximize (25); for this purpose, R [22] brings the function optim in package stats.
The MLEs can also be obtained by solving the system of equations U(ξ|X) = 0 r , where U(ξ|X) = ∇ ξ (ξ|X) = (u i ) 1≤i≤r is the score vector, such that: and 0 r is a r × 1 vector of zeros.
The information matrix J(ξ|X) is essential to construct confidence intervals and to test hypotheses on ξ.The expectation of J(ξ|X) is the expected Fisher information matrix I ξ and under certain conditions of regularity, √ m( ξ − ξ) follows approximately a multivariate normal distribution N r 0 r , I ξ −1 .The expression for J(ξ|X) is presented in Appendix A.

Simulation Study
We used the free software R version 3.4.4[22] to carry out the Monte Carlo simulation study; the number of replications was 10,000.The pseudo-random samples were generated via Von Neumann's acceptance-rejection method [23].This simple procedure requires the corresponding pdf y = f (x), a minorant and a majorant for x and a majorant for y; it is not necessary to implement the quantile function in this case.Four sample sizes, namely n = 50, 100, 200 and 500, and five different values for the vector of parameters were considered.For each scenario, we calculated the bias and the mean squared error (MSE) as follows: where ξ i is the i-th element of the vector of parameters ξ = (ξ 1 , . . ., ξ r ) and ξ ij is the estimate for ξ i at the j-th replication.The log-likelihood function was maximized using the technique of simulated annealing, available by the optim subroutine, for which the user has to pass a vector ξ 0 of initial values.At first, we took ξ 0 = 1 r , namely a r × 1 vector of ones, then we run one single replication considering sample size n = 50; the obtained estimates from this procedure were assigned to ξ 0 and used in all of the aforementioned scenarios.
The results for both parameters of the Normal-Weibull density (8), shown in Table 1, indicate that the estimates are fairly close to the actual values.Moreover, as it would be expected, the bigger the sample size, the smaller the MSEs.The results given in Table 2 suggest that the estimates of the parameters of the Normal-log-logistic model (10) have similar behavior of those shown in Table 1, that is to say, the biases are quite small and the MSE decreases as the sample size increases.
Figure 5 shows the histogram of soil fertility data and the fitted densities with the three lowest values of AIC among the distributions in the first column of Table 4.Although the Normal-Weibull and Exponentiated Weibull curves appear to be very close, the blue one (NW) seems to be closer to the histogram.The modified versions of Anderson-Darling (A * ) and Cramér-von Mises (W * ) statistics (more details in [27]) are typically used to investigate the quality of fit of probabilistic models.Table 5 brings these statistics concerning the fitted models to soil fertility data.The measures portrayed in Table 5 represent the difference between the empirical distribution function and the real underlying cdf; hence we will consider that the models with lower values of A * and W * fit the data better.Therefore, once again the Normal-Weibull distribution beats the competing models.
In Figure 6 the histogram of eruption data and the fitted densities with the three lowest values of AIC among the distributions in the first column of Table 7 are depicted.By a visual comparison, the three curves are apparently good approximations to the histogram, but the Normal-log-logistic's seems to explain the behavior of the data more accurately.Table 8 provides the values of A * and W * of the distributions in the first column of Table 7.These statistics suggest that GoLL and NLL models fit the eruption dataset very closely.Nonetheless, in order to pick a more parsimonious model, one should prefer the NLL, since it has fewer parameters than GoLL.It is worth mentioning that [17] proposed the new class Exponentiated Kumaraswamy-G and fitted one of its submodels (with Weibull as baseline) to the same eruption dataset.It presented A * = 0.7594 and W * = 0.1037, whereas NLL presented lower values of these statistics as one can check in Table 8.

Concluding Remarks
Based on the method of generating classes of probability distributions presented by [12], we introduce a new class called Normal-G.It has the advantage of demanding no additional parameters besides the baseline ones.We demonstrate that the proposed class generates identifiable sub-models as long as the parent distribution is identifiable.The pdf of the class can be written as a linear combination of pdfs of exponentiated distributions; it allows us to easily derive the raw moments, the incomplete moments and the moment generating function.
We bring Monte Carlo simulation studies to attest the good performance of the MLEs of two distributions generated by the class and to illustrate its usefulness, applications to real datasets are made.The fitted models are compared to other competitive distributions regarding the Anderson-Darling and the Cramér-von Mises statistics, as well as commonly used information criteria as goodness-of-fit measures.The general results indicate that the Normal-G outperforms the other distributions in comparison.The new class is powerful and provides parsimonious models, which may hopefully interest practitioners of statistics, soil science, oceanography and other fields.

Figure 1 .
Figure 1.Plots of pdf and hrf for the Normal-Weibull distribution.

Figure 3 .
Figure 3. Plots of pdf and hrf for the Normal-log-logistic distribution.

Figure 5 .
Figure 5. Histogram of soil fertility dataset and fitted densities.

Figure 6 .
Figure 6.Histogram of eruption dataset and fitted densities.

Table 1 .
Bias and MSE of the estimates under the maximum likelihood method for the Normal-Weibull model.

Table 2 .
Bias and MSE of the estimates under the maximum likelihood method for the Normal-log-logistic model.

Table 3 .
Descriptive statistics for soil fertility dataset.

Table 4 .
Fitted distributions to the soil fertility dataset (estimates and information criteria).

Table 6 .
Descriptive statistics for eruption dataset.

Table 7 .
Fitted distributions to the eruption dataset (estimates and information criteria).