Next Article in Journal
A Modified FlowDroid Based on Chi-Square Test of Permissions
Previous Article in Journal
Maxwell Equations without a Polarization Field, Using a Paradigm from Biophysics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Regression Model for Generalized Normal Distributions

by
Ayman Alzaatreh
1,*,
Mohammad Aljarrah
2,
Ayanna Almagambetova
3 and
Nazgul Zakiyeva
4
1
Department of Mathematics and Statistics, American University of Sharjah, Sharjah P.O. Box 26666, United Arab Emirates
2
Department of Mathematics, Tafila Technical University, Tafila 66110, Jordan
3
Department of Mathematics, University of Amsterdam, 1098 XH Amsterdam, The Netherlands
4
Zuse Institute Berlin, 14195 Berlin, Germany
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(2), 173; https://doi.org/10.3390/e23020173
Submission received: 2 January 2021 / Revised: 26 January 2021 / Accepted: 26 January 2021 / Published: 30 January 2021
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
The traditional linear regression model that assumes normal residuals is applied extensively in engineering and science. However, the normality assumption of the model residuals is often ineffective. This drawback can be overcome by using a generalized normal regression model that assumes a non-normal response. In this paper, we propose regression models based on generalizations of the normal distribution. The proposed regression models can be used effectively in modeling data with a highly skewed response. Furthermore, we study in some details the structural properties of the proposed generalizations of the normal distribution. The maximum likelihood method is used for estimating the parameters of the proposed method. The performance of the maximum likelihood estimators in estimating the distributional parameters is assessed through a small simulation study. Applications to two real datasets are given to illustrate the flexibility and the usefulness of the proposed distributions and their regression models.

1. Introduction

Existing distributions do not always provide an adequate fit. Hence, generalizing distributions and studying their flexibility are of interest for researchers over recent decades. One of the earliest works on generating distributions was done by [1] who proposed a method of differential equation as a fundamental approach to generate statistical distributions. Ref. [2] also made a contribution in this category and developed another method based on differential equation. After that, other methods were developed such as the method of transformation [3] and the method of quantile function [4,5]. More recent techniques in generalizing statistical distributions emerged after the 1980s and can be summarized into five major categories [6]; the method of generating skew distributions, the method of adding parameters, the beta generated method, the transformed-transformer method, and the composite method.
The beta-generated (BG) family introduced by [7] has a cumulative distribution function (CDF) given by
G ( x ) = 0 F ( x ) b ( t ) d t ,
where b ( t ) is the probability density function (PDF) of the beta random variable and F ( x ) is the CDF of any random variable. The PDF corresponding to (1) is given by
g ( x ) = 1 B α , β f ( x ) F α 1 ( x ) 1 F ( x ) β 1 , α > 0 , β > 0 ; x Supp ( F ) ,
where Supp(F) is the support of F and B α , β = Γ ( α ) Γ ( β ) Γ α + β .
Since the proposal of BG family in 2002, several members of the BG family of distributions were investigated. For example, beta-normal [7,8,9], beta-Gumbel [10], beta-Frechet [11], beta-Weibull [8,12,13,14], beta-Pareto [15], beta generalized logistic of type IV [16] and beta-Burr XII [17]. Some extensions of the BG family are also appeared in literature such as Kw-G distribution [18,19], beta type I generalization [20], and generalized gamma-generated family [21].
The beta-generated family of distributions is formed by using the beta distribution in (1) with support between 0 and 1 as a generator. Ref. [22], in turn, were interested whether other distributions with different support can be used as a generator. They extended the family of BG distributions and defined the so called T-X family. In the T-X family, the generator b ( t ) was replaced by a generator r T ( t ) , where T is any random variable with support a , b . The CDF of the T-X family is given by
G ( x ) = 0 W [ F ( x ) ] r ( t ) d t ,
where W 0 , 1 is a link function that satisfies W ( 0 ) a and W ( 1 ) b . Ref. [23] studied a special case of the T-X family where the link function, W ( . ) , is a quantile function of a random variable Y . The proposed CDF is defined as
F X ( x ) = 0 Q Y [ F R ( x ) ] f T ( t ) d t = P T Q Y [ F R ( x ) ] = F T Q Y [ F R ( x ) ] ,
where T , R , and Y are random variables with CDF F T ( x ) = P T x , F R ( x ) = P R x , and F Y ( x ) = P Y x . The corresponding quantile functions are Q T ( p ) , Q R ( p ) , and Q Y ( p ) , where the quantile function is defined as Q Z ( p ) = inf { z : F Z ( z ) p } , 0 < p < 1 . If densities exist, we denote them by f T ( x ) , f R ( x ) , and f Y ( x ) . Now, if the random variables T a , b and Y c , d , for a < b , and c < d , then the corresponding PDF of (4) is given by
f X ( x ) = f R ( x ) × f T Q Y [ F R ( x ) ] f Y Q Y [ F R ( x ) ] .
If R follows the normal distribution N ( μ , σ 2 ) , then (5) reduces to the T-normal family of distributions [24] with PDF given by
f X ( x ) = 1 σ ϕ x μ σ × f T Q Y Φ x μ σ f Y Q Y Φ x μ σ ,
where ϕ ( . ) and Φ ( . ) are the PDF and CDF of the standard normal distribution, respectively.
The T-normal family is a general base for generating many different generalizations of the normal distribution. The distributions generated from the T-normal family can be symmetric, skewed to right, skewed to the left, or bimodal. Some of the existing generalizations of normal distribution can be obtained using this framework. In particular, some generalizations of the normal distribution are beta-normal [7], Kumaraswamy normal [19] and gamma-normal distribution [25].
Other generalizations of the normal distribution is the skew-normal, first considered by [26], and it is defined as
f X ( x ) = 2 ϕ ( x ) Φ ( λ x ) , x R , λ R .
Another generalization of the normal distribution is the power-normal distribution [27] with CDF given by
F ( x ) = Φ ( x ) α , x R , α R + .
Several properties of the power-normal distribution are studied by [27]. Recently, Ref. [28] proposed a new extension of the normal distribution.
The rest of the paper is organized as follows. In Section 2, we introduce a class of skew-symmetric model by using the logistic kernel and the normal distribution as the baseline distribution. In Section 3, we discuss some structural properties of the logistic-normal (henceforth, LN in short) distribution including moments, tail behavior, and modes. In Section 4, the maximum likelihood estimation method is considered to estimate the model parameters, and a small simulation study is implemented to evaluate the performance of the method. In Section 5, a generalized normal regression model based on skew-LN distribution is developed. In Section 6, applications to two real datasets are given to demonstrate the flexibility and the usefulness of the new distribution and its regression model. We conclude this paper by providing some concluding remarks in Section 7.

2. The Symmetric Logistic-G Family of Distributions

If T follows the logistic distribution with PDF f T ( x ) = λ e λ x ( 1 + e λ x ) 2 , λ > 0 and Y follows the standard logistic distribution ( λ = 1 ), then Equation (4) reduces to the Logistic-G family of distributions with CDF given by
F G ( x ) = G λ ( x ) G λ ( x ) + 1 G ( x ) λ , λ > 0 ; x S u p p ( G ) ,
where G ( . ) is the CDF of any baseline probability density function. A special case of (7) was studied in some details in [29]. The corresponding PDF of (7) is given by
f G ( x ) = λ g ( x ) G λ 1 ( x ) ( 1 G ( x ) ) λ 1 [ G λ ( x ) + ( 1 G ( x ) ) λ ] 2 ,
where g ( . ) is the PDF of G ( . ) .
Remark 1.
The Logistic-G family possesses the following properties
i. 
If g ( x ) in (8) is a symmetric PDF about μ, then the resulting f G ( x ) is a symmetric PDF about μ. i.e., the Logistic-G family in (7) preserves the symmetry property.
ii. 
If a random variable T follows the logistic distribution with scale parameter λ, then the random variable X = G 1 e T 1 + e T follows the Logistic-G family in (7).
iii. 
The quantile function of the Logistic-G family can be written as
Q G ( p ) = G 1 1 + p 1 1 1 / λ 1 , 0 < p < 1 .
Now setting G ( x ) to be the normal CDF with parameters μ and σ 2 , say G ( x ) = Φ x μ σ , then the Logistic-G family reduces to the Logistic-normal distribution with CDF given by
F N ( x ) = Φ λ x μ σ Φ λ x μ σ + 1 Φ x μ σ λ , x R ,
where λ > 0 , σ > 0 , and < μ < . The associated PDF of (10) is
f N ( x ) = λ ϕ x μ σ Φ λ 1 x μ σ Φ x μ σ λ 1 σ Φ λ x μ σ + 1 Φ x μ σ λ 2 , x R .
when λ = 1 , the logistic-normal (LN ( μ , σ , λ ) , henceforth in short) in (10) reduces to the normal distribution. Thus LN distribution is a generalization of the normal distribution. Furthermore, the LN distribution is a member of the T-normal family proposed by [24]. In Figure 1, graphs of standard LN distribution (where μ = 0 , σ = 1 ) for various values of λ are provided. Figure 1 shows that the logistic-normal PDF has several advantages, the parameter λ introduces the flexibility on kurtosis (see also Figure 2) and controls whether the distribution is unimodal or bimodal. Moreover, it appears that the bi-modality occurs when λ is approximately less than 0.5 .

3. Some Properties of LN Distribution

We begin our discussion by providing some useful remarks as listed below.
Remark 2.
Using (10), (11) and Remark 1, the following useful properties can be obtained
(i) 
It is easy to show from (11) that f N x + μ = f N μ x which implies that the LN λ , μ , σ is symmetric about the location parameter μ .
(ii) 
The mean and median of the LN distribution are μ which is the location parameter of the normal distribution.
(iii) 
The quantile function of the LN distribution can be written as
Q ( p ) = μ + σ Φ 1 1 + p 1 1 1 / λ 1 , 0 < p < 1 .
(iv) 
In order to generate random sample from the LN distribution, first simulate random sample, t i , i = 1 , 2 , , n , from logistic(λ) distribution and then compute x i = μ + σ Φ 1 e t i 1 + e t i .
Remark 3.
Using the fact that ϕ ( x ) = x ϕ ( x ) and setting the derivative of log f N ( x ) in (11) to 0 , one can show that Mode(s) of the LN distribution is/are at the point(s) x * = μ + σ z * , where z * satisfies the equation
z = ϕ ( z ) 1 Φ ( z ) ( λ 1 ) [ 1 2 Φ ( z ) ] Φ ( z ) 2 λ Φ λ 1 ( z ) Φ λ ( z ) + 1 Φ ( z ) λ + 2 λ , z R .
From Remark 3, it is easy to see that 0 satisfies Equation (12). Therefore f N ( x ) has a critical point at x = μ . We were able to observe numerically that for λ > 0.5 the distribution is always unimodal and hence, x = μ is the unique mode in this case. In addition, because of the fact that LN distribution is symmetric about μ for all values of λ , then for the bimodal case, if x = a < μ is a mode then the second mode will be at x = 2 μ a .
The tail behaviour of the standard LN distribution ( μ = 0 and σ = 1 ) as x ± are discussed in the following Lemma.
Lemma 1.
Let Z L N ( 0 , 1 , λ ) , then as z ± ,
f N ( z ) exp λ z 2 / 2 | z | λ 1 , λ > 0 .
Proof. 
As z , ϕ ( z ) exp z 2 / 2 , and 1 Φ ( z ) ϕ ( z ) z (see [17]). Consequently, as x , f N ( z ) = λ ϕ ( z ) Φ λ 1 z 1 Φ z λ 1 Φ λ ( z ) + 1 Φ ( z ) λ 2 ϕ ( z ) ϕ ( z ) z λ 1 exp λ x 2 / 2 z λ 1 . Similarly, as z , f N ( z ) e λ z 2 / 2 | z | λ 1 .
Lemma 1 implies that as Z ± , the tails of the standard LN distribution behave in similar way as the right tail of the function exp λ x 2 / 2 x λ 1 . Note that when 0 < λ < 1 , the tails of f N ( x ) approaches 0 slowly, while for λ > 1 , the tails of f N ( x ) approaches 0 faster, meaning that the tail weight increases for higher values of λ . A graphical representation of the association between the tail weight of LN and λ can be shown using the measure of Kurtosis defined by [30]. The Moore’s kurtosis is defined as
γ M = Q ( 7 / 8 ) Q ( 5 / 8 ) + Q ( 3 / 8 ) + Q ( 1 / 8 ) Q ( 6 / 8 ) Q ( 2 / 8 ) .
The values of Moore’s kurtosis of LN ( 0 , 1 , λ ) for various value of λ is depicted in Figure 2. It shows that as λ increases the Moore’s kurtosis increases. For 0 < λ < 1 , there is a sharp change in the kurtosis, while for λ > 1 the change is gradual. Figure 1 indicates that for λ < 1 , the tails of LN distribution are lighter than that of the normal distribution, while for λ > 1 the tails of LN distribution are heavier than that of the normal distribution.

Moments of LN Distribution

Using Remark 2 (ii), the rth moment of the LN distribution can be written as E X r = E σ Φ 1 e T 1 + e T + μ r , where the random variable T follows the logistic distribution with scale parameter λ . Therefore,
E X r = λ σ Φ 1 e t 1 + e t + μ r e λ t 1 + e λ t 2 d t .
Now, Φ 1 ( x ) = 2 erf 1 2 x 1 , where erf ( x ) = 2 π 0 x e ( t 2 ) d t . This implies that
E X r = E σ 2 erf 1 1 2 1 + e T 1 + μ r = j = 0 r r j 2 j / 2 σ j μ r j ξ j ,
where ξ j = λ erf 1 1 2 1 + e t 1 e λ t 1 + e λ t 2 d t .
ξ j can be evaluated using numerical integration from any available software such as R or S A S .
Remark 4.
Let X L N μ , σ , λ , then
i. 
From Remark 2 (i), the rth central moment E ( X μ ) r = 0 for any odd integer r .
ii. 
X L N μ , σ , λ implies that X = σ Z + μ where Z L N ( 0 , 1 , λ ) . Therefore,
E ( X r ) = k = 0 r r k σ k μ r k E ( Z k ) = even k r r j σ k μ r k E ( Z k ) .

4. Estimation and Simulation

In this section, the maximum likelihood method (MLE) is used to estimate the parameters of LN distribution. Moreover, a small simulation study is performed to assess the performance of the MLE method.

4.1. Parameter Estimation of LN Distribution

Let x 1 , x 2 , , x n be a random sample of size n taken from LN distribution. Then the log-likelihood function is given by
( λ , μ , σ ) = n log λ σ + i = 1 n log ϕ x i μ σ + ( λ 1 ) i = 1 n log Φ x i μ σ + ( λ 1 ) i = 1 n log 1 Φ x i μ σ 2 i = 1 n log Φ λ x μ σ + 1 Φ x μ σ λ .
The MLE of λ ^ , μ ^ , and σ ^ of the parameters λ , μ , and σ can be obtained by maximizing numerically the log-likelihood function in (14). The initial value of μ is taken to be the moment estimator x ¯ . The initial value of σ is taken to be the sample standard deviation, s . To obtain the initial value of the parameter λ , we use Remark 2 (iv) as follows; assume the random sample t i = log Φ x i x ¯ s 1 Φ x i x ¯ s , i = 1 , 2 , . . . , n is taken from the logistic distribution with parameter λ . By equating the population variance π 2 3 λ 2 of logistic distribution with the sample variance, s T 2 of the random sample t i and solving it for λ , we obtain λ 0 = 1 3 π s T .
The trust-region optimization routine in SAS (PROC IML and CALL NLPTR) is used in order to maximize the likelihood function in (14). The trust-region optimization routine is a powerful technique that can optimize complicated functions. It outputs the iteration details including parameter estimates, their standard errors, and the value of the gradient function at which iteration stops.

4.2. Simulation

In order to evaluate the performance of the ML method, a small simulation study is conducted with sample sizes n = 30 , 50 , 70 and with three different parameter combinations. The study involved computing and analyzing the relative bias [(Estimate-Actual)/Actual] and the standard deviation of the estimates. The results of the study are reported in Table 1.
From Table 1, it is observed that the ML estimate of the parameter μ is overestimated. Moreover, when λ < 1 , the ML estimates of λ and σ are overestimated. On the other hand, when λ > 1 , ML estimates of λ and σ are underestimated. Moreover, for small sample size(s) and when λ < 1 , MLE method does not perform well. In fact, standard deviations are higher than the corresponding estimated values. However, the results for higher sample sizes and when λ > 1 , it can be seen that the MLE method performs quite well in estimating the model parameters.

5. Skew-LN and Its Generalized Normal Regression Model

In this section, we first propose a skewed type of LN distribution that can be used to fit skewed dataset. In Section 5.2, we propose a location-scale regression model based on the skew-LN distribution.

5.1. Skew Logistic-Normal Distribution

For skewed data, one can generate a skew-LN distribution in various ways. Once way is by exponentiating the CDF of the LN distribution as
F ( x ) = Φ λ x μ σ Φ λ x μ σ + 1 Φ x μ σ λ α , α > 0 , λ > 0 , x R .
Note that when α = 1 , the skew-LN distribution in (15) reduces to LN distribution. Moreover, when λ = 1 , the skew-LN reduces to the eponentiated-normal distribution proposed by [27]. Finally, when α = λ = 1 , the skew-LN distribution reduces to normal distribution.
In order to analyze the skewness and kurtosis regions of the skew-LN distribution, the Refs. [30,31] measures were plotted against the parameter α and λ . Figure 3 shows that the distribution is right skewed for α , λ < 1 and left skewed for α > 1 , λ < 1 and α < 1 , λ > 1 . The plot of kurtosis in Figure 3 demonstrates the flexibility of the proposed distribution. For λ < 1 , the tails of the skewed LN can be heavier or lighter than that tail of the normal distribution.
The skew-LN distribution has several advantages; the parameter α introduces the flexibility on the skewness and the parameter λ introduces the flexibility on the kurtosis. Furthermore, the main advantage of the skew-LN when compared with Azzalini skew-normal is the flexibility of fitting data with wider range of skewness and kurtosis. Based on numerical calculations, for the Azzalini skew-normal, the Galton’s skewness ranges between −0.1443 and 0.1443 and the Moor’s kurtosis ranges between 1.1746 and 1.2460. However, for the skew-LN, the Galton’s skewness ranges between −0.3000 and 0.3000 and the Moor’s kurtosis ranges between 0.8000 and 1.6000. It is also worth mentioning that the skew-LN can be unimodal or bimodal and has closed form CDF which is not the case of Azzalini skew-normal distribution.

5.2. Generalized Normal Regression Model Based on Skew-LN Distribution

The traditional linear regression model that assumes normal residuals is applied extensively in engineering and science. However, the normality assumption of the model residuals is often ineffective. This drawback can be overcome by using a generalized normal regression model that assumes non-normal response Y . In this section, T is assumed to follow the skew-LN distribution. The following location-scale regression model is considered based on the skew-LN distribution
y i = x ̲ i T β ̲ + σ Z i , i = 1 , 2 , , n ,
where y i pertains to the response variable with a skew-LN distribution in (15), β ̲ = β 0 , β 1 , , β p T , and σ > 0 are unknown parameters. Every y i has a covariate vector x ̲ i T = 1 , x i 1 , , x i p that models the linear predictor μ i = x ̲ i T β ̲ . The random error Z i follows the skew-LN 0 , 1 , λ , α distribution.
Remark 5.
The skew-LN regression model in (16) has several nested regression models. These special cases are enumerated as follows:
1. 
The regression model in (16) is reduced to the traditional normal linear regression model when α = λ = 1 .
2. 
The exponentiated-normal (Exp-N) regression model is obtained when λ = 1 . This location-scale regression model is based on the power normal distribution introduced by [27].
3. 
The LN regression model based on the distribution (10) is obtained when α = 1 .
A sample of y 1 , x ̲ 1 , , y n , x ̲ n of n independent observations is considered, and the log-likelihood function for model (16) parameters θ ̲ = λ , α , σ , β ̲ T T is presented as
( θ ̲ ) = n log α λ σ + i = 1 n log ϕ ( z i ) + λ α 1 i = 1 n log Φ ( z i ) + λ 1 i = 1 n log 1 Φ ( z i ) α + 1 i = 1 n log Φ λ ( z i ) + 1 Φ ( z i ) λ ,
where z i = y i x ̲ i T β ̲ σ . The maximum likelihood θ ̲ ^ of the parameter vector θ ̲ can be obtained by maximizing the log-likelihood function in (17) numerically.

6. Applications

In this section, we apply the LN distribution and the generalized normal regression to two real-life datasets. The first dataset possesses a bimodal shape, and the fit of the LN distribution is compared with the mixture normal distribution. For the second application, the skew-normal regression model is compared with some nested sub-models and some other generalization of the normal regression models. Maximum likelihood method is used to estimate the model parameters.

6.1. Fitting LN Distribution to Buoys Data

In this subsection the LN distribution is fitted to a bimodal datasets using ML method. The dataset is obtained from National Data Buoy Center (NDBC). It represents the number of buoys situated in the North East Pacific: Buoy 46,005 (46 N, 131 W) for the time period 1 January 1983 to 31 December 2003. The data is available from [1]. The Histogram in Figure 4 shows that the distribution of the data possesses a bimodality shape, for this reason, we fitted the dataset to both LN and the mixture normal distributions. The results of the maximum likelihood estimates, the log-likelihood value, the AIC (Akaike Information Criterion) and the Kolmogorov-Smirnov (K-S) test statistic for the fitted distributions are reported in Table 2. Figure 4 displays both the empirical and the fitted cumulative distribution as well as the probability density functions for the fitted distributions. The results in Table 2 indicate that the LN distribution outperforms the mixture normal distribution. In fact, the fitted CDF in Figure 4 shows that the mixture normal distribution does not provide an adequate fit. The fact that the LN distribution has only three parameters adds an extra advantage to the distribution over the mixture normal distribution.

6.2. Modeling Real Estate Valuation Using the Generalized Normal Regression Model

The dataset contains historical data on the real estate market from June 2012 to May 2013 . The data is obtained from Sindian District in New Taipei City, Taiwan (for additional details, see [32]). The data consist of n = 414 transaction records of real estate property. The data can be used to establish the relationship between housing price (per unit area) and its predictive regressors. The following variables are used (for i = 1 , 2 , , 414 ). Response variable y is the housing price per unit area ( 10 , 000 New Taiwan Dollar/Ping, where 1 Ping = 3.3 m 2 ), the covariates are as follows: x i 1 is the transaction date (e.g., 2013.250 = 2013 March and 2013.500 = 2013 June), x i 2 is the house age (in years), x i 3 is the distance to the nearest MRT station (in meters), x i 4 is the number of convenience stores in the living circle on foot (integer), and x i 5 is the geographic coordinate, latitude (in degrees). The data are analyzed on the basis of the following skew-LN regression model
y i = β 0 + β 1 x i 1 * + β 2 x i 2 * + β 3 x i 3 * + β 4 x i 4 * + β 5 x i 5 * + σ Z i , i = 1 , , 414 ,
where the error terms Z i are independent random variables that assumed to follow the skew-LN 0 , 1 , λ , α distribution, and x i j * = ( x i j x ¯ j ) / s j , j = 1 , 2 , , 5 , are the standardized covariates, which are considered because of the fact that some covariates are measured using different scales. Additionally, the fit under the skew-LN regression model is compared with several regression models, including the regression model based on the beta-normal (BN) distribution [7], the regression model based on the skewed-normal (SN) distribution [26], and the extended normal (EN) regression model [28]. Furthermore, the skew-LN regression model is compared with its nested models, including LN, Exp-N, and normal regression. In this application, the model parameters are estimated using the maximum likelihood method and SAS programming language is used. The initial values of β 0 , , β 5 and σ are obtained from fitting the data to the normal regression model. The initial values of the other parameters are set to 1 . Table 3 shows the MLEs results of fitting skew-LN, LN, Exp-N, SN, EN, and normal regression models to the data.
The fitted skew-LN an LN regression models show that the estimates β 0 , , β 5 and σ are significant at 5% level of error. Table 4 presents the goodness of fit statistics including AIC, consistent AIC (AICC) and Bayesian information criterion (BIC). The goodness of fit statistics show that the skew-LN regression model outperforms the other regression models. We also notice that the LN regression model has the second-lowest values of AIC, AICC, and BIC. Hence, skew-LN and LN regression models can be used effectively to analyze the real estate valuation data.
The likelihood ratio (LR) statistic is utilized to compare the skew-LN regression model with its sub-models; normal, LN, and Exp-N regression models. The LR test statistic values and the corresponding p-values are given in Table 5. This Table shows that the skew-LN regression model has a better fit when compared with the other sub-models. The LN regression model also has a better fit when compared with the normal regression model.

7. Concluding Remarks

In this paper, two generalizations of the normal distribution namely; logistic-normal and skew logistic-normal distributions were investigated. Several mathematical and structural properties have been studied such as shape properties. The proposed generalizations of the normal distribution exhibit a great flexibility in modeling symmetric as well as skewed datasets. Moreover, new regression models based on both logistic-normal and skew logistic-normal were developed. Two real datasets were used to illustrate the applicability of the distributions and their regression models.
Future work could be devoted toward investigating other parameter estimation methods for the LN and the skew-LN distributions. The applicability of the skew-LN regression model to other fields could be further explored.

Author Contributions

Conceptualization, A.A. (Ayman Alzaatreh); Data curation, A.A. (Ayman Alzaatreh); Formal analysis, A.A. (Ayman Alzaatreh), M.A., A.A. (Ayanna Almagambetova) and N.Z.; Investigation, A.A. (Ayman Alzaatreh) and A.A. (Ayanna Almagambetova); Methodology, A.A. (Ayman Alzaatreh), M.A. and A.A. (Ayanna Almagambetova); Project administration, A.A. (Ayman Alzaatreh); Software, A.A. (Ayman Alzaatreh), M.A., A.A. (Ayanna Almagambetova) and N.Z.; Supervision, A.A. (Ayman Alzaatreh); Writing—original draft, A.A. (Ayman Alzaatreh), M.A., A.A. (Ayanna Almagambetova) and N.Z.; Writing—review & editing, A.A. (Ayman Alzaatreh), M.A. and A.A. (Ayanna Almagambetova). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by American University of Sharjah: AUS Open Access Fund.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

The authors are grateful for the comments and suggestions by the referees and the handling Editor. Their comments and suggestions have greatly improved the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pearson, K. Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. Philos. Trans. R. Soc. Lond. Ser. A 1895, 186, 343–414. [Google Scholar]
  2. Burr, I.W. Cumulative frequency functions. Ann. Math. Stat. 1942, 13, 215–232. [Google Scholar] [CrossRef]
  3. Johnson, N.L. Systems of frequency curves generated by methods of translation. Biometrika 1949, 36, 149–176. [Google Scholar] [CrossRef]
  4. Hastings, J.C.; Mostseller, F.; Tukey, J.W.; Windsor, C. Low moments for small samples: A comparative study of order statistics. Ann. Stat. 1947, 18, 413–426. [Google Scholar] [CrossRef]
  5. Tukey, J.W. The Practical Relationship Between the Common Transformations of Percentages of Counts and Amounts; Technical Report 36; Statistical Techniques Research Group, Princeton University: Princeton, NJ, USA, 1960. [Google Scholar]
  6. Lee, C.; Famoye, F.; Alzaatreh, A. Methods for generating families of univariate continuous distributions in the recent decades. WIREs Comput. Stat. 2013, 5, 219–238. [Google Scholar] [CrossRef]
  7. Eugene, N.; Lee, C.; Famoye, F. The beta-normal distribution and its applications. Commun. Stat.-Theory Methods 2002, 31, 497–512. [Google Scholar] [CrossRef]
  8. Famoye, F.; Lee, C.; Eugene, N. Beta-normal distribution: Bimodality properties and applications. J. Mod. Appl. Stat. Methods 2004, 3, 85–103. [Google Scholar] [CrossRef]
  9. Rego, L.C.; Cintra, R.J.; Cordeiro, G.M. On some properties of the beta normal distribution. Commun. Stat. Theory Methods 2012, 41, 3722–3738. [Google Scholar] [CrossRef]
  10. Nadarajah, S.; Kotz, S. The beta Gumbel distribution. Math. Probl. Eng. 2004, 4, 323–332. [Google Scholar] [CrossRef] [Green Version]
  11. Barreto-Souza, W.; Cordeiro, G.M.; Simas, A.B. Some results for beta Frechet distribution. Commun. Stat. Theory Methods 2011, 40, 798–811. [Google Scholar] [CrossRef]
  12. Cordeiro, G.M.; Simas, A.B.; Stosic, B.D. Closed form expressions for moments of the beta Weibull distribution. Ann. Braz. Acad. Sci. 2011, 83, 357–373. [Google Scholar] [CrossRef] [Green Version]
  13. Lee, C.; Famoye, F.; Olumolade, O. Beta-Weibull distribution: Some properties and applications to censored data. J. Mod. Appl. Stat. Methods 2007, 6, 173–186. [Google Scholar] [CrossRef]
  14. Wahed, A.S.; Luong, T.M.; Jong-Hyeon, J.J.H. A new generalization of Weibull distribution with application to a breast cancer dataset. Stat. Med. 2009, 28, 2077–2094. [Google Scholar] [CrossRef] [Green Version]
  15. Akinsete, A.; Famoye, F.; Lee, C. The beta-Pareto distribution. Statistics 2008, 42, 547–563. [Google Scholar] [CrossRef]
  16. De Morais, A.L. A Class of Generalized Beta Distributions, Pareto Power Series and Weibull Power Series Dissertation; Universidade Federal de Pernambuco: Recife, Brazil, 2009. [Google Scholar]
  17. Paranaiba, P.F.; Ortega, E.M.M.; Cordeiro, G.M.; Pescim, R.R. The beta Burr XII distribution with application to lifetime data. Comput. Stat. Data Anal. 2011, 55, 1118–1136. [Google Scholar] [CrossRef]
  18. Jones, M.C. Kumaraswamy’s distribution: A beta type distribution with tractability advantages. Stat. Methodol. 2009, 6, 70–81. [Google Scholar] [CrossRef]
  19. Cordeiro, G.M.; de Castro, M. A new family of generalized distributions. J. Stat. Comput. Simul. 2011, 81, 883–898. [Google Scholar] [CrossRef]
  20. Alexander, C.; Cordeiro, G.M.; Ortega, E.M.M.; Sarabia, J.M. Generalized beta-generated distributions. Comput. Stat. Data Anal. 2012, 56, 1880–1897. [Google Scholar] [CrossRef]
  21. Zografos, K.; Balakrishnan, N. On families of beta- and generalized gamma-generated distributions and associated inference. Stat. Methodol. 2009, 6, 344–362. [Google Scholar] [CrossRef]
  22. Alzaatreh, A.; Lee, C.; Famoye, F. A new method for generating families of continuous distributions. Metron 2013, 71, 63–79. [Google Scholar] [CrossRef] [Green Version]
  23. Aljarrah, M.A.; Lee, C.; Famoye, F. On generating T-X family of distributions using quantile functions. J. Stat. Distrib. Appl. 2014, 1, 2. [Google Scholar] [CrossRef]
  24. Alzaatreh, A.; Famoye, F.; Lee, C. T-normal family of distribution: A new approach to generalize the normal distribution. J. Stat. Distrib. Appl. 2014, 1, 16. [Google Scholar] [CrossRef]
  25. Alzaatreh, A.; Famoye, F.; Lee, C. The gamma-normal distribution: Properties and applications. Comput. Stat. Data Anal. 2014, 69, 67–80. [Google Scholar] [CrossRef]
  26. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  27. Gupta, R.C.; Gupta, R.D. Analyzing skewed data by power-normal model. Test 2008, 17, 197–210. [Google Scholar] [CrossRef]
  28. Lima, M.C.S.; Cordeiro, G.M.; Ortega, E.M.M.; Nascimento, A.D.C. A new extended normal regression model: Simulations and applications. J. Stat. Distrib. Appl. 2019, 6, 7. [Google Scholar] [CrossRef] [Green Version]
  29. Alzaatreh, A.; Sulieman, H. On fitting cryptocurrency log-return exchange rates. Empir. Econ. 2019, 15, 1–18. [Google Scholar] [CrossRef]
  30. Moors, J.J.A. A quantile alternative for Kurtosis. Statistician 1988, 37, 25–32. [Google Scholar] [CrossRef] [Green Version]
  31. Galton, F. Enquiries into Human Faculty and Its Development; Macmillan: London, UK, 1883. [Google Scholar]
  32. Yeh, I.C.; Hsu, T.K. Building real estate valuation models with comparative approach through case-based reasoning. Appl. Soft Comput. 2018, 65, 260–271. [Google Scholar] [CrossRef]
Figure 1. The logistic-normal (LN) density for μ = 0 , σ = 1 , and various values of λ .
Figure 1. The logistic-normal (LN) density for μ = 0 , σ = 1 , and various values of λ .
Entropy 23 00173 g001
Figure 2. Plot of Moore’s kurtosis of LN distribution for various value of λ . The dashed line represents the Moore’s kurtosis of the standard normal distribution.
Figure 2. Plot of Moore’s kurtosis of LN distribution for various value of λ . The dashed line represents the Moore’s kurtosis of the standard normal distribution.
Entropy 23 00173 g002
Figure 3. Three-dimensional plots of Galton’s skewness and Moore’s kurtosis for various values of α and λ .
Figure 3. Three-dimensional plots of Galton’s skewness and Moore’s kurtosis for various values of α and λ .
Entropy 23 00173 g003
Figure 4. Plots of fitted distributions for the Buoys dataset.
Figure 4. Plots of fitted distributions for the Buoys dataset.
Entropy 23 00173 g004
Table 1. Relative bias and standard deviation of the maximum likelihood method (MLE) for LN distribution.
Table 1. Relative bias and standard deviation of the maximum likelihood method (MLE) for LN distribution.
Sample SizeActual ValueRelative BiasStandard Deviation
n λ μ σ λ ^ μ ^ σ ^ λ ^ μ ^ σ ^
300.5211.26980.02760.80641.59280.27971.8965
500.66060.02560.42050.46510.27490.5992
700.32900.01400.14220.40050.20130.4485
301.521−0.14220.0101−0.12100.73090.12240.4959
50−0.06920.0339−0.10740.59270.10050.3494
70−0.06710.0087−0.08980.34600.07730.2153
30231−0.30890.0113−0.30050.81900.09780.3418
50−0.32470.0083−0.29150.80950.08270.2212
70−0.31620.0076−0.29900.80070.05750.1379
Table 2. Estimates of the parameters and goodness of fit measures for the Buoys data.
Table 2. Estimates of the parameters and goodness of fit measures for the Buoys data.
DistributionLNMixture Normal
λ ^ = 0.5515 ( 0.2920 )
λ ^ = 0.2734 ( 0.3304 ) μ 1 ^ = 8.6051 ( 0.6836 )
Parameter Estimates μ ^ = 10.57 ( 0.3145 ) μ 2 ^ = 11.4634 ( 0.6930 )
σ ^ = 0.6507 ( 0.5051 ) σ 1 ^ = 1.4994 ( 0.7293 )
σ 2 ^ = 1.0750 ( 0.3770 )
Log-likelihood80.7109.5
AIC86.7119.5
K-S0.22730.6901
Table 3. MLEs of the parameters (SEs in parentheses) and p-values below SE for the real estate valuation data.
Table 3. MLEs of the parameters (SEs in parentheses) and p-values below SE for the real estate valuation data.
ModelEstimates
λ α a b σ β 0 β 1 β 2 β 3 β 4 β 5
Skew-LN201.56
(11.3874)
<0.0001
2.4560
(0.5421)
<0.0001
--1710.11
(1.3422)
<0.0001
31.1581
(1.6737)
<0.0001
0.7992
(0.3525)
0.0239
−3.3125
(0.3765)
<0.0001
−5.0717
(0.5341)
<0.0001
3.7339
(0.4609)
<0.0001
2.6738
(0.4435)
<0.0001
LN251.26
(10.1697)
<0.0001
---1742.09
(1.4668)
<0.0001
37.3380
(0.3675)
<0.0001
1.0021
(0.3645)
0.0062
−3.3672
(0.3825)
<0.0001
−5.0580
(0.5408)
<0.0001
3.6095
(0.4784)
<0.0001
2.9054
(0.4900)
<0.0001
Exp-N---32.1513
(20.3311)
0.1146
16.9758
(1.7343)
<0.0001
2.9407
(7.8762)
0.7091
0.8789
(0.3932)
0.0259
−3.1964
(0.4112)
<0.0001
−5.0716
(0.5809)
<0.0001
3.4727
(0.4988)
<0.0001
2.8686
(0.4746)
<0.0001
EN--1.3218
(1.7053)
0.4387
35.3119
(27.6631)
0.2025
19.2211
(11.1653)
0.0859
5.8752
(16.5371)
0.7226
0.8829
(0.3936)
0.0254
−3.1970
(0.4110)
<0.0001
−5.0767
(0.5812)
<0.0001
3.4745
(0.4987)
<0.0001
2.8644
(0.4751)
<0.0001
BN--111.13
(237.23)
0.6397
1.5116
(0.7873)
0.0556
23.9265
(11.4859)
0.0379
−17.7314
(39.9566)
0.6574
0.8863
(0.3943)
0.0251
−3.1978
(0.4112)
<0.0001
−5.0561
(0.5789)
<0.0001
3.4673
(0.4993)
<0.0001
2.8947
(0.4774)
<0.0001
SN2.3462
(0.2908)
<0.0001
---12.4444
(0.5943)
<0.0001
29.1849
(0.5737)
<0.0001
0.9097
(0.3949)
0.0217
−3.1574
(0.4120)
<0.0001
−5.1793
(0.5934)
<0.0001
3.5278
(0.5015)
<0.0001
2.7417
(0.4729)
<0.0001
N----8.7832
(0.3052)
<0.0001
37.9803
(0.4317)
<0.0001
1.4478
(0.4352)
0.0010
−3.0689
(0.4350)
<0.0001
−5.4944
(0.6138)
<0.0001
3.3466
(0.5486)
<0.0001
2.8156
(0.5442)
<0.0001
Table 4. Goodness of fit statistics for the real estate valuation data.
Table 4. Goodness of fit statistics for the real estate valuation data.
Model AICAICCBIC
Skew-LN1433.53272885.06542885.51092921.2982
LN1444.22952904.45902904.81462936.6659
Exp-N1454.78112925.56222925.91782957.7691
EN1454.75512927.51022927.95572963.7430
BN1454.39732926.79462927.24012963.0274
SN1458.69462933.38922933.74482965.5961
N1486.99532987.99062988.26653016.1717
Table 5. LR statistics for the real estate valuation data.
Table 5. LR statistics for the real estate valuation data.
HypothesesLR Statisticp-Value
Skew-LN vs. LN H 0 : α = 1 21.3936<0.0001
Skew-LN vs. Exp-N H 0 : λ = 1 42.4968<0.0001
Skew-LN vs. Normal H 0 : α = λ = 1 106.9252<0.0001
LN vs. Normal H 0 : λ = 1 85.5316<0.0001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Alzaatreh, A.; Aljarrah, M.; Almagambetova, A.; Zakiyeva, N. On the Regression Model for Generalized Normal Distributions. Entropy 2021, 23, 173. https://doi.org/10.3390/e23020173

AMA Style

Alzaatreh A, Aljarrah M, Almagambetova A, Zakiyeva N. On the Regression Model for Generalized Normal Distributions. Entropy. 2021; 23(2):173. https://doi.org/10.3390/e23020173

Chicago/Turabian Style

Alzaatreh, Ayman, Mohammad Aljarrah, Ayanna Almagambetova, and Nazgul Zakiyeva. 2021. "On the Regression Model for Generalized Normal Distributions" Entropy 23, no. 2: 173. https://doi.org/10.3390/e23020173

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop