Next Article in Journal
Positive Solutions to the Discrete Boundary Value Problem of the Kirchhoff Type
Next Article in Special Issue
Bivariate Unit-Weibull Distribution: Properties and Inference
Previous Article in Journal
Two-Threshold-Variable Integer-Valued Autoregressive Model
Previous Article in Special Issue
The NBRULC Reliability Class: Mathematical Theory and Goodness-of-Fit Testing with Applications to Asymmetric Censored and Uncensored Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Log-Bimodal Asymmetric Generalized Gaussian Model with Application to Positive Data

by
Guillermo Martínez-Flórez
1,*,†,
Roger Tovar-Falón
1,*,† and
Heleno Bolfarine
2
1
Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Montería 230002, Colombia
2
Departamento de Estatística, Instituto de Matemática e Estatística (IME), Universidade de São Paulo, São Paulo 1010, Brazil
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2023, 11(16), 3587; https://doi.org/10.3390/math11163587
Submission received: 21 June 2023 / Revised: 20 July 2023 / Accepted: 20 July 2023 / Published: 19 August 2023
(This article belongs to the Special Issue Probability, Statistics & Symmetry)

Abstract

:
One of the most widely known probability distributions used to explain the probabilistic behavior of positive data is the log-normal (LN). Although the LN distribution is capable of adjusting data types, it is not always fully true that the model manages to adequately model the behavior of the response of interest since in some cases, the degree of skewness and/or kurtosis of the data are greater or less than those that the LN distribution can capture. Another peculiarity of the LN distribution is that it only fits unimodal positive data, which constitutes a limitation when dealing with data that present more than one mode (bimodality). On the other hand, the log-normal model only fits unimodal positive data and in reality there are multiple applications where the behavior of materials is bimodal. To fill this gap, this paper introduces a new probability distribution that is capable of fitting unimodal or bimodal positive data with a high or low degree of skewness and/or kurtosis. The new distribution is a generalization of the LN distribution. For the new proposal, its main properties are studied and the process of estimation of the parameters involved in the model is carried out from a classical perspective using the maximum likelihood method. An important feature of this distribution is the non-singularity of the Fisher information matrix, which guarantees the use of asymptotic theory to study the properties of the parameter estimators. A Monte Carlo type simulation study is carried out to evaluate the properties of the estimators and finally, an illustration is presented with a set of data related to the concentration of nickel in soil samples, allowing to show that the proposed distribution fits extremely well in certain situations.

1. Introduction

Until a few years ago, the modeling of positive data was limited to the use of some distributions which are characterized by being of asymmetric type, such as the gamma, Weibull, exponential and log-normal (LN) models. In geochemistry, the fundamental law enunciated by Ahrens [1], “the concentration of a chemical element in a rock has a logarithm-normal distribution”, converted to the LN distribution, one of the most used for modeling the concentration of chemical elements.The LN distribution denoted by LN ( μ , σ 2 ) , is defined from the transformation of the random variable Y = log ( X ) , where X N ( μ , σ 2 ) . The model has wide applicability in studies about survival time materials in engineering sciences and some economic studies.
Another type of asymmetric distribution, used for fitting positive data, is the log-skew-normal (LSN) model introduced by Azzalini et al. [2], which is an extension of the original skew-normal (SN) distribution proposed by Azzalini [3]. From the SN model, numerous families of asymmetric distributions have been introduced and studied in detail, for example, the symmetric-asymmetric family of distributions with probability density function (PDF) given by
φ A ( y ; λ ) = 2 f ( y ) G ( λ y ) , y R ,
where λ R , f is a symmetric PDF around zero, G is an absolutely continuous symmetric distribution function, and λ is a parameter that controls the asymmetry. The SN model is a special case of the model in (1), which is obtained by letting f = ϕ and G = Φ , the PDF and the cumulative distribution function (CDF) of the normal distribution. The extension of the location-scale version of the SN model follows by applying the linear transformation Z = ξ + η Y , where ξ R and η > 0 . This is denoted by SN ( ξ , η , λ ) , and the standard case by SN ( λ ) . Additional works focused on the study of the SN distribution were carried out by Azzalini and Dalla Valle [4], Henze [5] and Chiogna [6], among others.
An extension of the SN model for fitting positive data denoted by LSN ( ξ , η , λ ) was introduced by Azzalini et al. [2]. This extension has PDF given by
f LSN ( y ; ξ , η , λ ) = 2 η y ϕ log ( y ) ξ η Φ λ log ( y ) ξ η , y R + ,
where ϕ and Φ are the PDF and CDF of the standard normal distribution, respectively, and λ is a parameter which controls the asymmetry and kurtosis in the model. The LSN distribution is commonly used for modeling data with skewness and kurtosis coefficients greater than the LN distribution can fit. Notice that, letting λ = 0 in the Equation (2), the LSN reduces to the LN model.
As an alternative to the SN model, Durrans [7] introduced another type of asymmetric distribution, called fractional order statistics model, and in much of the statistical literature referred to as alpha-power (AP) model. Properties of the AP model were studied by Gupta and Gupta [8] and Pewsey et al. [9]. The AP has PDF given by
φ F ( y ; α ) = α f ( y ) { F ( y ) } α 1 , y R ,
where α R + is a parameter that controls the skewness and kurtosis of the distribution, and F is an absolutely continuous distribution function with PDF f = d F . In the particular case of F = Φ in (3), we said that the random variable Y follows the generalized Gaussian (or power-normal) distribution, and is denoted by Y PN ( α ) . The extension of the location-scale of the PN model, denoted by PN ( ξ , η , α ) , to the case of positive data was proposed by Martínez-Flórez et al. [10], who defined the log-power-normal (LPN) model whose PDF is given by
φ LPN ( y ; ξ , η , α ) = α η y ϕ log ( y ) ξ η Φ log ( y ) ξ η α 1 , y R + ,
where ξ R and η > 0 are parameters of location and scale, respectively. This model is denoted by Y LPN ( ξ , η , α ) . The LPN model contains, as a special case, the LN model when α = 1 .
Contrary to the LSN model in which the information matrix is singular for the case λ = 0 , Martínez-Flórez et al. [10] showed that the LPN model has information matrix non-singular when α = 1 . Thus, in the LPN model, statistical inference based on the theory of large samples can be carried out. The normality of the vector of maximum likelihood estimator (MLE) for the model parameters can be tested by using the likelihood-ratio statistic. The LPN was studied also by Martínez-Flórez et al. [11].
In the statistical literature, it is well known that the SN and PN models are restricted to the case of unimodal data set; so, extensions to the bimodal situations of these two models have been considered by some authors. For example, Arnold et al. [12] defined the bimodal model known as “the extended normal-asymmetric two-pieces model” (ETN), whose PDF is given by
f ETN ( y ; λ , β ) = 2 c λ ϕ ( y ) Φ ( λ | y | ) Φ ( β y ) , y R ,
where λ , β R , and c λ is a normalizing constant. For the ETN model, Arnold et al. [12] showed that the information matrix is singular when λ = β = 0 .
Concerning bimodal positive data, Bolfarine et al. [13] introduced the log-bimodal-skew-normal model, which is denoted by Y LBSN ( ξ , η , λ , β ) . The LBSN model is an extension of the SN distribution and is adequate for modeling bimodal positive data. The PDF of the LBSN model is given by
φ LBSN ( y ; ξ , η , λ , β ) = 2 y η 1 + β x 2 1 + β ϕ ( x ) Φ ( λ x ) , y R + ,
where x = ( log ( y ) ξ ) / η , with ξ , λ R , β , η R + .
Recently, Bolfarine et al. [14] extended the unimodal generalized Gaussian model of Durrans [7] to situations of bimodal asymmetric data by considering the alternative of the ETN model developed by Arnold et al. [12]. The extension of Bolfarine et al. [14], which is denoted by ABPN ( β , α ) , has density function given by
φ ( y ; β , α ) = 2 α c α ϕ ( y ) Φ ( | y | ) α 1 Φ ( β y ) , y R ,
where ϕ and Φ are the PDF and CDF of the standard normal distribution, respectively, and, c α = ( 2 α 1 ) / ( 2 α 1 ) , with α R + , and β R . Bolfarine et al. [14] showed that the PDF in (7) is bimodal and asymmetric for values of α > 1 , and certain values of β ; and unimodal for α 1 . Therefore, the ABPN model can be used for fitting data with a high degree of asymmetry and bimodality. For the ABPN model, the authors also showed that the information matrix is non-singular in the neighborhood of α = 1 , contrary to the case of the ETN model of Arnold et al. [12], whose information matrix is singular for the case λ = β = 0 . Hence, normality of the MLE for the parameters of the ABPN model can be tested by using the large sample theory together with the likelihood-ratio statistic.
In the current literature, there are few distributions for fitting bimodal positive data, and therefore, this work is focused to propose a new distribution which is adequate to fit data with this behavior. The proposed model generalizes the fundamental law of geochemistry of [1], and in addition, is more flexible than the LN, LSN and LPN models, which are contained as special cases.
The rest of the paper is organized as follows: Section 2 introduces the log-bimodal asymmetric generalized Gaussian model, and its main properties are enunciated and studied. The moment, score function, and the observed and expected information matrix are obtained. The inference for the parameters of the model is realized by using the maximum likelihood method. The results of a simulation study and its respective discussion are presented in Section 3. In Section 4, an application to a real data set consisting of samples of concentration of nickel in soil is presented to illustrate the use of the new model. Finally, a discussion about the proposed model is presented in Section 5.

2. The Log-Uni-Bi-Modal Asymmetric Generalized Gaussian Model

In this section, we introduce a new model which generalizes some distributions already known in the statistical literature for fitting positive data. This model contains two parameters that make it more flexible than the LN, LSN and LPN models, and is obtained from considering the alternative two-pieces skew-normal model for bimodal asymmetric data of Arnold et al. [12].
Definition 1. 
If random variable Y is distributed with density function given by:
φ ( y ; ξ , η , β , α ) = k α y ϕ log ( y ) ξ η Φ | log ( y ) ξ | η α 1 Φ β log ( y ) ξ η ,
for y R + ; where ξ , β R ; α , η R + , and k α = α 2 α / ( 2 α 1 ) is the normalizing constant, then Y follows a log-bimodal asymmetric generalized Gaussian distribution, also called log-bimodal asymmetric power-normal (LABPN) distribution, with parameter θ = ( ξ , η , β , α ) . We use the notation Y L A B P N ( ξ , η , β , α ) .
In density function (8), β is an asymmetry parameter, α is a shape parameter, ξ is a location parameter and η is a scale parameter.
Although the new distribution is a little more complicated than the existing methodologies, this does not constitute a limitation in its applicability. On the one hand, the main benefit of this new proposal is the possibility of fitting positive data that present bimodality and a high degree of asymmetry and kurtosis that cannot be captured by existing models in the current literature. On the other hand, the existence of statistical packages today facilitates their implementation in practical terms.
From Definition 1, some special cases of the LABPN model are followed by letting specific values of the parameters. For example, when α = 1 and β = 0 the LN model follows; for α = 1 , the LSN model is obtained; and, if β = 0 and ( log ( y ) ξ ) / η > 0 the LPN model follows. Finally, when α = 2 , then Y ETN ( 1 , β ) . These results are presented in the following properties:
Property 1. 
If Y LABPN ( ξ , η , β , α ) , then LABPN ( ξ , η , 0 , 1 ) = LN ( ξ , η 2 ) , where LN denote the log-normal distribution.
Property 2. 
If Y LABPN ( ξ , η , β , α ) , then LABPN ( ξ , η , β , 1 ) = LSN ( ξ , η , β ) , where LN denote the log-skew normal distribution, see Azzalini et al. [2].
Property 3. 
If Y LABPN ( ξ , η , β , α ) and log ( Y ) > ξ for all Y R + , then LABPN ( ξ , η , 0 , α ) = LPN ( ξ , η , α ) , where LN denote the log-skew normal distribution, see Martínez-Flórez et al. [10].
Property 4. 
If Y LABPN ( ξ , η , β , 2 ) , then Y ETN ( ξ , η , 1 ) , where LN denote the log-skew normal distribution, see Arnold et al. [12].
Differentiating the density function regarding y, we have that the derivative of φ ( y ; ξ , η , β , α ) is null at
α 1 = | z | Φ ( | z | ) ϕ ( z ) 1 + 1 z β z ϕ ( β z ) Φ ( β z ) , y > 0 ,
where z = ( log ( y ) ξ ) / η . Now, for y > 0 such that < z < 0 , we have
lim β β z ϕ ( β z ) Φ ( β z ) = ,
and for y > 0 , such that 0 < z < , we have
lim β β z ϕ ( β z ) Φ ( β z ) = .
Therefore, for α > 1 and certain values of < β < satisfying
0 < 1 + 1 z β z ϕ ( β z ) Φ ( β z ) < ,
we have a log-bimodal model. Figure 1 reveals how the parameters α and β control the skewness, kurtosis and shape of the LABPN model.

2.1. Moments

The moments of the random variable Y with LABPN distribution do not have explicit form; however, the rth moment for the standard case of the LABPN model, that is, ξ = 0 and η = 1 , which is denoted by LABPN ( β , α ) , can be obtained by using the formula:
μ r = E ( Z r ) = k α 0 ( exp ( r z ) + exp ( r z ) ) ϕ ( z ) Φ ( z ) α 1 Φ ( β z ) d z .
where z = log ( y ) , y > 0 . Thus, the rth moment of Y LABPN ( ξ , η , β , α ) can be obtained from the expression:
E ( Y r ) = k = 0 r r k ξ k η r k μ r k .
The following result is similar to that given for the LN, LSN and LPN models.
Property 5. 
For all α R + , the moment-generating function (MGF) of the random variable Y LABPN ( ξ , η , β , α ) does not exist.
Proof. 
The result is obvious for α = 1 and β = 0 , since this corresponds to the case of the LN model. For α = 1 , the LSN model follows and for β = 0 and z > 0 for all y = log ( z ) R + , we have the case of the LPN model.
For 0 < t < a < (without loss of generality, we took ξ = 0 and η = 1 ), we have by definition
M Y ( t ) = E ( e t Y ) = k α 0 e t y y ϕ log ( y ) Φ | log ( y ) | α 1 Φ β log ( y ) d y , y R + , = k α 0 1 1 y e t y ϕ log ( y ) 1 Φ ( log ( y ) ) α 1 Φ β log ( y ) d y + k α 1 1 y e t y ϕ log ( y ) Φ ( log ( y ) ) α 1 Φ β log ( y ) d y ,
Letting h β ( y ) = 2 y e t y ϕ ( log ( y ) ) Φ ( β log ( y ) ) d y , it follows for β > 0 that
lim inf y Φ β log ( y ) 1 2 ,
see Lin and Stoyanov [15]. In addition, by letting α = α 0 , we get
0 < 2 α 0 Φ ( log ( y ) ) α 0 1 < k α 0 Φ ( log ( y ) ) α 0 1 ,
then, if y it follows that h β ( y ) and 0 < k α 0 Φ ( log ( x ) ) α 0 1 k α 0 , where it is obtained that 0 < 2 α 0 < k α 0 ; therefore, for
A ( α ) = k α 1 1 y e t y ϕ log ( y ) Φ ( log ( y ) ) α 1 Φ β log ( y ) ,
we conclude that A ( α 0 ) ( y ) = . On the other hand, following Arnold and Lin [16],
lim y log ( Φ ( y ) ) y 2 = 1 2 ,
we have for β < 0 , and y , the approximation
log Φ ( β log ( y ) ) 1 2 β log ( y ) 2 ;
therefore, when y , we have for fixed α that
log 1 2 k α 0 h β ( y ) Φ ( log ( y ) ) α 0 1 1 2 log ( 2 π ) + log ( k α 0 ) log ( 2 y ) + t y 1 2 ( 1 + β 2 ) log ( y ) 2 + ( α 0 1 ) log Φ ( log ( y ) )
from which we can conclude that A ( α 0 ) ( y ) = . Thus, for all α R + and β R , the variable random Y does not have MGF when t > 0 . Therefore, M Y ( t ) does not exist, and the proof is completed. □

2.2. Statistical Inference

Given a random sample of size n, Y such that, Y = ( Y 1 , Y 2 , , Y n ) , with Y i LABPN ( ξ , η , β , α ) , the log-likelihood function for the parameter θ = ( ξ , η , β , α ) can be written as
( θ ; X ) = n log ( α ) + log ( c α ) log ( η ) + i = 1 n log ϕ ( z i ) + ( α 1 ) i = 1 n log Φ ( | z i | ) + i = 1 n log Φ ( β z i ) ,
where z i = ( log ( y i ) ξ ) / η , for i = 1 , , n . The corresponding score functions, obtained by taking the first derivative of the log-likelihood function, are given by
U ( ξ ) = 1 η i = 1 n z i α 1 η i = 1 n sgn ( z i ) ϕ ( | z i | ) Φ ( | z i | ) β η i = 1 n ϕ ( β z i ) Φ ( β z i ) = 0 U ( η ) = n η + 1 η i = 1 n z i 2 α 1 η i = 1 n | z i | ϕ ( | z i | ) Φ ( | z i | ) β η i = 1 n z i ϕ ( β z i ) Φ ( β z i ) = 0 U ( β ) = i = 1 n z i ϕ ( β z i ) Φ ( β z i ) = 0 , U ( α ) = n α n log 2 1 2 α + i = 1 n log ( 2 Φ ( | z i | ) ) = 0 .

2.3. Observed Information Matrix

The elements of the observed information matrix are obtained by multiplying by -1 the second partial derivatives of the log-likelihood function regarding the parameters, by using the expression
k θ r , θ p = 2 ( θ ; y ) θ r θ p , r , p = 1 , 2 , 3 , 4 ,
with θ 1 = ξ , θ 2 = η , θ 3 = β and θ 4 = α . These elements are presented in detail in Appendix A.

2.4. Expected Information Matrix

Under the assumption that the regularity conditions are satisfied, the elements of the expected information matrix can be calculated by multiplying by n 1 , the expected value of the corresponding elements of the observed information matrix, that is,
I θ r θ p = n 1 E 2 ( θ ; y ) θ r θ p , r , p = 1 , 2 , 3 , 4 ,
In Appendix A, the explicit expressions of the elements of the expected information matrix are presented. The expectations of the expressions involved in the components of the information matrix must be calculated numerically. In the particular case where α = 1 and β = 0 , so that
φ ( y ; ξ , η , 1 , 0 ) = 1 y η ϕ log ( y ) ξ η ,
is the density function of the LN model of location-scale version, the information matrix becomes
I ( θ ) = 1 η 2 0 E [ sgn ( Z ) W ] η 1 η 2 π 0 2 η 2 E [ sgn ( Z ) Z W ] η 0 E [ sgn ( Z ) W ] η E [ sgn ( Z ) Z W ] η 1 2 ( log 2 ) 2 0 1 η 2 π 0 0 2 π ,
whose determinant | I ( θ ) | 0 , then the information matrix is non-singular in the neighborhood of α = 1 and β = 0 , that is, for the LN model. This is not the case for the model of Azzalini and Dalla Valle [4] for which the Fisher information matrix is singular in the neighborhood of λ = 0 . Furthermore, the upper sub-matrix of size 2 × 2 corresponds to the Fisher information matrix of the LN model. Therefore, for n large
θ ^ D N 4 ( θ , I ( θ ) 1 ) ,
θ ^ is consistent and has asymptotic normal distribution with covariance matrix I ( θ ) 1 . Inferences based on confidence intervals and hypothesis testing for the location, scale and shape parameters can be realized by using sampling properties for large samples of the MLE.

3. Simulation Study

This section presents a Monte Carlo simulation study, which was carried out with the objective of evaluating the behavior of the maximum likelihood estimators for the LABPN distribution.
For this simulation, the maxLik function of the statistical software R Development Core Team [17], version 4.2.3 was used and data from the LABPN distribution were generated, considering the values of the parameters: ξ = 1.0 and η = 0.5 , and different values of the parameters β and α .
For each scenario, 5000 random samples from the LABPN distribution were generated with the sample sizes n = 40 , 80 , 150 , 200 and 500. As quality measures to evaluate the behavior of the MLEs; the bias and the mean square error (MSE) were used. These measurements were calculated as
B i a s ( δ ^ ( j ) ) = 1 5000 i = 1 5000 ( δ ^ i ( j ) δ ( j ) ) ,
and
M S E ( δ ^ ( j ) ) = 1 5000 i = 1 5000 ( δ ^ i ( j ) δ ) 2
where δ ^ i ( j ) is the estimate of δ ( j ) for the ith sample. The results of the simulation study are presented in the Table 1.
From the table, it can be seen that in general, as the sample size increases, the bias and the MSE of all parameter estimators tend to decrease and they approach zero. Thus, MLEs are asymptotically consistent and large-sample theory can be used to perform interval estimation of parameters as well as hypothesis testing based on likelihood ratio statistics.

4. Application to the Nickel Content in the Soil Data

For the illustration, we use a data set which was previously analyzed by Bolfarine et al. [13], who fitted the LBSN model. The data consists of 86 samples of nickel content (in Ni( μ g g 1 )) in soil samples analyzed at the Department of Mines of the University of Atacama, Chile. The descriptive statistics for this data set are: n = 86 , mean = 21.337 , standard deviation = 16.639 , skewness = 2.440 , and kurtosis = 12.0443 . For this same data set, the skewness and kurtosis coefficients of the logarithm of the nickel content for the 86 samples were also calculated, which were 0.4490 and 3.7344 , respectively; therefore, the assumption that the logarithm of the nickel content follows an LN model is inadequate.
Bolfarine et al. [13] found that the LBSN model fitted the nickel content data better than the LN model. As an alternative to the LBSN model, we fitted the LABPN model. To compare our proposal, we also fitted the LN and LSN Azzalini et al. [2] models. The MLEs of the fitted models, which were obtained numerically using the optimal function of the statistical package [17], are presented in the Table 2 with the respective standard errors in parentheses. For obtaining the parameter estimates, we used the optim function from the [17] package.
To compare the fitted models, we computed the estimated values of the AIC Akaike [18], which is given by A I C = 2 ^ ( · ) + 2 k , B I C = 2 ^ ( · ) + log ( n ) k Schwarz [19], and the modified AIC criterion [20], typically called the consistent AIC, namely, C A I C = 2 ^ ( · ) + ( 1 + log ( n ) ) k , where k is the number of parameters for the considered model. The best model is the one with the smallest AIC (or BIC or CAIC). According to the AIC, BIC and CAIC statistics, the best models are the LBSN and LABPN.
The bimodality hypothesis can also be formally tested from the system of hypotheses
H 0 : α = 1 versus H 1 : α 1 ,
which is equivalent to compare the LSN and LABPN models. Since the Fisher information matrix is non-singular, we used the likelihood-ratio statistic, namely,
Λ 1 = L LSN ( ξ ^ , η ^ , β ^ ) L LESPN ( ξ ^ , η ^ , β ^ , α ^ ) ,
where L ( · ) is the likelihood function.
After substituting the estimated values of the parameters, we have
2 log ( Λ 1 ) = 2 ( 332.5 + 328.8 ) = 7.4 ,
which is greater than the 5 % critical value of the chi-squared distribution, χ 1 , 95 % 2 = 3.84 . This result leads to the rejection of the null hypothesis; therefore, we conclude that the LABPN model fits best the nickel concentration data.
To compare the LN model with the LABPN model, we considered the system of hypotheses
H 0 : ( α , β ) = ( 1 , 0 ) versus H 1 : ( α , β ) ( 1 , 0 ) ,
which can be tested by using the likelihood-ratio statistic given by
Λ 2 = L LN ( ξ ^ , η ^ ) L LABPN ( ξ ^ , η ^ , α ^ , β ^ ) .
Considering the estimated values, we get 2 log ( Λ 2 ) = 10 , which is greater than the critical value of the chi-square distribution with two degrees of freedom, χ 2 , 95 % 2 = 5.99 . Again, we rejected the null hypotheses H 0 , and we concluded that the LABPN model fits the nickel content data better than the LN model.
Now, to compare the LBSN and LABPN models, it is necessary to consider a test for non-nested models. Thus, we suppose that f ( y i | x i , θ ) and g ( y i | x i , β ) are the corresponding non-nested densities to be compared. To test the hypothesis of no differences between these densities, that is,
H 0 : E log f ( y i | x i , θ ) g ( y i | x i , β ) = 0 ,
Vuong [21] proposed the likelihood-ratio statistic given by
T LR , NN = 1 n L R ( θ ^ , β ^ ) ω ^ 2 ,
where
ω ^ 2 = 1 n i = 1 n log f ( y i | x i , θ ^ ) g ( y i | x i , β ^ ) 2 1 n i = 1 n log f ( y i | x i , θ ^ ) g ( y i | x i , β ^ ) 2 ,
is an estimator for the variance of 1 n L R ( θ ^ , β ^ ) . One can show that, under H 0 , if n , then
T LR , NN D N ( 0 , 1 ) .
The null hypothesis of equivalence of the models is rejected at significance level δ in favor of the LABPN model, that is, better fit (or worse fit) compared to the LBSN model, if T LR , NN > z δ / 2 (or T LR , NN < z δ / 2 ). For the nickel concentration data, we obtained T L R , N N = 0.54 , which is less than the critical value z 0.025 = 1.96 ; therefore, there are no statistical differences between the LBSN and LABPN models. In this way, the LABPN model is a useful alternative to fit the nickel concentration data. Figure 2 shows the fitted densities and QQplot plots for the LSN and LABPN models. These plots also show evidence that the LABPN model fits better than the other considered models. The QQplot in Figure 2c also shows that the LABPN model has a good fit.

5. Concluding Remarks

In this work, a new family of parametric distributions capable of fitting unimodal and bimodal positive data is introduced. The main properties of the new family were studied. This new family is obtained by considering the Arnold et al. [12] and Durrans [7] models and extends some existing models in the literature, among them, the log-normal, log-skew norm, log-bimodal-skew-normal and log-bimodal-power-normal. This new distribution is also very flexible and can fit unimodal and bimodal data with high (or low) degrees of skewness and kurtosis. To obtain the estimates of the parameters in the model, a classical approach was considered by using the maximum likelihood method together with iterative Newton–Raphson algorithms for the optimization of the likelihood function. The score functions were presented and the Fisher information matrix was shown to be non-singular, which allows statistical inference to be carried out through the theory of large samples and the use of likelihood-ratio statistics. The applicability of the new family was illustrated by considering a data set corresponding to nickel content in soil samples. The results showed better fit of the proposed family compared to other existing models in the literature.

Author Contributions

Conceptualization, G.M.-F. and R.T.-F.; methodology, G.M.-F., R.T.-F. and H.B.; software, G.M.-F. and R.T.-F.; validation, G.M.-F., R.T.-F. and H.B.; formal analysis, G.M.-F. and R.T.-F.; investigation, G.M.-F. and R.T.-F.; resources, G.M.-F. and R.T.-F.; data curation, G.M.-F. and R.T.-F.; writing—original draft preparation, G.M.-F. and R.T.-F.; writing—review and editing, G.M.-F., R.T.-F. and H.B.; visualization, G.M.-F. and R.T.-F.; supervision, G.M.-F. and R.T.-F.; project administration, G.M.-F. and R.T.-F.; funding acquisition, G.M.-F. and R.T.-F. All authors have read and agreed to the published version of the manuscript.

Funding

The research of G. Martinez-Flórez and R. Tovar-Falón was supported by Fondo de Investigación de la Vicerrectoría de Investigación, Universidad de Córdoba, Colombia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Details about available data are given in Section 3.

Acknowledgments

G. Martínez-Flórez and R. Tovar-Falón acknowledge the support given by Universidad de Córdoba, Montería, Colombia.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Information Matrix Elements

If the elements of the observed information matrix are denoted by k ξ ξ , k η ξ , , k β β , , k α α , then you have to
k ξ ξ = n η 2 + α 1 η 2 i = 1 n w i w i + | z i | + β 2 η 2 i = 1 n w 1 i β z i + w 1 i , k η ξ = 2 n η 2 z ¯ + α 1 η 2 i = 1 n sgn ( z i ) w i 1 + | z i | ( w i + | z i | ) β η 2 i = 1 n w 1 i 1 β z i ( β z i + w 1 i ) , k β ξ = 1 η i = 1 n w 1 i 1 β z i β z i + w 1 i , k η η = n η 2 + 3 η 2 i = 1 n z i 2 + α 1 η 2 i = 1 n | z i | w i 2 + | z i | ( w i + | z i | ) + β η 2 i = 1 n z i w 1 i 2 + β z i ( β z i + w 1 i ) , k β η = 1 η i = 1 n z i w 1 i 1 β z i β z i + w 1 i , k β β = i = 1 n z i 2 w 1 i ( β z i + w 1 i ) , k α ξ = 1 η i = 1 n sgn ( z i ) w i , k α η = 1 η i = 1 n | z i | w i , k α β = 0 , k α α = n α 2 2 α ( 2 α 1 ) 2 ( log 2 ) 2 ,
where w 1 i = ϕ ( β z i ) / Φ ( β z i ) . After making some calculations, the elements of the expected information matrix are given by
I ξ ξ = 1 η 2 + ( α 1 ) η 2 E ( | Z | W ) + E ( W 2 ) + β 2 η 2 β E ( Z W 1 ) + E ( W 1 2 ) , I ξ η = 2 η 2 E ( Z ) + ( α 1 ) η 2 E ( sgn ( Z ) Z 2 W ) + E ( Z W 2 ) E ( sgn ( Z ) W ) + β η 2 β 2 E ( Z 2 W 1 ) + β E ( Z W 1 2 ) E ( W 1 ) , I ξ α = 1 η E sgn ( Z ) W , I ξ β = 1 η β 2 E ( Z 2 W 1 ) + β E ( Z W 1 2 ) E ( W 1 ) , I η η = 1 η 2 + 3 η 2 E ( Z 2 ) + ( α 1 ) η 2 2 E ( | Z | W ) + E ( Z 3 W ) + E ( Z 2 W 2 ) + β η β 2 E ( Z 3 W 1 ) + β E ( Z 2 W 1 2 ) 2 E ( Z W 1 ) , I η α = 1 η E ( | Z | W ) , I η β = 1 η β 2 E ( Z 3 W 1 ) + β E ( Z 2 W 1 2 ) E ( Z W 1 ) , I β β = 1 η β E ( Z 3 W 1 ) + β E ( Z 2 W 1 2 ) , I β α = 0 , I α α = 1 α 2 2 α ( log 2 ) 2 ( 2 α 1 ) 2 ,
where W = ϕ ( | Z | ) / Φ ( | Z | ) and W 1 = ϕ ( β Z ) / Φ ( β Z ) .

References

  1. Ahrens, L.H. The lognormal distribution of the elements (A fundamental law of geochemistry and its subsidiary). Geochim. Cosmochim. Acta 1954, 5, 49–73. [Google Scholar] [CrossRef]
  2. Azzalini, A.; Cappello, D.; Kotz, S. Log-skew-normal and log-skew-t distributions as models for family income data. J. Income Distrib. 2002, 11, 12–20. [Google Scholar] [CrossRef]
  3. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  4. Azzalini, A.; Dalla Valle, A. The multivariate skew-normal distribution. Biometrika 1996, 83, 715–726. [Google Scholar] [CrossRef]
  5. Henze, N. A probabilistic representation of the skew-normal distribution. Scand. J. Stat. 1986, 13, 271–275. [Google Scholar]
  6. Chiogna, M. Some results on the scalar Skew-normal distribution. J. Ital. Stat. Soc. 1998, 1, 1–14. [Google Scholar] [CrossRef]
  7. Durrans, S.R. Distributions of fractional order statistics in hydrology. Water Resour. Res. 1992, 28, 1649–1655. [Google Scholar] [CrossRef]
  8. Gupta, R.C.; Gupta, R.D. Analyzing skewed data by power normal model. Test 2008, 17, 197–210. [Google Scholar] [CrossRef]
  9. Pewsey, A.; Gómez, H.W.; Bolfarine, H. Likelihood-based inference for power distributions. Test 2012, 21, 775–789. [Google Scholar] [CrossRef]
  10. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. The log-power-normal distribution with application to air pollution. Environmetrics 2014, 25, 44–56. [Google Scholar] [CrossRef]
  11. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. Asymmetric regression models with limited responses with an application to antibody response to vaccine. Biom. J. 2013, 55, 156–172. [Google Scholar] [CrossRef]
  12. Arnold, B.C.; Gómez, H.W.; Salinas, H.S. On Multiple Constraint Skewed Models. Stat. J. Theor. Appl. Stat. 2009, 43, 279–293. [Google Scholar] [CrossRef]
  13. Bolfarine, H.; Gómez, H.W.; Rivas, L. The log-bimodal-skew-normal model. A geochemical application. J. Chemom. 2011, 25, 329–332. [Google Scholar] [CrossRef]
  14. Bolfarine, H.; Martínez-Flórez, G.; Salinas, H.S. Bimodal symmetric-asymmetric power-normal families. Commun. Stat. Theory Methods 2018, 47, 259–276. [Google Scholar] [CrossRef]
  15. Lin, G.D.; Stoyanov, J. The logarithmic Skew-Normal distributions are Moment-Indeterminate. J. Appl. Probab. Trust. 2009, 46, 909–916. [Google Scholar] [CrossRef]
  16. Arnold, B.C.; Lin, G.D. Characterization of the skew-normal and generalized chi distributions. Sankhyā 2004, 66, 593–606. [Google Scholar]
  17. R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: http://www.R-project.org (accessed on 31 May 2023).
  18. Akaike, H. A new look at statistical model identification. IEEE Trans. Autom. Control. 1974, 19, 716–722. [Google Scholar] [CrossRef]
  19. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 61, 461–464. [Google Scholar] [CrossRef]
  20. Bozdogan, H. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika 1987, 52, 345–370. [Google Scholar] [CrossRef]
  21. Vuong, Q.H. Likelihood tatio tests for model selection and non-nested hypotheses. Econometrica 1989, 57, 307–333. [Google Scholar] [CrossRef]
Figure 1. PDF of the LABPN model with parameter θ , for: (a) θ = ( 0.25 , 1.0 , 1.0 , α ) with α = 3.25 (solid line), 2.5 (dashed line), 1.75 (dotted line); (b) θ = ( 0.25 , 1.0 , 1.0 , α ) with α = 0.25 (solid line), 0.5 (dashed line), 0.75 (dotted line); (c) θ = ( 1.25 , 0.5 , 1.0 , α ) with α = 3.25 (solid line), 2.5 (dashed line), 1.75 (dotted line); (d) θ = ( 1.25 , 0.5 , 1.0 , α ) with α = 0.25 (solid line), 0.5 (dashed line), 0.75 (dotted line); (e) θ = ( 2.25 , 1.0 , 1.0 , α ) with α = 3.25 (solid line), 2.5 (dashed line), 1.75 (dotted line) and (f) θ = ( 2.25 , 1.0 , 1.0 , α ) with α = 0.25 (solid line), 0.5 (dashed line), 0.75 (dotted line).
Figure 1. PDF of the LABPN model with parameter θ , for: (a) θ = ( 0.25 , 1.0 , 1.0 , α ) with α = 3.25 (solid line), 2.5 (dashed line), 1.75 (dotted line); (b) θ = ( 0.25 , 1.0 , 1.0 , α ) with α = 0.25 (solid line), 0.5 (dashed line), 0.75 (dotted line); (c) θ = ( 1.25 , 0.5 , 1.0 , α ) with α = 3.25 (solid line), 2.5 (dashed line), 1.75 (dotted line); (d) θ = ( 1.25 , 0.5 , 1.0 , α ) with α = 0.25 (solid line), 0.5 (dashed line), 0.75 (dotted line); (e) θ = ( 2.25 , 1.0 , 1.0 , α ) with α = 3.25 (solid line), 2.5 (dashed line), 1.75 (dotted line) and (f) θ = ( 2.25 , 1.0 , 1.0 , α ) with α = 0.25 (solid line), 0.5 (dashed line), 0.75 (dotted line).
Mathematics 11 03587 g001
Figure 2. (a) Histogram for the nickel concentration variable. Densities fitted by maximum likelihood: LN (dotted-dashed line), LSN (dotted line), BLSN (dashed line) and LABPN (solid line), (b) QQplot LSN and (c) QQplot LABPN.
Figure 2. (a) Histogram for the nickel concentration variable. Densities fitted by maximum likelihood: LN (dotted-dashed line), LSN (dotted line), BLSN (dashed line) and LABPN (solid line), (b) QQplot LSN and (c) QQplot LABPN.
Mathematics 11 03587 g002
Table 1. Bias and mean square error (MSE) of maximum likelihood estimates.
Table 1. Bias and mean square error (MSE) of maximum likelihood estimates.
β = 0.5 , α = 2.5 β = 0.5 , α = 2.5 β = 1.0 , α = 4.0 β = 1.0 , α = 4.0
n EstimatorBiasMSEBiasMSEBiasMSEBiasMSE
40 ξ ^ −0.02040.00970.02140.0111−0.04540.00960.04530.0105
η ^ −0.05530.0055−0.05280.0057−0.06020.0063−0.06070.0063
β ^ 0.05360.0771−0.02860.0994−0.05070.16390.05510.1720
α ^ 0.94022.02370.92762.00910.62321.85340.67932.0535
80 ξ ^ −0.01960.00600.02070.0064−0.04400.00650.04440.0064
η ^ −0.04580.0036−0.04530.0036−0.05540.0044−0.05560.0044
β ^ 0.02530.0480−0.02780.0505−0.03600.07100.02370.0605
α ^ 0.59180.89990.57260.87610.34390.81270.31040.7778
150 ξ ^ −0.01890.00460.02050.0041−0.04380.00530.04390.0051
η ^ −0.04350.0029−0.04270.0029−0.05470.0039−0.05330.0038
β ^ 0.02330.0338−0.02460.0321−0.02260.04520.02340.0469
α ^ 0.49460.57600.46440.57010.26300.52380.25390.5615
200 ξ ^ −0.01790.00360.02030.0033−0.04210.00420.04030.0041
η ^ −0.04140.0025−0.04180.0026−0.05300.0035−0.05210.0034
β ^ 0.02020.0263−0.02240.0245−0.01640.03390.02020.0360
α ^ 0.42940.43530.41040.44710.23150.42980.24640.3938
500 ξ ^ −0.01760.00190.02010.0019−0.03980.00270.03830.0024
η ^ −0.04010.0019−0.04090.0020−0.04960.0028−0.04920.0027
β ^ 0.01550.0126−0.01800.0123−0.01030.01510.01130.0137
α ^ 0.33990.21920.34570.22510.16050.16720.15590.1709
Table 2. Estimated parameters (standard error) for the LN, LSN, LBSN and LABPN models.
Table 2. Estimated parameters (standard error) for the LN, LSN, LBSN and LABPN models.
ParameterLNLSNLBSNLABPN
ξ 2.828 (0.078)3.486 (0.154)1.784 (0.163)1.689 (0.098)
η 0.728 (0.055)0.979 (0.128)0.778 (0.077)0.937 (0.071)
α 4.958 (3.471)6.111 (1.170)
β −1.596 (0.587)1.253 (0.296)1.567 (0.324)
AIC671.6671.0665.6665.6
BIC676.4678.3675.3675.3
CAIC678.4681.3679.3679.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Martínez-Flórez, G.; Tovar-Falón, R.; Bolfarine, H. The Log-Bimodal Asymmetric Generalized Gaussian Model with Application to Positive Data. Mathematics 2023, 11, 3587. https://doi.org/10.3390/math11163587

AMA Style

Martínez-Flórez G, Tovar-Falón R, Bolfarine H. The Log-Bimodal Asymmetric Generalized Gaussian Model with Application to Positive Data. Mathematics. 2023; 11(16):3587. https://doi.org/10.3390/math11163587

Chicago/Turabian Style

Martínez-Flórez, Guillermo, Roger Tovar-Falón, and Heleno Bolfarine. 2023. "The Log-Bimodal Asymmetric Generalized Gaussian Model with Application to Positive Data" Mathematics 11, no. 16: 3587. https://doi.org/10.3390/math11163587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop