The Log-Bimodal Asymmetric Generalized Gaussian Model with Application to Positive Data

: One of the most widely known probability distributions used to explain the probabilistic behavior of positive data is the log-normal (LN). Although the LN distribution is capable of adjusting data types, it is not always fully true that the model manages to adequately model the behavior of the response of interest since in some cases, the degree of skewness and/or kurtosis of the data are greater or less than those that the LN distribution can capture. Another peculiarity of the LN distribution is that it only ﬁts unimodal positive data, which constitutes a limitation when dealing with data that present more than one mode (bimodality). On the other hand, the log-normal model only ﬁts unimodal positive data and in reality there are multiple applications where the behavior of materials is bimodal. To ﬁll this gap, this paper introduces a new probability distribution that is capable of ﬁtting unimodal or bimodal positive data with a high or low degree of skewness and/or kurtosis. The new distribution is a generalization of the LN distribution. For the new proposal, its main properties are studied and the process of estimation of the parameters involved in the model is carried out from a classical perspective using the maximum likelihood method. An important feature of this distribution is the non-singularity of the Fisher information matrix, which guarantees the use of asymptotic theory to study the properties of the parameter estimators. A Monte Carlo type simulation study is carried out to evaluate the properties of the estimators and ﬁnally, an illustration is presented with a set of data related to the concentration of nickel in soil samples, allowing to show that the proposed distribution ﬁts extremely well in certain situations.


Introduction
Until a few years ago, the modeling of positive data was limited to the use of some distributions which are characterized by being of asymmetric type, such as the gamma, Weibull, exponential and log-normal (LN) models.In geochemistry, the fundamental law enunciated by Ahrens [1], "the concentration of a chemical element in a rock has a logarithm-normal distribution", converted to the LN distribution, one of the most used for modeling the concentration of chemical elements.The LN distribution denoted by LN(µ, σ 2 ), is defined from the transformation of the random variable Y = log(X), where X ∼ N(µ, σ 2 ).The model has wide applicability in studies about survival time materials in engineering sciences and some economic studies.
Another type of asymmetric distribution, used for fitting positive data, is the logskew-normal (LSN) model introduced by Azzalini et al. [2], which is an extension of the original skew-normal (SN) distribution proposed by Azzalini [3].From the SN model, numerous families of asymmetric distributions have been introduced and studied in detail, for example, the symmetric-asymmetric family of distributions with probability density function (PDF) given by ϕ A (y; λ) = 2 f (y)G(λy), y ∈ R, where λ ∈ R, f is a symmetric PDF around zero, G is an absolutely continuous symmetric distribution function, and λ is a parameter that controls the asymmetry.The SN model is a special case of the model in (1), which is obtained by letting f = φ and G = Φ, the PDF and the cumulative distribution function (CDF) of the normal distribution.The extension of the location-scale version of the SN model follows by applying the linear transformation Z = ξ + ηY, where ξ ∈ R and η > 0. This is denoted by SN(ξ, η, λ), and the standard case by SN(λ).Additional works focused on the study of the SN distribution were carried out by Azzalini and Dalla Valle [4], Henze [5] and Chiogna [6], among others.An extension of the SN model for fitting positive data denoted by LSN(ξ, η, λ) was introduced by Azzalini et al. [2].This extension has PDF given by where φ and Φ are the PDF and CDF of the standard normal distribution, respectively, and λ is a parameter which controls the asymmetry and kurtosis in the model.The LSN distribution is commonly used for modeling data with skewness and kurtosis coefficients greater than the LN distribution can fit.Notice that, letting λ = 0 in the Equation ( 2), the LSN reduces to the LN model.As an alternative to the SN model, Durrans [7] introduced another type of asymmetric distribution, called fractional order statistics model, and in much of the statistical literature referred to as alpha-power (AP) model.Properties of the AP model were studied by Gupta and Gupta [8] and Pewsey et al. [9].The AP has PDF given by where α ∈ R + is a parameter that controls the skewness and kurtosis of the distribution, and F is an absolutely continuous distribution function with PDF f = dF.In the particular case of F = Φ in (3), we said that the random variable Y follows the generalized Gaussian (or power-normal) distribution, and is denoted by Y ∼ PN(α).The extension of the location-scale of the PN model, denoted by PN(ξ, η, α), to the case of positive data was proposed by Martínez-Flórez et al. [10], who defined the log-power-normal (LPN) model whose PDF is given by where ξ ∈ R and η > 0 are parameters of location and scale, respectively.This model is denoted by Y ∼ LPN(ξ, η, α).The LPN model contains, as a special case, the LN model when α = 1.
Contrary to the LSN model in which the information matrix is singular for the case λ = 0, Martínez-Flórez et al. [10] showed that the LPN model has information matrix non-singular when α = 1.Thus, in the LPN model, statistical inference based on the theory of large samples can be carried out.The normality of the vector of maximum likelihood estimator (MLE) for the model parameters can be tested by using the likelihood-ratio statistic.The LPN was studied also by Martínez-Flórez et al. [11].
In the statistical literature, it is well known that the SN and PN models are restricted to the case of unimodal data set; so, extensions to the bimodal situations of these two models have been considered by some authors.For example, Arnold et al. [12] defined the bimodal model known as "the extended normal-asymmetric two-pieces model" (ETN), whose PDF is given by where λ, β ∈ R, and c λ is a normalizing constant.For the ETN model, Arnold et al. [12] showed that the information matrix is singular when λ = β = 0. Concerning bimodal positive data, Bolfarine et al.
[13] introduced the log-bimodalskew-normal model, which is denoted by Y ∼ LBSN(ξ, η, λ, β).The LBSN model is an extension of the SN distribution and is adequate for modeling bimodal positive data.The PDF of the LBSN model is given by where x = (log(y Recently, Bolfarine et al. [14] extended the unimodal generalized Gaussian model of Durrans [7] to situations of bimodal asymmetric data by considering the alternative of the ETN model developed by Arnold et al. [12].The extension of Bolfarine et al. [14], which is denoted by ABPN(β, α), has density function given by where φ and Φ are the PDF and CDF of the standard normal distribution, respectively, and, c α = (2 α−1 )/(2 α − 1), with α ∈ R + , and β ∈ R. Bolfarine et al. [14] showed that the PDF in ( 7) is bimodal and asymmetric for values of α > 1, and certain values of β; and unimodal for α ≤ 1.Therefore, the ABPN model can be used for fitting data with a high degree of asymmetry and bimodality.For the ABPN model, the authors also showed that the information matrix is non-singular in the neighborhood of α = 1, contrary to the case of the ETN model of Arnold et al. [12], whose information matrix is singular for the case λ = β = 0. Hence, normality of the MLE for the parameters of the ABPN model can be tested by using the large sample theory together with the likelihood-ratio statistic.
In the current literature, there are few distributions for fitting bimodal positive data, and therefore, this work is focused to propose a new distribution which is adequate to fit data with this behavior.The proposed model generalizes the fundamental law of geochemistry of [1], and in addition, is more flexible than the LN, LSN and LPN models, which are contained as special cases.
The rest of the paper is organized as follows: Section 2 introduces the log-bimodal asymmetric generalized Gaussian model, and its main properties are enunciated and studied.The moment, score function, and the observed and expected information matrix are obtained.The inference for the parameters of the model is realized by using the maximum likelihood method.The results of a simulation study and its respective discussion are presented in Section 3. In Section 4, an application to a real data set consisting of samples of concentration of nickel in soil is presented to illustrate the use of the new model.Finally, a discussion about the proposed model is presented in Section 5.

The Log-Uni-Bi-Modal Asymmetric Generalized Gaussian Model
In this section, we introduce a new model which generalizes some distributions already known in the statistical literature for fitting positive data.This model contains two parameters that make it more flexible than the LN, LSN and LPN models, and is obtained from considering the alternative two-pieces skew-normal model for bimodal asymmetric data of Arnold et al. [12].
In density function (8), β is an asymmetry parameter, α is a shape parameter, ξ is a location parameter and η is a scale parameter.
Although the new distribution is a little more complicated than the existing methodologies, this does not constitute a limitation in its applicability.On the one hand, the main benefit of this new proposal is the possibility of fitting positive data that present bimodality and a high degree of asymmetry and kurtosis that cannot be captured by existing models in the current literature.On the other hand, the existence of statistical packages today facilitates their implementation in practical terms.
From Definition 1, some special cases of the LABPN model are followed by letting specific values of the parameters.For example, when α = 1 and β = 0 the LN model follows; for α = 1, the LSN model is obtained; and, if β = 0 and (log(y) − ξ)/η > 0 the LPN model follows.Finally, when α = 2, then Y ∼ ETN(1, β).These results are presented in the following properties: , where LN denote the log-skew normal distribution, see Arnold et al. [12].
Differentiating the density function regarding y, we have that the derivative of ϕ(y; ξ, η, β, α) is null at and for y > 0, such that 0 < z < ∞, we have we have a log-bimodal model.Figure 1 reveals how the parameters α and β control the skewness, kurtosis and shape of the LABPN model.

Moments
The moments of the random variable Y with LABPN distribution do not have explicit form; however, the rth moment for the standard case of the LABPN model, that is, ξ = 0 and η = 1, which is denoted by LABPN(β, α), can be obtained by using the formula: where z = log(y), y > 0. Thus, the rth moment of Y ∼ LABPN(ξ, η, β, α) can be obtained from the expression: The following result is similar to that given for the LN, LSN and LPN models.
Proof.The result is obvious for α = 1 and β = 0, since this corresponds to the case of the LN model.For α = 1, the LSN model follows and for β = 0 and z > 0 for all y = log(z) ∈ R + , we have the case of the LPN model.For 0 < t < a < ∞ (without loss of generality, we took ξ = 0 and η = 1), we have by definition see Lin and Stoyanov [15].In addition, by letting α = α 0 , we get where it is obtained that 0 < 2α 0 < k α 0 ; therefore, for we conclude that A(α 0 )(y) = ∞.On the other hand, following Arnold and Lin [16], we have for β < 0, and y → ∞, the approximation log Φ(β log(y)) − 1 2 β log(y) 2 ; therefore, when y → ∞, we have for fixed α that log 1 2 k α 0 h β (y) Φ(log(y)) from which we can conclude that A(α 0 )(y) = ∞.Thus, for all α ∈ R + and β ∈ R, the variable random Y does not have MGF when t > 0. Therefore, M Y (t) does not exist, and the proof is completed.

Observed Information Matrix
The elements of the observed information matrix are obtained by multiplying by -1 the second partial derivatives of the log-likelihood function regarding the parameters, by using the expression with θ 1 = ξ, θ 2 = η, θ 3 = β and θ 4 = α.These elements are presented in detail in Appendix A.

Expected Information Matrix
Under the assumption that the regularity conditions are satisfied, the elements of the expected information matrix can be calculated by multiplying by n −1 , the expected value of the corresponding elements of the observed information matrix, that is, In Appendix A, the explicit expressions of the elements of the expected information matrix are presented.The expectations of the expressions involved in the components of the information matrix must be calculated numerically.In the particular case where α = and β = 0, so that is the density function of the LN model of location-scale version, the information matrix becomes , whose determinant |I(θ)| = 0, then the information matrix is non-singular in the neighborhood of α = 1 and β = 0, that is, for the LN model.This is not the case for the model of Azzalini and Dalla Valle [4] for which the Fisher information matrix is singular in the neighborhood of λ = 0. Furthermore, the upper sub-matrix of size 2 × 2 corresponds to the Fisher information matrix of the LN model.Therefore, for n large θ is consistent and has asymptotic normal distribution with covariance matrix I(θ) −1 .Inferences based on confidence intervals and hypothesis testing for the location, scale and shape parameters can be realized by using sampling properties for large samples of the MLE.

Simulation Study
This section presents a Monte Carlo simulation study, which was carried out with the objective of evaluating the behavior of the maximum likelihood estimators for the LABPN distribution.
For this simulation, the maxLik function of the statistical software R Development Core Team [17], version 4.2.3 was used and data from the LABPN distribution were generated, considering the values of the parameters: ξ = 1.0 and η = 0.5, and different values of the parameters β and α.
For each scenario, 5000 random samples from the LABPN distribution were generated with the sample sizes n = 40, 80, 150, 200 and 500.As quality measures to evaluate the behavior of the MLEs; the bias and the mean square error (MSE) were used.These measurements were calculated as and where δ(j) i is the estimate of δ (j) for the ith sample.The results of the simulation study are presented in the Table 1.From the table, it can be seen that in general, as the sample size increases, the bias and the MSE of all parameter estimators tend to decrease and they approach zero.Thus, MLEs are asymptotically consistent and large-sample theory can be used to perform interval estimation of parameters as well as hypothesis testing based on likelihood ratio statistics.

Application to the Nickel Content in the Soil Data
For the illustration, we use a data set which was previously analyzed by Bolfarine et al. [13], who fitted the LBSN model.The data consists of 86 samples of nickel content (in Ni(µg g −1 )) in soil samples analyzed at the Department of Mines of the University of Atacama, Chile.The descriptive statistics for this data set are: n = 86, mean = 21.337,standard deviation = 16.639,skewness = 2.440, and kurtosis = 12.0443.For this same data set, the skewness and kurtosis coefficients of the logarithm of the nickel content for the 86 samples were also calculated, which were −0.4490 and 3.7344, respectively; therefore, the assumption that the logarithm of the nickel content follows an LN model is inadequate.Bolfarine et al. [13] found that the LBSN model fitted the nickel content data better than the LN model.As an alternative to the LBSN model, we fitted the LABPN model.To compare our proposal, we also fitted the LN and LSN Azzalini et al. [2] models.The MLEs of the fitted models, which were obtained numerically using the optimal function of the statistical package [17], are presented in the Table 2 with the respective standard errors in parentheses.For obtaining the parameter estimates, we used the optim function from the [17] package.
To compare the fitted models, we computed the estimated values of the AIC Akaike [18], which is given by AIC = −2 ˆ (•) + 2k, BIC = −2 ˆ (•) + log(n)k Schwarz [19], and the modified AIC criterion [20], typically called the consistent AIC, namely, CAIC = −2 ˆ (•) + (1 + log(n))k, where k is the number of parameters for the considered model.The best model is the one with the smallest AIC (or BIC or CAIC).According to the AIC, BIC and CAIC statistics, the best models are the LBSN and LABPN.The bimodality hypothesis can also be formally tested from the system of hypotheses which is equivalent to compare the LSN and LABPN models.Since the Fisher information matrix is non-singular, we used the likelihood-ratio statistic, namely, where L(•) is the likelihood function.
After substituting the estimated values of the parameters, we have −2 log(Λ 1 ) = −2(−332.5+ 328.8) = 7.4, which is greater than the 5% critical value of the chi-squared distribution, χ 2 1,95% = 3.84.This result leads to the rejection of the null hypothesis; therefore, we conclude that the LABPN model fits best the nickel concentration data.
To compare the LN model with the LABPN model, we considered the system of hypotheses which can be tested by using the likelihood-ratio statistic given by Considering the estimated values, we get −2 log(Λ 2 ) = 10, which is greater than the critical value of the chi-square distribution with two degrees of freedom, χ 2 2,95% = 5.99.Again, we rejected the null hypotheses H 0 , and we concluded that the LABPN model fits the nickel content data better than the LN model.Now, to compare the LBSN and LABPN models, it is necessary to consider a test for non-nested models.Thus, we suppose that f (y i |x i , θ) and g(y i |x i , β) are the corresponding non-nested densities to be compared.To test the hypothesis of no differences between these densities, that is, Vuong [21] proposed the likelihood-ratio statistic given by is an estimator for the variance of 1 The null hypothesis of equivalence of the models is rejected at significance level δ in favor of the LABPN model, that is, better fit (or worse fit) compared to the LBSN model, if T LR,NN > z δ/2 (or T LR,NN < −z δ/2 ).For the nickel concentration data, we obtained T LR,NN = −0.54,which is less than the critical value z 0.025 = 1.96; therefore, there are no statistical differences between the LBSN and LABPN models.In this way, the LABPN model is a useful alternative to fit the nickel concentration data.Figure 2 shows the fitted densities and QQplot plots for the LSN and LABPN models.These plots also show evidence that the LABPN model fits better than the other considered models.The QQplot in Figure 2c also shows that the LABPN model has a good fit.

Concluding Remarks
In this work, a new family of parametric distributions capable of fitting unimodal and bimodal positive data is introduced.The main properties of the new family were studied.This new family is obtained by considering the Arnold et al. [12] and Durrans [7] models and extends some existing models in the literature, among them, the log-normal, log-skew norm, log-bimodal-skew-normal and log-bimodal-power-normal.This new distribution is also very flexible and can fit unimodal and bimodal data with high (or low) degrees of skewness and kurtosis.To obtain the estimates of the parameters in the model, a classical approach was considered by using the maximum likelihood method together with iterative Newton-Raphson algorithms for the optimization of the likelihood function.The score functions were presented and the Fisher information matrix was shown to be non-singular, which allows statistical inference to be carried out through the theory of large samples and the use of likelihood-ratio statistics.The applicability of the new family was illustrated by considering a data set corresponding to nickel content in soil samples.The results showed better fit of the proposed family compared to other existing models in the literature.

Table 1 .
Bias and mean square error (MSE) of maximum likelihood estimates.