Abstract
A bimodal double log-normal distribution on the real line is proposed using the random sign mixture transform. Its associated statistical inferences are derived. Its parameters are estimated by the maximum likelihood method. The performance of the estimators and the corresponding confidence intervals is checked by simulation studies. Application of the proposed distribution to a real data set from a DNA microarray is presented.
Keywords:
bimodality; log-normal distribution; maximum likelihood estimation; Monte Carlo simulations MSC:
62E15; 62F12
1. Introduction
A log-normal distribution is perhaps the most popular model for skewed data [1]. However, a log-normal distribution is defined only on the positive real line. Many of its application areas involve data spanning the entire real line. One example is the modeling of stock returns. The log-normal distribution is a popular model for stock returns. However, stock returns can be positive or negative. Positive stock returns correspond to profits, while negative stock returns correspond to losses. Other application areas of log-normal distributions involving data spanning the entire real line are discussed later on. Hence, a double log-normal distribution is needed.
We follow the procedure presented in [2] to construct a double log-normal (DLN) distribution. Consider the following two transforms [2]:
- (i)
- The random sign transform (RST) given by
- (ii)
- The random sign mixture transform (RSMT) given by
where Y is a Bernoulli random variable (RV) with parameter and X, and are non-negative RVs independent of Y. The probability density function (PDF) of W is
and the PDF of Z is
where , , are the PDFs of non-negative RVs X, and , respectively, with (vector) parameters , and , respectively. If X is an RV from a family of distributions , then W is said to have a double distribution. If and are independent RVs from a family of distributions , then Z is said to have a double distribution.
There are many double continuous distributions on the real line. However, the words ‘double’ or ‘reflection’ are sometimes used to denote the distribution of the absolute value of a random variable. Some double continuous distributions based on the RST are:
- Double exponential distribution (Laplace) [3].
- Double generalized gamma distribution [4].
- Double Weibull distribution [5].
- Double gamma distribution [6].
- Double generalized Pareto distribution [7].
- Double Lomax distribution [8].
- Double Lindley distribution [9].
Some double continuous distributions based on the RSMT are:
- Double half-normal distribution [10,11].
- Double exponential distribution [12].
- Double inverse gamma distribution [13].
- Double Gompertz distribution [14].
- Double Pareto II distribution [15].
- Double inverse Gaussian distribution [16].
We construct the DLN distribution using the RSMT, i.e., the distribution of Z when and independently follow the log-normal distribution.
The remainder of this paper is organized as follows. In Section 2, the statistical properties of the DLN distribution are presented. The maximum likelihood estimates (MLEs) of the parameters and their asymptotic distributions are studied in Section 3. Simulations to check the finite sample performance of the estimators of the parameters and the corresponding confidence intervals are presented in Section 4. An application of the proposed double distribution to a real data set from a DNA microarray is presented in Section 5. Finally, the conclusions and comments are stated in Section 6.
2. Statistical Properties
We present the statistical properties of the DLN distribution in this section.
2.1. Probability Density Function
The PDF of the DLN distribution is
where for and , and
are the PDFs of the LN distributions.
Figure 1 shows the bimodality of the PDF of the DLN distribution for selected parameter values.
Figure 1.
PDF of the DLN distribution: : (0.3, −0.5, 1, 0.5, 1) (
), (0.5, −0.5, 1, 0.5, 1) (
), (
).
), (0.5, −0.5, 1, 0.5, 1) (
), (
).
The DLN distribution has two modes given by
where
are the modes of the LN distributions.
2.2. Cumulative Distribution Function
The cumulative distribution function (CDF) of the DLN distribution is
where
are the CDFs of the LN distributions and
is the CDF of the standard normal distribution.
Figure 2 shows the CDF of the DLN distribution for selected parameter values. We can observe that and hence decreases as increases.
Figure 2.
CDF of the DLN distribution: : (0.3, −0.5, 1, 0.5, 1) (
), (0.5, −0.5, 1, 0.5, 1) (
), (
).
), (0.5, −0.5, 1, 0.5, 1) (
), (
).
2.3. Hazard Rate Function
The survival function of the DLN distribution is
where
are the SFs of the LN distributions.
The hazard rate function (HRF) of the DLN distribution is
Figure 3 shows the HRF of the DLN distribution for selected parameter values. This figure shows that the HRF of the DLN distribution can be bimodal with one mode on each side of the origin.
Figure 3.
HRF of the DLN distribution: : (0.3, −0.5, 1, 0.5, 0.9) (
), (0.5, 0.5, 0.9, −0.5, 1) (
), (
).
), (0.5, 0.5, 0.9, −0.5, 1) (
), (
).
2.4. Moments and Associated Measures
The rth raw moment of the DLN distribution is
where
are the rth moments of the LN distributions.
In particular, the first four raw moments of Z are
The variance, skewness and kurtosis of the DLN distribution can be obtained using the well-known expressions:
upon substituting for the raw moments.
Figure 4 shows the mean, variance, skewness and kurtosis of the DLN distribution as a function of for selected values of . We can observe that the skewness can be negative or positive, i.e., the DLN distribution can be skewed to the left or skewed to the right.
Figure 4.
Mean, variance, skewness and kurtosis of the DLN distribution as a function of : (
), (
), (
).
), (
), (
).
2.5. Harmonic Mean
The harmonic mean of an RV V is defined as
provided exists.
Proposition 1.
The harmonic mean of the RSMT Z is
Proof.
Since
the proposition follows. □
Corollary 1.
The harmonic mean of the DLN distribution is
where
are the harmonic means of the LN distributions.
Figure 5 shows the harmonic mean of the DLN distribution as a function of for selected parameter values.
Figure 5.
Harmonic mean of the DLN distribution as a function of : : (−0.5, 1, 0.5, 1) (
), (
), (
).
), (
), (
).
2.6. Entropies
Entropies are measures of a system’s variation, instability or unpredictability. For an RV V with PDF , the following are two well-known entropies:
- 1.
- Tsallis entropy [17]:
- 2.
- Shannon entropy [18]:
Proposition 2.
The Tsallis entropy of the RSMT Z is
for , where
Proof.
See [16]. □
Corollary 2.
The Shannon entropy of the RSMT Z is
where
Proposition 3.
The Tsallis entropy of the LN distribution with parameters is
for .
Proof.
Since
the proposition follows. □
Corollary 3.
The Shannon entropy of the LN distribution with parameters is
Proposition 4.
The Tsallis entropy of is
for , where
and
The proof of Proposition 4 follows directly from Propositions 2 and 3.
Corollary 4.
The Shannon entropy of is
where
and
Figure 6 shows the Tsallis and Shannon entropies of the DLN distribution as a function of for selected parameter values.
Figure 6.
Tsallis and Shannon entropies of the DLN distribution as a function of : : (
), (3, 1, −3, 1) (
), (0, 1, 0, 1) (
).
), (3, 1, −3, 1) (
), (0, 1, 0, 1) (
).
Note that the Tsallis and Shannon entropies can be negative for continuous distributions.
3. Maximum Likelihood Estimation
In this section, MLEs of the parameters of the DLN distribution and their asymptotic distributions are derived.
Let be a random sample from the DLN distribution. The log-likelihood function is
where denotes the indicator function. The MLEs of the parameters are:
where
The Fisher information matrix about is
, where is the Fisher information matrix about and is the Fisher information matrix about .
Moreover, the asymptotic distribution of the MLEs as is
where denotes convergence in distribution, stands for multivariate normal distribution and
4. Simulations
This section details simulations to check the finite sample performance of the MLEs of the parameters of the DLN distribution. The performance is evaluated in terms of biases, mean squared errors of the MLEs and coverage probabilities of the corresponding 95% confidence intervals.
The simulation was repeated 10,000 times. In each of the M repetitions, a random sample of size was drawn from the DLN distribution with selected parameter values = , , and , using the following algorithm:
- Generate
- Generate
- Generate
- Set
The parameter values are those estimated in the real data application in Section 5.
The measures examined in this simulation study are:
- The bias of the MLEs:
- The mean squared error (MSE) of the MLEs:
- The coverage probability (CP) of the 95% confidence interval of each parameter:
Figure 7.
Bias of the MLEs of the parameters of the DLN distribution: : (
), (0.5, 0, 1, 1, 2) ( - - -), , .
), (0.5, 0, 1, 1, 2) ( - - -), , .
Figure 8.
MSE of the MLEs of the parameters of the DLN distribution: : (
), (0.5, 0, 1, 1, 2) ( - - -), (), (0.547, −2.812, 1.016, −2.224, 0.764) ().
), (0.5, 0, 1, 1, 2) ( - - -), (), (0.547, −2.812, 1.016, −2.224, 0.764) ().
Figure 9.
CP of the 95% confidence intervals of the parameters of the DLN distribution: : (0.3, −2, 1, −1, 2) (
), (0.5, 0, 1, 1, 2) (- - -), (0.8, 2, 1, −1, 2) (), (0.547, −2.812, 1.016, −2.224, 0.764) ().
), (0.5, 0, 1, 1, 2) (- - -), (0.8, 2, 1, −1, 2) (), (0.547, −2.812, 1.016, −2.224, 0.764) ().
These conclusions show that the MLEs of the DLN distribution are well behaved for point as well as interval estimation.
5. Application
In this section, we apply the proposed DLN distribution to a real data set from a DNA microarray reported in [19]. According to Wikipedia, “A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome”. The data labelled as “SID 377353, ESTs [5’:, 3’:AA055048]” consist of the following 118 observations: 0.029, 0.062, 0.011, 0.009, 0.065, −0.128, 0.133, 0.116, 0.184, 0.111, −0.066, −0.049, 0.05, 0.137, 0.162, 0.173, 0.033, 0.107, 0.11, 0.147, 0.118, 0.172, 0.284, −0.137, 0.038, −0.145, −0.181, −0.155, 0.198, 0.024, 0.079, −0.252, 0.062, 0.097, 0.032, 0.026, 0.195, 0.019, 0.138, −0.3, −0.105, −0.11, −0.168, −0.173, −0.15, 0.078, 0.113, −0.047, 0.024, 0.001, −0.075, 0.014, 0.058, −0.083, −0.339, −0.177, −0.073, −0.044, −0.106, −0.159, −0.101, −0.074, −0.126, −0.131, −0.22, −0.184, −0.105, 0.173, 0.151, 0.064, −0.007, −0.005, −0.189, −0.219, −0.301, −0.212, −0.088, 0.157, 0.042, 0.184, 0.114, 0.102, 0.119, −0.064, −0.075, 0.073, 0.038, 0.017, −0.134, −0.118, −0.097, 0.059, 0.025, −0.102, −0.096, −0.035, 0.057, −0.055, 0.015, −0.23, −0.115, 0.255, 0.034, 0.078, 0.129, 0.081, 0.032, 0.047, −0.145, 0.012, −0.224, 0.074, −0.06, −0.137, 0.034, 0.009, −0.139, −0.141.
Figure 10 shows the histogram of the data, which indicates bimodality around the origin.
Figure 10.
Histogram of the microarray data.
For the sake of comparing the bimodal DLN distribution with other bimodal distributions, we consider the double inverse Gaussian (DIG) distribution proposed in [16]. The PDF of the DIG distribution is
where
are the PDFs of inverse Gaussian distributions.
Table 1 gives the MLEs, their standard errors (S.E.s), estimated log-likelihoods and Kolmogrov–Smirnov (KS), Anderson–Darling (AD) and Cramér–von Mises (CVM) goodness-of-fit tests of the fitted DIG and DLN distributions. This table shows that the MLE of and its S.E. are the same for both the fitted DIG and DLN distributions, since the Bernoulli parameter is estimated independently in the RSMT. In addition, this table shows that the MLEs of and in the fitted DLN distribution are both negative.
Table 1.
Summary of the fitted DIG and DLN distributions for DNA microarray data.
Table 1 shows that the three goodness-of-fit tests have much smaller (larger) test statistics for the fitted DLN (DIG) distribution. This table also shows that the three goodness-of-fit tests reject (accept) the DIG (DLN) distribution for the given data. This conclusion is supported by the diagnostic plots in Figure 11 and Figure 12. In these figures, (i) the PDF and CDF plots indicate, in an informal way, that the fitted DIG (DLN) distribution may not be suitable for the given data; (ii) the quantile–quantile (Q–Q) plots show that the fitted DIG and DLN distributions inappropriately describe the tails of the distributions; (iii) the probability–probability (P–P) plots show that the fitted DIG (DLN) distribution inappropriately (appropriately) describes the center of the distribution.
Figure 11.
Diagnostic plots of the fitted DIG distribution.
Figure 12.
Diagnostic plots of the fitted DLN distribution.
6. Conclusions and Comments
We have proposed a bimodal distribution on the real line, referred to as the double log-normal distribution. We have derived its statistical properties, including the probability density, cumulative distribution and hazard rate functions, the moments and associated measures and harmonic mean, as well as Tsallis and Shannon entropies. Additionally, maximum likelihood estimates of the parameters and their asymptotic distribution are provided. Simulation studies showed that the maximum likelihood estimation performed well in terms of the bias, mean squared error and coverage probability of confidence intervals. Application to a DNA microarray data set showed that the proposed distribution is flexible and competitive for modeling bimodal data around the origin.
Instead of the log-normal distribution, one can consider the length biased log-normal distribution developed in [20]. It will be interesting to formulate a double length biased log-normal distribution.
Author Contributions
Conceptualization, M.E.G.; Methodology, M.F.A. and M.E.G.; Software, M.F.A. and A.N.A.; Validation, M.F.A., A.N.A. and S.N.; Formal analysis, M.F.A.; Data curation, M.F.A. and A.N.A.; Writing—original draft, M.E.G.; Writing—review & editing, S.N. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data are given in the paper. The code used can be obtained from the corresponding author.
Acknowledgments
The authors would like to thank the Editor and the three referees for careful reading and comments which greatly improved the paper.
Conflicts of Interest
The authors have no conflict of interest.
References
- Crow, E.L.; Shimizu, K. Lognormal Distributions: Theory and Applications; Statistics Textbooks and Monographs; Routledge: London, UK, 2018. [Google Scholar]
- Aly, E. A unified approach for developing Laplace-type distributions. J. Indian Soc. Probab. Stat. 2018, 19, 245–269. [Google Scholar] [CrossRef]
- Laplace, P.S. Memoire sur la probabilite des causes par les evenements. Mem. L’Acad. R. Sci. Present. Divers. Savan 1774, 6, 621–656. [Google Scholar]
- Plucinska, A. On a general form of the probability density function and its application to the investigation of the distribution of rheostat resistence. Zastosow. Mat. 1966, 9, 9–19. [Google Scholar]
- Balakrishnan, N.; Kocherlakota, S. On the double Weibull distribution: Order statistics and estimation. Sankhya Indian J. Stat. Ser. B 1985, 47, 161–178. [Google Scholar]
- Kantam, R.R.L.; Narasimham, V.L. Linear estimation in reflected gamma distribution. Sankhya Indian J. Stat. Ser. B 1991, 53, 25–47. [Google Scholar]
- Nadarajah, S.; Afuecheta, E.; Chan, S. A double generalized Pareto distribution. Stat. Probab. Lett. 2013, 83, 2656–2663. [Google Scholar] [CrossRef]
- Bindu, P.; Sangita, K. Double Lomax distribution and its applications. Statistica 2015, 75, 331–342. [Google Scholar]
- Kumar, S.C.; Jose, R. On double Lindley distribution and some of its properties. Am. J. Math. Manag. Sci. 2019, 38, 23–43. [Google Scholar]
- John, S. The three-parameter two-piece normal family of distributions and its fitting. Commun. Stat. Theory Methods 1982, 11, 879–885. [Google Scholar] [CrossRef]
- Kimber, A. Methods for the two-piece normal distribution. Commun. Stat. Theory Methods 1985, 14, 235–245. [Google Scholar] [CrossRef]
- Lingappaiah, G. On two-piece double exponential distribution. J. Korean Stat. Soc. 1988, 17, 46–55. [Google Scholar]
- Abdulah, E.; Elsalloukh, H. Bimodal Class based on the Inverted Symmetrized Gamma Distribution with Applications. J. Stat. Appl. Probab. 2014, 3, 1–7. [Google Scholar] [CrossRef]
- Hoseinzadeh, A.; Maleki, M.; Khodadadi, Z.; Contreras-Reyes, J.E. The skew-reflected-Gompertz distribution for analyzing symmetric and asymmetric data. J. Comput. Appl. Math. 2019, 349, 132–141. [Google Scholar] [CrossRef]
- Halvarsson, D. Maximum likelihood estimation of asymmetric double type II Pareto distributions. J. Stat. Theory Pract. 2020, 14, 22. [Google Scholar] [CrossRef]
- Almutairi, A.; Ghitany, M.; Alothman, A.; Gupta, R.C. Double Inverse-Gaussian Distributions and Associated Inference. J. Indian Soc. Probab. Stat. 2023, 1–32. [Google Scholar] [CrossRef]
- Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Cankaya, M. Asymmetric Bimodal Exponential Power Distribution on the Real Line. Entropy 2018, 20, 23. [Google Scholar] [CrossRef] [PubMed]
- Ratnaparkhi, M.V.; Naik-Nimbalkar, U.V. The Length-biased Lognormal Distribution and its application in the Analysis of data from oil Field Exploration studies. J. Mod. Appl. Stat. Methods 2012, 11, 225–260. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).