Abstract
We introduce a quantile-based multivariate log-normal distribution, providing a new multivariate skewed distribution with positive support. The parameters of this distribution are interpretable in terms of quantiles of marginal distributions and associations between pairs of variables, a desirable feature for statistical modeling purposes. We derive statistical properties of the quantile-based multivariate log-normal distribution involving the transformations, closed-form expressions for the mixed moments, expected value, covariance matrix, mode, Shannon entropy, and Kullback–Leibler divergence. We also present results on marginalization, conditioning, and independence. Additionally, we discuss parameter estimation and verify its performance through simulation studies. We evaluate the model fitting based on Mahalanobis-type distances. An application to children data is presented.
1. Introduction
Quantile regression modeling has been widely applied in different fields such as economics, environmental science, ecology, and medicine, among many others (Cade and Noon [1], Yu et al. [2]). A number of studies on nonparametric quantile regression and its applications have been developed since the seminal work of Koenker and Bassett [3]. Recently, several parametric quantile models have been studied in the regression literature, which have motivated the study of probability distributions that are useful for this purpose.
In the univariate setting, some distributions suitable for parametric quantile modeling appear in Ferrari and Fumes [4], Gijbels et al. [5], Mazucheli et al. [6], and Smithson and Shou [7]. Multivariate quantile modeling is less frequent in the statistical literature and often uses nonparametric methods. Several studies are based on extensions of the quantile concept to a multivariate setting. Some examples can be found in Breckling and Chambers [8], Kong and Mizera [9], McKeague et al. [10], and Wei [11]. Other multivariate models are based on the univariate quantile notion. For instance, Petrella and Raponi [12], Morán-Vásquez and Ferrari [13], and Morán-Vásquez et al. [14] propose methods for jointly modeling univariate marginal quantiles, taking into account the potential correlation between marginals.
In the present article, we define a quantile-based multivariate log-normal distribution. This distribution has positive support, and is simplified to the quantile-based log-normal distribution (Saulo et al. [15]) in the univariate setting. On the other hand, the usual multivariate log-normal distribution (Morán-Vásquez and Ferrari [13] and Morán-Vásquez et al. [14]) can be expressed as a quantile-based multivariate log-normal distribution. The parameters of the proposed distribution are interpretable in terms of marginal quantiles and associations between pairs of variables, making this model attractive to quantile modeling for correlated multivariate positive skewed data.
In this article, we study some statistical properties of the quantile-based multivariate log-normal family, describe the estimation of its parameters, and show its usefulness through an application to real data. We derive distributional properties obtained through transformations, as well as results related to the mixed moments, expected value, covariance matrix, mode, Shannon entropy, Kullback–Leibler divergence, marginal and conditional distributions, and independence. Applications of some of our results derived in this article establish new properties of the multivariate log-normal distribution. We compute the maximum likelihood estimates of the parameters of the quantile-based multivariate log-normal distribution from the maximum likelihood estimates of the multivariate log-normal distribution. We evaluate the performance of the proposed estimation procedure through Monte Carlo simulations. The usefulness of the proposed distribution for modeling multivariate positive skewed data is illustrated through an analysis of real data on children’s weights and heights.
The paper is organized as follows. Section 2 presents the quantile-based multivariate log-normal distribution. Section 3 deals with the derivation of various statistical properties of the proposed distribution. Section 4 focuses on maximum likelihood estimation and simulation studies. Also, a graphical method to assess the goodness of fit is described. Section 5 presents an application to real data. Finally, Section 6 closes the paper with concluding remarks.
2. Quantile-Based Multivariate Log-Normal Distribution
We denote vectors with lowercase Greek letters in bold and matrices with capital Greek letters in bold. For vectors and matrices, the components are denoted by the respective Greek letter in normal font. For example, if and are a real matrix, then and . We denote by and the p-dimensional vectors whose components are all zero and one, respectively. We denote by the identity matrix. Let be a square matrix. We denote by and the determinant and trace of , respectively. If is a symmetric matrix, then means that is the positive definite. Additionally, is the unique symmetric positive definite square root of . If and are matrices of the same dimension, then denotes the Hadamard product of and . If is a vector, then denotes the diagonal matrix with diagonal elements of , that is, . We define the set as
If and f are a real function, we denote , provided that the components of are in the domain of f. If and , we write . We denote random vectors and their components with capital Roman letters in bold and normal fonts, respectively.
It is well known that the PDF of a multivariate normal vector is given by
where is the square of the Mahalanobis distance between and with respect to . On the other hand, the random vector has a multivariate log-normal distribution with median vector and dispersion matrix , denoted by , if , where the log denotes the natural logarithm function. The PDF of is (Morán-Vásquez et al. [14])
The multivariate log-normal distribution given in (1) has a slightly different parameterization than the one used by Fang et al. ([16], Section 2.8).
Let be fixed values in . Theorem 5 and Corollary 2 of Morán–Vásquez and Ferrari [13] permits us to establish that the -quantile of satisfies , where is the -quantile of a standard normal distribution, . Note that the quantile vector can be expressed as
where and . A reparameterization of the multivariate log-normal distribution in terms of is obtained by replacing in (1). Based on this, we present the quantile-based multivariate log-normal distribution in Definition 1.
Definition 1.
Let and be fixed vectors such that is the -quantile of a standard normal distribution, . The random vector is said to have a quantile-based multivariate log-normal distribution with quantile vector and dispersion matrix , denoted by , if its PDF is
where .
If we choose , then Definition 1 coincides with the definition of a multivariate log-normal distribution (Morán-Vásquez et al. [14]). In this case, is the median vector. Note that if , which establishes the way in which the quantile-based multivariate log-normal and normal distributions are related through the logarithmic transformation.
Figure 1 displays contour plots (at levels 0.15, 0.1, 0.05, 0.02, 0.01) of the quantile-based bivariate log-normal distribution. The legend indicates the values of , , and all the parameters considered in the first plot and the values that are changed from a plot to the subsequent one (in alphabetical order). The parameters and of the distribution in Figure 1a are the marginal medians of and , respectively. For Figure 1b–f, these parameters are the first quartile of and the median of , respectively. The parameter impacts the scale of the marginal distribution of (Figure 1b,c). The parameter controls the dispersion of the marginal distribution of (Figure 1c,d). The parameter controls the association between the marginal distributions of and , ranging from a negative to positive association (Figure 1d–f).
Figure 1.
Contour plots at levels 0.15, 0.1, 0.05, 0.02, 0.01 of the joint PDF of given in Definition 1, where (a) , , , , , (b) , (c) , (d) , (e) , (f) .
The quantile-based multivariate log-normal distribution is suitable in situations where it is necessary to model quantiles of the marginals, taking into account the correlation between them. Additionally, our model can be useful for regression modeling purposes. For instance, assume that, for fixed k, , where are unknown regression parameters and are fixed covariates. So, is the multiplicative effect of a one unit increase in on the -quantile of . This is a parametric methodology that allows us to jointly analyze marginal quantiles, taking into account the association among the response variables through the dispersion matrix . These types of models can provide more accurate estimates than those that consider univariate models for each marginal assuming independence among them (Morán-Vásquez et al. [14]).
3. Main Properties
Theorems 1–3 state distributional results involving the transformation of quantile-based multivariate log-normal random vectors.
Theorem 1.
Let . If , then .
Proof.
Corollary 1.
Let . If , then .
Proof.
The result follows by applying Theorem 1 to the quantile-based multivariate log-normal distribution generated by . □
The result stated in the above corollary can also be obtained as a particular case of Theorem 3(1) of Morán–Vásquez and Ferrari [13].
Theorem 2.
Let have nonzero components. If , then , where .
Proof.
Corollary 2.
Let with nonzero components. If , then .
Proof.
The result follows by applying Theorem 2 to the quantile-based multivariate log-normal distribution generated by . □
The above corollary can also be obtained as a particular case of Theorem 3(2) of Morán–Vásquez and Ferrari [13].
Theorem 3.
Let . If , then
where .
Proof.
Since , we have
where and . This completes the proof. □
Corollary 3.
Let . If , then
Proof.
Simply apply Theorem 3 to the quantile-based multivariate log-normal distribution generated by . □
In Theorem 4, we give a closed-form expression for the mixed moments of quantile-based multivariate log-normal random vectors.
Theorem 4.
Let . If , then
Proof.
In the following corollary, we derive the expected value and the covariance matrix of a quantile-based multivariate log-normal random vector.
Corollary 4.
Let . Then,
- , where is the vector with elements being the main diagonal elements of Σ.
- , where
Proof.
For each , by choosing with all its components being 0, except the kth which is 1, in (6), we obtain
From the above expression, we get the first assertion. Similarly, for each , by choosing with all components equal to 0, except the jth and kth, which are 1, in (6), we have
The second assertion is obtained from the identity . □
In Section 2, we described the behavior of the quantile-based multivariate log-normal distribution in terms of the parameters involved in the matrix . The following corollary establishes an exact interpretation of these parameters in terms of covariance between pairs of variables according to their signs.
Corollary 5.
Let . Then,
- if and only if , .
- if and only if , .
Proof.
The result follows from (8). □
Corollary 6.
Let . If , then
Moreover, , where is the vector with elements being the main diagonal elements of Σ, and , with
Proof.
Apply Theorem 4 and Corollary 4 to the quantile-based multivariate log-normal distribution generated by . □
Theorem 5 gives a closed-form expression for the mode of the quantile-based multivariate log-normal distribution.
Theorem 5.
The mode of is given by . The value of the PDF of at the mode is
Proof.
The mode of is obtained by maximizing (3) with respect to , which is the one that maximizes the function
with respect to . By using results on vector differentiation (Seber ([17], Chapter 17)), we find that the equation is equivalent to
The solution for of the above equation is .
Now, for , we have
which implies that
for all . Hence, . □
Corollary 7.
The mode of is given by . The value of the PDF of at the mode is
Proof.
Apply Theorem 5 to the quantile-based multivariate log-normal distribution generated by . □
Theorem 6 provides the distribution of a Mahalanobis-type distance involving a quantile-based multivariate log-normal random vector.
Theorem 6.
If , then .
Proof.
The result follows by noting that . □
The above result allows us to evaluate the goodness of fit of the quantile-based multivariate log-normal distribution by using quantile–quantile plots to compare empirical Mahalanobis distances with theoretical quantiles obtained from a chi-squared distribution with p degrees of freedom.
The Shannon entropy (also called differential entropy) of a continuous random vector with PDF is defined as
On the other hand, the Kullback–Leibler (KL) divergence between the distributions of two p-dimensional random vectors and is given by
where and denote the PDFs of and , respectively. The above expected value is defined with respect to the PDF . A detailed study about Shannon entropy and KL divergence can be found in Pardo [18].
Lemmas 1 and 2 provide the Shannon entropy and the KL divergence for the multivariate normal distribution, respectively.
Lemma 1.
The Shannon entropy of is given by
Proof.
See Pardo ([18], p. 32). □
Note that in the above lemma can be expressed as
Lemma 2.
The KL divergence between and is given by
Proof.
See Pardo ([18], p. 33). □
In the following Theorem, we derive the Shannon entropy of the quantile-based multivariate log-normal distribution.
Theorem 7.
The Shannon entropy of is given by
Proof.
By definition,
By making the change of variables , with Jacobian , in the above integral, we have
where . The result follows by calculating by using Lemma 1 and replacing in the above expression. □
Corollary 8.
The Shannon entropy of is given by
Proof.
The result follows by applying Theorem 7 to the quantile-based multivariate log-normal distribution generated by . □
In Theorem 8, we derive the KL divergence between two quantile-based multivariate log-normal distributions.
Theorem 8.
The KL divergence between and is given by
Proof.
By definition,
We substitute , with Jacobian , above to arrive at
where and . By using Lemma 2 to calculate we arrive at the desired result. □
Corollary 9.
The KL divergence between and is given by
Proof.
Take the quantile-based multivariate log-normal random vector in Theorem 8 generated by . □
Corollary 10.
The KL divergence between and is given by
Proof.
Generate the quantile-based multivariate log-normal random vector in Corollary 9 with . □
With the aim to derive results on marginal and conditional distributions and independence, relating sub-vectors of the random vector having a quantile-based multivariate log-normal distribution, we introduce notations for partitions of , , , and as follows:
where , , , , , , , , and and are such that . The Schur complement of the block of is given by . Also, we define , , and . The dimension p is such that .
In Lemma 3, we give a factorization of the PDF of the quantile-based multivariate log-normal distribution.
Lemma 3.
Proof.
It suffices to show that
The straightforward calculation shows that
Now, using the result
we have
which is the desired result. □
In Theorem 9, we show that the quantile-based multivariate log-normal family is preserved under marginalization and conditioning. In this theorem, we also present a characterization of the independence between subvectors of this family.
Theorem 9.
Let . Consider the partitions given in (9). Then,
- .
- .
- and are independent if and only if .
4. Parameter Estimation
The reparameterization used in Definition 1 permits us to compute the maximum likelihood estimates of the parameters of the quantile-based multivariate log-normal distribution through the maximum likelihood estimates of the parameters of the multivariate log-normal distribution. Let be the observed values of a random sample of . We denote the maximum likelihood estimators of and by and , respectively. From (2), we have
where , and and are the maximum likelihood estimators of the multivariate log-normal distribution given by (Morán-Vásquez et al. [14])
Note that the maximum likelihood estimator of in the quantile-based multivariate log-normal distribution is the same as in the multivariate log-normal distribution. Furthermore, this estimator is the same for any choice of .
We assess the goodness of fit of the quantile-based multivariate log-normal distributions by using quantile–quantile plots, comparing the empirical Mahalanobis distances , , with the theoretical quantiles , where , , obtained from a chi-squared distribution with p degrees of freedom. Additionally, we plot simulated envelopes (Atkinson [19]) for the quantile–quantile plots in order to help the comparison between quantiles and judge the adequacy of the models.
To evaluate the estimation procedure, we conducted simulations with the quantile-based bivariate log-normal distribution. We consider the sample sizes of , and Monte Carlo replicates. The random samples of were generated through the following steps:
- Generate a random sample of of .
- Compute . Then, is a random sample of .
The true parameters were yielded by fitting the quantile-based bivariate log-normal distribution to the children data set considered in Section 5. Table 1 reports the median and the interquartile range for the estimated values of the parameters of the investigated models. The medians get close to the true parameters and the interquartile range gets smaller as the sample size grows, indicating a satisfactory performance of the estimators. All the computations were conducted in the R software [20].
Table 1.
Median (M) and interquartile range (IQR) of the parameter estimates of the quantile-based bivariate log-normal distributions.
5. Application
Anthropometric measures are useful for monitoring the growth and identification of childhood developmental problems. The World Health Organization [21,22] provides quantile estimations of several children’s anthropometric characteristics as height, weight, and head and arm circumferences, among others. These estimations are obtained by fitting univariate models for each anthropometric measurement separately, ignoring the association between them. We use the quantile-based bivariate log-normal distribution to estimate quantiles of children’s weights (in kilograms) and heights (in centimeters), considering the natural correlation between them. We consider a sample of 587 children between 2 and 5 years of age collected at the year 2018 in the El Poblado neighborhood, located in Medellín, Colombia [23].
The bagplot in Figure 2 shows that children’s weights and heights are positively associated with slight joint skewedness and highlights two outliers. In order to estimate the third quartile of weight and the first quartile of height, we fitted the quantile-based bivariate log-normal distribution , with and . The maximum likelihood estimates of the parameters are , , , , and . Therefore, the third quartile of the children’s height is estimated to be cm, and the first quartile of the children’s weight is estimated to be kg. Since , thereby the children’s weights and heights are estimated to be positively correlated, which is consistent with the descriptive analysis presented in Figure 2.
Figure 2.
Bagplot of weight vs. height; children’s data.
Figure 3 shows the quantile–quantile plot with simulated envelopes for the Mahalanobis distances for the fitted quantile-based bivariate log-normal distribution. This plot suggests a suitable fit.
Figure 3.
Quantile–quantile plot with simulated envelopes for the Mahalanobis distances for the fitted distribution.
6. Final Remarks
In this article, we have proposed a multivariate distribution with positive support derived by applying a parameterization of the multivariate log-normal distribution by using their marginal quantiles. This distribution will attract researchers in the area of quantile modeling for correlated multivariate positive skewed data. We derived a number of important statistical properties of this distribution involving the transformations, mixed moments, expected value, covariance matrix, mode, Shannon entropy, Kullback–Leibler divergence, marginalization, conditioning, and independence. Needless to say, the quantile-based multivariate log-normal distribution defined in this article is rich in theoretical properties and can easily be manipulated from a mathematical viewpoint. The parameter estimation was approached by using the maximum likelihood estimation method. The satisfactory behavior of the estimation procedure was verified through simulation studies. Also, a graphical diagnostic tool was employed in order to assess the quality of the fitted distributions. On the other hand, an application to real data is presented and discussed as an alternative for the quantile estimation of the children’s weights and heights, considering the natural association between these variables.
There are several aspects that will be addressed in future articles. Bayesian approaches for the estimation of the parameters of the quantile-based multivariate log-normal distribution will be developed. The study of regression models based on the quantile-based multivariate log-normal distributions together with inferential developments and applications to real data will also be undertaken. These models will allow us to analyze the relationship between marginal quantiles of response vectors and a set of explanatory variables, taking into account the potential association among the marginal response variables. Additionally, a comparative analysis of this methodology with the model proposed by Petrella and Raponi [12] will be included in a forthcoming article.
Author Contributions
Conceptualization, R.A.M.-V., A.R.-C. and D.K.N.; methodology, R.A.M.-V., A.R.-C. and D.K.N.; investigation, R.A.M.-V., A.R.-C. and D.K.N.; writing-original draft preparation, R.A.M.-V., A.R.-C. and D.K.N.; writing-review and editing, R.A.M.-V., A.R.-C. and D.K.N. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare that there are no conflict of interests regarding the publication of this article.
References
- Cade, B.S.; Noon, B.R. A gentle introduction to quantile regression for ecologists. Front. Ecol. Environ. 2003, 1, 412–420. [Google Scholar] [CrossRef]
- Yu, K.; Lu, Z.; Stander, J. Quantile regression: Applications and current research areas. J. R. Stat. Soc. Ser. D 2003, 52, 331–350. [Google Scholar] [CrossRef]
- Koenker, R.; Bassett, G., Jr. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
- Ferrari, S.L.P.; Fumes, G. Box–Cox symmetric distributions and applications to nutritional data. Adv. Stat. Anal. 2017, 101, 321–344. [Google Scholar] [CrossRef]
- Gijbels, I.; Karim, R.; Verhasselt, A. Semiparametric quantile regression using family of quantile-based asymmetric densities. Comput. Stat. Data Anal. 2021, 157, 107–129. [Google Scholar] [CrossRef]
- Mazucheli, J.; Alves, B.; Menezes, A.; Leiva, V. An overview on parametric quantile regression models and their computational implementation with applications to biomedical problems including COVID-19 data. Comput. Methods Programs Biomed. 2022, 221, 106816. [Google Scholar] [CrossRef] [PubMed]
- Smithson, M.; Shou, Y. CDF-quantile distributions for modelling random variables on the unit interval. Br. J. Math. Stat. Psychol. 2017, 70, 412–438. [Google Scholar] [CrossRef] [PubMed]
- Breckling, J.; Chambers, R. M-Quantiles. Biometrika 1988, 75, 761–771. [Google Scholar] [CrossRef]
- Kong, L.; Mizera, I. Quantile tomography: Using quantiles with multivariate data. Stat. Sin. 2012, 22, 1589–1610. [Google Scholar] [CrossRef]
- McKeague, I.W.; López-Pintado, S.; Hallin, M.; Šiman, M. Analyzing growth trajectories. J. Dev. Orig. Health Dis. 2011, 2, 322–329. [Google Scholar] [CrossRef] [PubMed]
- Wei, Y. An approach to multivariate covariate-dependent quantile contours with application to bivariate conditional growth charts. J. Am. Stat. Assoc. 2008, 103, 397–409. [Google Scholar] [CrossRef]
- Petrella, L.; Raponi, V. Joint estimation of conditional quantiles in multivariate linear regression models with an application to financial distress. J. Multivar. Anal. 2019, 173, 70–84. [Google Scholar] [CrossRef]
- Morán-Vásquez, R.A.; Ferrari, S.L.P. Box-Cox elliptical distributions with application. Metrika 2018, 82, 547–571. [Google Scholar] [CrossRef]
- Morán-Vásquez, R.A.; Mazo-Lopera, M.A.; Ferrari, S.L.P. Quantile modeling through multivariate log-normal/independent linear regression models with application to newborn data. Biom. J. 2021, 63, 1290–1308. [Google Scholar] [CrossRef] [PubMed]
- Saulo, H.; Dasilva, A.; Leiva, V.; Sánchez, L.; de la Fuente-Mella, H. Log-symmetric quantile regression models. Stat. Neerl. 2022, 76, 124–163. [Google Scholar] [CrossRef]
- Fang, K.T.; Kotz, S.; Ng, K.W. Symmetric Multivariate and Related Distributions; Chapman and Hall: London, UK, 1990. [Google Scholar]
- Seber, G.A.F. A Matrix Handbook for Staticians; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
- Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar]
- Atkinson, A.C. Two graphical displays for outlying and influential observations in regression. Biometrika 1981, 68, 13–20. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 21 July 2023).
- World Health Organization. WHO Child Growth Standards: Length/Height-for-Age, Weight-for-Age, Weight-for-Length, Weight-for-Height and Body Mass Index-for-Age: Methods and Development; World Health Organization: Geneva, Switzerland, 2006. Available online: https://apps.who.int/iris/handle/10665/43413 (accessed on 25 May 2023).
- World Health Organization. WHO Child Growth Standards: Head Circumference-for-Age, Arm Circumference-for-Age, Triceps Skinfold-for-Age and Subscapular Skinfold-for-Age: Methods and Development; World Health Organization: Geneva, Switzerland, 2007. Available online: https://apps.who.int/iris/handle/10665/43706 (accessed on 25 May 2023).
- MEData: Portal de datos de Medellín. Estado Nutricional de Menores de 6 años Programa de Crecimiento y Desarrollo. 2022. Available online: http://medata.gov.co/dataset/estado-nutricional-de-menores-de-6-anos-programa-de-crecimiento-y-desarrollo (accessed on 21 July 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).