Next Article in Journal
Ergodic Rate for Fading Interference Channels with Proper and Improper Gaussian Signaling
Previous Article in Journal
Emotional Speech Recognition Based on the Committee of Classifiers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Sampling Distribution of the Total Correlation for Multivariate Gaussian Random Variables

Department of Mathematics and Statistics, Queen’s University, Kingston, ON K7L 3N6, Canada
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(10), 921; https://doi.org/10.3390/e21100921
Submission received: 19 July 2019 / Revised: 9 September 2019 / Accepted: 16 September 2019 / Published: 22 September 2019
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
The sampling distribution of the total correlation (TC) for a d-dimensional standardized multivariate Gaussian random variable with an identity covariance matrix is derived. It is shown to be the distribution of a sum of generalized beta random variables. It is also shown that, for large dimension and sample size, a central limit theorem holds, providing a Gaussian approximation to the sampling distribution for high dimensional data.

1. Introduction

Mutual information quantifies the information shared between two random variables [1,2,3]. This concept can be been generalized to d variables in a variety of ways [4,5,6,7], with the most direct generalization being Watanabe’s total correlation (TC),
T ( X ) i = 1 d h ( X i ) h ( X )
where X is a vector whose components are the d random variables X 1 , , X d , and for continuous random variables, h ( X i ) is the differential entropy of X i and h ( X ) is the joint differential entropy of X .
Total correlation is also sometimes called multivariate mutual information, and it is the Kullback–Leibler divergence between the joint density of X and the density obtained by taking the product of the marginal densities of the X i . Thus, the total correlation T ( X ) quantifies, in a quite general sense, the information shared among all the d random variables. The total correlation is non-negative and in the case where all d random variables are mutually independent we have T ( X ) = 0 [7,8]. For the special case where X is multivariate Gaussian with arbitrary mean and covariance matrix Σ , the total correlation can be written explicitly as
T ( X ) = 1 2 i = 1 d log σ i i 2 1 2 log | Σ |
where σ i j 2 is the i j th entry of Σ . When the X i are independent we have σ i j 2 = 0 for all i j and so log | Σ | = log σ 11 2 σ 22 2 σ d d 2 , giving T ( X ) = 0 in Equation (2) as expected.
The total correlation provides a natural way to quantify dependencies among a set of random variables. For example, often we seek to determine if a set of random variables are mutually independent because dependency among variables can indicate interesting and meaningful relationships in nature. To do so one can take a sample from the unknown distribution and compute the total correlation from this sample. Even if the random variables are mutually independent, however, the total correlation measured using such a finite sample will typically be positive (rather than zero) simply because of sampling variation. Therefore, it is of interest to know the sampling distribution of the total correlation under independence. Once we have the sampling distribution we can then perform statistical tests of independence. Here we derive the sampling distribution of (2) in the case where the X i are standardized (i.e., zero mean, unit variance), independent, Gaussian random variables.
Previous authors have proposed exact expressions for the mean and variance of the sample total correlation [9,10]. In fact, Guerrero (Section 2.1 of [9]) derived a moment generating function for the sample total correlation using the distribution of the log-determinant of a Wishart matrix (see Wilks [11,12]). Unfortunately the asymptotic approximation of Guerrero’s result does not match the results of Marralec [10] suggesting that one of the two is incorrect. We will resolve this discrepancy by deriving the moment generating function directly from our expression for the probability density function of the sample total correlation. In the limit of large sample size our results match those presented in Section 4.1 of Marralec [10], suggesting that the moment generating function of [9] is incorrect.

2. Definitions and Preliminaries

Let X represent a d-variate zero mean Gaussian random variable with covariance matrix Σ = I d where I d is the d-dimensional identity matrix. Let { x 1 , , x n } denote a sample of n draws from the distribution of X . We focus on the case where n d . The sample covariance matrix is Σ ^ = ( 1 / n ) i = 1 n x i x i = { σ ^ i j } and n Σ ^ is Wishart distributed with n degrees of freedom, which we denote as n Σ ^ W ( Σ , d , n ) . From Equation (2) the sample total correlation is then also a random variable and is computed as
T ^ d , n ( X ) = 1 2 i = 1 d log σ i i ^ 2 1 2 log | Σ ^ |
where the subscripts d and n indicate that T ^ is a family of random variables indexed by dimension and sample size.
Odell and Feiveson’s 1966 result [13] provides a convenient way to characterize a Wishart-distributed matrix. Suppose that V i ( n ) ( 1 i d ) are independent chi-square random variables with n i + 1 degrees of freedom. Suppose that N i j are independent standardized normal random variables for 1 i < j d , also independent of every V i ( n ) . Now construct the random variables
b 11 = V 1 ( n ) b j j = V j ( n ) + i = 1 j 1 N i j 2 , 2 j d b 1 j = N 1 j V 1 ( n ) 2 j d b i j = N i j V i ( n ) + k = 1 i 1 N k i N k j , 2 < i < j d .
Then the matrix B = { b i j } (with b i j = b j i ) is Wishart-distributed W ( I d , d , n ) and thus we have
n σ ^ i i 2 b i i V i ( n ) + A i 1 < 1 d
where A i are independent chi-square random variables with i 1 degrees of freedom and we define A 1 = 0 . Now following [14] we can also define the lower-triangular matrix T = { t i j } as
t i i = V i ( n ) t i j = N j i 1 j < i d t i j = 0 i < j d
and thus B = T T . Furthermore, | B | = | T T | = | T | 2 = i = 1 d t i i 2 = i = 1 d V i ( n ) , revealing that
n d | Σ ^ | i = 1 d V i ( n ) .
Result (7) is a special case of results found in Wilks [11]. For analogous results involving complex matrices see Goodman [15].

3. The Sampling Distribution of the Total Correlation

With the above preliminaries the we can now state the following theorem.
Theorem 1
(The Sampling Distribution of TC). Consider a sample of size n from a set of d independent, standardized, Gaussian random variables, with n d . The total correlation (TC) is distributed as
T ^ d , n ( X ) 1 2 i = 1 d 1 log 1 + i n i F i , n i
where F i , n i are independent F-distributed random variables with i and n i degrees of freedom. Equivalently, (8) can be written as
T ^ d , n ( X ) i = 1 d 1 Y i , n
where Y i , n is a beta-exponential random variable with probability density
f Y i , n ( y ) = λ ( 1 e λ y ) i 2 1 ( e λ y ) n i 2 B ( i 2 , n i 2 ) y > 0
having parameter λ = 2 .
Proof. 
Writing Equation (3) as
T ^ d , n ( X ) = 1 2 log i = 1 d σ i i ^ 2 | Σ ^ |
and using result (5) and (7) one obtains
T ^ d , n ( X ) 1 2 log i = 1 d V i ( n ) + A i i = 1 d V i ( n ) 1 2 log i = 1 d 1 + A i V i ( n ) 1 2 i = 1 d log 1 + A i V i ( n ) .
Scaling each chi-square by their corresponding degrees of freedom and re-indexing, yields (8). Equivalently, if we define Y i , n = 1 2 log 1 + i n i F i , n i then T ^ d , n ( X ) i = 1 d 1 Y i , n , and using standard techniques it be can shown that the random variable Y i , n has probability density
f Y i , n ( y ) = 2 ( 1 e 2 y ) i 2 1 ( e 2 y ) n i 2 B ( i 2 , n i 2 ) y > 0
where B ( x , y ) is the beta function. ☐
Corollary 1.
The moment generating function for T ^ d , n ( X ) is
M d , n ( t ) = Γ ( n 2 ) Γ ( n t 2 ) d 1 i = 1 d 1 Γ ( n i t 2 ) Γ ( n i 2 )
where Γ ( x ) is the gamma function. The mean and variance of T ^ d , n ( X ) are therefore
μ d , n = d 1 2 ψ ( n / 2 ) 1 2 i = 1 d 1 ψ ( n i 2 ) σ d , n 2 = d 1 4 ψ ( 1 ) ( n / 2 ) + 1 4 i = 1 d 1 ψ ( 1 ) ( n i 2 )
where ψ ( x ) = Γ ( x ) / Γ ( x ) is the digamma function and ψ ( k ) ( x ) denotes its k t h derivative.
Proof. 
Taking Y i , n = 1 2 log 1 + i n i F i , n i , the moment generating function for Y i , n is
ϕ i , n ( t ) = E [ e t Y i , n ] = Γ [ n 2 ] Γ [ n i t 2 ] Γ [ n i 2 ] Γ [ n t 2 ] .
The random variables in the sum i = 1 d 1 Y i , n are independent, and therefore the moment generating function M d , n ( t ) for T ^ d , n ( X ) is the appropriate product of the functions ϕ i , n ( t ) . Equation (12) then follow directly from the properties of moment generating functions. ☐
Guerrero [9] obtained a formula for the mean and variance of T ^ d , n ( X ) (except for a typo in the variance) using Wilks’ [12] moment generating function for the generalized variance. These are remarkably close to (12), but the proposed moment generating function for the sample total correlation information provided in Guerrero [9] appears to be incorrect.

4. A Central Limit Theorem for the Total Correlation

Girkos central limit theorem [16] implies asymptotic normality of the sample log-determinant, as seen in the work of Bao et al., and Cai et al. [17,18]. This suggests the existence of a central limit theorem result for T ^ d , n ( X ) when the dimension d and sample size n are large. Here we provide such a result.
Define the mean and variance of Y i , n as m i , n = E [ Y i , n ] and s i , n 2 = E [ ( Y i , n μ i , n ) 2 ] , and the mean-centered random variables Y i , n * = Y i , n m i , n . Note that σ d , n 2 = i = 1 d 1 s i , n 2 .
Theorem 2
(Asymptotic normality of TC). Suppose n and d in such a way that n / d k where 1 k < . Then
1 σ d , n 2 i = 1 d 1 Y i , n * N ( 0 , 1 )
where convergence is in distribution. Thus, for large n and d (with n d ) the total correlation T ^ d , n ( X ) is approximately normally distributed with mean and variance given by μ d , n and σ d , n 2 in Equations (12).
Proof. 
The Y i , n * are a triangular array of random variables such that, for any fixed n the Y i , n * ( 1 i d 1 ) are independent. Thus, (13) will hold provided that the Lyapunov condition is satisfied [19]; namely, that there exists a δ > 0 such that
lim d , n 1 σ d , n 2 + δ i = 1 d 1 E [ | Y i , n * | 2 + δ ] = 0 .
For δ = 2 the entries in Lyapunov’s summation represent each Y i , n ’s fourth central moment, for which the generating function is C i , n ( t ) = e m i , n t ϕ i , n ( t ) . The summation therefore becomes
i = 1 d 1 E [ ( Y i , n * ) 4 ] = i = 1 d 1 1 16 3 ψ ( 1 ) ( n i 2 ) ψ ( 1 ) ( n / 2 ) 2 + ψ ( 3 ) ( n i 2 ) ψ ( 3 ) ( n / 2 ) = 3 16 i = 1 d 1 ψ ( 1 ) ( n i 2 ) ψ ( 1 ) ( n / 2 ) 2 + 1 16 i = 1 d 1 ψ ( 3 ) ( n i 2 ) d 1 16 ψ ( 3 ) ( n / 2 )
while the denominator in Lyapunov’s condition is
σ d , n 4 = 1 4 i = 1 d 1 ψ ( 1 ) ( n i 2 ) d 1 4 ψ ( 1 ) ( n / 2 ) 2 .
In Appendix A we show that
0 3 16 i = 1 d 1 ψ ( 1 ) ( n i 2 ) ψ ( 1 ) ( n / 2 ) 2 + 1 16 i = 1 d 1 ψ ( 3 ) ( n i 2 ) d 1 16 ψ ( 3 ) ( n / 2 ) 48 n d + 1
and, for any fixed 1 k < , and for sufficiently large d and n with n / d sufficiently close to k,
1 4 ln n n d + 1 + d 1 n ( n d + 1 ) d 1 2 2 n + 4 n 2 2 1 4 i = 1 d 1 ψ ( 1 ) ( n i 2 ) d 1 4 ψ ( 1 ) ( n / 2 ) 2 .
Therefore, for any fixed 1 k < , and for sufficiently large d and n with n / d sufficiently close to k, we have
0 1 σ d , n 4 i = 1 d 1 E [ | Y i , n * | 4 ] 48 n d + 1 1 4 ln n n d + 1 + d 1 n ( n d + 1 ) d 1 2 2 n + 4 n 2 2 .
Now first consider the case where n = d (and therefore k = 1 ). Then (14) simplifies to
0 1 σ d , n 4 i = 1 d 1 E [ | Y i , n * | 4 ] 48 1 4 ln n + n 1 n n 1 2 2 n + 4 n 2 2 .
Taking the limit n yields zero on the right-hand side, verifying Lyapunov’s condition for k = 1 . Next, consider the case where n > d . Taking the limit in (14) as n and d in such a way that n / d k where 1 < k < , again we see that the right-hand side is zero. This verifies Lyapunov’s condition in the case where k > 1 , thereby completing the proof. ☐

5. Conclusions

The total correlation of a multivariate random variable (sometimes called multivariate mutual information) is the Kullback–Leibler divergence between the joint density of the random variable and the product of its marginal densities. It therefore provides a natural measure of the degree of independence of a set of random variables. In this paper we derived the sampling distribution of the total correlation for a d-dimensional standardized multivariate Gaussian random variable with identity covariance matrix, and showed that it is the distribution of a sum of generalized beta random variables. We also proved that, for large dimension and sample size, a central limit theorem holds, providing a Gaussian approximation to the sampling distribution for high dimensional data.

Author Contributions

Conceptualization, T.R. and T.D.; methodology, T.R. and T.D.; formal analysis, T.R. and T.D.; investigation, T.R. and T.D.; writing–original draft preparation, T.R.; writing–review and editing, T.D.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The proof of the central limit theorem result makes use of the following two lemmas. Both are based on an inequality for the digamma function found in [20] (where m 1 is an integer)
( m 1 ) ! x m + m ! 2 x m + 1 ( 1 ) m + 1 ψ ( m ) ( x ) ( m 1 ) ! x m + m ! x m + 1 .
Lemma A1.
Suppose d n . Then the following inequality holds
0 3 16 i = 1 d 1 ψ ( 1 ) ( n i 2 ) ψ ( 1 ) ( n / 2 ) 2 + 1 16 i = 1 d 1 ψ ( 3 ) ( n i 2 ) d 1 16 ψ ( 3 ) ( n / 2 ) 48 n d + 1 .
Proof. 
The left-hand inequality follows from the fact that ψ ( 1 ) ( x ) and ψ ( 3 ) ( x ) are both monotonically decreasing functions and so ψ ( 1 ) ( n i 2 ) ψ ( 1 ) ( n 2 ) and ψ ( 3 ) ( n i 2 ) ψ ( 3 ) ( n 2 ) for all 1 i d 1 . For the right-hand inequality we have
3 16 i = 1 d 1 ψ ( 1 ) ( n i 2 ) ψ ( 1 ) ( n / 2 ) 2 + 1 16 i = 1 d 1 ψ ( 3 ) ( n i 2 ) d 1 16 ψ ( 3 ) ( n / 2 )
3 16 i = 1 d 1 ψ ( 1 ) ( n i 2 ) 2 + 1 16 i = 1 d 1 ψ ( 3 ) ( n i 2 ) 3 16 i = 1 d 1 2 n i + 4 ( n i ) 2 2 + 1 16 i = 1 d 1 16 ( n i ) 3 + 96 ( n i ) 4 3 16 i = 1 d 1 8 n i 2 + 1 16 i = 1 d 1 192 ( n i ) 3 12 j = n d + 1 n 1 1 j 2 + 1 j 3 = 12 1 ( n d + 1 ) 2 + 1 ( n d + 1 ) 3 + j = n d + 2 n 1 1 j 2 + 1 j 3 12 1 ( n d + 1 ) 2 + 1 ( n d + 1 ) 3 + j = n d + 1 n 1 1 x 2 + 1 x 3 d x
= 12 1 ( n d + 1 ) 2 + 1 ( n d + 1 ) 3 + 1 n d + 1 1 n 1 + 1 2 ( n d + 1 ) 1 2 ( n 1 ) 12 1 ( n d + 1 ) 2 + 1 ( n d + 1 ) 3 + 1 n d + 1 + 1 2 ( n d + 1 ) 48 n d + 1 .
 ☐
Lemma A2.
Suppose d n . Then, for any fixed 1 k , and for sufficiently large d and n with n / d sufficiently close to k, the following inequality holds
1 4 ln n n d + 1 + d 1 n ( n d + 1 ) d 1 2 2 n + 4 n 2 2 1 4 i = 1 d 1 ψ ( 1 ) ( n i 2 ) d 1 4 ψ ( 1 ) ( n / 2 ) 2 .
Proof. 
First note that the quantity in the parentheses on the right-hand side is positive because ψ ( 1 ) ( x ) is a monotonically decreasing function and so ψ ( 1 ) ( n i 2 ) ψ ( 1 ) ( n 2 ) for all 1 i d 1 . Thus, if for some quantity A we have 0 A 1 4 i = 1 d 1 ψ ( 1 ) ( n i 2 ) d 1 4 ψ ( 1 ) ( n / 2 ) then A 2 ( 1 4 i = 1 d 1 ψ ( 1 ) ( n i 2 ) d 1 4 ψ ( 1 ) ( n / 2 ) ) 2 . We construct such a quantity A as follows. First consider the summation term on the right-hand side of (A2). Using (A1) we have
1 4 i = 1 d 1 ψ ( 1 ) ( n i 2 ) 1 4 i = 1 d 1 2 n i + 2 ( n i ) 2 = 1 2 j = n d + 1 n 1 1 j + 1 j 2 1 2 n d + 1 n 1 x + 1 x 2 d x = 1 2 d 1 n ( n d + 1 ) + 1 2 ln n n d + 1 .
Using (A1) for the second term in parentheses on the right-hand side of (A2) gives
d 1 4 ψ ( 1 ) ( n / 2 ) d 1 4 2 n + 4 n 2 .
Thus we have
1 2 ln n n d + 1 + 1 2 d 1 n ( n d + 1 ) d 1 4 2 n + 4 n 2 1 4 i = 1 d 1 ψ ( 1 ) ( n i 2 ) d 1 4 ψ ( 1 ) ( n / 2 ) .
It remains to be shown that the left-hand side of (A3) is non-negative. Taking the limit of the left-hand side of (A3) as d and n get large, and assuming n / d k where 1 k < , we obtain
1 2 ln k k 1 1 2 k
which is strictly positive for any fixed k. Thus, for any fixed 1 k < there exists values d * and n * such that for all d > d * and n > n * with n / d sufficiently close to k we have
0 1 2 ln n n d + 1 + 1 2 d 1 n ( n d + 1 ) d 1 4 2 n + 4 n 2 .
As a result, for any fixed 1 k < , and for all d > d * and n > n * with n / d sufficiently close to k, we have
1 4 ln n n d + 1 + d 1 n ( n d + 1 ) d 1 2 2 n + 4 n 2 2 1 4 i = 1 d 1 ψ ( 1 ) ( n i 2 ) d 1 4 ψ ( 1 ) ( n / 2 ) 2 .
 ☐

References

  1. Linfoot, E.H. An informational measure of correlation. Inform. Contr. 1957, 1, 85–89. [Google Scholar] [CrossRef] [Green Version]
  2. Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Champaign, IL, USA, 1949. [Google Scholar]
  3. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 2012. [Google Scholar]
  4. Watanabe, S. Information theoretical analysis of multivariate correlation. IBM J. Res. Dev. 1960, 4, 66–82. [Google Scholar] [CrossRef]
  5. Garner, W.R. Uncertainty and Structure as Psychological Concepts; JohnWiley & Sons: New York, NY, USA, 1962. [Google Scholar]
  6. Studený, M.; Vejnarová, J. The multiinformation function as a tool for measuring stochastic dependence. In Learning in Graphical Models; Springer: Berlin, Germany, 1998; pp. 261–297. [Google Scholar]
  7. Joe, H. Relative entropy measures of multivariate dependence. J. Am. Stat. Assoc. 1989, 84, 157–164. [Google Scholar] [CrossRef]
  8. Kullback, S. Information Theory and Statistics; Dover: New York, NY, USA, 1968. [Google Scholar]
  9. Guerrero, J.L. Multivariate mutual information: Sampling distribution with applications. Commun. Stat. Theory Methods 1994, 23, 1319–1339. [Google Scholar] [CrossRef]
  10. Marrelec, G.; Benali, H. Large-sample asymptotic approximations for the sampling and posterior distributions of differential entropy for multivariate normal distributions. Entropy 2011, 13, 805–819. [Google Scholar] [CrossRef]
  11. Wilks, S.S. Moment-generating operators for determinants of product moments in samples from a normal system. Ann. Math. 1934, 35, 312–340. [Google Scholar] [CrossRef]
  12. Wilks, S. Certain Generalizations in the Analysis of Variance. Biometrika 1932, 24, 471–494. [Google Scholar] [CrossRef]
  13. Odell, P.L.; Feiveson, A.H. A numerical procedure to generate a sample covariance matrix. J. Am. Stat. Assoc. 1966, 61, 199–203. [Google Scholar] [CrossRef]
  14. Anderson, T.W. An Introduction to Multivariate Statistical Analysis, 3rd ed.; Wiley: New York, NY, USA, 2003. [Google Scholar]
  15. Goodman, N.R. The distribution of the determinant of a complex Wishart distributed matrix. Ann. Math. Stat. 1963, 34, 178–180. [Google Scholar] [CrossRef]
  16. Girko, V.L. A refinement of the central limit theorem for random determinants. Theor. Probab. Appl. 1998, 42, 121–129. [Google Scholar] [CrossRef]
  17. Bao, Z.; Pan, G.; Zhou, W. The logarithmic law of random determinant. Bernoulli 2015, 21, 1600–1628. [Google Scholar] [CrossRef] [Green Version]
  18. Cai, T.T.; Liang, T.; Zhou, H.H. Law of log determinant of sample covariance matrix and optimal estimation of differential entropy for high-dimensional Gaussian distributions. J. Multivariate Anal. 2015, 137, 161–172. [Google Scholar] [CrossRef] [Green Version]
  19. Billiingsley, P. Probability and Measure, 3rd ed.; John Wiley & Sons: New York, NY, USA, 1995. [Google Scholar]
  20. Guo, B.-N.; Qi, F. An extension of an inequality for ratios of gamma functions. J. Approx. Theor. 2011, 163, 1208–1216. [Google Scholar] [CrossRef] [Green Version]

Share and Cite

MDPI and ACS Style

Rowe, T.; Day, T. The Sampling Distribution of the Total Correlation for Multivariate Gaussian Random Variables. Entropy 2019, 21, 921. https://doi.org/10.3390/e21100921

AMA Style

Rowe T, Day T. The Sampling Distribution of the Total Correlation for Multivariate Gaussian Random Variables. Entropy. 2019; 21(10):921. https://doi.org/10.3390/e21100921

Chicago/Turabian Style

Rowe, Taylor, and Troy Day. 2019. "The Sampling Distribution of the Total Correlation for Multivariate Gaussian Random Variables" Entropy 21, no. 10: 921. https://doi.org/10.3390/e21100921

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop