The Sampling Distribution of the Total Correlation for Multivariate Gaussian Random Variables

Rowe, Taylor; Day, Troy

doi:10.3390/e21100921

Open AccessArticle

The Sampling Distribution of the Total Correlation for Multivariate Gaussian Random Variables

by

Taylor Rowe

and

Troy Day

^*

Department of Mathematics and Statistics, Queen’s University, Kingston, ON K7L 3N6, Canada

^*

Author to whom correspondence should be addressed.

Entropy 2019, 21(10), 921; https://doi.org/10.3390/e21100921

Submission received: 19 July 2019 / Revised: 9 September 2019 / Accepted: 16 September 2019 / Published: 22 September 2019

(This article belongs to the Section Information Theory, Probability and Statistics)

Download Review Reports Versions Notes

Abstract

:

The sampling distribution of the total correlation (TC) for a d-dimensional standardized multivariate Gaussian random variable with an identity covariance matrix is derived. It is shown to be the distribution of a sum of generalized beta random variables. It is also shown that, for large dimension and sample size, a central limit theorem holds, providing a Gaussian approximation to the sampling distribution for high dimensional data.

Keywords:

mutual information; total correlation; multiinformation; sampling distribution; central limit theorem; multivariate mutual information

1. Introduction

Mutual information quantifies the information shared between two random variables [1,2,3]. This concept can be been generalized to d variables in a variety of ways [4,5,6,7], with the most direct generalization being Watanabe’s total correlation (TC),

T (X) \equiv \sum_{i = 1}^{d} h (X_{i}) - h (X)

(1)

where

X

is a vector whose components are the d random variables

X_{1}, \dots, X_{d}

, and for continuous random variables,

h (X_{i})

is the differential entropy of

X_{i}

and

h (X)

is the joint differential entropy of

X

.

Total correlation is also sometimes called multivariate mutual information, and it is the Kullback–Leibler divergence between the joint density of

X

and the density obtained by taking the product of the marginal densities of the

X_{i}

. Thus, the total correlation

T (X)

quantifies, in a quite general sense, the information shared among all the d random variables. The total correlation is non-negative and in the case where all d random variables are mutually independent we have

T (X) = 0

[7,8]. For the special case where

X

is multivariate Gaussian with arbitrary mean and covariance matrix

Σ

, the total correlation can be written explicitly as

\begin{matrix} T (X) & = \frac{1}{2} \sum_{i = 1}^{d} log σ_{i i}^{2} - \frac{1}{2} log | Σ | \end{matrix}

(2)

where

σ_{i j}^{2}

is the

i j

th entry of

Σ

. When the

X_{i}

are independent we have

σ_{i j}^{2} = 0

for all

i \neq j

and so

log | Σ | = log σ_{11}^{2} σ_{22}^{2} \dots σ_{d d}^{2}

, giving

T (X) = 0

in Equation (2) as expected.

The total correlation provides a natural way to quantify dependencies among a set of random variables. For example, often we seek to determine if a set of random variables are mutually independent because dependency among variables can indicate interesting and meaningful relationships in nature. To do so one can take a sample from the unknown distribution and compute the total correlation from this sample. Even if the random variables are mutually independent, however, the total correlation measured using such a finite sample will typically be positive (rather than zero) simply because of sampling variation. Therefore, it is of interest to know the sampling distribution of the total correlation under independence. Once we have the sampling distribution we can then perform statistical tests of independence. Here we derive the sampling distribution of (2) in the case where the

X_{i}

are standardized (i.e., zero mean, unit variance), independent, Gaussian random variables.

Previous authors have proposed exact expressions for the mean and variance of the sample total correlation [9,10]. In fact, Guerrero (Section 2.1 of [9]) derived a moment generating function for the sample total correlation using the distribution of the log-determinant of a Wishart matrix (see Wilks [11,12]). Unfortunately the asymptotic approximation of Guerrero’s result does not match the results of Marralec [10] suggesting that one of the two is incorrect. We will resolve this discrepancy by deriving the moment generating function directly from our expression for the probability density function of the sample total correlation. In the limit of large sample size our results match those presented in Section 4.1 of Marralec [10], suggesting that the moment generating function of [9] is incorrect.

2. Definitions and Preliminaries

Let

X

represent a d-variate zero mean Gaussian random variable with covariance matrix

Σ = I_{d}

where

I_{d}

is the d-dimensional identity matrix. Let

{x_{1}, \dots, x_{n}}

denote a sample of n draws from the distribution of

X

. We focus on the case where

n \geq d

. The sample covariance matrix is

\hat{Σ} = (1 / n) \sum_{i = 1}^{n} x_{i} x_{i}^{'} = {{\hat{σ}}_{i j}}

and

n \hat{Σ}

is Wishart distributed with n degrees of freedom, which we denote as

n \hat{Σ} \sim W (Σ, d, n)

. From Equation (2) the sample total correlation is then also a random variable and is computed as

\begin{matrix} {\hat{T}}_{d, n} (X) & = \frac{1}{2} \sum_{i = 1}^{d} log {\hat{σ_{i i}}}^{2} - \frac{1}{2} log | \hat{Σ} | \end{matrix}

(3)

where the subscripts d and n indicate that

\hat{T}

is a family of random variables indexed by dimension and sample size.

Odell and Feiveson’s 1966 result [13] provides a convenient way to characterize a Wishart-distributed matrix. Suppose that

V_{i}^{(n)}

(1 \leq i \leq d)

are independent chi-square random variables with

n - i + 1

degrees of freedom. Suppose that

N_{i j}

are independent standardized normal random variables for

1 \leq i < j \leq d

, also independent of every

V_{i}^{(n)}

. Now construct the random variables

\begin{matrix} b_{11} & = V_{1}^{(n)} \\ b_{j j} & = V_{j}^{(n)} + \sum_{i = 1}^{j - 1} N_{i j}^{2}, 2 \leq j \leq d \\ b_{1 j} & = N_{1 j} \sqrt{V_{1}^{(n)}} 2 \leq j \leq d \\ b_{i j} & = N_{i j} \sqrt{V_{i}^{(n)}} + \sum_{k = 1}^{i - 1} N_{k i} N_{k j}, 2 < i < j \leq d . \end{matrix}

(4)

Then the matrix

B = {b_{i j}}

(with

b_{i j} = b_{j i}

) is Wishart-distributed

W (I_{d}, d, n)

and thus we have

\begin{matrix} n {\hat{σ}}_{i i}^{2} \sim b_{i i} & \sim V_{i}^{(n)} + A_{i} 1 < 1 \leq d \end{matrix}

(5)

where

A_{i}

are independent chi-square random variables with

i - 1

degrees of freedom and we define

A_{1} = 0

. Now following [14] we can also define the lower-triangular matrix

T = {t_{i j}}

as

\begin{matrix} t_{i i} & = \sqrt{V_{i}^{(n)}} \\ t_{i j} & = N_{j i} 1 \leq j < i \leq d \\ t_{i j} & = 0 i < j \leq d \end{matrix}

(6)

and thus

B = T T^{'}

. Furthermore,

| B | = | T T^{'} {| = | T |}^{2} = \prod_{i = 1}^{d} t_{i i}^{2} = \prod_{i = 1}^{d} V_{i}^{(n)}

, revealing that

n^{d} | \hat{Σ} | \sim \prod_{i = 1}^{d} V_{i}^{(n)} .

(7)

Result (7) is a special case of results found in Wilks [11]. For analogous results involving complex matrices see Goodman [15].

3. The Sampling Distribution of the Total Correlation

With the above preliminaries the we can now state the following theorem.

Theorem 1

(The Sampling Distribution of TC). Consider a sample of size n from a set of d independent, standardized, Gaussian random variables, with

n \geq d

. The total correlation (TC) is distributed as

\begin{matrix} {\hat{T}}_{d, n} (X) & \sim \frac{1}{2} \sum_{i = 1}^{d - 1} log (1 + \frac{i}{n - i} F_{i, n - i}) \end{matrix}

(8)

where

F_{i, n - i}

are independent F-distributed random variables with i and

n - i

degrees of freedom. Equivalently, (8) can be written as

\begin{matrix} {\hat{T}}_{d, n} (X) & \sim \sum_{i = 1}^{d - 1} Y_{i, n} \end{matrix}

(9)

where

Y_{i, n}

is a beta-exponential random variable with probability density

f_{Y_{i, n}} (y) = λ \frac{{(1 - e^{- λ y})}^{\frac{i}{2} - 1} {(e^{- λ y})}^{\frac{n - i}{2}}}{B (\frac{i}{2}, \frac{n - i}{2})} y > 0

having parameter

λ = 2

.

Proof.

Writing Equation (3) as

\begin{matrix} {\hat{T}}_{d, n} (X) & = \frac{1}{2} log \frac{\prod_{i = 1}^{d} {\hat{σ_{i i}}}^{2}}{| \hat{Σ} |} \end{matrix}

(10)

and using result (5) and (7) one obtains

\begin{matrix} {\hat{T}}_{d, n} (X) & \sim \frac{1}{2} log \frac{\prod_{i = 1}^{d} (V_{i}^{(n)} + A_{i})}{\prod_{i = 1}^{d} V_{i}^{(n)}} \\ \sim \frac{1}{2} log \prod_{i = 1}^{d} (1 + \frac{A_{i}}{V_{i}^{(n)}}) \\ \sim \frac{1}{2} \sum_{i = 1}^{d} log (1 + \frac{A_{i}}{V_{i}^{(n)}}) . \end{matrix}

Scaling each chi-square by their corresponding degrees of freedom and re-indexing, yields (8). Equivalently, if we define

Y_{i, n} = \frac{1}{2} log (1 + \frac{i}{n - i} F_{i, n - i})

then

{\hat{T}}_{d, n} (X) \sim \sum_{i = 1}^{d - 1} Y_{i, n}

, and using standard techniques it be can shown that the random variable

Y_{i, n}

has probability density

f_{Y_{i, n}} (y) = 2 \frac{{(1 - e^{- 2 y})}^{\frac{i}{2} - 1} {(e^{- 2 y})}^{\frac{n - i}{2}}}{B (\frac{i}{2}, \frac{n - i}{2})} y > 0

where

B (x, y)

is the beta function. ☐

Corollary 1.

The moment generating function for

{\hat{T}}_{d, n} (X)

is

\begin{matrix} M_{d, n} (t) & = {[\frac{Γ (\frac{n}{2})}{Γ (\frac{n - t}{2})}]}^{d - 1} \prod_{i = 1}^{d - 1} \frac{Γ (\frac{n - i - t}{2})}{Γ (\frac{n - i}{2})} \end{matrix}

(11)

where

Γ (x)

is the gamma function. The mean and variance of

{\hat{T}}_{d, n} (X)

are therefore

\begin{matrix} μ_{d, n} & = \frac{d - 1}{2} ψ (n / 2) - \frac{1}{2} \sum_{i = 1}^{d - 1} ψ (\frac{n - i}{2}) \\ σ_{d, n}^{2} & = - \frac{d - 1}{4} ψ^{(1)} (n / 2) + \frac{1}{4} \sum_{i = 1}^{d - 1} ψ^{(1)} (\frac{n - i}{2}) \end{matrix}

(12)

where

ψ (x) = Γ^{'} (x) / Γ (x)

is the digamma function and

ψ^{(k)} (x)

denotes its

k^{t h}

derivative.

Proof.

Taking

Y_{i, n} = \frac{1}{2} log (1 + \frac{i}{n - i} F_{i, n - i})

, the moment generating function for

Y_{i, n}

is

ϕ_{i, n} (t) = E [e^{t Y_{i, n}}] = \frac{Γ [\frac{n}{2}] Γ [\frac{n - i - t}{2}]}{Γ [\frac{n - i}{2}] Γ [\frac{n - t}{2}]} .

The random variables in the sum

\sum_{i = 1}^{d - 1} Y_{i, n}

are independent, and therefore the moment generating function

M_{d, n} (t)

for

{\hat{T}}_{d, n} (X)

is the appropriate product of the functions

ϕ_{i, n} (t)

. Equation (12) then follow directly from the properties of moment generating functions. ☐

Guerrero [9] obtained a formula for the mean and variance of

{\hat{T}}_{d, n} (X)

(except for a typo in the variance) using Wilks’ [12] moment generating function for the generalized variance. These are remarkably close to (12), but the proposed moment generating function for the sample total correlation information provided in Guerrero [9] appears to be incorrect.

4. A Central Limit Theorem for the Total Correlation

Girkos central limit theorem [16] implies asymptotic normality of the sample log-determinant, as seen in the work of Bao et al., and Cai et al. [17,18]. This suggests the existence of a central limit theorem result for

{\hat{T}}_{d, n} (X)

when the dimension d and sample size n are large. Here we provide such a result.

Define the mean and variance of

Y_{i, n}

as

m_{i, n} = E [Y_{i, n}]

and

s_{i, n}^{2} = E [{(Y_{i, n} - μ_{i, n})}^{2}]

, and the mean-centered random variables

Y_{i, n}^{*} = Y_{i, n} - m_{i, n}

. Note that

σ_{d, n}^{2} = \sum_{i = 1}^{d - 1} s_{i, n}^{2}

.

Theorem 2

(Asymptotic normality of TC). Suppose

n \to \infty

and

d \to \infty

in such a way that

n / d \to k

where

1 \leq k < \infty

. Then

\frac{1}{σ_{d, n}^{2}} \sum_{i = 1}^{d - 1} Y_{i, n}^{*} \overset{}{\to} N (0, 1)

(13)

where convergence is in distribution. Thus, for large n and d (with

n \geq d

) the total correlation

{\hat{T}}_{d, n} (X)

is approximately normally distributed with mean and variance given by

μ_{d, n}

and

σ_{d, n}^{2}

in Equations (12).

Proof.

The

Y_{i, n}^{*}

are a triangular array of random variables such that, for any fixed n the

Y_{i, n}^{*}

(

1 \leq i \leq d - 1

) are independent. Thus, (13) will hold provided that the Lyapunov condition is satisfied [19]; namely, that there exists a

δ > 0

such that

lim_{d, n \to \infty} \frac{1}{σ_{d, n}^{2 + δ}} \sum_{i = 1}^{d - 1} E [| Y_{i, n}^{*} |^{2 + δ}] = 0 .

For

δ = 2

the entries in Lyapunov’s summation represent each

Y_{i, n}

’s fourth central moment, for which the generating function is

C_{i, n} (t) = e^{- m_{i, n} t} ϕ_{i, n} (t)

. The summation therefore becomes

\begin{matrix} \sum_{i = 1}^{d - 1} E [{(Y_{i, n}^{*})}^{4}] & = & \sum_{i = 1}^{d - 1} \frac{1}{16} (3 {(ψ^{(1)} (\frac{n - i}{2}) - ψ^{(1)} (n / 2))}^{2} + ψ^{(3)} (\frac{n - i}{2}) - ψ^{(3)} (n / 2)) \\ = & \frac{3}{16} \sum_{i = 1}^{d - 1} {(ψ^{(1)} (\frac{n - i}{2}) - ψ^{(1)} (n / 2))}^{2} + \frac{1}{16} \sum_{i = 1}^{d - 1} ψ^{(3)} (\frac{n - i}{2}) - \frac{d - 1}{16} ψ^{(3)} (n / 2) \end{matrix}

while the denominator in Lyapunov’s condition is

σ_{d, n}^{4} = {(\frac{1}{4} \sum_{i = 1}^{d - 1} ψ^{(1)} (\frac{n - i}{2}) - \frac{d - 1}{4} ψ^{(1)} (n / 2))}^{2} .

In Appendix A we show that

0 \leq \frac{3}{16} \sum_{i = 1}^{d - 1} {(ψ^{(1)} (\frac{n - i}{2}) - ψ^{(1)} (n / 2))}^{2} + \frac{1}{16} \sum_{i = 1}^{d - 1} ψ^{(3)} (\frac{n - i}{2}) - \frac{d - 1}{16} ψ^{(3)} (n / 2) \leq \frac{48}{n - d + 1}

and, for any fixed

1 \leq k < \infty

, and for sufficiently large d and n with

n / d

sufficiently close to k,

\frac{1}{4} {(ln (\frac{n}{n - d + 1}) + \frac{d - 1}{n (n - d + 1)} - \frac{d - 1}{2} (\frac{2}{n} + \frac{4}{n^{2}}))}^{2} \leq {(\frac{1}{4} \sum_{i = 1}^{d - 1} ψ^{(1)} (\frac{n - i}{2}) - \frac{d - 1}{4} ψ^{(1)} (n / 2))}^{2} .

Therefore, for any fixed

1 \leq k < \infty

, and for sufficiently large d and n with

n / d

sufficiently close to k, we have

0 \leq \frac{1}{σ_{d, n}^{4}} \sum_{i = 1}^{d - 1} E [| Y_{i, n}^{*} |^{4}] \leq \frac{\frac{48}{n - d + 1}}{\frac{1}{4} {(ln (\frac{n}{n - d + 1}) + \frac{d - 1}{n (n - d + 1)} - \frac{d - 1}{2} (\frac{2}{n} + \frac{4}{n^{2}}))}^{2}} .

(14)

Now first consider the case where

n = d

(and therefore

k = 1

). Then (14) simplifies to

0 \leq \frac{1}{σ_{d, n}^{4}} \sum_{i = 1}^{d - 1} E [| Y_{i, n}^{*} |^{4}] \leq \frac{48}{\frac{1}{4} {(ln n + \frac{n - 1}{n} - \frac{n - 1}{2} (\frac{2}{n} + \frac{4}{n^{2}}))}^{2}} .

Taking the limit

n \to \infty

yields zero on the right-hand side, verifying Lyapunov’s condition for

k = 1

. Next, consider the case where

n > d

. Taking the limit in (14) as

n \to \infty

and

d \to \infty

in such a way that

n / d \to k

where

1 < k < \infty

, again we see that the right-hand side is zero. This verifies Lyapunov’s condition in the case where

k > 1

, thereby completing the proof. ☐

5. Conclusions

The total correlation of a multivariate random variable (sometimes called multivariate mutual information) is the Kullback–Leibler divergence between the joint density of the random variable and the product of its marginal densities. It therefore provides a natural measure of the degree of independence of a set of random variables. In this paper we derived the sampling distribution of the total correlation for a d-dimensional standardized multivariate Gaussian random variable with identity covariance matrix, and showed that it is the distribution of a sum of generalized beta random variables. We also proved that, for large dimension and sample size, a central limit theorem holds, providing a Gaussian approximation to the sampling distribution for high dimensional data.

Author Contributions

Conceptualization, T.R. and T.D.; methodology, T.R. and T.D.; formal analysis, T.R. and T.D.; investigation, T.R. and T.D.; writing–original draft preparation, T.R.; writing–review and editing, T.D.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The proof of the central limit theorem result makes use of the following two lemmas. Both are based on an inequality for the digamma function found in [20] (where

m \geq 1

is an integer)

\frac{(m - 1)!}{x^{m}} + \frac{m!}{2 x^{m + 1}} \leq {(- 1)}^{m + 1} ψ^{(m)} (x) \leq \frac{(m - 1)!}{x^{m}} + \frac{m!}{x^{m + 1}} .

(A1)

Lemma A1.

Suppose

d \leq n

. Then the following inequality holds

0 \leq \frac{3}{16} \sum_{i = 1}^{d - 1} {(ψ^{(1)} (\frac{n - i}{2}) - ψ^{(1)} (n / 2))}^{2} + \frac{1}{16} \sum_{i = 1}^{d - 1} ψ^{(3)} (\frac{n - i}{2}) - \frac{d - 1}{16} ψ^{(3)} (n / 2) \leq \frac{48}{n - d + 1} .

Proof.

The left-hand inequality follows from the fact that

ψ^{(1)} (x)

and

ψ^{(3)} (x)

are both monotonically decreasing functions and so

ψ^{(1)} (\frac{n - i}{2}) \geq ψ^{(1)} (\frac{n}{2})

and

ψ^{(3)} (\frac{n - i}{2}) \geq ψ^{(3)} (\frac{n}{2})

for all

1 \leq i \leq d - 1

. For the right-hand inequality we have

\frac{3}{16} \sum_{i = 1}^{d - 1} {(ψ^{(1)} (\frac{n - i}{2}) - ψ^{(1)} (n / 2))}^{2} + \frac{1}{16} \sum_{i = 1}^{d - 1} ψ^{(3)} (\frac{n - i}{2}) - \frac{d - 1}{16} ψ^{(3)} (n / 2)

\begin{matrix} \leq & \frac{3}{16} \sum_{i = 1}^{d - 1} {(ψ^{(1)} (\frac{n - i}{2}))}^{2} + \frac{1}{16} \sum_{i = 1}^{d - 1} ψ^{(3)} (\frac{n - i}{2}) \\ \leq & \frac{3}{16} \sum_{i = 1}^{d - 1} {(\frac{2}{n - i} + \frac{4}{{(n - i)}^{2}})}^{2} + \frac{1}{16} \sum_{i = 1}^{d - 1} (\frac{16}{{(n - i)}^{3}} + \frac{96}{{(n - i)}^{4}}) \\ \leq & \frac{3}{16} \sum_{i = 1}^{d - 1} {(\frac{8}{n - i})}^{2} + \frac{1}{16} \sum_{i = 1}^{d - 1} \frac{192}{{(n - i)}^{3}} \\ \leq & 12 \sum_{j = n - d + 1}^{n - 1} (\frac{1}{j^{2}} + \frac{1}{j^{3}}) \\ = & 12 (\frac{1}{{(n - d + 1)}^{2}} + \frac{1}{{(n - d + 1)}^{3}} + \sum_{j = n - d + 2}^{n - 1} (\frac{1}{j^{2}} + \frac{1}{j^{3}})) \\ \leq & 12 (\frac{1}{{(n - d + 1)}^{2}} + \frac{1}{{(n - d + 1)}^{3}} + \int_{j = n - d + 1}^{n - 1} (\frac{1}{x^{2}} + \frac{1}{x^{3}}) d x) \end{matrix}

\begin{matrix} = & 12 (\frac{1}{{(n - d + 1)}^{2}} + \frac{1}{{(n - d + 1)}^{3}} + \frac{1}{n - d + 1} - \frac{1}{n - 1} + \frac{1}{2 (n - d + 1)} - \frac{1}{2 (n - 1)}) \\ \leq & 12 (\frac{1}{{(n - d + 1)}^{2}} + \frac{1}{{(n - d + 1)}^{3}} + \frac{1}{n - d + 1} + \frac{1}{2 (n - d + 1)}) \\ \leq & \frac{48}{n - d + 1} . \end{matrix}

☐

Lemma A2.

Suppose

d \leq n

. Then, for any fixed

1 \leq k \leq \infty

, and for sufficiently large d and n with

n / d

sufficiently close to k, the following inequality holds

\frac{1}{4} {(ln (\frac{n}{n - d + 1}) + \frac{d - 1}{n (n - d + 1)} - \frac{d - 1}{2} (\frac{2}{n} + \frac{4}{n^{2}}))}^{2} \leq {(\frac{1}{4} \sum_{i = 1}^{d - 1} ψ^{(1)} (\frac{n - i}{2}) - \frac{d - 1}{4} ψ^{(1)} (n / 2))}^{2} .

(A2)

Proof.

First note that the quantity in the parentheses on the right-hand side is positive because

ψ^{(1)} (x)

is a monotonically decreasing function and so

ψ^{(1)} (\frac{n - i}{2}) \geq ψ^{(1)} (\frac{n}{2})

for all

1 \leq i \leq d - 1

. Thus, if for some quantity A we have

0 \leq A \leq \frac{1}{4} \sum_{i = 1}^{d - 1} ψ^{(1)} (\frac{n - i}{2}) - \frac{d - 1}{4} ψ^{(1)} (n / 2)

then

A^{2} \leq {(\frac{1}{4} \sum_{i = 1}^{d - 1} ψ^{(1)} (\frac{n - i}{2}) - \frac{d - 1}{4} ψ^{(1)} (n / 2))}^{2}

. We construct such a quantity A as follows. First consider the summation term on the right-hand side of (A2). Using (A1) we have

\begin{matrix} \frac{1}{4} \sum_{i = 1}^{d - 1} ψ^{(1)} (\frac{n - i}{2}) & \geq & \frac{1}{4} \sum_{i = 1}^{d - 1} (\frac{2}{n - i} + \frac{2}{{(n - i)}^{2}}) \\ = & \frac{1}{2} \sum_{j = n - d + 1}^{n - 1} (\frac{1}{j} + \frac{1}{j^{2}}) \\ \geq & \frac{1}{2} \int_{n - d + 1}^{n} (\frac{1}{x} + \frac{1}{x^{2}}) d x \\ = & \frac{1}{2} \frac{d - 1}{n (n - d + 1)} + \frac{1}{2} ln (\frac{n}{n - d + 1}) . \end{matrix}

Using (A1) for the second term in parentheses on the right-hand side of (A2) gives

\begin{matrix} \frac{d - 1}{4} ψ^{(1)} (n / 2) & \leq & \frac{d - 1}{4} (\frac{2}{n} + \frac{4}{n^{2}}) . \end{matrix}

Thus we have

\frac{1}{2} ln (\frac{n}{n - d + 1}) + \frac{1}{2} \frac{d - 1}{n (n - d + 1)} - \frac{d - 1}{4} (\frac{2}{n} + \frac{4}{n^{2}}) \leq \frac{1}{4} \sum_{i = 1}^{d - 1} ψ^{(1)} (\frac{n - i}{2}) - \frac{d - 1}{4} ψ^{(1)} (n / 2) .

(A3)

It remains to be shown that the left-hand side of (A3) is non-negative. Taking the limit of the left-hand side of (A3) as d and n get large, and assuming

n / d \to k

where

1 \leq k < \infty

, we obtain

\frac{1}{2} ln (\frac{k}{k - 1}) - \frac{1}{2 k}

which is strictly positive for any fixed k. Thus, for any fixed

1 \leq k < \infty

there exists values

d^{*}

and

n^{*}

such that for all

d > d^{*}

and

n > n^{*}

with

n / d

sufficiently close to k we have

0 \leq \frac{1}{2} ln (\frac{n}{n - d + 1}) + \frac{1}{2} \frac{d - 1}{n (n - d + 1)} - \frac{d - 1}{4} (\frac{2}{n} + \frac{4}{n^{2}}) .

(A4)

As a result, for any fixed

1 \leq k < \infty

, and for all

d > d^{*}

and

n > n^{*}

with

n / d

sufficiently close to k, we have

\frac{1}{4} {(ln (\frac{n}{n - d + 1}) + \frac{d - 1}{n (n - d + 1)} - \frac{d - 1}{2} (\frac{2}{n} + \frac{4}{n^{2}}))}^{2} \leq {(\frac{1}{4} \sum_{i = 1}^{d - 1} ψ^{(1)} (\frac{n - i}{2}) - \frac{d - 1}{4} ψ^{(1)} (n / 2))}^{2} .

(A5)

☐

References

Linfoot, E.H. An informational measure of correlation. Inform. Contr. 1957, 1, 85–89. [Google Scholar] [CrossRef] [Green Version]
Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Champaign, IL, USA, 1949. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 2012. [Google Scholar]
Watanabe, S. Information theoretical analysis of multivariate correlation. IBM J. Res. Dev. 1960, 4, 66–82. [Google Scholar] [CrossRef]
Garner, W.R. Uncertainty and Structure as Psychological Concepts; JohnWiley & Sons: New York, NY, USA, 1962. [Google Scholar]
Studený, M.; Vejnarová, J. The multiinformation function as a tool for measuring stochastic dependence. In Learning in Graphical Models; Springer: Berlin, Germany, 1998; pp. 261–297. [Google Scholar]
Joe, H. Relative entropy measures of multivariate dependence. J. Am. Stat. Assoc. 1989, 84, 157–164. [Google Scholar] [CrossRef]
Kullback, S. Information Theory and Statistics; Dover: New York, NY, USA, 1968. [Google Scholar]
Guerrero, J.L. Multivariate mutual information: Sampling distribution with applications. Commun. Stat. Theory Methods 1994, 23, 1319–1339. [Google Scholar] [CrossRef]
Marrelec, G.; Benali, H. Large-sample asymptotic approximations for the sampling and posterior distributions of differential entropy for multivariate normal distributions. Entropy 2011, 13, 805–819. [Google Scholar] [CrossRef]
Wilks, S.S. Moment-generating operators for determinants of product moments in samples from a normal system. Ann. Math. 1934, 35, 312–340. [Google Scholar] [CrossRef]
Wilks, S. Certain Generalizations in the Analysis of Variance. Biometrika 1932, 24, 471–494. [Google Scholar] [CrossRef]
Odell, P.L.; Feiveson, A.H. A numerical procedure to generate a sample covariance matrix. J. Am. Stat. Assoc. 1966, 61, 199–203. [Google Scholar] [CrossRef]
Anderson, T.W. An Introduction to Multivariate Statistical Analysis, 3rd ed.; Wiley: New York, NY, USA, 2003. [Google Scholar]
Goodman, N.R. The distribution of the determinant of a complex Wishart distributed matrix. Ann. Math. Stat. 1963, 34, 178–180. [Google Scholar] [CrossRef]
Girko, V.L. A refinement of the central limit theorem for random determinants. Theor. Probab. Appl. 1998, 42, 121–129. [Google Scholar] [CrossRef]
Bao, Z.; Pan, G.; Zhou, W. The logarithmic law of random determinant. Bernoulli 2015, 21, 1600–1628. [Google Scholar] [CrossRef] [Green Version]
Cai, T.T.; Liang, T.; Zhou, H.H. Law of log determinant of sample covariance matrix and optimal estimation of differential entropy for high-dimensional Gaussian distributions. J. Multivariate Anal. 2015, 137, 161–172. [Google Scholar] [CrossRef] [Green Version]
Billiingsley, P. Probability and Measure, 3rd ed.; John Wiley & Sons: New York, NY, USA, 1995. [Google Scholar]
Guo, B.-N.; Qi, F. An extension of an inequality for ratios of gamma functions. J. Approx. Theor. 2011, 163, 1208–1216. [Google Scholar] [CrossRef] [Green Version]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rowe, T.; Day, T. The Sampling Distribution of the Total Correlation for Multivariate Gaussian Random Variables. Entropy 2019, 21, 921. https://doi.org/10.3390/e21100921

AMA Style

Rowe T, Day T. The Sampling Distribution of the Total Correlation for Multivariate Gaussian Random Variables. Entropy. 2019; 21(10):921. https://doi.org/10.3390/e21100921

Chicago/Turabian Style

Rowe, Taylor, and Troy Day. 2019. "The Sampling Distribution of the Total Correlation for Multivariate Gaussian Random Variables" Entropy 21, no. 10: 921. https://doi.org/10.3390/e21100921

APA Style

Rowe, T., & Day, T. (2019). The Sampling Distribution of the Total Correlation for Multivariate Gaussian Random Variables. Entropy, 21(10), 921. https://doi.org/10.3390/e21100921

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Sampling Distribution of the Total Correlation for Multivariate Gaussian Random Variables

Abstract

1. Introduction

2. Definitions and Preliminaries

3. The Sampling Distribution of the Total Correlation

4. A Central Limit Theorem for the Total Correlation

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI