On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations

Jonathan E. W. Huffmann; Martin Mittelbach

doi:10.3390/e24070924

and

¹

Lehrstuhl für Theoretische Informationstechnik, Fakultät für Elektrotechnik und Informationstechnik, Technische Universität München, 80290 München, Germany

²

Lehrstuhl für Theoretische Nachrichtentechnik, Fakultät für Elektrotechnik und Informationstechnik, Technische Universität Dresden, 01062 Dresden, Germany

^*

Author to whom correspondence should be addressed.

Entropy2022, 24(7), 924;https://doi.org/10.3390/e24070924

This article belongs to the Section Information Theory, Probability and Statistics

Version Notes

Order Reprints

Abstract

Based on the canonical correlation analysis, we derive series representations of the probability density function (PDF) and the cumulative distribution function (CDF) of the information density of arbitrary Gaussian random vectors as well as a general formula to calculate the central moments. Using the general results, we give closed-form expressions of the PDF and CDF and explicit formulas of the central moments for important special cases. Furthermore, we derive recurrence formulas and tight approximations of the general series representations, which allow efficient numerical calculations with an arbitrarily high accuracy as demonstrated with an implementation in Python publicly available on GitLab. Finally, we discuss the (in)validity of Gaussian approximations of the information density.

Keywords:

information density; information spectrum; probability density function; cumulative distribution function; central moments; Gaussian random vector; canonical correlation analysis

1. Introduction and Main Theorems

Let

ξ

and

η

be arbitrary random variables on an abstract probability space

(Ω, F, P)

such that the joint distribution

P_{ξ η}

is absolutely continuous w. r. t. the product

P_{ξ} \otimes P_{η}

of the marginal distributions

P_{ξ}

and

P_{η}

. If

\frac{d P_{ξ η}}{d P_{ξ} \otimes P η}

denotes the Radon–Nikodym derivative of

P_{ξ η}

w. r. t.

P_{ξ} \otimes P_{η}

, then

\begin{matrix} i (ξ; η) = log (\frac{d P_{ξ η}}{d P_{ξ} \otimes P_{η}} (ξ, η)) \end{matrix}

is called the information density of

ξ

and

η

. The expectation

E (i (ξ; η)) = I (ξ; η)

of the information density, called mutual information, plays a key role in characterizing the asymptotic channel coding performance in terms of channel capacity. The non-asymptotic performance, however, is determined by the higher-order moments of the information density and its probability distribution. Achievability and converse bounds that allow a finite blocklength analysis of the optimum channel coding rate are closely related to the distribution function of the information density, also called information spectrum by Han and Verdú [1,2]. Moreover, based on the variance of the information density tight second-order finite blocklength approximations of the optimum code rate can be derived for various important channel models. First work on a non-asymptotic information theoretic analysis was already published in the early years of information theory by Shannon [3], Dobrushin [4], and Strassen [5], among others. Due to the seminal work of Polyanskiy et al. [6], considerable progress has been made in this area. The results of [6] on the one hand and the requirements of current and future wireless networks regarding latency and reliability on the other hand stimulated a significant new interest in this type of analysis (Durisi et al. [7]).

The information density

i (ξ; η)

in the case when

ξ

and

η

are jointly Gaussian is of special interest due to the prominent role of the Gaussian distribution. Let

ξ = (ξ_{1}, ξ_{2}, \dots, ξ_{p})

and

η = (η_{1}, η_{2}, \dots, η_{q})

be real-valued random vectors with nonsingular covariance matrices

R_{ξ}

and

R_{η}

and cross-covariance matrix

R_{ξ r v Y}

with rank

r = rank (R_{ξ η})

. (For notational convenience, we write vectors as row vectors. However, in expressions where matrix or vector multiplications occur, we consider all vectors as column vectors.) Without loss of generality for the subsequent results, we assume the expectation of all random variables to be zero. If

(ξ_{1}, ξ_{2}, \dots, ξ_{p}, η_{1}, η_{2}, \dots, η_{q})

is a Gaussian random vector, then Pinsker [8], Ch. 9.6 has shown that the distribution of the information density

i (ξ; η)

coincides with the distribution of the random variable

\begin{matrix} ν & = \frac{1}{2} \sum_{i = 1}^{r} ϱ_{i} ({\tilde{ξ}}_{i}^{2} - {\tilde{η}}_{i}^{2}) + I (ξ; η) . \end{matrix}

(1)

In this representation

{\tilde{ξ}}_{1}, {\tilde{ξ}}_{2}, \dots, {\tilde{ξ}}_{r}, {\tilde{η}}_{1}, {\tilde{η}}_{2},

…

, {\tilde{η}}_{r}

are independent and identically distributed (i.i.d.) Gaussian random variables with zero mean and unit variance and the mutual information

I (ξ; η)

in (1) has the form

\begin{matrix} I (ξ; η) & = \frac{1}{2} \sum_{i = 1}^{r} log (\frac{1}{1 - ϱ_{i}^{2}}) . \end{matrix}

(2)

Moreover,

ϱ_{1} \geq ϱ_{2} \geq \dots \geq ϱ_{r} > 0

denote the positive canonical correlations of

ξ

and

η

in descending order, which are obtained by a linear method called canonical correlation analysis that yields the maximum correlations between two sets of random variables (see Section 3). The rank r of the cross-covariance matrix

R_{ξ η}

satisfies

0 \leq r \leq min {p, q}

, and for

r = 0

we have

i (ξ; η) \equiv 0

almost surely and

I (ξ; η) = 0

. This corresponds to

P_{ξ η} = P_{ξ} \otimes P_{η}

and the independence of

ξ

and

η

such that the resulting information density is deterministic. Throughout the rest of the paper, we exclude this degenerated case when the information density is considered and assume subsequently the setting and notation introduced above with

r \geq 1

. As customary notation, we further write

R

,

N_{0}

, and

N

to denote the set of real numbers, non-negative integers, and positive integers.

Main contributions. Based on (1), we derive in Section 4 series representations of the probability density function (PDF) and the cumulative distribution function (CDF) as well as explicit general formulas for the central moments of the information density

i (ξ; η)

given subsequently in Theorems 1 to 3. The series representations are useful as they allow tight approximations with errors as low as desired by finite sums as shown in Section 5.2. Moreover, we derive recurrence formulas in Section 5.1 that allow efficient numerical calculations of the series representations in Theorems 1 and 2.

Theorem 1

(PDF of information density). The PDF

f_{i (ξ; η)}

of the information density

i (ξ; η)

is given by

\begin{matrix} f_{i (ξ; η)} (x) = \frac{1}{ϱ_{r} \sqrt{π}} \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \times \\ \frac{K_{\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{|x - I (ξ; η)|}{ϱ_{r}})}{Γ (\frac{r}{2} + k_{1} + k_{2} + \dots + k_{r - 1})} {(\frac{|x - I (ξ; η)|}{2 ϱ_{r}})}^{(\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}, x \in R \ {I (ξ; η)}, \end{matrix}

(3)

where

Γ (\cdot)

denotes the gamma function [9], Sec. 5.2.1 and

K_{α} (\cdot)

denotes the modified Bessel function of second kind and order α [9], Sec. 10.25(ii). If

r \geq 2

, then

f_{i (ξ; η)} (x)

is also well defined for

x = I (ξ; η)

.

Theorem 2

(CDF of information density). The CDF

F_{i (ξ; η)}

of the information density

i (ξ; η)

is given by

F_{i (ξ; η)} (x) = \{\begin{matrix} \frac{1}{2} - V (I (ξ; η) - x) & i f x \leq I (ξ; η) \\ \frac{1}{2} + V (x - I (ξ; η)) & i f x > I (ξ; η) \end{matrix},

with

V (z)

defined by

\begin{matrix} V (z) = & \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \frac{z}{2 ϱ_{r}} \times \\ [K_{\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{z}{ϱ_{r}}) L_{\frac{r - 3}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{z}{ϱ_{r}}) \\ + K_{\frac{r - 3}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{z}{ϱ_{r}}) L_{\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{z}{ϱ_{r}})], z \geq 0, \end{matrix}

(4)

where

L_{α} (\cdot)

denotes the modified Struve

L

function of order α [9], Sec. 11.2.

The method to obtain the result in Theorem 1 is adopted from Mathai [10], where a series representation of the PDF of the sum of independent gamma distributed random variables is derived. Previous work of Grad and Solomon [11] and Kotz et al. [12] goes in a similar direction as Mathai [10]; however, it is not directly applicable since only the restriction to positive series coefficients is considered there. Using Theorem 1, the series representation of the CDF of the information density in Theorem 2 is obtained. The details of the derivations of Theorems 1 and 2 are provided in Section 4.

Theorem 3

(Central moments of information density). The m-th central moment

E ({[i (ξ; η) - I (ξ; η)]}^{m})

of the information density

i (ξ; η)

is given by

\begin{matrix} E ([i (ξ; η) & {- I (ξ; η)]}^{m}) \end{matrix}

= \{\begin{matrix} \sum_{(m_{1}, m_{2}, \dots, m_{r}) \in K_{m, r}^{[2]}} m! \prod_{i = 1}^{r} \frac{(2 m_{i})!}{4^{m_{i}} {(m_{i}!)}^{2}} ϱ_{i}^{2 m_{i}} & i f m = 2 \tilde{m} \\ 0 & i f m = 2 \tilde{m} - 1 \end{matrix},

(5)

for all

\tilde{m} \in N

, where

K_{m, r}^{[2]} = \{(m_{1}, m_{2}, \dots, m_{r}) \in N_{0}^{r} : 2 m_{1} + 2 m_{2} + \dots + 2 m_{r} = m\}

.

Pinsker [8], Eq. (9.6.17) provided a formula for

\sum_{i = 1}^{r} E ({[\frac{ϱ_{i}}{2} ({\tilde{ξ}}_{i}^{2} - {\tilde{η}}_{i}^{2})]}^{m})

, which he called “derived m-th central moment” of the information density, where

{\tilde{ξ}}_{i}

and

{\tilde{η}}_{i}

are given as in (1). These special moments coincide for

m = 2

with the usual central moments considered in Theorem 3.

The rest of the paper is organized as follows: In Section 2, we discuss important special cases which allow simplified and explicit formulas. In Section 3, we provide some background on the canonical correlation analysis and its application to the calculation of the information density and mutual information for Gaussian random vectors. The proofs of the main Theorems 1 to 3 are given in Section 4. Recurrence formulas, finite sum approximations, and uniform bounds of the approximation error are derived in Section 5, which allow efficient and accurate numerical calculations of the PDF and CDF of the information density. Some examples and illustrations are provided in Section 6, where also the (in)validity of Gaussian approximations is discussed. Finally, Section 7 summarizes the paper. Note that a first version of this paper was published on arXiv as preprint [13].

2. Special Cases

2.1. Equal Canonical Correlations

A simple but important special case for which the series representations in Theorems 1 and 2 simplify to a single summand and the sum of products in Theorem 3 simplifies to a single product is considered in the following corollary.

Corollary 1

(PDF, CDF, and central moments of information density for equal canonical correlations). If all canonical correlations are equal, i.e.,

ϱ_{1} = ϱ_{2} = \dots = ϱ_{r},

then we have the following simplifications.

(i) The PDF

f_{i (ξ; η)}

of the information density

i (ξ; η)

simplifies to

f_{i (ξ; η)} (x) = \frac{1}{ϱ_{r} \sqrt{π} Γ (\frac{r}{2})} K_{\frac{r - 1}{2}} (\frac{|x - I (ξ; η)|}{ϱ_{r}}) {(\frac{|x - I (ξ; η)|}{2 ϱ_{r}})}^{\frac{r - 1}{2}}, x \in R \ {I (ξ; η)},

(6)

where

I (ξ; η)

is given by

I (ξ; η) = - \frac{r}{2} log (1 - ϱ_{r}^{2}) .

If

r \geq 2

, then

f_{i (ξ; η)} (x)

is also well defined for

x = I (ξ; η)

.

(ii) The CDF

F_{i (ξ; η)}

of the information density

i (ξ; η)

is given by

F_{i (ξ; η)} (x) = \{\begin{matrix} \frac{1}{2} - V (I (ξ; η) - x) & i f x \leq I (ξ; η) \\ \frac{1}{2} + V (x - I (ξ; η)) & i f x > I (ξ; η) \end{matrix},

(7)

with

V (z)

defined by

V (z) = \frac{z}{2 ϱ_{r}} [K_{\frac{r - 1}{2}} (\frac{z}{ϱ_{r}}) L_{\frac{r - 3}{2}} (\frac{z}{ϱ_{r}}) + K_{\frac{r - 3}{2}} (\frac{z}{ϱ_{r}}) L_{\frac{r - 1}{2}} (\frac{z}{ϱ_{r}})], z \geq 0 .

(8)

(iii) The m-th central moment

E ({[i (ξ; η) - I (ξ; η)]}^{m})

of the information density

i (ξ; η)

has the form

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{m}) = & \{\begin{matrix} \frac{m!}{(m / 2)!} (\prod_{j = 1}^{m / 2} (\frac{r}{2} + j - 1)) ϱ_{r}^{m} & i f m = 2 \tilde{m} \\ 0 & i f m = 2 \tilde{m} - 1 \end{matrix}, \end{matrix}

(9)

for all

\tilde{m} \in N

.

Clearly, if all canonical correlations are equal, then the only nonzero term in the series (3) and (4) occur for

k_{1} = k_{2} = \dots = k_{r - 1} = 0

. For this single summand, the product in squared brackets in (3) and (4) is equal to 1 by applying

0^{0} = 1

, which yields the results of part (i) and (ii) in Corollary 1. Details of the derivation of part (iii) of the corollary are provided in Section 4.

Note, if all canonical correlations are equal, then we can rewrite (1) as follows:

\begin{matrix} ν & = \frac{ϱ_{r}}{2} (\sum_{i = 1}^{r} {\tilde{ξ}}_{i}^{2} - \sum_{i = 1}^{r} {\tilde{η}}_{i}^{2}) + I (ξ; η) . \end{matrix}

This implies that

ν

coincides with the distribution of the random variable

\begin{matrix} ν_{*} & = \frac{ϱ_{r}}{2} (ζ_{1} - ζ_{2}) + I (ξ; η), \end{matrix}

where

ζ_{1}

and

ζ_{2}

are i.i.d.

χ^{2}

-distributed random variables with r degrees of freedom. With this representation, we can obtain the expression of the PDF given in (6) also from [14], Sec. 4.A.4.

Special cases of Corollary 1. The case when all canonical correlations are equal is important because it occurs in various situations. The subsequent cases follow from the properties of canonical correlations given in Section 3.

(i) Assume that the random variables

ξ_{1}, ξ_{2}, \dots, ξ_{p}, η_{1}, η_{2}, \dots, η_{q}

are pairwise uncorrelated with the exception of the pairs

(ξ_{i}, η_{i}), i = 1, 2, \dots, k \leq min {p, q}

for which we have

cor (ξ_{i}, η_{i}) = ρ \neq 0

, where

cor (\cdot, \cdot)

denotes the Pearson correlation coefficient. Then,

r = k

and

ϱ_{i} = | ρ |

for all

i = 1, 2, \dots, r

. Note, if

p = q = k

, then for the previous conditions to hold, it is sufficient that the two-dimensional random vectors

(ξ_{i}, η_{i})

are i.i.d. However, the identical distribution of the

(ξ_{i}, η_{i})

’s is not necessary. In Laneman [15], the distribution of the information density for an additive white Gaussian noise channel with i.i.d. Gaussian input is determined. This is a special case of the case with i.i.d. random vectors

(ξ_{i}, η_{i})

just mentioned. In Wu and Jindal [16] and in Buckingham and Valenti [17], an approximation of the information density by a Gaussian random variable is considered for the setting in [15]. A special case very similar to that in [15] is also considered in Polyanskiy et al. [6], Sec. III.J. To the best of the authors’ knowledge, explicit formulas for the general case as considered in this paper are not available yet in the literature.

(ii) Assume that the conditions of part (i) are satisfied. Furthermore, assume that

\hat{A}

is a real nonsingular matrix of dimension

p \times p

and

\hat{B}

is a real nonsingular matrix of dimension

q \times q

. Then, the random vectors

\begin{matrix} \hat{ξ} = \hat{A} ξ and \hat{η} = \hat{B} η \end{matrix}

have the same canonical correlations as the random vectors

ξ

and

η

, i.e.,

ϱ_{i} = | ρ |

for all

i = 1, 2, \dots, k \leq min {p, q}

.

(iii) If

r = 1

, i.e., if the cross-covariance matrix

R_{ξ, η}

has rank 1, then Corollary 1 obviously applies. Clearly, the most simple special case with

r = 1

occurs for

p = q = 1

, where

ϱ_{1} = | cor (ξ_{1}, η_{1}) |

.

As a simple multivariate example, let the covariance matrix of the random vector

(ξ_{1}, ξ_{2}, \dots, ξ_{p}, η_{1}, η_{2}, \dots, η_{q})

be given by the Kac-Murdock–Szegö matrix

\begin{matrix} (\begin{matrix} R_{ξ} & R_{ξ η} \\ R_{ξ η} & R_{η} \end{matrix}) = {(ρ^{| i - j |})}_{i, j = 1}^{p + q} \end{matrix}

which is related to the covariance function of a first-order autoregressive process, where

0 < | ρ | < 1

. Then,

r = rank (R_{ξ η}) = 1

and

ϱ_{1} = | ρ |

.

(iv) As yet another example, assume

p = q

and

R_{ξ η} = ρ R_{ξ}^{1 / 2} R_{η}^{1 / 2}

for some

0 < | ρ | < 1

. Then,

ϱ_{i} = | ρ |

for

i = 1, 2, \dots, r = q

. Here,

A^{1 / 2}

denotes the square root of the real-valued positive semidefinite matrix A, i.e., the unique positive semidefinite matrix B such that

B B = A

.

2.2. More on Special Cases with Simplified Formulas

Let us further evaluate the formulas given in Corollary 1 and Theorem 3 for some relevant parameter values.

(i) Single canonical correlation coefficient. In the most simple case, there is only a single non-zero canonical correlation coefficient, i.e.,

r = 1

. (Recall, in the beginning of the paper, we have excluded the degenerated case when all canonical correlations are zero.) Then, the formulas of the PDF and the m-th central moment in Corollary 1 simplify to the form

f_{i (ξ; η)} (x) = \frac{1}{ϱ_{1} π} K_{0} (\frac{|x - I (ξ; η|)}{ϱ_{1}}), x \in R \ {I (ξ; η)},

and

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{m}) = & \{\begin{matrix} {(\frac{m!}{(m / 2)!})}^{2} {(\frac{ϱ_{1}}{2})}^{m} & i f m = 2 \tilde{m} \\ 0 & i f m = 2 \tilde{m} - 1 \end{matrix} \end{matrix},

(10)

for all

\tilde{m} \in N

. A formula equivalent to (10) is also provided by Pinsker [8], Lemma 9.6.1 who considered the special case

p = q = 1

, which implies

r = 1

.

(ii) Second and fourth central moment. To demonstrate how the general formula given in Theorem 3 is used, we first consider

m = 2

. In this case, the summation indices

m_{1}, m_{2}, \dots, m_{r}

have to satisfy

m_{i} = 1

for a single

i \in {1, 2, \dots, r}

, whereas the remaining

m_{i}

’s have to be zero. Thus, (5) evaluates for

m = 2

to

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{2}) = var (i (ξ; η)) = \sum_{i = 1}^{r} ϱ_{i}^{2} . \end{matrix}

(11)

As a slightly more complex example, let

m = 4

. In this case, either we have

m_{i} = 2

for a single

i \in {1, 2, \dots, r}

, whereas the remaining

m_{i}

’s are zero or we have

m_{i_{1}} = m_{i_{2}} = 1

for two

i_{1} \neq i_{2} \in {1, 2, \dots, r}

, whereas the remaining

m_{i}

’s have to be zero. Thus, (5) evaluates for

m = 4

to

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{4}) = 9 \sum_{i = 1}^{r} ϱ_{i}^{4} + 6 \sum_{i = 2}^{r} \sum_{j = 1}^{i - 1} ϱ_{i}^{2} ϱ_{j}^{2} . \end{matrix}

(iii) Even number of equal canonical correlations. As in Corollary 1, assume that all canonical correlations are equal and additionally assume that the number r of canonical correlations is even, i.e.,

r = 2 \tilde{r}

for some

\tilde{r} \in N

. Then, we can use [9], Secs. 10.47.9, 10.49.1, and 10.49.12 to obtain the following relation for the modified Bessel function

K_{α} (\cdot)

of a second kind and order

α

\begin{matrix} K_{\frac{r - 1}{2}} (y) = \sqrt{\frac{π}{2}} exp (- y) \sum_{i = 0}^{r / 2 - 1} \frac{(r / 2 - 1 + i)!}{(r / 2 - 1 - i)! i! 2^{i}} y^{- (i + \frac{1}{2})}, y \in (0, \infty) . \end{matrix}

(12)

Plugging (12) into (6) and rearranging terms yields the following expression for the PDF of the information density:

\begin{matrix} f_{i (ξ; η)} (x) = \frac{1}{ϱ_{r} 2^{r - 1} (r / 2 - 1)!} & exp (- \frac{|x - I (ξ; η)|}{ϱ_{r}}) \times \\ \sum_{i = 0}^{r / 2 - 1} \frac{(2 (r / 2 - 1) - i)! 2^{i}}{(r / 2 - 1 - i)! i!} {(\frac{|x - I (ξ; η)|}{ϱ_{r}})}^{i}, x \in R . \end{matrix}

By integration, we obtain for the function

V (\cdot)

in (8) the expression

\begin{matrix} V (z) = \frac{1}{2} - \frac{1}{2^{r - 1} (r / 2 - 1)!} & exp (- \frac{z}{ϱ_{r}}) \times \\ \sum_{i = 0}^{r / 2 - 1} \frac{(2 (r / 2 - 1) - i)! 2^{i}}{(r / 2 - 1 - i)!} \sum_{j = 0}^{i} \frac{1}{(i - j)!} {(\frac{z}{ϱ_{r}})}^{i - j}, z \geq 0 . \end{matrix}

Note that these special formulas can also be obtained directly from the results given in [14], Sec. 4.A.3.

To illustrate the principal behavior of the PDF and CDF of the information density for equal canonical correlations, it is instructive to consider the specific value

r = 2

in the above formulas, which yields

\begin{matrix} f_{i (ξ; η)} (x) & = \frac{1}{2 ϱ_{r}} exp (- \frac{|x - I (ξ; η)|}{ϱ_{r}}), & x \in R, \\ V (z) & = \frac{1}{2} (1 - exp (- \frac{z}{ϱ_{r}})), & z \geq 0 \end{matrix}

and

r = 4

, for which we obtain

\begin{matrix} f_{i (ξ; η)} (x) & = \frac{1}{4 ϱ_{r}} exp (- \frac{|x - I (ξ; η)|}{ϱ_{r}}) (1 + \frac{|x - I (ξ; η)|}{ϱ_{r}}), & x \in R, \\ V (z) & = \frac{1}{2} (1 - exp (- \frac{z}{ϱ_{r}}) (1 + \frac{z}{2 ϱ_{r}})), & z \geq 0 . \end{matrix}

3. Mutual Information and Information Density in Terms of Canonical Correlations

First introduced by Hotelling [18], the canonical correlation analysis is a widely used linear method in multivariate statistics to determine the maximum correlations between two sets of random variables. It allows a particularly simple and useful representation of the mutual information and the information density of Gaussian random vectors in terms of the so-called canonical correlations. This representation was first obtained by Gelfand and Yaglom [19] and further extended by Pinsker [8], Ch. 9. For the convenience of the reader, we summarize in this section the essence of the canonical correlation analysis and demonstrate how it is applied to derive the representations in (1) and (2).

The formulation of the canonical correlation analysis given below is particularly suitable for implementations. The corresponding results are given without proof. Details and thorough discussions can be found, e.g., in Härdle and Simar [20], Koch [21], or Timm [22].

Based on the nonsingular covariance matrices

R_{ξ}

and

R_{η}

of the random vectors

ξ = (ξ_{1}, ξ_{2}, \dots, ξ_{p})

and

η = (η_{1}, η_{2}, \dots, η_{q})

, and the cross-covariance matrix

R_{ξ η}

with rank

r = rank (R_{ξ η})

satisfying

0 \leq r \leq min {p, q}

, define the matrix

M = R_{ξ}^{- \frac{1}{2}} R_{ξ η} R_{η}^{- \frac{1}{2}},

where the inverse matrices

R_{ξ}^{- 1 / 2} = {(R_{ξ}^{1 / 2})}^{- 1}

and

R_{η}^{- 1 / 2} = {(R_{η}^{1 / 2})}^{- 1}

can be obtained from diagonalizing

R_{ξ}

and

R_{η}

. Then, the matrix M has a singular value decomposition

M = U D V^{T},

where

V^{T}

denotes the transpose of V. The only non-zero entries

d_{1, 1}, d_{2, 2}, \dots, d_{r, r} > 0

of the matrix

D = {(d_{i, j})}_{i, j = 1}^{p, q}

are called canonical correlations of

ξ

and

η

, denoted by

ϱ_{i} = d_{i, i}, i = 1, 2, \dots, r

. The singular value decomposition can be chosen such that

ϱ_{1} \geq ϱ_{2} \geq \dots \geq ϱ_{r}

holds, which is assumed throughout the paper.

Define the random vectors

\begin{matrix} \hat{ξ} = ({\hat{ξ}}_{1}, {\hat{ξ}}_{2}, \dots, {\hat{ξ}}_{p}) = A ξ and \hat{η} = ({\hat{η}}_{1}, {\hat{η}}_{2}, \dots, {\hat{η}}_{q}) = B η, \end{matrix}

where the nonsingular matrices A and B are given by

\begin{matrix} A = U^{T} R_{ξ}^{- \frac{1}{2}} and B = V^{T} R_{η}^{- \frac{1}{2}} . \end{matrix}

Then, the random variables

{\hat{ξ}}_{1}, {\hat{ξ}}_{2}, \dots, {\hat{ξ}}_{p}, {\hat{η}}_{1}, {\hat{η}}_{2}, \dots, {\hat{η}}_{q}

have unit variance and they are pairwise uncorrelated with the exception of the pairs

({\hat{ξ}}_{i}, {\hat{η}}_{i}), i = 1, 2, \dots, r

for which we have

cor ({\hat{ξ}}_{i}, {\hat{η}}_{i}) = ϱ_{i}

.

Using these results, we obtain for the mutual information and the information density

\begin{matrix} I (ξ; η) & = I (A ξ; B η) & = I (\hat{ξ}; \hat{η}) & = \sum_{i = 1}^{r} I ({\hat{ξ}}_{i}; {\hat{η}}_{i}) \end{matrix}

(13)

\begin{matrix} i (ξ; η) & = i (A ξ; B η) & = i (\hat{ξ}; \hat{η}) & = \sum_{i = 1}^{r} i ({\hat{ξ}}_{i}; {\hat{η}}_{i}) (P-almost surely) . \end{matrix}

(14)

The first equality in (13) and (14) holds because A and B are nonsingular matrices, which follows, e.g., from Pinsker [8], Th. 3.7.1. Since we consider the case where

ξ

and

η

are jointly Gaussian,

\hat{ξ}

and

\hat{η}

are jointly Gaussian as well. Therefore, the correlation properties of

\hat{ξ}

and

\hat{η}

imply that all random variables

{\hat{ξ}}_{i}, {\hat{η}}_{j}

are independent except for the pairs

({\hat{ξ}}_{i}, {\hat{η}}_{i})

,

i = 1, 2, \dots, r

. This implies the last equality in (13) and (14), where

i ({\hat{ξ}}_{1}; {\hat{η}}_{1}), i ({\hat{ξ}}_{2}; {\hat{η}}_{2}), \dots, i ({\hat{ξ}}_{r}; {\hat{η}}_{r})

are independent. The sum representations follow from the chain rules of mutual information and information density and the equivalence between independence and vanishing mutual information and information density.

Since

{\hat{ξ}}_{i}

and

{\hat{η}}_{i}

are jointly Gaussian with correlation

cor ({\hat{ξ}}_{i}, {\hat{η}}_{i}) = ϱ_{i}

, we obtain from (13) and the formula of mutual information for the bivariate Gaussian case the identity (2). Additionally, with

{\hat{ξ}}_{i}

and

{\hat{η}}_{i}

having zero mean and unit variance, the information density

i ({\hat{ξ}}_{i}; {\hat{η}}_{i})

is further given by

\begin{matrix} i ({\hat{ξ}}_{i}; {\hat{η}}_{i}) = - \frac{1}{2} log (1 - ϱ_{i}^{2}) - \frac{ϱ_{i}^{2}}{2 (1 - ϱ_{i}^{2})} ({\hat{ξ}}_{i}^{2} - \frac{2 {\hat{ξ}}_{i} {\hat{η}}_{i}}{ϱ_{i}} + {\hat{η}}_{i}^{2}), i = 1, 2, \dots, r . \end{matrix}

(15)

Now assume

{\tilde{ξ}}_{1}, {\tilde{ξ}}_{2}, \dots, {\tilde{ξ}}_{r}, {\tilde{η}}_{1}, {\tilde{η}}_{2},

…

, {\tilde{η}}_{r}

are i.i.d. Gaussian random variables with zero mean and unit variance. Then, the distribution of the random vector

\begin{matrix} \frac{1}{\sqrt{2}} (\sqrt{1 + ϱ_{i}} {\tilde{ξ}}_{i} + \sqrt{1 - ϱ_{i}} {\tilde{η}}_{i}, \sqrt{1 + ϱ_{i}} {\tilde{ξ}}_{i} - \sqrt{1 - ϱ_{i}} {\tilde{η}}_{i}) \end{matrix}

coincides with the distribution of the random vector

({\hat{ξ}}_{i}, {\hat{η}}_{i})

for all

i = 1, 2, \dots, r

. Plugging this into (15), we obtain together with (14) that the distribution of the information density

i (ξ; η)

coincides with the distribution of (1).

4. Proof of Main Results

4.1. Auxiliary Results

To prove Theorem 1, the following lemma regarding the characteristic function of the information density is utilized. The results of the lemma are also used in Ibragimov and Rozanov [23] but without proof. Therefore, the proof is given below for completeness.

Lemma 1

(Characteristic function of (shifted) information density). The characteristic function of the shifted information density

i (ξ; η) - I (ξ; η)

is equal to the characteristic function of the random variable

\begin{matrix} \tilde{ν} & = \frac{1}{2} \sum_{i = 1}^{r} ϱ_{i} ({\tilde{ξ}}_{i}^{2} - {\tilde{η}}_{i}^{2}), \end{matrix}

(16)

where

{\tilde{ξ}}_{1}, {\tilde{ξ}}_{2}, \dots, {\tilde{ξ}}_{r}, {\tilde{η}}_{1}, {\tilde{η}}_{2},

…

, {\tilde{η}}_{r}

are i.i.d. Gaussian random variables with zero mean and unit variance, and

ϱ_{1}, ϱ_{2}, \dots, ϱ_{r}

are the canonical correlations of ξ and η. The characteristic function of

\tilde{ν}

is given by

φ_{\tilde{ν}} (t) = \prod_{i = 1}^{r} \frac{1}{\sqrt{1 + ϱ_{i}^{2} t^{2}}}, t \in R .

(17)

Proof.

Due to (1), the distribution of the shifted information density

i (ξ; η) - I (ξ; η)

coincides with the distribution of the random variable

\tilde{ν}

in (16) such that the characteristic functions of

i (ξ; η) - I (ξ; η)

and

\tilde{ν}

are equal.

It is a well known fact that

{\tilde{ξ}}_{i}^{2}

and

{\tilde{η}}_{i}^{2}

in (16) are chi-squared distributed random variables with one degree of freedom from which we obtain that the weighted random variables

ϱ_{i} {\tilde{ξ}}_{i}^{2} / 2

and

ϱ_{i} {\tilde{η}}_{i}^{2} / 2

are gamma distributed with a scale parameter of

1 / ϱ_{i}

and shape parameter of

1 / 2

. The characteristic function of these random variables therefore admits the form

φ_{\frac{ϱ_{i}}{2} {\tilde{ξ}}_{i}^{2}} (t) = {(1 - ϱ_{i} j t)}^{- \frac{1}{2}} .

Further, from the identity

φ_{- ϱ_{i} {\tilde{ξ}}_{i}^{2} / 2} (t) = φ_{ϱ_{i} {\tilde{ξ}}_{i}^{2} / 2} (- t)

for the characteristic function and from the independence of

{\tilde{ξ}}_{i}

and

{\tilde{η}}_{i}

, we obtain the characteristic function of

{\tilde{ν}}_{i} = ϱ_{i} ({\tilde{ξ}}_{i}^{2} - {\tilde{η}}_{i}^{2}) / 2

to be given by

φ_{{\tilde{ν}}_{i}} (t) = {(1 - ϱ_{i} j t)}^{- \frac{1}{2}} {(1 + ϱ_{i} j t)}^{- \frac{1}{2}} = {(1 + ϱ_{i}^{2} t^{2})}^{- \frac{1}{2}} .

Finally, because

\tilde{ν}

in (16) is given by the sum of the independent random variables

{\tilde{ν}}_{i}

, the characteristic function of

\tilde{ν}

results from multiplying the individual characteristic functions of the random variables

{\tilde{ν}}_{i}

. By doing so, we obtain (17). □

As further auxiliary result, the subsequent proposition providing properties of the modified Bessel function

K_{α}

of second kind and order

α

will be used to prove the main results.

Proposition 1

(Properties related to the function

K_{α}

). For all

α \in R

, the function

\begin{matrix} y \mapsto y^{α} K_{α} (y), y \in (0, \infty), \end{matrix}

where

K_{α} (\cdot)

denotes the modified Bessel function of second kind and order α [9], Sec. 10.25(ii), is strictly positive and strictly monotonically decreasing. Furthermore, if

α > 0

, then we have

\begin{matrix} lim_{y \to + 0} y^{α} K_{α} (y) = sup_{y \in (0, \infty)} y^{α} K_{α} (y) = Γ (α) 2^{α - 1} . \end{matrix}

(18)

Proof.

If

α \in R

is fixed, then

K_{α} (y)

is strictly positive and strictly monotonically decreasing w. r. t.

y \in (0, \infty)

due to [9], Secs. 10.27.3 and 10.37. Furthermore, we obtain

\begin{matrix} \frac{d y^{α} K_{α} (y)}{d y} = - y^{α} K_{α - 1} (y), y \in (0, \infty) \end{matrix}

by applying the rules to calculate derivatives of Bessel functions given in [9], Sec. 10.29(ii). It follows that

y^{α} K_{α} (y)

is strictly positive and strictly monotonically decreasing w. r. t.

y \in (0, \infty)

for all fixed

α \in R

.

Consider now the Basset integral formula as given in [9], Sec. 10.32.11

\begin{matrix} K_{α} (y z) = \frac{Γ (α + \frac{1}{2}) {(2 z)}^{α}}{y^{α} \sqrt{π}} \int_{u = 0}^{\infty} \frac{cos (u y)}{{(u^{2} + z^{2})}^{α + \frac{1}{2}}} d u \end{matrix}

(19)

for

| arg (z) | < π / 2, y > 0, α > - \frac{1}{2}

and the integral

\begin{matrix} \int_{u = 0}^{\infty} \frac{1}{{(u^{2} + 1)}^{α + \frac{1}{2}}} d u = \frac{\sqrt{π} Γ (α)}{2 Γ (α + \frac{1}{2})} \end{matrix}

(20)

for

α > 0

, where the equality holds due to [24], Secs. 3.251.2 and 8.384.1. Using (19) and (20), we obtain

\begin{matrix} lim_{y \to + 0} y^{α} K_{α} (y) & = lim_{y \to + 0} \frac{Γ (α + \frac{1}{2}) 2^{α}}{\sqrt{π}} \int_{u = 0}^{\infty} \frac{cos (u y)}{{(u^{2} + 1)}^{α + \frac{1}{2}}} d u \\ = \frac{Γ (α + \frac{1}{2}) 2^{α}}{\sqrt{π}} \int_{u = 0}^{\infty} \frac{1}{{(u^{2} + 1)}^{α + \frac{1}{2}}} d u = Γ (α) 2^{α - 1}, \end{matrix}

for all

α > 0

, where we also applied the dominated convergence theorem, which is possible due to

cos (u y) / {(u^{2} + 1)}^{α + 1 / 2} \leq 1 / {(u^{2} + 1)}^{α + 1 / 2}

. Using the previously derived monotonicity, we obtain (18). □

4.2. Proof of Theorem 1

To prove Theorem 1, we calculate the PDF

f_{\tilde{ν}}

of the random variable

\tilde{ν}

introduced in Lemma 1 by inverting the characteristic function

φ_{\tilde{ν}}

given in (17) via the integral

\begin{matrix} f_{\tilde{ν}} (v) = \frac{1}{2 π} \int_{- \infty}^{\infty} φ_{\tilde{ν}} (t) exp (- J t v) d t, v \in R . \end{matrix}

(21)

Shifting the PDF of

\tilde{ν}

by

I (ξ; η)

, we obtain the PDF

f_{i (ξ; η)} = f_{\tilde{ν}} (x - I (ξ; η))

,

x \in R

, of the information density

i (ξ; η)

.

The method used subsequently is based on the work of Mathai [10]. To invert the characteristic function

φ_{\tilde{ν}}

, we expand the factors in (17) as

\begin{matrix} {(1 + ϱ_{i}^{2} t^{2})}^{- \frac{1}{2}} & = {(1 + ϱ_{r}^{2} t^{2})}^{- \frac{1}{2}} \frac{ϱ_{r}}{ϱ_{i}} {(1 + (\frac{ϱ_{r}^{2}}{ϱ_{i}^{2}} - 1) {(1 + ϱ_{r}^{2} t^{2})}^{- 1})}^{- \frac{1}{2}} \end{matrix}

(22)

\begin{matrix} = {(1 + ϱ_{r}^{2} t^{2})}^{- \frac{1}{2}} \sum_{k = 0}^{\infty} {(- 1)}^{k} (\binom{- 1 / 2}{k}) \frac{ϱ_{r}}{ϱ_{i}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k} {(1 + ϱ_{r}^{2} t^{2})}^{- k} . \end{matrix}

(23)

In (23), we have used the binomial series

\begin{matrix} {(1 + y)}^{a} = \sum_{k = 0}^{\infty} (\binom{a}{k}) y^{k} \end{matrix}

(24)

where

a \in R

. The series is absolutely convergent for

| y | < 1

and

\begin{matrix} (\binom{a}{k}) = \prod_{ℓ = 1}^{k} \frac{a - ℓ + 1}{ℓ}, k \in N, \end{matrix}

(25)

denotes the generalized binomial coefficient with

(\binom{a}{0}) = 1

. Since

\begin{matrix} |(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}}) {(1 + ϱ_{r}^{2} t^{2})}^{- 1}| < 1 \end{matrix}

(26)

holds for all

t \in R

, the series in (23) is absolutely convergent for all

t \in R

. Using the expansion in (23) and the absolute convergence together with the identity

\begin{matrix} (\binom{- 1 / 2}{k}) = \frac{{(- 1)}^{k} (2 k)!}{{(k!)}^{2} 4^{k}} \end{matrix}

(27)

we can rewrite the characteristic function

φ_{\tilde{ν}}

as

\begin{matrix} φ_{\tilde{ν}} (t) = \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} & [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \times \\ {(1 + ϱ_{r}^{2} t^{2})}^{- (\frac{r}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}, t \in R . \end{matrix}

(28)

To obtain the PDF

f_{\tilde{ν}}

, we evaluate the inversion integral (21) based on the series representation in (28). Since every series in (28) is absolutely convergent, we can exchange summation and integration. Let

β = \frac{r}{2} + k_{1} + k_{2} + \dots + k_{r - 1}

. Then, by symmetry, we have for the integral of a summand

\int_{t = - \infty}^{\infty} \frac{exp (- J t v)}{{(1 + ϱ_{r}^{2} t^{2})}^{β}} d t = 2 \int_{t = 0}^{\infty} \frac{cos (t v)}{{(1 + ϱ_{r}^{2} t^{2})}^{β}} d t = \frac{2}{ϱ_{r}} \int_{u = 0}^{\infty} \frac{cos (u v / ϱ_{r})}{{(1 + u^{2})}^{β}} d u,

(29)

where the second equality is a result of the substitution

t = u / ϱ_{r}

. By setting

z = 1

,

α = β - \frac{1}{2} \geq 0

and

y = v / ϱ_{r}

in the Basset integral formula given in (19) in the proof of Proposition 1 and using the symmetry with respect to v, we can evaluate (29) to the following form:

\int_{t = - \infty}^{\infty} \frac{exp (- J t v)}{{(1 + ϱ_{r}^{2} t^{2})}^{β}} d t = \frac{\sqrt{π}}{Γ (β) 2^{β - \frac{3}{2}} ϱ_{r}^{β + \frac{1}{2}}} K_{β - \frac{1}{2}} (\frac{| v |}{ϱ_{r}}) {| v |}^{β - \frac{1}{2}}, v \in R \ {0} .

(30)

Combining (21), (28), and (30) yields

\begin{matrix} f_{\tilde{ν}} (v) = \frac{1}{2 \sqrt{π}} \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \times \\ \frac{K_{\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1}} (\frac{| v |}{ϱ_{r}}) {| v |}^{(\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}}{Γ (\frac{r}{2} + k_{1} + k_{2} + \dots + k_{r - 1}) 2^{(\frac{r - 3}{2} + k_{1} + k_{2} + \dots + k_{r - 1})} ϱ_{r}^{(\frac{r + 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}}, v \in R \ {0} . \end{matrix}

(31)

Slightly rearranging terms and shifting

f_{\tilde{ν}} (\cdot)

by

I (ξ; η)

yields (3).

It remains to show that

f_{i (ξ; η)} (x)

is also well defined for

x = I (ξ; η)

if

r \geq 2

. Indeed, if

r \geq 2

, then we can use Proposition 1 to obtain

\begin{matrix} lim_{x \to I (ξ; η)} f_{i (ξ; η)} (x) = \frac{1}{2 ϱ_{r} \sqrt{π}} \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \times \\ \frac{Γ (\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}{Γ (\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1} + \frac{1}{2})} \end{matrix}

where we used the exchangeability of the limit and the summation due to the absolute convergence of the series. Since

Γ (α) / Γ (α + \frac{1}{2})

is decreasing w. r. t.

α \geq \frac{1}{2}

, we have

\begin{matrix} \frac{Γ (\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1})}{Γ (\frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1} + \frac{1}{2})} \leq \frac{Γ (\frac{r - 1}{2})}{Γ (\frac{r - 1}{2} + \frac{1}{2})} \leq \sqrt{π} . \end{matrix}

Then, with (69) in the proof of Theorem 4, it follows that

{lim}_{x \to I (ξ; η)} f_{i (ξ; η)} (x)

exists and is finite. □

4.3. Proof of Theorem 2

To prove Theorem 2, we calculate the CDF

F_{\tilde{ν}}

of the random variable

\tilde{ν}

introduced in Lemma 1 by integrating the PDF

f_{\tilde{ν}}

given in (31). Shifting the CDF of

\tilde{ν}

by

I (ξ; η)

, we obtain the CDF

F_{i (ξ; η)} (x) = F_{\tilde{ν}} (x - I (ξ; η)), x \in R

, of the information density

i (ξ; η)

. Using the symmetry of

f_{\tilde{ν}}

, we can write

F_{\tilde{ν}} (z) = P (\tilde{ν} \leq z) = \{\begin{matrix} \frac{1}{2} - \int_{v = 0}^{- z} f_{\tilde{ν}} (v) d v & for z \leq 0 \\ \frac{1}{2} + \int_{v = 0}^{z} f_{\tilde{ν}} (v) d v & for z > 0 \end{matrix} .

It is therefore sufficient to evaluate the integral

\begin{matrix} V (z) : = \int_{v = 0}^{z} f_{\tilde{ν}} (v) d v \end{matrix}

(32)

for

z \geq 0

. To calculate the integral (32), we plug (31) into (32) and exchange integration and summation, which is justified by the monotone convergence theorem. To evaluate the integral of a summand, consider the following identity

\begin{matrix} \int_{x = 0}^{z} x^{α} K_{α} (x) d x = 2^{α - 1} \sqrt{π} Γ (α + \frac{1}{2}) z [K_{α} (z) L_{α - 1} (z) + K_{α - 1} (z) L_{α} (z)] \end{matrix}

(33)

for

α > - 1 / 2

given in [25], Sec. 1.12.1.3, where

L_{α} (\cdot)

denotes the modified Struve

L

function of order

α

[9], Sec. 11.2. Using (33) with

α = \frac{r - 1}{2} + k_{1} + k_{2} + \dots + k_{r - 1} \geq 0

, we obtain (4). □

4.4. Proof of Theorem 3

Using the random variable

\begin{matrix} \tilde{ν} = \sum_{i = 1}^{r} {\tilde{ν}}_{i} with {\tilde{ν}}_{i} = \frac{ϱ_{i}}{2} ({\tilde{ξ}}_{i} - {\tilde{η}}_{i}) \end{matrix}

introduced in Lemma 1 and the well-known multinomial theorem [9], Sec. 26.4.9

\begin{matrix} {(y_{1} + y_{2} + \dots y_{r})}^{m} = \sum_{(ℓ_{1}, ℓ_{2}, \dots, ℓ_{r}) \in K_{m, r}} m! \prod_{i = 1}^{r} \frac{y_{i}^{ℓ_{i}}}{ℓ_{i}!}, \end{matrix}

where

K_{m, r} = \{(ℓ_{1}, ℓ_{2}, \dots, ℓ_{r}) \in N_{0}^{r} : ℓ_{1} + ℓ_{2} + \dots + ℓ_{r} = m\}

, we can write the m-th central moment of the information density

i (ξ; η)

as

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{m}) & = E ({[\sum_{i = 1}^{r} {\tilde{ν}}_{i}]}^{m}) \\ = \sum_{(ℓ_{1}, ℓ_{2}, \dots, ℓ_{r}) \in K_{m, r}} m! \prod_{i = 1}^{r} \frac{E ({\tilde{ν}}_{i}^{ℓ_{i}})}{ℓ_{i}!} . \end{matrix}

(34)

To obtain the second equality in (34), we have exchanged expectation and summation and additionally used the identity

E (\prod_{i = 1}^{r} {\tilde{ν}}_{i}^{ℓ_{i}}) = \prod_{i = 1}^{r} E ({\tilde{ν}}_{i}^{ℓ_{i}})

, which holds due to the independence of the random variables

{\tilde{ν}}_{1}, {\tilde{ν}}_{2}, \dots, {\tilde{ν}}_{r}

.

Based on the relation between the ℓ-th central moment of a random variable and the ℓ-th derivative of its characteristic function at 0, we further have

\begin{matrix} E ({\tilde{ν}}_{i}^{ℓ_{i}}) = {(- j)}^{ℓ_{i}} \frac{d^{ℓ_{i}}}{d t^{ℓ_{i}}} φ_{{\tilde{ν}}_{i}} (t) |_{t = 0}, \end{matrix}

(35)

where

φ_{{\tilde{ν}}_{i}} (t) = {(1 + ϱ_{i}^{2} t^{2})}^{- 1 / 2}, t \in R,

is the characteristic function of the random variable

{\tilde{ν}}_{i}

derived in the proof of Lemma 1. As in the proof of Theorem 1, consider now the binomial series expansion using (24)

\begin{matrix} φ_{{\tilde{ν}}_{i}} (t) = {(1 + ϱ_{i}^{2} t^{2})}^{- \frac{1}{2}} = \sum_{m_{i} = 0}^{\infty} (\binom{- 1 / 2}{m_{i}}) {(ϱ_{i} t)}^{2 m_{i}} . \end{matrix}

The series is absolutely convergent for all

t < ϱ_{i}^{- 1}

. Furthermore, consider the Taylor series expansion of the characteristic function

φ_{{\tilde{ν}}_{i}}

at the point 0

\begin{matrix} φ_{{\tilde{ν}}_{i}} (t) = \sum_{ℓ_{i} = 0}^{\infty} (\frac{d^{ℓ_{i}}}{d t^{ℓ_{i}}} φ_{{\tilde{ν}}_{i}} (t) |_{t = 0}) \frac{t^{ℓ_{i}}}{ℓ_{i}!} . \end{matrix}

Both series expansions must be identical in an open interval around 0 such that we obtain by comparing the series coefficients

\begin{matrix} \frac{d^{ℓ_{i}}}{d t^{ℓ_{i}}} φ_{{\tilde{ν}}_{i}} (t) |_{t = 0} = & \{\begin{matrix} ℓ_{i}! (\binom{- 1 / 2}{ℓ_{i} / 2}) ϱ_{i}^{ℓ_{i}} & if ℓ_{i} = 2 m_{i} \\ 0 & if ℓ_{i} = 2 m_{i} - 1 \end{matrix} \end{matrix}

for all

m_{i} \in N

. With this result, (35) evaluates to

\begin{matrix} E ({\tilde{ν}}_{i}^{ℓ_{i}}) = & \{\begin{matrix} \frac{{(ℓ_{i}!)}^{2}}{{((ℓ_{i} / 2)!)}^{2} 4^{\frac{ℓ_{i}}{2}}} ϱ_{i}^{ℓ_{i}} & if ℓ_{i} = 2 m_{i} \\ 0 & if ℓ_{i} = 2 m_{i} - 1 \end{matrix} \end{matrix}

(36)

for all

m_{i} \in N

, where we have additionally used the identity (27).

From (34) and (36) we now obtain

E ({[i (ξ; η) - I (ξ; η)]}^{m}) = 0

for all

m = 2 \tilde{m} - 1

with

\tilde{m} \in N

because, if m is odd, then for all

(ℓ_{1}, ℓ_{2}, \dots, ℓ_{r}) \in K_{m, r}

at least one of the

ℓ_{i}

’s has to be odd. If

m = 2 \tilde{m}

with

\tilde{m} \in N

, we obtain from (34) and (36)

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{m}) & = \sum_{(ℓ_{1}, ℓ_{2}, \dots, ℓ_{r}) \in K_{m, r}} m! \prod_{i = 1}^{r} \frac{1}{ℓ_{i}!} \frac{{(ℓ_{i}!)}^{2}}{{((ℓ_{i} / 2)!)}^{2} 4^{\frac{ℓ_{i}}{2}}} ϱ_{i}^{ℓ_{i}} \\ = \sum_{(m_{1}, m_{2}, \dots, m_{r}) \in K_{m, r}^{[2]}} m! \prod_{i = 1}^{r} \frac{(2 m_{i})!}{{(m_{i}!)}^{2} 4^{m_{i}}} ϱ_{i}^{2 m_{i}} . \end{matrix}

□

4.5. Proof of Part (iii) of Corollary 1

Using the random variable

\tilde{ν}

as in the proof of Theorem 3, we can write the m-th central moment of the information density

i (ξ; η)

as

\begin{matrix} E ({[i (ξ; η) - I (ξ; η)]}^{m}) = E ({\tilde{ν}}^{m}) = {(- j)}^{m} \frac{d^{m}}{d t^{m}} φ_{\tilde{ν}} (t) |_{t = 0}, \end{matrix}

where the characteristic function

φ_{\tilde{ν}}

of

\tilde{ν}

is given by

φ_{\tilde{ν}} (t) = {(1 + ϱ_{r}^{2} t^{2})}^{- r / 2}, t \in R

, due to Lemma 1 and the equality of all canonical correlations. Using the binomial series and the Taylor series expansion as in the proof of Theorem 3, we obtain

\begin{matrix} \frac{d^{m}}{d t^{m}} φ_{\tilde{ν}} (t) |_{t = 0} = & \{\begin{matrix} m! (\binom{- r / 2}{m / 2}) ϱ_{r}^{m} & if m = 2 \tilde{m} \\ 0 & if m = 2 \tilde{m} - 1 \end{matrix} \end{matrix}

for all

\tilde{m} \in N

. Collecting terms and additionally using the definition of the generalized binomial coefficient given in (25) in the proof of Theorem 1 yields (9). □

5. Recurrence Formulas and Finite Sum Approximations

If there are at least two distinct canonical correlations, then the PDF

f_{i (ξ; η)}

and CDF

F_{i (ξ; η)}

of the information density

i (ξ; η)

are given by the infinite series in Theorems 1 and 2. If we consider only a finite number of summands in these representations, then we obtain approximations amenable in particular for numerical calculations. However, a direct finite sum approximation of the series in (3) and (4) is rather inefficient since modified Bessel and Struve

L

functions have to be evaluated for every summand. Therefore, we derive in this section recursive representations, which allow efficient numerical calculations. Furthermore, we derive uniform bounds of the approximation error. Based on the recurrence relations and the error bounds, an implementation in the programming language Python has been developed, which provides an efficient tool to numerically calculate the PDF and CDF of the information density with a predefined accuracy as high as desired. The developed source code as well as illustrating examples are made publicly available in an open access repository on GitLab [26].

Subsequently, we adopt all the previous notation and assume

r \geq 2

and at least two distinct canonical correlations (since otherwise we have the case of Corollary 1, where the series reduce to a single summand).

5.1. Recurrence Formulas

The recursive approach developed below is based on the work of Moschopoulos [27], which extended the work of Mathai [10]. First, we rewrite the series representations of the PDF and CDF of the information density given in Theorem 1 and Theorem 2 in a form, which is suitable for recursive calculations. To begin with, we define two functions appearing in the series representations (3) and (4), which involve the modified Bessel function

K_{α}

of second kind and order

α

and the modified Struve

L

function

L_{α}

of order

α

. Let us define for all

k \in N_{0}

the functions

U_{k}

and

D_{k}

by

\begin{matrix} U_{k} (z) = \frac{K_{\frac{r - 1}{2} + k} (z)}{Γ (\frac{r}{2} + k)} {(\frac{z}{2})}^{\frac{r - 1}{2} + k}, z \geq 0 \end{matrix}

(37)

and

\begin{matrix} D_{k} (z) = \frac{z}{2 ϱ_{r}} [K_{\frac{r - 1}{2} + k} (\frac{z}{ϱ_{r}}) L_{\frac{r - 3}{2} + k} (\frac{z}{ϱ_{r}}) + K_{\frac{r - 3}{2} + k} (\frac{z}{ϱ_{r}}) L_{\frac{r - 1}{2} + k} (\frac{z}{ϱ_{r}})], z \geq 0 . \end{matrix}

(38)

Furthermore, we define for all

k \in N_{0}

the coefficient

δ_{k}

by

\begin{matrix} δ_{k} = \sum_{(k_{1}, k_{2}, \dots, k_{r - 1}) \in K_{k, r - 1}} [\prod_{i = 1}^{r - 1} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}], \end{matrix}

(39)

where

K_{k, r - 1} = \{(k_{1}, k_{2}, \dots, k_{r - 1}) \in N_{0}^{r - 1} : k_{1} + k_{2} + \dots + k_{r - 1} = k\}

. With these definitions, we obtain the following alternative series representations of (3) and (4) by observing that the multiple summations over the indices

k_{1}, k_{2}, \dots, k_{r - 1}

can be shortened to one summation over the index

k = k_{1} + k_{2} + \dots + k_{r - 1}

.

Proposition 2

(Alternative representation of PDF and CDF of the information density). The PDF

f_{i (ξ; η)}

of the information density

i (ξ; η)

given in Theorem 1 has the alternative series representation

\begin{matrix} f_{i (ξ; η)} (x) = \frac{1}{ϱ_{r} \sqrt{π}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{\infty} δ_{k} U_{k} (\frac{|x - I (ξ; η)|}{ϱ_{r}}), x \in R . \end{matrix}

(40)

The function

V (\cdot)

specifying the CDF

F_{i (ξ; η)}

of the information density

i (ξ; η)

as given in Theorem 2 has the alternative series representation

\begin{matrix} V (z) = (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{\infty} δ_{k} D_{k} (z), z \geq 0 . \end{matrix}

(41)

Based on the representations in Proposition 2 and with recursive formulas for

U_{k} (\cdot)

,

D_{k} (\cdot)

and

δ_{k}

, we are in the position to calculate the PDF and CDF of the information density by a single summation over completely recursively defined terms. In the following, we will derive recurrence relations for

U_{k} (\cdot)

,

D_{k} (\cdot)

and

δ_{k}

, which allow the desired efficient calculations.

Lemma 2

(Recurrence formula of the function

U_{k}

). If for all

k \in N_{0}

the function

U_{k}

is defined by (37), then

U_{k} (z)

satisfies for all

k \geq 2

and

z \geq 0

the recurrence formula

\begin{matrix} U_{k} (z) = \frac{z^{2}}{(r + 2 k - 2) (r + 2 k - 4)} U_{k - 2} (z) + \frac{r + 2 k - 3}{r + 2 k - 2} U_{k - 1} (z) . \end{matrix}

(42)

Proof.

First, assume

z = 0

. Based on Proposition 1, we obtain for all

k \in N_{0}

\begin{matrix} lim_{z \to + 0} U_{k} (z) = \frac{Γ (\frac{r - 1}{2} + k)}{2 Γ (\frac{r}{2} + k)}, \end{matrix}

(43)

such that

U_{k} (0)

is well defined and finite. Using the recurrence relation

Γ (y + 1) = y Γ (y)

for the Gamma function [24], Sec. 8.331.1 we have

\begin{matrix} \frac{Γ (\frac{r - 1}{2} + k)}{2 Γ (\frac{r}{2} + k)} = \frac{(\frac{r - 1}{2} + k - 1)}{(\frac{r}{2} + k - 1)} \cdot \frac{Γ (\frac{r - 1}{2} + k - 1)}{2 Γ (\frac{r}{2} + k - 1)} . \end{matrix}

This shows together with (43) that the recurrence formula (42) holds for

U_{k} (0)

and

k \geq 2

.

Now, assume

z > 0

and consider the recurrence formula

\begin{matrix} z K_{α} (z) = z K_{α - 2} (z) + 2 (α - 1) K_{α - 1} (z) \end{matrix}

(44)

for the modified Bessel function of the second kind and order

α

[24], Sec. 8.486.10. Plugging (44) into (37) for

α = \frac{r - 1}{2} + k

yields for

k \geq 2

\begin{matrix} U_{k} (z) = \frac{K_{\frac{r - 1}{2} + k - 2} (z)}{Γ (\frac{r}{2} + k)} & {(\frac{z}{2})}^{\frac{r - 1}{2} + k - 2} {(\frac{z}{2})}^{2} \\ + \frac{(\frac{r - 1}{2} + k - 1) K_{\frac{r - 1}{2} + k - 1} (z)}{Γ (\frac{r}{2} + k)} {(\frac{z}{2})}^{\frac{r - 1}{2} + k - 1} . \end{matrix}

(45)

Using again the relation

Γ (y + 1) = y Γ (y)

, we obtain

\begin{matrix} Γ (\frac{r}{2} + k) & = (\frac{r}{2} + k - 1) Γ (\frac{r}{2} + k - 1) = (\frac{r}{2} + k - 1) (\frac{r}{2} + k - 2) Γ (\frac{r}{2} + k - 2), \end{matrix}

which yields together with (45) and (37) the recurrence formula (42) for

U_{k} (z)

if

z > 0

and

k \geq 2

. □

Lemma 3

(Recurrence formula of the function

D_{k}

). If, for all

k \in N_{0}

, the function

D_{k}

is defined by (38), then

D_{k} (z)

satisfies for all

k \geq 1

and

z \geq 0

the recurrence formula

\begin{matrix} D_{k} (z) = D_{k - 1} (z) - \frac{1}{2 \sqrt{π} (\frac{r}{2} + k - 1)} (\frac{z}{ϱ_{r}}) U_{k - 1} (\frac{z}{ϱ_{r}}), \end{matrix}

(46)

with

U_{k} (\cdot)

as defined in (37).

Proof.

First, assume

z = 0

. We have

D_{k} (0) = 0

for all

k \in N_{0}

and from the proof of Lemma 2 we have

U_{k} (0) = Γ (\frac{r - 1}{2} + k) / 2 Γ (\frac{r}{2} + k)

for all

k \in N_{0}

. Thus, the left-hand side and the right-hand side of (46) are both zero, which shows that (46) holds for

z = 0

and

k \geq 1

.

Now, assume

z > 0

and consider the recurrence formula

\begin{matrix} z L_{α} (z) = z L_{α - 2} (z) - 2 (α - 1) L_{α - 1} (z) - \frac{2^{1 - α} z^{α}}{\sqrt{π} Γ (α + \frac{1}{2})} \end{matrix}

for the modified Struve

L

function of order

α

[9], Sec. 11.4.25. Together with the recurrence formula (44) for the modified Bessel function of the second kind and order

α

, we obtain

\begin{matrix} z L_{α} (z) K_{α - 1} (z) & = z L_{α - 2} (z) K_{α - 1} (z) - 2 (α - 1) L_{α - 1} (z) K_{α - 1} (z) \end{matrix}

\begin{matrix} - \frac{2^{1 - α} z^{α}}{\sqrt{π} Γ (α + \frac{1}{2})} K_{α - 1} (z), \end{matrix}

(47)

\begin{matrix} z K_{α} (z) L_{α - 1} (z) & = z K_{α - 2} (z) L_{α - 1} (z) & + 2 (α - 1) K_{α - 1} (z) L_{α - 1} (z) . \end{matrix}

(48)

Plugging (47) and (48) into (38) for

α = \frac{r - 1}{2} + k

yields for

k \geq 1

\begin{matrix} D_{k} (z) = \frac{z}{2 ϱ_{r}} [ & K_{\frac{r - 1}{2} + k - 1} (\frac{z}{ϱ_{r}}) L_{\frac{r - 3}{2} + k - 1} (\frac{z}{ϱ_{r}}) + K_{\frac{r - 3}{2} + k - 1} (\frac{z}{ϱ_{r}}) L_{\frac{r - 1}{2} + k - 1} (\frac{z}{ϱ_{r}})] \\ - \frac{1}{\sqrt{π} Γ (\frac{r}{2} + k)} {(\frac{z}{2 ϱ_{r}})}^{\frac{r - 1}{2} + k} K_{\frac{r - 1}{2} + k - 1} (\frac{z}{ϱ_{r}}) . \end{matrix}

Together with (38), the identity

Γ (\frac{r}{2} + k) = (\frac{r}{2} + k - 1) Γ (\frac{r}{2} + k - 1)

, and the definition of the function

U_{k} (\cdot)

in (37), we obtain the recurrence formula (46) for

D_{k} (z)

if

z > 0

and

k \geq 1

. □

Lemma 4

(Recursive formula of the coefficient

δ_{k}

). The coefficient

δ_{k}

defined by (39) satisfies for all

k \in N_{0}

the recurrence formula

\begin{matrix} δ_{k + 1} = \frac{1}{k + 1} \sum_{j = 1}^{k + 1} j γ_{j} δ_{k + 1 - j}, \end{matrix}

(49)

where

δ_{0} = 1

and

\begin{matrix} γ_{j} = \sum_{i = 1}^{r - 1} \frac{1}{2 j} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{j} . \end{matrix}

(50)

For the derivation of Lemma 4, we use an adapted version of the method of Moschopoulos [27] and the following auxiliary result.

Lemma 5.

For

k \in N_{0}

, let g be a real univariate

(k + 1)

-times differentiable function. Then, we have the following recurrence relation for the

(k + 1)

-th derivative of the composite function

h = exp (g)

h^{(k + 1)} = \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) g^{(j)} h^{(k - j + 1)},

(51)

where

f^{(i)}

denotes the i-th derivative of the function f with

f^{(0)} = f

.

Proof.

We prove the assertion of Lemma 5 by induction over k. First, consider the base case for

k = 0

. In this case, formula (51) gives

h^{(1)} = g^{(1)} h,

which is easily seen to be true.

Assuming formula (51) holds for

h^{(k)}

, we continue with the case

k + 1

. Application of the product rule leads to

\begin{matrix} h^{(k + 1)} = & {(h^{(k)})}^{(1)} = {(\sum_{j = 1}^{k} (\binom{k - 1}{j - 1}) g^{(j)} h^{(k - j)})}^{(1)} \\ = & \sum_{j = 1}^{k} (\binom{k - 1}{j - 1}) g^{(j + 1)} h^{(k - j)} + \sum_{j = 1}^{k} (\binom{k - 1}{j - 1}) g^{(j)} h^{(k - j + 1)} . \end{matrix}

Substitution of

j^{'} = j + 1

in the first term gives

\begin{matrix} h^{(k + 1)} = & \sum_{j^{'} = 2}^{k + 1} (\binom{k - 1}{j^{'} - 2}) g^{(j^{'})} h^{(k - j^{'} + 1)} + \sum_{j = 1}^{k} (\binom{k - 1}{j - 1}) g^{(j)} h^{(k - j + 1)} . \end{matrix}

With this representation and the identity,

(\binom{k - 1}{j - 2}) + (\binom{k - 1}{j - 1}) = (\binom{k}{j - 1})

We finally have

\begin{matrix} h^{(k + 1)} = & g^{(1)} h^{(k)} + \sum_{j = 2}^{k} [(\binom{k - 1}{j - 1}) + (\binom{k - 1}{j - 2})] g^{(j)} h^{(k - j + 1)} + g^{(k + 1)} h \\ = & (\binom{k}{0}) g^{(1)} h^{(k)} + \sum_{j = 2}^{k} (\binom{k}{j - 1}) g^{(j)} h^{(k - j + 1)} + (\binom{k}{k}) g^{(k + 1)} h \\ = & \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) g^{(j)} h^{(k - j + 1)} . \end{matrix}

This completes the proof of Lemma 5. □

Proof of Lemma 4.

To prove the recurrence formula (49), we consider the characteristic function

φ_{\tilde{ν}} (t) = \prod_{i = 1}^{r} {(1 + ϱ_{i}^{2} t^{2})}^{- \frac{1}{2}}, t \in R

(52)

of the random variable

\tilde{ν}

introduced in Lemma 1. On the one hand, the series representation of

φ_{\tilde{ν}}

given in (28) in the proof of Theorem 1 can be rewritten as follows using the coefficient

δ_{k}

defined in (39):

\begin{matrix} φ_{\tilde{ν}} (t) = {(1 + ϱ_{r}^{2} t^{2})}^{- \frac{r}{2}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{ℓ = 0}^{\infty} δ_{ℓ} {(1 + ϱ_{r}^{2} t^{2})}^{- ℓ}, t \in R . \end{matrix}

(53)

On the other hand, recall the expansion of

{(1 + ϱ_{i}^{2} t^{2})}^{- \frac{1}{2}}

given in (22), which yields together with (52) and the application of the natural logarithm the identity

\begin{matrix} log (φ_{\tilde{ν}} (t)) = & log ({(1 + ϱ_{r}^{2} t^{2})}^{- \frac{r}{2}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}})) \\ + \sum_{i = 1}^{r - 1} log ({(1 + (\frac{ϱ_{r}^{2}}{ϱ_{i}^{2}} - 1) {(1 + ϱ_{r}^{2} t^{2})}^{- 1})}^{- \frac{1}{2}}) . \end{matrix}

(54)

Now consider the power series

log (1 + y) = \sum_{ℓ = 1}^{\infty} \frac{{(- 1)}^{ℓ + 1}}{ℓ} y^{ℓ},

(55)

which is absolutely convergent for

| y | < 1

. With the same arguments as in the proof of Theorem 1, in particular due to (26), we can apply the series expansion (55) to the second term on the right-hand side of (54) to obtain the absolutely convergent series representation

\begin{matrix} \begin{matrix} log (φ_{\tilde{ν}} (t)) = & log ({(1 + ϱ_{r}^{2} t^{2})}^{- \frac{r}{2}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}})) + \sum_{ℓ = 1}^{\infty} γ_{ℓ} {(1 + ϱ_{r}^{2} t^{2})}^{- ℓ}, \end{matrix} \end{matrix}

(56)

where we have further used the definition of

γ_{ℓ}

given in (50). Applying the exponential function to both sides of (56) then yields the following expression for the characteristic function

φ_{\tilde{ν}}

.

\begin{matrix} φ_{\tilde{ν}} (t) = & {(1 + ϱ_{r}^{2} t^{2})}^{- \frac{r}{2}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) exp (\sum_{ℓ = 1}^{\infty} γ_{ℓ} {(1 + ϱ_{r}^{2} t^{2})}^{- ℓ}) \end{matrix}

(57)

Comparing (53) and (57) yields the identity

\sum_{ℓ = 0}^{\infty} δ_{ℓ} {(1 + ϱ_{r}^{2} t^{2})}^{- ℓ} = exp (\sum_{ℓ = 1}^{\infty} γ_{ℓ} {(1 + ϱ_{r}^{2} t^{2})}^{- ℓ}) .

(58)

We now define

x = {(1 + ϱ_{r}^{2} t^{2})}^{- 1}

and take the

(k + 1)

-th derivative w. r. t. x on both sides of (58) using the identity

\frac{d^{m}}{d x^{m}} (\sum_{ℓ = 0}^{\infty} a_{ℓ} x^{ℓ}) = \frac{d^{m}}{d x^{m}} (\sum_{ℓ = 1}^{\infty} a_{ℓ} x^{ℓ}) = \sum_{ℓ = m}^{\infty} \frac{ℓ!}{(ℓ - m)!} a_{ℓ} x^{ℓ - m}

(59)

for the m-th derivative of a power series

\sum_{ℓ = 0}^{\infty} a_{ℓ} x^{ℓ}

. For the left-hand side of (58), we obtain

\begin{matrix} \frac{d^{k + 1}}{d x^{k + 1}} (\sum_{ℓ = 0}^{\infty} δ_{ℓ} x^{ℓ}) = \sum_{ℓ = k + 1}^{\infty} \frac{ℓ!}{(ℓ - k - 1)!} δ_{ℓ} x^{ℓ - k - 1} . \end{matrix}

(60)

For the right-hand side of (58), we obtain

\begin{matrix} \frac{d^{k + 1}}{d x^{k + 1}} (exp (\sum_{ℓ = 1}^{\infty} γ_{ℓ} x^{ℓ})) & = \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) \frac{d^{j}}{d x^{j}} (\sum_{ℓ = 1}^{\infty} γ_{ℓ} x^{ℓ}) \frac{d^{k - j + 1}}{d x^{k - j + 1}} (exp (\sum_{ℓ = 1}^{\infty} γ_{ℓ} x^{ℓ})) \\ = \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) \frac{d^{j}}{d x^{j}} (\sum_{ℓ = 1}^{\infty} γ_{ℓ} x^{ℓ}) \frac{d^{k - j + 1}}{d x^{k - j + 1}} (\sum_{ℓ = 0}^{\infty} δ_{ℓ} x^{ℓ}) \\ = \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) (\sum_{ℓ = j}^{\infty} \frac{ℓ! γ_{ℓ}}{(ℓ - j)!} x^{ℓ - j}) \times \\ (\sum_{ℓ = k + 1 - j}^{\infty} \frac{ℓ! δ_{ℓ}}{(ℓ - k + j - 1)!} x^{ℓ - k + j - 1}), \end{matrix}

(61)

where we used Lemma 5 and the identities (58) and (59). From the equality

\begin{matrix} \frac{d^{k + 1}}{d x^{k + 1}} (\sum_{ℓ = 0}^{\infty} δ_{ℓ} x^{ℓ}) = \frac{d^{k + 1}}{d x^{k + 1}} (exp (\sum_{ℓ = 1}^{\infty} γ_{ℓ} x^{ℓ})) \end{matrix}

and the evaluation of the right-hand side of (60) and (61), we obtain

\begin{matrix} (k + 1)! δ_{k + 1} & x^{0} + (\dots) & x^{1} + (\dots) & x^{2} \dots \\ = (\sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) j! γ_{j} (k + 1 - j)! δ_{k + 1 - j}) & x^{0} + (\dots) & x^{1} + (\dots) & x^{2} \dots \end{matrix}

Comparing the coefficients for

x^{0}

finally yields

\begin{matrix} δ_{k + 1} & = \frac{1}{(k + 1)!} \sum_{j = 1}^{k + 1} (\binom{k}{j - 1}) j! γ_{j} (k + 1 - j)! δ_{k + 1 - j} \\ = \frac{1}{(k + 1)!} \sum_{j = 1}^{k + 1} \frac{k!}{(j - 1)! (k + 1 - j)!} j! γ_{j} (k + 1 - j)! δ_{k + 1 - j} \\ = \frac{1}{(k + 1)} \sum_{j = 1}^{k + 1} j γ_{j} δ_{k + 1 - j} . \end{matrix}

This completes the proof of Lemma 4. □

5.2. Finite Sum Approximations

The results in the previous Section 5.1 can be used in the following way for efficient numerical calculations. Consider

\begin{matrix} {\hat{f}}_{i (ξ; η)} (x, n) = \frac{1}{ϱ_{r} \sqrt{π}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{n} δ_{k} U_{k} (\frac{|x - I (ξ; η)|}{ϱ_{r}}), x \in R \end{matrix}

(62)

for

n \in N_{0}

, i.e., the finite sum approximation of the PDF given in (40). To calculate

{\hat{f}}_{i (ξ; η)} (x, n)

, first calculate

U_{0} (| x - I (ξ; η) | / ϱ_{r})

and

U_{1} (| x - I (ξ; η) | / ϱ_{r})

using (37). Then, use the recurrence formulas (42) and (49) to calculate the remaining summands in (62). The great advantage of this approach is that only two evaluations of the modified Bessel function are required and for the rest of the calculations efficient recursive formulas are employed making the numerical computations efficient.

Similarly, consider

\begin{matrix} {\hat{F}}_{i (ξ; η)} (x, n) = & \{\begin{matrix} \begin{matrix} \frac{1}{2} - \hat{V} (I (ξ; η) - x, n) \end{matrix} & if x \leq I (ξ; η) \\ \frac{1}{2} + \hat{V} (x - I (ξ; η), n) & if x > I (ξ; η) \end{matrix}, \end{matrix}

(63)

\begin{matrix} with \hat{V} (z, n) = (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{n} δ_{k} D_{k} (z), z \geq 0, \end{matrix}

(64)

for

n \in N_{0}

, i.e., the finite sum approximation of the alternative representation of the CDF of the information density, where

\hat{V} (z, n)

is the finite sum approximation of the function

V (\cdot)

given in (41). To calculate

{\hat{F}}_{i (ξ; η)} (x, n)

, first calculate

D_{0} (z)

,

U_{0} (z / ϱ_{r})

, and

U_{1} (z / ϱ_{r})

for

z = I (ξ; η) - x

or

z = x - I (ξ; η)

using (37) and (38). Then, use the recurrence formulas (42), (46), and (49) to calculate the remaining summands in (64). This approach requires only three evaluations of the modified Bessel and Struve

L

function resulting in efficient numerical calculations also for the CDF of the information density.

The following theorem provides suitable bounds to evaluate and control the error related to the introduced finite sum approximations.

Theorem 4

(Bounds of the approximation error for the alternative representation of PDF and CDF). For the finite sum approximations in (62)–(64) of the alternative representation of the PDF and CDF of the information density as given in Proposition 2, we have for

n \in N

summands the error bounds

\begin{matrix} | f_{i (ξ; η)} (x) - {\hat{f}}_{i (ξ; η)} (x, n) | \leq \frac{Γ (\frac{r - 1}{2} + n)}{2 ϱ_{r} \sqrt{π} Γ (\frac{r}{2} + n)} (1 - (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{n} δ_{k}), x \in R \end{matrix}

(65)

and

\begin{matrix} | V (z) - \hat{V} (z, n) | \leq \frac{1}{2} (1 - (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{n} δ_{k}), z \geq 0 . \end{matrix}

(66)

Proof.

From the special case where all canonical correlations are equal, we can conclude from the CDF given in Corollary 1 that the function

\begin{matrix} z \mapsto z [K_{α} (z) L_{α - 1} (z) + K_{α - 1} (z) L_{α} (z)], z \geq 0, \end{matrix}

(67)

is monotonically increasing for all

α = (j - 1) / 2

,

j \in N

and that further

\begin{matrix} lim_{z \to \infty} z [K_{α} (z) L_{α - 1} (z) + K_{α - 1} (z) L_{α} (z)] = 1 \end{matrix}

(68)

holds. Using (68), we obtain from (4)

\begin{matrix} lim_{z \to \infty} 2 V (z) = \sum_{k_{1} = 0}^{\infty} \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] \end{matrix}

by exchanging the limit and the summation, which is justified by the monotone convergence theorem. Due to the properties of the CDF, we have

{lim}_{z \to \infty} 2 V (z) = 1

, which implies

\begin{matrix} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = 0}^{\infty} δ_{k} = \sum_{k_{2} = 0}^{\infty} \dots \sum_{k_{r - 1} = 0}^{\infty} [\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}} \frac{(2 k_{i})!}{{(k_{i}!)}^{2} 4^{k_{i}}} {(1 - \frac{ϱ_{r}^{2}}{ϱ_{i}^{2}})}^{k_{i}}] = 1, \end{matrix}

(69)

where the first equality follows from the definition of the coefficient

δ_{k}

in (39).

We now obtain with (41) and (64)

\begin{matrix} | V (z) - \hat{V} (z, n) | = (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = n + 1}^{\infty} δ_{k} D_{k} (z) \leq & (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = n + 1}^{\infty} \frac{1}{2} δ_{k} . \end{matrix}

The inequality follows from the definition of the function

D_{k} (\cdot)

in (38), the monotonicity of the function in (67), and from (68). Then, (66) follows from (69).

Similarly, we obtain with (40) and (62)

\begin{matrix} | f_{i (ξ; η)} (x) - {\hat{f}}_{i (ξ; η)} (x, n) | = & \frac{1}{ϱ_{r} \sqrt{π}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = n + 1}^{\infty} δ_{k} U_{k} (\frac{|x - I (ξ; η)|}{ϱ_{r}}) \\ \leq & \frac{1}{ϱ_{r} \sqrt{π}} (\prod_{i = 1}^{r - 1} \frac{ϱ_{r}}{ϱ_{i}}) \sum_{k = n + 1}^{\infty} δ_{k} \frac{Γ (\frac{r - 1}{2} + n)}{2 Γ (\frac{r}{2} + n)} . \end{matrix}

The inequality follows from the definition of the function

U_{k} (\cdot)

, Proposition 1, and the decreasing monotonicity of

Γ (\frac{r - 1}{2} + k) / Γ (\frac{r}{2} + k)

w. r. t.

k \in N_{0}

. Then, (65) follows from (69). □

Remark 1.

Note that the bound in (65) can be further simplified using the inequality

Γ (α) / Γ (α + 1 / 2) \leq \sqrt{π} .

Further note that the derived error bounds are uniform in the sense that they only depend on the parameters of the given Gaussian distribution and the number of summands considered. As can be seen from (69), the bounds converge to zero as the number of summands jointly increase.

Remark 2

(Relation to Bell polynomials). Interestingly, the coefficient

δ_{k}

can be expressed for all

k \in N

in the following form

\begin{matrix} δ_{k} = \frac{B_{k} (γ_{1}, 2 γ_{2}, 6 γ_{3}, \dots, k! γ_{k})}{k!}, \end{matrix}

where

γ_{j}

is defined in (50), and

B_{k}

denotes the complete Bell polynomial of order k [28], Sec. 3.3. Even though this is an interesting connection to the Bell polynomials, which provides an explicit formula of

δ_{k}

, the recursive formula given in Lemma 4 is more efficient for numerical calculations.

6. Numerical Examples and Illustrations

We illustrate the results of this paper with some examples, which all can be verified with the Python implementation publicly available on GITLAB [26].

Equal canonical correlations. First, we consider the special case of Corollary 1 when all canonical correlations are equal. The PDF and CDF given by (6) and (7) are illustrated in Figure 1 and Figure 2 in centered form, i.e., shifted by

I (ξ; η)

, for

r \in {1, 2, 3, 4, 5}

and equal canonical correlations

ϱ_{i} = 0.9, i = 1, \dots, r

. In Figure 3 and Figure 4, a fixed number of

r = 5

equal canonical correlations

ϱ_{i} \in {0.1, 0.2, 0.5, 0.7, 0.9}, i = 1, \dots, r

is considered. When all canonical correlations are equal, then, due to the central limit theorem, the distribution of the information density

i (ξ; η)

converges to a Gaussian distribution as

r \to \infty

. Figure 5 and Figure 6 show for

r \in {5, 10, 20, 40}

and equal canonical correlations

ϱ_{i} = 0.2, i = 1, 2, \dots, r

the PDF and CDF of the information density together with corresponding Gaussian approximations. The approximations are obtained by considering Gaussian distributions, which have the same variance as the information density

i (ξ; η)

. Recall that the variance of the information density is given by (11), i.e., by the sum of the squared canonical correlations. The illustrations show that only for a high number of equal canonical correlations the distribution of the information density becomes approximately Gaussian.

Figure 1. PDF

f_{i (ξ; η) - I (ξ; η)}

for

r \in {1, 2, 3, 4, 5}

equal canonical correlations

ϱ_{i} = 0.9

.

Figure 2. CDF

F_{i (ξ; η) - I (ξ; η)}

for

r \in {1, 2, 3, 4, 5}

equal canonical correlations

ϱ_{i} = 0.9

.

Figure 3. PDF

f_{i (ξ; η) - I (ξ; η)}

for

r = 5

equal canonical correlations

ϱ_{i} \in {0.1, 0.2, 0.5, 0.7, 0.9}

.

Figure 4. CDF

F_{i (ξ; η) - I (ξ; η)}

for

r = 5

equal canonical correlations

ϱ_{i} \in {0.1, 0.2, 0.5, 0.7, 0.9}

.

Figure 5. PDF

f_{i (ξ; η) - I (ξ; η)}

for

r \in {5, 10, 20, 40}

equal canonical correlations

ϱ_{i} = 0.2

vs. Gaussian approximation.

Figure 6. CDF

F_{i (ξ; η) - I (ξ; η)}

for

r \in {5, 10, 20, 40}

equal canonical correlations

ϱ_{i} = 0.2

vs. Gaussian approximation.

Different canonical correlations. To illustrate the case with different canonical correlations, let us consider two more examples.

(i) First, assume that the random vectors

ξ = (ξ_{1}, ξ_{2}, \dots, ξ_{p})

and

η = (η_{1}, η_{2}, \dots, η_{q})

have equal dimensions, i.e.,

p = q

, and are related by

\begin{matrix} (η_{1}, η_{2}, \dots, η_{p}) = (ξ_{1} + ζ_{1}, ξ_{2} + ζ_{2}, \dots, ξ_{p} + ζ_{p}), \end{matrix}

where

ξ = (ξ_{1}, ξ_{2}, \dots, ξ_{p})

and

ζ = (ζ_{1}, ζ_{2}, \dots, ζ_{p})

are zero mean Gaussian random vectors, independent of each other and with covariance matrices

\begin{matrix} R_{ξ} = {(ρ^{| i - j |})}_{i, j = 1}^{p} and R_{ζ} = σ_{z}^{2} I_{p}, \end{matrix}

for parameters

0 < | ρ | < 1

and

σ_{z}^{2} > 0

, where

I_{p}

denotes the identity matrix of dimension

p \times p

. The covariance matrix of the Gaussian random vector

(ξ_{1}, ξ_{2}, \dots, ξ_{p}, η_{1}, η_{2}, \dots, η_{p})

is the basis of the canonical correlation analysis and is given by

(\begin{matrix} R_{ξ} & R_{ξ η} \\ R_{ξ η} & R_{η} \end{matrix}) = (\begin{matrix} R_{ξ} & R_{ξ} \\ R_{ξ} & R_{ξ} + R_{ζ} \end{matrix}) .

The specified situation corresponds to a discrete-time additive noise channel, where a stationary first-order Markov-Gaussian input process is corrupted by a stationary additive white Gaussian noise process. In this setting, a block of p consecutive input and output symbols is considered.

For given parameter values

ρ

and

σ_{z}^{2}

, the canonical correlations can be calculated numerically with the method described in Section 3. However, the example at hand even allows the derivation of explicit formulas for the canonical correlations. Evaluating the approach in Section 3 analytically yields

\begin{matrix} ϱ_{i} (ρ, σ_{z}^{2}) = \sqrt{\frac{λ_{i}}{λ_{i} + σ_{z}^{2}}} with λ_{i} = \frac{1 - ρ^{2}}{1 - 2 ρ cos (θ_{i}) + ρ^{2}}, i = 1, 2, \dots, r = p, \end{matrix}

(70)

where

θ_{1}, θ_{2}, \dots, θ_{r}

are the zeros of the function

\begin{matrix} g (θ) = sin ((r + 1) θ) - 2 ρ sin (r θ) + ρ^{2} sin ((r - 1) θ), θ \in (0, π) . \end{matrix}

In this representation,

λ_{1}, λ_{2}, \dots, λ_{r}

denote the eigenvalues of the covariance matrix

R_{ξ} = {(ρ^{| i - j |})}_{i, j = 1}^{p}

derived in [29], Sec. 5.3.

As numerical examples Figure 7 and Figure 8 show, the approximated PDF

{\hat{f}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

and CDF

{\hat{F}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

for

p = r \in {5, 10, 20, 40}

and the parameter values

ρ = 0.9

and

σ_{z}^{2} = 10

using the finite sums (62) and (64). The bounds of the approximation error given in Theorem 4 are chosen

< 1 \times 10^{- 3}

to obtain a high precision of the plotted curves. The number n of summands required in (62) and (64) to achieve these error bounds for

r \in {5, 10, 20, 40}

is equal to

n \in {217, 333, 462, 649}

for the PDF and

n \in {282, 444, 618, 847}

for the CDF. For this example, the distribution of the information density

i (ξ; η)

converges to a Gaussian distribution as

r \to \infty

. However, Figure 7 and Figure 8 show that, even for

r = 40

, there is still a significant gap between the exact distribution and the corresponding Gaussian approximation.

Figure 7. Approximated PDF

{\hat{f}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

for

r \in {5, 10, 20, 40}

canonical correlations

ϱ_{i} (ρ, σ_{z}^{2})

given in (70) for

ρ = 0.9

and

σ_{z}^{2} = 10

(approximation error

< 1 \times 10^{- 3}

) vs. Gaussian approximation (

r \in {20, 40}

).

Figure 8. Approximated CDF

{\hat{F}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

for

r \in {5, 10, 20, 40}

canonical correlations

ϱ_{i} (ρ, σ_{z}^{2})

given in (70) for

ρ = 0.9

and

σ_{z}^{2} = 10

(approximation error

< 1 \times 10^{- 3}

) vs. Gaussian approximation (

r \in {20, 40}

).

(ii) As a second example with different canonical correlations, let us consider the sequence

{ϱ_{1} (T), ϱ_{2} (T),

\dots, ϱ_{r} (T)}

with

\begin{matrix} ϱ_{i} (T) = \sqrt{\frac{T^{2}}{T^{2} + π {(i - \frac{1}{2})}^{2}}}, i = 1, 2, \dots, r . \end{matrix}

(71)

These canonical correlations are related to the information density of a continuous-time additive white Gaussian noise channel confined to a finite time interval

[0, T]

with a Brownian motion as input signal (see, e.g., Huffmann [30], Sec. 8.1 for more details). Figure 9 and Figure 10 show the approximated PDF

{\hat{f}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

and CDF

{\hat{F}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

for

r \in {2, 5, 10, 15}

and

T = 1

using the finite sums (62) and (64). The bounds of the approximation error given in Theorem 4 are chosen

< 1 \times 10^{- 2}

such that there are no differences visible in the plotted curves by further lowering the approximation error. The number n of summands required in (62) and (64) to achieve these error bounds for

r \in {2, 5, 10, 15}

is equal to

n \in {15, 141, 638, 1688}

for the PDF and

n \in {20, 196, 886, 2071}

for the CDF. Choosing r larger than 15 for the canonical correlations (71) with

T = 1

does not result in visible changes of the PDF and CDF compared to

r = 15

. This demonstrates, together with Figure 9 and Figure 10, that a Gaussian approximation is not valid for this example, even if

r \to \infty

.

Figure 9. Approximated PDF

{\hat{f}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

for

r \in {2, 5, 10, 15}

canonical correlations

ϱ_{i} (T)

given in (71) for

T = 1

(approximation error

< 1 \times 10^{- 2}

) vs. Gaussian approximation (

r = 15

).

Figure 10. Approximated CDF

{\hat{F}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

for

r \in {2, 5, 10, 15}

canonical correlations

ϱ_{i} (T)

given in (71) for

T = 1

(approximation error

< 1 \times 10^{- 2}

) vs. Gaussian approximation (

r = 15

).

Indeed, from [8], Th. 9.6.1 and the comment above Eq. (9.6.45) in [8], one can conclude that, whenever the canonical correlations satisfy

\begin{matrix} lim_{r \to \infty} \sum_{i = 1}^{r} ϱ_{i}^{2} < \infty, \end{matrix}

then the distribution of the information density is not Gaussian.

7. Summary of Contributions

We derived series representations of the PDF and CDF of the information density for arbitrary Gaussian random vectors as well as a general formula for the central moments using canonical correlation analysis. We provided simplified and closed-form expressions for important special cases, in particular when all canonical correlations are equal, and derived recurrence formulas and uniform error bounds for finite sum approximations of the general series representations. These approximations and recurrence formulas are suitable for efficient and arbitrarily accurate numerical calculations, where the approximation error can be easily controlled with the derived error bounds. Moreover, we provided examples showing the (in)validity of approximating the information density with a Gaussian random variable.

Author Contributions

J.E.W.H. and M.M. conceived this work, performed the analysis, validated the results, and wrote the manuscript. All authors have read and agreed to this version of the manuscript.

Funding

The work of M.M. was supported in part by the German Research Foundation (Deutsche Forschungsgemeinschaft) as part of Germany’s Excellence Strategy—EXC 2050/1—Project ID 390696704—Cluster of Excellence “Centre for Tactile Internet with Human-in-the-Loop” (CeTI) of Technische Universität Dresden. We acknowledge the open access publication funding granted by CeTI.

Data Availability Statement

An implementation in Python allowing efficient numerical calculations related to the main results of the paper is publicly available on GitLab: https://gitlab.com/infth/information-density (accessed on 24 June 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Han, T.S.; Verdú, S. Approximation Theory of Output Statistics. IEEE Trans. Inf. Theory 1993, 39, 752–772. [Google Scholar] [CrossRef] [Green Version]
Han, T.S. Information-Spectrum Methods in Information Theory; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Shannon, C.E. Probability of Error for Optimal Codes in a Gaussian Channel. Bell Syst. Tech. J. 1959, 38, 611–659. [Google Scholar] [CrossRef]
Dobrushin, R.L. Mathematical Problems in the Shannon Theory of Optimal Coding of Information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability; Volume 1: Contributions to the Theory of Statistics; University of California Press: Berkeley, CA, USA, 1961; pp. 211–252. [Google Scholar]
Strassen, V. Asymptotische Abschätzungen in Shannons Informationstheorie. In Transactions of the Third Prague Conference on Information Theory, Statistical Decision Functions, Random Processes (Held 1962); Czechoslovak Academy of Sciences: Prague, Czech Republic, 1964; pp. 689–723. [Google Scholar]
Polyanskiy, Y.; Poor, H.V.; Verdú, S. Channel Coding Rate in the Finite Blocklength Regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
Durisi, G.; Koch, T.; Popovski, P. Toward Massive, Ultrareliable, and Low-Latency Wireless Communication With Short Packets. Proc. IEEE 2016, 104, 1711–1726. [Google Scholar] [CrossRef] [Green Version]
Pinsker, M.S. Information and Information Stability of Random Variables and Processes; Holden-Day: San Francisco, CA, USA, 1964. [Google Scholar]
Olver, F.W.J.; Lozier, D.W.; Boisvert, R.F.; Clark, C.W. (Eds.) NIST Handbook of Mathematical Functions; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Mathai, A.M. Storage Capacity of a Dam With Gamma Type Inputs. Ann. Inst. Stat. Math. 1982, 34, 591–597. [Google Scholar] [CrossRef]
Grad, A.; Solomon, H. Distribution of Quadratic Forms and Some Applications. Ann. Math. Stat. 1955, 26, 464–477. [Google Scholar] [CrossRef]
Kotz, S.; Johnson, N.L.; Boyd, D.W. Series Representations of Distributions of Quadratic Forms in Normal Variables. I. Central Case. Ann. Math. Stat. 1967, 38, 823–837. [Google Scholar] [CrossRef]
Huffmann, J.E.W.; Mittelbach, M. On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations. Entropy 2022, 24, 924. [Google Scholar] [CrossRef]
Simon, M.K. Probability Distributions Involving Gaussian Random Variables: A Handbook for Engineers and Scientists; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Laneman, J.N. On the Distribution of Mutual Information. In Proceedings of the Workshop Information Theory and Its Applications (ITA), San Diego, CA, USA, 13 February 2006. [Google Scholar]
Wu, P.; Jindal, N. Coding Versus ARQ in Fading Channels: How Reliable Should the PHY Be? IEEE Trans. Commun. 2011, 59, 3363–3374. [Google Scholar] [CrossRef] [Green Version]
Buckingham, D.; Valenti, M.C. The Information-Outage Probability of Finite-Length Codes Over AWGN Channels. In Proceedings of the 42nd Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 19–21 March 2008. [Google Scholar]
Hotelling, H. Relations Between Two Sets of Variates. Biometrika 1936, 28, 321–377. [Google Scholar] [CrossRef]
Gelfand, I.M.; Yaglom, A.M. Calculation of the Amount of Information About a Random Function Contained in Another Such Function. In AMS Translations, Series 2; AMS: Providence, RI, USA, 1959; Volume 12, pp. 199–246. [Google Scholar]
Härdle, W.K.; Simar, L. Applied Multivariate Statistical Analysis, 4th ed.; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Koch, I. Analysis of Multivariate and High-Dimensional Data; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
Timm, N.H. Applied Multivariate Analysis; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Ibragimov, I.A.; Rozanov, Y.A. On the Connection Between Two Characteristics of Dependence of Gaussian Random Vectors. Theory Probab. Appl. 1970, 15, 295–299. [Google Scholar] [CrossRef]
Gradshteyn, I.S.; Ryzhik, I.M. Table of Integrals, Series, and Products, 7th ed.; Elsevier: Amsterdam, The Netherlands, 2007. [Google Scholar]
Prudnikov, A.P.; Brychov, Y.A.; Marichev, O.I. Integrals and Series, Volume 2: Special Functions; Gordon and Breach Science: New York, NY, USA, 1986. [Google Scholar]
Huffmann, J.E.W.; Mittelbach, M. Efficient Python Implementation to Numerically Calculate PDF, CDF, and Moments of the Information Density of Gaussian Random Vectors. Source Code Provided on GitLab. 2021. Available online: https://gitlab.com/infth/information-density (accessed on 24 June 2022). Source Code Provided on GitLab.
Moschopoulos, P.G. The Distribution of the Sum of Independent Gamma Random Variables. Ann. Inst. Stat. Math. 1985, 37, 541–544. [Google Scholar] [CrossRef]
Comtet, L. Advanced Combinatorics: The Art of Finite and Infinite Expansions, Revised and Enlarged ed.; D. Reidel Publishing Company: Dordrecht, The Netherlands, 1974. [Google Scholar]
Grenander, U.; Szegö, G. Toeplitz Forms and Their Applications; University of California Press: Berkeley, CA, USA, 1958. [Google Scholar]
Huffmann, J.E.W. Canonical Correlation and the Calculation of Information Measures for Infinite-Dimensional Distributions. Diploma Thesis, Department of Electrical Engineering and Information Technology, Technische Universität Dresden, Dresden, Germany, 2021. Available online: https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-742541 (accessed on 24 June 2022).

Figure 1. PDF

f_{i (ξ; η) - I (ξ; η)}

for

r \in {1, 2, 3, 4, 5}

equal canonical correlations

ϱ_{i} = 0.9

.

Figure 2. CDF

F_{i (ξ; η) - I (ξ; η)}

for

r \in {1, 2, 3, 4, 5}

equal canonical correlations

ϱ_{i} = 0.9

.

Figure 3. PDF

f_{i (ξ; η) - I (ξ; η)}

for

r = 5

equal canonical correlations

ϱ_{i} \in {0.1, 0.2, 0.5, 0.7, 0.9}

.

Figure 4. CDF

F_{i (ξ; η) - I (ξ; η)}

for

r = 5

equal canonical correlations

ϱ_{i} \in {0.1, 0.2, 0.5, 0.7, 0.9}

.

Figure 5. PDF

f_{i (ξ; η) - I (ξ; η)}

for

r \in {5, 10, 20, 40}

equal canonical correlations

ϱ_{i} = 0.2

vs. Gaussian approximation.

Figure 6. CDF

F_{i (ξ; η) - I (ξ; η)}

for

r \in {5, 10, 20, 40}

equal canonical correlations

ϱ_{i} = 0.2

vs. Gaussian approximation.

Figure 7. Approximated PDF

{\hat{f}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

for

r \in {5, 10, 20, 40}

canonical correlations

ϱ_{i} (ρ, σ_{z}^{2})

given in (70) for

ρ = 0.9

and

σ_{z}^{2} = 10

(approximation error

< 1 \times 10^{- 3}

) vs. Gaussian approximation (

r \in {20, 40}

).

Figure 8. Approximated CDF

{\hat{F}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

for

r \in {5, 10, 20, 40}

canonical correlations

ϱ_{i} (ρ, σ_{z}^{2})

given in (70) for

ρ = 0.9

and

σ_{z}^{2} = 10

(approximation error

< 1 \times 10^{- 3}

) vs. Gaussian approximation (

r \in {20, 40}

).

Figure 9. Approximated PDF

{\hat{f}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

for

r \in {2, 5, 10, 15}

canonical correlations

ϱ_{i} (T)

given in (71) for

T = 1

(approximation error

< 1 \times 10^{- 2}

) vs. Gaussian approximation (

r = 15

).

Figure 10. Approximated CDF

{\hat{F}}_{i (ξ; η) - I (ξ; η)} (\cdot, n)

for

r \in {2, 5, 10, 15}

canonical correlations

ϱ_{i} (T)

given in (71) for

T = 1

(approximation error

< 1 \times 10^{- 2}

) vs. Gaussian approximation (

r = 15

).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations

Abstract

1. Introduction and Main Theorems

2. Special Cases

2.1. Equal Canonical Correlations

2.2. More on Special Cases with Simplified Formulas

3. Mutual Information and Information Density in Terms of Canonical Correlations

4. Proof of Main Results

4.1. Auxiliary Results

4.2. Proof of Theorem 1

4.3. Proof of Theorem 2

4.4. Proof of Theorem 3

4.5. Proof of Part (iii) of Corollary 1

5. Recurrence Formulas and Finite Sum Approximations

5.1. Recurrence Formulas

5.2. Finite Sum Approximations

6. Numerical Examples and Illustrations

7. Summary of Contributions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics