On the Distribution of the Information Density of Gaussian Random Vectors: Explicit Formulas and Tight Approximations

Based on the canonical correlation analysis, we derive series representations of the probability density function (PDF) and the cumulative distribution function (CDF) of the information density of arbitrary Gaussian random vectors as well as a general formula to calculate the central moments. Using the general results, we give closed-form expressions of the PDF and CDF and explicit formulas of the central moments for important special cases. Furthermore, we derive recurrence formulas and tight approximations of the general series representations, which allow efficient numerical calculations with an arbitrarily high accuracy as demonstrated with an implementation in Python publicly available on GitLab. Finally, we discuss the (in)validity of Gaussian approximations of the information density.


Introduction and Main Theorems
Let ξ and η be arbitrary random variables on an abstract probability space (Ω, F , P) such that the joint distribution P ξη is absolutely continuous w. r. t. the product P ξ ⊗ P η of the marginal distributions P ξ and P η . If dP ξη dP ξ ⊗P η denotes the Radon-Nikodym derivative of P ξη w. r. t. P ξ ⊗ P η , then i(ξ; η) = log dP ξη dP ξ ⊗ P η (ξ, η) is called the information density of ξ and η. The expectation E(i(ξ; η)) = I(ξ; η) of the information density, called mutual information, plays a key role in characterizing the asymptotic channel coding performance in terms of channel capacity. The non-asymptotic performance, however, is determined by the higher-order moments of the information density and its probability distribution. Achievability and converse bounds that allow a finite blocklength analysis of the optimum channel coding rate are closely related to the distribution function of the information density, also called information spectrum by Han and Verdú [1,2]. Moreover, based on the variance of the information density tight second-order finite blocklength approximations of the optimum code rate can be derived for various important channel models. First work on a non-asymptotic information theoretic analysis was already published in the early years of information theory by Shannon [3], Dobrushin [4], and Strassen [5], among others. Due to the seminal work of Polyanskiy et al. [6], considerable progress has been made in this area. The results of [6] on the one hand and the requirements of current and future wireless networks regarding latency and reliability on the other hand stimulated a significant new interest in this type of analysis (Durisi et al. [7]).
The information density i(ξ; η) in the case when ξ and η are jointly Gaussian is of special interest due to the prominent role of the Gaussian distribution. Let ξ = (ξ 1 , ξ 2 , . . . , ξ p ) and η = (η 1 , η 2 , . . . , η q ) be real-valued random vectors with nonsingular covariance matrices R ξ and R η and cross-covariance matrix R ξη with rank r = rank(R ξη ). (For notational convenience, we write vectors as row vectors. However, in expressions where matrix or vector multiplications occur, we consider all vectors as column vectors.) Without loss of generality for the subsequent results, we assume the expectation of all random variables to be zero. If (ξ 1 , ξ 2 , . . . , ξ p , η 1 , η 2 , . . . , η q ) is a Gaussian random vector, then Pinsker [8], Ch. 9.6 has shown that the distribution of the information density i(ξ; η) coincides with the distribution of the random variable In this representationξ 1 ,ξ 2 , . . . ,ξ r ,η 1 ,η 2 ,. . . ,η r are independent and identically distributed (i.i.d.) Gaussian random variables with zero mean and unit variance and the mutual information I(ξ; η) in (1) has the form ( Moreover, 1 ≥ 2 ≥ . . . ≥ r > 0 denote the positive canonical correlations of ξ and η in descending order, which are obtained by a linear method called canonical correlation analysis that yields the maximum correlations between two sets of random variables (see Section 3). The rank r of the cross-covariance matrix R ξη satisfies 0 ≤ r ≤ min{p, q}, and for r = 0 we have i(ξ; η) ≡ 0 almost surely and I(ξ; η) = 0. This corresponds to P ξη = P ξ ⊗ P η and the independence of ξ and η such that the resulting information density is deterministic. Throughout the rest of the paper, we exclude this degenerated case when the information density is considered and assume subsequently the setting and notation introduced above with r ≥ 1. As customary notation, we further write R, N 0 , and N to denote the set of real numbers, non-negative integers, and positive integers. Main contributions. Based on (1), we derive in Section 4 series representations of the probability density function (PDF) and the cumulative distribution function (CDF) as well as explicit general formulas for the central moments of the information density i(ξ; η) given subsequently in Theorems 1 to 3. The series representations are useful as they allow tight approximations with errors as low as desired by finite sums as shown in Section 5.2. Moreover, we derive recurrence formulas in Section 5.1 that allow efficient numerical calculations of the series representations in Theorems 1 and 2.
The method to obtain the result in Theorem 1 is adopted from Mathai [10], where a series representation of the PDF of the sum of independent gamma distributed random variables is derived. Previous work of Grad and Solomon [11] and Kotz et al. [12] goes in a similar direction as Mathai [10]; however, it is not directly applicable since only the restriction to positive series coefficients is considered there. Using Theorem 1, the series representation of the CDF of the information density in Theorem 2 is obtained. The details of the derivations of Theorems 1 and 2 are provided in Section 4. Theorem 3 (Central moments of information density). The m-th central moment E [i(ξ; η) − I(ξ; η)] m of the information density i(ξ; η) is given by for allm ∈ N, where K [2] m,r = (m 1 , m 2 , . . . , m r ) ∈ N r 0 : 2m 1 + 2m 2 + · · · + 2m r = m .
Pinsker [8], Eq. (9.6.17) provided a formula for called "derived m-th central moment" of the information density, whereξ i andη i are given as in (1). These special moments coincide for m = 2 with the usual central moments considered in Theorem 3. The rest of the paper is organized as follows: In Section 2, we discuss important special cases which allow simplified and explicit formulas. In Section 3, we provide some background on the canonical correlation analysis and its application to the calculation of the information density and mutual information for Gaussian random vectors. The proofs of the main Theorems 1 to 3 are given in Section 4. Recurrence formulas, finite sum approximations, and uniform bounds of the approximation error are derived in Section 5, which allow efficient and accurate numerical calculations of the PDF and CDF of the information density. Some examples and illustrations are provided in Section 6, where also the (in)validity of Gaussian approximations is discussed. Finally, Section 7 summarizes the paper. Note that a first version of this paper was published on ARXIV as preprint [13].

Equal Canonical Correlations
A simple but important special case for which the series representations in Theorems 1 and 2 simplify to a single summand and the sum of products in Theorem 3 simplifies to a single product is considered in the following corollary.
Corollary 1 (PDF, CDF, and central moments of information density for equal canonical correlations). If all canonical correlations are equal, i.e., then we have the following simplifications.
(i) The PDF f i(ξ;η) of the information density i(ξ; η) simplifies to where I(ξ; η) is given by with V(z) defined by (iii) The m-th central moment E [i(ξ; η) − I(ξ; η)] m of the information density i(ξ; η) has the form for allm ∈ N.
Clearly, if all canonical correlations are equal, then the only nonzero term in the series (3) and (4) occur for k 1 = k 2 = . . . = k r−1 = 0. For this single summand, the product in squared brackets in (3) and (4) is equal to 1 by applying 0 0 = 1, which yields the results of part (i) and (ii) in Corollary 1. Details of the derivation of part (iii) of the corollary are provided in Section 4.
Note, if all canonical correlations are equal, then we can rewrite (1) as follows: This implies that ν coincides with the distribution of the random variable where ζ 1 and ζ 2 are i.i.d. χ 2 -distributed random variables with r degrees of freedom.
With this representation, we can obtain the expression of the PDF given in (6) also from [14], Sec. 4.A.4. Special cases of Corollary 1. The case when all canonical correlations are equal is important because it occurs in various situations. The subsequent cases follow from the properties of canonical correlations given in Section 3.
(i) Assume that the random variables ξ 1 , ξ 2 , . . . , ξ p , η 1 , η 2 , . . . , η q are pairwise uncorrelated with the exception of the pairs (ξ i , η i ), i = 1, 2, . . . , k ≤ min{p, q} for which we have cor(ξ i , η i ) = ρ = 0, where cor(·, ·) denotes the Pearson correlation coefficient. Then, r = k and i = |ρ| for all i = 1, 2, . . . , r. Note, if p = q = k, then for the previous conditions to hold, it is sufficient that the two-dimensional random vectors (ξ i , η i ) are i.i.d. However, the identical distribution of the (ξ i , η i )'s is not necessary. In Laneman [15], the distribution of the information density for an additive white Gaussian noise channel with i.i.d. Gaussian input is determined. This is a special case of the case with i.i.d. random vectors (ξ i , η i ) just mentioned. In Wu and Jindal [16] and in Buckingham and Valenti [17], an approximation of the information density by a Gaussian random variable is considered for the setting in [15]. A special case very similar to that in [15] is also considered in Polyanskiy et al. [6], Sec. III.J. To the best of the authors' knowledge, explicit formulas for the general case as considered in this paper are not available yet in the literature.
(ii) Assume that the conditions of part (i) are satisfied. Furthermore, assume thatÂ is a real nonsingular matrix of dimension p × p andB is a real nonsingular matrix of dimension q × q. Then, the random vectorŝ have the same canonical correlations as the random vectors ξ and η, i.e., i = |ρ| for all i = 1, 2, . . . , k ≤ min{p, q}.

More on Special Cases with Simplified Formulas
Let us further evaluate the formulas given in Corollary 1 and Theorem 3 for some relevant parameter values.
(i) Single canonical correlation coefficient. In the most simple case, there is only a single non-zero canonical correlation coefficient, i.e., r = 1. (Recall, in the beginning of the paper, we have excluded the degenerated case when all canonical correlations are zero.) Then, the formulas of the PDF and the m-th central moment in Corollary 1 simplify to the form for allm ∈ N. A formula equivalent to (10) is also provided by Pinsker [8], Lemma 9.6.1 who considered the special case p = q = 1, which implies r = 1.
(ii) Second and fourth central moment. To demonstrate how the general formula given in Theorem 3 is used, we first consider m = 2. In this case, the summation indices m 1 , m 2 , . . . , m r have to satisfy m i = 1 for a single i ∈ {1, 2, . . . , r}, whereas the remaining m i 's have to be zero. Thus, (5) evaluates for m = 2 to As a slightly more complex example, let m = 4. In this case, either we have m i = 2 for a single i ∈ {1, 2, . . . , r}, whereas the remaining m i 's are zero or we have m i 1 = m i 2 = 1 for two i 1 = i 2 ∈ {1, 2, . . . , r}, whereas the remaining m i 's have to be zero. Thus, (5) evaluates for m = 4 to (iii) Even number of equal canonical correlations. As in Corollary 1, assume that all canonical correlations are equal and additionally assume that the number r of canonical correlations is even, i.e., r = 2r for somer ∈ N. Then, we can use [9], Secs. 10.47.9, 10.49.1, and 10.49.12 to obtain the following relation for the modified Bessel function K α (·) of a second kind and order α Plugging (12) into (6) and rearranging terms yields the following expression for the PDF of the information density: By integration, we obtain for the function V(·) in (8) the expression Note that these special formulas can also be obtained directly from the results given in [14], Sec. 4.A.3.
To illustrate the principal behavior of the PDF and CDF of the information density for equal canonical correlations, it is instructive to consider the specific value r = 2 in the above formulas, which yields and r = 4, for which we obtain

Mutual Information and Information Density in Terms of Canonical Correlations
First introduced by Hotelling [18], the canonical correlation analysis is a widely used linear method in multivariate statistics to determine the maximum correlations between two sets of random variables. It allows a particularly simple and useful representation of the mutual information and the information density of Gaussian random vectors in terms of the so-called canonical correlations. This representation was first obtained by Gelfand and Yaglom [19] and further extended by Pinsker [8], Ch. 9. For the convenience of the reader, we summarize in this section the essence of the canonical correlation analysis and demonstrate how it is applied to derive the representations in (1) and (2).
The formulation of the canonical correlation analysis given below is particularly suitable for implementations. The corresponding results are given without proof. Details and thorough discussions can be found, e.g., in Härdle and Simar [20], Koch [21], or Timm [22].
Define the random vectorŝ where the nonsingular matrices A and B are given by Then, the random variablesξ 1 ,ξ 2 , . . . ,ξ p ,η 1 ,η 2 , . . . ,η q have unit variance and they are pairwise uncorrelated with the exception of the pairs (ξ i ,η i ), i = 1, 2, . . . , r for which we have cor(ξ i ,η i ) = i . Using these results, we obtain for the mutual information and the information density The first equality in (13) and (14) holds because A and B are nonsingular matrices, which follows, e.g., from Pinsker [8], Th. 3.7.1. Since we consider the case where ξ and η are jointly Gaussian,ξ andη are jointly Gaussian as well. Therefore, the correlation properties ofξ andη imply that all random variablesξ i ,η j are independent except for the pairs (ξ i ,η i ), i = 1, 2, . . . , r. This implies the last equality in (13) and (14), where i(ξ 1 ;η 1 ), i(ξ 2 ;η 2 ), . . . , i(ξ r ;η r ) are independent. The sum representations follow from the chain rules of mutual information and information density and the equivalence between independence and vanishing mutual information and information density. Sinceξ i andη i are jointly Gaussian with correlation cor(ξ i ,η i ) = i , we obtain from (13) and the formula of mutual information for the bivariate Gaussian case the identity (2). Additionally, withξ i andη i having zero mean and unit variance, the information density i(ξ i ;η i ) is further given by Now assumeξ 1 ,ξ 2 , . . . ,ξ r ,η 1 ,η 2 ,. . . ,η r are i.i.d. Gaussian random variables with zero mean and unit variance. Then, the distribution of the random vector coincides with the distribution of the random vector (ξ i ,η i ) for all i = 1, 2, . . . , r. Plugging this into (15), we obtain together with (14) that the distribution of the information density i(ξ; η) coincides with the distribution of (1).

Auxiliary Results
To prove Theorem 1, the following lemma regarding the characteristic function of the information density is utilized. The results of the lemma are also used in Ibragimov and Rozanov [23] but without proof. Therefore, the proof is given below for completeness. Lemma 1 (Characteristic function of (shifted) information density). The characteristic function of the shifted information density i(ξ; η) − I(ξ; η) is equal to the characteristic function of the random variableν whereξ 1 ,ξ 2 , . . . ,ξ r ,η 1 ,η 2 ,. . . ,η r are i.i.d. Gaussian random variables with zero mean and unit variance, and 1 , 2 , . . . , r are the canonical correlations of ξ and η. The characteristic function of ν is given by Proof. Due to (1), the distribution of the shifted information density i(ξ; η) − I(ξ; η) coincides with the distribution of the random variableν in (16) such that the characteristic functions of i(ξ; η) − I(ξ; η) andν are equal. It is a well known fact thatξ 2 i andη 2 i in (16) are chi-squared distributed random variables with one degree of freedom from which we obtain that the weighted random variables iξ 2 i /2 and iη 2 i /2 are gamma distributed with a scale parameter of 1/ i and shape parameter of 1/2. The characteristic function of these random variables therefore admits the form Further, from the identity for the characteristic function and from the independence ofξ i andη i , we obtain the characteristic function ofν i = i (ξ 2 i −η 2 i )/2 to be given by Finally, becauseν in (16) is given by the sum of the independent random variablesν i , the characteristic function ofν results from multiplying the individual characteristic functions of the random variablesν i . By doing so, we obtain (17).
As further auxiliary result, the subsequent proposition providing properties of the modified Bessel function K α of second kind and order α will be used to prove the main results.
Proposition 1 (Properties related to the function K α ). For all α ∈ R, the function where K α (·) denotes the modified Bessel function of second kind and order α [9], Sec. 10.25(ii), is strictly positive and strictly monotonically decreasing. Furthermore, if α > 0, then we have Proof. If α ∈ R is fixed, then K α (y) is strictly positive and strictly monotonically decreasing w. r. t. y ∈ (0, ∞) due to [9], Secs. 10.27.3 and 10.37. Furthermore, we obtain by applying the rules to calculate derivatives of Bessel functions given in [9], Sec. 10.29(ii). It follows that y α K α (y) is strictly positive and strictly monotonically decreasing w. r. t.

Proof of Theorem 1
To prove Theorem 1, we calculate the PDF fν of the random variableν introduced in Lemma 1 by inverting the characteristic function ϕν given in (17) via the integral Shifting the PDF ofν by I(ξ; η), we obtain the PDF f i(ξ;η) = fν(x − I(ξ; η)), x ∈ R, of the information density i(ξ; η). The method used subsequently is based on the work of Mathai [10]. To invert the characteristic function ϕν, we expand the factors in (17) as In (23), we have used the binomial series where a ∈ R. The series is absolutely convergent for |y| < 1 and denotes the generalized binomial coefficient with ( a 0 ) = 1. Since holds for all t ∈ R, the series in (23) is absolutely convergent for all t ∈ R. Using the expansion in (23) and the absolute convergence together with the identity we can rewrite the characteristic function ϕν as To obtain the PDF fν, we evaluate the inversion integral (21) based on the series representation in (28). Since every series in (28) is absolutely convergent, we can exchange summation and integration. Let β = r 2 + k 1 + k 2 + · · · + k r−1 . Then, by symmetry, we have for the integral of a summand where the second equality is a result of the substitution t = u/ r . By setting z = 1, α = β − 1 2 ≥ 0 and y = v/ r in the Basset integral formula given in (19) in the proof of Proposition 1 and using the symmetry with respect to v, we can evaluate (29) to the following form: Combining (21), (28), and (30) yields Slightly rearranging terms and shifting fν(·) by I(ξ; η) yields (3). It remains to show that f i(ξ;η) (x) is also well defined for x = I(ξ; η) if r ≥ 2. Indeed, if r ≥ 2, then we can use Proposition 1 to obtain lim x→I(ξ;η) where we used the exchangeability of the limit and the summation due to the absolute convergence of the series. Since Γ(α)/Γ(α + 1 2 ) is decreasing w. r. t. α ≥ 1 2 , we have Then, with (69) in the proof of Theorem 4, it follows that lim x→I(ξ;η) f i(ξ;η) (x) exists and is finite.

Proof of Theorem 3
Using the random variablẽ introduced in Lemma 1 and the well-known multinomial theorem [9], Sec. 26.4.9 where K m,r = ( 1 , 2 , . . . , r ) ∈ N r 0 : 1 + 2 + · · · + r = m , we can write the m-th central moment of the information density i(ξ; η) as To obtain the second equality in (34), we have exchanged expectation and summation and additionally used the identity E ∏ r i=1ν i i = ∏ r i=1 E ν i i , which holds due to the independence of the random variablesν 1 ,ν 2 , . . . ,ν r .
Based on the relation between the -th central moment of a random variable and the -th derivative of its characteristic function at 0, we further have where ϕν i (t) = 1 + 2 i t 2 − 1 /2 , t ∈ R, is the characteristic function of the random variableν i derived in the proof of Lemma 1. As in the proof of Theorem 1, consider now the binomial series expansion using (24) The series is absolutely convergent for all t < −1 i . Furthermore, consider the Taylor series expansion of the characteristic function ϕν i at the point 0 Both series expansions must be identical in an open interval around 0 such that we obtain by comparing the series coefficients for all m i ∈ N, where we have additionally used the identity (27).

Proof of Part (iii) of Corollary 1
Using the random variableν as in the proof of Theorem 3, we can write the m-th central moment of the information density i(ξ; η) as where the characteristic function ϕν ofν is given by ϕν(t) = 1 + 2 r t 2 − r /2 , t ∈ R, due to Lemma 1 and the equality of all canonical correlations. Using the binomial series and the Taylor series expansion as in the proof of Theorem 3, we obtain for allm ∈ N. Collecting terms and additionally using the definition of the generalized binomial coefficient given in (25) in the proof of Theorem 1 yields (9).

Recurrence Formulas and Finite Sum Approximations
If there are at least two distinct canonical correlations, then the PDF f i(ξ;η) and CDF F i(ξ;η) of the information density i(ξ; η) are given by the infinite series in Theorems 1 and 2. If we consider only a finite number of summands in these representations, then we obtain approximations amenable in particular for numerical calculations. However, a direct finite sum approximation of the series in (3) and (4) is rather inefficient since modified Bessel and Struve L functions have to be evaluated for every summand. Therefore, we derive in this section recursive representations, which allow efficient numerical calculations. Furthermore, we derive uniform bounds of the approximation error. Based on the recurrence relations and the error bounds, an implementation in the programming language PYTHON has been developed, which provides an efficient tool to numerically calculate the PDF and CDF of the information density with a predefined accuracy as high as desired. The developed source code as well as illustrating examples are made publicly available in an open access repository on GITLAB [26].
Subsequently, we adopt all the previous notation and assume r ≥ 2 and at least two distinct canonical correlations (since otherwise we have the case of Corollary 1, where the series reduce to a single summand).

Recurrence Formulas
The recursive approach developed below is based on the work of Moschopoulos [27], which extended the work of Mathai [10]. First, we rewrite the series representations of the PDF and CDF of the information density given in Theorem 1 and Theorem 2 in a form, which is suitable for recursive calculations. To begin with, we define two functions appearing in the series representations (3) and (4), which involve the modified Bessel function K α of second kind and order α and the modified Struve L function L α of order α. Let us define for all k ∈ N 0 the functions U k and D k by Furthermore, we define for all k ∈ N 0 the coefficient δ k by where K k,r−1 = (k 1 , k 2 , . . . , k r−1 ) ∈ N r−1 0 : k 1 + k 2 + · · · + k r−1 = k . With these definitions, we obtain the following alternative series representations of (3) and (4) by observing that the multiple summations over the indices k 1 , k 2 , . . . , k r−1 can be shortened to one summation over the index k = k 1 + k 2 + . . . + k r−1 .
Proposition 2 (Alternative representation of PDF and CDF of the information density). The PDF f i(ξ;η) of the information density i(ξ; η) given in Theorem 1 has the alternative series representation The function V(·) specifying the CDF F i(ξ;η) of the information density i(ξ; η) as given in Theorem 2 has the alternative series representation Based on the representations in Proposition 2 and with recursive formulas for U k (·), D k (·) and δ k , we are in the position to calculate the PDF and CDF of the information density by a single summation over completely recursively defined terms. In the following, we will derive recurrence relations for U k (·), D k (·) and δ k , which allow the desired efficient calculations.
Lemma 2 (Recurrence formula of the function U k ). If for all k ∈ N 0 the function U k is defined by (37), then U k (z) satisfies for all k ≥ 2 and z ≥ 0 the recurrence formula Proof. First, assume z = 0. Based on Proposition 1, we obtain for all k ∈ N 0 such that U k (0) is well defined and finite. Using the recurrence relation Γ(y + 1) = yΓ(y) for the Gamma function [24], Sec. 8.331.1 we have This shows together with (43) that the recurrence formula (42) holds for U k (0) and k ≥ 2. Now, assume z > 0 and consider the recurrence formula for the modified Bessel function of the second kind and order α [24], Sec. 8.486.10. Plugging (44) into (37) for α = r−1 2 + k yields for k ≥ 2 Using again the relation Γ(y + 1) = yΓ(y), we obtain which yields together with (45) and (37) the recurrence formula (42) for U k (z) if z > 0 and k ≥ 2. Together with (38), the identity Γ r 2 + k = r 2 + k − 1 Γ r 2 + k − 1 , and the definition of the function U k (·) in (37), we obtain the recurrence formula (46) for D k (z) if z > 0 and k ≥ 1.
Lemma 4 (Recursive formula of the coefficient δ k ). The coefficient δ k defined by (39) satisfies for all k ∈ N 0 the recurrence formula where δ 0 = 1 and For the derivation of Lemma 4, we use an adapted version of the method of Moschopoulos [27] and the following auxiliary result.

Lemma 5.
For k ∈ N 0 , let g be a real univariate (k + 1)-times differentiable function. Then, we have the following recurrence relation for the (k + 1)-th derivative of the composite function h = exp(g) where f (i) denotes the i-th derivative of the function f with f (0) = f .

Proof.
We prove the assertion of Lemma 5 by induction over k. First, consider the base case for k = 0. In this case, formula (51) gives which is easily seen to be true. Assuming formula (51) holds for h (k) , we continue with the case k + 1. Application of the product rule leads to Substitution of j = j + 1 in the first term gives With this representation and the identity, This completes the proof of Lemma 5.

Proof of Lemma 4.
To prove the recurrence formula (49), we consider the characteristic function of the random variableν introduced in Lemma 1. On the one hand, the series representation of ϕν given in (28) in the proof of Theorem 1 can be rewritten as follows using the coefficient δ k defined in (39): On the other hand, recall the expansion of 1 + 2 i t 2 − 1 2 given in (22), which yields together with (52) and the application of the natural logarithm the identity log(ϕν(t)) = log 1 + 2 Now consider the power series which is absolutely convergent for |y| < 1. With the same arguments as in the proof of Theorem 1, in particular due to (26), we can apply the series expansion (55) to the second term on the right-hand side of (54) to obtain the absolutely convergent series representation log(ϕν(t)) = log 1 + 2 where we have further used the definition of γ given in (50). Applying the exponential function to both sides of (56) then yields the following expression for the characteristic function ϕν.
Comparing (53) and (57) yields the identity We now define x = 1 + 2 r t 2 −1 and take the (k + 1)-th derivative w. r. t. x on both sides of (58) using the identity for the m-th derivative of a power series ∑ ∞ =0 a x . For the left-hand side of (58), we obtain For the right-hand side of (58), we obtain where we used Lemma 5 and the identities (58) and (59). From the equality and the evaluation of the right-hand side of (60) and (61), we obtain Comparing the coefficients for x 0 finally yields This completes the proof of Lemma 4.

Finite Sum Approximations
The results in the previous Section 5.1 can be used in the following way for efficient numerical calculations. Consider for n ∈ N 0 , i.e., the finite sum approximation of the PDF given in (40). To calculatê f i(ξ;η) (x, n), first calculate U 0 |x − I(ξ; η)|/ r and U 1 |x − I(ξ; η)|/ r using (37). Then, use the recurrence formulas (42) and (49) to calculate the remaining summands in (62). The great advantage of this approach is that only two evaluations of the modified Bessel function are required and for the rest of the calculations efficient recursive formulas are employed making the numerical computations efficient.
Similarly, consider for n ∈ N 0 , i.e., the finite sum approximation of the alternative representation of the CDF of the information density, whereV(z, n) is the finite sum approximation of the function V(·) given in (41). To calculateF i(ξ;η) (x, n), first calculate D 0 (z), U 0 ( z / r ), and U 1 ( z / r ) for z = I(ξ; η) − x or z = x − I(ξ; η) using (37) and (38). Then, use the recurrence formulas (42), (46), and (49) to calculate the remaining summands in (64). This approach requires only three evaluations of the modified Bessel and Struve L function resulting in efficient numerical calculations also for the CDF of the information density.
The following theorem provides suitable bounds to evaluate and control the error related to the introduced finite sum approximations.
Theorem 4 (Bounds of the approximation error for the alternative representation of PDF and CDF). For the finite sum approximations in (62)-(64) of the alternative representation of the PDF and CDF of the information density as given in Proposition 2, we have for n ∈ N summands the error bounds and Proof. From the special case where all canonical correlations are equal, we can conclude from the CDF given in Corollary 1 that the function is monotonically increasing for all α = (j − 1)/2, j ∈ N and that further holds. Using (68), we obtain from (4) by exchanging the limit and the summation, which is justified by the monotone convergence theorem. Due to the properties of the CDF, we have lim z→∞ 2V(z) = 1, which implies where the first equality follows from the definition of the coefficient δ k in (39).

Remark 1.
Note that the bound in (65) can be further simplified using the inequality Γ(α)/Γ(α + 1 /2) ≤ √ π. Further note that the derived error bounds are uniform in the sense that they only depend on the parameters of the given Gaussian distribution and the number of summands considered. As can be seen from (69), the bounds converge to zero as the number of summands jointly increase.
Remark 2 (Relation to Bell polynomials). Interestingly, the coefficient δ k can be expressed for all k ∈ N in the following form where γ j is defined in (50), and B k denotes the complete Bell polynomial of order k [28], Sec. 3.3. Even though this is an interesting connection to the Bell polynomials, which provides an explicit formula of δ k , the recursive formula given in Lemma 4 is more efficient for numerical calculations.

Numerical Examples and Illustrations
We illustrate the results of this paper with some examples, which all can be verified with the Python implementation publicly available on GITLAB [26].
These canonical correlations are related to the information density of a continuous-time additive white Gaussian noise channel confined to a finite time interval [0, T] with a Brownian motion as input signal (see, e.g., Huffmann [30], Sec. 8.1 for more details). Figures 9 and 10 show the approximated PDFf i(ξ;η)−I(ξ;η) (·, n) and CDFF i(ξ;η)−I(ξ;η) (·, n) for r ∈ {2, 5, 10, 15} and T = 1 using the finite sums (62) and (64). The bounds of the approximation error given in Theorem 4 are chosen < 1×10 −2 such that there are no differences visible in the plotted curves by further lowering the approximation error. The number n of summands required in (62) and (64) to achieve these error bounds for r ∈ {2, 5, 10, 15} is equal to n ∈ {15, 141, 638, 1688} for the PDF and n ∈ {20, 196, 886, 2071} for the CDF. Choosing r larger than 15 for the canonical correlations (71) with T = 1 does not result in visible changes of the PDF and CDF compared to r = 15. This demonstrates, together with Figures 9 and 10, that a Gaussian approximation is not valid for this example, even if r → ∞. Indeed, from [8], Th. 9.6.1 and the comment above Eq. (9.6.45) in [8], one can conclude that, whenever the canonical correlations satisfy then the distribution of the information density is not Gaussian.

Summary of Contributions
We derived series representations of the PDF and CDF of the information density for arbitrary Gaussian random vectors as well as a general formula for the central moments using canonical correlation analysis. We provided simplified and closed-form expressions for important special cases, in particular when all canonical correlations are equal, and derived recurrence formulas and uniform error bounds for finite sum approximations of the general series representations. These approximations and recurrence formulas are suitable for efficient and arbitrarily accurate numerical calculations, where the approximation error can be easily controlled with the derived error bounds. Moreover, we provided examples showing the (in)validity of approximating the information density with a Gaussian random variable. Data Availability Statement: An implementation in Python allowing efficient numerical calculations related to the main results of the paper is publicly available on GitLab: https://gitlab.com/ infth/information-density (accessed on 24 June 2022).

Conflicts of Interest:
The authors declare no conflict of interest.