Expected Logarithm of Central Quadratic Form and Its Use in KL-Divergence of Some Distributions

: In this paper, we develop three different methods for computing the expected logarithm of central quadratic forms: a series method, an integral method and a fast (but inexact) set of methods. The approach used for deriving the integral method is novel and can be used for computing the expected logarithm of other random variables. Furthermore, we derive expressions for the Kullback–Leibler (KL) divergence of elliptical gamma distributions and angular central Gaussian distributions, which turn out to be functions dependent on the expected logarithm of a central quadratic form. Through several experimental studies, we compare the performance of these methods.


Introduction
Expected logarithm of random variables usually appears in the expressions of important quantities like entropy and Kullback-Leibler (KL) divergence [1][2][3].The second kind moment is an important statistics method used in estimation problems [4,5].It also appears in an important class of inference algorithms called the variational Bayesian inference [6,7].Furthermore, the geometric mean of a random variable which has been used in economics [8,9] is equal to the exponential of the expected logarithm of that random variable.
Central quadratic forms (CQFs) have many applications, and most of them stem from the fact that they are asymptotically equivalent to many statistics for testing null hypotheses.They are used for finding the number of components in mixtures of Gaussians [10], to test goodness-of-fit for some distributions [11] and as test statistics for dimensionality reduction in inverse regression [12].
In this paper, we develop three algorithms for computing the expected logarithm of CQFs.There is a need to develop special algorithms for it because CQFs do not have a closed-form probability density function, which makes the computation of their expected logarithms difficult.Although there is a vast literature on many different ways for calculating probability distributions of CQFs (see [13][14][15][16]), we have not found any work on calculating their expected logarithms.It is worth noting that one of our three algorithms is based upon works for computing the probability density function of CQFs using a series of gamma random variables [13,14].We also derive expressions for the KL-divergence of two distributions that are subclasses of generalized elliptical distributions.These are zero-mean elliptical gamma (ZEG) distribution and angular central Gaussian (ACG) distribution.The only term in their KL-divergences that can not be computed in terms of elementary functions is an expected logarithm of a CQF, which can be computed by one of our developed algorithms.
The KL-divergence or the relative entropy was first introduced in [17] as a generalization of Shannon's definition of information [18].This divergence has been used extensively by statisticians and engineers.Many popular divergence classes like f-divergence and alpha-divergence have been introduced as generalizations to this divergence [19].This divergence has several invariance properties like scale invariance that makes it an interesting dissimilarity measure in statistical inference problems [20].KL-divergence is also used as a criterion for model selection [21], hypothesis testing [22], and merging in mixture models [23,24].Additionally, it can be used as a measure of dissimilarity in classification problems, for example, text classification [25], speech recognition [26], and texture classification [27,28].
The wide applicability of KL-divergence as a useful dissimilarity measure persuade us to derive the KL-divergence for two important distributions.One of them is ZEG [29] that has a rich modeling power and allows heavy and light tail and different peak behaviors [30,31].The other is ACG [32] which is a distribution on the unit sphere that has been used in many applications [33][34][35][36].This distribution has many nice features; for example, its maximum likelihood estimator is asymptotically the most robust estimator of the scatter matrix of an elliptical distribution in the sense of minimizing the maximum asymptotic variance [37].

Contributions
To summarize, the key contributions of our paper are the following: -Introducing three methods for computing the expected logarithm of a CQF.
-Proposing a procedure for computing the expected logarithm of an arbitrary positive random variable.-Deriving expressions for the entropy and the KL-divergence of ZEG and ACG distributions (the form of KL-divergence between ZEG distributions appeared in [38] but without its derivations).
The methods for computing the expected logarithm of a CQF differ in running-time and accuracy.Two of these, namely integral and series methods, are exact.The third method is a fast but inexact set of methods.The integral method is a direct application of our proposed procedure for computing the expected logarithm of positive random variables.We propose two fast methods that are based on approximating the CQF with a gamma random variable.We show that these fast methods give upper and lower bounds to the true expected logarithm.This leads us to develop another fast method based on a convex combination of the other two fast methods.Whenever the weights of the CQF are eigenvalues of a matrix as in the case of KL-divergences, the fast methods can be very efficient because they do not need eigenvalue computation.

Outline
The remainder of this paper is organized as follows.Section 2 proposes three different methods for computing the expected logarithm of a CQF.Furthermore, a theorem is stated at the beginning of this section that has a pivotal role in the first two methods.Then, we derive expressions for the KL-divergence and entropy of ZEG and ACG distributions in Section 3. Afterwards, in Section 4, multiple experiments are conducted to examine the performance of three methods for computing the aforementioned expected logarithm in terms of accuracy and computational time.Finally, Section 5 presents our conclusions.To improve the readability of the manuscript, the proofs of some theorems are presented in appendices.

Calculating the Expected Logarithm of a Central Quadratic Form
Suppose N i is the i-th random variable in the series of d independent standard normal random variables, i.e., normal random variables with zero-means and unit variances.Then the central (Gaussian) quadratic form is the following random variable: where λ i s are non-negative real numbers.Note that N 2 i s are chi-square random variables with degree of freedoms equal to one; therefore, this random variable is also called the weighted sum of chi-square random variables.To the best of our knowledge, the expected logarithm of the random variable U does not have a closed-form expression using elementary mathematical functions.For its calculation, we propose three different approaches, namely an integral method, a series method and a set of fast methods.Each of them has its specific properties and does well in certain situations.
In the following theorem, a relation between the expected logarithm of two positive random variables distributed according to arbitrary densities and the Laplace transform of these two densities is given.This theorem is used in the integral method and the fast method.Note that the assumptions of the following theorem are unrestrictive.Therefore, it can also be used for computing the expected logarithm of other positive random variables.
Theorem 1.Let X and Y be two positive random variables, F and G be their cumulative distribution functions, and f and g be their probability density functions.Furthermore, suppose that F and G are the Laplace transform of f and g, respectively.If Proof.Using the definition of Laplace transform, we have Using the integration property of Laplace transform, we have Using the frequency integration property of Laplace transform, and the formulas Letting s in the above equation go to zero and since Using the integration by parts formula having log(x) and G(x) − F(x) as its parts, we have Since lim x→∞ log(x)(G(x) − F(x)) = 0, and lim Hence by using ( 5) in (4), we have From the definition of expectation, relation (2) is obtained.

Integral Method
In this part, we will use Theorem 1 for computing the expected logarithm of a CQF.To this end, we choose a random variable Y that has a closed-form formula for its expected logarithm and Laplace transform of its density.A possible candidate is the gamma random variable.The density of gamma random variable has the following Laplace transform: where k and θ are its shape and scale parameters, respectively.Also, the expected logarithm of this random variable is Ψ(k) + log(θ), where Ψ(•) is digamma function.
Using the convolution property of Laplace transform, it is easy to see that the density function of the CQF given in (1) has the following closed-form Laplace transform: Lemmas 2 and 3 show that a CQF and a gamma random variable satisfy the conditions of Theorem 1.For proving Lemma 2, we need Lemma 1.First of all, let us express the following trivial proposition.Proposition 1.Let X 1 , . . ., X n be arbitrary real random variables.Suppose we have two many-to-one transformations Y = h(X 1 , . . ., X n ) and Z = g(X 1 , . . ., X n ).If the following inequality holds for any x i s in the support of random variables X i s: then we have the following inequality between the cumulative distribution functions of random variables Y and Z: Lemma 1.Let F be the cumulative distribution function of a CQF, that is ∑ d i=1 λ i N 2 i , where λ i s are positive real numbers and N i s are independent standard normal random variables.Also, let G(x; k, θ) be the cumulative distribution function of a gamma random variable with parameters k and θ, then the following inequalities hold: where λ max = max{λ i } d i=1 , and Proof.This lemma is an immediate consequence of Proposition 1 and the following relation, knowing that λ ∑ d i=1 N 2 i is a gamma random variable with the shape parameter d/2 and the scale parameter 2λ: Lemma 2. Let G be the cumulative distribution function of an arbitrary gamma random variable and F be the cumulative distribution function of random variable ∑ d i=1 λ i N 2 i , where λ i s are positive real numbers and N i s are independent standard normal random variables, then lim x→∞ log(x)(G(x) − F(x)) = 0, and The proof of this lemma can be found in Appendix A. Lemma 3. Let G be the Laplace transform of probability density function of an arbitrary gamma random variable and F be the Laplace transform of probability density function of ∑ d i=1 λ i N 2 i , where λ i s are positive real numbers and N i s are independent standard normal random variables, then The proof of this lemma can be found in Appendix B.
According to Lemmas 2 and 3, the conditions of Theorem 1 hold by choosing X to be a CQF given in (1), and Y to be an arbitrary gamma random variable.Therefore, we can use (2) for calculating the expected logarithm of a CQF, and it is given by The above equation holds for any choice of positive scalars k and θ.To the best of our knowledge, the above integral does not have a closed-form solution, so it must be computed numerically.This integral can be computed numerically using the variety of techniques available for one-dimensional integrals (see for example [39]).

Fast Methods
The integral method explained in the previous part can be computationally expensive for some applications.To this end, we derive three approximations that can be calculated analytically and, therefore, are much faster.
Using the first or higher order Taylor expansion around E[U] to approximate the expected logarithm of U has been already proposed in the literature [6,40].However, we observed that lower order Taylor expansion does not give a very accurate approximation.Therefore, we use two different approximations, for which we can show that they provide a lower and an upper bound for the true expected logarithm.Finally, a convex combination of these two is used to get the final approximation.
Two different gamma distributions have been used in [15,41,42] to approximate a CQF.Since the expected logarithm of a gamma random variable has a closed-form solution, we use the expected logarithm of these gamma random variables to approximate the expected logarithm of a CQF.A further justification for this idea can be given based on (13) by choosing the shape and the scale parameters of gamma distribution such that the magnitude of the integral part in (13) becomes smaller.
Since the weights of CQF in the KL-divergence formulas in Section 3 are eigenvalues of a positive definite matrix Σ, we express the approximations based on this matrix.This way of expressing the approximations also elucidates the fact that the eigenvalues do not need to be calculated, which shows further computational benefits of these approximations.The shape and scale parameters of the first approximating gamma random variable are d/2 and 2tr(Σ)/d, respectively.Therefore, for the first fast approximation we have The shape and scale parameters of the gamma random variable for the second approximation are tr(Σ) 2 /2tr(Σ 2 ) and 2tr(Σ 2 )/tr(Σ), respectively.Then, we obtain the following formula for the second fast approximation: The following theorem shows that these approximations are lower and upper bounds to the true expected logarithm.
, where λ i s are eigenvalues of positive definite matrix Σ d×d and N i s are independent standard normal random variables, then The proof of this theorem can be found in Appendix C.
From this theorem, we can conclude that there exist some convex combinations of the two previously mentioned approximations which perform equal or better than each of them, in the sense that they are closer to the true expected logarithm.Therefore, we define the third fast approximation to be To determine parameter l ∈ [0, 1] in the above equation, we used the least squares fitting on thousands of positive definite matrices with different dimensions and unit trace sampled uniformly according to an algorithm given in [43].We observed that the fitted value is roughly equal to l = 0.7 and dimensionality has a negligible effect on the best value of l.For the case of d = 20, the mean squared error for various values of l can be seen in Figure 1.

Series Method
One can represent the probability density function of a CQF given by (1) as an infinite weighted sum of gamma densities [13,14], where g(u; d/2 + j, 2β) is the probability density function of a gamma random variable with parameters d/2 + j and 2β, and This result can be used for deriving a series formula for the expected logarithm of U. Thus, Ruben [13] analyzed the effect of various βs on the behavior of the series expansion and proposed the following β as an appropriate one: By using this β, ∑ ∞ j=0 c j = 1 holds [13] and also knowing that the following relation holds for the digamma function: can be simplified To approximate this formula, we cut the series coefficients, which means we only use a finite number of terms to evaluate the expectation: For this approximation, it is possible to compute an error bound which is expressed by the following lemma.Lemma 4. The bound for error of the approximation (25) where Γ(•) is gamma function, and = (λ max − λ min )/(λ max + λ min ).
The proof of this lemma is in Appendix D. By using this bound, it is possible to calculate the expected logarithm with a given accuracy by selecting an appropriate L. Note that the upper bound given by ( 26) is growing with respect to , and is also increasing with λ max /λ min .As we will see in the simulation studies, when the ratio λ max /λ min as well as the dimensionality d are small, this method performs better than the integral method.

KL-Divergence of Two Generalized Elliptical Distributions
In this section, we derive expressions for the KL-divergence and the entropy of two subclasses of generalized elliptical distributions, namely ZEG and ACG distributions [44].We first start by reviewing some related materials.

Some Background on the Elliptical Distributions
Suppose the d-dimensional random vector X is distributed according to a zero-mean elliptical contoured (ZEC) distribution with a positive definite scatter matrix Σ d×d , that is X ∼ Z E C(Σ, ϕ).The probability density function of X is given by for some density generator functions ϕ : R + → R. We know that we can decompose the vector X into a uniform hyper-spherical component and a scaled-radial component so that, X = Σ 1/2 RU, where U is uniformly distributed over the unit sphere S d−1 and R is a univariate random variable given by R = Σ −1/2 X 2 [45].Then, the random variable R has the density Therefore, square radial component Υ = R 2 has the following density: A ZEG is a ZEC whose square radial component is distributed according to a gamma distribution Υ ∼ Gamma(a, b).A gamma-distributed random variable has the density where a is a shape parameter and b is a scale parameter.So the probability density function of a d-dimensional random variable X ∼ Z E G(Σ, a, b) is given by where x ∈ R d and Σ 0 is its scatter matrix, also a, b > 0 are certain scale and shape parameters [31].
When ZEC random variable is projected onto a unit sphere, the resulting random variable is called ACG and denoted by X ∼ ACG(Σ).This distribution unlike many other distributions on the unit sphere has a nice closed-form density given by where x ∈ S d−1 and Σ 0 is its scatter matrix.

KL-Divergence between ZEG Distributions
Suppose we have two probability distributions P and Q with probability density functions p and q, KL-divergence between these two distributions is defined by The negative of the first part, H(X) = − log(p(x))p(x)dx, is the entropy and the second part, E[− log(q(X))] = − log(q(x))p(x)dx, is the averaged log-loss term, where X is a random variable distributed according to P.
Following lemma gives a general expression for the KL-divergence between two ZEC distributions.It is then used for deriving the KL-divergence between two ZEG distributions.

Lemma 5. Suppose we have two probability distributions on random variable Y, P
), then the KL-divergence between these two distributions is given by the following expression: where f Υ and f Υ are the square radial components of distributions P and Q, respectively.Also, f wd is the density of , where N i s are independent standard normal random variables, and λ 1 , . . ., λ d are eigenvalues of matrix Proof.KL-divergence is known to be invariant against invertible transformations of random variable Y [46].To simplify the derivations, we apply a linear transformation X = Σ −1/2 2 Y that makes the scatter matrix of the second distribution identity.By using this change of variable, the problem becomes that of KL-divergence computation between As expressed in (33), the KL-divergence is the subtraction of the entropy from the averaged log-loss.Firstly, let us derive the entropy of X having distribution P X , that is H(X) = − log(p(x))p(x)dx.
Let r = y 2 and recall that the area of a sphere in dimension d with radius r equals 2r d−1 π d/2 /Γ(d/2), thus Using the change of variable υ = r 2 and replacing ϕ by square radial density f Υ as expressed in (29), we obtain Now, we are deriving an expression for the averaged log-loss given by E[− log(q(X))] = −E log ϕ (X X) .The argument of function ϕ is X X; therefore, it is enough to compute the expectation of the function over the new random variable Z = X 2  2 : It is easy to see that the random variable Z can equally be written as Z = X Σ X where X ∼ Z E C(I, ϕ).The density of Z with this representation has already been reported in [47]: where f Υ is the square radial density of p Y , and f wd is the density of a linear combination of Dirichlet random variable components, where D = (D 1 , . . ., D s ) is a Dirichlet random variable with parameters (r 1 /2, . . ., r s /2), and l j s are s distinct eigenvalues of the positive definite matrix Σ with respective multiplicities r j , for j = 1, . . ., s.
It is known that if random variables C 1 , . . ., C s are independent chi-square random variables having r 1 , . . ., r s degrees of freedom, and C = ∑ s j=1 C j , then (C 1 /C, . . ., C s /C) is a Dirichlet random variable with the parameters (r 1 /2, . . ., r s /2) [48].Hence, the random variable Λ in (38) can be expressed as Λ = ∑ s i=1 l j C j /C.Equivalently, if N 1 , . . ., N d are independent standard normal random variables, then Λ can be written as Using (37) in (36) and replacing ϕ by square radial density f Υ as expressed in (29), we obtain the following expression for the averaged log-loss: Subtracting ( 35) from (40), we obtain (34).
Until now, we derived an expression for the KL-divergence between two ZEC distributions.We can further simplify the KL-divergence for the case of ZEG distributions to avoid computing double-integration, and the following theorem proves it.Theorem 3. Suppose we have two distributions P Y = Z E G(Σ 1 , a p , b p ), and Q Y = Z E G(Σ 2 , a q , b q ), then the entropy of random variable Y distributed according to P Y and the KL-divergence between these two distributions are given by the following expressions: where Ψ(•) is digamma function, and tr(•) is the trace of a matrix.Also N i s are independent standard normal random variables, and λ 1 , . . ., λ d are eigenvalues of matrix Proof.Like the previous lemma, we apply the change of variable X = Σ −1/2 2 Y and compute the KL-divergence between the transformed distributions.The expression for entropy (35) in the case of ZEG distributions becomes Next, recall the following gamma function identities [49]: Using ( 44) and ( 45), we can simplify (43) to obtain Since Y = Σ 1/2 2 X, we can trivially derive the expression of H(Y) given in (41).For deriving the averaged log-loss term, we obtain the following expression by putting the gamma square radial component (30) into (40): We apply the change of variable µ = υ/r and express the integrals in terms of new variables µ and r, Using the equalities ( 44) and ( 45), we obtain where similar to the previous lemma, f wd is the density of the random variable , where N i s are independent standard normal random variables, and λ 1 , . . ., λ d are the eigenvalues of matrix Σ. Subtracting the entropy from the averaged log-loss and knowing that The moments of Λ were computed in [47], but we are giving a simple derivation of the first moment below.It is known that the random variable V i = N 2 i / ∑ d j=1 N 2 j has the following beta distribution: Expected logarithm of Λ can be expressed as a difference of two expectations: Using the fact that the expected logarithm of a chi-square random variable with d degrees of freedom is equal to Ψ(d/2) + log( 2), E[log(Λ)] can be computed by the following equation: With substitution ( 48) and ( 50) into ( 46), we get (42).

KL-Divergence between ACG Distributions
The following theorem gives expressions for the KL-divergence between ACG distributions and the entropy of a single ACG distribution.Theorem 4. Suppose we have two probability distributions G Y = ACG(Σ 1 ) and J Y = ACG(Σ 2 ), then the entropy of random variable Y distributed according to G Y and the KL-divergence between these two distributions are given by the following expressions: 2) , ( 51) where N i s are independent standard normal random variables, λ 1 , . . ., λ d are eigenvalues of matrix , and σ 1 , . . ., σ d are eigenvalues of matrix Σ 1 .
Proof.Due to the invariance property of KL-divergence under invertible change of variables, we use the change of variable It is easy to verify that Ω is distributed according to a zero-mean generalized elliptical distribution with identity covariance [44].From the definition of KL-divergence given by ( 33), we have . By some simplifications, it is immediate that Since projecting any zero-mean generalized elliptical distribution (with identity covariance) on the unit sphere gives an ACG random variable (with identity covariance) [50], we can substitute E[log(Ω Σ−1 Ω/Ω Ω)] with E[log(X Σ−1 X/X X)], where the random vector X is distributed according to a multivariate normal distribution with identity covariance and zero mean.Because X Σ−1 X is a CQF and X X is a chi-square random variable, we have where λi s are the eigenvalues of Σ.Additionally, it is easy to verify that |Σ| = | Σ| −1 and λ i = λ−1 i , therefore ( 52) holds.
Since one of the terms in the KL-divergence is equal to the minus entropy, we use our derived expression for the KL-divergence between ACG distributions to find a formula for the entropy of an ACG distribution.Define S Y = ACG(I), then the KL-divergence between G Y and S Y can be easily derived from the main definition (33): where H(Y) is the entropy of the random variable Y. Now, we compute the above KL-divergence using (52) which is Equating the right-hand sides of ( 55) and ( 56) gives (51).
The following corollary shows a relation between the KL-divergence of ACG distributions and the KL-divergence of ZEG distributions.It is an immediate consequence of Theorems 3 and 4.

Simulation Study
In Section 2, we proposed three different methods for computing the expected logarithm of the CQF given in (1).We assume the weights of CQFs that are used in the simulations are eigenvalues of some random positive definite matrices.These random matrices are generated uniformly from the space of positive definite matrices with unit trace according to the procedure proposed in [43].In this section, we numerically investigate the running time and accuracy of these approaches.All methods were implemented in MATLAB (Version R2014a) (64-bit), and the simulations were run on a personal laptop with an Intel Core i5 (2.5Ghz) processor under the OS X Yosemite 10.10.3 operating system.Since the series method depends heavily on loops that are slow in MATLAB, we implemented this method in a MATLAB MEX-file.For the integral method, the integral is numerically evaluated using Gauss-Kronrod 7-15 rule [51,52].The absolute error tolerance is given as an input parameter of the numerical integration.In the integral method, the value can be computed with any given accuracy by choosing the absolute error tolerance; therefore, we do not analyze the integral method in the sense of the calculation error.
Figure 2 investigates the effects of dimensionality on the average running time of different methods for computing the expected logarithm of the CQF explained in Section 2. For the integral method (upper-left plot), two curves for two different absolute error tolerances are shown.The integral formula (13) has the parameters k and θ that can be chosen freely, and we choose those given in (14).Different curves for the series method (upper-right plot) correspond to different values of L, which is the truncation length of the series.The curve of the fast method (lower plot) corresponds to the computation time of the third fast method explained in Section 2. One reason of lower computation time of the fast method is its lack of need for any eigenvalue computation.There is a curve in upper-right plot showing the computational time of eigenvalue computation.
The approximation error of all three fast methods for different dimensions can be seen in Figure 3.The plot on the right-hand side of this figure magnifies the curve for the mean error of the third fast method (the blue curve with dots).As it can be observed in Figure 3, changing the dimensionality has a negligible effect on the mean and the standard deviation (SD) of the absolute approximation error for the fast methods.Small mean error and SD of the third method indicate the distinct advantage of the third fast method over the other two methods.This method uses a convex combination of the values of the other two approximations as explained in Section 2.
Approximating the expected logarithm of the CQF using the fast methods induces an error on the KL-divergence between ACG distributions given by (52). Figure 4 shows the mean percentage of relative error and its standard deviation as a function of dimensionality.It can be observed that the relative error decreases as the dimensionality increases.The third fast method is clearly superior to the other two fast methods.The reason for such a small percentage of relative error is the observation that whenever the error is large, then the KL-divergence is large too.We are not showing the results for the dimensions less than ten because the error percentage is quite large in that regime.The red curve in the upper-left plot shows the computational time for computing the eigenvalues of random positive-definite matrices (using eig function in MATLAB) needed before applying the integral method or series method.Different curves for the upper-left plot correspond to the computational time of integral method for different absolute error tolerances including the time needed for computing the eigenvalues.Different curves for the series method correspond to the computational time for various values of truncation length of the series.Error SD of the first method Error mean of the first method Error SD of the second method Error mean of the second method Error SD of the third method Error mean of the third method Figure 6 shows how increasing dimension affects the performance of the series method.The parameter is set to 0.9 by choosing maximum and minimum weights in the CQF to be 1 and 1/19, respectively.The other weights of CQF are sampled uniformly between the maximum and minimum weights.It can be seen that the dimensionality has a negligible effect on the slope of the curves.This can be predicted from the formula of upper bound in (26), because the exponential term L dominates other terms in the equation and the slopes of the curves are determined mainly by the parameter .In this figure, the standard deviations are due to the different distribution of the weights between the maximum and minimum weights.Figures 5 and 6 demonstrate that the error upper bound is a relatively tight bound for the actual error.
In Figure 7, we investigate the effect of and d on the averaged L to achieve an acceptable upper bound error (here 10 −8 ).We can see that as the amount of increases, the slopes of the curves increase and in the limit of → 1, it goes to infinity.This figure justifies our previous claim that when and the dimensionality are small, the series method is very efficient due to relatively small L needed to achieve an acceptable error.

Conclusions
In this paper, we developed three methods for calculating the expected logarithm of a central quadratic form.The integral method was a direct application of a more general result applicable for positive random variables.We then introduced three fast methods for approximating the expected logarithm.Finally, using an infinite series representation of central quadratic forms, we proposed a series method for computing the expected logarithm.By proving a bound for the approximation error, we investigated the performance of this method.
We also derived expressions for the entropy and the KL-divergence of zero-mean elliptical gamma and angular central Gaussian distributions.The expected logarithm of the central quadratic form appeared in the form of KL-divergences and entropy of the angular central Gaussian distribution.
By conducting multiple experiments, we observed that the three methods for computing the expected logarithm of a central quadratic form differ in running time and accuracy.The possible user can choose the most appropriate method based on his/her requirements.
The methodologies developed in this paper can be used in many applications.For example, one can use the result of Theorem 1 for computing the expected logarithm of other positive random variables like a non-central quadratic form.Another line of research would be to use the KL-divergence between angular central Gaussian distributions with the fast approximations in learning problems that have a divergence measure in their cost functions.
where λ max = max{λ i } d i=1 , λ min = min{λ i } d i=1 , and γ(•, •) is the lower incomplete gamma function.Adding G(x) to all sides of the above inequality, we get Since log(x) is positive for x > 1, therefore by multiplying all sides of the above inequality by log(x), we obtain log , (A3) which holds for all x > 1.For proving the first part of this lemma, namely holds for any positive choices of k, k, θ, and θ and then invoke squeeze theorem by taking the limits of all sides of (A3).From the definition of lower incomplete gamma function, the left-hand side (A5) can be rewritten as Using L'Hôpital's rule, it can be seen that the above limit is equivalent to lim x→∞ x log(x) 2  1 It is easy to see that (A6) is equal to zero and consequently (A5) and (A4) hold.Now, we want to prove the second statement in the lemma, that is lim If we multiply all sides of (A2) by log(x), then for 0 < x < 1 we have . (A8) Using the same strategy as above, we want to show that for any positive choices of k, k, θ, and θ, the following limit holds: Using L'Hôpital's rule, it can be seen that lim x log(x) 2  1 Therefore (A9) holds, and from (A8), we have By squeeze theorem, we can conclude that (A7) holds.

Appendix B. Proof of Lemma 3
From the expression of F and G, we have In this proof, for the simplicity of notation, we define L We give separate proofs for the cases d > 2k, d < 2k and d = 2k.For the first case d > 2k, we have Consequently, it can be said that there exists a number a > 0 that for all x ≥ a, the function V (σ) is positive.
Therefore, the integrand of ∞ a L(σ)dσ is positive in its domain of integration.If we choose 1 σ p dσ is convergent and its integrand is positive in its domain, from the limit comparison test, it follows that the integral ∞ a L(σ)dσ is convergent.Now, we want to show that the integral Therefore, there exists a number a > 0 that for all x ≥ a, the function −V (σ) is positive.Therefore, the integrand of ∞ a −L(σ)dσ is positive in its domain of integration.If we choose 1 < p < 1 + d/2, then lim σ→∞ −L(σ) Knowing that ∞ a 1 σ p dσ is bounded, using limit comparison test, we can conclude that ∞ a −L(σ)dσ is convergent.Now, with the same strategy as the previous case, we can show that the integral a 0 −L(σ)dσ is convergent and it is easy to see that ∞ 0 L(σ)dσ is also convergent.For 2k = d, excluding the obvious case G(σ) = F (σ), there exists a number a > 0 that for all x ≥ a, the function V (σ) is either positive or negative.If it is positive, then we use the proof strategy for the case d > 2k.Otherwise, we exploit the strategy for the case d < 2k.
We want to show P (σ) ≥ 0, for all positive sigmas, and it is equivalent to say , for all {x i , y i } ∈ R + , (C8) which is the Cauchy-Schwarz inequality.So the function P(σ) is increasing and consequently, the second inequality holds.

Appendix D. Proof of Lemma 4
As we can see in [14], the following bound exists for c i : which is true if i is large enough such that i < 1.Since L > d /(2 − 2 ), it can be observed that i < 1 for i ≥ L, hence for the total approximation error, we obtain

Figure 1 .
Figure 1.The mean squared error of the third fast method for approximating the expected logarithm of a CQF as a function of parameter l.

Figure 2 .
Figure 2.The average running time (in milliseconds) of the integral method (a), the series method (b) and the third fast method (c) in different dimensions for computing expected logarithm of the CQF.The red curve in the upper-left plot shows the computational time for computing the eigenvalues of random positive-definite matrices (using eig function in MATLAB) needed before applying the integral method or series method.Different curves for the upper-left plot correspond to the computational time of integral method for different absolute error tolerances including the time needed for computing the eigenvalues.Different curves for the series method correspond to the computational time for various values of truncation length of the series.

Figure 3 .
Figure 3.The absolute error for the approximation of the expected logarithm of the CQF by the fast methods explained in Section 2 for different dimensions.The third method uses a convex combination of the first two methods.The plot on the right shows the zoomed version of the error mean of the third method.

3 Figure 6 .Figure 7 .
Figure 6.The relation between L and the error in the series method for = 0.9.