Computing the Moments of the Complex Gaussian: Full and Sparse Covariance Matrix

: Given a multivariate complex centered Gaussian vector Z = ( Z 1 , . . . , Z p ) with non-singular covariance matrix Σ , we derive sufﬁcient conditions on the nullity of the complex moments and we give a closed-form expression for the non-null complex moments. We present conditions for the factorisation of the complex moments. Computational consequences of these results are discussed.


Introduction
The p-variate Complex Gaussian Distribution (CGD) is defined by [1] to be the image under the complex affine transformation z → µ + Az of a standard CGD. In the Cartesian representation this class of distributions is a proper subset of the general 2p-variate Gaussian distribution and its elements are also called rotational invariant Gaussian Distribution. Statistical properties and applications are discussed in [2].
As it is in the real case, CGD's are characterised by a complex covariance matrix Σ = E (ZZ * ), which an Hermitian operator, Σ * = Σ. In some cases, we assume Σ to be positive definite, z * Σz > 0 if z = 0. This assumption is equivalent to the existence of a density. We assume zero mean everywhere and we use the notation CN p (Σ).
When the complex field C is identified with R 2 it becomes a R-vector space, and the monomials must be replaced by complex monomials. The object of this paper is the computation of the moments of a CGD, i.e., the expected values of the complex monomials ∏ The computation of Gaussian moments is a classical subject that relies on a result usually called Wick's (or Isserlis') theorem, see ( [3], Ch. 1). The real and the complex cases are similar, but the complex case has a peculiar combinatorics. Actually, from many perspectives, the complex case is simpler, as observed in Section 2.3.
The paper is organised as follows. In Section 2 we offer a concise but complete overview of the basic results concerning the CGD. In particular we give a proof of Wick's theorem based on a version of the Faà di Bruno formula. In Section 3 we present recurrence relations for the moments and apply them to derive a new closed-form equation for the moments. Other results are sufficient conditions for the nullity of a moment, which is a feature where the complex case is different from the real case. The presentation is supported by a small running example. In Section 4 we present conditions on the moment of interest and on the sparsity of the correlation matrix that ensure the factorisation of that moment into a product of lower degree moments. Both the results on the closed form of the moment formula and the factorisation are expected to produce computational algorithm of interest. This issue is discussed in the final Section 5. The standard version of the Wick's theorem reduces the computation of the moments to the computation of the permanent of a matrix for which optimised algorithms have been designed. We have not performed this optimisation for the algorithms suggested by our results so that a full comparison is not possible by now. Some technical combinatorial computations are presented in the Appendix A.

The Multivariate Complex Gaussian Distribution and Its Moments
The identification C z = x + iy ↔ (x, y) ∈ R 2 turns C into a 2-dimensional real vector space with inner product z 1 , z 2 = z 1 z 2 . The image of the Lebesgue measure dxdy is denoted dz.
If C is seen as a real vector space of dimension 2, then the space of linear operators has dimension 4. It is easy to verify that a generic linear operator A has the form A : z → αz + βz , α, β ∈ C . (1) Notice that the linear operator z → αz is just a special case. A generic R-multilinear operator is of the form (z 1 , . . . , z k ) → ∑ L⊂{1,...,k} A general complex monomial ∏ p j=1 z n j j z m j j is obtained from a suitable multilinear form by identifying some of the variables e.g., z 1 z 1 z 2 = t 1 t 2 t 3 , with t 1 = t 2 = z 1 , t 3 = z 2 and L = {1, 3}.
The set of p-variate complex monomials characterise probability distributions on C p .

Calculus on C
If f : C → C is differentiable, the derivative at z in the direction h is expressed in the form in and the derivative operators D − and D + are related with the Cartesian derivatives by the equations The operators appearing in the equation above are sometimes called Wirtinger derivatives and denoted by D − = ∂/∂z and D + = ∂/∂z. The Wirtinger derivatives act on complex monomials as follows D − z n z m = nz n−1 z m , D + z n z m = mz n z m−1 .
If f is k-times differentiable, the the k-th derivative at z in the direction h 1 , . . . , h k is where h − j = h j and h + j = h j . We are going to use the following form of the Faà di Bruno formula, see [4].
Proposition 1. Let f : C → C and g : R h → C. For each set of commuting derivative operators D 1 , . . . , D k , where Π(k) is the set of partitions of {1, . . . , k}, d #π is defined in Equation (4) and D B = ∏ b∈B D b .
Proof. The proof easily follows by induction on k by using the fact that each partition of {1, . . . , k} is derived by a partition of {1, . . . , (k − 1)} by adding the singleton {k} as a new element of the k-partition, and by adding k to each element of the (k − 1)-partition.

Remark 1.
The formula applies either when each derivation is a different partial derivation D j = ∂ ∂x j or when D i = D j , for some i, j. In case of repeated derivations, some terms in the RHS appear more than once. If equal factors are collected, combinatorial counts appear. In the following, first we use the basic formula, then we switch, in the general case, to a different approach, based on the explicit use of recursion formulae instead of closed form equations.

Complex Gaussian
The p-variate Complex Gaussian Distribution (CGD) is, when expressed in the real space R 2p , a special case of a 2p-variate Gaussian Distribution. The definition is given below.
The univariate CGD is the distribution of a complex random variable Z = X + iY were the real and imaginary part form a couple of independent identically distributed centered Gaussian X and Y. As x 2 + y 2 = zz, the density of Z is ϕ(z) = 1 2πσ exp − 1 2σ 2 zz . The complex variance is γ = E ZZ = 2σ 2 and we write Z ∼ CN 1 (γ). Notice that the standard Complex Gaussian has γ = 1 that is, If Z ∼ CN 1 (γ) and α ∈ C, then it is easy to prove that αZ ∼ CN 1 (ααγ). The univariate complex Gaussian is sometimes called "circularly symmetric" because e iθ Z ∼ Z. Moreover, Z ∼ Z.
Consider d independent standard Complex Gaussian random variables, U j , j = 1, . . . , d and let C = [c ij ] be a p × d complex matrix. As in the real case, the distribution of Z = CU, U = (U 1 , . . . , U d ), is a multivariate Complex Gaussian Z ∼ CN p (Σ), with covariance matrix Σ = CC * . In the special case of a non singular covariance matrix Σ, the density exists and is obtained by performing the change of variable see [2]. Our aim is to discuss various methods to compute the value of a given complex moment of a normal random vector Z ∼ CN p (Σ) namely, ν(n 1 , . . . , n p ; m 1 , . . . , where n 1 , . . . , m p are non-negative integers. In the case of independent standard CG's, i.e., identity covariance matrix, we have which is zero unless m j = n j for all j = 1, . . . , p. In the general case, each component Z j is a C-linear combination of independent standard U j 's, so that each moment is the result of C-multilinear and C-anti-multilinear computations. The result will be a sum of products hence, one should expect numerically hard computations. Various approaches are available and their respective merits depend largely on the setting of the problem: number of variables, total degree of the complex monomial, sparsity of the covariance matrix. All this issues will be discussed in the following.

Wick's Theorem
Let us first consider the case where all the exponents in the complex monomial are 1. The general case is a special case of that one, where some of the variables are equal. The unity case is solved by the classical Wick's theorem (or Isserlis' theorem). In real case, for example in ( [3], [Th. 1.28]), if X 1 , . . . , X q is a centered (real) Gaussian vector with covariance matrix Σ = [σ ij ] q i,j=1 , then, if q is even where P (q) is the set of all partitions of {1, . . . , q} into subsets of two elements. If q is odd, then E X 1 · · · X q = 0.
The moment can be zero even in special cases depending on the sparsity of the covariance matrix. For example, where S(q) is the symmetric group of permutations on {1, . . . , q}. The properties of the permanent are discussed in [6].

Remark 2.
We note that the same argument shows that unequal lengths of the real and complex part give zero. In case of repeated components, i.e., non-unit exponent, the condition for nullity is the fact the sum of the two blocks of exponent are different.

Remark 3.
We observe that the complex case can be considered, in some perspective, more simpler than the real one. In fact, for example, when summing over Wick pairings, the former case considers only pairings matching Z i to T j variables; while the real case considers all pairings (and thus the sums involved are more complicated).

Computation of the Moments via Recurrence Relations
In the previous section we have offered a compact review of Wick's theorem which is a tool for the computation of the moments of the CGD.
The case were there exponents in the complex moment are not all equal 1 is reduced to the case of the theorem by considering repeated components. However, repeated components lead to identities between terms in the expansion of the permanent, which is always an homogeneous polynomial in the covariances. In this section we derive expressions of the permanent where all the monomials appear once, presented as a system of recurrence relations among the moments of a Z ∼ CN p (Σ). In conclusion, we present an explicit formula for the moments, which is an homogeneous polynomial in the elements of Σ in standard form.
Let us first introduce some definitions.
≥0 be a multi-index, let Z ∼ CN p (Σ), and let

Lemma 1.
Let us assume that the covariance matrix Σ is not degenerate and let ϕ be the associated density of Equation (7). For each bounded function f : C → C with Wirtinger derivatives bounded by a polynomial, the following relations hold:

Proof of 1.
We prove the thesis component-wise, dropping the index j.
. We prove that z = Σ t D − g and z = ΣD + g, and so the thesis follows trivially. We have ∂ ∂x k z = e k and ∂ ∂y k z = ie k , with {e 1 , . . . , e p } the canonical basis of R p . As the directional derivative of g in the direction r is d r g and we have for each k = 1, . . . , p that

Recurrence Relations
We prove in the following proposition recurrence relations in which a moment is expressed as a linear combination of moments with total degree reduced by 2. The proof is based on Lemma 1, hence it assumes that the covariance matrix is non-degenerate.
Proposition 2 (Recurrence relations for the moments). Given the multi-index α with supporting sets N and M, there are #N + #M ≤ 2p recurrence relations for the moment ν(α) As a consequence, the moment is zero unless ∑ p h=1 n h = ∑ p k=1 m k .
Proof. Let f (z, α) = ∏ p j=1 z n j j z m j j be the complex monomial with exponent α, and let h ∈ N. We denote by {e 1 , . . . , e 2p }, the canonical basis of R 2p .
Notice that such the proof holds without requiring any conditions on ∑ p h=1 n h and ∑ p k=1 m k . The stated sufficient condition for the nullity of the moment, ∑ h∈N n h = ∑ k∈M m k , is derived by considering a linear combination of the recurrence relations. In fact,

Remark 4.
If ∑ p h=1 n h = ∑ p k=1 m k = q, the recurrence relations in Proposition 2 coincide with the recurrence formula for computing the permanent of a q × q matrix Γ, derived from Σ splitting the h-th row in n h copies and the k-th column in m k copies.
The recursive formula for the permanent of a q × q matrix Γ, developed with respect to the r-th row is, see [6]: where Γ − rj is obtained from Γ deleting the r-th row and the j-th column. Suppose that the first n 1 rows and the first m 1 columns of Γ are obtained repeating n 1 times the first row of Σ and m 1 times the first colum of Σ, respectively. If 1 ≤ r ≤ n 1 and 1 ≤ j ≤ m 1 , then γ rj = σ 11 , and so since the matrices Γ − 1j are all the same between them for 1 ≤ j ≤ m 1 . The matrix Γ − 11 is associated to the multi-index α − 11 = (n 1 − 1, n 2 , . . . , n p , m 1 − 1, m 2 , . . . , m p ). Then per(Γ − 11 ) = ν(α − 11 ) and so ∑ m 1 j=1 γ 1j per(Γ − 1j ) = m 1 σ 11 ν(α − 11 ). The thesis follows by considering the sums associated to successive blocks of repeated columns.
The nullity of a moment depends also on the sparsity of the covariance matrix. For example, here is a simple sufficient condition.

Corollary 1.
If there exists h ∈ N such that σ hk = 0 for all k ∈ M or if there exists k ∈ M such that σ hk = 0 for all h ∈ N then ν(α) = 0.
Proof. In such cases the recurrence relations in Proposition 2 consist of null addends.

Closed Form
The recurrence relations in Proposition 2 allow us to compute a complex moment as a linear function of lower degree moments. Hence, each complex moment is a polynomial of the elementary moments σ hk , h ∈ N, k ∈ M. For example, in the simple case where α is proportional to a multi-index of an elementary moment, α = n β hk , n ∈ Z ≥0 , β hk = e h + e p+k , h ∈ N, k ∈ M, then each recurrence relation contains one term only, so that ν(n β hk ) = n! ν(β hk ) n = n! σ n hk .

1.
The free elements of the vector a are (p − 1) 2 . In fact, for each h = 1, . . . , p − 1, the elements a hp , a ph and a pp are: a ij .

2.
If there exists an index r such that n r = 0, then a rk = 0, for k = 1 . . . p. Analogously, if there exists an index s such that m s = 0, then a hs = 0, for h = 1 . . . p.
For simplicity α and a are omitted. This proof is based on Proposition A1 in Appendix A that states 0 ≤ l ij ≤ L ij .
Proof of 1. We show, by induction, the thesis for a ip .
The proof for a pj is analogous. Finally, from a ip and a pj , we obtain, by direct computation, a pp .

Proof of 2. We have
and so, a rk = 0, k = 1, . . . , (p − 1). Since a rp = n r − ∑ p−1 k=1 a rk , we have a rp = 0. Analogously for a hs . Example 1. If p = 3 we have I(α) = a = (a 11 , a 12 , a 21 , a 22 ) ∈ Z 4 The following theorem gives an explicit expression of the α-moment of CN p (Σ). The proof is based on several propositions given in the Appendix A.

Theorem 2.
Let α be a multi-index with N and M supporting sets. Assume that ∑ h∈N n h = ∑ k∈M m k . Then the α-moment of CN p (Σ) is  First, we show that ∏ p h=1 n h !m h ! ∑ a∈I(α) P(a) is the elementary moment σ hk when α = β hk . Let α = β hk . Since n i = m j = 0 for each i = h and j = k, Item 2 of Proposition 3 implies that a ir = a rj = 0, for i = h, j = k and r ≤ p, that is a hk is the unique element of the vector a different from zero. Furthermore a hk = 1 since l hk = 1 and L hk = 1 and the thesis follows. Now, we show that ∏ p h=1 n h !m h ! ∑ a∈I(α) P(a) satisfies the recurrence relations for a general α. σ a rs rs a rs ! = (a hk + 1) P(a + hk ) .

Remark 5.
For each elementary moment σ hk = 0, if a hk = 0, the corresponding addend of the sum on I(α) is null. Then, in the sum on I(α) are present only the addends such that a hk = 0, since we assume 0 0 = 1.

Remark 6.
It should be noted that Equation (11) of Theorem 2 contains the multinomial coefficients p ∏ h=1 n h a h1 . . . a hp and p ∏ h=1 m h a 1h . . . a ph that are related to the cardinality of the special permutations of equal terms in the permanent. This remark prompts for a purely combinatorial proof of the equation for the moments. However, it should be noted that the specific value of α induces on each a hk the constrains provided by the limits l hk and L hk that are stated in Definition 2 and Proposition 3. We do not follow here this line of investigation. We thank an anonymous referee for suggesting this remark.

Example 2.
Let us consider the case with p = 2 and p = 3.

Factorisation
In this section we present a factorisation of the complex moments which depends on the null elements of Σ and on the supporting sets N and M of the given moment. The argument we use is not based on independence assumptions. Such a factorisation is possible when there exists a non trivial partition of the supporting sets induced by the non null elements of covariance matrix Σ. Notice that the partitions of M and N are uniquely defined and they can be the trivial partition. Theorem 3. Let {N r } Q r=1 and {M r } Q r=1 be the partition induced by Σ. Let α r be the multi-index α restricted to N r ∪ M r , α r = (n h , m h ) h∈N r ∪M r . The moment ν(α) is given by Proof. We use the notations of Theorem 2. Let A r = N r × M r . There are no edges in G outside each connected component, σ hk = 0 if (h, k) / ∈ ∪ r A r . According to the argument in Remark 5, each a hk ∈ I(α) is zero unless (h, k) ∈ ∪ r A r . By cancelling trivial factors, we have , I * (α) = {a ∈ I(α)| a hk = 0, (h, k) ∈ ∪ r A r } , so that The factorisation of the previous theorem reduces the computational complexity for computing ν(α), since each factor is a moment of lower order. The computation of the connected components of a graph requires linear time, in terms of the numbers of its nodes and its edges, see [7]. In presence of a non-trivial induced partition of the supporting sets, the following corollary shows necessary conditions for the nullity of the moment ν(α). Proof. If there exists r ∈ {1, . . . , Q} such that ∑ h∈N r ∪M r n h = ∑ k∈N r ∪M r m k , then ν(α r ) is null and the thesis follows.

Remark 7.
If there exists an h ∈ N such that σ hk = 0 for each k ∈ M or if there exists an k ∈ M such that σ hk = 0 for each h ∈ N, then Corollary 1 shows that ν(α) = 0. This is a degenerate case, where the bipartite graph G has a connected component (∅, M r ) or (N r , ∅), for some r. It follows that, for example, if there exists a connected component (∅, M r ), then ∑ k∈N r ∪M r m k = 0 and ∑ h∈N r ∪M r n h = 0 since h ∈ M r does not belong to N. In such a case, we have ν(α) = 0.
The permuted matrix highlights the induced partitions. The conditions of Corollary 2 for the nullity of ν(α) are n 1 + n 2 + n 5 = m 1 + m 5 or n 3 + n 4 = m 4 .
We compute the moments with two different α.

Discussion
This piece of research about the moments of the CGD was originally motivated by the desire to evaluate the approximation error of a cubature formula with nodes on a suitable subset of the complex roots of the unity, first introduced in [8].
In this paper, we discussed particular cases of the Wick's theorem for the CGD. When the exponent α of the complex moment is not 0-1 valued, the permanent has repeated terms in the sum. By collecting them, one obtains the form of Theorem 2, which is an homogeneous polynomial in the covariances. By the way of this expression, cases of factorisation of the moments have been derived in Theorem 3.
The relevance of Theorem 2 is mainly theoretical. On one side, it allows for the analysis of the moment's behaviour as function of the elementary moments σ ij . On the other side, our the proof of the factorisation depends on it. The factorisation, if any, reduces the computational complexity of ν(α).
Let us discuss briefly the complexity of the computation of moments that do not factorise. We can compare the method presented in this paper, based on Equation (11), with the method based on the permanent, see Equation (9). For simplicity, we consider Z ∼ CN p (Σ), with Σ without null elements, and all the exponents of the moment equal to n, i.e., α = (n, . . . , n; n, . . . , n).
The computation of the moment as the permanent of a np × np matrix requires (np)! products using a raw algorithm and it requires (np)2 np−1 products using the Ryser algorithm, see [6].
The optimisation of the algorithm for Equation (11) is outside the scope of this paper. The raw implementation of such a formula requires first to compute, only once, j!, for j = 1, . . . , p. Then, for each a ∈ I(α), the computation of P(a) which requires 2np products, since ∑ p h,k=1 a hk = np. A conservative upper bound of #I(α) is obtained by assuming that each a ij ∈ [0, n]. It follows that #I(α) < n (p−1) 2 , and so the the complexity of Equation (11) is less than 2pn 1+(p−1) 2 . Actually, the #I(α) is much smaller in most cases. For instance, if p = 5, n = 6 then the effective #I(α) = 1.6 × 10 8 , while n (p−1) 2 = 2.8 × 10 12 .
Notice that the complexity of our formula, 2pn 1+(p−1) 2 , is much smaller than the complexity of the raw version of the permanent, (np)!. Furthermore, comparing the complexity of the Ryser algorithm to the upper bound of our algorithm, we find, by direct computation, that np2 np−1 > 2npn (p−1) 2 when p is small and n is large. For example our algorithm is less expensive for p = 5 and n ≥ 12. We expect that the Ryser optimisation techniques applied to Equation (11) will lead to a further reduction in complexity.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Properties of the bounds l ij and L ij
The following proposition states that the definition of I(α) in Definition 2 is consistent. Proposition A1. Let α, a, l ij (α, a) and L ij (α, a) be as in Definition 2. If the entries a hk of the vector a, h ≤ i, k ≤ j, (h, k) = (i, j), satisfy the bounds l hk (α, a) ≤ a hk ≤ L hk (α, a), then 0 ≤ l ij (α, a) ≤ L ij (α, a), Proof. For simplicity α and a are omitted. Obviously, for each i, j, we have l ij ≥ 0 and so a ij ≥ 0. If (i, j) = (1, 1), l 11 = 0 ∨ (n 1 − ∑ p k=2 m k ) and L 11 = n 1 ∧ m 1 . The thesis follows straightforward. The case with (ij) = (1, 1) is proved by induction.
We have a 1, The case L 1j = m j also implies l 1j ≤ L 1j . In fact, from the inductive hypothesis, we have Analogously, the relation between l i,1 and L i,1 can be shown.

•
Induction step. We show that l ij ≤ L ij , by assuming that l hk ≤ a hk ≤ L hk for h < i, k ≤ p, so that l i−1,j ≤ a i−1,j ≤ L i−1,j , and for h ≤ p, k < j, so that l i,j−1 ≤ a i,j−1 ≤ L i,j−1 . It follows that L ij ≥ 0 since the inequalities Furthermore, from l i,j−1 ≤ a i,j−1 and from l i−1,j ≤ a i−1,j it follows We conclude that L ij ≥ l ij . In fact L ij ≥ 0 and Proposition A1 also holds when some values n r and m s are equal to zero.
Proof. For simplicity α and a are omitted.

1.
Let a ij = L ij = m j − ∑ i−1 h=1 a hj . Then m j = ∑ i h=1 a hj . We show, by induction, that a qj = 0 for q > i.

2.
Analogously to previous Item.

3.
Since a ij = ∑ i h=1 n h − ∑ p k=j+1 m k − ∑ h≤i;k≤j (h,k) =(i,j) a hk then ∑ i h=1 n h − ∑ p k=j+1 m k = ∑ h≤i; k≤j a hk and so, for each t > j it holds We show by induction that a it = m t − ∑ i−1 h=1 a ht for each t > j.
• Induction step: let a ik = m k − ∑ i−1 h=1 a hk for j + 1 ≤ k < t. By Equation (A1) a ht ≥ L it ≥ l it and so l it = L it , and the thesis follows since l it ≤ a it ≤ L it .
Analogously, we can show a qj = n q − ∑ j−1 k=1 a qk for q > i.
Proof. We define the set I * hk = b|b ij = a ij , (i, j) = (h, k); b hk = a hk + 1; a ∈ I(α − hk ) . We consider the bounds for b ∈ I * hk . Using n h − 1 and m k − 1 instead of n h and m k , by direct computation we obtain the following conditions.