Capacity-Achieving Input Distributions of Additive Vector Gaussian Noise Channels: Even-Moment Constraints and Unbounded or Compact Support

We investigate the support of a capacity-achieving input to a vector-valued Gaussian noise channel. The input is subjected to a radial even-moment constraint and is either allowed to take any value in Rn or is restricted to a given compact subset of Rn. It is shown that the support of the capacity-achieving distribution is composed of a countable union of submanifolds, each with a dimension of n−1 or less. When the input is restricted to a compact subset of Rn, this union is finite. Finally, the support of the capacity-achieving distribution is shown to have Lebesgue measure 0 and to be nowhere dense in Rn.


Introduction
In this paper, we consider the support of the capacity-achieving input to a vectorvalued channel that is subject to additive non-degenerate Gaussian noise. Vector-valued channels are used in a variety of applications, including the complex-valued inputs and outputs of quadrature channels, which have alternate representations as two-dimensional real vectors. Larger antenna arrays enable Multiple-Input Multiple-Output (MIMO) channels, which have inputs of n ≥ 1 complex components. Additionally, noise with memory can be expressed by correlated noise components in a vector-valued channel.
Throughout the paper, the average input power is bounded to limit the consumption of environmental, battery, and monetary resources. Since the output of the amplifiers used in transmitters is severely distorted when the input is too large [1][2][3] and signals that are too small can be challenging to produce, it is also of practical interest to restrict the input to an arbitrary compact set.
There has been appreciable prior effort dedicated to understanding the capacityachieving input to vector-valued channels subject to average power constraints and restrictions to compact sets. Nevertheless, there are significant technical challenges in working with vector-valued inputs. Therefore, much of the work to this point has been limited to either one-dimensional channels [4] or spherically symmetric channels [5][6][7][8], where the latter case ensures that the capacity-achieving distribution can be expressed as a univariate function of the radius. However, this restriction limits the scope of study to channels in which the input is constrained to a ball and the noise components are independent and identically distributed. In this paper, the only assumption made on the Gaussian noise distribution is that it is non-degenerate. We consider both cases, those in which inputs are restricted to arbitrary compact sets and those in which inputs are allowed to take any value in R n .
The power of a vector-valued signal is equivalent to the second moment of its Euclidean norm. A constraint on the fourth moment then has the practical interpretation of limiting the second moment of the instantaneous power. Furthermore, imposing a moment constraint of order 2k ensures that the tails of the input distribution decay at least as quickly as a degree 2k monomial. Therefore, increasing the even-moment constraint penalizes large inputs without imposing a strict cutoff. This motivates us to generalize the average power constraint by limiting some even moment of the input's Euclidean norm.
The results in this paper apply to any combination of input constraints described above, except for the special case where the input is allowed to take any value in R n and is subject to a second-moment constraint. This case reduces to a classical result in which the capacity-achieving distribution is known to be Gaussian. For all other cases, we show that the capacity-achieving distribution is contained in a countable union of i-dimensional submanifolds, where i ranges {0, . . . , n − 1}. Furthermore, this union is finite when the input is restricted to a compact set. We then show that the support of the capacity-achieving distribution is a nowhere dense set with Lebesgue measure 0.
The paper is organized as follows. We first give a review of prior work in Section 2. Sections 3.1-3.3 provide intermediary results prior to the main results in Section 3.4. Section 4 concludes the paper.

Prior Work
Dating back to Shannon's work in [9], much of the research on continuous channels has focused on average power (equivalently, second-moment) constraints on the input. A transmitter's inability to produce arbitrarily large powers then led to the consideration of additional peak power constraints, modeled by restricting the input almost surely to compact sets.
The first major result on amplitude-constrained channels considers a scalar Additive White Gaussian Noise (AWGN) model, both with and without a variance constraint [4]. In each case, the support of the capacity-achieving distribution has a finite number of points.
The use of the Identity Theorem for functions of a single complex variable is key to the argument of [4] and many papers that follow. The theorem can be applied to any univariate analytic function that has an accumulation point of zeros. By contrast, the Identity Theorem in n complex dimensions requires an analytic function with an open set of zeros in C n . Therefore, to apply the Identity Theorem directly for n > 1, a random vector with support containing an open subset of C n must be considered. It was suspected by some authors that, since R n is not open in C n , no topological assumption on the support of the capacity-achieving distribution would be sufficient for this purpose [5,10,11]. Therefore, many papers restrict their models to ones that maintain spherical symmetry so that the capacity-achieving distribution can be expressed as a one-dimensional function of radius (e.g., Refs. [5][6][7][8]).
In [5] and [6], the results of [4] are extended to multivariate spherically symmetric channels, in n dimensions and 2 dimensions, respectively. In both papers, the inputs are subject to average and peak radial constraints. It is shown that it is optimal to concentrate the input on a finite number of concentric shells.
In [7] and [8], the number and positioning of optimal concentric shells under a peak radial constraint is studied. In [7], the least restrictive amplitude constraint for which the optimal distribution is concentrated on a single sphere is found. In [8], it is shown that the number of shells grows at most quadratically in the amplitude constraint. A similar result is found for n = 1 under an additional average power constraint.
While much of the prior work has focused on spherically symmetric channels, some research has considered spherically asymmetric channels. The case of inputs constrained to arbitrary compact sets and subject to a finite number of quadratic cost constraints as well as non-degenerate multivariate Gaussian noise is considered in [12]. It is concluded that the support of the capacity-achieving distribution must be "sparse"-that is, there must exist a not identically zero analytic function that is 0 on the support of the capacity-achieving distribution. Assuming otherwise leads to a contradiction by the n-dimensional Identity Theorem and Fourier analysis. These results, while quite general, do not consider either inputs of unbounded support or inputs subject to higher-moment constraints. Furthermore, outside of the special cases of n = 1 and spherically symmetric channels, they do not explore a characterization of sparse sets in R n .
In [13], MIMO channels with inputs that are restricted to compact sets, yet have no average power constraints, are considered. Using the Real-Analytic Identity Theorem and steps similar to [4], it is determined that the support of the optimal input distribution is nowhere dense in R n and has Lebesgue measure 0. For the case considered in [12] that coincides with this setup, [13] gives an instance of sparsity in terms of subsets of R n , rather than analytic functions.
There has also been work dedicated to generalizing the classic quadratic average cost constraint. In [14], a scalar channel with the input subject to a combination of even-moment constraints and restrictions to compact or non-negative subsets of R is studied. It is shown that, in most of the cases considered, the support has a finite number of points.
In [15], a complex-valued non-dispersive optical channel is considered, where the input is subject to an average cost that grows super-quadratically in radius, a peak constraint, or both. The noise is taken to be circularly symmetric and, under these conditions, so is the optimal input. The number of concentric circles composing the support of the distribution is shown to be finite.
In this paper, we study an n-dimensional channel subject to non-degenerate Gaussian noise. The input can either take any value in R n or is restricted to a compact subset of R n , and its norm is subject to even-moment constraints. The noise need not be spherically symmetric.
This paper gives a characterization of the capacity-achieving distribution to spherically asymmetric channels under peak and average power constraints that improves on prior work in three respects. Firstly, when our cases overlap with [12], our characterization of the capacity-achieving distribution is more detailed than the notion of sparsity used there. Secondly, our results apply to multivariate channels with inputs subject to even-moment constraints greater than 2. Thirdly, we consider both inputs that are restricted to compact sets and those that are allowed to take any value in R n .

Results
In this section, we consider R n -valued inputs subject to additive non-degenerate multivariate Gaussian noise. In Section 3.1, the capacity-achieving distribution, F * , is formulated as the objective of an optimization problem; its support is then framed in terms of the zero set of a certain real-analytic function, which is dependent on F * and referred to as s(·; F * ). Section 3.2 finds an equivalent expression for s(·; F * ), which is an intermediary step to showing in Section 3.3 that s(·; F * ) is non-constant. Section 3.4 uses the result that s(·; F * ) is non-constant to show that the support of the capacity-achieving distribution is contained in a countable union of submanifolds of dimensions in the range {0, . . . , n − 1}. This union is finite when the input is constrained to a compact subset of R n . It is then shown that the support of the capacity-achieving input has Lebesgue measure 0 and is nowhere dense in R n .
Appendix A is dedicated to showing the convexity and compactness of the optimization space used in Section 3.1. Appendix B establishes a pointwise characterization of supp(F * ), which justifies the definition of s(·, F * ). Appendix C provides integrability results, which are used throughout the paper. Appendix D shows that the objective functional is weak continuous, strictly concave, and weak differentiable on the optimization space. Appendix E shows that s(·, F * ) has an analytic extension to C n . Finally, Appendix F supports Section 3.3 by finding bounds for certain functions.
As a first step towards defining the set of feasible input distributions, let F (R n ) be the set of finite Borel measures on R n . Note that F (R n ) is contained in the set of finite signed Borel measures on R n , which has an intrinsic vector space structure and can be equipped with a norm [16]. Since F (R n ) lies within a normed vector space, the convexity and compactness of its subsets can be discussed.
The possibility that the transmitter is unable to produce arbitrary signals in R n is modeled by restricting the input to an alphabet A ⊆ R n . Denote the set of distributions for which the associated random variable is almost surely in A by Two cases for A are considered: A R n is compact.
In addition to the restriction to A, a radial even-moment constraint is associated with the input. For k ∈ Z >0 , the input must belong to the set The resulting channel model, with input X ∼ F ∈ P n (A, k, a), is where Y and N ∼ N (0, Σ) are output and noise, respectively, and A is an invertible matrix known to the transmitter and receiver. It is assumed that the noise covariance matrix Σ is positive-definite. We will simplify the analysis of (3) by showing that no generality is lost in assuming that A = I n , the n-dimensional identity matrix, and Σ is diagonal. Since Σ is positivedefinite and A is invertible, the positive-definite matrix A −1 Σ(A −1 ) T can be diagonalized by the orthogonal matrix Q. Now, multiplying the output Y in (3) by QA −1 , the receiver obtains whereX = QX and the covariance matrix ofÑ = QA −1 N is diagonal. Since QA −1 is invertible, I(X; Y) = I(X;Ỹ). Furthermore, since Q is orthogonal, and the set {Qx|x ∈ A} is merely a rotated version of A. Hence, no generality is lost by dropping the " ∼ " and adopting the following channel model for the remainder of the paper: where N ∼ N (0, Σ) and Σ is diagonal with entries 0 < σ 2 1 ≤ σ 2 2 ≤ . . . ≤ σ 2 n . The density of N is denoted by p N (·).

Optimization Problem
By Theorem 3.6.2 of [17], the capacity of the channel in (7) is given by the optimization problem Since the relationship between Y, X, and N is known, the mutual information is a function of the distribution of X alone. Thus, the mutual information induced between X ∼ F and Y will be denoted by I(F). Similarly, we express the even-moment constraint in terms of a functional g k : F n (A) → R ∪ {∞} given by where g k (F) ≤ 0 is equivalent to E X∼F [ X 2k ] ≤ a. Rewriting (8) in terms of I(·) and g k (·) yields C = sup F∈P n (A,k,a) Much of the appendix is dedicated to understanding properties of the problem presented in (10). It is shown in Theorem A1 that P n (A, k, a) is convex and compact. Furthermore, by Theorems A3 and A4, I(·) is a weakly continuous and strictly concave function on P n (A, k, a). Therefore, the supremum is achieved by a unique input distribution F * ∈ P n (A, k, a) (see, e.g., Appendix C of [14])-that is, We use the notation X * to describe a capacity-achieving input directly (i.e., X * ∼ F * ). Before proceeding, we require some definitions and notations. In the first definition, and throughout the paper, for x ∈ R n and r > 0, we denote the ball of radius r centered at x by B r (x) ⊆ R n . We will denote the closure of B r (x) by B r (x). The output density induced by an input X ∼ F is denoted p(·; F).

Definition 1.
Let V be a random variable with alphabet A ⊆ R n . Then, the support of V is the set given by If V has distribution F V , we may alternatively refer to supp(F V ) supp(V ).

Definition 2.
For F ∈ F n (A), the output differential entropy is given by p(y; F) ln p(y; F) dy (13) and the marginal entropy density at x ∈ R n is given by whenever the integrals exist.
The relationship between the differential entropy and marginal entropy density can be seen as follows. For any b > 0 and F 1 , = − R n p(y; F 2 ) ln p(y; F 1 ) dy.
Lastly, define and let s(·; F * ) : R n → R be given by Since F * ∈ P n (A, k, a), we have that h(x; F * ) is finite for all x ∈ R n and conclude that s(x; F * ) is also finite for all x ∈ R n . Furthermore, by Lemma A8, s(·; F * ) can be extended to a complex analytic function for all z ∈ C n ; hence, it is continuous. The remainder of Section 3.1 consists of two steps: 1.
We show that F * solves (11) if and only if there exists γ ≥ 0 such that for all F ∈ Q n (A, k, a).

2.
We show that, for the choice of γ in (22), the inequality in (22) is equivalent to the condition that for all x ∈ A, and if x ∈ supp(F * ), then For x ∈ A, (24) is satisfied if and only if s(x; F * ) = 0. To establish (22), we will first use a Lagrange multiplier to reformulate (11) as an unconstrained problem over Q n (A, k, a). We will then obtain (22) by taking the weak derivative of the resulting objective functional and applying a Karush-Kuhn-Tucker condition for optimality. We choose to work in the space Q n (A, k, a) since, when A = R n , the functionals I(·) and g k (·) are not weakly differentiable on the larger space F n (A) = F n (R n ). (11) can equivalently be written as where F * is the same as in (11). By Theorem A5, g k (·) is convex. Moreover, letting F s ∈ Q n (A, k, a) be a Heaviside step function at 0 ∈ R n , F s is an interior point of the feasible where and γg k (F * ) = 0 (see e.g., Appendix C of [14]). Furthermore, for an arbitrary b ≥ a, F * ∈ P n (A, k, b) ⊆ Q n (A, k, a). Therefore, for this choice of γ, we also have By Theorems A5 and A6, J γ (·) has a weak derivative at F * in the direction of any F ∈ P n (A, k, b) given by where the differential entropy of the noise h(N) = 1 2 ln |2πeΣ| is finite since Σ is positive definite. Now, J γ (·) is the difference between a strictly concave function (see Theorem A4) and a convex function (due to Theorem A5 and the non-negativity of γ). Therefore, J γ (·) is strictly concave and F * is optimal if and only if, for all F ∈ P n (A, k, b), However, b ≥ a is arbitrary and each F ∈ Q n (A, k, a) satisfies F ∈ P n (A, k, b) for some b ≥ a. Therefore, F * is optimal if and only if, for all F ∈ Q n (A, k, a), which is the statement we sought to show in (22). The condition (34) is on the capacity-achieving distribution F * itself. Since our objective is to characterize supp(F * ), we find an equivalent condition to (34) to describe the behavior of F * at individual points in the input alphabet A. Thus, by (34) and Theorem A2, for all and if x ∈ supp(F * ), then The rest of Section 3 is dedicated to exploiting the relationship where Z (s; F * ) is the zero set of s(·, F * ).

Hilbert Space and Hermite Polynomial Representation
In this subsection, an equivalent expression for (36) is found by viewing the integral as an inner product in a Hilbert space and writing ln p(·; F * ) in terms of a Hermite polynomial basis for that space. Hermite polynomial bases are well-suited to analysis of Gaussian noise channels and they have been used in a number of information-theoretic papers (see, e.g., Refs. [14,19]).
Consider the Hilbert space equipped with inner product The inner product's subscript is omitted when the space can be inferred. Since the components of N are independent, with N i having variance σ 2 i > 0, the density of N factors into We will construct an orthogonal basis for L 2 p N (R n ) from orthogonal bases for the spaces L 2 [20], which are defined through the generating function For any m ∈ Z ≥0 , the mth Hermite polynomial has degree m and a positive leading coefficient. Next, for each i ∈ {1, . . . , n} and m i ∈ Z ≥0 , define the stretched Hermite polynomials The inner product of L 2 where ξ √ sense) that had a zero L 2 consequently, {H m (·; σ 2 )|m ∈ Z n ≥0 } forms an orthonormal basis for L 2 p N (R n ) [21], where Since, by Lemma A2, ln p(·; where equality is in an L 2 p N (R n ) sense. Then, substituting (47), and using the notations and for m ∈ Z n ≥0 and x ∈ R n , we write Substituting (49) and (55) into the integral term in (36) yields where This simplification to a polynomial will be helpful since the cost function associated with the even-moment constraint is also a polynomial. This relationship is exploited in Section 3.3.

Non-Constancy of s(·; F * )
Recall the relationship supp(F * ) ⊆ Z (s; F * ) ∩ A from (37). Since supp(F * ) = ∅, s(·; F * ) has at least one zero, it is constant if and only if Z (s; F * ) = R n . This subsection is dedicated to showing the latter equivalent condition. The immediate implication is that supp(F * ) is a strict subset of R n ; however, the fact that s(·; F * ) is a non-zero real-analytic function will be used in Section 3.4 to prove the main results.
By way of contradiction, suppose that, for all x ∈ R n , s(x; F * ) = 0. Substituting (59) into (36), this is equivalent to for all x ∈ R n . The discussion proceeds in two cases: k = 1 and k > 1.
Case k = 1: In the case that k = 1 and A = R n , X * is known to be Gaussian [22] and there is no contradiction with (61). Therefore, for k = 1, we focus only on compact input alphabets A R n .
With k = 1 and A R n , (61) reduces to for all x ∈ R n . Let e i be the i'th row of the n × n identity matrix and let 0 ∈ Z n ≥0 be the all zero vector. Since (62) holds for all x ∈ R n , matching coefficients gives Since, for each i ∈ {1, . . . , n} and m ∈ Z n ≥0 , H m i (y i ; σ 2 i ) has degree m i and a positive leading coefficient, also has degree m i in y i and the unique term with total degree d(m) = m 1 + . . . + m n is y m 1 · · · y m n , which has a positive coefficient. Therefore, the polynomials present in the sum are of the form H 0 (y; and, for i ∈ {1, . . . , n}, The constants κ 0 and κ 2e i are positive, while α 2e i and β 2e i are real. Substituting this and the identity c m = c m ρ m into (49) yields ln p(y; F * ) = c 0 ρ 0 H 0 (y; or equivalently, By definition, γ ≥ 0; however, γ = 0 results in a constant density on R n , which is invalid. Then, it must be the case that γ > 0. Thus, the output achieved by X * , Y * X * + N has independent Gaussian components. Since X * and N are independent and N is an n-variate Gaussian random variable, X * must either be an n-variate Gaussian random variable or be almost surely equal to some x 0 ∈ R n . In the former case, X * violates the stipulation that the input alphabet A is compact and contradicts the assumption that s(·; F * ) is identically 0. In the latter case, supp(F * ) = {x 0 } is trivial and satisfies the main results of the paper.
Case k > 1: For the case k > 1, we derive a contradiction to (61) using results on the rate of decay of a function compared with that of its Fourier transform to conclude that s(·; F * ) is not identically 0.

Lemma 1.
Let U ∈ R n have, for some β > 0, a characteristic function satisfying for all ω ∈ R n . Let V be a random variable independent of U. Then, the characteristic function of (71) Proof. By the independence of U and V , and the fact that characteristic functions have pointwise moduli upper-bounded by 1, Lemma 2. Let U ∈ R n have, for some constant β > 0, a characteristic function satisfying for all ω ∈ R n . Let V be a random variable independent of U and W = U + V have density p W (·). If there exist positive constants α and K such that, for all x ∈ R n , then αβ ≤ 0.5.
Proof. Apply Lemma 1 and Theorem 4 of [23], noting that an identically 0 function cannot be a density.
We make use of Lemma 2 by setting U = N, V = X * , and W = Y * and deriving a contradiction to the assumption that s(·; F * ) is identically 0. Note that, using Rayleigh quotients, the modulus of the characteristic function of N can be upper-bounded for any That is, the characteristic function of N satisfies (75) with β = σ 2 1 . To complete the contradiction, we show that there exists α > 0.5/β and K α > 0 such that p(·; F * ) satisfies the bound in (76). The assumption that s(·; F * ) is identically 0 yields (61); substituting the Multinomial Theorem, By coefficient matching in (78), the set of non-zero coefficients, other than c 0 , is indexed by the set Furthermore, for m ∈ M, Therefore, substituting (80) into (49), for some positive constant κ 0 = H 0 (y; σ 2 ). As with the case k = 1, γ = 0 results in a constant output density over R n and can be disregarded as a possibility. Thus, for each m ∈ M, we have c m < 0.
With W = Y * and α > 0, showing that there exists K α for which (76) holds, is equivalent to showing that is bounded. This, in turn, is equivalent to showing that the polynomial in the exponent, is upper bounded. We proceed by considering the degrees of the terms of q α (y) to determine the behavior of (84) as y increases. For each m ∈ Z n ≥0 and i ∈ {1, . . . , n}, H m (y; σ 2 ) has degree m i in y i . Furthermore, H m (y; σ 2 ) has total degree and the unique highest degree term, y m 1 1 · · · y m n n , has coefficient κ m > 0. Note that, since k > 1, and by the definition of M, q α (·) has total degree 2k ≥ 4. Hence, (84) can be rewritten as q α (y) = q (0) and q (2) α (y) is the sum of the remaining terms, each with a total degree of at most 2k − 2. Note the following: • For each y ∈ R n , q α (y) ≤ 0 and, by Lemma A9, the minimal value of |q (0) α (y)|-evaluated on a sphere y = L of radius L ≥ 0-is at least γ min i∈{1,...,n} {ρ 2ke i κ 2ke i }L 2k /n k .

•
For each y ∈ R n , we have that q The maximum value of |q (2) α (y)|, evaluated on a sphere y = L of radius L ≥ 1, is at most AL 2k−2 for some A > 0-that is, each term of q (2) α (y) is either of the form αy 2 i or c m ρ m ν m,l y l for some m ∈ M, l ∈ Z n ≥0 , and ν m,l ∈ R, where d(l) ≤ 2k − 2. Lemma A10 shows that these are no more than αL 2 or c m ρ m ν m,l L d(l) .
We conclude that, since q Thus, since q α (y) is a continuous function that satisfies (90), it is bounded from above. Let M q,α = sup y∈R n q α (y) and Then, for all y ∈ R n , p(y; F * ) ≤ K α e −α y 2 .
Recall that, with σ 2 1 > 0, the smallest eigenvalue of Σ, and β = σ 2 1 , the characteristic function of N satisfies (75). Let α = 1/σ 2 1 and choose K α according to (91). Then, p(y; F * ) satisfies (92), yet αβ = 1 > 0.5. Hence, the bound on the characteristic function of N given by (77) and the bound on the density of Y * given in (92) contradict Lemma 2. Therefore, the coefficient matching equation (78) cannot hold for all x ∈ R n and we conclude that, for k > 1, s(·; F * ) cannot be identically 0 on R n .
We summarize the results of the two cases, k = 1 and k > 1, in a theorem.

1.
A R n is compact, or 2.
A = R n , with k = 1.
An immediate consequence of Theorem 1 is that supp(F * ) is a strict subset of R n . Recall from Section 3.1 that s(·; F * ) has an analytic extension to C n . Therefore, Theorem 1 shows that supp(F * ) is "sparse" in the sense used by [12]-that is, there exists a non-zero function with an analytic extension to C n that is zero on supp(F * ). However, the primary importance of Theorem 1 is as an intermediary result that is used in Section 3.4 to obtain a better understanding of the structure of supp(F * ).

Main Results
In this section, we use geometry to show that supp(F * ) is contained in a countable disjoint union of submanifolds of dimensions ranging 0, . . . , n − 1. Furthermore, this union is finite when A is compact. We then show that supp(F * ) has Lebesgue measure 0 and is nowhere dense in R n .
The discussions in this section consider subsets of a vector's components; so, for x ∈ R n and i ∈ {1, . . . , n}, we introduce the notation Recall from (37) in Section 3.1 that Since, by Lemma A8, s(·; F * ) has an analytic extension to C n , it is real-analytic, which motivates us to study the geometry of zero sets of real-analytic functions. We start by restating Theorem 6.3.3 of [24] to the level that is needed in this paper.
Theorem 2 (Structure Theorem). Let ψ(·) : R n → R be a real-analytic function, where ψ(0, . . . , 0, x n ) is not identically 0 in x n . After a rotation of the coordinates x 1 , . . . , x n−1 , there exist constants δ m , m ∈ {1, . . . , n} such that with we have where V 0 is either empty or contains only the origin and V i , i ∈ {1, . . . , n − 1}, is a finite disjoint union of i-dimensional submanifolds-that is, for each i ∈ {1, . . . , n − 1}, there exists n i for which where each Γ j i is an i-dimensional submanifold. Furthermore, letting there exists an open set Ω j i ⊆ Q i and real-analytic functions α We apply Theorem 2 to characterize the zero set of s(·; F * ) in the form of (97) and obtain the following result. A R n is compact, or 2.
A = R n , with k = 1. Then, where T 0 is a countable union of isolated points and T i , i ∈ {1, . . . , n − 1}, is a countable disjoint union of i-dimensional submanifolds. Furthermore, if A is compact, these unions are finite.
Proof. First, note that, by Theorem 1, either supp(F * ) = {x 0 } for some x 0 ∈ R n or s(·; F * ) is not identically 0 on R n . In the former case, the result is trivially true; so, assume that s(·; F * ) is not identically 0 on R n . Therefore, for any q ∈ Q n , we can translate s(·; F * ) by q and rotate its coordinate system to apply Theorem 2-that is, there exists a sufficiently small open set Q q around q such that where the V i values are as in Theorem 2.
Defining the index set we obtain Since, for each q ∈ M, V q 0 is either empty or a single point, is a countable set of points and is finite when A is compact. Furthermore, each V q i , where i ∈ {1, . . . , n − 1}, is itself a finite union of i-dimensional submanifolds. Hence, is a countable union of i-dimensional submanifolds. When A is compact, this union is also finite.
Note that Theorem 3 agrees with the results of [5] when the cases overlap. Indeed, consider the case in which A is a ball centered at the origin, k = 1, and the noise covariance matrix Σ = tI n , where I n is the n × n identity matrix and t > 0. Then, [5] shows that the capacity-achieving distribution is supported on a finite number of concentric (n − 1)spheres. Each (n − 1)-sphere is an n − 1 dimensional submanifold.
In the next two theorems, we show that supp(F * ) has Lebesgue measure 0 and is nowhere dense in R n . Theorem 4. Suppose that either

1.
A R n is compact, or 2.
A = R n , with k = 1.
Let µ(·) denote the n-dimensional Lebesgue measure. Then, Proof. By Theorem 3, we have where M is countable. Note that for each q ∈ M, V q 0 is either empty or a single point; so, µ(V q 0 ) = 0. Furthermore, for each q ∈ M and i ∈ {1, . . . , n − 1}, V q i is a finite disjoint union of n q i i-dimensional submanifolds, and for i ≤ n − 1, each submanifold has Lebesgue measure 0. Therefore, µ(supp(F * )) = 0.
We will now define the notion of a subset being nowhere dense in its superset and show that supp(F * ) is nowhere dense in R n . A = R n , with k = 1.
Then, supp(F * ) is nowhere dense in R n .

Proof. By Theorem 1, either supp(F
Since, in the former, case supp(F * ) is nowhere dense in R n , assume the latter and let (115) hold. Let U ⊆ R n be a non-empty open set; we will show the result by proving that Z (s; F * ) ∩ U is not dense in U.
Fix x ∈ U. Translating s(·; F * ) by x, rotating the coordinate system and applying Theorem 2 shows that there exists a sufficiently small open set Q containing x on which It suffices to show the existence of a point of the form (x n−1 , u n ) ∈ U ∩ Q that is not the limit of any sequence in Z (s; F * ) ∩ U ∩ Q. Let (y n−1 m , y n,m ) be a convergent sequence in Z (s; F * ) ∩ U ∩ Q, indexed by m, for which lim m→∞ y n−1 m = x n−1 . (118) Using the parameterization from (100), the n'th component of sequence index m satisfies one of the following: for some i m ∈ {1, . . . , n − 1} and j m ∈ {1, . . . , n i m }, Since each α j m ,n i m (·) is real-analytic, it is continuous. Then, for y n,m satisfying (119), if lim m→∞ y n,m exists, we have lim m→∞ y n,m = α j,n i (x i ) for some i ∈ {1, . . . , n − 1} and j ∈ {1, . . . , n i }. Since V 0 is either empty or a single point, the number of possible values for lim m→∞ (y n ) m is at most However, since x ∈ U ∩ Q, where U ∩ Q is open, the set {t ∈ R | (x n−1 , x n + t) ∈ U ∩ Q} is uncountable. Thus, there exists t such that (x n−1 , x n + t) ∈ U ∩ Q is not the limit of any sequence in Z (s; F * ) ∩ U ∩ Q.

Discussion
This paper has considered vector-valued channels with additive Gaussian noise. Unlike much of the prior work in this area, the noise was not limited to having independent and identically distributed components. The support of the capacity-achieving input distribution was discussed when inputs were subjected to an even-moment radial constraint of order 2k. Furthermore, the inputs were either allowed to take any value in R n or restricted to a compact set. When the input alphabet was the entire space, R n , only the case k ≥ 2 was considered since, for k = 1, the optimal input distribution is well-known to be Gaussian.
The problem was framed as a convex optimization problem that was shown to be solved by a unique input distribution F * . The conditions for optimality yielded a realanalytic function s(·; F * ) whose zero set contained supp(F * ), the support of F * . Using the framework of an L 2 space that was weighted by the noise density, s(·; F * ) was simplified and shown to be non-constant on R n . Through geometric analysis of the zero set of s(·; F * ), supp(F * ) was shown to be contained in a countable union of single points and submanifolds of dimensions ranging 1, . . . , n − 1. When the input alphabet was compact, this union was further shown to be finite. Finally, it was determined that supp(F * ) has Lebesgue measure 0 and is nowhere dense in R n . This paper is an expansion of the work concerning even-moment input constraints in [14] to vector-valued channels that are not necessarily spherically symmetric. Viewed as a generalization of [12], it considers order 2k rather than second-moment radial constraints and includes R n as a possible input alphabet. Unlike prior work, it also provides geometric results on the supports of capacity-achieving inputs to spherically asymmetric channels.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Convexity and Compactness of Optimization Space
Theorem A1. The following properties hold for sets defined in Section 3.1: Proof. We first show the convexity of F n (A). Let F 1 , F 2 ∈ F n (A), t ∈ [0, 1], and To show convexity of P n (A, k, b) for b > 0, let F 1 , F 2 ∈ P n (A, k, b), t ∈ [0, 1] and F t = tF 1 + (1 − t)F 2 . Since P n (A, k, b) ⊆ F n (A), it suffices to show that F t satisfies the radial even-moment constraint: We now show the convexity of Q n (A, k, a). For any F 1 , (A, k, a). Hence, Q n (A, k, a) is convex.
It remains to show the compactness of P n (A, k, b) for any b > 0. Note that the Lévy-Prokhorov metric metrizes weak convergence in F (R n ) [25]; so, sequential compactness is equivalent to compactness. To prove compactness of P n (A, k, b), we first show relative compactness, which allows us to conclude that any sequence in P n (A, k, b) has a subsequence that converges to some F ∈ F n (A). Further, showing that F ∈ P n (A, k, b) will complete the proof.
Observe that each F ∈ F (R n ) is defined on the complete separable metric space R n equipped with Euclidean distance. By Prokhorov's Theorem (Theorem 3.2.1 of [25]), the relative compactness of P n (A, k, b) is equivalent to the tightness of P n (A, k, b); so, we will prove the latter.
To show the tightness of P n (A, k, b), let X ∼ F ∈ P n (A, k, b), > 0, and D = (a/ ) 1/2k . Then, applying Markov's inequality, This is a uniform upper-bound for F ∈ P n (A, k, b); so, P n (A, k, b) is tight and, thus, relatively compact. By the relative compactness of P n (A, k, b), any sequence {F m } ∞ m=0 ⊆ P n (A, k, b) has a subsequence {F m j } ∞ m j =0 that converges weakly to some F ∈ F (R n ). To show compactness, we must show that F ∈ P n (A, k, b).
Since each F m j ∈ P n (A, k, b), it follows that By Theorem A.3.12 of [26], since x 2k is non-negative and lower semicontinuous, Therefore, the limiting distribution F satisfies the radial even-moment constraint imposed by P n (A, k, b). When A = R n , we conclude that F ∈ P n (A, k, b). However, for the case that A is compact, we must also show almost surely that X ∈ A. For any index of the subsequence m j , A dF m j (x) = 1. (A11) By the Portmanteau Theorem [27], since A is closed,

Appendix B. Necessary Conditions for the Capacity-Achieving Distribution
Theorem A2. Recall that Suppose that F * solves the optimization problem in (26), where γ ≥ 0 is the Lagrange multiplier corresponding to the problem in (11). Then, the following are equivalent:

P.1
For every F ∈ Q n (A, k, a), and if x ∈ supp(F * ), then Proof. For any F ∈ Q n (A, k, a), integrating both sides of (A15) with respect to dF(·) yields that (P.2) implies (P.1). It remains to be shown that (P.1) implies (P.2). Suppose this implication is false-that is, (P.1) holds but there either exists v ∈ A for which or there exists w ∈ supp(F * ) such that contradicting (A14). Therefore, (A17) cannot be satisfied for any x ∈ A and we are left with the alternative that there exists w ∈ supp(F * ) ⊆ A for which (A18) hold-that is, By Lemma A7, the extension of h(·; F) to C n is continuous; hence, h(·; F) is continuous on R n . Since · 2k is continuous as well, there exists δ > 0 such that for every x ∈ B δ (w). Furthermore, since w ∈ supp(F * ), there exists such that P{X * ∈ B δ (w)} = > 0. Recall from (18) that and since γg k (F * ) = 0 (see (27)), Substituting (A23) and (A24), and noting that F * (B δ (w) ∩ A C ) = 0, yields where the first term of (A29) is due to (A22) and the second is due to the contradiction derived from (A17). The above is a contradiction, which completes the proof.
Lemma A3. Let β ≥ α ≥ 1 and Z be a random variable taking values in R n .
Lemma A4. For any b > 0 and F 0 , F 1 ∈ P n (R n , k, b), We proceed by first proving that Then, since the integrand is non-negative, the Tonelli-Fubini Theorem [18] justifies interchanging the order of integration in (A44) to conclude that (A43) and, hence, the left side of (A42) is finite. By Lemma A1, for any y ∈ R n , Initially considering only the inner integral in (A44), with the substitution u = y − x, yields where (A48) is due to the triangle inequality. Since E[ N 2 ] = tr(Σ) is finite, there are positive constants for which Substituting this and (A53) into (A44), we obtain

Appendix D. Properties of the Objective Functional
The aim of this section is to discuss the weak continuity, strict concavity, and weak differentiability of the objective function, for the optimization problem posed in (26). These properties are instrumental in the establishment and subsequent analysis of the convex optimization problem considered in Section 3. To support the proof of Theorem A2, we show that, for arbitrary b > 0, the required properties hold on P n (A, k, b).
Theorem A3. I(·) is weak continuous on P n (A, k, a).
Proof. For any F ∈ P n (A, k, a), we write where h Y (F) is finite by Lemma A4. Therefore, weak continuity of I(·) on P n (A, k, a) is equivalent to weak continuity of h Y (·) on P n (A, k, a). Let {F m } ∞ m=0 be a sequence in P n (A, k, a) converging weakly to F ∈ P n (A, k, a). By the Helly-Bray Theorem, since p N (·) is bounded and continuous, for any y ∈ R n . Therefore, by Scheffé's Lemma, the sequence {p(y; F m )} ∞ m=0 converges in total variation to p(y; F). It suffices to show that differential entropy is uniformly continuous over {p(·;F)|F ∈ P n (A, k, a)} with respect to the total variation metric.
The family of densities {p(·;F)|F ∈ P n (A, k, a)} is uniformly upper bounded. Furthermore, the corresponding random vectors, Y = X + N for some X ∼F ∈ P n (A, k, a), uniformly satisfy the bound given by Lemma A3. The result follows by Theorem 1 of [28].
Theorem A4. For any b > 0, I(·) is strictly concave on P n (A, k, b).
Theorem A5. g k (·) is convex on Q n (A, k, a).
Proof. Let t ∈ [0, 1] and F 0 , F 1 ∈ Q n (A, k, a). Then, g k (F 0 ) and g k (F 1 ) are finite and We make use of the following notion of a derivative of a function defined on a convex set Ω [14].
Definition A1. Define the weak derivative of L : Ω → R at F 0 in the direction F by whenever it exists.
Lemma A5. g k (·) is weakly differentiable on Q n (A, k, a). For any F 0 , F ∈ Q n (A, k, a), the weak derivative is finite and given by Proof. Let t ∈ [0, 1] and F 0 , F ∈ Q n (A, k, a). Then, noting that g k (F 0 ) and g k (F 1 ) are finite, Dividing by t and taking the limit as it goes to 0 gives (A69).
Lemma A6. I(·) is weakly differentiable on Q n (A, k, a). For any F 0 , F ∈ Q n (A, k, a), the weak derivative at F 0 in the direction of F is given by Proof. The proof largely follows Appendix E from [14]. The step that requires special attention is the application of the Dominated Convergence Theorem in (27) of [14]-that is, we would like to show the integrability of |(p(y; F) − p(y; F 0 )) ln ( 1 2 p(y; F 0 ))| ≤ |p(y; F) ln p(y; F 0 )| + |p(y; F 0 ) ln p(y; F 0 )| (A76) + (p(y; F) + p(y; F 0 )) ln 2.

Appendix E. Analycity of Marginal Entropy Density
Lemma A7. For any b > 0 and F ∈ P n (A, k, b), the extension of h(x; F) to z ∈ C n given by is continuous in z.

Proof.
Let z ∈ C n . Fix > 0 and consider B C n , (z), the ball of radius around z in C n . For any sequence {z m } ∞ m=0 ⊆ C n converging to z, there exists M ≥ 0 such that z m ∈ B C n , (z) for each m ≥ M. Therefore, it suffices to show that lim m→∞ h(z m ; F) = h(z; F) for each sequence {z m } ∞ m=0 ⊆ B C n , (z). Since the extension of p N (u) to C n is continuous, where (A80) is due to the Dominated Convergence Theorem, which will be justified next. Let y ∈ R n and z m = α m + iβ m ∈ B C n , (z). Prior to finding a dominating function for the entire integrand in (A80), we establish the following upper bound on |p N (y − z m )|: which is integrable with respect to y.
Lemma A8. For any F ∈ P n (A, k, a), h(x; F) has an analytic extension to an entire function on C n .
Proof. For convenience of notation, we will prove the case of n = 2 here. Consider the extension of h(x; F) to z ∈ C 2 : h(z; F) = − where, for i ∈ {1, 2}, We will show that h((z 1 , z 2 ); F) is an entire function in z 1 ∈ C for fixed z 2 . Similarly, by the symmetry of the problem, h((z 1 , z 2 ); F) is an entire function in z 2 ∈ C for fixed z 1 . We finally conclude, by Hartog's Theorem [29], that h((z 1 , z 2 ); F) is entire on C 2 . Therefore, it suffices to show that h((·, z 2 ); F) is entire, for which we use Morera's Theorem. Morera's Theorem requires that the function under consideration, in this case h(·, z 2 ; F), be continuous, which holds by Lemma A7. If, for any closed smooth curve θ(t), defined for 0 ≤ t ≤ 1, and any fixed z 2 ∈ C, p N 2 (y 2 − z 2 ) ln p((y 1 , y 2 ); F) d y 1 d y 2 |θ (t)| dt < ∞, then the order of integration in (A95) can be interchanged such that integration with respect to t is performed first. Under this condition, since the extension of p N 1 (u 1 ) = κ 1 e − u 2 1 2σ 2 1 (A98) to z 1 ∈ C is analytic, we obtain 1 0 κ 1 e − (y 1 −θ(t)) 2 2σ 2 1 p N 2 (y 2 − z 2 ) ln p((y 1 , y 2 ); F) θ (t) dt = 0, thereby fulfilling the condition for Morera's Theorem in (A96). It remains only to justify the application of the Fubini-Tonelli theorem by showing (A97).