Abstract
We investigate the support of a capacity-achieving input to a vector-valued Gaussian noise channel. The input is subjected to a radial even-moment constraint and is either allowed to take any value in or is restricted to a given compact subset of . It is shown that the support of the capacity-achieving distribution is composed of a countable union of submanifolds, each with a dimension of or less. When the input is restricted to a compact subset of , this union is finite. Finally, the support of the capacity-achieving distribution is shown to have Lebesgue measure 0 and to be nowhere dense in .
1. Introduction
In this paper, we consider the support of the capacity-achieving input to a vector-valued channel that is subject to additive non-degenerate Gaussian noise. Vector-valued channels are used in a variety of applications, including the complex-valued inputs and outputs of quadrature channels, which have alternate representations as two-dimensional real vectors. Larger antenna arrays enable Multiple-Input Multiple-Output (MIMO) channels, which have inputs of complex components. Additionally, noise with memory can be expressed by correlated noise components in a vector-valued channel.
Throughout the paper, the average input power is bounded to limit the consumption of environmental, battery, and monetary resources. Since the output of the amplifiers used in transmitters is severely distorted when the input is too large [,,] and signals that are too small can be challenging to produce, it is also of practical interest to restrict the input to an arbitrary compact set.
There has been appreciable prior effort dedicated to understanding the capacity-achieving input to vector-valued channels subject to average power constraints and restrictions to compact sets. Nevertheless, there are significant technical challenges in working with vector-valued inputs. Therefore, much of the work to this point has been limited to either one-dimensional channels [] or spherically symmetric channels [,,,], where the latter case ensures that the capacity-achieving distribution can be expressed as a univariate function of the radius. However, this restriction limits the scope of study to channels in which the input is constrained to a ball and the noise components are independent and identically distributed. In this paper, the only assumption made on the Gaussian noise distribution is that it is non-degenerate. We consider both cases, those in which inputs are restricted to arbitrary compact sets and those in which inputs are allowed to take any value in .
The power of a vector-valued signal is equivalent to the second moment of its Euclidean norm. A constraint on the fourth moment then has the practical interpretation of limiting the second moment of the instantaneous power. Furthermore, imposing a moment constraint of order ensures that the tails of the input distribution decay at least as quickly as a degree monomial. Therefore, increasing the even-moment constraint penalizes large inputs without imposing a strict cutoff. This motivates us to generalize the average power constraint by limiting some even moment of the input’s Euclidean norm.
The results in this paper apply to any combination of input constraints described above, except for the special case where the input is allowed to take any value in and is subject to a second-moment constraint. This case reduces to a classical result in which the capacity-achieving distribution is known to be Gaussian. For all other cases, we show that the capacity-achieving distribution is contained in a countable union of i-dimensional submanifolds, where i ranges . Furthermore, this union is finite when the input is restricted to a compact set. We then show that the support of the capacity-achieving distribution is a nowhere dense set with Lebesgue measure 0.
The paper is organized as follows. We first give a review of prior work in Section 2. Section 3.1, Section 3.2 and Section 3.3 provide intermediary results prior to the main results in Section 3.4. Section 4 concludes the paper.
2. Prior Work
Dating back to Shannon’s work in [], much of the research on continuous channels has focused on average power (equivalently, second-moment) constraints on the input. A transmitter’s inability to produce arbitrarily large powers then led to the consideration of additional peak power constraints, modeled by restricting the input almost surely to compact sets.
The first major result on amplitude-constrained channels considers a scalar Additive White Gaussian Noise (AWGN) model, both with and without a variance constraint []. In each case, the support of the capacity-achieving distribution has a finite number of points.
The use of the Identity Theorem for functions of a single complex variable is key to the argument of [] and many papers that follow. The theorem can be applied to any univariate analytic function that has an accumulation point of zeros. By contrast, the Identity Theorem in n complex dimensions requires an analytic function with an open set of zeros in . Therefore, to apply the Identity Theorem directly for , a random vector with support containing an open subset of must be considered. It was suspected by some authors that, since is not open in , no topological assumption on the support of the capacity-achieving distribution would be sufficient for this purpose [,,]. Therefore, many papers restrict their models to ones that maintain spherical symmetry so that the capacity-achieving distribution can be expressed as a one-dimensional function of radius (e.g., Refs. [,,,]).
In [] and [], the results of [] are extended to multivariate spherically symmetric channels, in n dimensions and 2 dimensions, respectively. In both papers, the inputs are subject to average and peak radial constraints. It is shown that it is optimal to concentrate the input on a finite number of concentric shells.
In [] and [], the number and positioning of optimal concentric shells under a peak radial constraint is studied. In [], the least restrictive amplitude constraint for which the optimal distribution is concentrated on a single sphere is found. In [], it is shown that the number of shells grows at most quadratically in the amplitude constraint. A similar result is found for under an additional average power constraint.
While much of the prior work has focused on spherically symmetric channels, some research has considered spherically asymmetric channels. The case of inputs constrained to arbitrary compact sets and subject to a finite number of quadratic cost constraints as well as non-degenerate multivariate Gaussian noise is considered in []. It is concluded that the support of the capacity-achieving distribution must be “sparse”—that is, there must exist a not identically zero analytic function that is 0 on the support of the capacity-achieving distribution. Assuming otherwise leads to a contradiction by the n-dimensional Identity Theorem and Fourier analysis. These results, while quite general, do not consider either inputs of unbounded support or inputs subject to higher-moment constraints. Furthermore, outside of the special cases of and spherically symmetric channels, they do not explore a characterization of sparse sets in .
In [], MIMO channels with inputs that are restricted to compact sets, yet have no average power constraints, are considered. Using the Real-Analytic Identity Theorem and steps similar to [], it is determined that the support of the optimal input distribution is nowhere dense in and has Lebesgue measure 0. For the case considered in [] that coincides with this setup, [] gives an instance of sparsity in terms of subsets of , rather than analytic functions.
There has also been work dedicated to generalizing the classic quadratic average cost constraint. In [], a scalar channel with the input subject to a combination of even-moment constraints and restrictions to compact or non-negative subsets of is studied. It is shown that, in most of the cases considered, the support has a finite number of points.
In [], a complex-valued non-dispersive optical channel is considered, where the input is subject to an average cost that grows super-quadratically in radius, a peak constraint, or both. The noise is taken to be circularly symmetric and, under these conditions, so is the optimal input. The number of concentric circles composing the support of the distribution is shown to be finite.
In this paper, we study an n-dimensional channel subject to non-degenerate Gaussian noise. The input can either take any value in or is restricted to a compact subset of , and its norm is subject to even-moment constraints. The noise need not be spherically symmetric.
This paper gives a characterization of the capacity-achieving distribution to spherically asymmetric channels under peak and average power constraints that improves on prior work in three respects. Firstly, when our cases overlap with [], our characterization of the capacity-achieving distribution is more detailed than the notion of sparsity used there. Secondly, our results apply to multivariate channels with inputs subject to even-moment constraints greater than 2. Thirdly, we consider both inputs that are restricted to compact sets and those that are allowed to take any value in .
3. Results
In this section, we consider -valued inputs subject to additive non-degenerate multivariate Gaussian noise. In Section 3.1, the capacity-achieving distribution, , is formulated as the objective of an optimization problem; its support is then framed in terms of the zero set of a certain real-analytic function, which is dependent on and referred to as . Section 3.2 finds an equivalent expression for , which is an intermediary step to showing in Section 3.3 that is non-constant. Section 3.4 uses the result that is non-constant to show that the support of the capacity-achieving distribution is contained in a countable union of submanifolds of dimensions in the range . This union is finite when the input is constrained to a compact subset of . It is then shown that the support of the capacity-achieving input has Lebesgue measure 0 and is nowhere dense in .
Appendix A is dedicated to showing the convexity and compactness of the optimization space used in Section 3.1. Appendix B establishes a pointwise characterization of , which justifies the definition of . Appendix C provides integrability results, which are used throughout the paper. Appendix D shows that the objective functional is weak continuous, strictly concave, and weak differentiable on the optimization space. Appendix E shows that has an analytic extension to . Finally, Appendix F supports Section 3.3 by finding bounds for certain functions.
As a first step towards defining the set of feasible input distributions, let be the set of finite Borel measures on . Note that is contained in the set of finite signed Borel measures on , which has an intrinsic vector space structure and can be equipped with a norm []. Since lies within a normed vector space, the convexity and compactness of its subsets can be discussed.
The possibility that the transmitter is unable to produce arbitrary signals in is modeled by restricting the input to an alphabet . Denote the set of distributions for which the associated random variable is almost surely in by
Two cases for are considered:
- ;
- is compact.
In addition to the restriction to , a radial even-moment constraint is associated with the input. For , the input must belong to the set
The resulting channel model, with input , is
where and are output and noise, respectively, and is an invertible matrix known to the transmitter and receiver. It is assumed that the noise covariance matrix is positive-definite.
We will simplify the analysis of (3) by showing that no generality is lost in assuming that , the n-dimensional identity matrix, and is diagonal. Since is positive-definite and is invertible, the positive-definite matrix can be diagonalized by the orthogonal matrix .
Now, multiplying the output in (3) by , the receiver obtains
where and the covariance matrix of is diagonal. Since is invertible, . Furthermore, since is orthogonal,
and the set is merely a rotated version of . Hence, no generality is lost by dropping the and adopting the following channel model for the remainder of the paper:
where and is diagonal with entries . The density of is denoted by .
3.1. Optimization Problem
By Theorem 3.6.2 of [], the capacity of the channel in (7) is given by the optimization problem
Since the relationship between , , and is known, the mutual information is a function of the distribution of alone. Thus, the mutual information induced between and will be denoted by . Similarly, we express the even-moment constraint in terms of a functional given by
where is equivalent to . Rewriting (8) in terms of and yields
Much of the appendix is dedicated to understanding properties of the problem presented in (10). It is shown in Theorem A1 that is convex and compact. Furthermore, by Theorems A3 and A4, is a weakly continuous and strictly concave function on . Therefore, the supremum is achieved by a unique input distribution (see, e.g., Appendix C of [])—that is,
We use the notation to describe a capacity-achieving input directly (i.e., ).
Before proceeding, we require some definitions and notations. In the first definition, and throughout the paper, for and , we denote the ball of radius r centered at by . We will denote the closure of by . The output density induced by an input is denoted .
Definition 1.
Let be a random variable with alphabet . Then, the support of is the set given by
If has distribution , we may alternatively refer to .
Definition 2.
For , the output differential entropy is given by
and the marginal entropy density at is given by
whenever the integrals exist.
The relationship between the differential entropy and marginal entropy density can be seen as follows. For any and , we have that
The equivalence of (15)–(17) is due to Fubini–Tonelli [] and Lemma A4. If , then
Lastly, define
and let be given by
Since , we have that is finite for all and conclude that is also finite for all . Furthermore, by Lemma A8, can be extended to a complex analytic function for all ; hence, it is continuous.
The remainder of Section 3.1 consists of two steps:
To establish (22), we will first use a Lagrange multiplier to reformulate (11) as an unconstrained problem over . We will then obtain (22) by taking the weak derivative of the resulting objective functional and applying a Karush–Kuhn–Tucker condition for optimality. We choose to work in the space since, when , the functionals and are not weakly differentiable on the larger space .
Since , (11) can equivalently be written as
where is the same as in (11). By Theorem A5, is convex. Moreover, letting be a Heaviside step function at , is an interior point of the feasible region since . Since is convex by Lemma A1, there exists such that
where
and (see e.g., Appendix C of []). Furthermore, for an arbitrary , . Therefore, for this choice of , we also have
By Theorems A5 and A6, has a weak derivative at in the direction of any given by
where (31) is due to . Substituting gives
where the differential entropy of the noise is finite since is positive definite.
Now, is the difference between a strictly concave function (see Theorem A4) and a convex function (due to Theorem A5 and the non-negativity of ). Therefore, is strictly concave and is optimal if and only if, for all ,
However, is arbitrary and each satisfies for some . Therefore, is optimal if and only if, for all ,
which is the statement we sought to show in (22).
The condition (34) is on the capacity-achieving distribution itself. Since our objective is to characterize , we find an equivalent condition to (34) to describe the behavior of at individual points in the input alphabet . Thus, by (34) and Theorem A2, for all ,
and if , then
The rest of Section 3 is dedicated to exploiting the relationship
where is the zero set of .
3.2. Hilbert Space and Hermite Polynomial Representation
In this subsection, an equivalent expression for (36) is found by viewing the integral as an inner product in a Hilbert space and writing in terms of a Hermite polynomial basis for that space. Hermite polynomial bases are well-suited to analysis of Gaussian noise channels and they have been used in a number of information-theoretic papers (see, e.g., Refs. [,]).
Consider the Hilbert space
equipped with inner product
The inner product’s subscript is omitted when the space can be inferred.
Since the components of are independent, with having variance , the density of factors into
We will construct an orthogonal basis for from orthogonal bases for the spaces .
First, with , an orthogonal basis for is given by the Hermite polynomials [], which are defined through the generating function
For any , the mth Hermite polynomial has degree m and a positive leading coefficient. Next, for each and , define the stretched Hermite polynomials
The inner product of is related to that of by
where and are versions of and that have been stretched horizontally by a factor of . Substituting any and into (44) shows that the set is orthogonal. Furthermore, if there was a non-zero function (in an sense) that had a zero inner product with for each , then a stretched version of this function would also have a zero inner product with for each m. This would contradict the completeness of the Hermite polynomials in ; hence, forms a basis for . Lastly, the stretched Hermite polynomials have the generating function
Now, is isomorphic to the tensor product of the spaces, ; consequently, forms an orthonormal basis for [], where
Since, by Lemma A2, , there exist constants for which
where equality is in an sense. Then, substituting (47), and using the notations
and
for and , we write
Substituting (49) and (55) into the integral term in (36) yields
where , with
This simplification to a polynomial will be helpful since the cost function associated with the even-moment constraint is also a polynomial. This relationship is exploited in Section 3.3.
3.3. Non-Constancy of
Recall the relationship from (37). Since , has at least one zero, it is constant if and only if This subsection is dedicated to showing the latter equivalent condition. The immediate implication is that is a strict subset of ; however, the fact that is a non-zero real-analytic function will be used in Section 3.4 to prove the main results.
By way of contradiction, suppose that, for all , . Substituting (59) into (36), this is equivalent to
for all . The discussion proceeds in two cases: and .
- Case :
In the case that and , is known to be Gaussian [] and there is no contradiction with (61). Therefore, for , we focus only on compact input alphabets .
With and , ref. (61) reduces to
for all . Let be the i′th row of the identity matrix and let be the all zero vector. Since (62) holds for all , matching coefficients gives
Since, for each and , has degree and a positive leading coefficient,
also has degree in and the unique term with total degree is , which has a positive coefficient. Therefore, the polynomials present in the sum are of the form
and, for ,
The constants and are positive, while and are real. Substituting this and the identity into (49) yields
or equivalently,
By definition, ; however, results in a constant density on , which is invalid. Then, it must be the case that . Thus, the output achieved by , has independent Gaussian components. Since and are independent and is an n-variate Gaussian random variable, must either be an n-variate Gaussian random variable or be almost surely equal to some . In the former case, violates the stipulation that the input alphabet is compact and contradicts the assumption that is identically 0. In the latter case, is trivial and satisfies the main results of the paper.
- Case :
For the case , we derive a contradiction to (61) using results on the rate of decay of a function compared with that of its Fourier transform to conclude that is not identically 0.
Lemma 1.
Let have, for some , a characteristic function satisfying
for all . Let be a random variable independent of . Then, the characteristic function of satisfies, for all ,
Proof.
By the independence of and , and the fact that characteristic functions have pointwise moduli upper-bounded by 1,
□
Lemma 2.
Let have, for some constant , a characteristic function satisfying
for all . Let be a random variable independent of and have density . If there exist positive constants α and K such that, for all ,
then .
Proof.
Apply Lemma 1 and Theorem 4 of [], noting that an identically 0 function cannot be a density. □
We make use of Lemma 2 by setting , , and and deriving a contradiction to the assumption that is identically 0. Note that, using Rayleigh quotients, the modulus of the characteristic function of can be upper-bounded for any by
That is, the characteristic function of satisfies (75) with .
To complete the contradiction, we show that there exists and such that satisfies the bound in (76). The assumption that is identically 0 yields (61); substituting the Multinomial Theorem,
By coefficient matching in (78), the set of non-zero coefficients, other than , is indexed by the set
Furthermore, for ,
Therefore, substituting (80) into (49),
for some positive constant . As with the case , results in a constant output density over and can be disregarded as a possibility. Thus, for each , we have
With and , showing that there exists for which (76) holds, is equivalent to showing that
is bounded. This, in turn, is equivalent to showing that the polynomial in the exponent,
is upper bounded. We proceed by considering the degrees of the terms of to determine the behavior of (84) as increases.
For each and , has degree in . Furthermore, has total degree
and the unique highest degree term, , has coefficient . Note that, since , and by the definition of , has total degree . Hence, ref. (84) can be rewritten as , where
and is the sum of the remaining terms, each with a total degree of at most .
Note the following:
- For each , and, by Lemma A9, the minimal value of —evaluated on a sphere of radius —is at least .
- For each , we have that . Indeed, for each , by (82), , and ; further, for each , is even.
- The maximum value of , evaluated on a sphere of radius , is at most for some —that is, each term of is either of the form or for some , and , where . Lemma A10 shows that these are no more than or .
We conclude that, since and for all ,
Thus, since is a continuous function that satisfies (90), it is bounded from above. Let and
Then, for all ,
Recall that, with , the smallest eigenvalue of , and , the characteristic function of satisfies (75). Let and choose according to (91). Then, satisfies (92), yet . Hence, the bound on the characteristic function of given by (77) and the bound on the density of given in (92) contradict Lemma 2. Therefore, the coefficient matching equation (78) cannot hold for all and we conclude that, for , cannot be identically 0 on .
We summarize the results of the two cases, and , in a theorem.
Theorem 1.
Suppose that either
- 1.
- is compact, or
- 2.
- , with .
Then, either for some or
An immediate consequence of Theorem 1 is that is a strict subset of . Recall from Section 3.1 that has an analytic extension to . Therefore, Theorem 1 shows that is “sparse" in the sense used by []—that is, there exists a non-zero function with an analytic extension to that is zero on . However, the primary importance of Theorem 1 is as an intermediary result that is used in Section 3.4 to obtain a better understanding of the structure of .
3.4. Main Results
In this section, we use geometry to show that is contained in a countable disjoint union of submanifolds of dimensions ranging . Furthermore, this union is finite when is compact. We then show that has Lebesgue measure 0 and is nowhere dense in .
The discussions in this section consider subsets of a vector’s components; so, for and , we introduce the notation
Recall from (37) in Section 3.1 that
Since, by Lemma A8, has an analytic extension to , it is real-analytic, which motivates us to study the geometry of zero sets of real-analytic functions. We start by restating Theorem 6.3.3 of [] to the level that is needed in this paper.
Theorem 2
(Structure Theorem). Let be a real-analytic function, where is not identically 0 in . After a rotation of the coordinates , there exist constants , such that with
we have
where is either empty or contains only the origin and , , is a finite disjoint union of i-dimensional submanifolds—that is, for each , there exists for which
where each is an i-dimensional submanifold. Furthermore, letting
there exists an open set and real-analytic functions , , on for which
We apply Theorem 2 to characterize the zero set of in the form of (97) and obtain the following result.
Theorem 3.
Suppose that either
- 1.
- is compact, or
- 2.
- , with .
Then,
where is a countable union of isolated points and , , is a countable disjoint union of i-dimensional submanifolds. Furthermore, if is compact, these unions are finite.
Proof.
First, note that, by Theorem 1, either for some or is not identically 0 on . In the former case, the result is trivially true; so, assume that is not identically 0 on . Therefore, for any , we can translate by and rotate its coordinate system to apply Theorem 2—that is, there exists a sufficiently small open set around such that
where the values are as in Theorem 2.
Since is dense in ,
Furthermore, if is compact, the open cover has a finite subcover —that is,
Defining the index set
we obtain
Since, for each , is either empty or a single point,
is a countable set of points and is finite when is compact. Furthermore, each , where , is itself a finite union of i-dimensional submanifolds. Hence,
is a countable union of i-dimensional submanifolds. When is compact, this union is also finite. □
Note that Theorem 3 agrees with the results of [] when the cases overlap. Indeed, consider the case in which is a ball centered at the origin, , and the noise covariance matrix , where is the identity matrix and . Then, ref. [] shows that the capacity-achieving distribution is supported on a finite number of concentric -spheres. Each -sphere is an dimensional submanifold.
In the next two theorems, we show that has Lebesgue measure 0 and is nowhere dense in .
Theorem 4.
Suppose that either
- 1.
- is compact, or
- 2.
- , with .
Let denote the n-dimensional Lebesgue measure. Then,
Proof.
By Theorem 3, we have
where is countable. Note that for each , is either empty or a single point; so, . Furthermore, for each and , is a finite disjoint union of i-dimensional submanifolds, and for , each submanifold has Lebesgue measure 0. Therefore, . □
We will now define the notion of a subset being nowhere dense in its superset and show that is nowhere dense in .
Definition 3.
Let . A set is said to be dense in B if, for every , there exists a sequence such that
Definition 4.
Let . A set is called nowhere dense in B if, for every open set , is not dense in U.
Theorem 5.
Suppose that either
- 1.
- is compact, or
- 2.
- , with .
Then, is nowhere dense in .
Proof.
By Theorem 1, either for some or
Since, in the former, case is nowhere dense in , assume the latter and let (115) hold. Let be a non-empty open set; we will show the result by proving that is not dense in U.
Fix . Translating by , rotating the coordinate system and applying Theorem 2 shows that there exists a sufficiently small open set Q containing on which
It suffices to show the existence of a point of the form that is not the limit of any sequence in . Let be a convergent sequence in , indexed by m, for which
Using the parameterization from (100), the n’th component of sequence index m satisfies one of the following:
- , or
- for some and ,
However, since , where is open, the set is uncountable. Thus, there exists t such that is not the limit of any sequence in . □
4. Discussion
This paper has considered vector-valued channels with additive Gaussian noise. Unlike much of the prior work in this area, the noise was not limited to having independent and identically distributed components. The support of the capacity-achieving input distribution was discussed when inputs were subjected to an even-moment radial constraint of order . Furthermore, the inputs were either allowed to take any value in or restricted to a compact set. When the input alphabet was the entire space, , only the case was considered since, for , the optimal input distribution is well-known to be Gaussian.
The problem was framed as a convex optimization problem that was shown to be solved by a unique input distribution . The conditions for optimality yielded a real-analytic function whose zero set contained , the support of . Using the framework of an space that was weighted by the noise density, was simplified and shown to be non-constant on . Through geometric analysis of the zero set of , was shown to be contained in a countable union of single points and submanifolds of dimensions ranging . When the input alphabet was compact, this union was further shown to be finite. Finally, it was determined that has Lebesgue measure 0 and is nowhere dense in .
This paper is an expansion of the work concerning even-moment input constraints in [] to vector-valued channels that are not necessarily spherically symmetric. Viewed as a generalization of [], it considers order rather than second-moment radial constraints and includes as a possible input alphabet. Unlike prior work, it also provides geometric results on the supports of capacity-achieving inputs to spherically asymmetric channels.
Author Contributions
Conceptualization, J.E., R.R.M. and P.M.; methodology, J.E., R.R.M. and P.M.; validation, J.E., R.R.M. and P.M.; formal analysis, J.E., R.R.M. and P.M.; investigation, J.E.; resources, J.E., R.R.M. and P.M.; writing—original draft preparation, J.E.; writing—review and editing, J.E., R.R.M. and P.M.; supervision, R.R.M. and P.M.; project administration, R.R.M. and P.M.; funding acquisition, J.E., R.R.M. and P.M. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported in part by grants from the Natural Sciences and Engineering Research Council of Canada (NSERC) through a CGS-M Grant (J.E.) and Discovery Grants (R.R.M. and P.M.).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Convexity and Compactness of Optimization Space
Theorem A1.
The following properties hold for sets defined in Section 3.1:
- is convex;
- for any , is convex and compact;
- is convex.
Proof.
We first show the convexity of . Let , , and . Then, since
is convex.
To show convexity of for , let , and . Since , it suffices to show that satisfies the radial even-moment constraint:
We now show the convexity of . For any , there exists for which . Since is convex, . Hence, is convex.
It remains to show the compactness of for any . Note that the Lévy–Prokhorov metric metrizes weak convergence in []; so, sequential compactness is equivalent to compactness. To prove compactness of , we first show relative compactness, which allows us to conclude that any sequence in has a subsequence that converges to some . Further, showing that will complete the proof.
Observe that each is defined on the complete separable metric space equipped with Euclidean distance. By Prokhorov’s Theorem (Theorem 3.2.1 of []), the relative compactness of is equivalent to the tightness of ; so, we will prove the latter.
To show the tightness of , let , , and . Then, applying Markov’s inequality,
This is a uniform upper-bound for ; so, is tight and, thus, relatively compact.
By the relative compactness of , any sequence has a subsequence that converges weakly to some . To show compactness, we must show that .
Since each , it follows that
By Theorem A.3.12 of [], since is non-negative and lower semicontinuous,
Therefore, the limiting distribution F satisfies the radial even-moment constraint imposed by .
When , we conclude that . However, for the case that is compact, we must also show almost surely that . For any index of the subsequence ,
By the Portmanteau Theorem [], since is closed,
□
Appendix B. Necessary Conditions for the Capacity-Achieving Distribution
Theorem A2.
Recall that
Suppose that solves the optimization problem in (26), where is the Lagrange multiplier corresponding to the problem in (11). Then, the following are equivalent:
- P.1
- For every ,
- P.2
- For all ,
Proof.
For any , integrating both sides of (A15) with respect to yields that (P.2) implies (P.1).
It remains to be shown that (P.1) implies (P.2). Suppose this implication is false—that is, (P.1) holds but there either exists for which
or there exists such that
If (A17) holds, let and let , where is the Heaviside step function. Then, and
contradicting (A14). Therefore, (A17) cannot be satisfied for any and we are left with the alternative that there exists for which (A18) hold—that is,
By Lemma A7, the extension of to is continuous; hence, is continuous on . Since is continuous as well, there exists such that
for every . Furthermore, since , there exists such that .
Appendix C. Integrability Results
Lemma A1.
Let and . Then, there exist positive constants η and κ for which
for any .
Proof.
With , , and
let
Then, for any , . Since is continuous and has no local maxima,
□
Lemma A2.
For any ,
Proof.
The result follows by Lemma A1. □
Lemma A3.
Let and be a random variable taking values in . If , then +1.
Proof.
Since , we have
□
Lemma A4.
For any and ,
Proof.
Note that
We proceed by first proving that
Then, since the integrand is non-negative, the Tonelli–Fubini Theorem [] justifies interchanging the order of integration in (A44) to conclude that (A43) and, hence, the left side of (A42) is finite.
By Lemma A1, for any ,
Initially considering only the inner integral in (A44), with the substitution , yields
where (A48) is due to the triangle inequality. Since is finite, there are positive constants
for which
Furthermore, since and , then by Lemma A3, . Substituting this and (A53) into (A44), we obtain
□
Appendix D. Properties of the Objective Functional
The aim of this section is to discuss the weak continuity, strict concavity, and weak differentiability of the objective function,
for the optimization problem posed in (26). These properties are instrumental in the establishment and subsequent analysis of the convex optimization problem considered in Section 3. To support the proof of Theorem A2, we show that, for arbitrary , the required properties hold on .
Theorem A3.
is weak continuous on .
Proof.
For any , we write
where is finite by Lemma A4. Therefore, weak continuity of on is equivalent to weak continuity of on .
Let be a sequence in converging weakly to . By the Helly–Bray Theorem, since is bounded and continuous,
for any . Therefore, by Scheffé’s Lemma, the sequence converges in total variation to . It suffices to show that differential entropy is uniformly continuous over with respect to the total variation metric.
The family of densities is uniformly upper bounded. Furthermore, the corresponding random vectors, for some , uniformly satisfy the bound
given by Lemma A3. The result follows by Theorem 1 of []. □
Theorem A4.
For any , is strictly concave on .
Proof.
See Appendix E of []. □
Theorem A5.
is convex on .
Proof.
Let and . Then, and are finite and
□
We make use of the following notion of a derivative of a function defined on a convex set [].
Definition A1.
Define the weak derivative of at in the direction F by
whenever it exists.
Lemma A5.
is weakly differentiable on . For any , the weak derivative is finite and given by
Proof.
Let and . Then, noting that and are finite,
Dividing by t and taking the limit as it goes to 0 gives (A69). □
Lemma A6.
is weakly differentiable on . For any , the weak derivative at in the direction of F is given by
Proof.
The proof largely follows Appendix E from []. The step that requires special attention is the application of the Dominated Convergence Theorem in (27) of []—that is, we would like to show the integrability of
Since , there exists for which . Then, (A77) follows by Lemma A4. □
Appendix E. Analycity of Marginal Entropy Density
Lemma A7.
For any and , the extension of to given by
is continuous in .
Proof.
Let . Fix and consider , the ball of radius around in . For any sequence converging to , there exists such that for each . Therefore, it suffices to show that for each sequence .
Since the extension of to is continuous,
where (A80) is due to the Dominated Convergence Theorem, which will be justified next.
Let and . Prior to finding a dominating function for the entire integrand in (A80), we establish the following upper bound on :
Now, by Lemma A4,
which is integrable with respect to . □
Lemma A8.
For any , has an analytic extension to an entire function on .
Proof.
For convenience of notation, we will prove the case of here.
Consider the extension of to :
where, for ,
We will show that is an entire function in for fixed . Similarly, by the symmetry of the problem, is an entire function in for fixed . We finally conclude, by Hartog’s Theorem [], that is entire on . Therefore, it suffices to show that is entire, for which we use Morera’s Theorem.
Morera’s Theorem requires that the function under consideration, in this case , be continuous, which holds by Lemma A7. If, for any closed smooth curve , defined for , and any fixed ,
then, by Morera’s Theorem, is entire. Furthermore, by the Fubini–Tonelli Theorem [], if
then the order of integration in (A95) can be interchanged such that integration with respect to t is performed first. Under this condition, since the extension of
to is analytic, we obtain
thereby fulfilling the condition for Morera’s Theorem in (A96). It remains only to justify the application of the Fubini–Tonelli theorem by showing (A97).
To upper bound the integrand on the left side of (A97), let be sufficiently large such that
By Lemma A1, there exists such that, for all ,
We proceed by splitting the integral with respect to into two intervals:
For , let . Let and note that, since ,
From (A106), we obtain the upper bound
Therefore, substituting (A105), for any ,
for some constants . Applying similar reasoning when shows that there are constants for which
We now integrate with respect to . For any and ,
Therefore, is uniformly bounded over and
□
Appendix F. Polynomial Bounds
Lemma A9.
The function
satisfies .
Proof.
First, note that, for any and satisfying , we have
Then, for any , we obtain the lower bound
□
Lemma A10.
For any , the function
satisfies , where .
Proof.
For ,
□
References
- Alireza Banani, S.; Vaughan, R.G. Compensating for Non-Linear Amplifiers in MIMO Communications Systems. IEEE Trans. Antennas Propag. 2012, 60, 700–714. [Google Scholar] [CrossRef]
- Liang, C.P.; Jong, J.H.; Stark, W.E.; East, J.R. Nonlinear amplifier effects in communications systems. IEEE Trans. Microw. Theory Tech. 1999, 47, 1461–1466. [Google Scholar] [CrossRef]
- Raich, R.; Zhou, G.T. On the modeling of memory nonlinear effects of power amplifiers for communication applications. In Proceedings of the 2002 IEEE 10th Digital Signal Processing Workshop, 2002 and the 2nd Signal Processing Education Workshop, Pine Mountain, GA, USA, 16 October 2002; pp. 7–10. [Google Scholar] [CrossRef]
- Smith, J.G. The information capacity of amplitude- and variance-constrained sclar Gaussian channels. Inf. Control. 1971, 18, 203–219. [Google Scholar] [CrossRef]
- Rassouli, B.; Clerckx, B. On the Capacity of Vector Gaussian Channels With Bounded Inputs. IEEE Trans. Inf. Theory 2016, 62, 6884–6903. [Google Scholar] [CrossRef]
- Shamai, S.; Bar-David, I. The capacity of average and peak-power-limited quadrature Gaussian channels. IEEE Trans. Inf. Theory 1995, 41, 1060–1071. [Google Scholar] [CrossRef]
- Dytso, A.; Al, M.; Poor, H.V.; Shamai Shitz, S. On the Capacity of the Peak Power Constrained Vector Gaussian Channel: An Estimation Theoretic Perspective. IEEE Trans. Inf. Theory 2019, 65, 3907–3921. [Google Scholar] [CrossRef]
- Dytso, A.; Yagli, S.; Poor, H.V.; Shamai Shitz, S. The Capacity Achieving Distribution for the Amplitude Constrained Additive Gaussian Channel: An Upper Bound on the Number of Mass Points. IEEE Trans. Inf. Theory 2020, 66, 2006–2022. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Sommerfeld, J.; Bjelakovic, I.; Boche, H. On the boundedness of the support of optimal input measures for Rayleigh fading channels. In Proceedings of the 2008 IEEE International Symposium on Information Theory, Toronto, ON, Canada, 6–11 July 2008; pp. 1208–1212. [Google Scholar] [CrossRef][Green Version]
- Dytso, A.; Goldenbaum, M.; Shamai, S.; Poor, H.V. Upper and Lower Bounds on the Capacity of Amplitude-Constrained MIMO Channels. In Proceedings of the GLOBECOM 2017—2017 IEEE Global Communications Conference, Singapore, 4–8 December 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Chan, T.H.; Hranilovic, S.; Kschischang, F.R. Capacity-achieving probability measure for conditionally Gaussian channels with bounded inputs. IEEE Trans. Inf. Theory 2005, 51, 2073–2088. [Google Scholar] [CrossRef]
- Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shamai (Shitz), S. Amplitude Constrained MIMO Channels: Properties of Optimal Input Distributions and Bounds on the Capacity. Entropy 2019, 21, 200. [Google Scholar] [CrossRef]
- Fahs, J.J.; Abou-Faycal, I.C. Using Hermite Bases in Studying Capacity-Achieving Distributions Over AWGN Channels. IEEE Trans. Inf. Theory 2012, 58, 5302–5322. [Google Scholar] [CrossRef]
- Fahs, J.; Tchamkerten, A.; Yousefi, M.I. On the Optimal Input of the Nondispersive Optical Fiber. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 131–135. [Google Scholar] [CrossRef]
- Bhaskara Rao, K.; Bhaskara Rao, M. Theory of Charges; Pure and Applied Mathematics; Academic Press: New York, NY, USA, 1983. [Google Scholar]
- Han, T.S. Information-Spectrum Methods in Information Theory; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar] [CrossRef]
- Dudley, R.M. Integration. In Real Analysis and Probability, 2nd ed.; Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 2002; pp. 114–151. [Google Scholar] [CrossRef]
- Rose, K. A mapping approach to rate-distortion computation and analysis. IEEE Trans. Inf. Theory 1994, 40, 1939–1952. [Google Scholar] [CrossRef]
- Johnston, W. The Weighted Hermite Polynomials Form a Basis for L2(R). Am. Math. Mon. 2014, 121, 249–253. [Google Scholar] [CrossRef]
- REED, M.; SIMON, B. II—Hilbert Spaces. In Methods of Modern Mathematical Physics; Reed, M., Simon, B., Eds.; Academic Press: New York, NY, USA, 1972; pp. 36–66. [Google Scholar] [CrossRef]
- Alajaji, F.; Chen, P. An Introduction to Single-User Information Theory; Springer Undergraduate Texts in Mathematics and Technology; Springer: Singapore, 2018. [Google Scholar]
- Sitaram, A.; Sundari, M.; Thangavelu, S. Uncertainty principles on certain Lie groups. Proc. Indian Acad. Sci. Math. Sci. 1995, 105, 135–151. [Google Scholar] [CrossRef]
- Krantz, S.; Parks, H. A Primer of Real Analytic Functions; Advanced Texts Series; Birkhäuser: Boston, MA, USA, 2002. [Google Scholar]
- Shiryaev, A.; Chibisov, D. Probability-1; Graduate Texts in Mathematics; Springer: New York, NY, USA, 2016. [Google Scholar]
- Dupuis, P.; Ellis, R. A Weak Convergence Approach to the Theory of Large Deviations; Wiley Series in Probability and Statistics; Wiley: New York, NY, USA, 1997. [Google Scholar]
- Billingsley, P. Convergence of Probability Measures; Wiley Series in Probability and Statistics; Wiley: New York, NY, USA, 2013. [Google Scholar]
- Ghourchian, H.; Gohari, A.; Amini, A. Existence and Continuity of Differential Entropy for a Class of Distributions. IEEE Commun. Lett. 2017, 21, 1469–1472. [Google Scholar] [CrossRef]
- Gunning, R.; Rossi, H. Analytic Functions of Several Complex Variables; Ams Chelsea Publishing, AMS Chelsea Pub.: Englewood Cliffs, NJ, USA, 2009. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).