1. Introduction
We consider a discrete-time additive real-valued white Gaussian noise (AWGN) channel subject to a peak-power constraint. The channel output is given by
where the input random variable
X satisfies the constraint
almost surely (a.s.), and
Z is a standard normal random variable independent of
X. The capacity under the amplitude constraint is
We denote by
a capacity-achieving input random variable and by
the corresponding induced output random variable. In general, both the exact value of the capacity
and the precise structure of the capacity-achieving distribution
remain unknown.
1.1. Problem Formulation
In contrast to prior work focusing on exact optimality, we study the minimal support size required to achieve capacity up to an
-gap. Specifically, for
, define
While the precise scaling of
remains unknown, this relaxed formulation turns out to be significantly more tractable. In particular, we are able to characterize the exact scaling of
for a range of regimes. For example, when the gap decays polynomially with
A, i.e.,
for some
, we obtain a sharp characterization of
.
1.2. Literature Review
The literature on the amplitude constraint channel is large and we do not try to survey it fully and only mention key relevant results. For a comprehensive review, interest readers are referred to [
1,
2,
3] and references therein.
Studying the capacity of the amplitude-constrained AWGN channel is a classical problem in information theory, originating in the work of Shannon [
4]. A fundamental result by Smith [
5,
6] shows that the capacity-achieving input distribution is discrete with finitely many mass points, in contrast to the average-power-constrained setting where the capacity-achieving distribution is Gaussian. Subsequent work further characterized the structure of the optimal input, including transition thresholds where binary and ternary constellations are optimal [
7]. However, the precise scaling of the support size with the amplitude constraint remains unresolved: the best-known non-asymptotic bounds, [
8] and [
1], place it between
and
, respectively.
Somewhat intriguingly, numerical investigations have suggested a range of alternative asymptotic behaviors. In particular, Mattingly et al. [
9] reported an empirical scaling of order
as
, based on experiments involving not only the AWGN channel but also several non-Gaussian models, including the binomial channel and certain two-dimensional channels. A follow-up work by Abbott and Machta [
10] provided a heuristic, physics-inspired justification for this scaling. These findings coexist with Zhang’s spacing-based heuristic, which suggests growth of order
([
11] pp. 91, 95, 96), while earlier work conjectured linear scaling [
1], a claim that has since been disproved [
8].
We argue that this apparent diversity of scaling laws can be naturally interpreted through the lens of
-optimality. Indeed, all numerical procedures implicitly operate with a finite tolerance, effectively computing inputs that are only
-capacity-achieving for some algorithm-dependent
. From this perspective, different observed scalings correspond to different regimes of
, rather than intrinsic properties of the exact optimizer. This viewpoint also explains the well-known numerical sensitivity of the problem: in the large-
A regime, the optimal output distribution becomes nearly uniform in the interior, and deviations that distinguish competing inputs occur at a scale comparable to numerical precision. Consequently, implementations based on, e.g., the Blahut–Arimoto algorithm [
12,
13], which involve repeated numerical integrations of log-densities, are particularly susceptible to bias and instability. As a result, different numerical tolerances and methodologies can lead to markedly different empirical scaling laws.
In parallel, a large body of work has developed capacity bounds via entropy methods, duality, and estimation-theoretic representations; see [
14,
15,
16] and references therein. There is also a large body of work that focuses on showing discreteness of capacity-achieving inputs for non-Gaussian channels [
17,
18,
19,
20,
21,
22,
23,
24,
25].
1.3. Outline and Contributions
Section 2 collects the main technical tools used throughout the paper, including stability bounds for entropy, approximation results for Gaussian mixtures, and properties of the wrapping operation. These ingredients form the backbone of both the achievability and converse arguments.
Section 3 presents the main results together with their derivations. In particular, we obtain sharp bounds on
and provide a complete characterization in the regime where
decays at most polynomially in
A. We also discuss the behavior in the exponential regime and highlight the transition in scaling.
Section 4 concludes the paper with a summary of the main findings and a discussion of open problems.
We conclude this section by presenting relevant notation.
1.4. Notation
Throughout the paper, the deterministic scalar quantities are denoted by lowercase letters and random variables are denoted by uppercase letters.
We denote the distribution of a random variable
X by
. The support set of
is denoted and defined as
The notation
, depending on the context, denotes either absolute value or cardinality of the set. For example,
denotes the size of the support of
. All logarithms are taken with base
. The density of a standard normal will be denoted by
.
We denote the differential entropy of a continuous random variable
X by
. Given two probability distributions
P and
Q with probability densities functions (pdfs)
p and
q, respectively, we will require the following distances,
with the understanding that the relative entropy and
are equal to infinity if
P is not absolutely continuous with respect to
Q. Finally,
.
2. Tools and Preliminaries
In this section, we collect several technical ingredients that underlie our analysis. At a high level, our approach proceeds by approximating the output distribution induced by the capacity-achieving input using finite Gaussian mixtures, and then quantifying the resulting loss in mutual information. This leads to three main components: (i) a stability bound for entropy in terms of -divergence, (ii) approximation guarantees for Gaussian mixtures, and (iii) a wrapping argument that allows us to compare distributions to the uniform law on a circle.
2.1. Entropy Loss via
The first ingredient is a quantitative stability bound for differential entropy under -perturbations. This allows us to convert approximation guarantees at the level of densities into bounds on mutual information.
Lemma 1
(Entropy loss controlled by
).
Let be densities on such that and . Then,where . Proof. Write
so that
and
Moreover,
Then
Therefore,
By Cauchy–Schwarz,
For the second term, using
for all
, we get
Hence,
Combining the two bounds gives
which proves the claim. □
The key feature of Lemma 1 is that it provides control of entropy loss in terms of -divergence. In particular, small -error directly translates into a small loss in mutual information, up to a multiplicative factor depending on .
We will apply this lemma in a setting where f corresponds to the output density induced by the capacity-achieving input, and g corresponds to an approximating Gaussian mixture. The next result provides a uniform bound on the prefactor in this setting.
Lemma 2.
Fix . Let where and be supported on . Then, Proof. Fix
. First, notice that
since
. Since
is strictly decreasing in
, for
we have
; hence
. Averaging gives
Thus,
Consequently,
where (
26) follows from the bound in (
24) and from
thanks to (
22), and (
27) follows from sequential application of the inequality
, the bound
, and the evaluation of the moments of
Z. □
Lemma 2 shows that the entropy sensitivity grows at most quadratically in A. Combined with Lemma 1, this implies that achieving a small -approximation error is sufficient to ensure near optimality in mutual information.
2.2. Finite-Mixture Approximations
The second ingredient is a sharp approximation result for Gaussian mixtures. Recall that for any probability measure
P supported on
, the induced output density takes the form
i.e., a Gaussian location mixture.
Our goal is to approximate such densities using mixtures supported on finitely many points. The following result, by Ma, Wu, and Yang, provides near-optimal bounds on this approximation error.
Lemma 3
(Ma–Wu–Yang approximation theorem [
26]).
Let and let denote the set of probability measures supported on . Let denote the set of probability measures supported on at most m points. Define the worst-case best approximation errorwhere is the mixture density (29). Then there exists a universal constant such that for all and , The key takeaway from Lemma 3 is that Gaussian mixtures supported on m points can approximate arbitrary mixtures supported on with exponentially small -error. Moreover, the approximation exhibits two distinct regimes:
A quadratic regime, where the error behaves like ;
A large-m regime, where the error behaves like .
The dichotomy in Lemma 3 comes from the two approximation mechanisms used in the proof of [
26]. When
m is large compared to
, one can approximate the mixing distribution globally by matching a large number of moments, for example via Gauss quadrature. This gives the large-
m exponent
. In contrast, when
m is below the
scale, global moment matching is no longer efficient over the whole interval
. The remedy is to partition
into smaller intervals and apply moment matching locally to the conditional distribution on each subinterval. Optimizing the number of intervals and the number of atoms per interval leads to the quadratic exponent
. Thus, the two rates reflect the transition between a global moment-matching regime and a local moment-matching regime.
This dichotomy will directly translate into the two regimes in our main results. In particular, it is precisely this approximation behavior that determines the scaling of .
2.3. Wrapped Random Variables
The final ingredient is a wrapping argument, which was introduced in [
8], that allows us to compare output distributions to the uniform distribution on a circle. This plays a key role in the converse, where we lower bound the support size by quantifying how far the induced output can be from uniformity.
For
, define the wrapping map
by
where
.
The wrapping operation has several useful properties summarized below.
Proposition 1.
- 1.
(Wrapped density formula) If W has density , then has density - 2.
(Uniformity after wrapping) Let be independent of . Then is uniform on : - 3.
(Wrapped-mixture lower bound) Let be discrete with , and let . Then - 4.
(Uniform bound for the wrapped density) Let and assume and with . Then there is an explicit absolute constant such that - 5.
Let and assume and with . Then, where .
Proof. The proof of the first three statements can be found in [
8]. The last two statements are shown in
Appendix A. □
Conceptually, the wrapping argument allows us to reduce the problem to approximating the uniform distribution on a circle using wrapped Gaussian mixtures. Since uniformity is highly structured, this provides a robust way to obtain lower bounds on the number of support points.
3. Main Result
3.1. Some Basic Properties of
In this section, we collect several basic but important structural properties of . In particular, we show that behaves monotonically in and recovers the support size of the capacity-achieving input in the limit .
Theorem 1
(Basic properties of ). Fix . Then the following statements hold.
- 1.
For every , the quantity is well-defined and finite. In particular, where is the capacity-achieving input distribution.
- 2.
The map is non-increasing on .
- 3.
Choose an integer and let Then, for every Consequently, by choosing
Proof. We only show the last statement. Let
. By definition of
, no input supported on fewer than
points can achieve capacity. In particular, no input supported on at most
points can achieve capacity. Hence
Set
Therefore, for any
, we have
On the other hand, by definition of
, every discrete input
with
satisfies
Thus, no such input can be
-capacity-achieving. It follows that any
-capacity-achieving input must satisfy
By the definition of
, this implies that
This proves that for every
,
This concludes the proof. □
Remark 1.
Theorem 1 shows that interpolates between two regimes. For large ε, small support sizes suffice, while as , the quantity recovers the exact support size of the capacity-achieving input. Moreover, the quantity quantifies how much capacity is lost when restricting to inputs with fewer than points.
3.2. Bounds
We now state the main results of this work, which characterize the scaling of in different regimes of the capacity gap.
Theorem 2.
Suppose that . Then the following bounds hold.
Polynomial capacity gap. For , Exponential capacity gap.
The constants are given by We make the following remarks:
For polynomially decaying gaps, the upper and lower bounds match up to constants, yielding the characterization
In particular, the scaling is independent of at the level of first-order asymptotics.
The lower bound is universal across both regimes and reflects an intrinsic limitation: even moderately accurate approximation of the optimal output requires at least mass points.
In contrast, the upper bounds reveal a phase transition: while polynomial accuracy can be achieved with points, exponentially small gaps might require significantly larger support, up to order .
Taken together, these results suggest that different apparent scaling laws may reflect different accuracy regimes. We emphasize that the bounds proved in this paper are asymptotic and apply in the large-amplitude regime. Thus, they should not be interpreted as a direct finite-
A explanation of the numerical observations in [
9,
10,
11]. Rather, they provide an asymptotic mechanism by which different scalings can emerge from different implicit choices of the accuracy level
.
Note that our analysis has focused primarily on the regime in which
decays with
A. For many practical purposes, however, it is also natural to consider the case where
is fixed. Obtaining sharp bounds in this regime for arbitrary choices of
appears to be more delicate. Nevertheless, one can obtain a simple consequence from an Ozarow–Wyner-type bound [
27,
28,
29]. In particular, if
is a PAM input with the number of mass points chosen proportional to
A, then, as
,
Consequently, this implies the fixed-gap upper bound
3.3. Achievability
In this section, we demonstrate the achievability part of the main result. We begin by presenting a general achievability bound.
Theorem 3
(Achievability bound).
Fix and . Then,whereandwhere and κ is defined in Lemma 3. Proof. Let
be capacity-achieving on
and write
. We seek to apply Lemma 3, targeting
and
m chosen as in (
57).
Case : From (
57),
. Moreover, from (
58), we have that
, and Lemma 3 (quadratic regime) yields a
Q supported on at most
m points with
Since by (
58) we also have
, we get that
Case : From (
57),
. Moreover, from (
59), we have that
; in particular
and therefore
. Lemma 3 (large-
m regime) yields a
Q supported on at most
m points such that
Since, by (
59)
, we have that
In both cases, we have produced a discrete
Q supported on at most
m points such that
To couple
and
, let
be the distribution achieving the bound in (
64). Also, let
, with density
, and note that
where (
66) follows from Lemma 1; (
67) follows from Lemma 2; and (
69) follows from the choice of
in (
60).
Therefore,
is feasible in (
3) with
. This concludes the proof. □
With Theorem 3 at our disposal, we now show the two regimes of Theorem 2.
In Theorem 3, let
and note that for
Hence
. Substituting into the expression for
in (
58) and using
where in the last inequality we have used that
for all
and let
(the last inequality holds because
dominates
and 1 for
). The proof is concluded by noting that
, which from Theorem 3 implies that
.
In Theorem 3 let
. To bound
, use
for
:
Hence
. Substituting into the expression for
in (
58) and using
where in the last inequality we have used that
for all
with
.
3.4. Converse
We now show the converse bound.
Theorem 4.
Let and let . Thenwhere , is defined in (
36)
and Proof. Assume
X is such that
, which by using [
30] implies that
We also need the following bound [
8]:
Now,
where (
80) and (
81) follow from Proposition 1; (
82) follows from the triangular inequality; (
83) follows from Pinsker’s inequality; (
84) follows from the data processing inequality; and the first bound in (
85) follows from (
78) and the second bound follows from (
79). Rearrange and take logarithms of (
85) (using
):
By rearranging the terms, we conclude the proof. □
As a consequence of Theorem 2 note that since
,
and hence
. Plugging this bound into (
86) yields
which gives the explicit scaling form
Consequently, for all
A large enough (so that
),
4. Conclusions
In this work, we studied the amplitude-constrained AWGN channel from the perspective of near-optimal input distributions. Rather than focusing on the exact capacity-achieving input, whose support size remains poorly understood, we introduced the quantity , which captures the minimal support size required to achieve capacity up to an -gap.
We showed that this relaxed formulation is significantly more tractable and admits sharp characterizations across different regimes. In particular, for polynomially decaying gaps, we established that , while for exponentially small gaps, the required support size increases to at most order .
Beyond the technical results, our approach provides a conceptual explanation for the variety of scaling laws observed in prior numerical studies. Namely, different empirical scalings can be interpreted as arising from different implicit choices of .
Several open problems remain. In particular, it would be of interest to obtain tighter bounds in the exponential regime, as well as to better understand the behavior of the exact optimizer and its relation to as . More broadly, the -capacity perspective may prove useful in other settings where exact structural characterization is difficult but near-optimal behavior is more accessible.
Author Contributions
Methodology, L.B. and A.D.; Validation, L.B. and A.D.; Formal analysis, L.B. and A.D.; Investigation, L.B. and A.D.; Writing–original draft, L.B. and A.D.; Writing—review & editing, L.B. and A.D. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
No new data were created or analyzed in this study.
Acknowledgments
This paper is dedicated to H. Vincent Poor, whose profound contributions to estimation theory and generous mentorship have been a lasting source of inspiration.
Conflicts of Interest
Author Alex Dytso was employed by the company Qualcomm Flarion Technology, Inc. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Abbreviations
The following abbreviation is used in this manuscript:
| AWGN | Additive White Gaussian Noise |
Appendix A. Proof of Proposition 1
Appendix A.1. Proof of Property 4
We start with some helper lemmas.
Lemma A1.
For any , Proof. The proof follows from inequality: for
,
□
Lemma A2.
Let and , where X and are supported on . Assume . Let and defineThen,Moreover, for , as shown in ([8] Prop. 1). Proof. Let
maximize
, so
. By ([
8] Equation (
56)), for
,
Hence, with
,
Also
(since
I has length 2). By data processing (coarsening to the event
I),
; hence
By Lemma A1,
. Consider two cases.
Case 1: . Then
and thus
Therefore
, and since
, we get
.
Case 2: . Then
, and since
,
Combining the two cases yields (
A4). □
We now prove our final claim which is a uniform bound for the wrapped density.
Lemma A3.
Let and assume and with . Then there is an explicit absolute constant such thatOne may take Proof. Let
. By Lemma A2,
Under
, we have
; hence for all
,
Now apply the wrapping Formula (
32) with
:
For
, the argument ranges over
, so each term is at most
M; hence
For
, note that for
,
so
with
. By ([
8] Equation (
75)),
Therefore,
where we used the integral comparison
for decreasing
g and
. Combining the pieces,
which is exactly (
A11). □
Appendix A.2. Proof of Property 5
We start with the following lemma.
Lemma A4.
Let P be a distribution on with density f satisfying , and let Q be uniform on . Then Proof. Since
Q has density
,
which rearranges to (
A20). □
The proof is competed by noting that according to Property 2 is uniform on .
References
- Dytso, A.; Yagli, S.; Poor, H.V.; Shamai, S. The Capacity Achieving Distribution for the Amplitude Constrained Additive Gaussian Channel: An Upper Bound on the Number of Mass Points. IEEE Trans. Inf. Theory 2020, 66, 2006–2022. [Google Scholar] [CrossRef]
- Dytso, A.; Goldenbaum, M.; Shamai, S.; Poor, H.V. Upper and lower bounds on the capacity of amplitude-constrained MIMO channels. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Singapore, 4–8 December 2017; pp. 1–6. [Google Scholar]
- Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shamai, S. When Are Discrete Channel Inputs Optimal?—Optimization Techniques and Some New Results. In Proceedings of the Conference on Information Sciences and Systems, Princeton, NJ, USA, 21–23 March 2018; pp. 1–6. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Smith, J.G. On the Information Capacity of Peak and Average Power Constrained Gaussian Channels. Ph.D. Dissertation, University of California, Berkeley, CA, USA, 1969. [Google Scholar]
- Smith, J.G. The information capacity of amplitude-and variance-constrained scalar Gaussian channels. Inform. Control 1971, 18, 203–219. [Google Scholar] [CrossRef]
- Sharma, N.; Shamai, S. Transition points in the capacity-achieving distribution for the peak-power limited AWGN and free-space optical intensity channels. Probl. Inf. Transm. 2010, 46, 283–299. [Google Scholar] [CrossRef]
- Wang, H.; Barletta, L.; Dytso, A. An Improved Lower Bound on Cardinality of Support of the Amplitude-Constrained AWGN Channel. arXiv 2025, arXiv:2512.22691. [Google Scholar]
- Mattingly, H.H.; Transtrum, M.K.; Abbott, M.C.; Machta, B.B. Maximizing the information learned from finite data selects a simple model. Proc. Natl. Acad. Sci. USA 2018, 115, 1760–1765. [Google Scholar] [CrossRef] [PubMed]
- Abbott, M.C.; Machta, B.B. A scaling law from discrete to continuous solutions of channel capacity problems in the low-noise limit. J. Stat. Phys. 2019, 176, 214–227. [Google Scholar] [CrossRef]
- Zhang, Z. Discrete Noninformative Priors. Ph.D. Thesis, Yale University, New Haven, CT, USA, 1994. [Google Scholar]
- Blahut, R. Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory 1972, 18, 460–473. [Google Scholar] [CrossRef]
- Arimoto, S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Trans. Inf. Theory 1972, 18, 14–20. [Google Scholar] [CrossRef]
- McKellips, A.L. Simple tight bounds on capacity for the peak-limited discrete-time channel. In Proceedings of the IEEE International Symposium on Information Theory, Chicago, IL, USA, 27 June–2 July 2004; p. 348. [Google Scholar]
- Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shamai, S. Amplitude constrained MIMO channels: Properties of optimal input distributions and bounds on the capacity. Entropy 2019, 21, 200. [Google Scholar] [CrossRef]
- Thangaraj, A.; Kramer, G.; Böcherer, G. Capacity Bounds for Discrete-Time, Amplitude-Constrained, Additive White Gaussian Noise Channels. IEEE Trans. Inf. Theory 2017, 63, 4172–4182. [Google Scholar] [CrossRef]
- Abou-Faycal, I.C.; Trott, M.D.; Shamai, S. The capacity of discrete-time memoryless Rayleigh-fading channels. IEEE Trans. Inf. Theory 2001, 47, 1290–1301. [Google Scholar] [CrossRef]
- Katz, M.; Shamai, S. On the capacity-achieving distribution of the discrete-time noncoherent and partially coherent AWGN channels. IEEE Trans. Inf. Theory 2004, 50, 2257–2270. [Google Scholar] [CrossRef]
- Shamai, S. Capacity of a pulse amplitude modulated direct detection photon channel. IEE Proc. I (Commun. Speech Vis.) 1990, 137, 424–430. [Google Scholar] [CrossRef]
- Dytso, A.; Barletta, L.; Shamai, S. Properties of the Support of the Capacity-Achieving Distribution of the Amplitude-Constrained Poisson Noise Channel. IEEE Trans. Inf. Theory 2021, 67, 7050–7066. [Google Scholar] [CrossRef]
- Fahs, J.; Abou-Faycal, I. On properties of the support of capacity-achieving distributions for additive noise channel models with input cost constraints. IEEE Trans. Inf. Theory 2017, 64, 1178–1198. [Google Scholar] [CrossRef]
- Tchamkerten, A. On the discreteness of capacity-achieving distributions. IEEE Trans. Inf. Theory 2004, 50, 2773–2778. [Google Scholar] [CrossRef]
- Chan, T.H.; Hranilovic, S.; Kschischang, F.R. Capacity-achieving probability measure for conditionally Gaussian channels with bounded inputs. IEEE Trans. Inf. Theory 2005, 51, 2073–2088. [Google Scholar] [CrossRef]
- Abou El Hessen, T.; Tuninetti, D.; Belkhadir, A.; Banerjee, A. Channel Capacity Analysis with Nonlinear Effects of RF Power Amplifiers. In Proceedings of the 2025 IEEE International Symposium on Information Theory (ISIT), Ann Arbor, MI, USA, 22–27 June 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–6. [Google Scholar]
- Stapmanns, J.; Dias, C.; Eilers, L.; Kühn, T.; Pfister, J.P. Phase Transitions of the Additive Uniform Noise Channel with Peak Amplitude and Cost Constraint. arXiv 2025, arXiv:2510.12427. [Google Scholar] [CrossRef]
- Ma, Y.; Wu, Y.; Yang, P. On the Best Approximation by Finite Gaussian Mixtures. IEEE Trans. Inf. Theory 2025, 71, 5469–5492. [Google Scholar] [CrossRef]
- Ungerboeck, G. Channel coding with multilevel/phase signals. IEEE Trans. Inf. Theory 2003, 28, 55–67. [Google Scholar] [CrossRef]
- Ozarow, L.H.; Wyner, A.D. On the capacity of the Gaussian channel with a finite number of input levels. IEEE Trans. Inf. Theory 1990, 36, 1426–1428. [Google Scholar] [CrossRef]
- Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shitz, S.S. A generalized Ozarow-Wyner capacity bound with applications. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1058–1062. [Google Scholar]
- Topsøe, F. An information theoretical identity and a problem involving capacity. Stud. Sci. Math. Hung. 1967, 2, 246. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |