Next Article in Journal
Thermal-State Continuous-Variable Quantum Key Distribution Under the Effects of Gravity
Next Article in Special Issue
Multi-Stream Quickest Change Detection: Foundations and Recent Advances
Previous Article in Journal
BEP-IM: A Vehicular Crowdsensing Incentive Mechanism to Drive Sustained Spatial Coverage and Proactive Sensing Shaping
Previous Article in Special Issue
Tail-Latency-Aware Federated Learning with Pinching Antenna: Latency, Participation, and Placement
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Support Size of ε-Capacity-Achieving Inputs for the Amplitude-Constrained AWGN Channel

1
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milano, Italy
2
Qualcomm Flarion Technology, Inc., Bridgewater, NJ 08807, USA
*
Author to whom correspondence should be addressed.
Entropy 2026, 28(5), 500; https://doi.org/10.3390/e28050500
Submission received: 15 April 2026 / Revised: 24 April 2026 / Accepted: 25 April 2026 / Published: 28 April 2026

Abstract

We study the discrete-time amplitude-constrained additive white Gaussian noise (AWGN) channel from the perspective of near-optimal input distributions in the high-SNR, or equivalently large-amplitude, regime. While it is known that the capacity-achieving input is discrete with finitely many mass points, the precise scaling of its support size as a function of the amplitude constraint remains an open problem. In this work, we instead consider the minimal support size required to achieve capacity up to an ε -gap. We introduce the quantity K ε ( A ) , defined as the smallest support size among discrete inputs supported on [ A , A ] that achieves mutual information within ε of capacity. We show that this relaxed formulation is significantly more tractable and admits sharp characterizations in several vanishing-gap regimes. In particular, for polynomially decaying gaps, ε = A β with β 1 , we establish that K ε ( A ) = Θ ( A log A ) as A . For exponentially small gaps, we obtain bounds of order between A log A and A 3 / 2 . Our approach combines approximation-theoretic bounds for Gaussian mixtures with information-theoretic control of entropy via χ 2 -divergence, together with a wrapping argument that relates the problem to approximating the uniform distribution on a circle. Beyond the technical results, our framework provides a conceptual explanation for the variety of scaling laws observed in prior numerical studies, suggesting that these may correspond to different regimes of ε -optimality rather than intrinsic properties of the exact optimizer.

1. Introduction

We consider a discrete-time additive real-valued white Gaussian noise (AWGN) channel subject to a peak-power constraint. The channel output is given by
Y = X + Z ,
where the input random variable X satisfies the constraint | X | A almost surely (a.s.), and Z is a standard normal random variable independent of X. The capacity under the amplitude constraint is
C ( A ) : = sup P X : supp ( P X ) [ A , A ] I ( X ; Y ) .
We denote by X a capacity-achieving input random variable and by Y the corresponding induced output random variable. In general, both the exact value of the capacity C ( A ) and the precise structure of the capacity-achieving distribution X remain unknown.

1.1. Problem Formulation

In contrast to prior work focusing on exact optimality, we study the minimal support size required to achieve capacity up to an ε -gap. Specifically, for ε > 0 , define
K ε ( A ) : = min | supp ( P X ) | : supp ( P X ) [ A , A ] , P X discrete , I ( X ; Y ) C ( A ) ε .
While the precise scaling of | supp ( P X ) | remains unknown, this relaxed formulation turns out to be significantly more tractable. In particular, we are able to characterize the exact scaling of K ε ( A ) for a range of regimes. For example, when the gap decays polynomially with A, i.e., ε = A β for some β 1 , we obtain a sharp characterization of K ε ( A ) .

1.2. Literature Review

The literature on the amplitude constraint channel is large and we do not try to survey it fully and only mention key relevant results. For a comprehensive review, interest readers are referred to [1,2,3] and references therein.
Studying the capacity of the amplitude-constrained AWGN channel is a classical problem in information theory, originating in the work of Shannon [4]. A fundamental result by Smith [5,6] shows that the capacity-achieving input distribution is discrete with finitely many mass points, in contrast to the average-power-constrained setting where the capacity-achieving distribution is Gaussian. Subsequent work further characterized the structure of the optimal input, including transition thresholds where binary and ternary constellations are optimal [7]. However, the precise scaling of the support size with the amplitude constraint remains unresolved: the best-known non-asymptotic bounds, [8] and [1], place it between A log A and A 2 , respectively.
Somewhat intriguingly, numerical investigations have suggested a range of alternative asymptotic behaviors. In particular, Mattingly et al. [9] reported an empirical scaling of order A 4 / 3 as A , based on experiments involving not only the AWGN channel but also several non-Gaussian models, including the binomial channel and certain two-dimensional channels. A follow-up work by Abbott and Machta [10] provided a heuristic, physics-inspired justification for this scaling. These findings coexist with Zhang’s spacing-based heuristic, which suggests growth of order A log A ([11] pp. 91, 95, 96), while earlier work conjectured linear scaling [1], a claim that has since been disproved [8].
We argue that this apparent diversity of scaling laws can be naturally interpreted through the lens of ε -optimality. Indeed, all numerical procedures implicitly operate with a finite tolerance, effectively computing inputs that are only ε -capacity-achieving for some algorithm-dependent ε . From this perspective, different observed scalings correspond to different regimes of ε = ε ( A ) , rather than intrinsic properties of the exact optimizer. This viewpoint also explains the well-known numerical sensitivity of the problem: in the large-A regime, the optimal output distribution becomes nearly uniform in the interior, and deviations that distinguish competing inputs occur at a scale comparable to numerical precision. Consequently, implementations based on, e.g., the Blahut–Arimoto algorithm [12,13], which involve repeated numerical integrations of log-densities, are particularly susceptible to bias and instability. As a result, different numerical tolerances and methodologies can lead to markedly different empirical scaling laws.
In parallel, a large body of work has developed capacity bounds via entropy methods, duality, and estimation-theoretic representations; see [14,15,16] and references therein. There is also a large body of work that focuses on showing discreteness of capacity-achieving inputs for non-Gaussian channels [17,18,19,20,21,22,23,24,25].

1.3. Outline and Contributions

Section 2 collects the main technical tools used throughout the paper, including stability bounds for entropy, approximation results for Gaussian mixtures, and properties of the wrapping operation. These ingredients form the backbone of both the achievability and converse arguments.
Section 3 presents the main results together with their derivations. In particular, we obtain sharp bounds on K ε ( A ) and provide a complete characterization in the regime where ε decays at most polynomially in A. We also discuss the behavior in the exponential regime and highlight the transition in scaling.
Section 4 concludes the paper with a summary of the main findings and a discussion of open problems.
We conclude this section by presenting relevant notation.

1.4. Notation

Throughout the paper, the deterministic scalar quantities are denoted by lowercase letters and random variables are denoted by uppercase letters.
We denote the distribution of a random variable X by P X . The support set of P X is denoted and defined as
supp ( P X ) = x : for every open set D x we have that P X ( D ) > 0 .
The notation | · | , depending on the context, denotes either absolute value or cardinality of the set. For example, | supp ( P X ) | denotes the size of the support of P X . All logarithms are taken with base e . The density of a standard normal will be denoted by φ .
We denote the differential entropy of a continuous random variable X by h ( X ) . Given two probability distributions P and Q with probability densities functions (pdfs) p and q, respectively, we will require the following distances,
Total Variation : TV ( P Q ) = 1 2 | p ( x ) q ( x ) | d x ,
Relative Entropy : D ( P Q ) = p ( x ) log p ( x ) q ( x ) d x ,
χ 2 Divergence : χ 2 ( P Q ) = ( p ( x ) q ( x ) ) 2 q ( x ) d x ,
with the understanding that the relative entropy and χ 2 are equal to infinity if P is not absolutely continuous with respect to Q. Finally, log + ( x ) = max { log ( x ) , 0 } .

2. Tools and Preliminaries

In this section, we collect several technical ingredients that underlie our analysis. At a high level, our approach proceeds by approximating the output distribution induced by the capacity-achieving input using finite Gaussian mixtures, and then quantifying the resulting loss in mutual information. This leads to three main components: (i) a stability bound for entropy in terms of χ 2 -divergence, (ii) approximation guarantees for Gaussian mixtures, and (iii) a wrapping argument that allows us to compare distributions to the uniform law on a circle.

2.1. Entropy Loss via χ 2

The first ingredient is a quantitative stability bound for differential entropy under χ 2 -perturbations. This allows us to convert approximation guarantees at the level of densities into bounds on mutual information.
Lemma 1 
(Entropy loss controlled by χ 2 ). Let f , g be densities on R such that χ 2 ( g f ) < and ( log f ) 2 f < . Then,
h ( f ) h ( g ) log f L 2 ( f ) χ 2 ( g f ) + χ 2 ( g f ) ,
where log f L 2 ( f ) : = ( log f ) 2 f 1 / 2 .
Proof. 
Write
u ( x ) : = g ( x ) f ( x ) 1 ,
so that g ( x ) = ( 1 + u ( x ) ) f ( x ) and
u ( x ) f ( x ) d x = g ( x ) d x f ( x ) d x = 0 .
Moreover,
χ 2 ( g f ) = u ( x ) 2 f ( x ) d x .
Then
h ( g ) = g ( x ) log g ( x ) d x
= ( 1 + u ( x ) ) f ( x ) log ( 1 + u ( x ) ) f ( x ) d x
= ( 1 + u ( x ) ) f ( x ) log f ( x ) d x ( 1 + u ( x ) ) f ( x ) log ( 1 + u ( x ) ) d x .
Therefore,
h ( f ) h ( g ) = f ( x ) log f ( x ) d x + ( 1 + u ( x ) ) f ( x ) log f ( x ) d x
+ ( 1 + u ( x ) ) f ( x ) log ( 1 + u ( x ) ) d x
= u ( x ) f ( x ) log f ( x ) d x + ( 1 + u ( x ) ) f ( x ) log ( 1 + u ( x ) ) d x .
By Cauchy–Schwarz,
u ( x ) f ( x ) log f ( x ) d x u ( x ) 2 f ( x ) d x 1 / 2 ( log f ( x ) ) 2 f ( x ) d x 1 / 2
= log f L 2 ( f ) χ 2 ( g f ) .
For the second term, using log ( 1 + u ) u for all u > 1 , we get
( 1 + u ( x ) ) log ( 1 + u ( x ) ) ( 1 + u ( x ) ) u ( x ) = u ( x ) + u ( x ) 2 .
Hence,
( 1 + u ( x ) ) f ( x ) log ( 1 + u ( x ) ) d x u ( x ) + u ( x ) 2 f ( x ) d x = u ( x ) f ( x ) d x + u ( x ) 2 f ( x ) d x = χ 2 ( g f ) .
Combining the two bounds gives
h ( f ) h ( g ) log f L 2 ( f ) χ 2 ( g f ) + χ 2 ( g f ) ,
which proves the claim. □
The key feature of Lemma 1 is that it provides control of entropy loss in terms of χ 2 -divergence. In particular, small χ 2 -error directly translates into a small loss in mutual information, up to a multiplicative factor depending on log f L 2 ( f ) .
We will apply this lemma in a setting where f corresponds to the output density induced by the capacity-achieving input, and g corresponds to an approximating Gaussian mixture. The next result provides a uniform bound on the prefactor log f L 2 ( f ) in this setting.
Lemma 2. 
Fix A > 0 . Let Y = X + Z where Z N ( 0 , 1 ) and X P X be supported on [ A , A ] . Then,
log f Y L 2 ( f Y ) 10 ( 1 + A 2 ) .
Proof. 
Fix y R . First, notice that
f Y ( y ) = E [ φ ( y X ) ] 1 2 π < 1
since φ ( y ) 1 2 π . Since φ ( x ) = ( 2 π ) 1 / 2 e x 2 / 2 is strictly decreasing in | x | , for θ [ A , A ] we have | y θ | | y | + A ; hence φ ( y θ ) φ ( | y | + A ) . Averaging gives
f Y ( y ) = φ ( y θ ) d P X ( θ ) φ ( | y | + A ) = ( 2 π ) 1 / 2 exp ( | y | + A ) 2 2 .
Thus,
log f Y ( y ) ( | y | + A ) 2 2 + 1 2 log ( 2 π ) = : ( | y | + A ) 2 2 + c 0 .
Consequently,
log f Y L 2 ( f Y ) 2 = E ( log f Y ( Y ) ) 2
E ( | X + Z | + A ) 2 2 + c 0 2
8 ( 1 + A 2 ) 2 + 2 c 0 2
10 ( 1 + A 2 ) 2
where (26) follows from the bound in (24) and from log f Y ( y ) > 0 thanks to (22), and (27) follows from sequential application of the inequality ( a + b ) 2 2 ( a 2 + b 2 ) , the bound | X | A , and the evaluation of the moments of Z. □
Lemma 2 shows that the entropy sensitivity grows at most quadratically in A. Combined with Lemma 1, this implies that achieving a small χ 2 -approximation error is sufficient to ensure near optimality in mutual information.

2.2. Finite-Mixture Approximations

The second ingredient is a sharp approximation result for Gaussian mixtures. Recall that for any probability measure P supported on [ A , A ] , the induced output density takes the form
f P ( y ) : = A A φ ( y θ ) d P ( θ ) ,
i.e., a Gaussian location mixture.
Our goal is to approximate such densities using mixtures supported on finitely many points. The following result, by Ma, Wu, and Yang, provides near-optimal bounds on this approximation error.
Lemma 3 
(Ma–Wu–Yang approximation theorem [26]). Let A > 0 and let P A Bdd denote the set of probability measures supported on [ A , A ] . Let P m denote the set of probability measures supported on at most m points. Define the worst-case best approximation error
E ( m , P A Bdd , χ 2 ) : = sup P P A Bdd inf Q P A Bdd P m χ 2 ( f Q f P ) ,
where f P is the mixture density (29). Then there exists a universal constant κ 16 e 3 such that for all m N and A > 0 ,
E ( m , P A Bdd , χ 2 ) exp m log m A 2 , m κ A 2 , exp log κ 4 κ m 2 A 2 , 3 κ A m κ A 2 .
The key takeaway from Lemma 3 is that Gaussian mixtures supported on m points can approximate arbitrary mixtures supported on [ A , A ] with exponentially small χ 2 -error. Moreover, the approximation exhibits two distinct regimes:
  • A quadratic regime, where the error behaves like exp ( m 2 / A 2 ) ;
  • A large-m regime, where the error behaves like exp ( m log m / A 2 ) .
The dichotomy in Lemma 3 comes from the two approximation mechanisms used in the proof of [26]. When m is large compared to A 2 , one can approximate the mixing distribution globally by matching a large number of moments, for example via Gauss quadrature. This gives the large-m exponent exp ( m log m / A 2 ) . In contrast, when m is below the A 2 scale, global moment matching is no longer efficient over the whole interval [ A , A ] . The remedy is to partition [ A , A ] into smaller intervals and apply moment matching locally to the conditional distribution on each subinterval. Optimizing the number of intervals and the number of atoms per interval leads to the quadratic exponent exp ( c m 2 / A 2 ) . Thus, the two rates reflect the transition between a global moment-matching regime and a local moment-matching regime.
This dichotomy will directly translate into the two regimes in our main results. In particular, it is precisely this approximation behavior that determines the scaling of K ε ( A ) .

2.3. Wrapped Random Variables

The final ingredient is a wrapping argument, which was introduced in [8], that allows us to compare output distributions to the uniform distribution on a circle. This plays a key role in the converse, where we lower bound the support size by quantifying how far the induced output can be from uniformity.
For B > 0 , define the wrapping map · B : R [ π , π ) by
W B : = π B ( W mod 2 B ) ,
where W mod 2 B : = W 2 B W + B 2 B .
The wrapping operation has several useful properties summarized below.
Proposition 1. 
1.
(Wrapped density formula) If W has density f W , then W B has density
f W B ( θ ) = B π k Z f W B π ( θ + 2 π k ) , θ ( π , π ) .
2.
(Uniformity after wrapping) Let U Unif ( [ B , B ] ) be independent of Z N ( 0 , 1 ) . Then U + Z B is uniform on ( π , π ) :
f U + Z B ( θ ) = 1 2 π , θ ( π , π ) .
3.
(Wrapped-mixture lower bound) Let X [ A , A ] be discrete with K : = | supp ( P X ) | 2 , and let U Unif ( [ A , A ] ) . Then
χ 2 P X + Z A P U + Z A 1 2 exp 4 π 2 K 2 A 2 .
4.
(Uniform L bound for the wrapped density) Let Y = X + Z and assume A 1 and D ( P Y P Y ) ε with ε 1 / A . Then there is an explicit absolute constant M 0 such that
f X + Z A M 0 .
One may take
M 0 : = 3 π + 1 2 π 2 e 2 α 0 π e 2 , α 0 = 1 1 e t 2 / 2 d t .
5.
Let Y = X + Z and assume A 1 and D ( P Y P Y ) ε with ε 1 / A . Then,
C 0 χ 2 P X + Z A P U + Z A TV ( P X + Z A , P U + Z A )
where C 0 = 1 4 ( 2 π M 0 + 1 ) .
Proof. 
The proof of the first three statements can be found in [8]. The last two statements are shown in Appendix A. □
Conceptually, the wrapping argument allows us to reduce the problem to approximating the uniform distribution on a circle using wrapped Gaussian mixtures. Since uniformity is highly structured, this provides a robust way to obtain lower bounds on the number of support points.

3. Main Result

3.1. Some Basic Properties of K ε ( A )

In this section, we collect several basic but important structural properties of K ε ( A ) . In particular, we show that K ε ( A ) behaves monotonically in ε and recovers the support size of the capacity-achieving input in the limit ε 0 .
Theorem 1 
(Basic properties of K ε ( A ) ). Fix A > 0 . Then the following statements hold.
1.
For every ε > 0 , the quantity K ε ( A ) is well-defined and finite. In particular,
1 K ε ( A ) | supp ( P X ) | < ,
where P X is the capacity-achieving input distribution.
2.
The map ε K ε ( A ) is non-increasing on ( 0 , ) .
3.
Choose an integer 1 m < | supp ( P X ) | and let
C m ( A ) : = sup I ( X ; Y ) : supp ( P X ) [ A , A ] , P X discrete , | supp ( P X ) | | supp ( P X ) | m ,
and let
δ m ( A ) = C ( A ) C m ( A ) .
Then, for every 0 < ε < δ m ( A )
K ε ( A ) | supp ( P X ) | m + 1 , .
Consequently, by choosing m = 1
lim ϵ 0 K ε ( A ) = | supp ( P X ) | .
Proof. 
We only show the last statement. Let K ( A ) = | supp ( P X ) | . By definition of K ( A ) , no input supported on fewer than K ( A ) points can achieve capacity. In particular, no input supported on at most K ( A ) m points can achieve capacity. Hence
C m ( A ) < C ( A ) .
Set
δ m ( A ) : = C ( A ) C m ( A ) > 0 .
Therefore, for any 0 < ε < δ m ( A ) , we have
C ( A ) ε > C ( A ) δ m ( A ) = C m ( A ) .
On the other hand, by definition of C m ( A ) , every discrete input P X with
| supp ( P X ) | K ( A ) m
satisfies
I ( X ; Y ) C m ( A ) < C ( A ) ε .
Thus, no such input can be ε -capacity-achieving. It follows that any ε -capacity-achieving input must satisfy
| supp ( P X ) | K ( A ) m + 1 .
By the definition of K ε ( A ) , this implies that
K ε ( A ) K ( A ) m + 1 = | supp ( P X ) | m + 1 .
This proves that for every 0 < ε < δ m ( A ) ,
K ε ( A ) | supp ( P X ) | m + 1 .
This concludes the proof. □
Remark 1. 
Theorem 1 shows that K ε ( A ) interpolates between two regimes. For large ε, small support sizes suffice, while as ε 0 , the quantity K ε ( A ) recovers the exact support size of the capacity-achieving input. Moreover, the quantity δ m ( A ) quantifies how much capacity is lost when restricting to inputs with fewer than | supp ( P X ) | m points.

3.2. Bounds

We now state the main results of this work, which characterize the scaling of K ε ( A ) in different regimes of the capacity gap.
Theorem 2. 
Suppose that A > 1600 . Then the following bounds hold.
  • Polynomial capacity gap. For β 1 ,
    1 2 2 π A log + ( c 1 A ) K A β ( A ) 32 e A c 2 log ( A ) .
  • Exponential capacity gap.
    1 2 2 π A log + ( c 1 A ) K e A ( A ) c 3 A 3 / 2 .
The constants are given by
c 1 = 1 8 1 + π e 2 6 + 2 π 2 π e 4 α 0 + 1 2 , α 0 = 1 1 e t 2 / 2 d t ,
c 2 = ( 2 β + 5 ) e 4 log ( 2 ) + 3 , c 3 = 8 e 2 e 4 log ( 2 ) + 3 .
We make the following remarks:
  • For polynomially decaying gaps, the upper and lower bounds match up to constants, yielding the characterization
    K A β ( A ) = Θ A log A .
    In particular, the scaling is independent of β 1 at the level of first-order asymptotics.
  • The lower bound is universal across both regimes and reflects an intrinsic limitation: even moderately accurate approximation of the optimal output requires at least A log A mass points.
  • In contrast, the upper bounds reveal a phase transition: while polynomial accuracy can be achieved with A log A points, exponentially small gaps might require significantly larger support, up to order A 3 / 2 .
  • Taken together, these results suggest that different apparent scaling laws may reflect different accuracy regimes. We emphasize that the bounds proved in this paper are asymptotic and apply in the large-amplitude regime. Thus, they should not be interpreted as a direct finite-A explanation of the numerical observations in [9,10,11]. Rather, they provide an asymptotic mechanism by which different scalings can emerge from different implicit choices of the accuracy level ε = ε ( A ) .
  • Note that our analysis has focused primarily on the regime in which ε decays with A. For many practical purposes, however, it is also natural to consider the case where ε is fixed. Obtaining sharp bounds in this regime for arbitrary choices of ε appears to be more delicate. Nevertheless, one can obtain a simple consequence from an Ozarow–Wyner-type bound [27,28,29]. In particular, if X D is a PAM input with the number of mass points chosen proportional to A, then, as A ,
    C ( A ) I ( X D ; Y ) 1 2 log π e 3 .
    Consequently, this implies the fixed-gap upper bound
    K 1 2 log π e 3 ( A ) = O ( A ) .

3.3. Achievability

In this section, we demonstrate the achievability part of the main result. We begin by presenting a general achievability bound.
Theorem 3 
(Achievability bound). Fix A 1 and 0 < ε 1 . Then,
K ϵ ( A ) m
where
m : = m 1 , m 1 κ A 2 , m 2 , m 1 > κ A 2 .
and
m 1 = 3 κ A + A 1 c log 1 δ A ,
m 2 = max { 3 , κ A 2 } + A 2 log 1 δ A ,
δ A = min ε 2 , ε 2 40 ( 1 + A 2 ) 2 .
where c = log κ 4 κ and κ is defined in Lemma 3.
Proof. 
Let P be capacity-achieving on [ A , A ] and write f : = f P . We seek to apply Lemma 3, targeting P = P and m chosen as in (57).
Case m 1 κ A 2 : From (57), m = m 1 . Moreover, from (58), we have that m = m 1 3 κ A , and Lemma 3 (quadratic regime) yields a Q supported on at most m points with
χ 2 ( f Q f ) exp c m 2 A 2 .
Since by (58) we also have m = m 1 A 1 c log ( 1 / δ A ) , we get that
χ 2 ( f Q f ) exp c m 1 2 A 2 δ A .
Case m 1 > κ A 2 : From (57), m = m 2 . Moreover, from (59), we have that m = m 2 max { 3 , κ A 2 } ; in particular m 3 and therefore log m log 3 1 . Lemma 3 (large-m regime) yields a Q supported on at most m points such that
χ 2 ( f Q f ) exp m log m A 2 exp m A 2 .
Since, by (59) m = m 2 A 2 log ( 1 / δ A ) , we have that
χ 2 ( f Q f ) exp m A 2 δ A .
In both cases, we have produced a discrete Q supported on at most m points such that
χ 2 ( f Q f ) δ A .
To couple δ A and ε , let X Q be the distribution achieving the bound in (64). Also, let Y Q = X Q + Z , with density f Q , and note that
C ( A ) I ( X Q ; Y Q ) = h ( f ) h ( f Q )
log f L 2 ( f ) χ 2 ( f Q f ) + χ 2 ( f Q f )
10 ( 1 + A 2 ) χ 2 ( f Q f ) + χ 2 ( f Q f )
10 ( 1 + A 2 ) δ A + δ A
ε 2 + ε 2 = ε ,
where (66) follows from Lemma 1; (67) follows from Lemma 2; and (69) follows from the choice of δ A in (60).
Therefore, X Q is feasible in (3) with | supp ( X Q ) | m . This concludes the proof. □
With Theorem 3 at our disposal, we now show the two regimes of Theorem 2.
  • Achievability for K 1 A β ( A ) :
In Theorem 3, let ε = 1 A β and note that for A 1
1 δ A = 40 A 2 β ( 1 + A 2 ) 2 40 A 2 β ( 2 A 2 ) 2 = 160 A 2 β + 4 ,
Hence log ( 1 / δ A ) log 160 + ( 2 β + 4 ) log A . Substituting into the expression for m 1 in (58) and using 1 / c = 4 κ / log κ
m 1 3 κ A + A 4 κ log κ log 160 + ( 2 β + 4 ) log A + 1
C A log A ,
where in the last inequality we have used that log 160 + ( 2 β + 4 ) log A ( 2 β + 5 ) log A for all A 160 and let C : = 4 4 ( 2 β + 5 ) κ log κ (the last inequality holds because A log A dominates A and 1 for A e ). The proof is concluded by noting that C A log ( A ) k A 2 , which from Theorem 3 implies that K 1 A β ( A ) m 1 C A log ( A ) .
  • Achievability for K e A ( A ) :
In Theorem 3 let ε = e A . To bound log ( 1 / δ A ) , use 1 + A 2 2 A 2 for A 1 :
1 δ A = 40 e 2 A ( 1 + A 2 ) 2 40 e 2 A ( 2 A 2 ) 2 = 160 e 2 A A 4 ,
Hence log ( 1 / δ A ) log 160 + 4 log A + 2 A . Substituting into the expression for m 1 in (58) and using 1 / c = 4 κ / log κ
m 1 3 κ A + A 4 κ log κ log 160 + 4 log A + 2 A + 1
C A 3 / 2
where in the last inequality we have used that log 160 + 4 log A + 2 A 5 log A + 2 A for all A 160 with C : = 2 2 κ log κ .

3.4. Converse

We now show the converse bound.
Theorem 4. 
Let A 1 and let 0 < ε 1 / A . Then
K ϵ ( A ) A 2 π log + c L ε / 2 + b 0 / ( 2 A ) ,
where b 0 = π e 2 , M 0 is defined in (36) and
c L : = 1 8 ( 2 π M 0 + 1 ) .
Proof. 
Assume X is such that C ( A ) I ( X ; Y ) ϵ , which by using [30] implies that
D ( P Y P Y ) C ( A ) I ( X ; Y ) ϵ .
We also need the following bound [8]:
D ( P U + Z A P Y A ) b 0 A
Now,
C 0 2 exp 4 π 2 K 2 A 2 C 0 χ 2 P X + Z A P U + Z A
TV ( P X + Z A , P U + Z A )
TV ( P X + Z A , P Y A ) + TV ( P U + Z A , P Y A )
1 2 D ( P X + Z A P Y A ) + 1 2 D ( P U + Z A P Y A )
1 2 D ( P Y P Y ) + 1 2 D ( P U + Z P Y )
ϵ 2 + b 0 2 A ,
where (80) and (81) follow from Proposition 1; (82) follows from the triangular inequality; (83) follows from Pinsker’s inequality; (84) follows from the data processing inequality; and the first bound in (85) follows from (78) and the second bound follows from (79). Rearrange and take logarithms of (85) (using log + ( x ) : = max { log x , 0 } ):
4 π 2 K 2 A 2 log + c L ε / 2 + b 0 / ( 2 A ) .
By rearranging the terms, we conclude the proof. □
As a consequence of Theorem 2 note that since ε 1 / A , ε / 2 1 / 2 A and hence ε / 2 + b 0 / ( 2 A ) ( 1 + b 0 ) / ( 2 A ) . Plugging this bound into (86) yields
K A 2 π log + c L 2 A 1 + b 0 = A 2 2 π log + 2 c L 2 1 + b 0 A ,
which gives the explicit scaling form
K A 2 2 π log + c A , c : = 2 c L 2 1 + b 0 .
Consequently, for all A large enough (so that c A > 1 ),
K ε ( A ) A 2 2 π log ( c A ) .

4. Conclusions

In this work, we studied the amplitude-constrained AWGN channel from the perspective of near-optimal input distributions. Rather than focusing on the exact capacity-achieving input, whose support size remains poorly understood, we introduced the quantity K ε ( A ) , which captures the minimal support size required to achieve capacity up to an ε -gap.
We showed that this relaxed formulation is significantly more tractable and admits sharp characterizations across different regimes. In particular, for polynomially decaying gaps, we established that K ε ( A ) = Θ ( A log A ) , while for exponentially small gaps, the required support size increases to at most order A 3 / 2 .
Beyond the technical results, our approach provides a conceptual explanation for the variety of scaling laws observed in prior numerical studies. Namely, different empirical scalings can be interpreted as arising from different implicit choices of ε .
Several open problems remain. In particular, it would be of interest to obtain tighter bounds in the exponential regime, as well as to better understand the behavior of the exact optimizer and its relation to K ε ( A ) as ε 0 . More broadly, the ε -capacity perspective may prove useful in other settings where exact structural characterization is difficult but near-optimal behavior is more accessible.

Author Contributions

Methodology, L.B. and A.D.; Validation, L.B. and A.D.; Formal analysis, L.B. and A.D.; Investigation, L.B. and A.D.; Writing–original draft, L.B. and A.D.; Writing—review & editing, L.B. and A.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

This paper is dedicated to H. Vincent Poor, whose profound contributions to estimation theory and generous mentorship have been a lasting source of inspiration.

Conflicts of Interest

Author Alex Dytso was employed by the company Qualcomm Flarion Technology, Inc. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviation is used in this manuscript:
AWGNAdditive White Gaussian Noise

Appendix A. Proof of Proposition 1

Appendix A.1. Proof of Property 4

We start with some helper lemmas.
Lemma A1. 
For any p , q ( 0 , 1 ) ,
d ( p q ) : = p log p q + ( 1 p ) log 1 p 1 q p log p q + q p .
Proof. 
The proof follows from inequality: for u ( 0 , 1 ) ,
log ( 1 u ) u 1 u .
Lemma A2. 
Let Y = X + Z and Y = X + Z , where X and X are supported on [ A , A ] . Assume D ( P Y P Y ) ε . Let M : = f Y and define
α 0 : = 1 1 e t 2 / 2 d t .
Then,
M max ε α 0 , 2 e 2 α 0 f Y .
Moreover, for A 1 , f Y e 2 A as shown in ([8] Prop. 1).
Proof. 
Let y 0 maximize f Y , so f Y ( y 0 ) = M . By ([8] Equation (56)), for y [ y 0 1 , y 0 + 1 ] ,
f Y ( y ) M e ( y y 0 ) 2 / 2 .
Hence, with I : = [ y 0 1 , y 0 + 1 ] ,
p : = P Y ( I ) = I f Y ( y ) d y M 1 1 e t 2 / 2 d t = α 0 M .
Also q : = P Y ( I ) 2 f Y (since I has length 2). By data processing (coarsening to the event I), D ( P Y P Y ) d ( p q ) ; hence
ε d ( p q ) .
By Lemma A1, d ( p q ) p log ( p / q ) + q p . Consider two cases.
Case 1: p / q e 2 . Then log ( p / q ) 2 and thus
ε p log ( p / q ) + q p 2 p p = p .
Therefore p ε , and since p α 0 M , we get M ε / α 0 .
Case 2: p / q < e 2 . Then p < e 2 q 2 e 2 f Y , and since p α 0 M ,
M 2 e 2 α 0 f Y .
Combining the two cases yields (A4). □
We now prove our final claim which is a uniform L bound for the wrapped density.
Lemma A3. 
Let Y = X + Z and assume A 1 and D ( P Y P Y ) ε with ε 1 / A . Then there is an explicit absolute constant M 0 such that
f X + Z A M 0 .
One may take
M 0 : = 3 π + 1 2 π e 3 α 0 , α 0 = 1 1 e t 2 / 2 d t .
Proof. 
Let M : = f Y . By Lemma A2,
M max ε α 0 , 2 e 2 α 0 · e 2 A .
Under ε 1 / A , we have ε / α 0 1 α 0 A ; hence for all A 1 ,
M 2 e 2 α 0 · e 2 A .
Now apply the wrapping Formula (32) with B = A :
f X + Z A ( θ ) = A π k Z f Y A π ( θ + 2 π k ) .
For k { 1 , 0 , 1 } , the argument ranges over [ 3 A , 3 A ] , so each term is at most M; hence
A π k { 1 , 0 , 1 } f Y A π ( θ + 2 π k ) 3 A π M .
For | k | 2 , note that for θ ( π , π ) ,
| A π ( θ + 2 π k ) | A ( 2 | k | 1 ) 3 A ,
so | y | A 2 A ( | k | 1 ) with y = A π ( θ + 2 π k ) . By ([8] Equation (75)),
f Y ( y ) f Y ( A ) exp ( | y | A ) 2 2 M exp 2 A 2 ( | k | 1 ) 2 .
Therefore,
A π | k | 2 f Y A π ( θ + 2 π k ) A π M · 2 m = 1 e 2 A 2 m 2 A π M · 2 0 e 2 A 2 t 2 d t = M 2 π ,
where we used the integral comparison m 1 g ( m ) 0 g ( t ) d t for decreasing g and 0 e 2 A 2 t 2 d t = 1 2 A π 2 . Combining the pieces,
f X + Z A 3 A π M + M 2 π 3 π + 1 2 π · 2 e 2 α 0 e 2 ,
which is exactly (A11). □

Appendix A.2. Proof of Property 5

We start with the following lemma.
Lemma A4. 
Let P be a distribution on ( π , π ) with density f satisfying f M , and let Q be uniform on ( π , π ) . Then
TV ( P , Q ) 1 2 ( 2 π M + 1 ) χ 2 ( P Q ) .
Proof. 
Since Q has density 1 / ( 2 π ) ,
χ 2 ( P Q ) = π π ( f 1 2 π ) 2 1 2 π d θ = π π | 2 π f 1 | · | f 1 2 π | d θ ( 2 π M + 1 ) π π | f 1 2 π | d θ = 2 ( 2 π M + 1 ) TV ( P , Q ) ,
which rearranges to (A20). □
The proof is competed by noting that P U + Z A according to Property 2 is uniform on ( π , π ) .

References

  1. Dytso, A.; Yagli, S.; Poor, H.V.; Shamai, S. The Capacity Achieving Distribution for the Amplitude Constrained Additive Gaussian Channel: An Upper Bound on the Number of Mass Points. IEEE Trans. Inf. Theory 2020, 66, 2006–2022. [Google Scholar] [CrossRef]
  2. Dytso, A.; Goldenbaum, M.; Shamai, S.; Poor, H.V. Upper and lower bounds on the capacity of amplitude-constrained MIMO channels. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Singapore, 4–8 December 2017; pp. 1–6. [Google Scholar]
  3. Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shamai, S. When Are Discrete Channel Inputs Optimal?—Optimization Techniques and Some New Results. In Proceedings of the Conference on Information Sciences and Systems, Princeton, NJ, USA, 21–23 March 2018; pp. 1–6. [Google Scholar]
  4. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  5. Smith, J.G. On the Information Capacity of Peak and Average Power Constrained Gaussian Channels. Ph.D. Dissertation, University of California, Berkeley, CA, USA, 1969. [Google Scholar]
  6. Smith, J.G. The information capacity of amplitude-and variance-constrained scalar Gaussian channels. Inform. Control 1971, 18, 203–219. [Google Scholar] [CrossRef]
  7. Sharma, N.; Shamai, S. Transition points in the capacity-achieving distribution for the peak-power limited AWGN and free-space optical intensity channels. Probl. Inf. Transm. 2010, 46, 283–299. [Google Scholar] [CrossRef]
  8. Wang, H.; Barletta, L.; Dytso, A. An Improved Lower Bound on Cardinality of Support of the Amplitude-Constrained AWGN Channel. arXiv 2025, arXiv:2512.22691. [Google Scholar]
  9. Mattingly, H.H.; Transtrum, M.K.; Abbott, M.C.; Machta, B.B. Maximizing the information learned from finite data selects a simple model. Proc. Natl. Acad. Sci. USA 2018, 115, 1760–1765. [Google Scholar] [CrossRef] [PubMed]
  10. Abbott, M.C.; Machta, B.B. A scaling law from discrete to continuous solutions of channel capacity problems in the low-noise limit. J. Stat. Phys. 2019, 176, 214–227. [Google Scholar] [CrossRef]
  11. Zhang, Z. Discrete Noninformative Priors. Ph.D. Thesis, Yale University, New Haven, CT, USA, 1994. [Google Scholar]
  12. Blahut, R. Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory 1972, 18, 460–473. [Google Scholar] [CrossRef]
  13. Arimoto, S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Trans. Inf. Theory 1972, 18, 14–20. [Google Scholar] [CrossRef]
  14. McKellips, A.L. Simple tight bounds on capacity for the peak-limited discrete-time channel. In Proceedings of the IEEE International Symposium on Information Theory, Chicago, IL, USA, 27 June–2 July 2004; p. 348. [Google Scholar]
  15. Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shamai, S. Amplitude constrained MIMO channels: Properties of optimal input distributions and bounds on the capacity. Entropy 2019, 21, 200. [Google Scholar] [CrossRef]
  16. Thangaraj, A.; Kramer, G.; Böcherer, G. Capacity Bounds for Discrete-Time, Amplitude-Constrained, Additive White Gaussian Noise Channels. IEEE Trans. Inf. Theory 2017, 63, 4172–4182. [Google Scholar] [CrossRef]
  17. Abou-Faycal, I.C.; Trott, M.D.; Shamai, S. The capacity of discrete-time memoryless Rayleigh-fading channels. IEEE Trans. Inf. Theory 2001, 47, 1290–1301. [Google Scholar] [CrossRef]
  18. Katz, M.; Shamai, S. On the capacity-achieving distribution of the discrete-time noncoherent and partially coherent AWGN channels. IEEE Trans. Inf. Theory 2004, 50, 2257–2270. [Google Scholar] [CrossRef]
  19. Shamai, S. Capacity of a pulse amplitude modulated direct detection photon channel. IEE Proc. I (Commun. Speech Vis.) 1990, 137, 424–430. [Google Scholar] [CrossRef]
  20. Dytso, A.; Barletta, L.; Shamai, S. Properties of the Support of the Capacity-Achieving Distribution of the Amplitude-Constrained Poisson Noise Channel. IEEE Trans. Inf. Theory 2021, 67, 7050–7066. [Google Scholar] [CrossRef]
  21. Fahs, J.; Abou-Faycal, I. On properties of the support of capacity-achieving distributions for additive noise channel models with input cost constraints. IEEE Trans. Inf. Theory 2017, 64, 1178–1198. [Google Scholar] [CrossRef]
  22. Tchamkerten, A. On the discreteness of capacity-achieving distributions. IEEE Trans. Inf. Theory 2004, 50, 2773–2778. [Google Scholar] [CrossRef]
  23. Chan, T.H.; Hranilovic, S.; Kschischang, F.R. Capacity-achieving probability measure for conditionally Gaussian channels with bounded inputs. IEEE Trans. Inf. Theory 2005, 51, 2073–2088. [Google Scholar] [CrossRef]
  24. Abou El Hessen, T.; Tuninetti, D.; Belkhadir, A.; Banerjee, A. Channel Capacity Analysis with Nonlinear Effects of RF Power Amplifiers. In Proceedings of the 2025 IEEE International Symposium on Information Theory (ISIT), Ann Arbor, MI, USA, 22–27 June 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–6. [Google Scholar]
  25. Stapmanns, J.; Dias, C.; Eilers, L.; Kühn, T.; Pfister, J.P. Phase Transitions of the Additive Uniform Noise Channel with Peak Amplitude and Cost Constraint. arXiv 2025, arXiv:2510.12427. [Google Scholar] [CrossRef]
  26. Ma, Y.; Wu, Y.; Yang, P. On the Best Approximation by Finite Gaussian Mixtures. IEEE Trans. Inf. Theory 2025, 71, 5469–5492. [Google Scholar] [CrossRef]
  27. Ungerboeck, G. Channel coding with multilevel/phase signals. IEEE Trans. Inf. Theory 2003, 28, 55–67. [Google Scholar] [CrossRef]
  28. Ozarow, L.H.; Wyner, A.D. On the capacity of the Gaussian channel with a finite number of input levels. IEEE Trans. Inf. Theory 1990, 36, 1426–1428. [Google Scholar] [CrossRef]
  29. Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shitz, S.S. A generalized Ozarow-Wyner capacity bound with applications. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1058–1062. [Google Scholar]
  30. Topsøe, F. An information theoretical identity and a problem involving capacity. Stud. Sci. Math. Hung. 1967, 2, 246. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Barletta, L.; Dytso, A. Support Size of ε-Capacity-Achieving Inputs for the Amplitude-Constrained AWGN Channel. Entropy 2026, 28, 500. https://doi.org/10.3390/e28050500

AMA Style

Barletta L, Dytso A. Support Size of ε-Capacity-Achieving Inputs for the Amplitude-Constrained AWGN Channel. Entropy. 2026; 28(5):500. https://doi.org/10.3390/e28050500

Chicago/Turabian Style

Barletta, Luca, and Alex Dytso. 2026. "Support Size of ε-Capacity-Achieving Inputs for the Amplitude-Constrained AWGN Channel" Entropy 28, no. 5: 500. https://doi.org/10.3390/e28050500

APA Style

Barletta, L., & Dytso, A. (2026). Support Size of ε-Capacity-Achieving Inputs for the Amplitude-Constrained AWGN Channel. Entropy, 28(5), 500. https://doi.org/10.3390/e28050500

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop