Next Article in Journal
A Link Prediction Algorithm Based on Weighted Local and Global Closeness
Next Article in Special Issue
Entropic Bounds on the Average Length of Codes with a Space
Previous Article in Journal
Image Encryption Using a New Hybrid Chaotic Map and Spiral Transformation
Previous Article in Special Issue
Assisted Identification over Modulo-Additive Noise Channels
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smoothing of Binary Codes, Uniform Distributions, and Applications

Department of ECE and Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(11), 1515; https://doi.org/10.3390/e25111515
Submission received: 26 August 2023 / Revised: 30 October 2023 / Accepted: 1 November 2023 / Published: 5 November 2023
(This article belongs to the Special Issue Extremal and Additive Combinatorial Aspects in Information Theory)

Abstract

:
The action of a noise operator on a code transforms it into a distribution on the respective space. Some common examples from information theory include Bernoulli noise acting on a code in the Hamming space and Gaussian noise acting on a lattice in the Euclidean space. We aim to characterize the cases when the output distribution is close to the uniform distribution on the space, as measured by the Rényi divergence of order α ( 1 , ] . A version of this question is known as the channel resolvability problem in information theory, and it has implications for security guarantees in wiretap channels, error correction, discrepancy, worst-to-average case complexity reductions, and many other problems. Our work quantifies the requirements for asymptotic uniformity (perfect smoothing) and identifies explicit code families that achieve it under the action of the Bernoulli and ball noise operators on the code. We derive expressions for the minimum rate of codes required to attain asymptotically perfect smoothing. In proving our results, we leverage recent results from harmonic analysis of functions on the Hamming space. Another result pertains to the use of code families in Wyner’s transmission scheme on the binary wiretap channel. We identify explicit families that guarantee strong secrecy when applied in this scheme, showing that nested Reed–Muller codes can transmit messages reliably and securely over a binary symmetric wiretap channel with a positive rate. Finally, we establish a connection between smoothing and error correction in the binary symmetric channel.

1. Introduction

Many problems of information theory involve the action of a noise operator on a code distribution, transforming it into some other distribution. For instance, one can think of Bernoulli noise acting on a code in the Hamming space or Gaussian noise acting on a lattice in the Euclidean space. We are interested in characterizing the cases when the output distribution is close to the uniform distribution on the space. Versions of this problem have been considered under different names, including resolvability [1,2,3], smoothing [4,5], discrepancy [6,7], and the entropy of noisy functions [8,9,10]. Direct applications of smoothing include secrecy guarantees in both the binary symmetric wiretap channel [2,3,11] and the Gaussian wiretap channel [12,13], error correction in the binary symmetric channel (BSC) [14,15], converse coding theorems of information theory [1,16,17,18], strong coordination [11,19,20,21,22], secret key generation [13,23], and worst-to-average case reductions in cryptography [5,24]. Some aspects of this problem also touch upon approximation problems in statistics and machine learning [25,26,27].
Our main results are formulated for the smoothing in the binary Hamming space H n . For r : H n R 0 + , and f : H n R define
T r f ( x ) = ( r f ) ( x ) : = z H n r ( z ) f ( x z )
as the action of r on the functions on the space. We set r to be a probability mass function (pmf) and call the function T r f the noisy version of f with respect to r, and refer to r and T r as a noise kernel and a noise operator, respectively. By smoothing f with respect to r, we mean applying the noise kernel r to f. We often assume that r ( x ) is a radial kernel, i.e., its value on the argument x H n depends only on the Hamming weight of x.
There are several ways to view the smoothing operation. Interpreting it as a shift-invariant linear operator, we note that, from Young’s inequality, T r f α = f r α f α , 1 α , so smoothing contracts the α -norm. Upon applying T r , the noisy version of f becomes “flatter”; hence, the designation “smoothing”. Note that if f is a pmf, then T r f is also a pmf, and so this view allows us to model the effect of communication channels with additive noise.
The class of functions that we consider are (normalized) indicators of subsets (codes) in H n . A code C H n defines a pmf f C = 1 C | C | , and, thus, T r f C can be viewed as a noisy version of the code (we also sometimes call it a noisy distribution) with respect to the kernel r. The main question of interest for us is the proximity of this distribution to U n , or the “smoothness” of the noisy code distributions. To quantify closeness to U n , we use the Kullback–Leibler (KL) and Rényi divergences (equivalently, L α norms), and the smoothness measured in D α ( · · ) is termed the D α -smoothness ( L α -smoothness).
We say that a code is perfectly smoothable with respect to the noise kernel r if the resultant noisy distribution becomes uniform. Our main emphasis is on the asymptotic version of perfect smoothing and its implications for some of the basic information-theoretic problems. A sequence of codes ( C n ) n is asymptotically smoothed by the kernel sequence r n if the distance between ( T r n f C n ) and U n approaches 0 as n increases. This property is closely related to the more general problem of channel resolvability introduced by Han and Verdú in [1]. Given a discrete memoryless channel W ( Y | X ) and a distribution P X , we observe a distribution P Y on the output of the channel. The task of channel resolvability is to find P X supported on a subset C H n that approximates P Y with respect to the KL divergence. As shown in [1], there exists a threshold value of the rate such that it is impossible to approximate P Y using codes of lower rate, while any output process can be approximated by a well-chosen code of a rate larger than the threshold. Other proximity measures between distributions were considered for this problem in [3,28,29]. Following the setting in [3], we consider Rényi divergences for measuring the closeness to uniformity. We call the minimum rate required to achieve perfect asymptotic smoothing the D α -smoothing capacity of the noise kernels ( r n ) n , where the proximity to uniformity is measured by the α -Rényi divergence. In this work, we characterize the D α -smoothing capacity of the sequence ( r n ) n using its Rényi entropy rate.
Asymptotic smoothing. We will limit ourselves to studying smoothing bounds under the action of the Bernoulli noise or ball noise kernels, defined formally below. A common approach to deriving bounds on the norm of a noisy function is through hypercontractivity inequalities [30,31,32]. In its basic version, given a code C of size M, it yields the estimate
T δ f C α f C α = M 1 α α 2 n α ,
where T δ is the Bernoulli kernel (see Section 2 for formal definitions) and α = 1 + ( 1 2 δ ) 2 ( α 1 ) . This upper bound does not differentiate codes yielding higher or lower smoothness, which in many situations may not be sufficiently informative. Note that other tools, such as “Mrs. Gerber’s lemma” [30,33] or strong data-processing inequalities, also suffer from the same limitation.
A new perspective of the bounds for smoothing has recently been introduced in the works of Samorodnitsky [8,9,10]. Essentially, his results imply that codes satisfying certain regularity conditions have good smoothing properties. Their efficiency is highlighted in recent papers [14,34], which leveraged results for code performance on the binary erasure channel (BEC) to prove strong claims about the error correction capabilities of the codes when used on the BSC. Using Samorodnitsky’s inequalities, we show that the duals of some BEC capacity-achieving codes achieve D α -smoothing capacity for α { 2 , 3 , , } with respect to the Bernoulli noise. This includes the duals of polar codes and doubly transitive codes, such as the Reed–Muller (RM) codes.
Smoothing and the wiretap channel. Wyner’s wiretap channel [35] models communication in the presence of an eavesdropper. Code design for this channel pursues reliable communication between the legitimate parties, while at the same time leaking as little information as possible about the transmitted messages to the eavesdropper. The connection between secrecy in wiretap channels and resolvability was first mentioned by Csiszár [36] and later developed by Hayashi [2]. It rests on the observation that to achieve secrecy it suffices to make the distribution of an eavesdropper’s observations conditioned on the transmitted message nearly independent of the message. The idea of characterizing secrecy based on smoothness works irrespective of the measure of secrecy [2,3,11], and it was also employed for nested lattice codes used over the Gaussian wiretap channel in [12].
Secrecy on the wiretap channel can be defined in two ways, measured by the information gained by the eavesdropper, and it depends on whether this quantity is normalized to the number of channel uses (weak secrecy) or not (strong secrecy). This distinction was first highlighted by Maurer [37], and it has been adopted widely in the recent literature. Early papers devoted to code design for the wiretap channel relied on random codes, but, for simple channel models such as BSC or BEC, this has changed with the advent of explicit capacity-approaching code families. Weak secrecy results based on LDPC codes were presented in [38], but initial attempts to attain strong secrecy encountered some obstacles. To circumvent this, the first works on code construction [39,40] had to assume that the main channel is noiseless. The problem of combining strong secrecy and reliability for general wiretap channels was resolved in [41], but that work had to assume that the two communicating parties share a small number of random bits unavailable to the eavesdropper. Apart from the polar coding scheme of [41], explicit code families that support reliable communication with positive rate and strong secrecy have not previously appeared in the literature. In this work, we show that nested RM codes perform well in binary symmetric wiretap channels based on their smoothing properties. While our work falls short of proving that nested RM codes achieve capacity, we show that they can transmit messages reliably and secretly at rates close to capacity.
Ball noise and decoding error. Ball-noise smoothing provides a tool for estimating the error probability of decoding on the BSC. We derive impossibility and achievability bounds for the D α -smoothness of noisy distributions with respect to the ball noise. Smoothing of a code with respect to the L 2 norm plays a special role because, in this case, the second norm (the variance) of the resulting distribution can be expressed via the pairwise distance between codewords, enabling one to rely on tools from Fourier analysis. The recent paper by Debris-Alazard et al. [4] established universal bounds for the smoothing of codes or lattices, with cryptographic reductions in mind. The paper by Sprumont and Rao [15] addressed bounds for error probability of list decoding at rates above BSC capacity. A paper by one of the present authors [42] studied the variance of the number of codewords in balls of different radii (a quantity known as the quadratic discrepancy [43,44]).
The main contributions of this paper are the following:
  • Characterizing the D α -smoothing capacities of noise operators on the Hamming space for α ( 1 , ] .
  • Identifying some explicit code families that attain a smoothing capacity of the Bernoulli noise for α { 2 , 3 , , } ;
  • Obtaining rate estimates for the RM codes used on the BSC wiretap channel under the strong secrecy condition;
  • Showing that codes possessing sufficiently good smoothing properties are suitable for error correction.
In Section 2, we set up the notation and introduce the relevant basic concepts. Then, in Section 3, we derive expressions for the D α -smoothing capacities for α ( 1 , ] , and in Section 4, we use these results to analyze the smoothing of code families under the action of the Bernoulli noise. Section 5 is devoted to the application of these results for the binary symmetric wiretap channel. In particular, we show that RM codes can achieve rates close to the capacity of the BSC wiretap channel, while at the same time guaranteeing strong secrecy. In Section 6, we establish threshold rates for smoothing under ball noise, and derive bounds for the error probability of decoding on the BSC, including the list case, based on the distance distribution. Concluding the paper, Section 7 briefly points out that the well-known class of uniformly packed codes are perfectly smoothable with respect to “small” noise kernels.

2. Preliminaries

2.1. Notation

Throughout this paper, H n is the binary n-dimensional Hamming space
Balls and spheres. Denote by B ( x , t ) : = { y H n : | y x | t } the metric ball of radius t in H n with center at x, and denote by S ( x , t ) : = { y H n : | y x | = t } the sphere of radius t. Let V t = | B ( x , t ) | be the volume of the ball, and let μ t ( i ) be the intersection volume of two balls of radius t whose centers are distance i apart:
μ t ( i ) = | B ( 0 , t ) B ( x , t ) | , where | x | = i .
Codes and distributions. A code C is a subset in H n . The rate and distance of the code are denoted by R ( C ) : = log | C | / n and d ( C ) , respectively. Let
A i = 1 | C | | { ( x , y ) C 2 : d ( x , y ) = i } |
and let ( A i , i = 0 , , n ) be the distance distribution of the code. If the code C forms an F 2 -linear subspace in H n , we denote by C : = { y H n : i x i y i = 0 for all x C } its dual code.
The function 1 C denotes the indicator of a subset C H n , and f C = 1 C | C | is the corresponding pmf denoting the uniform distribution over the set, calling it a code distribution. Let b t denote the uniform distribution on the ball B 0 , t , given by b t ( x ) = 1 B 0 , t V t . In the context of noise operators, we refer to T b t as the ball noise. Finally, β δ is the binomial distribution on H n , given by
β δ ( x ) = β δ ( n ) ( x ) = δ | x | ( 1 δ ) n | x | ,
and U n is the uniform distribution, given by U n ( x ) = 2 n for all x .
Entropies and norms. For a function f : H n R , we define its α -norm as follows.
f α = 1 2 n x H n | f ( x ) | α 1 / α for α ( 0 , ) f = max x H n | f ( x ) | .
Given a pmf P, let
H ( P ) = i P i log P i ,
H α ( P ) = 1 1 α log i P i α
denote its Shannon entropy and Rényi entropy of order α , respectively. If P is supported on two points, we write h ( P ) and h α ( P ) instead (all logarithms are to the base 2). The limiting cases of α = 0 , 1 , are well-defined; in particular, for α = 1 , H α ( P ) reduces to H ( P ) .
For two discrete probability distributions P and Q, the α -Rényi divergence (or simply the α -divergence) is defined as follows:
D α ( P Q ) = log Q ( { i : P i > 0 } ) if   α = 0 1 α 1 log i P i α Q i ( α 1 ) if   α ( 0 , 1 ) ( 1 , ) i P i log P i Q i if   α = 1 max i log P i Q i if   α = .
The divergence D α ( P Q ) is a continuous function of α for α [ 0 , ] . For a pmf f on H n
D α ( f U n ) = α α 1 log 2 n f α , α ( 0 , 1 ) ( 1 , )
D ( f U n ) = log 2 n f .
Note that D α ( f U n ) = n H α ( f ) for all 0 α .
Channels. In this paper, a channel is a conditional probability distribution W : { 0 , 1 } Y , where Y is a finite set, so that W ( y | x ) is the conditional probability of the output y for the input x. We frequently consider the binary symmetric channel with crossover probability δ and the binary erasure channel with erasure probability λ , abbreviating them as BSC ( δ ) and BEC ( λ ) , respectively. We are often interested in the n-fold channel W ( n ) , i.e., the conditional probability distribution corresponding to n-uses of the channel. For the input X, let Y ( X , W ) be the random output of the channel W ( n ) . If the input sequences are chosen from a uniform distribution on a code C , we denote the input by X C . Since the number of uses of the channel is usually clear from the context, we suppress the dependency on n from the notation for channels and sequences.
Let C be a code of length n. For a channel W and input X C , the block-MAP decoder is defined as
x ^ ( y ) = error x C Pr ( x | y ) .
For a given code and channel, denote the error probability of the block-MAP decoding by
P B ( W , C ) = Pr ( X C X ^ ( Y ( X C , W ) ) .

2.2. D α - and L α -Smoothness

Recall that in the introduction, we expressed the smoothness of a distribution as its proximity to uniformity. Here, we formalize this notion based on two (equivalent) proximity measures.
Let g be a pmf on H n . A natural measure of the uniformity of g is D α ( g U n ) ( α [ 0 , ] ). We call this the D α -smoothness of g. Observe that
2 n g α = g α g 1 1 for α ( 1 , ] , and
2 n g α = g α g 1 1 for α ( 0 , 1 )
with equality iff g = U n . Thus, the better the pmf g approximates uniformity, the closer is 2 n g α to 1 (the denominator is simply a normalization quantity that allows dimension-agnostic analysis). Therefore, 2 n g α ( α ( 0 , 1 ) ( 1 , ] ) can be considered as another measure of proximity. We call 2 n g α the L α -smoothness of g. From (7) and (8), it follows that the D α -smoothness and L α -smoothness are equivalent.
Remark 1. 
It is easily seen that D α ( g U n ) = n H α ( g ) ; hence, D α ( g U n ) is an increasing function of α.
Recall that for a given code C , and a noise kernel r, T r f C = r f C is the noisy distribution of code C with respect to r. We intend to study the smoothing properties of such noisy distributions of codes. In particular, we characterize the necessary conditions for D α ( T r f C U n ) to be close to zero (equivalently, for 2 n T r f C α close to one). In Section 3, we quantify these requirements in the asymptotic setting.

2.3. Resolvability

The problem of channel resolvability was introduced by Han and Verdú [1] under the name of approximating the output statistics of the channel. The objective of channel resolvability is to approximate the output distribution of a given input by the output distribution of a code with a smaller support size. In this work, we are interested in code families whose noisy distributions approximate uniformity. Resolvability characterizes the necessary conditions for this to happen in terms of the rate of the code.
Let W be a (discrete memoryless) channel whose input alphabet is X and whose output alphabet is Y . Let X = { X n } n = 1 be a discrete-time random process where the RVs X n take values in X . Denote by Y n the random output of W with input X n and let Y = { Y n } n = 1 . Denote by P Y the distribution of Y and let P Y ( n ) be the pmf of the n-tuple Y ( n ) : = { Y 1 , Y 2 , , Y n } .
For a legitimate (realizable) output process Y , define
J ( Δ ) ( W , P Y ) = inf C n X n { lim inf n R ( C n ) : Δ ( f C n , P Y ( n ) ) 0 } ,
where Δ is a measure of closeness of a pair of probability distributions. In words, we look for sequences of distributions ( f C n ) n of the smallest possible rate that approximates P Y on the output of W .
The original problem as formulated by Han and Verdú in [1] seeks to find the resolvability of the channel, defined as
C r ( Δ ) ( W ) = inf P Y { J ( Δ ) ( W , P Y ) : Y is an output process over W } .
where Δ is either the variational distance or the normalized KL divergence 1 n D ( · · ) . Hayashi [2] considered the same problem where the proximity was measured by the unnormalized KL divergence. In each case, the resolvability equals the Shannon capacity of the channel W .
Theorem 1 
([1,2]). Let W be a discrete memoryless channel. Suppose that Δ is either the KL divergence (normalized or not) or the variational distance; then, the resolvability is given by
C r ( Δ ) ( W ) = C ( W ) ,
where C ( W ) is the Shannon capacity of the channel.
The authors of [1] proved this result under the additional assumption that the channel W satisfies strong converse, and Hayashi [2] later showed that this assumption is unessential.
In addition to the proximity measures considered in Theorem 1, the papers [3,28,29] considered other possibilities. In particular, Yu and Tan [3] studied the resolvability problem for a specific target distribution P Y and for the Rényi divergence Δ = D α (6). Their main result is as follows.
Theorem 2 
([3], Theorem 2). Let W be a channel and P Y be an output distribution. then
J ( D α ) ( W , P Y ) = min P X P ( W , P Y ) x P X ( x ) D α ( W ( · | x ) P Y ) i f α ( 1 , 2 ] { } min P X P ( W , P Y ) D ( W P Y | P X ) i f α ( 0 , 1 ] 0 i f α = 0 ,
where P ( W , P Y ) is the set of distributions P X consistent with the output P Y .
A direct corollary of Theorem 2 is the following:
Corollary 1 
([3], Equation (55)). Let Y be the output process where for each n, Y n Ber ( 1 / 2 ) . Then,
J ( D α ) ( BSC ( δ ) , P Y ) = 1 h α ( δ ) i f α ( 1 , 2 ] { } 1 h ( δ ) i f α ( 0 , 1 ] 0 i f α = 0 .
This corollary gives necessary conditions for the rate of codes that can approximate the uniform distribution via smoothing. We will connect this result to the problem of finding smoothing thresholds in Section 4.

3. Perfect Smoothing—The Asymptotic Case

For a given family of noise kernels ( T r n ) n , there exists a threshold rate such that it is impossible to approximate uniformity with codes of rate below the threshold irrespective of the chosen code, while at the same time, there exist families of codes with a rate above the threshold that allows perfect approximation in the limit of infinite length. For instance, for the Bernoulli ( δ ) noise applied to a code C , the smoothed distribution is nonuniform unless C = H n or δ = 1 / 2 . At the same time, it is possible to approach the uniform distribution asymptotically for large n once the code sequence satisfies certain conditions. Intuitively, it is clear that, for a fixed noise kernel, it is easier to approximate uniformity if the code rate is sufficiently high. In this section, we characterize the threshold rate for (asymptotically) perfect smoothing. Of course, the threshold also depends on the proximity measure Δ that we are using. In this section, we use perfect smoothing to mean “asymptotically perfect”. If the proximity measure Δ for smoothing is not specified, this means that we are using the KL divergence. We obtain the threshold rates for perfect smoothing measured with respect to the α -divergence for several values of α . In the subsequent sections, we work out the details for the Bernoulli and ball noise operators, which also have some implications for communication problems.
Definition 1. 
Let ( C n ) n be a sequence of codes of increasing length n and let 0 α . We say that the sequence C n is asymptotically perfectly D α -smoothable with respect to the noise kernels r n if
lim n D α ( T r n f C n U n ) = 0 ,
or equivalently (7) and (8) if
lim n 2 n T r n f C n α = 1 ( α 0 , 1 ) .
One can also define a dimensionless measure for perfect asymptotic smoothing by considering the limiting process
T r n f C n U n α T r n f C n 1 = 2 n T r n f C n U n α 0 .
Proposition 1. 
Convergence in (13) implies perfect smoothing for all 1 < α and is equivalent to it for α .
Proof. 
Let C = C n H n for some fixed n. Since by the triangle inequality,
2 n T r f C α 1 2 n T r f C U n α ,
(13) is not weaker than the mode of convergence in Definition 1 for all α [ 1 , ] . For α 1 , , we use Clarkson’s inequalities ([45], p. 388). Their form depends on α ; namely, for 2 α < , we have
1 + 2 n T r f C 1 2 α α 2 n T r f C + 1 2 α α + 2 n T r f C 1 2 α α 1 2 ( 2 n T r f C α α + 1 ) .
For 1 < α < 2 , the inequality has the form
1 + 2 n T r f C 1 2 α α 2 n T r f C + 1 2 α α + 2 n T r f C 1 2 α α 1 2 ( 2 n T r f C α α + 1 ) α / α ,
where α = α α 1 is the Hölder conjugate. These equations show that, for α ( 1 , ) , 2 n T r n f C n α 1 implies 2 n T r n f C n 1 α 0 , establishing the claimed equivalence. □
Definition 2. 
Let ( r n ) n be a sequence of noise kernels. We say that the rate R is achievable for perfect D α -smoothing if there exists a sequence of codes ( C n ) n such that R ( C n ) R as n and ( C n ) n is perfectly D α -smoothable.
Note that if R 1 is achievable, then any rate 1 R 2 > R 1 is also achievable. Indeed, consider a (linear) code C 1 of rate R 1 that has good smoothing properties. Construct C 2 by taking the union of 2 n ( R 2 R 1 ) non-overlapping shifts of C 1 . Then the rate of C 2 is R 2 , and since each shift has good smoothing properties, the same is true for C 2 . Therefore, let us define the main concept of this section.
Definition 3. 
Given a sequence of kernels r = ( r n ) n , define the D α -smoothing capacity as
S α r : = inf ( C n ) n { lim inf n R ( C n ) : lim n D α ( T r n f C n U n ) = 0 } .
Note that this quantity is closely related to the resolvability: if, rather than optimizing on the output process in (12), we set the output distribution to uniform and take Δ = D α , then S α r equals J ( D α ) ( W , P Y ) for the channel W given by the noise kernel r . To avoid future confusion, we refer to the capacity of reliable transmission as Shannon’s capacity.
The following lemma provides a lower bound for D α -smoothness. It follows from Lemma 2 in [3], and we give a direct proof for completeness.
Lemma 1. 
Let C H n be a code of size M = 2 n R and let r be a noise kernel. Then, for α [ 0 , ]
D α ( T r f C U n ) n ( 1 R ) H α ( r ) .
Proof. 
We will first prove that 2 n T r f C α α 2 ( α 1 ) [ n ( 1 R ) H α ( r ) ] for α ( 1 , ) :
2 n T r f C α α = 2 n α 2 n x H n T r f C ( x ) α = 2 n ( α 1 ) x H n [ y H n r ( y ) f C ( x y ) ] α 2 n ( α 1 ) x H n y H n [ r ( y ) f C ( x y ) ] α = 2 n ( α 1 ) y H n r ( y ) α x H n f C ( x y ) α = 2 n α | C | ( α 1 ) r α α = 2 ( α 1 ) [ n ( 1 R ) H α ( r ) ] .
Together with (7), this implies that the claimed inequality holds for α ( 1 , ) .
A similar calculation shows that for α ( 0 , 1 ) , 2 n T r f C α α 2 ( α 1 ) [ n ( 1 R ) H α ( r ) ] , yielding the claim for α ( 0 , 1 ) . The limiting cases α = 0 , α = 1 , and α = follow by continuity of D α and H α for all α 0 .
Define
π ( α ) = lim inf n H α ( r n ) n
Lemma 1 shows that it is impossible to achieve perfect D α -smoothing if R < 1 π ( α ) . A question of interest is whether there exist sequences of codes of R > 1 π ( α ) that achieve perfect D α -smoothing. The next theorem shows that this is the case for α ( 1 , ] .
Theorem 3. 
Let r = ( r n ) n be a sequence of noise kernels and let α ( 1 , ] . Then,
S α r = 1 π ( α ) .
The proof relies on a random coding argument and is given in Appendix B. This result will be used below to characterize the smoothing capacity of the Bernoulli and ball noise operators.
Remark 2. 
Equality (16) does not hold in the case α [ 0 , 1 ] . From Theorem 4 below, the Bernoulli noise does not satisfy (16) for α [ 0 , 1 ) . To construct a counterexample for α = 1 , consider the noise kernel that is almost uniform except for one distinguished point, for instance, r n ( x ) = 2 ( n + 1 ) for x 0 and r n ( 0 ) = 1 2 + 1 2 n + 1 . Performing the calculations, we then obtain that S 1 r = 1 while π ( 1 ) = 1 2 .
Remark 3. 
It is worth noting that π ( α ) is a decreasing function of α for 0 α .

4. Bernoulli Noise

In this section, we characterize the value S α β δ for a range of values of α . Then, we provide explicit code families that attain the D α -smoothing capacities.
As already mentioned, the resolvability for β δ with respect to α -divergence was considered by Yu and Tan [3]. Their results, stated in Corollary 1, yield an expression for S α β δ for α [ 0 , 2 ] { } . The next theorem summarizes the current knowledge about S α β δ , where the claims for 2 < α < form new results.
Theorem 4. 
S α β δ = 0 i f α = 0 1 h ( δ ) i f α ( 0 , 1 ] 1 h α ( δ ) i f α ( 1 , ] .
Proof. 
The claims for α [ 0 , 1 ] follow from Corollary 1. The results for α = ( 1 , ] follow from Theorem 3 since H α ( β δ ) n = h α ( δ ) . □
Having quantified the smoothing capacities, let us examine the code families with strong smoothing properties. Since the D 1 -smoothing capacity and the Shannon capacity coincide, it is natural to speculate that codes that achieve the Shannon capacity when used on the BSC ( δ ) would also attain the D 1 -smoothing capacity. However, the following result demonstrates that the capacity-achieving codes do not yield perfect smoothing. For typographical reasons, we abbreviate T β δ by T δ from this section onward.
Proposition 2. 
Let C n be a sequence of codes achieving a capacity of BSC ( δ ) . Then,
D ( T δ f C n U n ) , D ( T δ f C n U n ) = o ( n ) .
Proof. 
The second part of the statement is Theorem 2 in [46]. The first part is obtained as follows: Let C n be a capacity-achieving sequence of codes in BSC ( δ ) . Then, from [47] (Theorem 49), there exists a constant K > 0 such that n R ( C n ) n ( 1 h ( δ ) ) K n for large n. Therefore,
0 H ( X C n | Y BSC ( δ ) , X ) = n ( R ( C n ) + h ( δ ) 1 ) + D ( T δ f C n U n ) ,
which implies D ( T δ f C n U n ) K n . □
Apart from random codes, only polar codes are known to achieve D 1 -smoothing capacity. Before stating the formal result, recall that polar codes are formed by applying several iterations of a linear transformation to the input, which results in creating virtual channels for individual bits with Shannon’s capacity close to zero or to one, plus a vanishing proportion of intermediate-capacity channels. While by Proposition 2, that polar codes that achieve the BSC capacity cannot achieve D 1 -smoothing capacity, adding some intermediate-bit channels to the set of data bits makes this possible. This idea was first introduced in [39] and expressed in terms of resolvability in [48].
Theorem 5 
([48], Proposition 1). Let W be the BSC ( δ ) channel and W n ( i ) be the virtual channels formed after applying n steps of the polarization procedure. For γ ( 0 , 1 / 2 ) , define G n = { i { 1 , , n } : C ( W n ( i ) ) 2 n γ } . Let C n be the polar code corresponding to the virtual channels G n . Then, D ( T δ f C n U n ) 0 .
Note that lim n R ( C n ) = lim n | G n | n = 1 h ( δ ) . Hence, the polar code construction presented above achieves the perfect smoothing threshold with respect to the KL divergence. Furthermore, since the convergence in the α divergence for α < 1 is weaker than the convergence in α = 1 , the same polar code sequence is perfectly D α -smoothable for α < 1 . Noting that the smoothing threshold for α < 1 is 1 h ( δ ) by Theorem 4, we conclude that the above polar code sequence achieves smoothing capacity in α -divergence for α < 1 .
As mentioned earlier, the smoothing properties of code families other than random codes and polar codes have not been extensively studied. We show that the duals of capacity-achieving codes in the BEC exhibit good smoothing properties using the tools developed in [10]. As the first step, we establish a connection between the smoothing of a generic linear code and the erasure correction performance of its dual code.
Lemma 2. 
Let C be a linear code and let X C be a random uniform codeword of C . Let Y X C , BEC ( λ ) be the output of the erasure channel BEC ( λ ) for the input X C . Then,
D α ( T δ f C U n ) H ( X C | Y X C , BEC ( λ ) ) ,
where λ = ( 1 2 δ ) 2 for α = 1 and λ = 1 h α ( δ ) for α { 2 , 3 , , } .
The proof is given in Appendix D.
Using this lemma, we show that the duals of the BEC capacity-achieving codes (with growing distance) exhibit good smoothing properties. In particular, they achieve D α -smoothing capacities for α { 2 , 3 , , } .
Theorem 6. 
Let ( C n ) n be a sequence of linear codes with rate R n R . Suppose that the dual sequence ( C n ) n achieves Shannon’s capacity of the BEC ( λ ) with λ = R , and assume that d ( C n ) = ω ( log n ) . If R > ( 1 2 δ ) 2 , then,
lim n D ( T δ f C n U n ) = 0 .
Additionally, for α { 2 , 3 , , } , if R > 1 h α ( δ ) , then,
lim n D α ( T δ f C n U n ) = 0 .
In particular, the sequence C n achieves D α -smoothing capacity S α β δ for α { 2 , 3 , , } .
Proof. 
Since the dual codes achieve the capacity of the BEC, it follows from ([49], Theorem 5.2) that, if their distance grows with n, then their decoding error probability vanishes. In particular, if d ( C n ) = ω ( log ( n ) ) , then, P B ( BEC ( R ϵ ) , C n ) = o ( 1 n ) for all ϵ ( 0 , R ] . Hence, from Fano’s inequality,
lim n H ( X C n | Y X C n , BEC ( R ϵ ) ) = 0 .
Now, if R > ( 1 2 δ ) 2 , then there exists ϵ 0 such that R ϵ 0 = ( 1 2 δ ) 2 . Therefore, from Lemma 2,
lim n D ( T δ f C n U n ) lim n H ( X C n | Y X C n , BEC ( R ϵ 0 ) ) = 0 .
Similarly, if R > 1 h α ( δ ) for α { 2 , 3 , , } , then , lim n D α ( T δ f C n U n ) = 0 .
Together with Theorem 4, we have now proved the final claim. □
The known code families that achieve the capacity of the BEC include polar codes, LDPC codes, and doubly transitive codes, such as constant-rate RM codes. LDPC codes do not fit the assumptions because of low dual distance, but the other codes do. This yields explicit families of codes that achieve the D α -smoothing capacity.
We illustrate the results of this section in Figure 1, where the curves show the achievability and impossibility rates for perfect smoothing with respect to the Bernoulli noise. Given a code (sequence) of rate R, putting it through a noise β δ below the Shannon capacity cannot achieve perfect smoothing. The sequence of polar codes from [39], cited in Theorem 5, is smoothable at rates equal to the Shannon capacity, although these codes do not provide a decoding guarantee at that noise level. At the second curve from the bottom, the duals of the codes that achieve Shannon’s capacity in BEC achieve perfect D 1 -smoothing; at the third (fourth) curve, these codes are perfectly D 2 - (or D -) smoothable, and they achieve the corresponding smoothing capacity.
Remark 4. 
Observe that the strong converse of the channel coding theorem does not imply perfect smoothing. To give a quick example, consider a code C n = B ( 0 , δ n ) formed of all the vectors in the ball. Let 0 < δ < 1 / 2 and let us use this code on a BSC ( δ ) , where h ( δ ) + h ( δ ) > 1 and δ < 1 / 2 . From the choice of the parameters, the rate of C n is above capacity, and, therefore, P B ( BSC ( δ ) , C n ) 1 from the strong converse. At the same time,
D ( T δ f C n U n ) = n H ( b δ n β δ ) = n H ( β δ β δ ) + O ( n ) = n ( 1 h ( δ ( 1 δ ) + δ ( 1 δ ) ) ) + O ( n ) .
where the transition from the ball noise to the Bernoulli noise (the second equality) is shown in [30]. Since δ ( 1 δ ) + δ ( 1 δ ) ) < 1 / 2 for all δ < 1 / 2 , δ < 1 / 2 , we conclude that D ( T δ f C n U n ) 0 .
Remark 5. 
In this paper, we mostly study the trade-off between the rate of codes and the level of the noise needed to achieve perfect smoothing. A recent work of Debris-Alazard et al. [4] considered guarantees for smoothing derived from the distance distribution of codes and their dual distance (earlier, similar calculations were performed in [42,50]). Our approach enables us to find the conditions for perfect smoothing similar to [4] but relying on fewer assumptions.
Proposition 3. 
Let C n be a sequence of codes whose dual distance d ( C n ) n where ( 0 , 1 ) . If > ( 1 2 δ ) 2 , then,
lim n D ( T δ f C n U n ) = 0 .
Proof. 
Notice that lim n H ( X C n | Y X C , BEC ( λ ) ) = 0 if > λ . With this, the proof is a straightforward application of Lemma 2. □
Compared to [4], this claim removes the restrictions on the support of the dual distance distribution of the codes C n .

5. Binary Symmetric Wiretap Channels

In this section, we discuss applications of perfect smoothing to the BSC wiretap channel. Wyner’s wiretap channel model V [35] for the case of BSCs is defined as follows: The system is formed of three terminals, A , B , and E. Terminal A communicates with B by sending messages M chosen from a finite set M . Communication from A to B occurs over a BSC W b with crossover probability δ b , and it is observed by the eavesdropper E via another BSC W e with crossover probability δ e > δ b . A message M M is encoded into a bit sequence X H n and sent from A to B in n uses of the channel W b . Terminal B observes the sequence Y = X + W b , where W b Bin ( n , δ b ) is the noise vector, while terminal E observes the sequence Z = X + W e with W e Bin ( n , δ e ) . We assume that the messages are encoded into a subset of H n , which imposes some probability distribution on the input of the channels. The goal of the encoding is to ensure reliability and secrecy of communication. The reliability requirement amounts to the condition Pr ( M M ^ ) 0 as n , where M ^ is the estimate of M made by B. To ensure secrecy, we require the strong secrecy condition  I ( M ; Z ) 0 . This is in contrast to the condition 1 n I ( M ; Z ) 0 studied in the early works on the wiretap channel, which is now called weak secrecy. Denote by R = 1 n log | M | the transmission rate. The secrecy capacity C s ( V ) is defined as the supremum of the rates that permit reliable transmission, which also conforms to the secrecy condition.
The nested coding scheme, proposed by Wyner [35], has been the principal tool of constructing well-performing transmission protocols for the wiretap channel [38,39,41]. To describe it, let C e and C b be two linear codes such that C e C b and | M | = | C b | | C e | . We assign each message m to a unique coset of C e in C b . The sequence transmitted by A is a uniform random vector from the coset. As long as the rate of the code C b is below the capacity of W b , we can ensure the reliability of communication from A to B.
Strong secrecy can be achieved relying on perfect smoothing. Denote by c m a leader of the coset that corresponds to the message m. The basic idea is that if P Z | M = m = ( T δ f C e ) ( · + c m ) is close to a uniform distribution U n for all m, these conditional pmfs are almost indistinguishable from each other, and terminal E has no means of inferring the transmitted message from the observed bit string Z.
As mentioned earlier, the weak secrecy results for the wiretap channel based on LDPC codes and on polar codes were presented in [38,39], respectively. The problem that these schemes faced, highlighted in Theorems 2 and 5, is that code sequences that achieve BSC capacity have a rate gap of at least 1 / n to the capacity value. At the same time, the rate of perfectly smoothable codes must exceed the capacity by a similar quantity [51]. For this reason, the authors of [39] included the intermediate virtual channels in their polar coding scheme, which gave them strong secrecy, but interfered with transmission reliability. A similar general issue arose earlier in attempting to use LDPC codes for the wiretap channel [40].
Contributing to the line of work connecting smoothing and thewiretap channel [2,3,11], we show that nested coding schemes C e C b , where the code C b is good for error correction in BSC ( δ b ) and C e is perfectly smoothable with respect to β δ b , attain strong secrecy and reliability for a BSC wiretap channel ( δ b , δ e ) . As observed in Lemma 2, the duals of the good erasure-correcting codes are perfectly smoothable for certain noise levels and, hence, they form a good choice for C e in this scenario.
The following lemma establishes a connection between the smoothness of a noisy distribution of a code and strong secrecy.
Lemma 3. 
Consider the nested coding scheme for the BSC wiretap channel introduced above. If D ( T δ e f C e U n ) < ϵ , then I ( M ; Z ) < ϵ .
Proof. 
We have
D ( P Z | M U n | P M ) = m M z H n P M Z ( m , z ) log P Z | M ( z | m ) U n ( z ) = I ( M ; Z ) + D ( P Z U n ) .
Now, note that P Z | M = m ( z ) = ( T δ e f C e ) ( z + c m ) = P Z | M = m ( z + c m + c m ) , so D ( P Z | M = m U n ) is independent of m. Therefore, for all m M
D ( P Z | M = m U n ) = D ( P Z | M U n | P M ) = I ( M ; Z ) + D ( P Z U n ) I ( M ; Z ) .
This lemma enables us to formulate conditions for reliable communication while guaranteeing the strong secrecy condition. Namely, it suffices to take a pair (a sequence of pairs) of nested codes C e C b such that D ( T δ e f C e U n ) 0 as n . If at the same time the code C b corrects errors on a BSC ( δ b ) , then the scheme fulfills both the reliability and strong secrecy requirements under noise levels δ b and δ e for channels W b and W e , respectively, supporting transmission from A to B at rate R b R e . Together with the results established earlier, we can now make this claim more specific.
Theorem 7. 
Let ( ( C e n ) ) n and ( C b n ) n be sequences of linear codes that achieve the capacity of the BEC for their respective rates. Suppose that C e n C b n and
1 
d ( ( C e n ) ) = ω ( log n ) , R ( C e n ) R e ;
2 
d ( C b n ) = ω ( log n ) , R ( C b n ) R b .
If R b < 1 log ( 1 + 2 δ b ( 1 δ b ) ) and R e > 4 δ e ( 1 δ e ) , then the nested coding scheme based on C e n and C b n can transmit messages with rate R b R e from A to B, satisfying the reliability and strong secrecy conditions.
Proof. 
From Corollary A1, the conditions d ( C b ( n ) ) = ω ( log n ) and R b < 1 log ( 1 + 2 δ b ( 1 δ b ) ) guarantee transmission reliability. Furthermore, by Theorem 6, the conditions d ( ( C e n ) ) = ω ( log n ) and R e > 4 δ e ( 1 δ e ) imply that D ( T δ e f C e U n ) 0 , which in its turn implies strong secrecy by Lemma 3. □
To give an example of a code family that satisfies the assumptions of this theorem, consider the RM codes of constant rate. Namely, let C e n C b n be two sequences of RM codes whose rates converge to R e and R b , respectively. Note that the duals of the RM codes are themselves RM codes. By a well-known result [52], the RM codes achieve the capacity of the BEC, and for any sequence of constant-rate RM codes, the distance scales as 2 Θ ( n ) . Therefore, the RM codes satisfy the assumptions of Theorem 7.
Note that for the RM codes, we can obtain a stronger result, based on their error correction properties on the BSC. Involving this additional argument brings them closer to the secrecy capacity under the strong secrecy assumption.
Theorem 8. 
Let C e n and C b n be two sequences of RM codes satisfying C e n C b n whose rates approach R e > 0 and R b > 0 , respectively. If R b < 1 h ( δ b ) and R e > 4 δ e ( 1 δ e ) , then the nested coding scheme based on C e n and C b n supports transmission on a BSC wiretap channel ( δ b , δ e ) with rate R b R e , guaranteeing communication reliability and strong secrecy.
Proof. 
Very recently, Abbe and Sandon [53], building upon the work of Reeves and Pfister [54], proved that RM codes achieve capacity in symmetric channels. Therefore, the condition R b < 1 h ( δ b ) guarantees reliability. The rest of the proof is similar to that of Theorem 7. □
Theorems 7 and 8 stop short of constructing codes that attain the secrecy capacity of the channel (this is similar to the results of [14] for the transmission problem over the BSC). To quantify the gap to capacity, we plot the smoothing and decodability rate bounds in Figure 2.
As an example, let us set the noise parameters δ b = 0.05 and δ e = 0.3 and denote the corresponding secrecy capacity by C s . Suppose that we use a BEC capacity-achieving code as code C b and a dual of a BEC capacity-achieving code as code C e in the nested scheme. The value R is the largest rate at which we can guarantee both reliability and strong secrecy. In the example in Figure 2, C s = R b ( 1 ) R e ( 1 ) = 0.5949 and R = R b ( 2 ) R e ( 2 ) = 0.3181 . The only assumption required here is that the codes C e and C b have good erasure correction properties.
As noted, generally, the RM codes support a higher communication rate than the R . Let R be their achievable rate. For the same noise parameters as above, we obtain R = R b ( 1 ) R e ( 2 ) = 0.5536 , which is closer to C s than R .
Remark 6. 
The fact that the RM codes achieve capacity in symmetric channels immediately implies that nested RM codes achieve the secrecy capacity in the BSC wiretap channel under weak secrecy. While it is tempting to assume that, coupled with the channel duality theorems of [55,56], this result also implies that RM codes fulfil the strong secrecy requirement on the BSC wiretap channel, an immediate proof looks out of reach [57].

Secrecy from α -Divergence

Classically, the (strong) secrecy in the wiretap channel is measured by I ( M , Z ) . In [11], slightly weaker secrecy measures were considered besides the mutual information. However, more stringent secrecy measures may be required in certain scenarios; α -divergence-based secrecy measures were introduced by Yu and Tan [3] as a solution to this problem.
Observe that the secrecy measured by D α ( P Z | M U n | M ) for α 1 is stronger than the mutual-information-based secrecy. This is because for α 1
I ( M ; Z ) D ( P Z | M U n | P M ) D α ( P Z | M U n | P M ) .
Given a wiretap channel with an encoding-decoding scheme, we say the α -secrecy is satisfied if
lim n D α ( P Z | M U n | P M ) = 0 .
The following theorem establishes that it is possible to achieve the rate C ( δ b ) S α β δ e = h α ( δ e ) h ( δ b ) with RM codes for α { 2 , 3 , , } .
Theorem 9. 
Let α { 2 , 3 , , } . Let C e n and C b n be two sequences of RM codes satisfying C e n C b n whose rates approach R e > 0 and R b > 0 , respectively. If R b < 1 h ( δ b ) and R e > 1 h α ( δ e ) , then the nested coding scheme based on C e n and C b n supports transmission on a BSC wiretap channel ( δ b , δ e ) guaranteeing α-secrecy with rate R b R e , provided that h α ( δ e ) h ( δ b ) > 0 .
Evidently, to achieve a stringent version of secrecy, it is necessary to reduce the rate of the message. The capacity of the ( δ b , δ e ) -wiretap channel is h ( δ e ) h ( δ b ) , while the known highest rate that assures α -secrecy and reliability is h α ( δ e ) h ( δ b ) . Hence, to achieve α -secrecy, we must give up h ( δ e ) h α ( δ e ) of the attainable rate.

6. Ball Noise and Error Probability of Decoding

This section focuses on achieving the best possible smoothing with respect to the ball noise. As an application, we show that codes that possess good smoothing properties with respect to the ball noise are suitable for error correction in the BSC.

6.1. Ball Noise

Recall that the perfect smoothing of a sequence of codes is only possible if the rate is greater than the corresponding D α -smoothing capacity. In addition to characterizing the D α -smoothing capacities of the ball noise, we quantify the best smoothing one can expect with rates below the D α -smoothing capacity. We will use these results in the upcoming subsection when we derive upper bounds for the decoding error probability on a BSC. The next theorem summarizes our main result on smoothing with respect to the ball noise.
Theorem 10. 
Let b δ n n be the sequence of ball noise operators, where δ n is the radius of the ball. Let δ [ 0 , 1 / 2 ] , α [ 0 , ] . Let C n be a code of length n and rate R n . Then, we have the following bounds:
D α ( T b δ n f C n U n ) 0
1 n D α ( T b δ n f C n U n ) 1 R n h ( δ ) .
There exist sequences of codes of rate R n R that achieve asymptotic equality in (19) for all R > 1 h ( δ ) . At the same time, if R < 1 h ( δ ) , then there exist sequences of codes achieving asymptotic equality in (20).
Proof. 
The inequality in (19) is trivial. Let us prove that asymptotically it can be achieved with equality. From Theorem 3, there exists a sequence of codes ( C n ) n such that D ( T b δ n f C n U n ) = o ( 1 ) given that R > 1 h ( δ ) . Hence, for α [ 0 , ]
0 D α ( T b δ n f C n U n ) D ( T b δ n f C n U n ) = o ( 1 ) .
Hence, the equality case in (19) is achievable for all α [ 0 , ] .
Let us prove (20). From Lemma 1, we have
D α ( T b δ n f C n U n ) n ( 1 R n ) H α ( b δ n ) n ( 1 R n h ( δ ) )
because 1 n H α ( b δ n ) = 1 n log V δ n h ( δ ) .
We are left to show that for R < 1 h ( δ ) , (20) can be achieved with equality in the limit of large n. We use a random coding argument to prove this. Let C n be an ( n , 2 n R n ) code whose codewords are chosen independently and uniformly. In Equation (A6), Appendix B, we define the expected norm of the noisy function. Here, we use this quantity for the ball noise kernel. For α [ 0 , ) , define
Q n ( α ) = E C n 2 ( α 1 ) D α ( T b δ n U n ) .
From Lemma A2, for any rational α 1 ,
Q n ( α ) k = 0 p p k 2 n k q ( 1 R n log V δ n n ) Q n p k q ,
for p , q Z 0 + such that α = 1 + p q .
Assume that R < 1 h ( δ ) . Let us prove that Q n ( α ) 2 n ( α 1 ) ( 1 R h ( δ ) + o ( 1 ) ) for rational values of α using induction. Let α [ 1 , 2 ] be rational and note that p q . Since Q n ( · ) 1 when the argument is less than 1, we can write (21) as follows:
Q n ( α ) k = 0 p p k 2 n k q ( 1 R n log V δ n n ) = 2 n ( α 1 ) ( 1 R h ( δ ) + o ( 1 ) ) .
Now, assume that (21) holds for all rational α [ 1 , m ] for some integer m 2 and prove that, in this case, it holds also for α ( m , m + 1 ] . By the induction hypothesis,
Q n ( α ) 0 k p q p k 2 n k q ( 1 R n log V δ n n ) 2 n p k q q ( 1 R h ( δ ) + o ( 1 ) ) + k = p q p p k 2 n k q ( 1 R n log V δ n n ) 0 k p q p k 2 n ( α 2 ) ( 1 R h ( δ ) + o ( 1 ) ) + k = p q p p k 2 n ( α 1 ) ( 1 R h ( δ ) + o ( 1 ) ) = 2 n ( α 1 ) ( 1 R h ( δ ) + o ( 1 ) ) .
Therefore, for every rational α ( 1 , ) , there exists a sequence of codes satisfying
D α ( T b δ n f C n U n ) = n ( 1 R h ( δ ) + o ( 1 ) ) ,
which is equivalent to the equality in (20).
Let us extend this result to non-negative reals. Let α [ 0 , ) and let us choose a rational α ( 1 , ) such that α < α . We know that there exists a sequence of codes satisfying
D α ( T b δ n f C n U n ) = n ( 1 R h ( δ ) + o ( 1 ) ) .
From (20) and from Remark 1,
n ( 1 R n h ( δ ) ) D α ( T b δ n f C n U n ) D α ( T b δ n f C n U n ) = n ( 1 R h ( δ ) + o ( 1 ) ) .
Hence, the asymptotic equality in (20) is achievable for all α [ 0 , ) . □
The above theorem characterizes the D α -smoothing capacities with respect to ball noise.
Corollary 2. 
Let δ [ 0 , 1 / 2 ] . Let b ( δ ) = b δ n n be a sequence of ball noise operators, where δ n is the radius corresponding to the n-th kernel. Then,
S α b ( δ ) = 1 h ( δ ) for α [ 0 , ] .
The norms of T b t f C can be used to bound the decoding error probability on a BSC. While estimating these norms for a given code is generally complicated, the second norm affords a compact expression based on the distance distribution of the code. In the next section, we bound the decoding error probability using the second norm of T b t f C . The following proposition provides closed-form expressions for 2 n T b t f C 2 2 .
Proposition 4. 
2 n T b t f C 2 2 = 2 n | C | V t 2 i = 0 n μ t ( i ) A i = 1 V t 2 k = 0 n L t ( k ) 2 A k .
where μ t ( i ) is defined in (1) and L t is the Lloyd polynomial of degree t (A2).
The proof is immediate from Proposition A1 in combination with (A2) and (A4).

6.2. Probability of Decoding Error on a BSC ( δ )

The idea that the smoothing of codes under some conditions implies good decoding performance has appeared in a number of papers using different language. The smoothing of capacity-achieving codes was considered in [18,46]. Hązła et al. [14] showed that if a code (sequence) is perfectly smoothable with respect to the Bernoulli noise, then the dual code is good for decoding (see Theorem A4, Corollary A1). Going from smoothing to decodability involves representing the D 2 -smoothness of codes with respect to the Bernoulli noise as a potential energy form and comparing it to the Bhattacharyya bound for the dual codes. One limitation of this approach is that it cannot infer decodability for rates R > 1 log ( 1 + 2 δ ( 1 δ ) ) (this is the region above the blue solid curve in Figure 2). Rao and Sprumont [15] and Hązła [34] proved that sufficient smoothing of codes implies the decodability of the codes themselves rather than their duals. However, these results are concerned with list decoding for rates above the Shannon capacity, resulting in an exponential list size, which is arguably less relevant from the perspective of communication.
Except for [15], the cited papers utilize perfect or near-perfect smoothing to infer decodability. For codes whose rates are below the capacity, perfect smoothing is impossible. At the same time, codes that possess sufficiently good smoothing properties are good for decoding. This property is at the root of the results for list decoding in [15]; however, their bounds were insufficient to make conclusions about list decoding below capacity.
Consider a channel where, for the input X f C , the output Y is given by Y = X + W with W b t . Define F t ( y ) = | C B ( y , t ) | as the number of codewords in the ball B ( y , t ) . Hence, for a received vector y, the possible number of codewords that can yield y is given by F t ( y ) . Intuitively, the decoding error is small if F t ( y ) 1 for typical errors. Therefore, F t is of paramount interest in decoding problems. Since the typical errors for both ball noise and the Bernoulli noise are almost the same, this allows us to obtain a bound for decodability in the BSC channel. Using this approach, we show that the error probability of decoding on a BSC ( δ ) can be expressed via the second moment of the number of codewords in the ball of radius t δ n .
Assume, without loss of generality, that C is a linear code and 0 n is used for transmission. Let Y be the random Bernoulli vector of errors, and note that Y β δ . The calculation below does not depend on whether we rely on unique or list decoding within a ball of radius t, so let us assume that the decoder outputs L 1 candidate codewords conditioned on the received vector y, which is a realization of Y .
In this case, the list decoding error can be written as
P L , t ( C , BSC ( δ ) ) = Pr { F t ( Y ) L + 1 | Y | > t } .
Theorem 11. 
Let t and t be integers such that 0 < t < t < n . Then, for any L 1 ,
P L , t ( C , BSC ( δ ) ) β δ ( t ) L w = 1 n μ t ( w ) A w + Pr ( | Y | t | Y | t ) .
Proof. 
Define S t , t = B ( 0 , t ) B ( 0 , t ) . Clearly,
P L , t ( C , BSC ( δ ) ) = Pr { F t ( Y ) L + 1 | Y | > t } Pr { ( F t ( Y ) L + 1 ) ( Y S t , t ) } + Pr ( Y S t , t ) .
Let us estimate the first of these probabilities.
Pr { ( F t ( Y ) L + 1 ) ( Y S t , t ) } = y S t , t 1 F t ( y ) L + 1 β δ ( y ) y S t , t F t ( y ) 1 L β δ ( y ) β δ ( t ) L y S t , t ( F t ( y ) 1 ) β δ ( t ) L y B ( 0 , t ) ( F t ( y ) 1 ) ( because for all y B ( 0 , t ) , F t ( y ) 1 ) = β δ ( t ) L y H n ( 1 C 1 B ( 0 , t ) ) ( y ) 1 B ( 0 , t ) ( y ) V t = β δ ( t ) L c C ( 1 B ( 0 , t ) 1 B ( 0 , t ) ) ( c ) V t = β δ ( t ) L i = 1 n μ t ( i ) A i .
Remark 7. 
In the case of L = 1 , the bound in (24) can be considered a slightly weaker version of Poltyrev’s bound [58], Lemma 1. By allowing this weakening, we obtain a bound in a somewhat more closed form, also connecting the decodability with smoothing. We also prove a simple bound for the error probability of list decoding expressed in terms of the code’s distance distribution (and, from (A4), also in terms of the dual distance distribution). The latter result seems not to have appeared in earlier literature.
The following version of this lemma provides an error bound, which is useful in the asymptotic setting.
Proposition 5. 
Let t = δ n + n θ , where θ ( 1 / 2 , 1 ) . Then,
P L , t ( C , BSC ( δ ) ) 2 n L V t 1 δ δ 2 n θ w = 1 n μ t ( w ) A w + 2 e n 2 θ 1 .
In particular,
P L , t ( C , BSC ( δ ) ) 2 n V t 1 δ δ 2 n θ w = 1 n μ t ( w ) A w + 2 e n 2 θ 1 .
Proof. 
Set t = δ n n θ . A direct calculation shows that
β δ ( t ) V t < 2 n 1 δ δ 2 n θ .
By the Hoeffding bound,
Pr ( | Y | t | Y | t ) 2 e n 2 θ 1 .
Together with Lemma 11, this implies our statements. □
A question of prime importance is whether the right-hand side quantities in Proposition 5 converge to 0. For R < 1 h ( δ ) , one can easily see that for random codes, w = 1 n μ t ( w ) V t A w = 2 Θ ( n ) , where t = δ n + n θ , showing that this is, in fact, the case.
From Proposition 4, it is clear that the potential energy w = 1 n μ t ( w ) A w is a measure of the smoothness of T b t f C . This implies that codes that are sufficiently smoothable with respect to b t are decodable in the BSC with vanishing error probability. In other words, Proposition 5 establishes a connection between the smoothing and the decoding error probability.

7. Perfect Smoothing—The Finite Case

In this section, we briefly overview another form of perfect smoothing, which is historically the earliest application of these ideas in coding theory. It is not immediately related to the information-theoretic problems considered in the other parts.
We are interested in radial kernels that yield perfect smoothing for a given code. We often write r ( i ) instead of r ( x ) if | x | = i , and call ρ ( r ) : = max ( i : r ( i ) 0 ) the radius of r. Note that the logarithm of the support size of r (as a function on the space H n ) is exactly the 0-Rényi entropy of r. Therefore, kernels with smaller radii can be perceived as less random, supporting the view of the radius ρ ( r ) as a general measure of randomness.
Definition 4. 
We say a code C is perfectly smoothable with respect to r if T r f C ( x ) = 1 2 n for all x H n , and, in this case, we say that r is a perfectly smoothing kernel for C .
Intuitively, such a kernel should have a sufficiently large radius. In particular, it should be as large as the covering radius of the code ρ ( C ) or otherwise smoothing does not affect the vectors that are ρ away from the code. To obtain a stronger condition, recall that the external distance of code C is d ¯ ( C ) = | { i 1 : A i 0 } | .
Proposition 6. 
Let r be a perfectly smoothing kernel of code C . Then, ρ ( r ) d ¯ ( C ) .
Proof. 
Note that perfect smoothing of C with respect to r is equivalent to
2 n T r f C 2 2 = 1 ,
which by Proposition A1 is equivalent to the following condition:
i = 1 n r ^ ( i ) 2 A i = 0 .
Therefore,
d ¯ ( C ) = | { i 1 : A i 0 } | n | { i 1 : r ^ ( i ) 0 } | .
By definition,
r ^ = 1 2 n K r ,
where K = ( K i ( j ) ) i , j = 0 n is the Krawtchouk matrix. Define I 1 = { j { 1 , 2 , , n } : r ^ ( j ) = 0 } and I 2 = { i { 1 , 2 , , n } : r ( i ) 0 } then,
0 = r ^ | I 1 = 1 2 n K | ( I 1 , : ) r = 1 2 n K | ( I 1 , I 2 ) r | I 2 .
This relation implies that there exists a linear combination of Krawtchouk polynomials of degree at most ρ ( r ) with | I 1 | roots. Therefore, d ¯ ( C ) n | supp ( { r ^ ( i ) } i = 1 n ) | = | I 1 | ρ ( r ) .
Since ρ ( C ) d ¯ ( C ) , this inequality strengthens the obvious condition ρ ( r ) ρ ( C ) . At the same time, there are codes that are perfectly smoothable by a radial kernel r such that ρ ( r ) = ρ ( C ) .
Definition 5 
([59]). A code C is uniformly packed in the wide sense if there exists rational numbers { α i } i = 0 ρ such that
i = 0 ρ ( C ) α i A i ( x ) = 1 for all x H n ,
where A i ( x ) is the weight distribution of the code C x .
Our main observation here is that some uniformly packed codes are perfectly smoothable with respect to noise kernels that are minimal in a sense. The following proposition states this more precisely.
Proposition 7. 
Let C be a code that is perfectly smoothable by a radial kernel of radius ρ ( r ) = ρ ( C ) . Then, C is uniformly packed in the wide sense with α i 0 for all i.
Proof. 
By definition, if C is perfectly smoothable with respect to r, then 2 n T r f C = 1 , which is tantamount to y H n 2 n | C | r ( y ) 1 C ( x y ) = 1 for all x H n . This condition can be written as i = 0 ρ 2 n | C | r ( i ) A i ( x ) = 1 for all x H n , completing the proof. □
To illustrate this claim, we list several families of uniformly packed codes ([59,60,61]) that are perfectly smoothable by a kernel of radius equal to the covering radius of the code.
(i)
Perfect codes: r = b ρ , where ρ = ρ ( C ) is the covering radius.
(ii)
2-error-correcting BCH codes of length 2 2 m + 1 , m 2 . The smoothing kernel r is given by
r ( 0 ) = r ( 1 ) = L , r ( 2 ) = r ( 3 ) = 3 L n , r ( i ) = 0 , i 4 .
(iii)
Preparata codes. The smoothing kernel r is given by
r ( 0 ) = r ( 1 ) = L , r ( 2 ) = r ( 3 ) = 6 L n 1 , r ( i ) = 0 , i 4 .
(iv)
Binary ( 2 m 1 , 2 2 m 3 m + 2 , 7 ) Goethals-like codes [60]. The smoothing kernel r is given by
r ( 0 ) = r ( 1 ) = L , r ( 2 ) = r ( 3 ) = 65 L 2 n , r ( 4 ) = r ( 5 ) = 30 L n ( n 3 ) , r ( i ) = 0 , i 4 .
Here, L is a generic notation for the normalizing factor. More examples are found in a related class of completely regular codes [62].
Definition 5 does not include the condition that α i 0 , and, in fact, there are codes that are uniformly packed in the wide sense, but some of the α i ’s are negative, and, thus, they are not smoothable by a noise kernel of radius ρ ( C ) . One such family is the 3-error-correcting binary BCH codes of length 2 2 m + 1 , m 2 [60].

Author Contributions

Conceptualization, A.B.; Formal analysis, M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science Foundation, USA, grant number CCF-2104489, CCF-2110113.

Data Availability Statement

Not applicable.

Acknowledgments

We are grateful to the reviewers for their feedback on our manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. L2 Smoothing

The Fourier transform of a function f : H n R is a function on the dual group H ^ n , which we identify with H n :
f ^ ( y ) = 1 2 n x H n f ( x ) ( 1 ) x · y , y H n .
The Fourier transform of the indicator function of the sphere is given by 1 ^ S ( 0 , t ) = 1 2 n K t , where K t ( x ) = K t ( n ) ( x ) = j = 0 t ( 1 ) j x j n x t j is a Krawtchouk polynomial of degree t. Then, clearly the Fourier transform of the indicator of the ball is
1 ^ B ( 0 , t ) = 1 2 n L t ,
where L t ( x ) : = i = 0 t K i ( x ) is called the Lloyd polynomial ([63], p. 64). The intersection of balls in (1) can be written as 1 B ( 0 , t ) 1 B ( x , t ) , which implies the expression ([42], Lemma 4.1)
μ t ( i ) = 2 n k = 0 n L t ( k ) 2 K k ( i ) .
Given a code C H n , we define the dual distance distribution of C as the set of numbers A j : = 1 | C | i = 0 n A i K j ( i ) , where ( A i ) i = 0 n is the distance distribution of C (2). Note that when C is linear, the set ( A j ) j = 0 n coincides with the distance distribution of its dual code C . For a radial potential V on H n and a code C , we have
i = 0 n V ( i ) A i = | C | k = 0 n V ^ ( k ) A k .
The L 2 -smoothness of a noisy code distribution can be written in terms of the distance distribution or of the dual distance distribution.
Proposition A1. 
Let C be a code and r be a noise kernel. Then,
2 n T r f C 2 2 = 2 n | C | i = 0 n ( r r ) ( i ) A i = 4 n k = 0 n r ^ ( k ) 2 A k .
Proof. 
Let us prove the first equality:
2 n T r f C 2 2 = 1 2 n x H n ( 2 n T r f C ( x ) ) 2 = 2 n | C | 2 x H n ( r 1 C ) ( x ) 2 = 2 n | C | 2 x H n y H n r ( x y ) 1 C ( y ) z H n r ( x z ) 1 C ( z ) = 2 n | C | 2 y C z C x H n r ( x y ) r ( x z ) = 2 n | C | 2 y C z C ( r r ) ( y z ) = 2 n | C | i = 0 n ( r r ) ( i ) A i .
The second equality is immediate by noticing that r r ^ = 2 n r ^ 2 and using (A4). □

Appendix B. Proof of Theorem 3

We will first establish Theorem 3 when α is rational, and then use a density argument to extend the proof to all real numbers. The case α = is handled separately at the end of this appendix.
We will use the following technical claim:
Lemma A1. 
Let x and y be two non-negative reals. Further, let p and q be positive integers. Then,
( x + y ) p q k = 0 p p k x k q y p k q .
Proof. 
Clearly ( x + y ) 1 q x 1 q + y 1 q . Therefore,
( x + y ) p q ( x 1 q + y 1 q ) p = k = 0 p p k x k q y p k q .
For M 1 , let C = ( c 0 , c 2 , , c M 1 ) be a code whose codewords are chosen randomly and independently from H n . For α [ 0 , ) , define
Q n ( α ) = E C 2 ( α 1 ) D α ( T r f C U n ) .
For α > 0 ,   Q n ( α ) = 2 n T r f C ( x ) α α . Clearly Q n ( 1 ) = 1 , Q n ( α ) 1 for α [ 0 , 1 ) , and Q n ( α ) 1 for α > 1 .
In the next lemma, we obtain a recursive bound for Q n . We will then use an induction argument to show the full result.
Lemma A2. 
Let α = p q + 1 and let C H n be a random code of size M = 2 n R . Then,
Q n ( α ) k = 0 p p k 2 n k q ( 1 R 1 n H 1 + k / q ( r ) ) Q n p k q .
Proof. 
In the calculation below, we write E for E C . Starting with (A6), we obtain
Q n ( α ) = E 1 2 n x H n [ 2 n ( r f C ) ( x ) ] α = 2 n ( α 1 ) E x H n z C r ( x z ) 1 M α = 2 n ( α 1 ) M α x H n E i = 0 M 1 r ( x c i ) α = 2 n ( α 1 ) M α x H n E i = 0 M 1 r ( x c i ) j = 0 M 1 r ( x c j ) α 1 = 2 n ( α 1 ) M α x H n E i = 0 M 1 r ( x c i ) r ( x c i ) + j = 0 , j i M 1 r ( x c j ) p q 2 n ( α 1 ) M α x H n E i = 0 M 1 r ( x c i ) k = 0 p p k r ( x c i ) k q j = 0 , j i M 1 r ( x c j ) p k q = 2 n ( α 1 ) M α k = 0 p p k x H n E i = 0 M 1 r ( x c i ) 1 + k q j = 0 , j i M 1 r ( x c j ) p k q = 2 n ( α 1 ) M α k = 0 p p k x H n E i = 0 M 1 r ( x c i ) 1 + k q E j = 0 , j i M 1 r ( x c j ) p k q ,
where c i , i = 1 , , M are random codewords in the code C . Recalling that E r ( x c i ) a = r a a for any a > 0 , we continue as follows:
2 n ( α 1 ) M α k = 0 p p k x H n M r 1 + k / q 1 + k / q E j = 0 M 1 r ( x c j ) p k q = 2 n ( α 1 ) M α 1 k = 0 p p k r 1 + k / q 1 + k / q E x H n j = 0 M 1 r ( x c j ) p k q = 2 n p / q M p / q k = 0 p p k r 1 + k / q 1 + k / q Q n p k q M ( p k ) / q 2 n ( ( p k ) / q 1 ) = k = 0 p p k 2 n ( 1 + k / q ) M k / q r 1 + k / q 1 + k / q Q n p k q = k = 0 p p k 2 n k q 1 R H ( 1 + k / q ) ( r ) n Q n p k q ,
where we used (5) and the fact that r is a pmf. □
On account of (14), (A6), and Lemma 1, to prove Theorem 3, we need to prove the following:
Theorem A1. 
Consider a sequence of ensembles of random codes of increasing length n and rate R n R . If R > 1 π ( α ) , where π ( α ) is given by (15), then,
lim n Q n ( α ) = 1
for all α ( 1 , ) .
We start with the case of rational α .
Proposition A2. 
Let α 0 be rational. If R > 1 π ( α ) , then lim sup n Q n ( α ) 1 .
Proof. 
This statement is true for all 0 α < 1 , so also true for all rational α in [0,1).
Assume that it holds for all rational α in [ 0 , m ) , where m Z + . Let α [ m , m + 1 ) and choose p , q Z 0 + such that α = 1 + p q . By Lemma A2,
lim sup n Q n ( α ) lim sup n k = 0 p p k 2 n k q 1 R n H 1 + k / q ( r n ) n Q n p k q k = 0 p p k lim sup n 2 n k q 1 R n H 1 + k / q ( r n ) n lim sup n Q n p k q .
If R > 1 π ( α ) , then evidently, R > 1 π ( 1 + k / q ) for all k p . Therefore,
lim sup n 2 n k q 1 R n H 1 + k / q ( r n ) n = 0
for all k > 0 . Since p q < m , by the induction hypothesis, we have lim sup n Q n p k q 1 for k = 0 , 1 , , p . Therefore, all the terms except the one with k = 0 vanish, yielding lim sup n Q n ( α ) 1 . □
Since Q n ( α ) 1 for α > 1 , this proves Theorem A1 for all rational α ( 1 , ) .
Finally, let us extend this result to all real α > 1 . As a first step, let us show that π ( α ) is continuous.
Lemma A3. 
π ( α ) is continuous for 1 < α < .
Proof. 
From the monotonicity of the Rényi entropies, for α > α > 1 ,
0 π ( α ) π ( α ) = lim inf n 1 n H α ( r n ) lim inf n 1 n H α ( r n ) .
Now, let us choose a subsequence ( r n k ) k such that
lim k 1 n k H α ( r n k ) = lim inf n 1 n H α ( r n ) .
Therefore,
π ( α ) π ( α ) = lim inf n 1 n H α ( r n ) lim k 1 n k H α ( r n k ) lim inf k 1 n k ( H α ( r n k ) H α ( r n k ) ) .
Note that H α is a continuous function of the order α for α > 1 . We use the mean value theorem to claim that there is a value γ k ( α , α such that H α ( r n k ) H α ( r n k ) = ( α α ) d d α H γ k ( r n k ) . Next, for any probability vector P,
d H α ( P ) d α = 1 ( 1 α ) 2 D ( Z P ) log | supp ( P ) | ( 1 α ) 2 ,
where Z i = P i α j P j α . Taking these remarks together, we obtain
π ( α ) π ( α ) lim inf k 1 n k ( α α ) H γ k ( r n k ) lim inf k 1 n k ( α α ) n k ( γ k 1 ) 2 = α α ( α 1 ) 2 ,
Therefore, π ( α ) is continuous on ( 1 , ) . □
Now, let α ( 1 , ) and assume R > 1 π ( α ) . Choose α > α such that α is rational and R > 1 π ( α ) . This is possible from the continuity of π . Therefore,
1 lim sup n Q n ( α ) lim sup n Q n ( α ) = 1 ,
which proves that (A8) and Theorem 3 hold for all α [ 1 , ) .
It remains to address the case α = . We obtain the following upper bound, whose proof follows closely an argument in Appendix E of [3].
Lemma A4. 
Let ϵ > 0 . We have
E C 2 n T r f C 1 + ϵ + 2 2 n H ( r ) e 3 ϵ 2 2 ( 3 + ϵ ) 2 [ n ( 1 R ) H ( r ) ] .
Proof. 
Let ϵ > 0 , then,
E C 2 n T r f C = E C 2 n T r f C 1 2 n T r f C 1 + ϵ } + E C 2 n T r f C 1 2 n T r f C < 1 + ϵ } E C 2 n T r f C 1 { 2 n T r f C 1 + ϵ } + 1 + ϵ E C 2 n r 1 { 2 n T r f C 1 + ϵ } + 1 + ϵ = 2 n r Pr C max y H n 2 n T r f C ( y ) 1 + ϵ + 1 + ϵ 2 n r 2 n max y H n Pr C 2 n T r f C ( y ) 1 + ϵ + 1 + ϵ .
For any y H n ,
Pr C ( 2 n T r f C ( y ) 1 + ϵ ) = Pr C 2 n M z C r ( y z ) 1 + ϵ = Pr c i U n , iid 2 n M i = 1 M r ( y c i ) 1 + ϵ = Pr 2 n M i = 1 M r ( y c i ) 1 + ϵ = Pr i = 1 M ( 2 n r ( y c i ) 1 ) M ϵ
To bound the last line from above, we use Bernstein’s inequality: For independent, zero-mean random variables X i , i = 1 , , N such that | X i | a for all i,
P i X i t exp t 2 / 2 i = 1 n E X i 2 + 1 3 a t .
Note that for a random uniform vector c i , the expectation E [ r ( y c i ) ] = 2 n since r ( · ) satisfies x H n r ( x ) = 1 , so this inequality applies for (A10). We obtain
Pr C ( 2 n T r f C ( y ) 1 + ϵ ) exp 1 2 M 2 ϵ 2 i = 1 M Var ( 2 n r ( y · ) ) + 1 3 2 n r M ϵ exp 1 2 M 2 ϵ 2 i = 1 M 2 n r 2 2 + 1 3 2 n r M ϵ exp 1 2 M 2 ϵ 2 M 2 n r 1 2 n r + 1 3 2 n r M ϵ = exp 3 ϵ 2 2 ( 3 + ϵ ) 2 n ( 1 R ) H ( r ) .
where on the last line, we use the equalities 2 n r 1 = 1 and 2 n r = 2 D ( r U n ) . The proof is concluded by substituting this inequality into (A9). □
Now, let us consider a sequence of (ensembles of) random codes of increasing length n and rate R n R . Recalling the definition of π ( · ) in (15), for n , we obtain
lim sup n E C n 2 n T r n f C n 1 + ϵ
once R > 1 π ( ) . Since ϵ is arbitrarily small, the left-hand side of (A11) approaches one, and together with (14) this completes the proof of Theorem 3.

Appendix C. Samorodnitsky’s Inequalities and Their Implications

Samorodnitsky [8,10] recently proved certain powerful inequalities for α -norms of noisy functions, which permit us to estimate the proximity to uniformity upon action of the Bernoulli noise kernels. We state some of them in this appendix after introducing a few more elements of notation. These results are used in Theorem 7 and in Appendix D, where we prove Lemma 2.
In this proof, we write [ n ] for { 1 , , n } . For a subset Γ [ n ] , write x | Γ to denote the coordinate projection of a vector x H n on Γ . If the subset Γ is formed by random choice with Pr ( i Γ ) = λ independently for all i [ n ] , we write Γ λ . For a function f on H n , let
E ( f | Γ ) ( x ) = 1 2 n | Γ | y : y | Γ = x | Γ f ( y ) .
Observe that E ( f | Γ ) = f f H [ n ] Γ , where H S = { x H n : x | [ n ] S = 0 } . Therefore, E ( f | Γ ) ( x ) is the noisy function of f with respect to the pmf given by the indicator function of the subcube H [ n ] Γ .
The entropy of a function f : H n R is defined as
Ent [ f ] = f log f 1 f 1 log ( f 1 ) = f log f f 1 1 .
This quantity can be thought of as the KL divergence between the distribution induced by f on H n and the uniform distribution:
Ent [ f ] = f 1 D f f U n .
If f itself is a pmf, then D ( f U n ) = 2 n Ent ( f ) = Ent ( 2 n f ) .
Theorem A2 
([8], Corollary 9). Let f be a non-negative function on H n . then,
Ent [ T δ f ] E Γ λ Ent [ E ( f | Γ ) ] .
where λ = ( 1 2 δ ) 2 .
Theorem A3 
([10], Theorem 1.1). Let f be a non-negative function on H n and α 2 be an integer. Then,
log T δ f α E Γ λ log E ( f | Γ ) α .
where λ = λ ( α , δ ) = 1 + 1 α 1 log ( δ α + ( 1 δ ) α ) = 1 h α ( δ ) . Furthermore,
log T δ f E Γ λ log E ( f | Γ ) .
where λ = λ ( , δ ) = 1 + log ( 1 δ ) = 1 h ( δ )
To interpret the inequalities (A15) and (A16), we note that their left-hand side measures the smoothness of the noisy version of f with respect to the noise β δ . At the same time, the right-hand side is the average smoothness of the noisy versions of f with respect to the sub-cube pmf’s.
Hązła et al. [14] used Theorem A3 to great effect, showing that if a code corrects erasures up to a certain noise level in a BEC, then, with high probability, it corrects errors on a BSC channel up to a certain noise level.
Theorem A4 
([14], Corollary 3.4). Let ( C n ) n be a sequence of codes whose rate approaches R. Assume that for some λ ( 0 , 1 R ] , P B ( BEC ( λ ) , C n ) = o ( 1 n ) . Then, ( C n ) n decodes errors on a BSC ( δ ) for any δ that satisfies 2 δ ( 1 δ ) < 2 λ 1 .
This theorem implies the following corollary:
Corollary A1 
([14]). Let ( C n ) n be a sequence of codes with rate R n R that recover transmitted messages with high probability on a BEC ( 1 R ) (i.e., ( C n ) n is a capacity-achieving sequence for BEC ( 1 R ) ). Furthermore, assume that d ( C n ) = ω ( log n ) . If 2 δ ( 1 δ ) < 2 1 R 1 , then with high probability, the codes C n correct errors when used on a BSC ( δ ) channel.
The authors of [14] then used this result to show that the RM codes of a constant rate correct a non-vanishing proportion of errors on the BSC.

Appendix D. Proof of Lemma 2

We present the proof as a sequence of lemmas.
Let Γ { 1 , , n } be a subset of coordinates and for z ( 0 , 1 } n let C ( Γ , z ) : = { c C : c | Γ = z | Γ } be the set of codewords that fit z in the positions of Γ . In particular, C Γ c = C ( Γ , 0 ) | Γ c is the shortened code C , i.e., the subcode with zeros in the positions of Γ , projected on Γ c . Let F ( C ) ( Γ , z ) : = | C ( Γ , z ) | .
Let us obtain expressions for the norms and the entropy of F ( C ) ( Γ , z ) .
Lemma A5. 
Let C be a linear code and let Γ { 1 , , n } . Then,
F ( C ) ( Γ , · ) α = | C | 2 | Γ | F ( C ) ( Γ , 0 ) α 1 1 / α .
Proof. 
From the linearity of the code,
F ( C ) ( Γ , z ) = F ( C ) ( Γ , 0 ) if z | Γ   is a valid non erasure pattern 0 otherwise . .
Furthermore, the number of distinct z H n for which C ( Γ , z ) is nonempty equals 2 n | Γ | | C / C ( Γ , 0 ) | . Hence,
F ( C ) ( Γ , · ) α = 1 2 n x H n F ( C ) ( Γ , x | Γ ) α 1 / α = 1 2 n 2 n | C | 2 | Γ | 1 F ( C ) ( Γ , 0 ) F ( C ) ( Γ , 0 ) α 1 / α = | C | 2 | Γ | F ( C ) ( Γ , 0 ) α 1 1 / α .
Lemma A6. 
Let E ( f | Γ ) be defined as in (A12). Then,
E ( 2 n f C | Γ ) α = 2 | Γ | | C | F T ( C ) ( Γ , 0 ) ( α 1 ) / α .
Proof. 
Using f = 2 n f C in (A12), we obtain
E ( 2 n f C | Γ ) ( x ) = 2 n 2 n | Γ | | C | y C : y | Γ = x | Γ 1 = 2 | Γ | | C | F ( C ) ( Γ , x | Γ ) ,
and, thus, from Lemma A5,
E ( 2 n f C | Γ ) α = 2 | Γ | | C | F ( C ) ( Γ , · ) α = 2 | Γ | | C | | C | 2 | Γ | F ( C ) ( Γ , 0 ) α 1 1 / α = 2 | Γ | | C | F T ( C ) ( Γ , 0 ) ( α 1 ) / α .
Lemma A7. 
Let C be a linear code. For X = X C , Y = Y ( X , BEC ( λ ) ) ,
H ( X | Y ) = E Γ λ log 2 | Γ | | C | F C ( Γ , 0 ) .
Proof. 
Start with taking X = X C and Y = Y ( BEC ( λ ) , X C ) ; then,
H ( X | Y = y ) = log [ F ( C ) ( y ) ] .
Therefore,
H ( X | Y ) = E Y [ log ( F ( C ) ( Y ) ) ] = E Γ E Z | Γ [ log ( F ( C ) ( Γ , Z ) ) | Γ ] = E Γ 1 λ [ log F ( C ) ( Γ , 0 ) ] .
By a standard identity about dual matroids ([64], p. 72),
dim ( C Γ c ) = dim ( C ) | Γ | + dim ( ( C ) Γ ) ,
or
F ( C ) ( Γ , 0 ) = | C | 2 | Γ | F C ( Γ c , 0 ) ,
and, thus, we continue as follows:
H ( X | Y ) = E Γ 1 λ log | C | 2 | Γ | F C ( Γ c , 0 ) = E Γ c λ log 2 | Γ c | | C | F C ( Γ c , 0 ) = E Γ λ log 2 | Γ | | C | F C ( Γ , 0 ) .
Switching to the dual code and taking X = X C and Y = Y ( BEC ( λ ) , X C ) now yields (A17). □
Lemma A8. 
α α 1 E Γ λ log E ( 2 n f C | Γ ) α = E Γ λ Ent [ E ( 2 n f C | Γ ) ] = H ( X C | Y ( BEC ( λ ) , X C ) ) .
Proof. 
From Lemmas A6 and A7,
α α 1 E Γ λ log E ( 2 n f C | Γ ) α = E Γ λ log 2 | Γ | | C | F C ( Γ , 0 ) = H ( X C | Y ( BEC ( λ ) , X C ) ) ,
which establishes the equality between the first and the third quantities. Since the second quantity is a limiting case of the first quantity and the value of the first quantity is independent of α , we have equality between the first and the second quantities. □
Now, Lemma 2 follows by combining Lemma A8 with Theorems A2 and A3.

References

  1. Han, T.S.; Verdú, S. Approximation theory of output statistics. IEEE Trans. Inf. Theory 1993, 39, 752–772. [Google Scholar] [CrossRef]
  2. Hayashi, M. General nonasymptotic and asymptotic formulas in channel resolvability and identification capacity and their application to the wiretap channel. IEEE Trans. Inf. Theory 2006, 52, 1562–1575. [Google Scholar] [CrossRef]
  3. Yu, L.; Tan, V.Y. Rényi resolvability and its applications to the wiretap channel. IEEE Trans. Inf. Theory 2019, 65, 1862–1897. [Google Scholar] [CrossRef]
  4. Debris-Alazard, T.; Ducas, L.; Resch, N.; Tillich, J.-P. Smoothing codes and lattices: Systematic study and new bounds. IEEE Trans. Inf. Theory 2023, 69, 6006–6027. [Google Scholar] [CrossRef]
  5. Micciancio, D.; Regev, O. Worst-case to average-case reductions based on Gaussian measures. SIam J. Comput. 2007, 37, 267–302. [Google Scholar] [CrossRef]
  6. Chen, W.W.L.; Skriganov, M.M. Explicit constructions in the classical mean squares problem in irregularities of point distribution. J. Fur Die Reine Und Angew. Math. 2002, 545, 67–95. [Google Scholar] [CrossRef]
  7. Skriganov, M.M. Coding theory and uniform distributions. Algebra Anal. 2001, 13, 191–239, Translation in St. Petersburg Math. J. 2002, 13, 301–337. [Google Scholar]
  8. Samorodnitsky, A. On the entropy of a noisy function. IEEE Trans. Inf. Theory 2016, 62, 5446–5464. [Google Scholar] [CrossRef]
  9. Samorodnitsky, A. An upper bound on q norms of noisy functions. IEEE Trans. Inf. Theory 2019, 66, 742–748. [Google Scholar] [CrossRef]
  10. Samorodnitsky, A. An improved bound on q norms of noisy functions. arXiv 2020, arXiv:2010.02721. [Google Scholar] [CrossRef]
  11. Bloch, M.R.; Laneman, J.N. Strong secrecy from channel resolvability. IEEE Trans. Inf. Theory 2013, 59, 8077–8098. [Google Scholar] [CrossRef]
  12. Belfiore, J.-C.; Oggier, F. Secrecy gain: A wiretap lattice code design. In Proceedings of the 2010 International Symposium on Information Theory & Its Applications, Taichung, Taiwan, 17–20 October 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 174–178. [Google Scholar] [CrossRef]
  13. Luzzi, L.; Ling, C.; Bloch, M.R. Optimal rate-limited secret key generation from Gaussian sources using lattices. IEEE Trans. Inf. Theory 2023, 69, 4944–4960. [Google Scholar] [CrossRef]
  14. Hązła, J.H.; Samorodnitsky, A.; Sberlo, O. On codes decoding a constant fraction of errors on the BSC. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual, 21–25 June 2021; pp. 1479–1488. [Google Scholar] [CrossRef]
  15. Rao, A.; Sprumont, O. A criterion for decoding on the BSC. arXiv 2022, arXiv:2202.00240. [Google Scholar] [CrossRef]
  16. Arimoto, S. On the converse to the coding theorem for discrete memoryless channels (corresp.). IEEE Trans. Inf. Theory 1973, 19, 357–359. [Google Scholar] [CrossRef]
  17. Polyanskiy, Y.; Verdú, S. Arimoto channel coding converse and Rényi divergence. In Proceedings of the 2010 48th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 29 September–1 October 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1327–1333. [Google Scholar] [CrossRef]
  18. Polyanskiy, Y.; Verdú, S. Empirical distribution of good channel codes with nonvanishing error probability. IEEE Trans. Inf. Theory 2013, 60, 5–21. [Google Scholar] [CrossRef]
  19. Chou, R.A.; Bloch, M.R.; Kliewer, J. Empirical and strong coordination via soft covering with polar codes. IEEE Trans. Inf. Theory 2018, 64, 5087–5100. [Google Scholar] [CrossRef]
  20. Cover, T.M.; Permuter, H.H. Capacity of coordinated actions. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 2701–2705. [Google Scholar] [CrossRef]
  21. Cuff, P. Distributed channel synthesis. IEEE Trans. Inf. Theory 2013, 59, 7071–7096. [Google Scholar] [CrossRef]
  22. Cuff, P.W.; Permuter, H.H.; Cover, T.M. Coordination capacity. IEEE Trans. Inf. Theory 2010, 56, 4181–4206. [Google Scholar] [CrossRef]
  23. Chou, R.A.; Bloch, M.R.; Abbe, E. Polar coding for secret-key generation. IEEE Trans. Inf. Theory 2015, 61, 6213–6237. [Google Scholar] [CrossRef]
  24. Brakerski, Z.; Lyubashevsky, V.; Vaikuntanathan, V.; Wichs, D. Worst-case hardness for LPN and cryptographic hashing via code smoothing. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques, Paris, France, 30 April– 4 May 2017; Springer: Berlin/Heidelberg, Germany, 2019; pp. 619–635. [Google Scholar]
  25. Goldfeld, Z.; Kato, K.; Nietert, S.; Rioux, G. Limit distribution theory for smooth p-Wasserstein distances. arXiv 2022, arXiv:2203.00159. [Google Scholar] [CrossRef]
  26. Goldfeld, Z.; Kato, K.; Rioux, G.; Sadhu, R. Statistical inference with regularized optimal transport. arXiv 2022, arXiv:2205.04283. [Google Scholar] [CrossRef]
  27. Nietert, S.; Goldfeld, Z.; Kato, K. Smooth p-Wasserstein distance: Structure, empirical approximation, and statistical applications. In Proceedings of the International Conference on Machine Learning, Virtual, 8–24 July 2021; pp. 8172–8183. [Google Scholar]
  28. Liu, J.; Cuff, P.; Verdú, S. Eγ-resolvability. IEEE Trans. Inf. Theory 2016, 63, 2629–2658. [Google Scholar] [CrossRef]
  29. Steinberg, Y.; Verdú, S. Simulation of random processes and rate-distortion theory. IEEE Trans. Inf. Theory 1996, 42, 63–86. [Google Scholar] [CrossRef]
  30. Ordentlich, O.; Polyanskiy, Y. Entropy under additive Bernoulli and spherical noises. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 521–525. [Google Scholar] [CrossRef]
  31. Polyanskiy, Y. Hypercontractivity of spherical averages in Hamming space. Siam J. Discret. Math. 2019, 33, 731–754. [Google Scholar] [CrossRef]
  32. Yu, L. Edge-isoperimetric inequalities and ball-noise stability: Linear programming and probabilistic approaches. J. Comb. Theory Ser. A 2022, 188, 105583. [Google Scholar] [CrossRef]
  33. Wyner, A.; Ziv, J. A theorem on the entropy of certain binary sequences and applications–I. IEEE Trans. Inf. Theory 1973, 19, 769–772. [Google Scholar] [CrossRef]
  34. Hązła, J.H. Optimal list decoding from noisy entropy inequality. arXiv 2022, arXiv:2212.01443. [Google Scholar] [CrossRef]
  35. Wyner, A.D. The wire-tap channel. Bell Syst. Tech. J. 1975, 54, 1355–1387. [Google Scholar] [CrossRef]
  36. Csiszár, I. Almost independence and secrecy capacity. Probl. Peredachi Informatsii 1996, 32, 48–57, English translation in Probl. Inform. Transm. 1996, 32, 40–47. [Google Scholar]
  37. Maurer, U.M. Secret key agreement by public discussion from common information. IEEE Trans. Inf. Theory 1993, 39, 733–742. [Google Scholar] [CrossRef]
  38. Thangaraj, A.; Dihidar, S.; Calderbank, A.R.; McLaughlin, S.W.; Merolla, J.-M. Applications of LDPC codes to the wiretap channel. IEEE Trans. Inf. Theory 2007, 53, 2933–2945. [Google Scholar] [CrossRef]
  39. Mahdavifar, H.; Vardy, A. Achieving the secrecy capacity of wiretap channels using polar codes. IEEE Trans. Inf. Theory 2011, 57, 6428–6443. [Google Scholar] [CrossRef]
  40. Subramanian, A.; Suresh, A.T.; Raj, S.; Thangaraj, A.; Bloch, M.; McLaughlin, S. Strong and weak secrecy in wiretap channels. In Proceedings of the 2010 6th International Symposium on Turbo Codes & Iterative Information Processing, Brest, France, 6–10 September 2010; pp. 30–34. [Google Scholar] [CrossRef]
  41. Gulcu, T.C.; Barg, A. Achieving secrecy capacity of the wiretap channel and broadcast channel with a confidential component. IEEE Trans. Inf. Theory 2016, 63, 1311–1324. [Google Scholar] [CrossRef]
  42. Barg, A. Stolarsky’s invariance principle for finite metric spaces. Mathematika 2021, 67, 158–186. [Google Scholar] [CrossRef]
  43. Bilyk, D.; Dai, F.; Matzke, R. The Stolarsky principle and energy optimization on the sphere. Constr. Approx. 2018, 48, 31–60. [Google Scholar] [CrossRef]
  44. Skriganov, M.M. Point distributions in two-point homogeneous spaces. Mathematika 2019, 65, 557–587. [Google Scholar] [CrossRef]
  45. Simon, B. Real Analysis: A Comprehensive Course in Analysis, Part 1; American Mathematical Society: Providence, RI, USA, 2015. [Google Scholar] [CrossRef]
  46. Shamai, S.; Verdú, S. The empirical distribution of good codes. IEEE Trans. Inf. Theory 1997, 43, 836–846. [Google Scholar] [CrossRef]
  47. Polyanskiy, Y.; Poor, H.V.; Verdú, S. Channel coding rate in the finite blocklength regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
  48. Bloch, M.R.; Luzzi, L.; Kliewer, J. Strong coordination with polar codes. In Proceedings of the 2012 50th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 1–5 October 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 565–571. [Google Scholar] [CrossRef]
  49. Tillich, J.-P.; Zémor, G. Discrete isoperimetric inequalities and the probability of a decoding error. Comb. Probab. Comput. 2000, 9, 465–479. [Google Scholar] [CrossRef]
  50. Ashikhmin, A.; Barg, A. Bounds on the covering radius of linear codes. Des. Codes Cryptogr. 2002, 27, 261–269. [Google Scholar] [CrossRef]
  51. Watanabe, S.; Hayashi, M. Strong converse and second-order asymptotics of channel resolvability. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 1882–1886. [Google Scholar] [CrossRef]
  52. Kudekar, S.; Kumar, S.; Mondelli, M.; Pfister, H.D.; Şaşoğlu, E.; Urbanke, R. Reed-Muller codes achieve capacity on erasure channels. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, Cambridge, MA, USA, 19–21 June 2016; pp. 658–669. [Google Scholar] [CrossRef]
  53. Abbe, E.; Sandon, C. A proof that Reed-Muller codes achieve Shannon capacity on symmetric channels. arXiv 2023, arXiv:2304.02509. [Google Scholar] [CrossRef]
  54. Reeves, G.; Pfister, H.D. Reed-Muller codes achieve capacity on BMS channels. arXiv 2021, arXiv:2110.14631. [Google Scholar] [CrossRef]
  55. Renes, J.M. Duality of channels and codes. IEEE Trans. Inf. Theory 2018, 64, 577–592. [Google Scholar] [CrossRef]
  56. Rengaswamy, N.; Pfister, H.D. On the duality between the BSC and quantum PSC. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2232–2237. [Google Scholar] [CrossRef]
  57. Pfister, H.; Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA; Rengaswamy, N.; Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, USA. Personal Communication, 2023.
  58. Poltyrev, G. Bounds on the decoding error probability of binary linear codes via their spectra. IEEE Trans. Inf. Theory 1994, 40, 1284–1292. [Google Scholar] [CrossRef]
  59. Semakov, N.; Zinov’ev, V.A.; Zaitsev, G. Uniformly packed codes. Probl. Peredachi Informatsii 1971, 7, 38–50. [Google Scholar]
  60. Goethals, J.-M.; van Tilborg, H.C.A. Uniformly packed codes. Philips Res. Rep. 1975, 30, 9–36. [Google Scholar]
  61. Tokareva, N. An upper bound for the number of uniformly packed codes. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 346–349. [Google Scholar] [CrossRef]
  62. Borges, J.; Rifà, J.; Zinoviev, V.A. On completely regular codes. Probl. Inf. Transm. 2019, 55, 1–45. [Google Scholar] [CrossRef]
  63. Delsarte, P. An algebraic approach to the association schemes of coding theory. Philips Res. Repts. Suppl. 1973, 10, 1973. [Google Scholar]
  64. Oxley, J. Matroid Theory; Oxford University Press: Oxford, UK, 1992. [Google Scholar]
Figure 1. Capacities and achievable rates for perfect smoothing. The lowermost curve gives the Shannon capacity of the BSC ( δ ) , the second curve from the bottom is the smoothing threshold for the duals of the BEC capacity-achieving codes, the third one is S 2 β δ and the top one is S β δ .
Figure 1. Capacities and achievable rates for perfect smoothing. The lowermost curve gives the Shannon capacity of the BSC ( δ ) , the second curve from the bottom is the smoothing threshold for the duals of the BEC capacity-achieving codes, the third one is S 2 β δ and the top one is S β δ .
Entropy 25 01515 g001
Figure 2. Achievable rates in the BSC wiretap channel with BEC capacity-achieving codes. The bottom curve is the lower bound on the code rate that guarantees decodability on a BSC ( δ ) . The middle curve shows Shannon’s capacity and the top one is the D 1 -smoothing threshold for the Bernoulli noise T δ .
Figure 2. Achievable rates in the BSC wiretap channel with BEC capacity-achieving codes. The bottom curve is the lower bound on the code rate that guarantees decodability on a BSC ( δ ) . The middle curve shows Shannon’s capacity and the top one is the D 1 -smoothing threshold for the Bernoulli noise T δ .
Entropy 25 01515 g002
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pathegama, M.; Barg, A. Smoothing of Binary Codes, Uniform Distributions, and Applications. Entropy 2023, 25, 1515. https://doi.org/10.3390/e25111515

AMA Style

Pathegama M, Barg A. Smoothing of Binary Codes, Uniform Distributions, and Applications. Entropy. 2023; 25(11):1515. https://doi.org/10.3390/e25111515

Chicago/Turabian Style

Pathegama, Madhura, and Alexander Barg. 2023. "Smoothing of Binary Codes, Uniform Distributions, and Applications" Entropy 25, no. 11: 1515. https://doi.org/10.3390/e25111515

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop