Smoothing of Binary Codes, Uniform Distributions, and Applications

Pathegama, Madhura; Barg, Alexander

doi:10.3390/e25111515

Open AccessFeature PaperEditor’s ChoiceArticle

Smoothing of Binary Codes, Uniform Distributions, and Applications

by

Madhura Pathegama

and

Alexander Barg

^*

Department of ECE and Institute for Systems Research, University of Maryland, College Park, MD 20742, USA

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(11), 1515; https://doi.org/10.3390/e25111515

Submission received: 26 August 2023 / Revised: 30 October 2023 / Accepted: 1 November 2023 / Published: 5 November 2023

(This article belongs to the Special Issue Extremal and Additive Combinatorial Aspects in Information Theory)

Download

Browse Figures

Versions Notes

Abstract

:

The action of a noise operator on a code transforms it into a distribution on the respective space. Some common examples from information theory include Bernoulli noise acting on a code in the Hamming space and Gaussian noise acting on a lattice in the Euclidean space. We aim to characterize the cases when the output distribution is close to the uniform distribution on the space, as measured by the Rényi divergence of order

α \in (1, \infty]

. A version of this question is known as the channel resolvability problem in information theory, and it has implications for security guarantees in wiretap channels, error correction, discrepancy, worst-to-average case complexity reductions, and many other problems. Our work quantifies the requirements for asymptotic uniformity (perfect smoothing) and identifies explicit code families that achieve it under the action of the Bernoulli and ball noise operators on the code. We derive expressions for the minimum rate of codes required to attain asymptotically perfect smoothing. In proving our results, we leverage recent results from harmonic analysis of functions on the Hamming space. Another result pertains to the use of code families in Wyner’s transmission scheme on the binary wiretap channel. We identify explicit families that guarantee strong secrecy when applied in this scheme, showing that nested Reed–Muller codes can transmit messages reliably and securely over a binary symmetric wiretap channel with a positive rate. Finally, we establish a connection between smoothing and error correction in the binary symmetric channel.

Keywords:

noise operator; uniform distribution; Renyi divergence; wiretap channel

1. Introduction

Many problems of information theory involve the action of a noise operator on a code distribution, transforming it into some other distribution. For instance, one can think of Bernoulli noise acting on a code in the Hamming space or Gaussian noise acting on a lattice in the Euclidean space. We are interested in characterizing the cases when the output distribution is close to the uniform distribution on the space. Versions of this problem have been considered under different names, including resolvability [1,2,3], smoothing [4,5], discrepancy [6,7], and the entropy of noisy functions [8,9,10]. Direct applications of smoothing include secrecy guarantees in both the binary symmetric wiretap channel [2,3,11] and the Gaussian wiretap channel [12,13], error correction in the binary symmetric channel (BSC) [14,15], converse coding theorems of information theory [1,16,17,18], strong coordination [11,19,20,21,22], secret key generation [13,23], and worst-to-average case reductions in cryptography [5,24]. Some aspects of this problem also touch upon approximation problems in statistics and machine learning [25,26,27].

Our main results are formulated for the smoothing in the binary Hamming space

H_{n}

. For

r : H_{n} \to R_{0}^{+}

, and

f : H_{n} \to R

define

T_{r} f (x) = (r * f) (x) : = \sum_{z \in H_{n}} r (z) f (x - z)

as the action of r on the functions on the space. We set r to be a probability mass function (pmf) and call the function

T_{r} f

the noisy version of f with respect to r, and refer to r and

T_{r}

as a noise kernel and a noise operator, respectively. By smoothing f with respect to r, we mean applying the noise kernel r to f. We often assume that

r (x)

is a radial kernel, i.e., its value on the argument

x \in H_{n}

depends only on the Hamming weight of x.

There are several ways to view the smoothing operation. Interpreting it as a shift-invariant linear operator, we note that, from Young’s inequality,

∥ T_{r} {f ∥}_{α} = {∥ f * r ∥}_{α} \leq {∥ f ∥}_{α}, 1 \leq α \leq \infty,

so smoothing contracts the

α

-norm. Upon applying

T_{r}

, the noisy version of f becomes “flatter”; hence, the designation “smoothing”. Note that if f is a pmf, then

T_{r} f

is also a pmf, and so this view allows us to model the effect of communication channels with additive noise.

The class of functions that we consider are (normalized) indicators of subsets (codes) in

H_{n}

. A code

C \subset H_{n}

defines a pmf

f_{C} = \frac{1_{C}}{| C |},

and, thus,

T_{r} f_{C}

can be viewed as a noisy version of the code (we also sometimes call it a noisy distribution) with respect to the kernel r. The main question of interest for us is the proximity of this distribution to

U_{n},

or the “smoothness” of the noisy code distributions. To quantify closeness to

U_{n}

, we use the Kullback–Leibler (KL) and Rényi divergences (equivalently,

L_{α}

norms), and the smoothness measured in

D_{α} (\cdot ∥ \cdot)

is termed the

D_{α}

-smoothness (

L_{α}

-smoothness).

We say that a code is perfectly smoothable with respect to the noise kernel r if the resultant noisy distribution becomes uniform. Our main emphasis is on the asymptotic version of perfect smoothing and its implications for some of the basic information-theoretic problems. A sequence of codes

{(C_{n})}_{n}

is asymptotically smoothed by the kernel sequence

r_{n}

if the distance between

(T_{r_{n}} f_{C_{n}})

and

U_{n}

approaches 0 as n increases. This property is closely related to the more general problem of channel resolvability introduced by Han and Verdú in [1]. Given a discrete memoryless channel

W (Y | X)

and a distribution

P_{X}

, we observe a distribution

P_{Y}

on the output of the channel. The task of channel resolvability is to find

P_{X}

supported on a subset

C \subset H_{n}

that approximates

P_{Y}

with respect to the KL divergence. As shown in [1], there exists a threshold value of the rate such that it is impossible to approximate

P_{Y}

using codes of lower rate, while any output process can be approximated by a well-chosen code of a rate larger than the threshold. Other proximity measures between distributions were considered for this problem in [3,28,29]. Following the setting in [3], we consider Rényi divergences for measuring the closeness to uniformity. We call the minimum rate required to achieve perfect asymptotic smoothing the

D_{α}

-smoothing capacity of the noise kernels

{(r_{n})}_{n}

, where the proximity to uniformity is measured by the

α

-Rényi divergence. In this work, we characterize the

D_{α}

-smoothing capacity of the sequence

{(r_{n})}_{n}

using its Rényi entropy rate.

Asymptotic smoothing. We will limit ourselves to studying smoothing bounds under the action of the Bernoulli noise or ball noise kernels, defined formally below. A common approach to deriving bounds on the norm of a noisy function is through hypercontractivity inequalities [30,31,32]. In its basic version, given a code

C

of size M, it yields the estimate

\begin{matrix} ∥ T_{δ} f_{C} ∥_{α} \leq {∥ f_{C} ∥}_{α^{'}} = M^{\frac{1 - α^{'}}{α^{'}}} 2^{- \frac{n}{α^{'}}}, \end{matrix}

where

T_{δ}

is the Bernoulli kernel (see Section 2 for formal definitions) and

α^{'} = 1 + {(1 - 2 δ)}^{2} (α - 1)

. This upper bound does not differentiate codes yielding higher or lower smoothness, which in many situations may not be sufficiently informative. Note that other tools, such as “Mrs. Gerber’s lemma” [30,33] or strong data-processing inequalities, also suffer from the same limitation.

A new perspective of the bounds for smoothing has recently been introduced in the works of Samorodnitsky [8,9,10]. Essentially, his results imply that codes satisfying certain regularity conditions have good smoothing properties. Their efficiency is highlighted in recent papers [14,34], which leveraged results for code performance on the binary erasure channel (BEC) to prove strong claims about the error correction capabilities of the codes when used on the BSC. Using Samorodnitsky’s inequalities, we show that the duals of some BEC capacity-achieving codes achieve

D_{α}

-smoothing capacity for

α \in {2, 3, \dots, \infty}

with respect to the Bernoulli noise. This includes the duals of polar codes and doubly transitive codes, such as the Reed–Muller (RM) codes.

Smoothing and the wiretap channel. Wyner’s wiretap channel [35] models communication in the presence of an eavesdropper. Code design for this channel pursues reliable communication between the legitimate parties, while at the same time leaking as little information as possible about the transmitted messages to the eavesdropper. The connection between secrecy in wiretap channels and resolvability was first mentioned by Csiszár [36] and later developed by Hayashi [2]. It rests on the observation that to achieve secrecy it suffices to make the distribution of an eavesdropper’s observations conditioned on the transmitted message nearly independent of the message. The idea of characterizing secrecy based on smoothness works irrespective of the measure of secrecy [2,3,11], and it was also employed for nested lattice codes used over the Gaussian wiretap channel in [12].

Secrecy on the wiretap channel can be defined in two ways, measured by the information gained by the eavesdropper, and it depends on whether this quantity is normalized to the number of channel uses (weak secrecy) or not (strong secrecy). This distinction was first highlighted by Maurer [37], and it has been adopted widely in the recent literature. Early papers devoted to code design for the wiretap channel relied on random codes, but, for simple channel models such as BSC or BEC, this has changed with the advent of explicit capacity-approaching code families. Weak secrecy results based on LDPC codes were presented in [38], but initial attempts to attain strong secrecy encountered some obstacles. To circumvent this, the first works on code construction [39,40] had to assume that the main channel is noiseless. The problem of combining strong secrecy and reliability for general wiretap channels was resolved in [41], but that work had to assume that the two communicating parties share a small number of random bits unavailable to the eavesdropper. Apart from the polar coding scheme of [41], explicit code families that support reliable communication with positive rate and strong secrecy have not previously appeared in the literature. In this work, we show that nested RM codes perform well in binary symmetric wiretap channels based on their smoothing properties. While our work falls short of proving that nested RM codes achieve capacity, we show that they can transmit messages reliably and secretly at rates close to capacity.

Ball noise and decoding error. Ball-noise smoothing provides a tool for estimating the error probability of decoding on the BSC. We derive impossibility and achievability bounds for the

D_{α}

-smoothness of noisy distributions with respect to the ball noise. Smoothing of a code with respect to the

L_{2}

norm plays a special role because, in this case, the second norm (the variance) of the resulting distribution can be expressed via the pairwise distance between codewords, enabling one to rely on tools from Fourier analysis. The recent paper by Debris-Alazard et al. [4] established universal bounds for the smoothing of codes or lattices, with cryptographic reductions in mind. The paper by Sprumont and Rao [15] addressed bounds for error probability of list decoding at rates above BSC capacity. A paper by one of the present authors [42] studied the variance of the number of codewords in balls of different radii (a quantity known as the quadratic discrepancy [43,44]).

The main contributions of this paper are the following:

Characterizing the $D_{α}$ -smoothing capacities of noise operators on the Hamming space for $α \in (1, \infty]$ .
Identifying some explicit code families that attain a smoothing capacity of the Bernoulli noise for $α \in {2, 3, \dots, \infty}$ ;
Obtaining rate estimates for the RM codes used on the BSC wiretap channel under the strong secrecy condition;
Showing that codes possessing sufficiently good smoothing properties are suitable for error correction.

In Section 2, we set up the notation and introduce the relevant basic concepts. Then, in Section 3, we derive expressions for the

D_{α}

-smoothing capacities for

α \in (1, \infty]

, and in Section 4, we use these results to analyze the smoothing of code families under the action of the Bernoulli noise. Section 5 is devoted to the application of these results for the binary symmetric wiretap channel. In particular, we show that RM codes can achieve rates close to the capacity of the BSC wiretap channel, while at the same time guaranteeing strong secrecy. In Section 6, we establish threshold rates for smoothing under ball noise, and derive bounds for the error probability of decoding on the BSC, including the list case, based on the distance distribution. Concluding the paper, Section 7 briefly points out that the well-known class of uniformly packed codes are perfectly smoothable with respect to “small” noise kernels.

2. Preliminaries

2.1. Notation

Throughout this paper,

H_{n}

is the binary n-dimensional Hamming space

Balls and spheres. Denote by

B (x, t) : = {y \in H_{n} : | y - x | \leq t}

the metric ball of radius t in

H_{n}

with center at x, and denote by

S (x, t) : = {y \in H_{n} : | y - x | = t}

the sphere of radius t. Let

V_{t} = | B (x, t) |

be the volume of the ball, and let

μ_{t} (i)

be the intersection volume of two balls of radius t whose centers are distance i apart:

\begin{matrix} μ_{t} (i) = | B (0, t) \cap B (x, t) |, where | x | = i . \end{matrix}

(1)

Codes and distributions. A code

C

is a subset in

H_{n}

. The rate and distance of the code are denoted by

R (C) : = log | C | / n

and

d (C)

, respectively. Let

A_{i} = \frac{1}{| C |} | {(x, y) \in C^{2} : d (x, y) = i} |

(2)

and let

(A_{i}, i = 0, \dots, n)

be the distance distribution of the code. If the code

C

forms an

F_{2}

-linear subspace in

H_{n},

we denote by

C^{⊥} : = {y \in H_{n} : \sum_{i} x_{i} y_{i} = 0 for all x \in C}

its dual code.

The function

1_{C}

denotes the indicator of a subset

C \subset H_{n},

and

f_{C} = \frac{1_{C}}{| C |}

is the corresponding pmf denoting the uniform distribution over the set, calling it a code distribution. Let

b_{t}

denote the uniform distribution on the ball

B_{0, t}

, given by

b_{t} (x) = \frac{1_{B_{0}, t}}{V_{t}}

. In the context of noise operators, we refer to

T_{b_{t}}

as the ball noise. Finally,

β_{δ}

is the binomial distribution on

H_{n},

given by

β_{δ} (x) = β_{δ}^{(n)} (x) = δ^{| x |} {(1 - δ)}^{n - | x |},

(3)

and

U_{n}

is the uniform distribution, given by

U_{n} (x) = 2^{- n}

for all

x .

Entropies and norms. For a function

f : H_{n} \to R

, we define its

α

-norm as follows.

\begin{matrix} {∥ f ∥}_{α} & = {(\frac{1}{2^{n}} \sum_{x \in H_{n}} {| f (x) |}^{α})}^{1 / α} for α \in (0, \infty) \\ {∥ f ∥}_{\infty} & = max_{x \in H_{n}} | f (x) | . \end{matrix}

Given a pmf P, let

\begin{matrix} H (P) & = - \sum_{i} P_{i} log P_{i}, \end{matrix}

(4)

\begin{matrix} H_{α} (P) & = \frac{1}{1 - α} log (\sum_{i} P_{i}^{α}) \end{matrix}

(5)

denote its Shannon entropy and Rényi entropy of order

α,

respectively. If P is supported on two points, we write

h (P)

and

h_{α} (P)

instead (all logarithms are to the base 2). The limiting cases of

α = 0, 1, \infty

are well-defined; in particular, for

α = 1

,

H_{α} (P)

reduces to

H (P) .

For two discrete probability distributions P and Q, the

α

-Rényi divergence (or simply the

α

-divergence) is defined as follows:

D_{α} (P ∥ Q) = \{\begin{matrix} - log Q ({i : P_{i} > 0}) & if α = 0 \\ \frac{1}{α - 1} log \sum_{i} P_{i}^{α} Q_{i}^{- (α - 1)} & if α \in (0, 1) \cup (1, \infty) \\ \sum_{i} P_{i} log \frac{P_{i}}{Q_{i}} & if α = 1 \\ max_{i} log \frac{P_{i}}{Q_{i}} & if α = \infty \end{matrix} .

(6)

The divergence

D_{α} (P ∥ Q)

is a continuous function of

α

for

α \in [0, \infty]

. For a pmf f on

H_{n}

\begin{matrix} D_{α} (f ∥ U_{n}) & = \frac{α}{α - 1} log {∥ 2^{n} f ∥}_{α}, α \in (0, 1) \cap (1, \infty) \end{matrix}

(7)

\begin{matrix} D_{\infty} (f ∥ U_{n}) & = log ∥ 2^{n} {f ∥}_{\infty} . \end{matrix}

(8)

Note that

D_{α} (f ∥ U_{n}) = n - H_{α} (f)

for all

0 \leq α \leq \infty .

Channels. In this paper, a channel is a conditional probability distribution

W : {0, 1} \to Y,

where

Y

is a finite set, so that

W (y | x)

is the conditional probability of the output y for the input x. We frequently consider the binary symmetric channel with crossover probability

δ

and the binary erasure channel with erasure probability

λ,

abbreviating them as

BSC (δ)

and

BEC (λ),

respectively. We are often interested in the n-fold channel

W^{(n)},

i.e., the conditional probability distribution corresponding to n-uses of the channel. For the input X, let

Y_{(X, W)}

be the random output of the channel

W^{(n)}

. If the input sequences are chosen from a uniform distribution on a code

C

, we denote the input by

X_{C} .

Since the number of uses of the channel is usually clear from the context, we suppress the dependency on n from the notation for channels and sequences.

Let

C

be a code of length n. For a channel

W

and input

X_{C}

, the block-MAP decoder is defined as

\begin{matrix} \hat{x} (y) = \underset{x \in C}{error} Pr (x | y) . \end{matrix}

For a given code and channel, denote the error probability of the block-MAP decoding by

\begin{matrix} P_{B} (W, C) = Pr (X_{C} \neq \hat{X} (Y_{(X_{C}, W)}) . \end{matrix}

2.2. $D_{α}$ - and $L_{α}$ -Smoothness

Recall that in the introduction, we expressed the smoothness of a distribution as its proximity to uniformity. Here, we formalize this notion based on two (equivalent) proximity measures.

Let g be a pmf on

H_{n}

. A natural measure of the uniformity of g is

D_{α} (g ∥ U_{n})

(

α \in [0, \infty]

). We call this the

D_{α}

-smoothness of g. Observe that

\begin{matrix} ∥ 2^{n} {g ∥}_{α} & = \frac{{∥ g ∥}_{α}}{{∥ g ∥}_{1}} \geq 1 for α \in (1, \infty], and \end{matrix}

(9)

\begin{matrix} ∥ 2^{n} {g ∥}_{α} & = \frac{{∥ g ∥}_{α}}{{∥ g ∥}_{1}} \leq 1 for α \in (0, 1) \end{matrix}

(10)

with equality iff

g = U_{n}

. Thus, the better the pmf g approximates uniformity, the closer is

∥ 2^{n} {g ∥}_{α}

to 1 (the denominator is simply a normalization quantity that allows dimension-agnostic analysis). Therefore,

∥ 2^{n} {g ∥}_{α}

(

α \in (0, 1) \cup (1, \infty]

) can be considered as another measure of proximity. We call

∥ 2^{n} {g ∥}_{α}

the

L_{α}

-smoothness of g. From (7) and (8), it follows that the

D_{α}

-smoothness and

L_{α}

-smoothness are equivalent.

Remark 1.

It is easily seen that

D_{α} (g ∥ U_{n}) = n - H_{α} (g)

; hence,

D_{α} (g ∥ U_{n})

is an increasing function of α.

Recall that for a given code

C

, and a noise kernel r,

T_{r} f_{C} = r * f_{C}

is the noisy distribution of code

C

with respect to r. We intend to study the smoothing properties of such noisy distributions of codes. In particular, we characterize the necessary conditions for

D_{α} (T_{r} f_{C} ∥ U_{n})

to be close to zero (equivalently, for

∥ 2^{n} T_{r} f_{C} ∥_{α}

close to one). In Section 3, we quantify these requirements in the asymptotic setting.

2.3. Resolvability

The problem of channel resolvability was introduced by Han and Verdú [1] under the name of approximating the output statistics of the channel. The objective of channel resolvability is to approximate the output distribution of a given input by the output distribution of a code with a smaller support size. In this work, we are interested in code families whose noisy distributions approximate uniformity. Resolvability characterizes the necessary conditions for this to happen in terms of the rate of the code.

Let

W

be a (discrete memoryless) channel whose input alphabet is

X

and whose output alphabet is

Y

. Let

X = {X_{n}}_{n = 1}^{\infty}

be a discrete-time random process where the RVs

X_{n}

take values in

X

. Denote by

Y_{n}

the random output of

W

with input

X_{n}

and let

Y = {Y_{n}}_{n = 1}^{\infty} .

Denote by

P_{Y}

the distribution of

Y

and let

P_{Y^{(n)}}

be the pmf of the n-tuple

Y^{(n)} : = {Y_{1}, Y_{2}, \dots, Y_{n}}

.

For a legitimate (realizable) output process

Y

, define

\begin{matrix} J^{(Δ)} (W, P_{Y}) = inf_{C_{n} \subset X^{n}} {\underset{n}{lim inf} R (C_{n}) : Δ (f_{C_{n}}, P_{Y^{(n)}}) \to 0}, \end{matrix}

(11)

where

Δ

is a measure of closeness of a pair of probability distributions. In words, we look for sequences of distributions

{(f_{C_{n}})}_{n}

of the smallest possible rate that approximates

P_{Y}

on the output of

W .

The original problem as formulated by Han and Verdú in [1] seeks to find the resolvability of the channel, defined as

\begin{matrix} C_{r}^{(Δ)} (W) = inf_{P_{Y}} {J^{(Δ)} (W, P_{Y}) : Y is an output process over W} . \end{matrix}

(12)

where

Δ

is either the variational distance or the normalized KL divergence

\frac{1}{n} D (\cdot ∥ \cdot) .

Hayashi [2] considered the same problem where the proximity was measured by the unnormalized KL divergence. In each case, the resolvability equals the Shannon capacity of the channel

W .

Theorem 1

([1,2]). Let

W

be a discrete memoryless channel. Suppose that Δ is either the KL divergence (normalized or not) or the variational distance; then, the resolvability is given by

\begin{matrix} C_{r}^{(Δ)} (W) = C (W), \end{matrix}

where

C (W)

is the Shannon capacity of the channel.

The authors of [1] proved this result under the additional assumption that the channel

W

satisfies strong converse, and Hayashi [2] later showed that this assumption is unessential.

In addition to the proximity measures considered in Theorem 1, the papers [3,28,29] considered other possibilities. In particular, Yu and Tan [3] studied the resolvability problem for a specific target distribution

P_{Y}

and for the Rényi divergence

Δ = D_{α}

(6). Their main result is as follows.

Theorem 2

([3], Theorem 2). Let

W

be a channel and

P_{Y}

be an output distribution. then

\begin{matrix} J^{(D_{α})} (W, P_{Y}) = \{\begin{matrix} min_{P_{X} \in P (W, P_{Y})} \sum_{x} P_{X} (x) D_{α} (W (\cdot | x) ∥ P_{Y}) & i f α \in (1, 2] \cup {\infty} \\ min_{P_{X} \in P (W, P_{Y})} D (W ∥ P_{Y} | P_{X}) & i f α \in (0, 1] \\ 0 & i f α = 0, \end{matrix} \end{matrix}

where

P (W, P_{Y})

is the set of distributions

P_{X}

consistent with the output

P_{Y}

.

A direct corollary of Theorem 2 is the following:

Corollary 1

([3], Equation (55)). Let

Y^{*}

be the output process where for each n,

Y_{n}^{*} \sim Ber (1 / 2)

. Then,

\begin{matrix} J^{(D_{α})} (BSC (δ), P_{Y^{*}}) = \{\begin{matrix} 1 - h_{α} (δ) & i f α \in (1, 2] \cup {\infty} \\ 1 - h (δ) & i f α \in (0, 1] \\ 0 & i f α = 0 . \end{matrix} \end{matrix}

This corollary gives necessary conditions for the rate of codes that can approximate the uniform distribution via smoothing. We will connect this result to the problem of finding smoothing thresholds in Section 4.

3. Perfect Smoothing—The Asymptotic Case

For a given family of noise kernels

{(T_{r_{n}})}_{n}

, there exists a threshold rate such that it is impossible to approximate uniformity with codes of rate below the threshold irrespective of the chosen code, while at the same time, there exist families of codes with a rate above the threshold that allows perfect approximation in the limit of infinite length. For instance, for the Bernoulli

(δ)

noise applied to a code

C

, the smoothed distribution is nonuniform unless

C = H_{n}

or

δ = 1 / 2

. At the same time, it is possible to approach the uniform distribution asymptotically for large n once the code sequence satisfies certain conditions. Intuitively, it is clear that, for a fixed noise kernel, it is easier to approximate uniformity if the code rate is sufficiently high. In this section, we characterize the threshold rate for (asymptotically) perfect smoothing. Of course, the threshold also depends on the proximity measure

Δ

that we are using. In this section, we use perfect smoothing to mean “asymptotically perfect”. If the proximity measure

Δ

for smoothing is not specified, this means that we are using the KL divergence. We obtain the threshold rates for perfect smoothing measured with respect to the

α

-divergence for several values of

α

. In the subsequent sections, we work out the details for the Bernoulli and ball noise operators, which also have some implications for communication problems.

Definition 1.

Let

{(C_{n})}_{n}

be a sequence of codes of increasing length n and let

0 \leq α \leq \infty .

We say that the sequence

C_{n}

is asymptotically perfectly

D_{α}

-smoothable with respect to the noise kernels

r_{n}

if

lim_{n \to \infty} D_{α} (T_{r_{n}} f_{C_{n}} ∥ U_{n}) = 0,

or equivalently (7) and (8) if

lim_{n \to \infty} {∥ 2^{n} T_{r_{n}} f_{C_{n}} ∥}_{α} = 1 (α \neq 0, 1) .

One can also define a dimensionless measure for perfect asymptotic smoothing by considering the limiting process

\frac{∥ T_{r_{n}} f_{C_{n}} - U_{n} ∥_{α}}{∥ T_{r_{n}} f_{C_{n}} ∥_{1}} = 2^{n} {∥ T_{r_{n}} f_{C_{n}} - U_{n} ∥}_{α} \to 0 .

(13)

Proposition 1.

Convergence in (13) implies perfect smoothing for all

1 < α \leq \infty

and is equivalent to it for

α \neq \infty .

Proof.

Let

C = C_{n} \subset H_{n}

for some fixed n. Since by the triangle inequality,

2^{n} ∥ T_{r} f_{C} ∥_{α} - 1 \leq 2^{n} {∥ T_{r} f_{C} - U_{n} ∥}_{α},

(13) is not weaker than the mode of convergence in Definition 1 for all

α \in [1, \infty]

. For

α \neq 1, \infty

, we use Clarkson’s inequalities ([45], p. 388). Their form depends on

α

; namely, for

2 \leq α < \infty

, we have

\begin{matrix} 1 + {∥\frac{2^{n} T_{r} f_{C} - 1}{2}∥}_{α}^{α} & \leq {∥\frac{2^{n} T_{r} f_{C} + 1}{2}∥}_{α}^{α} + {∥\frac{2^{n} T_{r} f_{C} - 1}{2}∥}_{α}^{α} \\ \leq \frac{1}{2} (∥ 2^{n} T_{r} f_{C} ∥_{α}^{α} + 1) . \end{matrix}

For

1 < α < 2

, the inequality has the form

\begin{matrix} 1 + {∥\frac{2^{n} T_{r} f_{C} - 1}{2}∥}_{α}^{α^{'}} & \leq {∥\frac{2^{n} T_{r} f_{C} + 1}{2}∥}_{α}^{α^{'}} + {∥\frac{2^{n} T_{r} f_{C} - 1}{2}∥}_{α}^{α^{'}} \\ \leq {(\frac{1}{2} (∥ 2^{n} T_{r} f_{C} ∥_{α}^{α} + 1))}^{α^{'} / α}, \end{matrix}

where

α^{'} = \frac{α}{α - 1}

is the Hölder conjugate. These equations show that, for

α \in (1, \infty)

,

∥ 2^{n} T_{r_{n}} f_{C_{n}} ∥_{α} \to 1

implies

∥ 2^{n} T_{r_{n}} f_{C_{n}} {- 1 ∥}_{α} \to 0,

establishing the claimed equivalence. □

Definition 2.

Let

{(r_{n})}_{n}

be a sequence of noise kernels. We say that the rate R is achievable for perfect

D_{α}

-smoothing if there exists a sequence of codes

{(C_{n})}_{n}

such that

R (C_{n}) \to R

as

n \to \infty

and

{(C_{n})}_{n}

is perfectly

D_{α}

-smoothable.

Note that if

R_{1}

is achievable, then any rate

1 \geq R_{2} > R_{1}

is also achievable. Indeed, consider a (linear) code

C_{1}

of rate

R_{1}

that has good smoothing properties. Construct

C_{2}

by taking the union of

2^{n (R_{2} - R_{1})}

non-overlapping shifts of

C_{1}

. Then the rate of

C_{2}

is

R_{2}

, and since each shift has good smoothing properties, the same is true for

C_{2}

. Therefore, let us define the main concept of this section.

Definition 3.

Given a sequence of kernels

r = {(r_{n})}_{n}

, define the

D_{α}

-smoothing capacity as

S_{α}^{r} : = inf_{{(C_{n})}_{n}} {\underset{n \to \infty}{lim inf} R (C_{n}) : lim_{n \to \infty} D_{α} (T_{r_{n}} f_{C_{n}} ∥ U_{n}) = 0} .

(14)

Note that this quantity is closely related to the resolvability: if, rather than optimizing on the output process in (12), we set the output distribution to uniform and take

Δ = D_{α},

then

S_{α}^{r}

equals

J^{(D_{α})} (W, P_{Y})

for the channel

W

given by the noise kernel

r .

To avoid future confusion, we refer to the capacity of reliable transmission as Shannon’s capacity.

The following lemma provides a lower bound for

D_{α}

-smoothness. It follows from Lemma 2 in [3], and we give a direct proof for completeness.

Lemma 1.

Let

C \subset H_{n}

be a code of size

M = 2^{n R}

and let r be a noise kernel. Then, for

α \in [0, \infty]

D_{α} (T_{r} f_{C} ∥ U_{n}) \geq n (1 - R) - H_{α} (r) .

Proof.

We will first prove that

∥ 2^{n} T_{r} f_{C} ∥_{α}^{α} \geq 2^{(α - 1) [n (1 - R) - H_{α} (r)]}

for

α \in (1, \infty)

:

\begin{matrix} ∥ 2^{n} T_{r} f_{C} ∥_{α}^{α} & = \frac{2^{n α}}{2^{n}} \sum_{x \in H_{n}} T_{r} f_{C} {(x)}^{α} \\ = 2^{n (α - 1)} \sum_{x \in H_{n}} {[\sum_{y \in H_{n}} r (y) f_{C} (x - y)]}^{α} \\ \geq 2^{n (α - 1)} \sum_{x \in H_{n}} \sum_{y \in H_{n}} {[r (y) f_{C} (x - y)]}^{α} \\ = 2^{n (α - 1)} \sum_{y \in H_{n}} r {(y)}^{α} \sum_{x \in H_{n}} f_{C} {(x - y)}^{α} \\ = \frac{2^{n α}}{{| C |}^{(α - 1)}} {∥ r ∥}_{α}^{α} \\ = 2^{(α - 1) [n (1 - R) - H_{α} (r)]} . \end{matrix}

Together with (7), this implies that the claimed inequality holds for

α \in (1, \infty)

.

A similar calculation shows that for

α \in (0, 1)

,

∥ 2^{n} T_{r} f_{C} ∥_{α}^{α} \leq 2^{(α - 1) [n (1 - R) - H_{α} (r)]},

yielding the claim for

α \in (0, 1)

. The limiting cases

α = 0

,

α = 1

, and

α = \infty

follow by continuity of

D_{α}

and

H_{α}

for all

α \geq 0 .

□

Define

π (α) = \underset{n \to \infty}{lim inf} \frac{H_{α} (r_{n})}{n}

(15)

Lemma 1 shows that it is impossible to achieve perfect

D_{α}

-smoothing if

R < 1 - π (α) .

A question of interest is whether there exist sequences of codes of

R > 1 - π (α)

that achieve perfect

D_{α}

-smoothing. The next theorem shows that this is the case for

α \in (1, \infty]

.

Theorem 3.

Let

r = {(r_{n})}_{n}

be a sequence of noise kernels and let

α \in (1, \infty]

. Then,

\begin{matrix} S_{α}^{r} = 1 - π (α) . \end{matrix}

(16)

The proof relies on a random coding argument and is given in Appendix B. This result will be used below to characterize the smoothing capacity of the Bernoulli and ball noise operators.

Remark 2.

Equality (16) does not hold in the case

α \in [0, 1]

. From Theorem 4 below, the Bernoulli noise does not satisfy (16) for

α \in [0, 1)

. To construct a counterexample for

α = 1

, consider the noise kernel that is almost uniform except for one distinguished point, for instance,

r_{n} (x) = 2^{- (n + 1)}

for

x \neq 0

and

r_{n} (0) = \frac{1}{2} + \frac{1}{2^{n + 1}} .

Performing the calculations, we then obtain that

S_{1}^{r} = 1

while

π (1) = \frac{1}{2} .

Remark 3.

It is worth noting that

π (α)

is a decreasing function of α for

0 \leq α \leq \infty .

4. Bernoulli Noise

In this section, we characterize the value

S_{α}^{β_{δ}}

for a range of values of

α

. Then, we provide explicit code families that attain the

D_{α}

-smoothing capacities.

As already mentioned, the resolvability for

β_{δ}

with respect to

α

-divergence was considered by Yu and Tan [3]. Their results, stated in Corollary 1, yield an expression for

S_{α}^{β_{δ}}

for

α \in [0, 2] \cup {\infty}

. The next theorem summarizes the current knowledge about

S_{α}^{β_{δ}},

where the claims for

2 < α < \infty

form new results.

Theorem 4.

\begin{matrix} S_{α}^{β_{δ}} = \{\begin{matrix} 0 & i f α = 0 \\ 1 - h (δ) & i f α \in (0, 1] \\ 1 - h_{α} (δ) & i f α \in (1, \infty] . \end{matrix} \end{matrix}

(17)

Proof.

The claims for

α \in [0, 1]

follow from Corollary 1. The results for

α = (1, \infty]

follow from Theorem 3 since

\frac{H_{α} (β_{δ})}{n} = h_{α} (δ)

. □

Having quantified the smoothing capacities, let us examine the code families with strong smoothing properties. Since the

D_{1}

-smoothing capacity and the Shannon capacity coincide, it is natural to speculate that codes that achieve the Shannon capacity when used on the BSC

(δ)

would also attain the

D_{1}

-smoothing capacity. However, the following result demonstrates that the capacity-achieving codes do not yield perfect smoothing. For typographical reasons, we abbreviate

T_{β_{δ}}

by

T_{δ}

from this section onward.

Proposition 2.

Let

C_{n}

be a sequence of codes achieving a capacity of

BSC (δ)

. Then,

D (T_{δ} f_{C_{n}} ∥ U_{n}) \to \infty, D (T_{δ} f_{C_{n}} ∥ U_{n}) = o (n) .

Proof.

The second part of the statement is Theorem 2 in [46]. The first part is obtained as follows: Let

C_{n}

be a capacity-achieving sequence of codes in

BSC (δ)

. Then, from [47] (Theorem 49), there exists a constant

K > 0

such that

n R (C_{n}) \leq n (1 - h (δ)) - K \sqrt{n}

for large n. Therefore,

\begin{matrix} 0 \leq H (X_{C_{n}} | Y_{BSC (δ), X}) = n (R (C_{n}) + h (δ) - 1) + D (T_{δ} f_{C_{n}} ∥ U_{n}), \end{matrix}

which implies

D (T_{δ} f_{C_{n}} ∥ U_{n}) \geq K \sqrt{n}

. □

Apart from random codes, only polar codes are known to achieve

D_{1}

-smoothing capacity. Before stating the formal result, recall that polar codes are formed by applying several iterations of a linear transformation to the input, which results in creating virtual channels for individual bits with Shannon’s capacity close to zero or to one, plus a vanishing proportion of intermediate-capacity channels. While by Proposition 2, that polar codes that achieve the BSC capacity cannot achieve

D_{1}

-smoothing capacity, adding some intermediate-bit channels to the set of data bits makes this possible. This idea was first introduced in [39] and expressed in terms of resolvability in [48].

Theorem 5

([48], Proposition 1). Let

W

be the

BSC (δ)

channel and

W_{n}^{(i)}

be the virtual channels formed after applying n steps of the polarization procedure. For

γ \in (0, 1 / 2)

, define

G_{n} = {i \in {1, \dots, n} : C (W_{n}^{(i)}) \geq 2^{- n^{γ}}}

. Let

C_{n}

be the polar code corresponding to the virtual channels

G_{n}

. Then,

D (T_{δ} f_{C_{n}} ∥ U_{n}) \to 0

.

Note that

{lim}_{n \to \infty} R (C_{n}) = {lim}_{n \to \infty} \frac{| G_{n} |}{n} = 1 - h (δ)

. Hence, the polar code construction presented above achieves the perfect smoothing threshold with respect to the KL divergence. Furthermore, since the convergence in the

α

divergence for

α < 1

is weaker than the convergence in

α = 1

, the same polar code sequence is perfectly

D_{α}

-smoothable for

α < 1

. Noting that the smoothing threshold for

α < 1

is

1 - h (δ)

by Theorem 4, we conclude that the above polar code sequence achieves smoothing capacity in

α

-divergence for

α < 1

.

As mentioned earlier, the smoothing properties of code families other than random codes and polar codes have not been extensively studied. We show that the duals of capacity-achieving codes in the BEC exhibit good smoothing properties using the tools developed in [10]. As the first step, we establish a connection between the smoothing of a generic linear code and the erasure correction performance of its dual code.

Lemma 2.

Let

C

be a linear code and let

X_{C^{⊥}}

be a random uniform codeword of

C^{⊥}

. Let

Y_{X_{C^{⊥}}, BEC (λ)}

be the output of the erasure channel

BEC (λ)

for the input

X_{C^{⊥}}

. Then,

\begin{matrix} D_{α} (T_{δ} f_{C} ∥ U_{n}) \leq H (X_{C^{⊥}} | Y_{X_{C^{⊥}}, BEC (λ)}), \end{matrix}

(18)

where

λ = {(1 - 2 δ)}^{2}

for

α = 1

and

λ = 1 - h_{α} (δ)

for

α \in {2, 3, \dots, \infty}

.

The proof is given in Appendix D.

Using this lemma, we show that the duals of the BEC capacity-achieving codes (with growing distance) exhibit good smoothing properties. In particular, they achieve

D_{α}

-smoothing capacities for

α \in {2, 3, \dots, \infty}

.

Theorem 6.

Let

{(C_{n})}_{n}

be a sequence of linear codes with rate

R_{n} \to R

. Suppose that the dual sequence

{(C_{n}^{⊥})}_{n}

achieves Shannon’s capacity of the BEC

(λ)

with

λ = R

, and assume that

d (C_{n}^{⊥}) = ω (log n) .

If

R > {(1 - 2 δ)}^{2}

, then,

\begin{matrix} lim_{n \to \infty} D (T_{δ} f_{C_{n}} ∥ U_{n}) = 0 . \end{matrix}

Additionally, for

α \in {2, 3, \dots, \infty}

, if

R > 1 - h_{α} (δ)

, then,

\begin{matrix} lim_{n \to \infty} D_{α} (T_{δ} f_{C_{n}} ∥ U_{n}) = 0 . \end{matrix}

In particular, the sequence

C_{n}

achieves

D_{α}

-smoothing capacity

S_{α}^{β_{δ}}

for

α \in {2, 3, \dots, \infty}

.

Proof.

Since the dual codes achieve the capacity of the BEC, it follows from ([49], Theorem 5.2) that, if their distance grows with n, then their decoding error probability vanishes. In particular, if

d (C_{n}^{⊥}) = ω (log (n)),

then,

P_{B} (BEC (R - ϵ), C_{n}^{⊥}) = o (\frac{1}{n})

for all

ϵ \in (0, R] .

Hence, from Fano’s inequality,

lim_{n \to \infty} H (X_{C_{n}^{⊥}} | Y_{X_{C_{n}^{⊥}}, BEC (R - ϵ)}) = 0 .

Now, if

R > {(1 - 2 δ)}^{2}

, then there exists

ϵ_{0}

such that

R - ϵ_{0} = {(1 - 2 δ)}^{2}

. Therefore, from Lemma 2,

\begin{matrix} lim_{n \to \infty} D (T_{δ} f_{C_{n}} ∥ U_{n}) \leq lim_{n \to \infty} H (X_{C_{n}^{⊥}} | Y_{X_{C_{n}^{⊥}}, BEC (R - ϵ_{0})}) = 0 . \end{matrix}

Similarly,

if R > 1 - h_{α} (δ) for α \in {2, 3, \dots, \infty}, then, {lim}_{n \to \infty} D_{α} (T_{δ} f_{C_{n}} ∥ U_{n}) = 0 .

Together with Theorem 4, we have now proved the final claim. □

The known code families that achieve the capacity of the BEC include polar codes, LDPC codes, and doubly transitive codes, such as constant-rate RM codes. LDPC codes do not fit the assumptions because of low dual distance, but the other codes do. This yields explicit families of codes that achieve the

D_{α}

-smoothing capacity.

We illustrate the results of this section in Figure 1, where the curves show the achievability and impossibility rates for perfect smoothing with respect to the Bernoulli noise. Given a code (sequence) of rate R, putting it through a noise

β_{δ}

below the Shannon capacity cannot achieve perfect smoothing. The sequence of polar codes from [39], cited in Theorem 5, is smoothable at rates equal to the Shannon capacity, although these codes do not provide a decoding guarantee at that noise level. At the second curve from the bottom, the duals of the codes that achieve Shannon’s capacity in BEC achieve perfect

D_{1}

-smoothing; at the third (fourth) curve, these codes are perfectly

D_{2}

- (or

D_{\infty}

-) smoothable, and they achieve the corresponding smoothing capacity.

Remark 4.

Observe that the strong converse of the channel coding theorem does not imply perfect smoothing. To give a quick example, consider a code

C_{n} = B (0, δ^{'} n)

formed of all the vectors in the ball. Let

0 < δ < 1 / 2

and let us use this code on a BSC

(δ)

, where

h (δ) + h (δ^{'}) > 1

and

δ < 1 / 2

. From the choice of the parameters, the rate of

C_{n}

is above capacity, and, therefore,

P_{B} (BSC (δ), C_{n}) \approx 1

from the strong converse. At the same time,

\begin{matrix} D (T_{δ} f_{C_{n}} ∥ U_{n}) & = n - H (b_{δ^{'} n} * β_{δ}) = n - H (β_{δ^{'}} * β_{δ}) + O (\sqrt{n}) \\ = n (1 - h (δ^{'} (1 - δ) + δ (1 - δ^{'}))) + O (\sqrt{n}) . \end{matrix}

where the transition from the ball noise to the Bernoulli noise (the second equality) is shown in [30]. Since

δ^{'} (1 - δ) + δ (1 - δ^{'})) < 1 / 2

for all

δ < 1 / 2, δ^{'} < 1 / 2,

we conclude that

D (T_{δ} f_{C_{n}} ∥ U_{n}) ↛ 0 .

Remark 5.

In this paper, we mostly study the trade-off between the rate of codes and the level of the noise needed to achieve perfect smoothing. A recent work of Debris-Alazard et al. [4] considered guarantees for smoothing derived from the distance distribution of codes and their dual distance (earlier, similar calculations were performed in [42,50]). Our approach enables us to find the conditions for perfect smoothing similar to [4] but relying on fewer assumptions.

Proposition 3.

Let

C_{n}

be a sequence of codes whose dual distance

d (C_{n}^{⊥}) \geq \partial^{⊥} n

where

\partial^{⊥} \in (0, 1)

. If

\partial^{⊥} > {(1 - 2 δ)}^{2}

, then,

\begin{matrix} lim_{n \to \infty} D (T_{δ} f_{C_{n}} ∥ U_{n}) = 0 . \end{matrix}

Proof.

Notice that

{lim}_{n \to \infty} H (X_{C_{n}^{⊥}} | Y_{X_{C^{⊥}}, BEC (λ)}) = 0

if

\partial^{⊥} > λ

. With this, the proof is a straightforward application of Lemma 2. □

Compared to [4], this claim removes the restrictions on the support of the dual distance distribution of the codes

C_{n} .

5. Binary Symmetric Wiretap Channels

In this section, we discuss applications of perfect smoothing to the BSC wiretap channel. Wyner’s wiretap channel model

V

[35] for the case of BSCs is defined as follows: The system is formed of three terminals,

A, B,

and E. Terminal A communicates with B by sending messages M chosen from a finite set

M

. Communication from A to B occurs over a BSC

W_{b}

with crossover probability

δ_{b}

, and it is observed by the eavesdropper E via another BSC

W_{e}

with crossover probability

δ_{e} > δ_{b} .

A message

M \in M

is encoded into a bit sequence

X \in H_{n}

and sent from A to B in n uses of the channel

W_{b} .

Terminal B observes the sequence

Y = X + W_{b},

where

W_{b} \sim Bin (n, δ_{b})

is the noise vector, while terminal E observes the sequence

Z = X + W_{e}

with

W_{e} \sim Bin (n, δ_{e})

. We assume that the messages are encoded into a subset of

H_{n},

which imposes some probability distribution on the input of the channels. The goal of the encoding is to ensure reliability and secrecy of communication. The reliability requirement amounts to the condition

Pr (M \neq \hat{M}) \to 0

as

n \to \infty

, where

\hat{M}

is the estimate of M made by B. To ensure secrecy, we require the strong secrecy condition

I (M; Z) \to 0

. This is in contrast to the condition

\frac{1}{n} I (M; Z) \to 0

studied in the early works on the wiretap channel, which is now called weak secrecy. Denote by

R = \frac{1}{n} log | M |

the transmission rate. The secrecy capacity

C_{s} (V)

is defined as the supremum of the rates that permit reliable transmission, which also conforms to the secrecy condition.

The nested coding scheme, proposed by Wyner [35], has been the principal tool of constructing well-performing transmission protocols for the wiretap channel [38,39,41]. To describe it, let

C_{e}

and

C_{b}

be two linear codes such that

C_{e} \subset C_{b}

and

| M | = \frac{| C_{b} |}{| C_{e} |}

. We assign each message m to a unique coset of

C_{e}

in

C_{b}

. The sequence transmitted by A is a uniform random vector from the coset. As long as the rate of the code

C_{b}

is below the capacity of

W_{b}

, we can ensure the reliability of communication from A to B.

Strong secrecy can be achieved relying on perfect smoothing. Denote by

c_{m}

a leader of the coset that corresponds to the message m. The basic idea is that if

P_{Z | M = m} = (T_{δ} f_{C_{e}}) (\cdot + c_{m})

is close to a uniform distribution

U_{n}

for all m, these conditional pmfs are almost indistinguishable from each other, and terminal E has no means of inferring the transmitted message from the observed bit string Z.

As mentioned earlier, the weak secrecy results for the wiretap channel based on LDPC codes and on polar codes were presented in [38,39], respectively. The problem that these schemes faced, highlighted in Theorems 2 and 5, is that code sequences that achieve BSC capacity have a rate gap of at least

1 / \sqrt{n}

to the capacity value. At the same time, the rate of perfectly smoothable codes must exceed the capacity by a similar quantity [51]. For this reason, the authors of [39] included the intermediate virtual channels in their polar coding scheme, which gave them strong secrecy, but interfered with transmission reliability. A similar general issue arose earlier in attempting to use LDPC codes for the wiretap channel [40].

Contributing to the line of work connecting smoothing and thewiretap channel [2,3,11], we show that nested coding schemes

C_{e} \subset C_{b},

where the code

C_{b}

is good for error correction in

BSC (δ_{b})

and

C_{e}

is perfectly smoothable with respect to

β_{δ_{b}}

, attain strong secrecy and reliability for a BSC wiretap channel

(δ_{b}, δ_{e})

. As observed in Lemma 2, the duals of the good erasure-correcting codes are perfectly smoothable for certain noise levels and, hence, they form a good choice for

C_{e}

in this scenario.

The following lemma establishes a connection between the smoothness of a noisy distribution of a code and strong secrecy.

Lemma 3.

Consider the nested coding scheme for the BSC wiretap channel introduced above. If

D (T_{δ_{e}} f_{C_{e}} ∥ U_{n}) < ϵ,

then

I (M; Z) < ϵ .

Proof.

We have

\begin{matrix} D (P_{Z | M} ∥ U_{n} | P_{M}) & = \sum_{\begin{matrix} m \in M \\ z \in H_{n} \end{matrix}} P_{M Z} (m, z) log \frac{P_{Z | M} (z | m)}{U_{n} (z)} \\ = I (M; Z) + D (P_{Z} ∥ U_{n}) . \end{matrix}

Now, note that

P_{Z | M = m} (z) = (T_{δ_{e}} f_{C_{e}}) (z + c_{m}) = P_{Z | M = m^{'}} (z + c_{m^{'}} + c_{m}),

so

D (P_{Z | M = m} ∥ U_{n})

is independent of m. Therefore, for all

m \in M

\begin{matrix} D (P_{Z | M = m} ∥ U_{n}) & = D (P_{Z | M} ∥ U_{n} | P_{M}) \\ = I (M; Z) + D (P_{Z} ∥ U_{n}) \geq I (M; Z) . □ \end{matrix}

This lemma enables us to formulate conditions for reliable communication while guaranteeing the strong secrecy condition. Namely, it suffices to take a pair (a sequence of pairs) of nested codes

C_{e} \subset C_{b}

such that

D (T_{δ_{e}} f_{C_{e}} ∥ U_{n}) \to 0

as

n \to \infty .

If at the same time the code

C_{b}

corrects errors on a BSC

(δ_{b}),

then the scheme fulfills both the reliability and strong secrecy requirements under noise levels

δ_{b}

and

δ_{e}

for channels

W_{b}

and

W_{e}

, respectively, supporting transmission from A to B at rate

R_{b} - R_{e}

. Together with the results established earlier, we can now make this claim more specific.

Theorem 7.

Let

{({(C_{e}^{n})}^{⊥})}_{n}

and

{(C_{b}^{n})}_{n}

be sequences of linear codes that achieve the capacity of the BEC for their respective rates. Suppose that

C_{e}^{n} \subset C_{b}^{n}

and

1: $d ({(C_{e}^{n})}^{⊥}) = ω (log n), R (C_{e}^{n}) \to R_{e}$ ;
2: $d (C_{b}^{n}) = ω (log n), R (C_{b}^{n}) \to R_{b}$ .

If

R_{b} < 1 - log (1 + 2 \sqrt{δ_{b} (1 - δ_{b})})

and

R_{e} > 4 δ_{e} (1 - δ_{e})

, then the nested coding scheme based on

C_{e}^{n}

and

C_{b}^{n}

can transmit messages with rate

R_{b} - R_{e}

from A to B, satisfying the reliability and strong secrecy conditions.

Proof.

From Corollary A1, the conditions

d (C_{b}^{(n)}) = ω (log n)

and

R_{b} < 1 - log (1 + 2 \sqrt{δ_{b} (1 - δ_{b})})

guarantee transmission reliability. Furthermore, by Theorem 6, the conditions

d ({(C_{e}^{n})}^{⊥}) = ω (log n)

and

R_{e} > 4 δ_{e} (1 - δ_{e})

imply that

D (T_{δ_{e}} f_{C_{e}} ∥ U_{n}) \to 0

, which in its turn implies strong secrecy by Lemma 3. □

To give an example of a code family that satisfies the assumptions of this theorem, consider the RM codes of constant rate. Namely, let

C_{e}^{n} \subset C_{b}^{n}

be two sequences of RM codes whose rates converge to

R_{e}

and

R_{b}

, respectively. Note that the duals of the RM codes are themselves RM codes. By a well-known result [52], the RM codes achieve the capacity of the BEC, and for any sequence of constant-rate RM codes, the distance scales as

2^{Θ (\sqrt{n})}

. Therefore, the RM codes satisfy the assumptions of Theorem 7.

Note that for the RM codes, we can obtain a stronger result, based on their error correction properties on the BSC. Involving this additional argument brings them closer to the secrecy capacity under the strong secrecy assumption.

Theorem 8.

Let

C_{e}^{n}

and

C_{b}^{n}

be two sequences of RM codes satisfying

C_{e}^{n} \subset C_{b}^{n}

whose rates approach

R_{e} > 0

and

R_{b} > 0,

respectively. If

R_{b} < 1 - h (δ_{b})

and

R_{e} > 4 δ_{e} (1 - δ_{e})

, then the nested coding scheme based on

C_{e}^{n}

and

C_{b}^{n}

supports transmission on a BSC wiretap channel

(δ_{b}, δ_{e})

with rate

R_{b} - R_{e},

guaranteeing communication reliability and strong secrecy.

Proof.

Very recently, Abbe and Sandon [53], building upon the work of Reeves and Pfister [54], proved that RM codes achieve capacity in symmetric channels. Therefore, the condition

R_{b} < 1 - h (δ_{b})

guarantees reliability. The rest of the proof is similar to that of Theorem 7. □

Theorems 7 and 8 stop short of constructing codes that attain the secrecy capacity of the channel (this is similar to the results of [14] for the transmission problem over the BSC). To quantify the gap to capacity, we plot the smoothing and decodability rate bounds in Figure 2.

As an example, let us set the noise parameters

δ_{b} = 0.05

and

δ_{e} = 0.3

and denote the corresponding secrecy capacity by

C_{s}

. Suppose that we use a BEC capacity-achieving code as code

C_{b}

and a dual of a BEC capacity-achieving code as code

C_{e}

in the nested scheme. The value

R^{'}

is the largest rate at which we can guarantee both reliability and strong secrecy. In the example in Figure 2,

C_{s} = R_{b}^{(1)} - R_{e}^{(1)} = 0.5949

and

R^{'} = R_{b}^{(2)} - R_{e}^{(2)} = 0.3181

. The only assumption required here is that the codes

C_{e}^{⊥}

and

C_{b}

have good erasure correction properties.

As noted, generally, the RM codes support a higher communication rate than the

R^{'}

. Let

R^{''}

be their achievable rate. For the same noise parameters as above, we obtain

R^{''} = R_{b}^{(1)} - R_{e}^{(2)} = 0.5536,

which is closer to

C_{s}

than

R^{'}

.

Remark 6.

The fact that the RM codes achieve capacity in symmetric channels immediately implies that nested RM codes achieve the secrecy capacity in the BSC wiretap channel under weak secrecy. While it is tempting to assume that, coupled with the channel duality theorems of [55,56], this result also implies that RM codes fulfil the strong secrecy requirement on the BSC wiretap channel, an immediate proof looks out of reach [57].

Secrecy from $α$ -Divergence

Classically, the (strong) secrecy in the wiretap channel is measured by

I (M, Z)

. In [11], slightly weaker secrecy measures were considered besides the mutual information. However, more stringent secrecy measures may be required in certain scenarios;

α

-divergence-based secrecy measures were introduced by Yu and Tan [3] as a solution to this problem.

Observe that the secrecy measured by

D_{α} (P_{Z | M} ∥ U_{n} | M)

for

α \geq 1

is stronger than the mutual-information-based secrecy. This is because for

α \geq 1

\begin{matrix} I (M; Z) \leq D (P_{Z | M} ∥ U_{n} | P_{M}) \leq D_{α} (P_{Z | M} ∥ U_{n} | P_{M}) . \end{matrix}

Given a wiretap channel with an encoding-decoding scheme, we say the

α

-secrecy is satisfied if

\begin{matrix} lim_{n \to \infty} D_{α} (P_{Z | M} ∥ U_{n} | P_{M}) = 0 . \end{matrix}

The following theorem establishes that it is possible to achieve the rate

C (δ_{b}) - S_{α}^{β_{δ_{e}}} = h_{α} (δ_{e}) - h (δ_{b})

with RM codes for

α \in {2, 3, \dots, \infty}

.

Theorem 9.

Let

α \in {2, 3, \dots, \infty}

. Let

C_{e}^{n}

and

C_{b}^{n}

be two sequences of RM codes satisfying

C_{e}^{n} \subset C_{b}^{n}

whose rates approach

R_{e} > 0

and

R_{b} > 0,

respectively. If

R_{b} < 1 - h (δ_{b})

and

R_{e} > 1 - h_{α} (δ_{e})

, then the nested coding scheme based on

C_{e}^{n}

and

C_{b}^{n}

supports transmission on a BSC wiretap channel

(δ_{b}, δ_{e})

guaranteeing α-secrecy with rate

R_{b} - R_{e},

provided that

h_{α} (δ_{e}) - h (δ_{b}) > 0

.

Evidently, to achieve a stringent version of secrecy, it is necessary to reduce the rate of the message. The capacity of the

(δ_{b}, δ_{e})

-wiretap channel is

h (δ_{e}) - h (δ_{b})

, while the known highest rate that assures

α

-secrecy and reliability is

h_{α} (δ_{e}) - h (δ_{b})

. Hence, to achieve

α

-secrecy, we must give up

h (δ_{e}) - h_{α} (δ_{e})

of the attainable rate.

6. Ball Noise and Error Probability of Decoding

This section focuses on achieving the best possible smoothing with respect to the ball noise. As an application, we show that codes that possess good smoothing properties with respect to the ball noise are suitable for error correction in the BSC.

6.1. Ball Noise

Recall that the perfect smoothing of a sequence of codes is only possible if the rate is greater than the corresponding

D_{α}

-smoothing capacity. In addition to characterizing the

D_{α}

-smoothing capacities of the ball noise, we quantify the best smoothing one can expect with rates below the

D_{α}

-smoothing capacity. We will use these results in the upcoming subsection when we derive upper bounds for the decoding error probability on a BSC. The next theorem summarizes our main result on smoothing with respect to the ball noise.

Theorem 10.

Let

{(b_{δ n})}_{n}

be the sequence of ball noise operators, where

δ n

is the radius of the ball. Let

δ \in [0, 1 / 2], α \in [0, \infty] .

Let

C_{n}

be a code of length n and rate

R_{n} .

Then, we have the following bounds:

\begin{matrix} D_{α} (T_{b_{δ n}} f_{C_{n}} ∥ U_{n}) & \geq 0 \end{matrix}

(19)

\begin{matrix} \frac{1}{n} D_{α} (T_{b_{δ n}} f_{C_{n}} ∥ U_{n}) & \geq 1 - R_{n} - h (δ) . \end{matrix}

(20)

There exist sequences of codes of rate

R_{n} \to R

that achieve asymptotic equality in (19) for all

R > 1 - h (δ) .

At the same time, if

R < 1 - h (δ)

, then there exist sequences of codes achieving asymptotic equality in (20).

Proof.

The inequality in (19) is trivial. Let us prove that asymptotically it can be achieved with equality. From Theorem 3, there exists a sequence of codes

{(C_{n})}_{n}

such that

D_{\infty} (T_{b_{δ n}} f_{C_{n}} ∥ U_{n}) =

o (1)

given that

R > 1 - h (δ)

. Hence, for

α \in [0, \infty]

\begin{matrix} 0 \leq D_{α} (T_{b_{δ n}} f_{C_{n}} ∥ U_{n}) \leq D_{\infty} (T_{b_{δ n}} f_{C_{n}} ∥ U_{n}) = o (1) . \end{matrix}

Hence, the equality case in (19) is achievable for all

α \in [0, \infty]

.

Let us prove (20). From Lemma 1, we have

\begin{matrix} D_{α} (T_{b_{δ n}} f_{C_{n}} ∥ U_{n}) & \geq n (1 - R_{n}) - H_{α} (b_{δ n}) \geq n (1 - R_{n} - h (δ)) \end{matrix}

because

\frac{1}{n} H_{α} (b_{δ n}) = \frac{1}{n} log V_{δ n} \leq h (δ)

.

We are left to show that for

R < 1 - h (δ),

(20) can be achieved with equality in the limit of large n. We use a random coding argument to prove this. Let

C_{n}

be an

(n, 2^{n R_{n}})

code whose codewords are chosen independently and uniformly. In Equation (A6), Appendix B, we define the expected norm of the noisy function. Here, we use this quantity for the ball noise kernel. For

α \in [0, \infty)

, define

Q_{n} (α) = E_{C_{n}} 2^{(α - 1) D_{α} (T_{b_{δ n}} ∥ U_{n})} .

From Lemma A2, for any rational

α \geq 1

,

\begin{matrix} Q_{n} (α) & \leq \sum_{k = 0}^{p} (\binom{p}{k}) 2^{\frac{n k}{q} (1 - R_{n} - \frac{log V_{δ n}}{n})} Q_{n} (\frac{p - k}{q}), \end{matrix}

(21)

for

p, q \in Z_{0}^{+}

such that

α = 1 + \frac{p}{q}

.

Assume that

R < 1 - h (δ)

. Let us prove that

Q_{n} (α) \leq 2^{n (α - 1) (1 - R - h (δ) + o (1))}

for rational values of

α

using induction. Let

α \in [1, 2]

be rational and note that

p \leq q

. Since

Q_{n} (\cdot) \leq 1

when the argument is less than 1, we can write (21) as follows:

\begin{matrix} Q_{n} (α) & \leq \sum_{k = 0}^{p} (\binom{p}{k}) 2^{\frac{n k}{q} (1 - R_{n} - \frac{log V_{δ n}}{n})} = 2^{n (α - 1) (1 - R - h (δ) + o (1))} . \end{matrix}

Now, assume that (21) holds for all rational

α \in [1, m]

for some integer

m \geq 2

and prove that, in this case, it holds also for

α \in (m, m + 1] .

By the induction hypothesis,

\begin{matrix} Q_{n} (α) & \leq \sum_{0 \leq k \leq p - q} (\binom{p}{k}) 2^{\frac{n k}{q} (1 - R_{n} - \frac{log V_{δ n}}{n})} 2^{n \frac{p - k - q}{q} (1 - R - h (δ) + o (1))} + \sum_{k = p - q}^{p} (\binom{p}{k}) 2^{\frac{n k}{q} (1 - R_{n} - \frac{log V_{δ n}}{n})} \\ \leq \sum_{0 \leq k \leq p - q} (\binom{p}{k}) 2^{n (α - 2) (1 - R - h (δ) + o (1))} + \sum_{k = p - q}^{p} (\binom{p}{k}) 2^{n (α - 1) (1 - R - h (δ) + o (1))} \\ = 2^{n (α - 1) (1 - R - h (δ) + o (1))} . \end{matrix}

Therefore, for every rational

α \in (1, \infty)

, there exists a sequence of codes satisfying

\begin{matrix} D_{α} (T_{b_{δ n}} f_{C_{n}} ∥ U_{n}) = n (1 - R - h (δ) + o (1)), \end{matrix}

(22)

which is equivalent to the equality in (20).

Let us extend this result to non-negative reals. Let

α \in [0, \infty)

and let us choose a rational

α^{'} \in (1, \infty)

such that

α < α^{'}

. We know that there exists a sequence of codes satisfying

\begin{matrix} D_{α^{'}} (T_{b_{δ n}} f_{C_{n}} ∥ U_{n}) = n (1 - R - h (δ) + o (1)) . \end{matrix}

From (20) and from Remark 1,

\begin{matrix} n (1 - R_{n} - h (δ)) \leq D_{α} (T_{b_{δ n}} f_{C_{n}} ∥ U_{n}) \leq D_{α^{'}} (T_{b_{δ n}} f_{C_{n}} ∥ U_{n}) = n (1 - R - h (δ) + o (1)) . \end{matrix}

Hence, the asymptotic equality in (20) is achievable for all

α \in [0, \infty)

. □

The above theorem characterizes the

D_{α}

-smoothing capacities with respect to ball noise.

Corollary 2.

Let

δ \in [0, 1 / 2]

. Let

b (δ) = {(b_{δ n})}_{n}

be a sequence of ball noise operators, where

δ n

is the radius corresponding to the n-th kernel. Then,

\begin{matrix} S_{α}^{b (δ)} & = 1 - h (δ) for α \in [0, \infty] . \end{matrix}

The norms of

T_{b_{t}} f_{C}

can be used to bound the decoding error probability on a BSC. While estimating these norms for a given code is generally complicated, the second norm affords a compact expression based on the distance distribution of the code. In the next section, we bound the decoding error probability using the second norm of

T_{b_{t}} f_{C}

. The following proposition provides closed-form expressions for

∥ 2^{n} T_{b_{t}} f_{C} ∥_{2}^{2}

.

Proposition 4.

\begin{matrix} ∥ 2^{n} T_{b_{t}} f_{C} ∥_{2}^{2} = \frac{2^{n}}{| C | V_{t}^{2}} \sum_{i = 0}^{n} μ_{t} (i) A_{i} = \frac{1}{V_{t}^{2}} \sum_{k = 0}^{n} L_{t} {(k)}^{2} A_{k}^{⊥} . \end{matrix}

where

μ_{t} (i)

is defined in (1) and

L_{t}

is the Lloyd polynomial of degree t (A2).

The proof is immediate from Proposition A1 in combination with (A2) and (A4).

6.2. Probability of Decoding Error on a BSC $(δ)$

The idea that the smoothing of codes under some conditions implies good decoding performance has appeared in a number of papers using different language. The smoothing of capacity-achieving codes was considered in [18,46]. Hązła et al. [14] showed that if a code (sequence) is perfectly smoothable with respect to the Bernoulli noise, then the dual code is good for decoding (see Theorem A4, Corollary A1). Going from smoothing to decodability involves representing the

D_{2}

-smoothness of codes with respect to the Bernoulli noise as a potential energy form and comparing it to the Bhattacharyya bound for the dual codes. One limitation of this approach is that it cannot infer decodability for rates

R > 1 - log (1 + 2 \sqrt{δ (1 - δ)})

(this is the region above the blue solid curve in Figure 2). Rao and Sprumont [15] and Hązła [34] proved that sufficient smoothing of codes implies the decodability of the codes themselves rather than their duals. However, these results are concerned with list decoding for rates above the Shannon capacity, resulting in an exponential list size, which is arguably less relevant from the perspective of communication.

Except for [15], the cited papers utilize perfect or near-perfect smoothing to infer decodability. For codes whose rates are below the capacity, perfect smoothing is impossible. At the same time, codes that possess sufficiently good smoothing properties are good for decoding. This property is at the root of the results for list decoding in [15]; however, their bounds were insufficient to make conclusions about list decoding below capacity.

Consider a channel where, for the input

X \sim f_{C}

, the output Y is given by

Y = X + W

with

W \sim b_{t}

. Define

F_{t} (y) = | C \cap B (y, t) |

as the number of codewords in the ball

B (y, t)

. Hence, for a received vector y, the possible number of codewords that can yield y is given by

F_{t} (y)

. Intuitively, the decoding error is small if

F_{t} (y) \approx 1

for typical errors. Therefore,

F_{t}

is of paramount interest in decoding problems. Since the typical errors for both ball noise and the Bernoulli noise are almost the same, this allows us to obtain a bound for decodability in the BSC channel. Using this approach, we show that the error probability of decoding on a

BSC (δ)

can be expressed via the second moment of the number of codewords in the ball of radius

t ≳ δ n

.

Assume, without loss of generality, that

C

is a linear code and

0^{n}

is used for transmission. Let Y be the random Bernoulli vector of errors, and note that

Y \sim β_{δ}

. The calculation below does not depend on whether we rely on unique or list decoding within a ball of radius t, so let us assume that the decoder outputs

L \geq 1

candidate codewords conditioned on the received vector y, which is a realization of

Y .

In this case, the list decoding error can be written as

\begin{matrix} P_{L, t} (C, BSC (δ)) = Pr {F_{t} (Y) \geq L + 1 \cup | Y | > t} . \end{matrix}

(23)

Theorem 11.

Let t and

t^{'}

be integers such that

0 < t^{'} < t < n

. Then, for any

L \geq 1,

\begin{matrix} P_{L, t} (C, BSC (δ)) \leq \frac{β_{δ} (t^{'})}{L} \sum_{w = 1}^{n} μ_{t} (w) A_{w} + Pr (| Y | \leq t^{'} \cup | Y | \geq t) . \end{matrix}

(24)

Proof.

Define

S_{t^{'}, t} = B (0, t) ∖ B (0, t^{'})

. Clearly,

\begin{matrix} P_{L, t} (C, BSC (δ)) & = Pr {F_{t} (Y) \geq L + 1 \cup | Y | > t} \\ \leq Pr {(F_{t} (Y) \geq L + 1) \cap (Y \in S_{t^{'}, t})} + Pr (Y \notin S_{t^{'}, t}) . \end{matrix}

Let us estimate the first of these probabilities.

\begin{matrix} Pr {(F_{t} (Y) \geq L + 1) \cap (Y \in S_{t^{'}, t})} \\ = \sum_{y \in S_{t^{'}, t}} 1_{F_{t} (y) \geq L + 1} β_{δ} (y) \\ \leq \sum_{y \in S_{t^{'}, t}} \frac{F_{t} (y) - 1}{L} β_{δ} (y) \\ \leq \frac{β_{δ} (t^{'})}{L} \sum_{y \in S_{t^{'}, t}} (F_{t} (y) - 1) \\ \leq \frac{β_{δ} (t^{'})}{L} \sum_{y \in B (0, t)} (F_{t} (y) - 1) (because for all y \in B (0, t), F_{t} (y) \geq 1) \\ = \frac{β_{δ} (t^{'})}{L} (\sum_{y \in H_{n}} (1_{C} * 1_{B (0, t)}) (y) 1_{B (0, t)} (y) - V_{t}) \\ = \frac{β_{δ} (t^{'})}{L} (\sum_{c \in C} (1_{B (0, t)} * 1_{B (0, t)}) (c) - V_{t}) \\ = \frac{β_{δ} (t^{'})}{L} \sum_{i = 1}^{n} μ_{t} (i) A_{i} . □ \end{matrix}

Remark 7.

In the case of

L = 1

, the bound in (24) can be considered a slightly weaker version of Poltyrev’s bound [58], Lemma 1. By allowing this weakening, we obtain a bound in a somewhat more closed form, also connecting the decodability with smoothing. We also prove a simple bound for the error probability of list decoding expressed in terms of the code’s distance distribution (and, from (A4), also in terms of the dual distance distribution). The latter result seems not to have appeared in earlier literature.

The following version of this lemma provides an error bound, which is useful in the asymptotic setting.

Proposition 5.

Let

t = δ n + n^{θ},

where

θ \in (1 / 2, 1) .

Then,

\begin{matrix} P_{L, t} (C, BSC (δ)) \leq \frac{\sqrt{2 n}}{L V_{t}} {(\frac{1 - δ}{δ})}^{2 n^{θ}} \sum_{w = 1}^{n} μ_{t} (w) A_{w} + 2 e^{- n^{2 θ - 1}} . \end{matrix}

In particular,

\begin{matrix} P_{L, t} (C, BSC (δ)) \leq \frac{\sqrt{2 n}}{V_{t}} {(\frac{1 - δ}{δ})}^{2 n^{θ}} \sum_{w = 1}^{n} μ_{t} (w) A_{w} + 2 e^{- n^{2 θ - 1}} . \end{matrix}

Proof.

Set

t^{'} = δ n - n^{θ}

. A direct calculation shows that

β_{δ} (t^{'}) V_{t} < \sqrt{2 n} {(\frac{1 - δ}{δ})}^{2 n^{θ}} .

By the Hoeffding bound,

\begin{matrix} Pr (| Y | \leq t^{'} \cup | Y | \geq t) \leq 2 e^{- n^{2 θ - 1}} . \end{matrix}

Together with Lemma 11, this implies our statements. □

A question of prime importance is whether the right-hand side quantities in Proposition 5 converge to 0. For

R < 1 - h (δ)

, one can easily see that for random codes,

\sum_{w = 1}^{n} \frac{μ_{t} (w)}{V_{t}} A_{w} = 2^{- Θ (n)}

, where

t = δ n + n^{θ}

, showing that this is, in fact, the case.

From Proposition 4, it is clear that the potential energy

\sum_{w = 1}^{n} μ_{t} (w) A_{w}

is a measure of the smoothness of

T_{b_{t}} f_{C}

. This implies that codes that are sufficiently smoothable with respect to

b_{t}

are decodable in the BSC with vanishing error probability. In other words, Proposition 5 establishes a connection between the smoothing and the decoding error probability.

7. Perfect Smoothing—The Finite Case

In this section, we briefly overview another form of perfect smoothing, which is historically the earliest application of these ideas in coding theory. It is not immediately related to the information-theoretic problems considered in the other parts.

We are interested in radial kernels that yield perfect smoothing for a given code. We often write

r (i)

instead of

r (x)

if

| x | = i,

and call

ρ (r) : = max (i : r (i) \neq 0)

the radius of r. Note that the logarithm of the support size of r (as a function on the space

H_{n}

) is exactly the 0-Rényi entropy of r. Therefore, kernels with smaller radii can be perceived as less random, supporting the view of the radius

ρ (r)

as a general measure of randomness.

Definition 4.

We say a code

C

is perfectly smoothable with respect to r if

T_{r} f_{C} (x) = \frac{1}{2^{n}}

for all

x \in H_{n}

, and, in this case, we say that r is a perfectly smoothing kernel for

C

.

Intuitively, such a kernel should have a sufficiently large radius. In particular, it should be as large as the covering radius of the code

ρ (C)

or otherwise smoothing does not affect the vectors that are

ρ

away from the code. To obtain a stronger condition, recall that the external distance of code

C

is

\bar{d} (C) = | {i \geq 1 : A_{i}^{⊥} \neq 0} | .

Proposition 6.

Let r be a perfectly smoothing kernel of code

C

. Then,

ρ (r) \geq \bar{d} (C) .

Proof.

Note that perfect smoothing of

C

with respect to r is equivalent to

\begin{matrix} ∥ 2^{n} T_{r} f_{C} ∥_{2}^{2} = 1, \end{matrix}

which by Proposition A1 is equivalent to the following condition:

\begin{matrix} \sum_{i = 1}^{n} \hat{r} {(i)}^{2} A_{i}^{⊥} = 0 . \end{matrix}

Therefore,

\begin{matrix} \bar{d} (C) = | {i \geq 1 : A_{i}^{⊥} \neq 0} | \leq n - | {i \geq 1 : \hat{r} (i) \neq 0} | . \end{matrix}

By definition,

\begin{matrix} \hat{r} = \frac{1}{2^{n}} K^{⊺} r, \end{matrix}

where

K = {(K_{i} (j))}_{i, j = 0}^{n}

is the Krawtchouk matrix. Define

I_{1} = {j \in {1, 2, \dots, n} : \hat{r} (j) = 0}

and

I_{2} = {i \in {1, 2, \dots, n} : r (i) \neq 0}

then,

\begin{matrix} 0 = \hat{r} |_{I_{1}} = \frac{1}{2^{n}} K^{⊺} |_{(I_{1}, :)} r = \frac{1}{2^{n}} K^{⊺} {|_{(I_{1}, I_{2})} r |}_{I_{2}} . \end{matrix}

This relation implies that there exists a linear combination of Krawtchouk polynomials of degree at most

ρ (r)

with

| I_{1} |

roots. Therefore,

\bar{d} (C) \leq n - | supp ({\hat{r} (i)}_{i = 1}^{n}) | = | I_{1} | \leq ρ (r) .

□

Since

ρ (C) \leq \bar{d} (C)

, this inequality strengthens the obvious condition

ρ (r) \geq ρ (C) .

At the same time, there are codes that are perfectly smoothable by a radial kernel r such that

ρ (r) = ρ (C)

.

Definition 5

([59]). A code

C

is uniformly packed in the wide sense if there exists rational numbers

{α_{i}}_{i = 0}^{ρ}

such that

\begin{matrix} \sum_{i = 0}^{ρ (C)} α_{i} A_{i} (x) = 1 for all x \in H_{n}, \end{matrix}

where

A_{i} (x)

is the weight distribution of the code

C - x

.

Our main observation here is that some uniformly packed codes are perfectly smoothable with respect to noise kernels that are minimal in a sense. The following proposition states this more precisely.

Proposition 7.

Let

C

be a code that is perfectly smoothable by a radial kernel of radius

ρ (r) = ρ (C)

. Then,

C

is uniformly packed in the wide sense with

α_{i} \geq 0

for all i.

Proof.

By definition, if

C

is perfectly smoothable with respect to r, then

2^{n} T_{r} f_{C} = 1

, which is tantamount to

\sum_{y \in H_{n}} \frac{2^{n}}{| C |} r (y) 1_{C} (x - y) = 1

for all

x \in H_{n}

. This condition can be written as

\sum_{i = 0}^{ρ} (\frac{2^{n}}{| C |} r (i)) A_{i} (x) = 1

for all

x \in H_{n}

, completing the proof. □

To illustrate this claim, we list several families of uniformly packed codes ([59,60,61]) that are perfectly smoothable by a kernel of radius equal to the covering radius of the code.

(i): Perfect codes: $r = b_{ρ}$ , where $ρ = ρ (C)$ is the covering radius.
(ii): 2-error-correcting BCH codes of length $2^{2 m + 1}, m \geq 2$ . The smoothing kernel r is given by

$r (0) = r (1) = L, r (2) = r (3) = \frac{3 L}{n}, r (i) = 0, i \geq 4 .$
(iii): Preparata codes. The smoothing kernel r is given by

$r (0) = r (1) = L, r (2) = r (3) = \frac{6 L}{n - 1}, r (i) = 0, i \geq 4 .$
(iv): Binary $(2^{m} - 1, 2^{2^{m} - 3 m + 2}, 7)$ Goethals-like codes [60]. The smoothing kernel r is given by

$r (0) = r (1) = L, r (2) = r (3) = \frac{65 L}{2 n}, r (4) = r (5) = \frac{30 L}{n (n - 3)}, r (i) = 0, i \geq 4 .$

Here, L is a generic notation for the normalizing factor. More examples are found in a related class of completely regular codes [62].

Definition 5 does not include the condition that

α_{i} \geq 0

, and, in fact, there are codes that are uniformly packed in the wide sense, but some of the

α_{i}

’s are negative, and, thus, they are not smoothable by a noise kernel of radius

ρ (C)

. One such family is the 3-error-correcting binary BCH codes of length

2^{2 m + 1}, m \geq 2

[60].

Author Contributions

Conceptualization, A.B.; Formal analysis, M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science Foundation, USA, grant number CCF-2104489, CCF-2110113.

Data Availability Statement

Not applicable.

Acknowledgments

We are grateful to the reviewers for their feedback on our manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. L₂ Smoothing

The Fourier transform of a function

f : H_{n} \to R

is a function on the dual group

{\hat{H}}_{n}

, which we identify with

H_{n}

:

\hat{f} (y) = \frac{1}{2^{n}} \sum_{x \in H_{n}} f (x) {(- 1)}^{x \cdot y}, y \in H_{n} .

(A1)

The Fourier transform of the indicator function of the sphere is given by

{\hat{1}}_{S (0, t)} = \frac{1}{2^{n}} K_{t},

where

K_{t} (x) = K_{t}^{(n)} (x) = \sum_{j = 0}^{t} {(- 1)}^{j} (\binom{x}{j}) (\binom{n - x}{t - j})

is a Krawtchouk polynomial of degree t. Then, clearly the Fourier transform of the indicator of the ball is

{\hat{1}}_{B (0, t)} = \frac{1}{2^{n}} L_{t},

(A2)

where

L_{t} (x) : = \sum_{i = 0}^{t} K_{i} (x)

is called the Lloyd polynomial ([63], p. 64). The intersection of balls in (1) can be written as

1_{B (0, t)} * 1_{B (x, t)},

which implies the expression ([42], Lemma 4.1)

μ_{t} (i) = 2^{- n} \sum_{k = 0}^{n} L_{t} {(k)}^{2} K_{k} (i) .

(A3)

Given a code

C \subset H_{n}

, we define the dual distance distribution of

C

as the set of numbers

A_{j}^{⊥} : = \frac{1}{| C |} \sum_{i = 0}^{n} A_{i} K_{j} (i),

where

{(A_{i})}_{i = 0}^{n}

is the distance distribution of

C

(2). Note that when

C

is linear, the set

{(A_{j}^{⊥})}_{j = 0}^{n}

coincides with the distance distribution of its dual code

C^{⊥} .

For a radial potential V on

H_{n}

and a code

C

, we have

\begin{matrix} \sum_{i = 0}^{n} V (i) A_{i} = | C | \sum_{k = 0}^{n} \hat{V} (k) A_{k}^{⊥} . \end{matrix}

(A4)

The

L_{2}

-smoothness of a noisy code distribution can be written in terms of the distance distribution or of the dual distance distribution.

Proposition A1.

Let

C

be a code and r be a noise kernel. Then,

\begin{matrix} ∥ 2^{n} T_{r} f_{C} ∥_{2}^{2} = \frac{2^{n}}{| C |} \sum_{i = 0}^{n} (r * r) (i) A_{i} = 4^{n} \sum_{k = 0}^{n} \hat{r} {(k)}^{2} A_{k}^{⊥} . \end{matrix}

Proof.

Let us prove the first equality:

\begin{matrix} ∥ 2^{n} T_{r} f_{C} ∥_{2}^{2} & = \frac{1}{2^{n}} \sum_{x \in H_{n}} {(2^{n} T_{r} f_{C} (x))}^{2} \\ = \frac{2^{n}}{{| C |}^{2}} \sum_{x \in H_{n}} (r * 1_{C}) {(x)}^{2} \\ = \frac{2^{n}}{{| C |}^{2}} \sum_{x \in H_{n}} \sum_{y \in H_{n}} r (x - y) 1_{C} (y) \sum_{z \in H_{n}} r (x - z) 1_{C} (z) \\ = \frac{2^{n}}{{| C |}^{2}} \sum_{y \in C} \sum_{z \in C} \sum_{x \in H_{n}} r (x - y) r (x - z) \\ = \frac{2^{n}}{{| C |}^{2}} \sum_{y \in C} \sum_{z \in C} (r * r) (y - z) \\ = \frac{2^{n}}{| C |} \sum_{i = 0}^{n} (r * r) (i) A_{i} . \end{matrix}

(A5)

The second equality is immediate by noticing that

\hat{r * r} = 2^{n} {\hat{r}}^{2}

and using (A4). □

Appendix B. Proof of Theorem 3

We will first establish Theorem 3 when

α

is rational, and then use a density argument to extend the proof to all real numbers. The case

α = \infty

is handled separately at the end of this appendix.

We will use the following technical claim:

Lemma A1.

Let x and y be two non-negative reals. Further, let p and q be positive integers. Then,

\begin{matrix} {(x + y)}^{\frac{p}{q}} \leq \sum_{k = 0}^{p} (\binom{p}{k}) x^{\frac{k}{q}} y^{\frac{p - k}{q}} . \end{matrix}

Proof.

Clearly

{(x + y)}^{\frac{1}{q}} \leq x^{\frac{1}{q}} + y^{\frac{1}{q}}

. Therefore,

{(x + y)}^{\frac{p}{q}} \leq {(x^{\frac{1}{q}} + y^{\frac{1}{q}})}^{p} = \sum_{k = 0}^{p} (\binom{p}{k}) x^{\frac{k}{q}} y^{\frac{p - k}{q}} . □

For

M \geq 1

, let

C = (c_{0}, c_{2}, \dots, c_{M - 1}) □

be a code whose codewords are chosen randomly and independently from

H_{n} .

For

α \in [0, \infty)

, define

Q_{n} (α) = E_{C} 2^{(α - 1) D_{α} (T_{r} f_{C} ∥ U_{n})} .

(A6)

For

α > 0,

Q_{n} (α) = {∥ 2^{n} T_{r} f_{C} (x) ∥}_{α}^{α} .

Clearly

Q_{n} (1) = 1, Q_{n} (α) \leq 1 for α \in [0, 1)

, and

Q_{n} (α) \geq 1

for

α > 1 .

In the next lemma, we obtain a recursive bound for

Q_{n}

. We will then use an induction argument to show the full result.

Lemma A2.

Let

α = \frac{p}{q} + 1

and let

C \subset H_{n}

be a random code of size

M = 2^{n R}

. Then,

\begin{matrix} Q_{n} (α) & \leq \sum_{k = 0}^{p} (\binom{p}{k}) 2^{\frac{n k}{q} (1 - R - \frac{1}{n} H_{1 + k / q} (r))} Q_{n} (\frac{p - k}{q}) . \end{matrix}

(A7)

Proof.

In the calculation below, we write

E

for

E_{C}

. Starting with (A6), we obtain

\begin{matrix} Q_{n} (α) & = E [\frac{1}{2^{n}} \sum_{x \in H_{n}} {[2^{n} (r * f_{C}) (x)]}^{α}] \\ = 2^{n (α - 1)} E [\sum_{x \in H_{n}} {[\sum_{z \in C} r (x - z) \frac{1}{M}]}^{α}] \\ = \frac{2^{n (α - 1)}}{M^{α}} \sum_{x \in H_{n}} E {[\sum_{i = 0}^{M - 1} r (x - c_{i})]}^{α} \\ = \frac{2^{n (α - 1)}}{M^{α}} \sum_{x \in H_{n}} E [\sum_{i = 0}^{M - 1} r (x - c_{i}) {[\sum_{j = 0}^{M - 1} r (x - c_{j})]}^{α - 1}] \\ = \frac{2^{n (α - 1)}}{M^{α}} \sum_{x \in H_{n}} E [\sum_{i = 0}^{M - 1} r (x - c_{i}) {[r (x - c_{i}) + \sum_{j = 0, j \neq i}^{M - 1} r (x - c_{j})]}^{\frac{p}{q}}] \\ \leq \frac{2^{n (α - 1)}}{M^{α}} \sum_{x \in H_{n}} E [\sum_{i = 0}^{M - 1} r (x - c_{i}) \sum_{k = 0}^{p} (\binom{p}{k}) r {(x - c_{i})}^{\frac{k}{q}} {[\sum_{j = 0, j \neq i}^{M - 1} r (x - c_{j})]}^{\frac{p - k}{q}}] \\ = \frac{2^{n (α - 1)}}{M^{α}} \sum_{k = 0}^{p} (\binom{p}{k}) \sum_{x \in H_{n}} E [\sum_{i = 0}^{M - 1} r {(x - c_{i})}^{1 + \frac{k}{q}} {[\sum_{j = 0, j \neq i}^{M - 1} r (x - c_{j})]}^{\frac{p - k}{q}}] \\ = \frac{2^{n (α - 1)}}{M^{α}} \sum_{k = 0}^{p} (\binom{p}{k}) \sum_{x \in H_{n}} E [\sum_{i = 0}^{M - 1} r {(x - c_{i})}^{1 + \frac{k}{q}}] E {[\sum_{j = 0, j \neq i}^{M - 1} r (x - c_{j})]}^{\frac{p - k}{q}}, \end{matrix}

where

c_{i}, i = 1, \dots, M

are random codewords in the code

C .

Recalling that

E r {(x - c_{i})}^{a} = {∥ r ∥}_{a}^{a}

for any

a > 0,

we continue as follows:

\begin{matrix} \leq \frac{2^{n (α - 1)}}{M^{α}} \sum_{k = 0}^{p} (\binom{p}{k}) \sum_{x \in H_{n}} M {∥ r ∥}_{1 + k / q}^{1 + k / q} E {[\sum_{j = 0}^{M - 1} r (x - c_{j})]}^{\frac{p - k}{q}} \\ = \frac{2^{n (α - 1)}}{M^{α - 1}} \sum_{k = 0}^{p} (\binom{p}{k}) {∥ r ∥}_{1 + k / q}^{1 + k / q} E [\sum_{x \in H_{n}} {[\sum_{j = 0}^{M - 1} r (x - c_{j})]}^{\frac{p - k}{q}}] \\ = \frac{2^{n p / q}}{M^{p / q}} \sum_{k = 0}^{p} (\binom{p}{k}) {∥ r ∥}_{1 + k / q}^{1 + k / q} Q_{n} (\frac{p - k}{q}) \frac{M^{(p - k) / q}}{2^{n ((p - k) / q - 1)}} \\ = \sum_{k = 0}^{p} (\binom{p}{k}) \frac{2^{n (1 + k / q)}}{M^{k / q}} {∥ r ∥}_{1 + k / q}^{1 + k / q} Q_{n} (\frac{p - k}{q}) \\ = \sum_{k = 0}^{p} (\binom{p}{k}) 2^{\frac{n k}{q} (1 - R - \frac{H_{(1 + k / q)} (r)}{n})} Q_{n} (\frac{p - k}{q}), \end{matrix}

where we used (5) and the fact that r is a pmf. □

On account of (14), (A6), and Lemma 1, to prove Theorem 3, we need to prove the following:

Theorem A1.

Consider a sequence of ensembles of random codes of increasing length n and rate

R_{n} \to R .

If

R > 1 - π (α)

, where

π (α)

is given by (15), then,

\begin{matrix} lim_{n \to \infty} Q_{n} (α) = 1 \end{matrix}

(A8)

for all

α \in (1, \infty)

.

We start with the case of rational

α

.

Proposition A2.

Let

α \geq 0

be rational. If

R > 1 - π (α)

, then

{lim sup}_{n} Q_{n} (α) \leq 1

.

Proof.

This statement is true for all

0 \leq α < 1,

so also true for all rational

α

in [0,1).

Assume that it holds for all rational

α

in

[0, m)

, where

m \in Z^{+}

. Let

α \in [m, m + 1)

and choose

p, q \in Z_{0}^{+}

such that

α = 1 + \frac{p}{q}

. By Lemma A2,

\begin{matrix} \underset{n}{lim sup} Q_{n} (α) & \leq \underset{n}{lim sup} \sum_{k = 0}^{p} (\binom{p}{k}) 2^{\frac{n k}{q} (1 - R_{n} - \frac{H_{1 + k / q} (r_{n})}{n})} Q_{n} (\frac{p - k}{q}) \\ \leq \sum_{k = 0}^{p} (\binom{p}{k}) \underset{n}{lim sup} 2^{\frac{n k}{q} (1 - R_{n} - \frac{H_{1 + k / q} (r_{n})}{n})} \underset{n}{lim sup} Q_{n} (\frac{p - k}{q}) . \end{matrix}

If

R > 1 - π (α)

, then evidently,

R > 1 - π (1 + k / q)

for all

k \leq p

. Therefore,

\underset{n \to \infty}{lim sup} 2^{\frac{n k}{q} (1 - R_{n} - \frac{H_{1 + k / q} (r_{n})}{n})} = 0

for all

k > 0

. Since

\frac{p}{q} < m,

by the induction hypothesis, we have

{lim sup}_{n} Q_{n} (\frac{p - k}{q}) \leq 1

for

k = 0, 1, \dots, p

. Therefore, all the terms except the one with

k = 0

vanish, yielding

{lim sup}_{n} Q_{n} (α) \leq 1

. □

Since

Q_{n} (α) \geq 1

for

α > 1

, this proves Theorem A1 for all rational

α \in (1, \infty) .

Finally, let us extend this result to all real

α > 1

. As a first step, let us show that

π (α)

is continuous.

Lemma A3.

π (α)

is continuous for

1 < α < \infty .

Proof.

From the monotonicity of the Rényi entropies, for

α^{'} > α > 1

,

\begin{matrix} 0 & \leq π (α) - π (α^{'}) \\ = \underset{n \to \infty}{lim inf} \frac{1}{n} H_{α} (r_{n}) - \underset{n \to \infty}{lim inf} \frac{1}{n} H_{α^{'}} (r_{n}) . \end{matrix}

Now, let us choose a subsequence

{(r_{n_{k}})}_{k}

such that

\begin{matrix} lim_{k \to \infty} \frac{1}{n_{k}} H_{α^{'}} (r_{n_{k}}) = \underset{n \to \infty}{lim inf} \frac{1}{n} H_{α^{'}} (r_{n}) . \end{matrix}

Therefore,

\begin{matrix} π (α) - π (α^{'}) & = \underset{n \to \infty}{lim inf} \frac{1}{n} H_{α} (r_{n}) - lim_{k \to \infty} \frac{1}{n_{k}} H_{α^{'}} (r_{n_{k}}) \\ \leq \underset{k \to \infty}{lim inf} \frac{1}{n_{k}} (H_{α} (r_{n_{k}}) - H_{α^{'}} (r_{n_{k}})) . \end{matrix}

Note that

H_{α}

is a continuous function of the order

α

for

α > 1

. We use the mean value theorem to claim that there is a value

γ_{k} \in (α, α^{'}

such that

H_{α^{'}} (r_{n_{k}}) - H_{α} (r_{n_{k}}) = (α^{'} - α) \frac{d}{d α} H_{γ_{k}} (r_{n_{k}}) .

Next, for any probability vector P,

- \frac{d H_{α} (P)}{d α} = \frac{1}{{(1 - α)}^{2}} D (Z ∥ P) \leq \frac{log | supp (P) |}{{(1 - α)}^{2}},

where

Z_{i} = \frac{P_{i}^{α}}{\sum_{j} P_{j}^{α}} .

Taking these remarks together, we obtain

\begin{matrix} π (α) - π (α^{'}) & \leq \underset{k \to \infty}{lim inf} \frac{1}{n_{k}} (α - α^{'}) H_{γ_{k}}^{'} (r_{n_{k}}) \\ \leq \underset{k \to \infty}{lim inf} \frac{1}{n_{k}} (α^{'} - α) \frac{n_{k}}{{(γ_{k} - 1)}^{2}} \\ = \frac{α^{'} - α}{{(α - 1)}^{2}}, \end{matrix}

Therefore,

π (α)

is continuous on

(1, \infty)

. □

Now, let

α \in (1, \infty)

and assume

R > 1 - π (α)

. Choose

α^{'} > α

such that

α^{'}

is rational and

R > 1 - π (α^{'})

. This is possible from the continuity of

π

. Therefore,

\begin{matrix} 1 \leq \underset{n}{lim sup} Q_{n} (α) \leq \underset{n}{lim sup} Q_{n} (α^{'}) = 1, \end{matrix}

which proves that (A8) and Theorem 3 hold for all

α \in [1, \infty)

.

It remains to address the case

α = \infty

. We obtain the following upper bound, whose proof follows closely an argument in Appendix E of [3].

Lemma A4.

Let

ϵ > 0

. We have

\begin{matrix} E_{C} {∥ 2^{n} T_{r} f_{C} ∥}_{\infty} \leq 1 + ϵ + 2^{2 n - H_{\infty} (r)} e^{- \frac{3 ϵ^{2}}{2 (3 + ϵ)} 2^{- [n (1 - R) - H_{\infty} (r)]}} . \end{matrix}

Proof.

Let

ϵ > 0

, then,

\begin{matrix} E_{C} {∥ 2^{n} T_{r} f_{C} ∥}_{\infty} & = E_{C} [∥ 2^{n} T_{r} f_{C} ∥_{\infty} 1_{∥ 2^{n} T_{r} f_{C} ∥_{\infty} \geq 1 + ϵ}}] \\ + E_{C} [∥ 2^{n} T_{r} f_{C} ∥_{\infty} 1_{∥ 2^{n} T_{r} f_{C} ∥_{\infty} < 1 + ϵ}}] \\ \leq E_{C} [∥ 2^{n} T_{r} f_{C} ∥_{\infty} 1_{{∥ 2^{n} T_{r} f_{C} ∥_{\infty} \geq 1 + ϵ}}] + 1 + ϵ \\ \leq E_{C} [∥ 2^{n} {r ∥}_{\infty} 1_{{∥ 2^{n} T_{r} f_{C} ∥_{\infty} \geq 1 + ϵ}}] + 1 + ϵ \\ = ∥ 2^{n} {r ∥}_{\infty} {Pr}_{C} (max_{y \in H_{n}} 2^{n} T_{r} f_{C} (y) \geq 1 + ϵ) + 1 + ϵ \\ \leq ∥ 2^{n} {r ∥}_{\infty} 2^{n} max_{y \in H_{n}} {Pr}_{C} (2^{n} T_{r} f_{C} (y) \geq 1 + ϵ) + 1 + ϵ . \end{matrix}

(A9)

For any

y \in H_{n}

,

\begin{matrix} {Pr}_{C} (2^{n} T_{r} f_{C} (y) \geq 1 + ϵ) & = {Pr}_{C} (\frac{2^{n}}{M} \sum_{z \in C} r (y - z) \geq 1 + ϵ) \\ = {Pr}_{c_{i} \sim U_{n}, iid} (\frac{2^{n}}{M} \sum_{i = 1}^{M} r (y - c_{i}) \geq 1 + ϵ) \\ = Pr (\frac{2^{n}}{M} \sum_{i = 1}^{M} r (y - c_{i}) \geq 1 + ϵ) \\ = Pr (\sum_{i = 1}^{M} (2^{n} r (y - c_{i}) - 1) \geq M ϵ) \end{matrix}

(A10)

To bound the last line from above, we use Bernstein’s inequality: For independent, zero-mean random variables

X_{i}, i = 1, \dots, N

such that

| X_{i} | \leq a

for all i,

P (\sum_{i} X_{i} \geq t) \leq exp (- \frac{t^{2} / 2}{\sum_{i = 1}^{n} E X_{i}^{2} + \frac{1}{3} a t}) .

Note that for a random uniform vector

c_{i}

, the expectation

E [r (y - c_{i})] = 2^{- n}

since

r (\cdot)

satisfies

\sum_{x \in H_{n}} r (x) = 1,

so this inequality applies for (A10). We obtain

\begin{matrix} {Pr}_{C} (2^{n} T_{r} f_{C} (y) \geq 1 + ϵ) & \leq exp (- \frac{\frac{1}{2} M^{2} ϵ^{2}}{\sum_{i = 1}^{M} Var (2^{n} r (y - \cdot)) + \frac{1}{3} {∥ 2^{n} r ∥}_{\infty} M ϵ}) \\ \leq exp (- \frac{\frac{1}{2} M^{2} ϵ^{2}}{\sum_{i = 1}^{M} ∥ 2^{n} {r ∥}_{2}^{2} + \frac{1}{3} {∥ 2^{n} r ∥}_{\infty} M ϵ}) \\ \leq exp (- \frac{\frac{1}{2} M^{2} ϵ^{2}}{M ∥ 2^{n} {r ∥}_{1} ∥ 2^{n} {r ∥}_{\infty} + \frac{1}{3} {∥ 2^{n} r ∥}_{\infty} M ϵ}) \\ = exp (- \frac{3 ϵ^{2}}{2 (3 + ϵ)} 2^{- n (1 - R) - H_{\infty} (r)}) . \end{matrix}

where on the last line, we use the equalities

∥ 2^{n} {r ∥}_{1} = 1

and

∥ 2^{n} {r ∥}_{\infty} = 2^{D_{\infty} (r ∥ U_{n})} .

The proof is concluded by substituting this inequality into (A9). □

Now, let us consider a sequence of (ensembles of) random codes of increasing length n and rate

R_{n} \to R

. Recalling the definition of

π (\cdot)

in (15), for

n \to \infty

, we obtain

\begin{matrix} \underset{n}{lim sup} E_{C_{n}} {∥ 2^{n} T_{r_{n}} f_{C_{n}} ∥}_{\infty} \leq 1 + ϵ \end{matrix}

(A11)

once

R > 1 - π (\infty) .

Since

ϵ

is arbitrarily small, the left-hand side of (A11) approaches one, and together with (14) this completes the proof of Theorem 3.

Appendix C. Samorodnitsky’s Inequalities and Their Implications

Samorodnitsky [8,10] recently proved certain powerful inequalities for

α

-norms of noisy functions, which permit us to estimate the proximity to uniformity upon action of the Bernoulli noise kernels. We state some of them in this appendix after introducing a few more elements of notation. These results are used in Theorem 7 and in Appendix D, where we prove Lemma 2.

In this proof, we write

[n]

for

{1, \dots, n} .

For a subset

Γ \subset [n],

write

{x |}_{Γ}

to denote the coordinate projection of a vector

x \in H_{n}

on

Γ .

If the subset

Γ

is formed by random choice with

Pr (i \in Γ) = λ

independently for all

i \in [n],

we write

Γ \sim λ

. For a function f on

H_{n}

, let

E (f | Γ) (x) = \frac{1}{2^{n - | Γ |}} \sum_{{y : y |}_{Γ} {= x |}_{Γ}} f (y) .

(A12)

Observe that

E (f | Γ) = f * f_{H_{[n] ∖ Γ}},

where

H_{S} = {x \in H_{n} : x |_{[n] ∖ S} = 0}

. Therefore,

E (f | Γ) (x)

is the noisy function of f with respect to the pmf given by the indicator function of the subcube

H_{[n] ∖ Γ}

.

The entropy of a function

f : H_{n} \to R

is defined as

Ent [f] = {∥ f log f ∥}_{1} - {∥ f ∥}_{1} {log (∥ f ∥}_{1}) = ∥ f log \frac{f}{{∥ f ∥}_{1}} ∥_{1} .

(A13)

This quantity can be thought of as the KL divergence between the distribution induced by f on

H_{n}

and the uniform distribution:

Ent [f] = {∥ f ∥}_{1} D (\frac{f}{\sum f} ∥ U_{n}) .

(A14)

If f itself is a pmf, then

D (f ∥ U_{n}) = 2^{n} Ent (f) = Ent (2^{n} f) .

Theorem A2

([8], Corollary 9). Let f be a non-negative function on

H_{n}

. then,

\begin{matrix} Ent [T_{δ} f] \leq E_{Γ \sim λ} Ent [E (f | Γ)] . \end{matrix}

where

λ = {(1 - 2 δ)}^{2} .

Theorem A3

([10], Theorem 1.1). Let f be a non-negative function on

H_{n}

and

α \geq 2

be an integer. Then,

log ∥ T_{δ} {f ∥}_{α} \leq E_{Γ \sim λ} log {∥ E (f | Γ) ∥}_{α} .

(A15)

where

λ = λ (α, δ) = 1 + \frac{1}{α - 1} log (δ^{α} + {(1 - δ)}^{α}) = 1 - h_{α} (δ)

. Furthermore,

log ∥ T_{δ} {f ∥}_{\infty} \leq E_{Γ \sim λ} log {∥ E (f | Γ) ∥}_{\infty} .

(A16)

where

λ = λ (\infty, δ) = 1 + log (1 - δ) = 1 - h_{\infty} (δ)

To interpret the inequalities (A15) and (A16), we note that their left-hand side measures the smoothness of the noisy version of f with respect to the noise

β_{δ}

. At the same time, the right-hand side is the average smoothness of the noisy versions of f with respect to the sub-cube pmf’s.

Hązła et al. [14] used Theorem A3 to great effect, showing that if a code corrects erasures up to a certain noise level in a BEC, then, with high probability, it corrects errors on a BSC channel up to a certain noise level.

Theorem A4

([14], Corollary 3.4). Let

{(C_{n})}_{n}

be a sequence of codes whose rate approaches R. Assume that for some

λ \in (0, 1 - R]

,

P_{B} (BEC (λ), C_{n}) = o (\frac{1}{n})

. Then,

{(C_{n})}_{n}

decodes errors on a

BSC (δ)

for any δ that satisfies

2 \sqrt{δ (1 - δ)} < 2^{λ} - 1 .

This theorem implies the following corollary:

Corollary A1

([14]). Let

{(C_{n})}_{n}

be a sequence of codes with rate

R_{n} ↑ R

that recover transmitted messages with high probability on a

BEC (1 - R)

(i.e.,

{(C_{n})}_{n}

is a capacity-achieving sequence for

BEC (1 - R)

). Furthermore, assume that

d (C_{n}) = ω (log n)

. If

2 \sqrt{δ (1 - δ)} < 2^{1 - R} - 1

, then with high probability, the codes

C_{n}

correct errors when used on a

BSC (δ)

channel.

The authors of [14] then used this result to show that the RM codes of a constant rate correct a non-vanishing proportion of errors on the BSC.

Appendix D. Proof of Lemma 2

We present the proof as a sequence of lemmas.

Let

Γ \subset {1, \dots, n}

be a subset of coordinates and for

z \in {(0, 1}}^{n}

let

C (Γ, z) : = {c \in C : c |_{Γ} = z |_{Γ}}

be the set of codewords that fit z in the positions of

Γ .

In particular,

C^{Γ^{c}} {= C (Γ, 0) |}_{Γ^{c}}

is the shortened code

C

, i.e., the subcode with zeros in the positions of

Γ,

projected on

Γ^{c} .

Let

F^{(C)} (Γ, z) : = | C (Γ, z) |

.

Let us obtain expressions for the norms and the entropy of

F^{(C)} (Γ, z) .

Lemma A5.

Let

C

be a linear code and let

Γ \subset {1, \dots, n}

. Then,

\begin{matrix} ∥ F^{(C)} {(Γ, \cdot) ∥}_{α} = {[\frac{| C |}{2^{| Γ |}} F^{(C)} {(Γ, 0)}^{α - 1}]}^{1 / α} . \end{matrix}

Proof.

From the linearity of the code,

F^{(C)} (Γ, z) = \{\begin{matrix} F^{(C)} (Γ, 0) & {if z |}_{Γ} is a valid non - erasure pattern \\ 0 & otherwise . \end{matrix} .

Furthermore, the number of distinct

z \in H_{n}

for which

C (Γ, z)

is nonempty equals

2^{n - | Γ |} | C / C (Γ, 0) | .

Hence,

\begin{matrix} ∥ F^{(C)} {(Γ, \cdot) ∥}_{α} & = {[\frac{1}{2^{n}} \sum_{x \in H_{n}} F^{(C)} {(Γ, x |_{Γ})}^{α}]}^{1 / α} \\ = {[\frac{1}{2^{n}} \frac{2^{n} | C |}{2^{| Γ |}} \frac{1}{F^{(C)} (Γ, 0)} F^{(C)} {(Γ, 0)}^{α}]}^{1 / α} \\ = {[\frac{| C |}{2^{| Γ |}} F^{(C)} {(Γ, 0)}^{α - 1}]}^{1 / α} . \end{matrix}

□

Lemma A6.

Let

E (f | Γ)

be defined as in (A12). Then,

\begin{matrix} ∥ E (2^{n} f_{C} | Γ) ∥_{α} & = {[\frac{2^{| Γ |}}{| C |} F_{T}^{(C)} (Γ, 0)]}^{(α - 1) / α} . \end{matrix}

Proof.

Using

f = 2^{n} f_{C}

in (A12), we obtain

\begin{matrix} E (2^{n} f_{C} | Γ) (x) & = \frac{2^{n}}{2^{n - | Γ |} | C |} \sum_{{y \in C : y |}_{Γ} {= x |}_{Γ}} 1 = \frac{2^{| Γ |}}{| C |} F^{(C)} (Γ, x |_{Γ}), \end{matrix}

and, thus, from Lemma A5,

\begin{matrix} ∥ E (2^{n} f_{C} | Γ) ∥_{α} & = \frac{2^{| Γ |}}{| C |} {∥ F^{(C)} (Γ, \cdot) ∥}_{α} \\ = \frac{2^{| Γ |}}{| C |} {[\frac{| C |}{2^{| Γ |}} F^{(C)} {(Γ, 0)}^{α - 1}]}^{1 / α} \\ = {[\frac{2^{| Γ |}}{| C |} F_{T}^{(C)} (Γ, 0)]}^{(α - 1) / α} . \end{matrix}

□

Lemma A7.

Let

C

be a linear code. For

X = X_{C^{⊥}}, Y = Y_{(X, BEC (λ))}

,

\begin{matrix} H (X | Y) = E_{Γ \sim λ} [log (\frac{2^{| Γ |}}{| C |} F^{C} (Γ, 0))] . \end{matrix}

(A17)

Proof.

Start with taking

X = X_{C}

and

Y = Y_{(BEC (λ), X_{C})}

; then,

\begin{matrix} H (X | Y = y) = log [F^{(C)} (y)] . \end{matrix}

Therefore,

\begin{matrix} H (X | Y) & = E_{Y} [log (F^{(C)} (Y))] \\ = E_{Γ} E_{Z | Γ} [log (F^{(C)} (Γ, Z)) | Γ] \\ = E_{Γ \sim 1 - λ} [log F^{(C)} (Γ, 0)] . \end{matrix}

By a standard identity about dual matroids ([64], p. 72),

dim (C^{Γ^{c}}) = dim (C) - | Γ | + dim ({(C^{⊥})}^{Γ}),

or

F^{(C)} (Γ, 0) = \frac{| C |}{2^{| Γ |}} F^{C^{⊥}} (Γ^{c}, 0),

and, thus, we continue as follows:

\begin{matrix} H (X | Y) & = E_{Γ \sim 1 - λ} [log (\frac{| C |}{2^{| Γ |}} F^{C^{⊥}} (Γ^{c}, 0))] \\ = E_{Γ^{c} \sim λ} [log (\frac{2^{| Γ^{c} |}}{| C^{⊥} |} F^{C^{⊥}} (Γ^{c}, 0))] \\ = E_{Γ \sim λ} [log (\frac{2^{| Γ |}}{| C^{⊥} |} F^{C^{⊥}} (Γ, 0))] . \end{matrix}

Switching to the dual code and taking

X = X_{C^{⊥}}

and

Y = Y_{(BEC (λ), X_{C^{⊥}})}

now yields (A17). □

Lemma A8.

\begin{matrix} \frac{α}{α - 1} E_{Γ \sim λ} log {∥ E (2^{n} f_{C} | Γ) ∥}_{α} = E_{Γ \sim λ} Ent [E (2^{n} f_{C} | Γ)] = H (X_{C^{⊥}} | Y_{(BEC (λ), X_{C^{⊥}})}) . \end{matrix}

Proof.

From Lemmas A6 and A7,

\begin{matrix} \frac{α}{α - 1} E_{Γ \sim λ} log {∥ E (2^{n} f_{C} | Γ) ∥}_{α} & = E_{Γ \sim λ} [log (\frac{2^{| Γ |}}{| C |} F^{C} (Γ, 0))] \\ = H (X_{C^{⊥}} | Y_{(BEC (λ), X_{C^{⊥}})}), \end{matrix}

which establishes the equality between the first and the third quantities. Since the second quantity is a limiting case of the first quantity and the value of the first quantity is independent of

α

, we have equality between the first and the second quantities. □

Now, Lemma 2 follows by combining Lemma A8 with Theorems A2 and A3.

References

Han, T.S.; Verdú, S. Approximation theory of output statistics. IEEE Trans. Inf. Theory 1993, 39, 752–772. [Google Scholar] [CrossRef]
Hayashi, M. General nonasymptotic and asymptotic formulas in channel resolvability and identification capacity and their application to the wiretap channel. IEEE Trans. Inf. Theory 2006, 52, 1562–1575. [Google Scholar] [CrossRef]
Yu, L.; Tan, V.Y. Rényi resolvability and its applications to the wiretap channel. IEEE Trans. Inf. Theory 2019, 65, 1862–1897. [Google Scholar] [CrossRef]
Debris-Alazard, T.; Ducas, L.; Resch, N.; Tillich, J.-P. Smoothing codes and lattices: Systematic study and new bounds. IEEE Trans. Inf. Theory 2023, 69, 6006–6027. [Google Scholar] [CrossRef]
Micciancio, D.; Regev, O. Worst-case to average-case reductions based on Gaussian measures. SIam J. Comput. 2007, 37, 267–302. [Google Scholar] [CrossRef]
Chen, W.W.L.; Skriganov, M.M. Explicit constructions in the classical mean squares problem in irregularities of point distribution. J. Fur Die Reine Und Angew. Math. 2002, 545, 67–95. [Google Scholar] [CrossRef]
Skriganov, M.M. Coding theory and uniform distributions. Algebra Anal. 2001, 13, 191–239, Translation in St. Petersburg Math. J. 2002, 13, 301–337. [Google Scholar]
Samorodnitsky, A. On the entropy of a noisy function. IEEE Trans. Inf. Theory 2016, 62, 5446–5464. [Google Scholar] [CrossRef]
Samorodnitsky, A. An upper bound on ℓ_q norms of noisy functions. IEEE Trans. Inf. Theory 2019, 66, 742–748. [Google Scholar] [CrossRef]
Samorodnitsky, A. An improved bound on ℓ_q norms of noisy functions. arXiv 2020, arXiv:2010.02721. [Google Scholar] [CrossRef]
Bloch, M.R.; Laneman, J.N. Strong secrecy from channel resolvability. IEEE Trans. Inf. Theory 2013, 59, 8077–8098. [Google Scholar] [CrossRef]
Belfiore, J.-C.; Oggier, F. Secrecy gain: A wiretap lattice code design. In Proceedings of the 2010 International Symposium on Information Theory & Its Applications, Taichung, Taiwan, 17–20 October 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 174–178. [Google Scholar] [CrossRef]
Luzzi, L.; Ling, C.; Bloch, M.R. Optimal rate-limited secret key generation from Gaussian sources using lattices. IEEE Trans. Inf. Theory 2023, 69, 4944–4960. [Google Scholar] [CrossRef]
Hązła, J.H.; Samorodnitsky, A.; Sberlo, O. On codes decoding a constant fraction of errors on the BSC. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual, 21–25 June 2021; pp. 1479–1488. [Google Scholar] [CrossRef]
Rao, A.; Sprumont, O. A criterion for decoding on the BSC. arXiv 2022, arXiv:2202.00240. [Google Scholar] [CrossRef]
Arimoto, S. On the converse to the coding theorem for discrete memoryless channels (corresp.). IEEE Trans. Inf. Theory 1973, 19, 357–359. [Google Scholar] [CrossRef]
Polyanskiy, Y.; Verdú, S. Arimoto channel coding converse and Rényi divergence. In Proceedings of the 2010 48th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 29 September–1 October 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1327–1333. [Google Scholar] [CrossRef]
Polyanskiy, Y.; Verdú, S. Empirical distribution of good channel codes with nonvanishing error probability. IEEE Trans. Inf. Theory 2013, 60, 5–21. [Google Scholar] [CrossRef]
Chou, R.A.; Bloch, M.R.; Kliewer, J. Empirical and strong coordination via soft covering with polar codes. IEEE Trans. Inf. Theory 2018, 64, 5087–5100. [Google Scholar] [CrossRef]
Cover, T.M.; Permuter, H.H. Capacity of coordinated actions. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 2701–2705. [Google Scholar] [CrossRef]
Cuff, P. Distributed channel synthesis. IEEE Trans. Inf. Theory 2013, 59, 7071–7096. [Google Scholar] [CrossRef]
Cuff, P.W.; Permuter, H.H.; Cover, T.M. Coordination capacity. IEEE Trans. Inf. Theory 2010, 56, 4181–4206. [Google Scholar] [CrossRef]
Chou, R.A.; Bloch, M.R.; Abbe, E. Polar coding for secret-key generation. IEEE Trans. Inf. Theory 2015, 61, 6213–6237. [Google Scholar] [CrossRef]
Brakerski, Z.; Lyubashevsky, V.; Vaikuntanathan, V.; Wichs, D. Worst-case hardness for LPN and cryptographic hashing via code smoothing. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques, Paris, France, 30 April– 4 May 2017; Springer: Berlin/Heidelberg, Germany, 2019; pp. 619–635. [Google Scholar]
Goldfeld, Z.; Kato, K.; Nietert, S.; Rioux, G. Limit distribution theory for smooth p-Wasserstein distances. arXiv 2022, arXiv:2203.00159. [Google Scholar] [CrossRef]
Goldfeld, Z.; Kato, K.; Rioux, G.; Sadhu, R. Statistical inference with regularized optimal transport. arXiv 2022, arXiv:2205.04283. [Google Scholar] [CrossRef]
Nietert, S.; Goldfeld, Z.; Kato, K. Smooth p-Wasserstein distance: Structure, empirical approximation, and statistical applications. In Proceedings of the International Conference on Machine Learning, Virtual, 8–24 July 2021; pp. 8172–8183. [Google Scholar]
Liu, J.; Cuff, P.; Verdú, S. E_γ-resolvability. IEEE Trans. Inf. Theory 2016, 63, 2629–2658. [Google Scholar] [CrossRef]
Steinberg, Y.; Verdú, S. Simulation of random processes and rate-distortion theory. IEEE Trans. Inf. Theory 1996, 42, 63–86. [Google Scholar] [CrossRef]
Ordentlich, O.; Polyanskiy, Y. Entropy under additive Bernoulli and spherical noises. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 521–525. [Google Scholar] [CrossRef]
Polyanskiy, Y. Hypercontractivity of spherical averages in Hamming space. Siam J. Discret. Math. 2019, 33, 731–754. [Google Scholar] [CrossRef]
Yu, L. Edge-isoperimetric inequalities and ball-noise stability: Linear programming and probabilistic approaches. J. Comb. Theory Ser. A 2022, 188, 105583. [Google Scholar] [CrossRef]
Wyner, A.; Ziv, J. A theorem on the entropy of certain binary sequences and applications–I. IEEE Trans. Inf. Theory 1973, 19, 769–772. [Google Scholar] [CrossRef]
Hązła, J.H. Optimal list decoding from noisy entropy inequality. arXiv 2022, arXiv:2212.01443. [Google Scholar] [CrossRef]
Wyner, A.D. The wire-tap channel. Bell Syst. Tech. J. 1975, 54, 1355–1387. [Google Scholar] [CrossRef]
Csiszár, I. Almost independence and secrecy capacity. Probl. Peredachi Informatsii 1996, 32, 48–57, English translation in Probl. Inform. Transm. 1996, 32, 40–47. [Google Scholar]
Maurer, U.M. Secret key agreement by public discussion from common information. IEEE Trans. Inf. Theory 1993, 39, 733–742. [Google Scholar] [CrossRef]
Thangaraj, A.; Dihidar, S.; Calderbank, A.R.; McLaughlin, S.W.; Merolla, J.-M. Applications of LDPC codes to the wiretap channel. IEEE Trans. Inf. Theory 2007, 53, 2933–2945. [Google Scholar] [CrossRef]
Mahdavifar, H.; Vardy, A. Achieving the secrecy capacity of wiretap channels using polar codes. IEEE Trans. Inf. Theory 2011, 57, 6428–6443. [Google Scholar] [CrossRef]
Subramanian, A.; Suresh, A.T.; Raj, S.; Thangaraj, A.; Bloch, M.; McLaughlin, S. Strong and weak secrecy in wiretap channels. In Proceedings of the 2010 6th International Symposium on Turbo Codes & Iterative Information Processing, Brest, France, 6–10 September 2010; pp. 30–34. [Google Scholar] [CrossRef]
Gulcu, T.C.; Barg, A. Achieving secrecy capacity of the wiretap channel and broadcast channel with a confidential component. IEEE Trans. Inf. Theory 2016, 63, 1311–1324. [Google Scholar] [CrossRef]
Barg, A. Stolarsky’s invariance principle for finite metric spaces. Mathematika 2021, 67, 158–186. [Google Scholar] [CrossRef]
Bilyk, D.; Dai, F.; Matzke, R. The Stolarsky principle and energy optimization on the sphere. Constr. Approx. 2018, 48, 31–60. [Google Scholar] [CrossRef]
Skriganov, M.M. Point distributions in two-point homogeneous spaces. Mathematika 2019, 65, 557–587. [Google Scholar] [CrossRef]
Simon, B. Real Analysis: A Comprehensive Course in Analysis, Part 1; American Mathematical Society: Providence, RI, USA, 2015. [Google Scholar] [CrossRef]
Shamai, S.; Verdú, S. The empirical distribution of good codes. IEEE Trans. Inf. Theory 1997, 43, 836–846. [Google Scholar] [CrossRef]
Polyanskiy, Y.; Poor, H.V.; Verdú, S. Channel coding rate in the finite blocklength regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
Bloch, M.R.; Luzzi, L.; Kliewer, J. Strong coordination with polar codes. In Proceedings of the 2012 50th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 1–5 October 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 565–571. [Google Scholar] [CrossRef]
Tillich, J.-P.; Zémor, G. Discrete isoperimetric inequalities and the probability of a decoding error. Comb. Probab. Comput. 2000, 9, 465–479. [Google Scholar] [CrossRef]
Ashikhmin, A.; Barg, A. Bounds on the covering radius of linear codes. Des. Codes Cryptogr. 2002, 27, 261–269. [Google Scholar] [CrossRef]
Watanabe, S.; Hayashi, M. Strong converse and second-order asymptotics of channel resolvability. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 1882–1886. [Google Scholar] [CrossRef]
Kudekar, S.; Kumar, S.; Mondelli, M.; Pfister, H.D.; Şaşoğlu, E.; Urbanke, R. Reed-Muller codes achieve capacity on erasure channels. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, Cambridge, MA, USA, 19–21 June 2016; pp. 658–669. [Google Scholar] [CrossRef]
Abbe, E.; Sandon, C. A proof that Reed-Muller codes achieve Shannon capacity on symmetric channels. arXiv 2023, arXiv:2304.02509. [Google Scholar] [CrossRef]
Reeves, G.; Pfister, H.D. Reed-Muller codes achieve capacity on BMS channels. arXiv 2021, arXiv:2110.14631. [Google Scholar] [CrossRef]
Renes, J.M. Duality of channels and codes. IEEE Trans. Inf. Theory 2018, 64, 577–592. [Google Scholar] [CrossRef]
Rengaswamy, N.; Pfister, H.D. On the duality between the BSC and quantum PSC. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2232–2237. [Google Scholar] [CrossRef]
Pfister, H.; Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA; Rengaswamy, N.; Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, USA. Personal Communication, 2023.
Poltyrev, G. Bounds on the decoding error probability of binary linear codes via their spectra. IEEE Trans. Inf. Theory 1994, 40, 1284–1292. [Google Scholar] [CrossRef]
Semakov, N.; Zinov’ev, V.A.; Zaitsev, G. Uniformly packed codes. Probl. Peredachi Informatsii 1971, 7, 38–50. [Google Scholar]
Goethals, J.-M.; van Tilborg, H.C.A. Uniformly packed codes. Philips Res. Rep. 1975, 30, 9–36. [Google Scholar]
Tokareva, N. An upper bound for the number of uniformly packed codes. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 346–349. [Google Scholar] [CrossRef]
Borges, J.; Rifà, J.; Zinoviev, V.A. On completely regular codes. Probl. Inf. Transm. 2019, 55, 1–45. [Google Scholar] [CrossRef]
Delsarte, P. An algebraic approach to the association schemes of coding theory. Philips Res. Repts. Suppl. 1973, 10, 1973. [Google Scholar]
Oxley, J. Matroid Theory; Oxford University Press: Oxford, UK, 1992. [Google Scholar]

Figure 1. Capacities and achievable rates for perfect smoothing. The lowermost curve gives the Shannon capacity of the BSC

(δ)

, the second curve from the bottom is the smoothing threshold for the duals of the BEC capacity-achieving codes, the third one is

S_{2}^{β_{δ}}

and the top one is

S_{\infty}^{β_{δ}}

.

Figure 1. Capacities and achievable rates for perfect smoothing. The lowermost curve gives the Shannon capacity of the BSC

(δ)

, the second curve from the bottom is the smoothing threshold for the duals of the BEC capacity-achieving codes, the third one is

S_{2}^{β_{δ}}

and the top one is

S_{\infty}^{β_{δ}}

.

Figure 2. Achievable rates in the BSC wiretap channel with BEC capacity-achieving codes. The bottom curve is the lower bound on the code rate that guarantees decodability on a BSC

(δ)

. The middle curve shows Shannon’s capacity and the top one is the

D_{1}

-smoothing threshold for the Bernoulli noise

T_{δ}

.

Figure 2. Achievable rates in the BSC wiretap channel with BEC capacity-achieving codes. The bottom curve is the lower bound on the code rate that guarantees decodability on a BSC

(δ)

. The middle curve shows Shannon’s capacity and the top one is the

D_{1}

-smoothing threshold for the Bernoulli noise

T_{δ}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pathegama, M.; Barg, A. Smoothing of Binary Codes, Uniform Distributions, and Applications. Entropy 2023, 25, 1515. https://doi.org/10.3390/e25111515

AMA Style

Pathegama M, Barg A. Smoothing of Binary Codes, Uniform Distributions, and Applications. Entropy. 2023; 25(11):1515. https://doi.org/10.3390/e25111515

Chicago/Turabian Style

Pathegama, Madhura, and Alexander Barg. 2023. "Smoothing of Binary Codes, Uniform Distributions, and Applications" Entropy 25, no. 11: 1515. https://doi.org/10.3390/e25111515

APA Style

Pathegama, M., & Barg, A. (2023). Smoothing of Binary Codes, Uniform Distributions, and Applications. Entropy, 25(11), 1515. https://doi.org/10.3390/e25111515

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Smoothing of Binary Codes, Uniform Distributions, and Applications

Abstract

1. Introduction

2. Preliminaries

2.1. Notation

2.2. $D_{α}$ - and $L_{α}$ -Smoothness

2.3. Resolvability

3. Perfect Smoothing—The Asymptotic Case

4. Bernoulli Noise

5. Binary Symmetric Wiretap Channels

Secrecy from $α$ -Divergence

6. Ball Noise and Error Probability of Decoding

6.1. Ball Noise

6.2. Probability of Decoding Error on a BSC $(δ)$

7. Perfect Smoothing—The Finite Case

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. L₂ Smoothing

Appendix B. Proof of Theorem 3

Appendix C. Samorodnitsky’s Inequalities and Their Implications

Appendix D. Proof of Lemma 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Smoothing of Binary Codes, Uniform Distributions, and Applications

Abstract

1. Introduction

2. Preliminaries

2.1. Notation

2.2. D α - and L α -Smoothness

2.3. Resolvability

3. Perfect Smoothing—The Asymptotic Case

4. Bernoulli Noise

5. Binary Symmetric Wiretap Channels

Secrecy from α -Divergence

6. Ball Noise and Error Probability of Decoding

6.1. Ball Noise

6.2. Probability of Decoding Error on a BSC ( δ )

7. Perfect Smoothing—The Finite Case

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. L2 Smoothing

Appendix B. Proof of Theorem 3

Appendix C. Samorodnitsky’s Inequalities and Their Implications

Appendix D. Proof of Lemma 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2. $D_{α}$ - and $L_{α}$ -Smoothness

Secrecy from $α$ -Divergence

6.2. Probability of Decoding Error on a BSC $(δ)$

Appendix A. L₂ Smoothing