Scaling Exponent and Moderate Deviations Asymptotics of Polar Codes for the AWGN Channel

Fong, Silas L.; Tan, Vincent Y. F.

doi:10.3390/e19070364

Open AccessArticle

Scaling Exponent and Moderate Deviations Asymptotics of Polar Codes for the AWGN Channel

by

Silas L. Fong

^* and

Vincent Y. F. Tan

Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583, Singapore

^*

Author to whom correspondence should be addressed.

Entropy 2017, 19(7), 364; https://doi.org/10.3390/e19070364

Submission received: 8 June 2017 / Revised: 8 July 2017 / Accepted: 13 July 2017 / Published: 15 July 2017

(This article belongs to the Special Issue Multiuser Information Theory)

Download Versions Notes

Abstract

:

This paper investigates polar codes for the additive white Gaussian noise (AWGN) channel. The scaling exponent

μ

of polar codes for a memoryless channel

q_{Y | X}

with capacity

I (q_{Y | X})

characterizes the closest gap between the capacity and non-asymptotic achievable rates as follows: For a fixed

ε \in (0, 1)

, the gap between the capacity

I (q_{Y | X})

and the maximum non-asymptotic rate

R_{n}^{*}

achieved by a length-n polar code with average error probability

ε

scales as

n^{- 1 / μ}

, i.e.,

I (q_{Y | X}) - R_{n}^{*} = Θ (n^{- 1 / μ})

. It is well known that the scaling exponent

μ

for any binary-input memoryless channel (BMC) with

I (q_{Y | X}) \in (0, 1)

is bounded above by

4.714

. Our main result shows that

4.714

remains a valid upper bound on the scaling exponent for the AWGN channel. Our proof technique involves the following two ideas: (i) The capacity of the AWGN channel can be achieved within a gap of

O (n^{- 1 / μ} \sqrt{\log n})

by using an input alphabet consisting of n constellations and restricting the input distribution to be uniform; (ii) The capacity of a multiple access channel (MAC) with an input alphabet consisting of n constellations can be achieved within a gap of

O (n^{- 1 / μ} \log n)

by using a superposition of

\log n

binary-input polar codes. In addition, we investigate the performance of polar codes in the moderate deviations regime where both the gap to capacity and the error probability vanish as n grows. An explicit construction of polar codes is proposed to obey a certain tradeoff between the gap to capacity and the decay rate of the error probability for the AWGN channel.

Keywords:

AWGN channel; moderate deviations; multiple access channel; polar codes; scaling exponent

1. Introduction

1.1. The Additive White Gaussian Noise Channel

This paper investigates low-complexity codes over the classical additive white Gaussian noise (AWGN) channel ([1], Chapter 9), where a source wants to transmit information to a destination and each received symbol is the sum of the transmitted symbol and an independent Gaussian random variable. More specifically, if

X_{k}

denotes the symbol transmitted by the source in the kth time slot, then the corresponding symbol received by the destination is

Y_{k} = X_{k} + Z_{k}

(1)

where

Z_{k}

is the standard normal random variable. When the transmission lasts for n time slots, i.e., each transmitted codeword consists of n symbols, it is assumed that

Z_{1}, Z_{2}, \dots, Z_{n}

are independent and each transmitted codeword

x^{n} ≜ (x_{1}, x_{2}, \dots, x_{n})

must satisfy the peak power constraint

\frac{1}{n} \sum_{k = 1}^{n} x_{k}^{2} \leq P

(2)

where

P > 0

is a constant which denotes the permissible power. For transmitting a uniformly distributed message

W \in {1, 2, \dots, ⌈ 2^{n R} ⌉}

across this channel, Shannon [2] shows that the limit of the maximum coding rate R as n approaches infinity (i.e., capacity) is

C (P) ≜ \frac{1}{2} \log (1 + P) .

(3)

1.2. Polar Codes

In 1948, in his groundbreaking paper Shannon [2] proposed a systematic framework for studying the fundamental limits for transmitting information over noisy channels and provided a single-letter formula for the capacity of a memoryless channel. For decades, information and coding theorists sought to achieve these fundamental limits via low-complexity and capacity-achieving codes. In 2009, in his breakthrough paper Arıkan [3] proposed a class of codes—known as polar codes—whose encoding and decoding complexities are

O (n \log n)

and provably achieve the capacity of any binary-input memoryless symmetric channel (BMSC).

The scaling exponent

μ

of polar codes for a memoryless channel

q_{Y | X}

with capacity

\begin{matrix} I (q_{Y | X}) ≜ max_{p_{X}} I (X; Y) \end{matrix}

(4)

characterizes the closest gap between the channel capacity and non-asymptotic achievable rates as follows: For a fixed

ε \in (0, 1)

, the gap between the capacity

I (q_{Y | X})

and the maximum non-asymptotic rate

R_{n}^{*}

achieved by a length-n polar code with average error probability

ε

scales as

n^{- 1 / μ}

, i.e.,

I (q_{Y | X}) - R_{n}^{*} = Θ (n^{- 1 / μ})

. It has been shown in [4,5,6] that the scaling exponent

μ

for any BMSC with

I (q_{Y | X}) \in (0, 1)

lies between

3.579

and

4.714

. Indeed, the upper bound

4.714

remains valid for any general binary-input memoryless channel (BMC) ([7], Lemma 4). The scaling exponent of polar codes for a non-stationary channel has been recently studied in [8].

It is well known that polar codes are capacity-achieving for BMCs [9,10,11,12], and appropriately chosen ones are also capacity-achieving for the AWGN channel [13]. In particular, for any

R < C (P)

and any

β < 1 / 2

, polar codes operated at rate R can be constructed for the AWGN channel such that the decay rate of the error probability is

O (2^{- n^{β}})

[13] and the encoding and decoding complexities are

O (n \log n)

. However, the scaling exponent of polar codes for the AWGN channel has not been investigated yet.

In this paper, we construct polar codes for the AWGN channel and show that

4.714

remains a valid upper bound on the scaling exponent. Our construction of polar codes involves the following two ideas:(i) By using an input alphabet consisting of n constellations and restricting the input distribution to be uniform as suggested in [13], we can achieve the capacity of the AWGN channel within a gap of

O (n^{- 1 / μ} \sqrt{\log n})

; (ii) By using a superposition of

\log n

binary-input polar codes (in this paper, n is always a power of 2) as suggested in [14], we can achieve the capacity of the corresponding multiple access channel (MAC) within a gap of

O (n^{- 1 / μ} \log n)

where the input alphabet of the MAC has n constellations (i.e., the size of the Cartesian product of the input alphabets corresponding to the

\log n

input terminals is n). The encoding and decoding complexities of our constructed polar codes are

O (n \log^{2} n)

. On the other hand, the lower bound

3.579

holds trivially for the constructed polar codes because the polar codes are constructed by superposing

\log n

binary-input polar codes whose scaling exponents are bounded below by

3.579

[5].

In addition, Mondelli et al. ([4], Section IV) provided an explicit construction of polar codes for any BMSC which obeys a certain tradeoff between the gap to capacity and the decay rate of the error probability. More specifically, if the gap to capacity is set to vanish at a rate of

Θ (n^{- \frac{1 - γ}{μ}})

for some

γ \in (\frac{1}{1 + μ}, 1)

, then a length-n polar code can be constructed such that the error probability is

O (n \cdot 2^{- n^{γ h_{2}^{- 1} (\frac{γ μ + γ - 1}{γ μ})}})

where

h_{2} : [0, 1 / 2] \to [0, 1]

denotes the binary entropy function. This tradeoff was developed under the moderate deviations regime [15] where both the gap to capacity and the error probability vanish as n grows. For the AWGN channel, we develop a similar tradeoff under the moderate deviations regime by using our constructed polar codes described above.

1.3. Paper Outline

This paper is organized as follows. The notation used in this paper is described in the next subsection. Section 2 presents the background of this work, which includes existing polarization results for the BMC which are used in this work. Section 3.1, Section 3.2 and Section 3.3 state the formulation of the binary-input MAC and present new polarization results for the binary-input MAC. Section 4.1–Section 4.2 state the formulation of the AWGN channel and present new polarization results for the AWGN channel. Section 5 establishes the definition of the scaling exponent for the AWGN channel and establishes the main result—4.714 is an upper bound on the scaling exponent of polar codes for the AWGN channel. Section 6 presents an explicit construction of polar codes for the AWGN channel which obey a certain tradeoff between the gap to capacity and the decay rate of the error probability under the moderate deviations regime. Concluding remarks are provided in Section 7.

1.4. Notation

The set of natural numbers, real numbers and non-negative real numbers are denoted by

N

,

R

and

R_{+}

respectively. For any sets

A

and

B

and any mapping

f : A \to B

, we let

f^{- 1} (D)

denote the set

{a \in A | f (a) \in D}

for any

D \subseteq B

. We let

1 {E}

be the indicator function of the set

E

. An arbitrary (discrete or continuous) random variable is denoted by an upper-case letter (e.g., X), and the realization and the alphabet of the random variable are denoted by the corresponding lower-case letter (e.g., x) and calligraphic letter (e.g.,

X

) respectively. We use

X^{n}

to denote the random tuple

(X_{1}, X_{2}, \dots, X_{n})

where each

X_{k}

has the same alphabet

X

. We will take all logarithms to base 2 throughout this paper.

The following notations are used for any of the arbitrary X and Y random variables and any real-valued function g with domain

X

. We let

p_{Y | X}

and

p_{X, Y} = p_{X} p_{Y | X}

denote the conditional probability distribution of Y given X and the probability distribution of

(X, Y)

respectively. We let

p_{X, Y} (x, y)

and

p_{Y | X} (y | x)

be the evaluations of

p_{X, Y}

and

p_{Y | X}

respectively at

(X, Y) = (x, y)

. To make the dependence on the distribution explicit, we let

P_{p_{X}} {g (X) \in A}

denote

\int_{X} p_{X} (x) 1 {g (x) \in A} d x

for any set

A \subseteq R

. The expectation of

g (X)

is denoted as

E_{p_{X}} [g (X)]

. For any

(X, Y, Z)

distributed according to some

p_{X, Y, Z}

, the entropy of X and the conditional mutual information between X and Y given Z are denoted by

H_{p_{X}} (X)

and

I_{p_{X, Y, Z}} (X; Y | Z)

respectively. For simplicity, we sometimes omit the subscript of a notation if it causes no confusion. The relative entropy between

p_{X}

and

q_{X}

is denoted by

D (p_{X} ∥ q_{X}) ≜ \int_{X} p_{X} (x) \log (\frac{p_{X} (x)}{q_{X} (x)}) d x .

(5)

The 2-Wasserstein distance between

p_{X}

and

p_{Y}

is denoted by

W_{2} (p_{X}, p_{Y}) ≜ \inf_{\begin{matrix} s_{X, Y} : \\ s_{X} = p_{X}, \\ s_{Y} = p_{Y} \end{matrix}} \sqrt{\int_{X} \int_{Y} s_{X, Y} (x, y) {(x - y)}^{2} d y d x} .

(6)

We let

N (\cdot; μ, σ^{2}) : R \to [0, \infty)

denote the probability density function of a Gaussian random variable whose mean and variance are

μ

and

σ^{2}

respectively, i.e.,

N (z; μ, σ^{2}) ≜ \frac{1}{\sqrt{2 π σ^{2}}} e^{- \frac{{(z - μ)}^{2}}{2 σ^{2}}} .

(7)

2. Background: Point-to-Point Channels and Existing Polarization Results

In this section, we will review important polarization results related to the scaling exponent of polar codes for binary-input memoryless channels (BMCs).

2.1. Point-to-Point Memoryless Channels

Consider a point-to-point channel which consists of one source and one destination, denoted by

s

and

d

respectively. Suppose node

s

transmits information to node

d

in n time slots. Before any transmission begins, node

s

chooses message W destined for node

d

, where W is uniformly distributed over the alphabet

W ≜ {1, 2, \dots, M}

(8)

which consists of M elements. For each

k \in {1, 2, \dots, n}

, node

s

transmits

X_{k} \in X

based on W and node

d

receives

Y_{k} \in Y

in time slot k where

X

and

Y

denote respectively the input and output alphabets of the channel. After n time slots, node

d

declares

\hat{W}

to be the transmitted W based on

Y^{n}

. Formally, we define a length-n code as follows.

Definition 1.

An

(n, M)

code consists of the following:

A message set $W$ as defined in (8). Message W is uniform on $W$ .
An encoding function $f_{k} : W \to X$ for each $k \in {1, 2, \dots, n}$ , where $f_{k}$ is used by node $s$ for encoding $X_{k}$ such that $X_{k} = f_{k} (W)$ .
A decoding function $φ : Y^{n} \to W$ used by node $d$ for producing the message estimate $\hat{W} = φ (Y^{n})$ .

Definition 2.

The point-to-point memoryless channel is characterized by an input alphabet

X

, an output alphabet

Y

and a conditional distribution

q_{Y | X}

such that the following holds for any

(n, M)

code: For each

k \in {1, 2, \dots, n}

,

p_{W, X^{n}, Y^{n}} = p_{W, X^{n}} \prod_{k = 1}^{n} p_{Y_{k} | X_{k}}

where

p_{Y_{k} | X_{k}} (y_{k} | x_{k}) = q_{Y | X} (y_{k} | x_{k})

for all

x_{k} \in X

and

y_{k} \in Y

.

For any

(n, M)

code defined on the point-to-point memoryless channel, let

p_{W, X^{n}, Y^{n}, \hat{W}}

be the joint distribution induced by the code. By Definitions 1 and 2, we can factorize

p_{W, X^{n}, Y^{n}, \hat{W}}

as,

p_{W, X^{n}, Y^{n}, \hat{W}} = p_{W} p_{X^{n} | W} (\prod_{k = 1}^{n} p_{Y_{k} | X_{k}}) p_{\hat{W} | Y^{n}} .

(9)

2.2. Polarization for Binary-Input Memoryless Channels

Definition 3.

A point-to-point memoryless channel characterized by

q_{Y | X}

is called a binary-input memoryless channel (BMC) if

X = {0, 1}

.

We follow the formulation of polar coding in [11]. Consider any BMC characterized by

q_{Y | X}

. Let

p_{X}

be the probability distribution of a Bernoulli random variable X, and let

p_{X^{n}}

be the distribution of n independent copies of

X \sim p_{X}

, i.e.,

p_{X^{n}} (x^{n}) = \prod_{k = 1}^{n} p_{X} (x_{k})

for all

x^{n} \in X^{n}

. For each

n = 2^{m}

where

m \in N

, the polarization mapping of a length-n polar code is given by

G_{n} ≜ {[\begin{matrix} 1 & 0 \\ 1 & 1 \end{matrix}]}^{\otimes m} = G_{n}^{- 1}

(10)

where ⊗ denotes the Kronecker power. Define

p_{U^{n} | X^{n}}

such that

[U_{1} U_{2} \dots U_{n}] = [X_{1} X_{2} \dots X_{n}] G_{n}

(11)

where the addition and product operations are performed over GF(2), define

p_{Y_{k} | X_{k}} (y_{k} | x_{k}) ≜ q_{Y | X} (y_{k} | x_{k})

(12)

for each

k \in {1, 2, \dots, n}

and each

(x_{k}, y_{k}) \in X \times Y

where

q_{Y | X}

characterizes the BMC (cf. (2)), and define

p_{U^{n}, X^{n}, Y^{n}} ≜ p_{X^{n}} p_{U^{n} | X^{n}} \prod_{k = 1}^{n} p_{Y_{k} | X_{k}} .

(13)

In addition, for each

k \in {1, 2, \dots, n}

, define the Bhattacharyya parameter associated with time k as

\begin{matrix} Z^{[p_{X}; q_{Y | X}]} (U_{k} | U^{k - 1}, Y^{n}) \\ (14) & ≜ 2 \sum_{u^{k - 1} \in U^{k - 1}} \int_{Y^{n}} p_{U^{k - 1}, Y^{n}} (u^{k - 1}, y^{n}) \sqrt{p_{U_{k} | U^{k - 1}, Y^{n}} (0 | u^{k - 1}, y^{n}) p_{U_{k} | U^{k - 1}, Y^{n}} (1 | u^{k - 1}, y^{n})} d y^{n} \\ (15) & = 2 \sum_{u^{k - 1} \in U^{k - 1}} \int_{Y^{n}} \sqrt{p_{U_{k}, U^{k - 1}, Y^{n}} (0, u^{k - 1}, y^{n}) p_{U_{k}, U^{k - 1}, Y^{n}} (1, u^{k - 1}, y^{n})} d y^{n} \end{matrix}

where the distributions in (14) and (15) are marginal distributions of

p_{U^{n}, X^{n}, Y^{n}}

defined in (13). To simplify notation, let

β ≜ 4.714

(16)

in the rest of this paper. The following lemma is based on Section III of [4] and it is a restatement of Lemma 2 of [7], which has been used in [7] to show that

4.714

is an upper bound on the scaling exponent for any BMC (including non-symmetric ones).

Lemma 1.

([7], Lemma 2) There exists a universal constant

t > 0

such that the following holds. Fix any BMC characterized by

q_{Y | X}

and any

p_{X}

. Then for any

m \in N

and

n ≜ 2^{m}

, we have

\begin{matrix} \frac{1}{n} |\{k \in {1, 2, \dots, n} |Z^{[p_{X}; q_{Y | X}]} (U_{k} | U^{k - 1}, Y^{n}) \leq \frac{1}{n^{4}}\}| \geq I_{p_{X} q_{Y | X}} (X; Y) - \frac{t}{n^{1 / β}} . \end{matrix}

(17)

This lemma continues to hold if the quantity

\frac{1}{n^{4}}

is replaced by

\frac{1}{n^{ν}}

for any

ν > 0

. The main result of this paper continues to hold if the quantity

\frac{1}{n^{4}}

in this lemma is replaced by

\frac{1}{n^{ν}}

for any

ν > 2

.

3. Problem Formulation of Binary-Input MACs and New Polarization Results

Polar codes have been proposed and investigated for achieving any rate tuple inside the capacity region of a binary-input multiple access channel (MAC) [14,16]. The goal of this section is to use the polar codes proposed in [14] to achieve the symmetric sum-capacity of a binary-input MAC.

3.1. Binary-Input Multiple Access Channels

Consider a MAC ([1], Section 15.3) which consists of N sources and one destination. Let

I ≜ {1, 2, \dots, N}

be the index set of the N sources and let

d

denote the destination. Suppose the sources transmit information to node

d

in n time slots. Before any transmission begins, node i chooses message

W_{i}

destined for node

d

for each

i \in I

, where

W_{i}

is uniformly distributed over

W_{i} ≜ {1, 2, \dots, M_{i}}

(18)

which consists of

M_{i}

elements. For each

k \in {1, 2, \dots, n}

, node i transmits

X_{i, k} \in X_{i}

based on

W_{i}

for each

i \in I

and node

d

receives

Y_{k} \in Y

in time slot k where

X_{i}

denotes the input alphabet for node i and

Y

denotes the output alphabet. After n time slots, node

d

declares

{\hat{W}}_{i}

to be the transmitted

W_{i}

based on

Y^{n}

for each

i \in I

.

To simplify notation, we use the following convention for any

T \subseteq I

. For any random tuple

(X_{1}, X_{2}, \dots, X_{N})

, we let

X_{T} ≜ (X_{i} : i \in T)

be the corresponding subtuple, whose realization and alphabet are denoted by

x_{T}

and

X_{T}

respectively. Similarly, for each

k \in {1, 2, \dots, n}

and each random tuple

(X_{1, k}, X_{2, k}, \dots, X_{N, k}) \in X_{I}

, we let

X_{T, k} ≜ (X_{i, k} : i \in T)

denote the corresponding random subtuple, and let

x_{T, k}

and

X_{T, k}

denote respectively the realization and the alphabet of

X_{T, k}

. Formally, we define a length-n code for the binary-input MAC as follows.

Definition 4.

An

(n, M_{I})

-code, where

M_{I} ≜ (M_{1}, M_{2}, \dots, M_{N})

, consists of the following:

A message set $W_{i}$ for each $i \in I$ as defined in (18), where message $W_{i}$ is uniform on $W_{i}$ .
An encoding function $f_{i, k}^{MAC} : W_{i} \to X_{i}$ for each $i \in I$ and each $k \in {1, 2, \dots, n}$ , where $f_{i, k}^{MAC}$ is used by node i for encoding $X_{i, k}$ such that $X_{i, k} = f_{i, k}^{MAC} (W_{i})$ .
A decoding function $φ^{MAC} : Y^{n} \to W_{I}$ used by node $d$ for producing the message estimate ${\hat{W}}_{I} = φ^{MAC} (Y^{n})$ .

Definition 5.

The multiple access channel (MAC) is characterized by N input alphabets specified by

X_{I}

, an output alphabet specified by

Y

and a conditional distribution

q_{Y | X_{I}}

such that the following holds for any

(n, M_{I})

code: For each

k \in {1, 2, \dots, n}

,

\begin{matrix} p_{W_{I}, X_{I}^{n}, Y^{n}} = (\prod_{i \in I} p_{W_{i}, X_{i}^{n}}) \prod_{k = 1}^{n} p_{Y_{k} | X_{I, k}} \end{matrix}

(19)

where

p_{Y_{k} | X_{I, k}} (y_{k} | x_{I, k}) = q_{Y | X_{I}} (y_{k} | x_{I, k})

for all

x_{I, k} \in X_{I}

and

y_{k} \in Y

.

3.2. Polarization for Binary-Input MACs

Definition 6.

A MAC characterized by

q_{Y | X_{I}}

is called a binary-input MAC if

X_{I} = {0, 1}^{N}

.

Consider any binary-input MAC characterized by

q_{Y | X_{I}}

. For each

i \in I

, let

p_{X_{i}}

be the probability distribution of a Bernoulli random variable

X_{i}

, and let

p_{X_{i}^{n}}

be the distribution of n independent copies of

X_{i} \sim p_{X_{i}}

, i.e.,

p_{X_{i}^{n}} (x_{i}^{n}) = \prod_{k = 1}^{n} p_{X_{i}} (x_{i, k})

for all

x_{i}^{n} \in X_{i}^{n}

. Recall the polarization mapping

G_{n}

defined in (10). For each

i \in I

, define

p_{U_{i}^{n} | X_{i}^{n}}

such that

[U_{i, 1} U_{i, 2} \dots U_{i, n}] = [X_{i, 1} X_{i, 2} \dots X_{i, n}] G_{n}

(20)

where the addition and product operations are performed over GF(2), and define

p_{U_{I}^{n}, X_{I}^{n}, Y^{n}} ≜ (\prod_{i \in I} p_{X_{i}^{n}} p_{U_{i}^{n} | X_{i}^{n}}) \prod_{k = 1}^{n} p_{Y_{k} | X_{I, k}} .

(21)

In addition, for each

i \in I

and each

k \in {1, 2, \dots, n}

, define

[i - 1] ≜ {1, 2, \dots, i - 1}

and define the Bhattacharyya parameter associated with node i and time k as

\begin{matrix} Z^{[p_{X_{I}}; q_{Y | X_{I}}]} (U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}) ≜ \\ 2 \sum_{\begin{matrix} u_{i}^{k - 1} \in U_{i}^{k - 1} \end{matrix}} \sum_{\begin{matrix} x_{[i - 1]}^{n} \in {0, 1}^{(i - 1) n} \end{matrix}} \int_{Y^{n}} \sqrt{p_{U_{i, k}, U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}} (0, u_{i}^{k - 1}, x_{[i - 1]}^{n}, y^{n}) p_{U_{i, k}, U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}} (1, u_{i}^{k - 1}, x_{[i - 1]}^{n}, y^{n})} d y^{n} \end{matrix}

(22)

where the distributions in (22) are marginal distributions of

p_{U_{I}^{n}, X_{I}^{n}, Y^{n}}

defined in (21). The following lemma is a direct consequence of Lemma 1.

Lemma 2.

There exists a universal constant

t > 0

such that the following holds. Fix any binary-input MAC characterized by

q_{Y | X_{I}}

and any

p_{X_{I}}

. Then for any

m \in N

and

n ≜ 2^{m}

, we have

\frac{1}{n} |\{k \in {1, 2, \dots, n} |Z^{[p_{X_{I}}; q_{Y | X_{I}}]} (U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}) \leq \frac{1}{n^{4}}\}| \geq I_{p_{X_{I}} q_{Y | X_{I}}} (X_{i}; X_{[i - 1]}, Y) - \frac{t}{n^{1 / β}}

(23)

for each

i \in I

. This lemma continues to hold if the quantity

\frac{1}{n^{4}}

is replaced by

\frac{1}{n^{ν}}

for any

ν > 0

. The main result of this paper continues to hold if the quantity

\frac{1}{n^{4}}

in this lemma is replaced by

\frac{1}{n^{ν}}

for any

ν > 2

.

Proof.

Fix any

i \in I

. Construct

p_{X_{[i - 1]}, Y | X_{i}}

by marginalizing

p_{X_{I}} q_{Y | X_{I}}

and view

p_{X_{[i - 1]}, Y | X_{i}}

as the conditional distribution that characterizes a BMC. The lemma then follows directly from Lemma 1. ☐

3.3. Polar Codes That Achieve the Symmetric Sum-Capacity of a Binary-Input MAC

Throughout this paper, let

p_{X_{i}}^{*}

denote the uniform distribution on

{0, 1}

for each

i \in I

and define

p_{X_{I}}^{*} ≜ \prod_{i \in I} p_{X_{i}}^{*}

, i.e.,

\begin{matrix} p_{X_{I}}^{*} (x_{I}) = \frac{1}{2^{N}} \end{matrix}

(24)

for any

x_{I} \in {0, 1}^{N}

.

Definition 7.

For a binary-input MAC characterized by

q_{Y | X_{I}}

, the symmetric sum-capacity is defined to be

C_{sum} ≜ I_{p_{X_{I}}^{*} q_{Y | X_{I}}} (X_{I}; Y)

.

The following definition summarizes the polar codes for the binary-input MAC proposed in Section IV of [14].

Definition 8.

([14], Section IV) Fix an

n = 2^{m}

where

m \in N

. For each

i \in I

, let

J_{i}

be a subset of

{1, 2, \dots, n}

, define

J_{i}^{c} ≜ {1, 2, \dots, n} ∖ J_{i}

, and let

b_{i, J_{i}^{c}} ≜ (b_{i, k} \in {0, 1} : k \in J_{i}^{c})

be a binary tuple. An

(n, J_{I}, b_{I, J_{I}^{c}})

polar code, where

J_{I} ≜ (J_{i} : i \in I)

and

b_{I, J_{I}^{c}} ≜ (b_{i, J_{i}^{c}} : i \in I)

, consists of the following:

An index set for information bits transmitted by node i denoted by $J_{i}$ for each $i \in I$ . The set $J_{i}^{c}$ is referred to as the index set for frozen bits transmitted by node i.
A message set $W_{i} ≜ {1, 2, \dots, 2^{| J_{i} |}}$ for each $i \in I$ , where $W_{i}$ is uniform on $W_{i}$ .
An encoding bijection $f_{i}^{M A C} : W_{i} \to U_{i, J_{i}}$ for encoding $W_{i}$ into $| J_{i} |$ information bits denoted by $U_{i, J_{i}}$ for each $i \in I$ such that

$U_{i, J_{i}} = f_{i}^{MAC} (W_{i})$

(25)

where $U_{i, J_{i}}$ and $U_{i, J_{i}}$ are defined as $U_{i, J_{i}} ≜ \prod_{k \in J_{i}} U_{k}$ and $U_{i, J_{i}} ≜ (U_{i, k} : k \in J_{i})$ respectively. Since message $W_{i}$ is uniform on $W_{i}$ , $f_{i}^{MAC} (W_{i})$ is a sequence of independent and identically distributed (i.i.d.) uniform bits such that

$P {U_{i, J_{i}} = u_{i, J_{i}}} = \frac{1}{2^{| J_{i} |}}$

(26)

for all $u_{i, J_{i}} \in {0, 1}^{| J_{i} |}$ , where the bits are transmitted through the polarized channels indexed by $J_{i}$ . For each $i \in I$ and each $k \in J_{i}^{c}$ , let

$U_{i, k} = b_{i, k}$

(27)

be the frozen bit to be transmitted by node i in time slot k. After $U_{i}^{n}$ has been determined, node i transmits $X_{i}^{n}$ where

$[X_{i, 1} X_{i, 2} \dots X_{i, n}] ≜ [U_{i, 1} U_{i, 2} \dots U_{i, n}] G_{n}^{- 1} .$

(28)
A sequence of successive cancellation decoding functions $φ_{i, k}^{MAC} : {0, 1}^{k - 1} \times {0, 1}^{(i - 1) n} \times Y^{n} \to {0, 1}$ for each $i \in I$ and each $k \in {1, 2, \dots, n}$ such that the recursively generated $({\hat{U}}_{1, 1}, \dots, {\hat{U}}_{1, n})$ , $({\hat{U}}_{2, 1}, \dots, {\hat{U}}_{2, n}), \dots, ({\hat{U}}_{N, 1}, \dots, {\hat{U}}_{N, n})$ and $({\hat{X}}_{1, 1}, \dots, {\hat{X}}_{1, n}), ({\hat{X}}_{2, 1}, \dots, {\hat{X}}_{2, n})$ ,…, $({\hat{X}}_{N, 1}, \dots, {\hat{X}}_{N, n})$ are produced as follows. For each $i \in I$ and each $k = 1, 2, \dots, n$ , given that ${\hat{U}}_{i}^{k - 1}$ , ${\hat{U}}_{[i - 1]}^{n}$ and ${\hat{X}}_{[i - 1]}^{n}$ have been constructed before the construction of ${\hat{U}}_{i, k}$ , node $d$ constructs the estimate of $U_{i, k}$ through computing

${\hat{U}}_{i, k} ≜ φ_{i, k}^{MAC} ({\hat{U}}_{i}^{k - 1}, {\hat{X}}_{[i - 1]}^{n}, Y^{n})$

(29)

where

$\begin{matrix} {\hat{u}}_{i, k} & ≜ φ_{i, k}^{MAC} ({\hat{u}}_{i}^{k - 1}, {\hat{x}}_{[i - 1]}^{n}, y^{n}) \\ = \{\begin{matrix} 0 & if k \in J_{i} and p_{U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}} (0 | {\hat{u}}_{i}^{k - 1}, {\hat{x}}_{[i - 1]}^{n}, y^{n}) \geq p_{U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}} (1 | {\hat{u}}_{i}^{k - 1}, {\hat{x}}_{[i - 1]}^{n}, y^{n}), \\ 1 & if k \in J_{i} and p_{U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}} (0 | {\hat{u}}_{i}^{k - 1}, {\hat{x}}_{[i - 1]}^{n}, y^{n}) < p_{U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}} (1 | {\hat{u}}_{i}^{k - 1}, {\hat{x}}_{[i - 1]}^{n}, y^{n}), \\ b_{i, k} & if k \in J_{i}^{c} . \end{matrix} \end{matrix}$

(30)

After obtaining ${\hat{U}}_{i}^{n}$ , node $d$ constructs the estimate of $X_{i}^{n}$ through computing

$\begin{matrix} [{\hat{X}}_{i, 1} {\hat{X}}_{i, 2} \dots {\hat{X}}_{i, n}] ≜ [{\hat{U}}_{i, 1} {\hat{U}}_{i, 2} \dots {\hat{U}}_{i, n}] G_{n}^{- 1} \end{matrix}$

(31)

and declares that

${\hat{W}}_{i} ≜ {(f_{i}^{MAC})}^{- 1} ({\hat{U}}_{i, J_{i}})$

(32)

is the transmitted $W_{i}$ where ${(f_{i}^{MAC})}^{- 1}$ denotes the inverse function of $f_{i}^{MAC}$ .

Remark 1.

By inspecting Definition 4 and Definition 8, we see that every

(n, J_{I}, b_{I, J_{I}^{c}})

polar code is also an

(n, (2^{| J_{1} |}, 2^{| J_{2} |}, \dots, 2^{| J_{N} |}))

code.

Definition 9.

The uniform-input

(n, J_{I})

polar code is defined as an

(n, J_{I}, B_{I, J_{I}^{c}})

polar code where

B_{I, J_{I}^{c}}

consists of i.i.d. uniform bits that are independent of the message

W_{I}

.

Definition 10.

For the uniform-input

(n, J_{I})

polar code defined for the MAC, the probability of decoding error is defined as

P {{\hat{W}}_{I} \neq W_{I}} = P \{⋃_{i \in I} {U_{i, J_{i}} \neq {\hat{U}}_{i, J_{i}}}\}

where the error is averaged over the random messages and the frozen bits. The code is also called a uniform-input

(n, J_{I}, ε)

polar code if the probability of decoding error is no larger than ε.

The following proposition bounds the error probability in terms of Bhattacharyya parameters, and it is a generalization of the well-known result for the special case

N = 1

(e.g., see [3], Proposition 2). The proof of Proposition 1 can be deduced from Section IV of [14], and is contained in Appendix A for completeness.

Proposition 1.

For the uniform-input

(n, J_{I})

polar code defined for the MAC

q_{Y | X_{I}}

, we have

\begin{matrix} P {{\hat{W}}_{I} \neq W_{I}} \leq \sum_{i = 1}^{N} \sum_{k \in J_{i}} Z^{[p_{X_{I}}; q_{Y | X_{I}}]} (U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}) . \end{matrix}

(33)

The following proposition follows from combining Lemma 2, Definition 7 and Proposition 1.

Proposition 2.

There exists a universal constant

t > 0

such that the following holds. Fix any N-source binary-input MAC characterized by

q_{Y | X_{I}}

. Fix any

m \in N

, let

n = 2^{m}

and define

J_{i}^{SE} ≜ \{k \in {1, 2, \dots, n} |Z^{[p_{X_{I}}^{*}; q_{Y | X_{I}}]} (U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}) \leq \frac{1}{n^{4}}\}

(34)

for each

i \in I

, where

p_{X_{I}}^{*}

is the uniform distribution as defined in (24) and the superscript “SE" stands for “scaling exponent". Then, the corresponding uniform-input

(n, J_{I}^{SE})

polar code satisfies

\begin{matrix} \frac{\sum_{i = 1}^{N} |J_{i}^{SE}|}{n} \geq C_{sum} - \frac{t N}{n^{1 / β}} \end{matrix}

(35)

and

\begin{matrix} P {{\hat{W}}_{I} \neq W_{I}} \leq \frac{N}{n^{3}} . \end{matrix}

(36)

Proof.

Let

t > 0

be the universal constant specified in Lemma 2 and fix an n. For each

i \in I

, it follows from Lemma 2 and Proposition 1 that

\begin{matrix} \frac{|J_{i}^{SE}|}{n} \geq I_{p_{X_{I}}^{*} q_{Y | X_{I}}} (X_{i}; X_{[i - 1]}, Y) - \frac{t}{n^{1 / β}} \end{matrix}

(37)

and

\begin{matrix} P {{\hat{W}}_{I} \neq W_{I}} \leq \frac{N}{n^{3}} \end{matrix}

(38)

for the uniform-input

(n, J_{I}^{SE})

polar code. Since

p_{X_{I}}^{*} = \prod_{i = 1}^{N} p_{X_{i}}^{*}

, it follows that

I_{p_{X_{I}}^{*} q_{Y | X_{I}}} (X_{i}; X_{[i - 1]}, Y) = I_{p_{X_{I}}^{*} q_{Y | X_{I}}} (X_{i}; Y | X_{[i - 1]})

(39)

holds for each

i \in I

, which implies that

\sum_{i = 1}^{N} I_{p_{X_{I}}^{*} q_{Y | X_{I}}} (X_{i}; X_{[i - 1]}, Y) = I_{p_{X_{I}}^{*} q_{Y | X_{I}}} (X_{I}; Y) .

(40)

Consequently, (35) follows from (37), (40) and Definition 7, and (36) follows from (38). ☐

Remark 2.

Proposition 2 shows that the sum-capacity of a binary-input MAC with N sources can be achieved within a gap of

O (N n^{- 1 / β})

by using a superposition of N binary-input polar codes.

4. Problem Formulation of the AWGN Channel and New Polarization Results

4.1. The AWGN Channel

It is well known that appropriately designed polar codes are capacity-achieving for the AWGN channel [13]. The main contribution of this paper is proving an upper bound on the scaling exponent of polar codes for the AWGN channel by using uniform-input polar codes for binary-input MACs described in Definition 8. The following two definitions formally define the AWGN channel and length-n codes for the channel.

Definition 11.

An

(n, M, P)

code is an

(n, M)

code described in Definition 1 subject to the additional assumptions that

X = R

and the peak power constraint

P \{\frac{1}{n} \sum_{k = 1}^{n} X_{k}^{2} \leq P\} = 1

(41)

is satisfied.

Definition 12.

The AWGN channel is a point-to-point memoryless channel described in Definition 2 subject to the additional assumption that

Y = R

and

q_{Y | X} (y | x) = N (y; x, 1)

for all

x \in R

and

y \in R

.

Definition 13.

For an

(n, M, P)

code defined on the AWGN channel, we can calculate according to (9) the average probability of error defined as

P \{\hat{W} \neq W\}

. We call an

(n, M, P)

code with average probability of error no larger than ε an

(n, M, P, ε)

code.

4.2. Uniform-Input Polar Codes for the AWGN Channel

Recall that we would like to use uniform-input polar codes for binary-input MACs described in Definition 8 to achieve the capacity of the AWGN channel, i.e.,

C (P)

in (3). The following definition describes the basic structure of such uniform-input polar codes.

Definition 14.

Fix an

n = 2^{m}

where

m \in N

. An

{(n, J_{I}, P, A)}_{avg}

polar code with average power P and input alphabet

A

consists of the following:

An input alphabet $A \subset R \cup {0^{-}}$ with $| A | = n$ such that

$\frac{1}{n} \sum_{a \in A ∖ {0^{-}}} a^{2} \leq P,$

(42)

where $R \cup {0^{-}}$ can be viewed as a line with 2 origins. Introducing the symbol $0^{-}$ allows us to create a set of cardinality n which consists of $n - 2$ non-zero real numbers and 2 origins 0 and $0^{-}$ . We index each element of $A$ by a unique length-m binary tuple

$x_{I}^{MAC} ≜ (x_{1}^{MAC}, x_{2}^{MAC}, \dots, x_{m}^{MAC}),$

(43)

and let $ρ : {0, 1}^{m} \to A$ be the bijection that maps the indices to the elements of $A$ such that $ρ (x_{I}^{MAC})$ denotes the element in $A$ indexed by $x_{I}^{MAC}$ .
A binary-input MAC $q_{Y | X_{I}^{MAC}}$ induced by $A$ as defined through Definitions 5 and 6 with the identifications $N = m$ and $I = {1, 2, \dots, m}$ .
A message set $W_{i} ≜ {1, 2, \dots, 2^{| J_{i} |}}$ for each $i \in {1, 2, \dots, m}$ , where $W_{I}$ is the message alphabet of the uniform-input $(n, J_{I})$ polar code for the binary-input MAC $q_{Y | X_{I}^{MAC}}$ as defined through Definitions 8 and 9 such that

$| W_{I} | = \prod_{i = 1}^{m} | W_{i} | = 2^{\sum_{i = 1}^{m} | J_{i} |} .$

(44)

In addition, $W_{I}$ is uniform on $W_{I}$ . We view the uniform-input $(n, J_{I})$ polar code as an $(n, (2^{| J_{1} |}, 2^{| J_{2} |}, \dots, 2^{| J_{N} |}))$ code (cf. Remark 1) and let ${f_{i, k}^{MAC} | i \in I, k \in {1, 2, \dots, n}}$ and $φ^{MAC}$ denote the corresponding set of encoding functions and the decoding function respectively (cf. Definition 4).
An encoding function $f_{k} : W_{I} \to A$ defined as

$f_{k} (W_{I}) ≜ ρ (f_{1, k}^{MAC} (W_{1}), f_{2, k}^{MAC} (W_{2}), \dots, f_{m, k}^{MAC} (W_{m}))$

(45)

for each $k \in {1, 2, \dots, n}$ , where $f_{k}$ is used for encoding $W_{I}$ into $X_{k}$ such that

$X_{k} = \{\begin{matrix} f_{k} (W_{I}) & if f_{k} (W_{I}) \neq 0^{-}, \\ 0 & if f_{k} (W_{I}) = 0^{-} . \end{matrix}$

(46)

Note that both the encoded symbols 0 and $0^{-}$ in $A$ result in the same transmitted symbol $0 \in R$ according to (46). By construction, $f_{1} (W_{I})$ , $f_{2} (W_{I})$ , …, $f_{n} (W_{I})$ are i.i.d. random variables that are uniformly distributed on $A$ and hence $X_{1}$ , $X_{2}$ , …, $X_{n}$ are i.i.d. real-valued random variables (but not necessarily uniform).
A decoding function $φ : R^{n} \to W_{I}$ defined as

$φ ≜ φ^{MAC}$

(47)

such that

${\hat{W}}_{I} = φ (Y^{n}) .$

(48)

Remark 3.

For an

{(n, J_{I}, P, A)}_{avg}

polar code, the flexibility of allowing

A

to contain 2 origins is crucial to proving the main result of this paper. This is because the input distribution which we will use to establish scaling results for the AWGN channel in Theorem 1 can be viewed as the uniform distribution over some set that contains 2 origins, although the input distribution in the real domain as specified in (52) to follow is not uniform.

Proposition 3.

There exists a universal constant

t > 0

such that the following holds. Suppose we are given an

{(n, J_{I}^{SE}, P, A)}_{avg}

polar code defined for the AWGN channel

q_{Y | X}

with a 2-origin

A

(i.e.,

A \supseteq {0, 0^{-}})

. Define

X ≜ A ∖ {0^{-}} \subset R

where

X

contains 1 origin and

n - 2

non-zero real numbers. Then, the

{(n, J_{I}^{SE}, P, A)}_{avg}

polar code is an

(n, M)

-code (cf. Definition 1) which satisfies

\begin{matrix} \frac{1}{n} \log M \geq I_{p_{X}^{'} q_{Y | X}} (X; Y) - \frac{t \log n}{n^{1 / β}}, \end{matrix}

(49)

\begin{matrix} P {{\hat{W}}_{I} \neq W_{I}} \leq \frac{\log n}{n^{3}}, \end{matrix}

(50)

and

\begin{matrix} P \{X^{n} = x^{n}\} = \prod_{k = 1}^{n} P \{X_{k} = x_{k}\} = \prod_{k = 1}^{n} p_{X}^{'} (x_{k}) \end{matrix}

(51)

for all

x^{n} \in X^{n}

where

p_{X}^{'}

is the distribution on

X

defined as

p_{X}^{'} (a) = \{\begin{matrix} \frac{1}{n} & if a \neq 0, \\ \frac{2}{n} & if a = 0 . \end{matrix}

(52)

Proof.

The proposition follows from inspecting Proposition 2 and Definition 14 with the identifications

N = \log n

and

\log M = \sum_{i = 1}^{m} | J_{i} |

. ☐

The following lemma, a strengthened version of Theorem 6 of [13], provides a construction of a good

A

which leads to a controllable gap between

C (P)

and

I_{p_{X}^{'} q_{Y | X}} (X; Y)

for the corresponding

{(n, J_{I}^{SE}, P, A)}_{avg}

polar code. Although the following lemma is intuitive, the proof is technical and hence relegated to Appendix B.

Lemma 3.

Let

q_{Y | X}

be the conditional distribution that characterizes the AWGN channel, and fix any

γ \in [0, 1)

. For each

n = 2^{m}

where

m \in N

, define

D_{n} ≜ \{\frac{ℓ}{n}| ℓ \in \{1, 2, \dots, \frac{n}{2}, \dots n - 1\}\},

(53)

define

s_{X} (x) ≜ N (x; 0, P (1 - \frac{1}{n^{(1 - γ) / β}}))

(54)

for all

x \in R

, define

Φ_{X}

to be the cumulative distribution function (cdf) of

s_{X}

, and define

X ≜ Φ_{X}^{- 1} (D_{n}) .

(55)

Note that

X

contains 1 origin and

n - 2

non-zero real numbers, and we let

p_{X}^{'}

be the distribution on

X

as defined in (52). In addition, define the distribution

p_{X^{n}}^{'} (x^{n}) ≜ \prod_{k = 1}^{n} p_{X}^{'} (x_{k})

. Then, there exists a constant

t^{'} > 0

that depends on P and γ but not n such that the following statements hold for each

n \in N

:

\begin{matrix} E_{p_{X}^{'}} [X^{2}] \leq P (1 - \frac{1}{n^{(1 - γ) / β}}), \end{matrix}

(56)

\begin{matrix} P_{p_{X^{n}}^{'}} \{\frac{1}{n} \sum_{k = 1}^{n} X_{k}^{2} > P\} \leq \frac{e^{3}}{e^{n^{\frac{1}{2} - \frac{1 - γ}{β}}}}, \end{matrix}

(57)

and

\begin{matrix} C (P) - I_{p_{X}^{'} q_{Y | X}} (X; Y) \leq \frac{t^{'} \sqrt{\log n}}{n^{(1 - γ) / β}} . \end{matrix}

(58)

A shortcoming of Proposition 3 is that the

{(n, J_{I}^{SE}, P, A)}_{avg}

polar code may not satisfy the peak power constraint (41) and hence it may not qualify as an

(n, 2^{\sum_{i = 1}^{N} | J_{i}^{SE} |}, P)

code (cf. Definition 11). Therefore, we describe in the following definition a slight modification of an

{(n, J_{I}^{SE}, P, A)}_{avg}

polar code so that the modified polar code always satisfies the peak power constraint (41).

Definition 15.

The 0-power-outage version of an

{(n, J_{I}, P, A)}_{avg}

polar code is an

(n, 2^{\sum_{i = 1}^{N} | J_{i} |})

code which follows identical encoding and decoding operations of the

{(n, J_{I}, P, A)}_{avg}

polar code except that the source will modify the input symbol in a time slot k if the following scenario occurs: Let

X^{n}

be the desired codeword generated by the source according to the encoding operation of the

{(n, J_{I}, P, A)}_{avg}

polar code, where the randomness of

X^{n}

originates from the information bits

(U_{i, J_{i}} : i \in I)

and the frozen bits

B_{I, J_{I}^{c}}

. If transmitting the desired symbol

X_{k}

at time k results in violating the power constraint

\frac{1}{n} \sum_{ℓ = 1}^{k} X_{ℓ}^{2} > P

, the source will transmit the symbol 0 at time k instead. An

(n, 2^{\sum_{i = 1}^{N} | J_{i} |}, ε)

code is called an

{(n, J_{I}, P, A, ε)}_{peak}

polar code if it is the 0-power-outage version of some

{(n, J_{I}, P, A)}_{avg}

polar code.

By Definition 15, every

{(n, J_{I}, P, A, ε)}_{peak}

polar code satisfies the peak power constraint (41) and hence achieves zero power outage, i.e.,

P {\frac{1}{n} \sum_{k = 1}^{n} X_{k}^{2} > P} = 0

. Using Definition 15, we obtain the following corollary which states the obvious fact that the probability of power outage of an

{(n, J_{I}, P, A)}_{avg}

polar code can be viewed as part of the probability of error of the 0-power-outage version of the code.

Corollary 1.

Given an

{(n, J_{I}, P, A)}_{avg}

-polar code, define

ε_{1} ≜ P {{\hat{W}}_{I} \neq W_{I}}

(59)

and

ε_{2} ≜ P \{\frac{1}{n} \sum_{k = 1}^{n} X_{k}^{2} > P\} .

(60)

Then, the 0-power-outage version of the

{(n, J_{I}, P, A)}_{avg}

polar code is an

{(n, J_{I}, P, A, ε_{1} + ε_{2})}_{peak}

polar code that satisfies the peak power constraint (41).

5. Scaling Exponents and Main Result

5.1. Scaling Exponent of Uniform-Input Polar Codes for MACs

We define scaling exponent of uniform-input polar codes for the binary-input MAC as follows.

Definition 16.

Fix an

ε \in (0, 1)

and an N-source binary-input MAC

q_{Y | X_{I}}

with symmetric sum-capacity

C_{sum}

(cf. Definition 7). The scaling exponent of uniform-input polar codes for the MAC is defined as

\begin{matrix} μ_{ε}^{PC - MAC} ≜ \\ \underset{m \to \infty}{\lim \inf} \inf_{J_{I}} \{\frac{- \log n}{\log |C_{sum} - \frac{\sum_{i = 1}^{N} | J_{i} |}{n}|}| n = 2^{m}, there exists a uniform - input (n, J_{I}, ε) - polar code on q_{Y | X_{I}}\} . \end{matrix}

(61)

Definition 16 formalizes the notion that we are seeking the smallest

μ \geq 0

such that

| C_{sum} - R_{n}^{MAC} | = O (n^{- \frac{1}{μ}})

holds where

R_{n}^{MAC} ≜ \frac{| J |}{n}

denotes the rate of an

(n, J_{I}, ε)

polar code. Using the existing results in Section IV-C of [5] and Theorem 2 of [4], we know that

3.579 \leq μ_{ε}^{PC - BMSC} \leq β = 4.714 \forall ε \in (0, 1)

(62)

for the special case

N = 1

where the binary-input MAC is a BMSC. We note from Theorem 48 of [17] (and also [18,19]) that the optimal scaling exponent (optimized over all codes) for any non-degenerate discrete memoryless channel (DMC) as well as BMC is equal to 2 for all

ε \in (0, 1 / 2)

.

Using Proposition 1 and Definition 16, we obtain the following corollary, which shows that 4.714, the upper bound on

μ_{ε}^{PC - BMSC}

in (62) for BMSCs, remains a valid upper bound on the scaling exponent for binary-input MACs.

Corollary 2.

Fix any

ε \in (0, 1)

and any binary-input MAC

q_{Y | X_{I}}

. Then,

μ_{ε}^{PC - MAC} \leq β = 4.714 .

5.2. Scaling Exponent of Uniform-Input Polar Codes for the AWGN Channel

Definition 17.

Fix a

P > 0

and an

ε \in (0, 1)

. The scaling exponent of uniform-input polar codes for the AWGN channel is defined as

\begin{matrix} μ_{P, ε}^{PC - AWGN} ≜ \\ \underset{m \to \infty}{\lim \inf} \inf_{J_{I}, A} \{\frac{- \log n}{\log |C (P) - \frac{\sum_{i = 1}^{N} | J_{i} |}{n}|}| n = 2^{m}, there exists a uniform - input {(n, J_{I}, P, A, ε)}_{peak} - polar code\} . \end{matrix}

(63)

Definition 17 formalizes the notion that we are seeking the smallest

μ \geq 0

such that

|C (P) - R_{n}^{AWGN}| = O (n^{- \frac{1}{μ}})

holds where

R_{n}^{AWGN} ≜ \frac{\sum_{i = 1}^{N} | J_{i} |}{n}

denotes the rate of an

{(n, J_{I}, P, ε)}_{peak}

polar code. We note from Theorem 54 of [17] and Theorem 5 of [19] that that the optimal scaling exponent of the optimal code for the AWGN channel is equal to 2 for any

ε \in (0, 1 / 2)

. The following theorem is the main result of this paper, which shows that

4.714

is a valid upper bound on the scaling exponent of polar codes for the AWGN channel.

Theorem 1.

Fix any

P > 0

and any

ε \in (0, 1)

. There exists a constant

t^{*} > 0

that does not depend on n such that the following holds. For any

n = 2^{m}

where

m \in N

, there exists an

A

such that the corresponding

{(n, J_{I}^{SE}, P, A)}_{peak}

polar code defined for the AWGN channel

q_{Y | X}

satisfies

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{N} | J_{i}^{SE} | \geq C (P) - \frac{t^{*} \log n}{n^{1 / β}} \end{matrix}

(64)

and

\begin{matrix} P {{\hat{W}}_{I} \neq W_{I}} \leq \frac{\log n}{n^{3}} + \frac{e^{3}}{e^{n^{\frac{1}{2} - \frac{1}{β}}}} . \end{matrix}

(65)

In particular, we have,

μ_{P, ε}^{PC - AWGN} \leq β = 4.714 .

(66)

Proof.

Fix a

P > 0

, an

ε \in (0, 1)

and an

n = 2^{m}

where

m \in N

. Combining Proposition 3 and Lemma 3, we conclude that there exists a constant

t^{*} > 0

that does not depend on n and an

A

such that the corresponding

{(n, J_{I}^{SE}, P, A)}_{avg}

polar code defined for the AWGN channel

q_{Y | X}

satisfies (64),

\begin{matrix} P {{\hat{W}}_{I} \neq W_{I}} \leq \frac{\log n}{n^{3}}, \end{matrix}

(67)

and

P \{\frac{1}{n} \sum_{k = 1}^{n} X_{k}^{2} > P\} \leq \frac{e^{3}}{e^{n^{\frac{1}{2} - \frac{1}{β}}}} .

(68)

Using (67), (68) and Corollary 1, we conclude that the

{(n, J_{I}^{SE}, P, A)}_{avg}

polar code is an

{(n, J_{I}^{SE}, P, A)}_{peak}

polar code that satisfies (64) and (65). Since

\begin{matrix} \frac{\log n}{n^{3}} + \frac{e^{3}}{e^{n^{\frac{1}{2} - \frac{1}{β}}}} \leq ε \end{matrix}

(69)

for all sufficiently large n, it follows from (64), (65) and Definition 17 that (66) holds. ☐

6. Moderate Deviations Regime

6.1. Polar Codes That Achieve the Symmetric Capacity of a BMC

The following result is based on Section IV of [4], which developed a tradeoff between the gap to capacity and the decay rate of the error probability for a BMC under the moderate deviations regime [15] where both the gap to capacity and the error probability vanish as n grows.

Lemma 4.

([4], Section IV) There exists a universal constant

t_{MD} > 0

such that the following holds. Fix any

γ \in (\frac{1}{1 + β}, 1)

and any BMC characterized by

q_{Y | X}

. Recall that

p_{X}^{*}

denotes the uniform distribution on

{0, 1}

. Then for any

n = 2^{m}

where

m \in N

, we have

\begin{matrix} \frac{1}{n} |\{k \in {1, 2, \dots, n} |Z^{[p_{X}^{*}; q_{Y | X}]} (U_{k} | U^{k - 1}, Y^{n}) \leq 2^{- n^{γ h_{2}^{- 1} (\frac{γ β + γ - 1}{γ β})}}\}| \geq I_{p_{X}^{*} q_{Y | X}} (X; Y) - \frac{t_{MD}}{n^{(1 - γ) / β}} \end{matrix}

(70)

where

h_{2} : [0, 1 / 2] \to [0, 1]

denotes the binary entropy function.

6.2. Polar Codes that Achieve the Symmetric Sum-Capacity of a Binary-Input MAC

The following lemma, whose proof is omitted because it is analogous to the proof of Lemma 2, is a direct consequence of Lemma 4.

Lemma 5.

There exists a universal constant

t_{MD} > 0

such that the following holds. Fix any

γ \in (\frac{1}{1 + β}, 1)

and any binary-input MAC characterized by

q_{Y | X_{I}}

. Recall that

p_{X_{I}}^{*} = \prod_{i \in I} p_{X_{i}}^{*}

. Then for any

n = 2^{m}

where

m \in N

, we have

\begin{matrix} \frac{1}{n} |\{k \in {1, 2, \dots, n} |Z^{[p_{X_{I}}^{*}; q_{Y | X_{I}}]} (U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}) \leq 2^{- n^{γ h_{2}^{- 1} (\frac{γ β + γ - 1}{γ β})}}\}| \\ \geq I_{p_{X_{I}}^{*} q_{Y | X_{I}}} (X_{i}; X_{[i - 1]}, Y) - \frac{t_{MD}}{n^{(1 - γ) / β}} \end{matrix}

(71)

for each

i \in I

.

Combining Lemma 5, Definition 7 and Proposition 1, we obtain the following proposition, whose proof is analogous to the proof of Proposition 2 and hence omitted.

Proposition 4.

There exists a universal constant

t_{MD} > 0

such that the following holds. Fix any

γ \in (\frac{1}{1 + β}, 1)

and any N-source binary-input MAC characterized by

q_{Y | X_{I}}

. In addition, fix any

m \in N

, let

n = 2^{m}

and define

J_{i}^{MD} ≜ \{k \in {1, 2, \dots, n} |Z^{[p_{X_{I}}; q_{Y | X_{I}}]} (U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}) \leq 2^{- n^{γ h_{2}^{- 1} (\frac{γ β + γ - 1}{γ β})}}\}

(72)

for each

i \in I

where the superscript “MD” stands for “moderate deviations”. Then, the corresponding uniform-input

(n, J_{I}^{MD})

polar code described in Definition 9 satisfies

\begin{matrix} \frac{\sum_{i = 1}^{N} |J_{i}^{MD}|}{n} \geq C_{sum} - \frac{t_{MD} N}{n^{(1 - γ) / β}} \end{matrix}

(73)

and

\begin{matrix} P {{\hat{W}}_{I} \neq W_{I}} \leq N n 2^{- n^{γ h_{2}^{- 1} (\frac{γ β + γ - 1}{γ β})}} . \end{matrix}

(74)

6.3. Uniform-Input Polar Codes for the AWGN Channel

Proposition 5.

There exists a universal constant

t_{MD} > 0

such that the following holds. Fix any

γ \in (\frac{1}{1 + β}, 1)

. Suppose we are given an

{(n, J_{I}^{MD}, P, A)}_{avg}

polar code (cf. Definition 14) defined for the AWGN channel

q_{Y | X}

with a 2-origin

A

(i.e.,

A \supseteq {0, 0^{-}})

. Define

X ≜ A ∖ {0^{-}} \subset R

where

X

contains 1 origin and

n - 2

non-zero real numbers. Then, the

{(n, J_{I}^{MD}, P, A)}_{avg}

polar code is an

(n, M)

-code (cf. Definition 1) which satisfies

\begin{matrix} \frac{1}{n} \log M \geq I_{p_{X}^{'} q_{Y | X}} (X; Y) - \frac{t_{MD} \log n}{n^{(1 - γ) / β}}, \end{matrix}

(75)

\begin{matrix} P {{\hat{W}}_{I} \neq W_{I}} \leq (\log n) n 2^{- n^{γ h_{2}^{- 1} (\frac{γ β + γ - 1}{γ β})}}, \end{matrix}

(76)

and

\begin{matrix} P \{X^{n} = x^{n}\} = \prod_{k = 1}^{n} P \{X_{k} = x_{k}\} = \prod_{k = 1}^{n} p_{X}^{'} (x_{k}) \end{matrix}

(77)

for all

x^{n} \in X^{n}

where

p_{X}^{'}

is the distribution on

X

as defined in (52).

Proof.

The proposition follows from inspecting Proposition 4 and Definition 14 with the identifications

N = \log n

and

\log M = \sum_{i = 1}^{m} | J_{i}^{MD} |

. ☐

The following theorem develops the tradeoff between the gap to capacity and the decay rate of the error probability for

{(n, J_{I}^{MD}, P, A)}_{peak}

polar codes defined for the AWGN channel.

Theorem 2.

Fix a

γ \in (\frac{1}{1 + β}, 1)

. There exists a constant

t_{MD}^{*} > 0

that depends on P and γ but not n such that the following holds for any

n = 2^{m}

where

m \in N

. There exists an

{(n, J_{I}^{MD}, P, A, ε)}_{peak}

polar code defined for the AWGN channel

q_{Y | X}

that satisfies,

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{N} | J_{i}^{MD} | \geq C (P) - \frac{t_{MD}^{*} \log n}{n^{(1 - γ) / β}}, \end{matrix}

(78)

and,

\begin{matrix} P {{\hat{W}}_{I} \neq W_{I}} \leq (n \log n + e^{3}) 2^{- n^{γ h_{2}^{- 1} (\frac{γ β + γ - 1}{γ β})}} . \end{matrix}

(79)

Proof.

By Proposition 5 and Lemma 3, there exists a constant

t_{MD}^{*} > 0

that depends on P and γ but not n such that for any

n = 2^{m}

where

m \in N

, there exists an

A

and the corresponding

{(n, J_{I}^{MD}, P, A)}_{avg}

polar code that satisfies (78),

\begin{matrix} P {{\hat{W}}_{I} \neq W_{I}} \leq (\log n) n 2^{- n^{γ h_{2}^{- 1} (\frac{γ β + γ - 1}{γ β})}}, \end{matrix}

(80)

and

\begin{matrix} P \{\frac{1}{n} \sum_{k = 1}^{n} X_{k}^{2} > P\} \leq \frac{e^{3}}{e^{n^{\frac{1}{2} - \frac{1 - γ}{β}}}} . \end{matrix}

(81)

Equation (79) remains to be shown. Using (80), (81) and Corollary 1, we conclude that the

{(n, J_{I}^{MD}, P, A)}_{avg}

polar code is an

{(n, J_{I}^{MD}, P, A, ε)}_{peak}

polar code that satisfies,

\begin{matrix} ε & ≜ (n \log n) 2^{- n^{γ h_{2}^{- 1} (\frac{γ β + γ - 1}{γ β})}} + \frac{e^{3}}{e^{n^{\frac{1}{2} - \frac{1 - γ}{β}}}} \end{matrix}

(82)

\begin{matrix} \leq (n \log n + e^{3}) 2^{- n^{γ h_{2}^{- 1} (\frac{γ β + γ - 1}{γ β})}} \end{matrix}

(83)

where the inequality follows from the fact that

h_{2} (x) \geq 2 x

for all

x \in [0, 1 / 2]

. This concludes the proof. ☐

Remark 4.

A candidate of

A

in Theorem 2 can been explicitly constructed according to Lemma 3 with the identification

A \equiv X \cup {0^{-}}

.

7. Concluding Remarks

In this paper, we provided an upper bound on the scaling exponent of polar codes for the AWGN channel (Theorem 1). In addition, in Theorem 2 we have shown a moderate deviations result—namely, the existence of polar codes which obey a certain tradeoff between the gap to capacity and the decay rate of the error probability for the AWGN channel.

Since the encoding and decoding complexities of the binary-input polar code for a BMC are

O (n \log n)

as long as we allow pseudorandom numbers to be shared between the encoder and the decoder for encoding and decoding the randomized frozen bits (e.g., see [3], Section IX), the encoding and decoding complexities of the polar codes for the AWGN channel defined in Definition 14 and Definition 15 are

O (n \log n) \times \log n = O (n \log^{2} n)

. By a standard probabilistic argument, there must exist a deterministic encoder for the frozen bits such that the decoding error of the polar code for the AWGN channel with the deterministic encoder is no worse than the polar code with randomized frozen bits. In the future, it may be fruitful to develop low-complexity algorithms for finding a good deterministic encoder for encoding the frozen bits. Another interesting direction for future research is to compare the empirical performance between our polar codes in Definitions 14 and 15 and the state-of-the-art polar codes. One may also explore various techniques (e.g., list decoding, cyclic redundancy check (CRC), etc.) to improve the empirical performance of the polar codes constructed herein.

Author Contributions

S. L. Fong carried out the research and wrote the paper. V. Y. F. Tan proposed the research topic, suggested and improved the flow of the presentation in the paper, and verified the correctness of the results.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Proposition 1

Unless specified otherwise, all the probabilities in this proof are evaluated according to the distribution induced by the uniform-input

(n, J_{I})

polar code. Consider

\begin{matrix} (A 1) & P {{\hat{W}}_{I} \neq W_{I}} & = \sum_{i = 1}^{N} P \{{{\hat{W}}_{i} \neq W_{i}} \cap {{\hat{W}}_{[i - 1]} = W_{[i - 1]}}\} \\ (A 2) & = \sum_{i = 1}^{N} P \{{{\hat{U}}_{i}^{n} \neq U_{i}^{n}} \cap {{\hat{X}}_{[i - 1]}^{n} = X_{[i - 1]}^{n}}\} \\ (A 3) & = \sum_{i = 1}^{N} \sum_{k = 1}^{n} P \{{{\hat{U}}_{i, k} \neq U_{i, k}} \cap {{\hat{U}}_{i}^{k - 1} = U_{i}^{k - 1}} \cap {{\hat{X}}_{[i - 1]}^{n} = X_{[i - 1]}^{n}}\} \end{matrix}

where (A2) is due to Definition 8 where

{{\hat{W}}_{i} = W_{i}} = {{\hat{U}}_{i}^{n} = U_{i}^{n}}

and

{{\hat{W}}_{[i - 1]} = W_{[i - 1]}} = {{\hat{X}}_{[i - 1]}^{n} = X_{[i - 1]}^{n}}

. For each

i \in I

and each

k \in J_{i}

, we have

\begin{matrix} P \{{{\hat{U}}_{i, k} \neq U_{i, k}} \cap {{\hat{U}}_{i}^{k - 1} = U_{i}^{k - 1}} \cap {{\hat{X}}_{[i - 1]}^{n} = X_{[i - 1]}^{n}}\} \\ \leq \sum_{u_{i, k} \in {0, 1}} \sum_{u_{i}^{k - 1} \in U_{i}^{k - 1}} \sum_{x_{[i - 1]}^{n} \in X_{[i - 1]}^{n}} \int_{Y^{n}} p_{U_{i}^{k}, X_{[i - 1]}^{n}, Y^{n}} (u_{i}^{k}, x_{[i - 1]}^{n}, y^{n}) \\ (A 4) & \times 1 \{p_{U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}} (u_{i, k} | u_{i}^{k - 1}, x_{[i - 1]}^{n}, y^{n}) \leq p_{U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}} (1 - u_{i, k} | u_{i}^{k - 1}, x_{[i - 1]}^{n}, y^{n})\} d y^{n} \\ \leq 2 \sum_{u_{i}^{k - 1} \in U_{i}^{k - 1}} \sum_{x_{[i - 1]}^{n} \in X_{[i - 1]}^{n}} \int_{Y^{n}} p_{U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}} (u_{i}^{k - 1}, x_{[i - 1]}^{n}, y^{n}) \\ (A 5) & \times \sqrt{p_{U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}} (0 | u_{i}^{k - 1}, x_{[i - 1]}^{n}, y^{n}) p_{U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}} (1 | u_{i}^{k - 1}, x_{[i - 1]}^{n}, y^{n})} d y^{n} \\ (A 6) & = Z^{[p_{X_{I}}; q_{Y | X_{I}}]} (U_{i, k} | U_{i}^{k - 1}, X_{[i - 1]}^{n}, Y^{n}) \end{matrix}

where (A4) follows from (30). In addition, it follows from (27) and (30) that

\begin{matrix} P \{{{\hat{U}}_{i, k} \neq U_{i, k}} \cap {{\hat{U}}_{i}^{k - 1} = U_{i}^{k - 1}} \cap {{\hat{X}}_{[i - 1]}^{n} = X_{[i - 1]}^{n}}\} = 0 \end{matrix}

(A7)

for each

i \in I

and each

k \in J_{i}^{c}

. Combining (A3), (A6) and (A7), we obtain (33).

Appendix B. Proof of Lemma 3

Let

q_{Y | X}

be the conditional distribution that characterizes the AWGN channel and fix a

γ \in [0, 1)

. Recall the definitions of

p_{X}^{'}

and

s_{X}

in (52) and (54) respectively and recall that

Φ_{X}

is the cdf of

s_{X}

. Fix a sufficiently large

n \geq 36

that satisfies

\frac{1}{n^{(1 - γ) / β}} \in (0, \frac{1 + P}{2}]

(A8)

and

Φ_{X}^{- 1} (\frac{1}{n^{1 - (1 - γ) / β}}) \leq - 1 .

(A9)

In addition, recall the definition of

X

in (55) and let

g : R \to X

be a quantization function such that,

g (t) ≜ Φ_{X}^{- 1} (\frac{ℓ}{n})

(A10)

where

ℓ \in {1, 2, \dots, \frac{n}{2}, \dots n - 1}

is the unique integer that satisfies

\begin{matrix} \{\begin{matrix} Φ_{X}^{- 1} (\frac{ℓ}{n}) \leq t < Φ_{X}^{- 1} (\frac{ℓ + 1}{n}) & if t \geq 0, \\ Φ_{X}^{- 1} (\frac{ℓ - 1}{n}) < t \leq Φ_{X}^{- 1} (\frac{ℓ}{n}) & if t < 0 . \end{matrix} \end{matrix}

(A11)

In words, g quantizes every

a \in R

to its nearest point in

X

whose magnitude is smaller than

| a |

. Let

\hat{X} ≜ g (X)

(A12)

be the quantized version of X. By construction,

P_{s_{X}} \{| \hat{X} | \leq | X |\} = 1,

(A13)

P_{s_{X}} \{|Φ_{X} (\hat{X}) - Φ_{X} (X)| \leq \frac{1}{n}\} = 1

(A14)

and

P_{s_{X}} \{X \in (Φ_{X}^{- 1} (\frac{ℓ}{n}), Φ_{X}^{- 1} (\frac{ℓ + 1}{n}))\} = \frac{1}{n}

(A15)

for all

ℓ \in {0, 1, \dots, n - 1}

. It follows from (A15) and the definition of

p_{X}^{'}

in (52) that

\begin{matrix} E_{p_{X}^{'}} [X^{2}] = E_{s_{X}} [{\hat{X}}^{2}] \end{matrix}

(A16)

and

\begin{matrix} P_{p_{X^{n}}^{'}} \{\frac{1}{n} \sum_{k = 1}^{n} X_{k}^{2} > P\} = P_{s_{X^{n}}} \{\frac{1}{n} \sum_{k = 1}^{n} {\hat{X}}_{k}^{2} > P\} \end{matrix}

(A17)

where

s_{X^{n}} (x^{n}) ≜ \prod_{k = 1}^{n} s_{X} (x_{k})

. Consequently, in order to show (56) and (57), it suffices to show

E_{s_{X}} [{\hat{X}}^{2}] \leq P (1 - \frac{1}{n^{(1 - γ) / β}})

(A18)

and

\begin{matrix} P_{s_{X^{n}}} \{\frac{1}{n} \sum_{k = 1}^{n} {\hat{X}}_{k}^{2} > P\} \leq \frac{e^{3}}{e^{n^{\frac{1}{2} - \frac{1 - γ}{β}}}} \end{matrix}

(A19)

respectively. Using (A13) and the definition of

s_{X}

in (54), we obtain (A18). In order to show (A19), we consider the following chain of inequalities:

\begin{matrix} P_{s_{X^{n}}} \{\frac{1}{n} \sum_{k = 1}^{n} {\hat{X}}_{k}^{2} > P\} \\ (A 20) & \leq P_{s_{X^{n}}} \{\frac{1}{n} \sum_{k = 1}^{n} X_{k}^{2} > P\} \\ (A 21) & = P_{s_{X^{n}}} \{\frac{\sum_{k = 1}^{n} X_{k}^{2}}{\sqrt{n} P (1 - \frac{1}{n^{(1 - γ) / β}})} > \sqrt{n} + \frac{n^{\frac{1}{2} - \frac{1 - γ}{β}}}{1 - \frac{1}{n^{(1 - γ) / β}}}\} \\ (A 22) & \leq P_{s_{X^{n}}} \{e^{\frac{\sum_{k = 1}^{n} X_{k}^{2}}{\sqrt{n} P (1 - \frac{1}{n^{(1 - γ) / β}})}} > e^{(\sqrt{n} + n^{\frac{1}{2} - \frac{1 - γ}{β}})}\} \\ (A 23) & \leq {(1 - \frac{1}{\sqrt{n} / 2})}^{- n / 2} e^{- (\sqrt{n} + n^{\frac{1}{2} - \frac{1 - γ}{β}})} \\ (A 24) & = {(1 + \frac{1}{\sqrt{n} / 2 - 1})}^{n / 2} e^{- (\sqrt{n} + n^{\frac{1}{2} - \frac{1 - γ}{β}})} \\ (A 25) & \leq e^{\frac{n}{\sqrt{n} - 2}} \cdot e^{- (\sqrt{n} + n^{\frac{1}{2} - \frac{1 - γ}{β}})} \\ (A 26) & = e^{\frac{2 \sqrt{n}}{\sqrt{n} - 2}} \cdot e^{- n^{\frac{1}{2} - \frac{1 - γ}{β}}} \\ (A 27) & \leq e^{3} \cdot e^{- n^{\frac{1}{2} - \frac{1 - γ}{β}}} \end{matrix}

where

(A20) is due to (A13).
(A23) is due to Markov’s inequality.
(A25) is due to the fact that ${(1 + \frac{1}{t})}^{t} \leq e$ for all $t > 0$ .
(A27) is due to the assumption that $\sqrt{n} \geq 6$ .

Equation (58) remains to be shown. To this end, we let

q_{Z}

denote the distribution of the standard normal random variable (cf. (1)) and consider

\begin{matrix} C (P - \frac{1}{n^{(1 - γ) / β}}) - I_{p_{X}^{'} q_{Y | X}} (X; Y) \\ (A 28) & = C (P - \frac{1}{n^{(1 - γ) / β}}) - I_{p_{X}^{'} q_{Z}} (X; X + Z) \\ (A 29) & = I_{s_{X} q_{Z}} (X; X + Z) - I_{p_{X}^{'} q_{Z}} (X; X + Z) \end{matrix}

where (A29) is due to the definition of

s_{X}

in (54). In order to simplify the right hand side of (A29), we invoke Corollary 4 of [20] and obtain

\begin{matrix} I_{s_{X} q_{Z}} (X; X + Z) - I_{p_{X}^{'} q_{Z}} (X; X + Z) \leq (\log e) (3 \sqrt{1 + P} + 4 \sqrt{P}) W_{2} (s_{Y}, p_{Y}^{'}) . \end{matrix}

(A30)

After some tedious calculations which will be elaborated after this proof, it can be shown that the Wasserstein distance in (A30) satisfies

W_{2} (s_{Y}, p_{Y}^{'}) \leq \sqrt{\frac{κ \log n}{n^{\frac{2 (1 - γ)}{β}}}},

(A31)

where

κ ≜ P^{2} + 4 P + \frac{4 P}{\log e} (1 + \log \sqrt{\frac{P}{2 π}}) .

(A32)

On the other hand, since

\frac{\log (1 + P - ξ)}{\log e} \geq \frac{\log (1 + P)}{\log e} - \frac{ξ}{1 + P} - \frac{2 ξ^{2}}{{(1 + P)}^{2}}

(A33)

for each

ξ \in (0, \frac{1 + P}{2}]

by Taylor’s theorem and

\frac{1}{n^{(1 - γ) / β}} \in (0, \frac{1 + P}{2}]

by (A8), we have

C (P - \frac{1}{n^{\frac{1 - γ}{β}}}) \geq C (P) - \frac{\log e}{2} (\frac{1}{n^{\frac{1 - γ}{β}} (1 + P)} + \frac{2}{n^{\frac{2 (1 - γ)}{β}} {(1 + P)}^{2}}) .

(A34)

Using (A29), (A30), (A31) and (A34), we obtain

\begin{matrix} C (P) - I_{p_{X}^{'} q_{Y | X}} (X; Y) \\ \leq (\log e) (3 \sqrt{1 + P} + 4 \sqrt{P}) \sqrt{\frac{κ \log n}{n^{2 (1 - γ) / β}}} + \frac{\log e}{2} (\frac{1}{n^{\frac{1 - γ}{β}} (1 + P)} + \frac{2}{n^{\frac{2 (1 - γ)}{β}} {(1 + P)}^{2}}) . \end{matrix}

(A35)

Consequently, (58) holds for some constant

t^{'} > 0

that does not depend on n.

Derivation of (A31)

Consider the distribution (coupling)

r_{X, \hat{X}, Y, \hat{Y}}

defined as

r_{X, \hat{X}, Y, \hat{Y}} (x, \hat{x}, y, \hat{y}) = s_{X} (x) q_{Z} (y - x) 1 {\hat{x} = g (x)} 1 {\hat{y} = g (x) + y - x}

(A36)

and simplify the Wasserstein distance in (A30) as follows:

\begin{matrix} (A 37) & {(W_{2} (s_{Y}, p_{Y}^{'}))}^{2} & \leq \int_{R} \int_{R} r_{Y, \hat{Y}} (y, \hat{y}) {(y - \hat{y})}^{2} d \hat{y} d y \\ (A 38) & = \int_{R} \int_{R} r_{X, \hat{X}} (x, \hat{x}) {(x - \hat{x})}^{2} d \hat{x} d x \\ (A 39) & = \int_{R} s_{X} (x) {(x - g (x))}^{2} d x \end{matrix}

where

(A37) follows from the definition of $W_{2}$ in (6) and the fact due to (A36) that $r_{Y} = s_{Y}$ and $r_{\hat{Y}} = p_{Y}^{'}$ .
(A38) follows from the fact due to (A36) that $P_{r_{X, \hat{X}, Y, \hat{Y}}} {Y - \hat{Y} = X - \hat{X}} = 1$ .
(A39) is due to (A36).

Following (A39), we define

ξ_{n}

to be the positive number that satisfies

\begin{matrix} \int_{ξ_{n}}^{\infty} s_{X} (x) d x = \frac{1}{n^{1 - (1 - γ) / β}} \end{matrix}

(A40)

and consider

\begin{matrix} \int_{R} s_{X} (x) {(x - g (x))}^{2} d x \\ (A 41) & = \int_{- \infty}^{- ξ_{n}} s_{X} (x) {(x - g (x))}^{2} d x + \int_{- ξ_{n}}^{ξ_{n}} s_{X} (x) {(x - g (x))}^{2} d x + \int_{ξ_{n}}^{\infty} s_{X} (x) {(x - g (x))}^{2} d x \\ (A 42) & \leq \int_{- \infty}^{- ξ_{n}} s_{X} (x) x^{2} d x + \int_{- ξ_{n}}^{ξ_{n}} s_{X} (x) {(x - g (x))}^{2} d x + \int_{ξ_{n}}^{\infty} s_{X} (x) x^{2} d x \\ (A 43) & = 2 \int_{ξ_{n}}^{\infty} s_{X} (x) x^{2} d x + \int_{- ξ_{n}}^{ξ_{n}} s_{X} (x) {(x - g (x))}^{2} d x \end{matrix}

where (A42) follows from the fact due to (A10) that

x - g (x) \geq 0

for all

x \in R

. In order to bound the first term in (A43), we let

P_{n} ≜ P (1 - \frac{1}{n^{(1 - γ) / β}})

(A44)

and consider

\begin{matrix} (A 45) & \int_{ξ_{n}}^{\infty} s_{X} (x) x^{2} d x & = P_{n} (\int_{ξ_{n}}^{\infty} s_{X} (x) d x + ξ_{n} s_{X} (ξ_{n})) \\ (A 46) & < \int_{ξ_{n}}^{\infty} s_{X} (x) d x (2 P_{n} + ξ_{n}^{2}) \\ (A 47) & < \frac{1}{n^{1 - (1 - γ) / β}} (2 P + ξ_{n}^{2}) \end{matrix}

where

(A45) follows from integration by parts.
(A46) is due to the simple fact that

$\begin{matrix} (A 48) & \int_{ξ_{n}}^{\infty} s_{X} (x) d x & > \frac{ξ_{n}^{2}}{P_{n} + ξ_{n}^{2}} \int_{ξ_{n}}^{\infty} (1 + \frac{P_{n}}{x^{2}}) s_{X} (x) d x \\ (A 49) & = (\frac{P_{n}}{P_{n} + ξ_{n}^{2}}) ξ_{n} s_{X} (ξ_{n}) . \end{matrix}$
(A47) is due to (A40) and (A44).

In order to bound the term in (A47), we note that

ξ_{n} \geq 1

(A50)

by (A9) and (A40) and would like to obtain an upper bound on

ξ_{n}

through the following chain of inequalities:

\begin{matrix} (A 51) & \frac{1}{n^{1 - (1 - γ) / β}} & = \int_{ξ_{n}}^{\infty} s_{X} (x) d x \\ (A 52) & \leq \int_{ξ_{n}}^{\infty} x s_{X} (x) d x \\ (A 53) & = P_{n} s_{X} (ξ_{n}) \\ (A 54) & = \sqrt{\frac{P_{n}}{2 π}} e^{- \frac{ξ_{n}^{2}}{2 P_{n}}} \\ (A 55) & < \sqrt{\frac{P}{2 π}} e^{- \frac{ξ_{n}^{2}}{2 P}} \end{matrix}

where

(A51) is due to (A40).
(A52) is due to (A50).

Since

ξ_{n}^{2} \leq \frac{2 P}{\log e} ((1 - \frac{1 - γ}{β}) \log n + \log \sqrt{\frac{P}{2 π}})

(A56)

by (A55) and

β = 4.714 > 3

, it follows from (A47) that

\begin{matrix} (A 57) & \int_{ξ_{n}}^{\infty} s_{X} (x) x^{2} d x & < \frac{1}{n^{1 - \frac{1 - γ}{β}}} (2 P + \frac{2 P}{\log e} ((1 - \frac{1 - γ}{β}) \log n + \log \sqrt{\frac{P}{2 π}})) \\ (A 58) & < \frac{\log n}{n^{\frac{2 (1 - γ)}{β}}} (2 P + \frac{2 P}{\log e} (1 + \log \sqrt{\frac{P}{2 π}})) . \end{matrix}

In order to bound the second term in (A43), we consider

\begin{matrix} \int_{- ξ_{n}}^{ξ_{n}} s_{X} (x) {(x - g (x))}^{2} d x \\ (A 59) & = \int_{- ξ_{n}}^{ξ_{n}} s_{X} (x) {(Φ^{- 1} (Φ (x)) - Φ^{- 1} (Φ (g (x))))}^{2} d x \\ (A 60) & \leq \int_{- ξ_{n}}^{ξ_{n}} s_{X} (x) {(\frac{1}{n} \times \frac{1}{s_{X} (ξ_{n})})}^{2} d x \\ (A 61) & \leq \int_{- ξ_{n}}^{ξ_{n}} s_{X} (x) (\frac{P_{n}^{2}}{n^{\frac{2 (1 - γ)}{β}}}) d x \\ (A 62) & < \frac{P^{2}}{n^{\frac{2 (1 - γ)}{β}}} \end{matrix}

where

(A60) is due to (A14), the mean value theorem and the fact that the derivative of Φ is always positive and uniformly bounded below by $s_{X} (ξ_{n})$ on the interval $[- ξ_{n}, ξ_{n}]$ .
(A61) is due to (A53).
Combining (A39), (A43), (A58) and (A62) and recalling the definition of κ in (A32), we obtain (A31).

References

Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley and Sons Inc.: Hoboken, NJ, USA, 2006. [Google Scholar]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Arıkan, E. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. Inf. Theory 2009, 55, 1–23. [Google Scholar]
Mondelli, M.; Hassani, S.H.; Urbanke, R. Unified scaling of polar codes: Error exponent, scaling exponent, moderate deviations, and error floors. IEEE Trans. Inf. Theory 2016, 62, 6698–6712. [Google Scholar] [CrossRef]
Hassani, S.H.; Alishahi, K.; Urbanke, R. Finite-length scaling for polar codes. IEEE Trans. Inf. Theory 2014, 60, 5875–5898. [Google Scholar] [CrossRef]
Goldin, D.; Burshtein, D. Improved Bounds on the Finite Length Scaling of Polar Codes. IEEE Trans. Inf. Theory 2014, 60, 6966–6978. [Google Scholar] [CrossRef]
Fong, S.L.; Tan, V.Y.F. On the Scaling Exponent of Polar Codes for Binary-Input Energy-Harvesting Channels. IEEE J. Sel. Areas Commun. 2016, 34, 3540–3551. [Google Scholar] [CrossRef]
Mahdavifar, H. Fast Polarization and Finite-Length Scaling for Non-Stationary Channels. arXiv, 2016; arXiv:1611.04203. [Google Scholar]
Şaşoğlu, E.; Telatar, I.; Arıkan, E. Polarization of arbitrary discrete memoryless channels. In Proceedings of the 2009 IEEE Information Theory Workshop, Taormina, Italy, 11–16 October 2009; pp. 114–118. [Google Scholar]
Sutter, D.; Renes, J.M.; Dupuis, F.; Renner, R. Achieving the capacity of any DMC using only polar codes. In Proceedings of the 2012 IEEE Information Theory Workshop, Lausanne, Switzerland, 3–7 September 2012; pp. 114–118. [Google Scholar]
Honda, J.; Yamamoto, H. Polar coding without alphabet extension for asymmetric models. IEEE Trans. Inf. Theory 2013, 59, 7829–7838. [Google Scholar] [CrossRef]
Mondelli, M.; Urbanke, R.; Hassani, S.H. How to Achieve the Capacity of Asymmetric Channels. In Proceedings of the 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 30 September–3 October 2014; pp. 789–796. [Google Scholar]
Abbe, E.; Barron, A. Polar coding schemes for the AWGN channel. In Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings, St. Petersburg, Russia, 31 July–5 August 2011; pp. 194–198. [Google Scholar]
Abbe, E.; Telatar, E. Polar Codes for the m-User Multiple Access Channel. IEEE Trans. Inf. Theory 2012, 58, 5437–5448. [Google Scholar] [CrossRef]
Altuğ, Y.; Wagner, A.B. Moderate Deviations in Channel Coding. IEEE Trans. Inf. Theory 2014, 20, 4417–4426. [Google Scholar]
Mahdavifar, H.; El-Khamy, M.; Lee, J.; Kang, I. Achieving the Uniform Rate Region of General Multiple Access Channels by Polar Coding. IEEE Trans. Commun. 2016, 64, 467–478. [Google Scholar] [CrossRef]
Polyanskiy, Y.; Poor, H.V.; Verdú, S. Channel coding rate in the finite blocklength regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
Strassen, V. Asymptotische Abschätzungen in Shannons Informationstheorie. Trans. Third Prague Conf. Inf. Theory 1962, 689–723. [Google Scholar]
Hayashi, M. Information spectrum approach to second-order coding rate in channel coding. IEEE Trans. Inf. Theory 2009, 55, 4947–4966. [Google Scholar] [CrossRef]
Polyanskiy, Y.; Wu, Y. Wasserstein Continuity of Entropy and Outer Bounds for Interference Channels. IEEE Trans. Inf. Theory 2016, 62, 3992–4002. [Google Scholar] [CrossRef]

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fong, S.L.; Tan, V.Y.F. Scaling Exponent and Moderate Deviations Asymptotics of Polar Codes for the AWGN Channel. Entropy 2017, 19, 364. https://doi.org/10.3390/e19070364

AMA Style

Fong SL, Tan VYF. Scaling Exponent and Moderate Deviations Asymptotics of Polar Codes for the AWGN Channel. Entropy. 2017; 19(7):364. https://doi.org/10.3390/e19070364

Chicago/Turabian Style

Fong, Silas L., and Vincent Y. F. Tan. 2017. "Scaling Exponent and Moderate Deviations Asymptotics of Polar Codes for the AWGN Channel" Entropy 19, no. 7: 364. https://doi.org/10.3390/e19070364

APA Style

Fong, S. L., & Tan, V. Y. F. (2017). Scaling Exponent and Moderate Deviations Asymptotics of Polar Codes for the AWGN Channel. Entropy, 19(7), 364. https://doi.org/10.3390/e19070364

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Scaling Exponent and Moderate Deviations Asymptotics of Polar Codes for the AWGN Channel

Abstract

1. Introduction

1.1. The Additive White Gaussian Noise Channel

1.2. Polar Codes

1.3. Paper Outline

1.4. Notation

2. Background: Point-to-Point Channels and Existing Polarization Results

2.1. Point-to-Point Memoryless Channels

2.2. Polarization for Binary-Input Memoryless Channels

3. Problem Formulation of Binary-Input MACs and New Polarization Results

3.1. Binary-Input Multiple Access Channels

3.2. Polarization for Binary-Input MACs

3.3. Polar Codes That Achieve the Symmetric Sum-Capacity of a Binary-Input MAC

4. Problem Formulation of the AWGN Channel and New Polarization Results

4.1. The AWGN Channel

4.2. Uniform-Input Polar Codes for the AWGN Channel

5. Scaling Exponents and Main Result

5.1. Scaling Exponent of Uniform-Input Polar Codes for MACs

5.2. Scaling Exponent of Uniform-Input Polar Codes for the AWGN Channel

6. Moderate Deviations Regime

6.1. Polar Codes That Achieve the Symmetric Capacity of a BMC

6.2. Polar Codes that Achieve the Symmetric Sum-Capacity of a Binary-Input MAC

6.3. Uniform-Input Polar Codes for the AWGN Channel

7. Concluding Remarks

Author Contributions

Conflicts of Interest

Appendix A. Proof of Proposition 1

Appendix B. Proof of Lemma 3

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI