Scaling Exponent and Moderate Deviations Asymptotics of Polar Codes for the AWGN Channel

This paper investigates polar codes for the additive white Gaussian noise (AWGN) channel. The scaling exponent $\mu$ of polar codes for a memoryless channel $q_{Y|X}$ with capacity $I(q_{Y|X})$ characterizes the closest gap between the capacity and non-asymptotic achievable rates in the following way: For a fixed $\varepsilon \in (0, 1)$, the gap between the capacity $I(q_{Y|X})$ and the maximum non-asymptotic rate $R_n^*$ achieved by a length-$n$ polar code with average error probability $\varepsilon$ scales as $n^{-1/\mu}$, i.e., $I(q_{Y|X})-R_n^* = \Theta(n^{-1/\mu})$. It is well known that the scaling exponent $\mu$ for any binary-input memoryless channel (BMC) with $I(q_{Y|X})\in(0,1)$ is bounded above by $4.714$, which was shown by an explicit construction of polar codes. Our main result shows that $4.714$ remains to be a valid upper bound on the scaling exponent for the AWGN channel. Our proof technique involves the following two ideas: (i) The capacity of the AWGN channel can be achieved within a gap of $O(n^{-1/\mu}\sqrt{\log n})$ by using an input alphabet consisting of $n$ constellations and restricting the input distribution to be uniform; (ii) The capacity of a multiple access channel (MAC) with an input alphabet consisting of $n$ constellations can be achieved within a gap of $O(n^{-1/\mu}\log n)$ by using a superposition of $\log n$ binary-input polar codes. In addition, we investigate the performance of polar codes in the moderate deviations regime where both the gap to capacity and the error probability vanish as $n$ grows. An explicit construction of polar codes is proposed to obey a certain tradeoff between the gap to capacity and the decay rate of the error probability for the AWGN channel.


A. The Additive White Gaussian Noise Channel
This paper investigates low-complexity codes over the classical additive white Gaussian noise (AWGN) channel [1,Ch. 9], where a source wants to transmit information to a destination and each received symbol is the sum of the transmitted symbol and an independent Gaussian random variable. More specifically, if X k denotes the symbol transmitted by the source in the k th time slot, then the corresponding symbol received by the destination is where Z k is the standard normal random variable. When the transmission lasts for n time slots, i.e., each transmitted codeword consists of n symbols, it is assumed that Z 1 , Z 2 , . . . , Z n are independent and each transmitted codeword x n (x 1 , x 2 , . . . , x n ) must satisfy the peak power constraint 1 n n k=1 x 2 k ≤ P (2) where P > 0 is a constant which denotes the permissible power. If we would like to transmit a uniformly distributed message W ∈ {1, 2, . . . , ⌈2 nR ⌉} across this channel, it was shown by Shannon [2] that the limit of the maximum coding rate R as n approaches infinity (i.e., capacity) is

B. Polar Codes
Although the capacity of a memoryless channel was proved by Shannon [2] in 1948, low-complexity channel codes that achieve the capacity have not been found until Arıkan [3] proposed to use polar codes with encoding and decoding complexities being O(n log n) for achieving the capacity of a binary-input memoryless symmetric channel (BMSC). This paper investigates the scaling exponent of polar codes [4] for the AWGN channel, a ubiquitous channel model in wireless communications.
The scaling exponent µ of polar codes for a memoryless channel q Y |X with capacity characterizes the closest gap between the channel capacity and non-asymptotic achievable rates in the following way: For a fixed ε ∈ (0, 1), the gap between the capacity I(q Y |X ) and the maximum non-asymptotic rate R * n achieved by a length-n polar code with average error probability ε scales as n −1/µ , i.e., I(q Y |X ) − R * n = Θ(n −1/µ ). It has been shown in [4]- [6] that the scaling exponent µ for any BMSC with I(q Y |X ) ∈ (0, 1) lies between 3.579 and 4.714, where the upper bound 4.714 was shown by an explicit construction of polar codes. Indeed, the upper bound 4.714 remains valid for any general binary-input memoryless channel (BMC) [7,Lemma 4]. It is well known that polar codes are capacity-achieving for BMCs [8]- [11], and appropriately chosen ones are also capacity-achieving for the AWGN channel [12]. In particular, for any R < C(P ) and any β < 1/2, polar codes operated at rate R can be constructed for the AWGN channel such that the decay rate of the error probability is O(2 −n β ) [12] and the encoding and decoding complexities are O(n log n). However, the scaling exponent of polar codes for the AWGN channel has not been investigated yet.
In this paper, we construct polar codes for the AWGN channel and show that 4.714 remains to be a valid upper bound on the scaling exponent. Our construction of polar codes involves the following two ideas: (i) By using an input alphabet consisting of n constellations and restricting the input distribution to be uniform as suggested in [12], we can achieve the capacity of the AWGN channel within a gap of O(n −1/µ √ log n); (ii) By using a superposition of log n binary-input polar codes 1 as suggested in [13], we can achieve the capacity of the corresponding multiple access channel (MAC) within a gap of O(n −1/µ log n) where the input alphabet of the MAC has n constellations (i.e., the size of the Cartesian product of the input alphabets corresponding to the log n input terminals is n). The encoding and decoding complexities of our constructed polar codes are O(n log 2 n). On the other hand, the lower bound 3.579 holds trivially for the constructed polar codes because the polar codes are constructed by superposing log n binary-input polar codes whose scaling exponents are bounded below by 3.579 [5].
In addition, Mondelli et al. [4,Sec. IV] provided an explicit construction of polar codes for any BMSC which obey a certain tradeoff between the gap to capacity and the decay rate of the error probability. More specifically, if the gap to capacity is set to vanish at a rate of Θ n − 1−γ µ for some γ ∈ 1 1+µ , 1 , then a length-n polar code can be constructed such that the error where h 2 : [0, 1/2] → [0, 1] denotes the binary entropy function. This tradeoff was developed under the moderate deviations regime [14] where both the gap to capacity and the error probability vanish as n grows.
For the AWGN channel, we develop a similar tradeoff under the moderate deviations regime by using our constructed polar codes described above.

C. Paper Outline
This paper is organized as follows. The notation used in this paper is described in the next subsection.

D. Notation
The set of natural numbers, real numbers and non-negative real numbers are denoted by N, R and R + respectively. For any sets A and B and any mapping f : A → B, we let f −1 (D) denote the set {a ∈ A |f (a) ∈ D} for any D ⊆ B. We let 1{E} be the indicator function of the set E. An arbitrary (discrete or continuous) random variable is denoted by an upper-case letter (e.g., X), and the realization and the alphabet of the random variable are denoted by the corresponding lower-case letter (e.g., x) and calligraphic letter (e.g., X ) respectively. We use X n to denote the random tuple (X 1 , X 2 , . . . , X n ) where each X k has the same alphabet X . We will take all logarithms to base 2 throughout this paper.
The following notations are used for any arbitrary random variables X and Y and any real-valued function g with domain X .
We let p Y |X and p X,Y = p X p Y |X denote the conditional probability distribution of Y given X and the probability distribution of (X, Y ) respectively. We let p X,Y (x, y) and p Y |X (y|x) be the evaluations of p X,Y and p Y |X respectively at (X, Y ) = (x, y). To make the dependence on the distribution explicit, we let P pX {g(X) ∈ A} denote X p X (x)1{g(x) ∈ A} dx for any set A ⊆ R.
The expectation of g(X) is denoted as E pX [g(X)]. For any (X, Y, Z) distributed according to some p X,Y,Z , the entropy of X and the conditional mutual information between X and Y given Z are denoted by H pX (X) and I pX,Y,Z (X; Y |Z) respectively.
For simplicity, we sometimes omit the subscript of a notation if it causes no confusion. The relative entropy between p X and q X is denoted by The 2-Wasserstein distance between p X and p Y is denoted by We let N ( · ; µ, σ 2 ) : R → [0, ∞) denote the probability density function of a Gaussian random variable whose mean and variance are µ and σ 2 respectively, i.e., II. BACKGROUND: POINT-TO-POINT CHANNELS AND EXISTING POLARIZATION RESULTS In this section, we will review important polarization results related to the scaling exponent of polar codes for binary-input memoryless channels (BMCs).

A. Point-to-Point Memoryless Channels
Consider a point-to-point channel which consists of one source and one destination, denoted by s and d respectively. Suppose node s transmits information to node d in n time slots. Before any transmission begins, node s chooses message W destined for node d, where W is uniformly distributed over the alphabet which consists of M elements. For each k ∈ {1, 2, . . . , n}, node s transmits X k ∈ X based on W and node d receives Y k ∈ Y in time slot k where X and Y denote respectively the input and output alphabets of the channel. After n time slots, node d declaresŴ to be the transmitted W based on Y n . Formally, we define a length-n code as follows.
Definition 1: An (n, M )-code consists of the following: 1) A message set W as defined in (8). Message W is uniform on W.
2) An encoding function f k : W → X for each k ∈ {1, 2, . . . , n}, where f k is used by node s for encoding X k such that 3) A decoding function ϕ : Y n → W used by node d for producing the message estimateŴ = ϕ(Y n ).

Definition 2:
The point-to-point memoryless channel is characterized by an input alphabet X , an output alphabet Y and a conditional distribution q Y |X such that the following holds for any (n, M )-code: For each k ∈ {1, 2, . . . , n}, p W,X n ,Y n = for all x k ∈ X and y k ∈ Y. For any (n, M )-code defined on the point-to-point memoryless channel, let p W,X n ,Y n ,Ŵ be the joint distribution induced by the code. By Definitions 1 and 2, we can factorize p W,X n ,Y n ,Ŵ as

B. Polarization for Binary-Input Memoryless Channels
Definition 3: A point-to-point memoryless channel characterized by q Y |X is called a binary-input memoryless channel (BMC) We follow the formulation of polar coding in [10]. Consider any BMC characterized by q Y |X . Let p X be the probability distribution of a Bernoulli random variable X, and let p X n be the distribution of n independent copies of X ∼ p X , i.e., p X n (x n ) = n k=1 p X (x k ) for all x n ∈ X n . For each n = 2 m where m ∈ N, the polarization mapping of a length-n polar code is given by where ⊗ denotes the Kronecker power. Define p U n |X n such that where the addition and product operations are performed over GF(2), define for each k ∈ {1, 2, . . . , n} and each (x k , y k ) ∈ X × Y where q Y |X characterizes the BMC (cf. (2)), and define In addition, for each k ∈ {1, 2, . . . , n}, define the Bhattacharyya parameter associated with time k as where the distributions in (13) and (14) are marginal distributions of p U n ,X n ,Y n defined in (12). The following result is based on [4, Sec. III] and has been used in [7] to show that 4.714 is an upper bound on the scaling exponent for any BMC. To simplify notation, let β 4.714 (15) in the rest of this paper.

Lemma 1 ( [4, Sec. III], [7, Lemma 2]):
There exists a universal constant t > 0 such that the following holds. Fix any BMC characterized by q Y |X and any p X . Then for any m ∈ N and n 2 m , we have 2

III. PROBLEM FORMULATION OF BINARY-INPUT MACS AND NEW POLARIZATION RESULTS
Polar codes have been proposed and investigated for achieving any rate tuple inside the capacity region of a binary-input multiple access channel (MAC) [13,15]. The goal of this section is to use the polar codes proposed in [13] to achieve the 2 This lemma remains to hold if the quantities 1 n 4 are replaced by 1 n ν for any ν > 0. The main result of this paper continues to hold if the quantities 1 n 4 in this lemma are replaced by 1 n ν for any ν > 2.
symmetric sum-capacity of a binary-input MAC.

A. Binary-Input Multiple Access Channels
Consider a MAC [1, Sec. 15.3] which consists of N sources and one destination. Let I {1, 2, . . . , N } be the index set of the N sources and let d denote the destination. Suppose the sources transmit information to node d in n time slots. Before any transmission begins, node i chooses message W i destined for node d for each i ∈ I, where W i is uniformly distributed over which consists of M i elements. For each k ∈ {1, 2, . . . , n}, node i transmits X i,k ∈ X i based on W i for each i ∈ I and node d receives Y k ∈ Y in time slot k where X i denotes the input alphabet for node i and Y denotes the output alphabet. After n time slots, node d declaresŴ i to be the transmitted W i based on Y n for each i ∈ I.
To simplify notation, we use the following convention for any T ⊆ I. For any random tuple (X 1 , X 2 , . . . , X N ), we let be the corresponding subtuple, whose realization and alphabet are denoted by x T and X T respectively.
Similarly, for each k ∈ {1, 2, . . . , n} and each random tuple (X 1,k , X 2,k , . . . , X N,k ) ∈ X I , we let X T,k (X i,k : i ∈ T ) denote the corresponding random subtuple, and let x T,k and X T,k denote respectively the realization and the alphabet of X T,k . Formally, we define a length-n code for the binary-input MAC as follows.
3) A decoding function ϕ MAC : Y n → W I used by node d for producing the message estimateŴ I = ϕ MAC (Y n ).

Definition 5:
The multiple access channel (MAC) is characterized by N input alphabets specified by X I , an output alphabet specified by Y and a conditional distribution q Y |XI such that the following holds for any (n, M I )-code: For each k ∈ {1, 2, . . . , n}, where p Y k |X I,k (y k |x I,k ) = q Y |XI (y k |x I,k ) for all x I,k ∈ X I and y k ∈ Y.

B. Polarization for Binary-Input MACs
Consider any binary-input MAC characterized by q Y |XI . For each i ∈ I, let p Xi be the probability distribution of a Bernoulli random variable X i , and let p X n i be the distribution of n independent copies of X i ∼ p Xi , i.e., p X n i (x n i ) = n k=1 p Xi (x i,k ) for all x n i ∈ X n i . Recall the polarization mapping G n defined in (10). For each i ∈ I, define p U n i |X n i such that where the addition and product operations are performed over GF (2), and define In addition, for each i ∈ I and each k ∈ {1, 2, . . . , n}, define [i − 1] {1, 2, . . . , i − 1} and define the Bhattacharyya parameter associated with node i and time k as where the distributions in (21) are marginal distributions of p U n I ,X n I ,Y n defined in (20). The following lemma is a direct consequence of Lemma 1.

Lemma 2:
There exists a universal constant t > 0 such that the following holds. Fix any binary-input MAC characterized by q Y |XI and any p XI . Then for any m ∈ N and n 2 m , we have 3 for each i ∈ I.
,Y |Xi as the conditional distribution that characterizes a BMC. The lemma then follows directly from Lemma 1.

C. Polar Codes That Achieve the Symmetric Sum-Capacity of a Binary-Input MAC
Throughout this paper, let p * Xi denote the uniform distribution on {0, 1} for each i ∈ I and define p * XI i∈I p * Xi , i.e., for any x I ∈ {0, 1} N .
Definition 7: For a binary-input MAC characterized by q Y |XI , the symmetric sum-capacity is defined to be The following definition summarizes the polar codes for the binary-input MAC proposed in [13, Sec. IV].
consists of the following: 1) An index set for information bits transmitted by node i denoted by J i for each i ∈ I. The set J c i is referred to as the index set for frozen bits transmitted by node i.
for all u i,Ji ∈ {0, 1} |Ji| , where the bits are transmitted through the polarized channels indexed by J i . For each i ∈ I and be the frozen bit to be transmitted by node i in time slot k. After U n i has been determined, node i transmits X n i where and (X 1,1 , . . . ,X 1,n ), (X 2,1 , . . . ,X 2,n ), . . . , (X N,1 , . . . ,X N,n ) are produced as follows. For each i ∈ I and each k = After obtainingÛ n i , node d constructs the estimate of X n i through computing and declares thatŴ Remark 1: By inspecting Definition 4 and Definition 8, we see that every (n, J I , b I,J c I )-polar code is also an n, (2 |J1| , 2 |J2| , . . . , 2 |JN | ) -code.

Definition 9:
The uniform-input (n, J I )-polar code is defined as an (n, J I , B I,J c I )-polar code where B I,J c I consists of i.i.d. uniform bits that are independent of the message W I .
Definition 10: For the uniform-input (n, J I )-polar code defined for the MAC, the probability of decoding error is defined as where the error is averaged over the random messages and the frozen bits. The code is also called a uniform-input (n, J I , ε)-polar code if the probability of decoding error is no larger than ε.
The following proposition bounds the error probability in terms of Bhattacharyya parameters, and it is a generalization of the well-known result for the special case N = 1 (e.g., see [3,Proposition 2]). The proof of Proposition 3 can be deduced from [13, Sec. IV], and is contained in Appendix A for completeness.

Proposition 3:
For the uniform-input (n, J I )-polar code defined for the MAC q Y |XI , we have The following proposition follows from combining Lemma 2, Definition 7 and Proposition 3.

Proposition 4:
There exists a universal constant t > 0 such that the following holds. Fix any N -source binary-input MAC characterized by q Y |XI . Fix any m ∈ N, let n = 2 m and define for each i ∈ I where p * XI is the uniform distribution as defined in (23) and the superscript "SE" stands for "scaling exponent". Then, the corresponding uniform-input (n, J SE I )-polar code satisfies and Proof: Let t > 0 be the universal constant specified in Lemma 2 and fix an n. For each i ∈ I, it follows from Lemma 2 and Proposition 3 that and June 9, 2017 DRAFT for the uniform-input (n, J SE I )-polar code. Since p * XI = N i=1 p * Xi , it follows that holds for each i ∈ I, which implies that Consequently, (31) follows from (33), (36) and Definition 7, and (32) follows from (34).
Remark 2: Proposition 4 shows that the sum-capacity of a binary-input MAC with N sources can be achieved within a gap of O(N n −1/β ) by using a superposition of N binary-input polar codes.

A. The AWGN Channel
It is well known that appropriately designed polar codes are capacity-achieving for the AWGN channel [12]. The main contribution of this paper is proving an upper bound on the scaling exponent of polar codes for the AWGN channel by using uniform-input polar codes for binary-input MACs described in Definition 8. The following two definitions formally define the AWGN channel and length-n codes for the channel.
Definition 11: An (n, M, P )-code is an (n, M )-code described in Definition 1 subject to the additional assumptions that X = R and the peak power constraint is satisfied.

Definition 12:
The AWGN channel is a point-to-point memoryless channel described in Definition 2 subject to the additional assumption that Y = R and q Y |X (y|x) = N (y; x, 1) for all x ∈ R and y ∈ R.
Definition 13: For an (n, M, P )-code defined on the AWGN channel, we can calculate according to (9) the average probability of error defined as P Ŵ = W . We call an (n, M, P )-code with average probability of error no larger than ε an (n, M, P, ε)code.

B. Uniform-Input Polar Codes for the AWGN Channel
Recall that we would like to use uniform-input polar codes for binary-input MACs described in Definition 8 to achieve the capacity of the AWGN channel, i.e., C(P ) in (3). The following definition describes the basic structure of such uniform-input polar codes.
where R ∪ {0 − } can be viewed as a line with 2 origins. 4 We index each element of A by a unique length-m binary tuple In addition, W I is uniform on W I . We view the uniform-input (n, J I )-polar code as an n, (2 |J1| , 2 |J2| , . . . , 2 |JN | ) -code (cf. Remark 1) and let {f MAC i,k | i ∈ I, k ∈ {1, 2, . . . , n}} and ϕ MAC denote the corresponding set of encoding functions and the decoding function respectively (cf. Definition 4).

4) An encoding function
for each k ∈ {1, 2, . . . , n}, where f k is used for encoding W I into X k such that Note that both the encoded symbols 0 and 0 − in A result in the same transmitted symbol 0 ∈ R according to (42). By construction, f 1 (W I ), f 2 (W I ), . . ., f n (W I ) are i.i.d. random variables that are uniformly distributed on A and hence X 1 , X 2 , . . ., X n are i.i.d. real-valued random variables (but not necessarily uniform).

5) A decoding function
such thatŴ Remark 3: For an (n, J I , P, A) avg -polar code, the flexibility of allowing A to contain 2 origins is crucial to proving the main result of this paper. This is because the input distribution which we will use to establish scaling results for the AWGN channel in Theorem 1 can be viewed as the uniform distribution over some set that contains 2 origins, although the input distribution in the real domain as specified in (48) to follow is not uniform. 4 Introducing the symbol 0 − allows us to create a set of cardinality n which consists of n − 2 non-zero real numbers and 2 origins 0 and 0 − Proposition 5: There exists a universal constant t > 0 such that the following holds. Suppose we are given an (n, J SE I , P, A) avgpolar code defined for the AWGN channel q Y |X with a 2-origin A (i.e., A ⊇ {0, 0 − }). Define X A \ {0 − } ⊂ R where X contains 1 origin and n − 2 non-zero real numbers. Then, the (n, J SE I , P, A) avg -polar code is an (n, M )-code (cf. Definition 1) which satisfies and for all x n ∈ X n where p ′ X is the distribution on X defined as Proof: The proposition follows from inspecting Proposition 4 and Definition 14 with the identifications N = log n and The following lemma, a strengthened version of [12, Th. 6], provides a construction of a good A which leads to a controllable gap between C(P ) and I p ′ X q Y |X (X; Y ) for the corresponding (n, J SE I , P, A) avg -polar code. Although the following lemma is intuitive, the proof is technical and hence relegated to Appendix B. define for all x ∈ R, define Φ X to be the cdf of s X , and define Note that X contains 1 origin and n − 2 non-zero real numbers, and we let p ′ X be the distribution on X as defined in (48). In addition, define the distribution p ′ X n (x n ) n k=1 p ′ X (x k ). Then, there exists a constant t ′ > 0 that depends on P and γ but not n such that the following statements hold for each n ∈ N: and A shortcoming of Proposition 5 is that the (n, J SE I , P, A) avg -polar code may not satisfy the peak power constraint (37) and hence it may not qualify as an (n, 2 Definition 11). Therefore, we describe in the following definition a slight modification of an (n, J SE I , P, A) avg -polar code so that the modified polar code always satisfies the peak power constraint (37).  and Then, the 0-power-outage version of the (n, J I , P, A) avg -polar code is an (n, J I , P, A, ε 1 + ε 2 ) peak -polar code that satisfies the peak power constraint (37).

A. Scaling Exponent of Uniform-Input Polar Codes for MACs
We define scaling exponent of uniform-input polar codes for the binary-input MAC as follows.   [17] and [18]) that the optimal scaling exponent (optimized over all codes) for any non-degenerate DMC (as well as BMC) is equal to 2 for all ε ∈ (0, 1/2).
Using Proposition 3 and Definition 16, we obtain the following corollary, which shows that 4.714, the upper bound on µ PC-BMC ε in (57) for BMCs, remains to be a valid upper bound on the scaling exponent for binary-input MACs.

B. Scaling Exponent of Uniform-Input Polar Codes for the AWGN channel
Definition 17: Fix a P > 0 and an ε ∈ (0, 1). The scaling exponent of uniform-input polar codes for the AWGN channel is defined as |Ji| n n = 2 m , there exists a uniform-input (n, J I , P, A, ε) peak -polar code Definition 17 formalizes the notion that we are seeking the smallest µ ≥ 0 such that C(P ) − R AWGN |Ji| n denotes the rate of an (n, J I , P, ε) peak -polar code. We note from [16,Th. 54] and [18,Th. 5] that the optimal scaling exponent of the optimal code for the AWGN channel is equal to 2 for any ε ∈ (0, 1/2). The following theorem is the main result of this paper, which shows that 4.714 is a valid upper bound on the scaling exponent of polar codes for the AWGN channel.
Theorem 1: Fix any P > 0 and any ε ∈ (0, 1). There exists a constant t * > 0 that does not depend on n such that the following holds. For any n = 2 m where m ∈ N, there exists an A such that the corresponding (n, J SE I , P, A) peak -polar code defined for the AWGN channel q Y |X satisfies and In particular, we have Proof: Fix a P > 0, an ε ∈ (0, 1) and an n = 2 m where m ∈ N. Combining Proposition 5 and Lemma 6, we conclude that there exist a constant t * > 0 that does not depend on n and an A such that the corresponding (n, J SE I , P, A) avg -polar code defined for the AWGN channel q Y |X satisfies (58) and Using (61), (62) and Corollary 7, we conclude that the (n, J SE I , P, A) avg -polar code is an (n, J SE I , P, A) peak -polar code that satisfies (58) and (59). Since log n n 3 + e 3 e n for all sufficiently large n, it follows from (58), (59) and Definition 17 that (60) holds.

A. Polar Codes That Achieve the Symmetric Capacity of a BMC
The following result is based on [4, Sec. IV], which developed a tradeoff between the gap to capacity and the decay rate of the error probability for a BMC under the moderate deviations regime [14] where both the gap to capacity and the error probability vanish as n grows.

B. Polar Codes that Achieve the Symmetric Sum-Capacity of a Binary-Input MAC
The following lemma, whose proof is omitted because it is analogous to the proof of Lemma 2, is a direct consequence of Lemma 9.
Lemma 10: There exists a universal constant t MD > 0 such that the following holds. Fix any γ ∈ 1 1+β , 1 and any binaryinput MAC characterized by q Y |XI . Recall that p * XI = i∈I p * Xi . Then for any n = 2 m where m ∈ N, we have for each i ∈ I.
Combining Lemma 10, Definition 7 and Proposition 3, we obtain the following proposition, whose proof is analogous to the proof of Proposition 4 and hence omitted.
Proposition 11: There exists a universal constant t MD > 0 such that the following holds. Fix any γ ∈ 1 1+β , 1 and any N -source binary-input MAC characterized by q Y |XI . In addition, fix any m ∈ N, let n = 2 m and define for each i ∈ I where the superscript "MD" stands for "moderate deviations". Then, the corresponding uniform-input (n, J MD I )polar code described in Definition 9 satisfies and

C. Uniform-Input Polar Codes for the AWGN Channel
Proposition 12: There exists a universal constant t MD > 0 such that the following holds. Fix any γ ∈ and for all x n ∈ X n where p ′ X is the distribution on X as defined in (48).
Proof: The proposition follows from inspecting Proposition 11 and Definition 14 with the identifications N = log n and The following theorem develops the tradeoff between the gap to capacity and the decay rate of the error probability for (n, J MD I , P, A) peak -polar codes defined for the AWGN channel.
Theorem 2: Fix a γ ∈ 1 1+β , 1 . There exists a constant t * MD > 0 that depends on P and γ but not n such that the following holds for any n = 2 m where m ∈ N. There exists an (n, J MD I , P, A, ε) peak -polar code defined for the AWGN channel q Y |X that satisfies and Proof: By Proposition 12 and Lemma 6, there exists a constant t * MD > 0 that depends on P and γ but not n such that for any n = 2 m where m ∈ N, there exist an A and the corresponding (n, J MD I , P, A) avg -polar code that satisfies (72), and It remains to show (73). Using (74), (75) and Corollary 7, we conclude that the (n, J MD I , P, A) avg -polar code is an (n, J MD I , P, A, ε) peakpolar code that satisfies where the inequality follows from the fact that h 2 (x) ≥ 2x for all x ∈ [0, 1/2]. This concludes the proof.

Remark 4:
A candidate of A in Theorem 2 can been explicitly constructed according to Lemma 6 by the identification

VII. CONCLUDING REMARKS
In this paper, we provided an upper bound on the scaling exponent of polar codes for the AWGN channel (Theorem 1). In addition, we have shown in Theorem 2 a moderate deviations result -namely, the existence of polar codes which obey a certain tradeoff between the gap to capacity and the decay rate of the error probability for the AWGN channel.
Since the encoding and decoding complexities of the binary-input polar code for a BMC are O(n log n) as long as we allow pseudorandom numbers to be shared between the encoder and the decoder for encoding and decoding the randomized frozen bits (e.g., see Unless specified otherwise, all the probabilities in this proof are evaluated according to the distribution induced by the uniforminput (n, J I )-polar code. Consider where (79) is due to the fact by Definition 8 that For each i ∈ I and each k ∈ J i , we have where (81) follows from (27). In addition, it follows from (25) and (27) that for each i ∈ I and each k ∈ J c i . Combining (80), (83) and (84), we obtain (29).

APPENDIX B
PROOF OF LEMMA 6 Let q Y |X be the conditional distribution that characterizes the AWGN channel and fix a γ ∈ [0, 1). Recall the definitions of p ′ X and s X in (48) and (50) respectively and recall that Φ X is the cdf of s X . Fix a sufficiently large n ≥ 36 that satisfies and In addition, recall the definition of X in (51) and let g : R → X be a quantization function such that where ℓ ∈ {1, 2, . . . , n 2 , . . . n − 1} is the unique integer that satisfies In words, g quantizes every a ∈ R to its nearest point in X whose magnitude is smaller than |a|. Let be the quantized version of X. By construction, and for all ℓ ∈ {0, 1, . . . , n − 1}. It follows from (92) and the definition of p ′ X in (48) that and where s X n (x n ) n k=1 s X (x k ). Consequently, in order to show (52) and (53), it suffices to show and respectively. Using (90) and the definition of s X in (50), we obtain (95). In order to show (96), we consider the following chain of inequalities: It remains to show (54). To this end, we let q Z denote the distribution of the standard normal random variable (cf. (1)) and consider = I sX qZ (X; X + Z) − I p ′ X qZ (X; X + Z) where (106) is due to the definition of s X in (50). In order to simplify the RHS of (106), we invoke [19,Corollary 4] and obtain I sX qZ (X; X + Z) − I p ′ X qZ (X; X + Z) ≤ (log e)(3 After some tedious calculations which will be elaborated after this proof, it can be shown that the Wasserstein distance in (107) satisfies where κ P 2 + 4P + 4P log e 1 + log P 2π .
In order to bound the term in (124), we note that by (86) and (117) and would like to obtain an upper bound on ξ n through the following chain of inequalities: In order to bound the second term in (120), we consider ξn −ξn where • (137) is due to (91), the mean value theorem and the fact that the derivative of Φ is always positive and uniformly bounded below by s X (ξ n ) on the interval [−ξ n , ξ n ].