How to Construct Polar Codes for Ring-LWE-Based Public Key Encryption

There exists a natural trade-off in public key encryption (PKE) schemes based on ring learning with errors (RLWE), namely: we would like a wider error distribution to increase the security, but it comes at the cost of an increased decryption failure rate (DFR). A straightforward solution to this problem is the error-correcting code, which is commonly used in communication systems and already appears in some RLWE-based proposals. However, applying error-correcting codes to those cryptographic schemes is far from simply installing an add-on. Firstly, the residue error term derived by decryption has correlated coefficients, whereas most prevalent error-correcting codes with remarkable error tolerance assume the channel noise to be independent and memoryless. This explains why only simple error-correcting methods are used in existing RLWE-based PKE schemes. Secondly, the residue error term has correlated coefficients leaving accurate DFR estimation challenging even for uncoded plaintext. It can be found in the literature that a tighter DFR estimation can effectively create a DFR margin. Thirdly, most error-correcting codes are not well designed for safety considerations, e.g., syndrome decoding has a nonconstant time nature. A code good at error correcting might be weak under a variety of attacks. In this work, we propose a polar coding scheme for RLWE-based PKE. A relaxed “independence” assumption is used to derive an uncorrelated residue noise term, and a wireless communication strategy, outage, is used to construct polar codes. Furthermore, some knowledge about the residue noise is exploited to improve the decoding performance. With the parameterization of NewHope Round 2, the proposed scheme creates a considerable DRF margin, which gives a competitive security improvement compared to state-of-the-art benchmarks. Specifically, the security is improved by 28.8%, while a DFR of 2−149 is achieved a for code rate pf 0.25, n=1024,q= 12,289, and binomial parameter k=55. Moreover, polar encoding and decoding have a quasilinear complexity O(Nlog2N) and intrinsically support constant-time implementations.


Error-Correcting for Ring-LWE-Based Public Key Encryption
The ring LWE (RLWE) problem was firstly introduced in 2010 [1], expanding on the classical version of the problem (i.e., LWE) introduced by Regev in [2]. Key establishment mechanisms based on RLWE, for example NewHope [3], are among the most attractive postquantum proposals. Their quantum security relies on the worst-case approximate shortest independent vector problem (SIVP), and they give better efficiency compared to plain LWE because of the ring structure. One topic of pressing importance is to refine such schemes for better efficiency and security. In this work, we focus on the issue of error correcting for RLWE-based public key encryption.
The key establishment based on RLWE is differentiated into two approaches regarding how to share the secret information. One is the "reconciliation-based approach" proposed given based on an "independence" assumption claiming that the correlation between the coefficients of the residue noise e · t − s · e + e is negligible. However, these were actually not, and the soft decision decoding of LDPC assumes i.i.d. channels. The dependency among the noise coefficients is obvious in the vector representation of e · t − s · e + e , i.e., We have to be careful about the "independence" assumption: the assumption will overestimate (underestimate) DFR for schemes without (with) error-correction, and therefore underestimate (overestimate) the security. This "independence" assumption was relaxed by D'Anvers et al. in [16]. Specifically, the i-th coefficient of the noise term e · t − s · e + e is refined in the form of c T s + g where vector c is essentially determined by polynomials e, e , vector s by s, t, and scalar g by the ith coefficient of e . They assumed c T s + g to be i.i.d. conditional on the l 2 -norm of c and s. The DFR of LAC is interpreted as a weighted DFR averaged over all possible values of s , c . The ternary error terms in LAC make the calculation tractable. However, for a more general ring-(module)-LWE-based encryption with error terms drawn over Z, calculating the marginal distribution Pr{ s } and Pr{ c } is no longer trivial. In their prior work [17], they gave another assumption, namely the "Gaussian" assumption, to ease the calculation.
Song et al. interpreted NewHope as a digital communication system in [18]. At the transmitter's end, binary message m ∈ {0, 1} 256 is encoded as a codeword enc(m) by repeating m n/256 times. Then, enc(m) is modulated as a vector in {0, q/2 } n . At the receiver's end, upon receiving v = e · t − s · e + e + q/2 · enc(m), the additive threshold decoder calculates v i = ∑ n/256−1 l=0 v i+256l for i = 0, 1, · · · 255 and recovers m by hard decision decoding. To analyze the DFR, one needs to take into account two types of dependencies in the noise term: (a) the dependency between the coefficients of v conveying the same message bit of m, i.e., v i+256l for l = 0, 1, · · · , n/256 − 1; (b) the dependency between the n/4 coefficients of v . In [18], v i was elegantly written in the form of v i = ∑ 511 j=0 W i,j + ∑ n/256−1 l=0 n i+256l as was the sum of 512 i.i.d. random variables W i,j and n/256 i.i.d. random variables n i+256l for any fixed i. Therefore, the first-type dependency was addressed. As for the second type, Song et al. proved the error term v i to be identically distributed for any i = 0, 1, · · · , n/4, and therefore gave a union bound on the DFR. Consequently, a tighter upper bound on the DFR is derived, which is less than 2 −418 for n = 1024 and 2 −399 for n = 512 (The NewHope submission claims to have an upper bound on DFR to be 2 −216 for n = 1024 and 2 −213 for n = 512). The improved DFR margin enhances the security level without any changes to the original protocol.
The motivation of this work was to investigate how to handle the dependency of RLWE-based PKE and how to adapt modern error-correcting codes to it. We sought a security improvement using the derived DFR margin. A concurrent work can be found in [19], where canonical embedding was employed to derive i.i.d. fading channels with channel state information (CSI) available to the recipient and polar codes were constructed. However, in reality, we do not expect to engagein canonical embedding because we can: (a) spare ourselves the trouble of switching between the canonical and polynomial representation; (b) avoid the error tolerance loss due to the tailored constellation diagram as [19] illustrated; (c) make the overall scheme comply with the most popular and practical RLWE-based PKE framework where we only deal with integers on the interval [0, q).

Contribution
The contribution of this paper is as follows.

1.
We formulated the RLWE-based PKE as an i.i.d. mod 2Z additive Gaussian noise channel with channel state information (CSI) available to the receiver under a relaxed "independence" assumption; (a) Given the residue noise term e · t − s · e + e , we formulated the RLWE-based PKE as a mod 2Z additive Gaussian noise channel within exactly one code block. We assumed the mod 2Z additive Gaussian channel to be independent under a relaxed assumption compared to the one in [15]; Alice, the decoder, can considerably improve the DFR by exploiting the advantage that the polynomials e and s are generated on her side and she can figure out the precise distribution of the Gaussian noise;

2.
We employed a telecommunication-engineering strategy, namely outage, to construct polar codes for RLWE-based PKE. The encoding and decoding routines allow quasilinear (i.e., (N log 2 N)) and constant-time implementations. Experimental results and theoretical estimation of DFR are also given. Specifically, we derived a new DFR of 2 −149 by SC decoding for NewHope parameters q = 12,289, n = 1024 and code rate = 0.25 and a larger central binomial parameter k = 55. The DFR margin enabled us to improve the security by 28.8% while keeping the target DFR of 2 −140 (as is the benchmark in the work of [15,18]) achievable.

Roadmap
This paper is organized as follows. A review of the ring-LWE-based public key encryption and some basics of channel models and polar codes can be found in Section 2. The problem formulation and methodology are introduced in Section 3. In Section 3.1, we explain how to formulate a typical RLWE-based PKE scheme as a mod 2Z channel with additive Gaussian noise. A relaxed "independence" assumption is used to derive i.i.d. channels. We explain the soundness of the proposed scheme in Section 3.2 and demonstrate how to construct and decode polar codes explicitly in Section 3.3. In Section 4, we analyze the DFR theoretically and experimentally when polar decoding (SC decoding) is applied. We, in Section 5, discuss the security improvement, the constant-time implementation, and communication overhead increase by polar encoding and decoding. We conclude this paper in Section 6.

Ring-LWE Public Key Encryption Scheme
The public key encryption scheme based on ring-LWE was first described in [20] and formally defined in a subsequent work [21]. We use the "informal" definition of ring-LWE given in [20], as it then became the most prevalent version in implementations, e.g., NewHope [22] and Peikert's KEM [5]. The scheme is parameterized by an integer modulus q, dimension n, a power of two, and a ring of integers R := Z[X] x n + 1 and its quotient ring R q := R/qR. We define an error distribution χ over R. We take the example of NewHope and define sampling from χ to be sampling each coefficient of a polynomial in R from a discrete Gaussian over Z. The scheme proceeds as follows: • Alice firstly samples a ∈ R q uniformly at random, then she samples a secret key s together with an error e according to χ. She publishes as the public key a ring-LWE sample (a, b) = (a, a · s + e mod q) ∈ R q × R q ; • Bob encrypts a message m ∈ {0, 1} n as (c 1 , c 2 ) = (a · t + e mod q, b · t + e + q 2 · m mod q), where e , e , t are sampled independently from χ; • Alice decrypts using s by computing d := c 2 − c 1 · s = q 2 · m + e · t − s · e + e . Alice then recovers the message m by decoding: if the ith coordinate of d is closer to zero than q/2 , Alice assumes the ith coordinate of m was zero, otherwise she assumes it was one. We observe a few key facts about this scheme that we need for our work. Firstly, although its formal security proof may be found in [21], the main idea is that b, c 1 , and c 2 leak no information about the secret s and the plaintext m because they are ring-LWE samples, which are assumed to be pseudorandom by the hardness of the ring-LWE decision problem. Therefore, one could alternate the encoding term q 2 · m without affecting security, as long as the encoding is independent of the actualization of the variables s, e, e , e , t. We use this fact implicitly while constructing polar codes in the sequel. Secondly, we observe that Alice knows the actualization of s and e, and so may use these for decoding.

Channel Models
In wireless communications, the additive white Gaussian noise (AWGN) channel is the most primary and frequently used model to characterize how noises interfere with the channel input. A typical discrete-time AWGN channel is defined as: where x i ∈ R is the channel input, y i ∈ R is the channel output, and z i is an additive white Gaussian noise, and there are N time slots in total. Ideally, these variables are independent in different time slots indicated by subscript i. A fading channel arises due to a time-varying attenuation of signal quality caused by either the propagation environment or by the movement of the transmitter/receiver. We consider a fading channel model W as: where h i is the channel gain and z i is additive white Gaussian noise. Denote by T c the coherence interval of a fading channel W. In the context of a fading channel with memory, the channel gain h i is believed to be a constant within one coherence interval and varies independently as the next coherence interval approaches. The realization of h i is called channel state information (CSI), and the distribution of h i is called channel distribution information (CDI). In the special case of T c = 1, channel W is referred to as an identically independently distributed (i.i.d.) fading channel. The design and performance of errorcorrecting codes for i.i.d. fading channels with/without CSI is well studied [23][24][25][26][27].
How to design x i to reliably transmit information at the highest rate via a specific channel has been widely and comprehensively studied over the past decades. A branch of this study is to construct capacity-achieving lattice codes for an AWGN channel and its fading variants [28][29][30][31]. At the transmitter's end, lattice coding maps binary codes to a constellation diagram in Euclidean space, called lattice modulation. At the recipient's end, the decoder recovers the binary codes by the bounded distance decoding or preferably maximum likelihood decoding for better performance. This leads to the definition of mod Λ channel and Λ/Λ channel where Λ is a lattice and Λ is a sublattice of Λ. We omit the formal definition here, but give an example of a mod Z channel and a Z/2Z channel, which will be used in Section 3.1.

Example 1.
A mod Z channel is an additive white Gaussian noise (AWGN) channel with input restricted to a ∈ V (Z) where V (Z) is the fundamental region (A fundamental region of a lattice Λ is a region that includes one and only one point from each coset of Λ in R n . Algebraically, V (Λ) is a set of coset representatives for all the cosets of Λ in R n , e.g., we can define V (Z) to be [0, 1), but not necessarily to be the fundamental Voronoi cell [−0.5, 0.5).) of Z. At the receiver's end, there is a mod V (Z) operation giving the equivalent channel output as: where n is the AWGN noise and n = n mod Z. Example 2. A Z/2Z channel is an AWGN channel with input restricted to r ∈ (Z + a) ∩ V (2Z) for some offset a ∈ R. At the receiver's end, the equivalent channel output is: It can be viewed as a mod 2Z channel with input restricted to a set of elements of Z + a that fall in V (2Z).

Polar Codes for BDMS Channels
Polar codes, introduced by Arıkan in [32], are linear block codes of length N = 2 n for a positive integer n that achieves the capacity of any binary input discrete memoryless symmetric (BDMS) channels asymptotically (In fact, the generalizations of polar codes are extended to arbitrary code length and a large class of channels.). We firstly recall some basics of polar coding for a BDMS channel. Given a BDMS channel W, there are two commonly used metrics in information theory to measure the quality of W: the mutual information (The maximum mutual information over all possible channel input distributions is the channel capacity.) and the reliability.
Definition 1 (Mutual information of BDMS channels). The mutual information I(W) of a channel W is the maximum rate at which information can be successfully transmitted from the transmitter to the receiver. For a BDMS channel W : X → Y, I(W) ∈ [0, 1] is defined as: .
Here, we use the definition of symmetric mutual information assuming a uniform channel input, which is also the capacity of the BDMS channel. We use the notations I(W) and I(Y; X) interchangeably to denote the mutual information of W. A small Z(W) indicates a more reliable channel, while a large Z(W) implies a channel with more inferences.
The capacity-achieving nature of polar codes arises from the so-called channel polarization phenomenon as a result of recursive applications of Arıkan's transform to identical Ws and their synthesized derivatives. The overall recursive transform can be performed in a channel-combining phase and a channel-splitting phase. In the channelcombining phase, a linear transformation defined as X 1: By taking X 1:N as the raw input of W, one derives a combined channel W N : X 1:N → Y 1:N with a transition probability of: where (·) i denotes i-th coordinate. Since G N induces a one-to-one mapping between U 1:N and X 1:N , the mutual information of W N is: In the channel-splitting phase, W N is further split back into N synthesized channels W (i) N : X → Y N × X i−1 whose transition probability is defined by: We now demonstrate how to perform Arıkan's transform. We begin with the transform on two i.i.d. BDMS channels W : {0, 1} → Y as shown in Figure 1. Let X 1:2 = (X (1) , X (2) ) ∈ {0, 1} 2 be the raw input vector of two W and X 1:2 = (Y (1) , Y (2) ) ∈ Y 2 be the raw channel output vector. Denote by U 1:2 = (X (1) , X (2) ) ∈ {0, 1} 2 the message vector. The symbol ⊕ indicates a mod-2 operation.  At the channel-combining stage, the message vector U 1:2 is transformed into X 1:2 = The two parallel Ws are seen as a combination chan- Since there exists a bijection between U 1:2 and X 1:2 , the transition probability of W 2 is: (2) ).
The channel capacity of W 2 and W satisfies: At the channel-splitting stage, the combination channel W 2 is split into two synthesized channels W (1) takes U (1) as the only input and gives Y (1) , Y (2) as the output. As for channel W (2) takes U (2) as the only channel input and gives Y (1) , Y (2) and U (1) as the channel output.
The channel transition probabilities of W (1)
Note that the equalities (a) (b) are derived because U (1) , U (2) are i.i.d. and they are uniformly distributed over {0, 1}. More generally, a proposition follows to show the relation between (W It was proven in [32] that Arıkan's transform preserves the mutual information in the sense that: More importantly, the quality of the synthesized channels polarizes asymptotically as the recursion proceeds.
Theorem 1 (Channel polarization of mutual information [32]). For any BDMS channel W, the synthesized channels W (i) N polarize in the sense that, for any fixed δ ∈ (0, 1), as N goes to infinity through powers of two, the fraction of indices i ∈ {1, · · · , N} for which I(W goes to I(W) and the fraction for which I(W The channel polarization theorem from above can also be stated in the metric of the Bhattacharyya parameter by replacing I(W For any desired transmission rate R < I(W), we can partition {1, · · · , N} into a subset A and its complement A C such that (i) |A| = NR and (ii) for any i ∈ A and j ∈ A C , Z(W A C ). Given the most reliable NR channels indexed by A, one can construct polar codes following the encoding rule: where U A is the useful information vector of length NR and U A C is a predetermined vector, named frozen bits, known to both the encoder and decoder, e.g., U A C = 0. In this manner, the useful information is transmitted via the most reliable synthesized channels.
A question may arise about how to efficiently calculate Z(W (i) N ). A brief review can be found in Sections 2.4 and 3.3. As a high-level description, calculating Z(W (i) N ) according to Definition 2 for a BDMS channel with a large or even continuous output alphabet is not easy because the output alphabet of the synthesized channel W (i) N increases exponentially with a factor of log 2 N. One solution to handle this problem is to firstly construct an approximate channel W of W using a degrading/upgrading technique such that W has a countable output alphabet of a size no greater than µ and only minor and traceable capacity loss [33]. Then, one applies Arıkan's transform recursively to W , deriving synthesized channels as Proposition 1 indicates. At each recursion, one applies a merging technique to approximate the synthesized channels such that the approximation is stochastically degraded with the original one and has an output alphabet no greater than a predetermined value (e.g., ν) [34].
In this way, one can finally derive an approximation of W (i) N with an output alphabet no larger than ν and negligible capacity difference. Now, one is able carry out the encoding as in Formula (10).
The successive cancellation (SC) decoder is the initial decoding algorithm for polar codes. It gives an estimation of u (i) , the i-th coordinate of U 1:N , in the natural order of i. Given a polar code parameterized by code length N, information set A, and frozen bits U A C , one can derive the recovered messageū (i) of u (i) in sequential order of index i according to the decoding rule specified as: whereū 1:i−1 is the estimation of u 1:i−1 recovered beforeū (i) and L (i) N (y 1:N ,ū 1:i−1 ) is the likelihood ratio function defined as: The computational complexity of SC decoding, as is dominated by the recursive Denote by P e the average probability of block decoding error. As a result of polar encoding and SC decoding, it was proven in [32] that P e is upper bounded as follows.
Theorem 2 (Decoding performance [32]). For any BDMS channel W and any choices of parameter (N, R, A),

Channel Degradation and Upgradation
The construction of polar codes can be addressed if all the Bhattacharyya parameters of synthesized channels can be efficiently calculated. In [32], an efficient solution to compute Z(W (i) N ) for binary erasure channels (BEC) was given, while it was suggested to use the Monte Carlo method to deal with more general BDMS channels. R. Mori and T. Tanaka made an attempt to solve this problem for arbitrary binary input memoryless symmetric (BMS) channels using the density evolution [35][36][37] of belief propagation (BP) decoding. However, they also mentioned that it was unclear how to handle the computational efficiency when the code length N was large and the requirement for precision was high. In [33], a quantization method was proposed to construct a degraded and upgraded approximation of a general BMS channel. If the degraded or upgraded relation exists, one Definition 3 (Degraded and upgraded channel [33]). A channel Q : X → Z is (stochastically) degraded with respect to a channel W : X → Y if there exists a channel P : Y → Z such that: for all z ∈ Z and x ∈ X . We denote by Q W the relation that Q is degraded with respect to W. Conversely, we denote by Q W the relation that Q is upgraded with respect to W if there exists a channel Q : X → Z and a channel P : Z → Y such that: Moreover, the synthesized channels of Q, W, Q under Arıkan's transform also fulfill the channel degradation and upgradation relation. If the channel degradation or upgradation relation is set up, their channel capacity, reliability, and error probability will be related as follows.

Lemma 2 ([33]
). Let W be a BMS channel, and suppose there exists another channel Q such that Q W. Then: The inequality will reverse if we replace "degraded" by "upgraded".

RLWE-Based PKE Channel Model with Outage
In the field of telecommunication, a signal outage occurs if the signal power at the receiver's end falls below a threshold, which is related to the minimum signal-to-noise ratio (SNR) acceptable to the communication performance. The outage probability is defined as the probability with which signal outage occurs. The analysis of outage probability is of great importance to estimate fading capacities in a fading environment. A typical example is the outage estimation for fading multiple-input and multiple-output (MIMO) channels [39,40].
We already gave an RLWE-based PKE instance in Section 2. We now consider the problem of decoding the message m given the polynomial: where e · t and s · e are polynomial multiplications in Z[x]/(1 + x n ). It can be written in vector form as: (13) Since the receiver knows matrices E, S and we observe that the norm of each row of E, S stays the same within one code block, the channel model of RLWE-based PKE can be described in a fading channel form as: where m i ∈ {0, 1}, Z i ← N (0, r 2 ) and the channel gain where e i and s i are coefficients of polynomials e and s for i ∈ [n], respectively. Note that we assume the error distribution χ to be a normal distribution N (0, r 2 ) for the convenience of analysis. A similar setting can be found in [41] where χ is defined on R/[0, q).
Independence assumption: Taking a close look at the channel model in Formula (14), we derive a group of n identically distributed channels rather than i.i.d. channels because every Z i is related to every coordinate of t and e . To apply polar codes to the encoding and decoding step, we assume that the correlation between the Z i s are negligible and will not affect the decoding performance, as is a common assumption when applying modern error-correcting codes to RLWE-based PKE [15]. Now, we denote by ∈ (0, 1) the outage probability and denote by H the threshold such that Pr{H > H } = . Unlike in a telecommunication system where the uncertainty of channel gain would introduce difficulties in estimating the outage probability, in our RLWE channel, how the fading behaves is clearly known to the receiver. In the RLWE-based PKE instance in Section 2, both participants of the PKE process know the distribution of H. Moreover, Alice, who plays the role of the receiver in telecommunication, precisely knows the value of H, i.e., the channel state information. Examples of how H is defined can be seen in Figure 2 where = 0.01 and r is the parameter of normal distribution N (0, r 2 ).
The revised public key encryption proceeds as follows: • The key generation step is the same as the RLWE-based PKE instance in Section 2; • At the encryption step, Bob takes the RLWE channel as a mod 2Z additive Gaussian channel (To be precise, it is a q 2 Z/qZ channel with additive Gaussian noise N (0, r 2 H 2 ) or, equivalently, a Z/2Z channel. To ease the notation, we instead use the mod 2Z channel with input restricted to {0,1}. The two channels are statistically equivalent.) with the Gaussian distribution to be N (0, r 2 H 2 ). Then, he constructs polar codes of code length N = n for this channel as described in Section 2.3 and carries out encryption as normal; • At the decryption step, Alice firstly calculates H = 1 + ∑ n 1 e 2 i + ∑ n 1 s 2 i . If H > H , Alice goes back to the key generation step, and the whole process is restarted; otherwise, she decrypts and carries out SC decoding for the mod 2Z channel with additive Gaussian noise N (0, r 2 H 2 ).

The Soundness and Security of the Proposed Scheme
In the above revised RLWE-based PKE scheme, we construct polar codes for a mod 2Z channel with additive noise N (0, r 2 H 2 ), then apply the codes to a mod 2Z channel with additive noise N (0, r 2 H 2 ) where H ≤ H . The soundness is guaranteed by the channel degradation relationship between the two channels. Lemma 3. If σ 1 < σ 2 , the N (0, σ 2 2 ) mod 2Z channel is degraded with respect to the N (0, σ 2 1 ) mod 2Z channel.
Proof. Suppose the channel input is X, and let N 1 ← N (0, σ 2 1 ) and N 2 ← N (0, σ 2 2 ) be additive noises. We also define an auxiliary additive noise denoted by N aux , which is drawn from N (0, σ 2 2 − σ 2 1 ). At the recipient's end, the channel output after the mod 2Z operation is Y = (X + N 2 ) mod 2Z = ((X + N 1 ) mod 2Z + N aux ) mod 2Z. As a result, N 2 mod 2Z can be interpreted as a concatenation of N 1 mod 2Z and N aux mod 2Z. The proof is complete according to the definition of channel degradation as in Section 2.4.
We now have the degradation relation between the channel models Bob and Alice have access to, i.e., N (0, H 2 r 2 ) mod 2Z N (0, H 2 r 2 ) mod 2Z. Recall that Lemma 2 quantitatively shows from what aspect one channel is degraded to the other and Lemma 1 shows that Arıkan's transform preserves the channel degradation relation. Meanwhile, constructing polar codes is performed by selecting the most reliable synthesized channels to convey the message. As a result, the polar code customized for N (0, H 2 r 2 ) mod 2Z is a subcode of the polar codes customized for the channel N (0, H 2 r 2 ) mod 2Z. A similar technique by which one can construct a polar code for a degraded channel and apply it to the channel in reality can be found in [30]. The explicit polar encoding and SC decoding processes are given in Section 3.3. [42]). Consider a public key encryption scheme Π = (Gen, Enc, Dec) and an adversary A; the chosen plaintext attack (CPA) indistinguishability experiment PubK cpa A,Π (n) is defined as follows: 1

Definition 4 (CPA indistinguishability experiment
Gen(1 n ) is run to obtain keys (pk, sk); 2 Adversary A is given pk, as well as oracle access to Enc pk (·). The adversary outputs a pair of messages m 0 , m 1 of the same length (these messages must be in the plaintext space associated with pk); 3 A random bit b ← {0, 1} is chosen, and then, a ciphertext c ← Enc pk (m b ) is computed and given to A. We call c the challenge ciphertext; 4 A continues to have access to Enc pk (·) and outputs a bit b ; 5 The output of the experiment is defined to be 1 if b = b, and 0 otherwise. For properly chosen parameters n, q and error distribution χ (e.g., in NewHope setting n = 512, 1024, q = 12,289; χ is the central binomial of parameter k = 8), RLWE-based PKE is CPA secure assuming the hardness of ring-LWE decision problem, and a concrete CPA-secure protocol was described in [43].

Proposition 2.
The revised RLWE-based PKE in Section 3.1 preserves the CPA security assuming that the standard RLWE-based PKE with properly chosen parameters n, q and χ is CPA secure.

Proof.
A standard RLWE-based PKE scheme Π is CPA secure assuming the hardness of the ring-LWE decision problem, i.e., Pr[PubK cpa A,Π (n) = 1] ≤ 1 2 + negl(n). There are two modifications we made to the standard RLWE-based PKE. Firstly, at the encryption stage, Bob uses polar codes instead of uncoded plaintext. This operation has no influence on the distribution of the ciphertext and therefore preserves the security. Secondly, at the decryption step, Alice first calculates H = 1 + ∑ n 1 e 2 i + ∑ n 1 s 2 i ; then, she decides to repeat the key generation step if and only if H > H . Since the adversary is passive and has no idea if H > H or not, he/she cannot determine if the ciphertext given to him/her is a valid one or not. Therefore, a polynomial-time adversary in the experiment PubK cpa A,Π (n) behaves no better than in the experiment PubK cpa A,Π (n), i.e.,

Polar Encoding and SC Decoding for RLWE Channel Using Outage
In this section, we show how Bob constructs polar codes using outage at the encryption step and how Alice performs decoding at the decryption step. Denote by W : X → Y the N (0, H 2 r 2 ) mod 2Z channel and by W : X → Y its degradation N (0, H 2 r 2 ) mod 2Z channel. Given the channel degradation relationship, one is able to construct polar codes for W and apply it to W in reality. Recall in Section 2.3 that the first step to construct polar codes is to calculate the Bhattacharyya parameters for every synthesized channel W (i) N for i = 1, · · · , N. However, as mentioned in Section 2.3, a practical solution to calculate Z(W (i) N ) is to firstly quantize the continuous output alphabet of W , then construct an approximate channel of the synthesized channel at each recursion of Arıkan's transform [33,34]. This solution proceeds as follows.
We define the likelihood ratio of W as: , y ∈ [0, q). (15) Since N (0, h 2 r 2 ) mod 2Z is stochastically equivalent to 2Z-periodic additive Gaussian noise with variance h 2 r 2 , the transition probability W Y|X is defined as: where g a,b 2 (x) is the density function of the Gaussian noise with mean a and variance b 2 . The channel W is symmetric because there exists a permutation π(y) = ( q 2 − y) mod q such that W (y|0) = W (π(y)| q 2 ). Intuitively, a symmetric channel with binary input and continuous output can be seen as a combination of infinite binary symmetric channels (BSCs). If we focus on the likelihood ratio λ(y) ≥ 1, the crossover probability of any one of these BSCs is 1 λ(y) . The capacity of this BSC is: If we ignore the minor geometrical error introduced by rounding operation · , we observe that the intervals satisfying λ(y) ≥ 1 is: Because C[λ(y)] is a strict monotonic function of λ(y), we divide A into ν segments such that for j ∈ {1, · · · ν}: where h 2 (·) is the entropy function of a Bernoulli random variable. Each A j corresponds to a BSC channel with crossover probability: where: If we define z j and its conjugatez j to be the channel output of the BSC associated with A j , we obtain the quantized output alphabet of W as: Z := {z 1 ,z 1 , z 2 ,z 2 , · · · , z ν ,z ν }.
If we denote by W Q the quantized version of the channel W , the output alphabet of W Q is Z := {z 1 ,z 1 , · · · , z ν ,z ν }. The following lemma claims that W Q is degraded with respect to W .

Lemma 4.
The channel W Q : X → Z is degraded with respect to W .
Proof. We supply an intermediate channel W P : Y → Z such that: We can find that there exits a channel degradation relationship in the sense that:  . As the channel-combining and -splitting processes continue, the alphabet size of the synthesized channels W (i) QN will increase exponentially as the recursion proceeds. To handle this problem, we employed a merging technique proposed in [34], which can reduce the alphabet size of a BDMS channel with negligible and traceable loss of performance. Specifically, a BDMS channel W Q gives rise to BDMS synthesized channels under Arıkan's transform [32]. Any BDMS channel can be seen as a combination of BSCs. The merging technique gives an approximation of a BDMS channel by combing some of the BSCs of which it is comprised. In other words, merging approximates a BDMS channel with less BSCs, therefore a smaller output alphabet. Applying merging to the synthesized channels derived after every recursion of Arıkan's transform can effectively restrict the output alphabet. In this manner, we can approximate the synthesized channels W QN , we can define the information set A and frozen set A c . We construct the polar codewords as: Upon observing the channel output y 1:N , the recipient, Alice, invokes her knowledge of the CSI h and decides to apply the decoding or to restart the protocol. The successive cancellation (SC) decoder calculates the likelihood ratio of every synthesized channel and gives an estimation of u A according to the decision function: where the likelihood ratio L can be calculated recursively by the SC decoding algorithm in [32]. The input of SC decoder λ(y) is given as: where the transition probability W Y|X is defined as: A block-decoding error occurs if u 1:N = u 1:N ; we may interchangeably use the block error probability and DFR in this work. The complexity of both polar encoding and SC decoding is O (N log 2 N). Additionally, both algorithms require constant steps of operations for fixed choices of K, N, A, making constant-time implementations plausible. According to Theorem 2, the block error probability P e (N, K, A)

Results: Decoding Performance Analysis
Theorem 2 gives the upper bound on the decoding error probability (DFR of PKE equivalently) of polar codes constructed for the N (0, r 2 H 2 ) mod 2Z channel and applied to the N (0, H 2 r 2 ) mod 2Z channel in reality. Figure 3 depicts the upper bound on the DFR if polar codes constructed as above are used in our revised RLWE-based PKE. In the standardization process of PQC initialized by NIST, the target DFR at code rate 1/4 is 2 −128 . We targeted a more conservative benchmark DFR = 2 −140 as was used in [15,18]. Similar to NewHope, which employs a central binomial distribution with parameter k to approximate the discrete Gaussian distribution (The variance of central binomial distribution is k/2, and the variance of a discrete Gaussian distribution is r 2 . When calculating the upper bound on the DFR, we used a continuous Gaussian distribution instead of its discrete version to ease the analysis. However, we used the central binomial of the same variance in the experiments in Figure 4), we used the parameter k = 2r 2 to denote different distributions χ from which e, t, s, e , e were drawn. We observed that by using our polar coding scheme, we could achieve the target DFR of 2 −140 for k as large as 55, which is significantly larger than the current choice k = 8 in NewHope. A larger k benefits the security level of the overall scheme. Please note that schemes as NewHope compress the ciphertext before sending it out, which leads to additional compression noise. However, in this work, we only focused on the additive noise in the channel model.
The advantages of the RLWE channel model with outage are concluded as follows. Firstly, we employed an "independence" assumption so that we derived a group of i.i.d. channels. This is actually a relaxed assumption compared to the one in [15]. For example, the polynomial product e · t has correlated coefficients because of the polynomial convolution. However, we resolved the correlation produced by e by seeing it as a constant fading coefficient H over exactly one code block. The correlation left in our channel model only comes from t. Secondly, the decoder is able to exploit the CSI, while the encoder makes use of the knowledge of CDI. This benefits the decoding performance significantly if compared to coding schemes that take the residue additive error term as a whole. Thirdly, the channel degradation relation makes the polar codes constructed for the degraded channel precisely fit in with the real channel. We verify our polar coding scheme in RLWE-based PKE by simulation in Figure 4. The dotted lines are the experimental results of the DFR, and the solid lines are the DFR upper bounds. At least for reasonably large code rates, the simulation results verified our estimation of the upper bounds, whereas the performance at the target code rate 1/4 was unable to be experimentally checked.

Security Improvement
The new DFR margin can be exploited to increase the Gaussian noise parameter r (or the central binomial parameter k = 2r 2 ) such that the security level is increased and the DFR requirement is properly satisfied. In Table 1, we illustrate to what extent the security of RLWE-based PKE was improved for n = 1024, q = 12,289 compared to NewHope Round 2 if different error-correcting codes and schemes are employed. As in [15,18], a conservative target DFR was selected to be 2 −140 . The concrete security analysis of RLWEbased PKE, so far, has been based on the hardness of LWE [22]. The security level was estimated at the cost of primal attack and dual attack (The security estimator is available at https://github.com/tpoeppelmann/newhope (accessed on 3 March 2021)). It was observed that the polar coding scheme described in this work gives significant security improvement compared to the one in the concurrent work using polar codes [19]. We acquired this security gain because we used the original constellation diagram {0, q 2 } rather than the closer and tailored one in [19]. Furthermore, our polar coding scheme gives a security improvement as attractive as the state-of-the-art record of 31.76% in [15], which employed nonconstant-time BCH and LDPC codes.

Constant-Time Implementation
When applying modern error-correcting codes to RLWE-based PKE, we should always be careful if the encoding and decoding enables constant-time implementations. BCH code has a good error correction capability, but its decoding proceeds in two steps: (a) locate the errors by calculating the syndrome, and (b) correct the errors if there are t/2 or fewer errors where t is the code distance. This is obviously not a constant-time design. LDPC code also has nonconstant-time decoding because the decoding procedure is iterative and it comes to an end when either a correct codeword is found or the maximum number of iterations is reached. Unlike the error-correcting codes (e.g., BCH, LDPC) adopted by RLWE-based PKE in the literature [15], the encoding and decoding of polar codes intrinsically enable constant-time implementations.
As for the encoding, one calculates the Bhattacharyya parameters Z(W  (14) is known (i.e., the RLWE PKE parameters n, q, the error distribution χ, and the code rate are known). The encoding step is carried out online, and it consists of exactly N/2 log 2 N many XOR gates. An example of polar encoding for code length N = 8 is given in Figure 5. It can be concluded that the mod-2 additions of polar encoding are only related to the code length N, and therefore, a constanttime implementation is feasible for any fixed N. As for polar decoding, the running time does not vary with different actualization of the message m or error term drawn from χ, as is not the case for BCH and LDPC. Given the RLWE channel output y 1:N derived from decryption, the SC decoder recursively calculates L (i) N (y 1:N , u 1:i−1 ) and recovers the message u 1:N according to Formula (19). The LR calculations dominate the overall complexity of decoding, which is described in Appendix A, as well as an example for code length N = 8. It can be concluded that for any fixed code length N (N = parameter n of RLWE-based PKE), the SC decoding require exactly N * log 2 N steps of LR calculations as in Formulas (A1) and (A2) no matter what other parameters q, χ and the code rate are. In addition, the decision-making step in Formula (19) is also constant-time because the information set A is uniquely determined by the channel model in Formula (14) and the parameters n, q, χ and code rate.

Complexity and Communication Overhead
Compared with the repetition codes in NewHope Round 2 [43], the proposed polar encoding and decoding scheme will for sure significantly increase the complexity. We, in this paper, mainly focused on the DFR performance and security improvement while benchmarks of the proposed scheme are not provided. Nonetheless, seeing that LDPC codes have much higher complexity than polar codes at a relatively low code rate as is explained in Appendix B (also see [44]), polar encoding and SC decoding will incur a much smaller complexity increase compared to that of 650% for LDPC, as given in [15].
Since Alice, the recipient, calculates H = 1 + ∑ n 1 e 2 i + ∑ n 1 s 2 i and goes back to the public key generation step if H > H , the averaged communication overhead is supposed to increase by a percentage of approximately for a relatively small . In this work, we set the outage probability to be a small value of 0.01, incurring a communication overhead increase by approximately 1% on average. Therefore it almost preserves the communication overhead. In addition, the proposed polar coding scheme was designed to address the additive residue noise after decryption rather than the compression noise, and we did not improve the bandwidth efficiency compared to an improvement of 5.9% and 12.8% in [18] and [15], respectively.

Conclusions
In this work, we demonstrated how to construct polar codes for RLWE-based PKE. Theoretical and numerical results were given to verify the proposed coding scheme. The motivation for doing so was to give constructive guidance on how to at least relax the "dependency" and on how to design practical and efficient error-correcting codes to lower the DFR and increase the security of RLWE-based PKE.
The pros and cons of the polar coding scheme using outage are given as follows: • The polar coding scheme using outage considerably improves the error tolerance. It significantly improves the security level (measured by bits of security) of RLWE-based PKE in the NewHope setting by 28.8%, which is as attractive as the highest record in [15]; • The proposed polar coding scheme has lower encoding and decoding complexity at a low code rate compared to other error-correcting schemes in the literature [15]. Furthermore, it intrinsically supports constant-time implementations; • Compared with the polar coding scheme in [19], this scheme is carried out in polynomial representation and uses the original modulation constellation diagram rather than the shrunk one. This avoids the trouble of switching between the polynomial and canonical representation, and the modulation space is not compromised; • Since the standard process of RLWE-based PKE is amended, how it will behave under a variety of attacks is left for future work, and we proved it to be at least CPA secure nonetheless.
In conclusion, using the proposed polar coding scheme in this work, one can derive a new DFR margin and therefore improve the security of a typical RLWE-based PKE scheme (e.g., NewHope). The polar coding scheme will not increase the communication overhead. For a relatively low code rate (e.g., 0.25), polar encoding and decoding are efficient compared to other modern error-correcting codes such as LDPC. Moreover, polar codes support constant-time implementations, whereas other error-correcting codes such as LDPC and BCH do not. Future work will include a solid implementation of the proposed scheme, as well as a specific benchmarking. Besides, the hidden vulnerabilities of the proposed scheme under a variety of attacks will be investigated.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Computational Complexity of SC Decoding
Consider the SC decoding for an arbitrary polar code of length N. To recover u 1:N according to the rules in Formula (11), one needs to calculate the full set of LRs. Let W : X → Y denote a BDMS channel with input X and output Y with transition probability W(Y|X) = P(Y|X). As shown in Proposition 1, the channel polarization transform com-bines N i.i.d. copies of W in a recursive manner such that for any 0 ≤ m ≤ n, M = 2 m , N = 2 n , 1 ≤ κ ≤ M/2, the decoder calculates the LRs at the m-th layer of recursion as: (resp.û 2κ−2 1,e ) represents a subvector of {û (1) , · · · ,û (2κ−2) } with odd (resp. even) indexes (Section VIII, [32]). The stopping condition of the recursion is M/2 (y M M/2+1 ,û 2κ−2 1,e ) at the (m− 1)-th layer. The calculation of N LRs at layer m requires exactly N LR assembling at layer m − 1. One can reversely compute the LRs layer-by-layer until reaching the zeroth layer, which is exactly the LR of raw channel W. Suppose that assembling an LR pair of the (m − 1)-th layer into one LR of the m-th layer takes one complexity unit, then computing all the N LRs of the n-th layer requires N(1 + log 2 N) units in total.
An example of SC decoding for code length N = 8 is given in Figure A1. The SC decoder recursively calculates L
To give a relatively fair comparison of decoding, the complexity can be evaluated by observing the number of addition/subtraction, multiplication, division, comparison, max/min process, and table look-up operations. In general, most of these operations correspond to one equivalent addition, e.g., the product of two LRs can be transferred to the sum of two logarithms as is commonly used in the decoding of both LDPC and polar codes. A comparison operation in most cases corresponds to two equivalent additions, and a look-up operation takes six equivalent additions [44][45][46].
Normally, LDPC has larger decoding complexity than polar codes for small code rates. For both LDPC and polar codes, the basic operation at the core of decoding is the likelihood ratio (LR) calculation or equivalently the LR calculation in the log domain (LLR). Therefore, their complexity units, LR/LLR, are real numbers, and normally, we use their floating-point representations in software implementations and fixed-point on hardware. In Table A1, the decoding complexity of LDPC and polar codes is given where N, R, M denote the code length, code rate, and number of parity bits, respectively. Let L be the list size of polar SCL decoding. Denote by I max the maximum number of iterations of LDPC decoding (sum-product/min-sum algorithm), by d v the average variable degree of LDPC, and by d c the average check degree of LDPC. When analyzing the decoding complexity, we include the number of multiplications within additions by considering log domain processing. Generally speaking, for a small code rate, a regular LDPC has a relatively large parity check matrix with relatively more nonzero elements because the code rate R = 1 − d v /d c . This will increase the message-passing complexity because there are more edges between check nodes and variable nodes. Table A1. Complexity of LDPC and polar decoding (complexity unit: fixed/floating-point numbers) [44].

Coding Scheme
Additions max(min)/Comparison Look-Up Table Operations LDPC (min-sum) I max · (2Nd v + 2M) I max · (2d c − 1) · M -LDPC (sum-product) I max · (2Nd v + M · (2d c − 1)) -I max · M · d c Polar (SC) [47] N/2 log 2 N N/2 log 2 N -Polar (SCL) [47,48] L · N/2 log 2 N L · N/2 log 2 N -In Table A2, a specific complexity evaluation is given. Practically, the maximum number of iterations of LDPC decoding ranges from 20 to 50. The values of I max and the list size L are selected such that the min-sum, sum-product, and SCL have comparable complexity. However, in reality, an L as large as 20 suffices in most scenarios. It can be concluded that for at least a small code rate, polar codes have lower decoding complexity than LDPC.