The Fractality of Polar and Reed–Muller Codes †

The generator matrices of polar codes and Reed–Muller codes are submatrices of the Kronecker product of a lower-triangular binary square matrix. For polar codes, the submatrix is generated by selecting rows according to their Bhattacharyya parameter, which is related to the error probability of sequential decoding. For Reed–Muller codes, the submatrix is generated by selecting rows according to their Hamming weight. In this work, we investigate the properties of the index sets selecting those rows, in the limit as the blocklength tends to infinity. We compute the Lebesgue measure and the Hausdorff dimension of these sets. We furthermore show that these sets are finely structured and self-similar in a well-defined sense, i.e., they have properties that are common to fractals.


Introduction
In his book on fractal geometry, Falconer characterizes a set F as a fractal if it has some of the following properties [1] (p. xxviii): • F has a fine structure, i.e., there is detail on arbitrarily small scales • F does not admit a description in traditional geometrical language, neither locally nor globally; it is irregular in some sense • F has some form of self-similarity, at least approximate or statistical • The fractal dimension of F exceeds its topological dimension • F is defined in a simple, often recursive way In this work, we investigate whether polar codes and Reed-Muller are fractal in above sense. For a blocklength of 2 n , these codes are based on the n-fold Kronecker product G(n) := F ⊗n , where i.e., on a simple, recursive operation. Based on this, it has long been suspected that Kronecker product-based codes possess a fractal nature. For example, the authors of [2] observed that G(n), when converted to a picture, resembles the Sierpinski triangle. In a personal communication [3], Abbe expressed his suspicion that the set of "good" polarized channels is fractal. Nevertheless, to the best of the author's knowledge, a definite statement regarding this fractal nature has not been presented yet. A rate-K/2 n Kronecker product-based code is uniquely defined by a set F of K indices: Its generator matrix is the submatrix of G(n) consisting of the rows indexed by F . Letting F index the K rows of G(n) with the largest Hamming weight defines a Reed-Muller code. Alternatively, one can
For larger blocklengths 2 n , n > 1, we apply the polarization procedure recursively and obtain, for where b n 0 and b n 1 denote the sequences of zeros and ones obtained by appending 0 and 1 to b n , respectively. Note that the functions g 1 , g 0 , and h 0 from Lemma 2 are non-negative and non-decreasing and map the unit interval onto itself. Hence, the inequalities in (7) are preserved under composition: where h 1 ≡ g 1 .
If we stop the polarization procedure at a finite blocklength 2 n for n large enough, then still most of the resulting 2 n channels are either almost perfect or almost useless (i.e., the channel capacities are close to one or to zero). The idea of polar coding is to transmit data only on those channels that are almost perfect. The generator matrix of a blocklength-2 n polar code is thus the submatrix of G(n) consisting of rows indexed by F , where F contains the indices corresponding to the K virtual channels with the largest capacities. Determining this set F is inherently difficult, since (whenever W is not a BEC) the cardinality of the output alphabet increases exponentially in 2 n [15] (Chapter 3.3), [16] (p. 36). Tal and Vardy proposed an approximate construction method based on reducing the output alphabet and showing that the resulting channels are either upgraded or degraded w.r.t. the channel of interest [17] (see also Korada's PhD thesis [16] (Definition 1.7 & Lemma 1.8)). These upgrading/degrading properties are important tools in our proofs.

Fractal Properties of the Sets of Good and Bad Channels
We next investigate the behavior of the set F as we let the blocklength tend to infinity, i.e., as n → ∞. This set indexes all sequences b for which we obtain I(W b ∞ ) = 1. With the help of (2), we map these sequences to a subset of the unit interval, which we will call the set of good channels. Definition 2 (The Good and the Bad Channels). Let G denote the set of good channels, i.e., Let B denote the set of bad channels, i.e., If I(W) = 0, then all polarized channels are useless and we have B = [0, 1]. Similarly, if I(W) = 1, then all polarized channels are perfect and we have G = [0, 1]. We hence assume throughout this section that the channel W is nontrivial, i.e., that 0 < I(W) < 1.

Proof. See Appendix A.
It is not really surprising that G and B are not disjoint; this is a direct consequence of the fact that f is not injective. It is not obvious, however, that the intersection exhausts the set on which f is non-injective. A consequence of this proposition is that there is no interval that contains only good channels. This has implications for code construction techniques. Indeed, the authors of [18,19] suggest that, for a polar code of a given blocklength, one may stop polarizing channels at shorter blocklengths and use copies of these channels rather than their polarization. For example, they suggest to use the channels (W b n 2 n , W b n 2 n ) rather than the channels (W b n 0 2 n+1 , W b n 1 2 n+1 ) if I(W b n 2 n ) is sufficiently large. Such a procedure can be justified if further polarizing W b n 2 n to the desired blocklength will lead to including all channels polarized from W b n 2 n in the code. Such a justification can never appear for polar codes with unbounded blocklength: Stopping polarizing at a given blocklength 2 n for a given polarization sequence and using copies of the resulting channel W b n 2 n is equivalent to including a dyadic interval in the index set. This dyadic interval contains, by Proposition 2, bad channels, which shows that this choice is suboptimal.

Proposition 3 (Symmetry).
There exists a function ϑ, defined for almost all values in [0, 1], that is independent of W and satisfies 0 ≤ ϑ(x) ≤ 1 and ϑ (1 Proposition 3 has two implications. The first implication concerns the alignment of the sets G and G for two different channels W and W . Specifically, it is connected to the question whether Z(W) ≥ Z(W ) implies G ⊆ G . In general, the answer is negative [5]. Indeed, it may happen that for some b ∈ Ω, we have , that the polarized channel turns out to be good even though the sufficient condition from Proposition 3 is not fulfilled. Such a situation cannot occur for BECs, as Proposition 3 shows. Hence, the set of good channels for a BEC is also good for any binary-input memoryless channel with a smaller Bhattacharyya parameter [20].
The second implication is that, at least for BECs, the sets G and B are symmetric. Indeed, if ϑ(x) = Z(W), then x ∈ G implies 1 − x ∈ B. This symmetry is visible in the polar fractal that we display in Figure 1. It is possible to define ϑ for x ∈ D. We know from Proposition 2 that dyadic rationals are both good and bad, hence setting ϑ(x) = 1 for every x ∈ D leads to D ⊆ G. (The fact that also D ⊆ B is not captured by nor in conflict with this setting.) The question whether the function ϑ can be defined for x ∈ Q \ D is more interesting. In this case, the binary expansion is unique and recurring, i.e., there is a length-k sequence a k ∈ {0, 1} k such that f (b n a k a k a k · · · ) = x for some b n ∈ {0, 1} n . It is straightforward to show that for every non-trivial sequence a k (i.e., a k contains zeros and ones), p a k is from [0, 1] to [0, 1], non-negative, and non-decreasing with vanishing derivatives at 0 and 1. Since this ensures that p a k (z) < z for z close to zero and p a k (z) > z for z close to one, the operation z i+1 = p a k (z i ) constitutes an iterated function system with attracting fixed points at z = 0 and z = 1. Note further that, since p a k corresponds to the recurring part of the binary expansion of x, Z(W b n a k a k ··· ∞ ) will be bounded from above by the value to which this iterated function system converges after being initialized with Z(W b n 2 n ). To show that Proposition 3 holds for x ∈ Q \ D requires showing that p a k intersects the identity function only once on (0, 1), i.e., that there is no attracting fixed point on this open interval. We leave this problem for future investigation.
Example 2. Let x = 2/3, hence f −1 (x) = 101010101 · · · . We determine the fixed points of the iterated function system corresponding to one period of the recurring sequence, i.e, the fixed points of p 10 (z) = 2z 2 − z 4 . These are given by the roots of p 10 (z) − z, which are z = 0, z = 1, and z = (± √ 5 − 1)/2. One of these latter nontrivial roots lies outside [0, 1] and is hence irrelevant. The remaining root determines the threshold, Proof. See Appendix C.
Loosely speaking, the Lebesgue measure of G is the asymptotic equivalent of the rate of the "infinite-blocklength" polar code for the channel W. The fact that λ(G) = I(W) states that the rate approaches the symmetric capacity of W. A positive Lebesgue measure and a Hausdorff dimension equal to one are not indicators of fractality.
The last fractal property we consider is self-similarity. As Falconer notes [1] (p. xxviii), self-similarity often occurs only approximately. What we show in the following proposition is that the set G is quasi self-similar. Along the same lines, the quasi self-similarity of B can be shown.
Proof. See Appendix D.
In other words, at least for a symmetric channel, G is composed of two similar copies of itself (see Figure 1). The self-similarity is closely related to the fact that polar codes are decreasing monomial codes [6] (Theorem 1). Example 3. We want to determine whether 1/3 ∈ G for a given BEC W. This question translates the questions whether 1/6 ∈ G 1 (1) and whether 2/3 ∈ G 1 (2). Along the lines of Example 2, we obtain ϑ(1/6) ≈ 0.214, Since W is a BEC, we can connect this with Proposition 3 and thus obtain the inclusion indicated in Proposition 5.

Preliminaries for Reed-Muller Codes
An order-r, length-2 n Reed-Muller code is defined by having a generator matrix G RM (r, n) composed of all length-2 n sequences with a Hamming weight larger than 2 n−r . For example, we have G RM (n, n) = G(n), while G RM (0, n) is a single row vector containing only ones (length-2 n repetition code). To make this more precise, let w(b n ) = ∑ n i=1 b i be the Hamming weight of b n ∈ {0, 1} n and let s i (n) be the i-th row of G(n). Then, the generator matrix G RM (r, n) of an order-r, length-2 n Reed-Muller code consists of the rows of G(n) indexed by [4] To analyze the effect of doubling the block length, note that Assume that we indicate the rows of G(n) by a sequence of binary numbers, i.e., let the i-th row be indexed by h n (b n ) := 2 n ∑ n l=1 b l 2 −l . Furthermore, let 0b n and 1b n denote the sequences of zeros and ones obtained by prepending 0 and 1 to b n , respectively. Clearly, h n+1 (0b n ) = h n (b n ) and h n+1 (1b n ) = h n (b n ) + 2 n . Combining this with (21) yields w(s h n+1 (0b n ) (n + 1)) = w(s h n (b n ) (n)) w(s h n+1 (1b n ) (n + 1)) = 2w(s h n (b n ) (n)).
Defining G(0) := 1, we thus get and In Section 5, we will analyze the properties of F in the limit as n tends to infinity. An important ingredient in our proofs is the concept of normal numbers.

Definition 3 (Normal Numbers
In general, a number is simply normal to base M if the fraction of its digits used in its M-ary expansion is 1/M. A number is called normal if this property not only holds for digits, but also for subsequences: a number is normal in base M if, for each k ≥ 1, the fraction of each length-k sequences used in its M-ary expansion is 1/M k . It immediately follows that a normal number is simply normal. The converse is in general not true: Example 4. Let x = 1/3, hence b = 010101 · · · . x is simply normal to base 2, but not normal (since the sequences 00 and 11 never occur). Let x = 1/7, hence b = 001001001 · · · . x is neither normal nor simply normal. Let x ∈ D, hence b is either terminating (lim n→∞ w(b n )/n = 0) or non-terminating (lim n→∞ w(b n )/n = 1). Dyadic rationals are not simply normal.
Despite this result, there are uncountably many numbers in the unit interval which are not normal. Moreover, the set of numbers that are not normal is superfractal, i.e., it has a Hausdorff dimension equal to one although it has zero Lebesgue measure [22].

Fractal Properties of the Set of Heavy Codewords
If we let n tend to infinity, the definition of F in (23) becomes problematic. Rather than looking at order-r, length-2 n Reed-Muller codes, we investigate order-(1 − ρ)n, length-2 n codes, where we assume that ρn is integer. In other words, we assume that the threshold for the Hamming weight increases linearly with the blocklength. This gives rise to the definition of heavy codewords: Definition 4 (The Heavy Codewords). Let H(ρ) denote the set of ρ-heavy codewords, i.e., Loosely speaking, the set of heavy codewords corresponds to those rows of G(n) that asymptotically have a fractional Hamming weight larger than a given threshold. Similarly as for polar codes, also Reed-Muller codes are such that no interval is contained in either H(ρ) or its complement (unless in the trivial cases H(0) and H (1)). This is again in contrast with the intuition one obtains for Reed-Muller code with finite blocklength. Suppose we fix n to be even and set r = n/2, i.e., we require that at least one half of the bits in b n are one. The matrix G(n) resembles a Sierpinski triangle, as depicted in [2] (Figure 2). In our notation, the set F indexes none of the first 2 n/2 − 1 rows of G(n), since they cannot have sufficient Hamming weight. Consequently, the transition as n → ∞ creates complications that are not present for finite n, and one needs to depart from intuition based on these finite-blocklength considerations.

Proposition 7 (Lebesgue Measure & Hausdorff Dimension). H(ρ) is Lebesgue measurable and has Lebesgue measure
The Hausdorff dimension satisfies

Proof. See Appendix F.
Loosely speaking, the Lebesgue measure of H(ρ) is the asymptotic equivalent of the rate of the fractional order-ρ Reed-Muller code. As we showed in Proposition 4, the Lebesgue measure of G is equal to the symmetric capacity of W. In contrast, the set H(ρ) does not depend on W. Rather, Proposition 7 suggests that the order parameter ρ induces a phase transition for the rate of Reed-Muller codes: If ρ < 1/2, the "infinite-blocklength" Reed-Muller code consists of almost all (in the sense of Lebesgue measure) possible binary sequences. In contrast, if ρ ≥ 1/2, the "infinite-blocklength" Reed-Muller code consists of almost no codewords (again, in the sense of Lebesgue measure).
Let us briefly consider the case ρ = 1/2. For this case, Proposition 7 states that H(ρ) is a Lebesgue null set that has a Hausdorff dimension equal to 1. Thus, the set H(1/2) is a superfractal. Unfortunately, we were not able to give an exact expression for the Hausdorff dimension of H(ρ) for ρ > 1/2. While the set of all non-normal numbers is superfractal, we are not sure if this holds also for the specific proper subset H(ρ).

Discussion and Outlook
That Kronecker product-based codes possess fractal properties has long been suspected. The present manuscript contains several results that back this suspicion with solid mathematical analyses. Specifically, we assumed that the blocklength tends to infinity and investigated the properties of the set G of virtual channels that are perfect and the set H(ρ) of codewords that have a fractional Hamming weight no less than ρ. Since both polar codes and Reed-Muller codes are obtained by a simple, recursive procedure, it remains to investigate whether the sets G and H(ρ) satisfy any of the following properties [1] (p. xxviii): • The set has a fine structure, i.e., there is detail on arbitrarily small scales; • It does not admit a description in traditional geometrical language, neither locally nor globally; it is irregular in some sense; • It has some form of self-similarity, at least approximate or statistical; • The fractal dimension of the set exceeds its topological dimension.
Indeed, the sets G and H(ρ) possess a fine structure in the sense that they are dense in the unit interval, but that also their complements are dense in the unit interval (cf. Propositions 2 and 6). Therefore, at an arbitrarily small scale, the sets G and H(ρ) admit no simple description in geometrical language. Both of these sets are self-similar in a specific sense, as we outlined in Propositions 5 and 8. Finally, while G has a fractal dimension of one (cf. Proposition 4), the set H(ρ) has, for a certain range of ρ, a positive (fractional?) Hausdorff dimension despite being a Lebesgue null set. This result, which we proved in Proposition 7, is one of the defining properties of a fractal set.
One reviewer pointed out that our definition of H(ρ) can be complemented by a different one. Specifically, while H(ρ) indexes the codewords with a fractional Hamming weight not smaller than ρ, one could define a set H (R) indexing the codewords of a Reed-Muller code with rate R. In other words, while H(ρ) is parameterized via the fractional order of the code, H (R) is parameterized via its rate. We expect that the Lebesgue measure of the (adequately defined) set H (R) should be R and that, thus, its Hausdorff dimension equals one. An appropriate definition of H (R) is tied to the set F of a rate-R, length-2 n Reed-Muller code (such as is our Definition 4). Since finding such a definition has so far eluded us, we postpone this investigation to future work.
Another obvious extension of our work are non-binary polar and Reed-Muller codes. For example, consider an × matrix with entries from {0, . . . , q − 1}, where q is prime. One can show that this matrix is polarizing as long as it is not upper-triangular [15] (Theorem 5.2). We believe that our analysis can be replicated by considering the -ary expansion of real numbers in [0, 1]. Along the same lines, it would be interesting to examine the properties of q-ary Reed-Muller codes, e.g., [23,24].

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Proof of Proposition 2
That G ∩ B ⊆ D follows from the fact that only dyadic rationals have a non-unique binary expansion. In particular, the preimage of x ∈ D consists of two elements, namely b := (b n 0000000 · · · ) (A1) and b := (b n−1 b n 1111111 · · · ).
By the properties of polarization and with the assumption that 0 < I(W) < 1, we have that 0 < I(W b n−1 b n 2 n ), I(W b n 2 n ) < 1, and hence also 0 < Z(W b n−1 b n 2 n ), Z(W b n 2 n ) < 1. Moreover, it follows from Lemma 2 and (9) that from which we obtain I(W b ∞ ) = 1 and x ∈ G. Similarly, with Lemma 2 and (10) we obtain Hence, I(W b ∞ ) = 0 and x ∈ B. Since this holds for every x ∈ D, we have that G ∩ B = D. The proof that G \ D and B \ D are dense in [0, 1] follows along similar lines. Specifically, we show that between every dyadic rational we can find rational numbers x, x ∈ Q \ D such that x ∈ G and x ∈ B. To this end, fix x 1 = p/2 n and x 2 = (p + 1)/2 n . Let further b n be the terminating binary expansion of x 1 , i.e., f (b n 000 · · · ) = x 1 .
Let a k be such that a 1 = · · · = a k−1 = 1 and a k = 0. We can bound the polynomial p a k from above: The bound crosses z at z = 0 and at z * = 2 −1/(2 k−1 −1) , where z * can be made arbitrarily close to one for k sufficiently large. Now let z i+1 = p a k (z i ), where z 0 = Z(W b n 2 n ). It follows that Z(W b n a k a k a k ··· ∞ ) ≤ lim i→∞ z i . However, z i → 0 if z 0 < z * , hence if k is sufficiently large such that this holds, then Z(W b n a k a k a k ··· ∞ ) ≤ 0. Thus, I(W b n a k a k a k ··· ∞ ) = 1 and x = f (b n a k a k a k · · · ) ∈ G. Recall that a k is such that a 1 = · · · = a k−1 = 0 and a k = 1. We next bound the polynomial q a k from below: which intersects z at z = 0, at some root z * that can be made arbitrarily close to zero for k sufficiently large, and at some root z † < 1 that tends to one if k becomes large. Note further that the slope of q a k equals 2 k z(1 − z 2 ) 2 k−1 −1 . By setting k sufficiently large, one can guarantee that this slope is smaller than one on the interval [z † , 1]. Now let z i+1 = q a k (z i ), where z 0 = Z(W b n 2 n ). Suppose further that we have chosen k sufficiently large such that z 0 > z * and that the slope of q a k is smaller than one on the interval [z † , 1]. We know that q a k (z) > z on the interval (z * , z † ), since this is the case for the lower bound in (A5). However, since the slope of q a k is smaller than one on the interval [z † , 1] and since q a k intersects z at one, there can be no further intersection between q a k and z on the interval [z † , 1]. Hence, q a k (z) > z on the interval (z * , 1], and z i → 1 since z 0 > z * . Since furthermore Z(W b n a k a k a k ··· ∞ ) ≥ lim i→∞ z i , we obtain Z(W b n a k a k a k ··· ∞ ) ≥ 1. Thus, I(W b n a k a k a k ··· ∞ ) = 0 and x = f (b n a k a k a k · · · ) ∈ B.
Since both f (b n a k a k a k · · · ) and f (b n a k a k a k · · · ) are in the interval (x 1 , x 2 ), this shows that between every two dyadic rationals, there are numbers x and x that are good and bad. This proves that G \ D and B \ D are dense in [0, 1].

Appendix C. Proof of Proposition 4
For the proof we utilize properties of f derived in the proof of [26] (Theorem 2.1, p. 7). Specifically, let E ⊂ Ω be the set of all binary sequences with infinitely many zeros, i.e., E contains the binary expansion of all numbers x ∈ [0, 1] \ D and the terminating binary expansions of all numbers x ∈ D.
Since Ω \ E is countable, E is a Borel set. The function f restricted to E, f : E → [0, 1], is bijective and measurable (since f is measurable by Lemma 1). Finally, the inverse function f −1 : The set E contains all binary sequences except for non-terminating expansions of dyadic rationals, which lead to good channels by the proof of Proposition 2. Thus, we have Since this set has probability 1 − I(W) by Proposition 1, it is a Borel set (otherwise, it would not be measurable). However, since f ({b ∈ E: I(W b ∞ ) = 0}) = B, it follows that B is a Borel set of [0, 1]. That G is a Borel set can be shown along similar lines, with E containing all binary sequences with infinitely many ones.
Every Borel set is Lebesgue measurable. To evaluate the Lebesgue measure of B, note that Since B ∈ B [0,1] , we get from Lemma 1 that λ(B) = P( f −1 (B)). By the monotonicity and countable subadditivity of measures, we have Hence, by Proposition 1, λ(B) = P({b ∈ Ω: The proof for the set of good channels follows along the same lines. Since the one-dimensional Hausdorff measure of a Borel set equals its Lebesgue measure [1] (Equation (3.4), p. 45), it immediately follows that G and B have a Hausdorff dimension equal to one.
We now turn to the proof of the Hausdorff dimension. For ρ < 1/2, H(ρ) has full Lebesgue measure. Since every Lebesgue measurable set has a Borel subset with the same Lebesgue measure, and since Hausdorff dimension is monotonic, we have d(H(ρ)) = 1 for ρ < 1/2. for some ξ ∈ (0, 1). Note thatÑ 1/2 = N . By [30] (cf. [21] (Chapter 8) for further notes), the Hausdorff dimension of this set is given by (Interestingly, in Eggleston's paper, the dimension was not connected to entropy; it was submitted earlier in the same year as Shannon's Mathematical Theory of Communication was published). d(Ñ p ) = h 2 (ξ).

Appendix G. Proof of Proposition 8
The proof follows along the lines of the proof of Proposition 5. Let again b n k be the terminating expansion of (k − 1)2 −n and let a ∈ Ω. The connections between the sequences b := b n k a, b − := b n k 0a, and b + := b n k 1a have been established above. To prove the theorem, we have to show that