An Enhanced Belief Propagation Flipping Decoder for Polar Codes with Stepping Strategy

The Belief Propagation (BP) algorithm has the advantages of high-speed decoding and low latency. To improve the block error rate (BLER) performance of the BP-based algorithm, the BP flipping algorithm was proposed. However, the BP flipping algorithm attempts numerous useless flippings for improving the BLER performance. To reduce the number of decoding attempts needed without any loss of BLER performance, in this paper a metric is presented to evaluate the likelihood that the bits would correct the BP flipping decoding. Based on this, a BP-Step-Flipping (BPSF) algorithm is proposed which only traces the unreliable bits in the flip set (FS) to flip and skips over the reliable ones. In addition, a threshold β is applied when the magnitude of the log–likelihood ratio (LLR) is small, and an enhanced BPSF (EBPSF) algorithm is presented to lower the BLER. With the same FS, the proposed algorithm can reduce the average number of iterations efficiently. Numerical results show the average number of iterations for EBPSF-1 decreases by 77.5% when N = 256, compared with the BP bit-flip-1 (BPF-1) algorithm at Eb/N0 = 1.5 dB.


Introduction
Polar code is the first error-correcting code [1] which has achieved the Shannon limit. It has been adopted as the fifth generation (5G) wireless communications standard [2]. Not only does it have a strong error correction capability, but its encoding and decoding complexity is also affordable compared with the Low Density Parity Check (LDPC) [3] and Turbo codes [4].
The Successive Cancellation (SC) algorithm, which was proposed by Arıkan [5], is one of the most common decoding methods for polar codes and has attracted widespread attention. The original SC algorithm has been optimized by the relevant scholars, producing optimizations such as the SC list (SCL) [6], SC flip (SCF) [7] and dynamic SC flipping (DSCF) [8] algorithms. The Cyclic redundancy check (CRC)-aided SC list (CA-SCL) [9] decoder was introduced to improve the BLER of polar codes and has become a baseline algorithm used in the standardization process. Compared with the SC and its other optimized algorithms, the Belief Propagation (BP) algorithm [10] with parallel decoding has great advantages in terms of its throughput and decoding latency. In addition, BP decoding is expected to support polar codes of 5G standard in practical applications with a set of hardware units. However, the BP decoder will not terminate until the maximum number, the termination scheme suffers from lack of flexibility, and introduces great computational complexity. Moreover, the BLER performance of BP is uncompetitive.
To improve the original BP decoding performance, many methods to improve the BP-based algorithm have been proposed-for example, the BP list (BPL) [11] decoder. When using the standard polar code decoding factor graph (DFG), the original BP algorithm may generate incorrect decoding results because the transmission process of the messages is • A stepping strategy is proposed. We first analyze the behavior of FS and find that only a few bits in the FS could correct the error frame. Therefore, a concept to evaluate the likelihood of bits correcting the trajectory of BP decoding is presented to judge whether the bits in FS should be flipped or not. The judgement condition determines whether flipping the bits in the FS is helpful in correcting error frames. • Based on the stepping strategy, an optimization algorithm for the BP flipping algorithm, the BP step-flipping (BPSF) algorithm, is proposed. The algorithm flips only unreliable bits in FS and steps reliable bits to shrink the number of flipping attempts necessary. Similarly, the stepping strategy is also added into the GBPF-Ω algorithm [17,18] to reduce the number of flipping attempts. • In addition, we notice that some effective flipping bits may be skipped over when the LLR magnitude is small. We further propose the enhanced BPSF-Ω (EBPSF-Ω) algorithm, which adopts a threshold to identify the unreliable bits and lowers the block error rate (BLER). The numerical results obtained indicate that the average number of iterations can be significantly reduced for the EBPSF-1 and EBPSF-2 algorithms at the low E b /N 0 , compared with the BPF-1 and BPF-2 flipping algorithms when the code length is 256.
The remainder of this paper is organized as follows. Section 2 reviews the polar code, the original BP algorithm, and the BP flipping algorithm. The BPSF-Ω algorithm with a threshold is proposed in Section 3. Section 4 analyzes the decoding performance. Conclusions are drawn in Section 5.

Preliminary
In this paper, we use calligraphic characters, such as R, to denote sets. We write r, r, and R to denote a scalar, a vector, and a matrix, respectively. In this section, we first describe the polar codes. Then, we briefly introduce the original BP algorithm. Finally, the BP flipping algorithm is presented.

Polar Code
After channel combining and channel splitting, N independent copies of binary-input discrete memoryless channels are converted to N split channels with different capacities [18]. Some of these have a high channel capacity, which means that the channel is more reliable for transmitting information, and some of them have a low capacity. Polar codes use highcapacity channels to transmit information bits and CRC bits, and the rest of the channels are used to transmit frozen bits. In this paper, the frozen bits are fixed to zero.
Polar code can be represented as P (N, K), where N is the code length of the polar code and K represents the length of the information bits. Meanwhile, (N − K) represents the length of frozen bits. The code rate is R=K/N. The K information bits comprise (K − r)-bit data and r-bit CRC. The set of information bits and frozen bits are denoted as A and A C , respectively. The encoding process can be expressed as where x N 1 = {x 1 , x 2 , . . . , x N } represents the codeword and u N 1 = {u 1 , u 2 , . . . , u N } denotes the source vector which is mixed with the information bits u A and the frozen bits u A C . The generator matrix is represented as G N =B N F ⊗n , where B N is the bit-reversal permutation matrix [20], and F ⊗n denotes the n-th Kronecker power of n = log 2 N and The BP decoding is initiated from the received value y N 1 ={y 1 , y 2 , . . . , y N }. The decoder generates an estimationû i of u i based on the received y N 1 aŝ A binary input memoryless channel W n generates N sub-channels by channel splitting,

Original BP Decoding Algorithm
The BP algorithm for polar codes is based on a DFG. A polar code with code length N is represented by an n-stage DFG. We use (i, j) to indicate the nodes of the DFG, where i indicates node index and j indicates column index. The leftmost nodes in the DFG mean j = 0, such as the blue and black nodes in the Figure 1. Similarly, the rightmost nodes in the DFG mean j = n, as shown in the grey node column.   The classic BP DFG is depicted in Figure 1. Each stage consists of N/2 processing elements (PE), where a fundamental PE is shown in Figure 2. Figure 1 consists of three stages and each stage has four PEs. One PE has four nodes and each node is associated with two types of messages, a right-to-left message L j,i and a left-to-right message R j,i . L j,i and R j,i are in the form of LLR. The message propagation rules are as follows: where g(x, y) = α · sign(x) · sign(y) · min(|x|, |y|), where α = 0.9375 follows from the scaling factor used in [21]. L j,i and R j,i need to be initialized as follows where L (j) n denotes the LLR of the j-th received bit. The +∞ in the first column of R j,1 indicates the prior knowledge carried by the frozen bits.
The maximum number of iterations I max is preset, and the decoding is terminated when the number of iterations is equal to I max or when the CRC check is satisfied. The hard decisions ofû i andx i are estimated aŝ

BP Flipping Decoding Algorithm
The bit-flipping strategy is a feasible method with which to improve the performance of BP-based algorithms. Due to the parallelism of BP decoding, there may be more than one bit that could correct the error frame by flipping it. Not only can the real error bits correct the error frames, but there are still other bits that can correct the error frames in the process of the iterative computation of the DFG [18]. However, there exist more bits that provide no assistance in error frame correction, and the invalid flipping of these will cause much latency. Therefore, the study of the strategy used for locating the flipping bits which can effectively correct the error frames is essential.
Initially, the CS is used to identify unreliable bits in the SC flipping decoder, which is composed of the first bit index of each Rate-1 node [15]. As shown in Figure 3, blue nodes are referred to as Rate-1 nodes because all the leaf nodes are information bits, white node means that all its leaf nodes are frozen bits and grey node denotes that its leaf nodes include both information and frozen bits. In Figure 3, CS = {8,10,11,13} is shown as striped blue nodes. It can be noticed that the size and elements of CS are fixed for a certain polar code, which means that CS is static in decoding. Because of this characteristic, no latency will be caused by FS construction. The BPF algorithm [15] employs CS in bit-flipping and the flipping operation is as follows For the GBPF decoding [17,18] algorithm, the FS is constructed dynamically with the smallest LLR magnitude. A sorting network is required to select information bits to constitute the FS, F Γ 1 ={F 1 , F 2 , . . . , F Γ }, where Γ denotes the length of the FS. Before the next-round of BP decoding attempts, the FS is generated by the smallest LLR magnitude and the bit-flipping operation is performed with the FS. The rule used to generate the FS is defined by White node means that all its leaf nodes are frozen bits, blue node means that all its leaf nodes are information bits, striped blue node denotes the first information bit of the blue node and grey node denotes that its leaf nodes include both information and frozen bits.
Specifically, the GBPF flipping operation can be written as The oracle-assisted BP (OA-BP) decoder [18] knows which bit estimates make the frame mistakes after the original BP decoding and then re-decodes the incorrect frame by flipping the erroneous bit in turn into the correct value. The incorrect codeword set can be expressed as where τ is the count of erroneous bits. The differences between BPF algorithms are how to choose the flipping set. The BPF algorithm generates CS before decoding with Rate-1 nodes and stays static in decoding. The GBPF algorithm generates FS dynamically in decoding, which will lead to extra costs of sorting the LLR magnitude in generating FS. However, the GBPF algorithm also provides greater possibility of error-correction by setting larger sizes of FS. The generalized procedure for BPF-1 algorithm [15] and GBPF-1 algorithm [17] can be summarized as Algorithm 1.
Algorithm 1 BP flipping algorithm. The bits needing to be flipped by the BP flipping algorithms are listed in Table 1.
The BPF algorithm has a significantly higher number of flipping bits than the GBPF algorithm. The BLER performance and the average number of iterations for the existing algorithms are shown in Figure 4. It can be seen that the BLER performance of the BPF-Ω and GBPF-Ω algorithms is competitive. However, the average number of iterations for the BPF and GBPF algorithms increases exponentially with the rise of Ω. The average number of iterations of the BPF-2 algorithm is more than two thousand at E b /N 0 = 1. Therefore, we propose a step-flipping strategy to reduce the average number of iterations in Section 3. Table 1. Statistics for the flipping bits.

Algorithm
One-Bit-Flipping Digits Two-Bit-Flipping Digits

The Proposed BPSF Algorithm
In this section, we first analyze the behavior of the CS. Then, the BPSF algorithm is proposed to reduce the average number of iterations. Finally, the threshold factor is applied to the BPSF algorithm to lower the BLER, and the pseudocode of the two-bit-flipping algorithm is presented.

Analysis of Critical Set
According to Section 2.3, the size of the critical set is determined by the Rate-1 nodes. For N = 256, the size of CS is T = 39. Similarly for N = 512 and N = 1024, T equals 60 and 116 respectively. There are many elements in CS that could correct the error frames by flipping them and one flipping means one attempt. The BPF-1 and GBPF-1 decoders are analyzed in Figure 5 when N = 256 and N = 1024. It illustrates the number of successful flipping attempts in the CS-T that could correct the error frames by flipping them at 2 dB.
As shown in Figure 5a, there are nine attempts of BPF-1 decoding and three attempts of GBPF-1 decoding that could correct the error. Frame 10 is marked with a blue rectangle, but the other decoding attempts are failed to correct this error frame. Those successful decoding attempts can be expressed as r M Ω = r 1 Ω , r 2 Ω , · · · , r M Ω , where Ω is the maximum bit-flipping order and M is the number of successful BP-flipping decoding attempts. When Ω = 1, M is less or equal to the number of BP-flipping decoding attempts (C 1 T = 39). When Ω = 2, M is less or equal to the number of BP-flipping decoding attempts (C 1 T + C 2 T = 780). The decoding attempts are similar when N = 1024, as shown in Figure 5b. The distribution of M is listed in Table 2. In the BPF-1 and GBPF-1 decoders, 87.9% and 79.3% of the error frames can be corrected with M ≤ 9. It means that the number of successful decoding attempts is not larger than 9 for most error frames. Likewise, 79.1% and 78.3% of the error frames can be corrected with M ≤ 99 in the BPF-2 and GBPF-2 decoders. It can be observed that only a few BP-flipping decoding attempts can correct the error frames while the others are not helpful. In addition, the BP-flipping decoding attempts are different for each error frame. Thus, we propose a method to detect ineffective decoding attempts and step them to reduce the average number of decoding attempts.

Proposed BPSF Algorithm
A flipping set of FS-T can be expressed as ε T ={ω 1 , · · · , ω T } and ω q (1 ≤ q ≤ T) indicates the q-th flipping position.û[ω q ] j denotes the hard decision estimate of the bit u j after flipping the ω q -th bit. Let P(ω q ) be the probability of ω q correcting the trajectory of BP decoding, where Experiments show that only a few bits in the FS-T can correct the error frames, and flipping the rest will not facilitate successful decoding. Thus, this paper proposes a stepping scheme to step the flipping bits with low P(ω q ). Let L (ω q ) j,0 denote the LLR of the bit u j after flipping the ω q -th bit. The LLR magnitude can be intuitively used as a metric of the reliability [21]. If the L (ω q ) ω q ,0 magnitude is smaller, the bit is considered as an unreliable bit and then the P(ω q ) of the bit is deemed to be higher. Thus, if the L (ω q ) ω q ,0 magnitude is smaller than the L (ω q ) ω q+1,0 magnitude, P(ω q ) is deemed to be higher than P(ω q+1 ) and there is no need to perform a useless flipping for the reliable ω q+1 bit. Let ρ={ρ 1 , ρ 2 , · · · , ρ T } denote the index of FS. We assume one decoding trial of flipping ω ρ i has failed. Additionally, we obtain the new LLR of L The stepping decision is as (15). Using the stepping decision, the flipping sequence of the original BPF is modified, as shown in Figure 6.
The bit needs to be flipped. The flipping position flips in turn.

step step step
The flipping position skips from one bit to another bit according to the stepping decision. The original flipping strategy used in BPF [15] is to flip bits within CS-T in turn. We propose the use of the stepping decision in the BPSF algorithm to decide which bit can be skipped. The flipping operation in the proposed algorithm is as (16), which flips the left message L j,n+1 of the rightmost nodes.
Similarly, inspired by the GBPF-Ω algorithm [17,18], we design a generalized bit step-flipping (GBPSF-Ω) algorithm. There are two differences between the GBPF-Ω and GBPSF-Ω. Firstly, GBPSF-Ω flips the left message of the rightmost nodes according to (16). Secondly, the stepping decision is used to step over bits, as shown in Figure 6, after the construction of FS.

Enhanced BPSF Algorithm
The BPSF-Ω algorithm can significantly decrease the average number of iterations, but some flipping bits that could correct the frame in FS are skipped when their LLR magnitude is small, which will cause performance degradation. In Figure 7a,c, the "blue line" denotes the LLR magnitude of ω j (1 ≤ j ≤ M), which cannot correct the error frame after one BP decoding attempt. The "Purple line" denotes the LLR magnitude of ω k (j < k ≤ M), which is the first bit after ω k that can correct the error frame. The LLR magnitude gap between ω k and ω j , called β, is shown in Figure 7b,d. It indicates that ω k can still be an unreliable bit even if it satisfies (15).  ]. Using the β, the stepping decision of (15) is modified to (17) to skip the unreliable bits more accurately. Then, the enhanced BPSF-Ω algorithm (EBPSF-Ω) and the enhanced GBPSF-Ω algorithm (EGBPSF-Ω) with the threshold β are developed, whose stepping decision is the same as (17). β can be determined by a Monte-Carlo simulation. We detail the EBPSF-2 algorithm in Algorithm 2, and the EBPSF-Ω algorithm is also similar to it when Ω > 2.

Numerical Results
In this section, we compare the proposed step-flipping algorithm and the existing flipping algorithm in terms of β, the average number of iterations, and the BLER with different code lengths. Simulations are performed with additive white Gaussian noise (AWGN) channel and binary-phase shift keying (BPSK) modulation. The additional simulation parameters are listed in Table 4. The simulations for the (256, 128), (512, 256), and (1024, 512) polar codes are concatenated with 24-bit CRC and for (64, 32) polar codes with 11-bit CRC. The m CRC bits are attached to the information block, where m is the CRC remainder length, and all the K bits are sent into the error-correcting encoders.  [22] g(x) = x 11 +x 10 +x 9 +x 5 +1 g(x) = x 24 +x 23 +x 18 +x 17 +x 14 +x 11 +x 10 +x 7 +x 6 +x 5 +x 4 +x 3 +x 1 +1 Horizontal Coordinate E b /N 0 Design SNR of Construction 1 dB Efficient Information Bits Km

Analysis of the Threshold β
To verify the effectiveness of the threshold β, the BLERs of the BPSF-1, EBPSF-1, GBPSF-1 and EGBPSF-1 algorithms with different threshold β are compared in Figure 8. Additionally, the GBPF-1 and BPF-1 algorithms are provided as references. With the assistance of β, the EBPSF-1 algorithm outperforms the BPSF-1 algorithm in BLER under the same parameters for both N = 256 in Figure 8a and N = 1024 in Figure 8b. For β = 0.5 and β = 1 when N = 256 and T = 128, the EBPSF-1 algorithm achieves 0.09 dB and 0.13 dB gain with the BPSF-1 algorithm at BLER = 1 × 10 −2 , respectively. Furthermore, for β = 1 and β = 10 when N = 1024 and T = 116, the EBPSF-1 algorithm obtain the gain of 0.04 dB and 0.23 dB for BLER = 10 −3 compared with the BPSF-1 algorithm. Therefore, the EBPSF-1 algorithm has more accuracy in locating the reliable bits among the CS-T. With the increase of β, the BLER performance of the EBPSF algorithm is continuously optimized, but the average number of iterations also increases. Therefore choosing a proper β is essential to optimizing the BLER performance and can lead to a negligible increase in the average number of iterations.

Analysis of the Average Number of Iterations
The stepping strategy is used to skip some bits in FS. Thus, the number of flippings is smaller than the original flipping algorithm without the stepping strategy. To verify this point, some simulations are performed. Figures 9 and 10 compare the average number of iterations among the BPF-Ω, GBPF-Ω, EBPSF-Ω and EGBPSF-Ω algorithms. The average number of iterations for the EBPSF-Ω algorithm is lower than the BPF-Ω algorithm, while the EGBPSF-Ω algorithm is lower than that of the GBPF-Ω algorithm. That's because the step-flipping strategy applied in the EBPSF-Ω and EGBPSF-Ω algorithms reduces flipping attempts by skipping the reliable flipping bits.
It can be observed from Figures 9 and 10   Similarly, for the two-bit flipping, the average number of iterations is shown in Figure 10. Under the same T, the average number of iterations for the EBPSF-2 algorithm is significantly lower than that for the BPF-2 and GBPF-2 algorithms. In the case of 0 dB and T = 12, the EBPSF-2 algorithm is inferior to BPF-2 and GBPF-2 algorithms by 40.54% and 47.62%, as shown in Figure 10a, respectively. In Figure 10b, with T = 128 and T = 256, the EBPSF-2 algorithm reduces the average number of iterations by 77.4% and 95.9% at 1.5 dB against the BPF-2 algorithm, respectively. Consequently, the step-flipping strategy that we propose for the BP one-bit flipping and multi-bit flipping algorithms is effective in reducing the average number of iterations.

Analysis of the BLER Performance
Using the stepping strategy, some bits in FS are skipped during the procedure of flipping with negligible performance loss. The BLER performance of the EBPSF-Ω and EGBPSF-Ω algorithms is depicted in Figures 11 and 12. It can be observed that the algorithms which apply the proposed step-flipping strategy are better in terms of the BLER performance. Unlike the BPF-Ω and GBPF-Ω algorithms, which flip the right information of the leftmost nodes in the DFG, we flip the left information of the rightmost nodes by (16). In contrast to the CA-SCL decoding, the EBPSF-Ω and EGBPSF-Ω algorithms can achieve comparable decoding performance.
When T = 39, Figure 11a indicates that the EBPSF-1 (β = 1) algorithm performs similarly to the BLER of GBPF-1 algorithm. Additionally, the EBPSF-1 algorithm has 0.23 dB gain at BLER = 4 × 10 −3 against the OA-BP decoder when T = 256. The BPF-Ω and GBPF-Ω algorithms flip the right information of the leftmost nodes in the DFG. However, the proposed algorithm flips the left information of the rightmost nodes, and the EBPSF algorithm obtains a gain over the OA-BP decoder. With the same parameters, the EBPSF-1 (T = 256) outperforms the CA-SCL (L = 4) decoder by 0.13 dB when BLER = 1 × 10 −2 . The one-bit flipping BLER performance for N = 512 is depicted in Figure 11b. When T = 60 and β = 0.5, the EBPSF-1 algorithm for BLER compared to the BPF-1 algorithm shows a gain of 0.09 dB at BLER = 2 × 10 −3 . At T = 60, the performance of the EGBPSF-1 (Γ = 60, β = 0.5) algorithm is approaching the BLER of the CA-SCL (L = 2) decoder.

Conclusions
To reduce the average number of iterations in the BP flipping algorithm, this paper proposes a BP flipping algorithm with the stepping strategy. To narrow the search space, the reliable bits are skipped by the stepping strategy to improve the accuracy of flipping bits. The magnitude of LLRs are used to determine whether the bits are reliable or not, skip over the reliable bits in the FS, and flip the unreliable bits to reduce the average number of iterations. Furthermore, to make the algorithm more robust, we propose the threshold β to reduce the BLER. Simulation results show that the proposed algorithm with one-bit flipping and two-bit flipping can achieve BLER for CA-SCL decoder with list size 4 and 8, separately. Compared with the BPF decoder and the GBPF decoder, the EBPSF-Ω significantly reduces the average number of iterations.

Conflicts of Interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations
The following abbreviations are used in this manuscript: