1. Introduction
Polar codes are the first family of error correcting codes that provably achieve the capacity for symmetric binaryinput discrete memoryless channels (BDMCs) with low complexity successive cancellation (SC) decoding [
1]. Arikan introduced the channel polarization that splits the capacities of information channels to one or zero as block length becomes longer [
1]. As a dual phenomenon of the channel polarization, the conditional entropies in source coding context are also polarized when the joint entropy is broken down by the chain rule (this phenomenon is called the “source polarization”) [
2]. The source coding using polar codes based on the source polarization is also proven to be optimal for both lossless [
3] and lossy source coding [
4]. In addition, polar codes have been extended to multiterminal problems such as multiple access and broadcast channels in channel coding [
5,
6,
7,
8], and Slepian–Wolf and Gelfand–Pinsker problems in source coding [
9,
10,
11].
From the channel coding perspective, even though the SC decoding of polar codes achieves the capacity, it was not as competitive in finite length performance as lowdensity paritycheck (LDPC) codes and turbo codes [
12,
13]. However, when both the concatenation of polar codes with cyclic redundancy check (CRC) codes and the SC list (SCL) decoding that keeps the best multiple decoding paths are used, the finitelength polar coding performance becomes comparable to the LDPC coding [
12]. Furthermore, a paritycheck concatenation was proposed to improve the performance of polar codes based on CRCaided SCL decoding [
14]. Recently, thanks to their excellent error performance and fine rate flexibility [
15], polar codes have been adopted as a channel code for the control channel of 5G new radio of the 3rd generation partnership project [
16].
In order for polar codes to show comparable error performance to the LDPC codes in moderate block lengths, the SCL decoding requires a rather large list size, e.g.,
$L=32$. The list size
L determines the maximum number of decoding paths and affects the complexity of loglikelihood ratio (LLR) calculation and sorting of path metrics (PMs). It is a linear factor in the decoding complexity. According to [
17,
18], even for a list size smaller than 10, the SCL decoder has larger complexity than the LDPC decoder, which leads to a considerable gap in power consumption between those decoders. Therefore, a low complexity realization of the SCL decoding is important from a practical perspective, and some recent research has reduced the decoding complexity by efficiently managing decoding paths. In [
19], decoding paths do not split when one of two childpaths is dominant. In addition, the decoding paths that have low reliability are eliminated during decoding in [
20]. The operational decoding complexity of those schemes was lowered by the reduced average number of decoding paths.
Lately, some studies have used multiple CRC codes to address different complexity issues of the decoding [
21,
22,
23]. The scheme in [
21] aims to reduce the memory complexity by selecting only the most likely path after each CRC operation where more than one CRC is applied. In [
22], the complexity reduction due to CRCbased early stopping of decoding was reported. Finally, the method in [
23] reduces the worst decoding latency and space complexity for storing intermediate LLRs.
This paper addresses polar coding with multiple CRC codes similarly to [
21,
22,
23], but we focus more on optimization of the scheme in terms of decoding complexity. First, we define the operational complexity, and then minimize it by optimizing CRC positions. In addition, the optimization of CRC positions is extended to a modified decoding that uses a complexity reduction trick called “instant decision” (first proposed in [
24]). It is shown that the optimized CRC placement considerably lowers the decoding complexity of the multiCRC scheme in a wide range of signaltonoise ratio (SNR).
The rest of this paper is organized as follows. In
Section 2, the SCL decoding and its operational complexity are introduced, and in
Section 3, the encoding and decoding methods for the multiCRC scheme, and the optimization algorithm for CRC positions are explained.
Section 4 provides the performance and complexity of the proposed and conventional schemes, and
Section 5 concludes this paper.
3. Low Complexity SCL Decoding with Multiple CRC Codes
As the list size
L increases, the performance of the SCL decoding is improved, but the operational complexity also increases. Actually, in the LLRbased SCL decoding under consideration, the total operational complexity is the sum of
${C}_{S}^{i}$ as in Equation (
5), where each contributing term at
${C}_{S}^{i}$ is linearly or slightly superlinearly related with
${l}_{i}$. Therefore, the goal of this work is to reduce the operational complexity of the SCL decoding by managing the temporary list size lower by the aid of multiple CRC codes.
Previous research utilized multiple CRC codes to reduce the memory and time complexity of the corresponding SCL decoding [
21,
23]. Although multiple CRC codes were used to reduce the operational complexity in [
22], in the complexity evaluation, the temporary list size was not counted and only uniform (equallength) message partitioning was considered.
However, we consider the operational complexity reduction via nonuniform message partitioning and temporary list size management. If we can keep the temporary list size small, the overall complexity can be reduced. In this paper, in order to minimize the complexity effectively, the positions of CRC codes are optimized and a modified SCL decoding is employed.
3.1. Encoding and Decoding of the JTuple CRC Scheme
Let us consider a precoding of a partitioned message block with multiple CRC codes, followed by polar encoding. Before the polar encoding, the message vector
$\mathbf{m}$ is partitioned into
J subblocks
$({\mathbf{m}}_{1},{\mathbf{m}}_{2},...,{\mathbf{m}}_{J})$, and they are encoded serially with
Jtuple CRC codes as shown in
Figure 3. A CRC encoding is carried out recursively to the concatenation of a CRC encoded block and a message subblock. CRC code lengths are defined by
$({p}_{1},{p}_{2},...,{p}_{J})$, and the CRC parities are denoted by
$({\mathbf{r}}_{1},{\mathbf{r}}_{2},...,{\mathbf{r}}_{J})$. Through this nested encoding structure, undetected error by previous CRCs can be detected by following CRCs. The length of
jth message subblock is denoted by
${k}_{j}$ and the total sum of
${k}_{j}$ is
K. Likewise, the total sum of
${p}_{j}$ is
P. The CRC encoded message is encoded by a polar code. This scheme is termed the “
Jtuple CRC scheme”. In this scheme, the lengths of
J message subblocks can be determined by optimization of CRC positions that will be explained in
Section 3.2.
We introduce a decoding scheme for the Jtuple CRC scheme that is described in Algorithm 2. The temporary list size update is included in Algorithm 2 for a better presentation of the decoding procedure. The decoding is based on the conventional SCL decoding. However, the decoding is different in two aspects. First, $J1$ CRC operations are additionally conducted to discard incorrect paths (lines 12–13 in Algorithm 2). Thus, due to the added $J1$ CRCs, the temporary list size vector $\mathbf{l}=({l}_{1},{l}_{2},...,{l}_{N})$ varies depending on the decoding instance. Note, however, $\mathbf{l}$ of the SCRC scheme is invariant because decoding path validity is checked only once at the bottom of the decoding tree. If any of the CRCs fail in all decoding paths during the decoding, then the decoding is terminated and a block error is declared. This event is called “early stopping” that is rare at a high SNR but frequent at a low SNR. If the decoder finds paths with valid CRC, all such paths proceed to the next step. Note that we do not assume that best effort decoding is required.
Algorithm 2 The IDSCL decoding for the Jtuple CRC scheme 
Require: Received vector, $\mathcal{A}$, ${\mathcal{A}}_{g}$, $\mathbf{k}$, $\mathbf{p}$ Ensure: Estimated codeword
 1:
${l}_{1}\leftarrow 1$;  2:
for $i=1:N$ do  3:
if $i\in \mathcal{A}$ then  4:
if $i\in {\mathcal{A}}_{g}$ then  5:
SCL decoding of ${u}_{i}$ maintaining number of survival paths and ${l}_{i+1}\leftarrow {l}_{i}$;  6:
else  7:
SCL decoding of ${u}_{i}$ with list size of $\mathrm{min}(2{l}_{i},L)$ and ${l}_{i+1}\leftarrow \mathrm{min}(2{l}_{i},L)$;  8:
end if  9:
else  10:
Frozen bit decoding and ${l}_{i+1}\leftarrow {l}_{i}$;  11:
end if  12:
if i indicates the last parity bit of the jth CRC then  13:
Discard jthCRCinvalid paths and update ${l}_{i+1}$;  14:
end if  15:
end for  16:
return ML codeword that passes the Jth CRC;

At each CRC operation, only valid paths—from a CRC perspective—survive so that the temporary list size is spontaneously reduced to one or just a few. The temporary list grows as the decoding proceeds, but if frozen bits follow consecutively, then the growth can progress slowly. This sporadic reduction of temporary list due to multiple CRC operations induces nonnegligible complexity reduction.
Second, we consider a simple complexity reduction criterion for list management that was first proposed in [
24] for reduction of sorting complexity. When a highly reliable bit is processed, the list is forced not to grow (lines 4–5 in Algorithm 2) and the sorting operation is skipped. Let
${\mathcal{A}}_{g}$ be the set of indices where the capacity of corresponding split channels exceeds a certain threshold. (
${\mathcal{A}}_{g}$ may alternatively be defined in terms of bit error rate.(period position)) When
${u}_{i}$ is processed, if
$i\in {\mathcal{A}}_{g}$, we run the SC decoding instead of SCL decoding. The SC decoding decides the value of
${u}_{i}$ so that the temporary list size is maintained as
${l}_{i+1}={l}_{i}$ even if
${l}_{i}<L$. Because
${u}_{i}$ for
$i\in {\mathcal{A}}_{g}$ is highly reliable, the correct path may not be ruled out by the value decision of
${u}_{i}$. The list decoding with this criterion is termed the “instantdecisionaided SCL (IDSCL) decoding”. (Here, we view the conventional SCL decoding operation as a delayed decision.)
In fact, the instant decision significantly reduces the complexity even though multiple CRC codes are not applied because it skips sorting operations that account for a large portion of the total complexity. In addition, the instant decision also has an effect of maintaining the temporary list size lower. Therefore, it is expected that the multiCRC scheme including its optimization can also reduce the complexity under the IDSCL decoding.
In the next subsection, the method of how to optimize CRC positions and determine ${\mathcal{A}}_{g}$ is explained based on the aforementioned encoding and decoding methods. Note, however, that the Jtuple CRC scheme can be decoded with or without the instant bit decision. Obviously, CRC positions can be also optimized without the instant decision. The decoding without the instant decision is called the basic SCL decoding for the Jtuple CRC scheme.
3.2. Optimization of JTuple CRC Positions with Respect to Complexity
We can optimize the positions of CRCs to minimize the operational decoding complexity of the Jtuple CRC scheme. However, as mentioned above, since the decoding complexity of the Jtuple CRC scheme is variable depending on the decoding instance, we perform the optimization considering the average complexity based on the assumption that errors occur randomly in wrong paths. In order to simplify the problem, we input predetermined lengths of the Jtuple CRC codes $({p}_{1},{p}_{2},...,{p}_{J})$ into the optimization. Through decoding experiments, we determine $({p}_{1},{p}_{2},...,{p}_{J})$ empirically with regard to performance and complexity such that $\sum {p}_{j}=P$, where P is the CRC length of the compared SCRC scheme. In addition, because the minimum distance of the CRC concatenated polar code is affected by the length of the last CRC, we allocate more bits to the last CRC code than others.
Now let us first consider optimization of the positions of
J CRCs under the basic SCL decoding. Note that the CRC positions are determined by the lengths of message subblocks. As mentioned, the basic SCL decoding for the
Jtuple CRC scheme follows Algorithm 2 without instant bit decision. Therefore, for the
Jtuple CRC scheme with a predetermined
$\mathbf{p}$, the optimization follows as:
where
$\mathbb{Z}$ is the set of integers. The average temporary list
$\overline{\mathbf{l}}$ is obtained by Algorithm 3 with the input
$\mathcal{A}$,
$\mathbf{p}$ and
$\mathbf{k}$. Algorithm 3 determines the average temporary list size
${\overline{l}}_{i}$ by
$f(\xb7)$ first, and when the
jth CRC is possible, reduces the
${\overline{l}}_{i}$ according to the function of
${p}_{j}$ as shown at line 5. Here, we assume that channel quality is high enough so that the correct path always exists in the list, and many errors randomly occur in wrong paths. Then,
${\overline{l}}_{i}1$ paths survive with probability of
$1/{2}^{{p}_{j}}$ by the
jth CRC. Through the Monte Carlo simulation introduced in
Section 4, we confirmed that the estimated complexity based on this assumption is very close to the experimental one. The average sorting complexity vector
$\overline{\mathbf{s}}=({\overline{s}}_{1},{\overline{s}}_{2},...,{\overline{s}}_{N})$ is generated by Equation (
3) with
$\overline{\mathbf{l}}$ obtained by Algorithm 3. Note that the optimization is appropriate for a small
J because it is a joint optimization of
J variables. In order to avoid high complexity search for a large
J, one can apply a greedy search that determines CRC positions, one after another.
For a lower complexity implementation, the IDSCL decoding is combined with the optimization of the
Jtuple CRC positions of Equation (
6). As mentioned in
Section 3.1, the IDSCL decoding instantly decides the information bits
$\in {\mathcal{A}}_{g}$, maintaining the temporary list size. Assume the good information set
${\mathcal{A}}_{g}$ is defined by
${\mathcal{A}}_{g}=\{i:{P}_{e}^{i}\le {\sigma}_{\mathrm{Th}}\}\subset \mathcal{A}$, where
${P}_{e}^{i}$ is the bit error probability of the
ith split channel, and
${\sigma}_{\mathrm{Th}}$ is a certain threshold. Let us consider the effect of the IDSCL decoding on the CRC position optimization. For example, assume there exists a burst of good split channels defined by
${\mathcal{A}}_{g}$, and a CRC is located before the burst of good split channels. Then, the CRC reduces the number of decoding paths, and the number of survival paths remains until the end of the burst because of the IDSCL decoding. Eventually, the interworking of the IDSCL decoding and CRC position optimization will give obvious complexity reduction. A more detailed example about the temporary list size is presented in
Section 4.
Algorithm 3 Calculation of ${\overline{l}}_{i}$’s for the basic SCL decoding of the Jtuple CRC scheme 
Require: $\mathcal{A}$, $\mathbf{p}$, $\mathbf{k}$ Ensure: $\overline{\mathbf{l}}=\left(\right)open="("\; close=")">{\overline{l}}_{1},{\overline{l}}_{2},\cdots ,{\overline{l}}_{N}$
 1:
${\overline{l}}_{1}\leftarrow 1$  2:
for $i=1:N1$ do  3:
${\overline{l}}_{i+1}\leftarrow f(\mathcal{A},{\overline{l}}_{i},i)$;  4:
if i indicates the last parity bit of the jth CRC then  5:
${\overline{l}}_{i+1}\leftarrow 1+({\overline{l}}_{i+1}1)/{2}^{{p}_{j}}$;  6:
end if  7:
end for  8:
return $\overline{\mathbf{l}}$;

When we employ the IDSCL decoding, the average temporary list size
${\overline{l}}_{i}$ is computed by
$g(\xb7)$ of Algorithm 4 instead of
$f(\xb7)$. Similarly, let
${\overline{l}}_{1}\triangleq 1$ in Algorithm 4. Entire
$\overline{\mathbf{l}}$ is obtained by the modified version of Algorithm 3 in which
$f(\xb7)$ at line 3 is replaced with
g(·), and the input of Algorithm 3 is changed to
$\mathbf{p}$,
$\mathbf{k}$,
$\mathcal{A}$ and
${\mathcal{A}}_{g}$. In addition, the average complexity of the sorting operation is modified as
where the bitonic sorter is assumed. Eventually, the optimization of CRC positions combined with the IDSCL decoding is performed by Equation (
6) with new
$\overline{\mathbf{l}}$ and
$\overline{\mathbf{s}}$ calculated with Algorithm 4 and Equation (
7).
Algorithm 4 Calculation of ${\overline{l}}_{i+1}$ with ${\mathcal{A}}_{g}$ for the IDSCL decoding, g($\mathcal{A}$,${\mathcal{A}}_{g}$,${\overline{l}}_{i}$,i) 
Require: $\mathcal{A}$, ${\mathcal{A}}_{g}$, ${\overline{l}}_{i}$, i Ensure: ${\overline{l}}_{i+1}$
 1:
if $i\in (\mathcal{A}\setminus {\mathcal{A}}_{g})$ then  2:
if $2{\overline{l}}_{i}\le L$ then  3:
${\overline{l}}_{i+1}\leftarrow 2{\overline{l}}_{i}$;  4:
else  5:
${\overline{l}}_{i+1}\leftarrow L$;  6:
end if  7:
else  8:
${\overline{l}}_{i+1}\leftarrow {\overline{l}}_{i}$;  9:
end if  10:
return ${\overline{l}}_{i+1}$;

In the next section, we compare the proposed and conventional schemes with mixed criteria of complexity and performance. That is, we assess the complexity reduction with only negligible performance degradation.
4. Simulation Results and Analysis
This section provides performance and complexity of the proposed and conventional schemes. In this paper, a Monte Carlo simulation has been carried out to evaluate the decoding performance and complexity. The block lengths for simulation are 1024, 4096 and only rate onehalf is considered. We use a binaryinput additive white Gaussian noise channel. The number of total CRC parities
$P=16$ and the list size
$L=32$ for all schemes. For
Jtuple CRC schemes,
$J=2,3$ are considered, and
$\mathbf{p}=(8,8),(4,4,8)$ are used, respectively. All codes are optimized at
${E}_{s}/{N}_{0}=1\phantom{\rule{0.222222em}{0ex}}\mathrm{dB}$ by the density evolution [
28]. As the conventional schemes, the SCRC scheme and the
Jtuple CRC scheme that has
J uniform message subblocks [
22] are evaluated for comparison. The former is abbreviated as “SCRC” and the latter as “UJCRC”. The
Jtuple CRC scheme is optimized in terms of the basic SCL decoding or the IDSCL decoding. They are all abbreviated as “OJCRC”. For all the schemes, simulation is performed under the basic and the IDSCL decoding.
Table 1 shows the message vector
$\mathbf{k}$ of OJCRC and UJCRC. In general, the optimal
$\mathbf{k}$,
${\mathbf{k}}^{\ast}$, differs depending on whether the IDSCL decoding is applied or not. The threshold
${\sigma}_{\mathrm{Th}}$ for the IDSCL decoding also differs according to the block length and
J because it is determined empirically to minimize the performance loss, and maximize the complexity reduction for the SCRC scheme. The ratio
${\mathcal{A}}_{g}/\mathcal{A}$ is about 0.746 and 0.806 for
$N=1024,4096$, respectively.
The comparison of the average temporary list sizes between the proposed and conventional 3CRC schemes for
$N=1024$,
$L=32$ and
$P=16$ is shown in
Figure 4.
Figure 4a is under the basic SCL decoding, and
Figure 4b under the IDSCL decoding. For SCRC in
Figure 4a, the temporary list size slowly increases up to
L until the bit index reaches about 220. In the case of U3CRC, the average temporary list size decreases from
L to
$1+31/\left({2}^{4}\right)=2.94$ two times due to use of two 4bit CRC codes. Its temporary list size, however, quickly increases to
L again. For the proposed scheme, the temporary list size of O3CRC is also dropped two times, but the list size is maintained small for a burst of consecutive frozen bits. In
Figure 4b, the trend of the temporary list size is similar to that of
Figure 4a. However, for O3CRC, the list size stays small longer due to instant decisions of highly reliable bits on top of frozen operations.
Table 2 and
Table 3 show the complexity ratio of the
Jtuple CRC schemes to the SCRC scheme with the basic SCL and the IDSCL decoding. The complexity ratios are estimated by the analysis in
Section 3, and the estimation is validated by simulation results. Here, O2CRC and O3CRC schemes are optimized for serial and parallel sorters separately. As mentioned, the analysis is done under the assumption that there is no early stopping of decoding that indicates a detected codeword error. This assumption is consistent with a high SNR environment. Thus, the experimental complexity is measured at the high SNR, e.g.,
${E}_{S}/{N}_{0}=1.0,1.5$ dB for
$N=1024,4096$, respectively, by averaging the number of additionequivalent operations. The complexity ratio from the experiment is in the parentheses of
Table 2 and
Table 3. It is observed that the experimental values are very close to the estimated ones. Note that the optimization of
JCRC scheme is performed based on the estimated complexity and the estimation turned out to be pretty accurate. By the way, a large complexity reduction of the IDSCL decoding in
Table 3 is obtained due to the skipping of sorting operations.
Figure 5,
Figure 6 and
Figure 7 give the performance and complexity comparison for the optimized 2CRC and 3CRC schemes in a wide range of SNR. The complexity ratio includes the reduction from the early stopping in the low SNR region, and it is calculated based on the complexity of SCRC (under the basic SCL decoding). Only the parallel bitonic sorting is assumed in the scheme optimization and complexity evaluation. The average complexity is obtained via simulation.
Figure 5 and
Figure 6 show the performance and complexity ratio of the proposed and conventional 2CRC schemes. In
Figure 5a and
Figure 6a, the negligible performance gap between the proposed and conventional schemes is observed, and the error performance loss is within 0.05 dB. However,
Figure 5b and
Figure 6b display obvious visible difference in complexity. Especially, when the IDSCL decoding is applied to the SCRC scheme, about 40% of the complexity is reduced. For
$N=1024$, the complexity reduction of O2CRC under the basic SCL decoding is about 7%
points compared to U2CRC at
${E}_{S}/{N}_{0}=1.0$ dB, but when the IDSCL decoding is used, about 4%
points are reduced compared to “U2CRC with ID”. In addition, at the low SNR, more complexity reduction is acquired thanks to the early stopping. For
$N=4096$, the complexity reduction of O2CRC under the basic SCL decoding is about 22%
points compared to U2CRC at
${E}_{S}/{N}_{0}=3.0$ dB, and about 18%
points are reduced under the IDSCL decoding compared to “U2CRC with ID”.
Figure 7 depicts the complexity ratio of the proposed and conventional 3CRC schemes. The performances for the 3CRC schemes are not included in this paper because they are virtually the same as for the 2CRC scheme. For the 3CRC scheme, the complexity reduction from the early stopping is smaller than the 2CRC scheme because the CRC of four bits located in the middle of vector
$\mathbf{u}$ can not eliminate most incorrect paths. Especially, if
$L=32$ and a CRC of four bits is used, then
$32/\left({2}^{4}\right)=2$ paths survive on average, which implies a lower probability of early decoding termination than the 2CRC scheme. However, if the IDSCL decoding is applied, there is more complexity reduction at the high SNR. For
$N=1024$, about 12%
points (about 9%
points) are reduced compared to U3CRC (U3CRC with ID), and about 11%
points (about 9%
points) for
$N=4096$ compared to U3CRC (U3CRC with ID).
Note that, for the proposed Jtuple CRC scheme, the complexity reduction can be increased, while keeping the performance level by selecting ${\mathcal{A}}_{g}$ rigorously, not based on a threshold ${\sigma}_{\mathrm{Th}}$. This paper provides a simple and systematic algorithm that minimizes the complexity and performance loss by adjusting the single parameter ${\sigma}_{\mathrm{Th}}$.