Adaptive Learned Belief Propagation for Decoding Error-Correcting Codes

Tasdighi, Alireza; Yousefi, Mansoor

doi:10.3390/e27080795

Open AccessArticle

Adaptive Learned Belief Propagation for Decoding Error-Correcting Codes

by

Alireza Tasdighi

and

Mansoor Yousefi

^*

Telecom Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(8), 795; https://doi.org/10.3390/e27080795

Submission received: 12 June 2025 / Revised: 16 July 2025 / Accepted: 22 July 2025 / Published: 25 July 2025

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

Weighted belief propagation (WBP) for the decoding of linear block codes is considered. In WBP, the Tanner graph of the code is unrolled with respect to the iterations of the belief propagation decoder. Then, weights are assigned to the edges of the resulting recurrent network and optimized offline using a training dataset. The main contribution of this paper is an adaptive WBP where the weights of the decoder are determined for each received word. Two variants of this decoder are investigated. In the parallel WBP decoders, the weights take values in a discrete set. A number of WBP decoders are run in parallel to search for the best sequence- of weights in real time. In the two-stage decoder, a small neural network is used to dynamically determine the weights of the WBP decoder for each received word. The proposed adaptive decoders demonstrate significant improvements over the static counterparts in two applications. In the first application, Bose–Chaudhuri–Hocquenghem, polar and quasi-cyclic low-density parity-check (QC-LDPC) codes are used over an additive white Gaussian noise channel. The results indicate that the adaptive WBP achieves bit error rates (BERs) up to an order of magnitude less than the BERs of the static WBP at about the same decoding complexity, depending on the code, its rate, and the signal-to-noise ratio. The second application is a concatenated code designed for a long-haul nonlinear optical fiber channel where the inner code is a QC-LDPC code and the outer code is a spatially coupled LDPC code. In this case, the inner code is decoded using an adaptive WBP, while the outer code is decoded using the sliding window decoder and static belief propagation. The results show that the adaptive WBP provides a coding gain of 0.8 dB compared to the neural normalized min-sum decoder, with about the same computational complexity and decoding latency.

Keywords:

belief propagation; neural networks; low-density parity-check codes; optical fiber communication

1. Introduction

Neural networks (NNs) have been widely studied to improve communication systems. The ability of NNs to learn from data and model complex relationships makes them indispensable tools for tasks such as equalization, monitoring, modulation classification, and beamforming [1]. While NNs have also been considered for decoding error-correcting codes for quite some time [2,3,4,5,6,7,8,9,10], interest in this area has surged significantly in recent years due to advances in NNs and their widespread commercialization [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46].

Two categories of neural decoders may be considered. In model-agnostic decoders, the NN has a general architecture independent of the conventional decoders in coding theory [12,34,42]. Many of the common architectures have been studied for decoding, including multi-layer perceptrons [5,12,47], convolutional NNs (CNNs) [48], recurrent neural networks (RNNs) [13], autoencoders [19,35,40], convolutional decoders [6], graph NNs [33,43], and transformers [34,41]. These models have been used to decode linear block codes [4,12], Reed–Solomon codes [49], convolutional codes [4,48,50], Bose–Chaudhuri–Hocquenghem (BCH) codes [11,16,18,31], Reed–Muller codes [5,25], turbo codes [13], low-density parity-check (LDPC) codes [37,51], and polar codes [52].

Training neural decoders is challenging because the number of codewords to classify depends exponentially on the number of information bits. Furthermore, the sample complexity of the NN is high for the small bit error rates (BERs) and long block lengths required in some applications. As a consequence, model-agnostic decoders often require a large number of parameters and may overfit, which makes them impractical unless the block length is short.

In model-based neural decoders, the architecture of the NN is based on the structure of a conventional decoder [11,14,25,28,31,53]. An example is weighted belief propagation (WBP), where the messages exchanged across the edges of the Tanner graph of the code are weighted and optimized [11,31,54]. This gives rise to a decoder in the form of a recurrent network obtained by unfolding the update equations of the belief propagation (BP) over the iterations. Since the WBP is a biased model, it has fewer parameters than the model-agnostic NNs at the same accuracy.

Prior work has demonstrated that the WBP outperforms BP for block lengths up to around 1000, particularly with structured codes, low-to-moderate code rates, and high signal-to-noise ratios (SNRs) [17,28,31,37,38,39,54,55,56,56]. It is believed that the improvement is achieved by altering the log-likelihood ratios (LLRs) that are passed along short cycles. For example, for BCH and LDPC codes with block lengths under 200, WBP provides frame error rate (FER) improvements of up to 0.4 dB in the waterfall region and up to 1.5 dB in the error-floor region [23,57,58]. Protograph-based (PB) QC-LDPC codes have been similarly decoded using the learned weighted min-sum (WMS) decoder [28].

The WBP does not generalize well at low bit error rates (BERs) due to the requirement of long block lengths and the resulting high sample and training complexity [44]. For example, in optical fiber communication, the block length can be up to tens of thousands to achieve a BER of

10^{- 15}

. In this case, the sample complexity of WBP is high, and the model does not generalize well when trained with a practically manageable number of examples.

The training complexity and storage requirements of the WBP can be reduced through parameter sharing. Lian et al. introduced a WBP decoder wherein the parameters are shared across or within the layers of the NN [18,39]. A number of parameter-sharing schemes in WBP are studied in [28,39]. Despite intensive research in recent years, WBP remains impractical in most real-world applications.

In this work, we improve the generalization of WBP to enhance its practical applicability. The WBP is a static NN, trained offline based on a dataset. The main contribution of this paper is the proposal of adaptive learned message-passing algorithms, where the weights assigned to messages are determined for each received word. In this case, the decoder is dynamic, changing its parameters for each transmission in real time.

Two variants of this decoder are proposed. In the parallel decoder architecture, the weights take values in a discrete set. A number of WMS decoders are run in parallel to find the best sequence of weights based on the Hamming weight of the syndrome of the received word. In the two-stage decoder, a secondary NN is trained to compute the weights to be used in the primary NN decoder. The secondary NN is a CNN that takes the LLRs of the received word and is optimized offline.

The performance and computational complexity of the static and adaptive decoders are compared in two applications. In the first application, a number of regular and irregular quasi-cyclic low-density parity-check (QC-LDPC) codes, along with a BCH and a polar code, are evaluated over an additive white Gaussian noise (AWGN) channel in both low- and high-rate regimes. The results indicate that the adaptive WMS decoders achieve decoding BERs up to an order of magnitude less than the BERs of the static WMS decoders, at about the same decoding complexity, depending on the code, its rate, and the SNR. The coding gain is 0.32 dB at a bit error rate of

10^{- 4}

in one example.

The second application is coding over a nonlinear optical fiber link with wavelength division multiplexing (WDM). The data rates in today’s optical fiber communication system approach terabits/s per wavelength. Here, the complexity, power consumption, and latency of the decoder are important considerations. We apply concatenated coding by combining a low-complexity short-block-length soft-decision inner code with a long-block-length hard-decision outer code. This approach allows the component codes to have much shorter block lengths and higher BERs than the combined code. As a result, it becomes feasible to train the WBP for decoding the inner code, addressing the curse of dimensionality and sample complexity issues. For PB QC-LDPC inner codes and a spatially coupled (SC) QC-LDPC outer code, the results indicate that the adaptive WBP outperforms the static WBP by 0.8 dB at about the same complexity and decoding latency in a 16-QAM 8 × 80 km 32 GBaud WDM system with five channels.

The remainder of this paper is organized as follows. Section 2 introduces the notation, followed by the channel models in Section 3. In Section 4, we introduce the WBP, and in Section 5, two adaptive learned message-passing algorithms. In Section 6, we compare the performance and complexity of the static and adaptive decoders, and in Section 7, we conclude the paper. Appendix A and Appendix B provide supplementary information, and Appendix C presents the parameters of the codes.

2. Notation

Natural, real, complex and non-negative numbers are denoted by

N

,

R

,

C

, and

R_{+}

, respectively. The set of integers from m to n is shown as

[m : n] = \{m, m + 1, \dots, n\}

. The special case

m = 1

is shortened to

[n] \overset{Δ}{=} [1 : n]

.

⌊ x ⌋

and

⌈ x ⌉

denote, respectively, the floor and ceiling of

x \in R

. The Galois field GF(q) with

q \in N

,

q \geq 2

, is

F_{q}

. The set of matrices with m rows, n columns, and elements in

[0 : q - 1]

is

F_{q}^{m \times n}

.

A sequence of length n is denoted as

x^{n} = (x_{1}, x_{2}, \dots, x_{n})

. Deterministic vectors are denoted by boldface font, e.g.,

x \in R^{n}

. The

i^{th}

entry of

x

is

{[x]}_{i}

. Deterministic matrices are shown by upper-case letters with mathrm font, e.g.,

A

.

The probability density function (PDF) of a random variable X is denoted by

{Pr}_{X} (x)

, shortened to

Pr (x)

if there is no ambiguity. The conditional PDF of Y given X is

{Pr}_{Y | X} (y | x)

. The expected value of a random variable X is denoted by

E (X)

. The real Gaussian PDF with mean

θ

and standard deviation

σ

is denoted by

N (θ, σ^{2})

. The Q function is

Q (x) = \frac{1}{2} erfc (\frac{x}{\sqrt{2}})

, where

erfc (x)

is the complementary error function. The binary entropy function is

H_{b} (x) = - (x {log}_{2} (x) + (1 - x) {log}_{2} (1 - x))

,

x \in (0, 1)

.

3. Channel Models

3.1. AWGN Channel

Encoder: We consider an

(n, k)

binary linear code

C

with the parity-check matrix (PCM)

H \in F_{2}^{m \times n}

, where n is the code length, k is the code dimension, and

m \geq n - k

,

m \in N

. The rate of the code is

r = k / n \geq 1 - \frac{m}{n}

. A PB QC-LDPC code is characterized by a lifting factor

M \in N

, a base matrix

B \in {- 1, 0, 1}^{λ \times ω}

, and an exponent matrix

P \in {0, 1, \dots, M - 1}^{λ \times ω}

where,

λ, ω \in N

,

λ < ω

. Given

(λ, ω, M, P)

, the PCM is obtained according to the procedure in Appendix A.

We evaluate a BCH code, seven regular and irregular QC-LDPC codes, and a polar code in the low- and high-rate regimes. These codes are summarized in Table 1 and described in Section 6.1. The parameters of the QC-LDPC codes are given in Appendix C.

The encoder maps a sequence of information bits

b = (b_{1}, b_{2}, \dots, b_{k})

,

b_{i} \in {0, 1}

,

i \in [k]

, to a codeword

c = (c_{1}, c_{2}, \dots, c_{n})

as

c = b G

, where

G \in F_{2}^{k \times n}

is the generator matrix of the code.

Channel model: The codeword

c

is modulated with a binary phase shift keying with symbols

\pm A \in R

, and transmitted over an AWGN channel. The vector of received symbols is

y = (y_{1}, \dots, y_{n})

, where

\begin{matrix} y_{i} = {(- 1)}^{c_{i}} A + z_{i}, i = 1, 2, 3, \dots, \end{matrix}

(1)

where

z_{j} \sim i . i . d . N (0, σ^{2})

. If

ρ

is the SNR, the channel can be normalized so that

A = 1

, and

σ^{2} = {(r ρ)}^{- 1}

.

The LLR function

L : R^{n} \mapsto R^{n}

of

c

conditioned on

y

is

\begin{matrix} {[L (y)]}_{i} \overset{Δ}{=} log (\frac{Pr (c_{i} = 0 ∣ y)}{Pr (c_{i} = 1 ∣ y)}) \end{matrix}

= log (\frac{Pr (y_{i} ∣ c_{i} = 0)}{Pr (y_{i} ∣ c_{i} = 1)})

(2)

= 4 r ρ y_{i}

(3)

for each

i \in [n]

. Equation (2) holds under the assumption that

c_{i}

are independent and uniformly distributed, and (3) is obtained from Gaussian

Pr (y_{i} | c_{i})

from (1).

Decoder: We compare the performance and complexity of the static and adaptive belief propagation. The static decoders are tanh-based BP, the auto-regressive BP and WBP with different levels of parameter sharing, including BP with simple scaling and parameter-adapter networks (SS-PAN) [18]. Additionally, to assess the achievable performance with a large number of parameters in the decoder, we include a comparison with two model-agnostic neural decoders based on transformers [41] and graph NNs [33,43].

3.2. Optical Fiber Channel

In this application, we consider a multi-user fiber-optic transmission system using WDM with

N_{c}

users, each of bandwidth

B_{0}

Hz, as shown in Figure 1.

Transmitter (TX): A binary source generates a pseudo-random bit sequence of

n_{b}

information bits

b_{u} = (b_{u, 1}, b_{u, 2}, \dots, b_{u, n_{b}})

,

b_{u, j} \in {0, 1}

, for the WDM channel

u \in [u_{1} : u_{2}]

,

u_{1} = - ⌊ N_{c} / 2 ⌋

,

u_{2} = ⌈ N_{c} / 2 ⌉ - 1

,

j \in [1 : n_{b}]

. Bit-interleaved coded modulation (BICM) with concatenated coding is applied in WDM channels independently. The BICM comprises an outer encoder for the code

C_{o} (n_{o}, k_{o})

with rate

r_{o}

, an inner encoder for

C_{i} (n_{i}, k_{i})

with rate

r_{i}

, a permuter

π

, and a mapper

μ

, where

n_{o} / k_{i}

is assumed to be an integer. The concatenated code

C = C_{i} \circ C_{o}

has parameters

k = k_{o}

and

n = n_{o} / r_{i}

and

r_{total} = r_{i} r_{o}

. Each consecutive subsequence of

b_{u}

of length

k_{o}

is mapped to

{\tilde{c}}_{u} \in C_{o} \subset {0, 1}^{n_{o}}

by the outer encoder and subsequently to

c_{u} \in C \subset {0, 1}^{n}

by the inner encoder. Next,

c_{u}

is mapped to

{\bar{c}}_{u} = π (c_{u})

by a random uniform permuter

π : F_{2}^{n} \mapsto F_{2}^{n}

. The mapper

μ : F_{2}^{m} \mapsto A

maps consecutive sub-sequences of

{\bar{c}}_{u}

of length m to a symbol in a constellation

A

of size

M = 2^{m}

. Thus, the BICM maps

b_{u}

to a sequence of symbols

s_{u} = (s_{u, l_{1}}, s_{u, l_{1} + 1}, \dots, s_{u, l_{2}})

, where

s_{u, l} \in A

,

l \in [l_{1} : l_{2}]

,

l_{1} = - ⌊ n_{s} / 2 ⌋

,

l_{2} = ⌈ n_{s} / 2 ⌉ - 1

,

n_{s} = n_{b} / m

.

The symbols

s_{u, l}

are modulated with a root raised cosine (RRC) pulse shape

p (t)

at symbol rate

R_{s}

, where t is the time. The resulting electrical signal of each channel

x_{u} (t)

is converted to an optical signal and subsequently multiplexed by a WDM multiplexer. The baseband representation of the transmitted signal is

x (t) = \sum_{u = u_{1}}^{u_{2}} \sum_{l = l_{1}}^{l_{2}} s_{u, l} p (t - l T_{s}) e^{j 2 π u B_{0} t},

(4)

where

T_{s} = 1 / R_{s}

and

s_{u, l}

are i.i.d. random variables. The average power of the transmitted signal is

P

; thus,

E | s_{u, l} |^{2} = P

,

\forall u, l

.

Encoder: SC LDPC codes are attractive options for optical communications [59]. These codes approach the capacity of the canonical communication channels [60,61] and have a flexible performance–complexity trade-off. They are decoded with the BP and the sliding window decoder (SWD). Another class of codes in optical communication is the SC product-like codes, braided block codes [62], SC turbo product codes [63], staircase codes [64,65,66] and their generalizations [67,68]. These codes are decoded with iterative, algebraic hard decision algorithms and prioritize low-complexity, hardware-friendly decoding over coding gain.

In this paper, the encoding in BICM combines an inner (binary or non-binary) QC-LDPC code

C_{i}

with an outer SC QC-LDPC code

C_{o}

whose component code is a multi-edge QC-LDPC code, as outlined in Table 1. The construction and parameters of the codes are given in Appendix B.1 and Appendix C, respectively.

The choice of the inner code is due to the decoder complexity. Other options have been considered in the literature, for instance, algebraic codes, e.g., the BCH (Section 3.3 [69]) or Reed-Solomon codes (Section 3.4 [69]), or polar codes [70]. However, the QC-LDPC codes are simpler to decode, especially at high rates. The outer code can be an LDPC code [71,72], a staircase code [71,73], or a SC-LDPC code [72].

Fiber-optic link: The channel is an optical fiber link with

N_{s p}

spans of length

L_{s p}

of the standard single-mode fiber, with parameters in Table 2.

Let

q (t, z) : R \times R^{+} \mapsto C

be the complex envelope of the signal as a function of time t and distance z along the fiber. The propagation of the signal in one polarization over one span of optical fiber is modeled by the nonlinear Schrödinger equation [74]

\begin{matrix} \frac{\partial q (t, z)}{\partial z} = - \frac{α}{2} q (t, z) - \frac{j β_{2}}{2} \frac{\partial^{2} q (t, z)}{\partial t^{2}} + j γ {| q (t, z) |}^{2} q (t, z), \end{matrix}

(5)

where

α

is the loss constant,

β_{2}

is the chromatic dispersion coefficient,

γ

is the Kerr nonlinearity parameter, and

j = \sqrt{- 1}

. The transmitter is located at

z = 0

and the receiver at

z = L

. The continuous-time model (5) can be discretized to a discrete-time discrete-space model using the split-step Fourier method (Section III.B [75]). The optical fiber channel described by the partial differential Equation (5) differs significantly from the AWGN channel due to the presence of nonlinearity.

An erbium doped fiber amplifier (EDFA) is placed at the end of each span, which compensates for the fiber loss, and introduces amplified spontaneous emission noise. The input

x_{i} (t)

–output

x_{o} (t)

relation of the EDFA is given by

x_{o} (t) = G x_{i} (t) + n (t)

, where

G = e^{α L_{s p}}

is the amplifier’s gain, and

n (t)

is zero-mean circularly symmetric complex Gaussian noise process with the power spectral density

\begin{matrix} σ^{2} = \frac{1}{2} (G - 1) h f_{0} N F, \end{matrix}

where NF is the noise figure, h is a Planck constant, and

f_{0}

is the carrier frequency at 1550 nm.

Receiver: The advent of the coherent detection paved the way for the compensation of transmission effects in optical fiber using digital signal processing (DSP). As a result, the linear effects in the channel, such as the chromatic dispersion and polarization-induced impairments, and some of the nonlinear effects, can be compensated with DSP.

At the receiver, a demultiplexer filters the signal of each WDM channel. The optical signal for each channel is converted to an electrical signal by a coherent receiver. Next, DSP followed by bit-interleaved coded demodulation (BICD) is applied. The continuous-time electrical signal is converted to the discrete-time signals by analogue-to-digital converters, down-sampled, and passed to a digital signal processing unit for the mitigation of the channel impairments. For equalization, digital back-propagation (DBP) based on the symmetric split-step Fourier method is applied to compensate for most of the linear and nonlinear fiber impairments [76].

After DSP, the symbols are still subject to signal-dependent noise, which is mitigated by the bit-interleaved coded demodulator (BICD). Let

y \in C^{n_{s}}

denote the equalized signal samples for the transmitted symbols

s \in A^{n_{s}}

in the WDM channel of interest. Given that the deterministic effects were equalized, we assume that the channel

s \mapsto y

is memoryless so that

Pr (y | s) = \prod_{l = 1}^{n_{s}} Pr (y_{l} | s_{l})

. For

s \in A

, let

μ^{- 1} (s) = (b_{1} (s), \dots, b_{m} (s))

. From the symbol-to-symbol channel

Pr (y | s)

,

s \in A

,

y \in C

, we obtain m bit-to-symbol channels

\begin{matrix} {Pr}_{j} (y | b) = \sum_{s \in A, b_{j} (s) = b} Pr (y | s), \end{matrix}

(6)

where

b \in F_{2}

, and

j \in [m]

.

Let

\bar{c} = (b_{1} (s_{1}), \dots, b_{m} (s_{1}), \dots, b_{1} (s_{n_{s}}), \dots, b_{m} (s_{n_{s}}))

,

n = m n_{s}

. The LLR function

L : C^{n_{s}} \mapsto R^{n}

of

c

conditioned on

y

is, for each

i \in [n]

,

\begin{matrix} {[L (y)]}_{i} & \overset{Δ}{=} log (\frac{Pr (c_{i} = 0 ∣ y)}{Pr (c_{i} = 1 ∣ y)}) \\ = log (\frac{Pr ({\bar{c}}_{i^{'}} = 0 ∣ y)}{Pr ({\bar{c}}_{i^{'}} = 1 ∣ y)}) \\ = log (\frac{{Pr}_{j} (y ∣ b = 0)}{{Pr}_{j} (y ∣ b = 1)}), \end{matrix}

(7)

where

i^{'}

is obtained from i according to

π

,

j = i^{'} mod m

, and

{Pr}_{j} (y | b)

is defined in (6).

Decoder: The decoding of

C_{i} \circ C_{o}

consists of two steps. First,

C_{i}

is decoded using an adaptive WBP in Appendix B, which takes the soft information

L (y) \in R^{n}

and corrects some errors. Second,

C_{o}

is decoded using the min-sum (MS) decoder with SWD in Appendix B, which further lowers the BER, and outputs the decoded information bits

\hat{b}

. The LLRs in the inner decoder are represented with 32 bits, and in the outer decoder are quantized at 4 bits with per-window configuration.

In optical communication, the forward error correction (FEC) overhead 6–25% is common [77]. Thus, the inner code typically has a high rate of ≥0.9 and a block length of several thousands, achieving a BER of

10^{- 6}

–

10^{- 2}

. The outer code has a length of up to tens of thousands, lowering the BER to an error floor to ∼

10^{- 15}

.

3.3. Performance Metrics

Q-factor: The SNR per bit in the optical fiber channel is

E_{b} / N_{o}

, where

E_{b} = P / m

is the bit energy, and

N_{o} = σ^{2} B N_{s p}

is the total noise power in the link, where

B = B_{0} N_{c}

. The performance of the uncoded communication system is often measured by the BER. The Q-factor for a given BER is the corresponding SNR in an additive white Gaussian noise channel with binary phase-shift keying modulation:

\begin{matrix} QF = 20 {log}_{10} (\sqrt{2} {erfc}^{- 1} (2 BER)), dB . \end{matrix}

Coding gain: Let

{BER}_{i}

and

{QF}_{i}

(respectively,

{BER}_{o}

and

{QF}_{o}

) denote the BER and Q-factor at the input (respectively, output) of the decoder. The coding gain (CG) in dB is the reduction in the Q-factor

\begin{matrix} CG & = & {QF}_{o} - {QF}_{i} & = & 20 {log}_{10} {erfc}^{- 1} (2 {BER}_{o}) - 20 {log}_{10} {erfc}^{- 1} (2 {BER}_{i}) . \end{matrix}

The corresponding net CG (NCG) is

\begin{matrix} NCG = CG + 10 {log}_{10} r_{total} . \end{matrix}

(8)

Finite block-length NCG: If n is finite, the rate

r_{total}

in (8) may be replaced with the information rate in the finite block-length regime [78]

C_{f} \approx C - {log}_{2} (e) \sqrt{\frac{{BER}_{i} (1 - {BER}_{i})}{n}} Q^{- 1} ({BER}_{o}),

where

Q (x) = \frac{1}{2} erfc (\frac{x}{\sqrt{2}}) .

4. Weighted Belief Propagation

Given a code

C

, one can construct a bipartite Tanner graph

T_{C} = (C, V, E)

, where

C = [m]

,

[m] \overset{Δ}{=} 1, 2, \dots, m

,

V = [n]

, and

E = {(c, v) \in C \times V ∣ H_{c, v} \neq 0}

are, respectively, the set of check nodes, variable nodes and the edges connecting them. Let

V_{c} = {v \in V ∣ (c, v) \in E}

,

C_{v} = {c \in C ∣ (c, v) \in E}

, and

d_{c}

and

d_{v}

be the degree of c and v in

T_{C}

, respectively.

The WBP is an iterative decoder based on the exchange of the weighted LLRs between the variable nodes and the check nodes in

T_{C}

[11,79]. Let

L_{c 2 v}^{(t)}

denote the extrinsic LLR from the check node c to the variable node v at iteration t. Define similarly

L_{v 2 c}^{(t)}

.

The decoder is initialized at

t = 1

with

L_{v 2 c}^{(0)} = L (y_{j})

, where v is the j-th variable node, and

L (y_{j})

is obtained from (3) or (7). For iteration

t \in [T]

, the LLRs are updated in two steps.

The check node update:

\begin{matrix} L_{c 2 v}^{(t)} \overset{(a)}{=} 2 {tanh}^{- 1} \{\prod_{v^{'} \in V_{c} ∖ {v}} tanh \frac{γ_{v^{'}, c}^{(t)}}{2} L_{v^{'} 2 c}^{(t - 1)}\} \\ \overset{(b)}{\approx} {\underline{L}}_{c, v} \prod_{v^{'} \in V_{c} ∖ {v}} γ_{v^{'}, c}^{(t)} sign (L_{v^{'} 2 c}^{(t - 1)}), \end{matrix}

(9)

where

{\underline{L}}_{c, v} = min_{v^{'} \in V_{c} ∖ {v}} \{|L_{v^{'} 2 c}^{(t - 1)}|\} .

The equation in

(a)

represents the update relation in the BP [69], where the LLR messages are scaled by non-negative weights

{γ_{v, c}^{(t)} : v \in V, c \in C_{v}, t \in [T]}

. Further,

(b)

is obtained from

(a)

though an approximation to lower the computational cost. The WBP and WMS decoders use

(a)

and

(b)

, respectively.

The variable-node update:

\begin{matrix} L_{v 2 c}^{(t)} = α_{v}^{(t)} L (y) + \sum_{c^{'} \in C_{v} ∖ {c}} β_{c^{'}, v}^{(t)} L_{c^{'} 2 v}^{(t - 1)} . \end{matrix}

(10)

This is the update relation in the BP, to which the sets of non-negative weights

{α_{v}^{(t)} : v \in V, t \in [T]}

and

{β_{c, v}^{(t)} : c \in C, v \in V_{c}, t \in [T]}

are introduced.

At the end of each iteration t, a hard decision is made

\begin{matrix} {\bar{y}}_{j} = \{\begin{matrix} 1, & if L_{v}^{(t)} < 0, \\ 0, & if L_{v}^{(t)} \geq 0, \end{matrix} \end{matrix}

(11)

where

\begin{matrix} L_{v}^{(t)} = L (y_{j}) + \sum_{c \in C_{v}} L_{c 2 v}^{(t)} . \end{matrix}

(12)

Let

\bar{y} = ({\bar{y}}_{1}, {\bar{y}}_{2}, \dots, {\bar{y}}_{n}) \in F_{2}^{n}

, and let

s = \bar{y} H^{T} \in F_{2}^{m}

be the syndrome. The algorithm stops if

s = 0

or

t = T

.

The computation in (9) and (10) can be expressed with an NN. The Tanner graph

T_{C}

is unrolled over the iterations to obtain a recurrent network with

2 T

layers (see Figure 2), in which the weights

γ_{v, c}^{(t)}

and

β_{c, v}^{(t)}

are assigned to the edges of

T_{C}

, and the weights

α_{v}^{(t)}

to the outputs [16]. The weights are obtained by minimizing a loss function evaluated over a training dataset using the standard optimizers for NNs.

4.1. Parameter Sharing Schemes

The training complexity of WBP can be reduced through parameter sharing at the cost of performance loss. We consider dimensions

(t, v, c)

for the ragged arrays

γ_{v, c}^{(t)}

and

β_{c, v}^{(t)}

. In Type T parameter sharing over

γ_{v, c}^{(t)}

, parameters are shared with respect to iterations t. In Type

T_{a}

scheme,

γ_{v, c}^{(t)} = β_{c, v}^{(t)} = γ_{v, c}

,

\forall (c, v) \in E

,

\forall t \in [T]

. In this case, there is a single ragged array with

| E |

trainable parameters

{γ_{v, c}}_{c \in C_{v}, v \in V_{c}}

. For the regular LDPC code,

| E | = n d_{v} = m d_{c}

. It has been observed that for typical block lengths, indeed, the weights do not change significantly with iterations [28]. In Type

T_{b}

, there are T arrays

γ_{v, c}^{(t)} = β_{c, v}^{(t)}

, while in Type

T_{c}

, there are two arrays

γ_{v, c}^{(t)} = γ_{v, c}

and

β_{c, v}^{(t)} = β_{c, v}

. Type

T_{a}

and

T_{c}

decoders can be referred to as BP-RNN decoders and Type

T_{b}

as feedforward BP. In Type V sharing,

γ_{v, c}^{(t)} = γ_{c}^{(t)}

is independent of v. This corresponds to one weight per check node. Likewise, in Type C sharing, there is one weight per check node update, and

γ_{v, c}^{(t)} = γ_{v}^{(t)}

.

These schemes can be combined. For instance, in Type

T_{a} V C

parameter sharing,

β_{c, v}^{(t)} = γ_{v, c}^{(t)} = γ

. Thus, a single parameter

γ

is introduced in all layers of the NN. This decoder is referred to as the neural normalized BP, e.g., neural normalized min-sum (NNMS) decoder when BP is based on the MS algorithm. The latter is similar to the normalized MS decoder, except that the parameter

γ

is empirically determined there. In the Type

T_{b} V C

scheme,

β_{c, v}^{(t)} = γ_{v, c}^{(t)} = γ^{(t)}

. Here, there is one weight per iteration. In this paper,

α_{v} = 1

\forall t \in T

.

4.2. WBP over $F_{q}$

The construction and decoding of the PB QC-LDPC binary codes can be extended to codes over a finite field

F_{q}

[80,81]. Here, there are

q - 1

LLR messages sent from each node, defined in Equation (1) [81]. The update equations of the BP are similar to (9)–(10), and presented in [81] for the extended min-sum (EMS) and in [82] for the weighted EMS (WEMS) decoder.

The parameter sharing for the four-dimensional ragged array

{γ_{v, c, q^{'}}^{(t)}}_{t, v \in C_{v}, c \in V_{c}, q^{'} \in F_{q}}

is defined in Section 4.1. In the check-node update of the WEMS algorithm, it is possible to assign a distinct weight to each coefficient

q^{'} \in F_{q}

for every variable node. For instance, in the Type

T_{c} C Q

scheme,

γ_{v, c, q^{'}}^{(t)} = γ_{v}

and

β_{c, v, q^{'}}^{(t)} = β_{v}

, so there is one weight per variable and one per check node

\forall t \in T

. In the case of Type

T_{b} V C Q

, there is only one weight per variable, iteration, and coefficient. In this case, if BP is based on the non-binary EMS algorithm, the decoder is called the neural normalized EMS (NNEMS).

Remark 1.

The EMS decoder has a truncation factor in

{1, 2, \dots, q}

that provides a trade-off between complexity and accuracy. In this paper, it is set to q to investigate the maximum performance.

5. Adaptive Learned Message Passing Algorithms

The weights of the static WBP are obtained by training the network offline using a dataset. A WBP where the weights are determined for each received word

y

is an adaptive WBP. The weights must therefore be found by online optimization. To manage the complexity, we consider a WMS decoder with Type

T_{a}

parameter sharing. Thus, the decoder has one weight per T iterations, which must be determined for a received

y

.

Let

c \in F_{2}^{n}

be a codeword and

y \in U

be the corresponding received word, where

U = R^{n}

for the AWGN channel and

U = C^{n_{s}}

for the optical fiber channel. Let

\bar{y} = D_{γ} (y)

be the word decoded by a Type

T_{a}

WBP decoder with weight

γ^{(t)} \in R^{+}

in iteration

t \in [T]

, where

γ = (γ^{(1)}, γ^{(2)}, \dots, γ^{(T)})

. In the adaptive decoder, we wish to find a function

g : U \mapsto X

,

X \subseteq R_{+}^{T}

,

γ = g (y)

that minimizes the probability that

D_{g (y)} (y)

makes an error

\begin{matrix} min_{g \in H} \Pr (\bar{y} \neq c), \end{matrix}

(13)

where

H

is a functional class. The static decoder is a special case where

g (.)

is a constant function. Two variants of this decoder are proposed, illustrated in Figure 3.

5.1. Parallel Decoders

Architecture: In parallel decoders,

g (.)

is found through searching. Here,

γ^{(t)}

takes value in a discrete set

X_{t} = \{x_{1}^{(t)}, x_{2}^{(t)}, \dots, x_{K_{t}}^{(t)}\}

,

K_{t} \in N

, and thus

γ \in X : = \prod_{t = 1}^{T} X_{t} = \{γ_{1}, \dots, γ_{ν}\}

. The parallel decoders consist of

ν

independent decoders

{\bar{y}}_{i} = D_{γ_{i}} (y)

,

i \in [ν]

, running concurrently. Since

\Pr ({\bar{y}}_{i} \neq c)

in (13) is generally intractable, a sub-optimal

g (.)

is selected as follows. At the end of decoding by

D_{γ_{i}}

, the syndrome

s_{i} = {\bar{y}}_{i} H^{T}

is computed. Let

\begin{matrix} i^{*} = {argmin}_{i \in [ν]} \{| | s_{i} {| |}_{H} : s_{i} = {\bar{y}}_{i} H^{T}\}, \end{matrix}

(14)

be the index of the decoder whose syndrome has the smallest Hamming weight. Then,

g (y) = γ_{i^{*}}

, and

\bar{y} = D_{γ_{i^{*}}} (y)

.

In practice, the search can be performed up to depth

T_{1} = 5

iterations. However, the BP decoder often has to run for more iterations. Thus, a WBP decoder with weights

γ_{i^{*}}

can continue the output with

T_{2}

iterations.

Remark 2.

The decoder obtained via (14) is generally sub-optimal. Minimizing

{∥ s_{i} ∥}_{H}

does not necessarily minimize the number of errors. However, for random codes, the decoder obtained from (14) outperforms the static decoder.

Remark 3.

If

D_{γ_{1}}

and

D_{γ_{2}}

yield the same number of errors, the decoder with the smaller weight vector is selected, which tends to output smaller LLRs.

Obtaining

x_{k}^{(t)}

from the distribution of weights.: The values of

x_{k}^{(t)}

can be determined by dividing a sub-interval in

[0, 1]

uniformly. The resulting parallel WMS decoder outperforms WMS; however, the performance can be improved by choosing

x_{k}^{(t)}

based on the probability distribution of the weights.

The probability distribution of the channel noise induces a distribution on

y

and consequently on

γ = g (y)

. Let

Γ^{(t)}

be a random variable representing

γ^{(t)}

. Denote the corresponding mean by

θ^{(t)}

, standard deviation by

σ^{(t)}

, and the cumulative distribution function by

C_{t} (.) \overset{Δ}{=} C_{Γ^{(t)}} (.)

. For

ϵ > 0

, set

\begin{matrix} x_{k}^{(t)} = inf \{x : C_{t} (x) > \frac{ϵ}{2} + (k - 1) \frac{1 - ϵ}{K_{t}}\} . \end{matrix}

(15)

The numbers

x_{1}^{(t)} < x_{2}^{(t)} < \dots < x_{K_{t}}^{(t)}

partition the real line into intervals such that

Pr (Γ^{(t)} \in [x_{1}^{(t)}, x_{K_{t}}^{(t)}])

= 1 - ϵ

and

Pr (Γ^{(t)} \in [x_{k}^{(t)}, x_{k + 1}^{(t)}] |Γ^{(t)} \in [x_{1}^{(t)}, x_{K_{t}}^{(t)}])

= \frac{1}{K_{t}}

. In practice,

Γ^{(t)}

has a distribution close to Gaussian, in which case

x_{k}^{(t)}

values are given by the explicit formulas in Lemma 1.

Lemma 1.

Let

Γ^{(t)}

have a cumulative distribution function

C_{t} (.)

that is continuous and strictly monotonic. For

k \in [K_{t}]

,

\begin{matrix} x_{k}^{(t)} = C_{t}^{- 1} (\frac{ϵ}{2} + (k - 1) \frac{1 - ϵ}{K_{t}}) . \end{matrix}

If

Γ^{(t)}

has a Gaussian distribution with a mean

θ^{(t)}

and standard deviation

σ^{(t)}

, then

\begin{matrix} x_{k}^{(t)} = θ^{(t)} + \sqrt{2} σ^{(t)} {erfc}^{- 1} (2 - ϵ - \frac{2 (k - 1) (1 - ϵ)}{K_{t}}), \end{matrix}

(16)

where erfc is the complementary error function.

Proof.

The proof is based on elementary calculus. □

Obtaining the distribution of weights: To apply (15) or (16),

C_{t} (.)

is required. To this end, a static WBP (with no parameter sharing) is trained offline given a dataset

{(y^{(i)}, c^{(i)})}_{i}

. The empirical cumulative distribution of the weights in each iteration is computed as an approximation to

C_{t} (.)

. However, if the BER is low, it can be difficult to obtain a dataset that contains a sufficient number of examples corresponding to incorrectly decoded words required to obtain good generalization.

To address this issue, we apply active learning [23,83]. This approach is based on the fact that the training examples near the decision boundary of the optimal classifier determine the classifier the most. Hence, input examples are sampled from a probability distribution with a support near the decision boundary.

The following approach to active learning is considered. At epoch e in the training of the WBP, random codewords

c

and the corresponding outputs

y

are computed. The decoder from the epoch

e - 1

is applied to decode

y

to

\hat{c} = WBP (γ, L (y))

. An acquisition function

A_{f} (c, \hat{c}) : C \times F_{2}^{n} \mapsto R^{+}

evaluates whether the example pair

(y, c)

should be retained. A candidate example is retained if

A_{f} (c, \hat{c})

is in a given range.

The choice of the acquisition function depends on the specific problem being solved, the architecture of the NN, and the availability of the labeled data [83]. In the context of training the NN decoders for channel coding, the authors of [23] use distance parameters and reliability parameters. Inspired by [23], the authors of [84] define the acquisition function using importance sampling. In this paper, the acquisition function is the number of errors

A_{f} (c, \hat{c}) = {∥ c - \hat{c} ∥}_{H} = d_{H} (c, \hat{c})

, where

d_{H}

is the Hamming distance.

The dataset is incrementally generated and pruned as follows. At each epoch e, a subset

S_{e} = {(y^{(i)}, c^{(i)})}_{i = 1}^{b_{1}}

of

b_{1}

examples, filtered by the acquisition function, is selected. The entire dataset at epoch e is

S^{e} = Prune (\cup_{e^{'} = 1}^{e} S_{e^{'}})

and has size

b_{2} > b_{1}

. The operator Prune removes the subsets

S_{e^{'}}

introduced in old epochs

e^{'} \in [e - e_{0}]

if

e > e_{0} \overset{Δ}{=} b_{2} / b_{1}

and otherwise leaves its input intact. At each epoch e, the loss function is averaged over a batch set of size

b_{s}

obtained by randomly sampling from

S^{e}

.

Complexity of the parallel decoders: The computational complexity of the decoder is measured in real multiplications (RMs). For instance, the complexity of the WMS with T iterations,

α_{v} = 1

, without parameter sharing, or with Type

T_{b}

parameter sharing, is

RM = 2 T | E |

, where

| E | = n d_{v} = m d_{c}

is the number of edges of the Tanner graph of the code. For the WMS decoder with Type

T_{a}

or Type

T_{b} V C

parameter sharing,

RM = T (m + n)

. The latter arises from the fact that equal weights factor out of the ∑ and the min terms in BP and are applied once. Thus, the complexity of ν parallel WMS decoders with Type

T_{a}

parameter sharing is

RM = ν T (m + n)

. If

α_{v} \neq 1

,

n T

is added to the above formulas. Finally, the complexity of Type

T_{a} V C

decoder is RM

= n

per single iteration. These expressions neglect the cost of the syndrome check.

5.2. Two-Stage Decoder

In a parallel decoder, the weights are restricted in a discrete set. The number of parallel decoders depends exponentially on the size of this set. The two-stage decoder predicts arbitrary non-negative weights, without the exponential complexity of the parallel decoders. Further, since the weights are arbitrary, the two-stage decoder can improve upon the performance of the parallel decoders, when the output LLRs are sensitive to the weights.

Architecture: Recall that we wish to find a function

g (y)

that minimizes the BER in (13). In a two-stage decoder, this function is expressed by an NN

\bar{γ} = g_{θ} (y)

parameterized by vector θ. Thus, the two-stage decoder is a combination of an NN and a WBP. First, the NN takes as input either the LLRs at the channel output

L (y)

or

(L (y), y)

and outputs the vector of weights

\bar{γ}

. Then, the WBP decoder takes the channel LLRs

L (y)

and weights

\bar{γ}

and outputs the decoded word

\bar{y}

.

The parameters θ are found using a dataset of examples

{\{(y^{(i)}, γ^{(i)})\}}_{i}

, where

γ^{(i)}

is the target weight. This dataset can be obtained through a simulation, i.e., transmitting a codeword

c^{(i)}

, receiving

y^{(i)}

, and using, e.g., an offline parallel decoder to determine the target weight

γ^{(i)}

. In this manner,

g_{θ} (y)

is expressed in a functional form instead of being determined by real-time search, which may be more expensive.

In this paper, the NN is a CNN consisting of a cascade of two one-dimensional convolutional layers

{Conv}_{1}

and

{Conv}_{2}

, followed by a dense layer

Dens

.

{Conv}_{i}

applies

F_{i}

filters of size

S_{i}

and stride 1, and the rectified linear unit (ReLU) activation,

i = 1, 2

. The output of

{Conv}_{2}

is flattened and passed to

Dense

, which produces the vector of weights

\bar{γ}

of length T. This final layer is a linear transformation with ReLU activation to produce non-negative weights.

The model is trained by minimizing the quantile loss function

\begin{matrix} l_{ξ} (γ, \bar{γ}) = mean (max (ξ (γ - \bar{γ}), (ξ - 1) (γ - \bar{γ}))), \end{matrix}

where

ξ \in (0, 1)

is the quantile parameter and max is applied per entry and mean over vector entries. The choice of loss is obtained by cross-validating the validation error over a number of candidate functions. This is an asymmetric absolute-like loss, which, if

ξ \geq 1 / 2

as in Section 6, encourages entries of

\bar{γ}

to be close to entries of γ from above rather than below.

Complexity of the two-stage decoder: The computational complexity of the two-stage decoder in the inference mode is the sum of the complexity of the CNN and WMS decoder

RM = T (m + n) + RM (CNN),

where the computational complexity of the CNN is

\begin{matrix} RM (CNN) & = RM ({Conv}_{1}) + RM ({Conv}_{2}) + RM (Dense) \\ = F_{1} (n - S_{1} + 1) S_{1} + F_{1} F_{2} (n - S_{1} - S_{2} + 2) S_{2} + F_{2} (n - S_{1} - S_{2} + 2) . \end{matrix}

(17)

The complexity can be significantly reduced by pruning the weights, for example, by setting to zero the weights below a threshold

τ_{prun}

.

Remark 4.

Neural decoders are sensitive to distribution shifts and often require retraining when the input distribution or channel conditions change. To lower the training complexity, [18] proposed a decoder that learns a mapping from the input SNR to the weights in WBP, enabling the decoder to operate across a range of SNRs. However, the WBP decoder in [18] is static, since the weights remain fixed throughout the transmission once chosen, despite being referred to as dynamic WBP. We do not address the problem of distribution shift in this paper.

6. Performance and Complexity Comparison

In this section, we study the performance and complexity trade-off of the static and adaptive decoders for the AWGN in Section 3.1 and optical fiber channel in Section 3.2.

6.1. AWGN Channel

Low-rate regime: To investigate the error correction performance of the decoders at low rates, we consider a BCH code

C_{1} (63, 36)

of rate

0.57

with the cycle-reduced parity check matrix

H_{cr}

in [85] and two QC-LDPC codes

C_{2} (3224, 1612)

and

C_{3} (4016, 2761)

, which are, respectively,

(4, 8)

- and

(5, 16)

-regular with rates of

0.5

and

0.69

. The parity check matrix of each QC-LDPC code is constructed using an exponent matrix obtained from the random progressive edge growth (PEG) algorithm [86], with a girth-search depth of two, which is subsequently refined manually to remove the short cycles in their Tanner graphs. The parameters of the QC-LDPC codes, including the exponent matrices

P_{2}

and

P_{3}

, are given in Appendix C. In addition, we consider the irregular LDPC codes

C_{4} (420, 180)

specified in the 5G New Radio (NR) standard and

C_{5} (128, 64)

in the Consultative Committee for Space Data Systems (CCSDS) standard [85]. The code parameters, such as exponent matrices, are also available in the public repository ([87] v0.1).

We compare our adaptive decoders with tanh-based BP, the auto-regressive BP and several static WMS decoder with different levels of parameter sharing, such as BP with SS-PAN [18]. The latter is a Type

T_{a} V C

WBP with

α_{v} = α

, i.e., a BP with two parameters. Additionally, to assess the achievable performance with a large number of parameters in the decoder, we include a comparison with two model-agnostic neural decoders based on the transformer [41] and graph NNs [33,43].

The number of iterations in the WMS decoders of the parallel decoders is chosen so that the total computational complexities of the parallel decoders and the static WMS decoder are about the same. In Figure 4a, this value is

T = 5

for

C_{1}

, where

T = T_{1} + T_{2}

is the total number of iterations; in Figure 4b–d,

T = 4

for

C_{2}

and

C_{3}

,

T = 6

for

C_{4}

, and

T = 10

for

C_{5}

; in Figure 4f,

T = 10

for

C_{1}

. Furthermore,

K_{t} = 4

for all

t \in [T]

.

To compute the value of weights

x_{k}^{(t)}

, the probability distribution of

Γ^{(t)}

is required. For this purpose, a WMS decoder is trained offline. The training dataset is a collection of examples obtained using the AWGN channel with a range of SNRs

ρ \in \{5.8, 6.0, 6.2\}

dB for

C_{1}

or

ρ \in \{3.8, 3.9, 4.0, 4.1, 4.2\}

dB for

C_{2}

and

C_{3}

. The datasets for

C_{4}

and

C_{5}

are obtained similarly, with different sets of SNRs. The acquisition function

A_{f} (., .)

in active learning is the Hamming distance. A candidate example for the training dataset is retained if

A_{f} \leq 10

. The parameters of the active learning are

b_{1} = b_{s} = 2000

, and

b_{2} =

40,000. The loss function is the binary cross-entropy. The models are trained using the Adam optimizer with a learning rate of

0.0005

. It is observed that the distribution of

Γ^{(t)}

is nearly Gaussian. Thus, we obtain

x_{k}^{(t)}

from (16). Table 3 presents the mean and variance of this distribution, and

x_{k}^{(t)}

, for the three codes considered.

For the two-stage decoder, we use a CNN with

F_{1} = 5

,

S_{1} = 3

,

F_{2} = 8

, and

S_{2} = 2

, determined by cross-validation. The CNN is trained with a dataset of size 80,000, batch set size 300, and the quantile loss function with

ξ = 0.75

. The number of iterations of the WMS decoder for each code is the same as above.

Figure 4 illustrates the BER vs. SNR for different codes, and different decoders for the same code. In each of Figure 4a–d, one can compare different decoders at about the same complexity (except for the parallel decoder with the largest ν that shows the smallest achievable BER). For instance, it can be seen in Figure 4a that the two-stage decoder achieves half the BER of the WMS with SS-PAN decoder at SNR

6.4

dB for the short length code

C_{1}

, or approximately

0.32

dB gain in SNR at a BER of

10^{- 4}

. For this code, the parallel WMS decoders with 3 iterations and

ν = 9

outperforms the tanh-based BP with nearly the same complexity. Figure 4b,c show that the two-stage decoder offers about an order-of-magnitude improvement in the BER compared to the Type

T_{a}

WMS decoder at

4.2

dB for moderate-length codes

C_{2}

and

C_{3}

, or over

0.1

dB gain at a BER of

10^{- 6}

. The performance gains vary with the code, parameters, and SNR.

Figure 4d–f compare decoders with different complexities at about the same performance. The proposed adaptive model-based decoders achieve the same performance of the model-agnostic static decoders, with far fewer parameters.

The computational complexity of the decoders are presented in Table 4. For the CNN, from (17),

RM = n (8 T + 95) - 24 T - 310

, which is further reduced by a factor of 4 upon pruning at the threshold

τ_{prun} = 0.001

, with minimal impact on BER. Thus, the two-stage decoder requires less than half of the RM of the WMS decoder with no or Type

T_{a}

parameter sharing. Moreover, the two-stage decoder requires approximately one-fifth of the RM of the parallel decoders with

ν = 16

. Compared to the WMS decoder with SS-PAN [18], the two-stage decoder has nearly double the complexity, albeit with much lower BER, as seen in Figure 4a.

High-rate regime: To further investigate the error correction performance of the decoders at high rates, we consider three single-edge QC-LDPC codes,

C_{6} (1050, 875)

,

C_{7} (1050, 850)

, and

C_{8} (4260, 3834)

, associated, respectively, with the PCMs

H_{6} (λ = 7, ω = 42, M = 25, P_{6})

,

H_{7} (8, 42, 25, P_{7})

, and

H_{8} (6, 60, 71, P_{8})

. These codes have rates

r = 0.84

,

0.81

, and

0.9

, respectively, and are constructed using the PEG algorithm. The PEG algorithm requires the degree distributions of the Tanner graph, which are optimized using the stochastic extrinsic information transfer (EXIT) chart described in Appendix C. Additionally, we include the polar code

C_{9} (1024, 854)

with

r = 0.84

from the 5G-NR standard as a state-of-the-art benchmark. The code parameters, including degree distribution polynomials and the exponent matrices, are given in Appendix C.

The acquisition function with active learning in the parallel decoders is based on Figure 5. The figure shows the scatter plot of the pre-FEC error

e_{1} = ∥ c - \bar{y} ∥

versus post-FEC error

e_{2} = {∥ c - \hat{c} ∥}_{H}

, for 340 examples

(c, y)

for

C_{6}

at

E_{b} / N_{0} = 4.25

dB. Here,

\bar{y}

is the hard decision of the LLRs at the channel output defined in (11), and

\hat{c}

is decoded with the best decoder at epoch e, i.e., the WMS with weights from epoch

e - 1

. The acquisition function retains

(c, y)

if

e_{1} = 0

(no error) or if

(e_{1}, e_{2})

falls in the rectangle in Figure 5 (with error). The rectangle is defined such that

Pr ((e_{1}, e_{2}) \in S) \geq 0.95

. It is ensured that 70% of examples satisfy

e_{1} = 0

and 30% with

(e_{1}, e_{2})

in the rectangle

S

. In this example,

e_{1} \in {80, 81, \dots, 100}

, and

e_{2} \in [μ - 2 σ, μ + 2 σ]

,

μ = 149.97

,

σ = 12.2

. We use

b_{1} = 2000

,

b_{2} =

20,000,

b_{s} = 500

and the learning rate

0.001

.

For the adaptive decoder, we consider five parallel decoders with

T_{1} = 4

iterations. The decoder for the binary codes

C_{6}

,

C_{7}

and

C_{8}

is WMS with Type

T_{a} V C

sharing. The output of the decoder with the smallest syndrome is continued with an MS decoder with

T_{2} = 4

iterations. The polar code, however, is decoded with either a cyclic redundancy check (CRC) and successive cancellation list (SCL) with list size L [88] or the optimized successive cancellation (OSC) [89].

Figure 6 shows the performance of the adaptive and static MS decoders for

C_{6}

,

C_{7}

and

C_{9}

. The polar code

C_{9}

with 24 CRC bits is simulated using AFF3CT software toolbox ([90] v3.0.2). It can be seen that at high SNRs,

E_{b} / N_{0} \geq 4.6

,

C_{6}

and

C_{7}

decoded with adaptive parallel decoders outperform

C_{9}

. Given this, and the higher complexity of decoding the polar code with either SCL or OSC [88], the choice of QC-LDPC codes for the inner code for the optical fiber channel in Section 6.2 is justified.

Figure 7 shows the performance of

C_{8} (4260, 3834)

with rate

0.9

. The adaptive WMS decoder with

T_{1} + T_{2} = 8

iterations outperforms the static MS decoder with

T = 8

iterations at

E_{b} / N_{0} = 5

by an order of magnitude in BER.

The gains of WBP depend on parameters such as the block length or SNR [91] (Section IV. d [44]). In general, the gain is decreased when the block length is increased, with other parameters remaining fixed.

6.2. Optical Fiber Channel

We simulate a 16-QAM WDM optical communication system described in Section 3.2, with parameters described in Table 2. The continuous-time model (5) is simulated with the split-step Fourier method with a spatial step size of 100 m and a simulation bandwidth of 200 GHz. DBP with two samples/symbol is applied to compensate for the physical effects and to obtain the per-symbol channel law

Pr (y | s)

,

s \in A

,

y \in C

. For the inner code in the concatenated code, we consider two QC-LDPC codes of rate

r_{i} = 0.92

: binary single-edge code

C_{10} (4000, 3680)

and non-binary multi-edge code

C_{11} (800, 32)

over

F_{32}

, respectively, with PCMs

H_{10} (4, 50, 80, P_{10})

and

H_{11} (2, 25,

32,

P_{11})

, given in Appendix C. For the component code used in the outer spatially coupled code, we consider multi-edge QC-LDPC code

C_{12} (3680, 3520)

with the PCM

H_{12} (1, 23, 160, P_{12})

. For

m_{s} = 2

and

L = 100

, the resulting SC-QC-LDPC code has the PCM

H_{SC} (1, 23, 160, P_{12}, 2, 100, {\bar{B}}_{12})

, where

{\bar{B}}_{12}

is the spreading matrix. The outer SC-QC-LDPC code is encoded with the sequential encoder [92]. This requires that the top-left

λ M \times ω M

block

H_{0} (0)

of

H_{SC}

in Equation (A2) is of full rank. Thus,

{\bar{B}}_{12}

is designed to fulfill this condition. In Equation (A2), we have

H_{t} (λ = 1, ω = 23, M = 160) \overset{Δ}{=} H \in F_{2}^{160 \times 3680}

,

t = 0, 1, 2

,

H_{0}

is of full rank, and

H_{SC} \in F_{2}^{16320 \times 368000}

. The rate of the component code is

r_{QC} = 1 - 1 / 23 \approx 0.956

, and the rate of outer SC code is

r_{o}

= r_{QC} - \frac{m_{s} λ}{L ω}

\approx 0.955

, so

r_{total} = r_{i} \cdot r_{o} \approx 0.88

. The

P_{12}

and

{\bar{B}}_{12}

matrices are constructed heuristically and are given in Appendix C.

The inner code is decoded with the parallel decoder, with five decoders with four iterations each. The decoder for the binary code

C_{9}

is WMS with Type

T_{a} V C

parameter sharing, while for the non-binary code,

C_{11}

is WEMS with Type

T_{a} V C Q

sharing and

β_{c, v, q^{'}}^{(t)} = 1

. The static EMS algorithm [93] is parameterized as in Section 4, initialized with the LLRs computed from Equation (1) [81]. The outer code is decoded with the SWD, with the static MS decoding of a maximum of 26 iterations per window.

Table 5 and Table 6 contain a summary of the numerical results.

{BER}_{i}

is pre-FEC BER, and the reference BER for the coding gain is

{BER}_{o} = 10^{- 12}

.

P = - 10

dBm, the total gap to NCG_f for the adaptive weighted min-sum AWSM (resp., WEMS) decoder is 2.51 (resp., 1.75), while this value is 3.31 (resp., 2.29) and 3.44 (resp., 2.69), respectively, for the NNMS (resp., NNEMS) and MS (resp., EMS) decoders. Thus, the adaptive WBP provides a coding gain of 0.8 dB compared to the static NNMS decoder with about the same computational complexity and decoding latency

7. Conclusions

Adaptive decoders are proposed for codes on graphs that can be decoded with message-passing algorithms. Two variants, the parallel WBP and the two-stage decoder, are studied. The parallel decoders search for the best sequence of weights in real time using multiple instances of the WBP decoder running concurrently, while the two-stage neural decoder employs an NN to dynamically determine the weights of WBP for each received word. The performance and complexity of the adaptive and several static decoders are compared for a number of codes over an AWGN and optical fiber channel. The simulations show that significant improvements in BER can be obtained using adaptive decoders, depending on the channel, SNR, the code and its parameters. Future work could explore further reducing the computational complexity of the online learning, and applying adaptive decoders to other types of codes and wireless channels.

Author Contributions

Conceptualization, A.T. and M.Y.; Methodology, A.T. and M.Y.; Software, A.T. and M.Y.; Formal analysis, A.T. and M.Y.; Investigation, A.T. and M.Y.; Writing—original draft, A.T. and M.Y.; Project administration, M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work has received funding from the European Research Council (ERC) research and innovation program, under the COMNFT project, Grant Agreement no. 805195.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript.

AWGN	Additive White Gaussian Noise
AWEMS	Adaptive Weighted Extended Min-sum
AWMS	Adaptive Weighted Min-sum
BCH	Bose–Chaudhuri–Hocquenghem
BER	Bit Error Rate
BICD	Bit-interleaved Coded Demodulation
BICM	Bit-interleaved Coded Modulation
BP	Belief Propagation
CG	Coding Gain
CNN	Convolutional Neural Network
CRC	Cyclic Redundancy Check
DSP	Digital Signal Processing
EDFA	Erbium Doped Fiber Amplifier
EMS	Extended Min-sum
EXIT	Extrinsic Information Transfer
FEC	Forward Error Correction
FER	Frame Error Rate
GF	Galois Field
LDPC	Low-density Parity-check
LLR	Log-likelihood Ratio
MS	Min-sum
NCG	Net Coding Gain
NF	Noise Figure
NN	Neural Network
NR	New Radio
NNEMS	Neural Normalized Extended Min-sum
NNMS	Neural Normalized Min-sum
OSC	Optimized Successive Cancellation
PB	Protograph-based
PCM	Parity-check Matrix
PDF	Probability Density Function
PEG	Progressive-edge Growth
QAM	Quadrature Amplitude Modulation
QC	Quasi-cyclic
ReLU	Rectified Linear Unit
RRC	Root Raised Cosine
RM	Real Multiplication
RNN	Recurrent Neural Network
SC	Spatially coupled
SS-PAN	Simple Scaling and Parameter-adapter Networks
SCL	Successive Cancellation List
SWD	Sliding Window Decoder
WBP	Weighted Belief Propagation
WDM	Wavelength Division Multiplexing
WEMS	Weighted Extended Min-sum
WMS	Weighted Min-sum

Appendix A. Protograph-Based QC-LDPC Codes

In this appendix and the next, we provide the supplementary information necessary to reproduce the results presented in this paper. The presentation in Appendix B may be of independent interest, as it provides an accessible exposition of the construction and decoding of the SC codes.

Appendix A.1. Construction for the Single-Edge Case

A single-edge PB QC-LDPC code

C

is constructed in two steps. First, a base matrix

B \in F_{2}^{λ \times ω}

is constructed, where

λ, ω \in N

,

λ \leq ω

. Then,

B

is expanded to the PCM of

C

by replacing each zero in

B

with the all-zero matrix

0 \in F_{2}^{M \times M}

, where

M \in N

is the lifting factor, and a one in row i and column j with a sparse circulant matrix

H_{i, j} \in F_{2}^{M \times M}

.

Let

P \in {[- 1 : M - 1]}^{λ \times ω}

be the exponent matrix of the code, with the entries denoted by

p_{i, j}

. The matrices

H_{i, j}

(and

B

) can be obtained from

P

as follows. Denote by

I^{n} \in F_{2}^{M \times M}

,

n \in [- 1 : M - 1]

the circulant permutation matrix obtained by cyclic-shifting of each row of the

M \times M

identity matrix n positions to the right, with the convention that

I^{- 1}

is the all-zero matrix. Then,

H_{i, j} = I^{p_{i, j}}

. The PCM

H : = H (λ, ω, M, P) \in F_{2}^{ω M \times λ M}

of this QC-LDPC code is

\begin{matrix} H = (\begin{matrix} I^{p_{1, 1}} & I^{p_{1, 2}} & \dots & I^{p_{1, ω}} \\ I^{p_{2, 1}} & I^{p_{2, 2}} & \dots & I^{p_{2, ω}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ I^{p_{λ, 1}} & I^{p_{λ, 2}} & \dots & I^{p_{λ, ω}} \end{matrix}) . \end{matrix}

(A1)

This code has a length

n = ω M

,

k = ω M - r

, and a rate

r = 1 - r / (ω M)

, where

r \leq λ M

is the rank of

H

. If

H

is of a full rank,

r = 1 - λ / ω

. The base matrix is also obtained from

P

, as

B_{i, j} = 0

if

p_{i, j} = - 1

and

B_{i, j} = 1

if

p_{i, j} \neq - 1

.

Denote the Tanner graph of

C

by

T_{C}

, and let

d_{c}

and

d_{v}

be, respectively, the degree of the check node c and the variable node v in

T_{C}

. If

B

is regular (i.e., its rows have the same Hamming weight), then

H

is regular, and the check and variable nodes of

C

have the same degrees

d_{c}

and

d_{v}

, respectively. In this case,

C

is said to be

(d_{c}, d_{v})

-regular. More generally, a variable node’s degree distribution polynomial can be defined as

Υ (x) = \sum_{d} Υ_{d} x^{d - 1}

, where

Υ_{d}

is the fraction of variable nodes of degree d, and a check node degree distribution

Λ (x) = \sum_{d} Λ_{d} x^{d - 1}

, where

Λ_{d}

is the fraction of check nodes of degree d.

The parameter matrices

B

and

P

can be obtained so as to maximize the girth of

T_{C}

using search-based methods such as the PEG algorithm [86,94], algebraic methods ([69] Section 10), or a combination of them [95].

Example A1.

Consider

λ = 3

,

ω = 5

, and the base matrix

B \in F_{2}^{3 \times 5}

\begin{matrix} B & = & (\begin{matrix} 0 & 1 & 1 & 1 & 0 \\ 1 & 1 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 & 1 \end{matrix}) . \end{matrix}

For any exponent matrix

P

,

H \in F_{2}^{3 M \times 5 M}

is

\begin{matrix} H & = & (\begin{matrix} I^{- 1} & I^{p_{1, 2}} & I^{p_{1, 3}} & I^{p_{1, 4}} & I^{- 1} \\ I^{p_{2, 1}} & I^{p_{2, 2}} & I^{- 1} & I^{p_{2, 4}} & I^{p_{2, 5}} \\ I^{p_{3, 1}} & I^{- 1} & I^{p_{3, 3}} & I^{p_{3, 4}} & I^{p_{3, 5}} \end{matrix}) . □ \end{matrix}

Appendix A.2. Construction for the Multi-Edge Case

The above construction can be extended to multi-edge PB QC-LDPC codes. Here,

B_{i, j}

, instead of a binary number, is a sequence of length

D_{i, j}

with entries in

F_{2}

. Likewise,

p_{i, j}

is a sequence of length

D_{i, j}

with entries

p_{i, j}^{d} \in [- 1 : M - 1]

,

d \in [D_{i, j}]

. Then,

H_{i, j} = \sum_{d = 1}^{D_{i, j}} I^{p_{i, j}^{d}}

. This code is represented by a Tanner graph where there are multiple edges of different types between the variable and the check nodes. We say this code has type

D = {max}_{i, j} D_{i, j} \geq 1

. A single-edge PB QC-LDPC code is type 1.

Example A2.

Consider

λ = 2

,

ω = 3

,

\begin{matrix} B & = & (\begin{matrix} 0 & (1, 1) & 1 \\ 1 & 0 & (1, 1, 1) \end{matrix}) and & P & = & (\begin{matrix} - 1 & (p_{1, 2}^{1}, p_{1, 2}^{2}) & p_{1, 3} \\ p_{2, 1} & - 1 & (p_{2, 3}^{1}, p_{2, 3}^{2}, p_{2, 3}^{3}) \end{matrix}) . \end{matrix}

Then

\begin{matrix} H & = & (\begin{matrix} I^{- 1} & I^{p_{1, 2}^{1}} + I^{p_{1, 2}^{2}} & I^{p_{1, 3}} \\ I^{p_{2, 1}} & I^{- 1} & I^{p_{2, 3}^{1}} + I^{p_{2, 3}^{2}} + I^{p_{2, 3}^{3}} \end{matrix}) . □ \end{matrix}

Appendix A.3. Construction for Non-Binary Codes

In the non-binary codes, the entries of codewords, parity-check, and generator matrices are in a finite field

F_{q}

, where q is a power of a prime. There are several ways to construct non-binary PB QC-LDPC codes. The base matrix

B \in F_{2}^{λ \times ω}

typically remains binary, as defined as in Appendix A.1. We use the unconstrained and random assignment strategy in [80] to extend

B

to a PCM.

There is significant flexibility in selecting the edge weights for constructing non-binary QC-LDPC codes, which can be classified as constrained or unconstrained ([80] Section II). In this work, we focus on an unconstrained and random assignment strategy, where each edge in

T_{C}

corresponding to a 1 in the binary matrix B is replaced with a coefficient

h_{i, j} \in F_{q}

. Alternatively, these coefficients could be selected based on predefined rules to ensure appropriate edge-weight diversity, which can lead to enhanced performance. This methodology allows non-binary codes to retain the structural benefits of binary QC-LDPC codes while extending their functionality to finite fields, offering improved error-correcting capabilities for larger q values.

Appendix A.4. Encoder and Decoder

The generator matrix of the code is obtained by applying Gaussian elimination in the binary field to (A1). The encoder is then implemented efficiently using shift registers [96].

The QC-LDPC codes are typically decoded using belief propagation (BP), as described in Section 4.

Appendix B. Spatially Coupled LDPC Codes

Appendix B.1. Construction

The SC-QC-LDPC codes in this paper are constructed based on the edge spreading process [97]. Denote the PCM of the constituent PB QC-LDPC code by

H (λ, ω, M, P)

. Denote the PCM of the corresponding SC-QC-LDPC code by

H_{SC} : = H_{SC} (λ, ω, M, P, m_{s}, L, \bar{B})

, with the additional parameters of the syndrome memory

m_{s} \in N

, coupling length

L \in N

, and the spreading matrix

\bar{B} \in {[- 1 : m_{s}]}^{λ \times ω}

. Then,

H_{SC} \in F_{2}^{λ M (m_{s} + L + 1) \times ω M L}

is given by Entropy 27 00795 i001

in which

H_{t} (l), 0 \in F_{2}^{λ M \times ω M}

are

λ \times ω

block matrices,

t \in [0 : m_{s}]

,

l \in [0, L - 1]

and

H_{t} (l)

is obtained from

\bar{B}

as

\begin{matrix} H_{t} (l) at row - block i and column - block j & = & \{\begin{matrix} I^{p_{i, j}}, & if {\bar{B}}_{i, j} = t, \\ 0, & otherwise, \end{matrix} \end{matrix}

(A2)

where

I^{p_{i, j}}, 0 \in F_{2}^{M \times M}

.

If

H_{t} (l_{1}) = H_{t} (l_{2})

,

\forall t \in [0, m_{s}]

and

\forall l_{1}, l_{2} \in [0, L - 1]

, the code is time-invariant. If

H

and

H_{SC}

are full-rank, then the rate of SC-QC-LDPC code is

r = r_{QC} - \frac{m_{s} λ}{L ω}

, where

r_{QC} = 1 - \frac{λ}{ω}

is the rate of the component QC-LDPC code. For a fixed

m_{s}

, as

L \to \infty

, then

r \to r_{QC}

. Thus, the rate loss in SC-LDPC codes can be reduced by increasing the coupling length.

Example A3.

Consider any QC-LDPC code,

m_{s} = 2

,

L = 4

and the spreading matrix

\begin{matrix} \bar{B} = (\begin{matrix} - 1 & 1 & 0 & 2 & - 1 \\ 1 & - 1 & 1 & 0 & - 1 \\ 1 & 2 & - 1 & 0 & 2 \end{matrix}) \end{matrix}

Then,

H_{2} (l)

is obtained by replacing each entry of

\bar{B}

that is 2 at row i and column j with

I^{p_{i, j}}

, and other entries with

0

. Thus, for all

l \in [0 : 3]

\begin{matrix} H_{2} (l) & = & (\begin{matrix} 0 & 0 & 0 & I^{p_{1, 4}} & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & I^{p_{3, 2}} & 0 & 0 & I^{p_{3, 5}} \end{matrix}) . \end{matrix}

In a similar manner,

\begin{matrix} H_{0} (l) & = & (\begin{matrix} 0 & 0 & I^{p_{1, 3}} & 0 & 0 \\ 0 & 0 & 0 & I^{p_{2, 4}} & 0 \\ 0 & 0 & 0 & I^{p_{3, 4}} & 0 \end{matrix}), & H_{1} (l) & = & (\begin{matrix} 0 & I^{p_{1, 2}} & 0 & 0 & 0 \\ I^{p_{2, 1}} & 0 & I^{p_{2, 3}} & 0 & 0 \\ I^{p_{3, 1}} & 0 & 0 & 0 & 0 \end{matrix}) . \end{matrix}

If

H_{SC}

is full-rank, then

r = r_{QC} - \frac{2 \cdot 3}{4 \cdot 5}

. If

H_{t} (l)

is also full-rank, then

r_{QC} = 0.4

and

r = 0.1

. □

The SC-LDPC code is efficiently encoded sequentially [92] so that at each spatial position ℓ,

(ω - λ) M

information bits are encoded out of

(ω - λ) M L

.

If an entry of

B

corresponding to an

H_{i} (l)

is a sequence, the corresponding entry in

\bar{B}

is also a sequence. Thus, if

H_{i} (l)

represents a multi-edge code for some i or ℓ, so does

H_{SC}

.

Figure A1. (a) A window in SWD is a

m_{w} \times m_{w}

block, each block with size

λ M \times ω M

. The window slides diagonally by one block. The variable nodes in the block at the block-position

i \times j

are denoted by

A (i)

, and the check nodes by

B (j)

. Here,

m_{s} = 2

and

m_{w} = 3

. (b) The message update for the current window ℓ is denoted by the solid rectangle. The variable nodes in the previous windows

l^{'} < l

and not in the current window ℓ, denoted in green, send fixed messages

L_{v 2 c}^{(T_{l - 1}, l - 1)}

to the check nodes in the current window in any iterations t. The blue variable nodes in the current and a previous window send

L_{v 2 c}^{(T_{l - 1}, l - 1)}

to the check nodes in the current window at iteration

t = 1

, which would be updated according to (9) and (10) for

t \geq 2

. The red variable nodes in the current window, but not in any previous window, send

L (y)

to the check nodes in the current window at

t = 1

, which would be updated at

t \geq 2

. The edges from the gray check nodes in windows

l^{'} > l

are discarded for the decoding at position ℓ.

Figure A1. (a) A window in SWD is a

m_{w} \times m_{w}

block, each block with size

λ M \times ω M

. The window slides diagonally by one block. The variable nodes in the block at the block-position

i \times j

are denoted by

A (i)

, and the check nodes by

B (j)

. Here,

m_{s} = 2

and

m_{w} = 3

. (b) The message update for the current window ℓ is denoted by the solid rectangle. The variable nodes in the previous windows

l^{'} < l

and not in the current window ℓ, denoted in green, send fixed messages

L_{v 2 c}^{(T_{l - 1}, l - 1)}

to the check nodes in the current window in any iterations t. The blue variable nodes in the current and a previous window send

L_{v 2 c}^{(T_{l - 1}, l - 1)}

to the check nodes in the current window at iteration

t = 1

, which would be updated according to (9) and (10) for

t \geq 2

. The red variable nodes in the current window, but not in any previous window, send

L (y)

to the check nodes in the current window at

t = 1

, which would be updated at

t \geq 2

. The edges from the gray check nodes in windows

l^{'} > l

are discarded for the decoding at position ℓ.

Appendix B.2. Encoder

The SC-LDPC codes are encoded with the sequential encoder [92].

Appendix B.3. Sliding Window Decoder

Consider the SC LDPC code

H_{SC} (λ, ω, M, P, m_{s}, L, \bar{B})

in Appendix B.1. Note that any two variable-nodes in the Tanner graph of the code whose corresponding columns in

H_{SC}

are at least

(m_{s} + 1) M ω

columns apart do not share any common check-nodes. Thus, they are not involved in the same parity-check equation. The SWD uses this property and runs a local BP decoder on windows of

H_{SC}

shown in Figure A1.

SWD works through a sequence of spatial iterations ℓ, where a rectangular window slides from the top-left to the bottom-right side of

H_{SC}

. In general, a window matrix

H_{w}

of size

m_{w}

consists of

m_{w} λ M

consecutive rows and

m_{w} ω M

consecutive columns in

H_{SC}

. At each iteration ℓ, it moves

λ M

rows down and

ω M

columns to the right in

H_{SC}

. Thus, a window is an

m_{w} \times m_{w}

block, starting from the top left and moving diagonally one block down per iteration. There is a special case where the window reaches the boundary. The way the windows near the boundary are terminated impacts performance [98,99]. Our setup for window termination at boundary is early termination, which is discussed in section III-B1 [99].

Denote the variable nodes in window ℓ by

V (l) = \{v_{i} \in V : i = (l - 1) ω M, \dots, (m_{w} + l - 1) ω M - 1\} .

The check-nodes directly connected to

V (l)

are

C (l)

. Define

\tilde{V} (l) = \cup_{l^{'} < l} \{v \in V (l^{'}) ∖ V (l), v connected to C (l)\}

(the variable nodes in the previous windows not in the current window, shown in green in Figure A1)

\bar{V} (l) = \cup_{l^{'} < l} \{v \in V (l^{'}) \cap V (l), v connected to C (l)\}

(the variable nodes in the current window and any previous window, shown in blue in Figure A1), and

\hat{V} (l) = \cup_{l^{'} < l} {v \in V (l) ∖ V (l^{'}),

v connected to C (l)}

(the variables nodes in the current window not in any previous window, shown in red in Figure A1). Let

L_{v 2 c}^{(t, l)}

and

L_{c 2 v}^{(t, l)}

be LLRs in window ℓ and iteration

t \in [T_{l}]

in BP. At

t = 1

, the BP is initialized as

\begin{matrix} L^{(1, l)} (v) = \{\begin{matrix} L^{(T_{l - 1}, l - 1)} (v), & v \in \tilde{V} (l) \cup \bar{V} (l), \\ L (y), & v \in \hat{V} (l) \end{matrix} \end{matrix}

(A3)

The update equation for the variable node is

\begin{matrix} L_{v 2 c}^{(t, l)} = \{\begin{matrix} L_{v 2 c}^{(T_{l - 1}, l - 1)}, & v \in \tilde{V} (l), \\ L (y) + \sum_{c^{'} \in C_{v} ∖ {c}} L_{c^{'} 2 v}^{(t - 1), l}, & v \in \bar{V} (l) \\ L (y), & v \in \hat{V} (l) . \end{matrix} \end{matrix}

The update relation for

L_{c 2 v}^{(t, l)}

is given by (9), with no weights, applied for

c \in C (l)

and

v \in V (l)

. After

T_{l}

iterations, the variables in the window ℓ, called target symbols, are decoded. The SWD is illustrated in Figure A1.

Appendix C. Parameters of Codes

For low-rate codes, first, the degree distributions of the Tanner graph are determined using the extrinsic information transfer (EXIT) chart [100]. The EXIT chart produces accurate results if

n \to \infty

[100]. For high-rate codes, we apply the stochastic EXIT chart, which in the short block length regime yields better coding gains compared to the deterministic variant [101]. For instance, while the EXIT chart suggests that the degree distributions of

C_{6}

are optimal near

E_{b} / N_{0} = 3

dB, the stochastic EXIT chart in Figure A2 suggests

E_{b} / N_{0} = 3.5

dB. Indeed, at

E_{b} / N_{0} = 3

dB, the check-node extrinsic information

I_{C}

intersects with variable-node extrinsic information

I_{V}

for the deterministic EXIT chart. The exponent matrices are obtained using the PEG algorithm, which takes the optimized degree distribution polynomials.

Figure A2. Stochastic EXIT chart for the high-rate code

C_{6}

at

E_{b} / N_{0} = 3.5

dB, for the AWGN channel.

Figure A2. Stochastic EXIT chart for the high-rate code

C_{6}

at

E_{b} / N_{0} = 3.5

dB, for the AWGN channel.

The matrices below are vectorized row-wise. They can be unvectorized considering their dimensions.

C_{2}

:

λ = 4

,

ω = 8

,

M = 403

,

Λ (x) = x^{3}

,

Υ (x) = x^{7}

,

P_{2}

below

[345 152 72 376 377 197 4 144 187 398 320 225 330 198 79 289 271 165 259 105 288 254 51 236 111 233 380 332 47 76 222 247]

C_{3}

:

λ = 5

,

ω = 16

,

M = 251

,

Λ (x) = x^{4}

,

Υ (x) = x^{15}

,

P_{3}

below

[6 98 208 177 76 76 76 48 111 76 76 34 76 76 64 85 198 42 155 127 29 32 35 10 76 44 47 8 53 56 47 71 31 211 158 0 238 111 199 8 195 248 121 167 46 170 246 140 117 51 3 65 57 150 243 57 213 20 113 164 48 141 222 85 181 142 121 210 229 98 218 59 242 76 196 23 185 54 162 52]

C_{6}

:

λ = 7

,

ω = 42

,

M = 25

,

Λ (x) = 0.714 x^{2} + 0.286 x^{3}

,

Υ (x) = 0.857 x^{18} + 0.143 x^{23}

,

P_{6}

below

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 1 3 8 18 7 17 22 24 - 1 - 1 - 1 - 1 - 1 16 - 1 - 1 - 1 - 1 - 1 - 1 14 - 1 - 1 - 1 - 1 19 4 - 1 - 1 22 - 1 8 - 1 - 1 5 - 1 - 1 9 0 14 - 1 15 3 12 7 11 14 18 13 22 - 1 - 1 - 1 - 1 - 1 - 1 - 1 17 - 1 - 1 - 1 - 1 - 1 - 1 15 - 1 - 1 3 - 1 18 - 1 - 1 - 1 9 8 - 1 - 1 19 9 20 - 1 21 5 - 1 13 - 1 - 1 - 1 - 1 - 1 - 1 - 1 2 6 16 11 14 9 19 23 12 - 1 - 1 - 1 - 1 - 1 - 1 - 1 20 - 1 19 - 1 - 1 - 1 17 - 1 - 1 - 1 8 16 18 4 3 - 1 17 - 1 - 1 - 1 15 - 1 - 1 - 1 - 1 - 1 6 24 14 22 3 11 1 19 - 1 - 1 13 - 1 - 1 - 1 - 1 - 1 18 - 1 18 - 1 15 - 1 - 1 - 1 - 1 13 11 19 - 1 - 1 20 17 - 1 21 - 1 - 1 - 1 - 1 17 - 1 - 1 - 1 - 1 13 - 1 - 1 - 1 - 1 - 1 - 1 3 9 24 4 21 1 16 22 14 - 1 - 1 14 23 24 11 - 1 16 23 - 1 - 1 - 1 - 1 - 1 - 1 7 23 - 1 - 1 - 1 - 1 - 1 - 1 18 - 1 - 1 - 1 - 1 15 - 1 - 1 - 1 - 1 9 11 21 8 17 4 14 16 - 1 11 - 1 6 11 13 13 20 6 13 - 1 - 1 16 - 1 - 1 - 1 - 1 - 1]

C_{7}

:

λ = 8

,

ω = 42

,

M = 25

,

Λ (x) = 0.596 x^{2} + 0.404 x^{3}

,

Υ (x) = 0.125 x^{9} + 0.125 x^{16} + 0.5

x^{17} + 0.125 x^{19} + 0.125 x^{23}

,

P_{7}

below

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 1 3 8 18 7 17 22 24 - 1 - 1 - 1 - 1 - 1 16 - 1 - 1 - 1 - 1 - 1 - 1 14 - 1 - 1 - 1 - 1 20 - 1 16 - 1 - 1 - 1 - 1 4 - 1 10 - 1 - 1 20 19 - 1 2 9 3 12 7 11 14 18 13 22 - 1 - 1 - 1 - 1 - 1 - 1 - 1 17 - 1 - 1 - 1 - 1 - 1 - 1 15 - 1 - 1 - 1 - 1 22 6 - 1 - 1 13 - 1 21 4 - 1 - 1 - 1 - 1 14 1 17 13 - 1 - 1 - 1 - 1 - 1 - 1 - 1 2 6 16 11 14 9 19 23 12 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 5 - 1 - 1 3 - 1 10 - 1 23 - 1 - 1 8 4 - 1 2 - 1 - 1 - 1 15 - 1 - 1 - 1 - 1 - 1 6 24 14 22 3 11 1 19 - 1 - 1 13 - 1 - 1 - 1 - 1 - 1 20 - 1 - 1 - 1 - 1 13 - 1 - 1 - 1 - 1 0 3 3 - 1 12 3 7 - 1 - 1 - 1 - 1 - 1 17 - 1 - 1 - 1 - 1 13 - 1 - 1 - 1 - 1 - 1 - 1 3 9 24 4 21 1 16 22 - 1 - 1 12 - 1 6 - 1 - 1 6 - 1 - 1 - 1 - 1 0 6 18 19 - 1 6 - 1 - 1 - 1 - 1 - 1 - 1 18 - 1 - 1 - 1 - 1 15 - 1 - 1 - 1 - 1 9 11 21 8 17 4 14 16 23 1 3 - 1 24 10 0 - 1 - 1 23 - 1 0 - 1 - 1 - 1 23 - 1 19 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 8 5 20 - 1 - 1 7 10 24 21 10 - 1 11 22 - 1 - 1 - 1 - 1 - 1]

C_{8}

:

λ = 6

,

ω = 60

,

M = 71

,

Λ (x) = 0.1 x + 0.634 x^{2} + 0.266 x^{3}

,

Υ (x) = 0.166 x^{27} + 0.668

x^{30} + 0.166 x^{37}

,

P_{8}

below

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 1 - 1 - 1 - 1 - 1 61 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 62 - 1 - 1 20 60 24 37 - 1 1 17 45 52 57 62 63 68 70 54 26 19 14 9 8 3 0 - 1 - 1 - 1 - 1 - 1 - 1 28 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 46 - 1 - 1 - 1 - 1 34 51 - 1 - 1 59 - 1 18 - 1 - 1 - 1 61 - 1 52 46 21 48 - 1 - 1 44 34 - 1 2 48 6 1 17 40 29 11 67 23 65 70 54 31 42 60 4 - 1 0 - 1 - 1 - 1 - 1 2 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 46 - 1 - 1 - 1 - 1 - 1 21 - 1 23 - 1 - 1 7 - 1 - 1 53 - 1 - 1 62 - 1 32 27 27 39 1 - 1 15 - 1 16 0 - 1 - 1 - 1 - 1 - 1 - 1 16 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 3 51 64 14 29 44 47 62 68 20 7 57 42 27 24 9 - 1 - 1 - 1 32 29 - 1 - 1 - 1 - 1 38 - 1 34 - 1 45 - 1 67 - 1 - 1 60 40 - 1 37 - 1 9 20 - 1 69 20 - 1 0 - 1 - 1 - 1 - 1 21 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 2 18 3 51 49 16 33 59 69 53 68 20 22 55 38 12 - 1 - 1 62 - 1 - 1 - 1 - 1 48 0 37 48 - 1 26 19 59 60 49 38 - 1 - 1 - 1 - 1 67 - 1 - 1 - 1 24 - 1 - 1 - 1 0 1 40 42 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 67 14 48 55 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 25 67 16 41 64 0 68 66 50 20 63 67 34 45 0 7 29 11 11 - 1 - 1 - 1 61 - 1 - 1 - 1 - 1 - 1]

C_{10}

:

λ = 4

,

ω = 50

,

M = 80

,

Λ (x) = 0.02 + 0.18 x + 0.64 x^{2} + 0.16 x^{3}

,

Υ (x) = 0.25 x^{29}

0.25 x^{33} + 0.25 x^{34} + 0.25 x^{47}

,

P_{10}

below

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 1 - 1 1 2 8 - 1 53 70 - 1 78 72 - 1 27 - 1 3 6 - 1 - 1 79 50 77 - 1 - 1 - 1 - 1 30 7 14 - 1 - 1 51 - 1 73 66 - 1 - 1 29 - 1 18 36 64 - 1 74 - 1 62 44 16 60 - 1 20 37 - 1 3 67 78 41 31 26 77 13 - 1 39 49 54 9 - 1 - 1 43 - 1 - 1 71 - 1 6 37 - 1 - 1 21 69 66 47 57 22 59 11 14 33 23 58 - 1 - 1 44 18 - 1 68 - 1 - 1 36 62 - 1 12 - 1 79 42 - 1 - 1 23 - 1 - 1 38 79 - 1 57 - 1 - 1 - 1 3 70 69 54 - 1 - 1 77 10 11 26 68 - 1 7 30 1 - 1 28 - 1 73 50 - 1 - 1 52 36 18 20 14 4 72 44 62 60 66 76 8 34 - 1]

C_{11}

:

λ = 2

,

ω = 25

,

M = 32

,

Λ (x) = x

,

Υ (x) = x^{24}

,

P_{11}

below

[(0, 17) (- 1) (0, 20) (- 1) (0, 21) (- 1) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (- 1) (0, 17) (- 1) (0, 20) (- 1) (0, 21) (0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (16) (25) (26) (27) (28) (29) (30) (31)]

C_{12}

:

λ = 1

,

ω = 23

,

M = 160

,

Λ (x) = 0.480 x + 0.130 x^{2} + 0.217 x^{3}, 0.130 x^{4} + 0.043 x^{5}

Υ (x) = x^{71}

,

P_{12}

below

[(1, 151, 151, 55, 127, 151) (138, 27, 139, 144) (88, 57, - 1) (111, - 1, 47) (130, 15, 33) (11, 47, 118, 15, 108) (109, 5, - 1) (143, - 1, 140) (100, 14, 14, 141, - 1) (12, - 1, 20) (91, 42, 96) (74, - 1, 72) (54, 29, 155, 157, 159) (83, 82, - 1) (0, 77, 141, 78, 13) (112, - 1, 59) (119, 74, 56) (48, 6, 55, 157, - 1) (85, - 1, 9) (41, 80, 121, 2, - 1) (103, - 1, 45) (60, 117, 52, 87) (99, 148, - 1)]

m_{s} = 2

,

{\bar{B}}_{12}

below

[(0, 0, 1, 2, 2, 2) (0, 1, 1, 2) (0, 1, - 1) (0, - 1, 2) (0, 1, 2) (0, 0, 0, 1, 2) (0, 1, - 1) (0, - 1, 2) (0, 1, 1, 1, - 1) (0, - 1, 2) (0, 1, 2) (0, - 1, 2) (0, 1, 2, 2, 2) (0, 1, - 1) (0, 0, 0, 1, 2) (0, - 1, 2) (0, 1, 2) (0, 1, 1, 1, - 1) (0, - 1, 2) (0, 0, 0, 1, - 1) (0, - 1, 2) (0, 1, 2, 2) (0, 1, - 1)]

References

Pham, Q.V.; Nguyen, N.T.; Huynh-The, T.; Bao Le, L.; Lee, K.; Hwang, W.J. Intelligent radio signal processing: A survey. IEEE Access 2021, 9, 83818–83850. [Google Scholar] [CrossRef]
Bruck, J.; Blaum, M. Neural networks, error-correcting codes, and polynomials over the binary n-cube. IEEE Trans. Inf. Theory 1989, 35, 976–987. [Google Scholar] [CrossRef]
Zeng, G.; Hush, D.; Ahmed, N. An application of neural net in decoding error-correcting codes. In Proceedings of the 1989 IEEE International Symposium on Circuits and Systems (ISCAS), Portland, OR, USA, 8–11 May 1989; Volume 2, pp. 782–785. [Google Scholar] [CrossRef]
Caid, W.; Means, R. Neural network error correcting decoders for block and convolutional codes. In Proceedings of the GLOBECOM ’90: IEEE Global Telecommunications Conference and Exhibition, San Diego, CA, USA, 2–5 December 1990; Volume 2, pp. 1028–1031. [Google Scholar]
Tseng, Y.H.; Wu, J.L. Decoding Reed-Muller codes by multi-layer perceptrons. Int. J. Electron. Theor. Exp. 1993, 75, 589–594. [Google Scholar] [CrossRef]
Marcone, G.; Zincolini, E.; Orlandi, G. An efficient neural decoder for convolutional codes. Eur. Trans. Telecommun. Relat. Technol. 1995, 6, 439–445. [Google Scholar]
Wang, X.A.; Wicker, S. An artificial neural net Viterbi decoder. IEEE Trans. Commun. 1996, 44, 165–171. [Google Scholar] [CrossRef]
Tallini, L.G.; Cull, P. Neural nets for decoding error-correcting codes. In Proceedings of the IEEE Technical Applications Conference and Workshops. Northcon/95. Conference Record, Portland, OR, USA, 10–12 October 1995; pp. 89–94. [Google Scholar] [CrossRef]
Ibnkahla, M. Applications of neural networks to digital communications—A survey. Signal Process. 2000, 80, 1185–1215. [Google Scholar] [CrossRef]
Haroon, A. Decoding of Error Correcting Codes Using Neural Networks. Ph.D. Thesis, Blekinge Institute of Technology, Blekinge, Sweden, 2012. Available online: https://www.diva-portal.org/smash/get/diva2:832503/FULLTEXT01.pdf (accessed on 11 June 2025).
Nachmani, E.; Be’ery, Y.; Burshtein, D. Learning to decode linear codes using deep learning. In Proceedings of the 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 27–30 September 2016; pp. 341–346. [Google Scholar] [CrossRef]
Gruber, T.; Cammerer, S.; Hoydis, J.; Brink, S.T. On deep learning-based channel decoding. In Proceedings of the 2017 51st Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 22–24 March 2017; pp. 1–6. [Google Scholar] [CrossRef]
Kim, H.; Jiang, Y.; Rana, R.B.; Kannan, S.; Oh, S.; Viswanath, P. Communication algorithms via deep learning. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018; Available online: https://openreview.net/forum?id=ryazCMbR- (accessed on 11 June 2025).
Vasić, B.; Xiao, X.; Lin, S. Learning to decode LDPC codes with finite-alphabet message passing. In Proceedings of the 2018 Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 11–16 February 2018; pp. 1–9. [Google Scholar]
Bennatan, A.; Choukroun, Y.; Kisilev, P. Deep learning for decoding of linear codes—A syndrome-based approach. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 1595–1599. [Google Scholar] [CrossRef]
Nachmani, E.; Marciano, E.; Lugosch, L.; Gross, W.J.; Burshtein, D.; Be’ery, Y. Deep learning methods for improved decoding of linear codes. IEEE J. Sel. Top. Signal Process. 2018, 12, 119–131. [Google Scholar] [CrossRef]
Lugosch, L.P. Learning Algorithms for Error Correction. Master’s Thesis, McGill University, Montreal, QC, Canada, 2018. Available online: https://escholarship.mcgill.ca/concern/theses/c247dv63d (accessed on 11 June 2025).
Lian, M.; Carpi, F.; Häger, C.; Pfister, H.D. Learned belief-propagation decoding with simple scaling and SNR adaptation. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 161–165. [Google Scholar] [CrossRef]
Jiang, Y.; Kannan, S.; Kim, H.; Oh, S.; Asnani, H.; Viswanath, P. DEEPTURBO: Deep Turbo decoder. In Proceedings of the 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Cannes, France, 2–5 July 2019; pp. 1–5. [Google Scholar] [CrossRef]
Carpi, F.; Häger, C.; Martalò, M.; Raheli, R.; Pfister, H.D. Reinforcement learning for channel coding: Learned bit-flipping decoding. In Proceedings of the 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 24–27 September 2019; pp. 922–929. [Google Scholar] [CrossRef]
Wang, Q.; Wang, S.; Fang, H.; Chen, L.; Chen, L.; Guo, Y. A model-driven deep learning method for normalized Min-Sum LDPC decoding. In Proceedings of the 2020 IEEE International Conference on Communications Workshops (ICC Workshops), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar] [CrossRef]
Huang, L.; Zhang, H.; Li, R.; Ge, Y.; Wang, J. AI coding: Learning to construct error correction codes. IEEE Trans. Commun. 2020, 68, 26–39. [Google Scholar] [CrossRef]
Be’Ery, I.; Raviv, N.; Raviv, T.; Be’Ery, Y. Active deep decoding of linear codes. IEEE Trans. Commun. 2020, 68, 728–736. [Google Scholar] [CrossRef]
Xu, W.; Tan, X.; Be’ery, Y.; Ueng, Y.L.; Huang, Y.; You, X.; Zhang, C. Deep learning-aided belief propagation decoder for Polar codes. IEEE J. Emerg. Sel. Top. Circuits Syst. 2020, 10, 189–203. [Google Scholar] [CrossRef]
Buchberger, A.; Häger, C.; Pfister, H.D.; Schmalen, L.; i Amat, A.G. Pruning and quantizing neural belief propagation decoders. IEEE J. Sel. Areas Commun. 2021, 39, 1957–1966. [Google Scholar] [CrossRef]
Dai, J.; Tan, K.; Si, Z.; Niu, K.; Chen, M.; Poor, H.V.; Cui, S. Learning to decode protograph LDPC codes. IEEE J. Sel. Areas Commun. 2021, 39, 1983–1999. [Google Scholar] [CrossRef]
Tonnellier, T.; Hashemipour, M.; Doan, N.; Gross, W.J.; Balatsoukas-Stimming, A. Towards practical near-maximum-likelihood decoding of error-correcting codes: An overview. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 8283–8287. [Google Scholar] [CrossRef]
Wang, L.; Chen, S.; Nguyen, J.; Dariush, D.; Wesel, R. Neural-network-optimized degree-specific weights for LDPC MinSum decoding. In Proceedings of the 2021 11th International Symposium on Topics in Coding (ISTC), Montreal, QC, Canada, 30 August–3 September 2021; pp. 1–5. [Google Scholar] [CrossRef]
Habib, S.; Beemer, A.; Kliewer, J. Belief propagation decoding of short graph-based channel codes via reinforcement learning. IEEE J. Sel. Areas Inf. Theory 2021, 2, 627–640. [Google Scholar] [CrossRef]
Nachmani, E.; Wolf, L. Autoregressive belief propagation for decoding block codes. arXiv 2021, arXiv:2103.11780. [Google Scholar]
Nachmani, E.; Be’ery, Y. Neural decoding with optimization of node activations. IEEE Commun. Lett. 2022, 26, 2527–2531. [Google Scholar] [CrossRef]
Cammerer, S.; Ait Aoudia, F.; Dörner, S.; Stark, M.; Hoydis, J.; Ten Brink, S. Trainable communication systems: Concepts and prototype. IEEE Trans. Commun. 2020, 68, 5489–5503. [Google Scholar] [CrossRef]
Cammerer, S.; Hoydis, J.; Aoudia, F.A.; Keller, A. Graph neural networks for channel decoding. In Proceedings of the 2022 IEEE Globecom Workshops (GC Wkshps), Rio de Janeiro, Brazil, 4–8 December 2022; pp. 486–491. [Google Scholar] [CrossRef]
Choukroun, Y.; Wolf, L. Error correction code transformer. Conf. Neural Inf. Proc. Syst. 2022, 35, 38695–38705. Available online: https://proceedings.neurips.cc/paper_files/paper/2022/file/fcd3909db30887ce1da519c4468db668-Paper-Conference.pdf (accessed on 11 June 2025).
Jamali, M.V.; Saber, H.; Hatami, H.; Bae, J.H. ProductAE: Toward training larger channel codes based on neural product codes. In Proceedings of the ICC 2022—IEEE International Conference on Communications, Seoul, Republic of Korea, 16–20 May 2022; pp. 3898–3903. [Google Scholar] [CrossRef]
Dörner, S.; Clausius, J.; Cammerer, S.; ten Brink, S. Learning joint detection, equalization and decoding for short-packet communications. IEEE Trans. Commun. 2022, 71, 837–850. [Google Scholar] [CrossRef]
Li, G.; Yu, X.; Luo, Y.; Wei, G. A bottom-up design methodology of neural Min-Sum decoders for LDPC codes. IET Commun. 2023, 17, 377–386. [Google Scholar] [CrossRef]
Wang, Q.; Liu, Q.; Wang, S.; Chen, L.; Fang, H.; Chen, L.; Guo, Y.; Wu, Z. Normalized Min-Sum neural network for LDPC decoding. IEEE Trans. Cogn. Commun. Netw. 2023, 9, 70–81. [Google Scholar] [CrossRef]
Wang, L.; Terrill, C.; Divsalar, D.; Wesel, R. LDPC decoding with degree-specific neural message weights and RCQ decoding. IEEE Trans. Commun. 2023, 72, 1912–1924. [Google Scholar] [CrossRef]
Clausius, J.; Geiselhart, M.; Ten Brink, S. Component training of Turbo Autoencoders. In Proceedings of the 2023 12th International Symposium on Topics in Coding (ISTC), Brest, France, 4–8 September 2023; pp. 1–5. [Google Scholar] [CrossRef]
Choukroun, Y.; Wolf, L. A foundation model for error correction codes. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024; Available online: https://openreview.net/forum?id=7KDuQPrAF3 (accessed on 15 March 2012).
Choukroun, Y.; Wolf, L. Learning linear block error correction codes. arXiv 2024, arXiv:2405.04050. [Google Scholar]
Clausius, J.; Geiselhart, M.; Tandler, D.; Brink, S.T. Graph neural network-based joint equalization and decoding. In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT), Athens, Greece, 7–12 July 2024; pp. 1203–1208. [Google Scholar] [CrossRef]
Adiga, S.; Xiao, X.; Tandon, R.; Vasić, B.; Bose, T. Generalization bounds for neural belief propagation decoders. IEEE Trans. Inf. Theory 2024, 70, 4280–4296. [Google Scholar] [CrossRef]
Kim, T.; Sung Park, J. Neural self-corrected Min-Sum decoder for NR LDPC codes. IEEE Commun. Lett. 2024, 28, 1504–1508. [Google Scholar] [CrossRef]
Ninkovic, V.; Kundacina, O.; Vukobratovic, D.; Häger, C.; i Amat, A.G. Decoding Quantum LDPC Codes Using Graph Neural Networks. In Proceedings of the GLOBECOM 2024—2024 IEEE Global Communications Conference, Cape Town, South Africa, 8–12 December 2024; pp. 3479–3484. [Google Scholar]
Cammerer, S.; Gruber, T.; Hoydis, J.; Ten Brink, S. Scaling deep learning-based decoding of Polar codes via partitioning. In Proceedings of the GLOBECOM 2017—2017 IEEE Global Communications Conference, Singapore, 4–8 December 2017; pp. 1–6. [Google Scholar] [CrossRef]
Sagar, V.; Jacyna, G.M.; Szu, H. Block-parallel decoding of convolutional codes using neural network decoders. Neurocomputing 1994, 6, 455–471. [Google Scholar] [CrossRef]
Hussain, M.; Bedi, J.S. Reed-Solomon encoder/decoder application using a neural network. Proc. SPIE 1991, 1469, 463–471. [Google Scholar] [CrossRef]
Alston, M.D.; Chau, P.M. A neural network architecture for the decoding of long constraint length convolutional codes. In Proceedings of the 1990 IJCNN International Joint Conference on Neural Networks, San Diego, CA, USA, 17–21 June 1990; pp. 121–126. [Google Scholar] [CrossRef]
Wu, X.; Jiang, M.; Zhao, C. Decoding optimization for 5G LDPC codes by machine learning. IEEE Access 2018, 6, 50179–50186. [Google Scholar] [CrossRef]
Miloslavskaya, V.; Li, Y.; Vucetic, B. Neural network-based adaptive Polar coding. IEEE Trans. Commun. 2024, 72, 1881–1894. [Google Scholar] [CrossRef]
Doan, N.; Hashemi, S.A.; Mambou, E.N.; Tonnellier, T.; Gross, W.J. Neural belief propagation decoding of CRC-polar concatenated codes. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar]
Yang, C.; Zhou, Y.; Si, Z.; Dai, J. Learning to decode protograph LDPC codes over fadings with imperfect CSIs. In Proceedings of the 2023 IEEE Wireless Communications and Networking Conference (WCNC), Glasgow, UK, 26–29 March 2023; pp. 1–6. [Google Scholar] [CrossRef]
Wang, M.; Li, Y.; Liu, J.; Guo, T.; Wu, H.; Lau, F.C. Neural layered min-sum decoders for cyclic codes. Phys. Commun. 2023, 61, 102194. [Google Scholar] [CrossRef]
Raviv, T.; Goldman, A.; Vayner, O.; Be’ery, Y.; Shlezinger, N. CRC-aided learned ensembles of belief-propagation polar decoders. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 8856–8860. [Google Scholar]
Raviv, T.; Raviv, N.; Be’ery, Y. Data-driven ensembles for deep and hard-decision hybrid decoding. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; pp. 321–326. [Google Scholar] [CrossRef]
Kwak, H.Y.; Yun, D.Y.; Kim, Y.; Kim, S.H.; No, J.S. Boosting learning for LDPC codes to improve the error-floor performance. In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023; Volume 36, pp. 22115–22131. Available online: https://proceedings.neurips.cc/paper_files/paper/2023/file/463a91da3c832bd28912cd0d1b8d9974-Paper-Conference.pdf (accessed on 11 June 2025).
Schmalen, L.; Suikat, D.; Rösener, D.; Aref, V.; Leven, A.; ten Brink, S. Spatially coupled codes and optical fiber communications: An ideal match? In Proceedings of the 2015 IEEE 16th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Stockholm, Sweden, 28 June–1 July 2015; pp. 460–464. [Google Scholar] [CrossRef]
Kudekar, S.; Richardson, T.; Urbanke, R.L. Spatially coupled ensembles universally achieve capacity under belief propagation. IEEE Trans. Inf. Theory 2013, 59, 7761–7813. [Google Scholar] [CrossRef]
Liga, G.; Alvarado, A.; Agrell, E.; Bayvel, P. Information rates of next-generation long-haul optical fiber systems using coded modulation. IEEE J. Lightw. Technol. 2017, 35, 113–123. [Google Scholar] [CrossRef]
Feltstrom, A.J.; Truhachev, D.; Lentmaier, M.; Zigangirov, K.S. Braided block codes. IEEE Trans. Inf. Theory 2009, 55, 2640–2658. [Google Scholar] [CrossRef]
Montorsi, G.; Benedetto, S. Design of spatially coupled Turbo product codes for optical communications. In Proceedings of the 2021 11th International Symposium on Topics in Coding (ISTC), Montreal, QC, Canada, 30 August–3 September 2021; pp. 1–5. [Google Scholar] [CrossRef]
Smith, B.P.; Farhood, A.; Hunt, A.; Kschischang, F.R.; Lodge, J. Staircase codes: FEC for 100 Gb/s OTN. IEEE J. Lightw. Technol. 2012, 30, 110–117. [Google Scholar] [CrossRef]
Zhang, L.M.; Kschischang, F.R. Staircase codes with 6% to 33% overhead. IEEE J. Lightw. Technol. 2014, 32, 1999–2002. [Google Scholar] [CrossRef]
Zhang, L. Analysis and Design of Staircase Codes for High Bit-Rate Fibre-Optic Communication. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 2017. Available online: https://tspace.library.utoronto.ca/bitstream/1807/79549/3/Zhang_Lei_201706_PhD_thesis.pdf (accessed on 11 June 2025).
Shehadeh, M.; Kschischang, F.R.; Sukmadji, A.Y. Generalized staircase codes with arbitrary bit degree. In Proceedings of the 2024 Optical Fiber Communications Conference and Exhibition (OFC), San Diego, CA, USA, 24–28 March 2024; pp. 1–3. Available online: https://ieeexplore.ieee.org/abstract/document/10526860 (accessed on 11 June 2025).
Sukmadji, A.Y.; Martínez-Peñas, U.; Kschischang, F.R. Zipper codes. IEEE J. Lightw. Technol. 2022, 40, 6397–6407. [Google Scholar] [CrossRef]
Ryan, W.; Lin, S. Channel Codes: Classical and Modern; Cambridge University Press: Cambridge, UK, 2009; Available online: https://www.cambridge.org/fr/universitypress/subjects/engineering/communications-and-signal-processing/channel-codes-classical-and-modern?format=HB&isbn=9780521848688 (accessed on 11 June 2025).
Ahmad, T. Polar Codes for Optical Communications. Ph.D. Thesis, Bilkent University, Ankara, Turkey, 2016. Available online: https://api.semanticscholar.org/CorpusID:116423770 (accessed on 11 June 2025).
Barakatain, M.; Kschischang, F.R. Low-complexity concatenated LDPC-staircase codes. IEEE J. Lightw. Technol. 2018, 36, 2443–2449. [Google Scholar] [CrossRef]
i Amat, A.G.; Liva, G.; Steiner, F. Coding for optical communications—Can we approach the Shannon limit with low complexity? In Proceedings of the 45th European Conference on Optical Communication (ECOC 2019), Dublin, Ireland, 22–26 September 2019; pp. 1–4. [Google Scholar] [CrossRef]
Zhang, L.M.; Kschischang, F.R. Low-complexity soft-decision concatenated LDGM-staircase FEC for high-bit-rate fiber-optic communication. IEEE J. Lightw. Technol. 2017, 35, 3991–3999. [Google Scholar] [CrossRef]
Agrawal, G.P. Nonlinear Fiber Optics, 6th ed.; Academic Press: San Francisco, CA, USA, 2019. [Google Scholar]
Kramer, G.; Yousefi, M.I.; Kschischang, F. Upper bound on the capacity of a cascade of nonlinear and noisy channels. arXiv 2015, arXiv:1503.07652, 1–4. [Google Scholar]
Secondini, M.; Rommel, S.; Meloni, G.; Fresi, F.; Forestieri, E.; Poti, L. Single-step digital backpropagation for nonlinearity mitigation. Photonic Netw. Commun. 2016, 31, 493–502. [Google Scholar] [CrossRef]
Union, I.T. G.709: Interface for the Optical Transport Network (OTN). 2020. Available online: https://www.itu.int/rec/T-REC-G.709/ (accessed on 11 June 2025).
Polyanskiy, Y.; Poor, H.V.; Verdu, S. Channel coding rate in the finite blocklength regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
Mezard, M.; Montanari, A. Information, Physics, and Computation; Oxford University Press: Oxford, UK, 2009. [Google Scholar]
Dolecek, L.; Divsalar, D.; Sun, Y.; Amiri, B. Non-binary protograph-based LDPC codes: Enumerators, analysis, and designs. IEEE Trans. Inf. Theory 2014, 60, 3913–3941. [Google Scholar] [CrossRef]
Boutillon, E.; Conde-Canencia, L.; Al Ghouwayel, A. Design of a GF(64)-LDPC decoder based on the EMS algorithm. IEEE Trans. Circ. Syst. I 2013, 60, 2644–2656. [Google Scholar] [CrossRef]
Liang, Y.; Lam, C.T.; Wu, Q.; Ng, B.K.; Im, S.K. A model-driven deep learning-based non-binary LDPC decoding algorithm. TechRxiv 2024. [Google Scholar] [CrossRef]
Fu, Y.; Zhu, X.; Li, B. A survey on instance selection for active learning. Knowl. Inf. Syst. 2013, 35, 249–283. [Google Scholar] [CrossRef]
Noghrei, H.; Sadeghi, M.R.; Mow, W.H. Efficient active deep decoding of linear codes using importance sampling. IEEE Commun. Lett. 2024. [Google Scholar] [CrossRef]
Helmling, M.; Scholl, S.; Gensheimer, F.; Dietz, T.; Kraft, K.; Ruzika, S.; Wehn, N. Database of Channel Codes and ML Simulation Results. 2024. Available online: https://rptu.de/channel-codes/ml-simulation-results (accessed on 11 June 2025).
Hu, X.Y.; Eleftheriou, E.; Arnold, D.M. Progressive edge-growth Tanner graphs. In Proceedings of the GLOBECOM’01. IEEE Global Telecommunications Conference (Cat. No.01CH37270), San Antonio, TX, USA, 25–29 November 2001; Volume 2, pp. 995–1001. [Google Scholar] [CrossRef]
Tasdighi, A.; Yousefi, M. The Repository for the Papers on the Adaptive Weighted Belief Propagation. 2025. Available online: https://github.com/comsys2/adaptive-wbp (accessed on 11 June 2025).
Tal, I.; Vardy, A. List decoding of Polar codes. IEEE Trans. Inf. Theory 2015, 61, 2213–2226. [Google Scholar] [CrossRef]
Süral, A.; Sezer, E.G.; Kolağasıoğlu, E.; Derudder, V.; Bertrand, K. Tb/s Polar successive cancellation decoder 16 nm ASIC implementation. arXiv 2020, arXiv:2009.09388. [Google Scholar]
Cassagne, A.; Hartmann, O.; Léonardon, M.; He, K.; Leroux, C.; Tajan, R.; Aumage, O.; Barthou, D.; Tonnellier, T.; Pignoly, V.; et al. AFF3CT: A fast forward error correction toolbox. SoftwareX 2019, 10, 100345. [Google Scholar] [CrossRef]
Tang, Y.; Zhou, L.; Zhang, S.; Chen, C. Normalized Neural Network for Belief Propagation LDPC Decoding. In Proceedings of the 2021 IEEE International Conference on Networking, Sensing and Control (ICNSC), Xiamen, China, 3–5 December 2021. [Google Scholar]
Tazoe, K.; Kasai, K.; Sakaniwa, K. Efficient termination of spatially-coupled codes. In Proceedings of the 2012 IEEE Information Theory Workshop, Lausanne, Switzerland, 3–7 September 2012; pp. 30–34. [Google Scholar] [CrossRef]
Takasu, T. PocketSDR. 2024. Available online: https://github.com/tomojitakasu/PocketSDR/tree/master/python (accessed on 11 June 2025).
Li, Z.; Kumar, B.V. A class of good quasi-cyclic low-density parity check codes based on progressive edge growth graph. In Proceedings of the Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 7–10 November 2004; Volume 2, pp. 1990–1994. [Google Scholar] [CrossRef]
Tasdighi, A.; Boutillon, E. Integer ring sieve for constructing compact QC-LDPC codes with girths 8, 10, and 12. IEEE Trans. Inf. Theory 2022, 68, 35–46. [Google Scholar] [CrossRef]
Li, Z.; Chen, L.; Zeng, L.; Lin, S.; Fong, W. Efficient encoding of quasi-cyclic low-density parity-check codes. IEEE Trans. Commun. 2006, 54, 71–81. [Google Scholar] [CrossRef]
Mitchell, D.G.; Rosnes, E. Edge spreading design of high rate array-based SC-LDPC codes. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 2940–2944. [Google Scholar] [CrossRef]
Lentmaier, M.; Prenda, M.M.; Fettweis, G.P. Efficient message passing scheduling for terminated LDPC convolutional codes. In Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings, St. Petersburg, Russia, 31 July–5 August 2011; pp. 1826–1830. [Google Scholar] [CrossRef]
Ali, I.; Kim, J.H.; Kim, S.H.; Kwak, H.; No, J.S. Improving windowed decoding of SC LDPC codes by effective decoding termination, message reuse, and amplification. IEEE Access 2018, 6, 9336–9346. [Google Scholar] [CrossRef]
Land, I. Code Design with EXIT Charts. 2013. Available online: https://api.semanticscholar.org/CorpusID:61966354 (accessed on 11 June 2025).
Koike-Akino, T.; Millar, D.S.; Kojima, K.; Parsons, K. Stochastic EXIT design for low-latency short-block LDPC codes. In Proceedings of the 2020 Optical Fiber Communications Conference and Exhibition (OFC), San Diego, CA, USA, 8–12 March 2020; pp. 1–3. Available online: https://ieeexplore.ieee.org/document/9083080 (accessed on 11 June 2025).

Figure 1. Block diagram of an optical fiber transmission system.

Figure 2. Tanner graph unrolled to an RNN.

Figure 3. Adaptive decoders. (a) Parallel decoders; (b) the two-stage decoder.

WMS (γ, L (y))

refers to

D_{γ} (y)

.

Figure 3. Adaptive decoders. (a) Parallel decoders; (b) the two-stage decoder.

WMS (γ, L (y))

refers to

D_{γ} (y)

.

Figure 4. BER versus SNR ρ, for the AWGN channel in the low-rate regime. (a) BCH code

C_{1} (63, 36)

. Here, the curve for WMS, Type

T_{a}

&

T_{b}

is from ([16] Figure 8) and the curve for WMS SS-PAN is from ([18] Figure 5a). (b) QC-LDPC code

C_{2} (3224, 1612)

, (c) QC-LDPC code

C_{3} (4016, 2761)

, (d) 5G-NR LDPC code

C_{4} (420, 180)

. Here, the curve for Graph NN is from ([33] Figure 5). (e) CCSDS LDPC code

C_{5} (128, 64)

. In this and the next sub-figure, the Autoregressive BP and Transformers curves are from [30] and [34], respectively. (f) BCH code

C_{1} (63, 36)

. Figures (d–f) show that adaptive decoders achieve the performance of the static decoders with less complexity.

Figure 4. BER versus SNR ρ, for the AWGN channel in the low-rate regime. (a) BCH code

C_{1} (63, 36)

. Here, the curve for WMS, Type

T_{a}

&

T_{b}

is from ([16] Figure 8) and the curve for WMS SS-PAN is from ([18] Figure 5a). (b) QC-LDPC code

C_{2} (3224, 1612)

, (c) QC-LDPC code

C_{3} (4016, 2761)

, (d) 5G-NR LDPC code

C_{4} (420, 180)

. Here, the curve for Graph NN is from ([33] Figure 5). (e) CCSDS LDPC code

C_{5} (128, 64)

. In this and the next sub-figure, the Autoregressive BP and Transformers curves are from [30] and [34], respectively. (f) BCH code

C_{1} (63, 36)

. Figures (d–f) show that adaptive decoders achieve the performance of the static decoders with less complexity.

Figure 5. The scatter plot of

(e_{1}, e_{2})

for

C_{9}

at

E_{b} / N_{0} = 4.25

dB, for the AWGN channel. The scaled Gaussian approximation curve is fitted per axis.

Figure 5. The scatter plot of

(e_{1}, e_{2})

for

C_{9}

at

E_{b} / N_{0} = 4.25

dB, for the AWGN channel. The scaled Gaussian approximation curve is fitted per axis.

Figure 6. Performance of the polar code

C_{9} (1024, 854)

versus QC-LDPC codes

C_{6} (1050, 875)

and

C_{7} (1050, 850)

, for the AWGN channel in the high-rate regime. The curve for OSC decoder is from [89].

Figure 6. Performance of the polar code

C_{9} (1024, 854)

versus QC-LDPC codes

C_{6} (1050, 875)

and

C_{7} (1050, 850)

, for the AWGN channel in the high-rate regime. The curve for OSC decoder is from [89].

Figure 7. Performance of the static and adaptive MS decoder for

C_{8} (4260, 3834)

at

E_{b} / N_{0} = 4

dB, for the AWGN channel in the high-rate regime.

Figure 7. Performance of the static and adaptive MS decoder for

C_{8} (4260, 3834)

at

E_{b} / N_{0} = 4

dB, for the AWGN channel in the high-rate regime.

Table 1. Codes in this paper.

AWGN Channel
Low rate	High rate
BCH $C_{1} (63, 36)$ , $r = 0.57$	QC-LDPC $C_{6} (1050, 875)$ , $0.83$
QC-LDPC $C_{2} (3224, 1612)$ , $0.5$	QC-LDPC $C_{7} (1050, 850)$ , $0.81$
QC-LDPC $C_{3} (4016, 2761)$ , $0.69$	QC-LDPC $C_{8} (4260, 3834)$ , $0.9$
Irregular LDPC $C_{4} (420, 180)$ , $0.43$	Polar $C_{9} (1024, 854)$ , $0.83$
Irregular LDPC $C_{5} (128, 64)$ , $0.5$
Optical Fiber Channel
Inner code	Outer code
Single-edge QC-LDPC $C_{10} (4000, 3680)$ , $0.92$	Multi-edge QC-LDPC $C_{11} (3680, 3520)$ , $0.96$
Non-binary multi-edge $C_{12} (800, 32)$

Table 2. The parameters of the fiber-optic link.

Parameter Name	Value
Transmitter parameters
WDM channels	5
Symbol rate $R_{s}$	32 Gbaud
RRC roll-off	0.01
Channel frequency spacing	33 GHz
Fiber channel parameters
Attenuation $(α)$	0.2 dB/km
Dispersion parameter (D)	17 ps/nm/km
Nonlinearity parameter $(γ)$	1.2 l/(W·km)
Span configuration	8 × 80 km
EDFA gain	16 dB
EDFA noise figure	5 dB

Table 3. The mean and variance

(θ^{(t)}, σ^{(t)})

of

(x_{1}^{(t)}, \dots, x_{4}^{(t)})

in WMS for the AWGN channel.

Table 3. The mean and variance

(θ^{(t)}, σ^{(t)})

of

(x_{1}^{(t)}, \dots, x_{4}^{(t)})

in WMS for the AWGN channel.

Code	$C_{1}$	$C_{2}$	$C_{3}$
$t = 1$	$(0.99, 0.019)$ $(0.96, 0.98, 0.99, 1.02)$	$(0.90, 0.026)$ $(0.86, 0.89, 0.90, 0.94)$	$(0.91, 0.023)$ $(0.87, 0.90, 0.92, 0.95)$
$t = 2$	$(0.97, 0.036)$ $(0.91, 0.96, 0.98, 1.02)$	$(0.84, 0.029)$ $(0.79, 0.83, 0.85, 0.89)$	$(0.86, 0.030)$ $(0.81, 0.85, 0.87, 0.90)$
$t = 3$	$(0.91, 0.049)$ $(0.83, 0.89, 0.92, 0.99)$	$(0.73, 0.032)$ $(0.68, 0.72, 0.74, 0.78)$	$(0.75, 0.031)$ $(0.69, 0.74, 0.76, 0.80)$
$t = 4$	$(0.70, 0.086)$ $(0.56, 0.67, 0.73, 0.84)$	$(0.63, 0.036)$ $(0.57, 0.62, 0.64, 0.69)$	$(0.63, 0.034)$ $(0.57, 0.61, 0.64, 0.68)$
$t = 5$	$(0.40, 0.175)$ $(0.12, 0.34, 0.46, 0.68)$	–	–

Table 4. Computational complexity of decoders, for the AWGN channel.

	$γ^{* (t)}$	$α^{* (t)}$	Average RM per Iteration
	$γ^{* (t)}$	$α^{* (t)}$	$C_{1}$	$C_{2}$	$C_{3}$
No weight sharing
WMS [16]	$γ_{v, c}^{(t)}$	1	768	25,792	40,160
Weight sharing
WMS, Type $T_{a}$	$γ_{v, c}$	1	768	25,792	40,160
WMS, Type $T_{a} V C$	$γ$	1	63	3226	4016
Parallel WMS Type $T_{b} V C$ , $ν = 16$	$γ^{(t)}$	1	1440	77,376	84,336
Parallel WMS Type $T_{b} V C$ , $ν = 64$	$γ^{(t)}$	1	–	≃ $3.09 \times 10^{5}$	≃ $3.37 \times 10^{5}$
Parallel WMS Type $T_{b} V C$ , $ν = 1024$	$γ^{(t)}$	1	92,340	–	–
Two-stage decoder Type $T_{b} V C$ , $τ_{prun} = 0.001$	$γ^{(t)}$	1	≃300	≃14,093	≃17,558
WMS SS−PAN, Type $T_{a} V C$ [18]	$γ^{(t)}$	1	153	8060	9287

Table 5. Concatenated inner binary QC-LDPC code

C_{10}

and outer SC-QC-LDPC code

C_{12}

with an

r_{total} = 0.88

for the optical fiber channel. The sections for

{BER}_{i}

0.012 and 0.025 correspond to average powers −10 and −11 dBm, respectively. NCGs are in dB.

Table 5. Concatenated inner binary QC-LDPC code

C_{10}

and outer SC-QC-LDPC code

C_{12}

with an

r_{total} = 0.88

for the optical fiber channel. The sections for

{BER}_{i}

0.012 and 0.025 correspond to average powers −10 and −11 dBm, respectively. NCGs are in dB.

${BER}_{i}$	Inner-SD Decoder	${BER}_{o}$ Inner	${BER}_{o}$ Total	NCG Inner	NCG Total	${NCG}_{f}$ Inner	${NCG}_{f}$ Total	$Gap to {NCG}_{f}$ Inner	$Gap to {NCG}_{f}$ Total
$0.012$	AWMS	$3.29 \times 10^{- 6}$	$4.52 \times 10^{- 8}$	$5.64$	$6.93$	$9.38$	$9.44$	$3.74$	$2.51$
	NNMS $θ = 0.75$	$4.02 \times 10^{- 6}$	$5.43 \times 10^{- 7}$	$5.56$	$6.13$	$9.38$	$9.44$	$3.82$	$3.31$
	MS	$4.77 \times 10^{- 6}$	$7.75 \times 10^{- 7}$	$5.49$	$6.00$	$9.38$	$9.44$	$3.89$	$3.44$
$0.025$	AWMS	$0.019$	$0.017$	$0.13$	$0.12$	$10.20$	$10.28$	$10.07$	$10.16$
	NNMS $θ = 0.72$	$0.02$	$0.018$	$0.04$	$0.03$	$10.20$	$10.28$	$10.16$	$10.25$
	MS	$0.023$	$0.02$	$- 0.20$	$- 0.15$	$10.20$	$10.28$	$10.40$	$10.43$

Table 6. Concatenated inner non-binary QC-LDPC code

C_{11}

and outer SC-QC-LDPC code

C_{12}

with

r_{total} = 0.88

for the optical fiber channel. The sections for

{BER}_{i}

0.012 and 0.025 correspond to average powers −10 and −11 dBm, respectively. NCGs are in dB.

Table 6. Concatenated inner non-binary QC-LDPC code

C_{11}

and outer SC-QC-LDPC code

C_{12}

with

r_{total} = 0.88

for the optical fiber channel. The sections for

{BER}_{i}

0.012 and 0.025 correspond to average powers −10 and −11 dBm, respectively. NCGs are in dB.

${BER}_{i}$	Inner-SD Decoder	${BER}_{o}$ Inner	${BER}_{o}$ Total	NCG Inner	NCG Total	${NCG}_{f}$ Inner	${NCG}_{f}$ Total	$Gap to {NCG}_{f}$ Inner	$Gap to {NCG}_{f}$ Total
$0.012$	AWEMS	$3.21 \times 10^{- 8}$	$2.74 \times 10^{- 9}$	$7.23$	$7.69$	$9.38$	$9.44$	$2.15$	$1.75$
	NNEMS $θ = 0.2$	$4.61 \times 10^{- 8}$	$2.11 \times 10^{- 8}$	$7.12$	$7.15$	$9.38$	$9.44$	$2.26$	$2.29$
	EMS	$2.44 \times 10^{- 7}$	$8.20 \times 10^{- 8}$	$6.60$	$6.75$	$9.38$	² $9.44$	$2.78$	$2.69$
$0.025$	AWEMS	$0.0063$	$0.0051$	$1.73$	$1.80$	$10.20$	$10.28$	$8.47$	$8.48$
	NNEMS $θ = 0.25$	$0.0087$	$0.0075$	$1.32$	$1.32$	$10.20$	$10.28$	$8.88$	$8.96$
	EMS	$0.025$	$0.022$	$- 0.36$	$- 0.32$	$10.20$	$10.28$	$10.56$	$10.60$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tasdighi, A.; Yousefi, M. Adaptive Learned Belief Propagation for Decoding Error-Correcting Codes. Entropy 2025, 27, 795. https://doi.org/10.3390/e27080795

AMA Style

Tasdighi A, Yousefi M. Adaptive Learned Belief Propagation for Decoding Error-Correcting Codes. Entropy. 2025; 27(8):795. https://doi.org/10.3390/e27080795

Chicago/Turabian Style

Tasdighi, Alireza, and Mansoor Yousefi. 2025. "Adaptive Learned Belief Propagation for Decoding Error-Correcting Codes" Entropy 27, no. 8: 795. https://doi.org/10.3390/e27080795

APA Style

Tasdighi, A., & Yousefi, M. (2025). Adaptive Learned Belief Propagation for Decoding Error-Correcting Codes. Entropy, 27(8), 795. https://doi.org/10.3390/e27080795

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Learned Belief Propagation for Decoding Error-Correcting Codes

Abstract

1. Introduction

2. Notation

3. Channel Models

3.1. AWGN Channel

3.2. Optical Fiber Channel

3.3. Performance Metrics

4. Weighted Belief Propagation

4.1. Parameter Sharing Schemes

4.2. WBP over F q

5. Adaptive Learned Message Passing Algorithms

5.1. Parallel Decoders

5.2. Two-Stage Decoder

6. Performance and Complexity Comparison

6.1. AWGN Channel

6.2. Optical Fiber Channel

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Protograph-Based QC-LDPC Codes

Appendix A.1. Construction for the Single-Edge Case

Appendix A.2. Construction for the Multi-Edge Case

Appendix A.3. Construction for Non-Binary Codes

Appendix A.4. Encoder and Decoder

Appendix B. Spatially Coupled LDPC Codes

Appendix B.1. Construction

Appendix B.2. Encoder

Appendix B.3. Sliding Window Decoder

Appendix C. Parameters of Codes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2. WBP over $F_{q}$