Information-Bottleneck Decoding of High-Rate Irregular LDPC Codes for Optical Communication Using Message Alignment

: In high-throughput applications, low-complexity and low-latency channel decoders are inevitable. Hence, for low-density parity-check (LDPC) codes, message passing decoding has to be implemented with coarse quantization—that is, the exchanged beliefs are quantized with a small number of bits. This can result in a signiﬁcant performance degradation with respect to decoding with high-precision messages. Recently, so-called information-bottleneck decoders were proposed which leverage a machine learning framework (i.e., the information bottleneck method) to design coarse-precision decoders with error-correction performance close to high-precision belief-propagation decoding. In these decoders, all conventional arithmetic operations are replaced by look-up operations. Irregular LDPC codes for next-generation ﬁber optical communication systems are characterized by high code rates and large maximum node degrees. Consequently, the implementation complexity is mainly inﬂuenced by the memory required to store the look-up tables. In this paper, we show that the complexity of information-bottleneck decoders remains manageable for irregular LDPC codes if our proposed construction approach is deployed. Furthermore, we reveal that in order to design information bottleneck decoders for arbitrary degree distributions, an intermediate construction step which we call message alignment has to be included. Exemplary numerical simulations show that incorporating message alignment in the construction yields a 4-bit information bottleneck decoder which performs only 0.15 dB worse than a double-precision belief propagation decoder and outperforms a min-sum decoder.


Introduction
It is well-known that the decoding of channel codes is one of the major bottlenecks in baseband signal processing.To realize next-generation fiber-optical systems with bit rates of 400 Gbit/s and above, high-speed decoding algorithms are inevitable.A very powerful and well-known class of error-correcting codes are the so-called low-density parity-check (LDPC) codes [1].The design of LDPC codes for optical communications has been discussed in recent works [2][3][4][5][6][7].In contrast to wireless communication, the coding schemes in optical communication use comparably high code rates R > 0.8 to keep the redundancy at a minimum.Furthermore, bit error rates of 10 −15 are required in optical communication systems, which is often challenging to achieve with LDPC codes due to their characteristic error floor.Nevertheless, recent works [2][3][4][5][6][7] underlined the potential of LDPC codes as promising candidates in future optical communication systems.For a detailed summary of coding schemes for optical systems, we refer the interested reader to [7,8].LDPC codes fully unfold their capacity-approaching error-correction capabilities under message passing decoding only if precise and computationally complex belief propagation decoding is performed.The high computational complexity is mainly induced by two properties of message passing decoding: On the one hand, to reliably exchange soft information and thus ensure optimum performance, the exchanged messages have to be represented precisely.On the other hand, elaborate arithmetic operations which combine incoming beliefs (e.g., at a check node) form a major performance bottleneck.The computational burden of high-precision message passing decoding diminishes the achievable throughput and latency and requires impractically complex implementations.Instead, practical hardware implementations use finite-precision message passing algorithms where the messages are quantized and the node operations are simplified by smart approximations.However, the error-rate performance of such finite-precision decoders deteriorates significantly with decreasing precision [9].
Recently, a fundamentally different approach was proposed: the so-called information bottleneck or look-up table decoder [9][10][11][12][13][14].This decoder leverages ideas from information theory and machine learning and differs from conventional finite-precision decoders mainly in the following two points: 1. Instead of executing the conventional arithmetic exactly or approximated in the nodes with discrete values, the node operations are replaced by relevant-information-maximizing look-up tables which map discrete input messages onto discrete output messages.The required message mappings are designed using a relevant-information-preserving clustering technique, such as the information bottleneck method as shown in [10,11] or using similar algorithms [12,14].2. The relevant-information-maximizing look-up tables let messages that are log-likelihood ratios (LLRs) become obsolete.Instead, integer-valued pointers to look-up table entries, sometimes called cluster indices, are exchanged, which do not represent LLRs.
In our previous works, we have already shown that four bits are sufficient to represent the exchanged messages [10,11,15]: our proposed 4-bit information bottleneck decoder for regular LDPC codes approaches the performance of double precision belief-propagation decoding up to 0.1 dB over E b /N 0 with a vastly reduced implementation complexity [11].Recently published investigations of FPGA implementations of similarly designed look-up table-based decoders in [16] showed that look-up table decoders for regular codes can achieve a throughput of 588 Gbit/s and are superior to conventional decoding techniques in terms of energy efficiency and area efficiency.
However, the existing work is mainly limited to regular LDPC codes.A first step towards information bottleneck decoders for irregular LDPC codes was described in [9], where the authors advocate that existing LDPC codes are often ill-suited to information-bottleneck decoding and thus proposed a joint optimization of the node-degree distribution and the look-up tables.That is, they proposed to adapt the code to the specific shortcomings of the available information bottleneck decoding method instead of fundamentally changing the decoder design.
In contrast, in [17] we devised a generalized design approach suitable for arbitrary irregular LDPC codes as defined in many standards (e.g., IEEE 802.11 or DVB-S2) without any modification of the LDPC code itself.We proposed an intermediate processing step called message alignment which we first applied in the context of information-bottleneck channel quantizers for higher-order modulation schemes [18] and distributed information-bottleneck sensor design [19].
Caused by the high coding rate required in optical communication systems, respective LDPC codes incorporate very high node degrees.In this paper, we show that existing construction techniques cause an undesirable growth in memory demand when they are applied to high rate codes, due to a large number of look-up tables.
In detail, this paper contains the following main contributions:

•
We extend the decoder construction framework from [13,19] to be applicable to arbitrary irregular LDPC codes also with high code rates.

•
We introduce a novel tree-like look-up pattern.With this strategy, the relation between the number of look-ups required per iteration and node degree changes from linear to logarithmic.

•
We derive the underlying information-theoretic problem formulation and explain how the intermediate optimization technique called message alignment can be incorporated.

•
We construct a 4-bit information bottleneck decoder for irregular LDPC codes with a code rate R = 0.8, where all conventional arithmetic in the nodes is replaced by simple look-up tables and only 4-bit integer-valued messages are passed.

•
Our proposed decoder achieves error-rates superior to min-sum decoding and only 0.15 dB away from double-precision belief propagation decoding.
The paper is organized as follows.The information bottleneck method and LDPC codes are briefly reviewed in Section 2. In Section 3, we propose message alignment, resulting in a general information bottleneck decoder design approach for arbitrary irregular LDPC codes.In Section 4 we provide detailed insights concerning the internal structure of the look-up tables replacing the arithmetic operations.Based on these considerations, a more efficient design strategy is proposed.In Section 5, numerical simulations comparing the performance of our proposed decoder with several reference systems are provided.Section 6 concludes the paper.

Prerequisites
This section briefly reviews LDPC codes and the information bottleneck method.Throughout the paper, we use the following notation: the elements y ∈ Y from the event space of a discrete random variable Y occur with probability Pr(Y = y), and p(y) defines the corresponding probability distribution.The cardinality or alphabet size of the random variable Y is denoted by |Y |.The joint distribution of X and Y is denoted p(x, y).

Low-Density Parity-Check (LDPC) Codes
An LDPC code is defined by a sparse N c × N v parity check matrix H.To analyze the structure of H, an LDPC code can be visualized using a Tanner graph.A Tanner graph is a bipartite graph consisting of one set for the N v variable nodes and the other set for the N c check nodes.This is schematically illustrated in Figure 1.Irregular LDPC codes are typically described using their ensemble characteristics, and therefore, the connections between the two sets are characterized probabilistically by the edge-degree distributions for the variable nodes and the check nodes [20]: where λ d denotes the fraction of edges connected to variable nodes with degree d and ρ d denotes the fraction of edges connected to check nodes with degree d.The LDPC code is called regular if all variable nodes have the same degree d v and all check nodes have the same degree d c , and otherwise the LDPC code is called irregular.The general derivations in the next sections hold for both node types (i.e., variable nodes and check nodes).Thus, we formally introduce a general edge-degree distribution to indicate when the derived equations are applicable for any node type.

The Information Bottleneck Method
The information bottleneck method is a generic unsupervised clustering framework from the field of machine learning [21,22].The so-called information bottleneck setup is visualized in Figure 2. Having defined a random variable X termed the relevant random variable, the principal aim of the information bottleneck method is to extract all relevant information contained in an observation Y, by squeezing Y through a compact bottleneck represented by the compression variable T.More precisely, the information bottleneck method attempts to design a compression mapping p(t|y) which maps an observation Y onto a compact compression variable T.  The key idea is to design p(t|y) such that the maximum possible amount of mutual information I(T; X) ≤ I(X; Y) is preserved under a constraint for the cardinality |T |.Several information bottleneck algorithms exist, intended to find a locally-optimum compression mapping p(t|y) [22].All information bottleneck algorithms input a joint distribution p(x, y) and yield the compression mapping p(t|y) and a joint distribution p(x, t) = p(x|t)p(t).Typically, in the context of signal processing, one assumes a deterministic input-output mapping (e.g., at a quantizer).Hence, it is often sufficient to consider only deterministic mappings p(t|y) (i.e., p(t|y) ∈ {0, 1} ∀(t, y)).Such mappings can be interpreted as look-up tables mapping y onto t.More details related to the information bottleneck method can be found in [21,22].Throughout the paper, we use the terms mapping and clustering synonymously.A more detailed overview and comparison of information bottleneck algorithms can be found in [22,23].Please note that when performing a mapping p(t|y), any physical meaning contained in Y is lost (i.e., the compression variable T on its own has no distinct meaning and the event space T can be chosen arbitrarily).Hence, a coupling between the relevant variable X and T is needed, given by p(x|t).We sometimes refer to this distribution as the meaning of a cluster index t with respect to x.

Information-Bottleneck Signal Processing and Information Bottleneck Graphs
Despite having its origin in machine learning and numerous classification problems, the information bottleneck has recently been used to design various communication systems.Application fields pairing the information bottleneck and signal processing include LDPC decoding, channel estimation, relaying, C-RAN, polar code construction, sensor networks, etc. [10,18,19,[24][25][26].
Leveraging information theoretical concepts to design practical systems is termed information-bottleneck signal processing [15].Information-bottleneck signal processing using the information bottleneck method differs from conventional approaches in mainly three points.The first fundamental difference is the focus on the preservation of the so-called relevant information as a distortion measure in contrast to conventional measures like the Euclidean distance or the mean-squared error.As a result, it is possible to obtain compact but very informative representations of the processed samples.In a second step, these compression mappings or clusterings replace the actual arithmetic operations.Thus, in addition to a reduction of memory due to the compression, the computational complexity is also reduced by replacing arithmetic operations with look-up operations.Finally, information-bottleneck signal processing is a system-oriented design technique rather than a function-oriented technique.That is, instead of finding local approximations of certain functions, the overall aim of information-bottleneck signal processing is to optimize the flow of relevant information in the system from the data aggregation to the decision unit.For a more detailed review of information-bottleneck signal processing, we refer the reader to [15].Due to the system-oriented design, finding the most efficient structure which optimizes the flow of relevant information can be quite cumbersome [15].Therefore, information bottleneck graphs were introduced in [27] which extend factor graphs [28] by a compact description of the flow of relevant information to provide a vivid graphical representation of the system.Information bottleneck graphs indicate where the use of information-bottleneck clusterings is beneficial, and thus help to simplify the design [27].For this purpose, a modified node symbol is used for all factor nodes representing compression mappings p(t|y) that were designed using the information bottleneck method.In an information bottleneck graph, the typical square notation of factor nodes is replaced by a trapezoid symbol for all compression mappings.An example is shown in Figure 3.The relevant variable labels are written in the center of the trapezoid symbol to illustrate the information bottleneck with respect to this variable.The compression variable is connected to the shortest side of the trapezoid.All other connected variables correspond to a (possibly multivariate) observed variable.The example in Figure 3 illustrates the flow of relevant information on x from random vector y = [y 1 , . . ., y M ] T to t.That is, t = f (y) compresses y while preserving relevant information on x.

Information Bottleneck Decoders for Irregular LDPC Codes Using Message Alignment
This section contains our main contributions.After a brief review of information bottleneck decoder design for regular LDPC codes, we generalize this concept to arbitrary irregular LDPC codes.First, we derive the relevant-information-preserving message mappings, which replace the conventional arithmetic operations.Then, we present two perspectives, a graphical and an information-theoretical one, to motivate the need for an additional step in the decoder design.As a result, we propose a technique that we call message alignment.

Information-Bottleneck Channel Quantizer for Arbitrary Discrete Memoryless Channels
In every digital communication system, the received, possibly continuous channel output y is fed into an analog-to-digital converter to obtain discrete-valued receive samples (cf. Figure 4a).In theoretical considerations, the effect of the quantizer is often ignored.This assumption is only valid if the resolution of the quantizer is very high.When transmitting at very high speed, the quantizer marks the first bottleneck in the receiver chain: coarse quantization of only a few bits is required for high-throughput implementations.In the context of information-bottleneck signal processing, the quantizer is the first unit in the receiver which can be optimized using the information bottleneck method.Figure 4b depicts the respective information bottleneck graph.As mentioned in the previous section, the information bottleneck method requires the joint distribution p(x, y) of the relevant random variable X and the observed random variable Y.In general, an assumed channel model implies the transition probabilities described by p(y|x).The transition model p(y|x) multiplied by the prior distribution p(x) of the channel input yields the joint distribution p(x, y).In the case of a channel output quantizer, an information bottleneck algorithm which delivers a deterministic mapping p(t|y) is a natural choice.If designed using the information bottleneck, the quantization regions of an information-bottleneck channel quantizer are only described by a cluster index t, instead of a cluster representative which has to be represented with high resolution.These cluster indices, which are typically integers, are forwarded to subsequent units (e.g., for the channel decoder that is introduced in the next subsection).Information-bottleneck channel quantizers can also be generalized to support higher-order modulation schemes which require a de-mapper before the channel decoder [18].

Information Bottleneck Decoders for Regular LDPC Codes
As mentioned in the Introduction, the need for quasi-continuous LLRs and computationally complex node operations makes belief propagation decoding challenging.Information bottleneck decoders have been shown to overcome this impairment.That is, they pair low implementation complexity and near-optimal performance [10,12,13].The fundamental idea of these decoders is to propagate compressed but highly informative integer-valued messages along the edges of a Tanner graph.In a second step, node operations optimized for discrete input alphabets are designed.These operations are look-up operations mapping a set of discrete incoming messages onto a discrete outgoing message, thereby neglecting the original arithmetic operations.The vital task of this mapping is to ensure that as much relevant information as possible gained by the processing of the incoming messages is contained in the compactly represented output.A suitable information-theoretic framework for such problems is the information bottleneck method [21].
The information bottleneck method constructs relevant-information-maximizing clusterings given a joint probability distribution of the observed random variable Y and relevant random variable X.In the context of LDPC decoder design, the observed random variables are the M incoming discrete messages y = [y 1 , . . ., y M ] T , and the relevant random variable X is a codeword bit.For a variable node, X represents the underlying code bit b i of a particular node, whereas if the mapping is designed for a check node, X represents the (mod 2)-sum of the connected code bits b 1 , . . ., b M (cf. Figure 1).The joint distribution p(x, y) serving as input for the information bottleneck algorithms is determined using discrete density evolution [13].Although classical density evolution was originally intended to find the decoding thresholds of an LDPC code ensemble, previous works have shown that the joint distributions exchanged in discrete density evolution equal exactly the necessary input distributions for the information bottleneck method [10][11][12][13].The term discrete in discrete density evolution implies that instead of processing continuous joint distributions, the event space of the observed random variable Y (i.e., Y = {0, 1, . . ., |Y | − 1}) is also discrete, meaning that the realizations y are from a finite alphabet with cardinality |Y |.Hence, given an observation vector y, |Y | M possible input combinations exist.To prevent an exponential growth in the number of input combinations while passing p(x, y) over the Tanner graph during discrete density evolution, p(x, y) is squeezed through a compact information bottleneck.That is, a relevant-information preserving clustering p(t|y) is introduced such that the outgoing message t ∈ T = {0, 1, . . ., |T | − 1} is from a finite alphabet with cardinality |T | |Y | M .Once this clustering is found, the actual decoding simplifies to simple look-ups in offline generated tables, which map the sequence of incoming integers y onto an outgoing integer-valued message t.For a more mathematical analysis and a detailed description of information bottleneck decoders for regular codes, we refer the reader to [10][11][12][13].

Relevant-Information-Preserving Clusterings for Arbitrary Irregular LDPC Codes
In contrast to regular LDPC codes, irregular LDPC codes are characterized by nodes with various degrees (i.e., the number of incoming messages differs).Thus, the input joint distribution p d (x, y) for the information bottleneck depends on the node degree d.Consequently, it is not sufficient to design message mappings only for each node type, but for each node type considering the individual node degrees.For ease of notation, we introduce subscripts for the distributions indicating the node degree.That is, p d (t|y) is the relevant-information-preserving clustering at a node with degree d found given the input joint distribution p d (x, y).In density evolution, a code ensemble is considered.That is, instead of a particular irregular LDPC code with a certain parity-check matrix, the connectivity between variable and check nodes is only known on average defined by the degree distribution.To construct the required input joint distributions p d (x, y), discrete density evolution from [13] needs to be extended to consider the degree distribution of the code ensemble.
Conventional density evolution is a technique to analyze the error correction performance of the code ensemble.In the conventional density evolution scheme for irregular codes, one tracks the average output distributions p d(x, y) of the variable and the check nodes.Therefore, the actual output distribution p d (x, y) for each node degree d is weighted according to the edge-degree distributions.That is, where ω d is from the edge-degree distribution [20].
In discrete density evolution, a relevant-information-preserving clustering is crucial to inhibit the exponential growth of the possible input combinations.Thus, p d(x, t) instead of p d(x, y) has to be tracked in discrete density evolution.We define the average distribution p d(x, t) as where Y vec d denotes the set of all possible combinations of y for a node with degree d.Please note that, assuming a trivial straightforward generalization of information bottleneck decoders to irregular LDPC codes, one would stop at this point.In this case, look-ups in node-degree-specific tables p d (t|y) would replace the node operations, and only integer values cluster indices would be exchanged.However, in the next subsections, we demonstrate that such a design approach is not beneficial.Given these findings, we propose to include an additional integral step in the construction process.

Message Alignment -A Graphical Perspective
In this subsection, some consequences of Definition (4) will be sketched, resulting in what we call the message alignment problem.Please recall that LLRs are not needed in the entire information bottleneck decoder during decoding, because only cluster indices t are exchanged.However, analyzing LLRs interpretation of a particular cluster index t in density evolution allows the limiting factor of the considered decoding method to be graphically illustrated for irregular codes.Figure 5a visualizes the meanings of certain cluster indices t with respect to x before and after averaging over the degree distribution according to (4).Here, L 2 (x|t) = log p 2 (X=0|t) p 2 (X=1|t) and L 4 (x|t) = log p 4 (X=0|t) p 4 (X=1|t) denote the conveyed meanings of a certain cluster t with respect to x of variable nodes with degree two and four before, that is, without averaging according to (4).Both compression variables have the same event space T .However, the meaning conveyed by a certain cluster with respect to X differs.This means that the cross and dot for the same index t in Figure 5a do not superimpose.For instance, if t = 13 and d = 2, L 2 (x|t) ≈ 2.88 and if d = 4, L 4 (x|t) ≈ 4.02 (cf.arrows in Figure 5a).After application of (4), the averaged meaning, that is, p d (X=1|t) (cf.black triangles in Figure 5a) lies somewhere in between and neither matches the originally intended belief of the variable node with degree two nor the one with degree four.Nevertheless, a closer look at Figure 5a leads to another interesting observation.Although the meanings corresponding to the same indices can differ significantly (i.e., L 2 (x|T = t) = L 4 (x|T = t)), quite similar meanings are expressed by different cluster indices.That is, L 2 (x|T = t 2 ) ≈ L 4 (x|T = t 4 ) for some t 2 = t 4 .The explanation of this phenomenon is as follows.The clusterings p d (t|y) are determined independently for different d.Although by construction the realizations of the compression variables t d ∈ T coincide, the realizations have no natural or physical interpretation which enable a meaningful order or relation (e.g., between t 2 ∈ T 2 and t 4 ∈ T 4 ).In other words, the messages are not aligned with respect to their belief on x.We advocate that this misalignment inhibits a straightforward application of (4).Therefore, knowing the node degrees d is extremely important to recapture L d (x|t).However, this node degree is not available at a receiving node in message passing decoding.At this point, we defer further evidence to Section 5, where it is shown that information bottleneck decoders without message alignment exhibit a notable performance degradation.

Message Alignment-An Information-Theoretic Perspective
In Figure 5a we noted by comparing the cross and dot markers that providing the node degree d together with the respective cluster index t is required to recover the proper belief on the relevant bit X.However, this approach would result in a very impractical decoder that is tailored to the actual connections between variable nodes and check nodes in a particular code.We propose to solve the underlying message alignment problem using information-theoretical concepts.Therefore, in this subsection, we first derive an information-theoretic formulation of the message alignment problem.
From an information theoretic point of view, I(X; T, D) upper-bounds the information on X of a node receiving an incoming discrete message t.Rewriting I(X; T, D) using the chain rule of mutual information yields I(X; T, D) = I(X; D|T) + I(X; T), where I(X; T) is the information about X obtained by receiving T alone and I(X; D|T) is the additional information gained by also providing the node degree if the cluster index is already known.In turn, if I(X; D|T) ≈ 0, from an information-theoretic point of view, conveying the delivering node degree d in addition to the cluster index yields no information gain about X.Thus, we propose a message mapping construction which minimizes I(X; D|T) such that exchanging the node degree in addition to the cluster index yields no information gain.Expanding I(X; T, D) for a variable node yields where D KL {.||.} denotes the Kullback-Leibler divergence.
The result from ( 7) has a direct intuitive link to Figure 5a.If the cross and dot markers would nearly superimpose for a particular cluster, the change in meaning introduced by averaging would be negligible.Similarly, in this case the Kullback-Leibler divergence between p d (x|t) and p d(x|t) would also be small.If this holds on average for all clusters and node degrees, I(X; D|T) ≈ 0. Motivated by the observation that similar meanings are expressed by different cluster indices paired with (7), our general idea is to map the different t d onto a variable z d such that the meanings p d (x|z) are aligned.Consequently, we define message alignment as a reordering strategy described by a deterministic mapping p d (z|t) to obtain min That is, message alignment assigns those indices t d with meanings p d (x|t) to a new cluster index z d which represents approximately the same meaning with respect to X. Figure 5b depicts the aligned meanings after application of the message alignment algorithm presented in the next subsection.As a consequence, I(X; D|Z) ≈ 0 and now z instead of t is exchanged over the edges of the Tanner graph.

Message Alignment Algorithm
To the best of our knowledge, due to the structure of ( 8) and the restriction to deterministic mappings p d (z|t), deriving an optimal implicit solution for ( 8) is not possible.In the literature, iterative algorithms have been proposed to minimize problems involving the Kullback-Leibler divergence [29].Thus, we propose an iterative algorithm as depicted in Figure 6.First, only the independently generated mappings for each node degree and the corresponding degree-specific cluster meanings p d (x|t) exist.The black squares in Figure 6 depict the actual message alignment steps.According to (8), in each step, a particular p d (x|t) is aligned with respect to the beliefs of the average distribution p d(x|z) (cf. Figure 6).To start the algorithm, p d(x|z) is initialized with an arbitrary mapping.To find a reordering p d (z|t) minimizing ( 8), a search over all candidates t for the best new cluster index z is performed.That is, which yields p d (z|t).Afterwards, p d (x|z) is computed using the determined p d (z|t).Then, p d(x, z) is updated similar to ( 4), but considering only the weights ω d of the already-aligned nodes.As visualized in Figure 6, this averaged distribution p d(x|z) is forwarded and serves as new input in the next alignment step, together with p d (x|t)-that is, the not-aligned cluster meanings of a different node with degree d .Again, the search according to ( 9) is performed.Once p d (z|t) is found, the averaged meaning is updated and forwarded.This processes, as shown in Figure 6, is performed for all node degrees and is repeated iteratively.
Once stable solutions for all message alignment mappings p d (z|t) are determined, the final average joint distribution p d(x, z) is computed: This distribution is passed in discrete density evolution and leveraged to design the relevant-information-preserving mappings.
Figure 5b visualizes the LLRs L 2 (x|z) and L 4 (x|z), that is, L 2 (x|t) and L 4 (x|t) after applying the deterministic mapping p 2 (z|t) and p 4 (z|t) which where found using the algorithm described above.The LLRs are now aligned (i.e., L 2 (x|z) ≈ L 4 (x|z)).Consequently, the average LLR L d(x|z) is quite similar to L 2 (x|z) and L 4 (x|z) (i.e., the original beliefs can propagate through the Tanner graph without significant distortion).The next section investigates how this reduced distortion improves the decoding performance compared to information bottleneck decoders designed without message alignment.The look-up tables used while decoding are p d (z|y) = ∑ t∈T p d (z|t)p d (t|y).
We advocate that message alignment does not affect the implementation complexity of an information bottleneck decoder.The look-up tables p d (z|y) have the same size as p d (t|y), and z instead of t is exchanged over the edges of the Tanner graph.Thus, introducing message alignment neither increases the number of needed lookup operations, nor does it increase the number of required lookup tables.However, when constructing the lookup tables offline, the message alignment algorithm has to be incorporated in the construction step.This results in slightly increased computational complexity of the construction process compared to the construction of information-bottleneck decoders without message alignment.

Optimizing the Node Structure
In this section we devise an optimized structure of the information-bottleneck nodes.First, the general internal processing of the variable nodes and check nodes described in [10] is reviewed.We show that with this design approach the number of required look-up tables and look-up operations depends linearly on the node degree.Especially for LDPC codes with a high code rate, where check nodes with very high degree exist, this linear relation affects the latency tremendously.Our proposed novel tree-like look-up strategy enables a more efficient decoding in terms of space complexity and latency.
Figure 1 depicts message passing decoding in a Tanner graph.Depending on the node type, the M incoming discrete messages y = [y 1 , . . ., y M ] T are processed in different ways to generate an outgoing message t.In general, it is possible to plug the joint distribution p(x, y) as input distribution in the information bottleneck algorithm.However, the vector y with M entries, all taken from the discrete alphabet Y, can take up to |Y | M distinct combinations.In turn, a huge look-up table p(t|y) with |Y | M entries needs to stored.Even for small node degrees, this exponential growth of entries prohibits a direct practical implementation.This problem is tackled by partitioning the equality constraint at the variable nodes and the (mod 2)-sum at the check nodes into a serial concatenation of simple partial operations with only two inputs.Such a so-called opened node is illustrated in Figure 7a.Due to the sequential concatenation of look-up tables, M − 1 different tables are required in each iteration and for each node degree.Since only two incoming discrete messages are used at a time, the size of the look-up table is |Y | 2 .Thus, instead of one large table with |Y | M entries, in total (M − 1) • |Y | 2 entries need to be stored for a decoder designed as in [10].While decoding using the sequential design, M − 1 look-up operations have to be performed.
The code rate of regular LDPC codes is , where d v is the variable node degree and d c denotes the check node degree.Consequently, to achieve a high code rate, very large check node degrees are required.In optical communications, often d c > 20.Hence, the sequential design is very inefficient in terms of memory demand and latency due to the large number of required look-up tables.Instead, we propose a more efficient look-up strategy that requires fewer tables.The proposed design is sketched in Figure 7b.We denote the stage of the look-up tree with s.We note that when using a tree-like pattern, the depth of the information bottleneck graph is reduced from O(M − 1) to O(2 • log 2 (M) ).The proposed tree-like structure makes use of the following observation: Since the look-up table depends only on the joint distribution which is the same if the "type" of incoming message is the same, it can be reused for the entire stage.That is, in the first stage (i.e., s = 1), always to incoming messages y 2i , y 2i+1 are combined which have the same probability distribution and thus yield the same joint distribution no matter which particular incoming messages y i are considered.In the next stage (i.e., s = 2), the inputs to the look-up table are the result of the compression and respectively the look-up from the previous stage (i.e., t 1,i ).Thus, these messages again have the same distribution and thus have the same joint distribution.This procedure continues until the final stage s = log 2 (M) is reached.In case M is not a power of two, at most 2 • log 2 (M) look-up stages are needed.See Appendix A for a detailed derivation.The algorithm to determine the actual number of needed stages if M is not a power of two is also given in Appendix A. Table 1 contains an overview of the required memory depending on the chosen node structure.

Reusing Intermediate Results
In information-bottleneck decoders, the actual decoding simplifies to look-up operations in the offline-generated tables.To compute an outgoing message, all incoming messages, except for the one received over the edge connected to the node we generate the outgoing message for, are considered.Hence, all outgoing messages are computed using a slightly different input vector y.However, parts of the input vector do not change for several outgoing messages.Thus, the total number of look-up operations per node can be reduced by the reuse of intermediate results.
Assuming a sequential node structure, we note that for example if only y M is changed, all t i for i = 0, . . ., M − 2 previous results could be reused.Thus, the number of total operations for all outgoing edges for a node with M incoming messages can be found as By exploiting the proposed tree structure, the number of look-up operations per node can be reduced even further compared to the sequential approach.The actual reuse potential depends largely on the internal structure of the tree.However, in the worst case, O(M log 2 M ) operations per node are required.In the next section, the achievable gains in terms of latency and memory efficiency are investigated.

Investigation and Results
In this section, we apply message alignment and the proposed tree-like node structure during the construction of information-bottleneck decoders for irregular LDPC codes.The investigated irregular code is taken from [3], where several irregular LDPCs relevant for optical communications were proposed and compared.We provide numerical simulations for our proposed information bottleneck decoder and several reference systems assuming transmission over an additive white Gaussian noise (AWGN) channel with BPSK modulation.We compare the decoder in terms of decoding performance, computational complexity, and memory demand.

Memory Demand
Given the degree distribution from [3], an information bottleneck decoder can be constructed as proposed in Section 3. We designed the decoder to perform at most i = 50 iterations.That is, i = 50 different tables for each node degree and node type have to be generated.For the considered code, Table 2 summarizes the memory demand.Clearly, the direct implementation with only one look-up table per iteration and node degree is not implementable due to the huge memory requirement of 104.6 • 10 24 kByte.The sequential approach proposed in [10] already yields a manageable complexity and allows the construction of an information bottleneck decoder that only needs 384 kByte of memory in total.Please note that due to very simple look-up operations leveraged for decoding, other than in conventional LDPC decoders, not the computational complexity but the memory demand and the memory access times are the only parameters affecting the space complexity and decoding throughput.Thus, it is crucial that the number of look-ups is reduced and parallelized to simultaneously tackle decoding speed and memory demand.Table 2 shows that the proposed tree-like node structure also achieved notable improvements for practically relevant codes.In total, only 134.4 kByte memory were required to realize the entire information bottleneck decoder.This is a reduction of 65%.Furthermore, the number of distinct look-up tables for the node with highest degree (i.e., the check node with d c = 23) could be reduced from 21 to 6, which is a reduction by 71%.Table 2 also includes the total number of look-ups needed to compute all outgoing messages per node and iteration.Clearly, the tree-like structure needs significantly less operations compared to the sequential structure, which enhances the decoding speed correspondingly.

Bit Error Rate (BER) Performance
For performance evaluation, BER curves for three conventional decoders and two information bottleneck decoders are shown in Figure 8.All compared systems performed at most i = 50 decoding iterations or stopped if the syndrome check was satisfied.The properties of the decoders used are summarized in Table 3.The first reference decoder (cf.dash-dotted blue curve in Figure 8) was a belief propagation decoder which receives quasi-continuous LLRs from the channel and performs box-plus operations at the check nodes and summations at the variable node.Clearly, the processing of quasi-continues LLRs in a digital implementation also requires very fine quantization.Standard processor architectures approximate the processing of continuous signal using floating-point or double-precision data types.
In a second step, we included a 4-bit channel output quantizer as described in [18] in our simulation.The belief propagation decoder receives only 16 distinct LLRs from the quantizer.However, the internal processing still uses quasi-continuous LLRs.We denote this approach "belief propagation decoding with quantized channel output" (cf.dashed green curve in Figure 8).
The third considered decoding approach was min-sum decoding (cf.dotted red curve in Figure 8).Just like the belief propagation decoder, the min-sum decoder receives channel LLRs which are not quantized and need to be represented precisely.These LLRs were exchanged during message passing decoding.However, the box-plus operation at the check node was approximated in min-sum decoding and thus much simpler compared to box-plus.
The aim of information bottleneck decoders is to combine both domains (i.e., simple node operations and compressed messages) such that only a small number of bits has to be exchanged.We restricted the cardinality of the compression variable T to 4 bits.3. LLR: log-likelihood ratio.
The solid magenta curve with "o" markers in Figure 8 corresponds to our proposed information bottleneck decoder, which uses message alignment during the construction.When constructed without message alignment, the information bottleneck decoder (cf.solid light-blue curve with "x"-markers) diverged already at a BER of 5 • 10 −3 .In contrast, the results obtained with our proposed information bottleneck decoder were quite remarkable compared to the performance of a belief propagation decoder working with double-precision data types.Although only 16 different integers were processed by the information bottleneck decoder, the performance degradation in terms of bit error rate was only 0.15 dB for a BER of 10 −3 down to 10 −5 .The operations performed during information-bottleneck decoding were only simple look-ups in the offline-generated look-up tables, whereas the belief propagation reference simulation processed double precision channel output values and also performed double precision box-plus operations.
The performance degradation was even smaller if the belief propagation decoder with quantized channel LLRs was considered as reference.It suffered from the same information loss introduced by using the channel quantizer as the information-bottleneck decoder.Finally, although the min-sum decoder exchanges double precision LLRs, the performance was significantly worse compared to our proposed information-bottleneck decoder.As a consequence, the performance gain achieved by applying message alignment allows the construction of information bottleneck decoders for irregular LDPC codes which pair coarse quantization, low complexity, simple node operations, and close-to-optimum performance.

Conclusions
In this paper, we generalized the construction of an information bottleneck decoder to be also applicable for arbitrary irregular LDPC codes with arbitrary rates.For this purpose, we first extended discrete density evolution for irregular codes and revealed that due to the discrete and finite alphabets of the compression variables, a straightforward application of existing methods is not possible without an additional processing step.We derived the underlying information-theoretic optimization problem and devised a solution, which we call message alignment.By adding this extra step, we were able to build a 4-bit information bottleneck decoder in which all arithmetic operations were replaced by look-up tables mapping incoming 4-bit messages onto outgoing 4-bit messages.Our presented decoder is characterized by a significantly lower implementation complexity than conventional decoders.However, at the same time, the information-bottleneck decoder achieved a comparable performance (0.15 dB gap) to double-precision belief propagation and significantly better performance than min-sum decoders.To achieve a design with low latency and less memory demand, a tree-like node structure was superior.With this structure, the memory to store the look-up tables depends only logarithmically (O(2 • log 2 (M) )) on the number of processed messages M instead of linearly (O(M − 1)).A similar improvement was achieved when considering the number of performed look-up operations.We believe that the demonstrated applicability now also for practically relevant irregular LDPC codes with arbitrarily high rates increases the relevance of information bottleneck decoders as promising candidates for practically important finite-precision LDPC decoding in the context of optical communications.

y d c − 1 tFigure 1 .
Figure 1.Illustration of a Tanner graph and message passing between variable nodes (circles) and check nodes (squares).

Figure 2 .
Figure 2. Information bottleneck setup, where I(X; T) is the relevant information, I(X; Y) is the original mutual information, and I(Y; T) is the compression information.

Figure 3 .
Figure 3. Example for a factor graph (left) and an information bottleneck Graph (right) of p(t|y), where y = [y 1 , . . ., y M ] T .The information bottleneck graph compactly describes that t shall keep relevant information on x while compressing all variables in vector y.

Figure 4 .
Figure 4. (a) Discrete memoryless channel (DMC) with subsequent quantizer; (b) Information bottleneck graph of the respective information optimum quantizer.

Figure 5 .
Figure 5. Averaged and original cluster meaning of variable nodes with degree two and four (a) without and (b) with message alignment.

Figure 6 .
Figure 6.Flow graph of iterative message alignment.

Figure 7 .
Figure 7. Illustration of an opened node, where all M incoming messages y 1 , ... y M are clustered (a) sequentially or (b) as proposed, in a tree-like manner.

Figure 8 .
Figure 8. Bit error rate (BER) performance of our proposed decoder and reference systems with properties summarized in Table3.LLR: log-likelihood ratio.

Figure A1 .
Figure A1.Illustration of a tree-like structure if M (a) is a power of two; (b) is not a power of two.

Table 1 .
Overview of maximum required look-up tables and their sizes depending on the node structure.

Table 2 .
Overview of the number of look-up table entries and the total memory demand depending on the node structure.

Table 3 .
Properties of the compared decoding algorithms.