An LDPC Decoder Architecture for Wireless Sensor Network Applications

The pervasive use of wireless sensors in a growing spectrum of human activities reinforces the need for devices with low energy dissipation. In this work, coded communication between a couple of wireless sensor devices is considered as a method to reduce the dissipated energy per transmitted bit with respect to uncoded communication. Different Low Density Parity Check (LDPC) codes are considered to this purpose and post layout results are shown for a low-area low-energy decoder, which offers percentage energy savings with respect to the uncoded solution in the range of 40%–80%, depending on considered environment, distance and bit error rate.


Introduction
Wireless Sensor Networks (WSN) have gained growing research interest in the last years. The possibility to monitor different physical quantities even in dangerous and hard-to-reach areas has found applications in several fields, including medical, industrial and surveillance environments [1]. WSNs are made of small nodes, where each node often relies on small size and light weight batteries. As a consequence, both energy consumption and area occupation are important aspects in the design of nodes. Although nodes feature a limited energy budget, they embody not only sensing but also computational and transmit/receive circuits. Thus, energy consumption issues are critical and ought to be minimized at every design level. As an example in [2] several system level techniques, including modulation, Media Access Control (MAC) protocols and channel coding techniques are analyzed to achieve energy efficiency in WSNs.
In [3] it is shown that in WSNs the transmission energy can be lowered accepting to receive error-affected data. In this case the receiver should embed error correction strategies to recover the original data. In particular, the amount of energy spent to perform error correction should be significantly lower than the energy saved at the transmitter side. As an example, in [4,5] an energy efficient error correction scheme for WSNs is proposed. In particular, in [5] the physical layer of the IEEE 802.15.4 standard [6] is augmented introducing interleaving and forward error correction. In [2,3] several classes of codes are investigated, including Reed-Solomon codes, convolutional codes, turbo codes and Low-Density-Parity-Check (LDPC) codes [7,8]. Experimental results in [3] show that LDPC codes are good candidates for WSN applications as they feature a significant coding gain as compared with other codes. However, they consume about one order of magnitude more than simpler codes as the extended Hamming ones. Most of previous works proposing error correction codes for WSNs assume that networks contain at least two classes of nodes: sensing nodes and central nodes. Sensing nodes feature lower computational capabilities and available energy than central nodes. Thus, sensing nodes send coded information to a central node which performs the decoding operations. On the contrary, this work investigates homogeneous WSNs where each node can both transmit and receive coded information. A similar idea is proposed in [9] with focus on turbo codes. In particular, in [9] it is shown that the energy consumption of homogeneous WSN is reduced by about 70% resorting to turbo codes. In this work we show that even higher energy saving and smaller area can be achieved with LDPC codes. In particular, this work shows that small block length LDPC codes are adequate for typical throughput and data transmission requirements of WSNs.
The paper is structured as follows: Section 2 deals with LDPC coding and decoding algorithms whereas Section 3 concentrates on modeling the WSN environment. Section 4 details the proposed LDPC decoder architecture and Section 5 shows the experimental results. Finally, in Section 6 conclusions are drawn.

Coding and Decoding Algorithms for LDPC Codes
LDPC codes are a class of linear block codes, characterized by a very sparse M × N parity-check matrix H where valid codewords x satisfy H · (x) ′ = 0 and (·) ′ represents the transposition operator. Each LDPC code can be represented as a bipartite graph, known as Tanner Graph [10], containing two sets of nodes: Variable Nodes (VNs) and Check Nodes (CNs). VNs are associated to the N bits of the codeword, whereas CNs correspond to the M parity-check constraints. Edges in the graph correspond to ones in the H and most of decoding algorithms imply the exchange of information along the edges of the Tanner graph. The most common algorithm to decode LDPC codes is the Belief Propagation (BP) algorithm. The VNs receive the intrinsic information λ (likelihood functions i.e., probabilities) from the channel and update it depending on the results of the parity check equations computed at the CNs. This process is iterated several times until either the maximum number of iterations is reached, or a convergence criterion is met. This criterion may be that a codeword was successfully decoded.
There are two main scheduling schemes for the BP [11]: two-phase scheduling and layered scheduling [12]. The latter nearly doubles the convergence speed as compared to two-phase scheduling. In a layered decoder, parity-check constraints are grouped in layers, each of which is associated to a component code. Then, layers are decoded in sequence by propagating extrinsic information from one layer to the following one [12]. When all layers have been decoded, one iteration is complete and the overall process can be iteratively repeated up to the desired level of reliability.
Let S j represent the Log-Likelihood-Ratio (LLR) of the bit in column j of H. Bit LLR S j is initialized to the corresponding received soft value. Then, for each parity constraints m in a given layer, the following operations are executed: is the extrinsic information received from the previous layer and updated in Equation (5) to be propagated to the succeeding layer. Term R Unfortunately, the computation of Equations (2) and (4) is complex, as Ψ(·) is a non-linear function. According to [13], Equation (2) can be simplified with a limited Bit-Error-Rate (BER) performance loss as usually referred to as normalized-min-sum approximation, where s ′ mj = σ · s mj and σ ≤ 1. For further details the reader can refer to [8,10].
A key concern in the design of high throughput LDPC code decoders comes from the communication structure that must be allocated to support message passing among VNs and CNs. Three approaches can be followed in the high level organization of the decoder: The first approach leads to very high throughput, large implementation cost and severe congestion problems in the routing of interconnects [14]. For these reasons it is not adopted in practical implementations. The partially parallel architecture requires a large bandwidth between processing units and memories where messages are stored. Moreover, special attention is necessary to avoid collisions in the memory access [15]. However, the partially parallel organization allows to precisely tune the wanted degree of parallelism with respect to the addressed throughput and it was proved to be the best solution for the implementation of efficient decoders [15][16][17][18][19]. The serial approach leads to low cost and low power implementations and it also offers a high level of flexibility with respect to the supported code. However serial architectures did not receive much attention, due to the fact that the sequential processing does not achieve large throughput. This solution is particularly suitable for software implementations on Digital Signal Processors [20]. As throughput requirements in WSN applications are usually much lower than in wireless communications, the serial approach appears as the best solution to implement low cost and low energy decoding in a sensor node.

Wireless Sensor Network Environment and Modeling
Required throughput and energy budget are important parameters to model the environment of a WSN. Although the throughput depends on the application, several recent works [21][22][23][24] as well as off-the-shelf products for the IEEE 802.15.4 standard target a throughput T of 250 kb/s. According to [3] the amount of energy per bit saved due to the use of a correcting code (∆E) can be expressed as where E T X,U and E T X,C are the amounts of energy per information bit spent to transmit one bit in an uncoded and coded system respectively. E enc and E dec are the amounts of energy per bit spent by the LDPC encoder and decoder. Assuming a Binary-Phase-Shift-Keying (BPSK) modulation, each E term in Equation (7) can be written as a function of the power consumption P and the throughput T of the corresponding task. For a fair comparison we assume that the throughput sustained by the transmitter is the same for both the uncoded and coded case. As a consequence, Equation (7) can be rewritten as However, as shown in [25] and [26] the complexity and the power consumption of LDPC encoding is negligible with respect to decoding. As a consequence, in the following the P enc term will be neglected. Moreover, as highlighted in [3], each P T X term can be written as a function of the path loss A(d) at a given distance d, the thermal noise N 0 · B (where B is the signal bandwidth and N 0 is the noise power spectral density), the Signal-to-Noise-Ratio (SNR) at the receiver and the receiver noise figure F : According to [27], where λ is the wavelength of the corresponding carrier frequency f and n is the path loss exponent, where n = 2 and n = 4 are good approximations for free space and dense environment propagations respectively. Assuming the same A(d) and F values for both uncoded and coded systems, Equation (8) can be rewritten as where SNR U and SNR C are the SNR at the receiver in the uncoded and coded systems respectively. Thus, given the curves representing the BER of one system as a function of the SNR, we obtain for each BER value the amounts SNR U and SNR C with SNR G = SNR U − SNR C representing the SNR gain achieved using error correction. So Equation (11) can be rewritten as The expression obtained in Equation (12) will be used in Section 5 to show the effectiveness of the proposed LDPC architecture.

LDPC Decoder Architecture Design
LDPC codes are known to nearly achieve the Shannon limit when the block of data is very large (N → ∞) [10]. However, in WSN applications the amount of bits exchanged by nodes is limited, leading to small N values. Nevertheless, in [28,29] it is shown that LDPC codes can achieve excellent performance even when N is small. In this work, we analyze the minimum N LDPC code from the IEEE 802.16e standard [30], which corresponds to N = 576 coded bits and K = R · N = 288 uncoded bits (R = 0.5). Moreover, we considered the two best performing regular codes with N = 96 and N = 204 (K = 48, K = 102) respectively, taken from MacKay database [31] and referred to as 96.33.966 and 204.33.484 (R = 0.5 for both).
In order to size the LDPC decoder architecture, finite precision analysis ought to be performed. Given that p S and p R are the number of bits to represent S j and R mj metrics respectively, as in Equations (1-6), simulations have been carried out for p S ∈ {5, 6} and p R ∈ {3, 4}; normalized-min-sum approximation with σ = 0.875 has been employed. The performance of the three considered codes are shown in Figures 1-3 both in the floating point and fixed point cases together with the performance of the corresponding uncoded system. Furthermore, it has been observed that targeting a BER of 10 −4 as in [3,9] and imposing a maximum of ten iterations (I = 10), the performance loss is negligible.  Due to the low throughput required, we assume that a fully serial processor architecture, which executes the decoding algorithm on one CN at the time, is a reasonable solution. In this case the throughput sustained by the architecture, defined as the number of decoded bits over the decoding time, is where f clk is the decoder clock frequency, I is the maximum number of iterations, d max c is the maximum degree of a CN, i.e., the maximum number of edges on a CN and D is the latency of the architecture. It is worth noting that Equation (13) can be adapted to parallel and partially parallel architectures by substituting M with M/W where W is the number of rows (in H) processed in one clock cycle. The latency D in Equation (13) can be minimized avoiding idle cycles between iterations, so that D = d max c . Thus, the throughput can be approximated as As it can be observed, the throughput increases with R so low-rate codes are a conservative choice to achieve the target throughput. Moreover, if we fix N we observe that increasing the rate has the effect of reducing the BER performance of the code. Thus, we considered the N = 204, R = 0.5 code and tried to increase both N and R. From MacKay database [31] we considered the following two high-rate codes where N > 204: N = 273, R = 0.7 and N = 495, R = 0.87 referred to as 273.82.3.353 and 495.62.3.2915 respectively. As shown in Figure 4 the BER performance of both codes is lower than the one obtained for N = 204, R = 0.5. Furthermore, codes with N > 204 require a larger amount of memory than the N = 204, R = 0.5 code. From this analysis we infer that for the most complex code among the ones considered in this work, i.e., d max c = 7 for the IEEE 802.16e N = 576, R = 0.5 code, and given the target throughput T = 250 kb/s and I = 10, Equation (14) leads to f clk ≥ 17.5 MHz. In this work we fix f clk = 20 MHz as a conservative value. Thus, the proposed architecture, inspired by the data-path of the solution proposed in [32], is made of four blocks as shown in the bottom part of Figure 5(a): a processing element (PE) devoted to implement the computation described in Equations (1-6) with the normalized-min-sum approximation; S and R memories, where S j and R mj metrics are stored; and an address generator. As depicted in the upper part of Figure 5 The MEU, detailed in the upper part of Figure 5 The CMP block and the multiplication unit are shown in the bottom part of Figure 5(b). The CMP block compares |Q mj | with M 1 . If they are equal, M 2 is passed to the multiplication unit. The multiplication unit does not contain a real multiplier as σ = 0.875 = 1 − 1/8 requires only a subtractor and a hard-wired three-bit right shift (>> 3). In order to take into account the −s mj term, two multiplexers, driven by −s mj are added to obtain R (new) mj as in Equation (6).

Experimental Results
The proposed architecture has been described using VHDL language. The complete design flow, including synthesis, place and route has been performed with Synopsys Design Compiler and Cadence Encounter on a 90 nm CMOS standard cell technology with 9 levels of metal and supply voltage equal to 1 V. Post place and route simulations was run to obtain accurate capacitances and switching activities [33], which are necessary for estimating the power consumption. Area and power consumption results for the three codes analyzed in Section 4 with p S ∈ {5, 6}, p R ∈ {3, 4} and f clk = 20 MHz are shown in Table 1. It is worth noting that it is difficult to make a fair comparison of the proposed architectures with other solutions proposed in the literature because the target applications are different. However, for the sake of completeness in Table 2 several LDPC decoder architectures are compared with the most area demanding and power consuming solution among the proposed ones (N = 576, p S = 6, p R = 4, last row of Table 1).
As it can be observed, most solutions proposed in the literature address partially parallel architectures designed for wireless communications and broadcasting applications. As a consequence, they are sized to obtain throughput of hundreds of Mb/s or even Gb/s with large blocks of data. On the contrary, the proposed serial architecture is specifically tailored for WSN applications where throughput and block length are much smaller, we assume here T ≤ 250 kb/s and N ≤ 576. Since the considered architectures have been designed on different technologies, we scale them all to the 90 nm technology node (A 90 ) for the sake of fairness. The scaling is obtained multiplying the area (fifth column in Table 2) by (F /90) 2 , where F is the feature size shown in the fourth column of Table 2. As expected, the proposed architecture is about one order of magnitude smaller than the other ones (fifth and sixth columns in Table 2). On the contrary, partially parallel architectures consume less energy per bit and energy per bit per iteration than serial solutions (eleventh and twelfth columns in Table 2). Assuming that area and energy consumption are the most important metrics to choose a decoder architecture for WSN applications, we introduce two figures of merit. The first one is the normalized area Φ A (k) = A 90 (k)/ min k {A 90 (k)} where A 90 (k) is the area of the k-th architecture scaled to the 90 nm technology node. The second one is the normalized energy per bit per iteration Φ E (k) = E I (k)/ min k {E I (K)}. These two figures of merit represent how far an architecture is from the minimum area and minimum energy per bit per iterations ones respectively. Assuming that Φ A and Φ E are equally important, their product shows which architecture is more suited for WSN applications among the compared ones. As shown in the last column of Table 2 the proposed architecture is the one with minimum Φ = Φ A · Φ E . It is worth noting that as shown in the last two rows of Table 2 the proposed architecture shows better area and energy figures than the recently proposed turbo decoder architecture for WSN applications described in [9]. As highlighted in [36], several standards have been proposed for WSNs. It can be interestingly noted that most of them rely on the physical layer of the IEEE 802.15.4 standard. Thus, to evaluate the gain of the proposed architecture in a WSN environment we assume typical parameters taken from the IEEE 802.15.4 standard, namely f = 2.4 GHz and B = 80 MHz and we fix d = 50 m. Moreover, employing an ultra-low-power low-noise-amplifier, as the one proposed in [37], we can fix F = 3.8 dB.
In the following we investigate the energy saving obtained for a path loss exponent equal to three and four respectively, to model either typical indoor environments and outdoor urban/suburban foliated areas [38] or dense outdoor urban environments [39]. From Equation (12) the energy per bit required by an uncoded system ranges from tens of nJ/bit to few µJ/bit depending on the considered path loss exponent value. As a consequence, to obtain a more significant information we compute the percentage of saved energy per bit with respect to the energy per bit of an uncoded system (∆E/E T X,U ) as a function of the BER. The percentage of saved energy as function of the BER for all the results shown in Table 1 is depicted in Figures 6 and 7 for n = 3 and n = 4 respectively.
As it can be observed, both for n = 3 and n = 4 at a BER of 10 −4 the percentage of saved energy is more than the 50% and, in the best case, it achieves the 80%. It is worth pointing out that when a code reaches the error floor region, the percentage of saved energy is maximum and then it decreases. Thus, the best energy saving performance is achieved in the waterfall region of the code.

Conclusions
Notwithstanding continuous progresses in the capacity of batteries, minimizing the energy dissipation still is one of the key objectives in the design of most sensor devices. In particular, transmission energy is a relevant component of the overall energy budget of a wireless sensor. This paper explores the use of LDPC codes to protect sent information against channel errors, thus allowing for a lower transmission energy. The energy that is saved at the transmission side depends on the coding gain of the selected code: more powerful the code, larger the saved energy. However a decoder is required at the receiver side to reconstruct the original information. The node to node communication throughput is low in wireless sensor applications and this enables the design of a fully serial decoding architecture, with limited implementation complexity and extremely low dissipated power. The additional energy consumed by the decoder has been evaluated by means of logical synthesis and layout generation. Final results prove that percentage saving as high as 80% can be achieved with the coded approach with respect to the usual uncoded transmission.