Capacity Bounds and Mapping Design for Binary Symmetric Relay Channels

Capacity bounds for a three-node binary symmetric relay channel with orthogonal components at the destination are studied. The cut-set upper bound and the rates achievable using decode-and-forward (DF), partial DF and compress-and-forward (CF) relaying are first evaluated. Then relaying strategies with finite memory-length are considered. An efficient algorithm for optimizing the relay functions is presented. The Boolean Fourier transform is then employed to unveil the structure of the optimized mappings. Interestingly, the optimized relay functions exhibit a simple structure. Numerical results illustrate that the rates achieved using the optimized low-dimensional functions are either comparable to those achieved by CF or superior to those achieved by DF relaying. In particular, the optimized low-dimensional relaying scheme can improve on DF relaying when the quality of the source-relay link is worse than or comparable to that of other links.


Introduction
In this paper we consider a relay network consisting of a source, a relay, and a destination, as illustrated in Figure 1.This channel is considered in [1,2].The communication task is to reproduce the transmitted message M , uniformly chosen from the set M = {1, 2, . . ., 2 nR }, at the destination such that the probability of error is arbitrarily small.The transmission of a message consumes n channel uses.We desire to quantify the supremum of the set of rates, R, for which the average message error probability at the destination can be made to approach zero as the number of channel uses n goes to infinity.This number is the capacity, C, in the communication between the source and destination.The channel capacity for the general relay channel is still an open problem.One special case of the general relay channel is the Gaussian relay channel, which is studied, e.g., in [2][3][4][5][6][7][8][9].In this paper, however, we focus on a relay network in which each link is a binary symmetric channel.Some instances of the binary symmetric relay channel (BSRC) are considered in [10][11][12][13][14].The work of [10,11] considers BSRCs with correlated noises and [12,13] focuses on the detect-and-forward protocol for multi-hop relaying and the relay channel, respectively.In this paper, we consider a model that is also studied in an independent work [14].We study decode-and-forward (DF), partial decode-and-forward (PDF), compress-and-forward (CF) and general finite-memory relay mappings.One of our main contributions is to propose a general structure for finite dimensional relaying.We show that one can improve on the result presented in [14] and illustrate that it is possible to approach the capacity upper bound for some cases by using the proposed low-dimensional mappings.Interestingly we recover the lower bound in [14] by a simple one-dimensional mapping at the relay.We also illustrate that one can obtain higher reliable rates by employing optimized relaying functions with larger memories.The rate obtained via optimized finite-length mappings can be superior to those achieved by DF protocol in some cases and are comparable to those achieved by CF protocol in general.
The remaining part of the paper is organized as follows.In Section 2, we present the three-node BSRC with orthogonal components at the destination.Section 3 derives capacity bounds for the BSRC when the relay is assumed to have infinite memory.Section 4 investigates capacity bounds for the BSRC when the relay is assumed to have a finite memory.We present an algorithm to optimize the relay mapping in Section 4. In Section 5 we compare the achievable rates as a function of memory length.Finally, Section 6 concludes the paper.
Notation: We use the following notation for brevity.

Binary Symmetric Relay Channel
In this section, we introduce the channel model that we consider in the paper.Figure 2 shows a BSRC consisting of three nodes: a source, a relay and a destination.In this model, we however assume that received signals at the destination are orthogonal.That is, the signals transmitted from the source and the relay do not interfere with each other.We further assume that all links are corrupted with modulo-sum noises distributed according to the Bernoulli distribution and all quantities are binary; i.e., X, X r , Y 1 , Y 2 , Y r ∈ {0, 1}.
The received signal at the relay Y r is given by where X is the transmitted symbol from the source and Z r ∼ Ber( r ) is the additive Bernoulli noise.(The Bernoulli noise models for example the conventional communication set-up using BPSK modulation followed by a hard decision (see also [14]).)The received signal from the source at the destination Y 1 is given by where Z 1 ∼ Ber( 1) is the additive Bernoulli noise.Similarly, the received signal from the relay at the destination Y 2 is given by where X r is the transmitted symbol from the relay and Z 2 ∼ Ber( 2 ) is the additive Bernoulli noise.We assume that the random variables Z r , Z 1 , and Z 2 are mutually independent.Note that the addition in Equations ( 1)-( 3) is done in GF(2).
Remark 1. Figure 3 shows a BSRC with non-orthogonal reception at the destination.The received signal at the destination is given by where X and X r respectively denote the symbols transmitted by the source and the relay.The random variable Z ∼ Ber( ) is the additive Bernoulli noise and is independent of Z r .The capacity of this channel is By setting X r = 0, we have Y = X ⊕ Z. Then the achievability follows since max p(x) I(X; Y |X r = 0) = C no .The converse follows from the cut-set bound.Invoking the multiple access bound, we have In the sequel, we present various capacity bounds for the orthogonal BSRC.In this section, we consider the cut-set upper bound and three lower bounds on the capacity: DF, PDF and CF.These three bounds are evaluated based on the results in [2] for the general relay channel where the relay has infinite memory and unlimited processing capability.
Proposition 1 (cutset bound).For the relay channel in Figure 2, the capacity is upper bounded by Proof.See Appendix A and also [14].
Proposition 2 (DF lower bound).For the relay channel in Figure 2, the capacity is lower bounded by where the bound is achieved using the block Markov encoding scheme in [2] (known as DF relaying) and the only-direct-transmission (i.e., when the relay is off).
Proof.See Appendix B.
Note that in Equation ( 8), we take the maximum of the rates achieved using the conventional DF [2] and the only-direct-link transmission.In some cases it is possible to improve on DF by using PDF at the relay.That is, the relay only decodes a part of the transmitted message.The achievable rate of PDF is given by where U denotes the part of the transmitted message that the relay decodes.(See Theorem 7 in [2] and also [15].) Proposition 3 (PDF lower bound).For the relay channel in Figure 2, generalized block Markov encoding attains the same rate as the modified DF given in Equation (8).
Proof.See Appendix C.
Corollary 1. DF is an optimal relaying strategy if Proof.The proof follows from Propositions 2 and 1.
Proposition 4 (CF lower bound).For the relay channel in Figure 2, the capacity is lower bounded by where q satisfies Proof.See Appendix D.
The bound in Equation ( 10) is achieved using the side-information encoding scheme in [2], known as CF relaying.
Corollary 2. CF is an optimal relaying strategy if Proof.The proof follows from Propositions 4 and 1.  5 show optimal regions in the plane [0, 1]×[0, 1] for 2 set to 0.05 and 0.2, respectively.One can see that for some parameters of r and 1 , CF or DF is optimal.
From Figures 4 and 5, we see that DF is optimal when r is small and 1 is close to 0.5.When 1 = 0.5, the BSRC simplifies to a two-hop channel for which DF is capacity achieving.However, DF is also optimal when the relay is relatively strong compared with the direct link.
From Figures 4 and 5, we similarly see that CF is optimal at the corner of the plane, i.e., when 1 and r are small.As 2 increases, the area of the region for which CF is optimal shrinks.This is due to the fact that small value of 2 allows the relay to use higher rate to describe the received signal to destination.Small value of 1 means the destination has better side-information in order to decode the reproduction sequence generated at the relay, and small value of r means that the relay receives a less-noisy sequence on average that makes compression of the "true" signal easier.

Capacity Bounds for the Orthogonal BSRC: Finite Memory Relay Case
We next consider the case when the relay has a finite memory length.For the Gaussian relay channel, optimized memoryless relaying is investigated in [16][17][18][19].Here we consider higher-dimension mappings for the BSRC.
If the relay has a storage memory of k bits, it can process the last k − 1 and the presently received symbol to generate k new symbols using k possibly different k-dimensional functions.This results in a low-complexity relaying protocol suitable for delay-sensitive or inexpensive applications.In the following, we denote the relay functions by for i ∈ {1, 2, . . ., k}.Note that we here consider the whole sub-block of k symbols to generate k new symbols to be transmitted to the destination.This is different from the classical definition with strictly causal relaying.We are allowed to do this, without any particular condition in signal reception at the relay, since the relay has an orthogonal link to the destination.

Achievable Rate
For a given set of relay functions {f i } k i=1 , the channel is parameterized by the pmf p(y|x), by defining y ≡ (y k 1,1 , y k 2,1 ) and x ≡ x k 1 , where Now one can apply the standard random coding argument for the equivalent discrete memoryless point-to-point communication link with the input x and the output y whose relation is governed by the pmf p (y|x) as follows.Generate 2 (nk)C k i.i.d.codewords where each has length kn and each k subsequent symbols in every codeword is distributed according to p(x k 1 ) (i.e., p(x kn 1 ) = n i=1 p(x k 1 )).Thus, the achievable rate using the finite memory relay is given by where the supremum is taken over the set of Boolean functions {f i } k i=1 and the joint pmf p(x k 1 ) of k symbols at the source.Since the channel is used k times, the mutual information in Equation ( 14) is divided by k (see also [1,4]).
Achievable Rate for k = 1: The simplest case is the memoryless relay in which the relay just transmits the received noisy bit to the destination without any further processing.That is x r,i = y r,i for 1 ≤ i ≤ n.For this relay function, the optimal input distribution is X ∼ Ber( 12 ).
Proposition 5.For the relay channel in Figure 2, the rate is achievable.
Proof.See Appendix E.
We note that the rate C 1 is also derived in [14] via a suboptimal evaluation of the CF lower bound.However, here we arrive at this rate using a one-dimensional mapping without any need for a compression codebook at the relay.
Computation of C k for k ≥ 2 is a cumbersome task.Nevertheless, there is no unique set of relay functions that is optimal for all channel parameters r , 1 , 2 .To see this, consider the case with 2 = 0.For this case, the simple strategy with k = 1 used in Proposition 5 is optimal.However, this relay function is not necessarily optimal for cases with 2 = 0 since one can potentially provide error protection on the relay-destination link by utilizing functions with higher dimensions.

Mapping Optimization for an Arbitrary k
In the following, we confine the pmf at the source to be p(x k 1 ) = k i=1 p(x), and p Lemma 1.The mutual information given in Equation ( 14) can be written as where and .
Proof.See Appendix F.
To compute the rate in Equation ( 14), one needs to select the best functions among 2 k2 k possible choices, which has a exponential complexity.(For k = 4, there are approximately 1.8 × 10 19 possible functions.)In order to cope with the complexity, we implement an efficient hill-climbing search algorithm as follows.For a given k, we first initialize the relay functions with a random mapping and compute the rate using Lemma 1. Then we randomly select one function and one corresponding dimension, and flip the mapping and recompute the rate.If the new mapping provides a higher rate, we accept the change.Otherwise we repeat the process until the mapping converges.Since the algorithm by construction may terminate in a local optimum, we repeat the whole algorithm with different initializations and pick the mapping that attains the highest rate.
One example of the optimized mapping for k = 4 when 1 = 2 = r = 0.01 is .
Here F ij denotes the output of the relay along the ith dimension for the jth input configuration (we have ).The relay for a given combination of the received bits finds the decimal representation and transmits the bits in the column given by the decimal representation plus one.For example, for the received string 0000, the relay transmits the bits in the first column, i.e., 0111.By studying the matrix F , we can get some insight into the underlying structure of the mapping.Finding an efficient structure at the relay simplifies the design and the implementation of relaying.We employ the Fourier transform to accomplish this task.Our use of the binary Fourier (or Hadamard) transform is related to how it was used in, e.g., [20,21], to analyze the performance of quantizers over noisy channels.

Fourier Spectrum of the Optimized Mappings
In order to define the Fourier transform, we need an orthonormal basis [22].Consider the following set of functions where S ⊆ {1, 2, . . ., k}.Then, any function f : {0, 1} k → {0, 1} can uniquely be represented as (We use the one-to-one mapping 0 ⇐⇒ +1 and 1 ⇐⇒ −1.) where f (S) is the Fourier coefficient of f and is given by The expectation in Equation ( 21) is taken uniformly over x ∈ {+1, −1} k .Note that the Fourier expansion of f can potentially have up to 2 k terms.As an example consider the following randomly chosen function.
This function can be expanded as That is, the Fourier spectrum has eight terms and the function is not linear.In general, we would like a sparse Fourier spectrum for efficient implementation.(Sparsity of mappings allows to realize them with much fewer multiplications and additions.This can be also compared to codes with low-density generator/parity matrices that allow simpler encoding and decoding in general.)Table 1 presents the Fourier expansion of optimized relay functions for k = 2, 3, 4, 6 when 1 = 2 = r = 0.01.Interestingly, we see that that the Fourier expansion of the optimized functions is indeed sparse.Using the results in Table 1, we can rewrite the functions in the following form where x r = [x r,1 , . . ., x r,k ] T and y r = [y r,1 , . . ., y r,k ] T , and A k ∈ {0, 1} [k×k] and b k ∈ {0, 1} [k×1] .For example for k = 6, we have Note that the mapping given in Equation ( 23) is not linear in the binary field, when b k = 0.However, the linear mapping x r = A k y r gives the same performance as x r = A k y r + b k .In other words the bias term b k does not improve the rate.This essentially follows from the data processing inequality.Therefore, the underlying relay functions define a linear code of rate one on the noisy received bits at the relay.Additionally, the code used at the relay performs joint source-channel coding, it therefore should be good for both source and channel coding.
Table 1.Fourier expansion of the optimized relay functions for the orthogonal BSRC with

Effect of Channel Parameters on the Structure of the Optimized Mappings
In this section, we investigate the structure of the optimized mappings for different channel parameters.Our numerical search indicates that the linear mapping x r = A k y r is an efficient strategy among all classes of mappings for low-dimensional relaying.That is, the relay employs the binary matrix A k to generate the relay outputs using k received bits.
Table 2 shows the optimized generator matrices A k for various values of r when k = 6, 1 = 2 = 0.05.In particular, for r = 0.25 the optimized generator matrix is the identity matrix, i.e., A 6 = I 6 .For this case, the relay is better off transmitting the received noisy bits without any further processing.However, as r decreases, the relay starts combining the received bits at the relay before transmitting.The number of ones in a row of the generator matrix indicates the number of inputs that the relay combines.The density of ones in the generator matrices ρ := # of ones are also shown in Table 2.We see that as r decreases, ρ increases.That is, the relay starts to transmit combinations of more bits in one single channel use.This occurs because of two main reasons: firstly when r decreases the relay receives less noisy bits on average and secondly the destination has some partial knowledge of individual bits via the received signal from the source.
Table 3 shows the optimized generator matrices for various values of 1 when k = 6, r = 0.01 and 2 = 0.1.We similarly see that as 1 decreases, ρ increases.This is due to the fact that when 1 decreases, the destination receives better descriptions of the transmitted bits via the source-destination link.The relay then forwards combinations of several incoming bits when the destination has access to more reliable side information.

Numerical Examples
Figure 6 shows the capacity results for the orthogonal BSRC shown in Figure 2 as a function of r when 1 = 2 = 0.01.In this figure, we have plotted the cut-set upper bound (UB) (Equation ( 7)), rates achieved using decode-and-forward (DF) (Equation ( 8)), compress-and-forward (CF) (Equation ( 10)), and optimized finite memory relay (Equation ( 14)) for different memory size.The relay functions are optimized for the channel parameters 1 = 2 = r = 0.01 and are given in Table 1.
From Figure 6, we see that the achievable rate of DF decreases as r increases to 0.01.For r ≥ 0.01, the achievable rate of DF coincides with that achievable with direct transmission (i.e., the relay is off).On the other hand, the rates achieved by CF coincides with the upper bound for the chosen channel parameters since the condition in Corollary 2 is satisfied.More interestingly, optimized low-dimensional relaying with k = 6 achieves rates close to those achieved by CF and operates close to the capacity.

Summary and Concluding Remarks
We introduced a binary symmetric relay channel with orthogonal components at the destination, and investigated three main relaying strategies: decode-and-forward (DF) relaying, compress-and-forward (CF) relaying, and optimized low-dimensional relaying.We used a bit-switching numerical algorithm to find optimized mappings.We initialized our algorithm with arbitrary random nonlinear mappings, and after optimization based on Fourier analysis, we observed that all optimized mappings that we found were linear.We also illustrated that one can obtain rates very close to the upper bound by using the proposed optimized low-dimensional relaying scheme.It is worth noting that DF and CF require codebooks with infinite block length codewords at the relay.This stands in a sharp contrast to the proposed low-dimensional relaying scheme.Additionally, the suggested relaying protocol has low-delay processing and paves the way for implementation of inexpensive relaying protocols.We finally note that the sufficiency of linear mappings for the problem of optimal relaying remains open.
Next consider We finally note that the upper bound can be achieved by choosing X ∼ Ber(0.5).
Proof of Proposition 1: Using the cut-set bound [2], we have Now we bound each term.
Similarly, we have where the last inequality follows from Lemma 2. Combining ( 27) and (28) and noting that ( 27) and ( 28) are maximized when the input distribution is chosen as yield the result.

B. Proof of Proposition 2
Using Theorem 1 in [2], the rate is achievable.The first term is evaluated in Appendix A and is maximized when X and X r are independent and have uniform distribution.One can show the same distribution maximizes the second term.This yields C. Proof of Proposition 3 Using Theorem 7 in [2] , the rate is achievable.(See also [15].)The first term is evaluated in Appendix A and is bounded as Next consider Now we bound H(Y 1 |X r , U ) − H(Y r |X r , U ). First, define V := (X r , U ) where p(V = v i ) = p i and In the following, let h(δ) := H b (δ * 1 ) − H b (δ * r ) and max{ 1 , r } ≤ 0.5.We then obtain . One can show that ∂g ∂ ≤ 0 if ≤ 0.5 and hence g( ) is a non-increasing function.Thus we conclude that where δ is the solution of ∂h ∂δ = 0. Finally we obtain the following bound Combining Equations ( 30), ( 31) and (36) proves that partial DF does not improve on DF for the BSRC.

D. Proof of Proposition 4
We use the equivalent formulation of the original CF (Theorem 6 in [2]) given in [4].CF achieves the rate where the upper bound can be achieved by choosing X ∼ Ber(0.5) and X r ∼ Ber(0.5).
In order to proceed we choose the following binary test channel: where Z q ∼ Ber( q ) and is independent of other random variables.This yields where the last inequality follows from Lemma 2 and it is achieved by choosing X ∼ Ber(0.5).Putting all together, the following rate is achievable Now define Using the fact that f ( Since R 2 (0.5) ≤ R 1 (0.5) we only need to consider two following cases: • Case 1: R 1 (0) < R 2 (0) If R 1 (0) < R 2 (0) we have 1 − H b ( 2 ) < H b ( 1 * r ) and there exists q such that R 1 ( q ) = R 2 ( q ).Thus is achievable where q satisfies • Case 2: R 1 (0 (47) is achievable.

E. Proof of Proposition 5
For the memoryless relay, we have x r = y r and hence where Z eq := Z r ⊕ Z 2 ∼ Ber( r * 2 ).Then the achievable rate is given by C 1 = max p(x) I(X; Y 1 , Y 2 ).Now using Lemma 2, we obtain F. Proof of Lemma 1 Consider We similarly obtain (52)

Figure 6 .
Figure 6.Capacity results for the binary symmetric relay channel as a function of r when 1 = 2 = 0.01.

Table 2 .
Optimized generator matrices as a function of r for k = 6 and 1 = 2 = 0.05.